| $a, b, c, \alpha, \beta, \gamma$ |
Scalars are lowercase |
| $\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{z}$ |
Vectors are bold lowercase |
| $\boldsymbol{A}, \boldsymbol{B}, \boldsymbol{C}$ |
Matrices are bold uppercase |
| $\boldsymbol{x}^\top, \boldsymbol{A}^\top$ |
Transpose of a vector or matrix |
| $\boldsymbol{A}^{-1}$ |
Inverse of a matrix |
| $\langle \boldsymbol{x}, \boldsymbol{y} \rangle$ |
Inner product of $\boldsymbol{x}$ and $\boldsymbol{y}$ |
| $\boldsymbol{x}^\top \boldsymbol{y}$ |
Dot product of $\boldsymbol{x}$ and $\boldsymbol{y}$ |
| $\mathbb{Z}, \, \mathbb{N}, \, \mathbb{R}$ |
Integers, natural numbers and real numbers respectively |
| $\mathbb{R}^n$ |
$n$-dimensional vector space of real numbers |
| $a := b$ |
$a$ is defined as $b$ |
| $a \approx b$ |
$a$ is approximately equal to $b$ |
| $a \simeq b$ |
$a$ is approximately equivalent to $b$ (stronger than $\approx$) |
| $a \in \mathcal{A}$ |
$a$ is an element of set $\mathcal{A}$ |
| $A \subseteq B$ |
$A$ is a subset of $B$ (or equal to $B$) |
| $A \setminus B$ |
Set difference: elements in $A$ that are not in $B$ |
| $\forall x \in \mathbb{R}$ |
For all $x$ in the set of real numbers |
| $\boldsymbol{I}_m$ |
Identity matrix of size $m \times m$ |
| $\boldsymbol{0}_{m,n}$ |
Matrix of zeros of size $m \times n$ |
| $\boldsymbol{1}_{m,n}$ |
Matrix of ones of size $m \times n$ |
| $\text{rk}(\boldsymbol{A})$ |
Rank of matrix $\boldsymbol{A}$ |
| $\text{Im}(\Phi)$ |
Image of linear mapping $\Phi$ |
| $\text{ker}(\Phi)$ |
Kernel (null space) of a linear mapping $\Phi$ |
| $\text{span}[\boldsymbol{b}_1]$ |
Span (generating set) of $\boldsymbol{b}_1$ |
| $\text{det}(\boldsymbol{A})$ |
Determinant of $\boldsymbol{A}$ |
| $\|\cdot\|$ |
Norm; Euclidean, unless specified |
| $\boldsymbol{x} \perp \boldsymbol{y}$ |
Vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ are orthogonal |
| $V$ |
Vector space |
| $\boldsymbol{\theta}$ |
Parameters |
| $\nabla$ |
Gradient (Nabla) |
| $\sum_{n=1}^N x_n$ |
Sum of the $x_n$: $x_1 + \ldots + x_N$ |
| $\prod_{n=1}^N x_n$ |
Product of the $x_n$: $x_1 \times \ldots \times x_N$ |
| $\frac{\partial f}{\partial x}$ |
Partial derivative of $f$ with respect to $x$ |
| $\frac{df}{dx}$ |
Total derivative of $f$ with respect to $x$ |
| $P(A)$ |
Probability of event $A$ |
| $P(A|B)$ |
Conditional probability of $A$ given $B$ |
| $\mathbb{E}[X]$ |
Expected value of random variable $X$ |
| $\sigma$ |
Standard deviation |
| $\mathcal{N}(\mu, \sigma^2)$ |
Normal distribution with mean $\mu$ and variance $\sigma^2$ |
| $\arg\max_x f(x)$ |
Value of $x$ that maximizes $f(x)$ |
| $\arg\min_x f(x)$ |
Value of $x$ that minimizes $f(x)$ |
| $L(y, \hat{y})$ |
Loss function comparing true value $y$ and predicted value $\hat{y}$ |
| $\hat{y}$ |
Predicted value of $y$ (y hat) |
| $\boldsymbol{w}$ |
Weights |
| $b$ |
Bias |
| $\eta$ |
Learning rate in optimizer (Eta) |
| $g(x)$ |
Activation function |
| $x \sim p(x)$ |
Random variable $x$ is distributed according to a probability distribution $p(x)$ |
| $\mathbb{E}_{p_{x}}\,\,[\,\,\cdot\,\,]$ |
Expected value where $x$ is sampled from the data distribution |
| $\mathbf{X} = \{x^{(i)}\}_{i=1}^N$ |
Dataset of N samples where $x^{(i)}$ represents the i-th sample |
| $\odot$ |
Element-wise product |