$a, b, c, \alpha, \beta, \gamma$ |
Scalars are lowercase |
$\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{z}$ |
Vectors are bold lowercase |
$\boldsymbol{A}, \boldsymbol{B}, \boldsymbol{C}$ |
Matrices are bold uppercase |
$\boldsymbol{x}^\top, \boldsymbol{A}^\top$ |
Transpose of a vector or matrix |
$\boldsymbol{A}^{-1}$ |
Inverse of a matrix |
$\langle \boldsymbol{x}, \boldsymbol{y} \rangle$ |
Inner product of $\boldsymbol{x}$ and $\boldsymbol{y}$ |
$\boldsymbol{x}^\top \boldsymbol{y}$ |
Dot product of $\boldsymbol{x}$ and $\boldsymbol{y}$ |
$\mathbb{Z}, \, \mathbb{N}, \, \mathbb{R}$ |
Integers, natural numbers and real numbers respectively |
$\mathbb{R}^n$ |
$n$-dimensional vector space of real numbers |
$a := b$ |
$a$ is defined as $b$ |
$a \approx b$ |
$a$ is approximately equal to $b$ |
$a \simeq b$ |
$a$ is approximately equivalent to $b$ (stronger than $\approx$) |
$a \in \mathcal{A}$ |
$a$ is an element of set $\mathcal{A}$ |
$A \subseteq B$ |
$A$ is a subset of $B$ (or equal to $B$) |
$A \setminus B$ |
Set difference: elements in $A$ that are not in $B$ |
$\forall x \in \mathbb{R}$ |
For all $x$ in the set of real numbers |
$\boldsymbol{I}_m$ |
Identity matrix of size $m \times m$ |
$\boldsymbol{0}_{m,n}$ |
Matrix of zeros of size $m \times n$ |
$\boldsymbol{1}_{m,n}$ |
Matrix of ones of size $m \times n$ |
$\text{rk}(\boldsymbol{A})$ |
Rank of matrix $\boldsymbol{A}$ |
$\text{Im}(\Phi)$ |
Image of linear mapping $\Phi$ |
$\text{ker}(\Phi)$ |
Kernel (null space) of a linear mapping $\Phi$ |
$\text{span}[\boldsymbol{b}_1]$ |
Span (generating set) of $\boldsymbol{b}_1$ |
$\text{det}(\boldsymbol{A})$ |
Determinant of $\boldsymbol{A}$ |
$\|\cdot\|$ |
Norm; Euclidean, unless specified |
$\boldsymbol{x} \perp \boldsymbol{y}$ |
Vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ are orthogonal |
$V$ |
Vector space |
$\boldsymbol{\theta}$ |
Parameters |
$\nabla$ |
Gradient (Nabla) |
$\sum_{n=1}^N x_n$ |
Sum of the $x_n$: $x_1 + \ldots + x_N$ |
$\prod_{n=1}^N x_n$ |
Product of the $x_n$: $x_1 \times \ldots \times x_N$ |
$\frac{\partial f}{\partial x}$ |
Partial derivative of $f$ with respect to $x$ |
$\frac{df}{dx}$ |
Total derivative of $f$ with respect to $x$ |
$P(A)$ |
Probability of event $A$ |
$P(A|B)$ |
Conditional probability of $A$ given $B$ |
$\mathbb{E}[X]$ |
Expected value of random variable $X$ |
$\sigma$ |
Standard deviation |
$\mathcal{N}(\mu, \sigma^2)$ |
Normal distribution with mean $\mu$ and variance $\sigma^2$ |
$\arg\max_x f(x)$ |
Value of $x$ that maximizes $f(x)$ |
$\arg\min_x f(x)$ |
Value of $x$ that minimizes $f(x)$ |
$L(y, \hat{y})$ |
Loss function comparing true value $y$ and predicted value $\hat{y}$ |
$\hat{y}$ |
Predicted value of $y$ (y hat) |
$\boldsymbol{w}$ |
Weights |
$b$ |
Bias |
$\eta$ |
Learning rate in optimizer (Eta) |
$g(x)$ |
Activation function |
$x \sim p(x)$ |
Random variable $x$ is distributed according to a probability distribution $p(x)$ |
$\mathbb{E}_{p_{x}}\,\,[\,\,\cdot\,\,]$ |
Expected value where $x$ is sampled from the data distribution |
$\mathbf{X} = \{x^{(i)}\}_{i=1}^N$ |
Dataset of N samples where $x^{(i)}$ represents the i-th sample |
$\odot$ |
Element-wise product |