latentspace · math-notation-cheat-sheet

Math Notation Cheat Sheet

09/08/2024

Frequently and commonly used mathematical notation symbols in the ML ...

Math Notation	Typical Meaning
$a, b, c, \alpha, \beta, \gamma$	Scalars are lowercase
$\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{z}$	Vectors are bold lowercase
$\boldsymbol{A}, \boldsymbol{B}, \boldsymbol{C}$	Matrices are bold uppercase
$\boldsymbol{x}^\top, \boldsymbol{A}^\top$	Transpose of a vector or matrix
$\boldsymbol{A}^{-1}$	Inverse of a matrix
$\langle \boldsymbol{x}, \boldsymbol{y} \rangle$	Inner product of $\boldsymbol{x}$ and $\boldsymbol{y}$
$\boldsymbol{x}^\top \boldsymbol{y}$	Dot product of $\boldsymbol{x}$ and $\boldsymbol{y}$
$\mathbb{Z}, \, \mathbb{N}, \, \mathbb{R}$	Integers, natural numbers and real numbers respectively
$\mathbb{R}^n$	$n$-dimensional vector space of real numbers
$a := b$	$a$ is defined as $b$
$a \approx b$	$a$ is approximately equal to $b$
$a \simeq b$	$a$ is approximately equivalent to $b$ (stronger than $\approx$)
$a \in \mathcal{A}$	$a$ is an element of set $\mathcal{A}$
$A \subseteq B$	$A$ is a subset of $B$ (or equal to $B$)
$A \setminus B$	Set difference: elements in $A$ that are not in $B$
$\forall x \in \mathbb{R}$	For all $x$ in the set of real numbers
$\boldsymbol{I}_m$	Identity matrix of size $m \times m$
$\boldsymbol{0}_{m,n}$	Matrix of zeros of size $m \times n$
$\boldsymbol{1}_{m,n}$	Matrix of ones of size $m \times n$
$\text{rk}(\boldsymbol{A})$	Rank of matrix $\boldsymbol{A}$
$\text{Im}(\Phi)$	Image of linear mapping $\Phi$
$\text{ker}(\Phi)$	Kernel (null space) of a linear mapping $\Phi$
$\text{span}[\boldsymbol{b}_1]$	Span (generating set) of $\boldsymbol{b}_1$
$\text{det}(\boldsymbol{A})$	Determinant of $\boldsymbol{A}$
$\\|\cdot\\|$	Norm; Euclidean, unless specified
$\boldsymbol{x} \perp \boldsymbol{y}$	Vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ are orthogonal
$V$	Vector space
$\boldsymbol{\theta}$	Parameters
$\nabla$	Gradient (Nabla)
$\sum_{n=1}^N x_n$	Sum of the $x_n$: $x_1 + \ldots + x_N$
$\prod_{n=1}^N x_n$	Product of the $x_n$: $x_1 \times \ldots \times x_N$
$\frac{\partial f}{\partial x}$	Partial derivative of $f$ with respect to $x$
$\frac{df}{dx}$	Total derivative of $f$ with respect to $x$
$P(A)$	Probability of event $A$
$P(A\|B)$	Conditional probability of $A$ given $B$
$\mathbb{E}[X]$	Expected value of random variable $X$
$\sigma$	Standard deviation
$\mathcal{N}(\mu, \sigma^2)$	Normal distribution with mean $\mu$ and variance $\sigma^2$
$\arg\max_x f(x)$	Value of $x$ that maximizes $f(x)$
$\arg\min_x f(x)$	Value of $x$ that minimizes $f(x)$
$L(y, \hat{y})$	Loss function comparing true value $y$ and predicted value $\hat{y}$
$\hat{y}$	Predicted value of $y$ (y hat)
$\boldsymbol{w}$	Weights
$b$	Bias
$\eta$	Learning rate in optimizer (Eta)
$g(x)$	Activation function
$x \sim p(x)$	Random variable $x$ is distributed according to a probability distribution $p(x)$
$\mathbb{E}_{p_{x}}\,\,[\,\,\cdot\,\,]$	Expected value where $x$ is sampled from the data distribution
$\mathbf{X} = \{x^{(i)}\}_{i=1}^N$	Dataset of N samples where $x^{(i)}$ represents the i-th sample
$\odot$	Element-wise product