Combined notes for EECS 501 & MECHENG 549

Probability Basics

Probability Space

• Notation: $(\Omega,\mathcal{F},\mathbb{P})$
• $\Omega$: Sample space
• $\mathcal{F}$: Event space. Require 3 axioms (to be σ-field).
• $\mathbb{P}$: Probability measure. Require 3 axioms.
• Product Space: Probability spaces can be combined using Cartesian product.
• Independence: $\mathbb{P}(A_{k_1}\cap A_{k_2}\cap …\cap A_{k_l})=\prod_{i=1}^l \mathbb{P}(A_i),\;\forall \{k_i\}_1^l\subset\{ 1..n\}$
• Conditional probability: $\mathbb{P}\left(A_i \middle| A_j\right)=\mathbb{P}(A_i\cap A_j)/\mathbb{P}(A_j)$
• Total probability: $\mathbb{P}(B)=\sum_{i=1}^n \mathbb{P}(B\cap A_i)=\sum_{i=1}^n \mathbb{P}\left(B\middle| A_i\right)\mathbb{P}(A_i)$, where $\{A_1,\cdots,A_n\}$ are disjoint and partition of $\Omega$.
• Bayes: $\mathbb{P}(A_j|B)=\frac{P(B|A_j)P(A_j)}{\sum^n_{i=1} P(B|A_i)P(A_i)}$
• Priori: $\mathbb{P}(B|A_j)$
• Posteriori: $\mathbb{P}(A_j|B)$

Random Variables

• Scalar Random Variable: $X: \Omega\rightarrow\mathbb{F}$
• Formal definition: $X: (\Omega,\mathcal{F},\mathbb{P})\rightarrow(\Omega_X\subset\mathbb{F},\mathcal{F}_X\subset\left\{\omega\middle| \omega\subset\Omega_X\right\},\mathbb{P}_X:\mathcal{F}_X\rightarrow[0,1])$
• Cumulative Distribution Function (CDF): $F_X(x)=\mathbb{P}(X(\omega)\leqslant x)$
• Probability Mass Function (PMF): $p_X(x)=\mathbb{P}(\{x\})$
• Probability Density Function (PDF): $$f_X(x)=\mathbb{P}(x< X(\omega)\leqslant x+ \mathrm{d}x)=dF_X(x)/ \mathrm{d}x$$
• Vector Random Variable: $X: \Omega\rightarrow\mathbb{F}^n$
• Formal definition $X: (\Omega,\mathcal{F},\mathbb{P})\rightarrow(\Omega_X\subset\mathbb{F}^n,\mathcal{F}_X\subset\left\{\omega\middle| \omega\subset\Omega_X\right\},\mathbb{P}_X:\mathcal{F}_X\rightarrow[0,1])$
• Cumulative Distribution Function (CDF): $F_X(x)=\mathbb{P}(\{X(\omega)\}_i\leqslant x_i), i=1\ldots n$
• Probability Mass Function (PMF): $p_X(x)=\mathbb{P}(\{x\})$
• Probability Density Function (PDF): $f_X(x)=\mathbb{P}(x_i< \{X(\omega)\}_i\leqslant x_i+ \mathrm{d}x_i)=\frac{\partial^n}{\partial x_1\partial x_2\ldots\partial x_n}F_X(x)$
• Common distributions
• Uniform: $$f(x)= \begin{cases} 1/(b-a), & a< x< b \\ 1, & \text{otherwise} \end{cases}$$
• Normal (multivariate):
$$f(x)=\frac{1}{\sqrt{(2\pi)^n \det(S)}}\exp\left\{-\frac{1}{2}(x-\mu)^\top S^{-1}(x-\mu)\right\}$$
• Rayleigh: $$f(x)=\frac{x}{\sigma^2}\exp\left\{-\frac{x^2}{2\sigma^2}\right\}H(x)$$ where $H(x)$ is Heaviside step function
• Exponential: $$f(x)=\frac{1}{\mu}\exp\left\{-\frac{x}{\mu}\right\}H(x)$$
• Laplacian: $$f(x)=\frac{c}{2}\exp\left\{-c|x|\right\}$$
• Marginalization

• Marginal distribution function: $F_{1:m}(x_{1:m})\equiv F\left(\begin{bmatrix}x_1&x_2&\ldots&x_m&\infty&\ldots&\infty\end{bmatrix}^\top\right)$
• Marginal density function: $f_{1:m}(x_{1:m})=\int^\infty_{-\infty}\cdots\int^\infty_{-\infty} f(x) \mathrm{d}x_{m+1}\cdots \mathrm{d}x_n$
• Conditional

• Conditional probability (continuous random variable): $$\mathbb{P}(X\in\mathcal{D}_ 1|X\in\mathcal{D}_ 2)=\frac{\mathbb{P}(X\in\mathcal{D}_ 1\cap\mathcal{D}_ 2)}{\mathbb{P}(X\in\mathcal{D}_ 2)}=\frac{\int_{\mathcal{D}_1\cap\mathcal{D}_2}f_X(x) \mathrm{d}x_1\ldots\mathrm{d}x_n}{\int_{\mathcal{D}_2}f_X(x) \mathrm{d}x_1\ldots \mathrm{d}x_n}$$
• Conditional distribution: $F_X(x|X\in\mathcal{D})=\mathbb{P}(X_i\leqslant x_i|X\in\mathcal{D})$
• Conditional density: $f_X(x|X\in\mathcal{D})=\frac{\partial^n}{\partial x_1\partial x_2\ldots\partial x_n}F_X(x|X\in\mathcal{D})$

Expectation & Moments

• Expectation: $\mathbb{E}\psi(X)=\int^\infty_\infty\cdots\int^\infty_\infty\psi(x)f_X(x) \mathrm{d}x_1\ldots \mathrm{d}x_n$
• Moment (p-th order): $\mathbb{E}[X^p]$
• Mean: $\mu_X=\mathbb{E}[X]$
• Central moment (p-th order): $\mathbb{E}[(X-\mu_X)^p]$
• Variance: $\sigma_X^2=\mathbb{E}[(X-\mu_X)^2]$
• Skewness: $\sigma_X^3=\mathbb{E}[(X-\mu_X)^3]$
• Kurtosis: $\sigma_X^4=\mathbb{E}[(X-\mu_X)^4]$
• Covariance: $\text{cov}(X_i,X_j)=\mathbb{E}[(X_i-\mu_i)(X_j-\mu_j)]$
• Covariance matrix: $S=\mathbb{E}\left[(X-\mu_X)(X-\mu_X)^\top\right]$
• Conditional expectation: $\mathbb{E}[Y|X=x]=\int^\infty_{-\infty}yf_{Y|X}(y|x)\mathrm{d}y$
• Centered variable: $\tilde{X}=X-\mu_X$

Transform methods

• Probability generating function(PGF): similiar to Z-tranform $$G_X(z)\equiv\mathbb{E}[z^X]=\sum_{x_i}z^{x_i}p_X(x_i)$$
• For $\mathcal{F}_X=\mathbb{N}$, we have $$\frac{\mathrm{d}^k}{\mathrm{d}z^k}G_X(z)\Biggr|_{z=1}=\mathbb{E}[X(X-1)\cdots(X-(k-1))]$$
• Moment generating function(MGF): similar to Laplace transform $$M_X(z)\equiv\mathbb{E}[e^{sX}]=\int^\infty_{-\infty}e^{sx}f_X(x)\mathrm{d}x$$
• Generally we have $$\frac{\mathrm{d}^k}{\mathrm{d}s^k}M_X(s)\Biggr|_{s=0}=\mathbb{E}[X^k]$$
• Characteristic function: similar to Fourier transform $$\phi_X(\omega)\equiv\mathbb{E}[e^{j\omega X}]=\int^\infty_{-\infty}e^{j\omega x}f_X(x)\mathrm{d}x$$
• Generally we have $$\frac{\mathrm{d}^k}{\mathrm{d}\omega^k}\phi_X(\omega)\Biggr|_{\omega=0}=j^k\mathbb{E}[X^k]$$
• Joint characteristic function: for vector case $X\in\mathbb{R}^n$, we define vector $u$ and $$\phi_X(u)\equiv\mathbb{E}[e^{ju^\top X}]$$
• Trivial usage: if $Y=X_1+X_2$, then $\phi_Y(u)=\phi_{X_1}(u)\phi_{X_2}(u)$

Famous Inequalities

• Cauchy-Schwarz Inequality: $S=\left(\mathbb{E}[(X_i-\mu_i)(X_j-\mu_j)]\right)^2\leqslant\left(\mathbb{E}[X_i-\mu_i]\right)^2\left(\mathbb{E}[X_j-\mu_j]\right)^2$
• Markov Inequality: $\mathbb{P}(X\geqslant a)\leqslant \mathbb{E}X/a,\;a>0$
• Chebychev Inequality: $\mathbb{P}(|X-\mu|\geqslant\delta)\leqslant\sigma^2/\delta^2$
• Chernoff bound: $\mathbb{P}(X\geqslant a)\leqslant \underset{s\geqslant 0}{\min}\,e^{-as}M_X(s)$
• Law of Large Numbers: let $X_i$ be samples drawn from $(\mathbb{R}^n,\mathcal{B}^n,\mathbb{P})$, and $\mathbb{P}$ is such that $X_k$ has mean $\mu$ and covariance $S$
• Weak version: if $Y_k=\frac{1}{k}\sum^k_{j=1}X_j$ then $$\underset{k\rightarrow\infty}{\lim} \mathbb{P}\left(\left\lVert Y_k-\mu\right\rVert>\epsilon\right)=0$$
• Strong version: if $Y_k=\frac{1}{k}\sum^k_{j=1}X_j$ then $$\underset{k\rightarrow\infty}{\lim} Y_k=\mu$$
• Central Limit Theorem: if $Y_k=\frac{1}{\sqrt{k}}\sum^k_{j=1}S^{-1/2}\left(X_j-\mu\right)$ then $$\underset{k\rightarrow\infty}{\lim} f_{Y_k}(y_k)=\mathcal{N}(0,I)$$

Uncertainty Propagation

Suppose $Y=\psi(X)$ or specifically $y=\psi(x_1)=\psi(x_2)=\cdots=\psi(x_K)$

• Scalar case: $$f_Y(y)=\sum^K_{k=1} f_X(\psi^{-1}_ k(y))\left| \frac{\partial\psi}{\partial x}\biggr|_{x=\psi^{-1}_k(y)} \right|^{-1}$$
• Vector case: $$f_Y(y)=\sum^K_{k=1} f_X(\psi^{-1}_ k(y))\left|\det(J)\right|^{-1},\;\text{Jacobian }J=\frac{\partial\psi}{\partial x}\biggr|_{x=\psi^{-1}_k(y)}$$
• Trivial Case — Summation: $Y=X_1+X_2$, then $f_Y=\int^\infty_{-\infty}f_{X_1}(x_1)f_{X_2}(y-x_1) \mathrm{d}x_1$

Discrete-Time Stochastic System

Stochastic Sequences

• Definition: Given $k\in\mathcal{K}\subseteq\mathbb{Z}$ a sequence of integers, $X(k,\omega): (\Omega,\mathcal{F},\mathbb{P})\rightarrow(\mathbb{R}^n,\mathcal{B},\mathbb{P}_X)$ is a random/stochastic sequence.
• Uncertainties: Consider a casual system $F$ relates some scalar inputs $u(k)$ to output $x(k)$
• Epistemic/Model uncertainty: $X(k,\omega)=F(k,u(k),u(k-1),\ldots,\omega)$. (system is stochastic and input is deterministic).
• Aleatoric/Input uncertainty: $X(k,\omega)=f(k,U(k,\omega),u(k-1,\omega),\ldots)$ (system is deterministic and input is stochastic).
• Realization: An outcome $X(k,\omega)=x(k)$ is called a realization of stochastic sequence $X$
• Terminology and convention
• $X(k,\omega)$ is often written as $X(k)$ when there’s no ambiguity.
• $\mathcal{K}=\mathbb{Z}$ if not specified.
• Sequence over a set $\mathcal{K}_1\subseteq\mathcal{K}$ are denoted $X(\mathcal{K}_1)$.
• $X$ denotes $X(\mathcal{K})$ if not specified.
• Consecutive subsequence: $$X(k:l)=\{X(k),X(k+1),\ldots,X(l)\},\;x(k:l)=\{x(k),x(k+1),\ldots,x(l)\}$$
• Abbreviations:
• SS - stochastic sequence
• IID - independent indentically distributed

Probabilistic characterization

• Distribution and density: $$F_X\left(k:l;x(k:l)\right)\equiv\mathbb{P}((X_i(k)\leqslant x_i(k))\cap\cdots\cap(X_i(l)\leqslant x_i(l)),\;i=1\ldots n)$$ $$f_X\left(k:l;x(k:l)\right)\equiv \frac{\partial^{n(l-k+1)}}{\partial x_1(k)\cdots\partial x_n(l)}F_X(k:l;x(k:l))$$
Here $k:l$ actually denotes a set of consecutive integers, it can be also changed to ordinary sets $\{k,l\}$ or single scalar $k$.
• Ensemble Average: $\mathbb{E}[\psi(X(k))]$, doing summation over different realization at same time $k$
• Mean: $\mu_X(k)\equiv\mathbb{E}[X(k)]=\int^\infty_{-\infty}x(k)f_X(k;x(k))\mathrm{d}x(k)$
• Conditional Mean: $\mu_X(l|k)\equiv\mathbb{E}[X(l)|X(k)=x(k)]$
• Time Average: $\frac{1}{2N+1}\sum^N_{k=-N}\psi(X(k))$, doing summation over different time k of same realization
• Autocorrelation:
• Scalar case: $r_X(k,l)\equiv\mathbb{E}[X(k)X(l)]=\int^\infty_{-\infty}\int^\infty_{-\infty}x(k)x(l)f_X(k,l;x(k,l))\mathrm{d}x(k)\mathrm{d}x(l)$
• Vector case: $R_X(k,l)\equiv\mathbb{E}[X(k)X^\top(l)]$
• Conditional autocorrelation: $R_X(k,l|q)\equiv\mathbb{E}[X(k)X^\top(l)|X(q)=x(q)]$
• Autocovariance:
• Scalar case: $\kappa_X(k,l)\equiv\mathbb{E}[(X(k)-\mu_X(k))(X(l)-\mu_X(l))]$
• Vector case: $\mathrm{K}_X(k,l)\equiv\mathbb{E}[(X(k)-\mu_X(k))(X(l)-\mu_X(l))^\top]$
• Conditional autocovariance: $\mathrm{K}_X(k,l|q)\equiv\mathbb{E}[(X(k)-\mu_X(k|q))(X(l)-\mu_X(l|q))^\top]$
• Useful conclusion: $\mathrm{K}(a,b)=\mathrm{K}(b,a)^T$
• Stationarity: (identically distributed over time) $$\forall x(k:l)\in\mathbb{R}^{n(l-k+1)},\;\forall s\in\mathbb{Z},\;f_X(k:l;x(k:l))=f_X(k+s:l+s;x(k:l))$$
• Weak Stationarity: $\forall k,l$ if $$\mu_X(k)=\mu_X(l)\;\text{and}\;\mathrm{K}_X(k,l)=\mathrm{K}_X(k+s,l+s)\equiv\bar{\mathrm{K}}_X(s)$$ Weak stationarity is necessary condition for stationarity. (Equal when Gaussian distributed)
• Ergodicity: $X$ is called ergodic in $\psi$ if
1. $\mathbb{E}[\psi(X)]$ is stationary
2. Ensemble average is equal to Time average, that is $$\frac{1}{2N+1}\sum^N_{k=-N}\psi(X(k))\rightarrow\mathbb{E}[\psi(X)]\;\text{as}\;l\rightarrow \infty$$

Markov Sequences

• Markov Sequence: A ss. $X(k)$ is called a (Discrete-time) Markov sequence if $$f_X\left(k,x(k)\middle|X(k-1)=x(k-1),X(k-2)=x(k-2),\ldots\right)=f_X(k;x(k)|X(k-1)=x(k-1))$$
• Guassian-Markov Sequence (GMS): $X(k+1)=g(k,X(k),W(k))$ where $W(k)$ is iid. Guassian
• In LTV system, the form is usually $X(k+1)=A(k)X(k)+B(k)W(k)$
• For linear Markov sequences, the deterministic mean sequence and centered uncertain sequence completely decouple.
• We often assume that ss. is centered in this case.
• Hidden Markov model: Sequence $Y$ is called a Hidden Markov Model if it’s modeled by a system of the form \begin{align}X(k+1)&=g(k,X(k),W(k)) \\ Y(k)&=h(k,X(k),W(k))\end{align}
We also say that $Y$ has a (discrete-time) stochastic state space.

Linear Markov Model

Given the model \begin{align}X(k+1)&=A(k)X(k)+B(k)W(k) \\ Y(k)&=C(k)X(k)+D(k)W(k)\end{align}
Note that we assume $X(k)$ and $Y(k)$ are centered with regard to deterministic inputs.

• Explicit state transition: By recursive substitution, $$X(k)=\Psi(k,q)X(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)W(i)$$ where state transition matrix $\Psi(k,q)=\begin{cases}I, &k=q \\ \prod^{k-1}_ {i=q}A_i,& k>q\end{cases}$ and $\Gamma(k,i)=\Psi(k,i+1)B(i)$.
• Conditioned Mean sequence: $\mu_X(k|q)=\Psi(k,q)\mu_X(q)+\sum^{k-1}_{i=q}\Gamma(k,i)\mu_W(i)$
• Conditioned Autocovariance Matrix: $$\mathrm{K}_ X(k,l|q)=\Psi(k,q)S_X(q)\Psi^\top(l,q)+\sum^{min\{k,l\}-1}_{i=q}\Gamma(k,i)S_W(i)\Gamma^\top(l,i)$$
• A special case: $S_X(k|q)=\mathrm{K}_ X(k,k|q)=\sum^{k-1}_{i=q}\Gamma(k,i)S_W(i)\Gamma^\top(k,i)$
• Useful equation (stationary centered case): $$\mathrm{K}_X(k,l)=\begin{cases} S_X(k)\cdot(A^\top)^{(l-k)}&,l>k\\ S_X(k)&,l=k\\ A^{(k-l)}S_X(k)&,l< k\end{cases}$$
• Conditional Autocorrelation Matrix: $R_X(k,l|q)=\mathrm{K}_X(k,l|q)+\mu_X(k|q)\mu_X^\top(l|q)$
• Recursive Form:
• Mean sequence: $\mu_X(k+1|q)=A(k)\mu_X(k|q)+B(k)\mu_W(k)$
• (Discrete-time algebraic) Lyapunov difference equation (algebraic Stein difference equation): $S_X(k+1|q)=A(k)S_X(k|q)A^\top(k)+B(k)S_W(k)B^\top(k)$
• Convergence when $k\rightarrow\infty$:
• Mean convergence: $\mu_X(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$ ($\lambda$ denotes eigenvalue)
• Covariance convergence: $S_X(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$
• (Discrete-time) Lyapunov equation (Stein equation): $\bar{S}_X=A\bar{S}_XA^\top+B\bar{S}_W(k)B^\top$. Solution for this equation exists iff. $A$ is asymptotically stable (characterizing sequence is stationary).
• Observation property:
• Mean: $\mu_Y(k|q)=C(k)\mu_X(k|q)+D(k)\mu_W(k)$
• Covariance: $$\mathrm{K}_Y(k,l|q)=\begin{cases}C(k)\mathrm{K}_X(k,l|q)C^\top(l)+C(k)\Gamma(k,l)S_W(l)D^\top(l)&:k>l \\C(k)S_XC^\top(k)+D(k)S_W(k)D^\top(k)&:k=l\\ C(k)\mathrm{K}_X(k,l|q)C^\top(l)+D(k)S_W(k)\Gamma^\top(l,k)C^\top(l)&:k< l\end{cases}$$
• Stationary covariance: $$\mathrm{K}_Y(s)=\begin{cases}CA^{|s|}\bar{S}_XC^\top+CA^{|s|-1}B\bar{S}_WD^\top&:s\neq 0 \\C\bar{S}_XC^\top+D\bar{S}_WD^\top&:s=0\end{cases}$$

Gaussian Stochastic Sequence

• (Theorem 6) A linear controllable GMS $X(k)$ is stationary iff. $A(k)=A, B(k)=B$ (time-invariant) and $A$ is asymptotically stable.
• (Corollary 1) All LTI stationary GMS are also ergodic in all finite momoents
• Solve $X(k+1)=AX(k)+BW(k)$ when $X$ is stationary ($\max_i|\lambda_i(A)|<1$)
1. solve $\bar{\mu}_X=A\bar{\mu}_X+B\bar{\mu}_W$ for $\bar{\mu}$
2. solve $\bar{S}_X=A\bar{S}_XA^\top+B\bar{S}_WB^\top$ for $\bar{S}_X$
3. calculate $\Sigma_X(k:l|q)=\begin{bmatrix}\mathrm{K}_X(k,k|q)&\cdots&\mathrm{K}_X(k,l|q)\\ \vdots&\ddots&\vdots\\ \mathrm{K}_X(l,k|q)&\cdots&\mathrm{K}_X(l,l|q)\end{bmatrix}$ using $\bar{S}_X$
4. Then $f_X(k:l;x(k:l))$ is determined with $\mu_X(k:l)=\{\bar{\mu}_X,\bar{\mu}_X\ldots\bar{\mu}_X\}$ and $\Sigma_X(k:l)$ above.

Observation & Filtering

• (LTI) Luenberger observer: \begin{align}\hat{X}(k+1)&=A\hat{X}(k)+L(\hat{Y}(k)-Y(k))+B\bar{\mu}_W \\ \hat{Y}(k)&=C\hat{X}(k)+D\bar{\mu}_W\end{align} where $L$ is the observer gain.
• Note: We often assume that the process & output noise are decoupled and independent
• Combined form: $\hat{X}(k+1)=[A+LC]\hat{X}(k)-LY(k)+B\mu_W(k)$
• State estimation residual $r(k)=X(k)-\hat{X}(k)$
• Combined form: $r(k+1)=[A+LC]r(k)+[B+LD]\tilde{W}(k)$
• Stationary covariance can be solved by a Lyapunov equation $$\bar{S}_r=[A+LC]\bar{S}_r[A+LC]^T+[LD+B]\bar{S}_W[LD+B]^T$$
• Objective: minimize $\bar{S}_r=\mathbb{E}(rr^\top)$
• Solutions: $L=-A\bar{S}_rC^\top[C\bar{S}_rC^\top+D\bar{S}_WD^\top]^{-1}$ (Kalman observer gain)
• Or solve (Discrete-time) Algebraic Riccati equation $$\bar{S}_r=A\bar{S}_rA^\top+B\bar{S}_WB^\top-A\bar{S}_rC^\top[C\bar{S}_rC^\top+D\bar{S}_WD^\top]^{-1}C\bar{S}_rA^\top$$
• Innovation sequence $e(k)=\hat{Y}(k)-Y(k)$ with $L$ is optimal
• We can find that $\mu_e=0$ and $\mathrm{K}_e(k+s,k)=\begin{cases}C\bar{S}_rC^\top+D\bar{S}_WD^T&:s=0 \\0&:s\neq0\end{cases}$. So the innovation sequence is iid. (only in Kalman observer)
• (Output) Probabilistically-equivalent model: \begin{align}X(k+1)&=AX(k)+Le(k) \\ Y(k)&=CX(k)-e(k)\end{align}

Notes for math typing in Hexo

1. Escape \ by \\. Especially escape { by \\{ instead of \{, and escape \\ by \\\\.
2. Be careful about _, it’s used in markdown as italic indicator. Add space after _ is a useful solution.
3. Some useful Mathjax tricks at StackExchange
4. Several capital Greek characters should directly use its related Latin alphabet with \mathrm command.
Shoot me some coffee money XD
0%