StochasticSystemNotes

Combined notes for EECS 501 & MECHENG 549

Probability Basics

Probability Space

  • Notation: $(\Omega,\mathcal{F},\mathbb{P})$
    • $\Omega$: Sample space
    • $\mathcal{F}$: Event space. Require 3 axioms (to be σ-field).
    • $\mathbb{P}$: Probability measure. Require 3 axioms.
  • Product Space: Probability spaces can be combined using Cartesian product.
  • Independence: $\mathbb{P}(A_{k_1}\cap A_{k_2}\cap …\cap A_{k_l})=\prod_{i=1}^l \mathbb{P}(A_i),\;\forall \{k_i\}_1^l\subset\{ 1..n\}$
  • Conditional probability: $\mathbb{P}\left(A_i \middle| A_j\right)=\mathbb{P}(A_i\cap A_j)/\mathbb{P}(A_j)$
  • Total probability: $\mathbb{P}(B)=\sum_{i=1}^n \mathbb{P}(B\cap A_i)=\sum_{i=1}^n \mathbb{P}\left(B\middle| A_i\right)\mathbb{P}(A_i)$, where $\{A_1,\cdots,A_n\}$ are disjoint and partition of $\Omega$.
  • Bayes: $\mathbb{P}(A_j|B)=\frac{P(B|A_j)P(A_j)}{\sum^n_{i=1} P(B|A_i)P(A_i)}$
    • Priori: $\mathbb{P}(B|A_j)$
    • Posteriori: $\mathbb{P}(A_j|B)$

Random Variables

  • Scalar Random Variable: $X: \Omega\rightarrow\mathbb{F}$
    • Formal definition: $X: (\Omega,\mathcal{F},\mathbb{P})\rightarrow(\Omega_X\subset\mathbb{F},\mathcal{F}_X\subset\left\{\omega\middle| \omega\subset\Omega_X\right\},\mathbb{P}_X:\mathcal{F}_X\rightarrow[0,1])$
    • Cumulative Distribution Function (CDF): $F_X(x)=\mathbb{P}(X(\omega)\leqslant x)$
    • Probability Mass Function (PMF): $p_X(x)=\mathbb{P}(\{x\})$
    • Probability Density Function (PDF): $$f_X(x)=\mathbb{P}(x< X(\omega)\leqslant x+ \mathrm{d}x)=dF_X(x)/ \mathrm{d}x$$
  • Vector Random Variable: $X: \Omega\rightarrow\mathbb{F}^n$
    • Formal definition $X: (\Omega,\mathcal{F},\mathbb{P})\rightarrow(\Omega_X\subset\mathbb{F}^n,\mathcal{F}_X\subset\left\{\omega\middle| \omega\subset\Omega_X\right\},\mathbb{P}_X:\mathcal{F}_X\rightarrow[0,1])$
    • Cumulative Distribution Function (CDF): $F_X(x)=\mathbb{P}(\{X(\omega)\}_i\leqslant x_i), i=1\ldots n$
    • Probability Mass Function (PMF): $p_X(x)=\mathbb{P}(\{x\})$
    • Probability Density Function (PDF): $f_X(x)=\mathbb{P}(x_i< \{X(\omega)\}_i\leqslant x_i+ \mathrm{d}x_i)=\frac{\partial^n}{\partial x_1\partial x_2\ldots\partial x_n}F_X(x)$
  • Common distributions
    • Uniform: $$f(x)=
      \begin{cases}
      1/(b-a), & a< x< b \\
      1, & \text{otherwise}
      \end{cases}$$
    • Normal (multivariate):
      $$f(x)=\frac{1}{\sqrt{(2\pi)^n \det(S)}}\exp\left\{-\frac{1}{2}(x-\mu)^\top S^{-1}(x-\mu)\right\}$$
    • Rayleigh: $$f(x)=\frac{x}{\sigma^2}\exp\left\{-\frac{x^2}{2\sigma^2}\right\}H(x)$$ where $H(x)$ is Heaviside step function
    • Exponential: $$f(x)=\frac{1}{\mu}\exp\left\{-\frac{x}{\mu}\right\}H(x)$$
    • Laplacian: $$f(x)=\frac{c}{2}\exp\left\{-c|x|\right\}$$
  • Marginalization

    • Marginal distribution function: $F_{1:m}(x_{1:m})\equiv F\left(\begin{bmatrix}x_1&x_2&\ldots&x_m&\infty&\ldots&\infty\end{bmatrix}^\top\right)$
    • Marginal density function: $f_{1:m}(x_{1:m})=\int^\infty_{-\infty}\cdots\int^\infty_{-\infty} f(x) \mathrm{d}x_{m+1}\cdots \mathrm{d}x_n$
  • Conditional

    • Conditional probability (continuous random variable): $$\mathbb{P}(X\in\mathcal{D}_ 1|X\in\mathcal{D}_ 2)=\frac{\mathbb{P}(X\in\mathcal{D}_ 1\cap\mathcal{D}_ 2)}{\mathbb{P}(X\in\mathcal{D}_ 2)}=\frac{\int_{\mathcal{D}_1\cap\mathcal{D}_2}f_X(x) \mathrm{d}x_1\ldots\mathrm{d}x_n}{\int_{\mathcal{D}_2}f_X(x) \mathrm{d}x_1\ldots \mathrm{d}x_n}$$
    • Conditional distribution: $F_X(x|X\in\mathcal{D})=\mathbb{P}(X_i\leqslant x_i|X\in\mathcal{D})$
    • Conditional density: $f_X(x|X\in\mathcal{D})=\frac{\partial^n}{\partial x_1\partial x_2\ldots\partial x_n}F_X(x|X\in\mathcal{D})$

Expectation & Moments

  • Expectation: $\mathbb{E}\psi(X)=\int^\infty_\infty\cdots\int^\infty_\infty\psi(x)f_X(x) \mathrm{d}x_1\ldots \mathrm{d}x_n$
  • Moment (p-th order): $\mathbb{E}[X^p]$
    • Mean: $\mu_X=\mathbb{E}[X]$
  • Central moment (p-th order): $\mathbb{E}[(X-\mu_X)^p]$
    • Variance: $\sigma_X^2=\mathbb{E}[(X-\mu_X)^2]$
    • Skewness: $\sigma_X^3=\mathbb{E}[(X-\mu_X)^3]$
    • Kurtosis: $\sigma_X^4=\mathbb{E}[(X-\mu_X)^4]$
  • Covariance: $\text{cov}(X_i,X_j)=\mathbb{E}[(X_i-\mu_i)(X_j-\mu_j)]$
    • Covariance matrix: $S=\mathbb{E}\left[(X-\mu_X)(X-\mu_X)^\top\right]$
  • Conditional expectation: $\mathbb{E}[Y|X=x]=\int^\infty_{-\infty}yf_{Y|X}(y|x)\mathrm{d}y$
  • Centered variable: $\tilde{X}=X-\mu_X$

Transform methods

  • Probability generating function(PGF): similiar to Z-tranform $$G_X(z)\equiv\mathbb{E}[z^X]=\sum_{x_i}z^{x_i}p_X(x_i)$$
    • For $\mathcal{F}_X=\mathbb{N}$, we have $$\frac{\mathrm{d}^k}{\mathrm{d}z^k}G_X(z)\Biggr|_{z=1}=\mathbb{E}[X(X-1)\cdots(X-(k-1))]$$
  • Moment generating function(MGF): similar to Laplace transform $$M_X(z)\equiv\mathbb{E}[e^{sX}]=\int^\infty_{-\infty}e^{sx}f_X(x)\mathrm{d}x$$
    • Generally we have $$\frac{\mathrm{d}^k}{\mathrm{d}s^k}M_X(s)\Biggr|_{s=0}=\mathbb{E}[X^k]$$
  • Characteristic function: similar to Fourier transform $$\phi_X(\omega)\equiv\mathbb{E}[e^{j\omega X}]=\int^\infty_{-\infty}e^{j\omega x}f_X(x)\mathrm{d}x$$
    • Generally we have $$\frac{\mathrm{d}^k}{\mathrm{d}\omega^k}\phi_X(\omega)\Biggr|_{\omega=0}=j^k\mathbb{E}[X^k]$$
  • Joint characteristic function: for vector case $X\in\mathbb{R}^n$, we define vector $u$ and $$\phi_X(u)\equiv\mathbb{E}[e^{ju^\top X}]$$
    • Trivial usage: if $Y=X_1+X_2$, then $\phi_Y(u)=\phi_{X_1}(u)\phi_{X_2}(u)$

Famous Inequalities

  • Cauchy-Schwarz Inequality: $S=\left(\mathbb{E}[(X_i-\mu_i)(X_j-\mu_j)]\right)^2\leqslant\left(\mathbb{E}[X_i-\mu_i]\right)^2\left(\mathbb{E}[X_j-\mu_j]\right)^2$
  • Markov Inequality: $\mathbb{P}(X\geqslant a)\leqslant \mathbb{E}X/a,\;a>0$
  • Chebychev Inequality: $\mathbb{P}(|X-\mu|\geqslant\delta)\leqslant\sigma^2/\delta^2$
  • Chernoff bound: $\mathbb{P}(X\geqslant a)\leqslant \underset{s\geqslant 0}{\min}\,e^{-as}M_X(s)$
  • Law of Large Numbers: let $X_i$ be samples drawn from $(\mathbb{R}^n,\mathcal{B}^n,\mathbb{P})$, and $\mathbb{P}$ is such that $X_k$ has mean $\mu$ and covariance $S$
    • Weak version: if $Y_k=\frac{1}{k}\sum^k_{j=1}X_j$ then $$\underset{k\rightarrow\infty}{\lim} \mathbb{P}\left(\left\lVert Y_k-\mu\right\rVert>\epsilon\right)=0$$
    • Strong version: if $Y_k=\frac{1}{k}\sum^k_{j=1}X_j$ then $$\underset{k\rightarrow\infty}{\lim} Y_k=\mu$$
    • Central Limit Theorem: if $Y_k=\frac{1}{\sqrt{k}}\sum^k_{j=1}S^{-1/2}\left(X_j-\mu\right)$ then $$\underset{k\rightarrow\infty}{\lim} f_{Y_k}(y_k)=\mathcal{N}(0,I)$$

Uncertainty Propagation

Suppose $Y=\psi(X)$ or specifically $y=\psi(x_1)=\psi(x_2)=\cdots=\psi(x_K)$

  • Scalar case: $$f_Y(y)=\sum^K_{k=1} f_X(\psi^{-1}_ k(y))\left| \frac{\partial\psi}{\partial x}\biggr|_{x=\psi^{-1}_k(y)} \right|^{-1}$$
  • Vector case: $$f_Y(y)=\sum^K_{k=1} f_X(\psi^{-1}_ k(y))\left|\det(J)\right|^{-1},\;\text{Jacobian }J=\frac{\partial\psi}{\partial x}\biggr|_{x=\psi^{-1}_k(y)}$$
  • Trivial Case — Summation: $Y=X_1+X_2$, then $f_Y=\int^\infty_{-\infty}f_{X_1}(x_1)f_{X_2}(y-x_1) \mathrm{d}x_1$

Discrete-Time Stochastic System

Stochastic Sequences

  • Definition: Given $k\in\mathcal{K}\subseteq\mathbb{Z}$ a sequence of integers, $X(k,\omega): (\Omega,\mathcal{F},\mathbb{P})\rightarrow(\mathbb{R}^n,\mathcal{B},\mathbb{P}_X)$ is a random/stochastic sequence.
  • Uncertainties: Consider a casual system $F$ relates some scalar inputs $u(k)$ to output $x(k)$
    • Epistemic/Model uncertainty: $X(k,\omega)=F(k,u(k),u(k-1),\ldots,\omega)$. (system is stochastic and input is deterministic).
    • Aleatoric/Input uncertainty: $X(k,\omega)=f(k,U(k,\omega),u(k-1,\omega),\ldots)$ (system is deterministic and input is stochastic).
  • Realization: An outcome $X(k,\omega)=x(k)$ is called a realization of stochastic sequence $X$
  • Terminology and convention
    • $X(k,\omega)$ is often written as $X(k)$ when there’s no ambiguity.
    • $\mathcal{K}=\mathbb{Z}$ if not specified.
    • Sequence over a set $\mathcal{K}_1\subseteq\mathcal{K}$ are denoted $X(\mathcal{K}_1)$.
    • $X$ denotes $X(\mathcal{K})$ if not specified.
    • Consecutive subsequence: $$X(k:l)=\{X(k),X(k+1),\ldots,X(l)\},\;x(k:l)=\{x(k),x(k+1),\ldots,x(l)\}$$
    • Abbreviations:
      • SS - stochastic sequence
      • IID - independent indentically distributed

Probabilistic characterization

  • Distribution and density: $$F_X\left(k:l;x(k:l)\right)\equiv\mathbb{P}((X_i(k)\leqslant x_i(k))\cap\cdots\cap(X_i(l)\leqslant x_i(l)),\;i=1\ldots n)$$ $$f_X\left(k:l;x(k:l)\right)\equiv \frac{\partial^{n(l-k+1)}}{\partial x_1(k)\cdots\partial x_n(l)}F_X(k:l;x(k:l))$$
    Here $k:l$ actually denotes a set of consecutive integers, it can be also changed to ordinary sets $\{k,l\}$ or single scalar $k$.
  • Ensemble Average: $\mathbb{E}[\psi(X(k))]$, doing summation over different realization at same time $k$
    • Mean: $\mu_X(k)\equiv\mathbb{E}[X(k)]=\int^\infty_{-\infty}x(k)f_X(k;x(k))\mathrm{d}x(k)$
    • Conditional Mean: $\mu_X(l|k)\equiv\mathbb{E}[X(l)|X(k)=x(k)]$
  • Time Average: $\frac{1}{2N+1}\sum^N_{k=-N}\psi(X(k))$, doing summation over different time k of same realization
  • Autocorrelation:
    • Scalar case: $r_X(k,l)\equiv\mathbb{E}[X(k)X(l)]=\int^\infty_{-\infty}\int^\infty_{-\infty}x(k)x(l)f_X(k,l;x(k,l))\mathrm{d}x(k)\mathrm{d}x(l)$
    • Vector case: $R_X(k,l)\equiv\mathbb{E}[X(k)X^\top(l)]$
    • Conditional autocorrelation: $R_X(k,l|q)\equiv\mathbb{E}[X(k)X^\top(l)|X(q)=x(q)]$
  • Autocovariance:
    • Scalar case: $\kappa_X(k,l)\equiv\mathbb{E}[(X(k)-\mu_X(k))(X(l)-\mu_X(l))]$
    • Vector case: $\mathrm{K}_X(k,l)\equiv\mathbb{E}[(X(k)-\mu_X(k))(X(l)-\mu_X(l))^\top]$
    • Conditional autocovariance: $\mathrm{K}_X(k,l|q)\equiv\mathbb{E}[(X(k)-\mu_X(k|q))(X(l)-\mu_X(l|q))^\top]$
    • Useful conclusion: $\mathrm{K}(a,b)=\mathrm{K}(b,a)^T$
  • Stationarity: (identically distributed over time) $$\forall x(k:l)\in\mathbb{R}^{n(l-k+1)},\;\forall s\in\mathbb{Z},\;f_X(k:l;x(k:l))=f_X(k+s:l+s;x(k:l))$$
  • Weak Stationarity: $\forall k,l$ if $$\mu_X(k)=\mu_X(l)\;\text{and}\;\mathrm{K}_X(k,l)=\mathrm{K}_X(k+s,l+s)\equiv\bar{\mathrm{K}}_X(s)$$ Weak stationarity is necessary condition for stationarity. (Equal when Gaussian distributed)
  • Ergodicity: $X$ is called ergodic in $\psi$ if
    1. $\mathbb{E}[\psi(X)]$ is stationary
    2. Ensemble average is equal to Time average, that is $$ \frac{1}{2N+1}\sum^N_{k=-N}\psi(X(k))\rightarrow\mathbb{E}[\psi(X)]\;\text{as}\;l\rightarrow \infty$$

Markov Sequences

  • Markov Sequence: A ss. $X(k)$ is called a (Discrete-time) Markov sequence if $$f_X\left(k,x(k)\middle|X(k-1)=x(k-1),X(k-2)=x(k-2),\ldots\right)=f_X(k;x(k)|X(k-1)=x(k-1))$$
  • Guassian-Markov Sequence (GMS): $X(k+1)=g(k,X(k),W(k))$ where $W(k)$ is iid. Guassian
    • In LTV system, the form is usually $X(k+1)=A(k)X(k)+B(k)W(k)$
    • For linear Markov sequences, the deterministic mean sequence and centered uncertain sequence completely decouple.
    • We often assume that ss. is centered in this case.
  • Hidden Markov model: Sequence $Y$ is called a Hidden Markov Model if it’s modeled by a system of the form $$\begin{align}X(k+1)&=g(k,X(k),W(k)) \\ Y(k)&=h(k,X(k),W(k))\end{align}$$
    We also say that $Y$ has a (discrete-time) stochastic state space.

Linear Markov Model

Given the model $$\begin{align}X(k+1)&=A(k)X(k)+B(k)W(k) \\ Y(k)&=C(k)X(k)+D(k)W(k)\end{align}$$
Note that we assume $X(k)$ and $Y(k)$ are centered with regard to deterministic inputs.

  • Explicit state transition: By recursive substitution, $$X(k)=\Psi(k,q)X(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)W(i)$$ where state transition matrix $\Psi(k,q)=\begin{cases}I, &k=q \\ \prod^{k-1}_ {i=q}A_i,& k>q\end{cases}$ and $\Gamma(k,i)=\Psi(k,i+1)B(i)$.
  • Conditioned Mean sequence: $\mu_X(k|q)=\Psi(k,q)\mu_X(q)+\sum^{k-1}_{i=q}\Gamma(k,i)\mu_W(i)$
  • Conditioned Autocovariance Matrix: $$\mathrm{K}_ X(k,l|q)=\Psi(k,q)S_X(q)\Psi^\top(l,q)+\sum^{min\{k,l\}-1}_{i=q}\Gamma(k,i)S_W(i)\Gamma^\top(l,i)$$
    • A special case: $S_X(k|q)=\mathrm{K}_ X(k,k|q)=\sum^{k-1}_{i=q}\Gamma(k,i)S_W(i)\Gamma^\top(k,i)$
    • Useful equation (stationary centered case): $$\mathrm{K}_X(k,l)=\begin{cases} S_X(k)\cdot(A^\top)^{(l-k)}&,l>k\\ S_X(k)&,l=k\\ A^{(k-l)}S_X(k)&,l< k\end{cases}$$
  • Conditional Autocorrelation Matrix: $R_X(k,l|q)=\mathrm{K}_X(k,l|q)+\mu_X(k|q)\mu_X^\top(l|q)$
  • Recursive Form:
    • Mean sequence: $\mu_X(k+1|q)=A(k)\mu_X(k|q)+B(k)\mu_W(k)$
    • (Discrete-time algebraic) Lyapunov difference equation (algebraic Stein difference equation): $S_X(k+1|q)=A(k)S_X(k|q)A^\top(k)+B(k)S_W(k)B^\top(k)$
  • Convergence when $k\rightarrow\infty$:
    • Mean convergence: $\mu_X(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$ ($\lambda$ denotes eigenvalue)
    • Covariance convergence: $S_X(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$
    • (Discrete-time) Lyapunov equation (Stein equation): $\bar{S}_X=A\bar{S}_XA^\top+B\bar{S}_W(k)B^\top$. Solution for this equation exists iff. $A$ is asymptotically stable (characterizing sequence is stationary).
  • Observation property:
    • Mean: $\mu_Y(k|q)=C(k)\mu_X(k|q)+D(k)\mu_W(k)$
    • Covariance: $$\mathrm{K}_Y(k,l|q)=\begin{cases}C(k)\mathrm{K}_X(k,l|q)C^\top(l)+C(k)\Gamma(k,l)S_W(l)D^\top(l)&:k>l \\C(k)S_XC^\top(k)+D(k)S_W(k)D^\top(k)&:k=l\\ C(k)\mathrm{K}_X(k,l|q)C^\top(l)+D(k)S_W(k)\Gamma^\top(l,k)C^\top(l)&:k< l\end{cases}$$
    • Stationary covariance: $$\mathrm{K}_Y(s)=\begin{cases}CA^{|s|}\bar{S}_XC^\top+CA^{|s|-1}B\bar{S}_WD^\top&:s\neq 0 \\C\bar{S}_XC^\top+D\bar{S}_WD^\top&:s=0\end{cases}$$

Gaussian Stochastic Sequence

  • (Theorem 6) A linear controllable GMS $X(k)$ is stationary iff. $A(k)=A, B(k)=B$ (time-invariant) and $A$ is asymptotically stable.
  • (Corollary 1) All LTI stationary GMS are also ergodic in all finite momoents
  • Solve $X(k+1)=AX(k)+BW(k)$ when $X$ is stationary ($\max_i|\lambda_i(A)|<1$)
    1. solve $\bar{\mu}_X=A\bar{\mu}_X+B\bar{\mu}_W$ for $\bar{\mu}$
    2. solve $\bar{S}_X=A\bar{S}_XA^\top+B\bar{S}_WB^\top$ for $\bar{S}_X$
    3. calculate $\Sigma_X(k:l|q)=\begin{bmatrix}\mathrm{K}_X(k,k|q)&\cdots&\mathrm{K}_X(k,l|q)\\ \vdots&\ddots&\vdots\\ \mathrm{K}_X(l,k|q)&\cdots&\mathrm{K}_X(l,l|q)\end{bmatrix}$ using $\bar{S}_X$
    4. Then $f_X(k:l;x(k:l))$ is determined with $\mu_X(k:l)=\{\bar{\mu}_X,\bar{\mu}_X\ldots\bar{\mu}_X\}$ and $\Sigma_X(k:l)$ above.

Observation & Filtering

  • (LTI) Luenberger observer: $$\begin{align}\hat{X}(k+1)&=A\hat{X}(k)+L(\hat{Y}(k)-Y(k))+B\bar{\mu}_W \\ \hat{Y}(k)&=C\hat{X}(k)+D\bar{\mu}_W\end{align}$$ where $L$ is the observer gain.
    • Note: We often assume that the process & output noise are decoupled and independent
    • Combined form: $\hat{X}(k+1)=[A+LC]\hat{X}(k)-LY(k)+B\mu_W(k)$
  • State estimation residual $r(k)=X(k)-\hat{X}(k)$
    • Combined form: $r(k+1)=[A+LC]r(k)+[B+LD]\tilde{W}(k)$
    • Stationary covariance can be solved by a Lyapunov equation $$\bar{S}_r=[A+LC]\bar{S}_r[A+LC]^T+[LD+B]\bar{S}_W[LD+B]^T$$
  • Objective: minimize $\bar{S}_r=\mathbb{E}(rr^\top)$
    • Solutions: $L=-A\bar{S}_rC^\top[C\bar{S}_rC^\top+D\bar{S}_WD^\top]^{-1}$ (Kalman observer gain)
    • Or solve (Discrete-time) Algebraic Riccati equation $$\bar{S}_r=A\bar{S}_rA^\top+B\bar{S}_WB^\top-A\bar{S}_rC^\top[C\bar{S}_rC^\top+D\bar{S}_WD^\top]^{-1}C\bar{S}_rA^\top$$
  • Innovation sequence $e(k)=\hat{Y}(k)-Y(k)$ with $L$ is optimal
    • We can find that $\mu_e=0$ and $\mathrm{K}_e(k+s,k)=\begin{cases}C\bar{S}_rC^\top+D\bar{S}_WD^T&:s=0 \\0&:s\neq0\end{cases}$. So the innovation sequence is iid. (only in Kalman observer)
  • (Output) Probabilistically-equivalent model: $$\begin{align}X(k+1)&=AX(k)+Le(k) \\ Y(k)&=CX(k)-e(k)\end{align}$$

Notes for math typing in Hexo

  1. Escape \ by \\. Especially escape { by \\{ instead of \{, and escape \\ by \\\\.
  2. Be careful about _, it’s used in markdown as italic indicator. Add space after _ is a useful solution.
  3. Some useful Mathjax tricks at StackExchange
  4. Several capital Greek characters should directly use its related Latin alphabet with \mathrm command.
Shoot me some coffee money XD
0%