Radon-Nykodym and Conditional Expectation
Radon-Nykodym Change of Measure
In this section, we will make use of a central theorem of Functional Analysis applied in the special case of Hilbert spaces.
Note
A (real) Hilbert space is a vector space with a bilinear form \(⟨\cdot,\cdot⟩:H\times H \to \mathbb{R}\), that is linear in the first as well as in the second argument, such that \(⟨x,y⟩ =⟨y,x⟩\) and \(⟨x,x⟩ \geq 0\) with equality if and only if \(x=0\).
Hence, \(\left\Vert x\right\Vert=\left\vert\langle x,y\rangle\right\vert^{1/2}\) defines a norm on \(H\) due to the Cauchy-Schwarz inequality that states that \(\left\vert\langle x,y\rangle\right\vert\leq \left\Vert x\right\Vert\left\Vert y\right\Vert\).
The finite-dimensional vector space \(\mathbb{R}^d\) is a Hilbert space for the dot product \(\langle x,y\rangle=\sum_{k\leq d} x_ky_k\) and defines the Euclidean norm \(\left\Vert x\right\Vert=\sqrt{\sum_{k\leq d}x_k^2}\).
More importantly in our case, given a finite measure \(P\), the space \(L^2\) with the bilinear form \(⟨X,Y⟩=\int XY dP\) is, due to Hölder's inequality
a Hilbert space with resulting norm \(\left\Vert X\right\Vert_{L^2(P)}=\left(\int X^2 dP\right)^{1/2}\), which is the \(L^2\) norm as defined before.
Recall that a linear functional \(T:H\to \mathbb{R}\) where \((H,\langle \cdot,\cdot\rangle)\) is a Hilbert space is continuous if and only if
Riesz Representation Theorem
Let \(H\) be a Hilbert space, and \(T:H\to \mathbb{R}\) be a continuous linear functional. Then there exists \(y\) in \(H\) such that \(T(x)=\langle y,x\rangle\) for every \(x\) in \(H\).
This theorem allows us to treat the following central theorem of measure theory in a rather simple way.
Radon-Nikodym Theorem
Let \((\Omega, \mathcal{F})\) be a measurable space and \(P, Q\) two finite measures on \(\mathcal{F}\) such that \(Q \ll P\). Then there exists a \(P\)-almost surely unique and positive random variable \(Z\) in \(L^1(P)\) such that
The random variable \(Z\) is called the Radon-Nikodym derivative of \(Q\) with respect to \(P\) and denoted by \(dQ/dP\).
In particular, it holds
Proof
The proof is based on the argumentation of John von Neumann. The measure
is such that \(\tilde{Q}\) is equivalent to \(P\). Indeed, for any event \(A\), on the one hand, \(P[A]=0\) implies that \(Q[A]=0\) and therefore \(\tilde{Q}[A]=P[A]+Q[A]=0\). On the other hand, the positivity of \(Q\) and \(P\) implies \(P[A]=0\) whenever \(\tilde{Q}[A]=0\), showing \(P\sim \tilde{Q}\). The equivalence between \(P\) and \(\tilde{Q}\) implies in particular that two random variables \(X\) and \(Y\) agree \(P\)-almost surely if and only if they agree \(Q\)-almost surely. Hence, \(L^0(P)=L^0(\tilde{Q})\). For any random variable \(X\) in \(L^0(P)=L^0(\tilde{Q})\), it holds
showing that \(L^2(\tilde{Q})\subseteq L^2(P)\).
Define now the linear functional
which is well-defined since \(L^2(\tilde{Q})\subseteq L^2(P)\). Using Jensen's inequality and the fact that \(\tilde{Q}/\tilde{Q}(\Omega)\) is a probability measure, it holds
As a consequence,
showing that \(T\) is a continuous linear functional on the Hilbert space \(L^2(\tilde{Q})\). Applying the Riesz Representation Theorem, there exists \(Y\) in \(L^2(\tilde{Q})\) such that
Taking \(X=1_A\) for \(A=\{Y\leq 0\}\), it follows that \(P[A]=0\), showing that \(Y>0\) \(P\)-almost surely. Similarly, we obtain \(0<Y\leq 1\) \(\tilde{Q}\)- and \(P\)-almost surely. Defining \(Z=1/Y-1\), which is a positive measurable function in \(L^1(P)\), it follows that
which ends the proof of existence.
Uniqueness is left as an exercise.
Conditional Expectation
The Radon-Nikodym Theorem allows us to prove easily the existence of conditional expectations.
Theorem: Conditional Expectation
Let \((\Omega,\mathcal{F},P)\) be a probability space and \(\mathcal{G}\subseteq \mathcal{F}\) be a sub-\(\sigma\)-algebra. For every integrable random variable \(X\), there exists a \(P\)-almost surely unique \(\mathcal{G}\)-measurable and integrable random variable \(Y\) such that
Denoting \(E[X|\mathcal{G}]:=Y\), provided all the following random variables are all in \(L^1\), it holds:
- \(X\mapsto E[X | \mathcal{G}]\) is linear, monotone, and \(E[c|\mathcal{G}]=c\) for every constant.
- \(E[X|\mathcal{G}]=E[X]\) if \(\mathcal{G}=\{\emptyset, \Omega\}\).
- \(E[X_n|\mathcal{G}]\nearrow E[X|\mathcal{G}]\) whenever \(0\leq X_n\nearrow X\).
- \(E[YX|\mathcal{G}]=YE[X|\mathcal{G}]\) and \(E[Y|\mathcal{G}]=Y\) whenever \(Y\) is \(\mathcal{G}\)-measurable.
- \(E[E[X|\mathcal{G}_2]|\mathcal{G}_1]=E[X|\mathcal{G}_1]\) for every two \(\sigma\)-algebras \(\mathcal{G}_1\subseteq \mathcal{G}_2\subseteq \mathcal{F}\).
- \(\varphi(E[X|\mathcal{G}])\leq E[\varphi(X)|\mathcal{G}]\) if \(\varphi\) is convex with \(\varphi(X)\in L^1\).
- \(E[\left\vert E[X|\mathcal{G}]\right\vert]\leq E[\left\vert X\right\vert]\).
- \(E[\liminf X_n|\mathcal{G}]\leq \liminf E[X_n|\mathcal{G}]\) if \(X_n\geq Y \in L^1\) and \(\liminf X_n \in L^1\).
- \(E[X|\mathcal{G}]=\lim E[X_n |\mathcal{G}]\) if \(X_n \to X\) almost surely and \(|X_n|\leq Y\in L^1\).
- \(E[XE[Y|\mathcal{G}]]=E[E[X|\mathcal{G}]Y]=E[E[X|\mathcal{G}]E[Y|\mathcal{G}]]\).
This unique random variable is called the \(\mathcal{G}\)-conditional expectation of \(X\), and is denoted by \(E[X|\mathcal{G}]\).
Proof
For \(X\) in \(L^1\), it defines two finite measures on \(\mathcal{G}\) given by
which are by definition both absolutely continuous with respect to \(P\). It follows from the Radon-Nikodym Theorem {#thm:radonnikodym} that there exist two \(P\)-almost surely unique positive \(\mathcal{G}\)-measurable random variables \(Z^{\pm}\in L^1(\mathcal{G})\) such that
Defining \(E\left[ X|\mathcal{G} \right]=Z^+-Z^-\in L^1(\mathcal G)\) as the conditional expectation ends the proof of existence and uniqueness.
In the following, we assume that the integrability conditions are sufficient for the assertions.
- Point 1: Follows immediately from the linearity and monotonicity of the expectation.
- Point 2: Let \(Y=E[X|\mathcal{G}]\). Since \(\mathcal{G}\) is the trivial \(\sigma\)-algebra, it follows that \(Y\) is constant. Furthermore, \(E[Y1_\Omega]=E[X1_{\Omega}]=E[X]\), hence \(Y=E[X]\) per definition of the conditional expectation.
-
Point 3: For \(X_n \nearrow X\) almost surely, by monotonicity of the conditional expectation, it follows that \(E[X_n|\mathcal{G}]\leq E[X|\mathcal{G}]\) and is an increasing sequence whose limit is denoted by \(Y\).
For \(A=\{Y < E[X|\mathcal{G}]\}\), which is in \(\mathcal{G}\), we have by monotone convergence\[ E[ 1_A (E[ X|\mathcal{G} ]-Y) ]=\lim E[ 1_A E[X-X_n|\mathcal{G}] ]=\lim E[ X-X_n]=0 \]showing that \(P[A]=0\) since \(E[X|\mathcal{G}]\geq Y\).
-
Point 4: For \(Y=1_A\) with \(A \in \mathcal{G}\), it holds
\[ E[ 1_BE[ YX|\mathcal{G} ]]=E[ 1_B YX ]=E[ 1_{B\cap A}X ]=E[ 1_{B\cap A}E[ X|\mathcal{G} ]]=E[1_B YE[ X|\mathcal{G} ]]. \]Since \(YE[X|\mathcal{G}]\) is \(\mathcal{G}\)-measurable, by uniqueness of the conditional expectation, we get \(YE[X|\mathcal{G}]=E[YX|\mathcal{G}]\) for every \(Y\) of the form \(Y=1_A\) with \(A\) in \(\mathcal{G}\). By linearity of the conditional expectation, it holds for every simple step \(\mathcal{G}\)-measurable \(Y\). By approximating from below by simple step functions and using Point 3, it holds for any positive \(\mathcal{G}\)-measurable random variable \(Y\).
The general case follows from the linearity of the conditional expectation and the decomposition \(Y=Y^+-Y^-\).
-
Point 5: Let \(A\) in \(\mathcal{G}_1\subseteq \mathcal{G}_2\). By the definition of the conditional expectation applied twice, it follows that
\[ E[1_A E[E[X|\mathcal{G}_2]|\mathcal{G}_1]]=E[1_A E[ X|\mathcal{G}_2]]=E[1_AX] \]showing that \(E[E[X|\mathcal{G}_2]|\mathcal{G}_1]=E[X|\mathcal{G}_1]\).
-
Point 6: This follows from Jensen's inequality using the fact that the conditional expectation is linear and monotone, plus monotone convergence from Point 3.
- Point 7: Applying Point 6 for \(\varphi(x)=|x|\).
- Point 8: For the Fatou assertion, it follows from Point 3 by considering the increasing sequence \(Z_n=\sup_{k\geq n}X_k\).
- Point 9: Follows from the dominated convergence theorem.
For your interest, here is the proof of the existence of conditional expectation using Hilbert projections.
Proof
Suppose first that \(X \in L^2(\mathcal{F})\). Note that \(L^2(\mathcal{F})\) is a Hilbert space for the norm \(\left\Vert\cdot\right\Vert_2\) and \(L^2(\mathcal{G})\) is a closed linear subspace of \(L^2(\mathcal{F})\). Hence, by Hilbert's projection theorem, there exists a unique \(Y \in L^2(\mathcal{G})\) such that \(X-Y\) is orthogonal to \(L^2(\mathcal{G})\). Since \(1_A \in L^2(\mathcal{G})\) for every \(A \in \mathcal{G}\), it follows that
showing the main assertion.