Probability
Probability Space
In probability, we consider:
- A state space \( \Omega \) of states \( \omega \in \Omega \): Description of possible states of an outcome for which there is uncertainty.
- Events \( A \subseteq \Omega \) as a collection of states that can happen. The family of all considered events \( A \subseteq \Omega \) is denoted by \( \mathcal{F} \).
Examples
-
Coin Flipping:
-
State space: \( \Omega = \{H, T\} \), where \( H \) and \( T \) denote the states "Head occurs" and "Tail occurs" as the possible outcomes of throwing a coin.
-
Events: \( A = \{H\} \) is the event that head will occur.
-
-
Temperature tomorrow:
-
State space: \( \Omega = \mathbb{R} \), where \( x \in \Omega \) represents the possible temperature at 8:00 am tomorrow.
-
Events: \( A = [13,19] \) is the event that tomorrow at 8:00 am, the temperature will lie between \( 13 \) and \( 19 \) degrees.
-
-
Financial decision:
-
State space: \( \Omega = [-1,10]^2 \), where for \( (x, y) \in \Omega \), \( x \) and \( y \) represent the interest rates that the central banks of the USA and EU, respectively, will fix next month.
-
Events: \( A = [0.25,0.75] \times [0.9,1.8] \cup \{1\} \times [1.7,2.1] \) is the event that next month the USA fixes an interest rate between \( 0.25\% \) and \( 0.75\% \) while the EU fixes one between \( 0.9\% \) and \( 1.8\% \), OR the USA fixes an interest rate of \( 1\% \) while the EU fixes one between \( 1.7\% \) and \( 2.1\% \).
-
-
Texas Holdem:
For Texas Holdem, we have 52 cards deck \(D\) with cards , , , ...
After pre-flop, flop, turn and river (if you're still there), you have to choose 5 best cards out of the best combinations you can get from the 5 on the table and the two in your hand.
-
State space: \( \Omega = \{\{c_1, c_2, c_3, c_4, c_5\} \colon c_i \in D, \text{ and } c_i \neq c_j\} \). Note here the notation in terms of set, since the order does not count. Furthermore, each card is different since distributing occurs without replacement.
-
Event: The event \(A\) that I have a royal flush corresponds to \(A\) containing the elements \(\{\) , , , , \(\}\), \(\{\) , , , , \(\}\), \(\{\) , , , , \(\}\), \(\{\) , , , , \(\}\).
-
Events are supposed to be measured afterwards, however we require some structure among events. We want to speak about the occurence of one or the other event, two events happening coincidentally, or an event not happening. Therefore, the definition of measurable space
Measurable Space
A measurable space is a tuple \( (\Omega, \mathcal{F}) \), where
- \(\Omega\) is a set (state space)
- \(\mathcal{F}\) is an algebra of subsets of \(\Omega\) (Events)
An algebra is a collection of sets satisfying the following properties
- \( \emptyset \) (nothing happens) and \( \Omega \) (anything can happen) are events.
- If \(A\) is an event, then so is \(A^c\);
- If \(A\) and \(B\) are events, then \(A\cup B\) is an event (the event that \(A\) or \(B\) happen is itself an event)
Warning
Note that this is the intuitive definition of a measurable space, but for mathematical reason, we require the algebra of events \(\mathcal{F}\) to be \(\sigma\)-stable, that is instead of requiring union of two or finitely many events to be an events, we also require
If the state space \(\Omega\) is finite or countable, the classical assumption is to consider as algebra of events the power set \(2^{\Omega}\) which is the collection of any subsets. If the state space is infinite, such as \(\mathbb{R}\), the power set would be truly large and leading to mathematical issues. In the case of \(\mathbb{R}\) for instance, the measurable sets are those generated by intervals.
Proposition
The third assumption for an algebra is equivalent to replace by
- If \(A\) and \(B\) are events, then \(A\cap B\) is an event.
Proof
Let \(A\) and \(B\) be event, from the second assumption it follows follows that \(A^c\). Now the equivalence between the two assertion (intersection vs union) follows from Morgan's rule
Examples
Here are some classical exmaples we will see throughout the lecture.
-
Coin toss:
- State Space: \(\Omega = \{-1, 1\}\) two states for head and tail
- Events: \(\mathcal{F} = 2^\Omega = \{\emptyset, \Omega, \{1\}, \{-1\}\}\)
There are here exactly \(2^2 =4\) events.
-
Finite state space:
- State Space: \(\Omega = \{\omega_1, \ldots, \omega_N\}\)
- Events: \(\mathcal{F} = 2^\Omega\)
There are here exactly \(2^{\#\Omega} = 2^N\) events (already with \(N\) beyond 100 this is more than a computer can take).
-
Random Walk:
The random walk consists to draw a coin several times in a row, recording every single result.
- State Sapce: \(\Omega = \{\omega = (\omega_1, \ldots, \omega_T)\colon \omega_i = \pm 1\}\) where each state is the sequence of results of the coin toss.
- Events: \(\mathcal{F}=2^\Omega\).
As above, the cardinality of \(\mathcal{F}\) is equal to \(2^{\# \Omega}\). However there are \(2^N\) possible sequences, and so the cardinality of events is equal to \(2^{2^N}\). You can imagine that for small \(N\) this size is already gigantic.
Random Variables
Aside from being able to measure events, we also want to know how to measure the events that a function of this state satisfies. For instance, in the case of the coin toss, suppose that you play a game where if head you win 100 and if tail you lose everything. As a function of the state it writes as \(X \colon \Omega \to \mathbb{R}\) where \(X(\omega) = 100\) if \(\omega = 1\) and \(X(\omega) = 0\) otherwize. We want to be able to speak about the event \(A\) you strictly win something which clearly if \(\{1\}\). In the general case we define random variables as such functions where you can measure this function to reach some certain level.
Definition
Let \( (\Omega, \mathcal{F}) \) be a measurable space. A function
is called a random variable if for every level \(x\), the set
is an event, that is \(A \in \mathcal{F}\).
The fact that we require the event smaller than some value seems arbitrary, however, since we have a (\(\sigma\))-algebra this is quite general
Proposition
It is equivalent for \( X: \Omega \to \mathbb{R} \) to be a random variable to require:
- \( \{X > x\} \in \mathcal{F} \) for any \( x \).
- \( \{X < x\} \in \mathcal{F} \) for any \( x \).
- \( \{X \geq x\} \in \mathcal{F} \) for any \( x \).
- \( \{x \leq(<) X \leq(<) y\} \in \mathcal{F} \) for any \( x \leq y \).
Proof
- Follows from \( \{X > x\} = \{X \leq x\}^c \), and \( \mathcal{F} \) is closed under complementation.
- \( \{X < x\} = \cap_{n} \{X \leq x +1/n\} \), and \( \mathcal{F} \) is closed under countable union.
The other assertions follows similar argumentations.
This definition is compatible with many of the standard operations. In other terms the sum, product, composition with continuous function of random variables remain random variables.
Proposition
Let \( X \) be a random variable and \( f:\mathbb{R}\to \mathbb{R} \) be a continuous function. Then
is a random variable denoted \( Y = f(X) \).
Let \( X, Y \) be random variables as well as \( (X_n) \) be a converging sequence of random variables. The following are random variables:
- \( aX + bY \) for every \( a, b \in \mathbb{R} \);
- \( XY \);
- \( \max(X, Y) \) and \( \min(X, Y) \);
- \( \sup X_n \) and \( \inf X_n \);
- \( \lim X_n \).
Proof
The first part of the proof is not trivial and has to do with topology as well as the definition of continuous functions. The argument goes as follows, for \(y\) in \(\mathbb{R}\), the set \(F = \{x \in \mathbb{R}\colon f(x) \leq y\}\) is a close set since \(f\) is continuous (lower semi-continuous would be enought). Now it is possible to show following the previous proposition that if \(X\) is a random variable, then \(\{X \in F\}\) is an event since \(F\) is closed. It follows that
For the other three points, it follows from the continuity of functions. For the \(\sup\) and \(\inf\), it follows from \(\{\sup X_n \leq x\} = \cap \{X_n \leq x\}\) and \(\{\inf X_n <x\} = \cup \{X_n <x\}\), with similar argumentation of the limit of converging sequence of random variables.
If you are interested, you can ask for lecture notes on probability.
Example: Indicator and Simple Random Variables
We turn to the most simple yet one of the most important example of random variable in probability.
-
Indicator Function
Definition
Let \( (\Omega, \mathcal{F}) \) be a measurable space and let \( A \in \mathcal{F} \) be an event. The function
\[ \begin{equation*} \begin{split} 1_A \colon & \Omega \longrightarrow \mathbb{R}\\ \omega & \longmapsto 1_A(\omega) = \begin{cases} 1 & \text{if } \omega \in A, \\ 0 & \text{if } \omega \notin A \end{cases} \end{split} \end{equation*} \]is called the indicator function of \( A \).
Exercise
The indicator function \(1_A\) of an event \(A\) is a random variable. Indeed, let \(x\) be in \(\mathbb{R}\). It follows that
\[ \{1_A \leq x\} = \begin{cases} \emptyset & \text{if } x<0\\ A^c & \text{if }0\leq x <1 \\ \Omega & \text{if }x \geq 1 \end{cases} \] -
Plot
This definition is strongly related to a table of truth: \( 1 \) for true, \( 0 \) for false.
Clearly \( 1_{\emptyset} = 0 \) and \( 1_{\Omega} = 1 \).
Show that:
- If \( A \) and \( B \) are events such that \( A \cap B = \emptyset \), then \( 1_{A \cup B} = 1_A + 1_B \).
- If \( A \) and \( B \) are events, then \( 1_{A \cap B} = 1_A 1_B \).
- If \( A \subseteq B \) are events, then \( 1_A \leq 1_B \).
-
Simple Random Variable
Definition: Simple Random Variables
For a family \( A_1, A_2, \ldots, A_n \) of disjoint events and numbers \( \alpha_1, \ldots, \alpha_n \), we can define the simple random variable
\[ X(\omega) = \sum_{k=1}^n \alpha_k 1_{A_k}(\omega) = \begin{cases} \alpha_k & \text{if } \omega \in A_k, \\ 0 & \text{otherwize} \end{cases} \]According to the previous proposition, it follows that \( X \) is also a random variable.
Note that intuitively, multiplication and addition of simple random variables remain simple random variables, however one has to be careful to show it on the events where both random variable coincide.
-
Plot
Example: Random Variable on Finite State Space
Let \( \Omega = \{\omega_1, \omega_2, \ldots, \omega_N\} \) be a finite state space and \( \sigma \)-algebra \( \mathcal{F} = 2^\Omega \). We consider a financial market with one stock \( S \) where \( S_0 > 0 \) denotes the price today and \( S_1 \) represents the possible price of the stock tomorrow. The possible evolution for the stock is given as a function:
We can also write the stock price function as a simple random variable (showing therefore that it is a random variable):
where \( A_n = \{\omega_n\} \). In other terms, the stock price is entirely given by the vector \( (s_1, \ldots, s_N) \). Without any loss of generality, since we have one stock, we may assume that \( s_1 < s_2 < \ldots < s_N \). Also, since the stock price is positive, we also have \( 0 \leq s_1 \). The returns \( R_1 = \frac{S_1 - S_0}{S_0} \) are also a random variable that can be described as a vector \( (r_1, \ldots, r_N) \), where
Probability Measure
Definition: Probability Measure
A probability measure \( P \) on the measurable space \( (\Omega, \mathcal{F}) \) is a function \( P: \mathcal{F} \to [0,1] \) that associate to each event \(A\) the likelyhood of this event.
It has the following basic properties:
-
\( P[\emptyset] = 0 \) and \( P[\Omega] = 1 \)(1)
- Clearly, the probability that nothing or anything can happen is \(0\) or \(1\).
-
\(P[A \cup B] = P[A] + P[B]\) if \(A\) and \(B\) are two disjoint events.(1)
- The countable property is however assumed, that is \( P[\cup A_n] = \sum P[A_n] \) for every sequence of pairwise disjoint(1) events \( (A_n) \subseteq \mathcal{F} \).
The triple \( (\Omega, \mathcal{F}, P) \) is called a probability space.
The assumptions for a probability measure are few, however together with the definition of the algebra we can rapidly derive classical properties that are common knowledge.
Lemma
Let \( P \) be a probability measure. For any events \( A \), \( B \), or sequence \( (A_n) \) of events, the following hold:
- \( P[B] = P[A] + P[B \setminus A] \geq P[A] \) whenever \( A \subseteq B \);
- \( P[A^c] = 1 - P[A] \);
- \( P[A \cup B] + P[A \cap B] = P[A] + P[B] \);
-
If \( A_1 \subseteq A_2 \subseteq \ldots \subseteq A_n \subseteq \ldots \), then:
\[ P\left[ \cup A_n \right] = \lim P[A_n] \] -
If \( A_1 \supseteq A_2 \supseteq \ldots \supseteq A_n \supseteq \ldots \), then:
\[ P\left[ \cap A_n \right] = \lim P[A_n] \]In particular, it equals \( 0 \) if \( \cap A_n = \emptyset \).
Proof
We prove some of the points, leaving the others as an exercise.
For the first point, let \( A \subseteq B \). We have \( B = A \cup (B \setminus A) \), where this union is disjoint. By the second property of a probability measure and the positivity of probability:
Taking \( B = \Omega \), and using \( P[\Omega] = 1 \), the second point follows.
Using similar arguments, prove the third point.
For the fourth point, construct the sequence of disjoint sets:
By induction, it is easy to show:
By additivity of the probability measure:
Thus:
By the second property of a probability measure:
Combining these equations shows \( \lim P[A_n] = P[\cup A_n] \).
Follow similar reasoning to prove the last point.
Note: Shorthand Notations in Probability
In probability theory, the following shorthand notations are commonly used:
Examples
-
Probability on Finite Sets: Suppose \( \Omega = \{\omega_1, \ldots, \omega_N\} \) is finite. Each probability measure \( P \) on \( \mathcal{F} = 2^\Omega \) is entirely determined by the values \( p_n = P[\{\omega_n\}] \) for \( n = 1, \ldots, N \). Indeed, for every event \(A\) is of the form \(A=\{\omega_n\colon n \in I\}\) for some \(I\subseteq \{1, \ldots, N\}\). It follows that
\[ P[A] = \sum_{\omega \in A} P[\{\omega\}] = \sum_{n \in I} p_n \]This vector \(\boldsymbol{p}=(p_1, \ldots, p_N) \) has the property that \(p_n = P[\{\omega_n\}]\geq 0\) and \(\sum p_n =P[\Omega] = 1\).
Reciprocally, if you give yourself a vector \(\boldsymbol{p}=(p_1, \ldots, p_N)\) with \(p_n \geq 0\) and \(\sum p_n\), it defines a probability \(P\) on \(\mathcal{F}\) with the definition
\[ P[A]:=\sum_{n \in I} p_n \]where \(A = \{\omega_n \colon n \in I\}\). As an exercise, verify that this defines a probability measure.
The set of such vectors is denoted by
\[ \Delta := \left\{ \boldsymbol{p} \in \mathbb{R}^N \colon : p_n \geq 0, \, \sum p_n = 1 \right\} \]An important case is when \( p_n = 1/N \) for all \( n \). This is called the uniform probability distribution.
-
Probability on the Coin Toss Space: Let \( \Omega = \{\omega = (\omega_1, \ldots, \omega_T) : \omega_t = \pm 1\} \), a finite state space. Assuming the probability of heads is \( p \) and coin tosses are independent, the probability is:
\[ P[\{\omega = (\omega_1, \ldots, \omega_T)\}] = p^l q^{T-l} \]where \( l \) is the number of times \( \omega_t = 1 \) for \( t = 1, \ldots, T \).
-
Normal Distribution: For \( \Omega = \mathbb{R} \) and \( \mathcal{F} \) the \( \sigma \)-algebra of \( \mathbb{R} \) generated by intervals, define for any event \(A\) the probability
\[ P[A] = \frac{1}{\sigma \sqrt{2\pi}} \int_A e^{-\frac{(x-\mu)^2}{2\sigma^2}} \lambda(dx) \]where \( \lambda \) is the Lebesgue measure on \( \mathbb{R} \), the one measuring intervals. This is the normal distribution. For example, temperatures in Shanghai at this time of year may follow a normal distribution around 24°C with variance 1.
Integration
The historical idea behind integration was to measure areas below a function. The expectation in probability brings exactly the same intuition to this more abstract level.
Consider the simple example of the indicator function \(1_A\), it represents a rectangle of height \(1\) and width represented by the measure of \(A\), that is \(P[A]\). Hence, the area of the rectangle, or expectation of the indicator function, is given by \(E[1_A]=1 \times P[A]\).
Extending this concept is straightforward for any positive simple random variable.
-
Integration of Simple Random Variable
Definition: Expectation 1.0
Let \((\Omega,\mathcal{F},P)\) be a probability space. Given a simple random variable
\[ X = \sum_{k\leq n} \alpha_k 1_{A_k} \]we define the expectation of \(X\) with respect to \(P\) as
\[ E[X]:=\sum_{k\leq n} \alpha_k P[A_k] \] -
Plot
Warning
One needs to be careful that this definition is independent of the representation of the simple random variable. Indeed, we have \(X= 1_A + 1_B = 1_{A\cup B}\) if \(A\) and \(B\) are disjoint for instance. Luckily, by the properties of the probability measure, this random variable has the same expectation for the two representations.
Proposition
The two following important properties of the expectation on simple random variables can be rapidely checked.
- Monotony: \(E[X]\leq E[Y]\) whenever \(X\leq Y\).
- Linearity: \(E[aX+bY]=aE[X]+bE[Y]\).
The proof of which is easy and left to you.
Exercise
Given a simple random variable \(X\) show that
- If \(X\) is positive, then \(E[X]>0\) if and only if \(P[X>0] >0\).
- If \(X\) is positive, then \(E[X] = 0\) if and only if \(P[X = 0]=1\).
We can now define the expectation of an arbitrary positive random random variable. The idea is to approximate from below this random variable by simple ones and take the limit.
-
First Approximation
-
Second Approximation
Note
Though the definition of the expectation does not implies the explicit construction of a sequence approximating, it is however possible to formalize the idea in the picture.
Given a random variable \(X\), the strategy is as follows: For every natural number \(n\), divide the ever growing vertical interval \([0, n)\) into \(2^n\) sub intervals \(\left[k \frac{n}{2^Nn}, (k+1)\frac{n}{2^n}\right)\) for \(k=0, \ldots, 2^n-1\). Define now
It follows that the sequence \((X_n)\) of simple random variables defined as
is increasing and converges to \(X\).
Definition: Expectation 1.5
Given a positive random variable, the expectation of which is defined as
This is well defined but eventually equal to \(\infty\). For this is also holds that for two positive random variable \(X\) and \(Y\) with positive numbers \(a\) and \(b\) then \(E[aX + bY] = aE[X] + b E[Y]\) as well as \(E[X]\leq E[Y]\) if \(X\leq Y\).
To consider general random variable, we need to assume integrability.
Definition: Expectation 2.0
A random variable is called integrable if \(E[X^+]<\infty\) and \(E[X^-]<\infty\). The expectation of an integrable random variable is then defined as
On the set of integrable random variables, which is a vector space, the expectation is also linear and monotone.
The following fundamental theorem is due to Lebesgue. It tells under which conditions it is possible to swap limit and expectation.
Theorem
Let \((X_n)\) be a sequence of random variables. The following holds true
-
Monotone Convergence: If \((X_n)\) are positive and increasing, that is, X_1\leq X_2 \leq \cdots$ it holds that
\[ \sup E[X_n] = \lim E[X_n] = E[\sup X_n] = E[\lim X_n] \] -
Fatou's Lemma: If \((X_n)\) are positive then it holds
\[ E\left[ \liminf X_n \right]:=E\left[ \sup_n \inf_{k\geq n} X_k\right] \leq \liminf E[X_n] \] -
Lebesgue's Dominated Convergence: If \(X_n(\omega) \to X(\omega)\) for all (at least in probability) and \(|X_n|\leq Y\) for some integrable random variable \(Y\), then it holds
\[ \lim E[X_n] = E[\lim X_n] = E[X] \]
Proof
We start by the monotone convergence.
By monotonicity, we clearly have \(E[X_n]\leq E[X]\) for every \(n\), therefore \(\sup E[X_n]\leq E[X]\).
Reciprocally, suppose that \(E[X]<\infty\) and pick \(\varepsilon>0\) and a positive simple random variable \(Y \) such that \(Y\leq X\) and \(E[X]-\varepsilon\leq E[Y]\). For \(0<c<1\), define the sets \(A_n=\{X_n\geq cY\}\). Since \(X^n\) is increasing to \(X\), it follows that \(A_n\) is an increasing sequence of events. Furthermore, since \(cY\leq Y\leq X\) and \(cY<X\) on \(\{X>0\}\), it follows that \(\cup A_n=\Omega\). By non-negativity of \(X_n\) and monotonicity, it follows that
and so
Since \(Y=\sum_{l\leq k} \alpha_l 1_{B_l}\) for positive numbers \(\alpha_1,\ldots,\alpha_k\) and events \(B_1,\ldots, B_k\), it follows that
However, since \(P\) is a probability measure, and \(A_n\) is increasing to \(\Omega\), it follows from the lower semi-continuity of probability measures, that \(P[A_n\cap B_l]\nearrow P[\Omega\cap B_l]=P[B_l]\), and so
Consequently
which, by letting \(c\) converge to \(1\) and \(\varepsilon\) to \(0\), yields the result.
The case where \(E[X]=\infty\) is similar and left to the reader.
As for Fatou's lemma, define \(Y_n =\inf_{k\geq n} X_k\) which defines by assumption an increasing sequence of positive random variables. It follows from monotone convergence that
On the other hand, it clearly holds that \(X_k \geq Y_n\) for every \(k\geq n\) and therefore \(\inf_{k\geq n} E[X_k] \geq E[Y_n]\). Combined with the previous inequality we get
As for the dominated convergence of Lebesgue, we have by assumption that \(X_n+Y\) is a sequence of positive random variables, which by Fatou's lemma yields
Reciprovally \(Y = X_n\) is a sequence of positive random variable for which also holds
Combining both inequality yields
Since $\liminf \leq \limsup $ if and only if there exists a limit we deduce that \(E[X] = \lim E[X_n]\).
Example
-
Integration for the simple coin toss: Let \(\Omega =\{\omega_1,\omega_2\}\) and \(p=P[\{\omega_1\}]\) and \(q=(1-p)\).
Every random variable \(X:\Omega \to \mathbb{R}\) is entirely determined by the values \(X(\omega_1) = x_1\) and \(X(\omega_2)=x_2\).
It follows that\[ E[X]=pX(\omega_1)+qX(\omega_2) = p x_1 + (1-p)x_2 \] -
Integration in the finite state case: Let \(\Omega=\{\omega_1,\ldots,\omega_N\}\) be a finite state space.
The probability measure is entirely given by the vector \(\boldsymbol{p}=(p_1,\ldots,p_N)\in \mathbb{R}^N\), where \(p_n=P[\{\omega_n\}]\geq 0\) and \(\sum p_n=1\). Every random variable \(X:\Omega \to \mathbb{R}\) can be seen as a vector \(\boldsymbol{x} \in \mathbb{R}^N\), where \(x_n=X(\omega_n)\). It follows that the expectation of \(X\) under \(P\) is given by\[ E[X]=\sum p_n X(\omega_n)=\sum p_n x_n=\boldsymbol{p}\cdot \boldsymbol{x} \]In other terms, the expectation of \(X\) boils down to the scalar product of the probability vector \(\boldsymbol{p}\) with the vector of values \(\boldsymbol{x}\) of the random variable.
Measure Change
The concept of the expectation of a random variable \( E[X] \) depends, by definition, on the probability measure \( P \). We should therefore write \( E^P[X] \) to signify this dependence. If, on the same measurable space \( (\Omega, \mathcal{F}) \), we are given another probability \( Q \), the question arises: how is \( E^P[X] \) related to \( E^Q[X] \)?
Remark
Before diving into this question, let us first see how, starting from a probability \( P \), we can define a new probability \( Q \). Suppose we are given a random variable \( Z \) such that:
- \( Z \) is positive.
- \( E^P[Z] = 1 \).
We can define the function:
This function, for any event \( A \), returns the expectation of \( Z \) over \( A \).
It turns out that this function, under the assumptions on \( Z \), defines a new probability measure.
Specifically:
- \( Q[\emptyset] = E^P[Z \cdot 1_\emptyset] = E^P[0] = 0 \),
- \( Q[\Omega] = E^P[Z \cdot 1_\Omega] = E^P[Z] = 1 \).
Additivity also holds: for any two disjoint events \( A \) and \( B \), \( 1_{A \cup B} = 1_A + 1_B \). Hence:
Warning
To fully define \( Q \) as a probability measure, you must also check \(\sigma\)-additivity. That is, for every sequence \( (A_n) \) of pairwise disjoint events, it must hold:
Define the random variables \( X_n = Z \cdot 1_{\cup_{k \leq n} A_k} = Z \cdot \left( \sum_{k \leq n} 1_{A_k} \right) \) and let \( X = Z \cdot 1_{\cup A_n} \). Since \( |X_n| \leq Z \), where \( Z \) is integrable, dominated convergence implies:
Meanwhile:
and \( E^P[X] = Q\left[\bigcup A_n\right] \).
Hence, any positive random variable \( Z \) with expectation 1 under \( P \) defines a new probability measure \( Q \).
Furthermore, for any bounded random variable \( X \), it holds that: [ E^Q[X] = E^P[Z \cdot X]. ]
To see this, consider a simple random variable \( X = \sum \alpha_k \cdot 1_{A_k} \):
The general case follows by approximating \( X \) with simple random variables.
Additionally, \( Q \) is dominated by \( P \) in the sense that \( P[A] = 0 \) implies \( Q[A] = E^P[Z \cdot 1_A] = 0 \).
From this, we see that a positive random variable \( Z \) with expectation 1 allows us to define a new probability \( Q \), dominated by \( P \), and connects expectations under \( Q \) to those under \( P \). The challenging and powerful task is to establish the reciprocal relationship. The key lies in the concepts of absolute continuity or equivalence between probability measures, and the Radon-Nikodym Theorem.
Definition
Given two probability measures \( P \) and \( Q \), we define:
-
\( Q \) is absolutely continuous with respect to \( P \) (\( Q \ll P \)) if:
\[ P[A] = 0 \quad \text{implies} \quad Q[A] = 0. \] -
\( Q \) is equivalent to \( P \) (\( Q \sim P \)) if both \( Q \ll P \) and \( P \ll Q \), i.e.:
\[ P[A] = 0 \quad \text{if and only if} \quad Q[A] = 0. \]
By definition:
or equivalently:
In the equivalent case:
or equivalently:
Absolute continuity implies that events unlikely under \( P \) are also unlikely under \( Q \). Equivalence means that \( P \) and \( Q \) agree on which sets are unlikely.
Radon-Nikodym Theorem
On a measurable space \( (\Omega, \mathcal{F}) \), if a probability measure \( Q \) is absolutely continuous with respect to another probability measure \( P \), there exists a (\( P \)-almost surely) unique random variable \( Z \) such that:
This unique random variable is called the density of \( Q \) with respect to \( P \) and is denoted \( \frac{dQ}{dP} \).
The notation \( \frac{dQ}{dP} \) is cosmetic; it does not represent a literal ratio. It simplifies expressions such as:
This theorem underpins many results in stochastic processes and finance, such as the Black-Scholes-Merton formula. However, proving it requires knowledge of functional analysis, which is beyond this lecture's scope. The proof is simpler in a finite state space.
Exercise
Let \( \Omega = \{\omega_1, \ldots, \omega_n\} \) be a finite state space with \( \sigma \)-algebra \( \mathcal{F} = 2^\Omega \). Suppose \( P \) is a probability measure given by \( \boldsymbol{p} = (p_1, \ldots, p_n) \), where \( P[\{\omega_i\}] = p_i > 0 \) and \( \sum p_i = 1 \). Let \( Q \) be another probability measure on \( (\Omega, \mathcal{F}) \) given by \( \boldsymbol{q} = (q_1, \ldots, q_n) \), where \( Q[\{\omega_i\}] = q_i \geq 0 \) and \( \sum q_i = 1 \).
Since \( P[A] = 0 \) implies \( A = \emptyset \), it follows that \( Q[A] = Q[\emptyset] = 0 \).
Hence, \( Q \ll P \).
Find a random variable \( \frac{dQ}{dP} \colon \Omega \to \mathbb{R} \) such that \( \frac{dQ}{dP} \geq 0 \), \( E^P\left[\frac{dQ}{dP}\right] = 1 \), and:
for every random variable \( X \colon \Omega \to \mathbb{R} \). Show that \( \frac{dQ}{dP} \) is unique.
In this finite setting, \( \frac{dQ}{dP} \) can be represented by a vector \( \boldsymbol{z} = (z_1, \ldots, z_n) \) with \( z_i = \frac{dQ}{dP}(\omega_i) \). The conditions reduce to finding \( \boldsymbol{z} \) such that \( z_i \geq 0 \), \( \sum z_i p_i = 1 \), and for every vector \( \boldsymbol{x} = (x_1, \ldots, x_n) \):
Independence
A fundamental concept in probability, distinct from general measure theory, is independence. Intuitively, two events \( A \) and \( B \) are independent if their probability of joint occurrence equals the product of their respective probabilities.
This concept can be extended to random variables and families of events, with significant implications for results in probability theory.
Definition
Given a probability space \( (\Omega, \mathcal{F}, P) \):
-
Two events \( A \) and \( B \) are called independent if:
\[ P[A \cap B] = P[A] P[B]. \] -
Two families of events \( \mathcal{C} \) and \( \mathcal{D} \) are independent if any event \( A \) in \(\mathcal{C}\) is independent of any event \( B \) in \(\mathcal{D}\).
-
Two random variables \( X \) and \( Y \) are independent if the \(\sigma\)-algebras generated by their information,
\[ \sigma(X) = \sigma(\{X \leq x\} : x \in \mathbb{R}) \quad \text{and} \quad \sigma(Y) = \sigma(\{Y \leq x\} : x \in \mathbb{R}), \]are independent.
-
A collection of families of events \( \mathcal{C}^i \) (with \( i \) indexing the families) is independent if for every finite selection of events \( A^{i_1}, \ldots, A^{i_n} \), where \( A^{i_k}\) is in \(\mathcal{C}^{i_k} \), it holds that:
\[ P\left[ A^{i_1} \cap \cdots \cap A^{i_n} \right] = \prod_{k=1}^n P[A^{i_k}]. \] -
A family (or sequence) of random variables \( (X_i) \) is independent if the family of \(\sigma\)-algebras \( \sigma(X_i) \) is independent.
Warning
The first three points focus on pairwise independence for events, families, or random variables. However, for collections with more than two elements, pairwise independence is insufficient. For example, a sequence of random variables requires a stronger notion of independence that accounts for all finite subsets.
Exercise
Consider a four-element probability space \( \Omega = \{\omega_1, \omega_2, \omega_3, \omega_4\} \) with uniform probability \( P[\{\omega_i\}] = \frac{1}{4} \). Construct three events \( A_1 \), \( A_2 \), and \( A_3 \) such that: - \( A_1 \) is independent of \( A_2 \), - \( A_1 \) is independent of \( A_3 \), - \( A_2 \) is independent of \( A_3 \), - but \( A_1 \), \( A_2 \), and \( A_3 \) together are not independent.
Formally:
If you struggle, ask ChatGPT—it can handle this.
Independence is a strong assumption, but it depends on the probability measure. Even if two events are independent under a specific \( P \), independence might fail under a different measure. This concept is crucial in foundational results such as the law of large numbers and the central limit theorem, which are cornerstones of Monte Carlo methods.
Let us now present a proposition related to independent random variables, which will be further explored in the context of stochastic processes and conditional expectations.
Proposition
Let \( X \) and \( Y \) be two independent bounded random variables. Then:
Proof sketch
Consider the case where \( X = 1_A \) and \( Y = 1_B \) are indicator functions.
Independence of \( X \) and \( Y \) implies that \( A \) and \( B \) are independent.
Hence:
[
E[XY] = E[1_A 1_B] = E[1_{A \cap B}] = P[A \cap B] = P[A] P[B] = E[X] E[Y].
]
This reasoning extends easily to simple random variables, as the \(\sigma\)-algebras generated by \( X \) and \( Y \) correspond to the events on which they are defined.
For the general case, approximate \( X \) and \( Y \) by sequences of simple random variables \( (X_n) \) and \( (Y_n) \), and use the properties of independence and limits of expectations.
Conditional Expectation
The conditional expectation is the first step towards stochastic processes. It is basically the best approximation in terms of expectation given some information. In other terms, let \(\mathcal{G}\subseteq \mathcal{F}\) be a sub-\(\sigma\)-algebra of events, what is the best approximation of the expectation of \(X\) knowing the events in \(\mathcal{G}\).
Conditional Expectation
Let \((\Omega, \mathcal{F}, P)\) be a probability space, \(X\) a random variable and \(\mathcal{G}\subseteq \mathcal{F}\) a \(\sigma\)-algebra.
Then, there exists a unique(1) random variable \(Y\) with the properties
- \(Y\) is \(\mathcal{G}\)-measurable;
- \(E[Y1_A] = E[X1_A]\) for any event \(A\) in \(\mathcal{G}\).
Proof
The proof of the theorem is a consequence of Radon-Nykodym derivative. Indeed, define the measurs \(Q^+\) and \(Q^-\)
which are measures defined on the smallest \(\sigma\)-algebra of events \(\mathcal{G}\). These measures are absolutely continuous with respect to \(P\), and therefore there exists unique \(dQ^\pm/dP\) their densities that are \(\mathcal{G}\)-measurable.
Defining
give a unique \(\mathcal{G}\)-measurable random variable satistying by definition the expectation property.
Since the random variable satisfying the two conditions is unique(1) we can therefore use it as definition.
Conditional Expectation
The conditional expectation of a random variable \(X\) with respect to \(\mathcal{G}\) is denoted by \(E[X|\mathcal{G}]\) and is defined as the unique random variable which is \(\mathcal{G}\)-measurable and such that \( E[ E[X |\mathcal{G}]1_A] = E[X 1_A]\) for all events \(A\) in \(\mathcal{G}\).
The conditional expectation shares most of the properties of the traditional expectation
Proposition
Let \(X\) be a random variable, \(\mathcal{G} \subseteq \mathcal{F}\). It holds that
- Expectation: \(E[E[X|\mathcal{G}]] = E[X]\)
- Conditional Linearity: \(E[Y X + Z |\mathcal{G}] = Y E[X |\mathcal{G}] + Z\) for any random variables \(Y\) and \(Z\) which are \(\mathcal{G}\)-measurable.
- Tower Property: \(E[E[X | \mathcal{G_2}] | \mathcal{G}_1] = E[X|\mathcal{G}_1]\) if \(\mathcal{G}_1\subseteq \mathcal{G}_2\).
- Trivial: \(E[X |\mathcal{F}_0] = E[X]\) if \(\mathcal{F}_0 = \{\emptyset, \Omega\}\),
- Independence: \(E[X | \mathcal{G}] = E[X]\) if \(X\) is independent of \(\mathcal{G}\).