从维基百科和机器学习的书上整理、复制、粘贴了几个常见的集中不等式

Markov’s inequality
- statement
- intuition
- proof
- corollaries
Chebyshev’s inequality
- statement
- proof
Hoeffding’s inequality
- Hoeffding’s lemma
  - statement
  - Proof
- proof
McDiarmid’s inequality

Markov’s inequality

statement

If $X$ is a nonnegative random variable and $a > 0$ , then the probability that $X$ is at least $a$ is at most the expectation of $X$ divided by $a$

P (X \geq a) \leq \frac{E ( X )}{a}

intuition

$E (X) = P (X < a) \cdot E (X ∣ X < a) + P (X \geq a) \cdot E (X ∣ X \geq a)$

$E (X) \geq P (X \geq a) \cdot E (X ∣ X \geq a) \geq a \cdot P (X \geq a)$

$P (X \geq a) \leq \frac{E ( X )}{a}$

proof

For any event $E$ , let $I_{E}$ be the indicator random variable of $E$ , that is, $I_{E} = 1$ if $E$ occurs and $I_{E} = 0$ otherwise.
Using this notation, we have $I_{(X \geq a)} = 1$ if the event $X \geq a$ occurs, and $I_{(X \geq a)} = 0$ if $X < a$ . Then, given $a > 0$ ,

a I_{(X \geq a)} \leq X

which is clear if we consider the two possible values of $X \geq a$ . If $X < a$ , then $I_{(X \geq a)} = 0$ , and so $a I_{(X \geq a)} = 0 \leq X$ .
Otherwise, we have $X \geq a$ , for which $I_{X \geq a} = 1$ and so $a I_{X \geq a} = a \leq X$ .
Since $E$ is a monotonically increasing function, taking expectation of both sides of an inequality cannot reverse it. Therefore,

E (a I_{(X \geq a)}) \leq E (X) .

Now, using linearity of expectations, the left side of this inequality is the same as

a E (I_{(X \geq a)}) = a (1 \cdot P (X \geq a) + 0 \cdot P (X < a)) = a P (X \geq a) .

Thus we have

a P (X \geq a) \leq E (X)

and since $a > 0$ , we can divide both sides by $a$ .

corollaries

$ϕ$ is a nondecreasing nonnegative fundction

P (∣ X ∣ \geq a) = P (φ (∣ X ∣) \geq φ (a)) \leq MI \frac{E ( φ ( ∣ X ∣ ))}{φ ( a )}

Chebyshev’s inequality

statement

P (∣ X - E (X) ∣ \geq a) \leq \frac{Var ( X )}{a ^{2}},

proof

for any $a > 0$

Var (X) = E [(X - E (X))^{2}] .

because Markov’s inequality

P ((X - E (X))^{2} \geq a^{2}) \leq \frac{Var ( X )}{a ^{2}} .

This argument can be summarized (where “MI” indicates use of Markov’s inequality):

P (∣ X - E (X) ∣ \geq a) = P ((X - E (X))^{2} \geq a^{2}) \leq MI \frac{E ( ( X - E ( X ) ) ^{2} )}{a ^{2}} = \frac{Var ( X )}{a ^{2}} .

Hoeffding’s inequality

Let $X_{1}, \dots, X_{m}$ be independent random variables with $X_{i}$ taking values in $[a_{i}, b_{i}]$ for all $i \in [1, m]$ . Then for any $ϵ > 0$ , the following inequalities hold for $S_{m} = \sum_{i = 1}^{m} X_{i}$ :

Pr [S_{m} - E [S_{m}] \geq ϵ] \leq e^{- 2 ϵ^{2} / \sum_{i = 1}^{m} (b_{i} - a_{i})^{2}} Pr [S_{m} - E [S_{m}] \leq - ϵ] \leq e^{- 2 ϵ^{2} / \sum_{i = 1}^{m} (b_{i} - a_{i})^{2}} .

Hoeffding’s lemma

statement

Let $X$ be a random variable with $E [X] = 0$ and $a \leq X \leq b$ with $b > a$ . Then, for any $t > 0$ , the following inequality holds:

E [e^{tX}] \leq e^{\frac{t ^{2} ( b - a ) ^{2}}{8}} .

Proof

By the convexity of $x \mapsto e^{x}$ , for all $x \in [a, b]$ , the following holds:

e^{t x} \leq \frac{b - x}{b - a} e^{t a} + \frac{x - a}{b - a} e^{t b} .

Thus, using $E [X] = 0$ ,

E [e^{tX}] \leq E [\frac{b - X}{b - a} e^{t a} + \frac{X - a}{b - a} e^{t b}] = \frac{b}{b - a} e^{t a} + \frac{- a}{b - a} e^{t b} = e^{ϕ (t)},

where,

ϕ (t) = lo g (\frac{b}{b - a} e^{t a} + \frac{- a}{b - a} e^{t b}) = t a + lo g (\frac{b}{b - a} + \frac{- a}{b - a} e^{t (b - a)}) .

For any $t > 0$ , the first and second derivative of $ϕ$ are given below:

ϕ^{'} (t) ϕ^{''} (t) = a - \frac{a e ^{t (b - a)}}{\frac{b}{b - a} - \frac{a}{b - a} e ^{t (b - a)}} = a - \frac{a}{\frac{b}{b - a} e ^{- t (b - a)} - \frac{a}{b - a}}, = \frac{- ab e ^{- t (b - a)}}{[ \frac{b}{b - a} e ^{- t (b - a)} - \frac{a}{b - a} ] ^{2}} = \frac{α ( 1 - α ) e ^{- t (b - a)} ( b - a ) ^{2}}{[ ( 1 - α ) e ^{- t (b - a)} + α ] ^{2}} = \frac{α}{[ ( 1 - α ) e ^{- t (b - a)} + α ]} \frac{( 1 - α ) e ^{- t (b - a)}}{[ ( 1 - α ) e ^{- t (b - a)} + α ]} (b - a)^{2} .

where $α$ denotes $\frac{- a}{b - a}$ . Note that $ϕ (0) = ϕ^{'} (0) = 0$ and that $ϕ^{''} (t) = u (1 - u) (b - a)^{2}$ where $u = \frac{α}{[ ( 1 - α ) e ^{- t (b - a)} + α ]}$ . Since $u$ is in $[0, 1], u (1 - u)$ is upper bounded by $1/4$ and $ϕ^{'} (t) \leq \frac{( b - a ) ^{2}}{4}$ . Thus, by the second order expansion of function $ϕ$ , there exists $θ \in [0, t]$ such that:

ϕ (t) = ϕ (0) + t ϕ^{'} (0) + \frac{t ^{2}}{2} ϕ^{''} (θ) \leq t^{2} \frac{( b - a ) ^{2}}{8},

proof

$Pr [S_{m} - E [S_{m}] \geq ϵ] \leq e^{- t ϵ} E [e^{t (S_{m} - E [S_{m}])}] = Π_{i = 1}^{m} e^{- t ϵ} E [e^{t (X_{i} - E [X_{i}])}] \leq Π_{i = 1}^{m} e^{- t ϵ} e^{t^{2} (b_{i} - a_{i})^{2} /8} = e^{- t ϵ} e^{t^{2} \sum_{i = 1}^{m} (b_{i} - a_{i})^{2} /8} \leq e^{- 2 ϵ^{2} / \sum_{i = 1}^{m} (b_{i} - a_{i})^{2}},$

McDiarmid’s inequality

statement

Let $X_{1}, \dots, X_{m} \in X^{m}$ be a set of $m \geq 1$ independent random variables and assume that there exist $c_{1}, \dots, c_{m} > 0$ such that $f : X^{m} \to R$ satisfies the following conditions:

∣ f (x_{1}, \dots, x_{i}, \dots, x_{m}) - f (x_{1}, \dots, x_{i}^{'}, \dots x_{m}) ∣ \leq c_{i},

for all $i \in [1, m]$ and any points $x_{1}, \dots, x_{m}, x_{i}^{'} \in X$ . Let $f (S)$ denote $f (X_{1}, \dots, X_{m})$ , then, for all $ϵ > 0$ , the following inequalities hold:

Pr [f (S) - E [f (S)] \geq ϵ] \leq exp (\frac{- 2 ϵ ^{2}}{\sum _{i = 1}^{m} c _{i}^{2}}) Pr [f (S) - E [f (S)] \leq - ϵ] \leq exp (\frac{- 2 ϵ ^{2}}{\sum _{i = 1}^{m} c _{i}^{2}}) .

Definition Martingale Difference

A sequence of random variables $V_{1}, V_{2}, \dots$ is a martingale difference sequence with respect to $X_{1}, X_{2}, \dots$ if for all $i > 0, V_{i}$ is a function of $X_{1}, \dots, X_{i}$ and

E [V_{i + 1} ∣ X_{1}, \dots, X_{i}] = 0.

Lemma

Let $V$ and $Z$ be random variables satisfying $E [V ∣ Z] = 0$ and, for some function $f$ and constant $c \geq 0$ , the inequalities:

f (Z) \leq V \leq f (Z) + c .

Then, for all $t > 0$ , the following upper bound holds:

E [e^{t V} ∣ Z] \leq e^{t^{2} c^{2} /8} .

Theorem Azuma’s inequality

Let $V_{1}, V_{2}, \dots$ be a martingale difference sequence with respect to the random variables $X_{1}, X_{2}, \dots$ , and assume that for all $i > 0$ there is a constant $c_{i} \geq 0$ and
random variable $Z_{i}$ , which is a function of $X_{1}, \dots, X_{i - 1}$ , that satisfy

Z_{i} \leq V_{i} \leq Z_{i} + c_{i} .

Then, for all $ϵ > 0$ and $m$ , the following inequalities hold:

Pr [i = 1 \sum m V_{i} \geq ϵ] \leq exp (\frac{- 2 ϵ ^{2}}{\sum _{i = 1}^{m} c _{i}^{2}}) Pr [i = 1 \sum m V_{i} \leq - ϵ] \leq exp (\frac{- 2 ϵ ^{2}}{\sum _{i = 1}^{m} c _{i}^{2}}) .

proof

Proof For any $k \in [1, m]$ , let $S_{k} = \sum_{i = 1}^{k} V_{k}$ . Then, using Chernoff’s bounding technique, for any $t > 0$ , we can write

Pr [S_{m} \geq ϵ] \leq e^{- t ϵ} E [e^{t S_{m}}] = e^{- t ϵ} E [e^{t S_{m - 1}} E [e^{t V_{m}} ∣ X_{1}, \dots, X_{m - 1}]] \leq e^{- t ϵ} E [e^{t S_{m - 1}}] e^{t^{2} c_{m}^{2} /8} \leq e^{- t ϵ} e^{t^{2} \sum_{i = 1}^{m} c_{i}^{2} /8} = e^{- 2 ϵ^{2} / \sum_{i = 1}^{m} c_{i}^{2}}

where we chose $t = 4 ϵ / \sum_{i = 1}^{m} c_{i}^{2}$ to minimize the upper bound. This proves the first statement of the theorem, and the second statement is shown in a similar way.
( $i f F 1 \subset F 2, t h e n E [E [Y ∣ F 1] ∣ F 2] = E [E [Y ∣ F 2] ∣ F 1] = E [Y ∣ F 1]$ )

proof

Define a sequence of random variables $V_{k}, k \in [1, m]$ , as follows: $V = f (S) - E [f (S)], V_{1} = E [V ∣ X_{1}] - E [V]$ , and for $k > 1$ ,

V_{k} = E [V ∣ X_{1}, \dots, X_{k}] - E [V ∣ X_{1}, \dots, X_{k - 1}] .

Note that $V = \sum_{k = 1}^{m} V_{k}$ . Furthermore, the random variable $E [V ∣ X_{1}, \dots, X_{k}]$ is a function of $X_{1}, \dots, X_{k}$ . Conditioning on $X_{1}, \dots, X_{k - 1}$ and taking its expectation is therefore:

E [E [V ∣ X_{1}, \dots, X_{k}] ∣ X_{1}, \dots, X_{k - 1}] = E [V ∣ X_{1}, \dots, X_{k - 1}],

which implies $E [V_{k} ∣ X_{1}, \dots, X_{k - 1}] = 0$ . Thus, the sequence $(V_{k})_{k \in [1, m]}$ is a martingale difference sequence. Next, observe that, since $E [f (S)]$ is a scalar, $V_{k}$ can be expressed as follows:

V_{k} = E [f (S) ∣ X_{1}, \dots, X_{k}] - E [f (S) ∣ X_{1}, \dots, X_{k - 1}] .

Now define the random variables for each $i$

U_{i} L_{i} := x \in X_{i} sup E [f (X_{1 ⇁ (i - 1)}, x, X_{(i + 1) ⇁ n}) ∣ X_{1 ⇁ (i - 1)}, X_{i} = x] - [f (X_{1 ⇁ (i - 1)}, X_{i ⇁ n}) ∣ X_{1 ⇁ (i - 1)}], := x \in X_{i} in f E [f (X_{1 ⇁ (i - 1)}, x, X_{(i + 1) ⇁ n}) ∣ X_{1 ⇁ (i - 1)}, X_{i} = x] - [f (X_{1 ⇁ (i - 1)}, X_{i ⇁ n}) ∣ X_{1 ⇁ (i - 1)}] .

Since $X_{i}, \dots, X_{n}$ are independent of each other, conditioning on $X_{i} = x$ does not affect the probabilities of the other variables, so these are equal to the expressions

U_{i} L_{i} = x \in X_{i} sup E [f (X_{1 \to (i - 1)}, x, X_{(i + 1) ⇁ n}) - f (X_{1 ⇁ (i - 1)}, X_{i \to n}) ∣ X_{1 \to (i - 1)}], = x \in X_{i} in f E [f (X_{1 \to (i - 1)}, x, X_{(i + 1) ⇁ n}) - f (X_{1 \to (i - 1)}, X_{i ⇁ n}) ∣ X_{1 \to (i - 1)}] .

U_{i} - L_{i} = u \in X_{i}, ℓ \in X_{i} sup E [f (X_{1 ⇁ (i - 1)}, u, X_{(i + 1) ⇁ n}) ∣ X_{1 ⇁ (i - 1)}] - E [f (X_{1 ⇁ (i - 1)}, ℓ, X_{(i + 1) ⇁ n}) ∣ X_{1 ⇁ (i - 1)}] = u \in X_{i}, ℓ \in X_{i} sup E [f (X_{1 \to (i - 1)}, u, X_{(i + 1) ⇁ n}) - f (X_{1 \to (i - 1)}, l, X_{(i + 1) ⇁ n}) ∣ X_{1 ⇁ (i - 1)}] \leq x_{u} \in X_{i}, x_{l} \in X_{i} sup E [c_{i} ∣ X_{1 ⇁ (i - 1)}] \leq c_{i}

$L_{i}$ is a random variables about $X_{1}, X 2, ..., X_{i - 1}$
thus, $L_{k} \leq V_{k} \leq L_{k} + c_{k}$ . In view of these inequalities, we can apply Azuma’s inequality to $V = \sum_{k = 1}^{m} V_{k}$

Blogs

探索

概率论-集中不等式

Markov’s inequality

statement

intuition

proof

corollaries

Chebyshev’s inequality

statement

proof

Hoeffding’s inequality

Hoeffding’s lemma

statement

Proof

proof

McDiarmid’s inequality

statement

Definition Martingale Difference

Lemma

Theorem Azuma’s inequality

proof

proof

关系图谱

目录