Appendix B Entropy, Diffusion, Denoising, and Lossy Coding

The increase of disorder or entropy with time is one example of what is called an arrow of time, something that distinguishes the past from the future, giving a direction to time.

  – A Brief History of Time, Stephen Hawking

In this appendix we provide proofs for several facts, mentioned in Chapter 3, which are related to differential entropy, how it evolves under diffusion processes, and its connections to lossy coding. We will make the following mild assumption about the random variable representing the data source, denoted 𝒙\bm{x}bold_italic_x.

Assumption B.1.

𝒙\bm{x}bold_italic_x is supported on a compact set 𝒮D\mathcal{S}\subseteq\mathbb{R}^{D}caligraphic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT of radius at most RRitalic_R, i.e., Rsup𝝃𝒮𝝃2R\doteq\sup_{\bm{\xi}\in\mathcal{S}}\|\bm{\xi}\|_{2}italic_R ≐ roman_sup start_POSTSUBSCRIPT bold_italic_ξ ∈ caligraphic_S end_POSTSUBSCRIPT ∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

In particular, since compact sets in Euclidean space are bounded, it holds R<R<\inftyitalic_R < ∞. We will consistently use the notation Br(𝝃){𝒖D:𝝃𝒖2r}B_{r}(\bm{\xi})\doteq\{\bm{u}\in\mathbb{R}^{D}\colon\|\bm{\xi}-\bm{u}\|_{2}\leq r\}italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_ξ ) ≐ { bold_italic_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT : ∥ bold_italic_ξ - bold_italic_u ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r } to denote the Euclidean ball of radius rritalic_r centered at 𝝃\bm{\xi}bold_italic_ξ. In this sense, Assumption B.1 has 𝒮BR(𝟎)\mathcal{S}\subseteq B_{R}(\bm{0})caligraphic_S ⊆ italic_B start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( bold_0 ).

Notice that this assumption holds for (almost) all variables we care about in practice, as it is (often) imposed by a normalization step during data pre-processing.

B.1 Differential Entropy of Low-Dimensional Distributions

In this short appendix we discuss the differential entropy of low-dimensional distributions. By definition, the differential entropy of a random variable 𝒙\bm{x}bold_italic_x which does not have a density is -\infty- ∞; this includes all random variables supported on low-dimensional sets. The objective of this section is to discuss why this is a “morally correct” value.

In fact, let 𝒙\bm{x}bold_italic_x be any random variable such that Assumption B.1 holds, the support 𝒮\mathcal{S}caligraphic_S of 𝒙\bm{x}bold_italic_x has 0 volume.111Formally this means that 𝒮\mathcal{S}caligraphic_S is Borel measurable with Borel measure 0. We will consider the case that 𝒙\bm{x}bold_italic_x is uniform on 𝒮\mathcal{S}caligraphic_S.222Say, w.r.t.  the Hausdorff measure on 𝒮\mathcal{S}caligraphic_S. Our goal is to compute h(𝒙)h(\bm{x})italic_h ( bold_italic_x ).

In this case, 𝒙\bm{x}bold_italic_x would not have a density; in the counterfactual world where we did not know h(𝒙)=h(\bm{x})=-\inftyitalic_h ( bold_italic_x ) = - ∞, we could not directly define it using the standard definition of differential entropy. Instead, as in the rest of analysis and information theory it would be reasonable to consider the limit of entropies of successively better approximations 𝒙ε\bm{x}_{\varepsilon}bold_italic_x start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT of 𝒙\bm{x}bold_italic_x which have densities, i.e.,

h(𝒙)“=”limε0h(𝒙ε).h(\bm{x})\ \text{``=''}\ \lim_{\varepsilon\searrow 0}h(\bm{x}_{\varepsilon}).italic_h ( bold_italic_x ) “=” roman_lim start_POSTSUBSCRIPT italic_ε ↘ 0 end_POSTSUBSCRIPT italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) . (B.1.1)

To this end, the basic idea is to take an ε\varepsilonitalic_ε-thickening of 𝒮\mathcal{S}caligraphic_S, say 𝒮ε\mathcal{S}_{\varepsilon}caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT defined as

Sε=𝝃𝒮Bε(𝝃)S_{\varepsilon}=\bigcup_{\bm{\xi}\in\mathcal{S}}B_{\varepsilon}(\bm{\xi})italic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT bold_italic_ξ ∈ caligraphic_S end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( bold_italic_ξ ) (B.1.2)

and visualized in Figure B.1.

𝒮\mathcal{S}caligraphic_Sxxitalic_x𝒮ε=𝝃SBε(𝝃)\mathcal{S}_{\varepsilon}=\bigcup_{\bm{\xi}\in S}B_{\varepsilon}(\bm{\xi})caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT bold_italic_ξ ∈ italic_S end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( bold_italic_ξ )
Figure B.1: Illustration of the ε\varepsilonitalic_ε-thickening 𝒮ε\mathcal{S}_{\varepsilon}caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT of a curve 𝒮2\mathcal{S}\subseteq\mathbb{R}^{2}caligraphic_S ⊆ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

We will work with random variables whose support is 𝒮ε\mathcal{S}_{\varepsilon}caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT, which is fully-dimensional, and take the limit as ε0\varepsilon\to 0italic_ε → 0. Indeed, define 𝒙ε𝒰(𝒮ε)\bm{x}_{\varepsilon}\sim\operatorname{\mathcal{U}}(\mathcal{S}_{\varepsilon})bold_italic_x start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ∼ caligraphic_U ( caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ). Since 𝒮ε\mathcal{S}_{\varepsilon}caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT has positive volume, 𝒙ε\bm{x}_{\varepsilon}bold_italic_x start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT has a density pεp_{\varepsilon}italic_p start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT equal to

pε(𝝃)=𝟏(𝝃𝒮ε)1vol(𝒮ε).p_{\varepsilon}(\bm{\xi})=\mathbf{1}(\bm{\xi}\in\mathcal{S}_{\varepsilon})\cdot\frac{1}{\operatorname{vol}(\mathcal{S}_{\varepsilon})}.italic_p start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( bold_italic_ξ ) = bold_1 ( bold_italic_ξ ∈ caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) ⋅ divide start_ARG 1 end_ARG start_ARG roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) end_ARG . (B.1.3)

Computing the entropy of 𝒙ε\bm{x}_{\varepsilon}bold_italic_x start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT using the convention that 0log0=00\log 0=00 roman_log 0 = 0, it holds

h(𝒙ε)\displaystyle h(\bm{x}_{\varepsilon})italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) =Dpε(x)logpε(x)dx\displaystyle=-\int_{\mathbb{R}^{D}}p_{\varepsilon}(x)\log p_{\varepsilon}(x)\mathrm{d}x= - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( italic_x ) roman_log italic_p start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x (B.1.4)
=𝒮ε1vol(𝒮ε)log(1vol(𝒮ε))d𝝃\displaystyle=-\int_{\mathcal{S}_{\varepsilon}}\frac{1}{\operatorname{vol}(\mathcal{S}_{\varepsilon})}\log\left(\frac{1}{\operatorname{vol}(\mathcal{S}_{\varepsilon})}\right)\mathrm{d}\bm{\xi}= - ∫ start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) end_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) end_ARG ) roman_d bold_italic_ξ (B.1.5)
=log(vol(𝒮ε))vol(𝒮ε)𝒮εd𝝃\displaystyle=\frac{\log(\operatorname{vol}(\mathcal{S}_{\varepsilon}))}{\operatorname{vol}(\mathcal{S}_{\varepsilon})}\int_{\mathcal{S}_{\varepsilon}}\mathrm{d}\bm{\xi}= divide start_ARG roman_log ( roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) ) end_ARG start_ARG roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) end_ARG ∫ start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_d bold_italic_ξ (B.1.6)
=log(vol(𝒮ε)).\displaystyle=\log(\operatorname{vol}(\mathcal{S}_{\varepsilon})).= roman_log ( roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) ) . (B.1.7)

Since 𝒮\mathcal{S}caligraphic_S is compact vol(𝒮ε)\operatorname{vol}(\mathcal{S}_{\varepsilon})roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) is finite and tends to 0 as ε0\varepsilon\searrow 0italic_ε ↘ 0. Thus

h(𝒙)=limε0h(𝒙ε)=limε0log(vol(𝒮ε))=,h(\bm{x})=\lim_{\varepsilon\searrow 0}h(\bm{x}_{\varepsilon})=\lim_{\varepsilon\searrow 0}\log(\operatorname{vol}(\mathcal{S}_{\varepsilon}))=-\infty,italic_h ( bold_italic_x ) = roman_lim start_POSTSUBSCRIPT italic_ε ↘ 0 end_POSTSUBSCRIPT italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) = roman_lim start_POSTSUBSCRIPT italic_ε ↘ 0 end_POSTSUBSCRIPT roman_log ( roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) ) = - ∞ , (B.1.8)

as desired.

The above calculation is actually a corollary of a much more famous and celebrated set of results about the maximum possible entropy of 𝒙\bm{x}bold_italic_x subject to certain constraints on the distribution of 𝒙\bm{x}bold_italic_x. We would be remiss to not provide the results here; the proofs are provided in Chapter 2 of [PW22], for example.

Theorem B.1.

Let 𝐱\bm{x}bold_italic_x be a random variable on D\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT.

  1. 1.

    If 𝒙\bm{x}bold_italic_x is supported on a compact set 𝒮D\mathcal{S}\subseteq\mathbb{R}^{D}caligraphic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT (i.e., Assumption B.1) then

    h(𝒙)h(𝒰(𝒮))=logvol(𝒮).h(\bm{x})\leq h(\operatorname{\mathcal{U}}(\mathcal{S}))=\log\operatorname{vol}(\mathcal{S}).italic_h ( bold_italic_x ) ≤ italic_h ( caligraphic_U ( caligraphic_S ) ) = roman_log roman_vol ( caligraphic_S ) . (B.1.9)
  2. 2.

    If 𝒙\bm{x}bold_italic_x has finite covariance such that, for a PSD matrix 𝚺𝖯𝖲𝖣(D)\bm{\Sigma}\in\mathsf{PSD}(D)bold_Σ ∈ sansserif_PSD ( italic_D ), it holds Cov(𝒙)𝚺\operatorname{Cov}(\bm{x})\preceq\bm{\Sigma}roman_Cov ( bold_italic_x ) ⪯ bold_Σ (w.r.t. the PSD ordering, i.e., 𝚺Cov(𝒙)\bm{\Sigma}-\operatorname{Cov}(\bm{x})bold_Σ - roman_Cov ( bold_italic_x ) is PSD), then

    h(𝒙)h(𝒩(𝟎,𝚺))=12log((2πe)Ddet𝚺).h(\bm{x})\leq h(\operatorname{\mathcal{N}}(\bm{0},\bm{\Sigma}))=\frac{1}{2}\log((2\pi e)^{D}\det\bm{\Sigma}).italic_h ( bold_italic_x ) ≤ italic_h ( caligraphic_N ( bold_0 , bold_Σ ) ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( ( 2 italic_π italic_e ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT roman_det bold_Σ ) . (B.1.10)
  3. 3.

    If 𝒙\bm{x}bold_italic_x has finite second moment such that, for a constant a0a\geq 0italic_a ≥ 0, it holds 𝔼𝒙22a\operatorname{\mathbb{E}}\|\bm{x}\|_{2}^{2}\leq ablackboard_E ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_a, then

    h(𝒙)h(𝒩(𝟎,aD𝑰))=D2log2πeaD.h(\bm{x})\leq h\left(\operatorname{\mathcal{N}}\left(\bm{0},\frac{a}{D}\bm{I}\right)\right)=\frac{D}{2}\log\frac{2\pi ea}{D}.italic_h ( bold_italic_x ) ≤ italic_h ( caligraphic_N ( bold_0 , divide start_ARG italic_a end_ARG start_ARG italic_D end_ARG bold_italic_I ) ) = divide start_ARG italic_D end_ARG start_ARG 2 end_ARG roman_log divide start_ARG 2 italic_π italic_e italic_a end_ARG start_ARG italic_D end_ARG . (B.1.11)

B.2 Diffusion and Denoising Processes

In the main body (Chapter 3), we considered a random variable 𝒙\bm{x}bold_italic_x, and a stochastic process defined by (3.2.1), i.e.,

𝒙t=𝒙+t𝒈,t[0,T]\bm{x}_{t}=\bm{x}+t\bm{g},\qquad\forall t\in[0,T]bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_x + italic_t bold_italic_g , ∀ italic_t ∈ [ 0 , italic_T ] (B.2.1)

where 𝒈𝒩(𝟎,𝑰)\bm{g}\sim\operatorname{\mathcal{N}}(\bm{0},\bm{I})bold_italic_g ∼ caligraphic_N ( bold_0 , bold_italic_I ) independently of 𝒙\bm{x}bold_italic_x.

The structure of this section is as follows. In Section B.2.1 we provide a formal theorem and crisp proof which shows that under Equation B.2.1 the entropy increases, i.e., ddth(𝒙t)>0\frac{\mathrm{d}}{\mathrm{d}t}h(\bm{x}_{t})>0divide start_ARG roman_d end_ARG start_ARG roman_d italic_t end_ARG italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) > 0. In Section B.2.2 we provide a formal theorem and crisp proof which shows that under Equation B.2.1, the entropy decreases during denoising, i.e., h(𝔼[𝒙s𝒙t])<h(𝒙t)h(\operatorname{\mathbb{E}}[\bm{x}_{s}\mid\bm{x}_{t}])<h(\bm{x}_{t})italic_h ( blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ) < italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for all s<ts<titalic_s < italic_t. In Section B.2.3 we provide proofs for technical lemmas that are needed to establish the claims in the previous subsections.

Before we start, we introduce some key notations. First, let φt\varphi_{t}italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the density of 𝒩(𝟎,t2𝑰)\operatorname{\mathcal{N}}(\bm{0},t^{2}\bm{I})caligraphic_N ( bold_0 , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I ), i.e.,

φt(𝝃)1(2π)D/2tDexp(𝝃222t2).\varphi_{t}(\bm{\xi})\doteq\frac{1}{(2\pi)^{D/2}t^{D}}\exp\left(-\frac{\|\bm{\xi}\|_{2}^{2}}{2t^{2}}\right).italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≐ divide start_ARG 1 end_ARG start_ARG ( 2 italic_π ) start_POSTSUPERSCRIPT italic_D / 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_ARG roman_exp ( - divide start_ARG ∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) . (B.2.2)

Next, 𝒙t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is supported on all of D\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT, so it has a density, which we denote ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (as in the main body). A quick calculation shows that

pt(𝝃)=𝔼[φt(𝝃𝒙)],p_{t}(\bm{\xi})=\operatorname{\mathbb{E}}[\varphi_{t}(\bm{\xi}-\bm{x})],italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) = blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] , (B.2.3)

and from this representation it is possible to deduce (i.e., from Proposition B.4) that ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is smooth (i.e., infinitely differentiable) in 𝝃\bm{\xi}bold_italic_ξ, also smooth in ttitalic_t, and positive everywhere. This fact is somewhat remarkable at first sight: even for a completely irregular random variable 𝒙\bm{x}bold_italic_x (say, a Bernoulli random variable, which does not have a density), its Gaussian smoothing admits a density for every (arbitrarily small) t>0t>0italic_t > 0. The proof is left as an exercise for readers well-versed in mathematical analysis.

However, we also need to add an assumption about the smoothness of the distribution of 𝒙\bm{x}bold_italic_x, which will eliminate some technical problems that occur around t=0t=0italic_t = 0 with low-dimensional distributions.333As then various quantities become highly irregular and dealing with them would require significant additional analysis. Despite this, we expect that our results hold under milder assumptions with additional work. For now, let us assume:

Assumption B.2.

𝒙\bm{x}bold_italic_x has a twice continuously differentiable density, denoted ppitalic_p.

B.2.1 Diffusion Process Increases Entropy Over Time

In this section appendix we provide a proof of Theorem B.2. For convenience, we restate it as follows.

Theorem B.2 (Diffusion Increases Entropy).

Let 𝐱\bm{x}bold_italic_x be any random variable such that Assumptions B.1 and B.2 hold, and let (𝐱t)t[0,T](\bm{x}_{t})_{t\in[0,T]}( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT be the stochastic process (B.2.1). Then

h(𝒙s)<h(𝒙t),s,t:0s<tT.h(\bm{x}_{s})<h(\bm{x}_{t}),\qquad\forall s,t\colon 0\leq s<t\leq T.italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) < italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , ∀ italic_s , italic_t : 0 ≤ italic_s < italic_t ≤ italic_T . (B.2.4)
Proof.

Before we start, we must ask: when does the inequality in (B.2.4) make sense? We will show in Lemma B.1 that under our assumptions, the differential entropy is well-defined, is never ++\infty+ ∞, and for t>0t>0italic_t > 0 is finite, so the (strict) inequality in (B.2.4) makes sense.

The question of well-definedness aside, the crux of this proof is to show that the density ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of 𝒙t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfies a particular partial differential equation, which is very similar to the heat equation. The heat equation is a famous PDE which describes the diffusion of heat through space. This intuitively should make sense, and paints a mental picture: as the time ttitalic_t increases, the probability from the original (perhaps tightly concentrated) 𝒙\bm{x}bold_italic_x disperses across all of D\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT like heat radiating from a source in a vacuum.

Such PDEs for ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, known as Fokker-Planck equations for more general stochastic processes, are very powerful tools, as they allow us to describe the instantaneous temporal derivatives of ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in terms of the instantaneous spatial derivatives of ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and vice versa, providing a concise description of the regularity and dynamics of ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Once we obtain dynamics for ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we can then use the system to obtain another one which describes the dynamics of h(𝒙t)h(\bm{x}_{t})italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), which after all is just a functional of ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

The description of the PDE involves a mathematical object called the Laplacian Δ\Deltaroman_Δ. Recall from your multivariable calculus class that the Laplacian operating on a differentiable-in-time and twice-differentiable-in-space function f:(0,T)×Df\colon(0,T)\times\mathbb{R}^{D}\to\mathbb{R}italic_f : ( 0 , italic_T ) × blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT → blackboard_R is given by

Δft(𝝃)=tr(2ft(𝝃))=i=1D2ftξi2(𝝃).\Delta f_{t}(\bm{\xi})=\operatorname{tr}(\nabla^{2}f_{t}(\bm{\xi}))=\sum_{i=1}^{D}\frac{\partial^{2}f_{t}}{\partial\xi_{i}^{2}}(\bm{\xi}).roman_Δ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) = roman_tr ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( bold_italic_ξ ) . (B.2.5)

Namely, from using the integral representation of ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and differentiating under the integral, we can compute the derivatives of ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (which we do in Proposition B.1) and observe that ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfies the heat-like PDE

ptt(𝝃)=tΔpt(𝝃).\frac{\partial p_{t}}{\partial t}(\bm{\xi})=t\Delta p_{t}(\bm{\xi}).divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG ( bold_italic_ξ ) = italic_t roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) . (B.2.6)

Then for finding the dynamics of h(𝒙t)h(\bm{x}_{t})italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), we can use Proposition B.3 again as well as the heat-like PDE to get

ddth(𝒙t)\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}h(\bm{x}_{t})divide start_ARG roman_d end_ARG start_ARG roman_d italic_t end_ARG italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) =ddtDpt(𝝃)logpt(𝝃)d𝝃\displaystyle=-\frac{\mathrm{d}}{\mathrm{d}t}\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\log p_{t}(\bm{\xi})\mathrm{d}\bm{\xi}= - divide start_ARG roman_d end_ARG start_ARG roman_d italic_t end_ARG ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ (B.2.7)
=Dt[pt(𝝃)logpt(𝝃)]d𝝃\displaystyle=-\int_{\mathbb{R}^{D}}\frac{\partial}{\partial t}\left[p_{t}(\bm{\xi})\log p_{t}(\bm{\xi})\right]\mathrm{d}\bm{\xi}= - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG [ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] roman_d bold_italic_ξ (B.2.8)
=Dptt(𝝃)[1+logpt(𝝃)]d𝝃\displaystyle=-\int_{\mathbb{R}^{D}}\frac{\partial p_{t}}{\partial t}(\bm{\xi})[1+\log p_{t}(\bm{\xi})]\mathrm{d}\bm{\xi}= - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG ( bold_italic_ξ ) [ 1 + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] roman_d bold_italic_ξ (B.2.9)
=tDΔpt(𝝃)[1+logpt(𝝃)]d𝝃.\displaystyle=-t\int_{\mathbb{R}^{D}}\Delta p_{t}(\bm{\xi})[1+\log p_{t}(\bm{\xi})]\mathrm{d}\bm{\xi}.= - italic_t ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) [ 1 + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] roman_d bold_italic_ξ . (B.2.10)

By using a slightly involved integration by parts argument (Lemma B.2), we obtain

ddth(𝒙t)\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}h(\bm{x}_{t})divide start_ARG roman_d end_ARG start_ARG roman_d italic_t end_ARG italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) =tDlogpt(𝝃),pt(𝝃)d𝝃\displaystyle=t\int_{\mathbb{R}^{D}}\langle\nabla\log p_{t}(\bm{\xi}),\nabla p_{t}(\bm{\xi})\rangle\mathrm{d}\bm{\xi}= italic_t ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⟨ ∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟩ roman_d bold_italic_ξ (B.2.11)
=tDpt(𝝃)22pt(𝝃)d𝝃\displaystyle=t\int_{\mathbb{R}^{D}}\frac{\|\nabla p_{t}(\bm{\xi})\|_{2}^{2}}{p_{t}(\bm{\xi})}\mathrm{d}\bm{\xi}= italic_t ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG ∥ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG roman_d bold_italic_ξ (B.2.12)
>0\displaystyle>0> 0 (B.2.13)

where strict inequality holds in the last line because, for it to not hold, pt(𝝃)\nabla p_{t}(\bm{\xi})∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) would need to vanish almost everywhere (i.e., everywhere except possibly on a set of zero volume), but this would imply that ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT would be constant almost everywhere, a contradiction with a fact that ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a density.

To complete the proof we just use the fundamental theorem of calculus

h(𝒙t)=h(𝒙s)+stdduh(𝒙u)du>h(𝒙s),h(\bm{x}_{t})=h(\bm{x}_{s})+\int_{s}^{t}\frac{\mathrm{d}}{\mathrm{d}u}h(\bm{x}_{u})\mathrm{d}u>h(\bm{x}_{s}),italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) + ∫ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT divide start_ARG roman_d end_ARG start_ARG roman_d italic_u end_ARG italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) roman_d italic_u > italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) , (B.2.14)

which proves the claim. (Note that this does not make sense when h(𝒙s)=h(\bm{x}_{s})=-\inftyitalic_h ( bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) = - ∞, which can only happen when s=0s=0italic_s = 0 and h(𝒙)=h(\bm{x})=-\inftyitalic_h ( bold_italic_x ) = - ∞, but in this case h(𝒙t)>h(\bm{x}_{t})>-\inftyitalic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) > - ∞ so the claim is vacuously true anyways.) ∎

B.2.2 Denoising Process Reduces Entropy Over Time

Recall that in Section 3.2.1 we start with the random variable 𝒙T\bm{x}_{T}bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and iteratively denoise it using iterations of the form

𝒙^s𝔼[𝒙s𝒙t=𝒙^t]=st𝒙^t+(1st)𝒙¯(t,𝒙^t).\hat{\bm{x}}_{s}\doteq\operatorname{\mathbb{E}}[\bm{x}_{s}\mid\bm{x}_{t}=\hat{\bm{x}}_{t}]=\frac{s}{t}\hat{\bm{x}}_{t}+\left(1-\frac{s}{t}\right)\bar{\bm{x}}^{\ast}(t,\hat{\bm{x}}_{t}).over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≐ blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t , over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (B.2.15)

for s,t{t0,t1,,tL}s,t\in\{t_{0},t_{1},\dots,t_{L}\}italic_s , italic_t ∈ { italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } with s<ts<titalic_s < italic_t and 𝒙T=𝒙^T\bm{x}_{T}=\hat{\bm{x}}_{T}bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. We wish to prove that h(𝒙^s)<h(𝒙^t)h(\hat{\bm{x}}_{s})<h(\hat{\bm{x}}_{t})italic_h ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) < italic_h ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), showing that the denoising process actually reduces the entropy.

Before we go about doing this, we make several remarks about the problem statement. First, Tweedie’s formula (3.2.20) says that

𝒙¯(t,𝒙t)=𝒙t+t2pt(𝒙t),\bar{\bm{x}}^{\ast}(t,\bm{x}_{t})=\bm{x}_{t}+t^{2}\nabla p_{t}(\bm{x}_{t}),over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t , bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (B.2.16)

which likens a full denoising step from time ttitalic_t to time 0 to a gradient step on the log-density of 𝒙t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Can we get a similar result for the full denoising step from time ttitalic_t to time ssitalic_s in (B.2.15)? It turns out that indeed we can, and it is pretty simple. By using (B.2.15) and Tweedie’s formula (3.2.20), we obtain

𝔼[𝒙s𝒙t]=st𝒙t+(1st)(𝒙t+t2𝒙tlogpt(𝒙t))=𝒙t+(1st)t2𝒙tlogpt(𝒙t).\operatorname{\mathbb{E}}[\bm{x}_{s}\mid\bm{x}_{t}]=\frac{s}{t}\bm{x}_{t}+\left(1-\frac{s}{t}\right)\left(\bm{x}_{t}+t^{2}\nabla_{\bm{x}_{t}}\log p_{t}(\bm{x}_{t})\right)=\bm{x}_{t}+\left(1-\frac{s}{t}\right)t^{2}\nabla_{\bm{x}_{t}}\log p_{t}(\bm{x}_{t}).blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) = bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (B.2.17)

So this iterative denoising step is again a gradient step on the perturbed log-density logpt\log p_{t}roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with a shrunken step size. In particular, this step can be seen as a perturbation of the distribution of the random variable 𝒙t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT by the score function vector field, suggesting a connection to stochastic differential equations (SDEs) and the theory of diffusion models [SSK+21]. Indeed, a proof of the following result Theorem B.3 can be developed using this powerful machinery and a limiting argument (e.g., following the technical approach in the exposition of [CCL+23]). We will give a simpler proof here, which will use only elementary tools and thereby illuminate some of the key quantities behind the process of entropy reduction via denoising. On the other hand, we will need to deal with some slightly technical calculations due to the fact that the denoising process in Theorem B.3 does not correspond exactly to the reverse process associated to the noise addition process that generates the observation 𝒙t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.444For those familiar with diffusion models, we refer here to the time-reversed forward process not coinciding with the sequence of iterates generated by the process defined by Theorem B.3. These processes coincide in a certain limit where infinitely many steps of Theorem B.3 are taken with infinitely small levels of noise added at each step; for general, finite steps, we must introduce some approximations regardless of the level of sophistication of our tools.

We want to prove that h(𝔼[𝒙s𝒙t])<h(𝒙t)h(\operatorname{\mathbb{E}}[\bm{x}_{s}\mid\bm{x}_{t}])<h(\bm{x}_{t})italic_h ( blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ) < italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), i.e., formally:

Theorem B.3.

Let 𝐱\bm{x}bold_italic_x be any random variable such that Assumptions B.1 and B.2 hold, and let (𝐱t)t[0,T](\bm{x}_{t})_{t\in[0,T]}( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT be the stochastic process (B.2.1). Then

h(𝔼[𝒙s𝒙t])<h(𝒙t),s,t[0,T]:0<tR2D,0s<tmin{1,R2/D2t2R2/Dt2}.h(\operatorname{\mathbb{E}}[\bm{x}_{s}\mid\bm{x}_{t}])<h(\bm{x}_{t}),\qquad\forall s,t\in[0,T]\colon\quad 0<t\leq\frac{R}{\sqrt{2D}},\quad 0\leq s<t\cdot\min\left\{1,\frac{R^{2}/D-2t^{2}}{R^{2}/D-t^{2}}\right\}.italic_h ( blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ) < italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , ∀ italic_s , italic_t ∈ [ 0 , italic_T ] : 0 < italic_t ≤ divide start_ARG italic_R end_ARG start_ARG square-root start_ARG 2 italic_D end_ARG end_ARG , 0 ≤ italic_s < italic_t ⋅ roman_min { 1 , divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_D - 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_D - italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } . (B.2.18)
Proof.

This proof uses two main ideas:

  1. 1.

    First, write down a density for 𝔼[𝒙s𝒙t]\operatorname{\mathbb{E}}[\bm{x}_{s}\mid\bm{x}_{t}]blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] using a change-of-variables formula.

  2. 2.

    Second, bound this density to control the entropy.

The change of variables is justified by Corollary B.1, which was originally derived in [Gri11].

We execute these ideas now. From Corollary B.1, we obtain that the function 𝒙¯\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG defined as 𝒙¯(𝝃)𝔼[𝒙s𝒙t=𝝃]\bar{\bm{x}}(\bm{\xi})\doteq\operatorname{\mathbb{E}}[\bm{x}_{s}\mid\bm{x}_{t}=\bm{\xi}]over¯ start_ARG bold_italic_x end_ARG ( bold_italic_ξ ) ≐ blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_ξ ] is differentiable, injective, and thus invertible on its range, which we henceforth denote 𝒳D\mathcal{X}\subseteq\mathbb{R}^{D}caligraphic_X ⊆ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT. We denote its inverse as 𝒙¯1\bar{\bm{x}}^{-1}over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Using a change-of-variables formula, the density p¯\bar{p}over¯ start_ARG italic_p end_ARG of 𝒙¯(𝒙t)\bar{\bm{x}}(\bm{x}_{t})over¯ start_ARG bold_italic_x end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is given by

p¯(𝝃)(pt𝒙¯1)(𝝃)det(𝒙¯(𝒙¯1(𝝃))),\bar{p}(\bm{\xi})\doteq\frac{(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))},over¯ start_ARG italic_p end_ARG ( bold_italic_ξ ) ≐ divide start_ARG ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG , (B.2.19)

where (recall, from Section A.2) 𝒙¯\bar{\bm{x}}^{\prime}over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the Jacobian of 𝒙¯\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG. Since from Lemma B.3 we know 𝒙¯\bar{\bm{x}}^{\prime}over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a positive definite matrix, the determinant is positive and so the whole density is positive. Then it follows that

h(𝒙¯(𝒙t))\displaystyle h(\bar{\bm{x}}(\bm{x}_{t}))italic_h ( over¯ start_ARG bold_italic_x end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) =𝒳(pt𝒙¯1)(𝝃)det(𝒙¯(𝒙¯1(𝝃)))log(pt𝒙¯1)(𝝃)det(𝒙¯(𝒙¯1(𝝃)))d𝝃\displaystyle=-\int_{\mathcal{X}}\frac{(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\log\frac{(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\mathrm{d}\bm{\xi}= - ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT divide start_ARG ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG roman_log divide start_ARG ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG roman_d bold_italic_ξ (B.2.20)
=𝒳(pt𝒙¯1)(𝝃)det(𝒙¯(𝒙¯1(𝝃)))log((pt𝒙¯1)(𝝃))d𝝃\displaystyle=-\int_{\mathcal{X}}\frac{(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\log((p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi}))\mathrm{d}\bm{\xi}= - ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT divide start_ARG ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG roman_log ( ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) ) roman_d bold_italic_ξ (B.2.21)
+𝒳(pt𝒙¯1)(𝝃)det(𝒙¯(𝒙¯1(𝝃)))logdet(𝒙¯(𝒙¯1(𝝃)))d𝝃\displaystyle\qquad+\int_{\mathcal{X}}\frac{(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\log\det\left(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi}))\right)\mathrm{d}\bm{\xi}+ ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT divide start_ARG ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG roman_log roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) roman_d bold_italic_ξ (B.2.22)
=Dpt(𝝃)logpt(𝝃)d𝝃+𝒳(pt𝒙¯1)(𝝃)det(𝒙¯(𝒙¯1(𝝃)))logdet(𝒙¯(𝒙¯1(𝝃)))d𝝃\displaystyle=-\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\log p_{t}(\bm{\xi})\mathrm{d}\bm{\xi}+\int_{\mathcal{X}}\frac{(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\log\det\left(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi}))\right)\mathrm{d}\bm{\xi}= - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ + ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT divide start_ARG ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG roman_log roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) roman_d bold_italic_ξ (B.2.23)
=h(𝒙t)𝒳(pt𝒙¯1)(𝝃)det(𝒙¯(𝒙¯1(𝝃)))log(1det(𝒙¯(𝒙¯1(𝝃))))d𝝃.\displaystyle=h(\bm{x}_{t})-\int_{\mathcal{X}}\frac{(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\log\left(\frac{1}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\right)\mathrm{d}\bm{\xi}.= italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT divide start_ARG ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG ) roman_d bold_italic_ξ . (B.2.24)

We will study the last term (including the --), and show that it is negative.

By concavity, one has xlogx1x-x\log x\leq 1-x- italic_x roman_log italic_x ≤ 1 - italic_x for every x0x\geq 0italic_x ≥ 0. Hence

h(𝒙¯(𝒙t))h(𝒙t)\displaystyle h(\bar{\bm{x}}(\bm{x}_{t}))-h(\bm{x}_{t})italic_h ( over¯ start_ARG bold_italic_x end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) - italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) =𝒳(pt𝒙¯1)(𝝃)det(𝒙¯(𝒙¯1(𝝃)))log(1det(𝒙¯(𝒙¯1(𝝃))))d𝝃\displaystyle=-\int_{\mathcal{X}}\frac{(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\log\left(\frac{1}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\right)\mathrm{d}\bm{\xi}= - ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT divide start_ARG ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG ) roman_d bold_italic_ξ (B.2.25)
𝒳(pt𝒙¯1)(𝝃)(11det(𝒙¯(𝒙¯1(𝝃))))d𝝃\displaystyle\leq\int_{\mathcal{X}}(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})\cdot\left(1-\frac{1}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\right)\mathrm{d}\bm{\xi}≤ ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) ⋅ ( 1 - divide start_ARG 1 end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG ) roman_d bold_italic_ξ (B.2.26)
=𝒳(pt𝒙¯1)(𝝃)d𝝃𝒳(pt𝒙¯1)(𝝃)det(𝒙¯(𝒙¯1(𝝃)))d𝝃\displaystyle=\int_{\mathcal{X}}(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})\mathrm{d}\bm{\xi}-\int_{\mathcal{X}}\frac{(p_{t}\circ\bar{\bm{x}}^{-1})(\bm{\xi})}{\det(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi})))}\mathrm{d}\bm{\xi}= ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) roman_d bold_italic_ξ - ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT divide start_ARG ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( bold_italic_ξ ) end_ARG start_ARG roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) end_ARG roman_d bold_italic_ξ (B.2.27)
=Dpt(𝝃)det(𝒙¯(𝒙¯1(𝝃)))d𝝃𝒳p¯(𝝃)d𝝃\displaystyle=\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\det\left(\bar{\bm{x}}^{\prime}(\bar{\bm{x}}^{-1}(\bm{\xi}))\right)\mathrm{d}\bm{\xi}-\int_{\mathcal{X}}\bar{p}(\bm{\xi})\mathrm{d}\bm{\xi}= ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_det ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) ) ) roman_d bold_italic_ξ - ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG ( bold_italic_ξ ) roman_d bold_italic_ξ (B.2.28)
=Dpt(𝝃)det(𝑰+(1st)t22logpt(𝝃))d𝝃1.\displaystyle=\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\det\left(\bm{I}+\left(1-\frac{s}{t}\right)t^{2}\nabla^{2}\log p_{t}(\bm{\xi})\right)\mathrm{d}\bm{\xi}-1.= ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_det ( bold_italic_I + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) roman_d bold_italic_ξ - 1 . (B.2.29)

Now, by the AM-GM inequality on eigenvalues, we have for any symmetric positive definite matrix 𝑴𝖯𝖲𝖣(D)\bm{M}\in\mathsf{PSD}(D)bold_italic_M ∈ sansserif_PSD ( italic_D ) the bound

det(𝑴)1/D=i=1Dλi(𝑴)1/Di=1Dλi(𝑴)D=tr(𝑴)D,\det(\bm{M})^{1/D}=\prod_{i=1}^{D}\lambda_{i}(\bm{M})^{1/D}\leq\frac{\sum_{i=1}^{D}\lambda_{i}(\bm{M})}{D}=\frac{\operatorname{tr}(\bm{M})}{D},roman_det ( bold_italic_M ) start_POSTSUPERSCRIPT 1 / italic_D end_POSTSUPERSCRIPT = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_M ) start_POSTSUPERSCRIPT 1 / italic_D end_POSTSUPERSCRIPT ≤ divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_M ) end_ARG start_ARG italic_D end_ARG = divide start_ARG roman_tr ( bold_italic_M ) end_ARG start_ARG italic_D end_ARG , (B.2.30)

which we can apply to the above expression and obtain

Dpt(𝝃)det(𝑰+(1st)t22logpt(𝝃))d𝝃\displaystyle\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\det\left(\bm{I}+\left(1-\frac{s}{t}\right)t^{2}\nabla^{2}\log p_{t}(\bm{\xi})\right)\mathrm{d}\bm{\xi}∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_det ( bold_italic_I + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) roman_d bold_italic_ξ (B.2.31)
Dpt(𝝃)tr(1D[𝑰+(1st)t22logpt(𝝃)])Dd𝝃\displaystyle\leq\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\operatorname{tr}\left(\frac{1}{D}\left[\bm{I}+\left(1-\frac{s}{t}\right)t^{2}\nabla^{2}\log p_{t}(\bm{\xi})\right]\right)^{D}\mathrm{d}\bm{\xi}≤ ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_tr ( divide start_ARG 1 end_ARG start_ARG italic_D end_ARG [ bold_italic_I + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT roman_d bold_italic_ξ (B.2.32)
=Dpt(𝝃)(1+(1st)t2Dtr(2logpt(𝝃)))Dd𝝃\displaystyle=\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\left(1+\frac{\left(1-\frac{s}{t}\right)t^{2}}{D}\operatorname{tr}(\nabla^{2}\log p_{t}(\bm{\xi}))\right)^{D}\mathrm{d}\bm{\xi}= ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ( 1 + divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D end_ARG roman_tr ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT roman_d bold_italic_ξ (B.2.33)
=Dpt(𝝃)(1+(1st)t2DΔlogpt(𝝃))Dd𝝃.\displaystyle=\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\left(1+\frac{\left(1-\frac{s}{t}\right)t^{2}}{D}\Delta\log p_{t}(\bm{\xi})\right)^{D}\mathrm{d}\bm{\xi}.= ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ( 1 + divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D end_ARG roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT roman_d bold_italic_ξ . (B.2.34)

From Lemma B.5, it holds (where, recall, RRitalic_R is the radius of the support of 𝒙\bm{x}bold_italic_x as in Assumption B.1)

|Δlogpt(𝝃)|max(Dt2,|R2t4Dt2|)=:Ut.\lvert\Delta\log p_{t}(\bm{\xi})\rvert\leq\max\left(\frac{D}{t^{2}},\left\lvert\frac{R^{2}}{t^{4}}-\frac{D}{t^{2}}\right\rvert\right)=:U_{t}.| roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) | ≤ roman_max ( divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , | divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG | ) = : italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . (B.2.35)

Then it holds

(1st)t2DUt(1st)t2DΔlogpt(𝝃)(1st)t2DUt.-\frac{\left(1-\frac{s}{t}\right)t^{2}}{D}U_{t}\leq\frac{\left(1-\frac{s}{t}\right)t^{2}}{D}\Delta\log p_{t}(\bm{\xi})\leq\frac{\left(1-\frac{s}{t}\right)t^{2}}{D}U_{t}.- divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D end_ARG italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D end_ARG roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≤ divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D end_ARG italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . (B.2.36)

Meanwhile, the function x(1+x)Dx\mapsto(1+x)^{D}italic_x ↦ ( 1 + italic_x ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT is convex on [1,)[-1,\infty)[ - 1 , ∞ ), so for (1s/t)t2Ut/Dx(1s/t)t2Ut/D-(1-s/t)t^{2}U_{t}/D\leq x\leq(1-s/t)t^{2}U_{t}/D- ( 1 - italic_s / italic_t ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_D ≤ italic_x ≤ ( 1 - italic_s / italic_t ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_D we have

(1+x)d\displaystyle(1+x)^{d}( 1 + italic_x ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (1(1st)t2UtD)D+[(1+(1st)t2UtD)D(1(1st)t2UtD)D]M(s,t,D)x\displaystyle\leq\left(1-\frac{\left(1-\frac{s}{t}\right)t^{2}U_{t}}{D}\right)^{D}+\underbrace{\left[\left(1+\frac{\left(1-\frac{s}{t}\right)t^{2}U_{t}}{D}\right)^{D}-\left(1-\frac{\left(1-\frac{s}{t}\right)t^{2}U_{t}}{D}\right)^{D}\right]}_{M(s,t,D)}x≤ ( 1 - divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_D end_ARG ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT + under⏟ start_ARG [ ( 1 + divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_D end_ARG ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT - ( 1 - divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_D end_ARG ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ] end_ARG start_POSTSUBSCRIPT italic_M ( italic_s , italic_t , italic_D ) end_POSTSUBSCRIPT italic_x (B.2.37)
1+M(s,t,D)x.\displaystyle\leq 1+M(s,t,D)x.≤ 1 + italic_M ( italic_s , italic_t , italic_D ) italic_x . (B.2.38)

Here M(s,t,D)>0M(s,t,D)>0italic_M ( italic_s , italic_t , italic_D ) > 0 since Ut>0U_{t}>0italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 0. In the above bound, we need to verify that the lower bound for xxitalic_x is 1\geq-1≥ - 1. Indeed,

(1st)t2DUt\displaystyle-\frac{\left(1-\frac{s}{t}\right)t^{2}}{D}U_{t}- divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D end_ARG italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =(1st)t2Dmax(Dt2,|R2t4Dt2|)\displaystyle=-\frac{\left(1-\frac{s}{t}\right)t^{2}}{D}\max\left(\frac{D}{t^{2}},\left\lvert\frac{R^{2}}{t^{4}}-\frac{D}{t^{2}}\right\rvert\right)= - divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D end_ARG roman_max ( divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , | divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG | ) (B.2.39)
=(1st)max(1,|R2Dt21|)\displaystyle=-\left(1-\frac{s}{t}\right)\max\left(1,\left\lvert\frac{R^{2}}{Dt^{2}}-1\right\rvert\right)= - ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) roman_max ( 1 , | divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - 1 | ) (B.2.40)

Notice that this is 1\geq-1≥ - 1 if and only if (1st)(R2Dt21)1\left(1-\frac{s}{t}\right)\cdot\left(\frac{R^{2}}{Dt^{2}}-1\right)\geq 1( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) ⋅ ( divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - 1 ) ≥ 1, i.e., 0<t<R/2D0<t<R/\sqrt{2D}0 < italic_t < italic_R / square-root start_ARG 2 italic_D end_ARG and 0stR2/D2t2R2/Dt20\leq s\leq t\cdot\frac{R^{2}/D-2t^{2}}{R^{2}/D-t^{2}}0 ≤ italic_s ≤ italic_t ⋅ divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_D - 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_D - italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, as granted by the assumptions.

Applying this bound, we obtain

Dpt(𝝃)(1+(1st)t2DΔlogpt(𝝃))Dd𝝃\displaystyle\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\left(1+\frac{\left(1-\frac{s}{t}\right)t^{2}}{D}\Delta\log p_{t}(\bm{\xi})\right)^{D}\mathrm{d}\bm{\xi}∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ( 1 + divide start_ARG ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_D end_ARG roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT roman_d bold_italic_ξ (B.2.41)
Dpt(𝝃)(1+M(s,t,D)Δlogpt(𝝃))d𝝃\displaystyle\leq\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\left(1+M(s,t,D)\Delta\log p_{t}(\bm{\xi})\right)\mathrm{d}\bm{\xi}≤ ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ( 1 + italic_M ( italic_s , italic_t , italic_D ) roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) roman_d bold_italic_ξ (B.2.42)
=1+M(s,t,D)Dpt(𝝃)Δlogpt(𝝃)d𝝃\displaystyle=1+M(s,t,D)\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\Delta\log p_{t}(\bm{\xi})\mathrm{d}\bm{\xi}= 1 + italic_M ( italic_s , italic_t , italic_D ) ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ (B.2.43)
=1M(s,t,D)Dpt(𝝃),logpt(𝝃)d𝝃\displaystyle=1-M(s,t,D)\int_{\mathbb{R}^{D}}\langle\nabla p_{t}(\bm{\xi}),\nabla\log p_{t}(\bm{\xi})\rangle\mathrm{d}\bm{\xi}= 1 - italic_M ( italic_s , italic_t , italic_D ) ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , ∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟩ roman_d bold_italic_ξ (B.2.44)
=1M(s,t,D)Dpt(𝝃)22pt(𝝃)d𝝃,\displaystyle=1-M(s,t,D)\int_{\mathbb{R}^{D}}\frac{\|\nabla p_{t}(\bm{\xi})\|_{2}^{2}}{p_{t}(\bm{\xi})}\mathrm{d}\bm{\xi},= 1 - italic_M ( italic_s , italic_t , italic_D ) ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG ∥ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG roman_d bold_italic_ξ , (B.2.45)

where the last few lines are the same as in the proof of Theorem B.2. Combining this result with our previous estimate,

h(𝒙¯(𝒙t))h(𝒙t)M(s,t,D)Dpt(𝝃)22pt(𝝃)d𝝃<0h(\bar{\bm{x}}(\bm{x}_{t}))-h(\bm{x}_{t})\leq-M(s,t,D)\int_{\mathbb{R}^{D}}\frac{\|\nabla p_{t}(\bm{\xi})\|_{2}^{2}}{p_{t}(\bm{\xi})}\mathrm{d}\bm{\xi}<0italic_h ( over¯ start_ARG bold_italic_x end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) - italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ - italic_M ( italic_s , italic_t , italic_D ) ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG ∥ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG roman_d bold_italic_ξ < 0 (B.2.46)

where the inequality is strict by the same argument as in Theorem B.2. ∎

Notice that the bounds for ssitalic_s and ttitalic_t depend on the radius RRitalic_R of the data distribution, and are not so general as the bounds in Theorem B.2. The result is actually “as general as needed” in the following sense. Note that if 𝒙\bm{x}bold_italic_x has a twice continuously differentiable density supported on the ball of radius RRitalic_R centered at 𝟎\bm{0}bold_0, then it does for 2R2R2 italic_R, and 3R3R3 italic_R, and so on, i.e., for any ball of radius R>RR^{\prime}>Ritalic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > italic_R. Thus one strategy to get the appropriate denoising guarantee is: fix a data dimension DDitalic_D and discretization schedule, and then set (in the analysis) the data radius RRitalic_R to be very large such that each denoising step satisfies the requirements for entropy decrease given in Theorem B.3. Then each step of the denoising process will indeed reduce the entropy, as desired.

B.2.3 Technical Lemmas and Intermediate Results

In this subsection we present technical results which power our main two conceptual theorems. Our presentation will be more or less standard for mathematics; we will start with the higher-level results first, and gradually move back to the more incremental results. The higher-level results will use the incremental results, and in this way we have an easy-to-read dependency ordering of the results: no result depends on those before it. Results which do not depend on each other are generally ordered by the place they appear in the above pair of proofs.

Finitneness of the Differential Entropy

We first show that the entropy exists along the stochastic process and is finite.

Lemma B.1.

Let 𝐱\bm{x}bold_italic_x be any random variable, and let (𝐱t)t[0,T](\bm{x}_{t})_{t\in[0,T]}( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT be the stochastic process (B.2.1).

  1. 1.

    For t>0t>0italic_t > 0, the differential entropy h(𝒙t)h(\bm{x}_{t})italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) exists and is >>-\infty> - ∞.

  2. 2.

    If in addition Assumption B.1 holds for 𝒙\bm{x}bold_italic_x, then h(𝒙)<h(\bm{x})<\inftyitalic_h ( bold_italic_x ) < ∞ and h(𝒙t)<h(\bm{x}_{t})\ <\inftyitalic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) < ∞.

Proof.

To prove Lemma B.1.1, we use a classic yet tedious analysis argument. Since 𝒙t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT has a density, we can write

h(𝒙t)=Dpt(𝝃)logpt(𝝃)d𝝃.h(\bm{x}_{t})=-\int_{\mathbb{R}^{D}}p_{t}(\bm{\xi})\log p_{t}(\bm{\xi})\mathrm{d}\bm{\xi}.italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ . (B.2.47)

Accordingly, let g:Dg\colon\mathbb{R}^{D}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT → blackboard_R be defined as

g(𝝃)pt(𝝃)logpt(𝝃)h(𝒙t)=Dg(𝝃)d𝝃.g(\bm{\xi})\doteq-p_{t}(\bm{\xi})\log p_{t}(\bm{\xi})\implies h(\bm{x}_{t})=\int_{\mathbb{R}^{D}}g(\bm{\xi})\mathrm{d}\bm{\xi}.italic_g ( bold_italic_ξ ) ≐ - italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟹ italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g ( bold_italic_ξ ) roman_d bold_italic_ξ . (B.2.48)

As usual to bound the value of an integral in analysis, define

g+(𝝃)max(g(𝝃),0),g(𝝃)max(g(𝝃),0)g=g+gandg+,g0.g_{+}(\bm{\xi})\doteq\max(g(\bm{\xi}),0),\quad g_{-}(\bm{\xi})\doteq\max(-g(\bm{\xi}),0)\quad\implies\quad g=g_{+}-g_{-}\quad\text{and}\quad g_{+},g_{-}\geq 0.italic_g start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( bold_italic_ξ ) ≐ roman_max ( italic_g ( bold_italic_ξ ) , 0 ) , italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_italic_ξ ) ≐ roman_max ( - italic_g ( bold_italic_ξ ) , 0 ) ⟹ italic_g = italic_g start_POSTSUBSCRIPT + end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT and italic_g start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ≥ 0 . (B.2.49)

Then

h(𝒙t)=Dg+(𝝃)d𝝃Dg(𝝃)d𝝃,h(\bm{x}_{t})=\int_{\mathbb{R}^{D}}g_{+}(\bm{\xi})\mathrm{d}\bm{\xi}-\int_{\mathbb{R}^{D}}g_{-}(\bm{\xi})\mathrm{d}\bm{\xi},italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ , (B.2.50)

and both integrals are guaranteed to be non-negative since their integrands are.

In order to show that h(𝒙t)h(\bm{x}_{t})italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is well-defined, we need to show that Dg+(𝝃)d𝝃<\int_{\mathbb{R}^{D}}g_{+}(\bm{\xi})\mathrm{d}\bm{\xi}<\infty∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ < ∞ or Dg(𝝃)d𝝃<\int_{\mathbb{R}^{D}}g_{-}(\bm{\xi})\mathrm{d}\bm{\xi}<\infty∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ < ∞. To show that h(𝒙t)>h(\bm{x}_{t})>-\inftyitalic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) > - ∞, it merely suffices to show that Dg(𝝃)d𝝃<\int_{\mathbb{R}^{D}}g_{-}(\bm{\xi})\mathrm{d}\bm{\xi}<\infty∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ < ∞. To bound the integral of gg_{-}italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT we need to understand the quantity gg_{-}italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT, namely, we want to characterize when ggitalic_g is negative.

g(𝝃)0pt(𝝃)logpt(𝝃)0logpt(𝝃)0pt(𝝃)1.g(\bm{\xi})\leq 0\iff p_{t}(\bm{\xi})\log p_{t}(\bm{\xi})\geq 0\iff\log p_{t}(\bm{\xi})\geq 0\iff p_{t}(\bm{\xi})\geq 1.italic_g ( bold_italic_ξ ) ≤ 0 ⇔ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≥ 0 ⇔ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≥ 0 ⇔ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≥ 1 . (B.2.51)

Thus, it holds that

g(𝝃)=𝟏(pt(𝝃)1)(g(𝝃))=𝟏(pt(𝝃)1)pt(𝝃)logpt(𝝃).g_{-}(\bm{\xi})=\mathbf{1}(p_{t}(\bm{\xi})\geq 1)\cdot(-g(\bm{\xi}))=\mathbf{1}(p_{t}(\bm{\xi})\geq 1)p_{t}(\bm{\xi})\log p_{t}(\bm{\xi}).italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_italic_ξ ) = bold_1 ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≥ 1 ) ⋅ ( - italic_g ( bold_italic_ξ ) ) = bold_1 ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≥ 1 ) italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) . (B.2.52)

In order to bound the integral of g(𝝃)g_{-}(\bm{\xi})italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_italic_ξ ), we need to show that ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is “not too concentrated,” namely that ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is not too large. To prove this, in this case we are lucky enough to be able to bound the function g(𝝃)g_{-}(\bm{\xi})italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_italic_ξ ) itself. Namely, notice that

max𝝃Dφt(𝝃𝒙)=φt(𝟎)=1(2π)D/2tD=:Ct.\max_{\bm{\xi}\in\mathbb{R}^{D}}\varphi_{t}(\bm{\xi}-\bm{x})=\varphi_{t}(\bm{0})=\frac{1}{(2\pi)^{D/2}t^{D}}=:C_{t}.roman_max start_POSTSUBSCRIPT bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) = italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_0 ) = divide start_ARG 1 end_ARG start_ARG ( 2 italic_π ) start_POSTSUPERSCRIPT italic_D / 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_ARG = : italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . (B.2.53)

which blows up as t0t\to 0italic_t → 0 but is finite for all finite ttitalic_t. Therefore

pt(𝝃)=𝔼φt(𝝃𝒙)𝔼Ct=Ct.p_{t}(\bm{\xi})=\operatorname{\mathbb{E}}\varphi_{t}(\bm{\xi}-\bm{x})\leq\operatorname{\mathbb{E}}C_{t}=C_{t}.italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) = blackboard_E italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ≤ blackboard_E italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . (B.2.54)

Now there are two cases.

  • If Ct<1C_{t}<1italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT < 1, then pt(𝝃)<1p_{t}(\bm{\xi})<1italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) < 1, so the indicator is never 111, hence g=0g_{-}=0italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT = 0 identically and its integral is also 0.

  • If Ct1C_{t}\geq 1italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ 1, then logCt0\log C_{t}\geq 0roman_log italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ 0, so since the logarithm is monotonically increasing,

    Dg(𝝃)d𝝃\displaystyle\int_{\mathbb{R}^{D}}g_{-}(\bm{\xi})\mathrm{d}\bm{\xi}∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ =D𝟏(pt(𝝃)1)pt(𝝃)logpt(𝝃)d𝝃\displaystyle=\int_{\mathbb{R}^{D}}\mathbf{1}(p_{t}(\bm{\xi})\geq 1)p_{t}(\bm{\xi})\log p_{t}(\bm{\xi})\mathrm{d}\bm{\xi}= ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_1 ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≥ 1 ) italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ (B.2.55)
    =𝔼[𝟏(pt(𝒙t)1)logpt(𝒙t)]\displaystyle=\operatorname{\mathbb{E}}[\mathbf{1}(p_{t}(\bm{x}_{t})\geq 1)\log p_{t}(\bm{x}_{t})]= blackboard_E [ bold_1 ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ 1 ) roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] (B.2.56)
    𝔼[𝟏(pt(𝒙t)1)logCt]\displaystyle\leq\operatorname{\mathbb{E}}[\mathbf{1}(p_{t}(\bm{x}_{t})\geq 1)\log C_{t}]≤ blackboard_E [ bold_1 ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ 1 ) roman_log italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] (B.2.57)
    =[pt(𝒙t)1]logCt.\displaystyle=\operatorname{\mathbb{P}}[p_{t}(\bm{x}_{t})\geq 1]\log C_{t}.= blackboard_P [ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ 1 ] roman_log italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . (B.2.58)

Hence we have Dg(𝝃)d𝝃<\int_{\mathbb{R}^{D}}g_{-}(\bm{\xi})\mathrm{d}\bm{\xi}<\infty∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ < ∞, so the differential entropy h(𝒙t)h(\bm{x}_{t})italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) exists and is >>-\infty> - ∞.

To prove Lemma B.1.2, suppose that Assumption B.1 holds. We want to show that h(𝒙)<h(\bm{x})<\inftyitalic_h ( bold_italic_x ) < ∞ and h(𝒙t)<h(\bm{x}_{t})<\inftyitalic_h ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) < ∞. The mechanism for doing this is the same, and involves the maximum entropy result Theorem B.1. Namely, since 𝒙\bm{x}bold_italic_x is absolutely bounded, it has a finite covariance which we will denote 𝚺\bm{\Sigma}bold_Σ. Then the covariance of 𝒙t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is 𝚺+t2𝑰\bm{\Sigma}+t^{2}\bm{I}bold_Σ + italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I. Thus the entropy of 𝒙\bm{x}bold_italic_x and 𝒙t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be upper bounded by the entropy of normal distributions with the respective covariances, i.e., log[(2πe)Ddet(𝚺)]\log[(2\pi e)^{D}\det(\bm{\Sigma})]roman_log [ ( 2 italic_π italic_e ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT roman_det ( bold_Σ ) ] or log[(2πe)Ddet(𝚺+t2𝑰)]\log[(2\pi e)^{D}\det(\bm{\Sigma}+t^{2}\bm{I})]roman_log [ ( 2 italic_π italic_e ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT roman_det ( bold_Σ + italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I ) ], and both are <<\infty< ∞. ∎

Integration by Parts in De Brujin Identity

Finally, we fill in the integration-by-parts argument alluded to in the proofs of Theorems B.2 and B.3. The argument is conceptually pretty simple but requires some technical estimates to show that the boundary term in the integration-by-parts vanishes.

Lemma B.2.

Let 𝐱\bm{x}bold_italic_x be any random variable such that Assumptions B.1 and B.2 hold, and let (𝐱t)t[0,T](\bm{x}_{t})_{t\in[0,T]}( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT be the stochastic process (B.2.1). For t0t\geq 0italic_t ≥ 0, let ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the density of 𝐱t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then for a constant cc\in\mathbb{R}italic_c ∈ blackboard_R it holds

DΔpt(𝝃)[c+logpt(𝝃)]d𝝃=Dlogpt(𝝃),pt(𝝃)d𝝃.\int_{\mathbb{R}^{D}}\Delta p_{t}(\bm{\xi})[c+\log p_{t}(\bm{\xi})]\mathrm{d}\bm{\xi}=-\int_{\mathbb{R}^{D}}\langle\nabla\log p_{t}(\bm{\xi}),\nabla p_{t}(\bm{\xi})\rangle\mathrm{d}\bm{\xi}.∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] roman_d bold_italic_ξ = - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⟨ ∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟩ roman_d bold_italic_ξ . (B.2.59)
Proof.

The basic idea of this proof is in two steps:

  • First, apply Green’s theorem to do integration by parts over a compact set;

  • Second, send the radius of this compact set to ++\infty+ ∞, to get integrals over all of D\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT.

Green’s theorem says that for any compact set 𝒦D\mathcal{K}\subseteq\mathbb{R}^{D}caligraphic_K ⊆ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT, twice continuously differentiable ϕ:D\phi\colon\mathbb{R}^{D}\to\mathbb{R}italic_ϕ : blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT → blackboard_R, and continuously differentiable ψ:D\psi\colon\mathbb{R}^{D}\to\mathbb{R}italic_ψ : blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT → blackboard_R,

𝒦{ψ(𝝃)Δϕ(𝝃)+ψ(𝝃),ϕ(𝝃)}d𝝃=𝒦ψ(𝝃)ϕ(𝝃),𝒏(𝝃)dσ(𝝃)\int_{\mathcal{K}}\left\{\psi(\bm{\xi})\Delta\phi(\bm{\xi})+\langle\nabla\psi(\bm{\xi}),\nabla\phi(\bm{\xi})\rangle\right\}\mathrm{d}\bm{\xi}=\int_{\partial\mathcal{K}}\psi(\bm{\xi})\langle\nabla\phi(\bm{\xi}),\bm{n}(\bm{\xi})\rangle\mathrm{d}\sigma(\bm{\xi})∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT { italic_ψ ( bold_italic_ξ ) roman_Δ italic_ϕ ( bold_italic_ξ ) + ⟨ ∇ italic_ψ ( bold_italic_ξ ) , ∇ italic_ϕ ( bold_italic_ξ ) ⟩ } roman_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT ∂ caligraphic_K end_POSTSUBSCRIPT italic_ψ ( bold_italic_ξ ) ⟨ ∇ italic_ϕ ( bold_italic_ξ ) , bold_italic_n ( bold_italic_ξ ) ⟩ roman_d italic_σ ( bold_italic_ξ ) (B.2.60)

where dσ(𝝃)\mathrm{d}\sigma(\bm{\xi})roman_d italic_σ ( bold_italic_ξ ) denotes an integral over the “surface measure”, i.e., the inherited measure on 𝒦\partial\mathcal{K}∂ caligraphic_K, namely the boundary of 𝒦\mathcal{K}caligraphic_K, and accordingly 𝝃\bm{\xi}bold_italic_ξ takes values on this surface and 𝒏(𝝃)\bm{n}(\bm{\xi})bold_italic_n ( bold_italic_ξ ) is the unit normal vector to 𝒦\mathcal{K}caligraphic_K at the surface point 𝝃\bm{\xi}bold_italic_ξ. Now, taking ϕ(𝝃)pt(𝝃)\phi(\bm{\xi})\doteq p_{t}(\bm{\xi})italic_ϕ ( bold_italic_ξ ) ≐ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) and ψ(𝝃)c+logpt(𝝃)\psi(\bm{\xi})\doteq c+\log p_{t}(\bm{\xi})italic_ψ ( bold_italic_ξ ) ≐ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ), over a ball Br(𝟎)B_{r}(\bm{0})italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) of radius r>0r>0italic_r > 0 centered at 𝟎\bm{0}bold_0 (so that Br(𝟎)\partial B_{r}(\bm{0})∂ italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) is the sphere of radius rritalic_r centered at 𝟎\bm{0}bold_0 and 𝒏(𝝃)=𝝃/𝝃2=𝝃/r\bm{n}(\bm{\xi})=\bm{\xi}/\|\bm{\xi}\|_{2}=\bm{\xi}/rbold_italic_n ( bold_italic_ξ ) = bold_italic_ξ / ∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = bold_italic_ξ / italic_r):

Br(𝟎){Δpt(𝝃)[c+logpt(𝝃)]+logpt(𝝃),pt(𝝃)}d𝝃\displaystyle\int_{B_{r}(\bm{0})}\left\{\Delta p_{t}(\bm{\xi})[c+\log p_{t}(\bm{\xi})]+\langle\nabla\log p_{t}(\bm{\xi}),\nabla p_{t}(\bm{\xi})\rangle\right\}\mathrm{d}\bm{\xi}∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) end_POSTSUBSCRIPT { roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] + ⟨ ∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟩ } roman_d bold_italic_ξ (B.2.61)
=Br(𝟎)[c+logpt(𝝃)]pt(𝝃),𝝃rdσ(𝝃)\displaystyle=\int_{\partial B_{r}(\bm{0})}[c+\log p_{t}(\bm{\xi})]\left\langle\nabla p_{t}(\bm{\xi}),\frac{\bm{\xi}}{r}\right\rangle\mathrm{d}\sigma(\bm{\xi})= ∫ start_POSTSUBSCRIPT ∂ italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) end_POSTSUBSCRIPT [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , divide start_ARG bold_italic_ξ end_ARG start_ARG italic_r end_ARG ⟩ roman_d italic_σ ( bold_italic_ξ ) (B.2.62)
=1rBr(𝟎)[c+logpt(𝝃)]pt(𝝃),𝝃dσ(𝝃).\displaystyle=\frac{1}{r}\int_{\partial B_{r}(\bm{0})}[c+\log p_{t}(\bm{\xi})]\left\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\right\rangle\mathrm{d}\sigma(\bm{\xi}).= divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ∫ start_POSTSUBSCRIPT ∂ italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) end_POSTSUBSCRIPT [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ roman_d italic_σ ( bold_italic_ξ ) . (B.2.63)

Sending rr\to\inftyitalic_r → ∞, it holds that

D{Δpt(𝝃)[c+logpt(𝝃)]+logpt(𝝃),pt(𝝃)}d𝝃\displaystyle\int_{\mathbb{R}^{D}}\left\{\Delta p_{t}(\bm{\xi})[c+\log p_{t}(\bm{\xi})]+\langle\nabla\log p_{t}(\bm{\xi}),\nabla p_{t}(\bm{\xi})\rangle\right\}\mathrm{d}\bm{\xi}∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] + ⟨ ∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟩ } roman_d bold_italic_ξ (B.2.64)
=limrBr(𝟎){Δpt(𝝃)[c+logpt(𝝃)]+logpt(𝝃),pt(𝝃)}d𝝃\displaystyle=\lim_{r\to\infty}\int_{B_{r}(\bm{0})}\left\{\Delta p_{t}(\bm{\xi})[c+\log p_{t}(\bm{\xi})]+\langle\nabla\log p_{t}(\bm{\xi}),\nabla p_{t}(\bm{\xi})\rangle\right\}\mathrm{d}\bm{\xi}= roman_lim start_POSTSUBSCRIPT italic_r → ∞ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) end_POSTSUBSCRIPT { roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] + ⟨ ∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟩ } roman_d bold_italic_ξ (B.2.65)
=limrBr(𝟎){Δpt(𝝃)[c+logpt(𝝃)]+logpt(𝝃),pt(𝝃)}d𝝃\displaystyle=\lim_{r\to\infty}\int_{\partial B_{r}(\bm{0})}\left\{\Delta p_{t}(\bm{\xi})[c+\log p_{t}(\bm{\xi})]+\langle\nabla\log p_{t}(\bm{\xi}),\nabla p_{t}(\bm{\xi})\rangle\right\}\mathrm{d}\bm{\xi}= roman_lim start_POSTSUBSCRIPT italic_r → ∞ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT ∂ italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) end_POSTSUBSCRIPT { roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] + ⟨ ∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟩ } roman_d bold_italic_ξ (B.2.66)
=limr1rBr(𝟎)[c+logpt(𝝃)]pt(𝝃),𝝃dσ(𝝃),\displaystyle=\lim_{r\to\infty}\frac{1}{r}\int_{\partial B_{r}(\bm{0})}[c+\log p_{t}(\bm{\xi})]\left\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\right\rangle\mathrm{d}\sigma(\bm{\xi}),= roman_lim start_POSTSUBSCRIPT italic_r → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ∫ start_POSTSUBSCRIPT ∂ italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) end_POSTSUBSCRIPT [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ roman_d italic_σ ( bold_italic_ξ ) , (B.2.67)

where the first inequality follows by dominated convergence on the integrand. It remains to compute the last limit. For this, we take asymptotic expansions of each term. The main device is as follows: for 𝝃Br(𝟎)\bm{\xi}\in\partial B_{r}(\bm{0})bold_italic_ξ ∈ ∂ italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ), we have 𝝃2=r\|\bm{\xi}\|_{2}=r∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_r, so

pt(𝝃)\displaystyle p_{t}(\bm{\xi})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) =𝔼[φt(𝝃𝒙)]\displaystyle=\operatorname{\mathbb{E}}[\varphi_{t}(\bm{\xi}-\bm{x})]= blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] (B.2.68)
=𝔼[1(2π)D/2tDCte𝝃𝒙22/(2t2)]\displaystyle=\operatorname{\mathbb{E}}\left[\underbrace{\frac{1}{(2\pi)^{D/2}t^{D}}}_{\doteq C_{t}}e^{-\|\bm{\xi}-\bm{x}\|_{2}^{2}/(2t^{2})}\right]= blackboard_E [ under⏟ start_ARG divide start_ARG 1 end_ARG start_ARG ( 2 italic_π ) start_POSTSUPERSCRIPT italic_D / 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT ≐ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - ∥ bold_italic_ξ - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ] (B.2.69)
=Ct𝔼[e(𝝃222𝝃,𝒙+𝒙22)/(2t2)]\displaystyle=C_{t}\operatorname{\mathbb{E}}\left[e^{-(\|\bm{\xi}\|_{2}^{2}-2\langle\bm{\xi},\bm{x}\rangle+\|\bm{x}\|_{2}^{2})/(2t^{2})}\right]= italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT blackboard_E [ italic_e start_POSTSUPERSCRIPT - ( ∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ⟨ bold_italic_ξ , bold_italic_x ⟩ + ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) / ( 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ] (B.2.70)
=Ct𝔼[e(r22𝝃,𝒙+𝒙22)/(2t2)]\displaystyle=C_{t}\operatorname{\mathbb{E}}\left[e^{-(r^{2}-2\langle\bm{\xi},\bm{x}\rangle+\|\bm{x}\|_{2}^{2})/(2t^{2})}\right]= italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT blackboard_E [ italic_e start_POSTSUPERSCRIPT - ( italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ⟨ bold_italic_ξ , bold_italic_x ⟩ + ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) / ( 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ] (B.2.71)
=Cter2/(2t2)𝔼[e(2𝝃,𝒙𝒙22)/(2t2)].\displaystyle=C_{t}e^{-r^{2}/(2t^{2})}\operatorname{\mathbb{E}}[e^{(2\langle\bm{\xi},\bm{x}\rangle-\|\bm{x}\|_{2}^{2})/(2t^{2})}].= italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT blackboard_E [ italic_e start_POSTSUPERSCRIPT ( 2 ⟨ bold_italic_ξ , bold_italic_x ⟩ - ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) / ( 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ] . (B.2.72)

Note that because 𝝃2=r\|\bm{\xi}\|_{2}=r∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_r, we have by Cauchy-Schwarz that

2r𝒙2𝒙222𝝃,𝒙𝒙222r𝒙2𝒙22.-2r\|\bm{x}\|_{2}-\|\bm{x}\|_{2}^{2}\leq 2\langle\bm{\xi},\bm{x}\rangle-\|\bm{x}\|_{2}^{2}\leq 2r\|\bm{x}\|_{2}-\|\bm{x}\|_{2}^{2}.- 2 italic_r ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 ⟨ bold_italic_ξ , bold_italic_x ⟩ - ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 italic_r ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (B.2.73)

Recall that by Assumption B.1, 𝒙\bm{x}bold_italic_x is supported on a compact set 𝒮\mathcal{S}caligraphic_S of radius RRitalic_R. Thus

2R(r+R)2𝝃,𝒙𝒙222Rr.-2R(r+R)\leq 2\langle\bm{\xi},\bm{x}\rangle-\|\bm{x}\|_{2}^{2}\leq 2Rr.- 2 italic_R ( italic_r + italic_R ) ≤ 2 ⟨ bold_italic_ξ , bold_italic_x ⟩ - ∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 italic_R italic_r . (B.2.74)

In other words, it holds

Cte[r2+2R(r+R)]/(2t2)pt(𝝃)Cte[r2+2Rr]/(2t2).C_{t}e^{-[r^{2}+2R(r+R)]/(2t^{2})}\leq p_{t}(\bm{\xi})\leq C_{t}e^{[-r^{2}+2Rr]/(2t^{2})}.italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - [ italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R ( italic_r + italic_R ) ] / ( 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ≤ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≤ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT [ - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R italic_r ] / ( 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT . (B.2.75)

Now to compute the gradient, we can use Proposition B.1 and linearity of expectation to compute

pt(𝝃),𝝃\displaystyle\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\rangle⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ =1t2𝔼[(𝝃𝒙)φt(𝝃𝒙)],𝝃\displaystyle=\left\langle-\frac{1}{t^{2}}\operatorname{\mathbb{E}}\left[\left(\bm{\xi}-\bm{x}\right)\varphi_{t}(\bm{\xi}-\bm{x})\right],\bm{\xi}\right\rangle= ⟨ - divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ( bold_italic_ξ - bold_italic_x ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] , bold_italic_ξ ⟩ (B.2.76)
=1t2𝔼[𝝃𝒙,𝝃φt(𝝃𝒙)]\displaystyle=-\frac{1}{t^{2}}\operatorname{\mathbb{E}}\left[\langle\bm{\xi}-\bm{x},\bm{\xi}\rangle\varphi_{t}(\bm{\xi}-\bm{x})\right]= - divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ⟨ bold_italic_ξ - bold_italic_x , bold_italic_ξ ⟩ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] (B.2.77)
=1t2𝔼[(𝝃22𝝃,𝒙)φt(𝝃𝒙)]\displaystyle=-\frac{1}{t^{2}}\operatorname{\mathbb{E}}\left[\left(\|\bm{\xi}\|_{2}^{2}-\langle\bm{\xi},\bm{x}\rangle\right)\varphi_{t}(\bm{\xi}-\bm{x})\right]= - divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ( ∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ⟨ bold_italic_ξ , bold_italic_x ⟩ ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] (B.2.78)
=1t2𝔼[(r2𝝃,𝒙)φt(𝝃𝒙)]\displaystyle=-\frac{1}{t^{2}}\operatorname{\mathbb{E}}\left[\left(r^{2}-\langle\bm{\xi},\bm{x}\rangle\right)\varphi_{t}(\bm{\xi}-\bm{x})\right]= - divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ( italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ⟨ bold_italic_ξ , bold_italic_x ⟩ ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] (B.2.79)
=1t2𝔼[(𝝃,𝒙r2)φt(𝝃𝒙)].\displaystyle=\frac{1}{t^{2}}\operatorname{\mathbb{E}}\left[\left(\langle\bm{\xi},\bm{x}\rangle-r^{2}\right)\varphi_{t}(\bm{\xi}-\bm{x})\right].= divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ( ⟨ bold_italic_ξ , bold_italic_x ⟩ - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] . (B.2.80)

Using Cauchy-Schwarz and the representation pt(𝝃)𝔼[φt(𝝃𝒙)]p_{t}(\bm{\xi})\doteq\operatorname{\mathbb{E}}[\varphi_{t}(\bm{\xi}-\bm{x})]italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≐ blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] again, it holds

1t2𝔼[(Rrr2)φt(𝝃𝒙)]pt(𝝃),𝝃1t2𝔼[(Rrr2)φt(𝝃𝒙)]\displaystyle\frac{1}{t^{2}}\operatorname{\mathbb{E}}\left[\left(-Rr-r^{2}\right)\varphi_{t}(\bm{\xi}-\bm{x})\right]\leq\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\rangle\leq\frac{1}{t^{2}}\operatorname{\mathbb{E}}\left[\left(Rr-r^{2}\right)\varphi_{t}(\bm{\xi}-\bm{x})\right]divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ( - italic_R italic_r - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] ≤ ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ ≤ divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ( italic_R italic_r - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] (B.2.81)
1t2(Rrr2)𝔼[φt(𝝃𝒙)]pt(𝝃),𝝃1t2(Rrr2)𝔼[φt(𝝃𝒙)]\displaystyle\frac{1}{t^{2}}\left(-Rr-r^{2}\right)\operatorname{\mathbb{E}}\left[\varphi_{t}(\bm{\xi}-\bm{x})\right]\leq\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\rangle\leq\frac{1}{t^{2}}\left(Rr-r^{2}\right)\operatorname{\mathbb{E}}\left[\varphi_{t}(\bm{\xi}-\bm{x})\right]divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( - italic_R italic_r - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] ≤ ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ ≤ divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_R italic_r - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] (B.2.82)
r(R+r)t2pt(𝝃)pt(𝝃),𝝃r(rR)t2pt(𝝃).\displaystyle-\frac{r(R+r)}{t^{2}}p_{t}(\bm{\xi})\leq\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\rangle\leq-\frac{r(r-R)}{t^{2}}p_{t}(\bm{\xi}).- divide start_ARG italic_r ( italic_R + italic_r ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≤ ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ ≤ - divide start_ARG italic_r ( italic_r - italic_R ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) . (B.2.83)

For r>R>0r>R>0italic_r > italic_R > 0 (as is suitable, because we are going to take the limit rr\to\inftyitalic_r → ∞ while RRitalic_R is fixed), both sides are negative. This makes sense: most of the probability mass is contained within the ball of radius RRitalic_R and thus the score points inwards, having a negative inner product with the outward-pointing vector 𝝃\bm{\xi}bold_italic_ξ. Thus using the appropriate bounds for pt(𝝃)p_{t}(\bm{\xi})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ),

r(R+r)t2Cte[r2+2Rr]/(2t2)pt(𝝃),𝝃r(rR)t2Cte[r2+2R(r+R)]/(2t2).-\frac{r(R+r)}{t^{2}}\cdot C_{t}e^{[-r^{2}+2Rr]/(2t^{2})}\leq\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\rangle\leq-\frac{r(r-R)}{t^{2}}\cdot C_{t}e^{-[r^{2}+2R(r+R)]/(2t^{2})}.- divide start_ARG italic_r ( italic_R + italic_r ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT [ - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R italic_r ] / ( 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ≤ ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ ≤ - divide start_ARG italic_r ( italic_r - italic_R ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - [ italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R ( italic_r + italic_R ) ] / ( 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT . (B.2.84)

Then, noting that Ct=poly(t1)C_{t}=\mathrm{poly}(t^{-1})italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_poly ( italic_t start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ), we can compute

[c+logpt(𝝃)]pt(𝝃),𝝃=poly(r,R,t1,c)eΘr(r2)[c+\log p_{t}(\bm{\xi})]\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\rangle=\mathrm{poly}(r,R,t^{-1},c)e^{-\Theta_{r}(r^{2})}[ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ = roman_poly ( italic_r , italic_R , italic_t start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_c ) italic_e start_POSTSUPERSCRIPT - roman_Θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT (B.2.85)

So one can see that, letting the surface area of Br(𝟎)\partial B_{r}(\bm{0})∂ italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) be ωDrD1\omega_{D}r^{D-1}italic_ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT italic_D - 1 end_POSTSUPERSCRIPT where ωD\omega_{D}italic_ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT is a function ofDDitalic_D, it holds

1rBr(𝟎)[c+logpt(𝝃)]pt(𝝃),𝝃d𝝃=poly(r,R,t1,c)eΘr(r2)\frac{1}{r}\int_{\partial B_{r}(\bm{0})}[c+\log p_{t}(\bm{\xi})]\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\rangle\mathrm{d}\bm{\xi}=\mathrm{poly}(r,R,t^{-1},c)e^{-\Theta_{r}(r^{2})}divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ∫ start_POSTSUBSCRIPT ∂ italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) end_POSTSUBSCRIPT [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ roman_d bold_italic_ξ = roman_poly ( italic_r , italic_R , italic_t start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_c ) italic_e start_POSTSUPERSCRIPT - roman_Θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT (B.2.86)

and therefore the exponentially decaying tails mean

limr1rBr(𝟎)[c+logpt(𝝃)]pt(𝝃),𝝃d𝝃=0.\lim_{r\to\infty}\frac{1}{r}\int_{\partial B_{r}(\bm{0})}[c+\log p_{t}(\bm{\xi})]\langle\nabla p_{t}(\bm{\xi}),\bm{\xi}\rangle\mathrm{d}\bm{\xi}=0.roman_lim start_POSTSUBSCRIPT italic_r → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ∫ start_POSTSUBSCRIPT ∂ italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) end_POSTSUBSCRIPT [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] ⟨ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , bold_italic_ξ ⟩ roman_d bold_italic_ξ = 0 . (B.2.87)

Finally, plugging into (B.2.64), we have

D{Δpt(𝝃)[c+logpt(𝝃)]+logpt(𝝃),pt(𝝃)}d𝝃=0\displaystyle\int_{\mathbb{R}^{D}}\left\{\Delta p_{t}(\bm{\xi})[c+\log p_{t}(\bm{\xi})]+\langle\nabla\log p_{t}(\bm{\xi}),\nabla p_{t}(\bm{\xi})\rangle\right\}\mathrm{d}\bm{\xi}=0∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] + ⟨ ∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟩ } roman_d bold_italic_ξ = 0 (B.2.88)
\displaystyle\implies DΔpt(𝝃)[c+logpt(𝝃)]d𝝃=Dlogpt(𝝃),pt(𝝃)d𝝃\displaystyle\int_{\mathbb{R}^{D}}\Delta p_{t}(\bm{\xi})[c+\log p_{t}(\bm{\xi})]\mathrm{d}\bm{\xi}=-\int_{\mathbb{R}^{D}}\langle\nabla\log p_{t}(\bm{\xi}),\nabla p_{t}(\bm{\xi})\rangle\mathrm{d}\bm{\xi}∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) [ italic_c + roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ] roman_d bold_italic_ξ = - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⟨ ∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) , ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⟩ roman_d bold_italic_ξ (B.2.89)

as claimed. ∎

Local Invertibility of the Denoiser 𝒙¯\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG

Here we provide some results used in the proof of Theorem B.3 which are appropriate generalizations of corresponding results in [Gri11].

Lemma B.3 (Generalization of [Gri11], Lemma A.1).

Let 𝐱\bm{x}bold_italic_x be any random variable such that Assumptions B.1 and B.2 hold, and let (𝐱t)t[0,T](\bm{x}_{t})_{t\in[0,T]}( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT be the stochastic process (B.2.1). Let s,t[0,T]s,t\in[0,T]italic_s , italic_t ∈ [ 0 , italic_T ] be such that 0s<tT0\leq s<t\leq T0 ≤ italic_s < italic_t ≤ italic_T, and let 𝐱¯(𝛏)𝔼[𝐱s𝐱t=𝛏]\bar{\bm{x}}(\bm{\xi})\doteq\operatorname{\mathbb{E}}[\bm{x}_{s}\mid\bm{x}_{t}=\bm{\xi}]over¯ start_ARG bold_italic_x end_ARG ( bold_italic_ξ ) ≐ blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_ξ ]. The Jacobian 𝐱¯(𝛏)\bar{\bm{x}}^{\prime}(\bm{\xi})over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_ξ ) is symmetric positive definite.

Proof.

We have

𝒙¯(𝝃)=𝑰+(1st)t22logpt(𝝃).\bar{\bm{x}}^{\prime}(\bm{\xi})=\bm{I}+\left(1-\frac{s}{t}\right)t^{2}\nabla^{2}\log p_{t}(\bm{\xi}).over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_ξ ) = bold_italic_I + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) . (B.2.90)

Here we expand

2logpt(𝝃)=pt(𝝃)2pt(𝝃)(pt(𝝃))(pt(𝝃))pt(𝝃)2.\nabla^{2}\log p_{t}(\bm{\xi})=\frac{p_{t}(\bm{\xi})\nabla^{2}p_{t}(\bm{\xi})-(\nabla p_{t}(\bm{\xi}))(\nabla p_{t}(\bm{\xi}))^{\top}}{p_{t}(\bm{\xi})^{2}}.∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) = divide start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) - ( ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) ( ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (B.2.91)

So we need to ensure that

𝒙¯(𝝃)\displaystyle\bar{\bm{x}}^{\prime}(\bm{\xi})over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_ξ ) =𝑰+(1st)t2pt(𝝃)2pt(𝝃)(pt(𝝃))(pt(𝝃))pt(𝝃)2\displaystyle=\bm{I}+\left(1-\frac{s}{t}\right)t^{2}\frac{p_{t}(\bm{\xi})\nabla^{2}p_{t}(\bm{\xi})-(\nabla p_{t}(\bm{\xi}))(\nabla p_{t}(\bm{\xi}))^{\top}}{p_{t}(\bm{\xi})^{2}}= bold_italic_I + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) - ( ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) ( ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (B.2.92)
=pt(𝝃)2𝑰+(1st)t2[pt(𝝃)2pt(𝝃)(pt(𝝃))(pt(𝝃))]pt(𝝃)2\displaystyle=\frac{p_{t}(\bm{\xi})^{2}\bm{I}+\left(1-\frac{s}{t}\right)t^{2}\left[p_{t}(\bm{\xi})\nabla^{2}p_{t}(\bm{\xi})-(\nabla p_{t}(\bm{\xi}))(\nabla p_{t}(\bm{\xi}))^{\top}\right]}{p_{t}(\bm{\xi})^{2}}= divide start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) - ( ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) ( ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (B.2.93)

is symmetric positive semidefinite. Indeed it is obviously symmetric (by Clairaut’s theorem). To show its positive semidefiniteness, we plug in the expectation representation of ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT given by (B.2.3) (and pt\nabla p_{t}∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, Δpt\Delta p_{t}roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT by Proposition B.1) to obtain (where 𝒙\bm{x}bold_italic_x is as defined and 𝒚\bm{y}bold_italic_y is i.i.d. as 𝒙\bm{x}bold_italic_x),

𝒗[𝒙¯(𝝃)]𝒗\displaystyle\bm{v}^{\top}[\bar{\bm{x}}^{\prime}(\bm{\xi})]\bm{v}bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_ξ ) ] bold_italic_v (B.2.94)
=pt(𝝃)2𝒗{pt(𝝃)2𝑰+(1st)t2𝔼[φt(𝝃𝒙)]𝔼[φt(𝝃𝒙)(𝝃𝒙)(𝝃𝒙)t2𝑰t4]\displaystyle=p_{t}(\bm{\xi})^{-2}\bm{v}^{\top}\Bigg{\{}p_{t}(\bm{\xi})^{2}\bm{I}+\left(1-\frac{s}{t}\right)t^{2}\operatorname{\mathbb{E}}[\varphi_{t}(\bm{\xi}-\bm{x})]\operatorname{\mathbb{E}}\left[\varphi_{t}(\bm{\xi}-\bm{x})\cdot\frac{(\bm{\xi}-\bm{x})(\bm{\xi}-\bm{x})^{\top}-t^{2}\bm{I}}{t^{4}}\right]= italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT { italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I + ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ⋅ divide start_ARG ( bold_italic_ξ - bold_italic_x ) ( bold_italic_ξ - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ] (B.2.95)
(1st)t2𝔼[φt(𝝃𝒙)𝝃𝒙t2]𝔼[φt(𝝃𝒙)𝝃𝒙t2]}𝒗\displaystyle\qquad\qquad\qquad-\left(1-\frac{s}{t}\right)t^{2}\operatorname{\mathbb{E}}\left[-\varphi_{t}(\bm{\xi}-\bm{x})\cdot\frac{\bm{\xi}-\bm{x}}{t^{2}}\right]\operatorname{\mathbb{E}}\left[-\varphi_{t}(\bm{\xi}-\bm{x})\cdot\frac{\bm{\xi}-\bm{x}}{t^{2}}\right]^{\top}\Bigg{\}}\bm{v}- ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ - italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ⋅ divide start_ARG bold_italic_ξ - bold_italic_x end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] blackboard_E [ - italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ⋅ divide start_ARG bold_italic_ξ - bold_italic_x end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT } bold_italic_v (B.2.96)
=pt(𝝃)2𝒗{𝔼[φt(𝝃𝒙)φt(𝝃𝒚)𝑰]\displaystyle=p_{t}(\bm{\xi})^{-2}\bm{v}^{\top}\Bigg{\{}\operatorname{\mathbb{E}}[\varphi_{t}(\bm{\xi}-\bm{x})\varphi_{t}(\bm{\xi}-\bm{y})\bm{I}]= italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT { blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_y ) bold_italic_I ] (B.2.97)
+(1st)t2𝔼[φt(𝝃𝒙)φt(𝝃𝒚)(𝝃𝒚)(𝝃𝒚)t2𝑰t4]\displaystyle\qquad\qquad\qquad+\left(1-\frac{s}{t}\right)t^{2}\operatorname{\mathbb{E}}\left[\varphi_{t}(\bm{\xi}-\bm{x})\varphi_{t}(\bm{\xi}-\bm{y})\cdot\frac{(\bm{\xi}-\bm{y})(\bm{\xi}-\bm{y})^{\top}-t^{2}\bm{I}}{t^{4}}\right]+ ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_y ) ⋅ divide start_ARG ( bold_italic_ξ - bold_italic_y ) ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ] (B.2.98)
(1st)t2𝔼[φt(𝝃𝒙)φt(𝝃𝒚)(𝝃𝒙)(𝝃𝒚)t4]}\displaystyle\qquad\qquad\qquad-\left(1-\frac{s}{t}\right)t^{2}\operatorname{\mathbb{E}}\left[\varphi_{t}(\bm{\xi}-\bm{x})\varphi_{t}(\bm{\xi}-\bm{y})\cdot\frac{(\bm{\xi}-\bm{x})(\bm{\xi}-\bm{y})^{\top}}{t^{4}}\right]\Bigg{\}}- ( 1 - divide start_ARG italic_s end_ARG start_ARG italic_t end_ARG ) italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_y ) ⋅ divide start_ARG ( bold_italic_ξ - bold_italic_x ) ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ] } (B.2.99)
=1s/tpt(𝝃)2𝒗𝔼[φt(𝝃𝒙)φt(𝝃𝒚){11s/t𝑰+(𝝃𝒚)(𝝃𝒚)t2𝑰(𝝃𝒙)(𝝃𝒚)t2}]𝒗\displaystyle=\frac{1-s/t}{p_{t}(\bm{\xi})^{2}}\bm{v}^{\top}\operatorname{\mathbb{E}}\mathopen{}\Bigg{[}\varphi_{t}(\bm{\xi}-\bm{x})\varphi_{t}(\bm{\xi}-\bm{y})\left\{\frac{1}{1-s/t}\bm{I}+\frac{(\bm{\xi}-\bm{y})(\bm{\xi}-\bm{y})^{\top}}{t^{2}}-\bm{I}-\frac{(\bm{\xi}-\bm{x})(\bm{\xi}-\bm{y})^{\top}}{t^{2}}\right\}\Bigg{]}\bm{v}= divide start_ARG 1 - italic_s / italic_t end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_y ) { divide start_ARG 1 end_ARG start_ARG 1 - italic_s / italic_t end_ARG bold_italic_I + divide start_ARG ( bold_italic_ξ - bold_italic_y ) ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - bold_italic_I - divide start_ARG ( bold_italic_ξ - bold_italic_x ) ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } ] bold_italic_v (B.2.100)
=tstpt(𝝃)2𝒗𝔼[sts𝑰+(𝝃𝒙)(𝝃𝒙)2t2+(𝝃𝒚)(𝝃𝒚)2t2(𝝃𝒙)(𝝃𝒚)t2]𝒗\displaystyle=\frac{t-s}{tp_{t}(\bm{\xi})^{2}}\bm{v}^{\top}\operatorname{\mathbb{E}}\left[\frac{s}{t-s}\bm{I}+\frac{(\bm{\xi}-\bm{x})(\bm{\xi}-\bm{x})^{\top}}{2t^{2}}+\frac{(\bm{\xi}-\bm{y})(\bm{\xi}-\bm{y})^{\top}}{2t^{2}}-\frac{(\bm{\xi}-\bm{x})(\bm{\xi}-\bm{y})^{\top}}{t^{2}}\right]\bm{v}= divide start_ARG italic_t - italic_s end_ARG start_ARG italic_t italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E [ divide start_ARG italic_s end_ARG start_ARG italic_t - italic_s end_ARG bold_italic_I + divide start_ARG ( bold_italic_ξ - bold_italic_x ) ( bold_italic_ξ - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ( bold_italic_ξ - bold_italic_y ) ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG ( bold_italic_ξ - bold_italic_x ) ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] bold_italic_v (B.2.101)
=tstpt(𝝃)2𝒗𝔼[sts𝑰+12t2((𝝃𝒙)(𝝃𝒙)+(𝝃𝒚)(𝝃𝒚)2(𝝃𝒙)(𝝃𝒚))]𝒗\displaystyle=\frac{t-s}{tp_{t}(\bm{\xi})^{2}}\bm{v}^{\top}\operatorname{\mathbb{E}}\left[\frac{s}{t-s}\bm{I}+\frac{1}{2t^{2}}\left((\bm{\xi}-\bm{x})(\bm{\xi}-\bm{x})^{\top}+(\bm{\xi}-\bm{y})(\bm{\xi}-\bm{y})^{\top}-2(\bm{\xi}-\bm{x})(\bm{\xi}-\bm{y})^{\top}\right)\right]\bm{v}= divide start_ARG italic_t - italic_s end_ARG start_ARG italic_t italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E [ divide start_ARG italic_s end_ARG start_ARG italic_t - italic_s end_ARG bold_italic_I + divide start_ARG 1 end_ARG start_ARG 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ( bold_italic_ξ - bold_italic_x ) ( bold_italic_ξ - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_ξ - bold_italic_y ) ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - 2 ( bold_italic_ξ - bold_italic_x ) ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ] bold_italic_v (B.2.102)
=tstpt(𝝃)2𝔼[sts𝒗22+12t2([(𝝃𝒙)𝒗]2+[(𝝃𝒚)𝒗]22[(𝝃𝒙)𝒗][(𝝃𝒚)𝒗])]\displaystyle=\frac{t-s}{tp_{t}(\bm{\xi})^{2}}\operatorname{\mathbb{E}}\left[\frac{s}{t-s}\|\bm{v}\|_{2}^{2}+\frac{1}{2t^{2}}\left([(\bm{\xi}-\bm{x})^{\top}\bm{v}]^{2}+[(\bm{\xi}-\bm{y})^{\top}\bm{v}]^{2}-2[(\bm{\xi}-\bm{x})^{\top}\bm{v}][(\bm{\xi}-\bm{y})^{\top}\bm{v}]\right)\right]= divide start_ARG italic_t - italic_s end_ARG start_ARG italic_t italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ divide start_ARG italic_s end_ARG start_ARG italic_t - italic_s end_ARG ∥ bold_italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( [ ( bold_italic_ξ - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + [ ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 [ ( bold_italic_ξ - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] [ ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] ) ] (B.2.103)
=tstpt(𝝃)2𝔼[sts𝒗22+12t2([(𝝃𝒙)𝒗]2+[(𝝃𝒚)𝒗]22[(𝝃𝒙)𝒗][(𝝃𝒚)𝒗])]\displaystyle=\frac{t-s}{tp_{t}(\bm{\xi})^{2}}\operatorname{\mathbb{E}}\left[\frac{s}{t-s}\|\bm{v}\|_{2}^{2}+\frac{1}{2t^{2}}\left([(\bm{\xi}-\bm{x})^{\top}\bm{v}]^{2}+[(\bm{\xi}-\bm{y})^{\top}\bm{v}]^{2}-2[(\bm{\xi}-\bm{x})^{\top}\bm{v}][(\bm{\xi}-\bm{y})^{\top}\bm{v}]\right)\right]= divide start_ARG italic_t - italic_s end_ARG start_ARG italic_t italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ divide start_ARG italic_s end_ARG start_ARG italic_t - italic_s end_ARG ∥ bold_italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( [ ( bold_italic_ξ - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + [ ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 [ ( bold_italic_ξ - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] [ ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] ) ] (B.2.104)
=tstpt(𝝃)2𝔼[sts𝒗22+12t2([(𝝃𝒙)𝒗][(𝝃𝒚)𝒗])2]\displaystyle=\frac{t-s}{tp_{t}(\bm{\xi})^{2}}\operatorname{\mathbb{E}}\left[\frac{s}{t-s}\|\bm{v}\|_{2}^{2}+\frac{1}{2t^{2}}\left([(\bm{\xi}-\bm{x})^{\top}\bm{v}]-[(\bm{\xi}-\bm{y})^{\top}\bm{v}]\right)^{2}\right]= divide start_ARG italic_t - italic_s end_ARG start_ARG italic_t italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ divide start_ARG italic_s end_ARG start_ARG italic_t - italic_s end_ARG ∥ bold_italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( [ ( bold_italic_ξ - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] - [ ( bold_italic_ξ - bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (B.2.105)
=tstpt(𝝃)2𝔼[sts𝒗22+12t2[(𝒚𝒙)𝒗]2]\displaystyle=\frac{t-s}{tp_{t}(\bm{\xi})^{2}}\operatorname{\mathbb{E}}\left[\frac{s}{t-s}\|\bm{v}\|_{2}^{2}+\frac{1}{2t^{2}}[(\bm{y}-\bm{x})^{\top}\bm{v}]^{2}\right]= divide start_ARG italic_t - italic_s end_ARG start_ARG italic_t italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ divide start_ARG italic_s end_ARG start_ARG italic_t - italic_s end_ARG ∥ bold_italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG [ ( bold_italic_y - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (B.2.106)
=stpt(𝝃)2𝒗22+ts2t3pt(𝝃)𝔼[{(𝒚𝒙)𝒗}2]\displaystyle=\frac{s}{tp_{t}(\bm{\xi})^{2}}\|\bm{v}\|_{2}^{2}+\frac{t-s}{2t^{3}p_{t}(\bm{\xi})}\operatorname{\mathbb{E}}[\{(\bm{y}-\bm{x})^{\top}\bm{v}\}^{2}]= divide start_ARG italic_s end_ARG start_ARG italic_t italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_t - italic_s end_ARG start_ARG 2 italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG blackboard_E [ { ( bold_italic_y - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_v } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (B.2.107)

Since 𝒙\bm{x}bold_italic_x and 𝒚\bm{y}bold_italic_y are i.i.d., the whole integral (i.e., the original quadratic form) is 0 if and only if s=0s=0italic_s = 0 and 𝒙\bm{x}bold_italic_x has support entirely contained in an affine subspace which is orthogonal to 𝒗\bm{v}bold_italic_v. But this is ruled out by assumption (i.e., that 𝒙\bm{x}bold_italic_x has a density on D\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT), so the Jacobian 𝒙¯(𝝃)\bar{\bm{x}}^{\prime}(\bm{\xi})over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_ξ ) is symmetric positive definite. ∎

Lemma B.4 (Generalization of [Gri11] Corollary A.2, Part 1).

Let f:DDf\colon\mathbb{R}^{D}\to\mathbb{R}^{D}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT be any differentiable function whose Jacobian f(𝐱)f^{\prime}(\bm{x})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_x ) is symmetric positive definite. Then ffitalic_f is injective, and hence invertible as a function D(f)\mathbb{R}^{D}\to\mathcal{R}(f)blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT → caligraphic_R ( italic_f ) where (f)\mathcal{R}(f)caligraphic_R ( italic_f ) is the range of ffitalic_f.

Proof.

Suppose that ffitalic_f were not injective, i.e., there exists 𝒙,𝒙\bm{x},\bm{x}^{\prime}bold_italic_x , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that f(𝒙)=f(𝒙)f(\bm{x})=f(\bm{x}^{\prime})italic_f ( bold_italic_x ) = italic_f ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) while 𝒙𝒙\bm{x}\neq\bm{x}^{\prime}bold_italic_x ≠ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Define 𝒗(𝒙𝒙)/𝒙𝒙2\bm{v}\doteq(\bm{x}^{\prime}-\bm{x})/\|\bm{x}^{\prime}-\bm{x}\|_{2}bold_italic_v ≐ ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_italic_x ) / ∥ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Define the function g:g\colon\mathbb{R}\to\mathbb{R}italic_g : blackboard_R → blackboard_R as g(t)𝒗f(𝒙+t𝒗)g(t)\doteq\bm{v}^{\top}f(\bm{x}+t\bm{v})italic_g ( italic_t ) ≐ bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_f ( bold_italic_x + italic_t bold_italic_v ). Then g(0)=𝒗f(𝒙)=𝒗f(𝒙)=g(𝒙𝒙2)g(0)=\bm{v}^{\top}f(\bm{x})=\bm{v}^{\top}f(\bm{x}^{\prime})=g(\|\bm{x}^{\prime}-\bm{x}\|_{2})italic_g ( 0 ) = bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_f ( bold_italic_x ) = bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_g ( ∥ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Since ffitalic_f is differentiable, ggitalic_g is differentiable, so the derivative gg^{\prime}italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT must vanish for some t(0,𝒙𝒙2)t^{\ast}\in(0,\|\bm{x}^{\prime}-\bm{x}\|_{2})italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ ( 0 , ∥ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) by the mean value theorem. However,

g(t)𝒗[f(𝒙+t𝒗)]𝒗>0g^{\prime}(t^{\ast})\doteq\bm{v}^{\top}\left[f^{\prime}(\bm{x}+t^{\ast}\bm{v})\right]\bm{v}>0italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≐ bold_italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_x + italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_v ) ] bold_italic_v > 0 (B.2.108)

since the Jacobian is positive definite. Thus we arrive at a contradiction, as claimed. ∎

Combining the above two results, we obtain the following crucial result.

Corollary B.1 (Generalization of [Gri11] Corollary A.2, Part 2).

Let 𝐱\bm{x}bold_italic_x be any random variable such that Assumptions B.1 and B.2 hold, and let (𝐱t)t[0,T](\bm{x}_{t})_{t\in[0,T]}( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT be the stochastic process (B.2.1). Let s,t[0,T]s,t\in[0,T]italic_s , italic_t ∈ [ 0 , italic_T ] be such that 0s<tT0\leq s<t\leq T0 ≤ italic_s < italic_t ≤ italic_T, and let 𝐱¯(𝛏)𝔼[𝐱s𝐱t=𝛏]\bar{\bm{x}}(\bm{\xi})\doteq\operatorname{\mathbb{E}}[\bm{x}_{s}\mid\bm{x}_{t}=\bm{\xi}]over¯ start_ARG bold_italic_x end_ARG ( bold_italic_ξ ) ≐ blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_ξ ]. Then 𝐱¯\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG is injective, and therefore invertible onto its range.

Proof.

The only thing left to show is that 𝒙¯\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG is differentiable, but this is immediate from Tweedie’s formula (Theorem 3.3) which shows that 𝒙¯\bar{\bm{x}}over¯ start_ARG bold_italic_x end_ARG is differentiable if and only if logpt\nabla\log p_{t}∇ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is differentiable, and this is provided by Equation B.2.3. ∎

Controlling the Laplacian Δlogpt\Delta\log p_{t}roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

Finally, we develop a technical estimate which is required for the proof of Theorem B.3 and actually motivates the assumption for the viable ttitalic_t.

Lemma B.5.

Let 𝐱\bm{x}bold_italic_x be any random variable such that Assumptions B.1 and B.2 hold, and let (𝐱t)t[0,T](\bm{x}_{t})_{t\in[0,T]}( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT be the stochastic process (B.2.1). Let ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the density of 𝐱t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then, for t>0t>0italic_t > 0 it holds

sup𝝃D|Δlogpt(𝝃)|max(Dt2,|Rt4Dt2|).\sup_{\bm{\xi}\in\mathbb{R}^{D}}\lvert\Delta\log p_{t}(\bm{\xi})\rvert\leq\max\left(\frac{D}{t^{2}},\left\lvert\frac{R}{t^{4}}-\frac{D}{t^{2}}\right\rvert\right).roman_sup start_POSTSUBSCRIPT bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) | ≤ roman_max ( divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , | divide start_ARG italic_R end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG | ) . (B.2.109)
Proof.

By chain rule, a simple exercise computes

Δlogpt(𝝃)=Δpt(𝝃)pt(𝝃)pt(𝝃)22pt(𝝃)2.\Delta\log p_{t}(\bm{\xi})=\frac{\Delta p_{t}(\bm{\xi})}{p_{t}(\bm{\xi})}-\frac{\|\nabla p_{t}(\bm{\xi})\|_{2}^{2}}{p_{t}(\bm{\xi})^{2}}.roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) = divide start_ARG roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG - divide start_ARG ∥ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (B.2.110)

Using Proposition B.1 to write the terms in Δpt(𝝃)\Delta p_{t}(\bm{\xi})roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ), we obtain

Δpt(𝝃)pt(𝝃)\displaystyle\frac{\Delta p_{t}(\bm{\xi})}{p_{t}(\bm{\xi})}divide start_ARG roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG =𝔼[𝝃𝒙22Dt2t4φt(𝝃𝒙)]𝔼[φt(𝝃𝒙)]\displaystyle=\frac{\operatorname{\mathbb{E}}\left[\frac{\|\bm{\xi}-\bm{x}\|_{2}^{2}-Dt^{2}}{t^{4}}\cdot\varphi_{t}(\bm{\xi}-\bm{x})\right]}{\operatorname{\mathbb{E}}[\varphi_{t}(\bm{\xi}-\bm{x})]}= divide start_ARG blackboard_E [ divide start_ARG ∥ bold_italic_ξ - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ⋅ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] end_ARG start_ARG blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] end_ARG (B.2.111)
=D{𝝃𝒖22Dt2t4}φt(𝝃𝒖)p(𝒖)d𝒖Dφt(𝝃𝒖)p(𝒖)d𝒖.\displaystyle=\frac{\int_{\mathbb{R}^{D}}\left\{\frac{\|\bm{\xi}-\bm{u}\|_{2}^{2}-Dt^{2}}{t^{4}}\right\}\varphi_{t}(\bm{\xi}-\bm{u})p(\bm{u})\mathrm{d}\bm{u}}{\int_{\mathbb{R}^{D}}\varphi_{t}(\bm{\xi}-\bm{u})p(\bm{u})\mathrm{d}\bm{u}}.= divide start_ARG ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { divide start_ARG ∥ bold_italic_ξ - bold_italic_u ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG } italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_u ) italic_p ( bold_italic_u ) roman_d bold_italic_u end_ARG start_ARG ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_u ) italic_p ( bold_italic_u ) roman_d bold_italic_u end_ARG . (B.2.112)

This looks like a Bayesian marginalization, so let us define the appropriate normalized density

q𝝃(𝒖)=φt(𝝃𝒖)p(𝒖)Dφt(𝝃𝒗)p(𝒗)d𝒗=φt(𝝃𝒖)p(𝒖)𝔼[φt(𝝃𝒙)]=φt(𝝃𝒖)p(𝒖)pt(𝝃)q_{\bm{\xi}}(\bm{u})=\frac{\varphi_{t}(\bm{\xi}-\bm{u})p(\bm{u})}{\int_{\mathbb{R}^{D}}\varphi_{t}(\bm{\xi}-\bm{v})p(\bm{v})\mathrm{d}\bm{v}}=\frac{\varphi_{t}(\bm{\xi}-\bm{u})p(\bm{u})}{\operatorname{\mathbb{E}}[\varphi_{t}(\bm{\xi}-\bm{x})]}=\frac{\varphi_{t}(\bm{\xi}-\bm{u})p(\bm{u})}{p_{t}(\bm{\xi})}italic_q start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ( bold_italic_u ) = divide start_ARG italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_u ) italic_p ( bold_italic_u ) end_ARG start_ARG ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_v ) italic_p ( bold_italic_v ) roman_d bold_italic_v end_ARG = divide start_ARG italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_u ) italic_p ( bold_italic_u ) end_ARG start_ARG blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] end_ARG = divide start_ARG italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_u ) italic_p ( bold_italic_u ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG (B.2.113)

Then, defining 𝒚𝝃q𝝃\bm{y}_{\bm{\xi}}\sim q_{\bm{\xi}}bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ∼ italic_q start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT, we can write

Δpt(𝝃)pt(𝝃)=D{𝝃𝒖22Dt2t4}q𝝃(𝒖)d𝒖=1t4𝔼[𝝃𝒚𝝃22]Dt2.\frac{\Delta p_{t}(\bm{\xi})}{p_{t}(\bm{\xi})}=\int_{\mathbb{R}^{D}}\left\{\frac{\|\bm{\xi}-\bm{u}\|_{2}^{2}-Dt^{2}}{t^{4}}\right\}q_{\bm{\xi}}(\bm{u})\mathrm{d}\bm{u}=\frac{1}{t^{4}}\operatorname{\mathbb{E}}[\|\bm{\xi}-\bm{y}_{\bm{\xi}}\|_{2}^{2}]-\frac{D}{t^{2}}.divide start_ARG roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { divide start_ARG ∥ bold_italic_ξ - bold_italic_u ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG } italic_q start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ( bold_italic_u ) roman_d bold_italic_u = divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ∥ bold_italic_ξ - bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (B.2.114)

Similarly, writing out the second term (non-squared) we obtain

pt(𝝃)pt(𝝃)=𝝃𝔼[𝒚𝝃]t2.\frac{\nabla p_{t}(\bm{\xi})}{p_{t}(\bm{\xi})}=-\frac{\bm{\xi}-\operatorname{\mathbb{E}}[\bm{y}_{\bm{\xi}}]}{t^{2}}.divide start_ARG ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG = - divide start_ARG bold_italic_ξ - blackboard_E [ bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ] end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (B.2.115)

Letting 𝒛𝝃𝒚𝝃𝝃\bm{z}_{\bm{\xi}}\doteq\bm{y}_{\bm{\xi}}-\bm{\xi}bold_italic_z start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ≐ bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT - bold_italic_ξ, it holds

Δpt(𝝃)pt(𝝃)=𝔼[𝒛𝝃22]t4Dt2,pt(𝝃)pt(𝝃)=𝔼[𝒛𝝃]t2.\frac{\Delta p_{t}(\bm{\xi})}{p_{t}(\bm{\xi})}=\frac{\operatorname{\mathbb{E}}[\|\bm{z}_{\bm{\xi}}\|_{2}^{2}]}{t^{4}}-\frac{D}{t^{2}},\qquad\frac{\nabla p_{t}(\bm{\xi})}{p_{t}(\bm{\xi})}=\frac{\operatorname{\mathbb{E}}[\bm{z}_{\bm{\xi}}]}{t^{2}}.divide start_ARG roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG = divide start_ARG blackboard_E [ ∥ bold_italic_z start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , divide start_ARG ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) end_ARG = divide start_ARG blackboard_E [ bold_italic_z start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ] end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (B.2.116)

Thus writing Δlogpt\Delta\log p_{t}roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT out fully, we have

Δlogpt(𝝃)\displaystyle\Delta\log p_{t}(\bm{\xi})roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) =𝔼[𝒛𝝃22]t4Dt2𝔼[𝒛𝝃]22t4\displaystyle=\frac{\operatorname{\mathbb{E}}[\|\bm{z}_{\bm{\xi}}\|_{2}^{2}]}{t^{4}}-\frac{D}{t^{2}}-\frac{\|\operatorname{\mathbb{E}}[\bm{z}_{\bm{\xi}}]\|_{2}^{2}}{t^{4}}= divide start_ARG blackboard_E [ ∥ bold_italic_z start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG ∥ blackboard_E [ bold_italic_z start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ] ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG (B.2.117)
=𝔼[𝒛𝝃22]𝔼[𝒛𝝃]22t4Dt2\displaystyle=\frac{\operatorname{\mathbb{E}}[\|\bm{z}_{\bm{\xi}}\|_{2}^{2}]-\|\operatorname{\mathbb{E}}[\bm{z}_{\bm{\xi}}]\|_{2}^{2}}{t^{4}}-\frac{D}{t^{2}}= divide start_ARG blackboard_E [ ∥ bold_italic_z start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - ∥ blackboard_E [ bold_italic_z start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ] ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (B.2.118)
=tr(Cov(𝒛𝝃))t4Dt2\displaystyle=\frac{\operatorname{tr}(\operatorname{Cov}(\bm{z}_{\bm{\xi}}))}{t^{4}}-\frac{D}{t^{2}}= divide start_ARG roman_tr ( roman_Cov ( bold_italic_z start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ) ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (B.2.119)
=tr(Cov(𝒚𝝃))t4Dt2.\displaystyle=\frac{\operatorname{tr}(\operatorname{Cov}(\bm{y}_{\bm{\xi}}))}{t^{4}}-\frac{D}{t^{2}}.= divide start_ARG roman_tr ( roman_Cov ( bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ) ) end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (B.2.120)

A trivial lower bound on this trace is 0, since covariance matrices are positive semidefinite. To find an upper bound, note that 𝒚𝝃\bm{y}_{\bm{\xi}}bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT takes values only in the support of 𝒙\bm{x}bold_italic_x (since ppitalic_p is a factor of the density q𝝃q_{\bm{\xi}}italic_q start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT of 𝒚𝝃\bm{y}_{\bm{\xi}}bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT), which by Assumption B.1 is a compact set 𝒮\mathcal{S}caligraphic_S with radius Rsup𝝃D𝝃2R\doteq\sup_{\bm{\xi}\in\mathbb{R}^{D}}\|\bm{\xi}\|_{2}italic_R ≐ roman_sup start_POSTSUBSCRIPT bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. So

tr(Cov(𝒚𝝃))=𝔼[𝒚𝝃22]𝔼[𝒚𝝃]22𝔼[𝒚𝝃22]R2.\operatorname{tr}(\operatorname{Cov}(\bm{y}_{\bm{\xi}}))=\operatorname{\mathbb{E}}[\|\bm{y}_{\bm{\xi}}\|_{2}^{2}]-\|\operatorname{\mathbb{E}}[\bm{y}_{\bm{\xi}}]\|_{2}^{2}\leq\operatorname{\mathbb{E}}[\|\bm{y}_{\bm{\xi}}\|_{2}^{2}]\leq R^{2}.roman_tr ( roman_Cov ( bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ) ) = blackboard_E [ ∥ bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - ∥ blackboard_E [ bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ] ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ blackboard_E [ ∥ bold_italic_y start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (B.2.121)

Therefore

Dt2Δlogpt(𝝃)R2t4Dt2,-\frac{D}{t^{2}}\leq\Delta\log p_{t}(\bm{\xi})\leq\frac{R^{2}}{t^{4}}-\frac{D}{t^{2}},- divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ roman_Δ roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ≤ divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_D end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (B.2.122)

which shows the claim. ∎

Derivative Computations

Here we calculate some useful derivatives which will be reused throughout the appendix.

Proposition B.1.

Let 𝐱\bm{x}bold_italic_x be any random variable such that Assumptions B.1 and B.2 hold, and let (𝐱t)t[0,T](\bm{x}_{t})_{t\in[0,T]}( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT be the stochastic process (B.2.1). For t0t\geq 0italic_t ≥ 0, let ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the density of 𝐱t\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then

ptt(𝝃)\displaystyle\frac{\partial p_{t}}{\partial t}(\bm{\xi})divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG ( bold_italic_ξ ) =𝔼[φt(𝝃𝒙)𝝃𝒙22Dt2t3]\displaystyle=\operatorname{\mathbb{E}}\left[\varphi_{t}(\bm{\xi}-\bm{x})\cdot\frac{\|\bm{\xi}-\bm{x}\|_{2}^{2}-Dt^{2}}{t^{3}}\right]= blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ⋅ divide start_ARG ∥ bold_italic_ξ - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ] (B.2.123)
pt(𝝃)\displaystyle\nabla p_{t}(\bm{\xi})∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) =𝔼[φt(𝝃𝒙)𝝃𝒙t2]\displaystyle=-\operatorname{\mathbb{E}}\left[\varphi_{t}(\bm{\xi}-\bm{x})\cdot\frac{\bm{\xi}-\bm{x}}{t^{2}}\right]= - blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ⋅ divide start_ARG bold_italic_ξ - bold_italic_x end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] (B.2.124)
2pt(𝝃)\displaystyle\nabla^{2}p_{t}(\bm{\xi})∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) =𝔼[φt(𝝃𝒙)(𝝃𝒙)(𝝃𝒙)t2𝑰t4]\displaystyle=\operatorname{\mathbb{E}}\left[\varphi_{t}(\bm{\xi}-\bm{x})\cdot\frac{(\bm{\xi}-\bm{x})(\bm{\xi}-\bm{x})^{\top}-t^{2}\bm{I}}{t^{4}}\right]= blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ⋅ divide start_ARG ( bold_italic_ξ - bold_italic_x ) ( bold_italic_ξ - bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ] (B.2.125)
Δpt(𝝃)\displaystyle\Delta p_{t}(\bm{\xi})roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) =𝔼[φt(𝝃𝒙)𝝃𝒙22Dt2t4].\displaystyle=\operatorname{\mathbb{E}}\left[\varphi_{t}(\bm{\xi}-\bm{x})\cdot\frac{\|\bm{\xi}-\bm{x}\|_{2}^{2}-Dt^{2}}{t^{4}}\right].= blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ⋅ divide start_ARG ∥ bold_italic_ξ - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ] . (B.2.126)
Proof.

We use the convolution representation of ptp_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, namely (B.2.3). First taking the time derivative, a computation yields that Proposition B.3 applies,555We use ft(𝝃)=p(𝝃)φt(𝝃𝒙)f_{t}(\bm{\xi})=p(\bm{\xi})\varphi_{t}(\bm{\xi}-\bm{x})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) = italic_p ( bold_italic_ξ ) italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ), noting that it is twice continuously differentiable in 𝝃\bm{\xi}bold_italic_ξ and (more than) twice continuously differentiable in ttitalic_t. Then to check the local integrability of ftf_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT we compute ftt(𝝃)=ft(𝝃)1t3(𝝃𝒙22Dt2)\frac{\partial f_{t}}{\partial t}(\bm{\xi})=f_{t}(\bm{\xi})\cdot\frac{1}{t^{3}}(\|\bm{\xi}-\bm{x}\|_{2}^{2}-Dt^{2})divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG ( bold_italic_ξ ) = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⋅ divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ( ∥ bold_italic_ξ - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), which is is easy to check integrable over 𝝃\bm{\xi}bold_italic_ξ and t[tmin,tmax]t\in[t_{\min},t_{\max}]italic_t ∈ [ italic_t start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] where tmin>0t_{\min}>0italic_t start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT > 0. (Indeed, ftf_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT has exponentially decaying tails, so the quadratic term in the product is of no issue.) so we can bring the derivative inside the integral/expectation as:

ptt(𝝃)=t𝔼[φt(𝝃𝒙)]=𝔼[tφt(𝝃𝒙)]=φttp.\frac{\partial p_{t}}{\partial t}(\bm{\xi})=\frac{\partial}{\partial t}\operatorname{\mathbb{E}}[\varphi_{t}(\bm{\xi}-\bm{x})]=\operatorname{\mathbb{E}}\left[\frac{\partial}{\partial t}\varphi_{t}(\bm{\xi}-\bm{x})\right]=\frac{\partial\varphi_{t}}{\partial t}*p.divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG ( bold_italic_ξ ) = divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG blackboard_E [ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] = blackboard_E [ divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ - bold_italic_x ) ] = divide start_ARG ∂ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG ∗ italic_p . (B.2.127)

Meanwhile, by properties of convolutions (Proposition B.4) and using the fact that ppitalic_p is compactly supported (Assumption B.1),

pt=φtppt=φtp2pt=2φtpΔpt=Δφtp.p_{t}=\varphi_{t}*p\implies\nabla p_{t}=\nabla\varphi_{t}*p\implies\nabla^{2}p_{t}=\nabla^{2}\varphi_{t}*p\implies\Delta p_{t}=\Delta\varphi_{t}*p.italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∗ italic_p ⟹ ∇ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∗ italic_p ⟹ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∗ italic_p ⟹ roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Δ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∗ italic_p . (B.2.128)

The rest of the computation follows from Proposition B.2. ∎

Proposition B.2.

For t>0t>0italic_t > 0 and 𝛏D\bm{\xi}\in\mathbb{R}^{D}bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT it holds

tφt(𝝃)\displaystyle\frac{\partial}{\partial t}\varphi_{t}(\bm{\xi})divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) =φt(𝝃)𝝃22Dt2t3\displaystyle=\varphi_{t}(\bm{\xi})\cdot\frac{\|\bm{\xi}\|_{2}^{2}-Dt^{2}}{t^{3}}= italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⋅ divide start_ARG ∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG (B.2.129)
φt(𝝃)\displaystyle\nabla\varphi_{t}(\bm{\xi})∇ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) =φt(𝝃)𝝃t2\displaystyle=-\varphi_{t}(\bm{\xi})\cdot\frac{\bm{\xi}}{t^{2}}= - italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⋅ divide start_ARG bold_italic_ξ end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (B.2.130)
2φt(𝝃)\displaystyle\nabla^{2}\varphi_{t}(\bm{\xi})∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) =φt(𝝃)𝝃𝝃t2𝑰t4\displaystyle=\varphi_{t}(\bm{\xi})\cdot\frac{\bm{\xi}\bm{\xi}^{\top}-t^{2}\bm{I}}{t^{4}}= italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⋅ divide start_ARG bold_italic_ξ bold_italic_ξ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG (B.2.131)
Δφt(𝝃)\displaystyle\Delta\varphi_{t}(\bm{\xi})roman_Δ italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) =φt(𝝃)𝝃22Dt2t4.\displaystyle=\varphi_{t}(\bm{\xi})\cdot\frac{\|\bm{\xi}\|_{2}^{2}-Dt^{2}}{t^{4}}.= italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) ⋅ divide start_ARG ∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_D italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG . (B.2.132)
Proof.

Direct computation. ∎

Differentiating Under the Integral Sign

In this appendix, we differentiate under the integral sign many times, and it is important to know when we can do this. There are two kinds of differentiating under the integral sign:

  1. 1.

    Differentiating an integral ft(𝝃)d𝝃\int f_{t}(\bm{\xi})\mathrm{d}\bm{\xi}∫ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ with respect to the auxiliary parameter ttitalic_t.

  2. 2.

    Differentiating a convolution (fg)(𝝃)=f(𝝃)g(𝝃𝒖)du(f*g)(\bm{\xi})=\int f(\bm{\xi})g(\bm{\xi}-\bm{u})\mathrm{d}u( italic_f ∗ italic_g ) ( bold_italic_ξ ) = ∫ italic_f ( bold_italic_ξ ) italic_g ( bold_italic_ξ - bold_italic_u ) roman_d italic_u with respect to the variable 𝝃\bm{\xi}bold_italic_ξ.

For the first category, we give a concrete result, stated without proof but attributable to the linked source, which derives the following result as a special case of a more general theorem about the interaction of differential operators and tempered distributions, much beyond the scope of the book. A full formal reference can be found in [Jon82].

Proposition B.3 ([Jon82], Section 11.12).

Let f:(0,T)×Df\colon(0,T)\times\mathbb{R}^{D}\to\mathbb{R}italic_f : ( 0 , italic_T ) × blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT → blackboard_R be such that:

  • ffitalic_f is a jointly measurable function of (t,𝝃)(t,\bm{\xi})( italic_t , bold_italic_ξ );

  • For Lebesgue-almost every 𝝃D\bm{\xi}\in\mathbb{R}^{D}bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT, the function tft(𝝃)t\mapsto f_{t}(\bm{\xi})italic_t ↦ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) is absolutely continuous;

  • ftt\frac{\partial f_{t}}{\partial t}divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG is locally integrable, i.e., for every [tmin,tmax](0,T)[t_{\min},t_{\max}]\subseteq(0,T)[ italic_t start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] ⊆ ( 0 , italic_T ) it holds

    tmintmaxD|ftt(𝝃)|d𝝃<.\int_{t_{\min}}^{t_{\max}}\int_{\mathbb{R}^{D}}\left\lvert\frac{\partial f_{t}}{\partial t}(\bm{\xi})\right\rvert\mathrm{d}\bm{\xi}<\infty.∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG ( bold_italic_ξ ) | roman_d bold_italic_ξ < ∞ . (B.2.133)

Then tDft(𝛏)d𝛏t\mapsto\int_{\mathbb{R}^{D}}f_{t}(\bm{\xi})\mathrm{d}\bm{\xi}italic_t ↦ ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ is an absolutely continuous function on (0,T)(0,T)( 0 , italic_T ), and its derivative is

ddtDft(𝝃)d𝝃=Dtft(𝝃)d𝝃,\frac{\mathrm{d}}{\mathrm{d}t}\int_{\mathbb{R}^{D}}f_{t}(\bm{\xi})\mathrm{d}\bm{\xi}=\int_{\mathbb{R}^{D}}\frac{\partial}{\partial t}f_{t}(\bm{\xi})\mathrm{d}\bm{\xi},divide start_ARG roman_d end_ARG start_ARG roman_d italic_t end_ARG ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ξ ) roman_d bold_italic_ξ , (B.2.134)

defined for almost every t(0,T)t\in(0,T)italic_t ∈ ( 0 , italic_T ).

For the second category, we give another concrete result, stated without proof but fully formalized in [BB11].

Proposition B.4 ([BB11], Proposition 4.20).

Let ffitalic_f be kkitalic_k-times continuously differentiable with compact support, and let ggitalic_g be locally integrable. Then the convolution fgf*gitalic_f ∗ italic_g defined by

(fg)(𝝃)Df(𝒖)g(𝝃𝒖)d𝒖(f*g)(\bm{\xi})\doteq\int_{\mathbb{R}^{D}}f(\bm{u})g(\bm{\xi}-\bm{u})\mathrm{d}\bm{u}( italic_f ∗ italic_g ) ( bold_italic_ξ ) ≐ ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( bold_italic_u ) italic_g ( bold_italic_ξ - bold_italic_u ) roman_d bold_italic_u (B.2.135)

is kkitalic_k-times continuously differentiable, and its derivative of order kkitalic_k is

k(fg)=(kf)g.\nabla^{k}(f*g)=(\nabla^{k}f)*g.∇ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_f ∗ italic_g ) = ( ∇ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_f ) ∗ italic_g . (B.2.136)

Although not in the book, a simple integration by parts argument shows that if ggitalic_g is also kkitalic_k-times differentiable, then we can “trade off” the regularity:

k(fg)=f(kg).\nabla^{k}(f*g)=f*(\nabla^{k}g).∇ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_f ∗ italic_g ) = italic_f ∗ ( ∇ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_g ) . (B.2.137)

B.3 Lossy Coding and Sphere Packing

In this section, we prove Theorem 3.6. Following our conventions throughout this appendix, we write 𝒮=Supp(𝒙)\mathcal{S}=\operatorname{Supp}(\bm{x})caligraphic_S = roman_Supp ( bold_italic_x ) for the compact support of the random variable 𝒙\bm{x}bold_italic_x.

As foreshadowed, we will make a regularity assumption on the support set 𝒮\mathcal{S}caligraphic_S to prove Theorem 3.6. One possibility for proceeding under minimal assumptions would be to instantiate the results of [RBK18, RKB23] in our setting, since these results apply to sets 𝒮\mathcal{S}caligraphic_S with very low regularity (e.g., Cantor-like sets with fractal structure). However, we have found precisely computing the constants in these results, a necessary endeavor to assert a conclusion like Theorem 3.6, to be somewhat onerous in our setting. Our approach is therefore to add a geometric regularity assumption on the set 𝒮\mathcal{S}caligraphic_S that sacrifices some generality, but allows us to develop a more transparent argument. To avoid sacrificing too much generality, we must ensure that low-dimensionality in the set 𝒮\mathcal{S}caligraphic_S is not prohibited. We therefore consider the running example we have used throughout the book, the mixture of low-rank Gaussian distributions. In this geometric setting, we will enforce this via assuming that 𝒮\mathcal{S}caligraphic_S is a union of hyperspheres, which is equivalent to the Gaussian assumption in high dimensions with overwhelming probability.

Assumption B.3.

The support 𝒮D\mathcal{S}\subset\mathbb{R}^{D}caligraphic_S ⊂ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT of the random variable 𝒙\bm{x}bold_italic_x is a finite union of KKitalic_K spheres, each with dimension dkd_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, k[K]k\in[K]italic_k ∈ [ italic_K ]. The probability that 𝒙\bm{x}bold_italic_x is drawn from the kkitalic_k-th sphere is given by πk[0,1]\pi_{k}\in[0,1]italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ 0 , 1 ], and conditional on being drawn from the kkitalic_k-th sphere, 𝒙\bm{x}bold_italic_x is uniformly distributed on that sphere. The supports satisfy that each sphere is mutually orthogonal with all others.

We proceed under the simplifying Assumption B.3 in order to simplify excessive technicality, and to connect to an important running example used throughout the monograph. We believe our results can be generalized to support 𝒮\mathcal{S}caligraphic_S from the class of sets with positive reach with additional technical effort, but leave this for the future.

B.3.1 Proof of Relationship Between Rate Distortion and Covering

We briefly sketch the proof, then proceed to establishing three fundamental lemmas, then give the proof. The proof will depend on notions introduced in the sketch below.

Obtaining an upper bound on the rate distortion function (3.3.3) is straightforward: by the rate characterization (i.e., the rate distortion function is the minimum rate of a code for 𝒙\bm{x}bold_italic_x with expected squared 2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT distortion ϵ\epsilonitalic_ϵ), upper bounding ϵ(𝒙)\mathcal{R}_{\epsilon}(\bm{x})caligraphic_R start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x ) only requires demonstrating one code for 𝒙\bm{x}bold_italic_x that achieves this target distortion, and any ϵ\epsilonitalic_ϵ-covering of Supp(𝒙)\operatorname{Supp}(\bm{x})roman_Supp ( bold_italic_x ) achieves this, with rate equal to the base-222 logarithm of the cardinality of the covering. The lower bound is more subtle. We make use of the Shannon lower bound, discussed in Remark 3.7: working out the constants in [LZ94, §III, (22)] gives a more precise version of the result quoted in Equation 3.3.10 (in bits, of course): for any random variable 𝒙\bm{x}bold_italic_x with compact support and a density, it holds

ϵ(𝒙)h(𝒙)logvol(Bϵ)+log(2DΓ(D/2)(D2e)D/2),\mathcal{R}_{\epsilon}(\bm{x})\geq h(\bm{x})-\log\operatorname{vol}(B_{\epsilon})+\log\left(\frac{2}{D\Gamma(D/2)}\left(\frac{D}{2e}\right)^{D/2}\right),caligraphic_R start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x ) ≥ italic_h ( bold_italic_x ) - roman_log roman_vol ( italic_B start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + roman_log ( divide start_ARG 2 end_ARG start_ARG italic_D roman_Γ ( italic_D / 2 ) end_ARG ( divide start_ARG italic_D end_ARG start_ARG 2 italic_e end_ARG ) start_POSTSUPERSCRIPT italic_D / 2 end_POSTSUPERSCRIPT ) , (B.3.1)

with entropy (etc.) in nats in this expression. The constant can be easily estimated using Stirling’s approximation. A quantitative form of Stirling’s approximation which is often useful gives for any x>0x>0italic_x > 0 [Jam15]

Γ(x)2πxx1/2exe1/(12x).\Gamma(x)\leq\sqrt{2\pi}x^{x-1/2}e^{-x}e^{1/(12x)}.roman_Γ ( italic_x ) ≤ square-root start_ARG 2 italic_π end_ARG italic_x start_POSTSUPERSCRIPT italic_x - 1 / 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT 1 / ( 12 italic_x ) end_POSTSUPERSCRIPT . (B.3.2)

We will apply this bound to Γ(D/2)\Gamma(D/2)roman_Γ ( italic_D / 2 ) in Equation B.3.1. We get

log(2DΓ(D/2)(D2e)D/2)\displaystyle\log\left(\frac{2}{D\Gamma(D/2)}\left(\frac{D}{2e}\right)^{D/2}\right)roman_log ( divide start_ARG 2 end_ARG start_ARG italic_D roman_Γ ( italic_D / 2 ) end_ARG ( divide start_ARG italic_D end_ARG start_ARG 2 italic_e end_ARG ) start_POSTSUPERSCRIPT italic_D / 2 end_POSTSUPERSCRIPT ) 16D+log(2D(D2e)D/2D4π(D2e)D/2)\displaystyle\geq-\frac{1}{6D}+\log\left(\frac{2}{D}\left(\frac{D}{2e}\right)^{D/2}\cdot\sqrt{\frac{D}{4\pi}}\left(\frac{D}{2e}\right)^{-D/2}\right)≥ - divide start_ARG 1 end_ARG start_ARG 6 italic_D end_ARG + roman_log ( divide start_ARG 2 end_ARG start_ARG italic_D end_ARG ( divide start_ARG italic_D end_ARG start_ARG 2 italic_e end_ARG ) start_POSTSUPERSCRIPT italic_D / 2 end_POSTSUPERSCRIPT ⋅ square-root start_ARG divide start_ARG italic_D end_ARG start_ARG 4 italic_π end_ARG end_ARG ( divide start_ARG italic_D end_ARG start_ARG 2 italic_e end_ARG ) start_POSTSUPERSCRIPT - italic_D / 2 end_POSTSUPERSCRIPT ) (B.3.3)
=16D12logDπ,\displaystyle=-\frac{1}{6D}-\frac{1}{2}\log D\pi,= - divide start_ARG 1 end_ARG start_ARG 6 italic_D end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log italic_D italic_π , (B.3.4)

which we can take for the explicit value of the constant CDC_{D}italic_C start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT in Equation 3.3.10. Summarizing the fully quantified Shannon lower bound (in bits):

ϵ(𝒙)h(𝒙)log2vol(Bϵ)O(logD).\mathcal{R}_{\epsilon}(\bm{x})\geq h(\bm{x})-\log_{2}\operatorname{vol}(B_{\epsilon})-O(\log D).caligraphic_R start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x ) ≥ italic_h ( bold_italic_x ) - roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_vol ( italic_B start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) - italic_O ( roman_log italic_D ) . (B.3.5)

Now, the important constraint for our current purposes is that the Shannon lower bound requires the random variable 𝒙\bm{x}bold_italic_x to have a density, which rules out many low-dimensional distributions of interest. But let us momentarily consider the situation when 𝒙\bm{x}bold_italic_x does admit a density. The assumption that 𝒙\bm{x}bold_italic_x is uniformly distributed on its support is easily formalized in this setting: for any Borel set A𝒮A\subset\mathcal{S}italic_A ⊂ caligraphic_S, we have

[𝒙A]=A1vol(𝒮)d𝒙.\operatorname{\mathbb{P}}[\bm{x}\in A]=\int_{A}\frac{1}{\operatorname{vol}(\mathcal{S})}\mathrm{d}\bm{x}.blackboard_P [ bold_italic_x ∈ italic_A ] = ∫ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG roman_vol ( caligraphic_S ) end_ARG roman_d bold_italic_x . (B.3.6)

Then the entropy h(𝒙)h(\bm{x})italic_h ( bold_italic_x ) is just

h(𝒙)=log2vol(𝒮).h(\bm{x})=\log_{2}\operatorname{vol}(\mathcal{S}).italic_h ( bold_italic_x ) = roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_vol ( caligraphic_S ) . (B.3.7)

The proof then concludes with a lemma that relates the ratio vol(𝒮)/vol(Bϵ)\operatorname{vol}(\mathcal{S})/\operatorname{vol}(B_{\epsilon})roman_vol ( caligraphic_S ) / roman_vol ( italic_B start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) to the ϵ\epsilonitalic_ϵ-covering number of 𝒮\mathcal{S}caligraphic_S by ϵ\epsilonitalic_ϵ balls.

To extend the program above to degenerate distributions satisfying Assumption B.3, our proof of the lower bound in Theorem 3.6 will leverage an approximation argument of the actual low-dimensional distribution 𝒙\bm{x}bold_italic_x by “nearby” distributions which have densities, similarly but not exactly the same as the proof sketch preceding Theorem B.1. We will then link the parameter introduced in the approximating sequence to the distortion parameter ϵ\epsilonitalic_ϵ in order to obtain the desired conclusion in Theorem 3.6.

Definition B.1.

Let 𝒮\mathcal{S}caligraphic_S be a compact set. For any δ>0\delta>0italic_δ > 0, define the δ\deltaitalic_δ-thickening of 𝒮\mathcal{S}caligraphic_S, denoted 𝒮δ\mathcal{S}_{\delta}caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT, by

𝒮δ={𝝃Ddist(𝝃,𝒮)δ}.\mathcal{S}_{\delta}=\left\{\bm{\xi}\in\mathbb{R}^{D}\mid\mathrm{dist}(\bm{\xi},\mathcal{S})\leq\delta\right\}.caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = { bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ∣ roman_dist ( bold_italic_ξ , caligraphic_S ) ≤ italic_δ } . (B.3.8)

The distance function referenced in Definition B.1 is defined by

dist(𝝃,𝒮)=inf𝝃𝒮𝝃𝝃2.\operatorname{dist}(\bm{\xi},\mathcal{S})=\inf_{\bm{\xi}^{\prime}\in\mathcal{S}}\left\|\bm{\xi}-\bm{\xi}^{\prime}\right\|_{2}.roman_dist ( bold_italic_ξ , caligraphic_S ) = roman_inf start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_S end_POSTSUBSCRIPT ∥ bold_italic_ξ - bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (B.3.9)

For a compact set 𝒮\mathcal{S}caligraphic_S, Weierstrass’s theorem implies that for any 𝝃D\bm{\xi}\in\mathbb{R}^{D}bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT, there is always some 𝝃𝒮\bm{\xi}^{\prime}\in\mathcal{S}bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_S attaining the infimum in the distance function. Compactness of 𝒮δ\mathcal{S}_{\delta}caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT follows readily from compactness of 𝒮\mathcal{S}caligraphic_S, so vol(𝒮δ)\operatorname{vol}(\mathcal{S}_{\delta})roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) is finite for any δ>0\delta>0italic_δ > 0. It is then possible to make the following definition of a thickened random variable, specialized to Assumption B.3.

Definition B.2.

Let 𝒙\bm{x}bold_italic_x be a random variable such that Supp(𝒙)=𝒮\operatorname{Supp}(\bm{x})=\mathcal{S}roman_Supp ( bold_italic_x ) = caligraphic_S is a union of KKitalic_K hyperspheres, distributed as in Assumption B.3. Denote the support of each component of the mixture by 𝒮k\mathcal{S}_{k}caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Define the thickened random variable 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT as the mixture of measures where each component measure is uniform on the thickened set 𝒮k,δ\mathcal{S}_{k,\delta}caligraphic_S start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT (Definition B.1), for k[K]k\in[K]italic_k ∈ [ italic_K ], with mixing weights πk\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Lemma B.6.

Suppose the random variable 𝐱\bm{x}bold_italic_x satisfies Assumption B.3. Then if 0<δ<120<\delta<\tfrac{1}{2}0 < italic_δ < divide start_ARG 1 end_ARG start_ARG 2 end_ARG, the thickened random variable 𝐱δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT (Definition B.2) satisfies for any ϵ>0\epsilon>0italic_ϵ > 0

Rδ+ϵ(𝒙δ)Rϵ(𝒙).R_{\delta+\epsilon}(\bm{x}_{\delta})\leq R_{\epsilon}(\bm{x}).italic_R start_POSTSUBSCRIPT italic_δ + italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ≤ italic_R start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x ) . (B.3.10)

The proof of Lemma B.6 is deferred to Section B.3.2. Using Lemma B.6, the above program can be realized, because the random variable 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT has a density that is uniform with respect to the Lebesgue measure.

(Proof of Theorem 3.6).

The upper bound is readily shown. If SSitalic_S is any ϵ\epsilonitalic_ϵ-cover of the support of 𝒙\bm{x}bold_italic_x with cardinality 𝒩ϵ(Supp(𝒙))\mathcal{N}_{\epsilon}(\operatorname{Supp}(\bm{x}))caligraphic_N start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( roman_Supp ( bold_italic_x ) ), then consider the coding scheme assigning to each 𝝃Supp(𝒙)\bm{\xi}\in\operatorname{Supp}(\bm{x})bold_italic_ξ ∈ roman_Supp ( bold_italic_x ) the reconstruction 𝝃^=argmin𝝃S𝝃𝝃2\hat{\bm{\xi}}=\operatorname*{arg\ min}_{\bm{\xi}^{\prime}\in S}\,\|\bm{\xi}-\bm{\xi}^{\prime}\|_{2}over^ start_ARG bold_italic_ξ end_ARG = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_S end_POSTSUBSCRIPT ∥ bold_italic_ξ - bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, with ties broken arbitrarily. Then ties occur with probability zero, and the fact that SSitalic_S covers Supp(𝒙)\operatorname{Supp}(\bm{x})roman_Supp ( bold_italic_x ) at scale ϵ\epsilonitalic_ϵ guarantees distortion no larger than ϵ\epsilonitalic_ϵ; the rate of this scheme is log2𝒩ϵ(Supp(𝒙))\log_{2}\mathcal{N}_{\epsilon}(\operatorname{Supp}(\bm{x}))roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( roman_Supp ( bold_italic_x ) ).

For the lower bound, let 0<δ<120<\delta<\tfrac{1}{2}0 < italic_δ < divide start_ARG 1 end_ARG start_ARG 2 end_ARG, and consider the thickened random variable 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT. By Lemma B.6, we have

Rδ+ϵ(𝒙δ)Rϵ(𝒙).R_{\delta+\epsilon}(\bm{x}_{\delta})\leq R_{\epsilon}(\bm{x}).italic_R start_POSTSUBSCRIPT italic_δ + italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ≤ italic_R start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x ) . (B.3.11)

Since 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT has a Lebesgue density that is uniform, we can then apply the Shannon lower bound, in the form (B.3.5), to get

log2vol(Supp(𝒙δ))log2vol(Bδ+ϵ)O(logD)Rϵ(𝒙).\log_{2}\operatorname{vol}(\operatorname{Supp}(\bm{x}_{\delta}))-\log_{2}\operatorname{vol}(B_{\delta+\epsilon})-O(\log D)\leq R_{\epsilon}(\bm{x}).roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_vol ( roman_Supp ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ) - roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_vol ( italic_B start_POSTSUBSCRIPT italic_δ + italic_ϵ end_POSTSUBSCRIPT ) - italic_O ( roman_log italic_D ) ≤ italic_R start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x ) . (B.3.12)

Finally, we need to lower bound the ratio

vol(Supp(𝒙δ))vol(Bδ+ϵ)\frac{\operatorname{vol}(\operatorname{Supp}(\bm{x}_{\delta}))}{\operatorname{vol}(B_{\delta+\epsilon})}divide start_ARG roman_vol ( roman_Supp ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ) end_ARG start_ARG roman_vol ( italic_B start_POSTSUBSCRIPT italic_δ + italic_ϵ end_POSTSUBSCRIPT ) end_ARG (B.3.13)

in terms of the covering number. Since Supp(𝒙δ)=Supp(𝒙)+Bδ\operatorname{Supp}(\bm{x}_{\delta})=\operatorname{Supp}(\bm{x})+B_{\delta}roman_Supp ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) = roman_Supp ( bold_italic_x ) + italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT, where +++ here denotes the Minkowski sum, a standard application of volume bound arguments (see e.g. [Ver18, Proposition 4.2.12]) gives

vol(Supp(𝒙δ))𝒩2δ(Supp(𝒙))vol(Bδ).\operatorname{vol}(\operatorname{Supp}(\bm{x}_{\delta}))\geq\mathcal{N}_{2\delta}(\operatorname{Supp}(\bm{x}))\operatorname{vol}(B_{\delta}).roman_vol ( roman_Supp ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ) ≥ caligraphic_N start_POSTSUBSCRIPT 2 italic_δ end_POSTSUBSCRIPT ( roman_Supp ( bold_italic_x ) ) roman_vol ( italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) . (B.3.14)

Hence

vol(Supp(𝒙δ))vol(Bδ+ϵ)\displaystyle\frac{\operatorname{vol}(\operatorname{Supp}(\bm{x}_{\delta}))}{\operatorname{vol}(B_{\delta+\epsilon})}divide start_ARG roman_vol ( roman_Supp ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ) end_ARG start_ARG roman_vol ( italic_B start_POSTSUBSCRIPT italic_δ + italic_ϵ end_POSTSUBSCRIPT ) end_ARG 𝒩2δ(Supp(𝒙))vol(Bδ)vol(Bδ+ϵ)\displaystyle\geq\mathcal{N}_{2\delta}(\operatorname{Supp}(\bm{x}))\frac{\operatorname{vol}(B_{\delta})}{\operatorname{vol}(B_{\delta+\epsilon})}≥ caligraphic_N start_POSTSUBSCRIPT 2 italic_δ end_POSTSUBSCRIPT ( roman_Supp ( bold_italic_x ) ) divide start_ARG roman_vol ( italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) end_ARG start_ARG roman_vol ( italic_B start_POSTSUBSCRIPT italic_δ + italic_ϵ end_POSTSUBSCRIPT ) end_ARG (B.3.15)
=𝒩2δ(Supp(𝒙))(δδ+ϵ)D.\displaystyle=\mathcal{N}_{2\delta}(\operatorname{Supp}(\bm{x}))\left(\frac{\delta}{\delta+\epsilon}\right)^{D}.= caligraphic_N start_POSTSUBSCRIPT 2 italic_δ end_POSTSUBSCRIPT ( roman_Supp ( bold_italic_x ) ) ( divide start_ARG italic_δ end_ARG start_ARG italic_δ + italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT . (B.3.16)

Choosing δ=ϵ/2\delta=\epsilon/2italic_δ = italic_ϵ / 2 gives from the Shannon lower bound (B.3.12) and the above estimates

log2𝒩ϵ(Supp(𝒙))O(D)Rϵ(𝒙),\log_{2}\mathcal{N}_{\epsilon}(\operatorname{Supp}(\bm{x}))-O(D)\leq R_{\epsilon}(\bm{x}),roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( roman_Supp ( bold_italic_x ) ) - italic_O ( italic_D ) ≤ italic_R start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x ) , (B.3.17)

as was to be shown.

B.3.2 Proof of Lemma B.6

(Proof of Lemma B.6).

It suffices to show that any code for 𝒙\bm{x}bold_italic_x with expected squared distortion ϵ2\epsilon^{2}italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT produces a code for 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT with the same rate and distortion not much larger, for a suitable choice of δ\deltaitalic_δ. So fix such a code for 𝒙\bm{x}bold_italic_x, achieving rate RRitalic_R and expected squared distortion ϵ2\epsilon^{2}italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We write 𝒙^\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG for the reconstructed random variable using this code, and q:Supp(𝒙)Supp(𝒙)\mathrm{q}:\operatorname{Supp}(\bm{x})\to\operatorname{Supp}(\bm{x})roman_q : roman_Supp ( bold_italic_x ) → roman_Supp ( bold_italic_x ) for the associated encoding-decoding mapping (i.e., 𝒙^=q(𝒙)\hat{\bm{x}}=\mathrm{q}(\bm{x})over^ start_ARG bold_italic_x end_ARG = roman_q ( bold_italic_x )).

Now let 𝒮k\mathcal{S}_{k}caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denote the kkitalic_k-th hypersphere in the support of 𝒙\bm{x}bold_italic_x. There is an orthonormal basis 𝑼kD×dk\bm{U}_{k}\in\mathbb{R}^{D\times d_{k}}bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT such that Span(𝒮k)=Span(𝑼k)\operatorname{Span}(\mathcal{S}_{k})=\operatorname{Span}(\bm{U}_{k})roman_Span ( caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = roman_Span ( bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). The following orthogonal decomposition of the support set 𝒮\mathcal{S}caligraphic_S will be used repeatedly throughout the proof. We have

𝒮δ\displaystyle\mathcal{S}_{\delta}caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ={𝝃Dk[K]:dist(𝝃,𝒮k)δ}\displaystyle=\{\bm{\xi}\in\mathbb{R}^{D}\mid\exists k\in[K]\>:\>\operatorname{dist}(\bm{\xi},\mathcal{S}_{k})\leq\delta\}= { bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ∣ ∃ italic_k ∈ [ italic_K ] : roman_dist ( bold_italic_ξ , caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ italic_δ } (B.3.18)
=k[K]{𝝃Ddist(𝝃,𝒮k)δ}.\displaystyle=\bigcup_{k\in[K]}\{\bm{\xi}\in\mathbb{R}^{D}\mid\operatorname{dist}(\bm{\xi},\mathcal{S}_{k})\leq\delta\}.= ⋃ start_POSTSUBSCRIPT italic_k ∈ [ italic_K ] end_POSTSUBSCRIPT { bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ∣ roman_dist ( bold_italic_ξ , caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ italic_δ } . (B.3.19)

By orthogonal projection, for any k[K]k\in[K]italic_k ∈ [ italic_K ] any 𝝃D\bm{\xi}\in\mathbb{R}^{D}bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT can be written as 𝝃=𝝃+𝝃\bm{\xi}=\bm{\xi}^{\|}+\bm{\xi}^{\perp}bold_italic_ξ = bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT + bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT, with 𝝃Span(𝒮k)\bm{\xi}^{\|}\in\operatorname{Span}(\mathcal{S}_{k})bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∈ roman_Span ( caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and 𝝃,𝝃=0\langle\bm{\xi}^{\|},\bm{\xi}^{\perp}\rangle=0⟨ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ⟩ = 0. Then for any 𝝃𝒮k\bm{\xi}^{\prime}\in\mathcal{S}_{k}bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we have

𝝃𝝃22=𝝃+𝝃𝝃22\displaystyle\left\|\bm{\xi}-\bm{\xi}^{\prime}\right\|_{2}^{2}=\left\|\bm{\xi}^{\|}+\bm{\xi}^{\perp}-\bm{\xi}^{\prime}\right\|_{2}^{2}∥ bold_italic_ξ - bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT + bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT - bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝝃22+𝝃22+𝝃222𝝃,𝝃\displaystyle=\left\|\bm{\xi}^{\|}\right\|_{2}^{2}+\left\|\bm{\xi}^{\perp}\right\|_{2}^{2}+\left\|\bm{\xi}^{\prime}\right\|_{2}^{2}-2\left\langle\bm{\xi}^{\|},\bm{\xi}^{\prime}\right\rangle= ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ⟨ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ (B.3.20)
𝝃22+𝝃222𝝃,𝝃\displaystyle\geq\left\|\bm{\xi}^{\|}\right\|_{2}^{2}+\left\|\bm{\xi}^{\prime}\right\|_{2}^{2}-2\left\langle\bm{\xi}^{\|},\bm{\xi}^{\prime}\right\rangle≥ ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ⟨ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ (B.3.21)
=𝝃𝝃22.\displaystyle=\left\|\bm{\xi}^{\|}-\bm{\xi}^{\prime}\right\|_{2}^{2}.= ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT - bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (B.3.22)

Further, it is known that for any nonzero 𝝃Span(𝒮k)\bm{\xi}^{\|}\in\operatorname{Span}(\mathcal{S}_{k})bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∈ roman_Span ( caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ),

inf𝝃𝒮k𝝃𝝃22=𝝃𝝃𝝃222.\inf_{\bm{\xi}^{\prime}\in\mathcal{S}_{k}}\,\left\|\bm{\xi}^{\|}-\bm{\xi}^{\prime}\right\|_{2}^{2}=\left\|\bm{\xi}^{\|}-\frac{\bm{\xi}^{\|}}{\|\bm{\xi}^{\|}\|_{2}}\right\|_{2}^{2}.roman_inf start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT - bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT - divide start_ARG bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT end_ARG start_ARG ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (B.3.23)

If 𝝃\bm{\xi}^{\|}bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT is zero, it is clear that the above distance is equal to 111 for every 𝝃𝒮k\bm{\xi}^{\prime}\in\mathcal{S}_{k}bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Hence, if we define a projection mapping π𝒮k(𝝃)\pi_{\mathcal{S}_{k}}(\bm{\xi})italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_ξ ) by

πSk(𝝃)=𝑼k𝑼k𝝃𝑼k𝝃2\pi_{S_{k}}(\bm{\xi})=\frac{\bm{U}_{k}\bm{U}_{k}^{\top}\bm{\xi}}{\|\bm{U}_{k}^{\top}\bm{\xi}\|_{2}}italic_π start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_ξ ) = divide start_ARG bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ end_ARG start_ARG ∥ bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG (B.3.24)

for any 𝝃D\bm{\xi}\in\mathbb{R}^{D}bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT with 𝑼k𝝃𝟎\bm{U}_{k}^{\top}\bm{\xi}\neq\mathbf{0}bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ ≠ bold_0, then π𝒮k(𝝃)=argmin𝝃𝒮k𝝃𝝃2\pi_{\mathcal{S}_{k}}(\bm{\xi})=\operatorname*{arg\ min}_{\bm{\xi}^{\prime}\in\mathcal{S}_{k}}\left\|\bm{\xi}^{\prime}-\bm{\xi}\right\|_{2}italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_ξ ) = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We choose 0<δ<10<\delta<10 < italic_δ < 1, so that the thickened set 𝒮δ\mathcal{S}_{\delta}caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT contains no points 𝝃D\bm{\xi}\in\mathbb{R}^{D}bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT at which any of the projection maps π𝒮k\pi_{\mathcal{S}_{k}}italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT is not well-defined. So the thickened set 𝒮δ\mathcal{S}_{\delta}caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT satisfies

𝒮δ\displaystyle\mathcal{S}_{\delta}caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT =k[K]{𝝃D𝝃𝑼k𝑼k𝝃𝑼k𝝃22δ}.\displaystyle=\bigcup_{k\in[K]}\left\{\bm{\xi}\in\mathbb{R}^{D}\mid\left\|\bm{\xi}-\frac{\bm{U}_{k}\bm{U}_{k}^{\top}\bm{\xi}}{\|\bm{U}_{k}^{\top}\bm{\xi}\|_{2}}\right\|_{2}\leq\delta\right\}.= ⋃ start_POSTSUBSCRIPT italic_k ∈ [ italic_K ] end_POSTSUBSCRIPT { bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ∣ ∥ bold_italic_ξ - divide start_ARG bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ end_ARG start_ARG ∥ bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_δ } . (B.3.25)

These distances can be rewritten in terms of the orthogonal decomposition as

𝝃𝑼k𝑼k𝝃𝑼k𝝃222\displaystyle\left\|\bm{\xi}-\frac{\bm{U}_{k}\bm{U}_{k}^{\top}\bm{\xi}}{\|\bm{U}_{k}^{\top}\bm{\xi}\|_{2}}\right\|_{2}^{2}∥ bold_italic_ξ - divide start_ARG bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ end_ARG start_ARG ∥ bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝝃222𝑼k𝝃2+1\displaystyle=\left\|\bm{\xi}\right\|_{2}^{2}-2\|\bm{U}_{k}^{\top}\bm{\xi}\|_{2}+1= ∥ bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ∥ bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 (B.3.26)
=𝝃22+𝝃222𝑼k𝝃2+1\displaystyle=\left\|\bm{\xi}^{\|}\right\|_{2}^{2}+\left\|\bm{\xi}^{\perp}\right\|_{2}^{2}-2\|\bm{U}_{k}^{\top}\bm{\xi}^{\|}\|_{2}+1= ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ∥ bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 (B.3.27)
=𝝃22+(𝝃21)2.\displaystyle=\left\|\bm{\xi}^{\perp}\right\|_{2}^{2}+\left(\left\|\bm{\xi}^{\|}\right\|_{2}-1\right)^{2}.= ∥ bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (B.3.28)

We are going to show next that every such 𝝃𝒮δ\bm{\xi}\in\mathcal{S}_{\delta}bold_italic_ξ ∈ caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT can be uniquely associated to a projection onto a single subspace in the mixture, which will allow us to define a corresponding projection onto 𝒮\mathcal{S}caligraphic_S. Given a 𝝃𝒮δ\bm{\xi}\in\mathcal{S}_{\delta}bold_italic_ξ ∈ caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT, by the above, we can find a subspace 𝑼k\bm{U}_{k}bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT such that the orthogonal decomposition 𝝃=𝝃k+𝝃k\bm{\xi}=\bm{\xi}^{\|}_{k}+\bm{\xi}^{\perp}_{k}bold_italic_ξ = bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT satisfies

𝝃k22+(𝝃k21)2δ2.\left\|\bm{\xi}^{\perp}_{k}\right\|_{2}^{2}+\left(\left\|\bm{\xi}^{\|}_{k}\right\|_{2}-1\right)^{2}\leq\delta^{2}.∥ bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (B.3.29)

Consider the decomposition 𝝃=𝝃j+𝝃j\bm{\xi}=\bm{\xi}_{j}^{\|}+\bm{\xi}_{j}^{\perp}bold_italic_ξ = bold_italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT + bold_italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT for some jkj\neq kitalic_j ≠ italic_k. We have

𝝃j2=𝑼j𝑼j𝝃2\displaystyle\left\|\bm{\xi}_{j}^{\|}\right\|_{2}=\left\|\bm{U}_{j}\bm{U}_{j}^{\top}\bm{\xi}\right\|_{2}∥ bold_italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ bold_italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =𝑼j𝑼j(𝑼k𝑼k𝝃+(𝑰𝑼k𝑼k)𝝃)2\displaystyle=\left\|\bm{U}_{j}\bm{U}_{j}^{\top}(\bm{U}_{k}\bm{U}_{k}^{\top}\bm{\xi}+(\bm{I}-\bm{U}_{k}\bm{U}_{k}^{\top})\bm{\xi})\right\|_{2}= ∥ bold_italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ + ( bold_italic_I - bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) bold_italic_ξ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (B.3.30)
=𝑼j𝑼j(𝑰𝑼k𝑼k)𝝃2\displaystyle=\left\|\bm{U}_{j}\bm{U}_{j}^{\top}(\bm{I}-\bm{U}_{k}\bm{U}_{k}^{\top})\bm{\xi}\right\|_{2}= ∥ bold_italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_I - bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (B.3.31)
(𝑰𝑼k𝑼k)𝝃2=𝝃k2δ,\displaystyle\leq\left\|(\bm{I}-\bm{U}_{k}\bm{U}_{k}^{\top})\bm{\xi}\right\|_{2}=\left\|\bm{\xi}_{k}^{\perp}\right\|_{2}\leq\delta,≤ ∥ ( bold_italic_I - bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) bold_italic_ξ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_δ , (B.3.32)

where the second line uses the orthogonality assumption on the subspaces 𝑼k\bm{U}_{k}bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and the third uses the fact that orthogonal projections are nonexpansive. Hence, the jjitalic_j-th distance satisfies

𝝃j22+(𝝃j21)2(1δ)2.\left\|\bm{\xi}^{\perp}_{j}\right\|_{2}^{2}+\left(\left\|\bm{\xi}^{\|}_{j}\right\|_{2}-1\right)^{2}\geq(1-\delta)^{2}.∥ bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ ( 1 - italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (B.3.33)

This implies that if 0<δ<1/20<\delta<1/20 < italic_δ < 1 / 2, every 𝝃𝒮δ\bm{\xi}\in\mathcal{S}_{\delta}bold_italic_ξ ∈ caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT has a unique closest subspace in the mixture. Hence, under this condition, the following mapping π𝒮:𝒮δ𝒮\pi_{\mathcal{S}}:\mathcal{S}_{\delta}\to\mathcal{S}italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT : caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT → caligraphic_S is well-defined:

π𝒮(𝝃)=π𝒮k(𝝃),wherek=argmink[K]dist(𝝃,𝒮k).\pi_{\mathcal{S}}(\bm{\xi})=\pi_{\mathcal{S}_{k_{\star}}}(\bm{\xi}),\enspace\text{where}\enspace k_{\star}=\operatorname*{arg\ min}_{k\in[K]}\,\operatorname{dist}(\bm{\xi},\mathcal{S}_{k}).italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_ξ ) = italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_ξ ) , where italic_k start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_k ∈ [ italic_K ] end_POSTSUBSCRIPT roman_dist ( bold_italic_ξ , caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . (B.3.34)

Now, we define a code for 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT by

𝒙^δ=q(π𝒮(𝒙δ)).\hat{\bm{x}}_{\delta}=\mathrm{q}(\pi_{\mathcal{S}}(\bm{x}_{\delta})).over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = roman_q ( italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ) . (B.3.35)

Clearly this is associated to a rate-RRitalic_R code for 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT, because it uses the encoding-decoding mappings from the rate-RRitalic_R code for 𝒙\bm{x}bold_italic_x. We have to show that it achieves small distortion. We calculate

𝔼[𝒙δ𝒙^δ22]\displaystyle\mathbb{E}\left[\left\|\bm{x}_{\delta}-\hat{\bm{x}}_{\delta}\right\|_{2}^{2}\right]blackboard_E [ ∥ bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT - over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] =𝔼[𝒙δq(π𝒮(𝒙δ))22]\displaystyle=\mathbb{E}\left[\left\|\bm{x}_{\delta}-\mathrm{q}(\pi_{\mathcal{S}}(\bm{x}_{\delta}))\right\|_{2}^{2}\right]= blackboard_E [ ∥ bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT - roman_q ( italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (B.3.36)
(𝔼[𝒙δπ𝒮(𝒙δ)22]1/2+𝔼[π𝒮(𝒙δ)q(π𝒮(𝒙δ))22]1/2)2,\displaystyle\leq\left(\mathbb{E}\left[\left\|\bm{x}_{\delta}-\pi_{\mathcal{S}}(\bm{x}_{\delta})\right\|_{2}^{2}\right]^{1/2}+\mathbb{E}\left[\left\|\pi_{\mathcal{S}}(\bm{x}_{\delta})-\mathrm{q}(\pi_{\mathcal{S}}(\bm{x}_{\delta}))\right\|_{2}^{2}\right]^{1/2}\right)^{2},≤ ( blackboard_E [ ∥ bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT - italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + blackboard_E [ ∥ italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) - roman_q ( italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (B.3.37)

where the inequality uses the Minkowski inequality. Now, by Definitions B.1 and B.2, we have deterministically

𝒙δπ𝒮(𝒙δ)22δ2,\left\|\bm{x}_{\delta}-\pi_{\mathcal{S}}(\bm{x}_{\delta})\right\|_{2}^{2}\leq\delta^{2},∥ bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT - italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (B.3.38)

so the expectation also satisfies this estimate. For the second term, it will suffice to characterize the density of the random variable π𝒮(𝒙δ)\pi_{\mathcal{S}}(\bm{x}_{\delta})italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) as being sufficiently close to the density of 𝒙\bm{x}bold_italic_x—which, as Assumption B.3 implies, is a mixture of uniform distributions on each sub-sphere 𝒮k\mathcal{S}_{k}caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. By the argument above, every point 𝝃𝒮δ\bm{\xi}\in\mathcal{S}_{\delta}bold_italic_ξ ∈ caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT can be associated to one and only one subspace 𝑼k\bm{U}_{k}bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, which means that the mixture components in the definition of 𝒮δ\mathcal{S}_{\delta}caligraphic_S start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT (recall Definition B.2) do not overlap. Hence, the density π𝒮(𝒙δ)\pi_{\mathcal{S}}(\bm{x}_{\delta})italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) can be characterized by studying the effect of π𝒮k\pi_{\mathcal{S}_{k}}italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT on the conditional random variable 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT, conditioned on being drawn from 𝒮k,δ\mathcal{S}_{k,\delta}caligraphic_S start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT. Denote this measure by μk,δ\mu_{k,\delta}italic_μ start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT. We claim that the pushforward of this measure under π𝒮k\pi_{\mathcal{S}_{k}}italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT is uniform on 𝒮k\mathcal{S}_{k}caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. To see that this holds, we recall Equation B.3.28, which gives the characterization

𝒮k,δ={𝝃+𝝃𝝃Span(𝑼k),𝝃Span(𝑼k),𝝃22+(𝝃21)2δ}.\mathcal{S}_{k,\delta}=\left\{\bm{\xi}^{\|}+\bm{\xi}^{\perp}\mid\bm{\xi}^{\|}\in\operatorname{Span}(\bm{U}_{k}),\bm{\xi}^{\perp}\in\operatorname{Span}(\bm{U}_{k})^{\perp},\left\|\bm{\xi}^{\perp}\right\|_{2}^{2}+\left(\left\|\bm{\xi}^{\|}\right\|_{2}-1\right)^{2}\leq\delta\right\}.caligraphic_S start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT = { bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT + bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∣ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∈ roman_Span ( bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∈ roman_Span ( bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT , ∥ bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_δ } . (B.3.39)

The conditional distribution in question is uniform on this set; we need to show that the projection π𝒮k\pi_{\mathcal{S}_{k}}italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT applied to this conditional random variable yields a random variable that is uniform on 𝒮k\mathcal{S}_{k}caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. With respect to these coordinates, we have seen that π𝒮k(𝝃+𝝃)=𝝃/𝝃2\pi_{\mathcal{S}_{k}}(\bm{\xi}^{\|}+\bm{\xi}^{\perp})=\bm{\xi}^{\|}/\|\bm{\xi}^{\|}\|_{2}italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT + bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ) = bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT / ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Hence, for any 𝝃𝒮k\bm{\xi}\in\mathcal{S}_{k}bold_italic_ξ ∈ caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we have that the preimage of 𝝃\bm{\xi}bold_italic_ξ in 𝒮k,δ\mathcal{S}_{k,\delta}caligraphic_S start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT under π𝒮k\pi_{\mathcal{S}_{k}}italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT is

π𝒮k1(𝝃)={r𝝃+𝝃r>0,𝝃Span(𝑼k),𝝃22+(r1)2δ}.\pi_{\mathcal{S}_{k}}^{-1}(\bm{\xi})=\left\{r\bm{\xi}+\bm{\xi}^{\perp}\mid r>0,\bm{\xi}^{\perp}\in\operatorname{Span}(\bm{U}_{k})^{\perp},\left\|\bm{\xi}^{\perp}\right\|_{2}^{2}+\left(r-1\right)^{2}\leq\delta\right\}.italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) = { italic_r bold_italic_ξ + bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∣ italic_r > 0 , bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∈ roman_Span ( bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT , ∥ bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_r - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_δ } . (B.3.40)

To show that (π𝒮k)μk,δ(\pi_{\mathcal{S}_{k}})_{\sharp}\mu_{k,\delta}( italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT ♯ end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT is uniform, we need to decompose the integral of the uniform density on 𝒮k,δ\mathcal{S}_{k,\delta}caligraphic_S start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT in a way that makes it clear that each of the fibers π𝒮k1(𝝃)\pi_{\mathcal{S}_{k}}^{-1}(\bm{\xi})italic_π start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_ξ ) (for each 𝝃𝒮k\bm{\xi}\in\mathcal{S}_{k}bold_italic_ξ ∈ caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT) “contributes” equally to the integral.666More rigorously, this corresponds to decomposing the uniform density on 𝒮k,δ\mathcal{S}_{k,\delta}caligraphic_S start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT into a regular conditional density corresponding to 𝝃𝒮k\bm{\xi}\in\mathcal{S}_{k}bold_italic_ξ ∈ caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and showing that the corresponding density on 𝝃\bm{\xi}bold_italic_ξ is uniform. The proof makes it clear this is true. We have by Definition B.2

vol(𝒮k,δ)=Span(𝑼k)×Span(𝑼k)𝟏𝝃22+(𝝃21)2δd𝝃d𝝃.\operatorname{vol}(\mathcal{S}_{k,\delta})=\iint_{\operatorname{Span}(\bm{U}_{k})\times\operatorname{Span}(\bm{U}_{k})^{\perp}}\mathbf{1}_{\left\|\bm{\xi}^{\perp}\right\|_{2}^{2}+\left(\left\|\bm{\xi}^{\|}\right\|_{2}-1\right)^{2}\leq\delta}\mathrm{d}\bm{\xi}^{\|}\mathrm{d}\bm{\xi}^{\perp}.roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT ) = ∬ start_POSTSUBSCRIPT roman_Span ( bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) × roman_Span ( bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT ∥ bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_δ end_POSTSUBSCRIPT roman_d bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT roman_d bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT . (B.3.41)

In particular, the integration over the orthogonal coordinates factors. Let d𝜽d\mathrm{d}\bm{\theta}^{d}roman_d bold_italic_θ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT denote the uniform (Haar) measure on the sphere of radius 111 in d\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Converting the 𝝃\bm{\xi}^{\|}bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT integral to polar coordinates, we have

vol(𝒮k,δ)=[0,)𝕊dk1Span(𝑼k)rdk1𝟏𝝃22+(r1)2δdrd𝜽dkd𝝃.\operatorname{vol}(\mathcal{S}_{k,\delta})=\int_{[0,\infty)}\int_{\mathbb{S}^{d_{k}-1}}\int_{\operatorname{Span}(\bm{U}_{k})^{\perp}}r^{d_{k}-1}\mathbf{1}_{\left\|\bm{\xi}^{\perp}\right\|_{2}^{2}+\left(r-1\right)^{2}\leq\delta}\mathrm{d}r\mathrm{d}\bm{\theta}^{d_{k}}\mathrm{d}\bm{\xi}^{\perp}.roman_vol ( caligraphic_S start_POSTSUBSCRIPT italic_k , italic_δ end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT [ 0 , ∞ ) end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT blackboard_S start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT roman_Span ( bold_italic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT bold_1 start_POSTSUBSCRIPT ∥ bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_r - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_δ end_POSTSUBSCRIPT roman_d italic_r roman_d bold_italic_θ start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_d bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT . (B.3.42)

Comparing to the fiber representation (B.3.40), we see that we need to “integrate out” over the rritalic_r and 𝝃\bm{\xi}^{\perp}bold_italic_ξ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT components of the preceding integral in order to verify that the pushforward is uniform. But this is evident, as the previous expression shows that the value of this integral is independent of 𝝃\bm{\xi}^{\|}bold_italic_ξ start_POSTSUPERSCRIPT ∥ end_POSTSUPERSCRIPT—or, equivalently in context, the value of the spherical component 𝜽dk\bm{\theta}^{d_{k}}bold_italic_θ start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

Thus it follows from the above argument that π𝒮(𝒙δ)\pi_{\mathcal{S}}(\bm{x}_{\delta})italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) is uniform. Because the assumption on δ\deltaitalic_δ implies that the mixture components in the distribution of 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT do not overlap, the mixing weights πk\pi_{k}italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are also preserved in the image π𝒮(𝒙δ)\pi_{\mathcal{S}}(\bm{x}_{\delta})italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ), and in particular, the distribution of π𝒮(𝒙δ)\pi_{\mathcal{S}}(\bm{x}_{\delta})italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) is equal to the distribution of 𝒙\bm{x}bold_italic_x. Hence the second term in Equation B.3.37 satisfies

𝔼[π𝒮(𝒙δ)q(π𝒮(𝒙δ))22]=𝔼[𝒙q(𝒙)22]ϵ2,\mathbb{E}\left[\left\|\pi_{\mathcal{S}}(\bm{x}_{\delta})-\mathrm{q}(\pi_{\mathcal{S}}(\bm{x}_{\delta}))\right\|_{2}^{2}\right]=\mathbb{E}\left[\left\|\bm{x}-\mathrm{q}(\bm{x})\right\|_{2}^{2}\right]\leq\epsilon^{2},blackboard_E [ ∥ italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) - roman_q ( italic_π start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = blackboard_E [ ∥ bold_italic_x - roman_q ( bold_italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (B.3.43)

because q\mathrm{q}roman_q is a distortion-ϵ\epsilonitalic_ϵ code for 𝒙\bm{x}bold_italic_x.

We have thus shown that the hypothesized rate-RRitalic_R, (expected squared) distortion-ϵ2\epsilon^{2}italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT code for 𝒙\bm{x}bold_italic_x produces a rate-RRitalic_R, (expected squared) distortion δ+ϵ\delta+\epsilonitalic_δ + italic_ϵ code for 𝒙δ\bm{x}_{\delta}bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT. This establishes that

Rδ+ϵ(𝒙δ)Rϵ(𝒙),R_{\delta+\epsilon}(\bm{x}_{\delta})\leq R_{\epsilon}(\bm{x}),italic_R start_POSTSUBSCRIPT italic_δ + italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ) ≤ italic_R start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( bold_italic_x ) , (B.3.44)

as was to be shown.