Annali di Matematica (2007) 186(3):381–417 DOI 10.1007/s10231-006-0011-4
M. Fedrigo · F. Flandoli · F. Morandin
A Large Deviation Principle for the free energy of random Gibbs measures with application to the REM
Received: 15 February 2005 / Published online: 24 March 2006 C Springer-Verlag 2006
Abstract A Large Deviation Principle (LDP) for the free energy of random Gibbs measures is proved in the form of a general LDP for random log-Laplace integrals. The principle is then applied to an extended version of the Random Energy Model (REM). The rate of exponential decay for the classical REM is stronger than the known concentration exponent, and probabilities of negative deviations are superexponentially small. Keywords Large deviations · REM · Free energy · Random measure
1 Introduction Random measures of Gibbs type appear in various fields including disordered systems of statistical mechanics (cf. [13] as a general reference) and random coding theory (cf. the pioneering work [10]). Their random free energies are the main subject of investigation and the results of primary interest are the concentration of the free energy and its limit value. Restricting the discussion to one of the simplest models, the Random Energy Model (REM), this task has been accomplished and also fluctuation properties have been understood; see [1, 3, 5, 13] and references therein, among the wide literature. The large deviations of the REM, however, seem to be an open problem, of interest at least in the applications to random coding theory, see [7–11]. M. Fedrigo IBB-GSF, Neuherberg, Munich, Germany F. Flandoli University of Pisa, Pisa, Italy F. Morandin (B) University of Parma, Parma, Italy E-mail:
[email protected]
382
M. Fedrigo et al.
With the aim to understand some general ideas behind this problem, we have devised a general Large Deviation Principle (LDP) for random free energies and tested its applicability on the REM. In fact, we consider an extended version of the REM which arises in connection with Shannon random code, see [7]. The general result we prove on free energy densities takes the form of a generalization to random probability measures of the classical Varadhan lemma on log-Laplace integrals. We explain the idea now in Sects. 1.1 and 1.2 and give the details in Sects. 2 and 3. The main result on the application to the extended REM is described in Sects. 1.3 and 1.4, with details of the proof in Sect. 4.
1.1 Free energy densities For the purpose of this introduction, let us fix some notations. Let (, F , P) be a probability space with expectation E (ω ∈ will be the random parameter of the Gibbs measures). For every n ∈ N let (Sn , Bn , λn ) be a measure space (space of configurations) with λn being a finite measure, and let Hn : Sn × → R be a measurable function (the energy). Given β > 0 (inverse temperature) we set ω Z β,n := e−β Hn (s,ω) dλn (s) Sn
and introduce the Gibbs measure on (Sn , An ), depending on ω ∈ , defined as dνnω (s) =
1 ω Z β,n
e−β Hn (s,ω) dλn (s) .
Finally, we introduce the random free energy density ω = f β,n
1 ω log Z β,n βn
and we want to study its limit as n → ∞ and its fluctuations, in particular its ω entirely large deviations. For instance, one can see [7, 10] that the LD of f β,n characterize the possibility to transmit error-free and the rate of decay of the error probabilities. ω is by G¨ One natural approach to analyze the LD of f β,n artner–Ellis theorem, since ω α ω β . E enα fβ,n = E Z β,n For positive integers βα this approach is often feasible and resembles the replica approach. However, it is necessary to compute the previous expected value also for noninteger βα , and this is seldom an easy task. We bypass this difficulty by another approach, although somewhat longer in principle. It has the conceptual appeal to be based on properties of the number of states with a given energy, a well-known quantity both in statistical mechanics and information sciences (where it is related to the so-called spectral shape function).
A Large Deviation Principle
383
Let us introduce the (random) occupation measure of Hn (x, ω): ρnω law of Hn (., ω) under λn which, in a finite space with uniform measure (Sn , Bn , λn ), is proportional to the number of configurations s ∈ Sn with a given energy. Let Oa :R → R be the omothety Oa (b) = ab . We may think that On shrinks sets and measures by a factor n. Then ω Z β,n = e−βσ dρnω (σ ) R = e−βnσ d On ρnω (σ ) R
where
On ρnω ([a, b]) = ρnω ([na, nb]) .
ω become the analogous properties of We see that the limit properties of f β,n 1 log e−βnσ d On ρnω (σ ). βn R
Without randomness, the asymptotic of this quantity can be studied, in principle, by means of Varadhan lemma on log-Laplace integrals. Therefore, our approach consists in a generalization of Varadhan lemma to random measures. For the choice of this approach we have been particularly inspired by [5]. Notice that, when λn is not a probability measure, also On ρnω is not a probability measure, so we have to normalize it in order to work in the usual framework of Varadhan lemma. Denote by µωn the normalization of On ρnω . If the scaling factor between On ρnω and µωn is deterministic and exponential in n, it is sufficient to understand the LD of 1 log e−βnσ dµωn (σ ) βn R and then translate the result (this is the translation performed at the beginning of Sect. 4). 1.2 LDP for log-Laplace integrals With the previous motivations in mind, in Sects. 2 and 3 we consider a regular topological space , a sequence of random measures µωn on it and a continuous function φ : → R. Theorems A and B describe the LD of 1 enφ(σ ) µωn (dσ ) log n under suitable assumptions on µωn . Let us explain in rough terms the result. We need to distinguish between positive and negative fluctuations, since their probabilities are governed by different principles.
384
M. Fedrigo et al.
1.2.1 Probabilities of positive fluctuations Let us start with positive fluctuations, which we expect to give in many cases the most interesting quantitative informations. We want to estimate probabilities of the form 1 P (1) enφ(σ ) µωn (dσ ) > a log n for every real number a. Very heuristically, let J (σ, x) be a function such that 1 1 J (σ, x) ∼ − log P log µn (σ ) > x n n for large n (assumptions (4) and (5) given later are the rigorous formulation of this idea). We have 1 1 nφ(σ ) J (σ, x − φ(σ )) ∼ − log P µn (σ )) > x . log(e n n It is then natural to expect 1 1 nφ(σ ) ω inf inf J (σ, x − φ(σ )) ∼ − log P e µn (dσ ) > a . log x>a σ ∈ n n Theorem A makes this intuition rigorous. Call I + (x) the function infσ ∈ J (σ, x − φ(σ )). It is the rate function governing the probabilities of positive fluctuations. At the heuristic level we may elaborate more. Let Iµ : → [0, ∞] be defined as Iµ (σ ) = − inf{x : J (σ, x) > 0}. In easy cases when we expect µωn to concentrate around a deterministic point, we think that Iµ (σ ) is the LD rate function of µωn , namely 1 Iµ (σ ) ∼ − log µωn (σ ) n in very intuitive terms. Under appropriate tail conditions, Varadhan lemma gives us 1 f˜ := sup (φ(σ ) − Iµ (σ )) ∼ − log enφ(σ ) µωn (dσ ). n σ ∈ Since, from another direction, we know the probabilities of positive fluctuations of enφ(σ ) µωn (dσ ),
we guess
f˜ = inf{x : I + (x) > 0}.
We do not believe this relation is general, but we prove it in our particular example (Lemma 32).
A Large Deviation Principle
385
1.2.2 Probabilities of negative fluctuations Let us come to the negative fluctuations. We want to estimate probabilities of the form 1 nφ(σ ) ω log e µn (dσ ) < b (2) P n for every real number b. To understand the idea, consider the case when 1 enφ(σ ) µωn (dσ ) log n converges a.s. to a constant f˜. The interesting values of b are then b < f˜. In order that 1 enφ(σ ) µωn (dσ ) log n takes values < b it is necessary that all the exponential contributions of the integral over have a negative fluctuation (with respect to their typical values). This joint event, in the case when there is a sufficient degree of independence between µωn (A) for different sets A, is very difficult to realize. So we expect a super-exponential decay of the probabilities (2). This, of course, is not a general fact. However, it will be realized in our example of REM type. Moreover, Theorem B describes the general principle based on suitable joint events, which boils down to a super-exponential behavior in certain particular cases. 1.3 Main result for an extended REM model Let (, F , P) be a probability space with expectation E and {X n } be a sequence of random variables on (, F , P), with symmetric distribution, such that the following limit 1 (β) = lim log E[enβ X n ] ∈ [0, ∞) n→∞ n exists (finite) for all real numbers β, and that (β) is a differentiable, strictly convex function. Remark 1 Most of our results hold true under slightly more general assumptions, at the expences of simplicity of statements and proofs. Precisely, we could assume that (β) ∈ [0, ∞] exists for all β > 0, it is finite just for some β > 0, and is a lower semicontinuous and essentially smooth function (see [2]). However, under these conditions, we can prove the results only for β < β and often the proofs divide into several cases, depending on properties of the domain of . Also the assumption of symmetry is imposed only for sake of simplicity. Since in all our applications the restricted assumptions mentioned earlier hold true, we prefer to work in a readable set-up. Under these assumptions the sequence (X n ) satisfies a LDP with rate function ∗ (a), the Fenchel–Legendre (F–L) transform of (β): ∗ (a) = sup (aβ − (β)). β∈R
386
M. Fedrigo et al.
Let R ∈ (0, 1] be a given real number. Let {Sn } be a sequence a finite sets such that 1 lim log card(Sn ) = R log 2. n→∞ n We could incorporate log 2 in the constant R (changing its range), but both in the application to the REM and to the Shannon ensemble [7] it will give us more classical expressions. For every n ∈ N, let (X s,n ; s ∈ Sn ) be a family of i.i.d. random variables on (, F , P), distributed as X n . We consider an extended version of the classical REM with energies n X s,n : its partition function at inverse temperature β > 0 is defined as ω Z β,n = enβ X s,n (ω) . s∈Sn
The number R ∈ (0, 1] is a parameter of secondary importance in this paper, but it is useful in the application to the Shannon ensemble treated in [7] (it is the rate of the code). Remark 2 Let us start with a very heuristic argument in which we do not try to justify any one of our claims. The maximum of X s,n over s converges, as n → ∞, to a ∗ = sup{a : ∗ (a) < R log 2}. ∗
ω behaves as enβa . For small β, it behaves as For large β, Z β,n log 2 nβ (β)+R β
card(Sn )E[enβ X n ] ∼ e
.
We always have a∗ <
(β) + R log 2 β
except at β∗ =
d∗ ∗ (a ) da
(which may be ∞). The transition point between the two regimes is β ∗ (no transition when β ∗ = ∞). The sequence ω f β,n =
converges a.s. to
fβ =
1 ω log Z β,n βn
(β)+R log 2 β a∗
if β < β ∗ if β ≥ β ∗
(the second line is missing when β ∗ = ∞). Our problem is to understand the LD ω around f . of f β,n β
A Large Deviation Principle
387
Remark 3 Let us rigorously clarify a few facts about ∗ , a ∗ , β ∗ and f β . We do not give here the proofs,◦but simply list the facts, proved in Sect. 4. Let D∗ = {β : ∗ (β) < ∞} and D∗ be its interior. The set D∗ is either a symmetric interval ◦ of the form [−a, a] with a > 0, or D∗ = R, ∗ is differentiable in D∗ , and it has infinite lateral derivatives at ±a (when applicable, namely when D∗ = R). The definition d∗ ∗ (a ) β∗ = da is a shortening of ⎧ ◦ ⎨ d∗ ∗ (a ) if a ∗ ∈ D∗ ∗ β = . da ◦ ⎩∞ if a ∗ ∈ / D∗
About the definition of a ∗ , since ∗ (a) is strictly increasing for a > 0, and it is lower semicontinuous, we have |a| < a ∗ if and only if ∗ (a) < R log 2, |a| > a ∗ if and only if ∗ (a) > R log 2 (possibly ∗ (a) = ∞), and ∗ (a ∗ ) ≤ R log 2, ◦ . Finally, about f , we simply state the fact with equality holding when a ∗ ∈ D ∗ β ∗ that f β ≥ a for all β > 0. Our main result is the following theorem. ω = Theorem 4 Let a ∗ , β ∗ and f β be defined as mentioned earlier. Then f β,n 1 ω βn log Z β,n satisfies a LDP with the good rate function ⎧ if t < f β ⎨ +∞ if t = f β . I (t) = 0 ⎩ ∗ (t) − R log 2 if t > f β
Moreover, I (t) > 0 (possibly infinite) for t > f β . ω converges to f in the mean and P-a.s. Since I (t) > 0 for all t = f β , f β,n β The proof of Theorem 4 is given in Sect. 4.
Remark 5 At the heuristic level, we understand the previous theorem as follows. Let Nn = card(Sn ) and Wn = maxs∈Sn X s,n . In the low-temperature regime, β > β ∗ , Z β,n behaves like enWn . Note that Nn P(X n > t) − Nn2 P(X n > t)2 ≤ P(Wn > t) ≤ Nn P(X n > t), so that, if t > a ∗ , yielding in particular ∗ (t) > R log 2, we have lim n
1 log P(Wn > t) = R log 2 − ∗ (t). n
a∗,
since limn n1 log(Nn P(X n > t)) > 0 and by the independence of the If t < X s,n s, P(Wn < t) = P(X n < t) Nn , we get log P(Wn < t) = Nn log(1 − P(X n > t)) ∼ −Nn P(X n > t) showing the super-exponential behavior of negative deviations. If β < β ∗ , this phenomenon is even stronger, since Z β,n is typically much bigger than Wn . This increases the critical value to f β > α ∗ . The rate of positive deviations is the same because it is still linked to the fluctuations of Wn .
388
M. Fedrigo et al.
Let us comment on the method of proof. In order to apply Theorem A (similar considerations are true for Theorem B), we need the exponential asymptotic control of probabilities of the form 1 P (3) log µn (a, b) > v n where µωn =
1 δ X s,n (ω) . card(Sn ) s∈Sn
Since
⎞ ⎛ 1 P 1{X s,n ∈(a,b)} > card(Sn )env ⎠ , log µn (a, b) > v = P ⎝ n
s∈Sn
the computations will reduce to elementary facts about binomial distributions. 1.4 Application to the classical REM √ When R = 1 and n X s,n are all N (0, 1) (classical case of Deridda [3]) we have (β) =
β2 , 2
∗ (a) =
a2 . 2
√ In fact, this is not a mere particular case: one can show that if the law of n X s,n does not depend on n, then (β) must be a multiple of β 2 (possibly zero). For the classical REM it is well known that [1/(βn)] log Z n converges a.s. to ⎧ log 2 ⎨β + if β < 2 log 2 fβ = 2 β ⎩ 2 log 2 if β ≥ 2 log 2 1 log [Z n ] E βn converges, and the concentration inequality 1 1 2 1 log Z n − E log Z n > ε ≤ e− 2 ε n P βn βn towards which also
takes place (see for instance [13]), for ε > 0. Our result gives us 1 1 log Z n > f β + ε = −I ( f β + ε) lim log P n→∞ n βn ⎧ β log 2 β log 2 2 ⎪ 2 ⎨ if β ≤ 2 log 2 + + − 1 ε + 2ε 2 β 2 β =− 2⎪ ⎩ 2 if β > 2 log 2 ε + 2ε 2 log 2
A Large Deviation Principle
and
389
1 lim log P n→∞ n
1 log Z n < f (β) − ε = −∞. βn
The same is true with E
1 log Z n βn
in place of f β (see Lemma 25, (iii)). Therefore, the exponent in the LDP is strictly greater than the one known from concentration inequalities. 2 LDP for positive fluctuation of log-Laplace integrals In this section, we analyze the LDP corresponding to positive fluctuations only. However, we include here a subsection before dealing with the tail condition (Sect. 2.2). 2.1 General principle Let be a regular topological space with Borel σ -field B( ) (recall that a Hausdorff space is a topological space where every two distinct points admit disjoint neighborhoods, and such a space is called regular when the same property holds true between any closed set C and any point σ with σ ∈ / C). Let B(R) be the Borel σ -field on R, (, F , P) be a probability space with expectation E, {µωn ; n ∈ N, ω ∈ } a sequence of random probability measures on (we say that a probability measure µω depending on ω ∈ is a random probability measure if g(σ )µω (dσ )
is a random variable, for every bounded continuous function g : → R). Let A be a base of the topology of . Let φ : → R be a continuous function. Let J : × R → [0, ∞] be a function such that for every A ∈ A and every v ∈ R we have 1 1 log µn (A) > v − inf J (σ, x) ≤ limn→∞ log P ◦ n n σ ∈ A,x>v 1 ≤ limn→∞ log P n
1 log µn (A) > v ≤ − inf J (σ, x). n σ ∈A,x≥v
(4)
(5)
We also assume that J is lower semicontinuous and has the following form of the property of good rate functions: given two real numbers v and γ , the set {(σ, x) : x ≥ v, J (σ, x) ≤ γ } is compact.
390
M. Fedrigo et al.
Remark 6 Since log µωn (A) ≤ 0, we could avoid to consider v ≥ 0 in this definition. With the present definition we readily have (from the lower bound) J (σ, x) = ∞ ∀(σ, x) ∈ × (0, ∞). Remark 7 Let Iµ : → [0, ∞] be defined as Iµ (σ ) = − inf{x : J (σ, x) > 0}. Heuristically, Iµ is the rate function of µωn for P-a.e. ω ∈ . A rigorous statement has to involve also large deviations from below treated in Theorem B given later, so it is not given here. Here we could give only an inequality, which is of little use. But in the applications, it is very useful to have the right guess of the rate function Iµ from the previous formula. Remark 8 We suggest the reader to specialize assumptions (4) and (5) to the case of a finite space , to help the intuition of both the assumptions and the result of Theorem A given later. In fact, an alternative proof of the theorem consists in proving it first when is finite and then extending to the general case. At the end, we have found more elegant and compact-to-give statement and proof directly in the general set-up. In addition to the previous assumptions, we need to assume a tail condition, since we treat possibly unbounded functions φ. We assume that there exists a sequence (K j ) j∈N of compact subsets of such that 1 1 nφ(σ ) e µn (dσ ) > L = −∞ (6) log lim limn→∞ log P j→∞ n n K cj for all L ∈ R. We state in the next subsection a sufficient condition for this tail condition, very easy to check in the application to the REM, at least. Theorem 9 (A) Under the previous assumptions, let I + : R → [0, ∞] be defined as I + (x) = inf J (σ, x − φ(σ )). σ ∈
Then, for every a ∈ R, we have 1 1 I+ enφ(σ ) µn (dσ ) > a ≥ − inf log limn→∞ log P (a,∞) n n 1 1 nφ(σ ) I +. limn→∞ log P e µn (dσ ) > a ≤ − inf log [a,∞) n n Proof Along the proof, we shall denote enφ(σ ) µωn (dσ )
by Z nω .
(7)
(8)
A Large Deviation Principle
391
Step 1 lower bound (7) We can write (7) as 1 1 limn→∞ log P log Z n > a ≥ − inf J (σ, x − φ(σ )). σ ∈ ,x>a n n For every A ∈ A, setting φ − A = inf A φ, we have − enφ(σ ) µωn (dσ ) ≥ enφ A µωn (A).
Hence,
P
1 log Zn > a n
−
≥ P(enφ A µn (A) > ena ) 1 =P log µn (A) > a − φ − A . n
Therefore, from our assumptions 1 1 limn→∞ log P log Zn > a n n 1 1 ≥ limn→∞ log P log µn (A) > a − φ − A n n ≥− inf J (σ, x) ◦
σ ∈ A,x>a−φ − A
and thus 1 limn→∞ log P n
1 log Zn > a n
≥ − inf
A∈A
◦
inf
σ ∈ A,x>a−φ − A
J (σ, x).
The assertion of this step follows from the identity inf
A∈A
◦
inf
σ ∈ A,x>a−φ − A
J (σ, x) =
inf
σ ∈ ,x>a
J (σ, x − φ(σ )).
˚ for some A ∈ A In order to prove it, let be the set of all (σ, x) such that σ ∈ A − and x > a + (φ(σ ) − φ A ); and let be the set × (a, ∞). We prove that = . To prove ⊂ we have only to prove that (σ, x) ∈ implies x > a; − but (σ, x) ∈ implies x > a + (φ(σ ) − φ − A ) and (φ(σ ) − φ A ) ≥ 0, so x > a. Conversely, let (σ, x) ∈ . Since◦ x > a, there is ε > 0 such that x > a + ε. − + Let Aε ∈ A be such that σ ∈ Aε and φ + Aε − φ Aε ≤ ε where φ Aε = sup Aε φ, φ− Aε = inf Aε φ. Then x > a + ε implies − − x > a + φ+ Aε − φ Aε ≥ a + φ(σ ) − φ Aε . The proof is complete. Step 2 preparation to (8) Recall the tail For every j ∈ N, let A j ⊂ A be a finite family with the two properties:
392
M. Fedrigo et al.
• Kj ⊂
A∈A j 1 j
− • φ+ A − φA ≤
A for every A ∈ A j
− where φ + A = sup A φ, φ A = inf A φ. In the next step, we use the following property. For every sequence (c j ) j∈N such that lim j→∞ c j = ∞ we have
inf
sup
inf
A∈A j σ ∈A,x≥a− 1 −φ + A j
j∈N
J (σ, x) ∧ c j
≥
inf
σ ∈ ,x≥a
J (σ, x − φ(σ ))
It is sufficient to prove
≥
inf inf A∈A j σ ∈A,x≥a− 1 + φ(σ )−φ + A j
lim j→∞
(9)
inf
σ ∈ ,x≥a
J (σ, x − φ(σ ))
J (σ, x − φ(σ )).
We have inf inf A∈A j σ ∈A,x≥a− 1 + φ(σ )−φ + A j
J (σ, x − φ(σ )) ≥
inf
σ ∈ ,x≥a− 2j
J (σ, x − φ(σ ))
because the set of all (σ, x) such that σ ∈ A for some A ∈ A j and x ≥ 2 a − 1j + (φ(σ ) − φ + A ) is smaller than the set = × [a − j , ∞). Let us check that ⊂ : if (σ, x) ∈ then x ≥a−
1 1 2 1 + (φ(σ ) − φ + − =a− . A) ≥ a − j j j j
From the previous inequality, it is sufficient to prove lim j→∞
inf
σ ∈ ,x≥a− 2j
J (σ, x − φ(σ )) ≥
inf
σ ∈ ,x≥a
J (σ, x − φ(σ )).
This is a consequence of the lower semicontinuity of J . Indeed, given ε > 0, for every j let (σ j , x j ) ∈ × [a − 2j , ∞) be such that J (σ j , x j − φ(σ j )) ≤
inf
σ ∈ ,x≥a− 2j
J (σ, x − φ(σ )) + ε.
Then lim j→∞
inf
σ ∈ ,x≥a− 2j
J (σ, x − φ(σ )) ≥ lim j→∞ J (σ j , x j − φ(σ j )) − ε.
Assume there is a subsequence (x jk ) of (x j ) such that x jk ∈ [a − 2j , a). If not, eventually (σ j , x j ) ∈ × [a, ∞) and we get easily the result by the arbitrariety of ε > 0. The same argument applies to the complementary subsequence to (x jk ), so our attention is only on (x jk ). We have x jk → a. From the assumption on the union of the level sets of σ → J (σ, x) for x in a positive half-line, there is a subsequence of (σ j ), say (σ jk ) for simplicity of notations, such that σ jk → σ ∗ for
A Large Deviation Principle
393
some σ ∗ . Indeed, the sequence (σ j , x j − φ(σ j )) has to live in a compact set; so, in particular, this is true for the first component. But then lim j→∞ J (σ j , x j − φ(σ j )) ≥ limk→∞ J (σ jk , x jk − φ(σ jk )) which is ≥ J (σ ∗ , a − φ(σ ∗ )), by the lower semicontinuity. The proof is complete. Step 3 upper bound (8) We can rewrite (8) as 1 1 log Z n > a ≤ − inf J (σ, x − φ(σ )). limn→∞ log P σ ∈ ,x≥a n n For every fixed j ∈ N, the following implication is true: 1 log Z nω > a ⇒ K c enφ(σ ) µωn (dσ ) > ena N1 or j n + nφ na 1 A j e A µω ∃A n (A) > e N where N is equal to the cardinality of A j plus one. This implication is true + because, by contradiction, if enφ A µωn (A) ≤ ena N1 for every A ∈ A j and nφ(σ ) µω (dσ ) ≤ ena 1 , then n Kc e N j
enφ(σ ) µωn (dσ ) ≤
A
A∈A j
≤
enφ(σ ) µωn (dσ ) +
nφ + A
e
A∈A j
µωn (A) +
K cj
K cj
enφ(σ ) µωn (dσ )
enφ(σ ) µωn (dσ ) ≤ ena .
Therefore, + 1 nφ A na 1 P P e µn (A) > e log Z n > a ≤ n N A∈A j nφ(σ ) na 1 +P e µn (dσ ) > e N K cj 1 1 = P − log µn (A) > a − φ + log N A n n A∈A j 1 1 nφ(σ ) +P e µn (dσ ) > a − log N log n n K cj and therefore ≤
P
A∈A j
+P
1 1 log µn (A) > a − − φ + A n j
1 log n
K cj
enφ(σ ) µn (dσ ) > a − 1
394
M. Fedrigo et al.
for sufficiently large n. By our assumptions, for every A ∈ A j 1 1 1 + limn→∞ log P log µn (A) > a − − φ A n n j J (σ, x) ≤− inf σ ∈A,x≥a− 1j −φ + A
and 1 limn→∞ log P n
1 log n
nφ(σ )
K cj
e
µn (dσ ) > a − 1 = −c j
with lim j→∞ c j = ∞. Therefore, 1 1 limn→∞ log P log Zn > a n n inf J (σ, x) ∧ c j . ≤ − sup inf j∈N
A∈A j σ ∈A,x≥a− 1 −φ + A j
By (9) we have the result of this step. The proof is complete.
2.2 On the tail condition Lemma 10 Assume that for some λ > 1 we have 1 1 nλφ(σ ) lim limn→∞ log P e µn (dσ ) > L = −∞ log L→∞ n n
(10)
and lim
inf
j→∞ σ ∈K c x≥L
J (σ, x) = +∞
(11)
j
for every L ∈ R, where J and K cj are given in the previous section. Then the tail condition (6) holds true. Proof Since K cj
we have
enφ(σ ) µωn (dσ )
≤
µωn
K cj
λ−1
λ
λ1
enλφ(σ ) µωn (dσ )
1 enφ(σ ) µn (dσ ) > L log P n K cj c 1 1 λ−11 nλφ(σ ) ≤P log µn K j + log e µn (dσ ) > L . λ n λn
A Large Deviation Principle
395
Taken any two real numbers L 1 and L 2 such that L 1 + L 2 = L, we have λ−11 1 nφ(σ ) e µn (dσ ) > L ≤ P log log µn K cj > L 1 P n λ n K cj 11 enλφ(σ ) µn (dσ ) > L 2 . log +P λn Therefore, 1 nφ(σ ) e µn (dσ ) > L log n K cj 1 1 λ ≤ limn→∞ log P log µn (K cj ) > L 1 n n λ−1 1 1 enλφ(σ ) µn (dσ ) > λL 2 log ∨ limn→∞ log P n n ≤− inf J (σ, x)
1 limn→∞ log P n
λ σ ∈K cj ,x≥L 1 λ−1
1 ∨ limn→∞ log P n
1 log n
nλφ(σ )
e
µn (dσ ) > λL 2 .
The result is now a direct consequence of the two assumptions. Theorem 11 Consider the sequence of probability measures νn = Eµn and assume that 1 limn→∞ log n
enλφ(σ ) νn (dσ ) < ∞
(12)
for some λ > 1. Then the condition (10) is verified. Therefore, if (11) is also verified, the tail condition (6) holds true. Proof The assertion easy follows from the inequality 1 P enλφ(σ ) µn (dσ ) > L = P enλφ(σ ) µn (dσ ) > exp(n L) log n ≤ exp (−n L) E enλφ(σ ) µn (dσ ) = exp (−n L) enλφ(σ ) νn (dσ ).
Remark 12 In the application to the REM (and we think in many others) one has an easy control of νn , so that one can prove the (deterministic!) tail condition (12).
396
M. Fedrigo et al.
2.3 A simplified version on the real line In the application to the REM, the general idea of what we have to check is not difficult (certain properties of binomial distributions), but one has to distinguish several cases depending on the sets of the base A of the topology and on the value of v in assumptions (4) and (5). It is therefore very useful to state a simplified version of these assumptions, less general but easy to handle. Consider the case when = R, with the Euclidean topology. Let A be a base of the topology given by all open intervals (a, b) with a and b taken in a dense subset of (in the application we shall exclude a few isolated critical values). Let J : × R → [0, ∞] be a lower semicontinuous function. Our aim is to verify that for every (a, b) ∈ A and every v ∈ R we have 1 1 J (σ, x) ≤ limn→∞ log P (13) log µn (a, b) > v − inf σ ∈(a,b),x>v n n 1 1 ≤ limn→∞ log P inf J (σ, x). (14) log µn (a, b) > v ≤ − σ ∈[a,b],x≥v n n We want to give a simple set of conditions to prove these inequalities. The following propositions are, in a sense, well known in LD theory, but here we deal with a functional J (σ, x) of two variables where σ has no classical meaning in LD theory, so we state and prove these simple facts for completeness. Lemma 13 Assume that for every σ ∈ , the function x → J (σ, x) is nondecreasing; and for every x, the function σ → J (σ, x) is nondecreasing (resp. nonincreasing) on some interval A ⊂ . If, for a triple (a, b, v) with [a, b] ⊂ A, we have 1 1 lim log P log µn (a, b) > v = −J (a, v) n→∞ n n (resp. = −J (b, v)), then for this triple we have (13) and (14). Proof We have inf
σ ∈[a,b],x≥v
J (σ, x) = J (a, v) and
The claim easily follows from this fact.
inf
σ ∈(a,b),x>v
J (σ, x) ≥ J (a, v).
Lemma 14 Assume that, for an interval (a, b) ∈ A , (13) holds true for all v except those of a discrete set. Then (13) holds true for every v. The same property applies to (14). Proof step 1 Let v be a value in the exceptional set and let ε∗ > 0 be a value such that v ± ε∗ is in a connected neighborhood of v without others exceptional points. For ε ∈ (0, ε∗ ) we have 1 1 limn→∞ log P log µn (a, b) > v n n 1 1 log µn (a, b) > v − ε ≤ − ≤ limn→∞ log P inf J (σ, x) σ ∈[a,b],x≥v−ε n n
A Large Deviation Principle
397
so limn→∞ n1 log P
log µn (a, b) > v ≤ − supε∈(0,ε∗ ) infσ ∈[a,b],x≥v−ε J (σ, x).
1 n
The proof is complete if we show that sup
inf
ε∈(0,ε ∗ ) σ ∈[a,b],x≥v−ε
J (σ, x) ≥
inf
σ ∈[a,b],x≥v
J (σ, x).
By contradiction, if there is δ > 0 such that sup
inf
ε∈(0,ε ∗ ) σ ∈[a,b],x≥v−ε
J (σ, x) ≤
inf
σ ∈[a,b],x≥v
J (σ, x) − δ
then, taken a sequence εn ∈ (0, ε∗ ) with εn → 0, there exists σn ∈ [a, b], xn ≥ v − εn such that J (σn , xn ) ≤
δ J (σ, x) − . σ ∈[a,b],x≥v 2 inf
There is then a subsequence (σn k , xn k ) → (σ ∗ , v) for some σ ∗ ∈ [a, b]. By the lower semicontinuity limk→∞ J (σn k , xn k ) ≥ J σ ∗ , v ≥
inf
σ ∈[a,b],x≥v
J (σ, x).
This contradicts the previous inequality, so the proof is complete. Step 2 Let v, ε∗ and ε as mentioned in the previous step. We have 1 1 log µn (a, b) > v limn→∞ log P n n 1 1 log µn (a, b) > v + ε ≥ − ≥ limn→∞ log P inf J (σ, x) σ ∈(a,b),x>v+ε n n so 1 limn→∞ log P n
1 inf J (σ, x). log µn (a, b) > v ≥ − inf ∗ ε∈(0,ε ) σ ∈(a,b),x>v+ε n
The proof is complete because inf
inf
ε∈(0,ε ∗ ) σ ∈(a,b),x>v+ε
J (σ, x) =
inf
σ ∈(a,b),x>v
J (σ, x).
398
M. Fedrigo et al.
3 LDP for negative fluctuations We always assume the tail condition (6); (K j ) j∈N will be the sequence of compact subsets of described there. Moreover, we assume to have a functional I − : R → [0, ∞], a positive integer j0 , a finite family B j ⊂ B( ), for every j ≥ j0 ( j ∈ N), such that = K cj ∪ (∪ B∈B j B) and two sequences ε j → 0 and R j → −∞ such that 1 1 − inf I − (y) − ε j ≤ limn→∞ log P ∀B ∈ B log µn (B) < (x − φ)− j B y
for every x ∈ R and j ≥ j0 . Remark 15 The previous conditions correspond to the intuitive requirement that for every x ∈ R we have (we cannot speak of µn (σ )) 1 1 − − inf I (y) − ε j ≤ limn→∞ log P log µn (σ ) ≤ x − φ(σ ) for all σ ∈ y
When is finite, this formulation is meaningful and may help to understand the proof of Theorem B given later. Remark 16 Let Tφ = −φ + R be the set of all functions of the form −φ + c, c ∈ R. Since Tφ is isomorphic to R, the following formulation is equivalent to the main assumption (the correspondence is I − (x) = J ∗ (x − φ(.))): assume to have a functional J ∗ : Tφ → [0, ∞], a positive integer j0 and a finite family B j ⊂ B( ), for every j ≥ j0 , such that = K cj ∪ (∪ B∈B j B) and 1 1 − ∗ log µn (B) < f B ∀B ∈ B j − inf J − εn ≤ limn→∞ log P g∈Tφ ,g< f n n 1 1 log µn (B) < f B+ ∀B ∈ B j ≤ limn→∞ log P n n ≤ − inf J ∗ + εn ∨ Rn g∈Tφ ,g≤ f
for every f ∈ Tφ and j ≥ j0 . This more cumbersome scheme underlines the functional dependence on φ of the rate I − (x), which is somewhat similar to the one of the theorems for positive fluctuations.
A Large Deviation Principle
399
Theorem 17 (B) Under the previous assumptions, for every b ∈ R we have 1 1 nφ(σ ) I − (x) e µn (dσ ) < b ≥ − inf (17) log limn→∞ log P x
by
Z nω .
Step 1 upper bound (18) For every j ≥ j0 , we have 1 P enφ(σ ) µn (dσ ) < exp(nb) log Zn < b = P n enφ(σ ) µn (dσ ) < exp(nb) ∀B ∈ B j ≤P B nφ − B
≤ P(e µn (B) < exp(nb) ∀B ∈ B j ) 1 ∀B ∈ B log µn (B) < b − φ − =P j . B n + Therefore, from our assumptions (notice that b − φ − B = (b − φ) B ) 1 1 log Zn < b limn→∞ log P n n 1 1 ∀B ∈ B ≤ limn→∞ log P log µn (B) < (b − φ)+ j B n n I− + εj ∨ Rj. ≤ − inf y≤b
The claim easily follows.
Step 2 (lower bound (17)) If N j = card B j + 1, we have P n1 log Z n < b = P enφ(σ ) µn (dσ ) < exp(nb) ⎞ ⎛⎧ 1 ⎪ nφ(σ ) ⎪ e µ (dσ ) < exp(nb) ∀B ∈ B ⎪ n j ⎟ ⎜⎨ Nj B ⎟ ⎜ ≥ P⎜ ⎟ 1 ⎠ ⎝⎪ nφ(σ ) ⎪ e µ (dσ ) < exp(nb) ⎪ n ⎩ Kc N j j ⎛⎧ 1 ⎪ ⎪ log µn (B) < b − φ + − 1 log N j ∀B ∈ B j B ⎨ ⎜⎪ n ⎜ n ≥ P⎜ 1 1 ⎝⎪ nφ(σ ) ⎪ µn (dσ ) < b − log N j ⎪ ⎩ n log K c e n j
⎞ ⎟ ⎟ ⎟ ⎠
400
M. Fedrigo et al.
and therefore, given ε > 0, there exists n 0 (depending on j) such that for every n ≥ n0 ⎛⎧ ⎞ 1 ⎪ + ⎪ ⎨ log µn (B) < b − φ B − ε∀B ∈ B j ⎟ ⎜⎪ 1 ⎜ n ⎟ P log Z n < b ≥ P ⎜ 1 ⎟ ⎝⎪ ⎠ n nφ(σ ) ⎪ e µ (dσ ) < b − ε log ⎪ n ⎩n c Kj 1 + ≥P log µn (B) < b − φ B − ε ∀B ∈ B j n 1 nφ(σ ) −P e µn (dσ ) ≥ b − ε . log n K cj I − (x) < ∞, otherwise the proof is complete. Let x < b be a Assume infx
0 be such that b − ε > x . We have − infx
1 log n
nφ(σ )
K cj
e
µn (dσ ) > b − ε
< − inf I − (x) − 2. x
Since
1 1 log µn (B) < b − φ + limn→∞ log P B −ε n n I − (x) − ε j − ε ≥ − inf
∀B ∈ B j
x
by the previous choice of j0 we have, for all j > j0 , 1 1 I − (x) − ε j − ε. log Z n < b ≥ − inf limn→∞ log P x 0, for all ε < ε0 I − (x) + ε ≤ inf I − (x) + ε0 inf x
inf
0<ε<ε0
x
I − (x) + ε ≤ inf
x
=
I − (x) + ε0 inf inf 0<ε<ε0 x
1 1 I − (x) − ε0 . log Z n < b ≥ − inf limn→∞ log P x
hence
A Large Deviation Principle
401
3.1 How to find I − (x) In the application to the REM it is not too difficult to identify the correct functional J (σ, x) of Theorem A by a direct computation of the probabilities related to it. In contrast, a direct analysis of conditions of Theorem B is hard for two reasons: (1) they involve joint distributions, (2) therefore, one needs an a priori guess of I − (x). − About (2), it is helpful to detect the region where I (x) = 0. The first lemma of this section is devoted to this purpose. About (1), a lucky occurrence region where I − (x) = 0, it is not necessary to I − (x) = 0, it is not necessary to compute joint probabilities, since just one marginal is sufficient to get I − (x) = ∞ (a fortiori the same is true for the joint probabilities). This is the content of the second lemma. Lemma 18 Assume that the conditions (4) and (5) of Theorem A holds true and let Iµ : → [0, ∞] be defined as Iµ (σ ) = − inf {x : J (σ, x) > 0} (Iµ (σ ) = +∞ if {x : J (σ, x) > 0} = ∅). Let
(19)
f = sup (φ(σ ) − Iµ (σ )).
(20)
σ ∈ I − : (−∞,
f ) → [0, ∞], a positive integer j0 , a Assume there exists a function finite family B j ⊂ B( ), for every j ≥ j0 ( j ∈ N), such that = K cj ∪(∪ B∈B j B) and two sequences ε j → 0 and R j → −∞ such that (15)–(16) hold true for every x< f . Assume the following technical conditions: (i) For every j ≥ j0 and B ∈ B j there is U B ∈ A such that B ⊂ U B and − lim max φ + B − φU B = 0, j→∞ B∈B j
where A is the base of the topology of used in Theorem A. (ii) For every ε > 0 there exists jε ≥ j0 such that for every j ≥ jε and B ∈ B j we have inf J (σ, y) > 0. σ ∈U B ,y≥−Iµ (σ )+ε
f , ∞) by setting Extend I − on [ f. I − (x) = 0 for every x ≥ Then (15)–(16) hold true also for every x ≥ f , with the same B j , ε j , R j and possibly a new j0 . Proof Step 1 (upper estimate) Since for every x ≥ f 1 1 limn→∞ log P log µn (B) < (x − φ)+ B n n I − (y) + ε j ∨ R j ≤ 0 ≤ ε j = − inf y≤x
the upper estimate is obvious.
∀B ∈ B j
402
M. Fedrigo et al.
Step 2 (lower estimate, x > f ) We have to prove that for sufficiently large j ∈ N and all x > f, 1 1 limn→∞ log P ∀B ∈ B log µn (B) < (x − φ)− j ≥ −ε j . B n n This is true if we prove that there exists j0 such that for every j ≥ j0 1 limn→∞ P ∀B ∈ B log µn (B) < (x − φ)− j =1 B n which in turn is implied by
lim P
n→∞
∀B ∈ B j , or equivalently
lim P
n→∞
1 log µn (B) < (x − φ)− B n
1 log µn (B) ≥ (x − φ)− B n
=1
= 0.
Therefore, it is sufficient to prove that there exists j0 such that for every j ≥ j0 and B ∈ B j , 1 − lim P log µn (U B ) ≥ (x − φ) B = 0. n→∞ n From (5) (recall that U B was chosen in A), this limit is zero in particular when inf
σ ∈U B ,y≥(x−φ)− B
J (σ, y) > 0.
− Set ε = x−2 f . Let j0 be such that for every j ≥ j0 we have φ + B − φU B < ε for B ∈ B j . Then for every σ ∈ U B + − (x − φ)− B = x − φ B > x − φU B − ε
≥ x − φ(σ ) − ε = x − (φ(σ ) − Iµ (σ )) − Iµ (σ ) − ε ≥x− f − Iµ (σ ) − ε = −Iµ (σ ) + ε. Therefore, inf
σ ∈U B ,y≥(x−φ)− B
J (σ, y) ≥
inf
σ ∈U B ,y≥−Iµ (σ )+ε
J (σ, y)
and the latter is positive by assumption. Step 3 (lower estimate, x = f ) Given j, we have 1 1 limn→∞ log P f − φ)− ∀B ∈ B log µn (B) < ( j B n n 1 1 f − ε − φ)− ∀B ∈ B log µn (B) < ( ≥ sup limn→∞ log P j B n n ε>0 I − (y) = − inf I − (y) . ≥ − inf inf ε>0 y< f −ε
The proof is complete.
y< f
A Large Deviation Principle
403
Lemma 19 Given a number b, a positive integer j0 , a finite family B j ⊂ B( ), for every j ≥ j0 ( j ∈ N), such that = K cj ∪ (∪ B∈B j B) and a sequence ε j → 0, assume that for every x < b there is a sequence B j ∈ B j such that lim j→∞ R j = −∞ where 1 R j = limn→∞ log P n
1 log µn (B j ) < (x − φ)+ Bj . n
Define I − (x) = ∞
for every x < b.
Then, with this sequence R j , conditions (15)–(16) hold true for every x < b. Proof Define I − (x) = ∞ for every x < b. In order to prove (15)–(16) we have only to prove that 1 limn→∞ log P n
1 log µn (B) < (b − φ)+ B n
∀B ∈ B j
≤ Rj
for some R j → −∞. This is obvious with the choice of R j done in the statement, hence the lemma is proved. Remark 20 The good sets B j to apply the previous lemma are expected to be the small sets around the maximum points of φ − Iµ .
3.2 Full LDP Theorem 21 Under the assumptions of both Theorems A and B, assume that there exists a number f such that the function ⎧ + ⎪ ⎨ I (x) if x > I (x) = 0 if x = ⎪ ⎩ − I (x) if x <
f f f
is strictly increasing for x ≥ f and strictly decreasing for x ≤ f . Then 1 log n
enφ(σ ) µωn (dσ )
satisfies a LDP with rate function I. In the previous statement, we understand that strict monotonicity has to hold in the regions where I is finite. The passage from half-lines to general Borel sets under strict monotonicity of the rate function is standard, so the proof of the theorem will be omitted.
404
M. Fedrigo et al.
4 Application to the (extended) REM 4.1 Preliminaries on and ∗ Let {X n } and (X s,n ; s ∈ Sn ) be the random variables introduced in Sect. 1.3, with the basic assumptions on imposed there (existence of finite value (β) for every β, differentiability and strict convexity of ). For shortness, here we shall always call REM such a general model. Recall that is always convex (see Lemma 2.3.9 of [2]), so the strict convexity is the true assumption. We recall some basic facts about the F–L transform which will come handy. The following statement can be found, for example, in [6], Theorems VI.5.3 and VI.5.6 and the subsequent discussion. Theorem 22 Let f be a convex lower semicontinuous function on R and let denote F–L transform. Then the following conclusions hold.
∗
1. f ∗ is a convex lower semicontinuous function on R. 2. x y ≤ f (x) + f ∗ (y) for all x and y in R. 3. x y = f (x) + f ∗ (y) if and only if y is a value between the left and the right derivative of f in x. 4. f ∗∗ = f . 5. f is strictly convex on its domain if and only if f ∗ is essentially smooth. A function f is essentially smooth if: (i) the interior D ◦f of its domain D f = {x : f (x) < ∞} is nonempty; (ii) f is differentiable in D ◦f ; (iii) at finite boundary points of D f , the lateral derivative is infinite. Therefore, by our assumptions on , all these properties hold true both for and ∗ . By elementary inspection, and ∗ are symmetric and (0) = ∗ (0) = 0. Hence, by strict convexity, both are strictly positive away from zero, increasing on the positive half-line and lima→∞ ∗ (a) = +∞ and limβ→∞ (β) = +∞. By the same reason, they are both good rate functions. The assumptions of G¨artner–Ellis theorem apply (Theorem 2.3.6 in [2]), so ◦ , {X n } satisfies the LDP with rate function ∗ . From the continuity of ∗ in D ∗ we have 1 ∗ (a) = − lim log P(X n ∈ (a, b)) n→∞ n for 0 ≤ a < b, independently of b (even infinite), possibly with the exception of the value a = sup D∗ . Finally, let us prove the less-obvious claims of Remark 3. The facts about D∗ , ∗ , and their consequences on a ∗ have been already explained. The final claim that f β ≥ a ∗ for all β > 0 is immediate because by Theorem 22 aβ ≤ ∗ (a) + (β) for all values of a and β, so we choose a = a ∗ , recall that ∗ (a ∗ ) ≤ R log 2, and look to the definition of f β .
A Large Deviation Principle
405
4.2 Translation of the problem Let Z nω and µωn be the objects introduced in Sect. 1.3: Z nω =
enβ X s,n (ω) ,
µωn =
s∈Sn
1 δ X s,n (ω) . card(Sn ) s∈Sn
We set Z nω =
R
enβσ µωn (dσ )
so we have (as we explained in Sect. 1.1) Z nω = card(Sn ) Z nω . The strategy is to apply Theorems A and B to Z nω and then to get, with a suitable ω translation, a LDP for Z n . Proposition 23 With the notations of Theorem 4, n1 log Z nω satisfies a LDP with the good rate function t + R log 2 I (t) = I β ⎧ +∞ if t < fβ ⎪ ⎪ ⎪ ⎨ 0 if t = fβ = ⎪ t + R log 2 ⎪ ∗ ⎪ − R log 2 if t > fβ ⎩ β where f β = β f β − R log 2 =
!
(β) βa ∗ − R log 2
if β < β ∗ . if β ≥ β ∗
Moreover, I (t) > 0 (possibly infinite) for t > fβ . The proof of this proposition will be given in Sects. 4.3 and 4.4. More precisely, in Sect. 4.3 we shall apply Theorem A to prove the LDP for (1/n) log Z nω + ∗ for positive fluctuations, and we shall find the rate I equal to ([{t + R log 2}/β])− R log 2 for t > f β (and zero otherwise). Then, in Sect. 4.4, we shall apply Theorem B to prove the LDP for negative fluctuations, with rate I − equal to ∞ for t < f β and 0 for t ≥ f β . Then we put the rates at t = f β as in Theorem 21. The new global rate, which is the function I (t) defined earlier, satisfies the assumption of strict monotonicity of Theorem 21 (for the reason described next), I . Moreover, I is a good rate function, so (1/n) log Z nω satisfies the LDP with rate ∗ since is a good rate function (see Sect. 4.1). Finally, the property I (t) > 0 for t> f β is equivalent to I (t) > 0 for t > f β , namely ∗ (t) > R log 2 for t > f β , which is true, since t > f β implies t > a ∗ , and t > a ∗ implies ∗ (t) > R log 2 (see Remark 3 for both facts). We promised to prove that I (t) is strictly increasing for t ≥ f β . The claim is algebraically equivalent to prove that for t > f β , ∗ (t) > R log 2 and ∗ (t) is
406
M. Fedrigo et al.
strictly increasing. The first fact has been proved earlier; the second claim is true for all positive values of t, and of course f β ≥ a ∗ > 0. The proof is complete. The relation with Theorem 4 is the following proposition. Proposition 24 Theorem 4 and Proposition 23 are equivalent. The equivalence is simply based on the identity 1 1 1 1 R log 2 1 ω ω log Z n = log Z n + + log card(Sn ) − R log 2 βn β n β β n and the first two claims of the following lemma. Lemma 25 Let {αn } be a sequence of real numbers such that limn→∞ αn = 0. Let λ = 0 and η be two real numbers. Let {Yn } be a sequence of real random variables satisfying the LDP with the good rate function G. Then (i) {Yn + αn } satisfies the LDP with the same rate function G; (ii) {λYn + η} satisfies the LDP with the good rate function H defined as H (x) = G( x−η λ ); (iii) if E[Yn ] converges to a real number M, then {Yn − E[Yn ]} satisfies the LDP with the rate function G(. + M). Proof (about (i)) We do not give all the standard details but just notice that the estimate from previous discussion boils down to prove (easy by contradiction) that sup inf G ≥ inf G n
A1
A
n
for any Borel set A, where Aε denotes the ε-neighborhood of A; the estimate from later discussion reduces to prove (again easy by contradiction) that inf inf G ≤ inf G
U ∈C U ◦
◦
A ◦
for any Borel set A with A = ∅, where C is the class of all open sets U ⊂ A having positive distance from Ac . The proof of (ii) is trivial, and proof of (iii) follows from (i) and (ii). Assertion (iii) has been used in Sect. 1.4. 4.3 LDP for
1 n
log Z nω , positive fluctuations
Aim of this section is to apply Theorem A to 1 1 log Z nω = log enβσ µωn (dσ ). n n R To verify the assumptions of Theorem A we apply Lemmas 13 and 14. In Sect. 4.3.1 we prove that the random measure µωn satisfies the assumptions of Lemma 10. Then, in Sect. 4.3.2, we identify I + as ⎧ if x ≤ fβ ⎨0 + x + R log 2 I (x) = . ∗ − R log 2 if x > fβ ⎩ β
A Large Deviation Principle
407
Fig. 1 Values of the function J (σ, x).
4.3.1 Check of the assumptions of Theorem A Let = R with the Euclidean topology. Let A be the base of the topology given by all open intervals (a, b) such that a = 0, ∗ (a) = R log 2,
b = 0 ∗ (b) = R log 2
Let J : × R → [0, ∞] be defined as ! J (σ, x) =
(∗ (σ ) − R log 2)+ ∞
for x ≤ − (∗ (σ ) ∧ R log 2) . for x > − (∗ (σ ) ∧ R log 2)
It is easy to verify that J is lower semicontinuous and that for every v and γ , the set of (σ, x) such that x ≥ v and J (σ, x) ≤ γ is compact. So it satisfies the condition similar to the property of good rate function, assumed in Sect. 2. Moreover, for every x, σ → J (σ, x) is convex, symmetric and nondecreasing over positive σ , and for every σ , x → J (σ, x) is nondecreasing and piecewise constant (Fig. 1). The function Iµ defined by (19) takes the form ! Iµ (σ ) =
∗ (σ ) if ∗ (σ ) ≤ R log 2 . ∞ if ∗ (σ ) > R log 2
It will turn out to be the rate function of µωn , for P-a.e. ω ∈ . Lemma 26 1 log P n→∞ n lim
1 log µn (a, b) > v = −J (a, v) n
holds true for every triple (a, b, v) such that 0 < a < b, ∗ (a) = R log 2, a = sup D∗ v = 0, v = −∗ (a), v = −R log 2.
(21)
408
M. Fedrigo et al.
Proof As we mentioned in Sect. 1, 1 log µn (a, b) > v P n nv 1{ X s,n ∈(a,b)} > card(Sn )e =P = P(B Nn , pn > xn ) s∈Sn
where B Nn , pn is a binomial of parameters Nn = card(Sn ) and pn = P(X n ∈ (a, b)), and xn = Nn env . We have lim
1 log Nn = R log 2 n
lim
1 log pn = −∗ (a) n
n→∞
n→∞
1 log xn = v + R log 2. n Therefore, we use the result of the appendix with lim
n→∞
α = R log 2,
β = ∗ (a),
γ = v + R log 2.
This is the table of correspondence between the results of the appendix and what we need: • J (a, v) = 0 when v ≤ −(∗ (a) ∧ R log 2) and ∗ (a) ≤ R log 2, namely when v ≤ −∗ (a) and ∗ (a) ≤ R log 2; this case corresponds to γ ≤ α − β and α − β ≥ 0 and hence is covered by (24) and (29), when v = −∗ (a), ∗ (a) = R log 2 and v = −R log 2; • J (a, v) = ∗ (a) − R log 2 if v ≤ −R log 2 and R log 2 < ∗ (a); this case is covered by (23) with v = −R log 2; • J (a, v) = ∞ when v > −(∗ (a) ∧ R log 2), that is, when v + R log 2 > (R log 2 − ∗ (a))+ ; letting v = 0, this case is covered by (22) if v > 0 and by (25) if v < 0.
The proof is complete. Lemma 27 With K j = [− j, j], the tail condition (6) holds true. Proof We have inf
σ ∈K cj x≥L
J (σ, x) =
inf
|σ |≥ j,x≥L
J (σ, x) = J ( j, L)
so, for j sufficiently large ( j > a ∗ ), ! ∗ ( ( j) − R log 2)+ inf J (σ, x) = ∞ σ ∈K c x≥L j
for L ≤ −R log 2 for L > −R log 2
A Large Deviation Principle
409
and therefore lim j→∞ infσ ∈K c x≥L J (σ, x) = +∞ for every L. This is condition j
(11). As to (12), for bounded continuous functions g we have 1 E g(σ )µωn (dσ ) = E[g(X s,n )] card(Sn ) s∈Sn = E[g(X n )] = g(x)νn (dx) R
where νn is the law of X n . Thus, E[µn ] = νn . Given any λ > 1, we have enλβx νn (dx) = E[exp(nλβ X n )] R
so condition (12) is a consequence of our assumptions on (β). By Theorem 11, the tail condition (6) holds true. Corollary 28 The assumptions of Theorem A hold true, with J given earlier. Proof The function J , restricted to σ ∈ (0, ∞), satisfies the assumptions of Lemma 13. Hence, for the triples (a, b, v) of Lemma 26, we have (13)–(14). The same is true for the triples (a, b, v) such that a < b < 0, ∗ (b) = R log 2, b = inf D∗ v = 0, v = −∗ (b) , v = − log 2 because the random variables µn (a, b) and µn (−b, −a) have the same law under P, and J is also symmetric in σ . In this way, for every interval (a, b) of A except those with a < 0 < b, (13)–(14) are satisfied for all v except three values. By Lemma 14, they are satisfied for every v. When a < 0 < b, the case v > 0 is trivial as before. When v < 0, inf
σ ∈(a,b),x>v
J (σ, x) =
inf
σ ∈[a,b],x≥v
J (σ, x) = inf J (0, x) = 0. x>v
Thus, we suffice to show that 1 log µn (a, b) > v → 1. P n Since P(µn (a, b) > exp(nv)) 1 − exp(nv) 1 − exp(nv) , µn (∞, a] < ≥ P µn [b, ∞) < 2 2 it is sufficient to know that P (µn (α, ∞) > C) → 0 for every α > 0 and C ∈ (0, 1). This is guaranteed by (30) and completes the proof when v < 0. The case v = 0 follows again from Lemma 14. Finally, the tail condition (6) has been verified in the previous lemma. The proof of the corollary is complete.
410
M. Fedrigo et al.
Remark 29 It is natural to ask ourselves whether the properties of µn just discussed are a byproduct of Sanov theorem or the k-dimensional Cramer theorem, since, at least in the particular case when all the X i,n have the same law, µn is an empirical measure of those treated by these theorems. Given Borel subsets A1 , . . . , Ak and F1 , . . . , Fk are of R, these theorems allow us to compute asymptotically P(µn (A1 ) ∈ F1 , . . . , µn (Ak ) ∈ Fk ). This is not sufficient for us, since we have to deal with sets A j,n and Fk,n , which depend on n themselves. 4.3.2 Computation of I+ We first have to show that f˜· is the F–L transform of Iµ . Recall the definition of a ∗ and β ∗ given in Theorem 4. Lemma 30 Iµ and f˜· are F–L transforms of each other. Proof Convexity and F–L transform of a function f are better understood introducing the set of lines which are below f . Let R( f ) = {(a, b) : ax + b ≤ f (x),
∀x ∈ R}
Then f is convex iff f (x) = sup{ax + b : (a, b) ∈ R( f )} and moreover its transform is given by f ∗ (a) = − inf( f (x) − ax) x
= − sup{b : b ≤ f (x) − ax, = − sup{b : (a, b) ∈ R( f )}
∀x ∈ R}
We claim that R( f˜) is the subset of R() with a ≤ a ∗ . This is quite obvious, since f˜ and coincide until the derivative of reach a ∗ , then the former grows with constant slope. More formally, note that (β) is nowhere below a ∗ β − ∗ (a ∗ ) ≥ βa ∗ − R log 2 and they are tangent at β = β ∗ (possibly infinite, with an asymptote), so that f˜ ≥ and hence R( f˜) ⊆ R(). By the same argument, even when β ∗ = ∞, if (a, b) ∈ R( f˜), then the line aβ + b must be definitively below a ∗ β − ∗ (a ∗ ), so a ≤ a ∗ . The other direction is even simpler. Supposing (a, b) ∈ R() and a ≤ a ∗ , for all β we have aβ + b ≤ (β). On the set β < β ∗ we are done; then supposing β ∗ < ∞, at β = β ∗ we have aβ ∗ + b ≤ (β ∗ ) = a ∗ β ∗ − R log 2; since the first line starts below and has smaller slope, it will be below the second line on all [β ∗ , ∞). Perusing the claim, f˜∗ (a) = − sup{b : (a, b) ∈ R( f˜)} ! − sup{b : (a, b) ∈ R()} se a ≤ a ∗ = − sup ∅ se a > a ∗ ! ∗ (a) if a ≤ a ∗ = +∞ if a > a ∗
A Large Deviation Principle
411
This implies that Iµ is the F–L transform of f˜· . To see the converse we simply note that f˜β is convex. Let us now compute the function I + (x) = inf J (σ, x − φ(σ )) = inf J (σ, x − βσ ). σ ∈
σ ∈R
Lemma 31 The function I + is given by ⎧ ⎨ ∗ x + R log 2 − R log 2 I + (x) = β ⎩ 0
if x > f˜β if x ≤ f˜β
Proof By the definition of I˜+ and Iµ , and by Lemma 9, we have I˜+ (x) > 0 ⇔ J (σ, x − βσ ) > 0, ∀σ ∈ R ⇔ Iµ (σ ) > −x + βσ, ∀σ ∈ R ⇔ x > f˜β Suppose x > f˜β , so that for all σ , βσ −x < Iµ (σ ), and hence βσ −x < ∗ (σ ) on the set |σ | ≤ a ∗ i.e. {σ : ∗ (σ ) ≤ R log 2}. Since in general I + (x) = inf (∗ (σ ) − R log 2)+ σ ∈Dx
where
Dx = {σ ∈ R:
βσ − x ≥ ∗ (σ ) ∧ R log 2}
Dx reduces to the half-line {σ ∈ R:
x + R log 2 βσ − x ≥ R log 2} = ,∞ . β
By the monotonicity of ∗ , we get the thesis. 4.4 LDP for
1 n
log Z nω , negative fluctuations
Aim of this section is to apply Theorem B to 1 1 ω log Z n = log enβσ µωn (dσ ). n n R and identify I − as I − (x) =
⎧ ⎨∞ ⎩
0
if x < fβ if x ≥ fβ
.
To this purpose, we use Lemmas 18 and 19. In Lemma 19, we take b = f β . In Lemma 18, we have to prove that f , defined by (20), is equal to f β . As a simple consequence of the result of the previous section we have the following corollary.
412
M. Fedrigo et al.
Corollary 32 The number f , defined by (20), F–L transform of Iµ , is equal to fβ . For every j ∈ N, let K j = [− j, j] as mentioned earlier and B j be the family 1 1 2 1 , −j + ,−j + ,..., ,0 , − j, − j + j j j j 1 2 1 1 , , ,..., j − , j . 0, j j j j Let us verify (at the same time) the assumptions of Lemmas 18 and 19 with b = f β . The assumptions (4) and (5) of Theorem A have been already verified. For every j ∈ N and B ∈ B j let U B ∈ A, U B ⊃ B, be any open interval having distance from B at most 1/(2 j). We have − φ+ B − φU B ≤ β
2 j
hence (technical condition (i)) − lim max (φ + B − φU B ) = 0.
j→∞ B∈B j
As to condition (ii), given ε > 0, for every σ ∈ we have that inf y≥−Iµ (σ )+ε J (σ, y) is ∞ for |σ | ≤ a ∗ and (∗ (σ ) − R log 2) for |σ | > a ∗ , so inf J (σ, y) ≥ inf J (σ, y) > 0. σ ∈U B ,y≥−Iµ (σ )+ε
σ ∈ ,y≥−Iµ (σ )+ε
Therefore, as soon as we have identified I − (x) for every x < f β , the choice I − (x) = 0 for every x ≥ fβ is correct. For the complementary region, x < f β , we apply Lemma 19 with b = f β . To this end, given x < f β , it is sufficient to find a sequence B j ∈ B j such that 1 1 + log µn (B j ) < (x − φ) B j = −∞. lim limn→∞ log P j→∞ n n Good sets B j are expected to be those around the values of σ which realize sup V where V (σ ) = φ(σ ) − Iµ (σ ) = βσ − Iµ (σ ). Denote by σ ∗ the value which maximizes V (σ ). By Theorem 22 and Lemma 30, V (σ ) ≤ f β , with equality holding only for d σ = σ∗ = fβ . dβ Note that σ ∗ ∈ [0, a ∗ ] and it is a ∗ for β ≥ β ∗ . We take as B j the interval, or the left one of the two intervals, which contains σ ∗ . In particular, with this choice, if σ ∗ = a ∗ then a j := inf B j < a ∗ . Take x < f β , so of the form x= fβ − ε
A Large Deviation Principle
413
for a suitable ε > 0. Then with δ j ≤
∗ (x − φ)+ B j = x − βa j = x − βσ + βδ j 1 j
= f β − βσ ∗ − ε + βδ j = −Iµ (σ ∗ ) − ε + βδ j . It is now sufficient to prove 1 1 ∗ lim log P log µn (B j ) < −Iµ (σ ) − ε + βδ j = −∞ n→∞ n n for all j sufficiently large. Take the range of j such that −ε + βδ j ≤ −(ε/2). Write B j = [a j , b j ]. Then 1 1 ε P log µn (B j ) < −Iµ (σ ∗ ) − ε + βδ j ≤ P log µn (B j ) < −Iµ (σ ∗ ) − n n 2 ε ∗ 1{X s,n ∈[a j ,b j ]} < card(Sn )e−n(Iµ (σ )+ 2 ) =P s∈Sn
= P(B Nn , pn < xn ) where B Nn , pn is a binomial of parameters Nn = card(Sn ) and pn = P(X n ∈ [a j , b j ]), and xn = Nn env with ε v = − Iµ (σ ∗ ) + . 2 We have
1 log Nn = R log 2 n 1 lim log pn = −∗ (a j ) n→∞ n (recall from the earlier discussion that a j < a ∗ ) lim
n→∞
lim
n→∞
1 log xn = v + R log 2. n
Therefore, we use the result of the appendix with α = R log 2,
β = ∗ (a j ),
γ = v + R log 2.
Since a j < σ ∗ ≤ a ∗ and ∗ is increasing on the positive half-line, ∗ (a j ) < ∗ (σ ∗ ) ≤ ∗ (a ∗ ) ≤ R log 2, so that α > β. The same argument shows that γ < α − β, since this is equivalent to ε ∗ (a j ) < Iµ (σ ∗ ) + . 2 Equations (31) and (32) in the appendix complete the proof as long as γ = 0, i.e. when Iµ (σ ∗ ) + ε/2 = R log 2. If this is not the case, one can simply use ε/3 instead of ε/2 and get the desired result.
414
M. Fedrigo et al.
Appendix A In this appendix, given a positive integer N and a number p ∈ [0, 1], we shall denote by B N , p a binomial random variable with parameters N and p. Given three sequences Nn → ∞ (of positive integers), pn → 0 (of numbers in [0, 1]) and xn ≥ 0, we are interested in the exponential asymptotic behavior of P(B Nn , pn > xn ) and P(B Nn , pn < xn ). We assume that the following limits exist: 1 log Nn α = lim n→∞ n 1 β = − lim log pn n→∞ n 1 γ = lim log xn . n→∞ n an We write an ∼ bn if bn → 1 as n → ∞. Independently of β, we have γ > α ⇒ lim
n→∞
1 log P(B Nn , pn > xn ) = −∞. n
(22)
Moreover, we have the following lemma. Lemma 33 (i) If γ < 0 and α < β then lim
n→∞
1 log P(B Nn , pn > xn ) = α − β. n
(23)
1 log P(B Nn , pn > xn ) = 0. n
(24)
(ii) If γ < 0 and α > β then lim
n→∞
Proof For γ < 0 and n so large that xn < 1, we have P(B Nn , pn > xn ) = 1 − (1 − pn ) Nn = 1 − e Nn log(1− pn ) = 1 − e−Nn ( pn +o( pn )) = 1 − e−Nn pn +o(Nn pn ) .
This easily implies the result. Lemma 34 If (α
− β)+
< γ < α then
1 log P(B Nn , pn > xn ) ∼ (α − β − γ ) enγ → −∞. n
(25)
Proof We give at least this proof in all the details, although elementary. Step 1 We analyze Nn kn pn (1 − pn ) Nn −kn P(B Nn , pn = kn ) = kn where kn = xn denotes the smallest integer strictly greater than xn . Let us denote by εn every sequence such that εn → 0 as n → ∞ and apply to them Landau rules. Recall that N √ N N! = 2π N (1 + ε N ) . e Since kn → ∞ and Nn − kn → ∞, we have " P(B Nn , pn = kn ) =
Nn N n pn k n 2πkn (Nn − kn ) kn Nn −kn Nn (1 − pn ) × (1 + εn ). Nn − kn
A Large Deviation Principle
415
Moreover,
" 1 lim log n→∞ n
log
Nn (1 − pn ) Nn − kn
Nn −kn
Nn γ =− , 2πkn (Nn − kn ) 2
= (Nn − kn )
(26)
log 1 + Nn (1− pn ) − 1 Nn −kn Nn (1 − pn ) −1 Nn (1− pn ) Nn − kn −1 Nn −kn
= (−Nn pn + kn )(1 + εn )
(27)
hence, noting that Nn pn = o(kn ), 1 log n
Nn (1 − pn ) Nn − kn
Nn −kn
=
1 kn (1 + εn ) n
and finally 1 log n
N n pn kn
kn
1 N n pn log n kn = kn (α − β − γ + εn ) . = kn
(28)
Therefore, we have γ 1 log P(B Nn , pn = kn ) = + εn + kn (α − β − γ + εn ) , n 2 so
1 log P(B Nn , pn = kn ) ∼ (α − β − γ ) enγ → −∞ n Step 2 We now show that P(B Nn , pn = kn ) gives us the correct asymptotic of P(B Nn , pn > xn ). Given a positive integer k, let Ck,n = ({Nn − k}/{k + 1})({ pn }/{1 − pn }), so that P B Nn , pn = k + 1 = Ck,n P(B Nn , pn = k) and notice that Ck,n is decreasing in k. Hence, k ≥ kn ⇒ P(B Nn , pn = k) ≤ (Ckn ,n )k−kn P(B Nn , pn = kn ) P(B Nn , pn > xn ) ≤ P(B Nn , pn = kn )
Nn
(Ckn ,n )k−kn
k=kn
≤ P(B Nn , pn = kn )
1 . 1 − Ckn ,n
On the other hand, of course, P(B Nn , pn > xn ) ≥ P(B Nn , pn = kn ). The final result follows from the fact that Nn − kn pn → 0. Ckn ,n = k n + 1 1 − pn The proof is complete. When α > β and 0 < γ < α − β, both E[B Nn , pn ] = Nn pn and xn diverge to +∞, but xn is much smaller than E[B Nn , pn ], so that for large enough n, P(B Nn , pn > xn ) ≥ 12 . This proves the following lemma. Lemma 35 If α > β and 0 < γ < α − β then lim
n→∞
1 log P B Nn , pn > xn = 0 n
(29)
416
M. Fedrigo et al.
Finally, arguing as previously, we have (notice that C Nn is much larger than E[B Nn , pn ]) the following lemma. Lemma 36 For every C ∈ (0, 1) lim P(B Nn , pn > C Nn ) = 0.
(30)
n→∞
Now, we turn our attention to P(B Nn , pn < xn ). Note that lim
n→∞
1 log P(B Nn , pn > xn ) = 0 n
⇒
lim
n→∞
1 log P(B Nn , pn < xn ) = 0. n
There are only two cases left. In both it will turn out that the rate is −∞. Lemma 37 If γ < 0 and α > β then lim
n→∞
1 log P(B Nn , pn < xn ) = −∞. n
(31)
Proof If n is so large that xn < 1, we have 1 1 Nn N n pn log P(B Nn , pn < xn ) = log(1 − pn ) Nn = log(1 − pn ) ≤ − n n n n
This easily implies the result. Lemma 38 If α > β and 0 < γ < α − β then 1 1 log P(B Nn , pn < xn ) ∼ − Nn pn → −∞. n n
(32)
Proof The proof is completely analogous to that of (25). Step 1 We analyze Nn kn P(B Nn , pn = kn ) = pn (1 − pn ) Nn −kn kn where kn = xn denotes the greatest integer strictly smaller than xn . Equations 26–28 are still valid, but here kn = o(Nn pn ), so that 1 1 Nn (1 − pn ) Nn −kn = − Nn pn (1 + εn ) , log n Nn − kn n and therefore
γ 1 kn 1 log P(B Nn , pn = kn ) = + εn − Nn pn 1 + εn − n (α − β − γ + εn ) , n 2 n N n pn
so
1 1 log P(B Nn , pn = kn ) ∼ − Nn pn → −∞ n n Step 2 Given a positive integer k, let Ck,n = ({k}{Nn − k + 1})/({1 − pn }{ pn }), so that P(B Nn , pn = k − 1) = Ck,n P(B Nn , pn = k); then Ck,n is increasing in k. Hence, k ≤ kn ⇒ P(B Nn , pn = k) ≤ (Ckn ,n )k−kn P(B Nn , pn = kn ) P(B Nn , pn < xn ) ≤ P(B Nn , pn = kn )
kn (Ckn ,n )k−kn k=0
≤ P(B Nn , pn = kn )
1 . 1 − Ckn ,n
A Large Deviation Principle
417
Again P(B Nn , pn < xn ) ≥ P(B Nn , pn = kn ), so that the thesis follows from Ckn ,n =
kn 1 − pn → 0. N n − k n + 1 pn
Acknowledgements The author M. Fedrigo is supported by the HASSIP network, a European RTN under the contract HPRN-CT-2002-00285.
References 1. Bovier, A., Kurkova, I.: Rigorous results on some simple spin glass models. Markov Proc. Relat. Fields 9, 209–242 (2003) 2. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Springer-Verlag, New York (1998) 3. Deridda, B.: Random-energy model: an exactly solvable model of disordered systems. Phys. Rev. B 24(5), 2613–2626 (1981) 4. Deuschel, J.-D., Stroock, D.W.: Large Deviations. Academic Press, Boston, MA (1989) 5. Dorlas, T.C., Wedagedera, J.R.: Large Deviations and the Random Energy Model. Int. J. Modern Phys. B 15(1), 1–15 (2001) 6. Ellis, R.S.: Entropy, Large Deviations, and Statistical Mechanics. Springer-Verlag, New York (1985) 7. Fedrigo, M., Flandoli, F., Morandin, F.: LDP for Free Energy Densities and a phase transition in Shannon capacity, manuscript in preparation. 8. Montanari, A.: The glassy phase of Gallager codes. Eur. Phys. J. B 23, 121–136 (2001) 9. Montanari, A., Sourlas, N.: The statistical mechanics of turbo codes. Eur. Phys. J. B 18, 107–119 (2000) 10. Sourlas, N.: Spin-glass models as error-correcting codes. Nature 339, 693–695 (1989) 11. Sourlas, N.: Statistical mechanics and capacity-approaching error-correcting codes, condmat/0106570 12. Talagrand, M.: A first course on spin glasses, Lectures on probability theory and statistics. In: Saint-Flour 2000. Lecture Notes in Mathematics, vol. 1816, pp. 181–285 (2003) 13. Talagrand, M.: Spin Glasses, a Challenge to Mathematicians. Springer, New York (2003)