MATHEMATICAL METHODS IN PATTERN RECOGNITION
The Theory of Entropy Potentials, Basic Concepts, Results, and Applications V. L. Lazarev St. Petersburg State University of Low Temperature and Food Technologies, St. Petersburg, Russia email:
[email protected] Abstract—The basic concepts of the entropy potentials theory, which are instruments for analyzing the objects and systems of a different nature, are presented. The ways for applying the theory to control and inspect the systems and also to generate the procedures for processing the observations are examined. Keywords: abeyance, artificial intelligence, entropy potential, complex entropy potential, multidimensional complex entropy potential. DOI: 10.1134/S1054661811040122
1. INTRODUCTION In order to develop and introduce smart technolo gies for information processing, to estimate situations, and to make a decision, it is necessary to develop and refine approaches and methods that make it possible to properly describe an object’s abeyance. This is one of the most topical problems in the world today and is stimulating intensive development of applicable meth ods and theories [1–6]. Below, we present a promising approach for inves tigating the states of different systems and objects by using the theory of entropy potentials (TEP). TEP is the development of an entropy approach for describing the abeyance of systems. C. Shannon [7] introduced the concept of information or probabilistic entropy for characterizing a coded signal and message comprehension. For parameter x, which possesses the discrete values Xi(i ∈ I), the entropy H(x) can be deter mined as follows: H(x) = –
∑ P ( X ) log P ( X ), i
a
i
(1)
(i)
where P(Xi) is the probability of Xi value; i is the ele ments index of parameter set x. For continuous values ∞
∫
H ( x ) = – p ( x ) log ap ( x ) dx,
(2)
–∞
where p(x) is the probability density function of value x. The logarithmic bases in Eqs. (1) and (2) can be any. Usually, a = 2 (the unit of entropy measure is bit), which is convenient for analyzing twopositional sys Received March 17, 2011
tems, and a = e (the unit of measure is nit). In order to analyze the state of the systems that operate in decimal code, it is convenient to use a = 10 (the unit of mea sure is dit). If necessary, it is possible to pass from one unit of entropy measure to another one by using the conversion coefficient, which is calculated with the help of the respective logarithmic base. Hereinafter, we use the natural logarithm ln, if there is no special note. From the definition it follows that the value of entropy depends only on a parameter‘s distribution law and characterizes a parameter’s abeyance in terms of destabilizing properties of the law. Such value make it possible to predict a parameter appearing without considering a parameter’s spread and to predict the basic values of the background at which the respective variations are examined. That is the way of the approach presented by Shannon is used for solving the problems of signal coding and decoding, and for esti mating the efficiency of operation for sources, receiv ers, information circuits, etc.; i.e., the problems where the solution is obtained by predicting the occurrence of an event, for example, a signal appearing for moni toring technical, social, and other systems, for which the mentioned approach is acceptable [8, 9]. The classical concept of entropy is not widely used in practice due to the following reasons. The first reason is as follows: in order to calculate the entropy, it is necessary to know the law of a param eter’s distribution p(x) or the law of probability distri bution P(Xi), (i ∈ I), if a parameter possesses a discrete value Xi. If the law is not known a priori, the required information can be obtained experimentally by pro cessing the representative sample from the general population. In order to obtain such a sample, it is nec essary to perform many measurements. The procedure for measuring several parameters, for example, parameters that characterize substance properties can be expensive, since for this purpose it is necessary to
ISSN 10546618, Pattern Recognition and Image Analysis, 2011, Vol. 21, No. 4, pp. 637–648. © Pleiades Publishing, Ltd., 2011.
638
LAZAREV
use complicated and unique equipment and the mea surements should be performed by qualified staff. Hereby, economical factors limit entropy usage for such an investigation.
If we solve Eq. (3) with respect to Δe, we obtain the following expression for determining the entropy potential
The second reason is that in the entropy value there are no components that characterize the spread of the parameters and the basic values at which background of the abeyance is examined. For example, if we exam ine the entropy of a twoposition trigger, which can possess only two values (0 and U) with equal probabil ities P(0) = P(U) = 0.5, according to Eq. (1) we have H = 1(bit). Such an abeyance characteristic is invari ant to a parameter’s range of variation, to its active value, etc., which decreases its information capacity.
It is evident that value Δe depends on the chosen basic distribution law with a limited range of parame ter variation. As a distribution law, it is possible to use the laws presented in Table 1 (for convenience, in Table 1, we designate the boundaries for the variation range as –a and a). Let us examine different variants for the imple mentation of the presented approach. 1. Let us determine an expression for entropy potential Δe1, based on uniform density. If we equate a parameter’s entropy with the arbitrary distribution law H(x) to a parameter’s entropy distributed according to uniform density in the range x ∈ [–Δe1, Δe1], we get
It is possible to overcome these disadvantages by introducing the concept of entropy potentials and by determining whether it is possible to use them in the respective theory.
Δ e = F { H ( x ) }.
(4)
∞
2. THE ESSENCE AND THE BASIC CONCEPTS OF THE ENTROPY POTENTIAL THEORY The base idea of the entropy potential theory is as follows: to pass from the entropy value to any other value connected with the entropy. According to the law of parameter distribution, this value should be expressed (directly or indirectly) via the set of described and other characteristics of abeyance. And we can determine these performances, if the initial data are limited. It is evident that among them there are the values that characterize parameter spread and also the real range of its variation. The problem is to integrate and spread the field of entropy usage as a part of the performance ensemble for describing parameter abeyance. We suggest solving the problem by introduc ing the concept of a parameter’s entropy potential (EP) and by using the following definition. Definition 1. The entropy potential Δe of parameter x is called the half of the range of the variation of lim ited distribution with the same entropy H(x) as the dis tribution law of this parameter. From the definition it follows that as a base for determining the entropy parameter we should choose a distribution with a limited range of variation equal to 2Δe. In this case the respective probability density function depends on Δe, i.e., p(x) = p(x, Δe). (As p(x), it is reasonable to use a function which is symmetrical with respect to the center of the range [–Δe, Δe].) According to Eq. (2), the entropy of basic distribution also depends on Δe. If we equate the entropy of the examined parameter with the arbitrary distribution law to a basic distribution entropy with a limited range of variation for parameter H(x, Δe), we get H ( x ) = H ( x, Δ e ).
(3)
1 ln 1 dx H ( x ) = – 2Δ e1 2Δ e1
∫
–∞
(5)
Δ e1
= –
1
1
ln dx = ∫ 2Δ 2Δ e1
– Δ e1
ln 2Δ e1 .
e1
From here, we find the unknown expression for Δe1 in the form of Eq. (4). 1 H(x) Δ e1 = e . 2
(6)
2. By analogy, let us determine an expression for entropy potential Δe2, based on the triangle distribu tion law (Simpson distribution). ∞
Δ e2 – x Δ e2 – x H ( x ) = – ln dx 2 2 Δ Δ e2 e2 –∞
∫
Δ e2
= –2
∫ 0
Δ e2 – x Δ e2 – x ln dx 2 2 Δ e2 Δ e2
(7)
= ln Δ e2 + 1 = ln Δ e2 e. 2 From here, we determine the unknown expression for Δe2 1e H ( x ) . Δ e2 = e
(8)
3. By analogy, let us determine an expression for entropy potential Δe3 based on the antisine distribution law πΔ e3 H ( x ) = ln . 2
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 21
(9) No. 4
2011
THE THEORY OF ENTROPY POTENTIALS, BASIC CONCEPTS, RESULTS
639
Table 1. Properties of typical laws of probability distribution Typical law of probability distribution
Probability density p(x)
Probability density curve
The value of entropy potential Δe
Entropy coefficient Ke
p(x) 1. Discrete double valued distribution
p( x ) =
0.5 at x = a
σ
0 at x ≠ a. −a
a
1.00
x
p(x) 1 p(x) = ; 2 x πa 1 – ⎛ ⎞ ⎝ a⎠ |x| ≤ a.
2. Arcsinusoidal distribution law
−a
a
π σ 2 2
π ≈ 1.11 2 2
3σ
3 ≈ 1.73
3e σ 2
3e ≈ 2.02 2
πe σ 2
πe ≈ 2.07 2
x
p(x) 3. The law of uniform density
1 at x ≤ a p(x) = 2a 0 at x > a . −a
a
x
p(x) 4. Triangular law (Simpson Law)
0 at x > a p(x) = a – x at x ≤ a . 2 a −a
a x p(x)
5. Normal distribution law
1 e p(x) = σ 2π
x – 2 2σ
x
From here, we determine the unknown expression for Δe3: H(x) Δ e3 = 2e . π
sition coefficient Ki, j, it is possible to pass from one basic entropy potential Δei to another one Δej: Δ K i, j = ej, Δ ei
(10)
Thus, by analogy it is possible to obtain expressions for entropy potentials based on other distribution laws with a limited range of parameter variation for a parameter with an arbitrary distribution law (4). By using the introduced concept of entropy poten tial, it is possible to “unify” a parameter’s abeyance based on a specific distribution law. By using the tran PATTERN RECOGNITION AND IMAGE ANALYSIS
( i ∈ I; j ∈ J ),
(11)
where i and j are the indices of the elements of the entropy potential set. Δ e2 = For example, Δe2 = Δe1K1, 2, where K1, 2 = Δ e1 2 ≈ 1.21. The inverse transition coefficient from Δej e Vol. 21
No. 4
2011
640
LAZAREV
Table 2. Transition coefficients for three basic entropy potentials Final entropy potential Δej
Δe2
Δe1
Δei Initial entropy potential
Δe3
Δe1
K1,1 = 1
2 ≈ 1.21 K1,2 = e
K1,3 = 4 ≈ 1.27 π
Δe2
K2,1 = e ≈ 0.824 2
K2,2 = 1
2 e K3,2 = ≈ 1.05 π
Δe3
π K3,1 = ≈ 0.785 4
π K3,2 = ≈ 0.953 2 e
K3,3 = 1
1 e to Δei is Kj, i = , i.e., K2, 1 = ≈ 0.824. For n K i, j 2 entropy potentials, it is possible to calculate the tran sition coefficients in advance and to table it. In this case, the table is the quadratic transfer matrix K = K i, j (i ∈ I; j ∈ J) with dimension n with principal diagonal elements equal to unity. Table 2 depicts the expressions and transition coefficients for three entropy potentials determined above: Δe1, Δe2, and Δe3. In order to investigate and compare the obtained results, it is necessary to specify the basic distribution law and the value of the entropy potential. The basic distribution law should be chosen by considering a problem’s specificity, physical phenomena that define the formation of its parameters, etc. Hereinafter, we assume that the entropy potential is determined based on uniform density (i = 1) according to Eq. (6) and we designate it as Δe = Δe1, if there are no spatial notes. For the first time, such a variant of determining entropy potential was introduced in [10]. By using Eqs. (3), (4), and (6), it is possible to determine the entropy potential value for any law of parameter distri bution. Examples of such calculations, which are widely used in practice based on law of even distribu tion, are presented in [11, 12]. If the entropy potential rises, it means that the parameter’s abeyance increases and viceversa. We prove this assertion by means of the following theorem. Theorem 1. If the entropy potential is increased, the abeyance increases (in terms of entropy) and viceversa. Proof. Expression (6) that describes the relation ship between entropy potential Δe and entropy H(x) is an exponentially increasing function; i.e., ∀H ( x ) { H 1 ( x ) H 2 ( x ) } (12) Δ e { H 1 ( x ) } Δ e { H 2 ( x ) }, which proves the assertion for value Δe. Let us expand properties of (12) to other entropy potentials Δej (j ∈ J) in the following way. From Defi nition 1, it follows that Δej > 0. And since K1j = Δej/Δe,
K1j > 0. Thus, all entropy potential values Δej (j ∈ J) are positive scale images of value Δe. Their correlation with entropy as an argument is described by an increasing exponential function. From here, we have ∀H ( x ) { H 1 ( x ) H 2 ( x ) } Δ ej { H 1 ( x ) } Δ ej { H 2 ( x ) }, ( j ∈ J ),
(13)
which is to be proved. According to the definition, the entropy potential’s dimensionality is the dimensionality of the examined parameter. Therefore, for each specific distribution law, it can be expressed as a scale image of the mean square deviation (MSD)σ with the same dimension ality. The respective scale coefficient is called the entropy coefficient and designated as Ke = Ke1. From here, Δ e = K e σ.
(14)
Expression (14) makes it possible to present parameter abeyance via its scattering performance –σ and coefficient Ke, which characterizes the destabiliz ing properties of its distribution law. The validity of such a presentation is confirmed by the fact that for many typical distribution laws, whose probability den sity is described analytically, Eq. (14) can be written in explicit form, if Δe is determined via basic expres sion (6). The entropy coefficients as the multiplicands for value σ are determined unambiguously from these dependences. Table 1 (columns 4 and 5) depicts the analytical expressions for entropy potential and the expressions and numerical values for the respective entropy coefficients for several typical distribution laws. In the cases when it is impossible to obtain an analytical expression for entropy potentials (for exam ple, if it is determined with limited observation sam ples), Ke values can be calculated by using Eqs. (1), (2), (6), and (14): H(x) Δ e K e = e = . σ 2σ
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 21
(15) No. 4
2011
THE THEORY OF ENTROPY POTENTIALS, BASIC CONCEPTS, RESULTS
From Eq. (15) it is seen that value Ke is determined by parameter entropy and therefore by its distribution law. It is known that under the same meansquare deviation, the normal distribution law is characterized by maximal entropy. Therefore, the maximal possible value Ke ≈ 2.07 corresponds to the normal distribution law. Hereby, the range of variation for the entropy coefficient is 2.07 ≥ Ke > 0. (For real situations, this range is narrower: 2.07 ≥ Ke ≥ 1.) The entropy coeffi cient describes the multiplicative abeyance compo nent caused by the type of distribution law: the higher Ke the less predictable a parameter’s manifestation and viceversa. It is necessary to point out that destabiliz ing property mapping (for any distribution law) into an entropy coefficient value is a surjection, since different distribution laws can have the same Ke. In the present work, we develop the approaches and procedures that make it possible to determine values Δe and Ke for different situations without their direct calculation by using different indirect estimations and performances [11–15]. In [11] the procedure is pre sented for determining the resulting entropy coeffi cients for values with different distribution laws. In [13] the correlations between value Ke and the ratio between the amplitude of a limited observation sample and the meansquare deviation are found. Such a ten dency is nonlinear and is characterized by a certain spread caused by surjection properties for the set of distribution laws over Ke. Based on the tendency revealed in [13], the procedures are present for a robust estimation of Ke, if the sample of observation is limited. In [12, 14, 15], the situations when it is possi ble to determine Δe and Ke indirectly by using other characteristics are analyzed and systemized. Such characteristics are coefficients of statistical lineariza tion for nonlinear links, frequency characteristics of the object, and the characteristics of spectral distur bances. These developments make it possible to esti mate efficiently the entropy potentials and entropy coefficients by using limited initial data. In the case when the specificity of the examined problem makes it reasonable to pass to another basic value of entropy potential Δej (j ∈ J), it is possible to determine the value of entropy coefficient Kej by using the transition coefficient K1, j. According to Eqs. (11) and (14), for i = 1, it is possible to write Δ ej = K 1, j Δ e = K 1, j K e σ = K ej σ.
(16)
From here, it follows that Kej = K1, jKe. By analogy it is possible to deduce an expression for determining the entropy coefficients when passing to other bases. The approach for describing the abeyance of parameters by using the entropy potential concept is characterized by the following advantage: the abey ance is inspected more completely and objectively, since the destabilized property of the distribution law (in the form of the corresponding entropy coefficients) and a parameter’s spread (in the form of the mean square deviation) are taken into account. PATTERN RECOGNITION AND IMAGE ANALYSIS
641
However, in some cases it is not enough, since it is necessary to consider a certain basic Xn value with respect to which the abeyance is examined. In these cases, we suggest describing a parameter’s abeyance by the value of the complex entropy potential (CEP)LΔ by using the following definition. Definition 2. The complex entropy potential (CEP) is a value which is determined as follows K Δ L Δi = ei = σ ei, Xn Xn
(17)
where Xn is the basic value with respect to which the abeyance is examined; i is the number of the basic dis tribution law which is used for determining the corre sponding entropy potential Δei (i ∈ I). As a basic value, it is possible to choose a parame ter’s mathematical expectation mx or its rated value. However, if a parameter varies near zero, as a Xn, it is possible to use the values from the range of parameter variation, the maximal permissible value, etc. In par ticular, as Xn, it is possible to choose any basic value of entropy potential Δeb. If the chosen base is negative (for example, the heat modes’ abeyance of the object in the area of negative temperatures is examined), as Xn, we choose the magnitude of this value. That is why, if definition (17) is used, condition Xn > 0 is always true. And, by considering that Ke > 0 and σ > 0, we have that LΔi > 0 (i ∈ I). Hereinafter, unless otherwise specified, we assume that the complex entropy potential is determined by using the law of uniform density (i = 1) and we desig nate it as LΔ = LΔ1. According to the definition, the LΔi (i ∈ I) values are dimensionless. They can be used as the similarity criteria for describing parameters’ abeyance. (If neces sary, it is possible to pass from one basic LΔi to another one LΔj by using the respective transition coefficients Ki, j (i ∈ I; j ∈ J).) It is evident that if LΔ increases, it means that the abeyance increases, and viceversa. If we use the concept of the complex entropy potential, it is possible to describe a parameter’s abey ance by an integrated complex consisting of three illustrative informative performances (σ, Ke, Xn). If we use these performances as the phase space coordi nates, the parameter’s abeyance is presented by the representative point’s position in the 3D Cartesian coordinate system. Space properties and the specific ity for describing the system’s state in it are examined in detail in [12, 16]. In particular, the following results are obtained: 1. It is revealed that a system’s abeyance is constant during system evolution in this space. The geometrical interpretation for the abeyance and for transition to another abeyance level is presented. 2. The mathematical model of metrological sup port for the results of the investigation by using the entropy potentials is developed. Vol. 21
No. 4
2011
642
LAZAREV
Below, for description continuity, we use the frag ments from the work [Analysis of System Based on Entropy and Information Characteristics, V.L. Lazarev, Technical Physics, Vol. 55, No. 2, pp. 159–165. ©Pleiades Publishing, Ltd., 2010], which relates to Theorem 2 and partially to Theorem 3 and also to some definitions that meet the author’s contract. Conditions LΔ = Cj = const, (j ∈ J), which corre spond to different abeyance levels of the systems, divide the initial set of points of 3D space into classes (j) of nonintersecting subsets of points M ( σ, K, X ) . These points belong to one of the surfaces of the constant complex entropy potential: an isotropic surface, whose properties are given in Theorem 2. Theorem 2. The isotropic surfaces have no points of intersection. Proof. Let us assume that any two isotropic surfaces LΔ(1) = const and LΔ(2) = const (LΔ(1) ≠ LΔ(2)) intersect in any point K. Then, LΔ(1) = LΔ(2). We have a contra diction. This is what we needed to prove. By using the concept of entropy potentials, it is possible to characterize numerically how the system varies over the examined parameter according to its “information track.” We prove this concept by means of the following theorem. Theorem 3. Let LΔ(1), Δe(1), and LΔ(2), Δe(2) are the entropy potentials and they characterize the system’s abeyance. In this case, the amount of information I caused by the system’s transition from one abeyance to another one is invariant with respect to the basic values of Xn1 and Xn2 and is equal to I = ln(Δe(1)/Δe(2)). Proof. Let us determine the increment of a com plex entropy potential of a system’s parameter and express Δe(1) and Δe(2) via respective entropy values H1(x) and H2(x) from Eq. (6). The results are Δ e ( 1 ) Δ e ( 2 ) Δ e ( 1 ) X n2 – Δ e ( 2 ) X n1 L Δ ( 1 ) – L Δ ( 2 ) = – = X n1 X n2 X n1 X n2 Δ e ( 2 ) X n1 ⎛ Δ e ( 1 ) X n2 ⎞ = – 1 X n1 X n2 ⎝ Δ e ( 2 ) X n1 ⎠
Δe(1) I = ln . Δe (2)
(19)
This is what we had to prove. Equation (19) can be developed. For this pur pose, let us express the entropy potentials via the corresponding scattering performance according to Eq. (14) Δe(1) Ke (1) σ1 I = ln = ln Δe (2) Ke (2) σ2 σ D = ln k ke + ln 1 = ln k ke + ln 1 , σ2 D2
(20)
Ke (1) is the conversion coefficient for a where kke = Ke (2) parameter’s distribution law, which is one of the con cepts in the theory of entropy potentials. The kke con cept is introduced in [12], where there are recommen dations for its determination. In several cases, the kke values can be determined theoretically in accordance to the physical sense by using analogies, etc., and for several typical situations they can be calculated and tabulated. The kke value characterizes a parameter’s distribution law transformation under static condi tions. Under dynamic conditions, such a process is described by a differential equation or by the corre sponding transfer function Wk(p). That is why the kke value can be determined from the transfer function by passing to the static conditions: kke = Wke(p)Ip = 0 = Wke(0). D1 and the D2 values in Eq. (20) are the vari ances D = σ2. The variance characterizes the averaged power of the whole harmonic spectrum for the dynamic component of the examined parameter D = ∞
(18)
X n2 H1 ( x ) – H2 ( x ) ⎞ X n2 H1 ( x ) ⎞ = L Δ ( 2 ) ⎛ e – 1 = L Δ ( 2 ) ⎛ e –1 ⎝ X n1 ⎠ ⎝ X n1 ⎠ X n2 I ⎞ = L Δ ( 2 ) ⎛ e – 1 . ⎝ X n1 ⎠ In Eq. (18), the value of I = H1(x) – H2(x) is a mea sure of the amount of information created by variation of a system’s abeyance, by its information track at this stage. From Eq. (18) it follows X n2 I ⎞ L Δ ( 1 ) – L Δ ( 2 ) = L Δ ( 2 ) ⎛ e – 1 , ⎝ X n1 ⎠ LΔ(1) X n2 I Δ e ( 1 ) X n2 X n2 I or = e , or = e . LΔ (2) X n1 X n1 Δ e ( 2 ) X n1
From here,
1 S ( ω ) dω, where S(ω) is the spectral density func 2π
∫
–∞
tion which describes the mean power distribution over the harmonics of the random process. By taking this fact into account, the result (20) can be accepted as a basic model of the information–energy state of the system. This model can be a component for describing the interactions as a whole: substance–energy–infor mation for different structures and substances. As proven in Theorem 3, the basic values of Xnl (l ∈ L) do not directly influence directly the amount of information I. Nevertheless, as follows from the defi nition for a complex entropy potential, these values can be expressed indirectly via Δe and LΔ. That is why it is natural to assume that the ratio between these val ues influences I. We answer this question by means of the following theorem.
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 21
No. 4
2011
THE THEORY OF ENTROPY POTENTIALS, BASIC CONCEPTS, RESULTS
Theorem 4. Let LΔ(1) and LΔ(2) be the complex entropy potentials that characterize separate abeyances of the system according to any parameter. In this case, if LΔ(1) > LΔ(2), the amount of information I caused by vari ation in the system’s abeyance is higher than ln(Xn1/Xn2), and viceversa. If LΔ(1) = LΔ(2), then I = ln(Xn1/Xn2). Proof. From condition LΔ(1) > LΔ(2), it follows Δe(1) Δe(2) L Δ ( 1 ) – L Δ ( 2 ) = – X n1 X n2 X n2 Δ e ( 1 ) – X n1 Δ ( 2 ) = 0. X n1 X n2
(21)
And since Xn1 > 0 and Xn2 > 0, from Eq. (21) it fol lows Δ e ( 1 ) X n1 X n2 Δ e ( 1 ) – X n1 Δ e ( 2 ) 0, or . Δ e ( 2 ) X n2
(22)
Let us express entropy potentials entering into Eq. (22) by means of the corresponding entropies using Eq. (6). As a result, we obtain H (x)
H ( x ) – H2 ( x ) X n1 X n1 X n1 I e 1 H(x) , or e 1 , or e , (23) X n2 X n2 X n2 e 2 where I = H1(x) – H2(x) is the amount of information created by variation of the system’s abeyance. Let us find the logarithm for both parts of Eq. (23)
X n1 I ln . X n2
(24)
By analogy for LΔ(1) < LΔ(2), we have X n1 I ln . X n2
(25)
From condition LΔ(1) = LΔ(2), we also obtain X n1 I = ln , X n2
(26)
which is what we set out to prove. As comments to Theorems 3 and 4, it is necessary to point out the following information: 1. Result (26) does not conflict with Theorem 3. As Δ follows from Definition 2, Xn = e . That is why, if we LΔ substitute this expression for Xn in Eq. (26) and if we X n1 consider that LΔ(1) = LΔ(2), we obtain I = ln = X n2 Δe(1) , which is simply the assertion of Theorem 3. ln Δe(2) 2. The conclusions of Theorems 2 and 3 are true for any basic distribution laws and can be taken as a prin ciple for defining the entropy potentials and complex entropy potentials. If we pass to the new base of the PATTERN RECOGNITION AND IMAGE ANALYSIS
643
distribution law, in Eqs. (19)–(22), the entropy poten tials will be multiplied by the same transition coeffi cient. After reducing them, we obtain the same initial expressions. That is why in the final results of the men tioned theorems there are no transition coefficients. 3. According to the obtained results (19) and (24)– (26), the value of information created by variation of abeyance can be negative. At first glance it is in conflict with the known concepts of the theory of information. However, in reality there is no conflict. This is because the classical definition for information is based on the concepts of a priori and a posteriori entropies. A priori entropy characterizes the initial abeyance of the object. A posteriori entropy characterizes the abey ance of the object after the occurrence of an event that specifies the abeyance. Such events are as follows: measuring procedure, receiving a signal that contains additional information about the object, etc. That is why the value of a posteriori entropy is always lower than the value of a priori entropy. The classical defini tion for the amount of information I, created by the occurrence of such events, is based on the estimation of abeyance decreasing, which is determined by the difference of a priori and a posteriori entropies. Natu rally, the amount of information defined in this way is not a negative value (I ≥ 0). In the general case, how the abeyance of the object is varied is examined, if the object is subjected to different impacts that depend on space and time coordinates. In this case, the abeyance can increase and decrease. That is why the difference in entropies of two comparing states can be either pos itive or negative. In particular, when such factors cause a posteriori entropy, or the entropy of the next state decreases, the amount of information is not negative. Hereinafter, if it is necessary, to exclude an ambiguous interpretation, we designate the concept for the amount of information which is generalized based on the entropy potential theory as IL. 4. Based on the results of Theorem 4 (24)–(26), it is possible to say that if the complex entropy potential is used for estimating the abeyance, condition LΔ(1) = LΔ(2) and the respective expression IL = ln(Xn1/Xn2) determine the line of bifurcation for the information field of the system. Figure 1 illustrates this statement. Here, in Cartesian coordinates Xn1/Xn2 and IL, the line IL = ln(Xn1/Xn2) separates the space of possible IL of the system into two areas. In one area that corresponds to condition LΔ(1) > LΔ(2) the band of possible values for IL is above the curve of IL = ln(Xn1/Xn2), and in the other area it is below the curve. If the representation point is on the mentioned discriminating line, it means that both compared abeyances are constant. In this case, the IL value is determined unambiguously from Eq. (26) based on the ratio between the basic parameters. Hereby, the information field is frag mented with respect to this line. 5. Theorem 4 is the addition and development of Theorem 3. The substance of Theorem 4 is as follows: based on it, it is possible to estimate the range of vari Vol. 21
No. 4
2011
644
LAZAREV Bifurcation line IL
LΔ(1) = LΔ(2) IL = In(Xn1/Xn2)
LΔ(1) > LΔ(2) IL > In(Xn1/Xn2)
LΔ(1) < LΔ(2) IL < In(Xn1/Xn2)
Xn1/Xn2
Fig. 1. Structure fragmentation for the information field of the system as a function of the ratio between the potential of a complex entropy and a parameter’s basic value
ation of a parameter’s information track by using the minimal data amount which can be simply deter mined, i.e., the Xn1 and Xn2 values. The presented approach for investigating the sys tem’s abeyance is characterized by increased sensitiv ity and makes it possible to detect variation in abey ances at the level of variation of the distribution law even if the other characteristics σ and Xn are constant. And the clearness of the variation of the abeyance vari ation rises. The entropy potentials are an efficient instrument for investigating different systems. In the general case, when system abeyance is described by an mdimensional vector, whose compo nents are different parameters xi (i = 1, 2, …, m), in order to characterize the abeyance, it is necessary to chose and introduce another concepts. It is not rea sonable to use multidimensional entropy ∞
∞
∫ ∫
H ( x 1, x 2, …, x m ) = – … p ( x 1, x 2, …, x m ) –∞
–∞
(27)
× ln p ( x 1, x 2, …, x m )dx 1 dx 2 …dx m due to the reasons mentioned and also due to the addi tional difficulties dealing with the problem of how to determine the multidimensional probability distribu tion law p(x1, x2, …, xm). It is also not reasonable to introduce the concept of entropy potentials due to the problems of how to determine the values of multi dimensional entropy coefficients and the mean square deviations. (These problems are especially important if these values are determined experimen tally.) To describe the abeyance of a kdimensional vector, we suggest introducing a concept of multi dimensional entropy potential (MDEP) by using the following definition.
Definition 3. A multidimensional complex entropy potential (MDCEP) of m dimensional vector is Lazj, which is determined as follows: 1
1
z z⎞ z ⎛ k ⎛ k z⎞ eij ⎞ La zj = ⎜ ( c i L Δij ) ⎟ = ⎜ ⎛ c i Δ ⎟ ⎝i = 1 ⎠ ⎝ i = 1 ⎝ X ni ⎠ ⎠
∑
∑
(28)
1 z⎞ z
⎛ K eij σ i⎞ = ⎜ ⎛ c i ⎟ . ⎝ X ni ⎠ ⎠ ⎝i = 1 k
∑
In Eq. (28), the following designations are used: LΔij is the complex enthalpy potential of the ith parameter; Δeij is the enthalpy potential of the ith parameter; j is an index, a distribution law number, using which the entropy coefficients, enthalpy potentials, and com plex enthalpy potentials are set; ci are the weight coef ficients, which characterize the significance and prior ity of each ith parameter used for the describing a sys tem’s state, ci ≥ 0; and (i = 1, 2, …, k), z is the number of criterion variant, z = 1 or z = 2. Under z = 1, we have k
the following criterion variant: La1j =
∑ (c
i
L Δij ) ,
i=1 k
and under z = 2 we have La2j =
∑ (c
2
i
L Δij ) . The
i=1
geometrical interpretation of the La2j value is a magni tude or length of the kth vector consisting of complex enthalpy potentials of separate parameters of the sys tem presented in the weighted coefficient scale. And La1j value is the sum of the magnitude’s lengths of these components. Therefore, La1j ≥ La2j and equality takes place, if k = 1. The user chooses the variant. It is possible to choose the variant according to the follow ing ideas: La2j makes it possible to estimate the abey
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 21
No. 4
2011
THE THEORY OF ENTROPY POTENTIALS, BASIC CONCEPTS, RESULTS
ance and the estimation is less dependent on system dimension k than it is in the case of La2j. Therefore, under sufficiently high k (k ≥ 5), the La2j criterion is better. For both variants, if La2j increases, it means that the system’s abeyance increases and viceversa. In all cases, Lazj is dimensionless (similar to LΔ) and there fore it is possible to use it as a criterion of entropy sim ilarity, if we investigate the abeyance of different sys tems. It is evident that under k = 1, the Lazj value degenerates into magnitude LΔj. The advantage of the introduced multidimensional complex entropy poten tial criterion is as follows: it is based on the concept of entropy potential introduced above, it is simple for determination, and it has a clear physical sense. Therefore, in order to determine Lazj according to the experimental results, it requires a minimal amount of data. (The number of measurements needed for deter mining the multidimensional complex entropy poten tial is equal to the sum of the separate measurements required for determining the entropy potentials of each parameter.) Criterion Lazj is a development of the entropy potential theory and does not conflict with the con cepts introduced above. For example, if we pass to the 1D system, when k = 1, the Lazj value degenerates into magnitude LΔj. If the basic value of parameter Xn is constant, value LΔj is the dimensionless scale image of entropy potential Δej. If we neglect the transformation of distribution law during evolution or the control, which corresponds to condition Ke = const (for exam ple, to accept that the parameter distribution law is always normal with Ke = 2.066), then value LΔj is the scale image of the mean square deviation. In this par ticular case, it is possible to investigate the system’s state by using the known procedures of variance anal ysis. For investigations, it is necessary to choose the dis tribution law (i.e., to determine index j). Hereinafter, unless otherwise stated, we assume that value Lazj is determined according to the law of uniform density and we designate it as Lazj = Laz. From the presented material, it follows that the concepts of entropy potential, complex entropy potential, and multidimensional complex entropy potential are the unified structures based on folding and embedding principals. They are simple for deter mination and they make it possible to describe abey ance rigorously by using different characteristics. In general terms, a set of entropy potential con cepts E can be determined as follows: E = 〈 X, N E, L E, Z, P E〉 ,
(29)
where X = {Ωj}, j = 1, 2, …, m is a set of elements or parameters used for describing the object’s state. The number of elementsm defines the dimension of the state vector. This finite set can consist of a collection of separate clusters ωi(ξ)(i = 1, 2, …, k) with a respective PATTERN RECOGNITION AND IMAGE ANALYSIS
645
range of variation Di; i.e., the whole set of states of any ith parameters Ωj = ω i (ξ), where ξ is the factors vector that determines the ith parameter variation (for example, a factor of time or any space coordinates). NE is a set of images for a subset of parameters from Ωj. In entropy potential models, we use images for obtain ing the scattering characteristicsσi of basic values of the Xni parameters for entropy coefficients Keij and also the weight coefficients ci. LE, Z is the set of ratio forms for mapping elements from NE into PE according to the NE PE scheme, and Z is the number of the form variant. For example, for mapping the elements from X into Lazj, two variants (Z = 1 and Z = 2) correspond ing to linear and quadratic forms are used. PE is the set of estimations and criteria characterizing the abeyance of elements from X: Δe, LΔ, Lazj. Definition (29) allows further development and replenishment of the list of concepts for describing abeyance in different problems and also for using different groups of parameters.
∪
3. THE RESULTS OF THE THEORY OF THE APPLICATION OF ENTROPY POTENTIALS FOR SOLVING SPECIFIC PROBLEMS The theory of entropy potentials makes it possible to increase the efficiency of information processing, monitoring, and control by means of different systems under uncertain conditions, since the destabilizing properties of the distribution laws of parameters, basic values, and performance variations are taken into account. Such methods make it possible to increase the intellectualization level of investigations and con trol. That is why they are widely used for solving prob lems in different fields. Below, we present the results of the application of such approach in practice. One field of the application of the theory of entropy potentials is the informational theory of instruments’ errors, where the concept of entropy error value is used [17]. The entropy error value is determined as half of the uniform distribution range of an entropy having the same error distribution law as a specific measuring system. From here, it follows that the entropy error value is a particular case of the entropy potential, when an absolute error is examined as a parameter. In terms of entropy potential theory, the entropy error value can be examined as the numerical estimation that charac terizes the abeyance level of parameter measurement. In this respect, it is possible to consider the informa tion theory of error as a precondition of the entropy potential theory. The theoretical and applied aspects of the methods of entropy potential theory for investigating and solv ing several problems of physics and cybernetics are examined in [12, 16]. In these works specific examples that illustrate the approach’s validity for investiga tions, beginning from atomic structures up to plane tary systems, are presented. The entropy potential the Vol. 21
No. 4
2011
646
LAZAREV
ory makes it possible to solve problems for controlling complicated systems. The novelty and validity of the presented approaches are confirmed by two invention patents. The idea of such control is as follows: to form the control channels and introduce the control impacts that make it possible to vary the basic compo nents of entropy potentials [10, 12, 14]. As a result a new class of control systems adapted for uncertainty conditions are developed [18]. The possibilities of entropy potential theory meth ods are used in [19] for investigating the inflation rate in a large economical region. The consumer price index is analyzed and the state of affairs in a socioeco nomic system is estimated by using the potentials of entropy and complex entropy. In addition, the tenden cies for forming schemes and methods of economic control for the region are revealed by analyzing how the components of complex entropy potentials vary at different stages of development. The obtained results are of interest for predicting development and decision making for organizing and managing the economy. The same approach is used in [16] for investigating St. Petersburg’s ecological system according to the tem perature parameter. As a result, the tendencies in cli mate development and temperature conditions are predicted. In particular, we analyze how basic temper atures (mathematical expectation) for three time stages (43 years) vary and we make the decision that there is no tendency of global warming in the region. By analyzing how entropy potential and complex entropy potential and their defining characteristics vary, and by analyzing the information track of the ecological system, we make the decision on climate variation, as well as on temperature instability and abeyance. Further information about temperature modes, especially for winter and summer periods, confirms the validity of the results. We note that it is topical to use the entropy poten tial theory for solving problems of system qualimetry. Such problems’ specificity is as follows: one of the dominating factors under decision making for cost optimization or for modernizing a system’s elements is the uncertainty in planning results. Therefore, the models which are used for solving such problems should consider this component, which can be described rigorously by using the enthalpy potential concepts. The theoretical and practical aspects of the application of entropy potential theory for solving qualimetry problems are presented in [20] and they are illustrated by examples and calculations. The methods of entropy potential theory can be used for investigating different systems: social, eco nomical, technical, biological, informational, etc. Below, we present one promising approach for pro cessing the observation results.
4. OBSERVATION PROCESSING BY USING THE ENTROPY POTENTIAL THEORY It is necessary to develop the measuring means and to increase the processing efficiency of the initial information. Nowadays, technologies and algorithms for data processing and intellectual analysis are being developed intensively. Data Mining, OLAP, and other [21] developments are examples of such technologies. By analyzing the main stages of data processing, it is possible to make the following decision. In practice, the regression technique based on the least squares method (LSM) is used to solve the problems of infor mation filtration, experimental data approximation, local identification in cluster analysis, regularities simulation, etc. Therefore, by improving this instru ment of investigation, we increase the efficiency of observation processing. LSM is better with respect to other approaches and methods only if several requirements are satisfied [22, 23]: (1) the distribution of observation errors should be described by the normal law; and (2) the error of observation should be independent, etc. It is impossi ble to meet these requirements ideally due to the nature of the examined phenomena and physical core of the experiment. According to the normal law, the random value varies from –∞ up to ∞, which is not typical for the nature of the error. For example, the errors caused by the kinematic chain or by trigger hys teresis are distributed according to the discrete dou blevalued law and therefore, they are limited; the errors caused by signal quantization are distributed according to the uniform density law; and the errors caused by sinusoidal voltage at a device’s input are dis tributed according to the arcsinusoidal law, whose properties and characteristics are antipodal to the nor mal law, and theyis also limited. In this respect, if we describe the random variations of the examined parameters by the normal distribution law, the core of the studied phenomenon can be presented incorrectly. It is possible to say that there is no normal law in nature: it is a product of mathematical abstraction, which is convenient for analytical investigations. The real distribution laws according to their destabilizing properties and the predictability of some values are close to the normal law with different degrees of agree ment. There is only one central limit theorem and it reveals that the distribution of the resulting deviation approaches the normal law, if it is formed by the impacts of high number of distributions, each of which is characterized by insignificant weight. In this case, the problem of how to estimate numerically the degree of approximation for different number of distur bances, weight insufficiency, etc., has not been solved. All the laws mentioned above are illustrated by exam ples of typical distribution laws, which are widely used in practice, and which are presented in Table 2.
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 21
No. 4
2011
THE THEORY OF ENTROPY POTENTIALS, BASIC CONCEPTS, RESULTS
Under the conditions of a real experiment, the esti mation errors are not always independent. For exam ple, the noises caused by an alternative voltage for dif ferent segments of a measuring system can cause a cor relation error. The same situation can occur if heat, magnetic, and other disturbances act in complex. For these cases, the efficiency of the LSM that is used for observation processing can decrease greatly and the level of obtained information drops. That is why it is necessary to develop new, improved approaches and methods for generating the mathe matical models on based on experimental data. We note that alternative approaches for solving this prob lem have been developed for a long time. For example, approaches which use maximally probable parameter estimations, as well as Bayes and minimal parameter estimations. The best solution has still not been found and the increasing number of publications verifies this fact. We suggest one way for solving this problem, based on the theory of entropy potentials. The idea is as follows: observations based on measurement results have random components. That is why the generated mathematical model is characterized by uncertainty with respect to the examined phenomenon. From here, it follows that to determine the parameters of a model which approximates the unknown relation, it is necessary to minimize the criterion that characterizes its abeyance. We note that a similar approach is used in the LSM [22, 23]. The sum of squared misalignments, the disagreement between the experimental and calcu lated values, are used as a criterion of model abeyance. (In terms of the theory of error, this value is called the error function.) It is also assumed that the probability of the event that separate misalignments appear is dis tributed according to the normal law. However, this criterion is not perfect, since it does not take into account the real distribution law of parameter mis alignments for separate clusters of observations and the rated values of examined parameters at which background the abeyance is examined, together with the cluster significance that causes the mentioned dis advantages of this method. Thus, we suggest using value Laz as the criterion that characterizes the model’s abeyance. In this case, the disagreements between the experimental and modeling data are the examined parameters. The set of misalignments can consist of separate clusters caused, for example, by the results of unequal and multiple observations for any parameter at separate segments of its variation or time intervals and also obtained by dif ferent measuring means and methods. The weight coefficients characterizing the weight and priority of each ith cluster of observation during model genera tion, ci ≥ 0 (i = 1, 2, …, k), can be determined in dif ferent ways: analytically, based on the statistical analy sis of the results of observation, metrological instru ment characteristics, etc. When we estimate the abeyance of the model generated based on unequal observations, the weight coefficients are inverse to the PATTERN RECOGNITION AND IMAGE ANALYSIS
647
values of the respective variances, which corresponds to the measure of accuracy of these set of measure ments [22]. The idea to use the Laz criterion for determining model’s parameters does not conflict with the existing approaches. If we simulate any phenomenon based on one sequence of observations, when k = 1, Laz degen erates into magnitude LΔ. If the basic value of Xn parameter is constant, LΔ is the dimensionless scale image of the entropy potential for Δe parameter mis alignments. In this particular case, the procedure of model improvement is as follows: to decrease the entropy potential, and therefore the entropy after any stage of iteration is the measure of the obtained infor mation, i.e., the increment of knowledge included in the model. If we neglect the transformation of the dis tribution law during simulation, which corresponds to condition Ke = const (for example, to accept that the distribution law for a parameter’s misalignment with respect to modeling values is always normal with Ke = 2.066), the Δe value is the scale image of the mean square deviation. In this more particular case, when we determine the parameters of the mathematical model, the method based on the minimization of entropy potential degenerates to the LSM or the derived method. Such an approach should be exam ined as an additional instrument for observation pro cessing, which makes it possible to increase the level of attainable knowledge. CONCLUSIONS The main concepts of the entropy potential theory, whose methods are the instruments for smart analysis and system control, are presented. The approach that makes it possible to determine parameters of the mathematical models according to the results of observation is suggested. The obtained models are adapted for a variation of the error distribution law for different observation series to the basic values of the examined phenomena, which makes it possible to increase its adequacy. REFERENCES 1. Intellectual Systems of Automatic Control, Ed. by I. M. Ma karov and V. M. Lokhin (Fizmatlit, Moscow, 2001) [in Russian]. 2. A. V. Kolesnikov, Hybrid Intellectual Systems: Theory and Design Technology, Ed. by A. M. Yashin (SPbGTU, St. Petersburg, 2001) [in Russian]. 3. Yu. I. Zhuravlev, “On Algebraic Approach for Solving the Recognition or Classification Problems,” in Cyber netic Problems (Nauka, Moscow, 1978), pp. 5–68 [in Russian]. 4. E. A. Patrick, Fundamentals of Pattern Recognition, (PrenticeHall, Englewood Cliffs, NJ, 1972; Sov. Radio, Moscow, 1980). 5. L. A. Zadeh, “Fuzzy Sets and Their Application to Pat tern Classification and Cluster Analysis,” in Classifica Vol. 21
No. 4
2011
648
6.
7.
8.
9. 10.
11.
12.
13.
14. 15.
16.
17.
18.
LAZAREV tion and Clustering, Ed. by J. Van Ryzin (Acad. Press, 1977; Mir, Moscow, 1980), pp. 208–247. S. N. Vasil’ev, “From Classical Control Problems to Intelligent Control,” Izv. Akad. Nauk. Teor. Sist. Upr., Nos. 1, 2 (2001). C. A. Shannon, “Mathematical Theory of Communi cation,” Bell Syst. Tech. J., No. 27, Part I, 379–423; Part II, 623–656 (1948). I. V. Prangishvili, Entropy and Other System Regulari ties: Control Problems of Complicated Systems (Nauka, Moscow, 2003) [in Russian]. E. T. Jaynes, Probability Theory: The Logic of Science, Ed. by L. G. Bretthorst (Cambridge: Univ. Press, 2003). V. L. Lazarev, “The Way to Use the Probability–Infor mation Criteria for Control under Uncertain Condi tions,” in Proc. Int. Conf. on Soft Computing, and Mea surements SCM’2003 (SPbGTU “LETI,” St. Peters burg, 2003), Vol. 2, pp. 78–81. A. M. Turichin, P. V. Novitskii, E. S. Levshina, et al., Electric Measurement of Nonelectrical Parameters, Ed. by P. V. Novitskii (Energiya, Leningrad, 1975) [in Rus sian]. V. L. Lazarev, “Entropy Approach for Organizing the Monitoring and Control,” Izv. Akad. Nauk. Teor. Sist. Upr., No. 6, 61–68 (2005). V. L. Lazarev, Simulation of Entropy Potentials of Dynamical System Parameters under Expected Uncer tainty, Available from VINITI No. 1443–V2006 (2006). V. L. Lazarev, Entropy Models in Monitoring and Control Systems, Available from VINITI No. 233–V05 (2005). V. L. Lazarev, “Complex Objects Control and Inspec tion on the Base of InformationEntropy Models,” in Proc. Int. Conf. on Soft Computing and Measurements SCM’2004 (SPbGTU “LETI,” St. Petersburg, 2004), Vol. 2, pp. 57–61. V. L. Lazarev, “Research of the Systems on the Base of Entropy and Information Characteristics,” Zh. Tekh. Fiz., No. 2, 1–7 (2010). P. V. Novitskii, Foundations of Information Theory of Measuring Devices (Energiya, Leninrgad, 1968) [in Russian]. V. L. Lazarev, “Complex Systems Robust Control on the Base of Entropy Potentials Theory,” in Proc. 6th Sci. Conf. “Control and Information Technologies” (CIT 2010) (OAO “Kontsern TsNII “Elektropribor,” 2010), pp. 74–78.
19. V. G. Kulakov, V. L. Lazarev, and V. A. Fedulin, “Sim ulation and Research of Socio–Economic Systems by Using Entropy Potentials Theory Methods. Applica tion to the Problems of Tver Region,” in Proc. on Soft Calculations and Measurements “SCM’2010” (SPb GTU “LETI,” St. Petersburg, 2010), Vol. 2, pp. 93–97. 20. V. L. Lazarev, “Qualimetry of the Systems on the Base of Entropy Potentials Parameters. Application Aspects for Food Industry and Nanotechnologies,” Vestn. Mezhdunar. Akad. Kholoda, No. 4, 48–52 (2009). 21. A. A. Barsegyan, M. S. Kupriyanov, V. V. Stepanenko, and I. I. Kholod, OLAP and Data Mining Methods for Data Analysis (BKhV–Peterburg, St. Petersburg, 2004) [in Russian]. 22. Yu. V. Linnik, Least–Squares Method and Foundations of Mathematical–Statistical Theory of Observations Pro cessing, 2nd ed. (Fizmatlit, Moscow, 1962) [in Rus sian]. 23. L. N. Bol’shev, LeastSquares Method. Mathematical Encyclopedia, Ed. by I. M. Vinogradov (Sovetskaya Entsiklopediya, Moscow, 1982), Vol. 3, pp. 876–882 [in Russian]. Viktor Lazarevich Lazarev is a candidate of technical science, asso ciate professor in the department of Automatics and Automation of Industrial Processes at St. Petersburg State University of Low Temperature and Food Technologies. In 1975 he graduated from Kalinin Leningrad Polytechnic Institute, spe cializing in InformationMeasure ment Technology. After graduating he worked as an engineer, scientific fel low, and senior scientist. He defended his dissertation in 1980. Since 1983 he has been teaching. He became an asso ciate professor in 1988. Lazarev is the author of more than 100 scientific and methodical works, and he has been awarded exhibition prizes for his research. His fields of scientific interest include the following fields: —Object and system monitoring and control under uncertain conditions; —Intellectualization of algorithms and information pro cessing processes; —The theory of entropy potentials and its application for different fields. He has published more than 40 scientific works at this fields.
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 21
No. 4
2011