REPRESENTATION, PROCESSING, ANALYSIS, AND UNDERSTANDING OF IMAGES
On the Probability of the Formation of Local Groups in Random Point Images A. L. Reznik, A. A. Solov’ev, and A. V. Torgov Institute of Automation and Electrometry, Siberian Branch, Russian Academy of Sciences, Novosibirsk, 630090 Russia e-mail:
[email protected] Abstract—Original programming, combinatorial, and geometric schemes are presented. They have been developed and used by the authors to calculate exact analytical formulas that describe the probability of the formation of local groups of a given size in random point images. Formulas, which will be discussed below, arise in the assessment of the reliability detection of point images, when they are detected by scanning the aperture that has a limited number of thresholds. In this paper, significant attention is paid to the formulation and solution of difficult combinatorial problems that have been encountered in the course of the investigation and that are associated with new generalization of the Catalan numbers. Keywords: random image, computer analytical calculations, generalized Catalan numbers, random local groups DOI: 10.1134/S1054661816040155
INTRODUCTION The paper discusses the problems associated with the estimation of the probability of the formation of local concentrations in random point images. Such problems arise in many scientific and engineering disciplines and can be purely applied and purely theoretical. In our case, these problems were encountered in the investigation of the detection of random point fields using a scanning aperture that has a limited number of thresholds. When recording random coordinates of small (ideally, point) objects that form such a field, failure occurs at a time when the number of objects within the scanning aperture exceeds a predetermined threshold. In the basic case, when the analyzed image is formed by a random Poisson flow of constant intensity, the 2D problem of estimating the probability that the detection will be carried out without interruption can easily be reduced to the following seemingly simple one-dimensional problem (it is achieved by a standard factorization), “It is necessary to find the probability Pn,k (ε) of the event consisting in the fact that in the case of the random throwing of n points on the closed interval (0,1), no groups will be formed with more than k points and concentrated inside a subinterval Ω ε ⊂ (0,1) with a length of ε.” This problem will be called the main problem, and the development of mathematical methods for finding exact analytical formulas describing its solution is the purpose of this paper.
Received February 2, 2016
OBTAINING OF PARTIAL SOLUTIONS OF THE MAIN PROBLEM USING MACHINE INTELLIGENCE SOFTWARE The simplicity of the posed problem is deceptive, and its analytical solution is known only for the simplest case k = 1 [1, 2]:
Pn,1(ε) = (1 − (n − 1)ε) n ,
(0 ≤ ε ≤ 1/(n − 1)). (1)
Equation (1) describes the probability of the event consisting in the fact that, in the case of the random throwing of n points on the closed interval (0,1), no εgroups of at least two points will be formed, i.e., that all the thrown points will scatter at a distance greater than ε from each other. The classical method for the solution of (1) is to provide the desired probability Pn,1(ε) in the form of an easily integrated iterated integral [1]: 1
Pn,1(ε) = n !
∫
dx n
(n −1)ε
⎧⎪ x n −ε ⎧⎪ x4 −ε ⎧⎪ x3 −ε ⎧⎪ x2 −ε ⎫⎫⎫ ⎪⎪⎪⎪⎫ dx n−1 ⎨ dx 3 ⎨ dx 2 ⎨ dx1 ⎬⎬⎬⎬ . ×⎨ ⎪⎩(n−2)ε ⎪⎪⎪⎪⎭ ⎩⎪ 2ε ⎩⎪ ε ⎩⎪ 0 ⎭⎭⎭
∫
∫
∫
∫
The solution of (1) can be obtained in other ways. For example, recently in [3], a simple probabilistic and geometrical method was proposed to solve (1) without resorting to multidimensional integration. Thus, it is easy to solve the main problem when k = 1. It is much more difficult to solve the same problem when k > 1. Here, by now, we have achieved [3–8] the following results.
ISSN 1054-6618, Pattern Recognition and Image Analysis, 2016, Vol. 26, No. 4, pp. 714–719. © Pleiades Publishing, Ltd., 2016.
ON THE PROBABILITY OF THE FORMATION
Note that for arbitrary values of n and k the desired solution can be represented in the form of the n-fold integral
Pn,k (ε) = n !
∫
… Dn, k (ε)
∫ dx … dx 1
n,
(2)
715
Pn,k (ε) for the case of k > 1, if such will be identified. A number of such analytical regularities were actually discovered and later rigorously proved. Firstly, for the k = n – 1, a simple dependence was identified and then easily proved
Pn,n−1(ε ) = 1 − nε n−1 + (n − 1)ε n.
where the integration region Dn,k (ε) is defined by the sysn
tem of linear inequalities in the Euclidean space R :
⎧0 < x1 < x 2 < … < x n−1 < x n < 1 , ⎪x − x > ε , 1 ⎪⎪ k +1 ⎨ x k +2 − x2 > ε , ⎪ ⎪ ⎪⎩ x n − x n− k > ε . In order to find integral (2), we developed two methods of successive dimension reduction based on the recursive replacement of the original n-fold integral by a set of integrals which are structurally similar but have one dimension less. Further, by formalizing these methods and applying cyclic recursion, two systems of analytical calculation of probabilities Pn,k (ε) were designed and software implemented, which calculate the desired piecewise polynomial dependence in the form of functions of the continuous parameter ε. The first system constructively calculates the integration limits for each of the iterated integrals into which source n-fold integral (2) is decomposed. The second software system is based on the repeated differentiation of integral (2) by the parameter ε. The third algorithmic scheme for the calculation of probability formulas Pn,k (ε) was developed based on the discrete combinatorial model and software implemented as well. Analytical calculations made using the described software systems made it possible to find a complete set of partial formulas Pn,k (ε) in all ranges of the continuous parameter ε for all integer-valued parameters n and k up to n = 14. Note that their calculation involves a huge number of routine operations of the arrangement of integration limits, verification of intermediate systems for the compatibility of the inequalities, and direct integration in the n-dimensional space, which is almost impossible in the manual mode already at n = 4. THE CALCULATION OF PROBABILITY FORMULAS Pn,k (ε) BY THE DIRECT INTEGRATION METHOD In the next step, we analyzed the obtained partial results to establish and, if possible, to prove the general formation of regularities of probability formulas PATTERN RECOGNITION AND IMAGE ANALYSIS
(3)
(Recall that formula (3) describes the probability that all n points thrown on the interval (0,1) will not form one compact ε-group.) For k = n – 2, the dependence Pn,k (ε) is more complex:
⎧1 − 2C n2ε n−2(1 − ε) 2 − 2ε n, 0 ≤ ε ≤ 1/2; ⎪ Pn,n−2(ε) = ⎨1 − 2C n2ε n−2(1 − ε) 2 − 2ε n + (2ε − 1) n, (4) ⎪1/2 ≤ ε ≤ 1. ⎩ For k = n – 3, the dependence Pn,k (ε) is so complicated that it is impossible to identify it by the analysis of partial software solutions:
Pn,n−3(ε) (n >6)
⎧1 − 2ε n + C n1(6ε n − 4ε n−1) + C n2(− 3ε n + ε n−2 ) ⎪ 3 n n −1 n−2 n −3 ⎪+ C n (9ε − 18ε + 12ε − 3ε ) , (5) ⎪⎪0 ≤ ε ≤ 1/2 ; =⎨ 1 n n n −1 n −1 ⎪1 − 2ε + (2ε − 1) + C n(1 − ε)(− 2ε + 2(2ε − 1) ) ⎪+C 2(1 − ε) 2(ε n−2 + (2ε − 1) n−2 ) − 3C 3ε n−3(1 − ε)3 , n ⎪ n 1/2 ≤ ε ≤ 1 . ⎩⎪ As previous dependences (3)–(4), formula (5) was found analytically and proven by direct integration. DISCRETE AND COMBINATORIAL METHODS FOR FINDING FORMULAS Next, by analogy with formula (1) valid for k = 1 we tried to find a general solution Pn,k (ε) for k = 2. Unfortunately, this task appeared to be much more difficult than it seemed before the beginning of the studies. This is primarily because of the fact that, unlike the case of k = 1, the probability Pn,2(ε) consists not of one but of several piecewise homogeneous fragments that are continuously coupled at merging points. Secondly, the formula Pn,2(ε) itself transforms depending on the parity of n. Thirdly, finding the regularities on each of the ranges of the parameter ε requires the development of an individual reduction scheme of each continuous problem that corresponds to a particular range, to its own and often very complex discrete probability problem. In the proposed reduction scheme, generalized
Vol. 26
No. 4
2016
716
REZNIK et al.
Catalan numbers occur in all subproblems (i.e., in all ranges of the parameter ε). It is necessary to know their explicit form in the ordering of interdependent random number sequences. While all the elements of the analyzed sequences are random uniformly distributed real quantities, it appeared to be more convenient to formulate and solve problems with generalized Catalan numbers in the lexical-linguistic form. The statements and solutions of each of these problems, which were encountered in finding the relations Pn,2(ε), are given below. The first of them is well known; the problem involves classical Catalan numbers (in this context, only its lexical-linguistic formulation is of interest). The second and third problems are not very complex, since we did not encounter them in the literature, we found it appropriate to adduce their solutions here. Problems 4–6 are much more difficult. However, they are given without proof (not to overwhelm this paper with details). PROBLEM 1 HAS A LEXICAL-LINGUISTIC FORMULATION AND LEADS TO THE CLASSICAL CATALAN NUMBERS Problem 1 The word is formed by N symbols a and N symbols b. How many of these words are such that, when the word is viewed from left to right, the number of encountered symbols b never exceeds the number of encountered symbols a? Solution:
WORD1(N ) (6) ⎛ 2N ⎞ (2N )! (2N )! − = 1 ⎜ . ⎟ N ! N ! (N + 1)!(N − 1)! N + 1 ⎝ N ⎠ Formula (6) is found in the works of Euler (many applied probability problems lead to it). However, in combinatorics this sequence is known under the name of the Belgian mathematician Catalan, who lived a century later.
=
COMBINATORIAL AND LINGUISTIC PROBLEMS ARISING IN THE STUDY OF RANDOM POINT STRUCTURES AND LEADING TO GENERALIZED CATALAN NUMBERS Problem 2 The word is formed by Na symbols a and Nb symbols b. How many of these words are such that, when the word is viewed from left to right, the number of encountered symbols b never exceeds the number of encountered symbols a? (It is assumed that Na ≥ Nb.) Solution:
(N a + N b )! (N a + N b )! . WORD2(N a, N b ) = − Na !Nb ! (N a + 1)!(N b − 1)!
This is the simplest extension of the Catalan numbers. When Na = Nb, we have the classical Catalan sequence.
Problem 3 The word is formed by Na symbols a and Nb symbols b. How many of these words are such that, when the word is viewed from left to right, the number of encountered symbols b never exceeds the number of encountered symbols a by more than k? (It is assumed that k ≥ 0 and Na+ k ≥ Nb). Solution: WORD3 k (N a, N b ) (N a + N b )! (N a + N b )! = − . (N a + k + 1)!(N b − k − 1)! Na !Nb !
This is a further extension of the Catalan numbers. Note that for k = 0, problem 3 is equivalent to problem 2, and if Na = Nb and k = 0, we obtain problem 1. Problem 4 The word is formed by Na symbols a, Nb symbols b, and Nc symbols c. How many of these words are such that the two conditions hold simultaneously. (1) When viewed from left to right, the number encountered symbols b never exceeds the number of encountered symbols a? (2) When viewed from right to left the number of encountered symbols c never exceeds the number of encountered symbols a? Solution. In the considered problems with the detection of random point fields [9], the condition Na > N b + N c − 2 always held (generally speaking, even stricter condition held: N a ≥ N b + N c ). Thus, the following formula is valid for these problems:
WORD4.1(N a, N b, N c ) (N a + N b + N c )! (N a + N b + N c )! = − (N a + 1)!(N b − 1)! N c ! Na !Nb !Nc ! (N a + N b + N c )! (N a + N b + N c )! . − + (N a + 1)! N b!(N c − 1)! (N a + 2)!(N b − 1)!(N c − 1)! It was obtained by us purely geometrically in [3] using the classical André’s reflection method [10]. However, more general formula
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 26
No. 4
2016
ON THE PROBABILITY OF THE FORMATION
WORD4(N a, N b, N c ) (N a + N b + N c )! (N a + N b + N c )! = − Na !Nb !Nc ! (N a + 1)!(N b − 1)! N c ! (N a + N b + N c )! (N a + N b + N c )! − + (N a + 1)! N b!(N c − 1)! (N a + 2)!(N b − 1)!(N c − 1)! (N a + N b + N c )! + (N b + N c − N a − 2)!(N a + 1)!(N a + 1)! (N a + N b + N c )! − (N b + N c − N a − 2)! N a !(N a + 2) ! could not be obtained in a similar manner. Thus, in order to find it, we used the methods developed in [11]. (Note that for Nc = 0 problem 4 becomes problem 2, so that we have one more extension of Catalan numbers.) Problem 5 Six symbol alphabet {“a,”“b,”“c,”“d,”“e,”“f”} was used to compile different words with the length of 2m such that:
N a + N d + N e = m, N b + N c + N f = m, N a + N c ≤ m, where N a , N b , N c , N d , N e , and N f are the number of symbols “a,”“b,”“c,”“d,”“e,” and “f” used in this word, respectively. For each admissible set {Na, Nb, N c , N d , N c , N f }, it is necessary to find the total number of words WORD5 m(N a, N b, N c , N d , N e, N f ), which, when viewed from left to right, simultaneously satisfy three conditions. (1) The number of encountered symbols b never exceeds the number of encountered symbols a. (2) The number of encountered symbols d never exceeds the number of encountered symbols c. (3) The number of encountered symbols f never exceeds the number of encountered symbols e by more than (N a − N b ).
Solution: Using the classical André’s reflection method [10], we found the solution of the problem:
WORD5 m(N a, N b, N c , N d , N e, N f ) ⎧ 1 = (2m)!⎨ ⎩(m − (N b + N c ))!(m − (N a + N d ))! ⎫ 1 − ⎬ (m + 1 − (N b + N d ))!(m − 1 − (N a + N c ))!⎭ ⎧ ⎫ 1 1 ×⎨ − ⎬ ⎩N a ! N b ! (N a + 1)! (N b − 1)!⎭ ⎧ ⎫ 1 1 ×⎨ − ⎬. ⎩N c ! N d ! (N c + 1)! (N d − 1)!⎭ Problem 6 A six-symbol alphabet {“a,”“b,”“c,”“d,”“e,”“f”} was used to compile different words with the length of 2m such that:
N a + N d + N e = m, N b + N c + N f = m, N a + N c ≥ m + 1, where N a , N b , N c , N d , N e , and N f are the number of symbols “a”,“b”,“c”,“d”,“e”, and “f” used in this word, respectively. For each admissible set {Na, Nb, N c , N d , N c , N f }, it is necessary to find the total number of words WORD6 m(N a, N b, N c , N d , N e, N f ) , which, when viewed from left to right, simultaneously satisfy three conditions. (1) The number of encountered symbols b never exceeds the number of encountered symbols a. (2) The number of encountered symbols d never exceeds the number of encountered symbols c. (3) The number of encountered symbols a never exceeds the number of encountered symbols c by more than (m + 1 − N c ). Solution. Using the approach described in [11], we found the solution to the problem:
WORD6 m(N a, N b, N c , N d , N e, N f ) =
(2m)! Ne !N f
⎧⎡ ⎤ 1 1 − ⎨ ! ⎩⎣⎢N a ! N b ! N c ! N d ! N a ! N b !(N c + 1)!(N d − 1)!⎥⎦
⎡ ⎤ 1 1 +⎢ − ⎥ ⎣(N a + 1)!( N b − 1)!(N c + 1)!(N d − 1)! (N a + 1)!( N b -1)! N c ! N d !⎦ ⎡ ⎤ 1 1 +⎢ − ⎥ ⎣(m + 3)! N b !(N d − 1)!(N a + N c − (m + 2))! (m + 2)! N b ! N d !(N a + N c − (m + 2))!⎦ ⎡ ⎤⎫ 1 1 +⎢ − ⎥⎬ . ( m + 3)!( N − 1)! N !( N + N − ( m + 2))! ( m + 4)!( N − 1)!( N − 1)!( N + N − ( m + 2))! ⎣ ⎦⎭ b d a c b d a c PATTERN RECOGNITION AND IMAGE ANALYSIS
717
Vol. 26
No. 4
2016
718
REZNIK et al.
FORMULAS Pn,2(ε) FOUND USING SOFTWARE, ANALYTIC, AND DISCRETE AND COMBINATORIAL ALGORITHMS For the probabilities Pn,k (ε) for k = 2, we could not find a closed analytic formula similar to formula (1) for k = 1. However, with all of the above computer and discrete and combinatorial tools including software and analytical calculations and generalized Catalan numbers, we established and later proved a number of new previously unknown analytical dependences. In particular, for even values of n = 2m on the interval 1 < ε < 1 , we strictly proved [6] the previously m m −1 expressed formula-hypothesis [9]
1 C m (1 − (m − 1)ε)2m . 2m m +1 For even values of n = 2m on the interval, the following formula was found [12] P2m,2(ε ) =
P2m,2(ε) = C 2mm(1 − (m − 1)ε)2m m −1
− C 2m (1 − (m − 1)ε)
2m
− C 2mm−2(1 − mε) m + 2(1 − (m − 2)ε) m −2 + 2C 2mm−3(1 − mε) m +3(1 − (m − 2)ε) m −3 −
m−4 C 2m (1 −
mε)
m+4
(1 − (m − 2)ε)
m −4
.
For odd values of n = 2m + 1 on the interval 1 < ε < 1 , the following formula was found [7] m +1 m
P2m +1,2(ε) = C 2mm++11(1 − mε) m +1(1 − (m − 1)ε) m m+2
− 2C 2m +1(1 − mε) m +3
+ C 2m +1(1 − mε)
m+2
m +3
(1 − (m − 1)ε)
(1 − (m − 1)ε)
m −1
m−2
.
CONCLUSIONS The results provided in this paper were obtained using specially designed machine analytical tools, as well as using generalized Catalan numbers. It made it possible to transform an inherently continuous problem of finding the probability formulas Pn,k (ε) into the category of discrete and combinatorial formulas. The efficiency of the proposed methods gives hope for further progress in solving the main problem, until finding its general analytical solutions. ACKNOWLEDGMENTS This work was partially supported by the Russian Foundation for Basic Research, project no. 16-0100313, the Russian Academy of Sciences (project no. 224 of the RAS Presidium Program I.5P), and the Siberian Branch of the Russian Academy of Sciences (project no. 24/2015).
REFERENCES 1. E. Parzen, Modern Probability Theory and Its Applications (John Wiley and Sons, New York-London, 1960), p. 464. 2. S. S. Wilks, Mathematical Statistics (Wiley, New York, 1962). 3. A. L. Reznik, V. M. Efimov, A. A. Solov’ev, and A. V. Torgov, “On reading probability of random point objects under limit number of threshold levels of scanning aperture,” Avtometriya 50 (6), 61–68 (2014). 4. A. L. Reznik and V. M. Efimov, Computer Analytic and the Catalan Generalized Numbers in the Problems on Recording the Random Discrete Objects (Siberian Branch RAS, 2013) [in Russian]. 5. A. L. Reznik, V. M. Efimov, A. A. Solov’ev, and A. V. Torgov, “On the reliable readout of random discrete-point structures,” Pattern Recogn. Image Anal. Adv. Math. Theory Appl. 25 (1), 84–88 (2015). 6. A. L. Reznik, V. M. Efimov, and A. A. Solov’ev, “On the reliable readout of random discrete-point structures,” Avtometriya 47 (1), 10–16 (2011). 7. A. L. Reznik, V. M. Efimov, A. A. Solov’ev, and A. V. Torgov, “Catalan generalized numbers in the problems for processing the random discrete images,” Avtometriya 47 (6), 11–16 (2011). 8. A. L. Reznik, V. M. Efimov, and A. A. Solov’ev, “The way to estimate the reading probability of random discrete images by using computer analytic means,” Vestn. Novosib. Gos. Univ. Ser. Fiz. 5 (2), 104–110 (2010). 9. A. L. Reznik, “Computer simulation of continuous reading for images with discrete structure,” Avtometriya, No. 6, 3–6 (1981). 10. D. Andre, Solution directe du probleme resola par M. Bertrand, C.R. Acad. Sci., Paris 105, 436–437 (1887). 11. I. M. Gessel and D. Zeilberger, “Random walk in a Weyl chamber,” Proc. Amer. Math. Soc. 115 (1), 27– 31 (1992). 12. A. L. Reznik, A. A. Solov’ev, and A. V. Torgov, “Program-combinatory approach for solving problems on proper reading the random point images,” Avtometriya 52 (2), 20–27 (2016).
Translated by O. Pismenov Aleksandr L’vovich Reznik. Born in 1948. Graduated from the Novosibirsk State University in 1969. Received candidate’s degree in 1981 and doctoral degree in 2006. Head of Laboratory of the Institute of Automation and Electrometry of the Siberian Branch of the Russian Academy of Sciences. Scientific interests: analytical and numerical methods for solving complex probabilistic problems using computer technology. Author of 95 papers.
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 26
No. 4
2016
ON THE PROBABILITY OF THE FORMATION
Andrei Vladislavovich Torgov. Born in 1959. Graduated from the Novosibirsk State University in 1983. Researcher at the Institute of Automation and Electrometry of the Siberian Branch of the Russian Academy of Sciences. Scientific interests: image processing and digital filtration. Coauthor of 35 papers.
Aleksandr Anatol’evich Solov’ev. Born in 1980. Graduated from the Novosibirsk State University in 2002. Junior Researcher at the Institute of Automation and Electrometry of the Siberian Branch of the Russian Academy of Sciences. Scientific interests: image processing and mathematical statistics. Coauthor of 15 papers.
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 26
719
No. 4
2016