Vol.1
No.2
J. of Compt. Sci. & Technol.
1986
A l m o s t Optimal D y n a m i c 2-3 Trees Li Wanxue (~-Y~@) (~,ankai University, Tian,iin) Received November 8, 1983.
Abstract T h i s p a p e r p r e s e n t s a p r i n c i p l e t o c r e a t e A l m o s t O p t i m a l D y n a m i c a l 2-3 t r e e s b a s e d o n t h e t h e o r y o f M i l l e r e t a l . , ['U a n d g i v e s a s e a r c h i n g a l g o r i t h m , a n i n s e r t i o n a l g o r i t h m a n d a d e l e t i o n a l g o r i t h m f o r t h e s e 2-3 t r e e s . E x p e r i m e n t a l r e s u l t g i v e n in t h i s p a p e r i n d i c a t e s t h a t t h e s e 2-3 t r e e s h a v e v e r y g o o d p e r f o r m a n c e a t n o d e - v i s i t c o s t . W e d i s c u s s a s y m p t o t i c p r o p e r t y o f t h e 2-3 t r e e s a s N --* co, a n d e v a l u a t e i t s a p p r o x i m a t e h e i g h t , h = 1og2.45(:'r + 1), w h e r e :V is t h e n u m b e r o f n o d e s o f a 2-3 t r e e . F i n a l l y , t h i s p a p e r a n a l y s e s t h e t i m e c o m p l e x i t i e s o f t h e a l g o r i t h m s , w h i c h a r e O(log2 ~5(1~ + 1)).
The 2-3 trees is a kind of very important balance trees. Its definition and property are described in [1-3]. Miller et al. developed the concept cf an optimaI 2-3 trees on nodevisit cost in [4], and presented a linear algorithm for these trees on statically ordered set of keys. But, in many eases, these data are dynamical and disordered. Thus, the use of algorithm in [4] is limited. In this paper, we discuss problems which create an Almost Cost-Optimal Dynamical (ACOD) 2 4 trees of height h = Iog2.,,5(N + I}, by w a y ' o f limiting the increment of tree height. 1.
Definition
A 2-3 tree is a rooted, oriented tree in which every, nonleaf node contains either 1 or 2 keys, and has either 2 or 3 sons and all the leaves are at the same level. The keys in the left (or right) subtree rooted at a given node are less (or greater respectively) than those resident in the node, and the keys in the middle subtree are greater than the first key and less than the second in the node. The root of a 2-3 tree is said to be at level 0. The sons of a node at level l are said to be at level l + 1. The depth of the tree, denoted as d, is the level of its leaves, denoted as h. "['he height of a 2-3 tree is d-1. Let :q denote the number of 1-key nodes at level l, fll denote the number of 2-key nodes at level l. We define d-1
c,,st(7~ = Z (l + 1)(~, + 2/~,), l=l
as the node-visit cost of a 2-3 tree T.
No.2
J.of Compt. Sci. & Technol.
6[
The 2-3 tree T, containing N keys is cost-optima! if its cost is minimum among all Nkey 2-3 trees. In other words, if T is an arbitrary N-key 2-3 tree, then cost(T.) ~< cost(T). The following two properties have been proven in [-4]: i) Let T and T' be the 2-3 trees, both containing k keys, if h(T) < h(T'), then cost(7") < cost(T'). ii) If a 2-3 tree has 1-key node at level i, and two (or more) 2-key nodes at level j > i, then it is not cost-optimal. According to the two properties, we develop a principle of creating almost costoptimal dynamical (ACOD) 2-3 trees. The following assumptions are made. i) The keys are given dynamically and randomly. In generating a 2-3 tree, the height of the tree is as low as possible. In order to decrease the number of splitting nodes, limit the increment of height, and make the three sons of every node have, if possible, 2keys. ii) The case that one father node contains 1 key and two son nodes contain 2 keys does not occur. iii) When this tree is generated or updated, all reconstructions must result in a 2-3 tree.
2.
Algorithms
In this section, we discuss the algorithms of searching, inserting and deleting keys. These algorithms reflect the ACOD principle mentioned above. The process of generating dynamically 2-3 tree is that all keys are successively inserted to the tree. Updating a tree is that an old key is deleted and a new key is inserted. In both deleting and inserting, we must first search the tree to find the location where a key is deleted or inserted. Before discussing these algorithms, we first give the data structure of a 2-3 tree node (using Pascal data type) as follows: Type Nodeptr = T Node Node = Record KI: integer; lp, rp: Nodeptr; Case NK: 1-.-2 of
1: 2:
(
{first key} {left, right subtree} {No. of keys at a node}
);
(K2: integer rap: Nodptr) END
{second key} {middle subtree}
Because the leaf node does not contain any key, it is a dummy node. Thus the values of lp, rp and rap pointing to the leaf node are nil. The algorithms use a stack to keep the locations of nodes encountered in searching and the directions along the left, middle or right branches at the node. The data type of an element in the stack is
J. of Compt. Sci.& Techno].
62
Vot. l
ENTRY = Record p: Nodeptr: i: 0 . - . 3 {No. of key} END The variable declaration of the stack is Var stack: array 1.-. n of ENTRY; t: ENTRY The operators on the stack are: p u s h ( t ) - - s t o r e the value of variable t into the stack top; p o p ( t ) - - r e t u r n the value of the stack top to the variable t and remove the element of stack top.
2.1.
Algorithm Search (r, K, B)
Find the location of a node containing the given key K from the tree with root r. I l K is found, then B = true, t.p points out the location of this node, t.i is the number of the key which equals K. If K is not found, then B = false, the stack stores the searching path from the root to a leaf indicating the locations of the nodes and the direction.
2.2.
Algorithm Insert (r, K, T)
Insert a new key K in the 2-3 tree with root r. First, call search (r, K, T), if T = TRUE, then key K has already been contained in the 2-3 tree, which is an error. Otherwise, the node p, where the key K is inserted, has been stored in the stack top, it is a node at the lowest level. Three cases should be considered.
Case
1.
If p = nil, then this 2-3 tree is empty, the new key K is the first key on the 2-3 tree, the node containing K becomes the root of the 2-3 tree.
Case
2.
If n o d e p has only one key, then put the new key K on the nodep. Let q be tile father node of the nodep. I f q is not dummy, q. N K = 1, a n d f . N K = 2, w h e r e f i s the brothel of node p, then the reconstruction, is shown in Fig. 1.
q
Pl P~ P~
P4 P~ P,~
Fig. 1
P~ p: p~ p,
p,, p,~
No.2
J.of Compt. Sci. & Technol.
63
Case 3. If node p has contained 2 keys, then the new key K will logically be inserted into the node p, making node p illegal. The necessary reconstruction is as follows: i) If n o d e p does not have father node, then this node is split into three nodes, the middle key goes up as father node (the generated new root), the other 2 keys are in the two son nodes, as shown in Fig. 2.
P~ P~ P:~ P4 Pl
P'-' Ps P4
PL P~ P:~ P: P:, P,~
Fig. 2
Pl P~ P:~ P4
P~
P~;
Fig. 3
ii) If in father node q, q. N K = 1, then the node p is split, the middle key goes up into father node, the other two keys are in the two son nodes, so the node q has three son nodes, as shown in Fig. 3. iii) If q. N K = 2, and there are at least a brother n o d e f o f n o d e p , such t h a t f . N K = 1, then the reconstruction is as shown in Fig. 4.
Pl P~ P, P~ P5 P,~ P7 p~
P L Pz P3
P4 Ps P~ P7 P~
ta)
P~ P~ p:~ p.~ P.~ P~ P7
Ps P~
Pl P2 P3 P4 P5 Ps
Fig. 4
P7 Ps P~
64
J. of Compt. Sci. & Technol.
Vol.1
There are other four similar cases. iv) If the father node and brother nodes all have N K = 2, then n o d e p will be split, middle key goes up into its father node q, other two keys are in the two new son nodes of node q, the father node q becomes illegal which is a node with three keys. The process of reconstruction goes up a level p: = q, as shown in Fig. 5.
Pl
P2 P,~ P.~
Ps
P~ P7
Ps P~ Pm
P] P'-' P~ P.~
Ps Ps
P7 Ps P, Pm
(a)
(b) Fig. 5
2.3. Algorithm Delete (r, K, T) Delete a given key K on the 2-3 tree with root r. First, call search (r, K, T) to find the node q containing the given key K, if q is not a node at the lowest level, then call (rs, K, B), (if q. N K = 2, and q. K f = K, then rs is q. rap, otherwise rs is q. rp). Find a node p at the lowest level, in which the first key K' is the least of those greater than given K. The key K' replaces K in the node q, not to delete internal node key K, but to delete the key K' in the lowest level node. The reconstruction path of deleting key in the lowest level node has been stored in the stack by two (or one) call of algorithm search. To delete key in the lowest level node p., two cases are considered.
Case 1. If the key is contained in a 2-key node, then this node is changed into Lkey node. Case 2. If p. N K = 1, after the key is deleted from the node p, node p has no key, this is illegal. The necessary reconstruction is as follows: i) Ifp has no father node, then the 2-3 tree reduces a level, the son node o f n o d e p is a root of 2-3 tree.
No.2
J.of Compt. Sci. & Technol.
65
ii) L e t f b e the father node of nodep,f. NK = 2, then the reconstruction proceeds between father and son nodes, as shown in Fig. 6:
Pt
P_~ P3 P4 Ps Ps
PJ Pz P3 P~ P5 Ps
(,)
Pz
Pz P3
P~ Ps P,~
P~ Pz P3 P4 Ps P6
(b)
P~
Pz P3 P4 Ps
PL Pz !33 P' P~
(r Fig. 6
iii) I f f . N K = I , and for the brother node q of node p, q. N K = 2 , then reconstruction is as in Fig. 7.
Pl
P~ P:~ P'~
PJ P~ P:* P4
r'~.7
Pl
P:~ Pz
rig. 8
Pl
P:' P3
66
J. of Compt. Sci. & Technol.
Vol.1
iv) Iff. N K = 1 and of. ArK = 1 then father and son nodes are combined into a son node," as shown in Fig. 8. The father node becomes illegal node without any key. The process of reconstruction goes up a level, p: = f . Repeat i)-iv) of the algorithm delete. The 2-3 tree created by these three algorithms is ACOD 2-3 tree.
3. 3.1
Experiment and Analysis
Preliminary Experimental Result
We have created both a random 2-3 tree and an ACOD 2-3 tree with pseudorandom numbers generated by computer as keys. Whenever inserting a key, we calculate the node-visit cost for the two trees. We construct Table 1 for the node-visit cost of four 2-3 trees which are of different kinds and all contain the same keys. The step lengths of keys are increasing. In the Table, the values for the cost optimal 2-3 tree and pessimal 2-3 tree are the result calculated from mathematical formula, others are experimental results.
Table 1. The N o d e - v i s i t Costs o f V a r i o u s 2 4 T r e e s w i t h t i n S a m e K e y s
Optimal
aCOD
Random
Worst
10
25
25
26
26
25 5o
65
65
87
88
166
173
221
2~7
8O
284
351
352
426
115
382
504
611
626
150
640
662
796
936
190
830
840
999
1216
235
1059
1267
1469
1510
280
1421
1512
1758
1979
33o 39o
1701 2030
1781 2104
2074 2458
2399 2876
455
2387
2453
2860
3375
52O
2762
3315
2774
4184
From Table 1, we can see that the node-visit cost of ACOD 2-3 tree is between the cost-optimal and random 2-3 trees and is near the cost of the cost-optimal 2-3 tree. In addition we see that the maximum numbers of keys for different 2-3 trees with the same height can differ very much, which are listed in Table 2.
No.2
J.of Compt. Sci. & Technol. T a b l e 2. M a x i m u m Height 1 2 3 4 5 6 7
67
N u m b e r o f K e y s f o r V a r i o u s 2-3 T r e e s w i t h t h e S a m e H e i g h t
Optimal
ACOD
Random
Worst
2 8 26 80 242 728 2186
2 8 25 60 186 458 ?
2 7 24 40 113 225 502
2 6 14 30 62 126 254
In Table 2, the optimal and the worst results are calculated from mathematical formula, and others are experimental results. If the various levels of a tree keep balance and growing, then the n u m b e r of keys N and tree height h will satisfy the relation a h = N + 1, i.e. h = log~(N + 1). For col.3 of Table 2, when h = 5 or 6, then N + 1 = 187 or 459, we can deduce a = 2.85 or 2.70, respectively. Also, for the col.4, h = 6 or 7, then N + 1 = 226 or 503, deduce a = 2.47 or 2.43, respectively. Therefore, we can see that the ACOD 2-3 tree is more like the costoptimal 2-3 tree as far as the relation between the height h and the n u m b e r of keys N is concerned.
3.2. Asymptotic Analysis We use a technique of fringe analysis, which is first presented by Yao [31. Consider the internal nodes on the lowest 2 level of ACOD 2-3 tree. There are 7 type subtree structures, as shown in Fig. 9.
type I
type2
type3
type4
type5
type6
type7
Fig. 9
Tis of class (x 1, ..., XT), if there are xi 2-height subtrees of type i for each i. ACOD 23 trees with no fewer than 3 keys can be classified according to 2-height subtrees. Let ~ N ) be an N-key ACOD 2-3 tree of class (xt, ..., xT). Because the number of leaf nodes is N + 1 , so 4 x 1 -4- 5 x 2 A- 6 x 3 -4- 6 x 4 +
7x s + 8X 6 31- 9x v = N + 1.
(1)
We use PN(Xt,..., XT) to denote the probability for an n-key ACOD 2-3 tree to be of class (xl,--.,XT). For each i(1 ~
A,=
E x I .... , x 7
x,p~xl,'",xT),
68
J. of Compt. Sci. & Technol.
Vol.1
Ai(N ) is the average value of x, for all T(N). To compute Ai(N)., we will insert a new key into any ( N - 1)-key ACOD 2-3 tree T~N) of class (x 1, ...,xv) to make it an N-key ACOD 2-3 tree ~ N ) of class (x 1, ...,x,). Because the new key always goes into a leaf node, the probability of adding a new key to the same leaf node is 1/N + 1. According to the. inserting algorithm, the 2nd type subtree can only become the 4th-type subtree, to insert a new key into a 7th-type subtree makes it split into two subtrees. Table 3 lists all the cases of changes from TIN - 1) to TIN). The right column lists its probability of occurrence. T a b l e 3. T r a n s i t i o n U n d e r a R a n d o m
I n s e r t i o n : a n ( N - - 1)-key A C O D 2-3 T r e e o f C l a s s (xl, ...,xT) B e c o m e a n N-key A C O D 2-3 T r e e of C l a s s of C l a s s (x~, .-.,x7)
x'l
x' 2
x'3
x',t
x'~
x' 6
x'7
xx - x
x2 +1
x3
x4
x5
x6
x7
X1
3s2 - I
X3
~4 + '
~5
X6
X7
Probability 4xl/N 5x2/N
X!
X2
X3 - 1
X~
X5 + 1
X6
~7
6x3/N
xt
x2
xs
x,- 1
xs +1
x6
x7
6x,/N
Xl
x2
xs
x4
xs
x6
x7
7xJN
xl
x2
x3
x4
x5
x6- x
x7+ t
8x6/N
X I
3r
X3
YgA
X$
3~6
X7 - 1
XI
X2 * 2
X3
X4
X$
176
X7 - 1
-
~
+
x
3X7/N
By Table 3, compute
A,(U) =
(xl - 4x~/N + 6xv/N)PN_ ,(x,, .--,xT)
2 x 1 , .... x 7
= A,(N-1)-4Ai(N-
1)/N + 6AT(N)/N.
Similarly we can compute A2(N) = A2(N - 1) + 4 A t ( N - 1)/N- 5 A 2 ( N - 1)/N + 6 A v ( N - 1)/N,
A (S) = A 3 ( N -
1 ) - 6As(N- 1)IN + 6Av(N - 1)/N,
A4(N) = A 4 ( N - 1) + 5A2(N- 1)/N -- 6A4(N- 1)/N,
A,(U) = A s ( N -
1) + 6 A 3 ( N - 1)/N + 6 A 4 ( N - 1)/N- 7As(N- 1)/N,
A6(S) = A 6 ( N -
1) + 7As(N - 1)/N- 8 A 6 ( N - 1)/U,
= A v ( N - 1 ) + 8 A 6 ( N - 1)/N- 9 A T ( N - 1)/N. Let A(N) be the 7-component column vector (Ai(IV)), then
A(N) = ([ + 1D) A(N - 1),
(2)
No.2
J.of Compt. Sci. & Technol.
69
where I is the 7 • 7 identity maxtrix and D is given by -4
D=
0
0
0
0
0
6
4
5
0
0
0
0
6
0
0
0
0
0
6
0
5
0
0
0
0
0
0
6
6
--7
0
0
0
0
0
0
7
--8
0
0
0
0
0
0
8
-6 -6
-9
To solve A(N) from (2), we define
ai(N ) = A,(N)/(N + 1).
(3)
Let a(N)denote column vector a ( N ) = (ai(N)) , then (2) can be written as
(
,
)
a(N) = I + - ~ - - ~ ( D - I) a ( N - 1).
(4,)
To solve (4), the following Lemma is useful.
Let G be a t x t real matrix with simple eigenvalues 2o, 21, "", 27_ 1, where 20 = O, and Re 2~_ 1 ~ Re 2,_ 2 ~ - . ~ Re 2~ < O. I f v(1), v(2), ..., v{N),-., is a Lemma.
1
sequence of t-component vector satisfying v(j) = (1 + ] - - ~ G ) v ( j - 1), then there exists a vector u such that (i) Gu = O, (ii) I(v(N))i - nil < CN ne~l for some constant C and all i, N, where (v(N))i, u~ denote the i-th component of v(N) and U, respectively. The proof of the Lemma can be found in Knuth (2, pp. 6 7 9 ~ 6 8 0 , answer to ex. 10). The characteristic polynomial of D-I is - 2(2 + 7)(25 + 4524 + 83523 + 817022 + 424832 + 95604), the eigenvalues of D-Iare 0, - 6 . 8 8 + 5.775i, - 7, - 8.37 + 3.41i, 14.499. Thus D-[ satisfies the conditions on G in Lemma, therefore, there exists a vector u = (ui) such that -
l a i ( N ) - uif < Co N-6.ss, for some constant Co where u
(5)
satisfies (D - I)u = 0. In addition, (1) and (3), as N ~ oe, can lead to the equation
4u~ + 5uz + 6u3 + 6u,L + 7u 5 + 8u 6 +
9u 7 =
1.
(6)
Therefore, solving equations (5) and (6), we obtain u, and then the average numbers A(N) of subtrees of various types.
70
J. of Compt. Sci. & Tcchnol.
Vol.1
A~(N) = 0.021(N + 1)
A E(YV) = 0.032(N + 1)
A3(N) = 0.01Z(N + 1)
A4(N ) = 0.023(N + 1)
(7) As(N ) = 0.028(N + l)
A6(N ) = 0.022(N + 1)
A,(N) = o.ola(N + i) Therefore, we can obtain the following results: Theorem 1. The numbers of keys Kh(N), Kh- ~(~ at the lowest level and the second lowest level of an N-key ACOD 2-3 tree are 0.5933(N + 1) and 0.2485(N + 1), respectively. Proof. Because Kh(N) = 2Ax(N) + 3A2(N) + 4A3(N) + 3A4(N) + 4As(N) + 5A6(N) + 6Av(N) and Kh_ l(N) = AI(N) + A2(N) + A3(N) + 2(A4(N) + As(N) + A6(N) + Av(N)) and (7), we can deduce these results. Corollary, The number of ke)s K(N) at the lowest two levels of an N-key ACOD 2-3 tree is 0,8418(N + 1). We.model the relation between the height h and the number of keys N for an N-key ACOD 2-3 tree by a h ~ N + l. From these results of Theorem 1 and its corollary, we can derive a in the following three ways: i)
because
1/we ob.n 1)/o.,(1 1)we ob.i.o.239, \
.>
a~ \
-
-
/
\
-~ \
)
/
0.8418(N + 1 ) = (N + 1) 1 - ~ - y , we obtain a ~ 2.52. The three different a's obtained show that a h = N + 1 is only an approximate expression. We use their average value, a ~ 2.45. Therefore, the asymptotic height of an N-key ACOD 2-3 tree is h ~ log2.4s(N + 1). Ref.[-4] indicates the height of random 2-3 tree is lOgT/3(N + 1) ~ logz.a3(N + 1), which is reduced from the result of Ref.[3]. Our analysis is an improvement of the methods in Ref. [3].
3.3, Performance Analysis of Algorithms i) Searching algorithm is O(h), i.e. O(log2.45(N + 1)). ii) The time of inserting = searching time + reconstruction time. As reconstruction proceeds level by level upward, at each level, the level at most 4 comparisons is made. In this ACOD 2-3 tree, 84% keys reside in the lowest two levels, therefore, inserting these
No.2
J.of Compt. Sci. & Technol.
71
84% keys, the reconstruction requires at most 4 ~ 8 comparisons. For the other 16% keys, the time of reconstruction is not more than 4 times of h. In a word, the time of inserting an ACOD 2-3 tree is O(h). The time of dynamical generating an ACOD 2-3 tree containing N random keys is not more than O(N" h), i.e. O(Nlogz.,,5(N + 1)). iii) The time of deleting = the searching time + reconstruction time of deleting, we can easily see that is O(log2.,5(N + 1))too, Therefore, this ACOD 2-3 tree is better than the random 2-3 tree in the case of frequently updating, and even more better for more searchings than updatings.
References [1] [2] [3] 1-4]
A.V. Aho, J. E. Hopcroft and J-:.D. Ullman, The design and analysis of computer algorithms, Addison-Wesley, 1974. D. E. Knuth, The art of computer programming, Vol.3: Sorting and Searching, Addison-Wesley, 1973. A. chi-ehih Yao, On random 2-3 tree, Acta Informatica, 9(197B), 159-170. R. Miller, N, Pippenger, A, Roseaber~ and L. Snyder, Optimal 2,3-tree, SIAM J. Comp., 8:1(1979), 42--59.