c Pleiades Publishing, Ltd., 2017. ISSN 1990-4789, Journal of Applied and Industrial Mathematics, 2017, Vol. 11, No. 1, pp. 40–48. c V.G. Gasenko, 2017, published in Sibirskii Zhurnal Industrial’noi Matematiki, 2017, Vol. XX, No. 1, pp. 21–30. Original Russian Text
The Differential Fourier Transform Method V. G. Gasenko* Kutateladze Institute of Thermophysics, pr. Akad. Lavrent’eva 1, Novosibirsk, 630090 Russia Received February 1, 2016
Abstract—We suggest the two new discrete differential sine and cosine Fourier transforms of a complex vector which are based on solving by a finite difference scheme the inhomogeneous harmonic differential equations of the first order with complex coefficients and of the second order with real coefficients, respectively. In the basic version, the differential Fourier transforms require by several times less arithmetic operations as compared to the basic classical method of discrete Fourier transform. In the differential sine Fourier transform, the matrix of the transformation is complex, with the real and imaginary entries being alternated, whereas in the cosine transform, the matrix is purely real. As in the classical case, both matrices can be converted into the matrices of cyclic convolution; thus all fast convolution algorithms including the Winograd and Rader algorithms can be applied to them. The differential Fourier transform method is compatible with the Good–Thomas algorithm of the fast Fourier transform and can potentially outperform all available methods of acceleration of the fast Fourier transform when combined with the fast convolution algorithms. DOI: 10.1134/S1990478917010057 Keywords: discrete Fourier transform, fast Fourier transform, harmonic differential equation, Good–Thomas algorithm, Winograd method
INTRODUCTION Given a complex vector Y = (y0 , y1 , . . . , yN −1 ), the direct and inverse discrete Fourier transforms (DFT) with a complex vector Z = (z0 , z1 , . . . , zN −1 ) are defined as follows: zm =
N −1
−1 −(N −1) yk a−k , m = y0 + y1 am + · · · + yN −1 am
(1)
k=0 N −1 1 1 −1 z0 + z1 am + · · · + zN −1 aN zm akm = , (2) m N N m=0 √ where am = exp(2πmj/N ), and j = −1 is the imaginary unit. From now on, the classical DFT (1) is called the basic DFT [1].
yk =
For a continuous periodic function y(x) with period L which is, in general, analytic complex function of real argument, the Fourier series is defined as y(x) =
∞ m=0
2πmjx/L
z˜m e
,
1 z˜m = L
L
y(x)e−2πmjx/L dx.
(3)
0
Here the first N/2 coefficients are connected with (1) by the relation z˜m = zm /N provided the values of function (3) at the mesh nodes y(xk ) = yk coincide with the entries of (2), where xk = Lk/N for k = 0, 1, . . . , N − 1. *
E-mail:
[email protected]
40
THE DIFFERENTIAL FOURIER TRANSFORM METHOD
41
Consider the Cauchy problem for the inhomogeneous first order differential equation with complex coefficients and the zero initial condition dSm − jαm Sm = y(x), Sm (0) = 0, (4) dx whose solution x jαm x e−jαm x y(x ) dx (5) Sm (x) = e 0
for x = L and αm = 2πm/L is connected with the value of the terms of (3); i.e., Sm (L) = z˜m L. If we solve (4) by the finite-difference methods; then, obviously, its solutions on the uniform mesh xk = Lk/N , k = 0, 1, . . . , N − 1, is connected with the coefficients of the DFT. We will demonstrate further in the paper that this exact relation exists, and the DFT coefficients can be obtained by a different, other than (1) method, which we will call the differential Fourier transform method (DFTM). 1. AN ALGORITHM FOR THE DISCRETE DIFFERENTIAL SINE FOURIER TRANSFORM Consider the following explicit finite-difference scheme of the second order of accuracy with respect to O(h2 ) to approximate (4): Sm,k − Sm,k−2 2πm − 2j sin Sm,k−1 = yk , k = 0, 1, . . . , N − 1, (6) 2h N where h is the meshsize with respect to x. If we define the translation operator r for the finite-difference scheme (6) as Sm,k = rSm,k−1 then the stability condition |r| ≤ 1 assumes |2h sin(2πm/N )| ≤ 1 or 2h ≤ 1. Putting 2h = 1 and 2j sin(2πm/N ) = am − a−1 m , we obtain the following stable recurrence formula for calculating the integral sums Sm,k : k = 0, 1, . . . , N − 1. (7) Sm,k = am − a−1 m Sm,k−1 + Sm,k−2 + yk , In what follows, we put Sm,−2 = Sm,−1 = 0. After some simple transformations, we can obtain from (7) the explicit connection between Sm,k and yk for m = 0, 1, . . . , N − 1: Sm,k =
k −(k+1−n) ak+1−n + (−1)n+k am m
am + a−1 m
n=0
(8)
yn .
For clarity, let us write out the expressions for the first integral sums resulting from (7): Sm,1 = am − a−1 Sm,0 = y0 , m y0 + y1 , −1 Sm,2 = a2m − 1 + a−2 m y0 + am − am y1 + y2 , 2 −3 −2 −1 Sm,3 = a3m − am + a−1 m − am y0 + am − 1 + am y1 + am − am y2 + y3 ,
(9)
and for the last two integral sums following from (8): −(N −1)
Sm,N −2 =
−1 + (−1)N a aN m m −1 am + am
−(N −2)
−2 − (−1)N a aN m m y1 −1 am + am a3 + a−3 a2m − a−2 m m + ··· + m y + yN −3 + yN −2 , N −4 am + a−1 am + a−1 m m
y0 +
−(N −1)
Sm,N −1 =
N −N −1 + (−1)N a aN aN m m − (−1) am m y + y1 + . . . 0 −1 −1 am + am am + am a2m − a−2 a3 + a−3 m m y + yN −2 + yN −1 . + m N −3 am + a−1 am + a−1 m m
JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS
Vol. 11 No. 1
2017
(10)
42
GASENKO
From (9) and (10), we find the connection of zm with Sm,k : zm = am Sm,N −1 + Sm,N −2 . The above algorithm of calculating the Fourier coefficients of a complex vector Y will be called the differential sine Fourier method (DSFM) . Taking into account the oddness of N and the two types of evenness, when N is divisible by 2 or 4, this algorithm finally assumes the form k = 0, 1, . . . , N − 1, Sm,k = 2j sin 2πm/N Sm,k−1 + Sm,k−2 + yk , zm = am Sm,N −1 + Sm,N −2 ,
m = 0, 1, . . . , N − 1,
if N is odd,
m = N/4, . . . , 3N/4, N = 4t, if N is even, zm = am Sm,N −1 + Sm,N −2 , zN/2−m = zm − 2 cos 2πm/N Sm,k−1 , m = (N + 2)/4, . . . , (3N − 2)/4, N = 2(2t + 1).
(11)
By counting the number of real multiplications M and additions A in the operations with a complex vector Y in the DSFM algorithm (11) for odd N , we respectively obtain M = 2(N − 1)(N − 2) + 8,
A = 4(N − 1)(N − 2) + 10.
(12)
In the case of N even, since sin(2πm/N ) = sin(2π(N/2 − m)/N ) and the two Fourier coefficients zm and zN/2−m are calculated with the help of one pair of the integral sums Sm,N −1 and Sm,N −2 , the number of operations is reduced to M = N 2 − 4,
A = 2N 2 − 4.
(13)
Comparing the number of real operations of algorithm (1) given in [1]: M = 4(N − 1)2 A = 2(2N − 1)(N − 1),
(14)
with (12) and (13), we can see that for N 1 the DSFM algorithm for odd N is twice more efficient than the basic one according to the number of multiplications, while for N even, by four times. The exact numbers of operations for N small will be given in the subsequent sections. The finite-difference scheme (6) is not the only possible for approximation of (4). Consider the following implicit absolutely stable finite-difference scheme of the second order accuracy with respect to O(h2 ): Sm,k + Sm,k−1 Sm,k − Sm,k−1 − 2jαm = y(xk−1/2 ), h 2
k = 0, 1, . . . , N,
(15)
with the recurrence relation Sm,k =
1 + jαm h h Sm,k−1 + y(xk−1/2 ). 1 − jαm h 1 − jαm h
After introducing the notation , 1 ± jαm h = 1 + α2m h2 e±j arctan αm h = ra±1/2 m
hr −1 a1/2 m y(xk−1/2 ) = yk
we obtain the algorithm different from (11) of calculating the DFT of a complex vector: k = 0, 1, . . . , N, Sm,k = am Sm,k−1 + yk , m = 0, 1, . . . , N − 1. zm = Sm,N ,
(16)
Here, we also have S−1 = 0 and yN = 0. Despite the simplicity of algorithm (16) and the exact implementation of the idea of (5), this algorithm has no advantages in terms of the number of operations as compared to the basic algorithm (1). Its only positive feature is the multiplication by the constant am rather than by the variable a−k m as in (1). Algorithm (16) is not under consideration further. Unfortunately, the advantage of the DSFM algorithm (11) is diminished by the condition of evenness of N . Moreover, in the operations with real vectors, the integral sums remain complex and require JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS Vol. 11 No. 1 2017
THE DIFFERENTIAL FOURIER TRANSFORM METHOD
43
extraction of real and imaginary parts from them, for example, in the form Am
N/2−1 N/2−1 Sm,N + Sm,N −2 2πm2k 2πm(2k + 1) = y2k − j y2k+1 , = cos sin 2 N N
Bm
N/2−1 2πm 2πm(2k + 1) 2πm2k Sm,N −1 = y2k+1 − j y2k , = cos cos sin N N N
k=0
k=0
N/2−1
k=0
zm = Am + Bm ,
k=0
m = 0, 1, . . . , N/2.
The above relations are useful for calculating the DFT of real vectors with additional symmetry. 2. AN ALGORITHM FOR THE DISCRETE DIFFERENTIAL COSINE FOURIER TRANSFORM By analogy to (4), we consider the Cauchy problem for an inhomogeneous differential harmonic equation of the second order with real coefficients and the zero initial conditions: d2 Sm + α2m Sm = y(x), Sm (0) = Sm (0) = 0. dx2 The general solution of (17) has the form x x sin(αm x) cos αm x y(x ) cos(αm x ) dx − y(x ) sin(αm x ) dx . Sm (x) = αm αm 0
(17)
0
On the finite intervals x = L = π(2m + 1)/αm and x = L = π(4m + 1)/(2αm ), this solution is also connected with the coefficients of the Fourier series (3) of the continuous function y(x) on the interval (0, L) and, hence, with the DFT (1). Let us find the relation of the finite-difference solution (17) with the DFT. Consider the following explicit finite-difference scheme of the second order of accuracy with respect to O(h2 ) for approximation of (17): Sm,k − 2Sm,k−1 + Sm,k−2 + α2m Sm,k = y(xk ), k = 0, . . . , N − 1. (18) h2 Analyzing (18) for the stability condition similarly to (6), we obtain the condition 1 − α2m h2 /2 ≤ 1. 2 Denoting 2 − α2m h2 = am + a−1 m , which satisfies the stability condition, and yk = h y(xk ), we obtain the following recurrence relation for calculating the integral sums Sm,k : k = 0, . . . , N − 1. (19) Sm,k = am + a−1 m Sm,k−1 − Sk−2 + yk , The recurrence formula (18) admits the explicit solution with respect to yk : Sm,k =
k −(k+1−n) ak+1−n − am m yn . −1 a − a m m n=0
(20)
For clarity, we separately write the expressions for the first integral sums resulting from (19): Sm,1 = am + a−1 Sm,0 = y0 , m y0 + y1 , −1 Sm,2 = a2m + 1 + a−2 m y0 + am + am y1 + y2 , 2 −3 −2 −1 Sm,3 = a3m + am + a−1 m + am y0 + am + 1 + am y1 + am + am y2 + y3 , and for the last integral sums following from (20): −(N −1)
Sm,N −2 =
−1 − a aN m m am − a−1 m
−(N −2)
y0 +
−2 − a aN m m y1 + . . . am − a−1 m a2m − a−2 a3 − a−3 m m y + yN −3 + yN −2 , + m N −4 am − a−1 am − a−1 m m
JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS
Vol. 11 No. 1
2017
(21)
44
GASENKO −(N −1)
Sm,N −1 =
−N −1 − a aN aN m m − am m −1 y0 + am − am am − a−1 m
+
y1 + . . . a2m − a−2 a3m − a−3 m m y + N −3 −1 yN −2 + yN −1 . am − a−1 a − a m m m
(22)
Hence, the algorithm of computing the Fourier coefficients of a complex vector on the basis of (19)–(22), which we will call the differential cosine Fourier method (DCFM), assumes the form k = 0, . . . , N − 1, Sm,k = 2 cos 2πm/N Sm,k−1 − Sm,k−2 + yk , (23) zN −m = zm − 2j sin 2πm/N Sm,N −1 , zm = am Sm,N −1 − Sm,N −2 , m = 0, . . . , [N/2]. Counting the number of real operations in the DCFM algorithm (23) in the case of N even and odd, we infer M = N 2 − 4, M = (N − 1)(N + 2),
A = 2N 2 − 4
for N even,
A = 2(N − 1) 2
for N odd.
(24)
As it follows from (24), the DCFM algorithm, unlike the DSFM one, requires for the calculation of the integral sums Sm for N 1 four times less multiplications as compared to the basic algorithm (1) for both even and odd N . Moreover, (19) simplifies the DFT operations with the real vector. If the initial vector Y is real then the vector Z = U + jV is Hermitian, i.e., its real part is symmetric: uN −m = um , while the imaginary part is skew-symmetric: vN −m = −vm . Therefore, the DFT of the real vector is limited to its compact representation with m = 0, . . . , N/2 for even N or m = 0, . . . , (N − 1)/2 for odd N . The DFT of the real vector can be calculated directly from (23), extracting the real and imaginary parts from zm = um + jvm ; however, it is easier to expand the calculation cycle for integral sums (23) up to Sm,N for yN = 0. Then, the DCFM algorithm for the real vector assumes the form k = 0, . . . , N, Sm,k = 2 cos 2πm/N Sm,k−1 − Sm,k−2 + yk , (25) 1 vm = sin 2πm/N Sm,N −1 , m = 0, . . . , [N/2]. um = (Sm,N − Sm,N −2 ), 2 Note that the DCFM algorithm for the real vector (25) is of practical significance only for short vectors. On the other hand, the long real vector, after the preliminary processing, is packaged into a complex vector of the twice or four times shorter length [2], provided the initial vector has an additional type of symmetry of the form yk = yN −k or yk = −yN −k . Then, the resulting complex vector is subjected to the complex DFT and, after post-processing, is either the DFT of a real vector or the discrete cosine or sine Fourier transform of some real vector in the presence of the indicated symmetry. This approach [3] is practiced everywhere since the DFT of a complex vector can be accelerated many times by the fast Fourier transform (FFT) considered below in the application of the DFTM. 3. APPLICATION OF THE DFTM TO THE FFT ALGORITHM The above algorithms of the DSFM (11) and DCFM (23), (25) are exact analogs of the DFT of complex and real vectors with at least three-fold gain in the number of operations as compared to the basic DFT, and as such can be used independently. However, of particular interest is the application of the DFTM to the FFT algorithms. The two FFT algorithms are known: the Cooley–Tukey algorithm [4] and the earlier but less popular algorithm due to Hood–Thomas [5–7]. The idea of the FFT is that if N equals the product of two positive integers N = N1 N2 then the number of DFT operations can be significantly reduced by presenting (1) in the form of a multiple series. Let us write k and m as follows: k = k1 N2 + k2 , m = m1 + m2 N1 ,
k1 = 0, . . . , N1 − 1, m1 = 0, . . . , N1 − 1,
k2 = 0, . . . , N2 − 1, m2 = 0, . . . , N2 − 1,
(26)
JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS Vol. 11 No. 1 2017
THE DIFFERENTIAL FOURIER TRANSFORM METHOD
45
as in the algorithm [4]. Then the DFT (1) assumes the form of a double series ⎛ ⎞ N N 2 −1 1 −1 ⎝ zm = zm1 ,m2 = yk1 ,k2 bkN1 m1 N2 ⎠ bkN2 m , k2 =0
(27)
k1 =0
where bN = exp(−2πj/N ) or, which is the same, is reduced to the successive finding of the sum of the series xm1 ,k2 =
N 1 −1
yk1 ,k2 bkN11,m1 ,
zm =
k1 =0
N 1 −1
(m1 +m2 N1 )k2
xm1 ,k2 bN
.
(28)
k2 =0
It is easy to calculate that if the number of complex multiplications and additions in (1) is equal to N 2 then in (23) it is N (N1 + N2 ). Next, repeatedly factorizing N2 , we obtain in the FFT algorithm [4] the number of operations N (N1 + N2 + · · · + NS ) that is smaller by several orders of magnitude. It is easy to see that the DFTM can only be applied to the first sum in (28), and hence, generally, the DFTM algorithm cannot be applied to the Cooley–Tukey algorithm of the fast Fourier transform [4]. The fully functional application of the DFTM algorithm is achieved in the Good–Thomas FFT algorithm [5–7] where N is also decomposed into a product, but a product of quasiprime numbers N = N1 N2 . . . NS so that no pair of factors has common divisors. Indeed, if N = N1 N2 , where N1 and N2 are quasiprime numbers, then for k and m the following representation is possible: k1 = 0, . . . , N1 − 1,
k = k1 N2 + k2 N1 ,
k2 = 0, . . . , N2 − 1,
m1 = 0, . . . , N1 − 1,
m = m1 N2 + m2 N1 ,
m2 = 0, . . . , N2 − 1,
(29)
which follows from the relation bkN = bkN , where k = mod(k, N ) is the remainder of the division of k by N . In view of (29) and relations bkN1 m2 N1 N2 = 1 and bkN2 m1 N1 N2 = 1, the double series (27) assumes the form ⎛ ⎞ N N 2 −1 1 −1 ⎝ yk1 ,k2 bkN11m1 N2 ⎠ bkN22m2 N1 . (30) zm = zm1 ,m2 = k2 =0
k1 =0
Considering in (30) that the indices k1 = mod(k1 N2 , N1 ) and k2 = mod(k2 N1 , N2 ) assume the same series of values as k1 and k2 , though in a different order, the double sum (30) can be written as the successive summation over the equivalent indices k1 and k2 : xm1 ,k2 =
N 1 −1 k1 =0
m k
yk ,k bN11 1 , 1
2
zm =
N 2 −1 k2 =0
m k
xm1 ,k2 bN22 2 .
(31)
Here, the vector Y is obtained from the initial vector Y by the corresponding permutation of entries. Both sums in (31) are the completed DFT of a two-dimensional vector with the dimension of N1 × N2 . Continuing the recursive decomposition of factors, we arrive at the DFT of a multidimensional vector N1 × N2 × · · · × NS with the dimensions of quasiprime numbers. In the literature, this method of calculating the FFT is called the Algorithm of prime factors (APF) [1], not by the names of the authors of this algorithm. Obviously, the APF is fully compatible with the DFTM algorithm, while the gain in the number of operations remains the same. Let us indicate some features of the application of the DFTM in the APF algorithm. Firstly, the representation N = 3 · 4 · N3 · · · NS is the most advantageous because for N1 = 3 almost half of multiplications and additions in the DCFM algorithm degenerate, while for N2 = 4 all multiplications and half of the operations of addition in the DSFM algorithm degenerate as compared with (13) and (23). Secondly, all other factors in the presentation of N are odd, and to calculate the DFT, we should use the DCFM algorithm for all m with the exception of m = 0, where, due to the degeneracy of the multiplication operations and half of the addition operations, the DSFM algorithm is preferable. JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS
Vol. 11 No. 1
2017
46
GASENKO
4. COMPARISON OF THE EFFICIENCY OF THE DFTM ALGORITHM The exact numbers of real operations of multiplications M and additions A in the basic algorithm (1), the DSFM and the DCFM in the Fourier transform of the short complex vectors defined by formulas (14), (13), and (24) are shown in the table. For N = 3 and N = 4, these numbers are adjusted by the number of degenerations of the operations of addition and multiplication in view of sin 0 = cos(π/2) = 0 and 2 cos(2π/3) = −1 in the algorithms (11) and (23). The table also presents the data on the number of operations in the Winograd algorithm [8]. Comparison of the number of DFT operations performed by different methods APF
N
DFTM
Winograd
M
A
M
A
M
A
3
16
20
6
12
4
12
4
0
24
0
16
0
16
5
64
72
28
48
10
34
7
144
156
54
96
16
72
11
400
420
130
240
40
168
13
576
600
180
336
40
186
17
1024
1056
304
576
70
314
1092/1024
76672
88568
25728
51936
7312
36600
T1092
1536
700
446
FFT N = 2n M
A
10248
30728 —
As it follows from the table, the DFTM algorithm is, indeed, superior here to the basic APF algorithm in terms of the number of multiplication and addition operations, on average, by three or two times respectively, but bears no comparison with the Winograd algorithm. The essence of the Winograd algorithm, as well as many others like it which are described in detail in [1, 3], is bringing of the DFT transformation matrix to a cyclic convolution matrix and applying the algorithms of the convolution acceleration. For short vectors, this reduction is possible, and practically it is equivalent to the following matrix form of the DFT: ⎞⎛ ⎞ ⎛ y 1 1 . . . 1 ⎟⎜ 0 ⎟ ⎜ ⎟ ⎜ −(N −1) ⎟ ⎜ −1 ⎟ ⎜ y1 ⎟ ⎜1 a1 . . . a 1 ⎟⎜ ⎟ = CDBY. (32) Z = AY = ⎜ ⎟⎜ ⎟ ⎜ ⎜ . . . . . . . . . . . . . . . . . . . .⎟ ⎜ . . . ⎟ ⎠⎝ ⎠ ⎝ −(N −1) . . . a 1 a−1 y N −1 N −1 N −1 Here, the initial matrix A of the basic DFT (1) with the reduced elements is transformed into the three matrices: The matrix B consists only of the entries 0 and ±1 and generates a new vector, usually, of a greater length than the initial. The matrix D is diagonal, while C consists of the entries 0, ±1, and ±j and forms the final vector Z. The number of operations of the long vector, expressed as the product of quasi-prime numbers N = N1 N2 · · · NS , is calculated in the APF on the basis of the number of real operations of the twodimensional vector N1 × N2 [1]: A = N1 A2 + N2 A1 , (33) M = N1 M2 + N2 M1 , which is recursively generalized on arbitrary dimension. Here, A1 , M1 and A2 , M2 are the numbers of the addition and multiplication operations for the vectors N1 and N2 respectively. In the table, there are JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS Vol. 11 No. 1 2017
THE DIFFERENTIAL FOURIER TRANSFORM METHOD
47
also given the numbers of operations calculated on the basis of (33) for the long complex vector with N = 3 · 4 · 7 · 13 = 1092 and all types of algorithms. For comparison, we also present there the number of real operations in the Cooley–Tukey FFT algorithm of a similar length with N = 210 = 1024. A separate row at the bottom of the table shows the total number of cycles, i.e., the sums of the number of additions and multiplications performed simultaneously in the parallelization of the DFT operations for a vector of length N = 1092 in the APF, DFT, and Winograd’s algorithms, i.e., wherever this is possible. In the general case, to parallelize the computation process means to create its time graph and then to divide the initial data and operations on them between several processors so that some amount of calculations can be carried out simultaneously without any exchange of intermediate results between the processors. For the DFT operations by the APF algorithms, these graphs are shown, for example, in [1, 3]. It follows from them that the DFT of a multidimensional vector, to which the APF algorithm is reducible, can be perfectly parallelized. The essence of parallelization of the DFT operations for the given vector as a multidimensional one is the carrying out, first, the complete DFT with N = 3 for the data divided in accordance with (29) and (31) between 4 · 7 · 13 = 364 processors; then, a new division of the data between the 3 · 7 · 13 = 273 processors and simultaneous performance of DFT with N = 4, and so on. The total number of cycles, i.e., the elapsed time of the multiprocessor computer for performing the DFT for a vector of dimension N1 × N2 × · · · × NS , in general is equal to TN =
S
(34)
Mk + Ak .
k=1
In the case of N = 1092, the FFT execution time in the parallelization is reduced according to the table by two orders of magnitude in all three APF, DFM, and Winograd’s algorithms. It does not seem feasible to parallelize the graphs of the Cooley–Tukey algorithm both with frequency decimation and time decimation [3], since the interprocessor data exchange is required after each operation of complex multiplication and addition. The number of FFT operations for the long vector in the table and formula (34) give a false idea of the absolute leadership of the Winograd algorithm. In the years 1970s and 1980s, when the multiplication operations were performed on the computer almost an order of magnitude longer than the addition operations, the Winograd algorithm was unchallenged despite the great length of its code. However, in modern computers where all operations are performed in a single cycle, only the total number of operations, the possibility of parallelization, and the code length are significant. Since the DFTM algorithm has extremely compact code and can be parallelized, while in terms of the total number of operations it is only one and a half times below the Winograd algorithm, it can perfectly compete with the existing FFT algorithms. The Cooley–Tukey method [4] has a compact code and a small number of operations; therefore, it remains the basic one for the single-processor computers. However, because of the impossibility of its parallelization, it drops out of competition in the case of multiprocessor computers. CONCLUSION. PROSPECTS OF THE DFTM ALGORITHM Consider the possibility of accelerating the DFTM algorithm. According to (19), calculating vector Z requires two vectors SN −2 and SN −1 that, by (16), can be represented in matrix form, for example, SN −1 = SY . The entries of S have the form m, k = 0, . . . , N − 1, sm,k = − sin 2πmk/N sin 2πm/N , which does not allow us to transform it into a matrix of cyclic convolution and thus using the methods of acceleration of the convolution computation. Consider the matrix equation R = RY , where the matrix R of the rank by one less than S consists of the entries rm,k = sin(2πmk/N ),
m, k = 1, . . . , N − 1,
and forms the vector R whose entries are connected with the vector SN −1 by the relation Rm = Sm,N −1 sin(2πm/N ). Obviously, R can already be transformed into a matrix of cyclic convolution. Analogously, through the matrix T whose rank is by one less than S and which is also transformable JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS
Vol. 11 No. 1
2017
48
GASENKO
into the matrix of cyclic convolution, we can find the entries of vector T : Tm = Sm,N −2 sin(2πm/N ). Now, the sought vector Z is calculated by R and T by an analogous and not more difficult manner than in (19). Thus, DCFM will be reduced to a problem of DFT for a complex vector with two real matrices R and T, which allows rapid calculation of the vectors R and T , for example, using the Rader algorithm [1]. DSMF is also transformed into an analogous problem, but with the matrices in which the real and imaginary entries alternate. The solutions of the formulated problems of accelerating the differential Fourier transform method will be presented in the subsequent papers. ACKNOWLEDGMENTS The author was supported by the Russian Science Foundation (project no. 15–19–10025). REFERENCES 1. H. J. Nussbaumer, Fast Fourier Transform and Convolution Algorithms (Springer, Heidelberg, 1982; Radio i Svyaz’, Moscow, 1985). 2. P. N. Swarztrauber, “Symmetric FFTs,” Math. Comp. (1986) 47 (175), 323–346. 3. R. E. Blahut, Fast Algorithms for Digital Signal Processing (Addison-Wesley, Reading, 1985; Mir, Moscow, 1989). 4. J. W. Cooley and J. W. Tukey, “An Algorithm for Machine Calculation of Complex Fourier Series,” Math. Comp. 19, 297–301 (1965). 5. I. J. Good, “The Interaction Algorithm and Practical Fourier Analysis,” J. Roy. Statist. Soc. Ser. B 20 (2), 361–375 (1958). 6. I. J. Good, “The Interaction Algorithm and Practical Fourier Analysis: An Addendum,” J. Roy. Statist. Soc. Ser. B 22 (2), 372–375 (1960). 7. L. H. Thomas, Using a Computer to Solve Problems in Physics, Applications of Digital Computers (Ginn, Boston, 1963). 8. S. Winograd, “On Computing the Discrete Fourier Transform,” Math. Comp. 32, 175–199 (1978).
JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS Vol. 11 No. 1 2017