Arab J Sci Eng DOI 10.1007/s13369-017-2833-3
RESEARCH ARTICLE - COMPUTER ENGINEERING AND COMPUTER SCIENCE
A Hybrid Intelligent System Integrating the Cascade Forward Neural Network with Elman Neural Network Mutasem Sh. Alkhasawneh1 · Lea Tien Tay2
Received: 26 April 2017 / Accepted: 21 August 2017 © King Fahd University of Petroleum & Minerals 2017
Abstract Cascade forward neural network (CFNN) is a well-known static neural network where the signals move in forward direction only. Dynamic neural network such as Elman neural network (ENN) is built in such a way that allows the signals to travel in both directions. Dynamic neural network has been used widely in various applications such as speech recognition and time series and rarely used in static applications because of their poor performance. This paper proposes to hybrid the CFNN with the ENN to take the advantages of both networks with signals travel in both directions. The proposed system is named as HECFNN, and its effectiveness is evaluated using a number of benchmarks. The benchmarks investigated include datasets of Wine, Ionosphere, Iris, Wisconsin breast cancer, glass and Pima Indians diabetes. Firstly, the performances of the hybrid system are compared with those of the CFNN and ENN. The simulations demonstrate that the proposed hybrid network structure can effectively model both linear and nonlinear static systems with high accuracy. The proposed system achieves an improvement in terms of accuracy as compared to the results of the CFNN and ENN. Secondly, the results are compared with different methods reported by Hoang. It is found that the accuracy of the proposed system is as good as, if not better than, other methods. Thirdly, the HECFNN results are also
B
Mutasem Sh. Alkhasawneh
[email protected] Lea Tien Tay
[email protected]
1
Software Engineering Department, Faculty of Information and Technology, Ajloun National University, P.O. Box 43, Ajloun 26810, Jordan
2
School of Electrical and Electronic Engineering, Universiti Sains Malaysia, Engineering Campus, 14300 Nibong Tebal, Penang, Malaysia
compared with best results reported from different methods in the literature review. It also found that the HECFNN results are better than those methods in general. Based on the results obtained, the proposed HECFNN system demonstrates better performance, thus justifying its potentials as a useful and effective system for prediction and classification. Keywords Elman neural network · Cascade forward neural network · Hybrid network · Classification · Benchmark dataset
1 Introduction First model of computational neural network based on mathematic and algorithms was shown in 1943 [1]. Since then neural network has become a subject of interest for many researchers. Nevertheless, the ideal neural network has not been achieved yet. Thus, different models of artificial neural networks (ANNs) have been designed and tested. ANNs have been used to solve modeling and prediction as well as classification tasks in many fields such as power engineering [2], material industrial [3], medical diagnosis [4], signal and image processing [5] and weather forecasting [6]. The unique characteristics and robustness of the ANN make it a flexible approach. It learns from experience via training of samples and later uses the experience to identify and classify new data. ANNs can be divided into two types, i.e., static and dynamic. Multilayer perceptron neural network (MLP), hybrid multilayer perceptron network (HMLP), cascade forward neural network (CFNN) and feedforward neural network (FNN) are well-known static neural networks where the signals move in forward direction only, i.e., the signals travel from input to output with no loops or feedbacks. The output
123
Arab J Sci Eng
of any layer also does not affect the outputs of the same layer or preceding layer [7]. Dynamic neural networks such as Elman neural network (ENN) and Hopfield neural network are built such that the signals can travel in both directions, i.e., the output of any layer can be fed back to the same layer or to neurons in the preceding layers. Dynamic neural network has been used widely in various applications such as speech recognition and time series and rarely used in static applications because of their poor performance [8,9]. ENN was proposed by Elman [10], and it is a feedforward neural network with feedback for every hidden layer through an extra layer called context layer. CFNN is proposed by Demuth [11], and it shows good performance in different applications such as speech recognition [12], vapor–liquid [13], photo energy [14], food industry [15] and others [16,17]. The CFNN is considered as a feedforward multilayer perceptron with an extra connection coming from the input to the successive layers and from every hidden layer to the output layer (in case of more than one hidden layer). The first direct connection from the input to the output layer was proposed and applied on the radial basis function (RBF) networks [18]. The RBF with linear input connections has shown to be significantly better than the standard RBF network [19]. Mashor added a connection from the input layer to the output layer in the multilayer perceptron; the new connection improves the performance of the neural network without increasing the complexity [20]. This paper presents a hybrid of CFNN and ENN, named as HECFNN. The proposed network structure uses every possible connection in the network. The capability of the proposed hybrid network architecture is demonstrated on six benchmark datasets extracted from University of California at Irvine (UCI) machine learning repository [21]. Nine training algorithms were used to train HECFNN to achieve the optimum structure. The proposed HECFNN is compared with traditional CFNN and ENN with the same algorithms as well as with 11 other popular supervised classifiers as reported in [4] and [5,6], namely the C4.5 and the C4.5 rules—a decision tree classifier, the incremental decision tree induction (ITI), the linear machine decision tree (LMDT), the rulebased inductive learning system (CN2), the learning vector quantization (LVQ), the oblique classifier (OC1), the Nevada feedforward backpropagation multilayered perceptron simulator (Nevprop), the k-nearest neighbor with k = 5 (K5), the dynamically created prototype algorithm (Q*) and the radial basis function (RBF) network. Furthermore, the maximum reported classification accuracy for the six benchmark datasets taken from Department of Informatics, Nicolaus Copernicus University [3], and other literature review was used to compare with actual values achieved using the proposed HECFNN. In this section, a description on the background of NN and some computational NN methods is provided. In Sect. 2,
123
a brief overview of the structure of the original CFNN and ENN is given. The proposed network model is detailed in Sect. 3. Sections 4 and 5 explain the experimental studies and performance evaluation, while Sect. 6 presents the results and discussion. Finally, conclusions are presented in Sect. 7.
2 Cascade Forward Neural Network and Elman Neural Network CFNN and ENN represent the static and dynamic neural networks, respectively. Sections 2.1 and 2.2 explain these neural networks in detail. 2.1 Cascade Forward Neural Network CFNN is a multilayer feedforward neural network proposed by Demuth [11]. It mainly consists of input, output and one or more hidden layers with two additional connections, i.e., first connection from the input to each layer in the network, and second connection from each layer to the successive layers in the network. Figure 1 depicts four layers of CFNN, i.e., input layer (I), hidden layer 1 (H1 ), hidden layer 2 (H2 ) and output layer (OL ). H1 has one input connection from the input and two output connections going toward H2 and OL . H2 has two input connections coming from the input and H1 and one output connection toward OL . OL has one output and three input connections from the input, H1 and H2 . The additional connections might improve data distribution over the neural network which increases the neural network generalization [22]. The neural network with one hidden layer is sufficient to solve most of the classification and prediction problems [23,24]. Figure 2 shows the topology of CFNN with one hidden layer. The notation used in this paper is given as: – Wi,1 j is the weight connect i in the input layer to node j in the hidden layer – W 2jn is the weight that connects node j in the hidden layer to node n in the output layer 3 is the weight that connects node i in the input layer – Wi,n to node n in the output layer – neth j is the input of node j in the hidden layer – O_h j is the output of node j in the hidden layer – net_ci is the input of context node i – O_ci is the output of context node i – M, L, N are the number of nodes in the input, hidden and output layers, respectively. (k) (k) – u i and yn are inputs and outputs of neural network, where i = 1, 2, . . . , M, and n = 1, 2, . . . , N , respectively. – k represents the moment.
Arab J Sci Eng Fig. 1 Four layers CFNN architectures H1
I
H2
Output layer
Output
W3in u1
1
1 1
u2
• •
2
2
• •
• •
M
uI
• • N
L
W2jn
1
W ij Input Layer
Hidden Layer
Output Layer
Fig. 2 Three layers CFNN topology
– f (∗) and g (∗) are the linear or nonlinear output function of hidden layer and output layer, respectively. Equation (1) shows the total input of jth hidden neuron at (k) time k coming from the input u i to the hidden layer neth (k) = j
M
(k)
wi,1 j u i
for 1 ≤ j ≤ M
(1)
i=1
For the one hidden layer of CFNN network as shown in Fig. 2, the output of the jth hidden neuron at time k is given by: (k) O_h j
= f neth (k)
(2)
j
where f (∗) is always in the range [−1, 1]. From Eqs. (1) and (2), the outputs of CFNN with one hidden layer can be expressed by:
yn(k)
⎛ ⎞ L M (k) 3 k⎠ = g⎝ w 2jn O_h j + wi,n ui for 1≤n≤N j=1
i=1
2.2 Elman Neural Network ENN is proposed by Elman [10]. In principle, ENN is setup as a regular feedforward neural network which consists of input, output and hidden layer. All neurons on one layer are connected with all neurons in the next layer. In addition, ENN has feed backward neural network where each hidden layer feed itself back through an extra set of context nodes. The latter also called context layer. It stores the output of hidden layer and feeds it back to the hidden layer later. Figure 3 shows 6 layers of ENN, i.e., input layer (I), two hidden layers (H1 and H2 ), two contexts layers (C1 and C2 ) and output layer (O1 ). Each layer has a feed from the previous layer, but H1 and H2 have additional feed from C1 and C2 , respectively. Figure 4 shows the architectures of ENN with hidden layer associated with its own context layer. For the first iteration, the output of the context layer is initialized and set arbitrarily to a fixed value as in Eq. (4) (k)
O_ci
(k)
= net_ci .
(4)
The input of the ith neuron in the hidden layer is given by Eq. (5)
(3) Note that when i = 0 in Eq. (3), the second term will represent the weight and input for the bias input.
neth (k) = j
M i=1
(k)
wi,1 j u i
+
L
(k)
wi,3 j O_ci
for 1 ≤ j ≤ L .
i=1
123
Arab J Sci Eng
Input
Hidden Layer 1
Hidden Layer 2
Context Layer 1
Context Layer 2
Output layer
Output
Fig. 3 Six-layer ENN architectures
3
Output Layer
1
•••
W i,j
N D
2
D-1
W jn Hidden Layer
-1
• • •
1
L
1
W ij
1
•••
L
Context weight 1
Input Layer
• • •
3 Hybrid CFNN–ENN Model
M
Fig. 4 Four-layer ENN topology
(5) Equation (6) gives the total output of the jth hidden neurons at time k (k) for 1 ≤ j ≤ L . (6) O_h j = f neth (k) j
From Eqs. (5) and (6), the output of ENN with one hidden layer can be expressed by Eq. (7) yn(k)
⎛ ⎞ L (k) = g⎝ w 2jn O_h ⎠ j
for 1 ≤ n ≤ N .
(7)
j=1
For the second iteration, the same method of calculation is applied except for context nodes, where the values of the hidden units are copied into the corresponding context units through a unit delay (D−1 ). Therefore, the context nodes input on the second iterations can be expressed in Eq. (8) (k)
net_ci
Figure 4 shows that every unit on the hidden layer is connected to its associated unit in context layer, while every unit on the context layer is connected with all units on the hidden layer. ENN can be used for dynamic and static problems. For dynamic problems, ENN has shown good performance in many applications such as speech recognition [10], controlling [25] and many other applications [26]. A new modification on ENN has been proposed on the next section with hybrid of CFNN and ENN to take the advantages of both methods.
(k−1)
= O_h i
123
.
(8)
To take the advantages of both CFNN and ENN methods as mentioned earlier, the hybrid CFNN–ENN model is proposed and named as HECFNN to train and test the static data for performance assessment. Figure 5 shows an example of sixlayer HECFNN model. It consists of input layer, two hidden layers and output layer in addition to two context layers. The neural network with one hidden layer is sufficient to solve most of the classification and prediction problems [24,27]. Figure 6 shows the topology of hybrid CFNN–ENN model with one hidden layer. Structure The proposed idea is to merge the characteristic of the static neural network (CFNN) and dynamic neural network (ENN). The hybrid network is similar to CFNN by having a full forward connection from the input to output layer and dissimilar by having recurrent connection on the hidden layer. On the other hand, the HECFNN model is similar to ENN by having a recurrent connection on the hidden layer. Unlike ENN, the proposed HECFNN model has connections from the input and hidden layers to all the following layers in the forward path. This structure helps HECFNN model to reorganize and (k) distribute the weight on the hidden layer. The O_ci , neth (k) , (k)
(k)
j
O_h j and output yn for the proposed model are given in Eqs. 9–12, respectively: (k)
O_ci
(k)
= net_ci
(9)
Arab J Sci Eng
Input
Hidden Layer 1
Hidden Layer 2
Context layer
Context layer
Output layer
Output
Fig. 5 Six-layer HECFNN model architectures Input Layer
Hidden Layer
Output Layer
4
W in
W2jn
1
W ij X1U1
1
1 1
X2
XMUM
2
2
• •
• •
M
• N
L
1 3
• •
W ij
L
Context Layer
Fig. 6 Four-layer HECFNN model topology
net
(k) hj
=
(k)
M
(k)
1 u wi, j i
i=1
+
L
(k)
3 O_C wi, j i
i=1
(10)
= f net (k) hj ⎛ ⎞ L M (k) (k) 4 uk ⎠ yn = g ⎝ w2j,n O_h j + wi,n i
O_h j
j=1
for 1 ≤ j ≤ L
(11) for 1 ≤ n ≤ N
i=1
and CFNN. The training procedures of the HECFNN can be explained by assuming that the input to HECFNN is a sequence of vectors and has a clock to synchronize it. The training procedure over time for the HECFNN network can be summarized as follows: First iteration Run all input data for one time through the HECFNN model. At a given time t, the following procedures are performed.
(12) The methodology and the principles used to train the HECFNN are the same as the ones used to train ENN
1. First input vector will be present in the hidden and output layers. 2. The context units will be initialized.
123
Arab J Sci Eng
3. The input vectors of the context units represent the output of the hidden nodes. 4. Following that, the input vectors and the hidden nodes output values will be the input of output layer. 5. Update the weights and the hidden nodes’ outputs are stored to the context units 6. At the end of first iteration, network output is checked. If true, the algorithm proceeds to the second iteration Second iteration In the second iteration (k + 1), the procedure from first iteration will be repeated with one exemption that the value of context units will be the stored value from the previous iteration and can be found using Eq. (11). By substituting Eqs. (10) and (11) into Eq. (12), the output (k) yn can be calculated by Eq. (13): ⎛ yn(k)
= g⎝
L
w 2j,n
j=1
+
L
f
M
Benchmark data
Cases
Attribute
Class
Wine
178
13
3
Ionosphere
351
34
2
Iris
150
4
3
WBC
699
9
2
Glass
214
9
6
PID
768
8
2
2.
(k)
wi,1 j u i
i=1
(k) wi,3 j O_Ci
Table 1 Benchmark datasets
+
i=1
M
w4i,n u ik
3. for 1≤n≤N
i=1
(13) Equation (13) shows that the first and second terms are in the same form, and both weights can be estimated using the same training algorithm. Since there are no interactions between the two types of neurons and can be unfolded into CFNN and ENN, the proposed HECFNN model can be trained principally with same algorithms normally used for CFNN and ENN.
4 Experimental Datasets In this paper, a hybrid system consisting of the CFNN and the ENN for data prediction and classification is designed and developed. The performance of the proposed CFNN–ENN model was evaluated using six benchmark datasets obtained from the University of California, Irvine (UCI) machine learning repository [21]. The datasets used were Wine, Ionosphere, Iris, Wisconsin breast cancer (WBC), Glass and Pima Indians diabetes (PID). Table 1 summarizes the number of training samples, number of prediction feature and number of the classes for the benchmark datasets. The description of each dataset is as follows.
4.
5.
6.
alkalinity of ash, magnesium, total phenols, flavonoids, non-flavonoid phenols, proanthocyanins, color intensity, hue, OD280/OD315 of diluted wines and proline. Ionosphere: It has two classes of radar data: good and bad. Good class of data returns with some evidence from the type of structure in the Ionosphere. Bad class of data does not contain any structure evidence from the Ionosphere. 351 samples of Ionosphere dataset with 34 features were used in this study. Iris: The Iris data has three classes problem with 4 features and 150 samples. The three classes refer the different types of Iris plants: iris setose, iris verse color and iris virginica, each with 50 samples. Each sample comprises four input features: sepal length, sepal width, petal length and petal width. WBC: The WBC contains data for 699 breast cancer patients. The patients were classified to two classes, i.e., benign class with 458 samples and malignant class with 241 samples. Each sample has nine features, namely clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli and mitoses. Glass: The Glass dataset is a popular dataset that contains six classes, nine features and 214 samples. The six classes are building windows float processed (70 cases), building windows non-float processed (76 cases), vehicle windows float processed (17 cases), containers (13 cases), tableware (9 cases) and headlamps (29 cases). PID: The PID is two-class dataset. The classes are the presence or absence of diabetes among Pima Indian females. The dataset contains 768 samples with 500 healthy and 268 diabetes cases. Each class on the dataset has eight features.
5 Classification Performance Analysis 1. Wine: Three different types of wine and 178 samples were collected from the same region in Italy. Each type represents one class. Class one has 59 samples, class two has 71 samples, and class three has 48 samples. Every class has 13 features: alcohol, malic acid, ash,
123
The performance analysis of the proposed HECFNN is based on two parameters of the results obtained after training and testing. These parameters are classification accuracy and average mean square error (MSE).
Arab J Sci Eng
For this study, the experimental work on the HECFNN evaluation can be divided into three measure parts: 1. Finding the optimum HECFNN structure by comparing with CFNN and ENN. 2. Comparing the performance of the HECFNN with average classification accuracy obtained by different techniques reported in [4,29]. These classification methods include 11 different categories of classifier such as instance-based (Q* and K5), Neural networks (LVQ, Nevprop and RBF), decision trees (C4.5, C4.5 rules, ITI, LMDT and OC1) and rule-based (CN2). 3. Comparing the accuracy achieved by HECFNN with the maximum results reported in the literatures for the six benchmark datasets.
will indicate how fast a prediction error converges with the number of training data [20]. The MSE is defined as the average squared of the error between the actual output and the predicted output. The MSE at the tth training step is given by: d
n 1 MSE (t1 , ∅ (t)) = (Actual output (i) N i=1
− Predicted ouput (i, ∅ (t)))2
(14)
where Actual output (i) and Predicted ouput (i, ∅ (t)) are actual output, and predicted output for a given set of estimated parameter ∅ (t) after t training steps, respectively. n d is the number of data that was used to calculate the MSE, and their average value is calculated after 10 times of testing based on Eq. (15) 1 MSEi . 10 10
The same experimental procedure implemented by [4,29] was employed in this study. The experiments were run 10 times for every benchmark data, and the average results were obtained. For each experiment, 80% of the training data were selected randomly from the entire dataset and the remaining 20% of the data were used for testing. In addition to that, mini-batch as in [4] was adopted for training the data, where the data records were divided into groups of approximately equal size and update the weights after passing one group. 5.1 The Hybrid CFNN–ENN Structure To determine the optimal structure of HECFNN and optimal training algorithm, the capability of the proposed HECFNN has been trained and tested using nine machines learning algorithms. As mentioned earlier, the HECFNN can be trained with same algorithms which are usually used to train CFNN and ENN in principle. The common algorithms used to train CFNN and ENN are: Levenberg–Marquardt (LM), resilient backpropagation (RP), scaled conjugate gradient (SCG), conjugate gradient with Powell Beale restarts (CGB), conjugate gradient with Fletcher–Reeves updates (CGF), conjugate gradient with PolakRibiere updates (CGP), gradient descent (GD), gradient descent with momentum (GDM) and gradient descent with adaptive learning rate (GDA) algorithms. In addition to that, another factor that often arises in artificial neural network applications is to determine optimal neural network structure, which is the number of the neurons in the hidden layer. In order to determine the CFNN, ENN and HECFNN network architectures, the experiments were carried out as implemented in [28–31]. The number of hidden neurons is set to vary from 1 to 100, and for each number of hidden neuron, the network was trained for 10 times. The epoch numbers vary from 1 to 10,000, and the purpose was to find the number of epochs that produced the best generalization for each number of hidden neuron. The MSE test
Average MSE =
(15)
i=1
6 Results and Discussion In the following subsections, the results of HECFNN performance evaluation in three phases are discussed in detail. Firstly, discussion on the optimum structure of HECFNN and comparison of the optimum HECFNN results against CFNN and ENN are presented. Secondly, comparison of the optimum HECFNN results with the results reported by Hoang [4] is presented. Thirdly, comparison with best results in the literature is reported. 6.1 Optimum Structures for CFNN, ENN and HECFNN Table 2 shows the first phase of the three evaluations. The performance of the CFNN, ENN and HECFNN after training with nine learning algorithms and six different datasets are presented. Table 2 shows that the classification performance for the three networks in the six datasets highly depends on the learning algorithm and it varies based on the data complexity, type and structure of the network. In this phase, the comparison between CFNN, ENN and HECFNN is discussed from three aspects, i.e., classification accuracy, MSE and number of the neurons in the hidden layer among the nine learning algorithms. The comparison of HECFNN network performance with respect to the other two networks, i.e., CFNN and ENN is discussed. The priority of choosing the best network is based on the accuracy, and followed by MSE and neuron number hidden layer. 1. Based on the Wine datasets: – The CFNN achieved the best results using LM algorithm with 100% for the accuracy, 0.0202 for the MSE and 25 hidden neurons. The CFNN achieved the lowest results with GDA algorithm with 99.48%, 0.051 and
123
Arab J Sci Eng Table 2 CFNN, ENN and HECFNN structures with testing accuracy (%), MSE and optimum number of hidden node for classifying the six benchmark datasets Data
Methods
Wine
CFNN
ENN
HECFNN
LM
RP
SCG
CGB
CGF
CGP
GD
GDM
GDA
100.00
99.65
100
100
100
100
99.65
100
99.48
MSE
.0202
.0270
.0264
.0208
.024
.022
.0391
.0261
.0501
Hidden
25
10
48
14
30
14
37
43
4
Accuracy
95.00
94.13
89.65
94.82
94.48
94.82
94.82
94.48
89.33
MSE
.0291
.0383
.0367
.0312
.0344
.0316
0.0350
.0486
.0363
Hidden
98
24
16
99
80
89
49
66
33
Accuracy
100
99.82
100
100
100
100
100
100
99.82
MSE
.0201
0.0227
.0166
0.0178
.0223
.0168
.0248
.0188
.0227
Hidden
99
62
58
92
86
86
84
98
56
Accuracy
97.42
97.14
98.19
98.09
97.85
97.85
64.25
65.14
96.85
MSE
.0140
.0177
.0111
.0123
.0151
.0141
.1646
.1751
.0154
Hidden
97
37
83
86
45
42
86
90
74
Accuracy
92.14
92.57
92.42
92.42
92.61
92.41
71.66
71.64
91.19
Accuracy
Ionosphere CFNN
ENN
HECFNN
MSE
.0359
.0336
.0347
.0355
.0321
.0347
.0495
0.0496
.0369
Hidden
21
54
78
10
55
66
49
64
11
Accuracy
99.82
97.619
98.57
98.57
98.57
98.09
81.42
78.81
98.57
MSE
.0097
.0024
.0133
.0124
.0100
.0151
.0201
.1011
.0166
Hidden
81
72
18
18
59
36
55
66
77
Accuracy
99.27
99.40
99.16
99.29
98.94
99.41
98.87
82.17
98.89
MSE
.0016
0.0019
.0029
.0025
.0019
.0019
.0016
.0033
.0029
Iris CFNN
ENN
HECFNN
Hidden
91
46
62
68
57
68
58
92
30
Accuracy
93.99
93.98
93.74
93.51
93.81
93.92
93.15
68.83
93.60
MSE
.0120
.0145
0.0180
.0121
.0111
.0109
.0121
.0129
.0122
Hidden
47
96
67
27
60
48
61
49
44
Accuracy
99.074
99.02
99.60
99.75
99.16
99.09
99.09
75.59
98.96
MSE
.0092
.0018
.0023
.0028
.0077
0.004
.0034
.0056
.0009
Hidden
93
89
49
91
44
32
64
24
37
Accuracy
97.70
97.19
97.26
97.33
97.69
97.48
93.88
93.00
96.27
WBC CFNN
ENN
HECFNN
MSE
.0051
.0081
.0050
0.0074
0.0060
.0032
0.0089
.0032
.0051
Hidden
43
39
13
87
39
11
11
5
9
Accuracy
88.201
86.69
83.08
87.60
87.84
88.32
88.76
85.20
91.53
MSE
0.0020
0.0022
0.0106
0.0079
.0195
0.0105
0.028
0.0189
0.0031
Hidden
20
34
33
47
40
49
31
97
44
Accuracy
97.12
97.60
97.51
97.84
97.36
97.94
97.12
96.40
97.85
MSE
.0179
.0104
.0113
.0078
.0005
.0106
.0130
.0184
0.0207
Hidden
4
46
50
61
17
19
61
74
74
Accuracy
84.98
80.82
84.28
82.38
82.14
82.73
70.71
62.55
72.10
MSE
.0045
.0172
.0049
.0058
.0077
.0061
.0077
.0339
.0235
Hidden
64
50
9
19
34
6
32
51
51
Accuracy
73.33
68.31
75.206
74.43
72.58
75.00
57.96
55.55
67.56
MSE
0.0229
0.0259
0.0156
0.0175
0.018
0.017
0.033
0.0245
0.0264
Hidden
6
21
51
43
51
46
51
39
32
Glass CFNN
ENN
123
Arab J Sci Eng Table 2 continued Data
Methods HECFNN
LM
RP
SCG
CGB
CGF
CGP
GD
GDM
GDA
Accuracy
85.309
83.57
85.62
80.81
80.76
83.02
63.23
64.64
76.66
MSE
.0229
.0259
.0156
.0175
.0189
.0175
.0330
.0245
.0264
Hidden
51
10
53
62
44
29
71
49
38
Accuracy
77.89
79.60
80.19
78.81
84.66
81.33
69.73
76.31
78.75
MSE
.0160
.0352
.0170
.0142
.0227
.0398
.196
.014
.0126
Hidden
2
41
16
25
47
86
45
1
15
Accuracy
75.04
73.026
74.38
65.39
73.68
73.42
58.68
68.33
56.84
PID CFNN
ENN
HECFNN
MSE
0.0556
0.0138
0.0279
0.107
0.0151
0.0177
0.0367
0.0701
0.176
Hidden
3
12
61
68
33
18
15
28
59
Accuracy
80.09
82.9
84.67
82.66
85.01
84.01
76.80
74.01
82
MSE
.0118
.0413
.0441
.0708
.0267
.0299
.163
.181
.0511
Hidden
18
94
89
18
99
7
58
34
88
Summary of table 2 Methods
Data
Wine
Ionosphere
Iris
WBC
Glass
PID
CFNN
Accuracy
100.00
98.19
99.41
97.70
84.98
84.66
MSE
0.0202
0.0111
0.0019
0.0051
0.0045
0.0227
ENN
Accuracy
95.00
92.61
93.99
91.53
75.206
75.04
MSE
0.0291
0.0321
0.0120
0.0031
0.0156
0.0556
HECFNN
Accuracy
100
99.82
99.75
97.94
85.62
85.01
MSE
0.0166
0.0097
0.0028
0.0106
0.0156
0.0267
Bold indicates maximum or minimum accuracy and corresponding MSE values
4 for accuracy, MSE and hidden neuron, respectively. Same as LM, algorithms like SCG, CGB, CGF, CGP and GDM achieved 100% accuracy with slight difference in the MSE values. GDA and SCG show the lowest and highest numbers of the neurons in the hidden layer, respectively. – ENN produced the highest accuracy of 95% and lowest MSE of 0.0291 with LM algorithm. The number of hidden neurons for this algorithm is 98. The lowest accuracy is 89.33% with MSE of 0.0363, achieved by GDA algorithm using 33 neurons in the hidden layer. The minimum hidden neuron 16 is achieved by SCG algorithm. – HECFNN obtained the highest accuracy of 100%, lowest MSE of 0.0166 and number of hidden neurons of 58 with SCG algorithm. While the RP and GDA algorithms gave the same lowest accuracy with 99.82%. The other six algorithms which also achieved 100% classification accuracy are LM, CGB, CGF, CGP, GD and GDM. – The CFNN, ENN and HECFNN showed good performance when they are used to classify the Wine dataset, CFNN and HECFNN achieved the maximum classification accuracy of 100%. The HECFNN showed better
performance in terms of MSE since it achieved lower MSE. The other advantage of the HECFNN over the other two neural networks is number of the learning algorithm obtained 100% accuracy. Seven of the learning algorithms used in HECFNN achieved 100% compared to zero in ENN and six algorithms in CFNN. Among the three neural networks, CFNN shows the fewest number of neurons in the hidden layers for the best performance (25) followed by HECFNN with 58 neurons and ENN is the last one with 98 neurons. 2. Based on the Ionosphere datasets: – The highest accuracy achieved using the CFNN is 98.19% using SCG algorithm associated with the lowest MSE 0.0111. The number of hidden neurons is 83 neurons. CFNN achieved the lowest accuracy of 64.25% and MSE of 0.1646 through GD algorithm. The rest of learning algorithms achieved accuracy higher than 90% except GD and GDM. – ENN produced the highest accuracy of 92.61% and lowest MSE of 0.0321 using the CGF algorithm with 55 hidden neurons. 71.64% and 0.0496 are the lowest accuracy and highest MSE values produced by GDM algorithm. The algorithms GD and GDM achieved
123
Arab J Sci Eng
the worst accuracy result with very small difference (0.02%). – HECFNN obtained the highest accuracy 99.82%, with corresponding MSE of 0.0097 and 81 neurons in the hidden layer for LM algorithm. 78.81%, 0.1011 and 66 are the lowest accuracy, highest MSE and hidden neuron number achieved using the GDM learning algorithm, respectively. – Comparing the three neural networks, it can be noticed that the best performance achieved for Ionosphere dataset is 99.82% for accuracy and 0.0097 for MSE using the HECFNN with LM algorithm followed by CFNN and finally ENN. 3. Based on the Iris datasets: – The CFNN achieved its highest accuracy 99.41% with MSE of 0.0019 and 68 hidden neurons using the CGP algorithm. GDM algorithm produced the lowest performance with 82.17% for the accuracy, 0.0033 for the MSE and 92 neurons in the hidden layer. The CFNN trained using LM, RP, SCG, CGB and CGP achieved accuracy of 99% and above. On the other hand, CGF, GD and GDA achieved result between 98.00 and 99.00%. – ENN achieved the best accuracy using the LM with 93.99% and 0.0120 for the MSE with 47 neurons in the hidden layer. The lowest performance for ENN was obtained using GDM, i.e., 63.83%, 0.0129 and 49 for accuracy, MSE and hidden neurons, respectively. ENN results show that the GDM algorithm is the only algorithm which has accuracy value below 93.0%. – HECFNN with CGB algorithm produced the highest accuracy and corresponding MSE of 99.75% and 0.0028, respectively. The number of hidden neurons achieved is 91. HECFNN with GDM produced the lowest accuracy, i.e., 75.59%, MSE of 0.0056 with 24 hidden neurons. Moreover, out of the nine learning algorithms used by HECFNN to classify the Iris dataset, 8 algorithms achieved accuracy higher than 98%. – Comparing the three neural networks, it can be noticed that the difference in the best accuracy among CFNN, ENN and HECFNN is very small as seen in Table 2. CFNN and HECFNN showed better performance comparing with ENN for the nine learning algorithms. HECFNN showed the best performance among the three networks generally, even though CFNN showed better performances when it was trained by LM, RP, CGP and GDM learning algorithms. 4. Based on the Wisconsin breast cancer datasets: – The CFNN with LM obtained the best performance, i.e., 97.70% for accuracy and 0.0051 for the MSE. GDM algorithm shows the lowest performance achie-
123
ved by CFNN with 93.00% classification accuracy and 0.0032 MSE. – ENN achieved the lowest accuracy of 83.08% with corresponding MSE of 0.0106 using SCG algorithm. On the other hand, GDA achieved the best results with 91.53% and 0.0031 for accuracy and MSE, respectively. – HECFNN achieved the best accuracy of 97.94% and 0.0106 for MSE with CGP algorithm after trained with 19 neurons in the hidden layer. GDM achieved the lowest performance, i.e., 96.40% for accuracy and 0.0184 for MSE. – The differences between the performance obtained by three neural networks CFNN, ENN and HECFNN are small for the best performance. HECFNN achieved the best classification accuracy 97.94%, followed by CFNN with 97.70% and ENN is the last with 91.53%. GDM has the worst performance when trained with HECFNN and CFNN. 5. Based on the glass datasets: – CFNN showed the best classification accuracy 84.98% and lowest MSE 0.0045 when trained by LM algorithm with 64 hidden neurons. On the contrary, CFNN achieved the lowest accuracy of 62.55% and 0.0339 for MSE using GDM algorithm trained with 51 neurons in the hidden layer. – ENN trained by SCG with 51 neurons in the hidden layer showed the best performance when it used to classify the Glass dataset. The accuracy achieved is 75.206%, and MSE is 0.0156. – HECFNN classified the Glass datasets with 85.62% and 0.0156 for the highest accuracy and lowest MSE, respectively, using SCG algorithm with 53 neurons in the hidden layer. – The results obtained for CFNN, ENN and HECFNN showed that the HECFNN had the best accuracy which is 85.62%. To achieve the best results, the HECFNN was trained by SCG algorithm with 53 neurons in the hidden layer. On the other hand, the lowest performance was achieved by training ENN with GDM algorithm. 6. Based on Pima Indian diabetes dataset: – CFNN has the best classification accuracy of 84.66% after trained by CGF algorithm. In contrast, the lowest accuracy of 69.73% and MSE of 0.196 was achieved by GD algorithm. ENN has the highest and lowest accuracy after training by LM and GDA, with the accuracy of 75.04% and 56.84%, respectively. Moreover, the ENN showed the lowest performance among the three networks when it was trained by all the 9 learning algorithms. HECFNN obtained the best performance, i.e., 85.01% for accuracy and 0.0267 for MSE using
Arab J Sci Eng Table 3 HECFNN best classification accuracy with corresponding learning algorithm
Table 4 Comparison of the accuracy of HECFNN with models published in [4]
HECFNN
Wine
Ionosphere Iris
WBC Glass PID
Methods
Learning algorithm
SCG
LM
CGP
Accuracy
100.00% 99.82%
CGB
SCG
Wine
CGF
99.75% 97.94 85.62 85.01
the CGF with 99 neurons in the hidden layer. On the other hand, 74.01% is the lowest accuracy achieved with GDM algorithm with the highest MSE of 0.181. Furthermore, the HECFNN performs better than ENN and CFNN regardless of the learning algorithm, except for GDM algorithm which performs better when used to train the CFNN. – Comparing the three networks results in Table 2, ENN has the lowest accuracy 56.84% among the three networks. Moreover, the ENN showed the lowest performance among the three neural networks when it was trained by the nine learning algorithms. The performance obtained using the HECFNN has higher accuracy as compared to ENN and CFNN for most of the used learning algorithms. Nevertheless, HECFNN needs more neurons in the hidden layers as compared to CFNN and ENN when training by most of the algorithms.
Testing accuracy Ionosphere
Iris
WBC
Glass
PID
RBF
67.87
87.60
85.64
94.89
69.54
70.57
Q*
74.35
89.70
92.10
95.46
74.78
68.50
K5
69.49
85.91
91.94
96.38
69.09
71.37
Nevprop
95.41
83.80
90.34
95.05
44.08
68.52
OC1
87.31
88.29
93.89
93.24
57.72
50.00
LVQ
68.90
88.58
92.55
94.82
60.69
71.28
CN2
91.09
90.98
91.92
94.39
70.23
72.19
LMDT
95.40
86.89
95.45
95.74
60.59
73.51
ITI
91.09
93.65
91.25
91.14
67.49
73.16
C4.5 rules
91.90
91.82
91.58
94.68
67.96
71.55
C4.5
91.09
91.56
91.60
94.25
70.23
71.02
100.00
98.19
99.41
97.70
84.98
84.66
95.00
92.61
93.99
91.53
75.206
75.04
100.00
99.82
99.75
97.94
85.62
85.01
CFNN ENN HECFNN
The best results are highlighted in boldface type Table 5 Maximum accuracy in (%) of six datasets in related references which were used to compare with the proposed HECFNN Methods
Testing accuracy Wine
Ionosphere Iris
WBC Glass PID
QDA. [32]
99.40 –
–
–
–
–
3-NN +simplex [3]
–
98.70
–
–
–
–
C-MLP2LN [33]
–
–
98.00 –
–
–
C-MLP2LN [33]
–
–
–
99.00 –
–
Adaptive metric NN [34]
–
–
–
–
75.20 –
6.2 Comparison HECFNN Results with an Optimum Average Results in Different Methods
Logdisk [35]
–
–
–
–
–
HMLP with MRLS [29]
99.94 96.37
99.62 99.82 77.27 83.22
The performance of the new proposed HECFNN is compared against the ENN and CFNN in the previous section. In this section, the performance of the HECFNN will be compared against the average performance achieved by 11 classification techniques proposed in Hoang [4]. The training and testing of HECFNN was conducted and followed the same procedure used by Hoang, i.e., 20% and 80% of the data samples were used for testing and training, respectively. The experiment also was conducted for 10 times, with the results averaged. The results in Table 4 show that the performance obtained using the proposed HECFNN is higher than the average performance obtained by the other 11 methods for the six benchmark datasets. The best performance achieved by Hoang in all dataset was as follows: Wine datasets 95.41% by Nevprop, Ionosphere 93.65% by ITI, Iris 95.45% by LMDT, WBC 96.38% by K5, Glass 74.78% by Q* and finally PID 73.51% by LMDT. The results achieved by HECFNN for the
MACS-CBS [36]
–
–
–
HECFNN
100
99.82
99.75 97.94 85.62 85.01
Table 3 shows the most optimum learning algorithm used to train the HECFNN with the corresponding classification accuracy.
77.70
97.56 83.32 76.58
six datasets are Wine 100%, Ionosphere 99.82%, Iris 99.75%, WBC 97.94%, Glass 85.62% and finally PID 85.01%. 6.3 Comparison with the Optimum Results The maximal reported performance, corresponding classification techniques and the references of the six datasets are listed in Table 5. The maximal reported performance for six benchmark datasets is compared with the HECFNN. It is clearly seen that the HECFNN obtained better performance results for five datasets. They are Wine, Ionosphere, Iris, Glass and PID. WBC is the only dataset where the HMLP with MRLS technique proposed by Al-Batah et al. [29]
123
Arab J Sci Eng
achieved better accuracy of 99.82% comparing to 97.94% achieved by HECFNN technique. The previous comparisons in Sects. 6.1, 6.2 and 6.3 show that the proposed HECFNN is stable and produces good performance with less MSE.
7 Conclusions The CFNN and ENN represent two major branches of neural network, i.e., static and dynamic ANNs, respectively. An integration of the CFNN and the ENN is proposed in this paper which is called HECFNN. The proposed HECFNN is applied to classification and prediction tasks. The selected case studies include six benchmark datasets. The results have been compared with those of the CFNN, ENN and other methods reported in Hoang [4]. The performance of the proposed hybrid system is promising in terms of accuracy. From the results obtained, the HECFNN has better prediction capability as compared to that from the standard CFNN and ENN. In addition, the proposed system yields improvements in terms of accuracy as compared to those from the other methods reported by Hoang in the benchmark problems. The proposed HECFNN improved the accuracy of the predictions by 4.59, 6.17, 4.30, 1.56, 10.84 and 11.50% over Nevprop, ITI, LMDT, K5, Q ∗ and LMDT for Wine, Ionosphere, Iris, WBC, Glass and PID benchmark datasets, respectively.
References 1. McCulloch, W.W.P.: Logical calculus of ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943) 2. Kit-Po, W.: Artificial intelligence and neural network applications in power systems. In: 2nd International Conference on Advances in Power System Control, Operation and Management, 1993. APSCOM-93 (1993) 3. Classification, D.U.F.: Comparison of results. Nicolaus Copernicus University, Department of Informatics (2008). http://www.fizyka. umk.pl/kmk/projects/datasets.html 4. Hoang, A.: Supervised classifier performance on the UCI database. Master Thesis, Department of Computer Science, University of Adelaide, Adelaide (1997) 5. Eklund, P.; Hoang, A.: A comparative study of public domain supervised classifier performance on the UCI database. Research Online (2006-01-01) 6. Eklund, P.; Hoang, A.: A performance survey of public domain machine learning algorithms. Technical Report, School of Information Technology, Griffith University (2002) 7. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, Basingstoke (1994) 8. Kim, D.-S.; Lee, S.-Y.: Intelligent judge neural network for speech recognition. Neural Process. Lett. 1(1), 17–20 (1994) 9. Liu, Y.; et al.: Speech recognition using dynamic time warping with neural network trained templates. In: International Joint Conference on Neural Networks, 1992. IJCNN (1992) 10. Elman, J.L.: Finding structure in time. Cognit. Sci. 14(1990), 179– 211 (1990)
123
11. Demuth, H.; Beale, M.H.; Hagan, M.T.: Neural Network Toolbox User’s Guide. The MathWorks, Inc., Natrick (2009) 12. Abdul-Kadir, N.A.; et al.: Applications of cascade-forward neural networks for nasal, lateral and trill Arabic phonemes. In: 2012 8th International Conference on Information Science and Digital Content Technology (ICIDT) (2012) 13. Lashkarbolooki, M.; Shafipour, Z.S.: Trainable cascade-forward back-propagation network modeling of spearmint oil extraction in a packed bed using SC-CO2. J. Supercrit. Fluids 73, 108–115 (2013) 14. Khatib, T.; Mohamed, A.; Sopian, K.; Mahmoud, M.: Assessment of artificial neural networks for hourly solar radiation prediction. Int. J. Photoenergy 2012, 7 (2012). Article ID 946890. doi:10.1155/ 2012/946890 15. Sumit, G.; Kumar, G.G.: Cascade and feedforward backpropagation artificial neural networks models for prediction of sensory quality of instant coffee flavoured sterilized drink. Can. J. Artif. Intell. Mach. Learn. Pattern Recognit. 2(6), 78–82 (2011) 16. Al-Allaf, O.N.A.; Tamimi, A.A.; Mohammad, M.A.: Face recognition system based on different artificial neural networks models and training algorithms. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(6), 40–47 (2013) 17. Al-Allaf, O.N.A.: Cascade-forward vs. function fitting neural network for improving image quality and learning time in image compression system. In: Proceedings of the World Congress on Engineering, vol. 2 (2012) 18. Geiger, H.: Storing and processing information in connectionist systems. In: Eckmiller, R. (ed.) Advanced Neural Computers, pp. 271–277. North-Holland, Amsterdam (1990) 19. Poggio, T.; Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1496 (1990) 20. Mashor M.Y.: Hybrid multilayered perceptron networks, Int. J. Syst. Sci. 31(6), 771–785 (2000) 21. Newman, D.J.; Hettich, S.; Blake, C.L.; Merz, C.J.; Aha, D.W.: Machine Learning Databases. Department of Information and ComputerScience, UCI Repository of University of California, Irvine, 1998 of Conference. http://archive.ics.uci.edu/ml/datasets. html (1998) 22. Gao, X.Z.; Gao, X.M.; Ovaska, S.J.: A modified Elman neural network model with application to dynamical systems identification. In: IEEE International Conference on Systems, Man, and Cybernetics, 1996 (1996) 23. Cybenko, G.: Approximations by superposition of a sigmoidal function. Math. Control Sig. Syst. 2, 303–314 (1989) 24. Patuwo, E.; Hu, M.Y.; Hung, M.S.: Two-group classification using neural networks. Decis. Sci. 24(4), 825–845 (1993) 25. Liu, C.; Jiang, D.; Zhao, M.: Application of RBF and Elman neural networks on condition prediction in CBM. In: Wang, H.; et al. (eds.) The 6th International Symposium on Neural Networks (ISNN 2009), pp. 847–855. Springer, Berlin (2009) 26. Wang, L.; et al.: An improved OIF Elman neural network and its applications to stock market. In: Gabrys, B., Howlett, R., Jain, L. (eds.) Knowledge-Based Intelligent Information and Engineering Systems, pp. 21–28. Springer, Berlin. (2006) 27. Baum, E.B.; Haussler, D.: What size net give valid generalization? Neural Comput. 1, 151 (1989) 28. Isa, N.A.M.; et al.: Suitable features selection for the HMLP and MLP networks to identify the shape of aggregate. Constr. Build. Mater. 22(3), 402–410 (2008) 29. Al-Batah, M.S.; et al.: Modified recursive least squares algorithm to train the hybrid multilayered perceptron (HMLP) network. Appl. Soft Comput. 10(1), 236–244 (2010) 30. Mat-Isa, N.A.; Mashor, M.Y.; Othman, N.H.: An automated cervical pre-cancerousdiagnostic system. Artif. Intell. Med. 42(1), 1–11 (2008)
Arab J Sci Eng 31. Al-Batah, M.S.; et al.: A novel aggregate classification technique using moment invariants and cascaded multilayered perceptron network. Int. J. Miner. Process. 92(1–2), 92–102 (2009) 32. Aeberhard, S.; Coomans, D.; Vel, O.D.: Comparison of classifiers in high dimensional settings. Technical Report Number 92–02. Department of Computer Science and Department of Mathematics and Statistics, James Cook University of North Queensland (1992) 33. Duch, W.; Adamczak, R.; Grabczewski, K.: A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Networks 12, 277–306 (2001)
34. Domeniconi, C.; Peng, J.; Gunopulos, D.: An adaptive metric machine for pattern classification. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 458–464. MIT Press (2001) 35. Michie, D.; Spieglhalter, D.; Taylor, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Elis Horwood, London (1994) 36. Mohammed, M.; Lim, C.; Quteishat, A.: A novel trust measurement method based on certified belief in strength for a multi-agent classifier system. Neural Comput. Appl. 24, 421–429 ( 2014)
123