Vol.26 No.3
JOURNAL OF ELECTRONICS (CHINA)
May 2009
RESEARCH AND APPLICATION OF A NEURAL NETWORK CLASSIFIER BASED ON DYNAMIC THRESHOLD1 Zhang Li Luo Jianhua Yang Suying (School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116023, China) Abstract In this study, a Multi-Layer BP neural network (MLBP) with dynamic thresholds is employed to build a classifier model. As to the design of the neural network structure, theoretical guidance and plentiful experiments are combined to optimize the hidden layers’ parameters which include the number of hidden layers and their node numbers. The classifier with dynamic thresholds is used to standardize the output for the first time, and it improves the robustness of the model to a high level. Finally, the classifier is applied to forecast box office revenue of a movie before its theatrical release. The comparison results with the MLP method show that the MLBP classifier model achieves more satisfactory results, and it is more reliable and effective to solve the problem. Key words
Neural network classifier; Dynamic threshold; Forecasting; Box office revenue
CLC Index
TP183
DOI 10.1007/s11767-008-0028-5
I. Introduction BP neural networks are the most widely used networks and are considered the workhorse of ANNs[1]. Because of its simplicity and its power to extract useful information from samples, the application of BP model is most widely recently[2]. It allows specification of multiple input criterion, and generation of multiple output recommendations, and no assumption regarding the form of the functions relating input and output variables. BP model eliminates the limitations of the regression method, and establishes the mapping accurately between the input and output variables[3]. Due to its strong learning ability and generalization capability, BP networks have been used in a great deal of domains, especially in classification and prediction. Researchers had found the BP model displayed more robust performance than other models in classification problems[4]. The BP networks are also successfully used in forecasting some financial problems, for example, predicting stock 1
Manuscript received date: March 18, 2008; revised date: June 24, 2008. Supported by National Natural Science Foundation of China (No. 60573172). Corresponding author: Zhang Li, born in 1971, male, Ph.D. School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116023, China. Email:
[email protected].
market returns and price index[5], loan risk warning[6], forecasting bankruptcy firms[7], and as well as areas of decision support systems and management science[8]. In our study, we use the Multi-Layer BP neural network (MLBP) with dynamic thresholds to build the classifier. This classification method is used for the first time. In the training section, the BP algorithm for multi-layer neural network is employed as the learning method. But in the classification section, we use a transfer function named dynamic threshold function, which is different from the traditional ones, and greatly improves the classifier’s generation ability. After the successful design of the classifier, we use it to forecast a movie’s box office before its theatrical release, and it is also the first attempt to apply neural network in this field in China. At last, we achieve acceptable results better than the method using traditional MLP method.
II.
Classification Method
1. The structure of the MLBP classifier When we design a neural network, the design of hidden layer is a very difficult and complex problem, especially to fix the number of hidden layers and their node numbers. Cybenko[9] has proved that when every Processing Element (PE) utilizes sigmoid transfer function, one hidden layer is enough to solve any discriminant classification problem,
408
JOURNAL OF ELECTRONICS (CHINA), Vol.26 No.3, May 2009
and double hidden layers are capable to parse the arbitrary output functions of input pattern. Lippmann[10] estimated the number of hidden units according to his geometric explanation for multilayer network function. He pointed out that the node number of the second hidden layer is M×2, where M is the node number of output layer. As to the first layer, the best proportion of it and the second hidden layer is 3:1 when the input vector is high dimensional[11]. But the size of hidden layer and its unit number should also be gained from statistical estimation and actual experiment[12]. In this classifier, two hidden layers with sigmoid transfer functions are utilized based on the theories above, and our plentiful experiments also showed that double hidden layers will earn better result than single one. Finally, the classifier model was established successfully, and its structure is shown as Fig.1.
Fig.1 The structure of the classifier
2. Training algorithm with dynamic thresholds In the traditional classification methods, the most commonly used threshold functions are hard limiting function, symmetrical hard limiting function, sigmoid function, and hyperbolic tangent sigmoid function. They are respectively shown as Eqs.(1)~(3).
⎪⎧0, x < θ f (x ) = ⎪⎨ ⎪⎪1, x ≥ θ ⎩
f (x ) = f (x ) =
1 1 + e −x
2 −1 1 + e −x
(1)
(2) (3)
here Eq.(1) is the symmetrical hard limiting function, and when θ = 0, it is the hard limiting function. Both of them have fixed threshold value. Eq.(2) is the Sigmoid function, which can limit the output value into interval (0, 1). Eq.(3) is the hyperbolic tangent sigmoid function and its output range is (–1,1). In our study, the BP algorithm for multi-layer neural network is employed to train the neural network[13]. But in the output layer, we use a different transfer function with dynamic thresholds to translate every output vector to the standard classification data. The transformation function is shown as Eq.(4). ⎧ ⎪1, yi = Max[y1, y2 , ", ym ] f (yi ) = ⎪ ⎨ ⎪ 0, others ⎪ ⎪ ⎩
(4)
in which i = 1,2, ", m, Y = [y1 y2 " ym ] is the neural network output vector of an input. Here, we set the maximum output as 1, and others as 0. In classification questions, especially multiclass ones, the neural network model usually has multi-output, and each output node stand for a certain class. In an output vector, we generally expect that the only positive class has a positive value (is usually 1), and the rest have a negative value (such as 0 or –1). However, the first two above threshold functions have fixed threshold value θ, so the neural network must only sets the right output’s value bigger than θ, and others below θ. Otherwise, we can not get the correct classification data, so the training performance requirement is very strict. The Sigmoid function and the hyperbolic tangent sigmoid function can not obtain the standard classification data and they also need high training accuracies. As to the dynamic threshold function, it accords with the feature of BP training method for neural network classifier, which trains the classification neural network’s outputs in two opposite directions. The positive output approaches to 1, and the rest approach to 0. The threshold function just selects the biggest output as the positive class output, so the correct standard output data can be obtained. In the following application section, the experiment data shows that this method can significantly improve the classification performance to a high level,
ZHANG et al. Research and Application of a Neural Network Classifier Based on Dynamic Threshold
and doesnot need a high training precision.
III. Application in Forecasting Box Office Revenue 1. Experiment method
Forecasting box office revenue of a movie before its theatrical release is a difficult and challenging problem. In this study, the two years box office data between 2005 and 2006 was gotten from Wanda Cinema Line Company in China, and we collected 241 movies after preprocessing with the original data. These movies were divided into six categories ranged from “blob” to “bomb” according to their box office incomes, and the purpose is to predict a film into the right class. The details of class rule is shown in Tab.1. The selections of the input variables are based on market survey and their weight values are determined by using statistical method. As to the structure parameter of the classifier, there are 30 nodes given to the first hidden layer, and 10 nodes for the second one. Tab.1
Movie classes
409
predictive performance of our neural network approach, and the main performance indexes include absolute accuracy and relative accuracy. Both of them are Average Percent Hit Rate ( APHR )[15]. The absolute accuracy is the exact ( Bingo ) hit rate and the relative accuracy is within 1 class ( 1-Away ) hit rate which reflects the instance that a movie predicted into its adjacent classes. Algebraically, APHR can be formulated by Eqs.(5)~(7). APHR =
Correctly classified number Total number of samples
1 C ∑ pi n i =1
(6)
1 C ∑ (pi−1 + pi + pi +1 ) n i =1
(7)
Bingo = 1 − Away =
(5)
where C is the total number of classes (=6), n is the total number of samples which belongs to class i, and pi is the total number of samples classified as class i, and if i < 1 or i > 6, pi = 0. 3. Results
Class no.
Range(×104RMB)
Number of movies
1
(blob, 10)
33
2
[10,30)
47
3
[30,80)
49
4
[80,140)
46
5
[140,200)
35
6
[200, bomb)
31
Reliable estimates of classification accuracy are important, not only for estimating the true accuracy of a classifier, but also for finding the best classifier from a set of competitive ones (modal selection). There is no universal learning algorithm giving the best performance in all possible learning situations[14]. In this paper we propose deterministic approaches for k-fold cross-validation that constructs representative rather than random folds. The cross-validation estimate of the overall accuracy is calculated as simply the average of the k individual accuracy measures. With these methods we attempt to reduce the effects of using fewer instances for training. In our study, all samples are divided into six stratified groups, so an six-fold cross-validation is utilized. 2. Performance indexes
We used percentage success rate to measure the
In our study, MATLAB 7.0 is used to realize the neural network model and the algorithm. MATLAB is a powerful simulation platform developed by Mathworks, and it is very adaptive to simulate intelligent algorithms and to solve complex problems. In classification problems, confusion matrix is commonly used to represent the results. This kind of representation method is intuitive and easy to understand, so we also employ this way to show our results. Tab.2 shows the aggregated six-fold cross-validation neural network classifier results in a confusion matrix. Tab.2
The BP neural network classification results
Actual catego-
MLBP predicted categories 2
3
4
5
6
Bingo
1-Away
(%)
(%) 90.9
ries
1
1
24
6
3
0
0
0
72.7
2
9
25
13
0
0
0
53.2
100
3
3
10
26
13
0
0
53.1
93.9 97.8
4
0
1
5
34
6
0
73.9
5
0
0
0
4
24
7
68.6
100
6
0
0
0
0
4
27
87.1
100
–
–
–
–
–
–
68.1
97.1
Average
410
JOURNAL OF ELECTRONICS (CHINA), Vol.26 No.3, May 2009
Ramesh Sharda and Dursun Delen have attempted to predict box office revenue by using MLP method[15]. We also employ this way, and the MLP aggregated results are shown in Tab.3. We also put the results of every method into a bar chart as shown in Fig.2, which presents Bingo and 1-Away respectively. As the results indicate, on an average, our classifier generates significantly better classification accuracy than the MLP method. Tab.3 Actual catego-
1
1
10
2
3
3
1
4
1
PTrain =
Right training sample number Total number of training samples
(8)
The results of the MLP method
MLP predicted categories
ries
sponds to the final results.
Bingo
1-Away (%)
3
4
5
6
(%)
11
3
6
1
2
30.3
63.6
15
17
7
5
0
31.9
74.5
11
16
13
4
4
32.7
81.6
3
2
20
13
7
43.5
76.1
2
Fig.3 Comparison results of the MLBP classifier on different training accuracies
5
0
3
4
8
15
5
42.9
80.0
6
0
0
0
4
9
18
58.1
87.1
IV. Conclusion and Discussion
–
–
–
–
–
–
39.9
77.2
In this paper, a neural network classifier with dynamic threshold is proposed and it is applied to predict the box office success of a movie before its theatrical release. In any case, the results of this study are very attractive, and prove the continuous value of BP neural networks in solving difficult classification problems. Beyond the accuracy of our results in predicting box office success, the classifier model could also be adapted to forecast the success rates of other media products. From an application perspective, once developed to a production system or a commercialization software platform, such a classifier model can be made available (via a web server or as an application service provider) to industrial decision makers, where individual users can plug in their own movie parameters to forecast the potential success of a motion picture before its theatrical release. Lately, researchers have been developing some knowledge reduction methods to optimize the parameters of neural networks, for example, rough set theory, genetic algorithm, etc. Application of such methods can improve the results that we have obtained in this study. Much additional work, in terms of modeling extensions, further experiments for testing the performance, and applications to other media product demand forecasting, remains to be done, and this is also the striving direction of our further research.
Average
Fig.2 Comparison results of different methods
In the classifier, we utilize dynamic thresholds rather than fixed ones to standardize the output. We set the maximum output of the neural network as 1, and the others as 0. The main advantage of this classification method is that its generalization ability is strong even the training precision is not very high. The training accuracy (PTrain ) is computed by Eq.(8). Finally, we did some simulations on different training precisions, and the results are shown in Fig.3, from which we can see that our model has strong robustness, and it can obtain a high forecasting performance even when the training accuracy is not very high. In our study, the best training accuracy is 95.13%, which corre-
ZHANG et al. Research and Application of a Neural Network Classifier Based on Dynamic Threshold
References [1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
I. A. Basheer and M. Hajmeer. Artificial neural networks: fundamentals, computing, design, and application. Journal of Microbiological Methods, 43(2000)1, 3–31. Qiang Li, Jing-Yuan Yu, Bai-Chun Mu, and Xu-Dong Sun. BP neural network prediction of the mechanical properties of porous NiTi shape memory alloy prepared by thermal explosion reaction. Materials Science and Engineering: A, 419(2006)1/2, 214–217. Subhash Kak. A class of instantaneously trained neural networks. Information Sciences-Applications: An International Journal, 148(2002)1/4, 97–102. Colin O. Benjamin, Sheng-Chai Chi, Tarek Gaber, and Catherine A. Riordan. Comparing BP and ARt II neural network classifiers for facility location. Computers & Industrial Engineering, 28(1995)1, 43–50. Tae Hyup Roh. Forecasting the volatility of stock price index. Expert Systems with Applications, 33(2007)4, 916–922. Baoan Yang, Ling X. Li, Hai Ji, and Jing Xu. An early warning system for loan risk assessment using artificial neural networks. Knowledge-Based Systems, 14(2001)5/6, 303–306. Kidong Lee, David Booth, and Pervaiz Alam. A comparison of supervised and unsupervised neural networks in predicting bankruptcy of Korean firms. Expert Systems with Applications, 29(2005)1, 1–16. D. Delen, R. Sharda, and P. Kumar. Movie forecast
[9]
[10]
[11]
[12]
[13]
[14]
[15]
411
Guru: A web-based DSS for Hollywood managers. Decision Support Systems, 43(2007)4, 1151–1170. G. Cybenko. Approximation by superpositions of sigmoidal function. Mathematics of Control Signals, and System, 2(1989), 303–314. R. P. Lippmann. Pattern classification using neural networks. IEEE Communications Magazine, 27(1989) 11, 47–64. S. Y. Kung and J. N. Hwang. An algebraic projection analysis for optional hidden units size and learning rate in back-propagation learning. IEEE International Conference on Neural Networks, San Diego, CA , July 24–27, 1988, Vol.1, 363–370. Yoshio Hirose, Koichi Yamashita, and Shimpei Hijiya. Back-propagation algorithm which varies the number of hidden units. Neural Networks, 4(1991)1, 61–66. Song Yibin. Quick training method for multi-layer BP neural network and its application. Control and Decision, 15(2000)1, 125–127 (in Chinese) . 宋宜斌. 多层感知器的一种快速网络训练法及其应用. 控制与决策, 12(2000)1, 125–127. C. Schaffer. A conservation law for generalization performance. Proc. 11th International Conference on Machine Learning, San Mateo, CA, July 10–15, 1994, 259–267. Ramesh Sharda and Dursun Delen. Predicting box-office success of motion pictures with neural networks. Expert Systems with Applications, 30(2006) 2, 243–254.