Australas Phys Eng Sci Med (2014) 37:439–456 DOI 10.1007/s13246-014-0264-y
TECHNICAL PAPER
Optimal selection of mother wavelet for accurate infant cry classification J. Saraswathy • M. Hariharan • Thiyagar Nadarajaw Wan Khairunizam • Sazali Yaacob
•
Received: 31 July 2013 / Accepted: 19 March 2014 / Published online: 2 April 2014 Ó Australasian College of Physical Scientists and Engineers in Medicine 2014
Abstract Wavelet theory is emerging as one of the prevalent tool in signal and image processing applications. However, the most suitable mother wavelet for these applications is still a relative question mark amongst researchers. Selection of best mother wavelet through parameterization leads to better findings for the analysis in comparison to random selection. The objective of this article is to compare the performance of the existing members of mother wavelets and to select the most suitable mother wavelet for accurate infant cry classification. Optimal wavelet is found using three different criteria namely the degree of similarity of mother wavelets, regularity of mother wavelets and accuracy of correct recognition during classification processes. Recorded normal and pathological infant cry signals are decomposed into five levels using wavelet packet transform. Energy and entropy features are extracted at different sub bands of cry signals and their effectiveness are tested with four supervised neural network architectures. Findings of this study expound that, the Finite impulse response based approximation of Meyer is the best wavelet candidate for accurate infant cry classification analysis. Keywords Infant cries Mother wavelets Similarity Regularity Classification accuracy
J. Saraswathy (&) M. Hariharan W. Khairunizam S. Yaacob School of Mechatronic Engineering, University Malaysia Perlis (UniMAP), Campus Pauh Putra, 02600 Arau, Perlis, Malaysia e-mail:
[email protected] T. Nadarajaw Department of Pediatrics, Hospital Sultanah Bahiyah, 05460 Alor Setar, Kedah, Malaysia
Introduction Crying is a form of biological magnetic siren for an infant. It is their only means of communication and infants generally attract the attention of their external vicinity by crying to express their needs. Naturally, it is highly non deterministic and carries numerous levels of information about an infant as shown in Fig. 1 [1]. Consequently, it is really a confusing task to identify the exact purpose of the cry signals. Investigations on the newborn cry signals are previously performed mainly to detect the pathological status of the recently born infants by using various types of conventional methods namely auditory analysis—one of the more common method in infant cry recognition analysis and the main tool of this analysis is the human ear which could distinguish different types of signals after some repetitions and experiences, time domain analysis—a discrimination method which requires the time domain based features of signals such as latency and amplitude of the signals, frequency domain analysis—classification based on the frequency information of signals and spectrographic analysis—an amalgamation of time and frequency domain analysis and has been an imperative tool in acoustic analysis of infant cry [2]. Although these existing methods have drawn noteworthy impacts in the infant cry classification area, they are totally based on the subjective evaluation, require good expertise, intangible and time consuming. Figure 1 briefly illustrates the drawbacks and limitations of the fore-mentioned conventional methods. Hence, for immaculate diagnostic the needs for the automatic classification of infant cry signals are emerging rapidly due to its significant perks. Being a fully automated system, the diagnosis judgment and results will be accurate, fast and not limited to the quantity of infant cry signal which are under diagnosis. Manual inspection of experts
123
440
Australas Phys Eng Sci Med (2014) 37:439–456
Fig. 1 Diverse levels of information conveyed in infant cry and the existing methods in infant cry classification
Infant Cry
Weight
Health
Identity
First Cry
Gender Emotions
PretermVs fullterm
Pathology
Conventional classification methods
Auditory Not reproducible Provides only a meager or a tiny proportion of information The rate of correct classification is highly depends on the experience and expertise
will no longer be required. Moreover, it will be easily reckonable, completely harmless and not unbearable to the infant. This non-invasive method has been widely used in infant cry signal analysis and has shown very promising results. In the development of automated infant cry classification systems, momentous researches have been carried out on this infant cry classification analysis and successfully detected certain pathological conditions among recently newborn babies such as brain damage [3], cleft palate [4], hydrocephalus [5], sudden infant death syndrome [6] and others [7, 8]. Recently, the classification of two or three classes of infant cry signals, especially using the normal, asphyxia and deaf cries is a detour. This is because, asphyxia is a type of respiratory disorder which may cause some peril long-lasting problems such as cerebral palsy, mental retardation, speaking, hearing, visual and
123
Time domain
Frequency domain
Provides limited information (only time based details are available)
Provides a vulgar representation of the frequency spectrum characteristics
Frequency based information are not provided
Time based information are not provided
Requires analysis by an expert
Requires analysis by an expert
Spectrographic Requires manual and wary inspection Restricted to large quantity of signals under analysis
Requires analysis by an expert
learning disabilities and even fatality if not subjected to early diagnosis and treatments. According to the World Health Organization forecast, in worldwide 4 to 9 million cases of newborn asphyxia are reported annually and 20 % of all newborn deaths are due to this mess [9, 10]. In addition, deafness or ‘hypo-acoustic’ which defined as the insufficiency of hearing ability may deter the performance of child’s learning and development stages, especially in school life if not subjected to early diagnosis and treatments [11]. Rosales et al. [12] investigated the application of fuzzy relational neural network (FRNN) in discriminating the extracted Mel frequency cepstral coefficients (MFCCs) of normal and asphyxia cry signals and achieved best accuracy of 88.67 %. Zabidi et al. [13] presented an analysis on binary particle swarm optimization for selection of MFCCs
Australas Phys Eng Sci Med (2014) 37:439–456
in the recognition of infant cries with asphyxia. The highest correct recognition rate of 95.07 % was reported by using Multi layer perceptron (MLP) neural network which was trained with scaled conjugate gradient algorithm. Wavelet packet transform (WPT) based features used for characterizing the normal and pathological infant cry (asphyxia) signals. This study reported an optimal recognition rate of 99 % using Probabilistic neural network (PNN) [14]. Rosales et al. [15] analyzed the effectiveness of their proposed genetic selection of fuzzy model (GSFM) which was modeled with an optimized combination of feature selection method, type of fuzzy processing and learning algorithm using genetic algorithm technique, in discrimination of the extracted MFCCs from normal and asphyxia cries. The best diagnosis accuracy of their proposed method was 90.68 %. PNN and General regression neural network (GRNN) classifiers used to classify normal and asphyxia cries using time frequency based statistical features and reported 99 % as the best achieved accuracy, employing principal component analysis (PCA) as feature reduction method [16]. Maximum accuracy of 97.55 % successfully presented in discrimination of normal and deaf infants using MFCCs and FRNN [12]. Time frequency based statistical features proposed for automatic classification of normal and deaf cry signals, and the best performance of the proposed features reported as 99 % using GRNN classifier [17]. MFCCs and GSFM which designed with an optimal combination of feature selection method, fuzzy processing type and learning algorithm implemented to distinguish normal and deaf cries, resulted with optimum accuracy of 99.42 % [15]. Hariharan et al. [9] developed a method based weighted linear prediction cepstal coefficient (WLPCC) and PNN for the detection of normal and pathological (asphyxia and deaf) status from infant cry signals. Due to the highly non-stationary characteristic of infant cry signals, the time–frequency analysis is an excellent approach for analyzing them, in time and frequency scale simultaneously without loss of any prominent information [17]. To the best of our knowledge, there is no research on selection of suitable mother wavelet for classification of different classes of infant cry signals with high accuracy by focusing on the time frequency analysis. The proposed research work’s aim is to select the best mother wavelet for infant cry classification by investigating the effectiveness of different mother wavelets (haar, daubechies, symlet, coiflet, biorthogonal, reverse biorthogonal and finite impulse response (FIR) based approximation of Meyer) in three different criteria such as: degree of similarity of mother wavelet with cry signal by assessing the cross correlation coefficient, regularity of mother wavelet in terms of the distribution of significant extracted wavelet packet based features using respective mother wavelets and classification accuracy of binary (experiment 1: normal vs
441
asphyxia and experiment 2: normal vs deaf) and multi class problems (experiment 3: normal vs asphyxia vs deaf) using the wavelet packet based features from different mother wavelets as inputs for various supervised classifiers. The rest of the paper is organized as follows. ‘‘Infant cry database’’ section deals with a brief explanation of the infant cry database used in this work. ‘‘Proposed Methodology for Selection of Best Mother Wavelet’’ section deals with the proposed methodology of this present work, including introduction to mother wavelet, WPTand the feature extraction of energy and Shannon entropy features with the employed classifiers. The results and discussion from the three different selection methods of this study are briefly presented in ‘‘Results and discussion’’ section. Finally, this work concluded in ‘‘Conclusion’’ section with some future directions.
Infant cry database The infant cry signals under investigation are obtained from a standard Mexican database which is a property of the Instituto Nacional de Astrofisica Optica y Electronica (INAOE)–CONACYT, Mexico [18]. It consists of 507 of normal cry signals, 340 of asphyxia cry signals and 879 of deaf cry signals with the length of 1 s. The infant cry samples are recorded directly by specialized physicians from just born up to 6 month old of babies. The samples are labeled in the moment of their recording. Labels contain information about the cause of the cry or the pathology presented. Asphyxia is determined by the presence of metabolic acidosis (pH 7.00), apgar of 0–3 to 5 min and neurological manifestations as convulsions, coma or hypotonic, as well as evidence of multi-organic dysfunction, with cellular and biochemical damage and circulatory alterations. The collection of deaf samples is carried out from babies who already diagnosed as deaf by a group of doctors specialized in communication disorders [19, 20]. All the cry signals which used for our analysis are resampled to 16 kHz [14]. Table 1 tabulates the characteristics of the database and the samples used for the three different experiments (experiment 1, experiment 2 and experiment 3) of our analysis. Figure 2 demonstrates the estimated energy spectrum of infant cry signals (normal, deaf and asphyxia). By visually inspecting the Fig. 2, one may distinguish the different patterns of cry signals. Nevertheless, it may lead to incorrect elucidation from the spectrum plot or misclassification as well since there are higher degrees of overlaps between the spectrums of cry signals for certain frequency bands and will strictly requires good knowledge and expertise to analyze. Hence, automatic recognition of infant cry signals is desired by using advanced signal
123
442
Australas Phys Eng Sci Med (2014) 37:439–456
Table 1 Characteristics of database Features
Original database Normal
Number of samples Sampling frequency, fs (Hz) Sample length (s)
Asphyxia
Deaf
Experiment 1
Experiment 2
Experiment 3
Normal
Normal
Normal
Asphyxia
Deaf
Asphyxia
Deaf
507
340
879
340
340
507
507
340
340
340
22,050
11,025
8,000
16,000
16,000
16,000
16,000
16,000
16,000
16,000
1
1
1
1
1
1
1
1
1
1
Experiment 1, normal vs asphyxia; experiment 2, normal vs deaf; experiment 3, normal versus asphyxia vs deaf
processing techniques which are necessary for mining the useful information of cry signals for quantification and efficient discrimination of cry signals.
Proposed methodology for selection of best mother wavelet Due to the highly non stationary characteristics of infant cry signals, the performance of the newborn signals with different mother wavelets is investigated in different circumstances or manner to enhance the selection result. In the current study, the best mother wavelet for infant cry classification is selected through evaluating the performance of the different mother wavelets based on the three distinguishable criteria: degree of similarity of mother wavelets with cry signals, regularity of mother wavelets and experimental results. Figure 3 illustrates entirely the overall block diagram of the proposed methodology of the analysis which incorporates the respective methods of the selected criteria. The methodologies used in the present study were described briefly in the following sections. Method 1: similarity of mother wavelet with cry signals One of the most paramount elements that must be considered in wavelet domain studies is the similarity of the signal under investigation with the wavelet to be analyzed. Good similarity between different waveforms is necessary for better analysis and consistent results. A mother wavelet is said to be similar with a signal, if the wavelet is able to divulge its own frequency spectrums when correlated with a signal, which are also contained in the signal under analysis [21, 22]. Cross correlation is a superb tool to measure the similarity of two waveforms as a function of a time as it is insensitive to noise, simple and versatile. Hence, cross correlation technique is used to evaluate and asses the degree of similarity between mother wavelets and different (normal and pathological) cry signals. In this study, the low pass wavelet filter from wavelet filter bank MATLAB [23] and one unit sample of infant cry signal from different classes are cross correlated. All the signals are normalized between the range of 0 and 1 before cross
123
correlation. Hence, the co-efficient value for each cross correlation would possess highest value of 1 and minimum value of 0. Cross coefficient value which is nearer to ‘1’ indicates the good similarity whereas ‘0’ refer to worst similarity of two waveforms. Accordingly after passing through cross correlation, the coefficients’ values amongst all cross correlated coefficients are considered for selection of best mother wavelet [21, 22]. Thus the following steps are followed to select the best wavelet in cross correlation coefficient: 1. 2.
3.
A specific mother wavelet is selected, low pass, decomposed from wavelet filter bank MATLAB library. The cross correlation coefficient is computed between normalized cry signal and normalized selected mother wavelet filter. The best mother wavelet which maximizes the cross correlation coefficient is selected.
Method 2: regularity of mother wavelets Regularity is one of the most vital properties of wavelet basis because it is responsible for a number of key wavelet properties such as vanishing moments, an order of approximation, smoothness of the mother wavelets and reproduction of polynomials. It is also useful for getting nice and significant features, like smoothness of the reconstructed signal, and for the estimated function in nonlinear regression analysis [21– 24]. Normally in image processing applications, the regularity of mother wavelets is determined by analyzing the smoothness of the reconstructed signal, and by calculating some significant parameters such as compression ratio, distortion, root mean square error and cross correlation [24, 25]. Theoretically, the decomposed wavelet coefficients are used to reconstruct back the original signal, good wavelet coefficients that are retained the maximal originality of the signal with minimum distortions of artifacts or unwanted noises which may originated from the decomposition algorithm will reproduce a smoother signal. In the study, in order to identify and asses the regularity level of different mother wavelets the significance of wavelet packet based features of different datasets (normal vs asphyxia and normal vs deaf) which are computed from wavelet coefficients are considered. An
Australas Phys Eng Sci Med (2014) 37:439–456 Fig. 2 Estimated spectrum of the corresponding cry signals
443
80 Deaf Normal Asphyxia
70 60 50
dB
40 30 20 10 0 -10 -20
0
1000
2000
3000
4000
5000
6000
7000
8000
Frequency (Hz)
Fig. 3 Block diagram of the proposed best mother wavelet selection methodology Infant cry signal Infant cry signal Method 1: Similarity of mother wavelets with cry signals
Wavelet packet transform (Convolution cry signal with mother Waveletofpacket transform (Convolution ofwavelet) cry signal with mother wavelet)
Feature extraction using: haar,db2,db3,db4,db5,db6,db7,db8,db9,db10,db20, Feature extraction using: sym2,sym3,sym4,sym5,sym6,sym7,sym8, sym9, sym10, haar,db2,db3,db4,db5,db6,db7,db8,db9,db10,db20, coif1,coif2,coif3,coif4,coif5,bior1.1, sym2,sym3,sym4,sym5,sym6,sym7,sym8, sym9, sym10, bior1.3, coif1,coif2,coif3,coif4,coif5,bior1.1, bior1.5,bior2.2,bior2.4,bior2.6,bior2.8,bior3.1,bior3.3,bior3.5,bior3.7, bior1.3, bior3.9,bior4.4,bior5.5, bior1.5,bior2.2,bior2.4,bior2.6,bior2.8,bior3.1,bior3.3,bior3.5,bior3.7, bior6.8,rbio1.1,rbio1.3,rbio1.5,rbio2.2,rbio2.4,rbio2.6,rbio2.8,rbio3.1, bior3.9,bior4.4,bior5.5, rbio3.3,rbio3.5,rbio3.7,rbio3.9,rbio4.4,rbio5.5,rbio6.8 and dmey bior6.8,rbio1.1,rbio1.3,rbio1.5,rbio2.2,rbio2.4,rbio2.6,rbio2.8,rbio3.1, at 5th level decomposition rbio3.3,rbio3.5,rbio3.7,rbio3.9,rbio4.4,rbio5.5,rbio6.8 and dmey at 5th level decomposition
Extracted wavelet coefficients (Energy & Entropy) Extracted wavelet coefficients (Energy & Entropy)
Method 2: Regularity of mother wavelets
Method 3: Classification results
Classification of infant cries (PNN, GRNN, MLP and TDNN) Classification of infant cries (PNN, GRNN, MLP and TDNN)
Normal versus Normal Asphyxia versus Asphyxia
Normal versus Deaf Normal versus Deaf
Normal versus Asphyxia Normal versus versus Deaf Asphyxia versus Deaf
123
444
Australas Phys Eng Sci Med (2014) 37:439–456
independent sample t test (p \ 0.0001 and 99.99 % of confidence interval) is performed to evaluate the number of significant wavelet based features extracted from each mother wavelets. The features (energy and entropy) are extracted at different sub bands using wavelet packet with different mother wavelets namely haar, daubechies, symlet, coiflet, biorthogonal, reverse biorthogonal and FIR based approximation of Meyer (dmey). Number of decomposition level is chosen as five based on the previous work by Hariharan et al. [14] since they have reported that the maximum accuracies are obtained from the fifth level of wavelet packet decomposition using PNN in classifying normal and asphyxia cry signals (Please refer to the ’’Extracted wavelet coefficients’’ section for further information on extraction of energy and entropy features). Thus the following steps are followed to select the optimal wavelet in number of significant features: 1. 2. 3.
4.
Cry signal is decomposed into fifth level using WPTand with a specific mother wavelet. Energy and entropy features are computed at different sub bands of cry signals. The independent t test (p \ 0.0001) is performed among the extracted wavelet packet based features of the datasets (normal vs asphyxia and normal vs deaf). The best mother wavelet which maximizes the number of significant features is selected.
Method 3: classification results In medical diagnostic area, it is necessary to discriminate different patterns of samples effectively with higher rate of correct classification or accuracy. Hence, the classification result is used as one of the selection method for identifying the best mother wavelet for infant cry classification. Four different types of artificial neural networks (PNN, GRNN, MLP and TDNN - Please refer to’’Classifiers’’ section for further information on these
classifiers) are trained to classify the different wavelet packet based cry features which are extracted from the fifth level of decomposition into three different classes (experiment 1: normal vs asphyxia, experiment 2: normal vs deaf and experiment 3: normal vs asphyxia vs deaf). Two classification validation schemes are (conventional and 10-fold cross validation) are used to prove the steadfastness of the classification results. Thus the following steps are followed to select the optimal wavelet in empirical accuracy: 1. 2. 3.
4.
Cry signal is decomposed into fifth level using wavelet packet transform with a specific mother wavelet. Energy and entropy features are computed at different sub bands of cry signals. Extracted feature vectors are discriminated using PNN, GRNN, MLP and TDNN classifiers through conventional and 10-fold cross validation schemes. The best mother wavelet which maximizes the classification accuracy of the three different experiments is selected.
Mother wavelet and wavelet packets transform Mother wavelet is a basic wave shaped signal which is associated with translation and dilation activities when involve with a signal decomposition algorithm. If w(t) is a mother wavelet, the basis function at discrete scale a and discrete dilation b is as shown in Eq. 1. wa;b ðtÞ ¼ 2a=2 wð2a t bÞ
ð1Þ
where a and b are the discrete dilation and discrete translation respectively. The inner product of the basis function with the signal at different scales and translations may endow with the complete spectrum of wavelet coefficients [26]. For the present investigation, a set of different types of mother wavelets (haar, daubechies, symlet, coiflet, biorthogonal, reverse biorthogonal and FIR based approximation
Table 2 Characteristics of different mother wavelets Wavelet
Surname
Biorthogonal
Symmetry
Haar
‘haar’
Yes
Yes
Yes
Yes
1
2
Daubechies
‘db’
Yes
Far from
Yes
Yes
N
2N
Symlet
‘sym’
Yes
Near from
Yes
Yes
N
2N
Coiflet
‘coif’
Yes
Near from
Yes
Yes
N
6N
Biorthogonal
‘bior’
Yes
Yes
No
Yes
Nr, Nd
Max(2Nr, 2Nd) ? 2
Reverse Biorthogonal
‘rbio’
Yes
Yes
No
Yes
Nr, Nd
Max(2Nr, 2Nd) ? 2
Finite impulse response (FIR) based approximation of Meyer
‘dmey’
Yes
Yes
Yes
Yes
–
62
N, Order of wavelet; recon, reconstruction; dec, decomposition
123
Orthogonality
Compact support
Vanishing order
Filter length
Australas Phys Eng Sci Med (2014) 37:439–456
445
Infant Cry Signal
1st
2nd 3rd 4th 5th f1
f2
f3 f4 f5
f6 f7
f8 f9 f10
f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30 f31 f32
…
Energy Entropy
Fig. 4 Wavelet packet based features (energy and entropy) extraction
of Meyer) was considered. These wavelet families are suitable for both continuous and discrete wavelet transform (DWT), however they differ in characteristics. Table 2, presents the crucial characteristics of mother wavelets namely symmetry (useful in avoiding de-phasing), compact support (allow efficient implementation), orthogonality (allow fast algorithm), filter length (determine degree of smoothness), biorthogonal (provides phase linearity) and vanishing order [24, 25]. The further information regarding these wavelet functions can be reviewed from earlier research works [27–29]. WPT is an extension of wavelet transform (WT) which requires a mother wavelet for its algorithm function [30]. It has been widely and successfully applied in different applications [30–32], since WPT splits the original signals into both low and high frequency bands as well as provides more and better frequency resolution features about the original signal of analysis. Furthermore, the multi resolution property of WPT is very useful in voice signal processing areas [32]. The major difference between WT and WPT is the structure of the binary tree, where the WT gives a left recursive binary tree structure by decomposing the lower frequency band whereas WPT gives a balanced binary tree structure by decomposing both the lower (approximation coefficients) and higher frequency bands (detail coefficients) [14]. Extracted wavelet coefficients In this present work, the normal and pathological infant cry signals are decomposed into five levels by different mother wavelets: haar, daubechies (order2–10 & 20), symlet (order 2–10), coiflet (order 1–5), biorthogonal (order 1.1, 1.3, 1.5, 2.2, 2.4, 2.6, 2.8, 3.1, 3.3, 3.5, 3.7, 3.9, 4.4, 5.5 and 6.8),
reverse biorthogonal (order 1.1, 1.3, 1.5, 2.2, 2.4, 2.6, 2.8, 3.1, 3.3, 3.5, 3.7, 3.9, 4.4, 5.5 and 6.8) and FIR based approximation of Meyer (dmey). Energy and Shannon entropy are computed using the extracted wavelet packet coefficients as shown in Fig. 4 The Eqs. 2 and 3 are used to extract the sub band energy and entropy features respectively. 2P 2 3 m p C 6 i¼1 5;k 7 Energy ¼ log 104 5 ð2Þ L m ¼ 1; 2; 3. . .5; k ¼ 0; 1; 2. . .25 1 where k is the wavelet packet node, m represents the number of decomposition level, p is the scale index and L is the number of wavelet coefficients of corresponding sub bands. Table 3 The learning parameters of the MLP and TDNN neural networks Network function
MLP
TDNN
Number of layers
3
3 with input delay (0,1)
Number of input neurons Number of hidden neurons
32 23/24
32 23/24
Number of output neurons
2* and 3*
2* and 3*
Performance goal
0.001
0.001
Learning rate
0.1
0.1
Momentum factor
0.9
0.9
Training algorithm
Scaled conjugate algorithm
Scaled conjugate algorithm
Activation function
‘logsig’, ‘logsig’
‘logsig’, ‘logsig’
2*, For normal versus asphyxia & normal versus deaf; 3*, For normal versus asphyxia vs deaf
123
446
Australas Phys Eng Sci Med (2014) 37:439–456
Fig. 5 Comparative plots of correlation coefficients with different mother wavelet filters for normal and pathological cry signals
Fig. 6 Number of the significant features of different datasets from chosen mother wavelets selection through classification accuracy
Entropy ¼
m X p 2 p 2 C5;k logC5;k ;m ¼ 1; 2; 3. . . 5; i¼1
k ¼ 0; 1; 2. . .25 1 ð3Þ where k is the wavelet packet node, m represents the number of decomposition level and p is the scale index. Through this process, 32 energy and 32 entropy features are extracted from different cry signals. Classifiers Artificial neural networks are artificially designed decision making tools with many interconnected neurons. Recently, the importance or usage of artificial neural networks in multidisciplinary areas is cannot be denied [33–35]. In the present study, two radial basis neural networks namely PNN and GRNN are used for the classification of normal and pathological cries, since they have some advantages such as being relatively robust with any external
123
disturbances and extra as reported in [36–39]. The networks comprise of 4 different layers such as input layer, patter layer which is activated by exponential function for this analysis, summation layer and output layer. Smoothing parameter (r) is a key element of these radial basis networks because the performance of these networks is highly rely on that [16]. Seeing that, the smoothing parameter for PNN and GRNN is varied between 0.04 and 0.085 in steps of 0.005 based on the experimental investigations. The detailed mathematical derivations about these radial basis neural networks can be found in these papers [36–39]. To compare the reliability of classification results of radial basis neural networks (PNN and GRNN), commonly used neural network models in previous times namely multilayer perceptron and time-delay neural network are also used as classifiers. The number of hidden neurons for MLP and TDNN structures are chosen based on a criteria, that the number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer (23 hidden neurons for two class & 24 hidden neurons for three class problems) [40]. The other learning parameters of MLP and
-2.70 ± 0.60
-3.08 ± 0.57
-4.09 ± 0.64
-3.78 ± 0.68
-3.24 ± 0.88
7
8
9
10
11
4.91 ± 5.08
32
-540 ± 1.10 -4.95 ± 0.87 -6.51 ± 1.13 -6.24 ± 1.25 -5.78 ± 1.27 -6.10 ± 1.29
-4.55 ± 1.17 \0.001
-5.77 ± 1.15 \0.001
-5.29 ± 1.20 \0.001
-4.71 ± 1.21 \0.001
-4.98 ± 1.22 \0.001
-4.74 ± 0.78
-4.35 ± 1.02 \0.001
-4.40 ± 1.08 \0.001
-7.05 ± 0.92 -4.85 ± 0.82
-7.16 ± 0.94
-7.22 ± 1.20 [0.001*
-7.05 ± 1.19 [0.001*
-6.64 ± 1.05 -6.84 ± 0.96
-6.10 ± 1.12 \0.001 -6.57 ± 1.12 \0.001
-4.37 ± 0.99 \0.001
-7.26 ± 0.86
-7.55 ± 0.86 -7.41 ± 0.90
-7.87 ± 1.02
-8.14 ± 1.30 \0.001
-7.95 ± 1.25 \0.001
-7.45 ± 1.21 [0.001*
-2.98 ± 0.88
-3.76 ± 1.37 \0.001
-7.60 ± 1.21 [0.001*
-2.94 ± 0.87
-2.94 ± 0.81 -3.18 ± 0.92
-3.74 ± 0.82
-4.25 ± 1.23 \0.001
-3.48 ± 1.24 \0.001
-3.63 ± 1.36 \0.001
-3.40 ± 0.90
-4.07 ± 1.31 \0.001
-3.90 ± 1.32 \0.001
-3.90 ± 0.78
-4.32 ± 1.12 \0.001
-2.53 ± 0.75
-2.45 ± 1.34 [0.001* -2.9 ± 0.77
-2.97 ± 0.75 -3.12 ± 0.76
-3.38 ± 1.19 \0.001 -3.14 ± 1.31 [0.001*
-4.15 ± 0.74
-1.94 ± 0.70
-2.68 ± 1.18 \0.001
-4.37 ± 1.02 \0.001
-1.81 ± 0.63
-2.41 ± 1.27 \0.001
-2.74 ± 1.47 \0.001
-2.60 ± 0.49 -1.98 ± 0.71
Normal
-2.02 ± 0.91 \0.001
P value
-2.69 ± 1.20 \0.001
Asphyxia
P valuea
-3.17 ± 1.28 \0.001
-3.05 ± 1.13 \0.001
-3.50 ± 1.19 \0.001
-4.10 ± 1.00 \0.001
-2.35 ± 0.67 \0.001
-2.69 ± 0.86 \0.001
-2.00 ± 0.64 \0.001
-1.78 ± 0.77 \0.001
-5.21 ± 0.82 \0.001
-5.31 ± 0.83 \0.001
-4.47 ± 0.87 \0.001 -4.80 ± 0.82 \0.001
-5.50 ± 0.86 \0.001
-5.42 ± 0.84 \0.001
-5.61 ± 0.90 \0.001
-5.63 ± 0.93 \0.001
-1.34 ± 0.55 \0.001
-1.24 ± 0.57 \0.001
-1.26 ± 0.50 \0.001
-1.47 ± 0.46 \0.001
-1.46 ± 0.53 \0.001
-1.33 ± 0.57 \0.001
-1.52 ± 0.47 \0.001
-1.45 ± 0.48 \0.001
-2.05 ± 0.64 \0.001
-1.53 ± 0.71 \0.001
-1.64 ± 0.50 \0.001 -1.75 ± 0.62 \0.001
-2.30 ± 0.66 \0.001
-1.48 ± 0.75 \0.001
-2.70 ± 0.55 \0.001
-3.11 ± 0.49 \0.001
Deaf
* Insignificant features; a, (507 normal ? 507deaf); b, (340 normal ? 340deaf)
3.60 ± -6.49
6.17 ± 6.11
30
2.12 ± -6.72
29
31
6.93 ± -5.58
7.06 ± -5.01
27
28
6.55 ± -4.77
-7.14 ± 0.91
24
26
-7.24 ± 0.93
23
5.95 ± -4.93
-6.82 ± 0.91 -6.98 ± 0.88
21 22
25
-7.33 ± 0.89
-7.49 ± 0.93
19
20
-7.85 ± 1.01
-7.64 ± 0.91
17
-2.88 ± 0.88
16
18
-2.89 ± 0.89
-3.01 ± 0.91
14
15
-3.61 ± 0.78
-3.00 ± 0.76 -3.24 ± 0.61
5 6
-2.96 ± 0.82
-2.13 ± 0.68
4
12
-1.98 ± 0.60
3
13
-2.80 ± 0.18
-2.18 ± 0.67
1
2
Normal
Features
Table 4 Summary of statistics of extracted energy features (dmey)
4.91 ± 5.08
6.17 ± 6.11
3.60 ± -6.49
2.12 ± -6.72
7.06 ± -5.01
6.93 ± -5.58
6.55 ± -4.77
5.95 ± -4.93
-7.14 ± 0.91
-7.24 ± 0.93
-6.82 ± 0.91 -6.98 ± 0.88
-7.49 ± 0.93
-7.33 ± 0.89
-7.64 ± 0.91
-7.85 ± 1.01
-2.88 ± 0.88
-3.01 ± 0.91
-2.89 ± 0.89
-2.96 ± 0.82
-3.61 ± 0.78
-3.24 ± 0.88
-3.78 ± 0.68
-4.09 ± 0.64
-3.08 ± 0.57
-2.70 ± 0.60
-3.00 ± 0.76 -3.24 ± 0.61
-2.13 ± 0.68
-1.98 ± 0.60
-2.18 ± 0.67
-2.80 ± 0.18
Normal
P valueb
-3.39 ± 1.46 \0.001
-3.26 ± 1.27 \0.001
-3.67 ± 1.36 \0.001
-4.24 ± 1.13 \0.001
-2.52 ± 0.66 \0.001
-2.87 ± 0.93 \0.001
-2.18 ± 0.63 \0.001
-1.99 ± 0.80 \0.001
-5.30 ± 0.87 \0.001
-5.40 ± 0.88 \0.001
-4.60 ± 0.98 \0.001 -4.90 ± 0.91 \0.001
-5.59 ± 0.89 \0.001
-5.51 ± 0.88 \0.001
-5.70 ± 0.93 \0.001
-5.72 ± 0.95 \0.001
-1.32 ± 0.56 \0.001
-1.29 ± 0.57 \0.001
-1.23 ± 0.52 \0.001
-1.44 ± 0.46 \0.001
-1.56 ± 0.46 \0.001
-1.42 ± 0.50 \0.001
-1.61 ± 0.42 \0.001
-1.55 ± 0.45 \0.001
-2.12 ± 0.63 \0.001
-1.47 ± 0.70 \0.001
-1.65 ± 0.48 \0.001 -1.84 ± 0.60 \0.001
-2.17 ± 0.64 [0.001*
-1.33 ± 0.70 \0.001
-2.63 ± 0.54 \0.001
-3.12 ± 0.52 \0.001
Deaf
-4.98 ± 1.22
-4.71 ± 1.21
-5.29 ± 1.20
-5.77 ± 1.15
-4.40 ± 1.08
-4.55 ± 1.17
-4.35 ± 1.02
-4.37 ± 0.99
-7.05 ± 1.19
-7.22 ± 1.20
-6.10 ± 1.12 -6.57 ± 1.12
-7.60 ± 1.21
-7.45 ± 1.21
-7.95 ± 1.25
-8.14 ± 1.30
-3.76 ± 1.37
-3.90 ± 1.32
-3.63 ± 1.36
-3.48 ± 1.24
-4.25 ± 1.23
-4.07 ± 1.31
-4.32 ± 1.12
-4.37 ± 1.02
-2.74 ± 1.47
-2.45 ± 1.34
-3.38 ± 1.19 -3.14 ± 1.31
-2.68 ± 1.18
-2.41 ± 1.27
-2.69 ± 1.20
-2.02 ± 0.91
Asphyxia
P value
-3.39 ± 1.46 \0.001
-3.26 ± 1.27 \0.001
-3.67 ± 1.36 \0.001
-4.24 ± 1.13 \0.001
-2.52 ± 0.66 \0.001
-2.87 ± 0.93 \0.001
-2.18 ± 0.63 \0.001
-1.99 ± 0.80 \0.001
-5.30 ± 0.87 \0.001
-5.40 ± 0.88 \0.001
-4.60 ± 0.98 \0.001 -4.90 ± 0.91 \0.001
-5.59 ± 0.89 \0.001
-5.51 ± 0.88 \0.001
-5.70 ± 0.93 \0.001
-5.72 ± 0.95 \0.001
-1.32 ± 0.56 \0.001
-1.29 ± 0.57 \0.001
-1.23 ± 0.52 \0.001
-1.44 ± 0.46 \0.001
-1.56 ± 0.46 \0.001
-1.42 ± 0.50 \0.001
-1.61 ± 0.42 \0.001
-1.55 ± 0.45 \0.001
-2.12 ± 0.63 \0.001
-1.47 ± 0.70 \0.001
-1.65 ± 0.48 \0.001 -1.84 ± 0.60 \0.001
-2.17 ± 0.64 \0.001
-1.33 ± 0.70 \0.001
-2.63 ± 0.54 [0.001*
-3.12 ± 0.52 \0.001
Deaf
Australas Phys Eng Sci Med (2014) 37:439–456 447
123
123
0.67 ± 1.08
0.90 ± 1.54
1.61 ± 2.80
0.99 ± 1.78
29
30
31
32
7.87 ± 6.76
4.91 ± 5.08
6.17 ± 6.11
3.51 ± 3.60
2.08 ± 2.12
6.88 ± 6.93 7.44 ± 7.06
7.45 ± 6.55
7.22 ± 5.95
0.32 ± 0.27
0.25 ± 0.22
0.74 ± 0.67
1.33 ± 1.33
0.19 ± 0.16
0.21 ± 0.18
0.13 ± 0.12
0.14 ± 0.16
\0.001
\0.001
P valuea
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
20.87 ± 15.09 \0.001
21.97 ± 14.74 \0.001
15.80 ± 12.20 \0.001
8.54 ± 6.64
25.93 ± 13.89 \0.001 31.21 ± 13.78 \0.001
38.27 ± 15.24 \0.001
42.03 ± 17.07 \0.001
1.36 ± 0.84
1.23 ± 0.76
3.57 ± 2.70
5.50 ± 4.14
1.03 ± 0.66
1.07 ± 0.65
0.85 ± 0.58
1.19 ± 0.97
43.45 ± 13.36 \0.001
43.44 ± 15.34 \0.001
45.03 ± 13.74 \0.001
45.57 ± 12.40 \0.001
43.91 ± 16.09 \0.001 44.44 ± 12.14 \0.001
46.10 ± 11.87 \0.001
47.68 ± 12.37 \0.001
34.23 ± 14.93 \0.001
39.65 ± 14.83 \0.001
37.83 ± 13.58 \0.001
43.09 ± 13.03 \0.001
27.66 ± 11.88 \0.001
38.82 ± 18.88 [0.001*
22.99 ± 9.41
11.38 ± 4.75
Deaf
(340 normal ? 340deaf)
3.20 ± 6.50 2.39 ± 5.26
\0.001 \0.001 b
1.18 ± 2.20 1.88 ± 3.92
\0.001 \0.001
5.07 ± 5.55 3.82 ± 6.72 4.57 ± 5.92
\0.001
4.12 ± 5.37
\0.001 \0.001 \0.001
0.38 ± 0.71 0.40 ± 0.71
\0.001 [0.001*
0.86 ± 1.51 0.57 ± 0.87
\0.001 \0.001
0.38 ± 0.67 0.35 ± 0.63
\0.001 \0.001
0.25 ± 0.52 0.30 ± 0.52
\0.001 \0.001
18.36 ± 12.24 20.98 ± 12.78
11.51 ± 10.08 \0.001
12.97 ± 11.37 \0.001
21.60 ± 11.62 21.49 ± 12.05
15.55 ± 12.51 \0.001
10.26 ± 7.14 15.93 ± 10.77 11.70 ± 7.65
\0.001 \0.001 \0.001
14.33 ± 12.88 \0.001
10.04 ± 8.69 8.56 ± 7.58
8.17 ± 6.06
[0.001*
27.58 ± 13.72 20.58 ± 11.85
25.91 ± 17.48 [0.001*
23.19 ± 17.74 \0.001
7.34 ± 5.99
21.06 ± 11.51 18.04 ± 10.45
16.09 ± 11.58 \0.001
18.84 ± 14.00 [0.001*
41.56 ± 16.46 37.81 ± 16.07
27.63 ± 17.72 \0.001
26.25 ± 18.17 \0.001
17.87 ± 11.86 39.92 ± 15.07
Normal
35.47 ± 18.64 \0.001
P value
29.99 ± 21.41 \0.001
Asphyxia
* Insignificant features; a(507 normal ? 507deaf);
4.32 ± 2.54
2.30 ± 2.93 3.56 ± 2.71
26
3.24 ± 2.56
25
27 28
0.43 ± 0.84
0.43 ± 0.82
0.46 ± 0.76
22
23
0.58 ± 0.91
21
24
0.43 ± 0.80
0.39 ± 0.75
0.34 ± 0.62
18
19
0.29 ± 0.61
17
20
20.23 ± 12.53
22.05 ± 12.40
15
16
20.66 ± 10.20
21.66 ± 11.30
13
14
10.72 ± 5.53
17.34 ± 10.52 12.50 ± 6.89
10
8.13 ± 4.22
9
11 12
24.53 ± 11.60
17.69 ± 8.78
7
8
20.06 ± 10.46
16.36 ± 8.02
5
6
37.84 ± 16.67
34.43 ± 15.14
3
4
13.00 ± 3.07
35.60 ± 14.69
1
2
Normal
Features
Table 5 Summary of statistics of extracted entropy features (dmey)
0.99 ± 1.78
1.61 ± 2.80
0.90 ± 1.54
0.67 ± 1.08
2.30 ± 2.93 3.56 ± 2.71
4.32 ± 2.54
3.24 ± 2.56
0.43 ± 0.82
0.43 ± 0.84
0.46 ± 0.76
0.58 ± 0.91
0.39 ± 0.75
0.43 ± 0.80
0.34 ± 0.62
0.29 ± 0.61
22.05 ± 12.40
20.23 ± 12.53
21.66 ± 11.30
20.66 ± 10.20
17.34 ± 10.52 12.50 ± 6.89
10.72 ± 5.53
8.13 ± 4.22
17.69 ± 8.78
24.53 ± 11.60
16.36 ± 8.02
20.06 ± 10.46
34.43 ± 15.14
37.84 ± 16.67
35.60 ± 14.69
13.00 ± 3.07
Normal
\0.001
\0.001
P valueb
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
18.82 ± 16.05 \0.001
19.67 ± 15.56 \0.001
14.55 ± 12.91 \0.001
7.80 ± 6.97
22.98 ± 14.02 \0.001 27.44 ± 12.41 \0.001
33.95 ± 14.09 \0.001
37.46 ± 16.98 \0.001
1.27 ± 0.86
1.15 ± 0.76
3.32 ± 2.90
5.01 ± 4.37
0.94 ± 0.65
1.00 ± 0.65
0.78 ± 0.57
1.09 ± 0.95
42.21 ± 12.41 \0.001
41.95 ± 15.47 \0.001
43.57 ± 13.67 \0.001
44.88 ± 11.65 \0.001
44.52 ± 12.22 \0.001 43.68 ± 10.87 \0.001
44.16 ± 10.74 \0.001
45.98 ± 11.75 \0.001
32.38 ± 13.79 \0.001
39.97 ± 13.73 \0.001
36.40 ± 12.67 \0.001
42.49 ± 12.61 \0.001
29.46 ± 12.09 \0.001
38.68 ± 20.39 [0.001*
23.63 ± 9.35
11.53 ± 4.68
Deaf
4.91 ± 5.08
6.17 ± 6.11
3.51 ± 3.60
2.08 ± 2.12
6.88 ± 6.93 7.44 ± 7.06
7.45 ± 6.55
7.22 ± 5.95
0.32 ± 0.27
0.25 ± 0.22
0.74 ± 0.67
1.33 ± 1.33
0.19 ± 0.16
0.21 ± 0.18
0.13 ± 0.12
0.14 ± 0.16
12.97 ± 11.37
11.51 ± 10.08
14.33 ± 12.88
15.55 ± 12.51
10.04 ± 8.69 8.56 ± 7.58
7.87 ± 6.76
7.34 ± 5.99
23.19 ± 17.74
25.91 ± 17.48
18.84 ± 14.00
16.09 ± 11.58
26.25 ± 18.17
27.63 ± 17.72
29.99 ± 21.41
35.47 ± 18.64
Asphyxia
\0.001
\0.001
P value
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
\0.001
18.82 ± 16.05 \0.001
19.67 ± 15.56 \0.001
14.55 ± 12.91 \0.001
7.80 ± 6.97
22.98 ± 14.02 \0.001 27.44 ± 12.41 \0.001
33.95 ± 14.09 \0.001
37.46 ± 16.98 \0.001
1.27 ± 0.86
1.15 ± 0.76
3.32 ± 2.90
5.01 ± 4.37
0.94 ± 0.65
1.00 ± 0.65
0.78 ± 0.57
1.09 ± 0.95
42.21 ± 12.41 \0.001
41.95 ± 15.47 \0.001
43.57 ± 13.67 \0.001
44.88 ± 11.65 \0.001
44.52 ± 12.22 \0.001 43.68 ± 10.87 \0.001
44.16 ± 10.74 \0.001
45.98 ± 11.75 \0.001
32.38 ± 13.79 \0.001
39.97 ± 13.73 \0.001
36.40 ± 12.67 \0.001
42.49 ± 12.61 \0.001
29.46 ± 12.09 [0.001*
38.68 ± 20.39 \0.001
23.63 ± 9.35
11.53 ± 4.68
Deaf
448 Australas Phys Eng Sci Med (2014) 37:439–456
Australas Phys Eng Sci Med (2014) 37:439–456
449
Table 6 Training and testing datasets of 10-fold and conventional validation schemes for three different experiments Experiments
Experiment 1 (340 normal ? 340 asphyxia) Experiment 2 (507 normal ? 507 deaf) Experiment 3 (340 normal ? 340 asphyxia ? 340 deaf)
Validation schemes 10-fold cross validation
Conventional (60 % training, 40 % testing)
Samples were segregated randomly into 10 sets and training was repeated for 10 times
Training = 408 samples
Samples were segregated randomly into 10 sets and training was repeated for 10 times Samples were segregated randomly into 10 sets and training was repeated for 10 times
Testing = 272 samples Training = 608 samples Testing = 406 samples Training = 612 samples Testing = 408 samples
TDNN classifiers are tabulated in Table 3. The signal processing and classification algorithms are developed under MATLAB environment [23].
Results and discussion The following section briefly provides the empirical results and discussion of the analysis, together with the details on training and testing dataset segregation and statistical analysis of extracted feature vectors. Due to the high computational complexity in feature extraction process by using higher order mother wavelets, only the results obtained by using lower order mother wavelets were presented. Selection through similarity of mother wavelet with cry signals Figure 5 shows the comparative plot of correlation coefficients with different low and their respective higher index mother wavelets for normal and pathological (asphyxia and deaf) infant cry signals. The cross correlation results shown in Fig. 5 are MATLAB [23] generated cross correlation coefficient of one unit sample of cry signals with various mother wavelet filters (available in MATLAB library). From the Fig. 5 it was observed that, the correlation coefficient values for the infant cry signals (asphyxia, deaf and normal) when cross correlated with the ‘dmey’ mother wavelet were greater (0.9) compared to other mother wavelets including db20 which was used in [14] to achieve 99 % of maximum accuracy in discrimination of normal and pathological cry signals. Hence, it was inferred that, the ‘dmey’ mother wavelet is intimately similar or well
matched with the infant cry signals that are highly non linear in nature. Results consolidate the aptness of ‘dmey’ mother wavelet as the best wavelet for the infant cry classification. This result will further support our findings with respect to selection of best mother wavelet. Selection through regularity of mother wavelet-number of significant features A comparative analysis was performed to determine the dispersion of significant and useful cry features after validation through independent t test (p \ 0.0001) from two different datasets (normal vs asphyxia and normal vs deaf) which were extracted using different wavelet families (Fig. 6). As seen in Fig. 6, the ‘dmey’ wavelet family was reported the maximum number of useful and significant features in both cases, normal vs asphyxia (24) and normal vs deaf (32) compared to other mother wavelets. Results attested that the ‘dmey’ mother wavelet retained and preserved the original features of the cry signals with less loss of salient information even after the fifth level of wavelet packet decomposition algorithm. The good regularity property of ‘dmey’ mother wavelet is also demonstrated in [41]. Tables 4 and 5 present the discriminatory ability of the wavelet packet features (energy and entropy) which were extracted from ‘dmey’ mother wavelet, in terms of mean, standard deviation and p values through independentsample t test. The statistics of ‘dmey’ was selected and tabulated, since it was outperformed in all the selection methods compared to other mother wavelets. The p values of different datasets such as normal vs asphyxia, normal vs deaf (507 normal ? 507 deaf), normal vs deaf (340 normal ? 340 deaf) and asphyxia vs deaf were analyzed by choosing 99.90 % of confidence interval. From Tables 4 and 5, it has been signified that the features extracted from normal and pathological (asphyxia and deaf) cry signals are almost differentiable, showed greater variation between different groups and most of the features were statistically significant (p \ 0.001). In the current work, the k-fold cross validation (10-fold) or sometimes known as rotation estimation and conventional validation schemes were used to prove the reliability of the classification accuracy [42]. Table 6 shows, the segregation of infant cry samples for training and testing of classification phases for the two classification validation schemes. Tables 7, 8, 9 highlight the simulation results of the three different experiments using different supervised classifiers. The maximum accuracy of 99.11 ± 0.18 % (conventional validation, entropy, PNN) and 99.10 ± 0.22 % (10-fold cross validation, entropy, PNN) was obtained from the ‘dmey’ mother wavelet as seen in Table 7. From Table 7, it was found that, the maximum accuracy of 97.28 ± 0.47 % (conventional validation,
123
123
CrossV
ConV
87.41 ± 3.41 93.72 ± 0.64 93.51 ± 0.73
MLP TDNN
91.99 ± 1.86
TDNN
GRNN
92.68 ± 1.65
MLP 95.19 ± 0.87
87.07 ± 3.14
PNN
95.04 ± 0.74
91.72 ± 0.78
TDNN
GRNN
91.46 ± 0.74
MLP
PNN
94.99 ± 0.44 89.06 ± 2.47
PNN
GRNN
92.54 ± 1.38
91.10 ± 2.30
95.60 ± 0.33 95.34 ± 0.97
91.19 ± 2.98
96.84 ± 0.58
94.41 ± 1.48
94.71 ± 1.46
91.17 ± 2.74
96.61 ± 0.66
93.57 ± 0.75
93.67 ± 0.81
93.31 ± 1.80
96.63 ± 0.25
92.87 ± 2.19
90.55 ± 2.10
MLP
TDNN
95.97 ± 0.40 92.73 ± 1.50
94.45 ± 0.27 89.13 ± 2.12
db2
PNN
haar
Mother Wavelets
GRNN
Classifiers
The values in bold highlight the maximum obtained accuracy
Entropy
ConV
Energy
CrossV
Validation type
Features
95.44 ± 0.57 95.79 ± 0.35
91.10 ± 2.95
96.88 ± 0.43
95.00 ± 1.41
94.52 ± 1.69
90.79 ± 2.93
96.33 ± 0.49
93.82 ± 0.77
93.50 ± 0.89
93.26 ± 1.78
96.57 ± 0.35
92.65 ± 2.04
92.28 ± 1.69
92.80 ± 1.52
96.33 ± 0.27
sym2
96.54 ± 0.41 96.73 ± 0.48
92.49 ± 2.74
97.40 ± 0.42
96.25 ± 1.43
96.73 ± 1.10
92.50 ± 2.73
97.10 ± 0.29
95.75 ± 0.43
95.13 ± 0.39
95.00 ± 1.35
97.12 ± 0.26
94.56 ± 1.63
94.34 ± 0.85
94.43 ± 1.08
96.72 ± 0.28
coif1
Table 7 Results of PNN, GRNN, MLP and TDNN for experiment 1 (conventional validation and 10-fold cross validation)
93.42 ± 0.60 93.81 ± 0.54
87.34 ± 3.30
95.21 ± 0.82
91.88 ± 2.18
92.46 ± 1.17
87.50 ± 3.43
94.79 ± 0.70
92.12 ± 0.78
91.31 ± 1.05
89.09 ± 2.51
95.19 ± 0.46
90.95 ± 1.67
90.26 ± 2.85
89.00 ± 2.37
94.79 ± 0.51
bior1.1
94.43 ± 0.75 93.51 ± 0.48
87.34 ± 3.10
95.35 ± 0.83
92.46 ± 1.16
92.32 ± 1.78
87.30 ± 3.54
94.79 ± 0.87
91.93 ± 0.73
91.72 ± 0.87
89.10 ± 2.59
95.16 ± 0.26
90.77 ± 2.14
90.07 ± 1.41
89.29 ± 2.05
94.54 ± 0.41
rbio1.1
97.15 ± 0.36 97.19 ± 0.44
98.38 ± 0.51
99.10 ± 0.22
97.28 ± 0.47
96.54 ± 0.88
98.10 ± 0.69
99.11 – 0.18
96.54 ± 0.30
96.25 ± 0.51
98.60 ± 0.14
98.95 ± 0.08
96.43 ± 0.81
95.59 ± 1.51
98.37 ± 0.23
98.64 ± 0.22
dmey
450 Australas Phys Eng Sci Med (2014) 37:439–456
CrossV
ConV
98.09 ± 0.43 97.39 ± 0.20 97.54 ± 0.24
GRNN
MLP
TDNN
97.29 ± 1.37
TDNN 99.22 ± 0.10
97.04 ± 0.71
MLP
PNN
97.99 ± 0.39
GRNN
97.64 ± 0.23
TDNN 99.02 ± 0.18
98.94 ± 0.43 97.46 ± 0.31
PNN
99.61 ± 0.05
97.32 ± 0.92
TDNN
GRNN MLP
96.95 ± 0.60
MLP
PNN
99.46 ± 0.08 98.76 ± 0.38
PNN
haar
Mother Wavelets
GRNN
Classifiers
The values in bold highlight the maximum obtained accuracy
Entropy
ConV
Energy
CrossV
Validation type
Features
97.59 ± 0.23
97.30 ± 0.41
99.04 ± 0.16
99.39 ± 0.12
97.44 ± 0.50
97.17 ± 0.74
98.79 ± 0.18
99.27 ± 0.13
97.63 ± 0.21
99.56 ± 0.19 97.23 ± 0.35
99.65 ± 0.08
97.29 ± 0.98
97.29 ± 1.04
99.20 ± 0.19
99.63 ± 0.05
db2
97.76 ± 0.26
97.30 ± 0.41
98.99 ± 0.12
99.56 ± 0.08
97.17 ± 0.75
97.27 ± 1.07
98.75 ± 0.26
99.21 ± 0.13
97.62 ± 0.35
99.47 ± 0.18 97.23 ± 0.35
99.73 ± 0.11
97.19 ± 0.85
96.92 ± 0.76
99.21 ± 0.18
99.64 ± 0.12
sym2
97.69 ± 0.16
97.57 ± 0.31
98.84 ± 0.13
99.42 ± 0.10
97.19 ± 1.00
97.04 ± 0.73
98.60 ± 0.19
99.17 ± 0.10
97.55 ± 0.35
99.03 ± 0.14 97.20 ± 0.41
99.46 ± 0.08
97.39 ± 0.73
96.85 ± 1.34
99.08 ± 0.16
99.37 ± 0.13
coif1
Table 8 Results of PNN, GRNN, MLP and TDNN for experiment 2 (conventional validation and 10-fold cross validation)
97.60 ± 0.32
97.29 ± 0.43
98.12 ± 0.39
99.28 ± 0.13
97.07 ± 1.03
96.80 ± 0.86
97.96 ± 0.35
98.86 ± 0.18
97.75 ± 0.34
99.00 ± 0.41 97.43 ± 0.22
99.56 ± 0.08
97.49 ± 0.91
96.77 ± 0.95
98.74 ± 0.38
99.39 ± 0.14
bior1.1
97.54 ± 0.24
97.25 ± 0.38
98.10 ± 0.45
99.21 ± 0.17
97.19 ± 0.78
96.99 ± 1.69
97.99 ± 0.39
98.97 ± 0.21
97.57 ± 0.28
98.95 ± 0.41 97.38 ± 0.24
99.53 ± 0.11
97.49 ± 0.91
96.77 ± 0.95
98.71 ± 0.35
99.36 ± 0.11
rbio1.1
97.85 ± 0.24
97.77 ± 0.24
99.07 ± 0.70
99.79 ± 0.09
98.00 ± 0.89
97.56 ± 0.74
98.83 ± 1.02
99.66 ± 0.08
97.80 ± 0.37
99.52 ± 0.14 97.78 ± 0.23
99.80 – 0.06
97.68 ± 0.82
97.24 ± 0.77
99.31 ± 0.38
99.56 ± 0.12
dmey
Australas Phys Eng Sci Med (2014) 37:439–456 451
123
123
CrossV
ConV
97.13 ± 0.83
TDNN
97.08 ± 1.58
97.71 ± 0.11 97.26 ± 0.13
MLP TDNN
97.36 ± 2.34
97.10 ± 0.27
95.81 ± 0.39
98.26 ± 0.61 97.45 ± 0.36
GRNN
97.55 ± 0.71 95.95 ± 0.47
TDNN
98.46 ± 0.43
96.63 ± 0.56
97.12 ± 0.45
97.25 ± 2.19
97.15 ± 1.55
97.25 ± 0.20
97.35 ± 0.12
97.98 ± 0.72
97.62 ± 0.87
96.89 ± 0.25 96.81 ± 0.24
db2
PNN
95.46 ± 0.21 97.60 ± 0.74
GRNN MLP
95.70 ± 0.47
97.02 ± 0.24
TDNN PNN
96.56 ± 0.15 97.23 ± 0.35
GRNN MLP
96.54 ± 0.24
96.95 ± 0.79
MLP PNN
96.21 ± 0.31 95.79 ± 0.22
haar
Mother wavelet
PNN GRNN
Classifiers
The values in bold highlight the maximum obtained accuracy
Entropy
ConV
Energy
CrossV
Validation type
Features
98.43 ± 0.30
97.53 ± 2.25
97.13 ± 0.35
97.34 ± 0.53
98.17 ± 0.76
98.38 ± 0.83
96.60 ± 0.37
97.01 ± 0.32
97.31 ± 1.31
97.22 ± 1.55
97.33 ± 0.21
97.36 ± 0.19
97.96 ± 0.80
97.67 ± 1.07
97.02 ± 0.37 96.95 ± 0.23
sym2
98.44 ± 1.05
98.34 ± 0.80
97.38 ± 0.25
97.85 ± 0.40
98.65 ± 0.37
98.67 ± 0.64
96.86 ± 0.61
97.36 ± 0.40
98.08 ± 0.13
97.51 ± 1.64
97.75 ± 0.17
97.91 ± 0.17
98.01 ± 0.61
97.99 ± 0.47
97.36 ± 0.32 97.24 ± 0.27
coif1
Table 9 Results of PNN, GRNN, MLP and TDNN for experiment 3 (conventional validation and 10-fold cross validation)
96.71 ± 1.48
97.59 ± 0.19
95.75 ± 0.41
95.86 ± 0.39
97.55 ± 0.61
97.55 ± 0.89
95.44 ± 0.35
95.68 ± 0.53
96.73 ± 1.30
96.77 ± 1.85
96.56 ± 0.24
96.55 ± 0.11
96.96 ± 0.70
97.28 ± 0.91
96.11 ± 0.35 96.09 ± 0.18
bior1.1
97.25 ± 1.08
97.86 ± 0.53
95.75 ± 0.41
95.86 ± 0.39
97.75 ± 0.78
97.60 ± 0.55
95.44 ± 0.35
95.68 ± 0.53
96.73 ± 1.32
97.24 ± 0.28
96.47 ± 0.22
96.45 ± 0.14
96.86 ± 0.68
97.21 ± 0.79
96.07 ± 0.30 95.99 ± 0.34
rbio1.1
98.17 ± 1.15
98.50 ± 0.24
98.84 ± 1.12
99.20 – 0.11
98.60 ± 0.43
98.63 ± 0.51
98.68 ± 0.51
98.99 ± 0.11
98.87 ± 0.16
98.53 ± 1.07
99.09 ± 0.17
99.06 ± 0.11
98.71 ± 0.39
98.75 ± 0.47
98.82 ± 0.14 98.76 ± 0.25
dmey
452 Australas Phys Eng Sci Med (2014) 37:439–456
Australas Phys Eng Sci Med (2014) 37:439–456
453
Fig. 7 Comparative performance of different classifiers through conventional validation scheme
Fig. 8 Comparative performance of different classifiers through 10-fold cross validation scheme
entropy, TDNN) and 97.19 ± 0.44 % (10-fold cross validation, entropy, TDNN) from the ‘dmey’ was indexed. As seen in Table 8, the highest accuracy of 99.66 ± 0.08 % (conventional validation, entropy, PNN) and 99.80 ± 0.06 % (10-fold cross validation, energy, PNN) was obtained from the Meyer’s mother wavelet. From Table 8, the maximum accuracy of 98.00 ± 0.89 % (conventional validation, entropy, TDNN) and 97.85 ± 0.24 % (10-fold cross validation, entropy, TDNN) from ‘dmey’ mother wavelet was registered. From the Table 9, it was perceived that the best results by using PNN, GRNN, MLP and TDNN classifiers were attained from the ‘dmey’ mother wavelet. The maximum classification accuracy of 98.99 ± 0.11 % (conventional validation, entropy, PNN) and 99.20 ± 0.11 % (10-fold cross validation, entropy, PNN) was accounted. The highest discrimination accuracy of 98.75 ± 0.47 % (conventional validation, energy, MLP) and 98.87 ± 0.16 % (10-fold cross validation, energy, TDNN) was obtained as seen in Table 9. The performance of the PNN, GRNN, MLP and TDNN classifiers proven the robustness and significance of wavelet packet based features which were extracted from fifth level decomposition for maximum discrimination of different types of infant cry signals. Figures 7 and 8, present the comparative performance of classifiers for the
three different experiments using ‘dmey’ mother wavelet and it has been observed that, PNN and GRNN were outperformed MLP and TDNN networks. Based on the simulation results above, it has been proven that the ‘dmey’ mother wavelet was the most appropriate mother wavelet compared to other tested types of mother wavelets. Table 10 presents the performance comparison of the proposed study and other existing classification works. Hariharan et al. [14] discriminated the normal and asphyxia cry signals with best accuracy of 99 % by implementing WPT and PNN. However in their work only the performance of different orders of Daubechies mother wavelet was investigated and the best accuracy was from a higher order of wavelet (db20). In [15] the authors have shown the effectiveness of their proposed approaches, MFCC and GSFM which designed with an optimal combination of feature selection method, fuzzy processing type and learning algorithm by reporting maximum accuracy of 90.68 % using 10-fold cross validation scheme. However, in our work, we achieved a maximum accuracy of 99 % using both conventional and 10-fold cross validation schemes. Best recognition rate of 99 % reported by using time frequency based statistical features which were derived from normal and asphyxia cry signals, PCA and PNN [16]. Hariharan et al. [17] investigated the use of
123
454 Table 10 A performance comparison of the proposed methodology and other automated infant cry classification studies
Australas Phys Eng Sci Med (2014) 37:439–456
Studies
Signal processing method
Classifier
Accuracy (%)
Normal and asphyxia cry (experiment 1) Hariharan et al. [14]
WPT (only Daubechies mother wavelet was considered)
PNN
99 (60 % training, 40 % testing)
Rosales-Perez et al. [15]
MFCC
Genetic selection of fuzzy model
90.68 (10-fold)
Hariharan et al. [16]
Time–frequency analysis based statistical features ? PCA
PNN and GRNN
99.19 (10-fold)
WPT (7 types of different mother wavelet was considered) Normal and deaf cry (experiment 2)
PNN and GRNN
Proposed methodology
Hariharan et al. [17]
98.88 (60 % training, 40 % testing) 99.10 (10-fold) 99.11 (60 % training, 40 % testing)
Time–frequency analysis based statistical features
GRNN
Rosales-Perez et al. [15]
MFCC
Genetic selection of fuzzy model
Proposed methodology
WPT (7 types of different mother wavelet was considered)
PNN and GRNN
99.31 (10-fold) 93.90 (data independent, 670 segments for training and 344 segments for testing) 99.42 (10-fold) 99.80 (10-fold) 99.66 (60 % training, 40 % testing)
Normal, asphyxia and deaf cry (experiment 3) Hariharan et al. [9]
WLPCC
PNN
99 (70 % training, 30 % testing)
Proposed methodology
WPT (7 types of different mother wavelet was considered)
PNN and GRNN
99.20 (10-fold) 98.99 (60 % training, 40 % testing)
GRNN to discriminate the normal and deaf and achieved 99 % of diagnosis accuracy using statistical features which derived from time frequency plots. Proposed GSFM which trained and tested with an optimal combination of feature selection method, fuzzy processing type and learning algorithm evaluated using extracted MFCC feature vectors of normal and deaf cries and acquired 99.42 % of maximum classification accuracy [15]. Hariharan et al. [9] developed an infant cry based multi class automated system using WLPCC for signal processing and PNN for classification process, with optimal classification result of 99 %. In this proposed study, by emphasizing the time frequency based approach a best mother wavelet (‘dmey’) for maximum infant cry recognition was selected using statistically validated wavelet packet based feature vectors and supervised neural networks through a combination of different selection criteria. Maximum recognition accuracy
123
of above 99 % was reported for all the experiments which are comparable or similar with most of the literature works (Table 10). However, our proposed system seem superior compared to other literature works, since it was designed as a single unit of block system to tackle the binary as well as the multi class recognition tasks mutually and successfully which not attempted yet in other existing systems especially using the normal, asphyxia and deaf infant cry signals. Though, recently numerous automated classification systems proposed and developed in infant cry classification area only a few multiclass based recognition systems was documented with higher successful recognition rates around 99 % which is sufficient enough for implementation in clinical trials even in the cases up to three class domain problems (Table 10). Hence, it can be deduced from the study results, that our proposed system maybe significant in terms of clinical applications for the improved diagnostic results. In addition, the proposed methodologies especially
Australas Phys Eng Sci Med (2014) 37:439–456
cry features (sub band energies and entropies) which contributed for the maximum efficacy of our proposed system due to their good discriminatory ability maybe used or adopted by the medical professionals for diagnosing the pathological status of infants based on their experience using cry signals.
Conclusion Infant cry is a good indicator of expressing infants’ physical and physiological status. Recently, infant cry has attracted great intention of research community to explore and move towards for evolving cry based automatic classification algorithms by adopting different digital signal processing and artificial intelligent techniques. In the development of this automated classification systems, this study concentrated on the time frequency based technique, mainly emphasizing on the selection of best mother wavelet among 56 different basis of wavelets for infant cry classification by incorporating the wavelet packet transform, decomposition of infant cry signals at best decomposition level and classification using supervised neural networks. Similarity of mother wavelets, regularity of mother wavelets and simulation results were considered as the selection criteria to select the best mother wavelet. Based on these different selection criteria results, it was inferred that the Meyer’s wavelet (‘dmey’) is the best candidate among other mother wavelets (haar, daubechies, symlet, coiflet, biorthogonal and reverse biorthogonal) for the accurate neonates cry signal classification, since it was harmonized well with the normal and pathological cry signals mutually with the highest cross coefficients, exhibited higher regularity by reporting maximum number of significant features for two different datasets and yielded maximum empirical results in all three experiments. In future, the proposed study will be extended to investigate the other types of infant cry signals, to test with different pathological cry databases, and to study the severity levels of the disordered cry signals (mild, moderate and severe). A comparison of the present work with other time frequency based approaches for example Wigner-ville, Choi William and extra will be tackled out. Acknowledgments The Baby Chillanto Data Base is a property of the Instituto Nacional de Astrofisica Optica y Electronica –CONACYT, Mexico. We like to thank Dr. Carlos A. Reyes-Garcia, Dr. Emilio Arch-Tirado and his INR-Mexico group, and Dr. Edgar M. Garcia-Tamayo for their dedication of the collection of the Infant Cry data base. The authors would like to thank Dr. Carlos Alberto ReyesGarcia, Researcher, CCC-Inaoep, Mexico for providing infant cry database. All authors declare that they have no financial or any commercial conflicts of interest.
455
References 1. Saraswathy J, Hariharan M, Yaacob S, Khairunizam W (2012) Automatic classification of infant cry: a review. In: Proceedings of the 2012 international conference on biomedical engineering (ICOBE-2012), pp 543–548 2. Lederman D (2002) Automatic classification of infants’ cry. Dissertation, University of Negev, Department of Electrical and Computer Engineering 3. Boukydis CFZ, Lester BM (1985) Infant crying. Plenum press, New York 4. Lederman D, Cohen A, Zmora E, Wermke K, Hauschildt S, Stellzig-Eisenhauer A (2008) Classification of cries of infants with cleft-palate using parallel hidden Markov models. Med Biol Eng Comput 46:965–975 5. Michelsson K, Kaskinn H, Aulanko R, Rinne A (1984) Sound spectrographic cry analysis of infants with hydrocephalus. Acta Paediatr Scand 73:65–68 6. Colton RH, Steinschneider A (1981) The cry characteristics of an infant who died of sudden infant death syndrome. J Speech Hear Dis 46:359–363 7. Ruiz Diaz MA, Reyes Garcia CA, Altamirano Robles LC, Xaltena Altamirano JE (2012) Automatic infant cry analysis for the identification of qualitative features to help opportune diagnosis. Biomed Signal Process Control 7:43–49 8. Verduzco-Mendoza A, Arch-Tirado E, Reyes-Garcia CA, Leybon-Ibarra J, Licona-Bonilla J (2012) Spectrographic infant cry analysis in newborns with profound hearing loss and perinatal high-risk newborns. Cir Cir 80:3–10 9. Hariharan M, Sin Chee L, Yaacob S (2012) Analysis of infant cry through weighted linear prediction cepstral coefficients and probabilistic neural network. J Med Syst 36:1309–1315 10. Office of Health and Nutrition CFP, Health and Nutrition, Bureau for Global Programs Field Support and Research, U.S. (2004) Detecting and treating newborn asphyxia, maternal neonatal & health, pp 1–2. http://pdf.usaid.gov/pdf_docs/Pnacy993.pdf Accessed 18 June 2012 11. Cunningham M, Edward O (2003) Hearing assessment in infants and children: recommendations beyond neonatal screening. Am Acad Pediatr Clin Rep 111:436–440 12. Rosales-Perez A, Reyes-Garcia CA, Gomez-Gil P (2011) Genetic fuzzy relational neural network for infant cry classification. Lect Notes Comput Sci 6718:288–296 13. Zabidi A, Mansor W, Khuan LY, Yassin IM, Sahak R (2011) Binary particle swarm optimization for selection of features in the recognition of infants cries with asphyxia. In: Proceedings of the IEEE international colloquium on signal processing and its applications, pp 272–276 14. Hariharan M, Yaacob S, Awang SA (2011) Pathological infant cry analysis using wavelet packet transform and probabilistic neural network. Exp Syst Appl 38:15377–15382 15. Rosales-Perez A, Reyes-Garcia CA, Gonzalez JA, Arch-Tirado E (2012) Infant cry classification using genetic selection of a fuzzy model. Lect Notes Comput Sci 7441:212–219 16. Hariharan M, Saraswathy J, Sindhu R, Wan Khairunizam, Yaacob S (2012) Infant cry classification to identify asphyxia using time-frequency analysis and radial basis neural networks. Exp Syst Appl 39:9515–9523 17. Hariharan M, Sindhu R, Sazali Yaacob (2012) Normal and hypoacoustic infant cry signal using time-frequency analysis and general regression neural network. Comput Methods Programs Biomed 108:559–569 18. http://ingenieria.uatx.mx/orionfrg/cry/. Accessed 7 May 2010 19. Reyes-Galaviz OF, Cano-Ortiz S, Reyes-Garcia C, y Electronica CO, Puebla M (2009) Evolutionary-neural system to classify
123
456
20.
21.
22.
23. 24.
25.
26.
27.
28.
29.
30.
Australas Phys Eng Sci Med (2014) 37:439–456 infant cry units for pathologies identification in recently born babies. In: Proceedings of the 8th Mexican international conference on artificial intelligent, pp 330–335 Verduzco-Mendoza A, Arch-Tirado E, Reyes-Garcia CA, LeybonIbarra J, Licona-Bonilla J (2009) Qualitative and quantitative crying analysis of new born babies delivered under high risk gestation. Multimodal Signals Lect Notes Artif Intell 5398:320–327 Singh BN, Tiwari AK (2006) Optimal selection of wavelet basis function applied to ECG signal denoising. Digit Signal Process 16:275–287 Kumari A, Bisht M (2013) Optimal wavelet filter maximizes the cross correlation coefficient with an ECG signal. Int J Innov Technol Res 1(2):191–193 MatlabÒ Documentation, version 7.0, Release 14, 2004. The MathWorks, Inc Kale VN, Khalsa NN (2010) Performance evaluation of various wavelets for image compression of natural and artificial images. Int J Comput Sci Commun 1(1):179–184 Goudarzi MM, Taheri A, Pooyan M, Mahboobi R (2006) Multiwavelet and biological signal processing. Int J Inf Technol 2(4):264–272 Martis RJ, Acharya UR, Ray AK, Chakraborty C (2011) Application of higher order cumulants to ECG signals for the cardiac health diagnosis. In: Proceedings of the 33rd international conference on IEEE EMBS, pp 1697–1700 Gandhi T, Panigrahi BK, Anand S (2011) A comparative study of wavelet families for EEG signal classification. Neurocomputing 74:3051–3057 Nagaria B, Hashmi F, Dhakad P (2011) Comparative analysis of fast wavelet transform for image compression for optimal image quality and higher compression ratio. Int J Eng Sci Technol 3:4014–4019 Chaudhari A, Chaudhary P, Cheeran AN, Aswani Y (2012) Improving signal to noise ratio of low-dose CT image using wavelet transform. Int J Comput Sci Eng 4:779–789 Hariharan M, Fook CY, Sindhu R, Ilias B, Yaacob S (2012) A comparative study of wavelet families for classification of wrist motions. Comput Electr Eng 38:1798–1807
123
31. Murugesapandian P, Yaacob S, Hariharan M (2008) Feature extraction based on Mel-scaled wavelet packet transform for the diagnosis of voice disorders. Proc IFMBE 2008 21:790–793 32. Burrus CS, Ramesh A, Gopinath, Guo H (1998) Introduction to wavelets and wavelet transforms: a primer. Prentice Hall, Englewood Cliffs 33. Gopinath B, Shanthi N (2013) Computer-aided diagnosis system for classifying benign and malignant thyroid nodules in multistained FNAB cytological images. Australas Phys Eng Sci Med. doi:10.1007/s13246-013-0199-8 34. Wang Y, Yang Z, Hao D, Zhang S, Yang Y, Zeng Y (2013) Measurement of subcutaneous adipose tissue thickness by nearinfrared. Australas Phys Eng Sci Med. doi:10.1007/s13246-0130196-y 35. Mahmoodian H (2012) Predicting the continuous values of breast cancer relapse time by type-2 fuzzy logic system. Australas Phys Eng Sci Med 35:193–204 36. Hariharan M, Paulraj MP, Yaccob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23:60–67 37. Hariharan M, Paulraj MP, Yaacob S (2011) Detection of vocal fold paralysis and oedema using time-domain features and probabilistic neural network. Int J Biomed Eng Technol 6:46–57 38. Specht DF (1990) Probabilistic neural networks. IEEE Trans Neural Netw 3:109–118 39. Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2:568–576 40. Heaton Jeff (2008) Introduction to neural networks for Java, 2nd edn. Heaton Research, St. Louis 41. Cohen I (2001) Enhancement of speech using bark-scaled wavelet packet decomposition. In: Proceedings of the 7th European conference speech, communication and technology, 2nd INTERSPEECH Event, Aalborg, Denmark, Sept 3–7 42. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence, vol 2, Montreal, Quebec, Canada, pp 1137–1143