Environ Monit Assess (2018) 190:260 https://doi.org/10.1007/s10661-018-6635-1
Assessment of surface water quality using a growing hierarchical self-organizing map: a case study of the Songhua River Basin, northeastern China, from 2011 to 2015 Mingcen Jiang & Yeyao Wang & Qi Yang & Fansheng Meng & Zhipeng Yao & Peixuan Cheng
Received: 24 July 2017 / Accepted: 22 March 2018 # Springer International Publishing AG, part of Springer Nature 2018
Abstract The analysis of a large number of multidimensional surface water monitoring data for extracting potential information plays an important role in water quality management. In this study, growing hierarchical self-organizing map (GHSOM) was applied to a water quality assessment of the Songhua River Basin in China using 22 water quality parameters monitored monthly from 13 monitoring sites from 2011 to 2015 (14,782 observations). The spatial and temporal features and correlation between the water quality parameters were explored, and the major contaminants were identified. The results showed that the downstream of the Second Songhua River had the worst water quality of the Songhua River Basin. The upstream and midstream of Nenjiang River and the Second Songhua River had the best. The major contaminants of the Songhua River were chemical oxygen demand (COD), ammonia nitrogen (NH3-N), total phosphorus (TP), and fecal coliform (FC). In the Songhua River, the water pollution at M. Jiang : Y. Wang : Q. Yang : P. Cheng Beijing Key Laboratory of Water Resources & Environmental Engineering, China University of Geosciences (Beijing), Beijing 100083, People’s Republic of China Y. Wang (*) : Z. Yao China National Environmental Monitoring Center, Beijing 100012, People’s Republic of China e-mail:
[email protected] F. Meng State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, People’s Republic of China
downstream has been gradually eased in years. However, FC and biochemical oxygen demand (BOD 5 ) showed growth over time. The component planes showed that three sets of parameters had positive correlations with each other. GHSOM was found to have advantages over self-organizing maps and hierarchical clustering analysis as follows: (1) automatically generating the necessary neurons, (2) intuitively exhibiting the hierarchical inheritance relationship between the original data, and (3) depicting the boundaries of the classification much more clearly. Therefore, the application of GHSOM in water quality assessments, especially with large amounts of monitoring data, enables the extraction of more information and provides strong support for water quality management. Keywords Water quality assessment . Growing hierarchical self-organizing map . Major contaminant identification . Spatial feature . Temporal feature . Water quality management
Introduction Surface water pollution remains a major global problem especially in the developing countries (Shukla et al. 2017; Hu et al. 2017; Daou et al. 2016). To prevent and control pollution from industry, agriculture, domestic sewage, and other human activities, surface water quality has been regularly monitored. The monitoring data can assist in the understanding of water quality trends (Anny et al. 2017; Griffiths
260
Page 2 of 15
et al. 2017). Mathematical methods and models are used for better exploring the hidden information in a large number of multidimensional water quality data because the correlation between the water indicators cannot be described directly (Singh et al. 2005; Zou et al. 2006; Yin & Xu, 2008; Koklu et al. 2010; Tyagi et al. 2013; Almeida et al. 2014). In addition, the periodicity of water quality variations and the spatial features of major pollutants can be revealed by a spatiotemporal analysis. Linear multivariate statistical methods, such as cluster analysis (CA), principle component analysis/factor analysis (PCA/FA), and discriminant analysis (DA), have been well reviewed in surface water quality assessments during the last decade because of their excellent ability to process and analyze large amounts of data (Templ et al. 2008; Shrestha & Kazama, 2007; Wahed et al. 2015; Juahir et al. 2011; Simeonov et al. 2003). However, the linear multivariate statistical methods are restricted by the assumption of linearity (Gamble & Babbar-Sebens, 2012). Currently, the self-organizing map (SOM), a type of artificial neural network, has received attention from researchers. SOM, in the form of the Kohonen clustering network, was first proposed by Professor Teuvo Kohonen of the University of Helsinki, Finland, in 1981 (Kohonen 1981). SOM is an unsupervised and self-organizing network. It is composed of fully connected neuron arrays that can realize the mapping from an ordinary high-dimensional space to a definite twodimensional space. It also enables the visualization of the output. Containing the advantages of non-linear characteristics, large number of parallel distributed structures, and abilities for learning and induction, SOM has been successfully applied in many fields, including water quality assessment, leakage detection, water region identification in LANDSAT images, and many other fields related to water resources (Sengorur et al. 2015; Aksela et al. 2009; Janahiraman & Kong 2011; Voutilainen et al. 2012; Kalteh et al. 2008; Céréghino & Park, 2009). Nevertheless, before training of the network, the number and arrangement of neurons must be pre-defined in order to determine the structure of the topology which directly affects the results of the classification. Wu et al. (2015) stated that if the map size is too small, SOM might not reveal some important different features between the data that should have been detected. In addition, the hierarchy of the input data cannot yet be expressed intuitively.
Environ Monit Assess (2018) 190:260
To address this problem, new models has been derived from the basic SOM model (Wu & Yen, 2003). Fritzke (1994) presented the growing cell structures model that started with three nodes forming a triangle. The growing neural gas algorithm added a growth mechanism to the hierarchy of SOMs (Costa & de Andrade Netto, 1999). The growing selforganizing map (GSOM) introduced a spread factor to control the growing process of the map, and the outcome can present hierarchical clusters (Alahakoon et al. 2000; Matharage & Alahakoon, 2014). Generally, the GSOM started with very few nodes, and the final structures did not need to be pre-defined, but could be obtained after learning and competition between the units of the growing mechanism. The growing hierarchical self-organizing map (GHSOM) is an improved SOM that incorporates the developments described above, which is an artificial neural network model with hierarchical architecture composed of independent GSOMs (Dittenbach et al. 2000). It has been widely and successfully used in many fields, including the visualization of network forensics traffic data (De la Hoz et al. 2014), clustering of text documents (Sarnovsky & Ulbrik, 2013), network anomaly detection (Ippoliti & Zhou, 2012), and analysis of multi-dimensional stream habitat datasets (Bizzi et al. 2009). In particular, GHSOM has been applied on sea surface temperature patterns recognition (Liu et al. 2006), seasonal changes studying, and variability analysis of the Kuroshio intrusion (Wu et al. 2014; Tsui & Wu, 2012). These reports showed that GHSOM had a shorter training time than SOM. Also, GHSOM could reveal the hierarchical structure of large amount of data. This gave GHSOM the strong ability of getting more information on characteristic and classification than SOM. If using GHSOM on the surface water quality assessment, the features of surface water quality will be better demonstrated. However, it has been rarely applied to the analysis of surface water quality monitoring data in published research articles. In this study, the feasibility of GHSOM application to the assessment of surface water quality was studied using the case of the Songhua River Basin in China. The water quality spatial and temporal features of the basin were analyzed based on monthly monitoring data from 13 sites with 22 water quality parameters from 2011 to 2015 (14,782 observations).
Environ Monit Assess (2018) 190:260
Materials and methods Study area The Songhua River Basin lies in northeastern China, and includes three major water resource areas that are drained by the Nenjiang, Songhua and Second Songhua Rivers. The basin drains a total catchment area of 557,000 km2, including 267,000 km2 of Nenjiang River, 73,000 km2 of Songhua River, and 187,000 km2 of the Second Songhua River. The water system of the Songhua River Basin is quite developed and has two sources, the Nenjiang River and the Second Songhua River, which originate from the north and the south, respectively. The two sources meet to form the Songhua River, which feeds into the Heilongjiang River, a border river between China and Russia. The Heilongjiang River is referred to the Amur River in Russia. The locations of the monitoring sites in the Songhua River Basin are depicted in Fig. 1. The Songhua River Basin is in the northern temperate monsoon climate zone and has well-defined climate characteristics. The temperature difference in the basin
Fig. 1 Study area and position of monitoring sites
Page 3 of 15 260
is relatively large. The maximum temperature occurs in July with an average value of 21 to 25 °C, while the minimum temperature generally occurs in January with the average value below − 20 °C. Most of the rainfall occurs in summer, and winter rainfall only accounts for 5% of the total rainfall. The Songhua River Basin is surrounded by three major mountains. The Greater Khingan Mountains is on the north side of the basin, and the Lesser Khingan Mountains and the Changbai Mountains are on the northeast and southeast, respectively. These three parts of the basin contain lots of brown and dark brown soil and the Greater Khingan Mountains are mainly composed of volcanic rocks. The arid and semi-arid steppe zones in the west of the basin contain meadow soil and swamp soil including salinized and alkalized meadow soil and swamp soil. The southwest is hilly terrain. The middle of the basin is Songnen Plain with dense population. The cities are concentrated in this area, which is also a developed area of industry and agriculture. The land use types of the Songhua River Basin have obvious spatial characteristics (Cao & Xu, 2014). The land-use types in the west, north, east, and northeast are
260
Environ Monit Assess (2018) 190:260
Page 4 of 15
dominated by forest land. In the northwest, grassland and cultivated land are the main types. The main agricultural areas in the basin are mainly concentrated in the middle and the northeast. The land-use type is arable land. The land for urban construction is dotted around large and medium-sized cities aggregately, while the land for rural construction is sporadic distributed in the plain area. Water from 13 monitoring sites (Fig. 1) was sampled and analyzed monthly. The monitoring site essential information are listed in Table 1.
noted that Pb, Cu, Zn, Cd, and CrVI measurements were of the soluble content of the metal in the water, while Hg, Se, and As measurements are of the total content in water, and therefore includes the suspended and dissolved organic and inorganic compounds. In addition, cyanide measurements expressed the freestate content in water rather than the total content, and fluoride and sulfide measurements expressed the dissolved content in water. All the data were provided by the China National Environmental Monitoring Center. The basic descriptive statistics of the data are listed in Table 2.
Data Growing hierarchical self-organizing map (GHSOM) To study water pollution spatial and temporal features, 22 water quality parameters were analyzed monthly over 5 years (2011–2015). The parameters analyzed were pH, electrical conductivity (EC), dissolved oxygen (DO), chemical oxygen demand (CODCr), 5-day biochemical oxygen demand (BOD5), ammonia nitrogen (NH3-N), total phosphorus (TP), petroleum, volatile phenol (VP), anionic surfactant (AiS), fluoride, cyanide, sulfide, mercury (Hg), selenium (Se), arsenic (As), lead (Pb), copper (Cu), zinc (Zn), cadmium (Cd), hexavalent chromium (CrVI), and fecal coliform (FC). It should be Table 1 Monitoring site essential information Monitoring site name
Code name Geographical coordinates Longitude
Latitude
The Second Songhua River Baiqi
S1
126.4769° E 44.4011° N
Songhua River Village S2
126.0240° E 44.7874° N
Songlin
124.7242° E 45.3433° N
S3
Nenjiang River Liuyuan
S4
123.9178° E 47.3788° N
Jiangqiao
S5
123.6996° E 46.7838° N
Baishatan
S6
123.8531° E 46.2961° N
Nenjiang River Estuary S7
124.6469° E 45.4364° N
Songhua River Zhaoyuan
S8
124.9867° E 45.4894° N
Zhushun Village
S9
126.5364° E 45.7567° N
Dadingzi Mountains
S10
127.1183° E 46.0047° N
Upstream of Kiamusze S11
129.9118° E 46.6517° N
Jiangnan Village
S12
130.8225° E 47.1860° N
Tongjiang
S13
132.4710° E 47.6449° N
GHSOM is an artificial neural network method derived from the SOM. GHSOM addresses two shortcomings of the original SOM approach: (1) the size of the map is generated automatically, based on the data structure, and does not need to be predetermined, and (2) the outcomes can clearly demonstrate the inherent hierarchical structure of the input data set (Rauber et al. 2002). The estimated values representing the original quality data are used to adjust the structure of the GHSOM adaptively to conform to the inherent structure of the input data as much as possible. As with the SOM, GHSOM projects multidimensional data onto the twodimensional plane. The difference between the two methods is that GHSOM has more than one twodimensional planes; GHSOM contains several SOMs independently distributed in different layers. If some categories in the layer are not sufficiently detailed after clustering, these categories will be extended to the next layer for more detailed classification. In general, the growth of the structure can be controlled by two methods: quantization error (qe) and mean quantization error (mqe). Liu et al. (2004) found that using mqe to control the growth of the model structure can provide more detailed information about the original data and enable high classification accuracy. The deviation Dui between the input vector xj and its model vector mi, which is the mqei of unit i, can be calculated using Eq. 1. Dui ¼ mqei ¼
1 ∑ mi −x j n x j ∈U j
ð1Þ
where n refers to the total number of the input vectors x in the set U mapped onto the unit i. The mean
(mg/L)
S.D.
Mean
S.D.
Mean
S.D.
(mg/L)
Se
(mg/L)
As
(mg/L)
S.D.
Mean
Sulfide
Hg
Mean
(mg/L)
(mg/L)
Mean
S.D.
Cyanide
Mean
S.D.
AiS
S.D.
Mean
(mg/L)
(mg/L)
S.D.
VP
Fluoride
S.D.
Mean
(mg/L)
Mean
Petroleum
(mg/L)
Mean
S.D.
NH3-N
S.D.
Mean
(mg/L)
(mg/L)
S.D.
BOD5
TP
S.D.
Mean
(mg/L)
S.D.
Mean
CODCr
DO
(mg/L)
S.D.
Mean
(ms/s)
Mean
0.00037
0.00072
0.00011
0.00012
< 1E-06
0.5E-05
< 0.0001
0.0100
< 0.0001
0.0020
0.102
0.362
0.005
0.026
0.00025
0.00073
0.012
0.018
0.037
0.089
0.232
0.447
0.83
1.61
2.23
12.66
2.05
8.36
4.36
21.60
S.D.
EC
7.49
0.23
Mean
pH
S1
Site
Variable
0.00042
0.00100
0.00001
0.00013
<1E-06
2.5E-05
0.0019
0.0029
<0.0001
0.0010
0.105
0.265
0.030
0.101
0.00034
0.00103
< 0.001
0.005
0.027
0.084
0.188
0.522
0.86
1.89
2.36
15.98
1.75
10.06
2.79
20.88
0.25
7.20
S2
0.00013
0.00100
0.00019
0.00150
0.6E-05
5.1E-0
0.0262
0.0563
0.0003
0.0020
0.072
0.321
0.005
0.057
0.00056
0.00297
0.008
0.033
0.021
0.181
0.397
0.794
0.26
3.07
1.10
13.22
0.14
7.29
4.68
26.86
0.32
7.82
S3
Table 2 Descriptive statistics of the monitoring data
0.00051
0.00063
0.00024
0.00039
0.3E-05
0.6E-05
0.0035
0.0076
< 0.0001
0.0020
0.074
0.210
< 0.001
0.025
< 0.00001
0.00015
0.004
0.006
0.057
0.095
0.139
0.292
0.60
1.66
3.55
16.35
1.64
9.09
2.30
12.95
0.41
7.91
S4
0.00049
0.00066
0.00023
0.00039
0.5E-05
0.6E-05
0.0036
0.0074
< 0.0001
0.0020
0.093
0.258
0.004
0.026
< 0.00001
0.00015
0.004
0.006
0.081
0.102
0.298
0.425
0.72
1.88
4.72
15.51
1.63
8.97
2.92
15.31
0.37
7.84
S5
< 0.00001
0.00350
0.00100
0.00170
0.7E-05
0.9E-05
0.0263
0.0102
0.0007
0.0014
0.285
0.709
0.007
0.027
0.00097
0.00068
0.010
0.009
0.075
0.082
0.315
0.713
0.77
2.75
4.01
16.85
1.94
10.23
8.81
18.17
0.33
7.77
S6
0.00410
0.00320
0.00002
0.00024
0.8E-05
1.1E-05
0.0204
0.0116
< 0.0001
0.0020
0.179
0.333
0.024
0.039
0.00049
0.00033
0.027
0.022
0.064
0.162
0.368
0.621
0.98
2.26
4.90
19.09
1.96
8.42
15.32
26.40
0.28
7.52
S7
0.00330
0.00360
0.00002
0.00024
0.8E-05
1.1E-05
0.0264
0.0123
< 0.0001
0.0020
0.148
0.359
0.038
0.046
0.00040
0.00028
0.029
0.022
0.050
0.193
0.402
0.749
1.02
2.94
4.49
21.15
1.56
8.43
13.42
31.96
0.34
7.63
S8
< 0.00001
0.00005
< 0.00001
0.00005
< 1E-06
0.5E-05
0.0037
0.0059
< 0.0001
0.0010
0.048
0.244
< 0.001
0.025
0.00022
0.00023
0.017
0.025
0.044
0.111
0.384
0.612
0.39
2.57
2.72
17.06
1.09
8.13
9.51
23.74
0.39
7.76
S9
0.00002
0.00006
0.00003
0.00006
< 1E-06
0.5E-05
0.0037
0.0055
<0.0001
0.0010
0.033
0.255
< 0.001
0.025
0.00003
0.00020
0.020
0.032
0.039
0.122
0.399
0.671
0.45
2.77
2.78
17.59
1.26
8.52
5.58
24.48
0.38
7.78
S10
0.00050
0.00099
0.00018
0.00016
1.6E-05
2.8E-05
0.0282
0.0314
< 0.0001
0.0020
0.169
0.334
0.013
0.029
0.00079
0.00082
0.016
0.026
0.043
0.127
0.320
0.571
1.07
3.12
3.85
20.13
2.08
9.77
4.97
22.09
0.53
7.52
S11
0.00057
0.00110
0.00016
0.00015
1.8E-05
3.2E-05
0.0244
0.0307
< 0.0001
0.0020
0.156
0.340
0.017
0.030
0.00096
0.00113
0.015
0.020
0.035
0.124
0.330
0.619
1.33
2.70
4.14
20.46
2.11
9.68
6.12
21.33
0.50
7.51
S12
< 0.00001
0.00003
< 0.00001
0.00003
0.2E-05
2.3E-05
< 0.0001
0.0100
< 0.0001
0.0020
0.093
0.274
< 0.001
0.025
0.00038
0.00039
0.003
0.022
0.052
0.142
0.270
0.488
0.26
2.30
2.18
18.75
1.49
8.36
6.01
20.98
0.50
7.12
S13
Environ Monit Assess (2018) 190:260 Page 5 of 15 260
Environ Monit Assess (2018) 190:260
< 0.001
7658
9427
< 0.001
74,548
73,153
< 0.001
22,419
35,610
0.00007
0.002
< 0.001
9235
7877
0.00007 < 1E-06
0.002
< 0.001
12,348
37,844
< 1E-06
0.002
< 0.001
1488
1088
4485
0.002 0.002 0.002
0.00010 0.00011 0.00005
4605
0.00018 < 1E-06 0.00001
< 0.0001 < 0.0001 < 0.0001
0.00005
< 0.0001
0.0250
< 0.001
0.0060
0.00009
0.0197
0.00002
0.0433
0.00002
0.0100
< 0.0001
0.0250
< 0.0001
0.002
0.0051
0.0033
0.0039
0.0156
0.0030
0.0343
< 0.0001
0.0100
0.0041 0.0042 0.0005 0.0005 0.0005
< 0.0001
0.00026
0.0050
0.00190
0.0005
0.00056 0.00160 0.00160
0.00140
0.00050
< 1E-06
0.00050 0.00050
< 1E-06
0.00050
< 1E-06
< 1E-06
S13 S11 S10 S8
S9
S12
Page 6 of 15
S7
260
quantization error of a SOM, the deviation Dmk between the input vectors and the mapping SOM k, can be regarded as the mean of all deviations of each unit and can be calculated using Eq. 2. Dmk ¼
1 ∑ Dui m i∈G
where m refers to the total number of all units of the set G mapped onto SOM k. To control the number of mapping units in each SOM, Eq. 3 is used as the criterion function. Dmk < φ1 Dul
1053
3713 21,694 3461
9908
6714 62,136
(CFU/L)
13,272 Mean
S.D.
FC
20,330
79,984
1717
35,763
< 0.001 < 0.001 S.D. (mg/L)
< 0.001
0.001
0.002
< 0.001
0.002
< 1E-06 < 1E-06
0.002
< 1E-06
0.002
0.00001
0.010
< 1E-06
0.002 0.002 CrVI
0.00023 S.D.
Mean
(mg/L)
0.00050
0.0155
0.00005 0.00005
< 0.0001 < 0.0001 0.0092
0.00050 Cd
0.00024
S.D.
Mean
(mg/L)
0.0253
< 0.0001
0.00005
0.0120
0.0055 < 0.0001
0.0250 0.0183
0.0250
< 0.0001 0.0032
0.0250
0.0007
Mean Zn
0.0231
S.D. (mg/L)
0.0062
0.0257 0.0250 Mean Cu
0.0076
0.0007
0.0254
0.0250
< 1E-06
0.00500
< 1E-06
< 1E-06
0.00050
< 1E-06 < 1E-06 S.D. (mg/L)
0.00071 Mean Pb
0.00067
0.0050
0.00280
0.00050
S6 S5 S4 S3 S2 S1 Site Variable
ð3Þ
where Dul is the mqe of the corresponding unit l in the upper layer, and φ1 is a preset fraction. Each SOM has 2 × 2 units at the beginning. During the training process, Dmk is calculated once after θ times training, and the unit e with the maximum Dui is marked as the error unit, while its most dissimilar unit b is found among all neighbor units around unit e. The unit b is determined using Eq. 4. b ¼ arg max ðkme −mi kÞ i
Table 2 (continued)
ð2Þ
ð4Þ
where me and mi are model vectors of unit e and its neighbor units, respectively. If Dmk does not meet the requirements of Eq. 3, a row or a column of units will be inserted between unit e and unit b, with the model vectors initialized as the means of their neighboring units (Chan & Pampalk, 2002). The model repeats the steps above until Dmk is acceptable in terms of Eq. 3. Clearly, φ1 controls the size of each SOM. The smaller the preset φ1, the larger the SOM obtained. Another criterion function is needed to stop the global growth of the structure—deciding whether a cluster needs to extend into the next layer, as listed in Eq. 5. Dui < φ2 Du1
ð5Þ
where Du1 is the mqe of the unit composing the SOM in the first layer, while φ2 is also a preset fraction, like φ1. During the training of GHSOM, after adjusting the size of each SOM, every mqe of the existing units must be examined to consider whether it meets Eq. 5 or not. For those units where mqe is not satisfied by Eq. 5, new SOMs (belonging to the next layer) are established to cluster the data mapped on these units in more detail. The result is that a GHSOM has been formed according to these rules.
Environ Monit Assess (2018) 190:260
In this paper, the GHSOM modeling was conducted using Matlab R2014b with the GHSOM Toolbox developed by the Department of Software Technology at the Vienna University of Technology (http://www.ifs. tuwien.ac.at/~andi/ghsom/). The hierarchical clustering and the correlation analysis were performed using IBM® SPSS Statistics® 22.
Results and discussion Clustering result The result of monitoring sites classification for the GHSOM model is shown in Fig. 2. The monitoring sites in the same rectangle represent that they are in one cluster (Chan & Pampalk, 2002). Four layers were obtained, which the color of the first layer is the deepest and the last layer is the lightest. This classification result representation is much easier to distinguish than the output of SOM (Rauber et al. 2002; Palomo et al. 2012). The Surface Water Environmental Quality Standard of China is provided in Table 3 and helped for feature analysis of the classification.
Fig. 2 Clusters of monitoring sites obtained from the GHSOM model
Page 7 of 15 260
Spatial features analysis The classification results in layer 1 mainly reflect spatial features of the water quality. Layer 1 component planes are shown in Fig. 3, which high value is in white and low value is in black. The component planes of the water quality parameters were an excellent aid for the interpretation of the clusters obtained (Alvarez-Guerra et al. 2008). It could be illustrated that cluster I contained four monitoring sites (S7, S8, S11, and S12 besides S11–2015 and S12–2015) from upstream and downstream of the Songhua River, which in general represented high concentrations of COD, NH3-N, TP, Zn, and As. According to the standards in Table 3, the concentrations of Zn and As in the four sites were all below level I; however, the concentrations of COD, NH3-N, and TP exceeded level II. This result may due to the emissions of large amounts of organic pollutants from industrial plants along the river (Wang et al. 2012a). High concentrations of FC were also found in cluster I, probably as a result of intensive population in the city (Gao et al. 2014). Therefore, the features of cluster I can be summarized as organic and domestic sewage pollution.
260
Environ Monit Assess (2018) 190:260
Page 8 of 15
Table 3 Surface Water Environmental Quality Standard of China Variable
Level I
Level II
Level III
Level IV
Level V 2
DO (mg/L) ≥
7.5
6
5
3
COD (mg/L) ≤
15
15
20
30
40
BOD5 (mg/L) ≤
3
3
4
6
10
NH3-N (mg/L) ≤
0.15
0.5
1.0
1.5
2.0
TP (mg/L) ≤
0.02
0.1
0.2
0.3
0.4
Petroleum (mg/L) ≤
0.05
0.05
0.05
0.5
1.0
VP (mg/L) ≤
0.002
0.002
0.005
0.01
0.1
Hg (mg/L) ≤
0.00005
0.00005
0.0001
0.001
0.001
Pb (mg/L) ≤
0.01
0.01
0.05
0.05
0.1
Cu (mg/L) ≤
0.01
1.0
1.0
1.0
1.0
Zn (mg/L) ≤
0.05
1.0
1.0
20.
2.0
Se (mg/L) ≤
0.01
0.01
0.01
0.02
0.02
As (mg/L) ≤
0.05
0.05
0.05
0.1
0.1
Cd (mg/L) ≤
0.001
0.005
0.005
0.005
0.01
Cr VI (mg/L) ≤
0.01
0.05
0.05
0.05
0.1
Fluoride (mg/L) ≤
1.0
1.0
1.0
1.5
1.5
Cyanide (mg/L) ≤
0.005
0.05
0.2
0.2
0.2
Sulfide (mg/L) ≤
0.05
0.1
0.2
0.2
1.0
AiS (mg/L) ≤
0.2
0.2
0.2
0.3
0.3
FC (mg/L) ≤
200
2000
10,000
20,000
40,000
It is apparent that the features of cluster II were the characteristics of S3. More than half of the parameters showed high values. Among these parameters in high concentrations, BOD5, NH3-H, TP, VP, Hg, and FC were all higher than the standards of level II, Cu, Cr VI, and sulfide exceeded level I. In addition, the values of DO in cluster II were lower than 7.5, that is to say, the concentrations of DO broke the limitation of level 1 shown in Table 3. Overall, cluster II represented the worst water quality of the four clusters. Cluster III grouped three monitoring sites (S9, S10, and S13) from the midstream and estuary of the Songhua River and S1 of 2 years. In contrast with cluster II, half of the 22 water quality parameters showed low concentrations. The three sites were characterized by low values of VP, Hg, Pb, Cu, Zn, Se, As, Cr VI, fluoride, cyanide, and AiS. The positions of the sites in cluster III revealed the self-purification of the water body (González et al. 2014). In addition, the concentrations of COD, NH3-N, and TP in cluster 3 were higher than level II. Cluster I and cluster III include all the sites of the Songhua River with high values of COD, NH3-N, and TP. This demonstrated that the Songhua River was
polluted by industrial waste water, which were also reported in other articles (Wang et al. 2012a; Wang et al. 2012b; Gao et al. 2014). In general, cluster III indicated relative clean water quality. Finally, five monitoring sites (S1, S2, S4, S5, and S6 besides S1–2013, S1–2014, and S6–2011) from upstream and midstream of the Nenjiang River and the Second Songhua River were grouped in cluster IV. High values of DO, Cd, and fluoride and low values of EC, BOD5, NH3-N, TP, and petroleum were shown in this cluster. The concentrations of Cd and fluoride did not exceed level I. Therefore, cluster IV had the best water quality of the four clusters. Temporal features analysis The temporal features of the water quality can be summarized from the classification results in layer 2. The component planes of layer 2 were shown in Figs. 4–6 for exploring the temporal features and more detailed information. It can be indicated that S2, S3, S4, S5, and S13 did not have clear temporal variations during 2011–2015.
Environ Monit Assess (2018) 190:260
Page 9 of 15 260
Fig. 3 Component planes of 22 water quality parameters in layer 1
S7 had the same temporal variations as S8, and S11 was consistent with S12 (Fig. 2). In these four monitoring sites, VP, Pb, and FC increased over time (Fig. 4). FC was a major pollution characteristic of cluster I as described above. This may due to the rapidly growing population in the Songhua River Basin (Shen et al. 2017). BOD5 was also a concern in S7 and S8. By 2015, the concentrations of BOD5 have risen beyond level II of Surface Water Environmental Quality Standard of China (Table 3). In S11 and S12, COD decreased over time, and the water quality was markedly improved in 2015. In general, the water pollution of the Songhua River downstream has been gradually eased in years (Wei et al. 2017). The temporal variations of S9 was consistent with S10 (Fig. 2). The concentrations of TP and Petroleum in 2013–2015 were higher than 2011–2012, and Cd and sulfide had significant reduction (Fig. 5). In addition, FC increased over time, which indicates the importance of controlling the discharge of domestic sewage in the Songhua River. S1 and S6 have their own temporal features respectively. The concentration of COD in S1 has risen in 2013–2014 and declined in 2105.
Although the concentrations of some parameters, namely, Cu, Zn, and TP have increased in 2015, the water quality was still better than 2011–2012 (Fig. 6). The temporal variations of S6 was not obvious during 2012–2015. However, the water quality improved a lot from 2011 because more than half of the parameters have decreased. Correlation analysis between parameters Some other information can also be analyzed from the component planes—the relationship between the parameters can be illustrated from the distribution characteristics. For example, Fig. 7 shows that the component planes of VP, Hg, and CrVI are very similar. Cu, Se, and Pb also have a similar pattern. It can be indicated that parameters with similar distribution in component planes may have positive correlation with each other. Some other articles also reported that comparing the component planes can indicate informative and correlation between parameters (Alvarez-Guerra et al. 2008; Hentati et al. 2010; Jin et al. 2011). Three sets of parameters of the 13
260
Environ Monit Assess (2018) 190:260
Page 10 of 15
Fig. 4 Component planes of 22 water quality parameters of cluster I in layer 2
monitoring sites which have positive correlation with each other were listed in Fig. 7. It should be noted that only the component planes in layer 1 contained all the monitoring data. Thus, this study only considered the similar distribution in layer 1. Validation of results As hierarchical clustering analysis (HCA) is a mature and widely accepted method of data classification analysis (Vega et al. 1998; Güler et al. 2002; Shrestha & Kazama, 2007; Yidana et al. 2008), it was used to classify the same 13 monitoring sites in order to verify the classification result of GHSOM. The result was showed in Fig. 8. It illustrates a dendrogram of the monitoring sites based on the water quality parameters with the sites classified into three obvious clusters, which are cluster I (S1, S4, S5, S7, S8, S9, S10, S11, S12, and S13), cluster II (S2 and S6), and cluster III (S3). The result is mostly consistent with the classification obtained from the GHSOM. However, except for monitoring sites with obvious characteristics, other sites cannot be easily distinguished in HCA. For example,
HCA clustered S7, S8, S9, and S10 together as well as S11, S12, and S13 because they had the same pollution characteristics. However, the better water quality of S9, S10, and S13 has not been excavated, which the selfpurification of water body cannot be reflected. In addition, the temporal features cannot be clearly represented when multiple monitoring sites are analyzed in one time. The outcome of the hierarchical cluster analysis not only verified the classification of GHSOM, but also revealed the advantages of the GHSOM model (Gamble & Babbar-Sebens, 2012). The correlation between parameters were tested using Pearson’s coefficient. Table 4 lists the Pearson’s coefficients between the water quality parameters with similar distribution in component planes. The results in Table 4 strongly support the results depicted in Fig. 7. It can be summarized that 88% of the pairs of the water quality parameters containing similar distribution in component planes correlate significantly (p < 0.01). The test is a good illustration which GHSOM can give a quick assessment of water quality parameter correlation (Alvarez-Guerra et al. 2008; Hentati et al. 2010; Jin et al. 2011).
Environ Monit Assess (2018) 190:260
Page 11 of 15 260
Fig. 5 Component planes of 22 water quality parameters of cluster III in layer 2
Conclusions In this study, GHSOM was applied to a water quality assessment of the Songhua River Basin in China using 22 water quality parameters monitored monthly from 13 monitoring sites from 2011 to 2015 (14,782 observations). The spatial and temporal features and correlation between the water quality parameters were explored and the major contaminants were identified. GHSOM clustered the 13 monitoring sites into four groups in the first layer. Three of four groups were sub-clustered in the second layer. The clusters in first layer mainly revealed the spatial features of the rivers. The results showed that the downstream of the Second Songhua River had the worst water quality of the Songhua River Basin. Conversely, the water quality of upstream and midstream of Nenjiang River and the Second Songhua River were the best. The major contaminants of the Songhua River were COD, NH3-N, TP, and FC because of the industrial and domestic waste water. The main discharge areas were at the upstream of S7, S8, S11, and S12.The sub-clusters in second
layer revealed the temporal features. In the Songhua River, the water pollution at downstream has been gradually eased in years. However, FC and BOD5 showed growth over time, which should be paid more attention in the future. The classification of the monitoring sites obtained from GHSOM was verified by the hierarchical clustering analysis. The component planes were compared for finding the correlation between water quality parameters. The results showed that three sets of parameters had positive correlations with each other. Correlation analysis was used to verify the outcomes. It revealed that 88% of the pairs of the water quality parameters had significant positive correlations (p < 0.01). It can be illustrated that the component planes of GHSOM can give a quick and visualized assessment of water quality parameter correlation. This study shows that GHSOM has a good applicability in surface water quality assessment. The key concept of GHSOM is to use a hierarchical structure with multiple layers, and each layer contains several independent SOMs. Moreover, GHSOM is revealed to be more useful than SOM and hierarchical
260
Environ Monit Assess (2018) 190:260
Page 12 of 15
Fig. 6 Component planes of 22 water quality parameters of cluster IV in layer 2
clustering analysis because of the advantages as follows. (1) It can automatically generate the necessary neurons to achieve the specified clustering accuracy according to the characteristics of the input
Fig. 7 Parameters with similar distribution in component planes
data. (2) GHSOM allows the hierarchical inheritance relationship between the original data to be intuitively exhibited in the surface water quality assessment. And (3) GHSOM enables the boundaries of
Environ Monit Assess (2018) 190:260
Page 13 of 15 260 Table 4 Correlation test of the water quality parameters with similar distribution in component planes Parameter
Pearson correlation
Significance (two-tailed)
Se–Pb
0.550**
0.000
Se–Cu
0.712**
0.000
Cu–Pb
0.238
0.056
NH3-N–TP
0.486**
0.000
NH3-N–BOD5
0.357**
0.003
TP–BOD5
0.441**
0.000
VP–Hg
0.775**
0.000
VP–CrVI
0.843**
0.000
Hg–CrVI
0.723**
0.000
**Correlation is significant at the 0.01 level (two-tailed)
quality assessment, especially with large amounts of monitoring data, can facilitate the extraction more information, and provide strong support for water quality management. Acknowledgements We would like to thank the China National Environmental Monitoring Center for providing the water quality monitoring data. Funding information This research was a part of Major Science and Technology Program for Water Pollution Control and Treatment of China entitled BNational Water Environment Monitoring Intelligent Management Integrated Platform Construction Technology and Operational Demonstration^ (2014ZX07502-002) and was also supported by the Fundamental Research Funds for the Central Universities (2652016084).
References
Fig. 8 Hierarchical cluster tree of the monitoring sites
the classification to be depicted much more clearly. Therefore, the application of GHSOM in water
Alahakoon, D., Halgamuge, S. K., & Srinivasan, B. (2000). Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Transactions on Neural Networks, 11(3), 601–614. Almeida, S. F., Elias, C., Ferreira, J., Tornés, E., Puccinelli, C., Delmas, F., et al. (2014). Water quality assessment of rivers using diatom metrics across Mediterranean Europe: a methods intercalibration exercise. Science of the Total Environment, 476, 768–776. Alvarez-Guerra, M., González-Piñuela, C., Andrés, A., Galán, B., & Viguri, J. R. (2008). Assessment of Self-Organizing Map artificial neural networks for the classification of sediment quality. Environment International, 34(6), 782–790. Anny, F., Kabir, M., & Bodrud-Doza, M. (2017). Assessment of surface water pollution in urban and industrial areas of Savar Upazila, Bangladesh. Pollution, 3(2), 243–259.
260
Page 14 of 15
Aksela, K., Aksela, M., & Vahala, R. (2009). Leakage detection in a real distribution network using a SOM. Urban Water Journal, 6(4), 279–289. Bizzi, S., Harrison, R. F., & Lerner, D. N. (2009). The Growing Hierarchical Self-Organizing Map (GHSOM) for analysing multi-dimensional stream habitat datasets. In 18th World IMACS/MODSIM Congress. Cao, H., & Xu, D. (2014). Spatial-temporal variation of land-use in Songhua River Basin. Chinese Agricultural Science Bulletin, 30(8), 144–149. Céréghino, R., & Park, Y. S. (2009). Review of the self-organizing map (SOM) approach in water resources: commentary. Environmental Modelling & Software, 24(8), 945–947. Chan, A., & Pampalk, E. (2002). Growing hierarchical self organising map (ghsom) toolbox: visualisations and enhancements. In Neural Information Processing, 2002. ICONIP'02. Proceedings of the 9th International Conference on (Vol. 5, pp. 2537–2541). IEEE. Costa, J. A. F., & de Andrade Netto, M. L. (1999). Automatic data classification by a hierarchy of self-organizing maps. In Systems, Man, and Cybernetics, 1999. IEEE SMC'99 Conference Proceedings. 1999 I.E. International Conference on (Vol. 5, pp. 419–424). IEEE. Daou, C., Nabbout, R., & Kassouf, A. (2016). Spatial and temporal assessment of surface water quality in the Arka River, Akkar, Lebanon. Environmental Monitoring and Assessment, 188(12), 684. De la Hoz, E., de la Hoz, E., Ortiz, A., Ortega, J., & MartínezÁlvarez, A. (2014). Feature selection by multi-objective optimisation: application to network anomaly detection by hierarchical self-organising maps. Knowledge-Based Systems, 71, 322–338. Dittenbach, M., Merkl, D., & Rauber, A. (2000). The growing hierarchical self-organizing map. In Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on (Vol. 6, pp. 15–19). IEEE. Fritzke, B. (1994). Growing cell structures—a self-organizing network for unsupervised and supervised learning. Neural Networks, 7(9), 1441–1460. Gamble, A., & Babbar-Sebens, M. (2012). On the use of multivariate statistical methods for combining in-stream monitoring data and spatial analysis to characterize water quality conditions in the White River Basin, Indiana, USA. Environmental Monitoring and Assessment, 184(2), 845– 875. Gao, D., Li, Z., Wen, Z., & Ren, N. (2014). Occurrence and fate of phthalate esters in full-scale domestic wastewater treatment plants and their impact on receiving waters along the Songhua River in China. Chemosphere, 95, 24–32. González, S. O., Almeida, C. A., Calderón, M., Mallea, M. A., & González, P. (2014). Assessment of the water selfpurification capacity on a river affected by organic pollution: application of chemometrics in spatial and temporal variations. Environmental Science and Pollution Research, 21(18), 10583–10593. Griffiths, J. A., Chan, F. K. S., Zhu, F., Wang, V., & Higgitt, D. L. (2017). Reach-scale variation surface water quality in a reticular canal system in the lower Yangtze River Delta region, China. Journal of Environmental Management, 196, 80–90. Güler, C., Thyne, G. D., McCray, J. E., & Turner, K. A. (2002). Evaluation of graphical and multivariate statistical methods
Environ Monit Assess (2018) 190:260 for classification of water chemistry data. Hydrogeology Journal, 10(4), 455–474. Hentati, A., Kawamura, A., Amaguchi, H., & Iseri, Y. (2010). Evaluation of sedimentation vulnerability at small hillside reservoirs in the semi-arid region of Tunisia using the SelfOrganizing Map. Geomorphology, 122(1), 56–64. Hu, J., Liu, C., Guo, Q., Yang, J., Okoli, C. P., Lang, Y., et al. (2017). Characteristics, source, and potential ecological risk assessment of polycyclic aromatic hydrocarbons (PAHs) in the Songhua River Basin, Northeast China. Environmental Science and Pollution Research, 1–13. Ippoliti, D., & Zhou, X. (2012). A-GHSOM: An adaptive growing hierarchical self organizing map for network anomaly detection. Journal of Parallel and Distributed Computing, 72(12), 1576–1590. Janahiraman, T. V., & Kong, W. (2011). SOM based segmentation method to identify water region in LANDSAT images. International Journal of Electronics, Computer and Communications Technologies, 2(1), 13–18. Jin, Y. H., Kawamura, A., Park, S. C., Nakagawa, N., Amaguchi, H., & Olsson, J. (2011). Spatiotemporal classification of environmental monitoring data in the Yeongsan River basin, Korea, using self-organizing maps. Journal of Environmental Monitoring, 13(10), 2886–2894. Juahir, H., Zain, S. M., Yusoff, M. K., Hanidza, T. I. T., Armi, A. S. M., Toriman, M. E., & Mokhtar, M. (2011). Spatial water quality assessment of Langat River Basin (Malaysia) using environmetric techniques. Environmental Monitoring and Assessment, 173(1), 625–641. Kalteh, A. M., Hjorth, P., & Berndtsson, R. (2008). Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environmental Modelling & Software, 23(7), 835–845. Kohonen, T. (1981). Automatic formation of topological maps of patterns in a self-organizing system. In Processing 2nd Scandinavian Conference on Image Analysis (pp. 214– 220). Oja, E., Simula, O. (eds.). Koklu, R., Sengorur, B., & Topal, B. (2010). Water quality assessment using multivariate statistical methods—a case study: Melen River System (Turkey). Water Resources Management, 24(5), 959–978. Liu, H., Wang, J., & Zheng, C. (2004). Growing hierarchical selforganizing map models for mental task classification. Shengwu Wuli Xuebao, 21(6), 443–448. Liu, Y., Weisberg, R. H., & He, R. (2006). Sea surface temperature patterns on the West Florida Shelf using growing hierarchical self-organizing maps. Journal of Atmospheric and Oceanic Technology, 23(2), 325–338. Matharage, S., & Alahakoon, D. (2014). Growing self organising map based exploratory analysis of text data. World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 8(4), 639–646. Palomo, E. J., North, J., Elizondo, D., Luque, R. M., & Watson, T. (2012). Application of growing hierarchical SOM for visualisation of network forensics traffic data. Neural Networks, 32, 275–284. Rauber, A., Merkl, D., & Dittenbach, M. (2002). The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Transactions on Neural Networks, 13(6), 1331–1341.
Environ Monit Assess (2018) 190:260 Sarnovsky, M., & Ulbrik, Z. (2013). Cloud-based clustering of text documents using the GHSOM algorithm on the GridGain platform. In Applied Computational Intelligence and Informatics (SACI), 2013 I.E. 8th International Symposium on (pp. 309–313). IEEE. Sengorur, B., Koklu, R., & Ates, A. (2015). Water quality assessment using artificial intelligence techniques: SOM and ANN—a case study of Melen River Turkey. Water Quality, Exposure and Health, 7(4), 469–490. Shen, Y., Cao, H., Tang, M., & Deng, H. (2017). The human threat to river ecosystems at the watershed scale: an ecological security assessment of the Songhua River Basin, Northeast China. Water, 9(3), 219. Shukla, A. K., Ojha, C. S. P., & Garg, R. D. (2017). Application of overall index of pollution (OIP) for the assessment of the surface water quality in the Upper Ganga River Basin, India. In Development of Water Resources in India (pp. 135-149). Springer, Cham. Shrestha, S., & Kazama, F. (2007). Assessment of surface water quality using multivariate statistical techniques: s case study of the Fuji river basin, Japan. Environmental Modelling & Software, 22(4), 464–475. Simeonov, V., Stratis, J. A., Samara, C., Zachariadis, G., Voutsa, D., Anthemidis, A., Sofoniou, M., & Kouimtzis, T. (2003). Assessment of the surface water quality in Northern Greece. Water Research, 37(17), 4119–4124. Singh, K. P., Malik, A., & Sinha, S. (2005). Water quality assessment and apportionment of pollution sources of Gomti river (India) using multivariate statistical techniques—a case study. Analytica Chimica Acta, 538(1), 355–374. Templ, M., Filzmoser, P., & Reimann, C. (2008). Cluster analysis applied to regional geochemical data: problems and possibilities. Applied Geochemistry, 23(8), 2198–2213. Tsui, I. F., & Wu, C. R. (2012). Variability analysis of Kuroshio intrusion through Luzon Strait using growing hierarchical self-organizing map. Ocean Dynamics, 62(8), 1187–1194. Tyagi, S., Sharma, B., Singh, P., & Dobhal, R. (2013). Water quality assessment in terms of water quality index. American Journal of Water Resources, 1(3), 34–38. Vega, M., Pardo, R., Barrado, E., & Debán, L. (1998). Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis. Water Research, 32(12), 3581–3592.
Page 15 of 15 260 Voutilainen, A., Rahkola-Sorsa, M., Parviainen, J., Huttunen, M. J., & Viljanen, M. (2012). Analysing a large dataset on longterm monitoring of water quality and plankton with the SOM clustering. Knowledge and Management of Aquatic Ecosystems, (406, 406), 04. Wahed, M. S. A., Mohamed, E. A., Wolkersdorfer, C., El-Sayed, M. I., M’nif, A., & Sillanpää, M. (2015). Assessment of water quality in surface waters of the Fayoum watershed, Egypt. Environmental Earth Sciences, 74(2), 1765–1783. Wang, C., Feng, Y., Sun, Q., Zhao, S., Gao, P., & Li, B. L. (2012a). A multimedia fate model to evaluate the fate of PAHs in Songhua River, China. Environmental Pollution, 164, 81–88. Wang, C., Feng, Y., Zhao, S., & Li, B. L. (2012b). A dynamic contaminant fate model of organic compound: a case study of Nitrobenzene pollution in Songhua River, China. Chemosphere, 88(1), 69–76. Wei, C., Gao, C., Han, D., Zhao, W., Lin, Q., & Wang, G. (2017). Spatial and temporal variations of water quality in Songhua River from 2006 to 2015: implication for regional ecological health and food safety. Sustainability, 9(9), 1502. Wu, C. R., Hsin, Y. C., Chiang, T. L., Lin, Y. F., & Tsui, I. (2014). Seasonal and interannual changes of the Kuroshio intrusion onto the East China Sea Shelf. Journal of Geophysical Research: Oceans, 119(8), 5039–5051. Wu, M. L., Wang, Y. S., & Gu, J. D. (2015). Assessment for water quality by artificial neural network in Daya Bay, South China Sea. Ecotoxicology, 24(7–8), 1632–1642. Wu, Z., & Yen, G. G. (2003). A SOM projection technique with the growing structure for visualizing high-dimensional data. International Journal of Neural Systems, 13(05), 353–365. Yidana, S. M., Ophori, D., & Banoeng-Yakubo, B. (2008). A multivariate statistical analysis of surface water chemistry data—the Ankobra Basin, Ghana. Journal of Environmental Management, 86(1), 80–87. Yin, H. L., & Xu, Z. X. (2008). Comparative study on typical river comprehensive water quality assessment methods [J]. Resources and Environment in the Yangtze Basin, 17(5), 729–733. Zou, Z. H., Yi, Y., & Sun, J. N. (2006). Entropy method for determination of weight of evaluating indicators in fuzzy synthetic evaluation for water quality assessment. Journal of Environmental Sciences, 18(5), 1020–1023.