Data mining based quality analysis on informants involved applied research

Attitudinal questions are widely applied in the statistical questionnaire surveys, but the reliability of answers, affected by the psychological tende...

2 downloads 59 Views 554KB Size

Download PDF

Cluster Comput DOI 10.1007/s10586-016-0657-7

Data mining based quality analysis on informants involved applied research Jinlou Xie1 · Jianjian Luo2 · Qingyuan Zhou3

Received: 29 July 2016 / Revised: 15 September 2016 / Accepted: 19 September 2016 © Springer Science+Business Media New York 2016

Abstract Attitudinal questions are widely applied in the statistical questionnaire surveys, but the reliability of answers, affected by the psychological tendency of informants, is in doubt for the existence of systematic psychological errors. In this case, control and experimental groups were built up in the work for the sake of investigation and analysis. As a result, it was found that the selection tendencies of systematic psychological errors were derived from the settings of questionnaire answers, and there were always some rules to be followed. On this account, the researchers of statistical surveys are required to abide by the psychological tendency laws of informants and set up statistical questionnaires scientifically and rationally. In this way, the overall quality of survey data can be enhanced. Keywords Statistical questionnaire survey · Psychological tendency laws · Data quality · Application

1 Introduction “In God we trust; everyone else must bring data,” said Edwards Deming, American managerialist and statistician. His statement simply points out that the importance of data has been universally recognized. With the continuous development and expansion of social economy and its scale, it

B

becomes more difficult to understand and grasp a comprehensive knowledge; contrarily, the possibility to master accurate knowledge of objects is monishing. As a result, the significance of sample survey has become increasingly evident, and the theory of sampling technique has been of great concern to the community accordingly. In the period of transition between complete possessing firsthand materials and the sampling investigation, the reliability of sampling becomes the core in the development of sample surveys naturally. Despite the statistical confidence trap makes the survey estimation in an awkward situation, it is still of great significance when the confidence level is very high. With constant confidence level, the improvement of other estimation or investigation methods will make the results close to the reality of objects. This will not only avoid the pitfalls of statistical confidence, but also improve the efficiency of the survey. In the development of statistical sampling theory, PPS sampling estimation is a big progress and a qualitative leap in the sampling estimation theory comparing with simple random sampling. However, the pure theory is meaningless, only through the test of practice is the theory of social significance. Therefore, this paper pays attention to empirical test on the two most commonly used and practical methods of sampling with a practical point of view.

2 Related works Qingyuan Zhou [email protected]

1

Changzhou Institute of Technology, Changzhou, People’s Republic of China

2

Changzhou College of Information Technology, Changzhou, People’s Republic of China

3

Changzhou Administrative College, Changzhou, People’s Republic of China

Some scholars have verified and proofed the theoretical research of PPS sampling’s superiority rigorously with the help of mathematical methods through rigorous derivation and proof. On the whole, existing research focuses on the improvement of statistical sampling methods. Correspondingly, the requirements for data quality have been simultaneously enhanced. Based on the previous studies, at

123

Cluster Comput

this request, the evaluation of data accuracy became a relatively crucial part of research [1–4]. Evaluation methods of macroeconomic data were also formed to play a role, including equivalent calculation index system evaluation, account balance index system evaluation and “problem” evaluations, etc [5]. An enormous vitality was demonstrated through the understandings of the accuracy of data quality with the aid of statistical surveys [6]. Since statistical data quality was highly valued recently, it should be considered as a priority for all to manage, improving the statistical data quality of the government. In addition, the government was expected to possess more diversified evaluation methods [7,8]. With regards to the sources of data, the data of statistical surveys were proved to be more significant for research. No matter for the government or the academic circles, the survey data turned out to be more practical since this kind of data were more convenient, flexible and easy to be obtained. Despite all this, survey data were never flawless jades. Due to the difficulty in stochastic control, it was very difficult to make all the data perfectly precise. But what was fortunate was the data could gradually get close to precision with improvements of survey plan design and progressive control of investigation process [9]. There were also some relatively matured researches on top-level survey plan within the range of sample investigations [2]. Thus, numerous tactics are adopted to aid in the questionnaire design, including the number of questions, question stems, text length of alternative choices and the order of questions. Thereinto, an important hypothesis is implied behind the research on the validity, reliability and application of survey questionnaires: Data are obtained from survey questionnaires without the impact of psychological factors. However, it can never be denied that the impact of such a hypothesis on the data quality is no less than the actual occurrences of incidents with small or medium probabilities. It was considered that the randomness of samples would give rise to systematic errors. But apart from this respect, the context problems should also be incorporated into the research of survey data quality [10]. This cognition can be regarded as an important achievement in the research from the perspective of informants’ psychology. However, the mental factors are bound to affect the investigations throughout the entire process. Thus, it is rather difficult to authentically avoid psychological influences if psychological factors are merely controlled by means of context. For this reason, the author maintained that the psychic-reflex factors of informants should be taken into consideration in the investigation of survey data quality. But the problem was that even if “the psychological tendency factors were incorporated into investigation” indeed, as mentioned above, the objective existence of psychological tendency would not be completely solved [11]. Hence, it was not for the sake of evading risks of psychological influencing factors to carry out data surveys

123

with psychological selection tendency integrated into each answer. Instead, these data surveys were conducted to investigate and analyze the objective target of research in use of psychic-reflex factors and in accordance with the objective laws [12–14]. This was because questionnaire surveys where “the psychological tendency of informants” was utilized seemed to be more effective and significant than those where survey errors were diminished by improvements of investigation methods. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use [15]. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating [16]. Data mining is the analysis step of the “knowledge discovery in databases” process [17]. Based on the comparative experiment of psychological selection tendency, the work took advantage of the practical results to study the psychological tendency generated in the process when informants answered the questionnaires. In the final part, a practice plan was proposed in accordance with the objective psychological laws.

3 Hypotheses of psychological errors 3.1 Error hypothesis Psychological selection tendency of informants in the questionnaire surveys: error hypothesis. Statistical investigations are affected by the individual mental activities of both investigators and informants. Generally, such influences are scarcely resulted from a single source, but from the combined characterizations of mental activities. According to the decomposition of statistical errors, the compound factors, from which the errors of mental activities are originated, can be decomposed into the following three types: psychological registration errors, psychological random errors and psychological systematic deviations. 3.1.1 Psychological random errors Randomness of psychological selection of informants in questionnaire surveys: psychological random errors. Informants, according to this hypothesis, sometimes tend to be uncertain about their responses to the questions. As a result, the same informant may give variant answers towards the same question under different time and conditions. Moreover, such discrepancies are strongly correlated to the variations of the environment. This is mainly accounted for the volatility of informants’ mental activities. If mental activities are determined, the psychological errors of informants will be

Cluster Comput

eliminated. In other words, the choice of alternative answers has a correlation to the volatility of environmental influences on mental activities. The difference mentioned above conforms to the principle of statistical randomness, as it fluctuates within certain range and in a steady sequence. 3.1.2 Psychological registration errors Inconsistency between mental and behavioral activities of informants in questionnaire surveys: psychological registration errors. Different from the previous hypothesis of psychological random errors, psychological registration errors focus on the correlation between individual mental activities and behavioral activities under the same circumstances. According to this hypothesis, people’s behavioral activities may also differ even if mental activities are under the definite conditions. This means that such a discrepancy, similar to the dislocation of thinking and writing, is mainly generated by the inconsistency between mental and behavioral activities. Thus, it is named as psychological registration error. 3.1.3 Psychological systematic deviations Informants’ special preference for the design of alternative answers: psychological systematic deviations. Previous two hypotheses are founded based on the relationship between the mental factors and behavioral activities of informants. While this hypothesis attaches greater importance to the selection tendency features of informants who are faced with types of designs of alternative answers. According to this hypothesis, the superficial phenomenon of objective target will form a tendentious conditioned response to people’s psychology. This inclined conditioned reflex will act on the behavioral choices. To be more specific, the deviation is rooted in the process of psychic-reflex. On account that the formation of deviation was stereotyped by the psychic-reflex, and there were always some laws to follow, the author assumed it as a systematic deviation manifested in the systematic tendency of alternative choices in the survey questionnaires. 3.2 Contents defined in the research The error solutions adopted in statistical investigations can be exploited to reduce the psychological registration errors and psychological random errors. However, the psychological systematic deviations are originally determined by the survey questionnaires. That is, the survey questionnaire affects the quality of survey data. It was assumed that the psychological registration errors and random errors were in the white noise sequence. To be specific, these two errors show an inconspicuous influence on the quality of statistical survey data. But the psychological systematic deviations exerted a marked impact on the quality enhancement of survey data. For this

reason, the work specifically carried out a research on the psychological systematic deviations existing throughout the entire process of questions within the statistical questionnaire surveys. In order to simplify the research, the work further narrowed down the range of study to the focus that whether psychological tendency of informants was affected by the answer design of attitudinal questions.

4 Experiment design and investigation The psychological tendency of this questionnaire survey was built upon the basis of experimental test and verification in design. The research took advantage of experiment design, investigation and analysis to reveal the influence of psychological systematic deviations on statistical surveys. 4.1 Related requirements for experiment design The experiment in the work belonged to an empirical research of social sciences. It was designed to meet the following four requirements in accordance with the research properties. 4.1.1 Principle of avoiding interference with alternative answer options Despite the psychological differences among informants involved in the sample questionnaire surveys, the discrepancies are multiply diminished through behavioral results analysis. If the interference still exists, the essential phenomenon will be multiply concealed, even more deeply hidden than in other social sciences. On this account, disturbance terms, for example, different survey time and research targets should be kept as few as possible in experimental design. 4.1.2 Distractions of survey questions Distractions of survey questions indicate that the questions in the questionnaire should be distracted by other items. There can be one single authentic experimental question. In order to obscure the research content from others and reduce a special emotion of informants towards the research question, more than one questions are designed to intervene with the real survey content. 4.1.3 Relative comparability The experiment design should contain two parts, control group and experimental group. In the natural sciences experiments, it is prone to carrying out conventional contrast experiments. While some social sciences having improved properties can also set up specific experimental group and control group. However, the psychological intervene prob-

123

Cluster Comput

lems in the research have no standard references, so the control group designed can also be taken as the experimental group and the latter in reverse as the control group. 4.1.4 Neutral survey questions The survey questions are neutrally designed. In other words, the questions in the questionnaire are allowed to have less transitory volatility for specific group of people. If this transitory volatility has no correlation with the variation of psychological qualities of informants, no matter positive or negative correlation, the questions will be considered to be neutral. 4.2 Experiment design In line with the hypotheses and requirements, the work divided the psychological experimental groups of statistical investigation into four types. The first type was the experimental group of alternative answers order. In this group, the alternative choices were symmetrical, and the order was completely opposite. The second type was the experimental group of asymmetrical alternative answers to the same experiment questions as shown in the first type. The alternative choices were designed to be asymmetrical, and the two answers to the questions, that were control groups to each other, inclined to two directions. The third type was the experimental group of questionnaires with a LIKERT scale and text descriptions. This part was mainly designed to investigate the differences when informants were required to represent the alternative answers with words or digits in the questionnaires. The fourth type was the magnitude experimental group of LIKERT scale. This part investigated the influence of magnitude on the selection results by designing alternative answers with different magnitudes. The questions were designed as follows: (1) Do you think that opportunities are important for job hunting? Not important at all Very important 1 2 3 4 5 6 7 (2) Do you think that opportunities are important for job hunting? Very important Not important at all 7 6 5 4 3 2 1 (3) Do you think that opportunities are important for job hunting? A.Not important B. Just so-so C. Relatively important D. Important E. Very important (4) Do you think that opportunities are important for job hunting? A.Not important at all B. Not important C. Relatively unimportant D. Just so-so E. Important

123

(5) To what degree do you concern about the employment situations of college students? A.Not concern at all B. Not concern C. Basically not concern D. Just so-so E.Relatively concern F. Concern G. Very concern (6) To what degree do you concern about the employment situations of college students? A. Very concern B. Concern C. Relatively concern D. Just so-so E. Basically not concern F. Not concern G. Not concern at all (7) To what degree do you concern about the employment situations of college students? Not concern 1 2 3 4 5 Often concern (8) To what degree do you concern about the employment situations of college students? Often concern 5 4 3 2 1 Not concern

4.3 Experiment design and investigation explanation Among the above eight questions, the first and second were taken as a control group. They were used to study whether the order of alternative answers in scale would affect the psychological behavior of informants and give rise to systematic psychological errors when questionnaire surveys were conducted with LIKERT scale. The fifth and sixth questions were taken as a control group. They were used to study whether the order of alternative answers would affect the responses to the attitudinal questions designed to be stated with text descriptions. The first two control groups jointly verified the influence of answer order on the psychological tendency of informants. The third and fourth were taken as a control group. They were used to study the impact of asymmetry on the psychological tendency of informants when the alternative answers were stated with text descriptions. The first and fifth, as well as second and sixth, were taken as a control group. They were used to study the different influences on the individual psychological tendency when the alternative answers were stated in LIKERT scale and with text descriptions, respectively. The first, and seventh, as well as the second and eighth, were taken as a control group. They were used to study the impact of the series of LIKERT scale on survey results. In order to satisfy the second requirement for the questionnaire design, the other eight questions related to job hunting were set as the disturbance terms. Separately, the eight experiment questions were divided into four groups. In other words, every two experiment questions were combined with eight disturbance ones so as to form four types of survey questionnaires. Forty students were randomly selected as the fixed informants from college students in the first semester of junior year. Two pieces of questionnaires that were control groups to each other should be sent to the same student, respectively. Then the student was demanded to fulfill the

Cluster Comput

whole survey twice, and the interval time between each survey should be kept in 2 months. The experiment design and investigation aspects should contribute to explain the following questions: (1) Each question in the two groups was regarded as both control items and experimental items on the same level. (2) The experiment only covered eight questions as well as another eight as the disturbance questions related to the cognition of employment situations of college students and the intention surveys. The experiment questions were separately inserted into the eight disturbance questions so as to form four types of survey questionnaires. In this way, the significance of the real investigation question could be implicit. In accordance with the requirement of distractions of survey questions, students were evaded to recall their memories. (3) Informants were chosen from college students in the first semester of junior year. That was because freshmen and sophomores basically had no idea about job hunting, and undergraduates now suffering from job hunting were prone to acquire more radical ideas that may influence the survey quality. In contrast, junior students just began to pay attention to job hunting, and had not really sought for one yet. With such a good hope and no lack of concern, junior students were relatively objective enough to accord with the requirement of avoiding interference. (4) The same students were surveyed with different survey questionnaires in the same control group. The interval time should be kept in 2 months so as to diminish students’ memories of the first questions. However, the interval should not last for a long period of time lest students may acquire new information of employment and change their options. (5) The first group of questions took the shape of LIKERT scale helpful to study the influence on behavior by psychological tendency of order and reduce the memories of informants. The LIKERT scale was not adopted in the second group, because this group was aimed at inconsistency problems that could not be reflected by LIKERT scale. Nonetheless, the LIKERT scale should still be relied on to conduct statistical analysis.

5 Result analysis 5.1 Experiment surveys and results All the survey questionnaires were demanded to be strictly withdrawn in according with the above investigation operations. Since experiment questions were all ordinal data of attitudinal variations, the attitudinal status, for the convenience of analysis, was assigned to numerical values

according to the degree from being pessimistic to being optimistic. This was considered as a good method which was suitable for the quantitative analysis and consistent with the LIKERT scale analysis. The organized survey data were shown in Table 1. Table 1 shows the informants numbered from 1 to 10 are the ones that received the first survey, and those numbered from 11–20 are the ones that took the second survey. Since question 1, 2, 7 and 8 are among the choices in LIKERT scale, the table directly incorporates the informants’ answers to these questions. During the process when the answers to question 3 and 4 are numerically quantified, the unbiased intermediate state is “generally” prescribed as zero. To be specific, the numerical values 1, 2 and 3 are separately assigned to the three levels of “general”-to-optimism direction: being “comparatively important”, “important” and “very important”. While the numerical values −1, −2 and −3 are separately assigned to the three levels of “general”to-pessimism direction: being “comparatively unimportant”, “unimportant” and “completely unimportant”. As for question 5 and 6, in order to maintain the comparability of the statistics, the numerical values “1, 2, 3, 4, 5, 6 and 7” are assigned according to the series of unconcern by informants in the order of being “completely unconcerned, unconcerned, basically unconcerned, generally concerned, relatively concerned, concerned and extremely concerned”. And similarly, the numerical values “7, 6, 5, 4, 3, 2 and 1” are assigned according to the series of concern by informants in the order of being “extremely concerned, concerned, relatively concerned, generally concerned, basically unconcerned, unconcerned and completely unconcerned.” 5.2 Testing methods for the survey results The work studied the psychological tendency reflected by informants in questionnaire surveys. In other words, the research focused on the question that whether different schemes of alternative answers would exert systematic psychological deviations on informants. This could be estimated according to the statistical significance of difference between mean levels of survey data. To test and verify its statistical significance, it was appropriate to adopt the research method of variance analysis proposed by R.A. Fisher for the analysis of differences between two or more samples. The purpose of verification could be achieved by contrast analysis of the statistical significance of differences within or between the groups. The work utilized the method of one-way variance analysis to separately test and verifies all the data of control groups. If xi j is supposed to be the data resulted from informant j of question i, then in the light of question numbers, all the survey data can be divided into eight groups:x1 j , x2 j ...x8 j . Thereinto, j can equal to 1, 2...20. Then, Equation 1 and

123

Cluster Comput Table 1 Result data of experiment survey Number of informants

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8 3

1

4

4

2

1

3

4

1

2

7

4

0

−2

5

5

5

3

3

7

4

3

1

6

6

4

3

4

4

7

2

0

3

5

3

4

5

6

5

0

1

7

6

4

1

6

5

5

1

1

4

5

2

5

7

7

4

2

0

3

4

3

5

8

5

6

1

1

5

4

4

3 3

9

7

5

1

1

3

4

5

10

5

4

2

1

7

6

3

4

11

6

3

1

−1

5

4

6

5

12

6

5

0

0

5

5

3

4

13

7

4

1

1

3

5

3

4

14

5

6

2

1

6

5

4

3

15

6

4

2

2

6

6

2

5

16

4

4

3

0

7

4

1

5

17

4

5

0

0

3

6

5

2

18

5

4

1

1

4

4

4

1

19

7

7

2

1

4

5

3

3

20

7

6

2

0

3

5

3

2

Equation 2 below separately calculate the quadratic sums of random errors within and between the groups in each set of data X i j . (Each control group is tested and verified separately, so here the subscripts i and j of xi j only refer to the subscript notations of the two questions that are control groups to each other. The data in the calculations are independent of the other data outside the control group.) SS E =

i

(xi j − x i )

j

2

(x i −x) =

n (x i −x)

2

(2)

i

where x represents the mean number of xi j ; x i each set of mean value within the groups. Thereinto, i is the subscript of questions that are control groups to each other. It is required to contrast the errors within or between the groups for any significant differences in order to analyze whether the option settings will generally exert systematic errors on the psychological behavior of informants. Based

123

F=

2

(x i −x) /(r − 1)

SS A/(r − 1) i j = SS E/(n − r ) (xi j − x i )2 /(n − r )

(3)

j

(1)

j

i

i 2

where xi j refers to all the survey data in group I; x i the mean value within group i. Thereinto, i is the subscript of questions that are control groups to each other. The error refers to the psychological random error in the hypotheses of psychological errors. SS Ai,i+k =

on the principle of variance test, it is possible to construct the F statistical magnitude to test the statistical significance of differences within and between the groups.

where r represents the number of groups involved in the contrast analysis. The work merely made a contrast of two related groups, so here r equals to 2. The results of F statistical magnitude are placed in the corresponding F-distribution. According to the statistical hypothesis testing analysis, if the truth-value of P is less than 0.05 or 0.10, then it can be estimated based on the strict requirements that there are statistical significant differences. Correspondingly, it was concluded that different designs of options do affect the psychological tendency of informants, implying the existence of systematic psychological deviations. 5.3 Analysis of experiment results On the basis of variance analysis principle, EXCEL software was utilized to calculate and obtain the value and probability truth-value of F statistical magnitude (See Table 2).

Cluster Comput Table 2 Results of variance analysis

Number of questions

Mean value

F statistical magnitude

Corresponding P value

Experiment 1

Question 1

5.7

9.218182

0.007104

Question 2

4.4

Experiment 2

Question 3

1.4

4.313609

0.052402

Question 4

0.5

Experiment 3

Question 1

5.7

4.986885

0.038689

Question 5

4.4

Experiment 4

Question 2

4.4

2.419355

0.137251

Question 6

4.9

Experiment 5

Question 5

4.4

0.965665

0.338789

Question 6

4.9

Experiment 6

Question 1

5.7

16.70526

0.000692

Question 7

3.4

Question 2

4.4

6.081081

0.023939

Question 8

3.4

Experiment 7

According to the results of seven control groups in Table 2, the results of the first and fifth experiment tests indicate that the influence of the order of alternative answers on the psychological tendency of individual choices largely depends on the use of text descriptions in the statements of answers. The first experiment utilized the order of LIKERT scale as the experiment contrast. As a result, it saliently turned out that the P value corresponding to the variance analysis of mean value differences equaled to 0.007104. In other words, the psychological tendency of informants is affected by the order of options in the questionnaires designed in LIKERT scale, and there are relatively conspicuous systematic psychological deviations. In addition, in the view of mean values, the mean value of series arrangement (5.7) is bigger than that of reverse order (4.4). This illustrates that from the perspective of individual psychology, such systematic psychological deviations are rooted in the habit that people tend to choose the options on the right. The result of variance analysis in the experiment five shows that the experiment fails the significance test with a P value higher than 0.05. This implies that the choices made by informants are independent of the order of alternative answers to attitudinal questions stated with text descriptions rather than in LIKERT scale. The result of experiment two reveals that this experiment passes the significance test with a P value of 0.052402 even when the conditions are relaxed to the critical probability of 0.1. In this case, it can be concluded that the consistency of attitudes will also affect the final results even if the alternative answers are stated with text descriptions. The third and fourth experiments were jointly used to analyze the differences when the alternative answers were stated in LIKERT scale and with text descriptions, respectively. The results show that the third experiment passes the significance test while the fourth experiment fails. Thus, there seems to be a contradiction here. However, if

combined with the analysis in the first experiment, it can be discovered that the nature of passing the test does not rely on the ways in which the alternative answers are stated, but on whether the options are arranged in series or reverse order. That is to say, it is the order of answers affecting the survey results. The consistent survey results are irrelevant to the ways in which alternative answers are stated. The sixth and seventh experiments simultaneously verified whether the magnitude of LIKERT scale would affect the survey results. Both experiments succeed to pass the significance test. But this merely indicates that numbers are diversified and cannot completely explain the problem. It can be assumed that the differences of data are probably originated from the same amount of increase or decrease of magnitude. Therefore, contrast analysis should be further conducted in combination with the theoretical mean value of questions. The mean value of seven-magnitude scale in the sixth experiment is 5.7, located in the middle of theoretical mean 4 and maximum 7. The mean value of five-magnitude scale is 3.4, more close to the theoretical mean 3 than the mean value 5.7 of seven-magnitude scale. Hence, it can be concluded that the increasing number of scale magnitude will oversize the survey results at the same time of precising the evaluation scale. The difference is found diminished in the eighth experiment when alternative answers were stated in reverse order.

6 Conclusions The differences of psychological tendency of informants were partly caused by the ways in which the alternative answers to attitudinal questions were stated in survey questionnaires. Besides, systematic psychological deviations remained to be influenced by alternative answer statement.

123

Cluster Comput

Such psychological deviations finally transferred to the actual choice behavior from psychological tendency, which was mainly manifested in the selection differences of alternative answers. The following aspects are the major features. (1) When alternative answers were stated in LIKERT scale, the order of scale would affect the individual psychological selection tendency, which was manifested in the feature that people tended to choose the options on the right. But this feature was not statistical significant enough when alternative answers were stated with text descriptions. (2) When alternative answers were stated with text descriptions, the consistency would affect the individual psychological selection tendency, which was manifested in the inconsistency of alternative answer selections. (3) The increasing number of scale magnitudes would oversize the survey results at the same time of precising the evaluation scale. The difference was found diminished when alternative answers were stated in reverse order. The research above indicates that in the survey research with attitudinal questions, informants tend to make affected choices due to the characteristics of answer settings. Therefore, it is expected to set alternative answers to these attitudinal questions according to the laws of systematic psychological deviations. If problems or deficiencies are found in the surveyed objective phenomenon, the alternative answers can be amplified so as to make the existing problems more significant. For example, the magnitude of scale can be increased; the options of alternative answers can be arranged in the order from being optimistic to being pessimistic; the alternative answers can also be set to be inconsistent and inclining to the pessimistic aspect of questions. While in the research on the advantages and merits of target objects, the alternative answers can be set in the opposite direction. To obtain more accurate data of the contents stated in the survey questionnaires, the magnitude of LIKERT scale can be adopted for regulation and control. With more magnitudes, the survey results will become more accurate, but unfortunately will also further deviate to the right. The use of reverse order is helpful to reduce the deviation to right and vice versa. Acknowledgments This work is supported by the Sciences Foundation of Jiangsu Province (No. 14ZWD001).

References 1. Xu, X.: Some statistical problems related to GDP accounting. Financ. Trade. Econ. 4, 5–10 (2009)

123

2. Pang, Z., Niu, C.: Construction of nonsampling error function— a discussion of non-response error based on incomplete sample frame. Stat. Inf. Forum. 12, 15–19 (2013) 3. Wang, H., Jin, Y.: Analytical methods for the error effect of precision evaluation of statistical data. Stat. Inf. Forum 9, 10–16 (2009) 4. Zipei, T.: Big data. Guangxi Normal University Press, Guangxi Province (2013) 5. Dang, W.: Research on the evaluation methods of macro statistical data quality based on SAM. Stat. Inf. Forum 8, 8–14 (2013) 6. Zhang, F., Zhu, S., Cong, R.: Investigation and analysis of statistical data accuracy and the influencing factors. World Survey. Res. 9, 47–49 (2013) 7. Feng, L., Zhou, J.: Assessment methods of government statistical data accuracy. Stat. Res. 6, 78–84 (2013) 8. Fan, H.: Total quality management of survey data. Stat. Res. 11, 53–56 (2010) 9. Wang, Y.: Quality control of questionnaires. J. Bus. Econ. 4, 25–27 (2003) 10. Cai, H.: Context and questionnaires. J.Sun Yatsen Univ. 3, 115–128 (2004) 11. Maddison, A.: The world economy—a millennial perspective. OECD Development Centre, Paris (2001) 12. Zhou, Q., Luo, J.: The risk management using limit theory of statistics on extremes on the big data era. J. Comput. Theor. Nanosci. 12, 6237–6243 (2015). doi:10.1166/jctn.2015.4661 13. Zhou, Q., Luo, J.: Empirical test of efficiency comparison between PPS estimation and simple random sampling. Adv. Sci. Lett. 5(1), 437–440 (2012). doi:10.1166/asl.2012.3166 14. Zhou, Q., Luo, J.: Artificial neural network based grid computing of E-government scheduling for emergency management. Comput. Syst. Sci. Eng. 30(5), 327–335 (2015) 15. Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014). doi:10.1109/ TKDE.2013.109 16. Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014). doi:10. 1109/TKDE.2013.34 17. Han, L., Ong, H.Y.: Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences. Clust. Comput. 18(1), 403–418 (2015). doi:10.1007/ s10586-014-0405-9

Jinlou Xie received Master degree from Soochow University, Suzhou in 2008, respectively. Currently, he is an associate professor in Changzhou Institute of Technology, China. He has joined and accomplished 6 provincial research programs. He has also authored/coauthored over 20 papers in international/national journals and conferences. His current research interests include financial engineering and cluster computing.

Cluster Comput

Jianjian Luo received Ph.D. degree from Shanghai University, Shanghai in 2007, respectively. Currently, he is an associate professor in Changzhou College of Information Technology, China. He has joined and accomplished 2 national and 9 provincial research programs. He has also authored/coauthored over 30 papers in international/national journals and conferences. His current research interests include optimization theory and cluster computing.

Qingyuan Zhou received Ph.D. degree from Sichuan University (SCU), Chengdu in 2014, respectively. Currently, he is an associate professor in Changzhou Administrative College, China. He has joined and accomplished 5 national and 6 provincial research programs. He has also authored/co-authored over 50 papers in international/national journals and conferences. His current research interests include cluster computing, computational economics and optimization theory.

123

Data mining based quality analysis on informants involved applied research

Recommend Documents