J Market Anal https://doi.org/10.1057/s41270-018-0039-5
ORIGINAL ARTICLE
Item placement for questionnaire design for optimal reliability Pushkin Kachroo1 • Sheen Kachen2
Revised: 12 May 2018 Ó Macmillan Publishers Ltd., part of Springer Nature 2018
Abstract This paper presents a methodology for placement of the items of surveys to obtain optimal reliability. A mathematical model is developed based on some specific structured assumptions for reliability and consistency. The problem is transformed into a mathematical optimization problem. We present solution methodology for the problem and also properties of the algorithm. An example is presented to illustrate our methodology. Keywords Optimization Questionnaire Sequencing
Introduction Questionnaire design encompasses general principles like a survey’s objective and the ordering of questions as well as more specific principles like a survey’s layout and the wording of questions (Labaw 1981). Elements of questionnaire design like numbering and physical presentation influence the way survey respondents perceive and answer questions. Similarly, the types of response formats offered for each question and the structure of the questions themselves are greatly important in questionnaire design (Bradburn et al. 2004). The phrases of questions and responses should avoid asking leading questions, thus & Pushkin Kachroo
[email protected] Sheen Kachen
[email protected] 1
Department of Electrical and Computer Engineering, UNLV, Las Vegas, USA
2
Department of International and Area Studies, University of California, Berkeley, Berkeley, USA
precluding response error (Willis 2004). In survey design, question spacing is integral in measuring a question’s salience from the perspective of respondents and thereby, a question’s overall effectiveness. Question sequencing is one metric by which a survey can optimize quality of responses. The quality of responses to questions depends on the placement of each question in the group of questions offered in a survey. Validity and reliability are metrics utilized to evaluate questionnaires, the former measuring whether or not a question poses a relevant query and the latter measuring a questionnaire’s internal consistency (Rattray and Jones 2007). Both metrics can help determine the optimal placement of questions in a survey to maximize quality responses. A proper and effective design of a survey is very important (Lavrakas 2008). One of the issues is item placement. Item placement has been studied experimentally to evaluate its effect for testing purposes (Mollenkopf 1950), and also in its more general context (Ory 1982; McFarland et al. 2002). Validity and reliability are extremely important to any questionnaire designer (Rattray and Jones 2007; Cuber and Gerberich 1946; Santos 1999) but specifically using item placement as a mechanism to address the issue and solving it as an optimization problem is the new contribution of this paper. The length of a questionnaire definitely has an effect on the response quality (Galesic and Bosnjak 2009; Herzog and Bachman 1981; Burchell and Marsh 1992). How the quality of response deteriorates in the later part of a questionnaire has been systematically studied (Galesic and Bosnjak 2009). This principle has been used in this paper to show how the mathematical optimization problem can be solved for item placement depending on some ‘‘chosen’’ deterioration rate. However, more research needs to be
P. Kachroo, S. Kachen
performed to find out what specific rate would be valid for a given questionnaire.
Reliability The internal consistency of questions in a questionnaire is vital in measuring its reliability, which is one of the factors used to determine the quality of responses by questionnaire respondents. It is imperative to assess the accuracy of information recorded in questionnaires in order to determine the reliability of the data (Cuber and Gerberich 1946). Reliability can be assessed using Cronbach’s alpha statistic, which measures whether constituent items are of the same domain within a questionnaire’s distinct domains. Cronbach’s alpha statistic tests the average correlation of items in a survey to determine its internal consistency, and thus its reliability (Santos 1999). Additionally, reliability is sometimes measured using alternate correlation variables. Both Cronbach’s alpha and alternate correlation variables indicate a questionnaire’s internal consistency, which measures its reliability. Questions with similar phrasing and content or identical questions repeated throughout a survey increase reliability and the quality of data collected. Reliability is extremely important for a survey, for example, for those that try to capture customer satisfaction, the survey designers would like to know the answers to their questions with high reliability and validity.
Questionnaire length and fatigue As question respondents work through a survey, the attention they afford to each individual question varies as a function of multiple metrics. Alertness is dependent on various inputs like sleep and the amount of time an individual spends working, both of which affect respondents’ performance (Mallis et al. 2004). Survey length can be negatively related to participants’ willingness to participate in a survey, and as such, lower the quality of data collected from survey responses. Questions situated later rather than earlier in a survey are associated with four indicators of data quality (Galesic and Bosnjak 2009): (a) (b) (c) (d)
Shorter response times Higher item-nonresponse rates Shorter answers to open-ended questions Less variability to items arranged in grids
Each of these metrics can be related to data quality of survey responses as a function of questionnaire length. As such, the issue of question placement in regard to survey length is important to consider in order to maximize data quality.
Mathematical problem description The item placement problem for a questionnaire is a very important one as it is related to the quality of data that we obtain from the responses. The placement problem can be a linear one when the questions are written in a sequential fashion following one after another. The placement problem can get more complicated if the questionnaire has a nonlinear structure as in the case of some web-based question where the placement of the questions could be done in a geometric way which is not sequential. This paper addresses the item placement problem for strictly the linear case. Model assumptions We present our model framework on which the optimization problem is built. The following are the assumptions used in our model. A1
A2
A3
A4
Reliability of an answer to a question is higher if one gets the same answer when the same question is asked more times. Reliability of an answer to a question is higher if one gets the same answer when the same question is asked far from each other in a sequence of questions upto a threshold distance. Reliability of an answer to a question is lower if the question is asked lower in a sequence of questions due to responder fatigue. Reliability is additive in the sense that the total reliability of a questionnaire is the sum of reliability of responses to each question (with its multiple postings).
Based on these assumptions, various optimization problems can be framed to obtain some desired questionnaire design reliability. Similarly, given a questionnaire, we can find out its reliability using formula from model assumptions. We present the reliability function R to quantitatively represent the assumption metrics and then present the analysis problem followed by various optimization problems. It is very important to realize that in practice, we do not ask the exact same question, but we have a set of similar questions that we ask and if the answers to those questions are similar we have higher confidence in the responses from the person taking the survey. Reliability function The reliability function maps a list of responses to questions and their corresponding placement in a sequence to the set of positive real numbers Rþ . A few examples of
Item placement for questionnaire design for optimal reliability
some members of the domain with their meanings are listed in Table 1. The symbol q1 represents a question where the natural numbers following that represent the placement of that question. r1 ðfðq1 ; 1ÞgÞ means some response to the questionnaire represented by fðq1 ; 1Þg, whereas r2 ðfðq1 ; 1ÞgÞ means some other response. The reliability functions R act on these responses to give a nonnegative number as the outcome that we can compare for reliability. We will write r1 ðq1 ð1ÞÞ to mean the response to question labeled as q1 which was the first item in the survey in the response r1 of the questionnaire. Similarly, we will write r1 ðq1 ð2ÞÞ to mean the response to question labeled as q1 which was the second item in the survey in the response r1 of the questionnaire. Using this notation and the model assumptions from ‘‘Model assumptions’’ section, we present some relationship examples. r1 ðq1 ð1ÞÞ ¼ r1 ðq1 ð2ÞÞ ¼ r2 ðq1 ð1ÞÞ )Rðr1 ðfq1 ; 1; 2gÞÞ Rðr2 ðfq1 ; 1; 2gÞÞ r1 ðq1 ð1ÞÞ ¼ r1 ðq1 ð3ÞÞ ¼ r2 ðq1 ð1ÞÞ ¼ r2 ðq1 ð2ÞÞ )Rðr1 ðfq1 ; 1; 3gÞÞ [ Rðr2 ðfq1 ; 1; 2gÞÞ
ð1Þ ð2Þ
r1 ðq1 ð1ÞÞ ¼ r2 ðq1 ð1ÞÞ)Rðr1 ðfq1 ; 1gÞÞ [ Rðr2 ðfq1 ; 2gÞÞ: ð3Þ
This paper is well written. Strongly Agree
1. 2.
2.
3.
rðfðq1 ; 1ÞgÞ
Strongly Disagree
Question placement and Specific Response
This means that even if we have fixed the exact questions as well as their repetition quantities, if we change the placement of questions and have the exact same responses, the reliability will be different just based on where the questions were placed. Similarly, if we keep the questions fixed with their repetitions and placements, but get different responses, then the reliability score will again be different. The problem of choosing the most reliable placement has to be performed over the set of responses possible, i.e., we have to perform an operation that removes the dependence on each response, so that we obtain the ‘‘best’’ solution irrespective of specific responses. This, in general, can be performed in multiple ways, such as
Reliability is a function of responses to a given survey of questions with specific repeated questions with a fixed placement. As a specific example, let us consider rðfðq1 ; 1; 6Þ; ðq2 ; 2; 4ÞgÞ. This shows the element of the domain to be a set of two ordered lists. The first one ðq1 ; 1; 6Þ shows that question q1 is placed at position 1 and 6. Similarly, ðq2 ; 2; 4Þ shows that question q2 is placed at position 2 and 4. The response of these four statements (two questions with two repetitions each) is given by the response function rðÞ applied to fðq1 ; 1; 6Þ; ðq2 ; 2; 4Þg. The response set for each question can be of many different types depending on scale. For instance, consider the Likert scale. The Likert scale typically can have choices such as the ones shown in Fig. 1. Each response from this question will be one of five choices, as only one answer should be allowed.
Table 1 Some domain members and their meanings
Disagree
expected reliability over all possible responses. The reliability function R we have developed is a function of two aspects:
1.
The problem is to find the placement of the given survey questions with given multiplicities so as to maximize
Neutral
Fig. 1 Likert scale
Reliability for placement
Optimization problem
Agree
Best of the best Choose the placement with the highest top rated reliability over all responses. This is to take the highest reliability for each fixed placement over all their responses, and then choose that placement which has the highest rated reliability. This could of course give multiple solutions, and moreover would not give best performance for ‘‘average’’ responses. This strategy tries to get the best overall performance out of all responses. Best of the worst Choose the placement with the highest lowest rated reliability over all responses. This is to take the lowest reliability for each fixed placement over all their responses, and then choose that placement which has the highest out of the worst rated reliability. This could of course give multiple solutions as well, and moreover would not give best performance for ‘‘average’’ responses. This strategy tries to minimize the impact of the worst reliability. Best average Choose the placement with the highest average reliability over all responses. This is to take the average reliability for each fixed placement over all their responses, and then choose that placement which
Question identified as q1 at line 1
rðfðq1 ; 1; 5; 7ÞgÞ
Question identified as q1 at lines 1, 5 and 7
rðfðq1 ; 1; 6Þ; ðq2 ; 2; 4ÞgÞ
Question identified as q1 at line 1 and 6, and q2 at 2 and 4
P. Kachroo, S. Kachen
4.
has the highest average reliability. This could of also give multiple solutions but would give best performance for ‘‘average’’ responses. This strategy aims to maximize average performance. Best expected value We can use different weights for different responses (for instance based on which ones are more likely) and then choose the placement with the highest expected reliability.
Mathematical analysis Most of the problems in the optimal survey question placements are combinatorial, and hence require combinatorial algorithms for their solution. We can perform complexity analysis for the different optimization problems. We are given n questions, qi ; i ¼ 1; 2; . . .; n, where each question qi has multiplicity ni . Even the multiplicity of each question could be obtained as the outcome of another mathematical design problem. For instance, assume that we are only given the list of questions qi ; i ¼ 1; 2; . . .; n and we are also told that we need certain minimum reliability for each question, or alternately, we are given relative importance of each question. Additionally, we are also given the total real estate in terms of the maximum number of total questions including multiplicity allowed in the questionnaire. Based on this data, we can solve the problem of deciding the optimum ni ; i ¼ 1; 2; . . .; n. Complexity analysis For optimal placement, we have to be able to compute the reliability score (of certain type, such as for the ‘‘Best of the best’’ etc.) for each of the different placements, unless we can eliminate some subset automatically via some analysis or heuristic. The total number of placements Np‘ we get is given by Pn ni ! ð4Þ : Np‘ ¼ Qni i ni ! Now, for each placement, we can compute the reliability score for each possible response. Let us assume that each question ni has ci choices such as the Likert scale. Hence the total different responses NR for each fixed placement will be NR ¼
n Y
cni i :
ð5Þ
i
optimization. Specifically, if a differential function has an optimum at a point, its derivative is zero. This is what we refer to ‘‘differential analysis’’ for optimization. For instance, we have a monotonic function for fatigue that shows that fatigue increases as the number of questions to be answered in a survey. Moreover, we have a consistency function that shows that if the ‘‘distance’’ between the same question increases in the placement and we still get the same response, then we have more reliability for that response. The balance between these two functions can provide the optimal placement of this question. Let us assume that we have placed the first question placed at the first location then, the reliability for the second question is a function of the placement x being treated as a continuous variable. Then we can have RðxÞ ¼ FðxÞ þ RðxÞ;
where we are using F(x) for the fatigue function and R(x) for distance between reliability for this specific situation, and RðxÞ for the overall reliability for this question with double repetition and this specific placement. We can solve this subproblem by solving dRðxÞ dFðxÞ dRðxÞ ¼ þ ¼ 0: dx dx dx
As an example of using the differential analysis to find the optimal placement, let us assume we have twenty places for questions in a survey. We have the most important question with a repetition of two to be placed. We make it the first question on the survey and now, we want to find out where the second instance of the question should be placed. We make the question placement as a continuous variable x taking value between 0 and 1, where x ¼ 0 indicates the first question place, and x ¼ 1 means the twentieth question place. We would like to point out that, in general, we can use more than two versions of a question in a survey design. However, we are only using this specific example to illustrate the method applied to this simple problem. We will use some functions to represent monotonic relationships to illustrate the technique. To obtain exact functions would require extensive experimental studies which is beyond the scope of this paper, and will be the study of future papers. Let us use the following functions: FðxÞ ¼ e2x
ð8Þ 2
If we have a differentiable function f : R ! R, where R indicates the set of real numbers, we can use calculus for
ð7Þ
Numerical differential analysis example
RðxÞ ¼ 1 e5x : Differential analysis
ð6Þ
ð9Þ
Taking their derivatives we see that their signs do not change in the domain to indicate their monotonicity.
Item placement for questionnaire design for optimal reliability
reliability obtained at x ¼ 0:75. Rescaling it back to the scale for twenty questions, this would indicate that the question should be placed at the 15th position.
Numerical example Fig. 2 Octave code used at https://octave-online.net/
dFðxÞ ¼ 2e2x \0; dx dRðxÞ 2 ¼ 10xe5x 0; dx
ð10Þ
8x 2 ½0; 1
ð11Þ
8x 2 ½0; 1:
To find the optimal placement, we take the derivative of RðxÞ and equate it to zero. dRðxÞ 2 ¼ 10xe5x 2e2x ¼ 0 dx
ð12Þ
This is a nonlinear equation to be solved and requires a numerical solver. One easily available solver is the online octave solver that we used. The code we used is provided in Fig. 2. The two functions F(x), R(x) as well as their sum R independently as well as together are shown in the plots of Fig. 3. We can see that the slope is horizontal at the peak
The contribution of the paper is optimizing the placement of survey items knowing that if we ask the same question multiple times and we get the same reply then the reliability of that answer is higher, especially if the distance (in terms of item steps) of the two placements is higher. However, the exact placement depends on how the distance affects this reliability, and also how much ‘‘fatigue’’ comes into play in the reliability of the answers. From a practical point of view, some simple models can be chosen such as linear ones to come up with optimal placement for the corresponding assumption. However, experimental studies can be performed to calculate accurate functions for these relationships and also their parameters. Those studies are considered as future topics in this line of research To understand some mechanics of optimal placement, we will now take a simple example that will show the details of the process. We will keep the problem
Fig. 3 Optimized location
(a)
(c)
(b)
(d)
P. Kachroo, S. Kachen Table 2 Placement and scores
Placement
Score
ðq1 ; q1 ; q2 ; q2 ; q2 Þ
4
ðq1 ; q2 ; q1 ; q2 ; q2 Þ
7
ðq1 ; q2 ; q2 ; q1 ; q2 Þ
9
ðq1 ; q2 ; q2 ; q2 ; q1 Þ
10
ðq2 ; q1 ; q1 ; q2 ; q2 Þ
6
ðq2 ; q1 ; q2 ; q1 ; q2 Þ
8
ðq2 ; q1 ; q2 ; q2 ; q1 Þ
9
ðq2 ; q2 ; q1 ; q1 ; q2 Þ
6
ðq2 ; q2 ; q1 ; q2 ; q1 Þ
7
ðq2 ; q2 ; q2 ; q1 ; q1 Þ
4
intentionally small so that the computations can be easily performed, and we will also take simple functions. However, the example will illustrate the important steps that must be taken on a big-sized problem. Let us take as the input the question list with the multiplicities of each question also provided. Multiplicities refer to how many times each question is repeated (even if stated in a different way). Let us take fq1 ; q1 ; q2 ; q2 ; q2 g as the input data. This shows we have only two questions q1 and q2 to place, and q1 has multiplicity of 2 and q2 has multiplicity of 3. Let also assume that each question has only two responses possible: true (T) or false (F). Hence, we have n ¼ 2 for our problem, with n1 ¼ 2 and n2 ¼ 3. We also have c1 ¼ c2 ¼ 2. An example of a placement would be ðq1 ; q2 ; q2 ; q2 ; q1 Þ. Another would be ðq1 ; q1 ; q2 ; q2 ; q2 Þ. The total number of unique placements Np‘ are Pn ni ! ð2 þ 3Þ! 5! ð13Þ ¼ ¼ 10: Np‘ ¼ Qni ¼ 2!3! 2!3! n ! i i Hence, the total different responses NR for each fixed placement will be NR ¼
n Y
cni i ¼ 22 23 ¼ 32:
let us say that the scoring function multiplies the distance between q1 placement by 2 and that between q2 by 1. Then the scores given to the 10 different placements would be as shown in Table 2. It is clear that for this example with the choices we have made the placement with the most reliability is ðq1 ; q2 ; q2 ; q2 ; q1 Þ.
Conclusions This paper presented a mathematical study of the problem of question placement in a survey to provide optimal reliability from questionnaire responses. This is an extremely significant problem to study as surveys are instruments for collecting data that require interpretation, and whose reliability is of utmost importance. This paper presented the mathematical framework that can be used to solve the problem. It illustrated different aspects of the problem and offered many design choices. The paper also presented certain details numerically by creating some sample study problems. This is the first paper studying this problem in this mathematical framework, and subsequent papers will look at various issues that this paper has presented. The main limitation of the paper is that it does not provide the exact reliability function to use. The most important immediate future research issue is to obtain the exact form of the reliability function by conducting controlled experiments to study locations of the items and then finding out what exact relationship those locations have with reliability. This will require multiple experiments and the functions will likely be different for different types of surveys. For instance, for customer satisfaction survey of a certain kind, the reliability function might be different as compared to a survey targeting a different audience or topic.
ð14Þ
i
Now, we need a scoring function for each response for each placement. For instance, for the placement ðq1 ; q2 ; q2 ; q2 ; q1 Þ, we can get 32 different responses. For instance, one response could be (T, T, F, F, T). The reliability function would map ððq1 ; q2 ; q2 ; q2 ; q1 Þ; ðT; T; F; F; TÞÞ to a real number. We would use any of the optimization objective as discussed in ‘‘Optimization problem’’ section, and then numerically (using a software code built for the specific optimization) pick the best solution. If we choose a reliability scoring function that would be independent of the survey answers, then that would simplify the solution process independently. In our example,
References Bradburn, Norman M., Seymour Sudman, and Brian Wansink. 2004. Asking Questions: The Definitive Guide to Questionnaire Design-for Market Research, Political Polls, and Social and Health Questionnaires. New York: Wiley. Burchell, Brendan, and Catherine Marsh. 1992. The effect of questionnaire length on survey response. Quality and Quantity 26 (3): 233–244. Cuber, John F., and John B. Gerberich. 1946. A note on consistency in questionnaire responses. American Sociological Review 11 (1): 13–15. Galesic, Mirta, and Michael Bosnjak. 2009. Effects of questionnaire length on participation and indicators of response quality in a web survey. Public Opinion Quarterly 73 (2): 349–360.
Item placement for questionnaire design for optimal reliability Herzog, A.Regula, and Jerald G. Bachman. 1981. Effects of questionnaire length on response quality. Public Opinion Quarterly 45 (4): 549–559. Labaw, Patricia J (1981). Advanced Questionnaire Design. Abt Books. Lavrakas, Paul J. 2008. Encyclopedia of Survey Research Methods. Thousand Oaks: Sage Publications. Mallis, Melissa M., et al. 2004. Summary of the key features of seven biomathematical models of human fatigue and performance. Aviation, Space, and Environmental Medicine 75 (3): A4–A14. McFarland, Lynn A., Ann Marie Ryan, and Aleksander Ellis. 2002. Item placement on a personality measure: Effects on faking behavior and test measurement properties. Journal of Personality Assessment 78 (2): 348–369.
Mollenkopf, William G. 1950. An experimental study of the effects on item-analysis data of changing item placement and test time limit. Psychometrika 15 (3): 291–315. Ory, John C. 1982. Item placement and wording effects on overall ratings. Educational and Psychological Measurement 42 (3): 767–775. Rattray, Janice, and Martyn C. Jones. 2007. Essential elements of questionnaire design and development. Journal of Clinical Nursing 16 (2): 234–243. Santos, J.Reynaldo. 1999. Cronbach’s alpha: A tool for assessing the reliability of scales. Journal of Extension 37 (2): 1–5. Willis, Gordon B. 2004. Cognitive Interviewing: A Tool for Improving Questionnaire Design. Thousand Oaks: Sage Publications.