MARKET BASKET ANALYSIS
Yasemin Boztuğ/Lutz Hildebrandt*
MODELING JOINT PURCHASES WITH A MULTIVARIATE MNL APPROACH**
A BSTRACT Our research examines the hypothesis that products chosen on a shopping trip to a supermarket indicate the preference interdependencies of consumers between different products or brands. The bundle chosen on the trip can be regarded as an indicator of a global utility function. This function implies a cross-category dependence of brand choice behavior. We hypothesize that the global utility function related to a product bundle is the result of the marketing-mix of the underlying brands. The structure of the chosen products allows us to uncover the impact of certain marketing-mix variables and product bundle buying behavior. JEL-Classification: C31, C33, M31. Keywords: Choice Model; Market Basket Analysis; Spatial Statistics; Theorem of Besag.
1 I NTRODUCTION Marketing researchers often focus on brand choice models. Guadagni and Little (1983), who introduced the multinomial logit model (MNL) in marketing, can be considered the starting point of modeling brand choice in marketing. A discrete choice model seems to be the most appropriate approach to describe the choice behavior, but most of the existing applications of this model type are only suitable to explain the choice of a single brand in one product category. The MNL and its sophisticated developments are all based on the implicit assumption that purchases in one particular category are independent of purchases in other categories. Usually, however, when a consumer buys more than one product, s/he chooses a bundle of products on a single shopping trip. The combination *
Yasemin Boztuğ, Associate Professor, MAPP, Department of Marketing and Statistics, Aarhus School of Business, University of Aarhus, Haslegaardsrej 10, DK-8210 Aarhus V, Denmark, e-mail:
[email protected]; Lutz Hildebrandt, Professor, Institute of Marketing, Humboldt University Berlin, Spandauer Str. 1, D-10178 Berlin, Germany, e-mail:
[email protected]. ** Financial support by the German Research Foundation (DFG) through the research project #B0 1952/1, and through the Sonderforschungsbereich 649 is gratefully acknowledged. The authors thank two anonymous referees for valuable comments.
400
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
of these products might indicate preference relations or interdependencies in the production function of the household. An increasing number of researchers demand, not only from an empirical point of view but also from a theoretical point of view, a comprehensive consideration of cross-category purchase behavior (e.g., Shocker et al. (2004)). Retailers are interested in multicategory models because such models can maximize store profits by coordinating marketing-mix actions across categories1. The relevance of joint purchases for optimizing prices therefore is an old research question (e.g., Böcker (1975); Müller-Hagedorn (1978); Hruschka (1991)). The analysis of market baskets can also be viewed as a “consumer orientation” (e.g., Julander (1992)), which means that it captures the shopping behavior of the consumer by analyzing their purchases in detail2. There is much discussion about the model type that is the most appropriate one to describe multicategory choice behavior (e.g., Seetharaman et al. (2005)). In this paper we discuss some features of these models and show that a model based on the MNL type is most suitable. We then use an empirical approach to provide evidence that using the MNL model will give reliable estimates and meaningful results for the manager. The model we propose overcomes the limitations of measuring only the category dependencies based on bivariate joint purchases observations (e.g., Hruschka (1991)). Furthermore, we include additional variables in the model. Standard choice models do not allow to estimate unbiased parameters because it is assumed that a purchase decision in one category does not correlate with those in other categories. Building models based on spatial statistics gives us an appropriate specification for dependent observations as in bundled purchases. Only recently have spatial models been used in marketing. They can capture several effects, e.g., spatially correlated errors or spatial lags3. One aim of our study is to compare our results with those of the Russell and Petersen (2000) MNL model. We do so by using the same model type and analyzing the same product class from Germany. We concentrate on a limited number of products that a consumer buys during the same shopping trip where the products are physically related. Second, we expand the approach by using additional explanatory variables and apply it to some other data sets to develop a broader application and to ensure that our method gives conclusive results when it is applied in a wider context. The paper is structured as follows. In Section 2, we describe market-basket models in general and their representation using models belonging to spatial statistics. We present the data description and estimation results in Section 3. Section 4 concludes.
1
2 3
See e.g., Mulhern and Leone (1991) and Walters (1991). Walters and Jamil (2002) conducted a broad study to account for consumer purchasing cross-category specials. They find significant evidence that consumers purchase promoted items and have also non-promoted products in their basket. It follows Rathneswar et al. (1999), who advocate incorporating the individual consumer in the theory of market behavior. See, e.g., Bradlow et al. (2005) for an overview of the usage of spatial statistics in marketing.
sbr 60 October 2008 400-422
401
Y. BOZTUĞ/L. HILDEBRANDT
2 M ARKET-B ASKET M ODELS Two types of models describing multiple purchases can be distinguished: market-basket approaches and market structure analysis. The market-basket approach arises due to a relation in purchase behavior across different categories. Consumers have a variety of ways in which to choose products across different categories. Market baskets are conducted regarding a “pick-any” choice (e.g., Levine (1979)), which means that the consumer chooses no item, one item, or several items on his/her shopping trip. Consumers are influenced not only by the own-category marketing mix, but also by marketing efforts in other categories4. More generally speaking, managers can use cross-category studies to design optimal prices simultaneously across product categories, and to maximize store profits (e.g., Chib et al. (2002)). It is also possible to identify categories that generate store traffic. The relation between the categories allows the retailer to design direct mailing programs, such as cross promotions (e.g., Manchanda et al. (1999)). Market structure analysis (e.g., Elrod et al. (2002)) can be used to explain substitutional or complementary relations in market offerings. In market structure analysis no difference is made between inner or outer category comparisons. The limitation of most approaches in this area is their assumption that the observed market consists only of one product category. Only recently have researchers started to investigate multi-item purchases across categories. These analyses examine different types of dependence between the purchased products, e.g., cross-category consideration, cross-category learning, and product bundling (e.g., Russell et al. (1999)). It is assumed that a purchase in one category affects the choice in other categories (Manchanda et al. (1999)). Russell et al. (1997) gives an overview of several definitions and assumptions for market baskets. Market-basket models can be distinguished by two research avenues (e.g., Boztug and Silberhorn (2006)). One research direction is oriented towards data-mining, where the researcher inspects the interdependent structure and the focus is mainly on how the categories relate to one another. Such models reduce the large amount of data arising from product bundling and are rarely used for forecasting. For a classification of different model types, see Figure 1. Here, we cite only few examples of research. In the beginning, researchers used only pairwise analysis (e.g., Böcker (1978); Hruschka (1985); Julander (1992); Schnedlitz and Kleinberg (1994)) or cluster analysis (e.g., Schnedlitz et al. (2001)) to describe category associations. More recently, researchers have applied more complex methods, such as association rules (e.g., Decker and Schimmelpfennig (2002)) or neural networks (e.g., Decker and Monien (2003); Mild and Reutterer (2001; 2003)), to marketbasket analysis. The second research direction is oriented to explanatory models. It focuses on the quantification of complementary and substitutional effects between categories. These effects are driven by marketing-mix variables. The approaches can be distinguished regarding 4
402
For a substantive discussion about the importance of other categories, see Shocker et al. (2004).
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
the dependent variable5 (see Figure 2). Again, we cite only a subset of the available studies. Most of these models focus on describing the purchase incidence outcome (e.g., Manchanda et al. (1999); Russell and Petersen (2000); Chib et al. (2002)). Other studies try to model the brand choice (e.g., Russell and Kamakura (1997); Ainslie and Rossi (1998)), store choice (e.g., Bodapati and Srinivasan (2001)) or even several outcomes simultaneously (e.g., Chib et al. (2004); Deepak et al. (2007); Song and Chintagunta (2007); Niraj et al. (2008))6. Nearly all models are based on the multivariate probit approach (e.g., Chib and Greenberg (1998)) or the multivariate logit approach (e.g., Russell and Petersen (2000)). Figure 1: Classification of different market-basket model types with data mining techniques Market basket models based on data mining techniques
Pairwise associations (Böcker (1978); Hruschka (1985); Dickinson et al. (1992); Julander (1992))
Neural networks (Decker and Monien (2003); Mild and Reutterer (2001; 2003))
Cluster analysis (Schnedlitz et al. (2001))
Associations rules (Agrawal and Skrikant (1994); Decker and Schimmelpfennig (2002))
Figure 2: Classification of different explanatory market-basket models Explanatory market basket models
Purchase incidence (Manchanda et al. (1999); Russell and Petersen (2000); Chib et al. (2002)) Brand choice (Russell and Kamakura (1997); Ainslie and Rossie (1998))
5 6
Store choice (Bodapati and Srinivasan (2001))
Multiple outcomes (Chib et al. (2004); Deepak et al. (2007); Song and Chintagunta (2007))
They can also be viewed as extensions on market baskets of Gupta’s famous question (1988): “[…] When, What and How Much to Buy”. For a more detailed description of the model types, see Seetharaman et al. (2005).
sbr 60 October 2008 400-422
403
Y. BOZTUĞ/L. HILDEBRANDT
The multivariate logit and probit approaches are closely related. They allow the modeling of a relation between categories which can be complementary or substitutive. With a multivariate probit model, often the researcher uses a hierarchical Bayes framework, such as the Markov Chain Monte Carlo Method (MCMC), to estimate the model (e.g., Ainslie and Rossi (1998); Manchanda et al. (1999); Seetharaman et al. (1999); Chib et al. (2002)). The likelihood of a multivariate probit model has no closed-form solution, so the computational effort for the model estimation is high. Also, a separate estimation for the no-purchase outcomes is not possible7. The multivariate logit approach is used by Russell and Petersen (2000) and others8. Its likelihood has a closed-form solution and needs less computational effort. The main advantage of using the logit approach is its closeness to the well-known and established MNL model (e.g., Guadagni and Little (1983)), which describes a single category choice. 2.1 G ENERAL A SSUMPTIONS
OF THE
M ULTIVARIATE MNL M ODEL
In our model, we consider only product bundling without incorporating learning effects, which cover the assumption that the choice in one category is influenced by, e.g., usage or marketing activity of products in other categories (Russell et al. (1999)). The learning effects are induced by earlier choices. When inspecting the market basket of consumers, we need to model the choice behavior for one product conditional on the decisions made for the other categories or products. If we assume that you have a basket with three products inside, then you have several possibilities that arise from the specific order in which you place these products {A, B, C} in the basket. Taking only the last choice into account, three possible last choices exist: Pr(A |B, C), Pr(B |A, C) or Pr(C |A, B). The probabilities are full conditional distributions that we combine to generate a joint distribution Pr(A , B, C). At the same time, the joint distribution describes the market-basket model. Using scanner panel data, we do not have any information about the order of a product joining the bundle of a specific consumer. We have to make assumptions to model a process that we cannot observe. In addition, we assume dependence between the choices in the categories. To obtain unbiased estimations, dependent observations must be modeled with methods based on spatial statistics. Spatial modeling gives us the opportunity to account for conditional obser7 8
404
For further details, see, e.g., Seetharaman et al. (2005). The multivariate logistic approach is also used by Hruschka (1991) and Hruschka et al. (1999). Hruschka (1991) uses a probabilistic approach which measures the relationship between plenty of categories, but no additionally explanatory marketing-mix variables are included. The model introduced by Hruschka et al. (1999) is an extension of Hruschka (1991), where they include also cross-category sales promotion effects in their multivariate logit model. The main difference between Hruschka and the approach of Russell and Petersen (2000) is that the final market-basket model of Russell and Petersen is developed from single category purchase models using the Theorem of Besag (Cressie (1991)), which has its roots in spatial statistics. Based on this theorem, we get a model in which the dependency of the categories is included via a parameter explicitly in the model. In Seetharaman et al. (2005), and in other newer empirical studies, the multivariate logit approach is now used more often as the multivariate probit approach.
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
vations without having any information about the concrete purchase sequence, and to capture the dependency structure. With the Theorem of Besag (1974), we can verify the consistency of the full conditional distributions and mathematically derive the form of the joint distribution. 2.2 A PPLICATION
TO
M ARKET B ASKET M ODELS
We use the autologistic process from equation (A8) to estimate market-basket data with spatial models. We provide the derivation of the market-basket model based on techniques from spatial statistics in detail in the appendix. The model’s functional form must be specified. To construct a model that is close to standard approaches of choice decisions, we use a utility function that includes marketing-mix parameters and household specific variables. Here, we adopt the function used by Russell and Petersen (2000), but extend their model by using two additional variables to predict the utility function U(i,k,t) for category i and for consumer k at time t. Formally, the model has the following structure: U(i,k,t) = βi HHikt + MIXikt + ∑ θijk C(j,k,t) + εikt = V(i,k,t) + εikt (1) i≠j
with HHikt household specific and MIXikt marketing-mix effects, with category i, time t and consumer k. The marketing-mix effects are price, display, and promotion. βi is a category dummy variable and is equal to the spatial trend. θijk is the cross-category parameter (see the appendix for a detailed derivation). θijk describes the spatial dependence. εikt is the stochastic error term that we assume is extreme value distributed, as in a standard multinomial logit model. The utility form as in equation (1) is close to a standard multinomial model formulation used for a single category to describe a brand choice utility. Only the cross-category-term is added to the common approach. C(j,k,t) is a binary variable that equals one if consumer k purchases category j at time t, and zero otherwise. V(i,k,t) describes the deterministic utility function for consumer k at time t for product category i. The household specific variable HHikt includes a time variable and a measure of loyalty for category i in the specification below HHikt = δ1i ln[TIMEikt + 1] + δ2i LOYALik
(2)
with TIMEikt the time in weeks since the last purchase for a consumer in the category. We note that we use ln(TIMEikt + 1) to capture nonlinearities. We define LOYALik as n(i,k) + 0.5 LOYALik = ln[______ ]. n(k) counts for the purchases of a consumer in the initial period, n(k) + 1 and n(i,k) is the number of purchases in category i during the initial period. LOYALik is a measure for the loyalty to one category per consumer. The loyalty variable is measured in the initialization period and does not change over time. For the household variable, we
sbr 60 October 2008 400-422
405
Y. BOZTUĞ/L. HILDEBRANDT
estimate two parameters for each category (δ1i and δ2i). We assume that these parameters are positive, because the higher the loyalty to a specific category, the higher should be the purchase probability. The same argument holds for the time variable: The more time that has passed since the last purchase, the higher the probability that the consumer will purchase in that category again. As explained above, the marketing-mix variable MIXikt captures the price, display, and a promotion component with MIXikt = γi ln(PRICEikt) + ξDISPLAYikt + ψi PROMOTIONikt.
(3)
PRICEikt is measured by an index of prices of a category. We use ln(PRICEikt) to capture nonlinearities. We calculate the index as the mean of prices of all purchased items in a specific category in a week9. DISPLAYikt is the mean number of available displays per category calculated for each week. PROMOTIONikt describes promotional activities per week calculated in the same manner as display. We expect the price parameter to be negative, and the parameters for display and promotion to be positive. The assumptions are similar to those made in a common brand-choice model setting. The cross-category variable θijk is θijk = κij + ηSIZEk,
(4)
where SIZEk is the mean basket size for consumer k in the initial period. We assume that θijk is symmetric, so κij must be constrained to be symmetric. We assume that η is positive, because a larger basket size imposes larger cross-category relations due to more categories purchased at the same time. We estimate the probability of choosing (buying) a product of one category (C(i,k,t)=1) conditional on other category purchases (C(j,k,t)) as 1 Pr(C(i,k,t) = 1|C (j,k,t) for j ≠ i) = ____________. 1 + exp {–V(i,k,t)}
(5)
After developing the single category model, we extend this model to the final marketbasket model. Looking at four product categories, we describe the market basket of a consumer k at time t by a quadruple B(k,t) with B(k,t) = {C(1,k,t), ..., C(4,k,t)} representing our four product categories with C(i,k,t) = 1 if consumer k purchases in category i at time t. This kind of choice representation induces 24 = 16 different baskets 9
406
The generation of the price index is slightly different from the approach of Russell and Petersen (2000). Since our database has no information about items not purchased, we build the index only based on purchased items. The difference between our study and that of Russell and Petersen is small, because they weight the prices of all items by the purchased quantities in the initial period, so products that are purchased infrequently have only a small influence on the index.
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
for possible purchases in four categories. We exclude the Null basket (no choice in any of the four categories) in our analysis. Using Besag’s (1974) theorem, the utility function from equation (1), and the binary description of a choice for a category, we obtain the probability of choosing a specific basket b from the choice set exp {μ(b,k,t)} Pr(B(k,t) = b) = ____________ Σb* exp {μ(b*,k,t)}
(6)
with μ(b,k,t) = ∑ βi X(i,b) + ∑ HHikt X(i,b) + ∑ MIXikt X(i,b) i
i
(7)
i
+ ∑ θijk X(i,b)X(j,b). j≠i
b* are all possible baskets (without the Null basket). X(i,b) is a dummy variable, which takes the value of one if category i is included in basket b, and zero otherwise. Equation (7) is an extension of equation (1) for a basket-wise examination. 3 D ATA D ESCRIPTION
AND
E STIMATION R ESULTS
For our analysis, we use the GfK ConsumerScan Household panel data set10 with several categories. To ensure comparison, we first analyze the same categories as in Russell and Petersen (2000). In addition, we inspect bundles of other categories. Therefore, in this section we compare the estimation results of Russell and Petersen, who apply the market-basket model to a Canadian data set, with our results11. We want to see if we can detect similar relations between the paper-products categories for the German data set as were observed in the Canadian one. Furthermore, to gain a better understanding of the modeling, we include two additional explanatory variables (display and promotion), and therefore analyze several extensions of the model of Russell and Petersen. We find that the DISPLAY variable gives the most additional explanation in the model. Additionally, we expand the application to three other closely related product groups. We extend our examination to other product combinations because consumers usually buy category bundles not only for paper products during their shopping trips. We want to compare the estimation results for the other category ensembles.
10 The data we use for this analysis are part of a subsample of the 1995 GfK ConsumerScan Household panel data and were made accessible by ZUMA. The ZUMA data set includes all households having continuously reported product purchases during the entire year of 1995. For a description of this data set cf. Papastefanou, (2001). 11 Unfortunately, we did not have the same data set Russell and Petersen used in their study.
sbr 60 October 2008 400-422
407
Y. BOZTUĞ/L. HILDEBRANDT
3.1 D ESCRIPTION
OF THE
DATA S ET
From the available data sets, we take one bundle that includes paper products such as paper towels, toilet tissue, facial tissue, and paper napkins. Doing so allows us to make a direct comparison between our results and those of Russell and Petersen (2000). The second bundle consists of breakfast beverages, e.g., coffee, instant coffee, tea, canned milk, and filter paper. Several types of detergent and softener comprise a third bundle, and our fourth bundle includes beverages such as beer, wine, and non-alcoholic drinks. The observations in the data are from a one-year period. We use the first 15 weeks taken for the initial data set, generating an estimation for the LOYAL variable. For calibration we use 23 weeks, and the holdout data set includes the remaining 14 weeks of the sample. For the different bundles, approximately 4,000 households purchased between 12,000 and 50,000 baskets. Modeling four product categories results in 15 different possible market baskets, and for five categories, we get 31 baskets. We number the baskets for the four-category case of paper categories as shown in Table 1. We also present the percentage of purchased baskets as observed in the calibration data set and estimated for the assumption of no interaction between the different baskets. Table 1: Market-basket explanation and distribution for paper categories Basket number
Paper towels
Facial tissue
Toilet tissue
Paper napkins
% in observations
% estimated for independence
1
1
0
0
0
13.7
2
0
1
0
0
2.3
1.8
3
0
0
1
0
63.1
47.3
4
0
0
0
1
5.9
4.0
5
1
1
0
0
0.3
0.4
6
1
0
1
0
10.2
9.5
7
1
0
0
1
0.6
0.8
8
0
1
1
0
1.1
1.2
9
0
1
0
1
0.1
0.1
10
0
0
1
1
2.1
2.6
11
1
1
1
0
0.2
0.2
12
1
1
0
1
0.0
0.0
13
1
0
1
1
0.5
0.5
14
0
1
1
1
0.0
0.1
15
1
1
1
1
0.0
0.0
15.1
A single purchase of toilet tissue is the basket chosen most often, representing more than 60% of all purchases. The number of baskets with toilet tissue and paper towels is twice as big as the single baskets that include paper napkins and four times as big as the one for
408
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
facial tissues. The full basket and the triple basket without paper towels or without toilet tissue are only rarely represented in the calibration data set, and the baskets with three categories play only a very small role in all purchases. Here, we have a less strong equal distribution above all baskets as in Russell and Petersen (2000). In their data set, the fraction of the different purchased baskets for the double and triple combinations are more equally distributed. In our data, we have a more skewed distribution. Due to this difference, we may also see different estimation results. For several baskets, the share assumed for independence between the categories is significantly lower (e.g., for toilet tissue) than observed. For all other bundles, differences exist in the number of observed baskets to the number of estimated baskets by assuming independence. The detailed results can be obtained from the authors on request. We estimate four different model types. For all models, equation (6) holds to calculate the market-basket probability. Equation (7) changes for both the excluded and included terms; its different shaping is shown below. First, we have a model without any crosseffect parameters (M1). This model can be seen as four (five) single models for each category, in which we include only the direct effects as PRICEikt, TIMEikt, LOYALik and an intercept. μ(b,k,t) = ∑ βi X(i,b)
(M1)
i
+ ∑ (δ1i ln[TIMEikt + 1] + δ2i LOYALik)X(i,b) i
+ ∑ (γi ln(PRICEikt))X(i,b) i
The first stage for considering cross-effects is to include only the parameters κij, while ignoring the SIZEk -effect (M2). μ(b,k,t) = ∑ βi X(i,b)
(M2)
i
+ ∑ (δ1i ln[TIMEikt + 1] + δ2i LOYALik)X(i,b) i
+ ∑ (γi ln(PRICEikt))X(i,b) i
+ ∑ κijX(i,b)X(j,b) i
The full cross-effects model contains all variables mentioned previously, including the SIZEk-effect (M3).
sbr 60 October 2008 400-422
409
Y. BOZTUĞ/L. HILDEBRANDT
μ(b,k,t) = ∑ βi X(i,b)
(M3)
i
+ ∑ (δ1i ln[TIMEikt + 1] + δ2i LOYALik)X(i,b) i
+ ∑ (γi ln(PRICEikt))X(i,b) i
+ ∑ (κij + ηSIZEk)X(i,b)X(j,b) i
In addition to the Russell and Petersen (2000) approach, we include the explanatory variables DISPLAYikt and PROMOTIONikt in model M3, which is denoted with full model M4. μ(b,k,t) = ∑ βi X(i,b)
(M4)
i
+ ∑ (δ1i ln[TIMEikt + 1] + δ2i LOYALik)X(i,b) i
+ ∑ (γiln(PRICEikt) + ξiDISPLAYikt + ψi PROMOTIONikt) X(i,b) i
+ ∑ (κij + ηSIZEk)X(i,b)X(j,b) i
Table 2 shows that for all bundles, the full model M412 has a significant better loglikelihood and better AIC values than do the models with fewer parameters. These results for our models M1 to M3 are comparable to the analysis of Russell and Petersen (2000). Additionally, we can show that adding the variable DISPLAYikt leads to even better AIC values than ignoring the variable. Overlooking cross-effects results in worse estimation. Table 2: Comparison of the fit parameters for the different model types with and without cross-effects for all bundles Paper products
Breakfast beverages
Detergents
Beverages
Model
LL
AIC
LL
AIC
LL
AIC
LL
AIC
M1
–20,468.6
40,969.3
–74,529.6
149,099.2
–23,063.1
46,166.3
–103,091.4
206,222.7
M2
–12,808.0
25,660.1
–62,269.0
124,598.0
–14,005.5
28,071.0
–97,666.2
195,392.5
M3
–12,752.1
25,550.2
–61,470.1
123,002.1
–13,945.3
27,952.7
–95,684.1
191,430.2
M4
–12,741.8
25,537.5
–61,458.1
122,988.2
–13,937.5
27,947.0
–95,665.5
191,403.1
Thus, we conclude that we find the same effects as Russell and Petersen (2000), that including the cross-effects and therefore assuming dependence between the categories represents the data sets much better than if we assume independence. 12 For the following descriptions and tables we drop the PROMOTIONikt variable in the parameter estimations, because it does not deliver any model improvement for several fit indexes, and the estimated parameter values are not significant. Therefore, only DISPLAYikt enters model M4.
410
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
To ensure a fair comparison between our results and those of Russell and Petersen (2000), we make a direct comparison between their final model and our model (M3) in Table 3. Significance on the 5%-level is marked with **, and for 10% with *. Table 3: Parameter estimation from Russell and Petersen (2000) compared to our results of model M3 for the paper categories bundle Category
Paper towels
Toilet tissue
Facial tissue
Paper napkins
–0.12*
0.25
Variable Base-level parameters (Russell and Petersen (2000)) Intercept
0.48*
0.31*
Loyalty
1.83**
1.67**
1.54**
1.85**
Time
0.31**
0.18**
0.17**
0.88**
Price
–0.73**
–0.55**
–0.96**
–1.24*
Cross-effect parameters (Russell and Petersen (2000)) Basket size loyalty Paper towels cross effects
0.64** –
0.64**
0.64**
0.64**
–0.66**
–0.86**
–1.17**
Toilet tissue cross effect
–0.66**
–
–0.76**
–0.72**
Facial tissue cross effects
0.86**
–0.76**
–
–1.37**
–1.17**
–0.72**
–1.37**
–
18.94**
3.29*
0.76**
–1.71**
Paper napkins cross effects Base-level parameters (our results) Intercept Loyalty
0.36 **
0.67**
0.76**
0.06
Time
0.88**
0.70**
1.10**
0.93**
Price
–3.96**
–0.11
–0.42
1.20**
Cross-effect parameters (our results) Basket size loyalty Paper towels cross effects
0.68** –
0.68**
0.68**
0.68**
–4.52**
–4.18**
–4.35**
Toilet tissue cross effect
–4.52**
–
–4.71**
–4.99**
Facial tissue cross effects
–4.18**
–4.71**
–
–4.42**
Paper napkins cross effects
–4.35**
–4.99**
–4.42**
–
We find similar estimates, especially for the signs, in both Russell and Petersen (2000) and our data set. When we examine the direct effects, i.e., loyalty, time, and price, we see that in absolute values, the loyalty values for the Canadian data are two to three times as large as the German values. This effect could be due to a larger average basket size (1.54) for Canadian consumers compared to 1.17 for the Germans. The time effects from Russell and Petersen are slightly smaller than ours, which can be explained by a slightly higher purchase frequency of the Canadians. Our price parameters are all negative (except for
sbr 60 October 2008 400-422
411
Y. BOZTUĞ/L. HILDEBRANDT
paper napkins), as they are in Russell and Petersen. Unfortunately, only our paper towels parameter is significantly negative. The absolute values are difficult to compare due to possible different price levels. When we look at the cross-effect parameter, we find a similar parameter estimate for size. However, to calculate the absolute influence of size, we must multiply the value with the size-value for each consumer, which is higher in the Canadian data set (as we can see from the average size value in comparison to the average German value). The cross-effect parameters are much higher for our data set, which may be the result of smaller market baskets bought in Germany. This result must lead indirectly to larger substitutional relationships, because these items are rarely purchased together. After inspecting the direct comparison, we look at the results that include the DISPLAY parameters, denoted as model M4. We present some of the parameter estimates for all four bundles in Table 4. The complete results are available on request. Table 4: Several parameter estimations for the own-effects for the full model (M4) for all bundles Variable
Loyalty
Time
Price
Display
Paper towels
0.36**
0.89**
–3.60**
Facial tissue
0.67**
1.10**
–0.47
–0.40
Toilet tissue
0.76**
0.70**
–0.18
–1.44
Paper napkins
0.06
0.94**
Coffee
0.74**
Instant-coffee
0.75**
Tea
Category Bundle 1 1.16
1.60**
3.11**
0.21**
0.60*
2.96**
0.51**
–0.39**
0.76**
0.56**
–0.64**
Canned milk
0.75**
0.19**
–0.17
Filter paper
–0.12**
0.97**
Mild detergent
0.22**
0.93**
Softener
0.43**
0.95**
4.20**
Conventional detergent
0.17*
0.70**
–2.76**
Detergent for colored laundry
0.19**
0.93**
–1.93**
0.94
Wool wash
0.00
1.12**
0.00
–0.87
8.32**
–1.33*
Bundle 2
2.24**
1.17* –0.76 0.60 3.57**
Bundle 3 –0.68
3.83** 2.36** 0.94
Bundle 4
412
Beer
0.50**
0.04**
Juices
0.50**
–0.14**
–0.18
2.30**
Wine
0.55**
0.26**
0.06
0.92**
Lemonade
0.48**
0.20**
Water
0.46**
1.04**
1.62** –0.64*
–0.82 –1.18
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
We note that compared to the results in Table 3, we find stable effects for loyalty, price, and time for the paper products. The loyalty parameters for all product categories are positive and significant. The positive signs are as expected, because higher product loyalty should induce a higher probability to purchase a product in a specific category. The estimates for the time variable are also all positive, again as expected (e.g., for paper towels, toilet tissue, filter paper). The results for loyalty and time are still in line with the results of Russell and Petersen (2000). This outcome holds for the same category bundle (paper categories), and also for the other category ensembles that we examine. The price estimates show also a consistent picture. For most categories, the parameters are negative and significant (e.g., for paper towels, instant coffee, tea, conventional detergents, detergent for colored laundry, water). We have a few categories with a positive price parameter (e.g., paper napkins, filter paper, beer, and lemonade). Consumers equating a high price with higher quality could drive this result. The remaining price effects are negative, but not significant. In most of these categories, markets make only few price promotions. Display is significant in many categories (e.g., paper napkins, coffee, instant coffee, filter paper, mild detergent, softener, juices, and wine). These are categories in which marketingmix actions often take part. The positive display effects can sometimes be observed along with a positive price effect. This result leads us to conclude that in these categories, display actions happen at the same time with price increase, but the products are still purchased. We also find similar effects not only between our results in the paper categories and those of Russell and Petersen (2000), but also for the other augmented category bundles. In examining the cross-category parameter estimates, we do not present the values for κij and η separately, but instead show the estimates for the full cross-effect parameter θijk for an average consumer in Tables 5 to 8 using equation (4) to calculate the cross-effects. To obtain the results for an average consumer, we use the average basket size SIZEk from the initial data set. Table 5: Parameter estimations of the cross-effects for the full model (M4) for an average consumer for the paper products bundle Paper towels
Facial tissue
Toilet tissue
Paper napkins
Paper towels
–
–2.60**
–3.73**
–3.55**
Facial tissue
–2.60**
–
–3.92**
Toilet tissue
–3.73**
–3.92**
Paper napkins
–3.55**
–3.64**
–3.64**
–
–4.20**
–4.20**
–
Table 6: Parameter estimations of the cross-effects for the full model (M4) for an average consumer for the breakfast beverages bundle Coffee
Instant-coffee
Tea
Canned milk
Filter paper
–
–1.02**
1.12**
–1.01**
–0.35**
Instant-coffee
–1.02**
–
–1.25**
–0.86**
–0.43**
Tea
–1.12**
–1.25**
–
–1.11**
–0.33**
Canned milk
–1.01**
–0.86**
–1.11**
–
–0.31**
Filter paper
–0.35**
–0.43**
–0.33**
–0.31**
–
Coffee
sbr 60 October 2008 400-422
413
Y. BOZTUĞ/L. HILDEBRANDT
Table 7: Parameter estimations of the cross-effects for the full model (M4) for an average consumer for the detergents bundle Mild detergents
Softener
Conventional detergent
Detergent for colored laundry
Wool wash
–
–3.70**
–3.74**
–4.00**
–4.11**
Softener
–3.70**
–
–3.48**
–3.46**
–3.29**
Conventional detergent
–3.74**
–3.48**
–
–3.76**
–3.78**
Detergent for colored laundry
–4.00**
–3.46**
–3.76**
–
–3.51**
Wool wash
–4.11**
–3.29**
–3.78**
–3.51**
–
Mild detergent
Table 8: Parameter estimations of the cross-effects for the full model (M4) for an average consumer for the beverages bundle Beer
Juices
Wine
Lemonade
Water
–
0.13**
0.66**
0.54**
0.08
Juices
0.13**
–
0.60**
0.16**
–0.01
Wine
0.66**
0.60**
–
0.50**
0.06
Lemonade
0.54**
0.16**
0.50**
–
–0.07
Water
0.08
Beer
–0.01
0.06
–0.07
–
For paper products and detergents, we find high direct substitutional relations. We observe the same effect for the breakfast beverage bundle. The direct effects inform us if the joint purchase of two categories will induce a higher or a smaller purchase probability for the bundle. In the beverage bundle, the categories are complementary to each other. The cross-effect estimates should not be detected separately, because they show only the marginal influence on the purchase probability. However, they give us no information about the total effects. For managers, cross-price elasticities give more valuable information than do the direct cross-effects, because they deliver information about the consumers‘ purchase choices in a typical week. Therefore, in Tables 9 to 12 we present the aggregated cross-price elasticities for all bundles. The horizontal rows in these tables show the percentage of change in choices for a 1% price increase shown in the column category. These elasticities are no longer symmetric. We can see clearly how price changes affect the other categories. Especially, we can detect which categories are the “draw” categories.
414
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
Table 9: Cross-price elasticities for the full model (M4) for the paper products bundle Paper towels
Facial tissue
Toilet tissue
Paper napkins
Paper towels
–2.40
0.00
0.03
–0.05
Facial tissue
–0.30
–0.35
0.04
–0.06
Toilet tissue
0.14
0.01
–0.02
–0.04
–0.32
–0.01
–0.06
–1.30
Paper napkins
Table 10: Cross-price elasticities for the full model (M4) for the breakfast beverages bundle Coffee
Instant-coffee
Tea
Canned milk
Filter paper
Coffee
–0.28
–0.01
–0.01
–0.01
0.02
Instant-coffee
–0.06
–0.27
0.02
0.01
–0.03
Tea
–0.07
0.02
–0.48
0.02
–0.01
Canned milk
–0.05
0.01
0.01
–0.08
–0.02
Filter paper
0.02
0.00
0.00
0.00
–1.61
Table 11: Cross-price elasticities for the full model (M4) for the detergents bundle Mild detergents
Softener
Conventional detergent
Detergent for colored laundry
Wool wash
Mild detergent
–0.49
–0.55
0.66
0.22
0.00
Softener
–0.03
–2.21
–0.49
–0.16
0.00
Conventional detergent
0.03
–0.42
–1.07
0.18
0.00
Detergent for colored laundry
0.04
–0.46
0.62
–1.17
0.00
–0.04
0.46
–0.72
–0.22
–0.01
Wool wash
Table 12: Cross-price elasticities for the full model (M4) for the beverages bundle
Beer
Beer
Juices
Wine
Lemonade
Water
–2.57
0.00
0.00
–0.05
0.00
Juices
0.03
–0.04
0.00
0.00
0.00
Wine
–0.45
0.00
–0.03
–0.07
0.00
Lemonade
–0.22
0.00
0.00
–0.48
0.00
0.12
0.00
0.00
0.03
–0.37
Water
sbr 60 October 2008 400-422
415
Y. BOZTUĞ/L. HILDEBRANDT
In comparison to Russell and Petersen (2000), Table 9 shows that for the paper categories there are some own-price elasticities larger than one (elastic demand), while others are smaller than one (inelastic demand). Further, price increases in the paper towels category have a large influence on choice share for all other categories. Price changes in the remaining categories have only a very small impact on the other categories. Table 10 shows that the cross-price elasticities for the breakfast bundle are all very small, so the spillover effects here that can be attributed to price are almost unimportant. For the last two category groups (detergents in Table 11 and beverages in Table 12), we again find that price strongly influences changes in the choice share. In the detergent bundle, a price increase has a major effect on the purchases in the remaining three categories, softener, conventional detergent, and detergent for colored laundry. In the beverage bundle, we see that beer affects all other categories except for juices, where we only find weak effect. Our study of the inter-relatedness of several product bundles leads to the following conclusions. From the modeling side, it is necessary to include cross-category effects. Omitting cross-category effects could lead to in misleading parameter estimates. We see small values of price elasticity for some categories, while others are quite large. Small price elasticities induce cross-category relations primarily due to the shopping style of the consumers. But even if categories are only related through the shopping style, category managers should keep in mind these relations (e.g., Manchanda et al. (1999)). 4 C ONCLUSION When modeling the purchase of market baskets, it is necessary to consider relations between the categories and use alternative models to the standard multinomial logit model. One possible approach is to build a model on n-dimensional decisions that are related to each other. Another approach is, e.g., a bivariate analysis as is usually used in data-mining approaches. In our study, we replicate the study of Russell and Petersen (2000), extending the model to find easy to get additional impact variables, and extend their approach to other category ensembles. Here, we employ a multivariate logit model using methods of spatial statistics. In the underlying approach, an explicit modeling of cross-category dependencies is possible. When we compare the general results of our German data set of purchases in paper categories with that of the Russell and Petersen (2000) results based on a Canadian data set, we find significant effects for the cross-category variables. These effects could also have been shown for three other bundles. For all category bundles, the model including crosseffects gives the best overall fit. The categories are either substitutes, complements, or have no direct relation to each other. Ignoring these effects could result in misleading parameter estimates and therefore to the under- or overestimating of marketing-mix effects. Additionally, optimal prices taking the cross-category effects into account could be set to maximize store profits.
416
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
Because of model restrictions, we examine only small bundles. Almost all market-basket studies also focus on only a few categories. Generally, the analysis of larger baskets is needed. Additionally, the categories forming a bundle should not be chosen ad hoc, but in a data-driven manner. Until now, researchers built the bundles using available data or by intuition. But as was shown by Chib et al. (2002), parameters could change rapidly if the researcher changes categories. Here, more research is needed to obtain stable estimates. Another open issue is to integrate a heterogeneity concept in the model approach. Several methods, e.g., finite mixture approaches, are possible. As Russell and Peterson (2000) have done in their analysis, we also ignore the heterogeneity question and leave it for further analyses. One other possible direction of model extension is to use non- or semiparametric models, e.g., a generalized additive model (Hastie and Tibshirani (1990)), which has also been successfully used as an extension for a standard multinomial logit approach. Also the methods based on data-mining techniques could be incorporated in the multivariate MNL approach to identify relevant product groups. A PPENDIX S PATIAL M ODELS Statistics for explanatory data analysis usually rely on stochastic models. For spatial models, we need a parameter space as a subspace of at least Ω2. Also, we must account for the dependence of the observations. We describe a stochastic process through a family of random quantities Xt, which we define on a joint probability space (Ω,F,P) with t ∈ D and D as an index set. For a fixed ω ∈ Ω X(ω) is called a realization of the stochastic process. We denote Z(x) as a (spatial) stochastic process with the parameter x = (x1, x2) ∈ D ⊂ Ω2. We focus on spatial models on a lattice. We call D a lattice if it contains observations at a countable collection of spatial sites (e.g., Cressie (1991)). The neighboring information belongs to the data structure. For neighboring relationships, the spatial stochastic process is no longer continuous, because it gives no possibility of realizations between two points of the lattice. The problem is to apply some asymptotic theory that can model the stochastic process of Z. With a Markov random field it is possible to develop information about the distribution of the stochastic process. We define the Markov random field as a probability measure whose conditional distribution defines a neighborhood structure {Ni: i = 1, ..., n} with k as a neighbor of i if the conditional distribution of Z(x) depends functionally on z(xk) for k ≠ i. Here, we assume pairwise-only dependence for the neighborhood structure. Pairwise-only dependence says only that some of the potentially nonzero G functions in equation (A3) turn out to be zero (Cressie 1991).
sbr 60 October 2008 400-422
417
Y. BOZTUĞ/L. HILDEBRANDT
The negpotential function Q is defined by: Q(z) = ln[Pr(z)/Pr(0)], z ∈ ξ
(A1)
with ξ ≡ {z: Pr(z) > 0} and ξ1 ≡ {z(xi): Pr(z(xi)) > 0} as a support function (Besag (1974)). Without loss of generality, we assume that 0 ∈ ξ holds. The positivity condition is fulfilled if ξ = ξ1 x...x ξ n (e.g., Cressie (1991)). In the discrete case, the knowledge of Q(·) implies the knowledge of Pr(·), because we can rewrite equation (A1) as Pr(z) = exp(Q(z))/Σy ∈ ξ exp(Q(y)). With this notation, Besag’s (1974) factorization theorem can be stated as: Factorization Theorem of Besag
Suppose the variables {Z(xi); i = 1, ..., n} have joint mass function Pr(·), whose support ξ satisfies the positivity condition. Then, Pr(z) ____ = Pr(y)
n
∏ i=1
Pr(z(xi)|z(x1),...,z(xi–1),y(xi+1),...,y(xn)) __________________________ , Pr(y(xi)|y(x1),...,y(xi–1),z(xi+1),...,z(xn))
z,y ∈ ξ
(A2)
where y ≡ (y(x1), ..., (y(xn)), z ≡ (z(x1), ..., (z(xn)) are possible realizations of Z. Using this theorem, it is possible to obtain the joint probability from the conditional probability with Σy ∈ ξ Pr(y) = 1. But the joint probability is only unique by restricting the full conditional distribution (Russell and Petersen (2000)). For the negpotential function Q, the following properties also hold (Cressie (1991)): n
Q can be obtained by Pr(z(x )|{z(x ): j ≠ i})
Pr(z) i j ______________ = ____ = exp(Q(z) – (Q(z i)) Pr(0(xi)|{z(xj): j ≠ i})
Pr(zi)
with 0(xi) describing the realization Z(xi) = 0 and zi ≡ (z(x1), ..., z(xi–1), 0,z (xi+1), ..., (z(xn)). n
Q can be expanded uniquely on ξ as Q(z) = ∑ z(xi)Gi (z(xi)) + ∑ ∑ z(xi)z(xj)Gij (z(xi)z(xj)) 1≤i≤n
(A3)
1≤i
+ ∑ ∑∑ z(xi)z(xj)z(xk)Gijk (z(xi)z(xj)z(xk)) 1≤i
+ ... + z(x1) ... + z(xn)G1...n(z(x1), ...,z(xn)) with z ∈ ξ.
418
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
n
can be expressed for binary data and only pairwise dependence as n
Q(z) = ∑ αi z(xi) + ∑ ∑ θij z(xi)z(xj) i=1
(A4)
1≤i< j≤n
with θij = 0 unless i and j are neighbors. θijk in equation (1) is the same cross-category parameter as in equation (A4). Each category can be a neighbor of the remaining ones, but we still have only a pairwise neighborhood. The model in equation (A4) is called ’autologistic model’. For reasons of identification, θii = 0 is fixed and θij = θji arises from the Theorem of Besag (1974) and the equity guarantees that the pairwise G function is well defined. The parameter αi describes the spatial trend (also called large-scale variation) and θij the spatial dependence (also called small-scale variation) of the observations. From the properties of Q follows: Pr(z(x )|{z(x ): j ≠ i})
n
{
i j ______________ = exp αi z(xi) +
Pr(0(xi)|{z(xj): j ≠ i})
}
∑ θij z(xi)z(xj) . j=1
(A5)
Due to the binarity of z(xi) holds (Cressie (1991)) 1 – Pr(0(x )|{z(x ): j ≠ i})
n
{
i j ________________ = exp αi +
Pr(0(xi)|{z(xj): j ≠ i})
}
∑ θij z(xj) j=1
1 – Pr(0(xi)|{z(xj): j ≠ i})
{
(A6) n
}
⇔ 1 + ________________ = 1 + exp αi + ∑ θij z(xj) Pr(0(xi)|{z(xj): j ≠ i})
{
n
j=1
}
1 ⇔ ______________ = 1 + exp αi + ∑ θij z(xj) Pr(0(xi)|{z(xj): j ≠ i})
j=1
1 ⇔ Pr(0(xi)|{z(xj): j ≠ i}) = ___________________ n 1 + exp {αi + ∑j=1 θij z(xj)}
and then follows exp {αi z(xi) + ∑
n
θij z(xi)z(xj)}
j=1 , z(xi) = 0,1.(A7) Pr(z(xi)|{z(xj): j ≠ i}) = ______________________ n
1 + exp {αi + ∑j=1 θij z(xj)}
The common estimation routine for the autologistic model in equation (A7) is a likelihood-based approach. We describe the likelihood as:
sbr 60 October 2008 400-422
419
Y. BOZTUĞ/L. HILDEBRANDT
Pr(z) = exp (Q(z))/∑ exp(Q(y)).
(A8)
y∈ξ
For this estimation, Q(z) is fixed-trough equation (A4). The estimation can be done with common statistical software packages with a pseudolikelihood approach. To achieve unity to avoid arbitrariness, Cox (1985) calls for statistics to concentrate on likelihood-based estimation procedures, but exact likelihood is not available due to a nonclosed form. Here, assuming a so-called isotropic Ising model, the maximum pseudolikelihood estimates are the same as using a formal maximum likelihood from the logistic regression model. Therefore, standard software packages can be used to estimate the model. R EFERENCES Ainslie, Andrew and Peter E. Rossi (1998), Similarities in choice behavior across product categories, Marketing Science 17, 91-106. Agrawal, Rakesh and Ramakrishnan Srikant (1994), Fast algorithms for mining association rules, in Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo (eds.), VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile: Morgan Kaufmann, 487-499. Besag, Julian (1974), Spatial interaction and the statistical analysis of lattice systems, Journal of the Royal Statistical Society, Series B 36, 192-236. Böcker, Franz (1975), Die Analyse des Kaufverbunds – Ein Ansatz zur bedarfsorientierten Warentypologie, zfbf 27, 290-306. Böcker, Franz (1978), Die Bestimmung der Kaufverbundenheit von Produkten, Berlin: Duncker & Humblot. Bodapati, Anand V. and V. Seenu Srinivasa (2001), The impact of out-of-store advertising on store sales, Working Paper, University of California at Los Angeles. Boztuğ, Yasemin (2002), Die Analyse der Preiswirkung auf die Markenwahl – Eine nichtparametrische Modellierung, Wiesbaden: Deutscher Universitätsverlag. Boztuğ, Yasemin and Lutz Hildebrandt (2001), Nonparametric modeling of buying behavior in fast moving consumer goods markets, in Georgios Papastefanou, Peter Schmidt, Axel Börsch-Supan, Hartmut Lüdtke, and Ulrich Oltersdorf (eds.), Social and Economic Research with Consumer Panel Data, Mannheim: ZUMA, 189-205. Boztuğ, Yasemin and Nadja Silberhorn (2006), Modellierungsansätze in der Warenkorbanalyse im Überblick, Journal für Betriebswirtschaft 56, 105-128. Bradlow, Eric T., Bart Bronnenberg, Gary J. Russell, Neeraij Arora, David Bell, Sri Devi Deepak, Frankelter Hofstede, Catarina Sismeiro, Raphael Thomadsen, and Sha Yang (2005), Spatial models in marketing, Marketing Letters 16, 267-278. Chib, Siddhartha and Edward Greenberg (1998), Analysis of multivariate probit models, Biometrika 85, 347-361. Chib, Siddhartha, P.B. Seetharaman, and Andrei Strijnev (2002), Analysis of multi-category purchase incidence decisions using IRI market basket data, in Philip Hans Franses and Alan L. Montgomery (eds.), Econometric Models in Marketing, Vol. 16, Amsterdam: Elsevier, 57-92. Chib, Siddhartha, P.B. Seetharaman, and Andrei Strijnev (2004), Model of brand choice with a no-purchase option calibrated to scanner-panel data, Journal of Marketing Research 41, 184-196. Cox, David R. (1985), Theory of statistics: Some current themes, Bulletin of the International Statistical Institute 51, Book 1, Section 6.3., International Statistical Institute: Amsterdam. Cressie, Noel A. C. (1991), Statistics for spatial data, New York: John Wiley & Sons Ltd.
420
sbr 60 October 2008 400-422
MARKET BASKET ANALYSIS
Decker, Reinhold and Heiko Schimmelpfennig (2002), Alternative Ansätze zur datengestützten Verbundmessung im Electronic Retailing, in Rainer Olbrich and Hendrik Schröder (eds.), Electronic Retailing, Frankfurt: Deutscher Fachverlag, 193-212. Decker, Reinhold and Katharina Monien (2003), Market basket analysis with neural gas networks and self-organising maps, Journal of Targeting, Measurement and Analysis for Marketing 11, 373-386. Deepak, Sri Devi, Asim Ansari, and Sunil Gupta (2007), Consumers’ Price Sensitivities Across Complementary Categories, Management Science 53, 1933-1945. Dickinson, Roger, Frederick Harris, and Sumit Sircar (1992), Merchandise compatibility: an exploratory study of its measurement and effect on department store performance, International Review of Retail, Distribution and Consumer Research 2, 351-379. Elrod, Terry, Gary J. Russell, Allan D. Shocker, Rick L. Andrews, Lynd Bacon, Barry L. Bayus, Douglas J. Carroll, Richard M. Johnson, Wagner A. Kamakura, Peter Lenk, Josef A. Mazanec, Vithala R. Rao, and Venkatesh Shankar (2002), Inferring Market Structure from Customer Response to Competing and Complementary Products, Marketing Letters 13, 219-230. Guadagni, Peter M. and John D. C. Little (1983), A logit model of brand choice calibrated on scanner data, Marketing Science 2, 203-238. Gupta, Sunil (1988), Impact of Sales Promotions on When, What, and How Much to Buy, Journal of Marketing Research 25, 342-355. Hastie, Trevor and Robert Tibshirani (1990), Generalized Additive Models, London: Chapman & Hall. Hruschka, Harald (1985), Der Zusammenhang zwischen Verbundbeziehungen und Kaufakt- bzw. Käuferstrukturmerkmalen, zfbf 37, 218-231. Hruschka, Harald (1991), Bestimmung der Kaufverbundenheit mit Hilfe eines probabilistischen Messmodells, zfbf 43, 418-434. Hruschka, Harald, Martin Lukanowicz, and Christian Buchta (1999), Cross-category sales promotion effects, Journal of Retailing and Consumer Services 6, 99-105. Julander, Claes-Robert (1992), Basket analysis, International Journal of Retail and Distribution Management 20, 10-18. Levine, Joel H. (1979), Joint-space analysis of pick-any data: analysis of choices from an unconstrained set of alternatives, Psychometrika 44, 85-92. Manchanda, Puneet, Asim Ansari, and Sunil Gupta (1999), The “shopping basket”: A model for multicategory purchase incidence decisions, Marketing Science 18, 95-114. Mild, Andreas and Thomas Reutterer (2001), Collaborative Filtering Methods for Binary Market Basket Analysis, in Jiming Liu, Pong C. Yuen, Chung-Hung Li, Joseph Ng, and Toru Ishida (eds.), Active Media Technology, Lecture Notes in Computer Science, Berlin: Springer, 302-313. Mild, Andreas and Thomas Reutterer (2003), An improved collaborative Filtering approach for predicting cross-category purchases based on binary market data, Journal of Retailing and Consumer Services 10, 123-133. Mulhern, Francis J. and Robert P. Leone (1991), Implicit price bundling of retail products: a multiproduct approach to maximizing store profitability, Journal of Marketing 55, 63-76. Müller-Hagedorn, Lothar (1978), Das Problem des Nachfrageverbundes in erweiterter Sicht, zfbf 30, 181-193. Niraj, Rakesh, V. Padmanabhan, and P.B. Seetharaman (2008), A cross-category model of households´ incidence and quantity decisions, Marketing Science 27, 225-235. Papastefanou, Georgios (2001), The ZUMA data file version of the GfK ConsumerScan Household Panel, in Georgios Papastefanou, Peter Schmidt, Axel Börsch-Supan, Hartmut Lüdkte, and Ulrich Oltersdorf (eds.), Social and Economic Analyses of Consumer Panel Data, Mannheim: ZUMA, 206-212. Rathneswar, S., Allan D. Shocker, June Cotte, and Rajendra K. Srivatava (1999), Product, person, and purpose: putting the consumer back into theories of dynamic market behavior, Journal of Strategic Marketing 7, 191-208.
sbr 60 October 2008 400-422
421
Y. BOZTUĞ/L. HILDEBRANDT
Russell, Gary J., David Bell, Anand Bodapati, Christina L. Brown, Joengwen Chiang, Gary Gaeth, Sunil Gupta, and Puneet Manchanda (1997), Perspectives on multiple category choice, Marketing Letters 8, 297-305. Russell, Gary J. and Wagner A. Kamakura (1997), Modeling multiple category brand preference with household basket data, Journal of Retailing 73, 439-461. Russell, Gary J. and Ann Petersen (2000), Analysis of cross category dependence in market basket selection, Journal of Retailing 76, 367-392. Russell, Gary J., S. Ratneshwar, Allan D. Shocker, David Bell, Anand Bodapati, Alex Degeratu, Lutz Hildebrandt, Namwoon Kim, S. Ramaswami, and Venkatash H. Shankar (1999), Multiple-category decision-making: Review and synthesis, Marketing Letters 10, 319-332. Schnedlitz, Peter and Michael Kleinberg (1994), Einsatzmöglichkeiten der Verbundanalyse im Lebensmittelhandel, Der Markt 33, 31-39. Schnedlitz, Peter, Thomas Reutterer, and Walter Joos (2001), Data-Mining und Sortimentsverbundanalyse im Einzelhandel, in Hajo Hippner, Ulrich Küsters, Matthias Meyer, and Klaus Wilde (eds.), Handbuch Data Mining im Marketing, Wiesbaden: Vieweg, 951-970. Seetharaman, P.B., Andrew Ainslie, and Pradeep K. Chintagunta (1999), Investigating household state dependence effect across categories, Journal of Marketing Research 36, 488-500. Seetharaman P.B., Siddhartha Chib, Andrew Ainslie, Peter Boatwright, Tat Chan, Sachin Gupta, Nitin Mehta, Vithala Rao, and Andrei Strijnev (2005), Models of multi-category choice behavior, Marketing Letters 16, 239-254. Shocker, Allan D., Barry L. Bayus, and Namwoon Kim (2004), Product complements and substitutes in the real world: The relevance of other products, Journal of Marketing 68, 28-40. Song, Inseong and Pradeep K. Chintagunta (2007), A Discrete-Continuous Model for Multicategory Purchase Behavior of Households, Journal of Marketing Research 44, 595-612. Walters, Rockney (1991), Assessing the impact of retail price promotions on product substitution, complementary purchase, and interstore sales displace, Journal of Marketing 55, 17-28. Walters, Rockney and Maqbul Jamil (2002), Measuring cross-category specials purchasing: theory, empirical results, and implications, Journal of Market-Focused Management 5, 25-42.
422
sbr 60 October 2008 400-422