J Real Estate Finan Econ DOI 10.1007/s11146-017-9636-x
Information Entropy-Based Housing Spatiotemporal Dependence Jin Zhao 1
# Springer Science+Business Media, LLC 2017
Abstract In the existing housing literature, there has been no academic consensus on how to combine the spatial dependence and the temporal dependence between housing transactions together. The combination is much dependent on the researcher’s priori knowledge of a referent market. This paper attempts to combine them by utilizing an information entropy-based spatiotemporal approach. The validity of the proposed information entropy-based spatiotemporal approach is tested by spatiotemporal regressions in terms of prices estimation accuracy. The methodology is conducted by using data on dwelling transactions from the San Francisco Bay Area. The empirical results suggest that the proposed information entropy-based modeling technique is a reasonable and efficient way to combine the spatial dependence and the temporal dependence. Keywords Informationentropy . Hedonicmodel . Spatiotemporaldependence . Bayesian estimation . Gibbs sampling
Introduction This paper attempts to estimate the spatiotemporal dependence between observations of dwellings on a spatiotemporal scale by utilizing an information entropy-based method. As with the existing literature (Tu et al. 2004; Sun et al. 2005; etc.), the spatiotemporal dependence proposed in this paper consists of the spatial dependence and the temporal dependence. In this paper, the basic ideology of the temporal lags comes from the idea, put forth by Pace et al. (1998, 2000), that the current price of a dwelling is partially determined by its historical prices.
* Jin Zhao
[email protected]
1
Department of Finance, The Shanghai Lixin University of Accounting and Finance, SongLin Lu 333 Nong 17 Lou 501 Shi, Shanghai 200122, China
J. Zhao
To quantify the spatial dependence more specifically, the spatial effect is split into two sub-effects in this paper, as suggested by the existing literature (Tu et al. 2004; Sun et al. 2005), namely the building effect and the regional effect, due to the characteristics of the dataset selected by this paper, which is composed of multi-type dwellings including detached single-unit dwellings, semi-detached dwellings, and attached multi-unit dwellings. In attached multi-unit buildings, the units located in the same building share some common features including similar layouts, facilities, views, and exterior and interior designs. Prospective home purchasers with similar purchase preferences are possibly interested in units with similar attributes in the same building. The aggregate behavior of home purchasers based on this assumption may lead to generic dependence between the prices of the units. For this reason, the building effect refers to the spatial autocorrelation between units in the same building. The building effect does not exist between the detached single-unit dwellings as each detached single-unit dwelling has a unique geographic location. Conversely, the building effect plays an irreplaceable role in the price valuation of attached multi-unit dwellings as a unique geographic coordinate is shared by more than one unit. For the above reason, in this paper, the doorplate numbers of units in the same building is used to construct the building effect weight matrix. The building effect is only considered for units in the same building, because the effect across different buildings is difficult to identify due to the issue of severe heterogeneity. Tu et al. (2004) and Sun et al. (2005) provide a building effect estimation to analyze the Singaporean real estate market by using a two-order spatial filtering process, which filters the building effect and the regional effect collectively. The regional effect is more familiar to researches and it typically relates to the two-dimensional geographic distance between any two dwellings on different geographic sites. It is often assumed that the spatial autocorrelation diminishes as the geographic distance between any two geographic sites increases. In the field of spatiotemporal research, one of the commonly encountered questions is how to combine the spatial effect and temporal effect (and other factors) reasonably. Spatial dependence and temporal dependence are different concepts, the combination of the two poses a great challenge for researchers. Pace et al. (1998) proposes a compound filtering process. This process includes a linear combination of both spatial and temporal weight matrices, as well as an interactive compound filtering process initially filtering the temporal effect first and subsequently the space effect and then reversing the order of operations: filtering the space effect first, and subsequently the temporal effect. Pace et al. (1998) points out that the optimum filtering process is a convex combination of the two compound filtering processes. This compound filtering process is a hybrid product generated from both the time lag autoregressive model in time series and the spatial lag autoregressive model in spatial econometrics (Ord 1975; Anselin 1988). Consequently, the spatial model and the temporal model are combined to form a spatiotemporal model by this compound filtering process. However, it is worth mentioning that they do not specify how to calculate the weight coefficient of each matrix, but use the combination of different matrices with their coefficients in the regression instead. Based on their compound filtering process, the final regression is a conventional regression rather than a typical spatial regression, although the spatiotemporal information is incorporated into the variables. Despite the fact that an
Information Entropy-Based Housing Spatiotemporal...
improvement in regression results has been found, multicollinearity caused by a large amount of spatiotemporal interactive terms that are produced from compound filtering process could be a major contributor to the increase in goodness of fit. Therefore, the results, particularly the coefficients, could be less reliable. Hitherto, the economic meaning of the coefficients of these spatiotemporal interactive terms has not been identified by any research. In addition, with a redundant list of independent variables, their method would be more difficult to implement a meaningful forecast. Consequently, this paper drops the idea of spatiotemporal interactive terms that provide us little intuitive information. Another way to combine the spatial and temporal effect is to utilize the statistical iteration method. However, besides the defect of less economic intuition, the computational cost of statistical iteration is not acceptable, particularly if the data set is large. Instead, the combination of different weight matrices could be achieved through information entropy-based method, since information entropy measurement provides us a reasonable way to estimate the coefficient of each weight matrix in the compound filtering process proposed by Pace et al. (1998, 2000). In this paper, an information entropy-based approach is designed to simplify the hedonic equation, alleviate the concern with multicollinearity and produce more reasonable results. It is well known in the literature that property valuation has some uncertainty associated with it (French 2007). Information entropy is designed to measure the randomness or uncertainty of a random process (Pandurangan and Upfal 2007). Information entropy is one of the methods for rating (Shannon and Weaver 1963) and provides a quantitative measurement of uncertainty (Kapur 1989). Chan et al. (1999) suggests that the requirement of the entropy method on information is less stringent for the purpose of ranking. The method has been implemented in areas such as multi-attribute decision-making under uncertainty (Zanakis et al. 1998; Dyer and Sarin 1979). Information entropy-based methods have been attempted in the real estate literature. Ge et al. (2005) applies entropy method to find out which attributes affect property values more significantly in the Hong Kong residential housing market. Within the large literature on spatiotemporal analysis, the specification for the element of the weight matrix has played a crucial role. A wide variety of specifications have been tried in housing price models, including minimum distance, k-nearest neighbors (Can and Megbolugbe 1997), inverse distances (Pace and Gilley 1997), contiguity based on Delaunay triangulation and contiguity based on the bordering of areal units. In this paper, the adaptive Gaussian kernel function with a minimum distance method is applied to construct the spatial and temporal weight matrices. To avoid the setbacks of a locally linear spatial model as well as to incorporate the proposed the information entropy-based process, this paper adopts Bayesian estimation method with Gibbs sampling proposed by Geweke (1993). Tu et al. (2004) and Sun et al. (2005) incorporate this approach to their spatiotemporal analysis and suggest that that a Bayesian method is more suitable for estimating a real estate hedonic model than non-Bayesian methods. A bootstrap method is adopted to test the overall performance of the proposed model for the purpose of comparison. The remainder of this paper is organized as follows. In BResearch Methods^, this paper develops an information entropy-based autoregressive model with Bayesian treatment to remove the residual autocorrelations and to correct the issue of residual heteroscedasticity by considering the proposed building, regional and temporal effects. Section BData Description^ presents the description of the selected dataset.
J. Zhao
Section BRegression Results^ provides both in-sample and out-of-sample empirical results based the selected dataset. Section BConcluding Remarks^ concludes the paper, provides its limitations and points to several directions of future extensions.
Research Methods Autoregressive Error Process To start with, the hedonic regression model in this paper is assumed to be based on a spatial error model: Y ¼ Xβ þ ε;
ð1Þ
ε ¼ ρWε þ ϵ;
ð2Þ
where β refers to the n × 1 vector of coefficients of the independent variables. n denotes the number of observations. W refers to the n × n weight matrix. Elements in the weight matrix W are arranged in chronological order to ensure that only ex ante observations can estimate ex post observations. Therefore, for each element Wi , j, if j ≥ i, then Wi , j = 0. Consequently the weight matrix W is a lower triangular matrix. All elements in W are nonnegative. ρ is the 1 × n vector of coefficients. ε refers to the n × 1 vector of the error terms of Eq. (1). ϵ refers to the n × 1 vector of the error terms of Eq. (2). Y refers to the n × 1 vector of dependent variables. For this equation to be valid, the regressor matrix X should have full column rank (for a sufficiently large n). In what follows, the building effect, the regional effect, and the temporal effect are introduced in detail respectively, and subsequently all effects are converted to correlations, which are used to construct weight matrices. Building Effect In this subsection, the building effect between the ith observation and the jth observation is discussed. In this paper, the building effect is considered by using the doorplate numbers of units in a building. In terms of research conducted by Tu et al. (2004) and Sun et al. (2005), it is shown that the unit price has a significant positive relation to the level of floor by analyzing Singapore’s real estate market. This paper assumes that there is a nonlinear relationship between the prices of units in the same building. In this paper, the nonlinear relationship is measured by the Gaussian function. To avoid the concern of multicolinearity, the building effect between different buildings is not considered because it is partially included in the regional effect between different buildings.1 Units in the same building are highly homogeneous in terms of structural and environment characteristics. These facts help us to eliminate the concern of spatial heterogeneity when the building effect is considered. In the dataset, each building has 1 This problem can be seen in the filtering process in Subsection BThe Combination of Building, Regional and Temporal Effects^.
Information Entropy-Based Housing Spatiotemporal...
only one pair of geographical coordinates, and therefore the building effect between units in the same building cannot be calculated by using geographical coordinates. Instead, the building effect is considered to be related to the difference between their floor levels. Definition 2 The building effect is defined as BUIi; j ≡CB F i − F j þ Si −S j ;
ð3Þ
where CB BUIi , j Si Sj Fi Fj
denotes the constant term and it is identical for all building effects; denotes the building effect between the ith observation and the jth observation, if the geographical locations of the two observations are identical; denotes the sequence of the dwelling unit on the floor recorded in the ith observation; denotes the sequence of the dwelling unit on the floor recorded in the jth observation; denotes the floor level recorded in the ith observation; and denotes the floor level recorded in the jth observation.
If the weight matrix is turned to be row stochastic,2 CB is cancelled out from each element and, therefore, its value does not affect the weight assigned to each element.3 BUIi , j represents the element in the ith row and the jth column of matrix BUI. In the dataset, each unit in a building has its unique doorplate number. We divide a doorplate number to two parts, the first part indicates the floor-level and the second part indicates the sequence of the unit on the floor. For example, the Unit 507 indicates that it is the seventh flat on the fifth floor. The distance between any two units in the same building equals the difference between the floor-levels of the two units added by the difference between the sequences of the units on their respective floors. For example, the distance between Unit 507 and Unit 201 is equal to (5 − 2) + (7 − 1) = 9; and the distance between Unit 301 and Unit 201 is (3 − 2) + (1 − 1) = 1. Building effect only matters to the relationships between dwellings located in the same building. The building effect distance between dwellings at different sites is assumed to be zero. In this paper, adaptive Gaussian kernel function associated with minimum distance method, which transfers distances to correlations between observations, is applied to construct the building, regional, and temporal effect weight matrices. The decaying effect between different units in the same building is assumed to be nonlinear in order to be consistent with the reality of real estate
2
A matrix is row stochastic means the matrix has rows sum up to one. Since the constant term is a common divisor, when the row is standardized, it can be divided from each element in the row.
3
J. Zhao
markets. The Gaussian function is adopted to guarantee this assumption. The elements in the building weight matrix can be identified as
w21½i; j
8 BUIi; j < ; if j < ϕ ¼ σ1 θw1 : 0; if j <
i i
9 = ;
;
ð4Þ
where w1 , [i, j] ϕ σW 1 θW 1
refers to the element in the ith row and the jth column of weight matrix W1; denotes the Gaussian kernel function; represents the standard deviation of the ith row vector in matrix BUI; and is a nonnegative decay parameter known as the bandwidth, which is calculated by a cross validation procedure.
To ensure that the weight matrix W in Eq. (2) is lower triangular, the building effect matrix W1 is designed to be lower triangular as well. The Gaussian function places more weight on neighboring observations and less weight on remote observations. If the ith observation records a detached or semi-detached single-unit dwelling transaction, the ith row of the building effect matrix is a vector with zero elements.
Regional Effect In this subsection, the regional effect between the ith observation and the jth observation is discussed. Spatial dependence between dwellings on different sites is applied throughout most of the literature. In this paper, this is attributed to the regional effect. As with previous sections, the Gaussian function applies to the regional effect, which is assumed to have a nonlinear distance decaying pattern. Definition 3 The regional effect is defined as REGij ≡CR ⋅
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 X i −X j þ Y i −Y j ;
ð5Þ
where CR Xi Xj Yi Yj REGij
is the constant term, which is assumed to be identical for all regional effects. denotes the longitude of the ith observation; denotes the longitude of the jth observation; denotes the latitude of the ith observation; denotes the latitude of the jth observation; and denotes the regional effect between the two observations.
If the weight matrix is turned to be row stochastic, CR is cancelled out from each element; therefore, its value does not affect the weight assigned to each element. The regional distance between dwellings in the same building is assumed to be zero. REGij
Information Entropy-Based Housing Spatiotemporal...
represents the element in the ith row and the jth column of matrix REG. The elements in the regional weight matrix can be identified as
w22½i; j
8 REGi; j < ; ϕ ¼ σw2 θw2 : 0;
if j < i if j≥ i
9 = ;
;
ð6Þ
where w2 , [i, j] ϕ σW 2 θW 2
denotes the element in the ith row and the jth column of weight matrix W2; denotes the Gaussian kernel function; represents the standard deviation of vector REGi, which is the ith row vector in matrix REG; and is the bandwidth calculated by a cross validation procedure.
Similar to the building effect matrix, to ensure that the weight matrix W in Eq. (2) is lower triangular, the regional effect matrix W2 is designed to be lower triangular as well.
Temporal Effect In this subsection, the temporal effect between observations is discussed. In this paper, temporal effect between observations is measured by the time interval between transaction dates. As with the previous dependence measurements, the Gaussian function applies to the temporal effect, which is assumed to have a nonlinear distance decaying pattern. Definition 4 The temporal effect is defined as TEMi; j ¼
CT DT;i −DT; j ; 0;
if j < i ; if j ≥ i
ð7Þ
where CT TEMi , j DT , i DT , j
denotes the constant term, which is assumed to be identical for all temporal effects; denotes the temporal effect between the ith observation and the jth observation; denotes the transaction date of the ith observation; and denotes the transaction date of the jth observation.
If the weight matrix is turned to be row stochastic, CT is cancelled out from each element, therefore its value does not affect the weight assigned to each element. TEMi , j represents the element in the ith row and the jth column of matrix TEM. Each observation is assumed to be only influenced by its prior observations.
J. Zhao
Elements in the temporal weight matrix can be identified as
t 2i: j
9 8 TEMi; j = < ; If j < i ϕ ; ¼ σ θ T T ; : 0 If j≥ i
ð8Þ
where ti , j ϕ σT θT
refers to the element in the ith row and the jth column of temporal weight matrix T; denotes the Gaussian kernel function; represents the standard deviation of the ith row vector in matrix TEM; and denotes the bandwidth.
Similar to the previous two effect matrices, to ensure that the weight matrix W in Eq. (2) is lower triangular, the temporal effect matrix T is designed to be lower triangular as well.
The Determination of the Bandwidth In this subsection, the determination of the bandwidths θW1 , θW2 , θT is discussed. Changing the bandwidth can potentially result in different exponential decay patterns, which produce estimates that vary more or less rapidly over space or time. Theoretically, an optimal way to determine the value of θW1 , θW2 , θT is to run regressions against the other values of θW1 , θW2 , θT to identify the value with the highest model fit. A model-fit based grid search is conducted to select the most appropriate bandwidths. According to the characteristics of the dataset, the minimum level of bandwidth is set to be 0.5, the maximum level of bandwidth is set to be 20, and the interval is set to be 0.5 for searching the candidates. Typically, the grid search algorithm outputs the settings that achieved the most appropriate score in the validation procedure. Cleveland (1979) and Fotheringham et al. (2002) point out the best bandwidth can be verified using the least squares cross-validation method. Therefore, a cross-validation procedure is implemented using a score function: i2 n h ∑ yi ‐^yi ðθw1 ; θw2 ; θT Þ
ð9Þ
i¼1
Where y refers to the regression model in Eq. (1); n refers to the number of observations. The parameters θW1 , θW2 , θT are included in the effect weight matrices reespectively. The score function attempts the different values of θW1 , θW2 , θT to identify the optimal combination that minimizes the score function. The Combination of Building, Regional and Temporal Effects The combination of different effects is designed to be constructed by information entropy-based measurement in this paper. In information theory, entropy is a measurement of uncertainty. We could use entropy to measure the degree of randomness and
Information Entropy-Based Housing Spatiotemporal...
disorder. Information entropy measures the information contained in a message as opposed to the portion of the message that is determined. If a previous introduced effect measurement provides little information on the house price determination, it is associated with a higher degree of uncertainty and the entropy will be large. In this case, this effect should be assigned with a less weight coefficient. Conversely, if an effect provides more information on price determination, it is associated with a less degree of uncertainty, the entropy will be small and this effect should be assigned with a large weight coefficient. Information entropy, which was originally devised by Claude Shannon in 1948 to study the amount of information in a transmitted message, is sometimes referred to as Shannon entropy. Information entropy resembles Bphysics entropy^ in statistical mechanics, which was developed by Ludwig Boltzmann in the 1870s. Information entropy is the measure of the amount of information that is missing before reception. In information theory, the entropy state function is defined as the amount of information that would be needed to specify the full microstate of the system. Typically, the entropy H( f ) of a discrete alphabet random variable x (Balian 2004. etc.) defined on the probability space (Ω, ℬ, P) is defined by H ð f Þ ¼ −q∑a∈A Pð f ¼ aÞIn Pð f ¼ aÞ; q ¼
1 In ðmÞ
ð10Þ
where A = {a1, a2, … , a‖A‖} Ω ℬ P m
denote the finite alphabet of f; refers to the sample space or event space; denotes the smallest σ-field of subsets of Ω; denotes a discrete set of probabilities; denotes the number of discrete alphabet random variable x.
The units for entropy are Bnats^ when the natural logarithm is used and Bbits^ for base 2 logarithms. In this paper, we replace the discrete alphabet random variable with an observation of a factor. In what follows, the definition of the information entropy of a factor in this paper is discussed. In this section, the previous calculated nonzero elements in the building, regional and temporal effect weight matrices would be used to determine the coefficient of each effect weight matrices based on the proposed information entropy-based method. The nonzero elements (j < i) in the building, reginal and temporal effect weight matrices are extracted to construct a variable matrix that would be used to calculate the information entropy. The rank of 2 the variable matrix is n 2−n by 3. The elements in BUI (j < i) are extracted to construct the first column of the variable matrix, similarly the nonzero elements in REG (j < i) and TEM (j < i) are extracted to construct the second and the third columns of the variable matrix respectively. The three factors considered in this paper are building, regional, and temporal correlations. Consequently, each column of the variable matrix represents a factor vector. For each factor, the order of variables in the vector does not affect the result of the calculation of information entropy.
J. Zhao
2 In this paper, the variable xsk s ¼ 1; ⋯; m: k ¼ 1; ⋯; 3: m ¼ n 2−n denotes an element in the sth row and kth column of the variable matrix. The following formulae is used to standardize each column, 0
xsk ¼
xsk −minfxik ; ⋯; xmk g : maxfx1k ; ⋯; xmk g−minfx1k ; ⋯; xmk g
ð11Þ
0
After obtaining xsk , the elements in the variable matrix need to be updated by Eq. (12), p f s;k ¼
0
xsk 0 ; s ¼ 1; ⋯; m:k ¼ 1; ⋯; 3: m ∑s¼1 xsk
ð12Þ
In what follows, the information entropy of each factor can be calculated by using p( fs , k) in Eq. (12). Definition 4 The information entropy of kth factor in terms of a discrete set of probabilities p is defined as m H ð f k Þ ¼ −q ∑ p f s;k In p f s;k ; q ¼ s¼1
1 ; InðmÞ
ð13Þ
where H( fk) xs , k m
denotes the information entropy of kth factor; refers to the sth variable of the kth factor; denotes the number of variables of the kth factor.4
It is worth mentioning that p( fi , k) must not be zero. In case that p( fi , k) is zero, this paper changes it to 0.0001 to ensure that logp( fi , k) can be calculated. If the information entropy of a factor is low, the level of the potential randomness of a factor would be low and the factor is considered to be more suitable to explain the state of the system. Hence, the factor should be provided with a large weight coefficient. Consequently, the contribution of a factor is expressed as dk ¼ 1−Hð f k Þ;
ð14Þ
where dk denotes the contribution of the kth factor. As a result, the weight of the factor is expressed as Ek ¼
dk ; m ∑k¼1 dk
ð15Þ
where m denotes the number of attributes; Ek denotes the weight coefficient of the kth factor based on the measurement of information entropy. As a result, the weight matrix 4
The numbers of variables of different factors are identical in this paper.
Information Entropy-Based Housing Spatiotemporal...
W in Eq. (2) is presented as the linear combination of the factors and their weight coefficients respectively: W ¼ E1 W1 þ E2 W2 þ E3 T;
ð16Þ
where W E1 E2 E3
represents the weight matrix in the regression equation; denotes the weight coefficient of the building effect weight matrix W1; denotes the weight coefficient of the regional effect weight matrix W2; denotes the weight coefficient of the temporal effect weight matrix T.
The information entropy-based method is preferred, particularly, if the number of weight matrices is large, since it is difficult to specify the weight coefficient of each matrix conveniently and intuitively by other means. The proposed method may not provide the optimal weight coefficients; however, it is much easier to implement than statistical iteration methods, particularly if the data sample is large. In order to accurately demonstrate the impact of the information entropy-based spatiotemporal dependence, the resulted weights would not be adjusted by any priori knowledge.
Bayesian Estimation Besides the problem of autocorrelation, the heterogeneous characteristics of properties might cause the problem of heteroscedasticity if a conventional estimation method is used. Sun et al. (2005) use the Bayesian estimation procedure with heteroscedastic treatment (Geweke 1993; LeSage 1999) to avoid suffering from the problem of heteroscedasticity, and conclude that their procedure can remove the problem of heteroscedasticity effectively. Gibbs sampling (Geman and Geman 1984; Gelfand and Smith 1990; Arnold 1993), which is a special case of Metropolis-Hastings process, is a widely used approach for generating the posterior estimation. To explain the process more clearly, the regression equation is written in a parsimonious form in Eqs. (17–23): ^ þ u; ^β Y¼X
ð17Þ
u∼N 0; σ2 V ;
ð18Þ
r∼Γ ða1 ; a2 Þ;
ð19Þ
r x2 ð r Þ ¼ ID ; i ¼ 1; 2; …; n; vi r
ð20Þ
^ ðc; GÞ; β∼N
ð21Þ
J. Zhao
1 σ∼ ; σ
ð22Þ
V ¼ diagðv1 ; v2 ; v3 ; …; vn Þ:
ð23Þ
The error term u is assumed to be an independently and identically distributed normal random variable with mean 0 and heterogeneous variance σ2V. vi is the variance of the error term from the ith observation. The different values of r are attempted to select the optimal estimation result. A small r indicates a large variance, whereas a large r indicates a small variance. The priori distribution vri is assumed to have an independent distribution χ rðrÞ with the mean r1 and the variance 2r. Hence, a smaller r accords with a more serious heteroscedasticity problem. σ is assumed to be uniformly distributed or a diffuse priori. For the priori distribution of β, it is assumed that it follows a multivariate normal distribution with mean c and variance G. Using the Gibbs sampling procedure, instead of computing the joint posterior density function of the parameters, the joint posterior density function can be approximated based on the distribution function of each parameter conditional on all other variables. In this paper, three parameters need to be estimated. Following Geweke (1993), their conditional distributions are expressed as below. 2
i
h 0 ~ jσ; V ∼N H X ~ V‐1 Y þ σ2 G‐1 c ; σ2 H ; p β
ð24Þ
n u2 .
~ ~ i p σβ; V ∼ ∑ σ2 β; V ∼x2 ðnÞ; i¼1 vi
ð25Þ
σ−2 u2 þ r
~ ~ i p vβ; σ ∼ β; σ ∼x2 ðr þ 1Þ; i ¼ 1; 2; …n; vi
ð26Þ
0 −1 ~ V−1 X ~ þ G−1 : H¼ X
ð27Þ
It is assumed that K stands for the parameter to be estimated and M denotes the ^ ; σM ; VM . K0 ¼ β ^ ; σ0 ; V0 is the initial paramnumber of draws, then KM ¼ β M 0
^ ; σ1 ; V1 is the value after the first Gibbs sampling process. eter value. K1 ¼ β 1
^ ; σ2 ; V2 is the value after the second Gibbs sampling process. Geweke K2 ¼ β 2 (1993) proves that the distribution of KM converges to the joint posterior distribution of K when the number of draws M is sufficiently large. The subsequent section provides the descriptive statistics of the dataset of this paper. The empirical results are presented in Section BEmpirical Results^. The details of the selected regression results are illustrated in the Appendix.
Information Entropy-Based Housing Spatiotemporal...
Data Description This paper selects a dataset from the San Francisco Bay Area to test the validity of the proposed methodologies. The selected data are available in county records. Housing attributes are based on information supplied from the Bay Area property transaction records. The definitions of all variables are illustrated in Table 1. The size of dwellings varies dramatically across the dataset as the dwellings in the dataset comprise different types of homes. Typically, small dwellings have fewer bedrooms, whereas large dwellings have more bedrooms. However, luxury houses typically have fewer bedrooms than equal-sized less expensive houses. An important feature of the dataset is that it includes detailed information about the address and date of each transaction, which can help identify the coordinates by GIS applications. It is worth noting that the problem of multicollinear relationship between the different variables may exist, and moreover, the recorded attributes of each dwelling may not comprehensive. For these reasons, Box-Cox transformation is conducted to alleviate the impacts of these weaknesses. Initially, a set of 24,930 chronologically ordered transaction records was collected from the San Francisco Bay Area residential real estate market from 2007 to 2012. Transaction records of attached multi-unit dwellings account for 21% of the dataset. Among these observations, the first 22,930 chronological observations were collected to construct an in-sample data pool, while the last 2000 chronological observations were included in the out-of-sample data pool. The statistical characteristics of variables are illustrated in Table 2. Some of the dwellings in the dataset were transacted more than once. Fortunately, the methodologies introduced above can apply to a repeat-sales dataset, since the spatial effect between any two observations of a dwelling is zero, while the temporal effect between them is non-zero. As a result, the proposed method remains valid.
Table 1 Variable definitions Variable
Definition
Price
Price per square foot
Floor size
Square foot
Age
The time interval between the year the home was built and the year of transaction
Number of bedrooms
The number of rooms in a dwelling used or intended for sleeping in
Number of bathrooms
The number of rooms in a dwelling used for personal hygiene activities
Number of rooms
The total number of all sorts of rooms
Number of floors
The number of floors of a dwelling
Elementary school quality
The quality of elementary school based on test scores and a variety of other factors
Junior high school quality
The quality of junior high school based on test scores and a variety of other factors
High school quality
The quality of high school based on test scores and a variety of other factors
Walk score
Walkability based on the distance to nearby amenities.
J. Zhao Table 2 Statistical characteristics of variables Variable
Min
Max
Mean
Standard deviation
Price
94.04
1548.2
279.8768
598.1863
Age
1
117
52.6936
147.6233
Floor size
371
5280
1351.3441
577.9925
Number of bedrooms
1
12
2.3871
1.2864
Elementary school quality
1
10
8.0645
2.2363
Junior high school quality
1
10
6.6613
2.0423
High school quality
1
10
6.9946
2.6660
Number of bathrooms
1
6
1.8118
0.7920
Number of rooms
3
18
5.6183
2.1103
Walk score
1
99
80.0161
22.3840
Number of floors
1
3
1.1720
0.406
The figures illustrated are not logarithm transformed
Information about school districts is provided by GreatSchools, which is a national nonprofit organization in United States. The GreatSchools provides ratings based on test scores and a variety of other factors for schools in all 50 states, including student academic growth and college readiness. Three stages of precollege education are covered in the analysis, including elementary school, junior high school and high school. The GreatSchools Rating is on a 1–10 scale, where the value of 10 indicates the highest school quality and the value of 1 indicates the lowest school quality. Walk Score measures the walkability of an address based on the distance to nearby amenities. The walk score rating is on a 0–100 scale, where 100 is the highest and 0 is the lowest. A higher walk score indicates that the address is more walkable than an address with a lower walk score. Typically, a dwelling unit in a multi-unit building occupies only one floor. For the independent single-unit dwellings in the dataset, the minimum number of floors is one, while the maximum number of floors is three.
Empirical Results For the purpose of comparison, the estimation of a conventional non-spatiotemporal model, which does not consider the proposed spatiotemporal process, is included in this section. The regression results of models are discussed subsequently. A discussion of autocorrelation reduction and heteroscedascity robustness is followed. In closing this section, the predictive accuracy of models is tested at the end. Non-Spatiotemporal Model The results of the non-spatiotemporal model are reported in Table 3 and the Appendix. The non-spatiotemporal model uses conventional OLS as well as Bayesian estimation to regress price on the independent variables.
Information Entropy-Based Housing Spatiotemporal... Table 3 Non-spatiotemporal ordinary least squares estimates R-squared
0.6030
Rbar-squared
0.6019
Sigma square
0.0321
Breusch-Pagan LM
71.8720
LM stands for Lagrange Multiplier
The result of conventional non-spatiotemporal estimation exhibits serious statis2
tical problems, which are indicated by a relatively low R value of 0.6019, as well as a high value of Breusch-Pagan Lagrange Multiplier test statistics. Moreover, the pairwise correlations among the residuals are detected and this fact violates the assumption of no serial correlation under the OLS estimation. The problem of heteroscedasticity is detected by the Breusch-Pagan Lagrange Multiplier test. The above results imply that the conventional estimation is not an appropriate option to apply to the dataset.
Coefficients of Weight Matrices Three factors are considered in this paper, including building effect, regional effect, and temporal effect. The optimal bandwidths of selected combinations of weight coefficients of the three different effects are illustrated in Table 4. Table 4 Optimal bandwidths Weight coefficients
Bandwidths
B
R
T
θW1
θW2
θW2
0.01
0.55
0.44
17.5
0.5
1
0.33
0.33
0.33
15
0.5
1.5
0.1
0.5
0.9
18
0.5
1
0.9
0.5
0.1
17.5
1
1.5
0.3
0.2
0.5
16
0.5
1
0.4
0.1
0.5
15.5
2
2.5
0.9
0.1
0
17
1
N/A
0.1
0.5
0.4
17.5
1
2.5
0.6
0
0.4
15.5
N/A
2
0.2
0.5
0.3
18.5
2
3.5
0
0.3
0.7
N/A
0.5
1
B denotes the weight coefficient of the building effect matrix R denotes the weight coefficient of the regional effect matrix T denotes the weight coefficient of the temporal effect matrix N/A: If the coefficient of an effect weight matrix is zero, there would be no optimal bandwidth for this effect
J. Zhao
The score function attempts to calculate the different values of θW1 , θW2 , θT and the optimal bandwidths are identified by minimizing the score function in Eq. (9). Different combinations of weight coefficients generate different optimal bandwidths. It is identified that the optimal values of the bandwidth of the building effect are typically larger than those of the regional or temporal effect. The potential reason is that due to the fact that building effect only matters to attached multi-unit dwellings, the distribution of building effect observations are sparse, the adaptive function allows large bandwidths to adapt to it. In what follows, the result of the information entropybased weight coefficients is discussed (Table 5). It is identified by the information entropy-based method that the building effect matrix provides less useful information on market development. The coefficient of building effect matrix is identified to be low, since most observations in the dataset are independent dwellings. In addition, it is identified that the regional effect matrix provides relatively more useful information than temporal effect matrix on market development. The result matches the fact that the local real estate market structure does not change much over the time horizon during the selected period. Nevertheless, it is worth mentioning that, in some cases, some combination of the coefficients of weight matrices with less economic intuition can perform even better, due to the randomness of the dataset. Coefficients of Independent Variables Tables 10, 11 and 12 in the Appendix show the coefficients of the independent variables from selected regressions. It is worth noting that, due to the problems of endogeneity and omitted variable bias, etc., explaining the marginal contribution of each attribute is not a primary focus of this paper. However, a brief introduction to the coefficients obtained from regressions is still presented in this section, although the results are potentially controversial. The coefficient of size is almost zero in most cases. This implies that dwelling price per square foot is less correlated with dwelling size. The reason could be that smaller sized dwellings located in the urban area are compensated by their convenient access to market centers. In the dataset, smaller dwellings are concentrated in regions near market centers. As a result, dwelling unit prices per square foot of smaller dwellings may not be less than those of larger dwellings, and therefore the prices per square foot are almost equal across different areas, although the validity of the result is potentially restricted to the selected dataset. It is surprising to find that the number of bedrooms is significantly negatively correlated with the price. The potential explanation for this phenomenon is that, given dwellings with the same overall size, purchasers prefer dwellings with large living room(s) and/or large bedroom(s). For example, in the dataset, a dwelling has three bedrooms with overall size of 1920 square feet, while another dwelling has seven bedrooms with overall size of 1937 square feet. The price of the former is higher than that of the latter. This is consistent with the reality that expensive dwellings typically feature larger living room(s) and/or larger bedroom(s). Number of bathrooms has shown to be positively correlated with the dwelling price. Bathroom space is more expensive to build than bedroom space because of the extra appliances, cabinetry, counters, electric, faucets, and plumbing that are not necessary in
Information Entropy-Based Housing Spatiotemporal...
Table 5 Information entropybased weight coefficients
Effect matrix
Building
Regional
Temporal
Coefficient
0.01
0.55
0.44
the relatively empty space for a bedroom. In addition, the flooring is usually more expensive in a bathroom. Therefore, bathroom spaces exhibit a higher square foot price than other space. As a result, its coefficient is higher than that of the number of bedrooms. The opposite contributions of the number of bedrooms and the number of bathrooms partially explain the reason for the coefficient of total number of rooms to be significant but only slightly higher than zero. An insignificant coefficient of dwelling age is obtained from the regression, indicating the property purchasers are relatively indifferent to the issue of aging. Dwelling age influences the market value of a property in two ways. First, dwelling age may cause the depreciation of the market price of a property due to the problem of aging (Hulten and Wykoff 1981). Second, a property’s market price may appreciate as its dwelling age increases. The second effect is referred to as the vintage effect, which refers to the fact that some unmeasured real estate characteristic is correlated with the year that a property was built (Hall 1971; Goodman and Thibodeau 1995). Goodman and Thibodeau (1995) conclude that the depreciation rates are nonlinear and non-monotonic meaning that it is higher for new properties and declining with dwelling age. Considering the above arguments, the aging problem and vintage effect may offset each other in this case. Consequently, the attribute of dwelling age has an insignificant contribution to dwelling values. The coefficient of the primary school quality is significantly positive, however the coefficient of the junior high school quality is almost zero and the coefficient of high school quality is not even significant, indicating that home purchasers may consider that the quality of local elementary education is more important than local junior high or high school education. Intuitively, parents may prefer to send their children to a local elementary school, when their children are not able to commute a long distance to school or live away from their parents. However, parents may prefer to send their children to a nonlocal private junior high or high school when their children can commute or live independently. The coefficient of the number of floors is insignificant, indicating that most home buyers are indifferent to the number of floors when they purchase a dwelling. The coefficient of walk score exhibits a significant positive value. An address with a high walk score implies that local amenities are more accessible to people living there. The apparent convenience to the home buyers contributes to higher prices per square foot of dwellings with a higher walk score. The Comparison of Pairwise Autocorrelation The pairwise autocorrelation is utilized to test the validity of the proposed information entropy-based structure. A number of weight coefficient combinations are selected to compare the results of Bayesian estimations. By assumption, the proposed information entropy-based spatiotemporal method is designed to reduce the pairwise autocorrelation. Typically, a lower pairwise autocorrelation indicates a more significant reduction of spatiotemporal correlation between the errors.
J. Zhao
As shown in Table 6, in general, most spatiotemporal models that incorporate spatiotemporal information apparently reduce the pairwise correlations. The reason is that the residuals of the spatiotemporal estimation refer to the error term vector ϵ in Eq. (2), rather than ε in Eq. (1). Spatiotemporal information is incorporated into the proposed model to reduce the interactions between residuals in the error term vector ϵ. In most cases, the first lag pairwise correlation shown in the proposed spatiotemporal model is reduced from 0.3289 shown in the non-spatiotemporal model. However, if the weight coefficient of the regional effect is set to zero, the spatiotemporal model does not outperform the non-spatiotemporal significantly. It indicates that the regional effect is crucial to autocorrelation reduction given the dataset. The weight coefficients of building, regional and temporal effects calculated by the information entropy-based method are 0.01, 0.55 and 0.44 respectively and it produces a relatively small autocorrelation. The information entropy-based method outperforms 92.6% of all alternatives using 68 linear combinations 5 of building, regional, and temporal effects in terms of the first lag autocorrelation reduction.
A Comparison of Goodness-of-Fit Tests The results of the goodness of fit for different weight coefficients are presented in Table 7. Tables 10, 11 and 12 in the Appendix show the details of the selected regression results. Palmer et al. (1996) show that there is a risk in the use of Gibbs sampling, if improper priori knowledge is included. In most cases, Bayesian approaches can detect heteroscedastic observations and down-weights them accordingly. 2
However, in some cases, Bayesian models show slightly lower R values than non-Bayesian models, and moreover, a Bayesian approach with less number of draws could outperform the one with a large number of draws in terms of the goodness of fit. The reason might be that Bayesian approaches down-weight the outliers to produce estimates that are more consistent with the overall pattern of the sample and therefore the goodness of fit would be sacrificed slightly. Thus, after attempting different r values and the number of draws, this paper assumes r value equals 2 and the number of draws equals 10,000 in order to minimize the side effect of Bayesian approach. It is shown that proposed information entropy-based model effectively improves the 2
goodness of fit. In terms of goodness of fit, the result (R ¼ 0:8521) generated from the proposed information entropy-based method (B = 0.01, R = 0.55, T = 0.44) is closer to the 2
optimal result (when B = 0.1, R = 0.8, T = 0.1, R reaches the optimal value of 0.8673). And the proposed information entropy-based method outperforms 94% of 68 selected linear combinations of building, regional, and temporal effects in terms of goodness of fit.
5
This paper selects 68 different linear combinations of weight coefficients. The details are explained in the next subsection.
Information Entropy-Based Housing Spatiotemporal... Table 6 Pairwise autocorrelation for the first 12 lagged residuals from the information entropy-based model estimation (1) Lag
Non-spatiotemporal model
Spatiotemporal model (r=2)
N/A
B = 0.1
B = 0.01
B = 0.6
B = 0.3
B=1
B = 0.3
B = 0.4
N/A
R = 0.3
R = 0.55
R=0
R = 0.3
R=0
R = 0.5
R = 0.5
N/A
T = 0.6
T = 0.44
T = 0.4
T = 0.4
T=0
T = 0.2
T = 0.1
1
0.3289
0.1545
0.1520
0.2594
0.2218
0.2966
0.2203
0.2225
2
0.2033
0.0929
0.0967
0.1544
0.1411
0.1919
0.1382
0.1373
3
0.0506
0.0201
0.0226
0.0330
0.0379
0.0527
0.0298
0.0263
4
0.1261
0.0594
0.0593
0.0976
0.0854
0.1151
0.0823
0.0842
5
0.0881
0.0439
0.0421
0.0647
0.0643
0.0805
0.0550
0.0503
6
0.1259
0.0577
0.0561
0.0951
0.0816
0.1068
0.0812
0.0759
7
0.1674
0.0825
0.0789
0.1276
0.1141
0.1470
0.1124
0.1103
8
0.1669
0.0813
0.0787
0.1293
0.1147
0.1511
0.1148
0.1124
9
0.1043
0.0453
0.0463
0.0784
0.0685
0.0878
0.0687
0.0645
10
0.0463
0.0164
0.0196
0.0348
0.0295
0.0347
0.0261
0.0190
11
0.1265
0.0599
0.0648
0.0990
0.0883
0.1087
0.0860
0.0783
12
0.0205
0.0088
0.0046
0.0182
0.0073
0.0124
0.0083
0.0091
N/A: There is no weight coefficient for the non-spatiotemporal model B denotes the weight coefficient of the building effect matrix R denotes the weight coefficient of the regional effect matrix T denotes the weight coefficient of the temporal effect matrix r refers to the prior parameter in Bayesian estimation
2
It is detected that if regional weight matrix is eliminated, the R will decline 2
significantly. For example, if B = 0.5, R = 0, T = 0.5, then R ¼ 0:6294. It indicates that the role of the regional effect in explaining the spatiotemporal dependence should not be replaced. Based on the comparison of regression results, it is concluded that the proposed information entropy-based model introduced by this paper provides a relatively better goodness of fit and a lower value of σ2 than its counterparts in most cases. Taken together, these phenomena imply that Bayesian estimation can correct the inefficient estimations resulting from conventional non-Bayesian approaches effectively. Results in Table 7 conclude that, in some cases, the proposed model with Bayesian estimation not only significantly alleviates the problem of heteroscedasticity, but reduces the problem of the autocorrelation to some extent if an appropriate number of draws is chosen. For example, the optimal combi2
nation (B = 0.1, R = 0.8, T = 0.1) not only generate the highest R (0.8673), but generate the lowest σ2 (0.0181). A potential reason is that the proposed model is able to incorporate the information on both neighboring and remote observations by the Gaussian function to offset the effect of homogeneity shared by units in the same building.
J. Zhao Table 7 The Comparison of goodness-of-fit tests 2
σ2
B
R
T
r
Number of draws
R
0.01
0.55
0.44
2
10000
0.8521
0.0188
0.33
0.33
0.33
2
10000
0.7388
0.0247
1
0
0
2
10000
0.6042
0.0318
0.9
0.1
0
2
10000
0.7742
0.0231
0.8
0.2
0
2
10000
0.7380
0.0246
0.7
0.3
0
2
10000
0.7799
0.0233
0.6
0.4
0
2
10000
0.8190
0.0205
0.5
0.5
0
2
10000
0.7609
0.0239
0.4
0.6
0
2
10000
0.8269
0.0201
0.3
0.7
0
2
10000
0.8184
0.0208
0.2
0.8
0
2
10000
0.6272
0.0307
0.1
0.9
0
2
10000
0.6937
0.0277
0
1
0
2
10000
0.7732
0.0231
0.9
0
0.1
2
10000
0.6701
0.0281
0.8
0.1
0.1
2
10000
0.8519
0.0185
0.7
0.2
0.1
2
10000
0.7497
0.0229
0.6
0.3
0.1
2
10000
0.8034
0.0217
0.5
0.4
0.1
2
10000
0.7924
0.0222
0.4
0.5
0.1
2
10000
0.7289
0.0247
0.3
0.6
0.1
2
1000
0.8073
0.0216
0.2
0.7
0.1
2
10000
0.7709
0.0231
0.1
0.8
0.1
2
10000
0.8673
0.0181
0
0.9
0.1
2
10000
0.7908
0.0219
0.8
0
0.2
2
10000
0.6492
0.0302
0.7
0.1
0.2
2
10000
0.8287
0.0202
0.6
0.2
0.2
2
10000
0.7608
0.0239
0.5
0.3
0.2
2
10000
0.7830
0.0226
0.4
0.4
0.2
2
10000
0.7905
0.0229
0.3
0.5
0.2
2
10000
0.7793
0.0232
0.2
0.6
0.2
2
10000
0.7978
0.0215
0.1
0.7
0.2
2
10000
0.7935
0.0216
0
0.8
0.2
2
10000
0.7532
0.0243
0.7
0
0.3
2
10000
0.5494
0.0304
0.6
0.1
0.3
2
10000
0.7701
0.0232
0.5
0.2
0.3
2
10000
0.8037
0.0217
0.4
0.3
0.3
2
10000
0.8236
0.0201
0.3
0.4
0.3
2
10000
0.8631
0.0181
0.2
0.5
0.3
2
10000
0.7769
0.0231
0.1
0.6
0.3
2
10000
0.6044
0.0316
0
0.7
0.3
2
10000
0.7790
0.0233
0.6
0
0.4
2
10000
0.6479
0.0303
0.5
0.1
0.4
2
10000
0.7805
0.0227
Information Entropy-Based Housing Spatiotemporal... Table 7 (continued) 2
σ2
B
R
T
r
Number of draws
R
0.4
0.2
0.4
2
10000
0.8407
0.0194
0.3
0.3
0.4
2
10000
0.7583
0.0242
0.2
0.4
0.4
2
10000
0.7935
0.0217
0.1
0.5
0.4
2
10000
0.6975
0.0275
0
0.6
0.4
2
10000
0.6672
0.0284
0.5
0
0.5
2
10000
0.6294
0.0309
0.4
0.1
0.5
2
10000
0.7082
0.0269
0.3
0.2
0.5
2
10000
0.8235
0.0203
0.2
0.3
0.5
2
10000
0.7598
0.0240
0.1
0.4
0.5
2
10000
0.7584
0.0242
0
0.5
0.5
2
10000
0.8085
0.0216
0.4
0
0.6
2
10000
0.6784
0.0279
0.3
0.1
0.6
2
10000
0.6979
0.0274
0.2
0.2
0.6
2
10000
0.8534
0.0186
0.1
0.3
0.6
2
10000
0.8285
0.0199
0
0.4
0.6
2
10000
0.7923
0.0217
0.3
0
0.7
2
10000
0.6750
0.0282
0.2
0.1
0.7
2
10000
0.7305
0.0245
0.1
0.2
0.7
2
10000
0.7753
0.0232
0
0.3
0.7
2
10000
0.7793
0.0230
0.2
0
0.8
2
10000
0.6906
0.0276
0.1
0.1
0.8
2
10000
0.7906
0.0218
0
0.2
0.8
2
10000
0.7389
0.0246
0.1
0
0.9
2
10000
0.6583
0.0290
0
0.1
0.9
2
10000
0.7673
0.0233
0
0
1
2
10000
0.6357
0.0308
B denotes the weight coefficient of the building effect matrix S denotes the weight coefficient of the regional effect matrix T denotes the weight coefficient of the temporal effect matrix
Spatiotemporal Forecast Unlike non-spatiotemporal forecast, the spatiotemporal forecast proposed in this paper requires the information on the geographic and temporal locations of the ex post observations. All out-of-sample transaction prices are predicted by the method of one-step-ahead static forecast (Zhao 2015). After adding an out-of-sample observation to the in-sample dataset, the sample size enlarges to n + 1. The out-of-sample transaction price is calculated according to the same methodology as that of the insample estimation. The forecast regression equation is illustrated as follows: 0
0
0
ðY1 ; ……; Yn ; Y F Þ ¼ ðX1 ; ……; Xn ; X F Þ β þ ðε1 ; ⋯⋯; εn ; meanðεÞÞ ; ð28Þ
J. Zhao 0
ðε1 ; ⋯⋯; εn ; meanðεÞÞ ¼ ρWðε1 ; ⋯⋯; εn ; meanðεÞÞ
0
ð29Þ
0
þð∈1 ; ⋯⋯; ∈n ; meanð∈ÞÞ ; In Eqs. (28) and (29), n denotes the number of observations in the in-sample dataset; W denotes an n + 1 × n + 1 forecast weight matrix that is constructed by the same method as that of the in-sample weight matrix, including the consideration of an out-of-sample observation of which the spatiotemporal location is priori knowledge; Yi , i = 1 , ⋯ , n denotes the dependent variable of the insample observations and similarly YF denotes the dependent variable of the outof-sample observation, which is priori knowledge; Xi , i = 1 , ⋯ , n denotes the vector of independent variables of the in-sample observations and similarly XF denotes the independent variable of the out-of-sample observation. In addition, the biasness of the error terms is taken into consideration to provide a more accurate forecast, and therefore the algebraic average of the error terms calculated from the in-sample estimation is used as a proxy for the out-of-sample error. As a result, mean(ε) denotes the algebraic average of the residuals ε 1 , ε 2 , ⋯ ⋯ , ε n obtained from the in-sample estimation, and similarly, mean(ϵ) denotes the algebraic average of the residuals ϵ1 , ϵ2 , ⋯ ⋯ , ϵn obtained from the in-sample estimation. ρ denotes the coefficient obtained from the in-sample estimation. The vector of coefficients β is obtained from the insample estimation. Among these variables, only Y F denotes the unknown variable, which needs to be estimated. This proposed one-step-ahead static forecast is conducted for each out-of-sample observation. Out-of-sample forecast accuracy is assessed by the square root of the mean squared error (SMSE):
^ ¼ SMSE θ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 ^ ; E θ−θ
ð30Þ
where ^ θ Θ
denotes the estimated value of the variable; denotes the recorded value of the variable in the out-of-sample dataset.
In this paper, θ refers to the regressand. The forecast methodology proposed by Zhao (2015) provides an ad hoc out-ofsample forecast for an ex post observation. Hence, it is more applicable to appraisers who are more interested in the precise valuation of properties. Bootstrap A bootstrap simulation is conducted to test the generality of the model proposed by this paper. The bootstrap results of four models: the non-spatiotemporal
Information Entropy-Based Housing Spatiotemporal...
model, the equally weighted model (B = 0.33, R = 0.33, T = 0.33), the information entropy-based model, and the optimally weighted model (B = 0.1, R = 0.8, T = 0.1) are compared at the end. The original sample is divided into two equal parts. The first 12,465 chronological observations are collected to generate the in-sample spatiotemporal distribution, while the remaining 12,465 chronological observations are collected to generate the out-of-sample spatiotemporal distribution. Fifty in-sample repetitions and fifty out-of-sample repetitions are conducted. In each repetition, 10,000 resampled observations are extracted from the in-sample distribution and 2000 resampled observations are extracted from the out-of-sample distribution. The four models are evaluated for their predictive accuracy, which is assessed by the Square root of the Mean Squared Error (SMSE).
SMSE ¼
∑50 j¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 u u ∑2000 Y −Y ^ ij t i¼1 ij 2000 50
;
ð31Þ
where Yij denotes the ith resampled out-of-sample observation generated from the out^ ij denotes the forecast value of Yij based on of-sample distribution in the jth repetition. Y the in-sample distribution. The Out-of-Sample Forecast In this paper, 2000 out-of-sample observations are collected for prediction. The results of four models: non-spatiotemporal model, equally weighted model, information entropy-based model and optimally weighted model are included in the comparison. All these above models adopt the forecast methodologies introduced in the previous subsection to estimate the prices of future dwelling transactions. Table 8 shows the results of the out-ofsample forecasts.
Table 8 The out-of-sample forecast
Methodology
SMSE
Non-spatiotemporal model
0.07362
Equally weighted model
0.04632
Information entropy-based model
0.01353
Optimally weighted model
0.01137
J. Zhao
Table 9 Bootstrap simulation forecast results
Model
SMSE
Non-spatiotemporal model
0.08736
Equally weighted model
0.05180
Information entropy-based model
0.02843
Optimally weighted model
0.02539
From the results in Table 8, it is shown that the optimally weighted approach provides the optimal forecast. On the contrary, the non-spatiotemporal model provides the most inaccurate forecast. The equally weighted model and the information entropy-based model provide better forecasts than the nonspatiotemporal model. The optimally weighted model performs slightly better than the information entropy-based model. It is concluded from the bootstrap simulation results in Table 9 that the performance of the information entropy-based model is close to that of the optimally weighted model. It provides a more accurate future forecast than the equally weight model and non-spatiotemporal model. In accordance with expectation, the non-spatiotemporal model produces the highest SMSE value indicating a less accurate ex post forecast. This comparison of results indicates that the proposed information entropybased produces both a better fit and, by comparing the bootstrap simulation results, a better overall forecast performance than the equally weighted model and non-spatiotemporal model. Based on the results, it can be identified that the information entropy-based model performs approximately to the optimally weighted method. It is worth mentioning that it is much more complicated and time consuming to utilize the statistical iteration process to obtain the optimal weight coefficients, particularly if the number of factor matrix is large. More importantly, the optimal combination of weight coefficients obtained from statistical iteration process may not be economic intuitive. Hence, the optimal weight coefficients generated from the statistical iteration process provide little theoretical contribution to the research. In conclusion, it is suggested that, compared with statistical iteration method, the information entropy-based approach provides a more reasonable and practical way to determine the weight coefficient of each weight matrix.
Conclusion The novelty of this paper rests on the use of an information entropy-based approach to estimate spatiotemporal dependence in real estate market. The information entropy-based approach introduced by this paper could be used as a method to construct real estate market indices. The proposed model provides
Information Entropy-Based Housing Spatiotemporal...
the information about price evolutions over the time horizon, which could be adopted as a surface to construct a price index for a location; on the other hand, it could also provide the information about price varying across different locations, which could result in a location surface at any time point. These surfaces are ideal sources to construct a property market index. The building effect usually does not exist among detached single-family homes. However, in a multi-unit residential market, dwelling units in the same building share some common characteristics. Consequently, a building effect analysis is required to measure the spatial relations between dwelling units in the same building. This paper uses doorplate numbers to measure the building effect between dwelling units in the same building. The proposed information entropy-based model with Bayesian estimation not only significantly alleviates the problem of heteroscedasticity, but reduces the problem of the autocorrelation to some extent if an appropriate number of draws is chosen. The potential reason is that the proposed approach is able to incorporate the information on both neighboring and remote observations by the Gaussian function to offset the effect of homogeneity shared by units in the same building. It is worth noting that, the proposed approach can incorporate dummy variables or binary variables, which are frequently used in real estate studies to describe the characteristics of properties. Incorporating dummy variables into the current study would be a potential improvement on this study because dummy variables can construct a price surface to further control spatiotemporal heterogeneity. The current study can be extended to other data in economics that feature attribute, spatial or temporal characteristics. For example, Case, Rosen and Hines (1993) explain public expenditures by incorporating spatial and temporal information. Moreover, the proposed research is not only restricted to the introduced dimensions. It could be extended further to more dimensions to incorporate more information that is not possibly incorporated by conventional methodologies to enhance the accuracy of estimation and prediction. The proposed methodology could easily apply to commercial real estate markets, which typically consists of different types of buildings in a compact area. In addition, this proposed methodology can also apply to repeat sales datasets, since the spatial effect between different observations of the same dwelling is zero in this case, while the temporal effect between them is nonzero. As a result, the proposed information entropy-based approach remains valid. Selecting an appropriate measurement to determine the weight coefficients of factors is a critical process by using the information entropy-based approach. This paper utilizes the dependence between observations on a spatiotemporal scale to determine the weight coefficient of each factor. Further studies may attempt to utilize other appropriate measurements to determine the weight coefficient of each factor based on the information entropy-based method.
0.0277
0.0822
0.0041
0.0119
0.0358*
−0.0652
0.1493*
0.0125*
−0.0168
0.0101*
Total Number of Rooms
Elementary School Quality
Junior High School Quality
High School Quality
Walk Score
0.0321
sigma^2
0.0247
0.7388
0.0091
0.0016
0.0117
0.0022
0.0196
0.0998
0.0307
0.0603
0.0654
0.0002
0.0000
0.2486
0.0188
0.8521
0.8540
10,000
0.44
0.55
0.01
2
0.8697*
0.0098*
−0.0131
0.0030*
0.1506*
−0.0707
0.0349*
0.0991*
−0.1541*
0.0001
−0.0002
4.4750*
Coefficient
0.0055
0.0017
0.0107
0.0011
0.0194
0.0700
0.0302
0.0611
0.0658
0.0002
0.0001
0.2401
Std deviation
Information entropy-based method
0.0277
0.6937
0.6954
10,000
0
0.9
0.1
2
0.8284*
0.0099*
−0.0134
0.0035*
0.1475*
−0.0645
0.0346*
0.1049*
−0.1668*
0.0001
−0.0001*
4.5239*
Coefficient
B denotes the weight coefficient of the building effect matrix. R denotes the weight coefficient of the regional effect matrix. T denotes the weight coefficient of the temporal effect matrix
0.0028
0.0017
0.0107
0.0012
0.0200
0.0773
0.0281
0.0563
0.0581
0.0003
0.0000
0.2673
Std deviation
Information entropy-based method
N/A: There is no Rho estimation for Non-Spatiotemporal method. NOD denotes the number of draws in Bayesian estimation. * indicates significance at 5% level
0.6019
Rbar-squared
0.7401
10,000
–
0.6030
0.33
–
T
NOD
0.33
–
R
R-squared
0.33
–
B
0.8491* 2
N/A
–
0.0095*
−0.0162
0.0069*
0.1502*
−0.0808
0.0313*
0.1119*
−0.1723*
0.0001
Rho
N/A
4.4974* 0.0001*
r
0.0015
0.0169
0.0588
Number of Floors
0.0549
−0.1557*
0.0793*
0.0001
Age
Number of Bathrooms
0.0002
−0.0001*
Number of Bedrooms
0.2067
0.0000
4.9109*
Size
Std deviation
Coefficient
Coefficient
Std deviation
Information entropy-based method
Non-spatiotemporal method
Const
Variable
Table 10 In-sample estimation
Appendix
J. Zhao
0.0000
0.0023
0.0561
0.0148
0.0981
0.0196
0.0026
0.0127
−0.0001*
0.0001
−0.1616*
0.0938*
0.0330*
−0.0815
0.1535*
0.0052*
Age
Number of bedrooms
Number of bathrooms
Total number of rooms
Number of floors
Elementary school quality
Junior high school quality
0.7497
0.0229
Rbar-squared
sigma^2
0.0079
0.0281
0.6701
0.6722
10,000
0.9
2
0.8113*
0.0075
0.0016
0.0126
0.0031
0.0182
0.0882
0.0130
0.0413
0.0593
0.0002
0.0000
0.2463
0.0269
0.7082
0.7103
10,000
0.5
0.1
0.4
2
0.8263*
0.0097*
−0.0166
0.0064*
0.1485*
−0.0811
0.0344*
0.1009*
−0.1684*
0.0001
−0.0001
4.5314*
Coefficient
0.0073
0.0018
0.0103
0.0031
0.0187
0.0906
0.0168
0.0462
0.0607
0.0002
0.0001
0.2230
Std deviation
Information entropy-based method
0.0002
0.0231
0.7742
0.7764
10,000
0
0.1
0.9
2
0.8124*
0.0099*
−0.0126
0.0069*
0.1481*
−0.0764
0.0236*
0.0838*
R denotes the weight coefficient of the regional effect matrix. T denotes the weight coefficient of the temporal effect matrix
0.0092
0.0018
0.0110
0.0031
0.0206
0.0926
0.0103
0.0388
0.0614
−0.0001 −0.1448*
0.0000
0.2350
Std deviation
−0.0001*
4.4798*
Coefficient
Information entropy-based method
NOD denotes the number of draws in Bayesian estimation. * indicates significance at 5% level. B denotes the weight coefficient of the building effect matrix
0.7515
R-squared
0.1
0.1
10,000
0.7
0.2
B
R
T
2
NOD
0
0.8025*
r
−0.0133 0.0095*
Rho
0.0016
−0.0156
0.0095*
High school quality
0.0064*
0.1211*
−0.0910
0.0284*
0.0909*
−0.1457*
0.0001
−0.0001*
4.4948*
Walk score
0.0395
0.2466
4.4972*
Size
Std deviation
Coefficient
Coefficient
Std deviation
Information entropy-based method
Information entropy-based method
Const
Variable
Table 11 In-sample estimation
Information Entropy-Based Housing Spatiotemporal...
0.0000
0.0003
0.0594
0.0165
0.0931
0.0185
0.0031
0.0122
−0.0001*
0.0001
−0.1729*
0.1030*
0.0382*
−0.0783
0.1493*
0.0066*
Age
Number of bedrooms
Number of bathrooms
Total number of rooms
Number of floors
Elementary school quality
Junior high school quality
0.0002
0.6975
0.0275
Rbar-squared
sigma^2
0.0058
0.0303
0.6479
0.6502
10,000
0.6
2
0.8275*
0.0067
0.0017
0.0147
0.0066
0.0210
0.0886
0.0035
0.0370
0.0231
0.7769
0.7785
10,000
0.3
0.5
0.2
2
0.8409*
0.0097*
−0.0146
0.0050*
0.1516*
−0.0863
0.0351*
0.0972*
−0.1551*
0.0001
−0.0001*
4.4844*
Coefficient
0.0085
0.0017
0.0123
0.0024
0.0194
0.0861
0.0159
0.0406
0.0610
0.0002
0.0000
0.2474
Std deviation
Information entropy-based method
0.0230
0.7793
0.7814
10,000
0.7
0.3
0
2
0.8259*
0.0098*
−0.0122
0.0045*
0.1479*
−0.0791
0.0295*
0.0897*
−0.1443*
−0.0001
−0.0001*
4.4882*
Coefficient
R denotes the weight coefficient of the regional effect matrix. T denotes the weight coefficient of the temporal effect matrix
0.0097
0.0016
0.0128
0.0021
0.0213
0.1013
0.0183
0.0399
0.0639
0.0003
0.0000
0.2640
Std deviation
Information entropy-based method
NOD denotes the number of draws in Bayesian estimation. * indicates significance at 5% level. B denotes the weight coefficient of the building effect matrix
0.6997
R-squared
0.4
0.4
10,000
0.1
0.5
B
R
T
2
NOD
0
0.8841*
r
−0.0003 0.0072*
Rho
0.0018
−0.0127
0.0095*
High school quality
0.0139*
0.1332*
−0.0439
0.0081*
0.0794*
0.0356
−0.0001 −0.0947*
0.0001
0.2518
−0.0002
4.8638*
Walk score
0.0554
0.2692
4.5059*
Size
Std deviation
Coefficient
Coefficient
Std deviation
Information entropy-based method
Information entropy-based method
Const
Variable
Table 12 In-sample estimation
J. Zhao
Information Entropy-Based Housing Spatiotemporal...
References Anselin, L. (1988). Spatial econometrics: methods and models. Boston: Kluwer Academic Publishers. Arnold, S. F. (1993). Gibbs sampling. In C. R. Rao (Ed.), Handbook of statistics. Amsterdam: North Holland. Balian, R. (2004). Entropy - a protean concept. In Dalibard, Jean. Poincaré seminar 2003: BoseEinstein condensation – entropy. Basel: Birkhäuser. Can, A., & Megbolugbe, I. (1997). Spatial dependence and house price index construction. Journal of Real Estate Finance and Economics, 14, 203–222. Case, A. C., Rosen, H. S., Hines, J. C. (1993). Budget spillover and fiscal policy interdependence:Evidence from the states. Journal of Public Economics, 52, 285-307. Chan, L. K., Kao, H. P., Ng, A., & Wu, M. L. (1999). Rating the importance of customer needs in quality function deployment by fuzzy and entropy methods. International Journal of Production Research, 37(11), 2499–2518. Cleveland, W. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74, 829–836. Dyer, J. A., & Sarin, R. K. (1979). Measurable multi-attribute value functions. Operations Research, 27, 810–822. Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically weighted regression: the analysis of spatially varying relationships. Chichester; Hoboken: Wiley. French, N. (2007). Valuation uncertainty - common professional standards and methods. Paper presented at the 13th Pacific-Rim Real Estate Society Conference, Fremantle, Australia. Ge, X. J., Lam, K. Y., & Lam, K. C. (2005). Entropy method in the rank of attributes for property value. Paper presented at the proceedings of the 3rd international structural engineering and construction conference, Shunan, Japan. Gelfand, A. E., & Smith, F. M. (1990). Sample based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409. Geman, S., & Geman, D. (1984). Stochastic relaxations, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741. Geweke, J. (1993). Bayesian treatment of the independent student linear model. Journal of Applied Econometrics, 8, 19–40. Goodman, A., & Thibodeau, T. (1995). Dwelling age heteroscedasticity in hedonic house price equations. Journal of Housing Research, 6, 25–42. Hall, R. E. (1971). The measurement of quality change from vintage price data, in price indexes and quality change, edited by Zvi Griliches. Cambridge: Harvard University Press. Hulten, C. R., & Wykoff, F. C. (1981). The estimation of economic depreciation using vintage asset prices: an application of the box-cox power transformation. Journal of Econometrics, 15, 376–396. Kapur, J. N. (1989). Maximum-entropy models in science and engineering. New York: Wiley. LeSage, J. (1999). The theory and practice of spatial econometrics. Working Paper. Department of Economics. Toledo: University of Toledo. Ord, K. (1975). Estimation methods for models of spatial interaction. Journal of the Statistical Association, 70, 120–126. Pace, R. K., & Gilley, O. W. (1997). Using the spatial configuration of the data to improve estimation. The Journal of Real Estate Finance and Economics, 14(3), 333–340. Pace, R. K., Barry, R., Clapp, J., & Rodriguez, M. (1998). Spatial–temporal estimation of neighborhood effects. Journal of Real Estate Finance and Economics, 17, 15–33. Pace, R. K., Barry, R., Gilley, O. W., & Sirmans, C. F. (2000). A method for spatial- temporal forecasting with an application to real estate prices. International Journal of Forecasting, 16, 229–246. Palmer, J. L., & Pettit, L. I. (1996). Risks of using improper priors with Gibbs sampling and autocorrelated errors. Journal of Computational and Graphical Statistics, 5(3), 245-249. Pandurangan, G., & Upfal, E. (2007). Entropy. ACM Transactions on Algorithms, 3(1), Article No. 7. Shannon, C. E., & Weaver, W. (1963). The mathematical theory of communication. Urbana: University of Illinois Press. Sun, H., Tu, Y., & Yu, S. (2005). Spatio-temporal autoregressive model for multi-unit residential market analysis. The Journal of Real Estate Finance and Economics, 31(2), 155–187. Tu, Y., Sun, H., & Yu, S. (2004). Transaction-based office price indexes: a Spatialtemporal modeling approach. Real Estate Economics, 32(2), 297–328.
J. Zhao Zanakis, S. H., Solomon, A., Wishart, N., & Dublish, S. (1998). Multi-attribute decision making: a simulation comparison of select methods. European Journal of Operational Research, 107, 507–529. Zhao, J. (2015). The anisotropic spatiotemporal estimation of housing prices. Journal of Real Estate Finance and Economics, 50(4), 484–516.