Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data
Introduction
The continuous rise in interest rates since 2005 has markedly slowed the housing market in the US Lehman Brothers Holdings, Inc., a US investment bank, was forced into bankruptcy on September 15, 2008 because of excessive borrowing of financial instruments that were devalued because of a serious reduction in housing prices. The insolvency of Lehman Brothers Holdings, Inc. and the sub-prime mortgage crisis intensified the slowdown of the actual economy and the decline of asset values. These depreciated the global real estate market and housing prices and sparked a global financial crisis.
House sales are determined based on the Standard & Poor’s Case-Shiller home price indices and the housing price index of the Office of Federal Housing Enterprise Oversight (OFHEO). These reflect the trends of the US housing market. According to the Case-Shiller home price indices, housing prices have declined by approximately 30–60% in major cities in the US since the sub-prime mortgage crisis. In Los Angeles, home prices peaked in September 2006 and continued to fall until the end of 2011. In June 2011, the housing prices in Los Angeles fell to 61.46% compared to September 2006 when the prices were the highest. Similarly, in New York, housing prices peaked in June 2006 and continued to fall until the end of 2011. In June 2011, home prices in New York fell to 29.56% compared to June 2006 when the prices were the highest.
Beginning in November 2012, the US housing market is experiencing a rapid recovery because of the decreasing inventory of houses, the increasing demand for new houses following employment growth, and government policies supporting the real estate market. To sustain the recovery trend of the housing market, timely real estate policies from the government are required. Therefore, housing price indices determining the housing market trend must be researched and developed. Housing price indices can be important indicators for stakeholders in the real estate market including real estate agents, appraisers, assessors, mortgage lenders, brokers, property developers, investors and fund managers, and policy makers, as well as to actual and potential home-owners. Moreover, the development of a housing price prediction model would greatly assist in the prediction of future housing prices and the establishment of real estate policies. This study uses machine learning algorithms as a research methodology to develop a housing price prediction model.
Machine learning has been used in disciplines such as business, computer engineering, industrial engineering, bioinformatics, medical, pharmaceuticals, physics, and statistics to gather knowledge and predict future events. With the recent growth in the real estate market, machine learning can play an important role to predict the price of a property. However, few researchers have experimented on the selling price for real estate properties using machine learning algorithms. In the real estate market, real estate agents, buyers, and sellers are all important players. If homeowners wish to sell their townhouse, they can be represented by a real estate agent. The agent inputs the information regarding the seller’s townhouse into a Multiple Listing Service (MLS). Other real estate agents can then access this information as an active listing. Sellers desire to sell their townhouses at their asking price. Conversely, buyers attempt to pay less than the listing price to close the transaction. Therefore, there can be price differences between the listing price that the seller originally expects and closing price that the buyers pay. From a seller’s point of view, if the closing price is greater than or equal to the listing price, the deal is profitable. If the closing price is less than the listing price, then it could be a loss for the seller. We use machine learning algorithms as a tool to predict whether the closing price will be greater or less than the listing price.
It is a well-known fact that housing price valuation is one of most important trading decisions affecting a national real estate policy. In this study, we create models using machine learning algorithms such as C4.5, RIPPER (Repeated Incremental Pruning to Produce Error Reduction), Naïve Bayesian, and AdaBoost (Adaptive Boosting) to predict housing price.
The remainder of this paper is organized as follows. Section 2 reviews some background research studies on housing price prediction. Section 3 explains experimental design and analysis procedure. Experimental results are presented and analyzed in Section 4, and finally, our concluding remarks are provided in Section 5.
Section snippets
Housing price prediction models
An analysis of the housing market and housing price valuation literature indicates two principal research trends: the use of the hedonic-based regression approach and artificial intelligence techniques for developing housing price prediction models. For several decades, various hedonic-based methods were utilized to identify the relationship between house prices and housing characteristics (Adair et al., 1996, Selim, 2009). Meese and Wallace (2003) developed hedonic-based regression approaches
Experimental design
This section describes how to establish the experiment in order to test performance of machine learning algorithms for classification. We started from merging real estate, public school ratings, and mortgage rate data into an integrated dataset. Four machine learning classifiers were selected and tested on WEKA data mining software. To determine the performance of each classifier, we explored two performance tests which are three-way data split with 10-folds and 10-fold cross-validation.
Experimental results
We applied the two performance tests on the machine learning classifiers. The three-way split was applied to the C4.5, RIPPER, and Naïve Bayesian methods. The 10-fold cross-validation was applied to C4.5, RIPPER, Naïve Bayesian, and AdaBoost. Table 5, Table 6, Table 7 display the results for the three-way split. For each fold, the training and validation pairs that returned the minimum error are displayed. The last column is the test error based on the test set of each fold. Fig. 4 presents a
Conclusions
In this study, several machine learning algorithms are used to develop a prediction model for housing prices. We test for the performance of these techniques by measuring how accurately a technique can predict whether the closing price is greater than or less than the listing price. Four different machine learning algorithms including C4.5, RIPPER, Naïve Bayesian, and AdaBoost are selected, and tested for which algorithm produces the highest rate of the accuracy. We find that the performance of
References (17)
- et al.
A hybrid fuzzy regression-fuzzy cognitive map algorithm for forecasting and optimization of housing market fluctuations
Expert Systems with Applications
(2012) A prediction comparison of housing sales prices by parametric versus semi-parametric regressions
Journal of Housing Economics
(2004)- et al.
Housing price forecasting based on genetic algorithm and support vector machine
Expert Systems with Application
(2011) - et al.
The use of fuzzy logic in predicting house selling price
Expert Systems with Applications
(2010) Determinants of house prices in Turkey: Hedonic regression versus artificial neural network
Expert Systems with Applications
(2009)New empirical evidence on heteroscedasticity in hedonic housing models
Journal of Housing Economics
(2004)- et al.
Real estate price forecasting based on SVM optimized by PSO
Optik-International Journal for Light and Electron Optics
(2014) - et al.
Hedonic modeling, housing submarkets and residential valuation
Journal of Property Research
(1996)
Cited by (256)
How urban air quality affects land values: Exploring non-linear and threshold mechanism using explainable artificial intelligence
2024, Journal of Cleaner ProductionA hybrid machine learning framework for forecasting house price
2023, Expert Systems with ApplicationsEffects of economic factors on median list and selling prices in the U.S. housing market
2023, Data Science and ManagementLeading indicators for the US housing market: New empirical evidence and thoughts about implications for risk managers and ESG investors
2023, International Review of Financial AnalysisEffect of realistically estimated building lifespan on life cycle assessment: A case study in Korea
2023, Journal of Building Engineering