Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data

https://doi.org/10.1016/j.eswa.2014.11.040Get rights and content

Highlights

  • Housing price valuation is one of most important trading decisions.

  • This study uses machine learning to develop housing price prediction models.

  • This study analyzes the housing data of 5359 townhouses in Fairfax County, VA.

  • The 10-fold cross-validation was applied to C4.5, RIPPER, Bayesian, and AdaBoost.

  • RIPPER outperformed these other housing price prediction models in all tests.

Abstract

House sales are determined based on the Standard & Poor’s Case-Shiller home price indices and the housing price index of the Office of Federal Housing Enterprise Oversight (OFHEO). These reflect the trends of the US housing market. In addition to these housing price indices, the development of a housing price prediction model can greatly assist in the prediction of future housing prices and the establishment of real estate policies. This study uses machine learning algorithms as a research methodology to develop a housing price prediction model. To improve the accuracy of housing price prediction, this paper analyzes the housing data of 5359 townhouses in Fairfax County, Virginia, gathered by the Multiple Listing Service (MLS) of the Metropolitan Regional Information Systems (MRIS). We develop a housing price prediction model based on machine learning algorithms such as C4.5, RIPPER, Naïve Bayesian, and AdaBoost and compare their classification accuracy performance. We then propose an improved housing price prediction model to assist a house seller or a real estate agent make better informed decisions based on house price valuation. The experiments demonstrate that the RIPPER algorithm, based on accuracy, consistently outperforms the other models in the performance of housing price prediction.

Introduction

The continuous rise in interest rates since 2005 has markedly slowed the housing market in the US Lehman Brothers Holdings, Inc., a US investment bank, was forced into bankruptcy on September 15, 2008 because of excessive borrowing of financial instruments that were devalued because of a serious reduction in housing prices. The insolvency of Lehman Brothers Holdings, Inc. and the sub-prime mortgage crisis intensified the slowdown of the actual economy and the decline of asset values. These depreciated the global real estate market and housing prices and sparked a global financial crisis.

House sales are determined based on the Standard & Poor’s Case-Shiller home price indices and the housing price index of the Office of Federal Housing Enterprise Oversight (OFHEO). These reflect the trends of the US housing market. According to the Case-Shiller home price indices, housing prices have declined by approximately 30–60% in major cities in the US since the sub-prime mortgage crisis. In Los Angeles, home prices peaked in September 2006 and continued to fall until the end of 2011. In June 2011, the housing prices in Los Angeles fell to 61.46% compared to September 2006 when the prices were the highest. Similarly, in New York, housing prices peaked in June 2006 and continued to fall until the end of 2011. In June 2011, home prices in New York fell to 29.56% compared to June 2006 when the prices were the highest.

Beginning in November 2012, the US housing market is experiencing a rapid recovery because of the decreasing inventory of houses, the increasing demand for new houses following employment growth, and government policies supporting the real estate market. To sustain the recovery trend of the housing market, timely real estate policies from the government are required. Therefore, housing price indices determining the housing market trend must be researched and developed. Housing price indices can be important indicators for stakeholders in the real estate market including real estate agents, appraisers, assessors, mortgage lenders, brokers, property developers, investors and fund managers, and policy makers, as well as to actual and potential home-owners. Moreover, the development of a housing price prediction model would greatly assist in the prediction of future housing prices and the establishment of real estate policies. This study uses machine learning algorithms as a research methodology to develop a housing price prediction model.

Machine learning has been used in disciplines such as business, computer engineering, industrial engineering, bioinformatics, medical, pharmaceuticals, physics, and statistics to gather knowledge and predict future events. With the recent growth in the real estate market, machine learning can play an important role to predict the price of a property. However, few researchers have experimented on the selling price for real estate properties using machine learning algorithms. In the real estate market, real estate agents, buyers, and sellers are all important players. If homeowners wish to sell their townhouse, they can be represented by a real estate agent. The agent inputs the information regarding the seller’s townhouse into a Multiple Listing Service (MLS). Other real estate agents can then access this information as an active listing. Sellers desire to sell their townhouses at their asking price. Conversely, buyers attempt to pay less than the listing price to close the transaction. Therefore, there can be price differences between the listing price that the seller originally expects and closing price that the buyers pay. From a seller’s point of view, if the closing price is greater than or equal to the listing price, the deal is profitable. If the closing price is less than the listing price, then it could be a loss for the seller. We use machine learning algorithms as a tool to predict whether the closing price will be greater or less than the listing price.

It is a well-known fact that housing price valuation is one of most important trading decisions affecting a national real estate policy. In this study, we create models using machine learning algorithms such as C4.5, RIPPER (Repeated Incremental Pruning to Produce Error Reduction), Naïve Bayesian, and AdaBoost (Adaptive Boosting) to predict housing price.

The remainder of this paper is organized as follows. Section 2 reviews some background research studies on housing price prediction. Section 3 explains experimental design and analysis procedure. Experimental results are presented and analyzed in Section 4, and finally, our concluding remarks are provided in Section 5.

Section snippets

Housing price prediction models

An analysis of the housing market and housing price valuation literature indicates two principal research trends: the use of the hedonic-based regression approach and artificial intelligence techniques for developing housing price prediction models. For several decades, various hedonic-based methods were utilized to identify the relationship between house prices and housing characteristics (Adair et al., 1996, Selim, 2009). Meese and Wallace (2003) developed hedonic-based regression approaches

Experimental design

This section describes how to establish the experiment in order to test performance of machine learning algorithms for classification. We started from merging real estate, public school ratings, and mortgage rate data into an integrated dataset. Four machine learning classifiers were selected and tested on WEKA data mining software. To determine the performance of each classifier, we explored two performance tests which are three-way data split with 10-folds and 10-fold cross-validation.

Experimental results

We applied the two performance tests on the machine learning classifiers. The three-way split was applied to the C4.5, RIPPER, and Naïve Bayesian methods. The 10-fold cross-validation was applied to C4.5, RIPPER, Naïve Bayesian, and AdaBoost. Table 5, Table 6, Table 7 display the results for the three-way split. For each fold, the training and validation pairs that returned the minimum error are displayed. The last column is the test error based on the test set of each fold. Fig. 4 presents a

Conclusions

In this study, several machine learning algorithms are used to develop a prediction model for housing prices. We test for the performance of these techniques by measuring how accurately a technique can predict whether the closing price is greater than or less than the listing price. Four different machine learning algorithms including C4.5, RIPPER, Naïve Bayesian, and AdaBoost are selected, and tested for which algorithm produces the highest rate of the accuracy. We find that the performance of

References (17)

There are more references available in the full text version of this article.

Cited by (256)

View all citing articles on Scopus
View full text