Predicting Home Value in California, United States via Machine Learning Modeling

  • Yitong Huang Department of Computer Science, Illinois Institute of Technology, USA
Keywords: home value, log error, linear regression, decision tree, boosting, random forest, SVM


The market value of real estate is difficult to predict with simple regression model due to the diversity and complexity of the real data. In this paper, with the latest real estate data of three counties in Los Angeles, California, United States, both linear and non-linear machine learning methods are employed to predict the log error of the home value. The motivation is to improve the accuracy in home value prediction with advanced methods. The main contribution is that it finds that traditional linear models are not predictive for complex home value data sets, while tree based non-linear models are most accurate with the lowest mean square errors.


J. McCarthy, M.L. Minsky, N. Rochester, C.E. Shannon, A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, AI MAGAZINE, vol. 27, no. 4, pp. 12–14, 2006.

N. Cristianini, The road to artificial intelligence: A case of data over theory, New Scientist, 2016

A. Halevy, P. Norvig, F. Pereira, The Unreasonable Effectiveness of Data, IEEE Intelligent Systems, vol. 24, no. 2, pp. 8–12, 2009.

D. Goldberg, J. Holland,, Genetic Algorithms and Machine Learning, Machine learning, vol. 3, pp. 95–99, 1988.

W. Cai, J. Gong, N. Wu, 2.4GHZ Class F Power Amplifier for Wireless Medical Sensor Network, Proceedings of the 2nd World Congress on New Technologies, 2016.

Z. Zhang, J. Ou, D. Li, S. Zhang, J. Fan, A thermography-based method for fatigue behavior evaluation of coupling beam damper, Fracture and Structural Integrity, vol. 11, no. 40, 2017.

E. Turban, J.E. Aronson, T.-P. Liang, Decision Support Systems and Intelligent Systems, Prentice-Hall, Inc., Upper Saddle River, NJ, 2005.

W. Cai, L. Huang, W. Wen, 2.4GHZ Class AB Power Amplifier for Wireless Medical Sensor Network, International Journal of Enhanced Research in Science, Technology & Engineering, vol. 5, no. 4, pp. 94–98, 2016.

Z. Zhang, J. Ou, D. Li, S. Zhang, Optimization Design of Coupling Beam Metal Damper in Shear Wall Structures, Applied Sciences,vol. 7, no. 137, 2017.

D. Grant, E. Cherif, Analysis of e-business models in real estate, Electronic Commerce Research, 2013.

W. Cai, F. Shi, Design of low power medical device, International Journal of VLSI design & Communication Systems (VLSICS),vol.8, no. 2, 2017.

D. Li, S. Zhang, W. Yang, W. Zhang, Corrosion Monitoring and Evaluation of Reinforced Concrete Structures Utilizing the Ultrasonic Guided Wave Technique, International Journal of Distributed Sensor Networks, vol. 10, no. 9, 2014.

R.E. Lowrance, Predicting the Market Value of Single-Family Residential Real Estate, PhD Dissertation, 2015.

G. Hu, J. Wang, W. Feng, Multivariate Regression Modeling for Home Value Estimates with Evaluation Using Maximum Information Coefficient, Software Engineering, Artificial Intelligence, Networking, SCI 443, pp. 69–81, 2013.

Vapnik, V, The Nature of Statistical Learning Theory, Springer- Verlag., 1995.

Shuichi Shinmura, The 95% confidence intervals of error rates and discriminant coefficients, Statistics, Optimization & Information Computing, vol. 3, pp. 66–78, 2015.

J. Mu, F. Wu, A. Zhang, Housing Value Forecasting Based on Machine Learning Methods, Abstract and Applied Analysis, 2014.

How to Cite
Huang, Y. (2019). Predicting Home Value in California, United States via Machine Learning Modeling. Statistics, Optimization & Information Computing, 7(1), 66-74.
Research Articles