Benefits and limitations of machine learning methods in the inhomogeneous real estate market of mixed-use asset class

Soot, Matthias; Sabine Horvath; Hans-Berndt Neuner; Alexandra Weitkamp; Simon Thaler; Wolfgang A. Brunauer

Property rates, usually used in the income approach, can be determined in a reverse income approach model for every transaction where the net yield is known. The height of the property rates represents the risk of the asset that is traded. The height of the yield, therefore, depends on influencing parameters that can explain the risk. A classical approach to investigate these influences is a multiple linear regression model. In an inhomogeneous market, the investigation leads to bad results for the classic approach.

In this work, we will compare different parametric and non-parametric methods to model the height of the rates. Thus, we present the application of Artificial Neural Networks (ANN) as well as Random Forest Regression (RFR) as non-parametric methods and compare the results with parametric approaches like the classic multiple linear regression (MLR) as well as a Geographical Weighted Regression (GWR). The dataset consists of a submarket of mixed-use-buildings (residential and commercial) in the federal state of Lower Saxony (Germany).

The asset class of mixed-use is only traded 200 times per year in the federal state with more than 8 million inhabitants. Therefore, the investigated sample (including 5 years of data) comes from the official purchase price database. Beside the building characteristics (No. of floors, year of construction and average rent per sqm), locational parameters are considered (standard land value, population forecast, and population structure).

Due to the inhomogeneous rural, urban and socio-demographic environment, the models can be complex. The evaluation of the different approaches led to inhomogeneous results. No perfect method can be determined for the dataset. Our goal is to understand and interpret the different results in the view of how the methods work. Therefore, we investigate the results by means of the used influencing parameters (model size), sample sizes and the influence/significance of the parameters on the result. The patterns found are discussed in comparison of methods and in the context of the data. We conclude our contribution by formulating the possibilities and limitations.

API for automated, fast, and accurate readout of Energy Performance Certificates (EPCs)

Climate Risk and Mobility in France

Multimodal Information Fusion for the Prediction of the Condition of Condominiums