Over the real estate lifecycle numerous documents and data are generated. The majority of building-related data is collected in day-to-day operations, such as maintenance protocols, contracts or energy consumptions. Previous successes in the classification already help to automatically recognize, categorize and name documents as well as to sort them into an individual structure in digital data rooms (Bodenbender/Kurzrock 2018). The actual added value is created in the next step: efficient data analysis with specific utilization of the data.

This paper describes an approach for the automation of Technical Due Diligence (TDD) by information extraction (IE). The aim is to extract relevant information from building-related documents and to automatically gain quick insights into the condition of real estate. A global asset under management (AuM) of US$1.2 trillion (PWC, AWM Report, 2017) and a global real estate transaction volume of around US$650 billion in 2016 (JLL Global Market Perspective, 2017) show that there is a regular need to analyze building data. Transactions are a very dynamic area where current trends focus on a more data-driven approach to improve time and cost.

In addition, the paper focuses on the standardization of information extraction methods for the TDD as well as the prioritization and evaluation of building-related data. The automated evaluation supports value-adding decisions in the real estate lifecycle with a detailed database. TDD audits are a key objective for reducing information asymmetries, especially in large transactions.

Efficient technologies are now available for IE from digital building data. Through machine learning, documents can be read and evaluated automatically. Digital data rooms and operational applications such as ERP systems serve as a source of information for information extraction. Due to the heterogeneity of the documents, rule and learning-based algorithms are used. The IE is based on various technical bases, especially in the field of neural networks and deep learning methods. As the documents are often only available as scans, it requires the integration of OCR methods.

The contribution to the ERES-PhD session presents the current state of information extraction in the real estate industry, the research method used for the automation of TDD and its potential benefits for real estate management.