To minimize risks and increase transparency, every company needs reliable information. The quality and completeness of digital building documentation is more and more a factor as “deal maker” and “deal breaker” in real estate transactions. However, there is a fundamental lack of instruments for leveraging internal data and a risk of overlooking the essentials.

In real estate transactions, the parties generally have just a few weeks for due diligence (DD). A large variety of Documents needs to be elaborately prepared and make available in data rooms. As a result, gaps in the documentation may remain hidden and can only be identified with great effort. Missing documents may result in high purchase price discounts. Therefore, investors are increasingly using a data-driven approach to gain essential knowledge in transaction processes. Digital technologies in due diligence processes should help to reduce existing information asymmetries and sustain data-supported decisions.

The paper describes an approach to automate Due Diligence processes with a focus on Technical Due Diligence (TDD) using Machine Learning (ML), esp. Information Extraction. The overall aim is to extract relevant information from building-related documents to generate a semi-automated report on the structural (and environmental) condition of properties.

The contribution examines due diligence reports on more than twenty office and retail properties. More than ten different companies generated the reports between 2006 and 2016. The research work provides a standardized TDD reporting structure which will be of relevance for both research and practice. To define relevant information for the report, document classes are reviewed and contained data prioritized. Based on this, various document classes are analyzed and relevant text passages are segmented. A framework is developed to extract data from the documents, store it and provide it in a standardized form. Moreover the current use of Machine Learning in DD processes, the research method and framework used for the automation of TDD and its potential benefits for transactions and risk management are presented.