Article

Extracting text from unstructured data for intelligent automation

June 25, 2021

Many businesses experience a recurring challenge regarding the optimal use of unstructured text data in workflow automation and analytics reporting. Within these documents, key words and phrases play an important role in obtaining the idea behind text data quickly, eliminating the need to read the entire text document. This innovation can accelerate workflow automation and analytics reporting, and a vital part of assisting in this information retrieval domain includes leveraging machine learning and Natural Language Processing (NLP) capabilities on Amazon Web Services (AWS).

To better understand key word and phrase extraction, let’s explore an example that illustrates how key word and key phrase extraction can be used to support accounting challenges related to the ASC 842 lease accounting standard, which required a manual review of leases. The business request was to identify key lease terms, such as lease commencement, termination dates and base rent amounts, from PDF documents to accelerate the review process. The following graphics show the input, a lease in PDF format, and the output, an Excel spreadsheet, with the key terms identified in this lease.

Input:

agreement input

Output:

agreement output

This output Excel spreadsheet allows business leaders to streamline their review because they can focus their attention on just the list of PDF documents listed within the output report. If a business leader is attempting to sift through several thousand PDF documents, this NLP technology can reduce their manual effort and allow employees to finish their reporting projects quicker.

Additionally, NLP and key term extraction technology can also help with other types of business challenges, across industries:

Aerospace industry

Extracting text from unstructured data for intelligent automation

Related sections

Understanding the methodology

Extracting future business insights