Article
Extracting text from unstructured data for intelligent automation
June 25, 2021
Many businesses experience a recurring challenge regarding the optimal use of unstructured text data in workflow automation and analytics reporting. Within these documents, key words and phrases play an important role in obtaining the idea behind text data quickly, eliminating the need to read the entire text document. This innovation can accelerate workflow automation and analytics reporting, and a vital part of assisting in this information retrieval domain includes leveraging machine learning and Natural Language Processing (NLP) capabilities on Amazon Web Services (AWS).
To better understand key word and phrase extraction, let’s explore an example that illustrates how key word and key phrase extraction can be used to support accounting challenges related to the ASC 842 lease accounting standard, which required a manual review of leases. The business request was to identify key lease terms, such as lease commencement, termination dates and base rent amounts, from PDF documents to accelerate the review process. The following graphics show the input, a lease in PDF format, and the output, an Excel spreadsheet, with the key terms identified in this lease.
Input:
Output:
This output Excel spreadsheet allows business leaders to streamline their review because they can focus their attention on just the list of PDF documents listed within the output report. If a business leader is attempting to sift through several thousand PDF documents, this NLP technology can reduce their manual effort and allow employees to finish their reporting projects quicker.
Additionally, NLP and key term extraction technology can also help with other types of business challenges, across industries:
Aerospace industry
- Product performance and reliability reporting in the aerospace industry presented with challenges in identifying which component was involved in a failure, by manufacturer serial number; when the failure occurred; and which airline, aircraft and flight phase was involved, by extracting key terms from unstructured text.