Indico Data releases industry-first large language model benchmark for document understanding tasks
Learn More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report


How Intelligent Process Automation Addresses the AI Data Problem

April 20, 2020 | Artificial Intelligence, Business, Intelligent Process Automation

Back to Blog

Companies looking to make effective use of artificial intelligence (AI) face a big problem: data – as in, AI solutions often require too much of it to create effective models. 

Let’s say you’re trying to create a model to automate the mortgage underwriting process. The process typically involves a human looking over lots of documents to assess an applicant’s creditworthiness. Those documents are likely to include tax returns, credit reports, W-2 wage forms, bank statements and more. To create an accurate classification or extraction model for this use case, you’d need over 100,000 sample data points or more. 

Unfortunately, this puts AI out of reach for most organizations save the Googles and Amazons of the world.

Rule-based AI: Insatiable demand for data

One way to try to automate a mortgage underwriting workflow and avoid the need for huge data sets, is to use a templated approach, which involves creating a series of rules to extract key bits of relevant data from each type of document. Such data may include adjusted gross income from tax returns for multiple years, salary data from W-2s, and lots of data around credit card balances, auto loans, and other types of debt. 

These rules would have to define exactly where on each type of document the relevant data can be found. That’s no mean feat given the variation among the documents in question. Consider just tax returns. One applicant may file a 1040, while another uses 1040A and a third 1040EZ.  Bank statements, of course, will vary depending on the bank in question as will credit reports from the three major reporting firms. 

While a rules-based approach might seem like a viable solution to avoid the need for thousands of sample data points, you’d find yourself presented with another challenge. Beyond the variation in format, there’s also plenty of judgment calls to be made about which data to extract. Writing rules to cover every possible permutation of what an underwriter may care about is an exercise in futility

It’s worth noting that robotic process automation (RPA) isn’t a likely solution to this problem. RPA is great at automating predefined steps that never vary. For example, if you know that a salary figure shows up in the same place on the same document every time, you can use RPA to automate the process of highlighting that figure, copying it and pasting it into a downstream system the underwriter uses for credit evaluations. But given all the variation inherent in the mortgage underwriting process, that’s not a viable approach. 

Adding intelligence to process automation 

What’s required is an AI solution with more emphasis on “intelligence,” which is where the concept of intelligent process automation (IPA) comes in. 

IPA tools are able to understand document context and learn what a given value looks like and find it no matter where on a document it may be. For example, any mortgage underwriting process involves assessing the total outstanding debt an applicant is carrying. That means combing over those credit reports to find balances on credit cards, auto loans and the like.  

By examining only about 200 examples of what a debt figure looks like, a good intelligent automation tool can then find debt figures on any relevant document, no matter if it’s from Equifax, Experian or TransUnion. That’s because the IPA tool learns and can understand the surrounding context of the document, so it can discern a debt figure from, say, income. It can also be trained to extract data showing whether an applicant is typically on time with loan payments or chronically late. 

The key value proposition behind IPA is that it works with just those 200 or so examples of the value in question, a capability known as “low training data.”  Training data is the gating factor that stymies so many AI projects; most companies simply don’t have enough data to accurately train the AI tool. But IPA makes use of AI technologies including machine learning, transfer learning and natural language processing to overcome that issue and produce models that are extremely accurate. 

And IPA tools can be applied to plenty of use cases besides mortgage underwriting, including insurance claims processing, customer onboarding, title and deed processing, financial document analysis and more. 


Increase intake capacity. Drive top line revenue growth.


Get started with Indico

1-1 Demo



Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!