Indico Data releases industry-first large language model benchmark for document understanding tasks
Learn More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report

Smart OCR Initiative

The guide to process automation for unstructured documents

Without OCR-templating or rule-based approaches


Automate complex document-based workflows

Schedule your demo

Rule-based or templated approaches for document process automation struggle to work with unstructured content (which makes up the majority of content in an organization). Intelligent automation solutions understand unstructured documents like humans, allowing them to effectively automate processes involving unstructured content for use cases in Financial Services, Insurance, and more.



Automate complex document-based workflows

Schedule your demo


of data in the Enterprise

Less than 2%

is leveraged by AI


in resources using AI

Manually extracting

relevant information is laborious and error-prone

Document variation

makes rule-based workflow automation impractical

Intelligent automation

understands document context

The unstructured data challenge

As companies seek to automate as many processes as possible, they hit a wall regarding unstructured data such as long-form financial documents, contracts, and emails. The reason is simple: unstructured content varies dramatically from one document to the next.

Companies used a rule-based approach such as OCR Templating and or Robotic Process Automation to automate document processing. This results in little success as these templates are too brittle to handle the variation inherent in unstructured content.

Intelligent document processing (IDP)
for unstructured documents

Download now

What is OCR and do you need it?

Optical character recognition (OCR) is the first tool you’ll encounter while researching solutions for automating processes involving unstructured content.

You’d use OCR to deal with scanned documents. Your documents are essentially images, even in PDF format. Images, of course, are not readily machine-readable, so computers can’t immediately process what humans can see is text in the document. OCR addresses that issue by identifying text in such documents and converting it to a digitized format that computers can handle.

You may not need OCR to interpret a PDF document. Suppose you take a Word document, for example, and save it as a PDF. In that case, the text information should be automatically preserved in text format, and any system can process it directly.

Beware of OCR Templates

You may encounter the term “OCR templating” in your search for document processing solutions. It’s often confused with vanilla OCR. While it also uses OCR in its workflow (i.e., turning images into readable text), the critical difference arises in the term “templating.” This means that these vendors produce rule-based templates built from a handful of your documents.

How to evaluate intelligent automation solution providers

Rule-based vs. Intelligent Automation

Both rule-based and intelligent automation approaches have their place. Your choice depends on your application requirements. Many vendors claim to have intelligent automation solutions that meet your requirements for unstructured data digitization, but not all are selling “real” solutions with AI technology behind them. You’ll need to conduct due diligence for your organization. Here are some questions to consider as you vet vendors.

Here are some questions to consider as you vet vendors.


Intelligent automation for
unstructured content processing

Instead of creating templates for each variation, intelligent process automation uses natural language processing (NLP) to accurately understand text, tables, and images within the context of any document.

IPA uses OCR to “read” text to input to the IPA platform. Instead of requiring rules to identify each element in a document, IPA uses deep learning to contextually understand what content that needs extraction

IPA takes what it learned from one document and applies it to others – a concept known as transfer learning.

Since IPA understands language and context, it picks up on revenue change information no matter what synonym you use and attributes it to the correct catalysts. Humans easily read misspelled words because the brain automatically recognizes the mistake and corrects for it. NLP models do the same thing; interpreting scanned images more accurately than rule-based systems. With IPA, template-based solutions for highly variable content become obsolete.

Get started with Indico

1-1 Demo



Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.