Indico Data releases industry-first large language model benchmark for document understanding tasks
Learn More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report


Unstructured data explained: Why rule-based tools make for brittle document process automation models

April 6, 2022 | Unstructured Data

Back to Blog

When it comes to document process automation, you can write a rule to automate virtually any step in a process. But that doesn’t mean you should.

Writing rule after rule to address all the variables that may come up in a process involving numerous documents is a trap, says Slater Victoroff, CTO and founder of Indico Data.

“When you proceed in that way the ability to solve every problem is actually the biggest possible danger you could face because it sucks you into this belief that at some point your rules will be correct,” he says. The rules-focused approach assumes the problem is you haven’t written enough rules or haven’t written the right rules. “When in fact that is not the problem. The problem is that you are writing rules to begin with.”

In a recent installment of the “Unstructured Explained” video series, Victoroff discussed the issue with two Indico Data colleagues: ML Architect and Co-Founder Madison May and VP of Business Development Brandi Corbello.


Many rules make for brittle models

The problem with rules is they tend to make automation models brittle, May says. While any single rule may improve the quality of an automation application, taken together, they amount to numerous potential points of failure.

“It means when [the model] fails it fails much harder because you’re imposing stricter and stricter constraints on what your system can and cannot do,” May says. “And at a certain point it ceases to become useful to try and inject all of your preconceived knowledge into the problem and you should take a step back and let the model handle it for you.”

Corbello agrees and says a rules-based approach harkens back to the days when shared service centers were convinced that optical character recognition would solve all their problems. For an invoice processing application, for example, the solution was to have a huge file of words that may exist on an invoice and training the OCR application to look for any or all of those words.

“It’s basically like this big ‘Control F’ was happening rather than actually understanding what was on the document,” she says. “There’s a big difference between those two things.”


Artificial intelligence delivers real understanding

Fully understanding what’s on a document requires a level of intelligence inherent in artificial intelligence technologies such as machine learning, natural language processing and transfer learning. Such technologies are the foundation upon which the Indico Unstructured Data Platform is built. They give the platform the ability to read and comprehend even unstructured documents just like your employees would – only far faster and with greater accuracy.

To learn more, check out the full video below.




Increase intake capacity. Drive top line revenue growth.


Get started with Indico

1-1 Demo



Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!