As we’ve discussed previously, automation models can be brittle. Often it’s because models are rule-based and not all documents play by the rules, but another big issue is ambiguity. It’s not always easy or clear how to parse the meaning of a document in a way that translates to an automation model, which is all the more reason we need intelligent document processing – with an emphasis on “intelligent.”
Some colleagues and I explored this issue in an episode of our “Unstructured Data Explained” video series. In the episode, Indico Data Founder and Machine Learning Architect Madison May laid out the issue nicely when describing the process of converting a PDF to plain text, which is a first step used with templated approaches to automation or robotic process automation.
Check out our entire library of Unstructured Data Explained video series here.
Related content: “Gartner 2022 Market Guide for intelligent document processing solutions”
Why a lossy process is the enemy of automation
“Most people tend to think of documents as plain text,” he said. “You have some PDF and it’s really just a series of words and if you feed the series of words into a machine learning model you expect you can understand everything there was to understand about the document you vetted.”
But as he correctly pointed out, you can lose a lot of salient information in that conversion process. Such information is “present in the layout of the page, in how the page is styled, the size of your words on the page, potentially graphics,” he said. “Sometimes text is a caption for an image and if you lose the image the text ceases to make sense.”
It all adds up to what Madison called a “lossy process” where you lose much of the context behind the page before you ever feed it to a machine learning model.
There’s also ambiguity inherent in the process itself, before you even try to apply automation.
Most, if not all document-based processes have decision points at which an employee deals with some inherent ambiguity and must make a decision, at times relying on their experience to infer information that may not be in the document at all. Given that, two different people who perform the same job may wind up with different outputs from the same document.
No artificial intelligence engine is going to solve for that issue, so the goal is really to get your AI engine to the point where it can accurately automate the processing of documents that are unambiguous, or at least sufficiently low-risk that you’re comfortable with letting the model make decisions. For others, you let the model do the best it can then loop in a human to apply judgment. The human + AI interface is key here.
Difficulties in training automation models
Part of the reason AI will get you only so far is because it can be difficult to teach an automation model all the nuances inherent in a document like a PDF. This is where the conversation in our recent Unstructured Data Explained episode ventured into esoteric topics, with our CTO and co-founder Slater Victoroff positing questions like: What is a word? What is a sentence or a paragraph?
“If you’re a non-technical person and you hear that question you’re like, “Are you an idiot? You don’t know what a word is?” Slater said.
But it turns out defining such things is not always so easy. A classic example is trying to define a paragraph that spans a page boundary. The page break might mean it’s a separate concept that’s starting. Or it might not. You might need to know how the text on the previous page reads or you might not because new paragraphs containing new thoughts are indented.
The more granular you want to get, the more you need precision that may be hard to come by. For example, are individual bullets in a list each their own sentences? Or should each element of the list include the task or heading that started the list? Are section headings or subheadings themselves sentences? To Slater’s point, words seem uncontroversial, until you have to decide whether the dollar sign symbol before a charge in an invoice is a separate word, as it would be if you read it out loud, or something else.
Intelligent automation requires contextual understanding
All of these definitions can be important for various document process automation use cases. In some instances, a big part of a use case is breaking up a document, such as a contract, into its various component parts. The ideal situation is an optical character recognition engine does the job for you and the automation model can then do its thing on the resulting output.
But when boundaries get fuzzy, such as those paragraphs split across a page, it takes contextual understanding to make accurate decisions. Contextual understanding only comes with intelligent automation platforms built on artificial intelligence technologies such as natural language processing, machine learning and deep learning.
It’s also helpful if your automation platform offers multiple types of models. Text-based models are a given. Object-based models that treat a page like an image are also helpful, with their ability to identify and extract data from elements such as a table, photo caption or logo. Now you can get models working together in parallel, each performing the tasks they’re best at.
The Indico Unstructured Data Platform, for example, offers numerous model types and gives you a confidence rating with each one. If the model is 95% sure a result is accurate, you can likely send that output along. If it’s only 75%, you may decide to kick it to an employee for review. You can basically turn knobs to deal with the imprecision and ambiguity in a process, defining how to deal with various issues that may crop up.
The platform is also smart enough to understand the context behind any kind of unstructured content or data, a testament to the database of some 500 million labeled data points upon which it’s built. So yes, with Indico, you can build models that know what a word is and can deal with paragraphs that span pages.
And in the cases where the artificial intelligence still can’t confidently make the decision for you, our platform’s human-in-the-loop capability ensures you can seamlessly combine artificial and human intelligence to maintain a high level of accuracy, and far greater efficiency than either by itself.