Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report

BLOG

How machine learning is changing the unstructured data analytics game

By: Christopher M. Wells, Ph. D.
May 24, 2022 | Confessions from a data scientist leader, Unstructured Data

Back to Blog

If you happen to know any college-aged kids who are wondering what to do with their lives, or perhaps a colleague who is looking for their next move, here’s a tip: pursue a career in unstructured data analytics

I’ve been in the data science field for some time now and can tell you the ability to analyze unstructured data is a largely untapped area that is of immense value to businesses. I’m convinced will be a hot field in coming years. 

Plenty of data analytics and business intelligence (BI) tools exist that enable companies to extract intelligence from structured data, whether it’s in a spreadsheet or a database. You can create all sorts of charts, pivot tables and dashboards using BI apps or data visualization tools like Tableau and Microsoft Power BI. They help you immediately analyze data, spot trends, and otherwise get value from the data you already have. 

Or at least, some of the data you have. The problem is, at least 85% of all data in most companies is in an unstructured format, whether PDFs, images, Word documents, email or the like. Sure, you can paste an image into a spreadsheet, but that won’t help you analyze it.

 

Devising an unstructured data analytics solution

It’s an issue I faced with my previous employer a few years back. We had network drives full of legal documents the company wanted to analyze in hopes of finding trends, commonalities in certain clauses, and executed language that might be considered risky now. We had to roll our own tools using something similar to an ELK stack. ELK stands for three open source projects – Elasticsearch, Logstash and Kibana – that respectively address search and analytics, data processing and transformation, and visualization. 

We were successful in building a data lake with query capabilities on top that enabled users to look for a given clause, for example, and find it wherever it existed across all the documents in the data lake. They could then take the resulting data and analyze it using tools like Excel or Word to identify trends or find specific values. 

It was powerful enough to enable the company to make about 25,000 contracts searchable. In one instance, we used it to find all contracts that mentioned LIBOR, which was a soon-to-be-phased out interest rate that was (and still is) a big deal for financial companies. 

That was about 3 years ago. More tools are out there today, and machine learning is playing a big role. Let’s say a commercial real estate firm wants to find all contracts with clauses that are “to the best of the party’s knowledge” or that have to do with contingencies. Such terms may appear anywhere in a contract and the language may not be consistent, ie Ctrl + F won’t always work. The model has to learn the patterns of clauses that contain such terms, so it can find the clauses no matter where they may exist in a contract – a great use case for natural language modeling.

What you’re after is some assurance that you’ve found all of the contracts that contain the target clause(s). It’s like why people use Google instead of Bing or Yahoo. With Google, the result you’re after is almost always on the first page. If not, you probably need to refine your search. 

Similarly, you need a way to identify if your machine learning model is not finding all the contracts it should and refine it accordingly. The Indico Unstructured Data Platform, for example, will give you a score indicating how confident it is that it’s giving you the right result. If it’s only 75% confident, you can go back and refine it until the score gets better. 

 

Use cases for unstructured data analytics

In terms of who stands to benefit from analyzing unstructured data, most any company probably has use cases. 

Legal is a big one. Law firms have tons of contracts and stand to benefit if they can analyze them to find ways to favor their clients. E-discovery platforms that help find contracts related to a given subject or company are a start but have been meet with only lukewarm adoption. Adding machine learning capabilities to give them customizable intelligence would be a big step forward. 

Insurance is another vertical that’s ripe for unstructured document analytics. Older firms, especially, have decades’ worth of policies they can examine to find trends that may shift their actuarial tables a percentage point or two, which can literally mean millions of dollars in savings

What’s required to make all of these use cases possible is a platform that can understand unstructured data and convert it into a format that traditional data analytics tools can deal with. That’s exactly what the Indico Data platform does. It also enables you to build intelligent automation models, to ease the process of finding relevant data in your sea of unstructured content. 

We’re just getting started with the idea of applying machine learning and data analytics capabilities to unstructured data and content sources. As noted up top, I believe it’s going to be a huge area in the years ahead, and tools like the Indico Data Unstructured Data Platform should be a welcome addition to the effort.

To learn more about how the platform can help you scale your mountain of unstructured documents, check out our interactive demo or schedule a live demo. Or just hit us up with any questions you may have. We’ll be happy to help you get started on the unstructured data analytics journey. 

Automate your most complex unstructured document workflows

Get started with Indico

Interactive demo

Transform your own unstructured documents with our OOTB models

Live Demo

Explore firsthand the value the Indico Platform delivers

Talk with us

Discuss how the Indico Platform can help you tackle your unstructured data problems

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Data Explained

Answers to the most complex questions in unstructured data.

CTO Corner

An accumulation of content straight from our co-founder and CTO.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!