As more and more companies adopt business intelligence tools to drive better decision-making, they may uncover a significant issue with respect to their BI platforms: they fall short on unstructured data analysis. That’s a problem given unstructured data accounts for 80% or more of all data in a typical enterprise.
The reason is simple: most BI tools, including data analytics and visualization solutions, work only with highly structured data. If the data doesn’t come from a structured environment such as a database, or another compatible application such as enterprise resource planning (ERP) system, the BI tool won’t be able to ingest and analyze it.
It’s an issue that can severely limit the value companies realize from their BI investments. BI tools are intended to enable groups including sales, marketing, finance, operations, and others to glean useful information from their massive pools of data, to aid in decision-making. That includes real-time decisions on what’s happening now, as well as predictive analytics to make projections on what’s likely to happen in the future – so you can act accordingly.
To enable such decision-making, BI software gathers data from sources such as sales, production and financial systems – so long as the data is in a compatible, structured format. For BI tools, unstructured data analytics remains elusive.
Unstructured data: hiding in plain sight
The problem is, even in seemingly structured data sources, unstructured data may be lurking. Consider a customer relationship management (CRM) system. A CRM tool may seem to be highly structured, full of forms and fields where specific pieces of data go. Inside those fields, however, may be verbatim notes from call center staff or a sales person’s conversation with a customer. Such notes are inherently unstructured, and thus probably out of reach for your BI tool.
Or think of field service personnel who are responsible for repairs on anything from commercial HVAC systems to aircraft. They likely have an application to keep field service records, and chances are it has fields in which technicians can enter free-form, unstructured, notes. Those notes contain data that’s valuable for historical trend analysis, but if they can’t be entered to a BI or analytics tool, their value is diminished.
As financial institutions onboard new mortgage clients, they may want to gather various data about the client. Such data can then be used to find existing clients with similar demographic and financial profiles, enabling the firm to unearth marketing opportunities for the new client. That means pulling data from all the various documents involved in the mortgage process, many of them paper-based PDFs that are inherently unstructured.
Automating to get more from your BI tools
In the past, getting around these roadblocks required employing armies of data entry personnel to manually extract relevant data from unstructured documents and enter it into downstream systems that turn it into structured data, such as a database or spreadsheet. At that point, the data can be entered into BI tools. Besides being time-consuming, such manual processes are error-prone and costly.
A better approach is to automate the process of turning unstructured data into a structured format, such as JSON or .csv, that can be fed to an analytics engine, data visualization tool or other BI platform.
That’s the intelligent document processing approach Indico Data takes. The key is to combine artificial intelligence technologies such as deep learning and natural language processing with a platform that’s trained on enough data such that it can understand virtually any kind of unstructured content, including audio and video.
The Indico Unstructured Data Platform is trained on some 500 million labeled data points. From there, customers use intuitive tools to label their own unstructured documents, indicating what sorts of data should be extracted.
It’s a simple matter, then, to label the verbatim notes field from a CRM or field service tool as ripe for extraction. Similarly, labeling around 200 documents associated with mortgage applications will be enough to give the Indico Data tool a sense for the type of data you want to extract. The list goes on and on.