The first line of a 2017 paper written by a group of 11 experts at Microsoft Research succinctly articulated a problem that too many companies face to this day: “The current processes for building machine learning systems require practitioners with deep knowledge of machine learning.”
But the paper also posited a solution to the machine learning complexity problem, in the process coining the term “machine teaching.”
Machine teaching: a better way to build automation models
“We believe that in order to meet this growing demand for machine learning systems we must significantly increase the number of individuals that can teach machines. We postulate that we can achieve this goal by making the process of teaching machines easy, fast and above all, universally accessible,” the researchers wrote.
As it turns out, they were right. Today solutions do exist that make it fast and easy to build powerful machine learning models that automate processes involving documents and data, including unstructured content. And these intelligent automation solutions – such as the Indico Unstructured Data Platform – are most definitely accessible. Indeed, they’re intended to be used by employees who have no knowledge of machine learning whatsoever – but a deep knowledge of the process to be automated.
In a sense, that flips on its head the traditional approach to building machine learning models. Previously, when the business had a process it wanted to automate, it would take requirements to a data science team. The data scientists would try to build a machine learning model to extract key elements of documents or images and drive automation for the process in question. That would likely take months because building such models is a complicated endeavor to complete from scratch, even for professional data scientists. And the model would almost certainly have to be fine-tuned before it went into production, because data scientists are experts in data generally, not in your business’s data specifically, meaning more back and forth and delays.
Like advanced coding tools for AI
I think of the progression of machine learning model-building as similar to what we witnessed with writing computer software code. When I first learned to program, I wrote Java code in a notepad file, ran a compiler from the command line, and held my breath while waiting to see if the code did what I intended. If it didn’t, I had to review it line by line to troubleshoot it.
println(“gET me oUt oF hERe!!!!”)
Today we’ve got tools that will tell you code is flawed as soon as you write it. That is much more productive (albeit less challenging) and makes it easier to write complex programs.
Machine teaching offers the same sort of benefit when it comes to building intelligent document processing models. Your process experts – meaning the people who perform the task day-to-day – are the ones who “teach” the model. They do so by using simple tools to label documents, telling, or teaching, the model which components of a document are important. And just like developer tools tell devs when they miss a semicolon, machine teaching tools give the user feedback on where the model is learning well and where it needs help.
Really, this mimics the work employees perform in processing, say, an invoice: cutting and pasting values such as name, amount, invoice number and so forth from the invoice and into a downstream ERP or other processing system – a process also known as “see and key.” With machine teaching, instead of cutting and pasting they use the labeling tools to teach an automated model how to do the job for them.
Change made easy: Staggered Loop Training
Going a step further, Indico Data recently came out with a new capability that addresses another problem that has plagued machine learning: dealing with changes. You can train a model to deal with a process as it exists today, but change is inevitable. Vendors change their invoice formats, regulations change, new documents crop up, and so on. Going back to your data science team to update models for all of these changes is time-consuming, expensive and impractical. Machine learning operations is a nascent field with emerging (maybe?) best practices and fragmented tooling.
To address it, Indico Data developed Staggered Loop Training. This is intended for “human in the loop” processes, where an employee is required to perform one or more steps in a given process, or simply to ensure accuracy in an automated process.
In such cases, whenever an employee finds an exception, they simply make a correction. The Indico Data platform will then learn from that correction and update the model accordingly – all on its own, with guardrails that the user puts in place. And it will do so without changing how it deals with all the content and data that hasn’t changed. Think of it as continuing education for process automation models.
Staggered Loop Training also addresses a fear you often hear about with respect to machine learning and artificial intelligence in general: that intelligent machines will take over all of our jobs. Staggered Loop is a great example of a far more likely reality, that of humans and machines working together. Employees teach machines how to be smarter and free themselves up for more valuable and less mundane work.
To learn more about Staggered Loop Training and other new features, check out Indico 5 or schedule an in-depth demo. You can also register for a free trial to test the platform for yourself or get in touch with any questions.