Indico Data receives top position in Everest Group's Intelligent Document Processing (IDP) Insurance PEAK Matrix® 2024
Read More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report

BLOG

Understanding Indico’s Staggered Loop

September 2, 2022 | Announcements, Machine Learning

Back to Blog

If you’re a seasoned Indico user, you may have encountered the language “Staggered Loop” before. Though familiar, you may not be sure what exactly this groundbreaking functionality does and how it benefits our users. Let’s dive into this staggeringly powerful functionality that helps the Indico platform stand out from its competitors.

What is Staggered Loop?

In short, staggered loop is a machine learning strategy that “reuses” your data to create better model output. With staggered loop, your model frequently retrains using your data from Review, which keeps your data as up-to-date as possible without additional hands-on time from humans.

Staggered loop uses a resource already available to you, your production data, to help improve your model’s performance. Production data at Indico is the data from when users correct or accept a model’s predictions and classifications in Review. By using this data to retrain your model, you can improve your underlying models without needing human labelers to label more data. As your model quality increases, the amount of manual work you need to process individual documents decreases.

Why does retraining your models help to improve model output? By combatting data drift, one barrier that prevents models from continuously performing at their best.

Combatting Data Drift

Data drift is the decrease in model performance as your documents begin to look different from your original training data. This change can be in content or format (or both) over the course of weeks, months, or years. Your data changing over time results in your model’s predictions becoming more inaccurate as it makes assumptions based on out-of-date data. Think of how much your business can change over the course of years. Similarly, your data is likely far different than it was in 2019. Staggered loop prevents antiquated data from informing your unstructured data processing.

It does this by adding your human production data from Review back to your training data set in order to learn from the most recent iterations of labeling and corrections. Since your model doesn’t rely on potentially antiquated training data, it can stay more current without needing additional training data, which would require human effort.

If you have an eye for detail, you may have encountered the “partial” tag on your examples list page. This partial tag indicates a key component of what makes staggered loop so powerful: it is mindful of the potential faults in the data. Users label production data differently than they label training data (i.e., not labeling exhaustively in Review), production data added to your training data is tagged as “partial.” Partial data indicates that labels are not exhaustive in that document, and many valid instances of a label may be unmarked. Since the document may only be partially labeled, it’s labeled “partial” on the examples list page! The model uses this kind of data with less confidence than it uses training data to avoid negatively impacting your model.

Why Staggered Loop?

The thoughtful design of this functionality does more than just reduce hands-on time. The “staggered” portion of staggered loops ensures that your production data is high quality – and that retraining the model won’t disrupt your training process. It does this by following the cadence often used by software companies when releasing software, which anticipates imperfections.

With staggered loop, first, the models enter a staging environment. Once they are shown to work in the staging environment, they are promoted to your Indico instance. This allows you to vet the changes to your model before deciding to go all in!

If you were to use a strategy like continuous learning, bad data could affect the quality of your model. In the case of continuous learning, catching bad quality data would be the responsibility of the users. The process of removing that bad data and resetting the model back to its original quality would be time-consuming (if even possible).

Want to learn how to add this powerful functionality to your instance? You can access staggered loop through our solutions toolkit. Get in touch with your Indico contact to learn more.

Want to learn even more about staggered loop? View our platform documentation and video explanation on our knowledge base to get the full scoop.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

[addtoany]

Unstructured Unlocked podcast

April 24, 2024 | E45

Unstructured Unlocked episode 45 with Daniel Faggella, Head of Research, CEO at Emerj Artificial Intelligence Research

podcast episode artwork
April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

podcast episode artwork
March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

podcast episode artwork

Get started with Indico

Schedule
1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!