Indico Data receives top position in Everest Group's Intelligent Document Processing (IDP) Insurance PEAK Matrix® 2024
Read More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report


When measuring the classification accuracy on the training set, can I take a sample of it or should I use it entirely?

July 26, 2017 | Ask Slater

Back to Blog

 Usually inference is the fast part so you measure on the whole dataset once per epoch. The problem with measuring on a sample is that for most useful things (like measuring test/train divergence) this injects so much noise into the signal that it’s much less useful. It’s contingent more on the overall size of your dataset than a ratio. If you’ve got 100m examples, then taking a class-balanced random sample of 10m is pretty reasonable. If you’ve got 10k examples then taking a sample of 1000 is probably going to mess everything up.

Above all, think about why you’re measuring accuracy on your training set. In most cases when I see someone doing this, they don’t have a great reason for doing so beyond wanting higher accuracy numbers.

View original question on Quora >

Follow Slater on Quora >>

Increase intake capacity. Drive top line revenue growth.


Unstructured Unlocked podcast

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

podcast episode artwork
March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

podcast episode artwork
March 13, 2024 | E42

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

podcast episode artwork

Get started with Indico

1-1 Demo



Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!