When measuring the classification accuracy on the training set, can I take a sample of it or should I use it entirely? – Indico Data

Upcoming Webinar

From Automation to Agency: Indico Data Unveils the Future of Insurance with Agentic AI

Thursday, June 26 at 4PM BST/ 11AM EST / 8AM PST

DAYS

HOURS

MINUTES

SECONDS

Register Now →

Platform
- Technology
  - Platform & Agents
  - Agentic Orchestration
- Solutions
- Why Indico
  - The Indico Advantage
Solutions
- By Industry
- By Use Case
- By Role
- Services
  
  Integration, optimization, and project-based services to support you at each step
Resources
- Resources
  - - Unstructured Unlocked
      Podcast
      Enterprise leaders discuss how to unlock value from unstructured data.
    - Large Language Model Benchmark
      Accuracy, speed, and cost-effectiveness of models for extractive, classification, and predictive tasks.
    - Blog
    - All Resources
- Documentation
  - Knowledge Base
  - Developer Documentation
- Customer Stories
Partners
- Partners
  
  Stronger together. Explore the partner program.
- Find a Partner
  
  See our community of technology partners, system integrators, and consultants ready to support you.
- Become a Partner
  
  Join our partner community and help your clients drive better outcomes with Intelligent Intake
- Partner Portal
  
  Partner log-in.
Company
- Company
  
  See the leadership team committed to your intelligent intake success
- Press & Events
  
  See Indico events and news coverage
- Careers
  
  See what it’s like being part of the Indicrew

Schedule Demo

Usually inference is the fast part so you measure on the whole dataset once per epoch. The problem with measuring on a sample is that for most useful things (like measuring test/train divergence) this injects so much noise into the signal that it’s much less useful. It’s contingent more on the overall size of your dataset than a ratio. If you’ve got 100m examples, then taking a class-balanced random sample of 10m is pretty reasonable. If you’ve got 10k examples then taking a sample of 1000 is probably going to mess everything up.

Above all, think about why you’re measuring accuracy on your training set. In most cases when I see someone doing this, they don’t have a great reason for doing so beyond wanting higher accuracy numbers.

View original question on Quora >

Follow Slater on Quora >>

[addtoany]

Increase intake capacity. Drive top line revenue growth.

Schedule Demo

Related Posts

Ask Slater, Machine Learning

What is a tensor in physics terminology and what’s the difference from a tensor in machine learning and AI?

Ask Slater, Machine Learning

How does the ELMo machine learning model work?

Ask Slater, Machine Learning

Should we remove duplicates from a data-set while training a Machine Learning algorithm (shallow and/or deep methods)?

[addtoany]

See how Indico Data’s AI-driven solutions can revolutionize your decision-making processes.

Schedule
1-1 Demo

Schedule

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

More

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

More

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.

More

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Others