In our previous posts from this product feature series, we took a look at how you can easily train custom models with Teach, and delve into its behaviour with Review’s AI explainability dashboard. But what if you’re not quite sure yet what your labeling strategy should be? You’ll need to explore your data further with some clustering techniques to find natural relationships that will strengthen your model’s trainability. Indico’s Discover feature makes this process decidedly straightforward.
How it works
Once you’ve uploaded your dataset, it will be automatically processed by our unsupervised text analysis algorithm to build contextual relationships. The results are visualized as an interactive t-sne graphic, allowing you to easily lasso and drill down into word clusters to explore your data and better inform your labeling strategy.
From there, you can start adding labels directly to your clusters, or to specific documents that the model pulls up when you select clusters. After you’ve labeled enough examples, our model will start predicting labels for remaining data points that you can accept, edit, or remove to help speed up the labeling process.
You can also use fuzzy search to find related terms and documents — note that this type of search works based on context and not keyword matching, so you won’t lose out on important information that could play a key role in your final model’s results.
Finally, save your labels and segmented data as an “overlay” that you can easily access for the rest of your model development journey through Teach (for a more robust labeling system + training your model) and Review (for digging deeper into how your trained model works + tweaking it).
And…that’s a wrap for our product feature series for now — stay tuned as we build out more product features to fulfill our mission of democratizing machine learning!
Want to see our Discover module for yourself? Click here to set up your account.