Select the type of demo you'd like to experience with Indico Data.
Welcome to the ultimate resource for comparing large language models. Here, we meticulously analyze and present the accuracy, speed, and cost-effectiveness of leading models across critical tasks such as information extraction, clause classification, and summarization.
Indico Data has been a guiding force in the AI industry since its inception, consistently emphasizing practical AI applications and real customer outcomes amidst a landscape often clouded by overhype. Indico was the first in the industry to deploy a large language model-based application inside the enterprise and the first to integrate explainability and auditability directly into its products, setting a standard for transparency and trust.
While the vast majority of LLM benchmarking is focused on chatbot-related tasks, Indico recognized the need to understand the performance of large language models for more deterministic tasks such as extraction and classification, and further to understand the performance and costs based on assumptions related to context length and task complexity.
Indico Data runs a monthly benchmarking exercise across providers (LLama, Azure OpenAI, Google, AWS Bedrock, and Indico trained discriminative standard language models RoBERTa and DeBERTa), datasets (e.g. cord and CUAD), and capabilities (text classification, key information extraction, and generative summarization). The table below ranks the accuracy (F1 score) of these models for each capability averaged over datasets and prompt styles. The "Accuracy" page contains the same information at a much more granular level of detail.
Full details of this month's benchmarking run across models, capabilities, prompt styles. This information is meant to facilitate decision making when trying to decide the best model for a given task. For example, if missed information in your process is expensive, then
you should choose a model with high recall.
Green means better than average, red means worse than average, orange is average.
The size is how far above/below average the model is.
Gain insights into not just how well each model performs, but how fast and cost-efficiently they do it.
Plotted below are the tradeoffs between accuracy (F1 score) and cost and accuracy and response time by model for all capabilities, datasets, and prompt styles.
COnsolidated Receipt Dataset for post-OCR parsing.
Original source: https://github.com/clovaai/cord
From the authors:
…The dataset consists of thousands of Indonesian receipts, which contains images and box/text annotations for OCR, and multi-level semantic labels for parsing...
Non-disclosure agreements, published by Applica AI.
Original source: https://github.com/applicaai/kleister-nda
From the authors:
Extract the information from NDAs (Non-Disclosure Agreements) about the involved parties, jurisdiction, contract term, etc...
Charity financial reports, published by Applica AI.
Original source: https://github.com/applicaai/kleister-charity
From the authors:
The goal of this task is to retrieve charity address (but not other addresses), charity number, charity name and its annual income and spending in GBP in PDF files published by British charities...
Contract Understanding Atticus Dataset
Original source: https://www.atticusprojectai.org/cuad
From the authors:
...a corpus of 13,000+ labels in 510 commercial legal contracts that have been manually labeled under the supervision of experienced lawyers to identify 41 types of legal clauses that are considered important in contact review...
ResourceContracts is a repository of publicly available oil, gas, and mining contracts
Original source: https://www.resourcecontracts.org/
Indico retrieved hundreds of contracts from this repository and labeled key information including names, organizations, section orders, and full clauses (used in this classification task).
Legal language classification into three classes: Entailment, Contradiction or NotMentioned
Original source: https://stanfordnlp.github.io/contract-nli/
From the authors:
ContractNLI is the first dataset to utilize NLI for contracts and is also the largest corpus of annotated contracts (as of September 2021)...
Large language models are the driving force behind the generative AI boom of 2023. However, they've been around for a while - and we know a thing or two about them.
Since our founding in 2014, Indico has been on the forefront of innovation in unstructured data and intelligent document processing, with a leadership team that brings years of experience deep expertise in artificial intelligence and machine learning-powered solutions.
Better data.
Better decisions.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |