Unless you’re a cave dweller, you’ve heard about GPT-3 and chatGPT and how they’re going to change the way we work, think, (and write term papers!). If you believe the hype, these technologies will change literally everything. My opinion? Undoubtedly, but we need thoughtful people building the right technology ecosystem and platforms around and on top of them to realize even a fraction of their promise. In the words of the great, fictional chaos theorist Ian Malcom: Your scientists were so preoccupied with whether they could, they didn’t stop to think if they should.
On that note, Indico has a long and storied history of innovation in this area, going all the way back to our co-founder Alec Radford publishing the seminal generative AI paper on DC-GANs, which is still one of the most cited papers to be published in this space. This foundation paved the way for a lot of what we’re seeing today with generative AI: DALL-E, stable diffusion, and – of course – GPT-3.
Indico was also the first to market with a GPT-based custom model solution for the enterprise and has maintained a strong leadership position in the adoption and deployment of subsequent innovations such as BERT, RoBERTa, GPT-2 and more. We have a strong point of view on what’s happening in the market as it pertains to GPT-3 and chatGPT, formed in the crucible of taking countless AI based applications to production in enterprise companies.
This blog post is meant to inform the market, our customers, and our partners on what GPT-3 actually is. We want to give you the tools to separate hype from reality and let you know how we plan on continuing our history of innovation by embedding GPT-3 technology into the Indico solution.
What is GPT-3? What is chatGPT?
GPT-3 isn’t all that different from the first Generative Pretrained Transformer model, published in 2018. The architecture is mostly the same, the way it is trained is the same, and the style of problems it solves is the same. That said, GPT-3 is 1000x larger and trained on 45 terabytes of text (almost twice the text in the entire library of congress).
During pre-training, the model is asked to solve (over and over again) a relatively simple to state problem: predict the next word in this ________. In doing so, the model learns parts of speech, it learns to reason about mathematical expressions, and – crucially – it learns that while sandwich has almost the right number of characters and is a noun, sentence is the answer we were looking for. Its enormous size and the vast volume of data it is trained on make it a “jack of all trades and master of none.”
chatGPT is a language model built on top of the general purpose capabilities of GPT-3.5. Human screeners were shown 10s of thousands of GPT’s answers to various prompts and asked to rank their quality and classify them (for example, in cases where the language is abusive). That human-in-the-loop feedback produced a specific version of GPT that is much more adept at conversational english and question answering than its general purpose forebear.
Indico’s History of Large Language Model Innovation
Rewind the clock to 2015. Computer Vision was having its moment in the sun as new deep learning technologies were rapidly improving upon the transformative benchmark set by AlexNet in 2012. It would take a few more years for that technology to catch on in natural language processing, but the founders of Indico Data believed that such a moment was coming and founded a company ready to ride the wave when it crested.
And crest it did. In 2018 BERT (the first commercially usable model in the family of models that brought us GPT-3.5) was made open source and almost simultaneously Indico co-founder Alec Radford went off to OpenAI, the company behind chatGPT, and published the first in the GPT family of generative models. Within a matter of months both large language models were part of the Indico platform marking the first time enterprise knowledge workers could harness the power of transfer learning and augment their data entry workflows with fit-to-purpose artificial intelligence.
However, it wasn’t just the core technology that made this possible. These models are powerful to be sure, but they need guardrails and infrastructure to make the productivity they unlock reliable and scalable. As Indico co-founder Madison May says, “with great flexibility comes great responsibility” and the Indico engineering and product organizations have built a very safe and comfortable car on top of these powerful engines.
Hype vs reality
GPT-3, like the technologies that preceded it, also requires guardrails. As mentioned at the outset, the G stands for generative and unlike BERT and RoBERTa, which are discriminative language models, GPT-3 does not have to stick to the source text. It is incentivized during training to provide answers and provide answers it will. In the best of cases, the generated responses are correct. In the worst of cases, they are model hallucinations; i.e. plausible fabrications that are convincing, especially to someone who might be asking about a topic they are not very familiar with. Try asking GPT how heavy the fourth generation of quarks is. (Spoiler alert: there are only three generations of quarks.)
The model is also expensive to run. It has such a massive memory footprint that you cannot fit it into a single processor, so just having it available to answer questions is costly both in terms of compute and the devops knowledge required to keep it up and performant. For that reason, it is currently SaaS only, so it may be quite impossible to employ on tasks that require access to sensitive data.
And, as I remarked before, this model is a language generalist. Tuning it to your exact needs, in the way that one can with BERT or RoBERTa, is infeasible at the moment (at least without significant expense and knowledge).
Indico and generative AI in the enterprise
All the problems mentioned aside, this is a powerful technology and Indico has a long track record of building products that make it possible for the business user to take such technologies to production safely, reliably, and cost-effectively. We were the first to bring the power of GPT (1 and 2), BERT, and RoBERTa to the market in a user-friendly way. While the enterprise was building out expensive data and analytics organizations that only got their work to production 40% of the time, we were busy building a platform that allows the business to bring AI to production 97% of the time.
We are on the cusp of doing the exact same thing for GPT-3. In the months ahead, you will see our platform incorporate this latest large language model such that its strengths are optimized and its weaknesses are mitigated. Querying these models productively is a skill and it can be very brittle. Our model design interface Teach will enable you to build reliable prompts and bootstrap production AI faster than ever. We will harness GPT-3 with our Explain dashboard so you can understand, concretely and numerically, how well this model is understanding your content and its context. Furthermore, we will use these models as guides to building fit-to-purpose, economically scalable production workflows that expose decision ready data to your knowledge workers in Indico Review.
For those of you looking to hear more on GPT-3 check out our latest episode of Unstructured Unlocked with Madison and Tom, recorded live at Insurtech Insights London.
Listen to the full podcast here: Unstructured Unlocked episode 12 with Tom Wilde and Madison May
Check out the full Unstructured Unlocked podcast on your favorite platform, including: