Watch Tom Wilde, CEO at Indico Data, alongside Michelle Gouveia, VP at Sandbox Insurtech Ventures, in season 2 episode 18 of Unstructured Unlocked with Peter Mansfield, Partner at Reynolds Porter Chamberlain and host of Insurance Covered podcast.
Peter Mansfield: so, so we’re here today to discuss data.and I’m gonna confront the elephant in the room right at the outset, Tom, which is that I know that there are some people, kind of mostly scientists, to be honest, who have very strong views, and I mean very, very strong views on whether the word data is singular or plural, on the basis that it is technically the, the plural of the word datum.
Now Tom, I obviously don’t want to annoy you during the course of this conversation by getting it wrong every single time.
so, so which camp do you fall in?
Is data singular or plural for you?
Tom Wilde: I’ll sort of defer to where I think the market has landed on this, that I think, you know, data is is an all encompassing term both of of the datum as as you describe it, and, and data as a sort of collective.
Peter Mansfield: Brilliant, and I I’m delighted you said that because I, I have always regarded data as a collective noun that can be either singular or plural, depending on the context. anyway,, before we start talking about what data is slash R,.
I think it might be sensible to remind ourselves that insurance is all about making decisions and I’ve I’ve heard you talk about something that you describe as the decision economy.
So could, could you expand upon that please and and explain why it’s particularly relevant to insurance?
Tom Wilde: I think more recently, you know, this became really clear to to us at Indio that there is this decision economy and and this notion of a of a decision supply chain.
but when you think about insurers for a moment, They really are in the business of making decisions that defines their their difference between success and failure between profitability and and not profitability, is their ability to make quality decisions and and do that sustainably.
Should I underwrite this risk?
How should I price this risk?
Is this claim covered by our policy?
Like, these are all core to the activities that insurer must perform to have a successful business.
And this is why, you know, data becomes so important because data underpins all of that.
And so as we enter this decision economy fueled by the rise of AI and and more recently, you know, generative AI and now even today agentic AI, you know, we, we can talk about how those things contribute to this overall decision economy and how insurers build competitive advantage that way.
Peter Mansfield: Brilliant.
I mean, on previous episodes of the podcast we’ve talked extensively about, not so much about dating in the sense that we’re talking about it today, but we’ve talked about it when we, we did an episode on Cuthbert Heath, who was the great Lloyd’s underwriter of the turn of the last century,, but he obtained data on hurricanes and earthquakes and whatever, so he could understand risks and make, as you say, make good decisions.
and we’ve talked about the life insurance industry which looked into death rates in order to work out how long people were gonna last when, when, you know, on average when they were gonna die, how many children they’d have at that time and whatever.
So data is the absolute core of insurance always has been.
Tom Wilde: I mean, aren’t actuaries the original data scientists, right?
You could exactly they were the original data scientists going way, way back.
Peter Mansfield: No, exactly.
I always say that, you know, that that the whole of humanity has always been terrified of kind of the people who look into the future, so soothsayers and kind of visionaries and prophets and things like that.
But yet when we actually create a profession that manages to do that, we think of it as a slightly dull thing to do.
It’s, it’s, but we’re very weird human beings, kind of, we, we fear the future, but yet.
Don’t respect it either.
So anyway, that that’s you know, by the by, our topic today is, is better data, better decisions.
so I, I, you know, I presume the essence of what we’re gonna be discussing is that that the secret to making better decisions is getting better data,.
Before we talk about what what you mean by better data, let’s just talk about data in the first place.
so in, in the context of insurance, what do we mean by data?
What what what is what what is the stuff that insurers get hold of?
Tom Wilde: Yeah, it’s sort of a a good news bad news story, right?
The good news is insurers have become very proficient.
At being able to make these kinds of decisions when and if they have the data inputs,, that they assume they’ll get,, to, to run these, you know, predictive analytics and and other approaches they’ve taken to understanding risk and pricing and things of that nature.
The challenge they have is The people supplying that data to them, namely the the brokers and the insureds,, don’t always provide that data to them in a ready to use format, or a complete format.
You know, I think the insurance industry, almost more than any other industry, is powered by by documents, right?
documents capture all of the data, typically,, that insurers need to understand risk and and claims and so forth.
It could be very unstructured, unpredictable inputs like medical records.
Doctors notes,, accident reports, police reports, loss runs,, and these are, are truly heterogeneous inputs that are not ready to be used by an insurer to make these predictions.
They need that data transformed in its raw state, it’s not ready for use. so this is historically been one of the big challenges insurers have almost more than any other industry.
I think maybe you could put healthcare in the same category because,, you know, medical records are also not in a standardized format that that are sort of ready to use for downstream systems.
Peter Mansfield: I’ve heard many an insurer over the years say that they’re brilliant at collecting data, but they can’t do anything with it because they they haven’t gone to that next stage. and that next stage is, is, is kind of what we’re talking about as better data, so.
So how does data become better data?
Tom Wilde: So the insurers generally have done a good job of putting systems in place that can Execute, you know, on, on these kinds of decisions, but again,, it assumes that you’re gonna provide nicely schematic, you know, cleansed, validated data.
So better data is really that, right?
It’s going from raw data, which might be in the form of a scanned PDF, you know, application or loss around or police report or whatever it might be, and turning that into schematic data,, in a way that the downstream systems expect it.
And and if it is presented in that manner, you can then execute a a workflow or a task or a process successfully.
So better data is all about going from data to to actionable data.
Peter Mansfield: OK, so I, I’m just trying to think of of an example, and the only one I can really think of off the top of my head is, is in the health context, which is, yeah, you might have a lot of kind of a lot of people with, I suffer from Crohn’s disease, so that, you know, that’s what I’m, that’s what I’m gonna choose.
so you know, lots of people with Crohn’s disease.
So all all the information might be out there as to how many people with Crohn’s disease smoke, for example, but.
That’s the raw data, but what you’re saying is you have to then do the next stage to get get it so you can actually use that data, so you can actually work out, are there connections between Crohn’s disease and smoking or Crohn’s disease and other environmental factors and things like that.
Tom Wilde: It’s actually, it’s more basic than that even,, because that to me would be sort of the predictive step, right, where you’re trying to make these associations.
The problem is much more fundamental, which is When I, let’s say that I’m trying to write,, you know, life insurance or health health policy,, and I get a medical record on you,, the problem is that’s typically a scanned PDF if we kind of get into the the nuts and bolts of the problem.
That’s typically a scanned PDF meaning it’s an image. the, the information is completely opaque to to a machine that is trying to understand that data.
So we first have to lift that data out of that document.
And then put it into a schema that these predictive analytics will understand, meaning a a diagnosis is Crohn’s, right?
So that that is a key and a value.
So a key would be diagnosis and the value would be Crohn’s. we, we need to know the patient gender cause that probably has.
Some impact on on these predictives.
So the, the key would be gender and the value would be male. so we need to create that, that’s what I mean by schema.
I have to lift all of that unstructured data off of this document and then organize it.
It’s a two-step process, that’s the better data steps.
So that it can feed predictive analytics, the example you gave, which is, OK, now that we have all this nicely structured data, I can begin to interpret associations across that data.
Peter Mansfield: So So how do you do that? how do you, how do you turn unstructured data, so as you say, kind of, you know, it just comes in on a PDF,, and how do you then turn that into actionable data?
So how do you turn it into the schema, is that the word you used?
Tom Wilde:: is a good way to think about it, yes,, the structure,, so this is where AI makes things really exciting.
So if we roll back 2030 years ago, the way we had to do this historically.
is we had to be very explicit with the computer to understand what it is we were trying to learn about a document, right?
And think of that as as a very almost software programming approach to the problem, and we used things like controlled vocabularies and regular expressions and rules.
What that meant was, if we very, very carefully defined what we were trying to pull out of that document,, the machine would do a good job of it.
The challenge is, these documents and the and the syntax and the context of what’s submitted to an insurer is never that clean and and reliable.
think about the way acronyms are used, abbreviations,, the way that you, you have different ways to describe different concepts that to us as humans mean the same thing.
We’re really good at this, right?
And the reason for that, unlike the machine, prior to the rise of AI.
Humans have a a vast amount of context in our head, right?
When you and I are speaking right now, there’s the spoken words that are going between us, but there’s an awful lot of context as well.
We know what what the topic of this discussion is broadly, and that informs of how we interpret, you know, the the conversation.
you know, we’re we’re not talking about,, you know, social media, we’re not talking about,, automobile manufacturing, those are different contexts for a conversation.
So humans are really good at this.
Historically, machines were really bad at this.
Machines didn’t have this vast context store with which to interpret language.
AI arrived, you know, let’s let’s let’s roll back to say 20, you know, 10, and the rise of deep learning and and the the dawn of large language models.
Why large language models are so profound is for the first time.
Because of the rise of of the compute power, the algorithms,, and the training data, we were able to teach the machine a similar amount of context about language, and using a giant mathematical model of language, the machine could then begin to proxy that ability to interpret context in a much more robust way.
So just because a word or or or an abbreviation was something you had never seen before.
The context by which it is used, it could understand the meaning of that word, which we as people, you know, the human brain has always been really good at that.
So, that’s what’s been so profound about the rise of large language models, and now these super large language models that not only understand You know, understand trillions of data points about language, but now also understand trillions of data points about sound, speech, image, pictures and videos.
So, the machine can sort of hold that in its understanding, and then when you apply a question to it, it uses that contextual understanding to to answer in a much more robust way than ever before possible.
Peter Mansfield: So indica data you use.
AI generative AI to do that process of transforming unstructured data into actionable data.
Tom Wilde:So let’s talk about AI for a second, because I think this is your question is, is totally reasonable, but I think people today are are sort of using generative AI as the catch-all for AI. AI has gone through a series of steps,, all of which it’s sort of right to think about them as as different programming languages, right, that we would apply to a different problem.
So if you go back to the beginning of AI, It was really predictive AI using good old fashioned machine learning, right?
That, that sort of arrived in the 60s and then, you know, carried forward into the present day, and that’s where all like actuarial models use a machine learning and predictive AI to do those associations you referenced between smoking and a and a and a particular disease, right?
That’s a good old fashioned machine learning can do that.
Then we had the, the, the appearance of of extractive AI,, about,, you know, around 200, let’s call it 2015 to 2018, discriminative AI, you know, allowed us the ability to interpret unstructured data and extract the information and structure that in a very, very robust way.
Then we have generative AI which is good at just what it sounds at.
It, it, it is a generative technology that can take an input and generate an output.
using statistical prediction as to what words should come next.
Now, interestingly, you know, that is good at at things like summarization.
It’s good at being creative almost, right?
What you’ve seen it, write poems and and and things like that.
Well, that randomness that is inside a generative AI is what makes it creative, also makes it a little challenging to predict the outcomes.
And now more recently, we have agentic AI, which is the dawn of of AI that can take initiative and learn a task and improve a task on its own.
Peter Mansfield: So sorry, Tom,, how was that word? I, how’s that spelled?
Tom Wilde: agentic, A G E N T I C Agentic, acting as an agent, effectively.
OK, I’m with you, sorry, I interrupted.
Tom Wilde: Yeah, so those are really our kind of for AI if you want to think about them as programming languages that we have in front of us now, and we can use them, you know, depending on the task we’re trying to accomplish,.
You know, extractive AI is very deterministic. it’s, it’s very predictable, it’s very accurate, and we can get,, you know, a predictable outcome from it.
It’s very explainable.
Generative AI is less explainable, it’s stochastic, meaning the output we get will vary, you know, from time to time depending on when we ask it and how we ask it.
Very subtle differences in the way we phrase our prompts, think of those as the the programming input, can yield very different outputs.
so that makes it both very powerful and somewhat challenging.
When you’re trying to create a deterministic output that you have to be able to rely on to make critical decisions around risk and pricing and medical diagnoses and things like that.
and then agentic AI is this dawn of, of AI’s ability to, to take initiative and solve a task and and sort of understand how to solve a task and define its own approach to a task,, somewhat autonomously.
so, you know, we’re at the very dawn of that, but agentic AI is gonna use all of those tools at its disposal to to accomplish a task that we give it.
Peter Mansfield: And will that also be stochastic, so will also lead to different results depending on slight differences in inputs?
Tom Wilde: It depends.
So agentic AI could use an extractive AI approach for certain tasks which would not be.
Stochastic and generative AI for other tasks, which would be stochastic.
So,, the ability to sort of learn the success of its task and then refine its approach is fundamental to what agentic AI will, will bring to the table here.
Peter Mansfield: Wow, I mean, this is, I’ve just learned more in the last 5 minutes, and I think I’ve learned in the last 10 years.
That’s, that’s amazing, thank you very much.
And and you know, whilst we’re here on AI what are your views on it?
Is it, you know, is there a lot of hype around it, or is it genuinely transformational technology for insurers and and the way they deal with data?
TW: I think it’s a classic case of it’s probably overhyped in the short term and under hyped in the long term.
It’s absolutely transformational without question. , you know, we have moved from an era where we we programmed.
Machines with instructions to now we program machines with data., and, and you can’t understate the transformational effect of that,, and the ability for machines to hold in their understanding a near complete contextual understanding of sound, image and and text is really profound, right?
I mean, we’re at the point now where these very large language models like.
GPT 4 are so large that, you know, they may contain all of the context required.
We may not have to give them much more context.
Now the the task is how do we instruct them effectively to get the results we hope to get,, but that is absolutely profound and, and,, this is as important as the mobile phone or the browser,, you know, when we look back on it.
Peter Mansfield: Brilliant, and.
I’m already beginning to see the answer to this question, but how,, in relation to highly commoditized areas of insurance, so motor, healthcare, whatever, where, where there is a vast amount of, of data arising from similar situations.
I can see how data and the analysis of data is, you know, is absolutely crucial.
I mean, how can you decide, let’s go back to Crohn’s again, what the prognosis for Crohn’s is unless you have the data for a million people with Crohn’s.
but for more bespoke areas of insurance,, you know, specialty, all sorts of, you know, very specific areas of insurance,, how does it work with that?
How does it work where the where the the input data is far more.
Broad and less, well, you talk about stochastic as outputs, stochastic inputs kind of input if that’s if you’re allowed to describe it like that, but where the inputs are very, very different and the, you know, the, the, the data sources are different and, and everything about it is, is slightly random.
Tom Wilde: Yeah, I think you’re you’re absolutely right.
I think commercial and specialty is, is a much more difficult task.
so this is where I think AI in general, you know, really shines.
I think personal lines, auto, life, those were really transformed by mobile.
If you think about yourself as a consumer, personal lines, you know, you interact with those providers almost exclusively through their mobile app now, right?
They’re able to basically dictate to you.
The data you need to provide and and how you will will present risk to them.
And there’s some really fascinating developments in that space.
You could imagine that with connected cars, there’ll be a time in the future where your insurance will be billed by the minute, right?
When when you’re behind the wheel, how you’re driving, where you’re driving, your risk profile will change minute to minute, and and you may actually be paying for insurance on that basis.
in specialty and commercial, what we do with our customers is really sit down with them and start with the outcome they’re trying to drive, right?
What will determine success?
Then we work backwards to figure out what inputs do you need to drive that outcome, and where is the source.
Material you’re getting for those inputs.
And as we work backwards, we’re able to, with our platform, you know, modify it and and configure the workflow and the AI agents that live in that workflow to perform those tasks.
And as you say,, that help underwriters make better decisions.
Do they underwrite this particular insured or do they not?
And if they do, on what rates, or what exclusions, what, you know, what are the terms and conditions of the policy.
That’s where they’ll win, right?
So that’s their secret sauce.
So we, when we, when we talk to the customer, we want to enable them to exploit that secret sauce in in an understanding of risk and pricing that allows them to win.
So we provide them with this basically this decision supply chain that goes from raw input to a finished good, which is their decision, you know, on risk and pricing.
So, that’s sort of why we’ve started to talk about this as, you know, the enterprise needing a decisioning strategy, a, a decision supply chain metaphor,, to understand the key decisions that will determine.
You know, success, profitability, market share,, and, and I think that we’ve arrived at a point in time where you can apply that as a metaphor,, to make better decisions.
Peter Mansfield: And, and is there such a thing as too much data? is there a danger that,, you have so much data kind of looking out there, that there’s there’s so much noise, you can’t work out what the signal is.
Tom Wilde: Well, I think that we’re coming out of the era where And this was driven in part by a lot of the large data vendors,, data infrastructure vendors I should say, who counseled customers to collect everything as much as possible all the time.
Not a bad recommendation.
I think the challenge though is it came without, the advice as to how to make that data better, how to make that data more valuable, how to cleanse it, normalize it, validate it, you know, extract it, so that it became actionable.
And so you, you start to have a bit of a hangover, you know, 10 years ago where people were asking the question, we spent all this money on our data infrastructure, but what really advantage have we unlocked from it?
OK, so that was before AI.
AI arrives and now one of the things AI is really, really good at.
is examining vast amounts of data much faster and more robustly than we can do as people, right?
So, I can’t stare at a database and and make connections and inferences from from a database with a million rows of records, right?
Not possible, but a machine can do that really, really well.
What that then says is the skill that’s required now is to ask the right questions.
So, when you think about prompting and generative AI prompting is is all about asking the right questions.
And if we get really good at asking the right questions, then we gain advantage from the data asset that we’ve collected over time.
Now, AI is also beginning to help us.
Understand what questions to ask.
So this is where it becomes really interesting and maybe a little bit confusing is, you know, are are we asking questions or being prompted to ask questions of the data in the future?
so that’s where we start to get into agentic AI which is agentic AI begins to say, I’m gonna come up with my own questions to ask based on this task that I’ve done over and over again.
So, we’re at the very dawn of that kind of fascinating future, and not sure where where that ends.
Peter Mansfield: it is fascinating.
It is absolutely fascinating, and,, and, and before we talk about it, we’re gonna come on to the future in in just a moment, but,, before we leave sort of potential issues with with AI, I, I, I’m, yeah, I’m interested to know to what extent is AI being able to work out the difference between causation and and correlation?
I mean in preparation for this, I came across what can only be described as the best false science ever. which is that that there was a study in the German state of Lower Saxony.
which genuinely shows that there is a direct correlation between the number of home births.
And the population of white storks.
So yeah, the whole, you know, white storks bring the babies thing.
It’s true in Lower Saxony,, but clearly there’s no causation there, it’s just correlation, it doesn’t mean anything.
And you also have octopuses who successfully choose the winning teams in, in World Cups and and things like that.
how good is the AI at differentiating between causation and correlation?
Tom Wilde: Not great yet.
I think that’s where the human intellect is still vastly superior in understanding what is sort of absurd and and what is is rational, because AI and large language models are only as good as the observations they’ve been fed, right?
So,, the ability to, to be, you know, purely creative and Understanding what questions to ask and understanding what is, is sort of, you know, realistic and unrealistic, is still not how it works, right?
It is purely a statistical model of what it’s been fed.
so, You know, if, and this is the, the sort of risk of of AI collapse, right?
Where if AI starts to train itself on AI produced content, which at times is absurd or not factual, it could collapse on itself, right?
Because it begins to think that those facts are true.
There’s some parallels to this with humans and social media right now, right, that amplifying.
Things that are false, people who consume that begin to believe they’re true and build a new sort of truth system around that.
So there’s a lot of parallels to that with with AI,, so the, the sort of ground truth becomes even more important, you know, in, in understanding the world, right?
Obviously we know that the storks are not causing, home births in, in Germany, but it’s a correlation, so maybe,maybe it’s a causation, and that’s where AI is not good at sort of understanding that distinction.
Peter Mansfield: So, so AI in that respect is, is the archetype postmodern technology.
It doesn’t understand what truth is.
Tom Wilde: No.
It it only it only understands what it’s been told,, and it will faithfully leverage what it’s been told to give you,, output, and so that’s the risk here.
You could call it hallucination.
Hallucination is a little bit different than than what I’m describing, but hallucination is related to this, right?
The the stochastic nature of Geni is, is a, it’s a feature, not a bug,, it’s why GenniI can do so many amazing things.
But it also is, is the challenge, because in its desire to to please you, to generate an answer for you, it doesn’t know that it’s not supposed to generate a false answer for you. if it sounds good, then it presents it to you and says, this sounds like a good citation for a legal precedent. and you read it and you go, yeah, that sounds like that’s a strong legal precedent.
It didn’t know that it’s not supposed to make one up.
that’s not part of its rubric.
now we started to design guardrails where, where we attempt to to tell it, don’t make stuff up, and you even have adversarial AIs which can say, hey, that looks made up to me, you should try again.
So that’s, you know, another sort of approach that we’re starting to take to figure out how to Enforce truth on the outcome that it’s producing.
Peter Mansfield: And and and at the moment you have this kind of wonderful balance between experienced human underwriters and AI.
So AI is being used as a tool for the human underwriters.
Is there a risk that as the AI.
Becomes better So the humans will delegate more and more of their decision making to the AI.
Therefore, over time, you will no longer have the experienced human underwriters, and everything will just become AI.
Tom Wilde: I think that’s a risk.
I think that the short term risk here is that.
The AI sounds so convincing in its answers that you put too much trust in it.
And I think that, you know, human in the loop or co-pilot, however you want to think about it, sometimes we describe it as a bionic arm.
Remains a vital sort of metaphor here. so, the short term risk though is that it sounds so convincing that we, we lean on it too much, and before we know it, you know, we, we’ve missed some bad decisions that it’s recommended.
Second, I think there’s gonna be an increasing focus on the transparency and explainability of every step.
Let’s go back to our our decision supply chain metaphor.
Every step in that supply chain, can we go back in time and explode all the steps it took, you know, in that particular step of the process, so that we can always be able to trace back and correct errors and understand where it’s making good decisions or bad.
So, I think that ability to check and balance this is critical because it can make more decisions faster with more data, which also means it can create more bad decisions, more chaos faster.
So, we, we have to be constantly aware and checking against that.
Peter Mansfield: Brilliant.
I mean this has been absolutely wonderful, but,, we have to bring it to an end.
So, so finally, Tom,, in terms of your career, you’re a fairly recent convert to insurance, but I mean, do you think it’s a good industry to be involved with?
is it, yeah, is it something that you’d recommend to a friend?
Tom Wilde: Look, I’ll preach to the choir a little bit.
I mean, insurance allows us to do a tremendous amount of things that we couldn’t do if there wasn’t a safety net, right, of, of a sort of risk protection.
so insurance is absolutely a vital engine to the, to the global economy that wouldn’t exist without it, right?
If you go back to the early days of Lloyd’s and, and people wanting to, to charter, you know, ships across the ocean,, no one person could have afforded that risk.
I think that,, My experience with insurers is that they are very eager to improve.
The way they do business.
I think they, they are unfairly characterized as being slow and and head in the sand.
That’s not been my experience.
I think there are challenges to transformation, but I don’t see them being resistant to it.
they’ve also, you know, had challenges and failures trying to do this, and that has scarred them to some degree.
So, you know, there’s there’s a a more increased focus on making sure there’s gonna be a good outcome when they make a decision like this.
Peter Mansfield: Brilliant.
Thank you, Tom, that was wonderful.
Thank you so much for your time.
Tom Wilde: And with that, I’m co-host Tom Wilde.
Michelle Gouveia: And I’m co-host Michelle Gouveia.
Tom Wilde: Thanks for listening to Unstructured Unlocked.
Check out the full Unstructured Unlocked podcast on your favorite platform, including: