Christopher M. Wells, Ph. D., Indico VP of Research and Development talks with Patricia Thaine, CEO of Private AI, in episode 5 of Unstructured Unlocked. Tune in to discover how enterprise data and automation leaders are solving their most complex unstructured data challenges.
Christopher Wells: Here we go. Hi, and welcome to another episode of Unstructured Unlocked. I’m your host, Chris Wells, VP of R and D at Indico Data. And I’m really happy to introduce you to my guest today. Patricia Thaine, CEO of Private AI. Patricia, welcome to the podcast.
Patricia Thaine: Thank you so much, Chris. It’s pleasure to be here.
CW: Great. you’re a little bit off the beaten path in terms of the guests we’ve had so far on the show, which I’m excited about. There’s a lot of good stuff to discuss. Why don’t we start with you telling us about your journey through the tech world and what it is exactly that goes on at private AI?
PT: Journey through the tech world. How far back do you want me to go?
CW: Go as far back as you think is interesting.
PT: All right. I actually started out in English literature and then moved to linguistics because I got kind of annoyed at the process of teaching English literature. Oh. and then I thought, well, eventually I’m gonna wanna have a job and linguistics gives no job prospects. What am I gonna combine this with? And I found out that there’s this thing called computational linguistics. So started looking into that, did a programming class, absolutely loved computer science, ended up making that my major, and then started eight masters in computational linguistics at the University of Toronto. Right after undergrad, following that during my master’s, I knew I wanted to start a company, and the decision was, do I start a Ph.D. and work on a company during a Ph.D.? Or do I join a startup to learn the ropes?
And I chose the former. Yeah. So for Private AI, it’s really about making it super simple for developers to integrate privacy into their pipelines cuz they’re not experts with privacy. A lot of people need to worry about either compliance or customer trust, and they’re collecting unstructured data that they often need to be able to see to debug their systems and train their models. So if imagine you’re a Grammarly and you wanna integrate a privacy layer into Grammarly. If we didn’t exist, you’d either have to rely on a third party that actually has very low accuracy that you’re sending your data to. And therefore not respecting your own customer’s privacy because some of them store your data and use it to train their models where you’d have to build a house. And we found a lot of people were building in house.
CW: Yeah. I can say I can validate both of those things. I have built privacy pipelines myself in-house and I’ve also used Private AI, and I far prefer Private AI.
PT: Much appreciated, less painful than building it yourself.
CW: Yeah. Much less painful. Download the API key, and get privacy. It’s pretty great. Thank you. So yeah, absolutely. I’m always happy to be a commercial for you all. I also want to say, I think you just set a record for the earliest mention of unstructured data on the podcast, so well done. It’s exciting. Thank you. Thank you. Have a lot to talk about there. My favorite, yeah, it’s the best kind of data it’s the worst to work with, but the most exciting stuff is inside of it. I was perusing your LinkedIn. Could you give us a two-minute introduction to what homomorphic encryption is? I wanna know what those words mean together.
PT: Absolutely. So homomorphic encryption is a technology that I worked on a lot of my Ph.D. on. By the way, I’m an in-denial dropout, I’m on leave but probably never going back. And the technology there, it’s all about being able to compute on encrypted data. So you encrypted one, you encrypted two, you add them together, and multiply them together. You get a three or a two if you and it’s really good for, for example, matrix multiplication and polynomial operations, but not good for nonpolynomial operations. So it can be used for password management, for example, and Microsoft is using it for password management. It can be used for searching encrypted databases. So you are a client sending an encrypted query to a cloud, and that cloud has no idea what you’re searching for, and then you get it back and it, you unencrypted no one’s any of the wiser and a lot of homework for encrypt aims are quantum-safe. Which means not that, oh, interesting. Not that there’s proof that quantum computers can’t break it, but that they haven’t broken it yet.
PT: Yeah. So a lot of them are based on what’s called LA space cryptography, and I can nerd out as much as you want or as little as you want.
CW: That’s great. So like, this kind of encryption is immune to shores algorithm. Is that the kind of thing we’re talking about?
PT: Correct. Yeah, that’s right. That’s one of the algorithms it’s immune to. That’s cool. And it’s yeah, so it’s, it’s partially what financial institutions might rely on. If we, if quantum computers ever become ubiquitous and people could just great per essay on the flip of a dime and a lot of privacy-preserving machine learning teams, not that there are many out there in the first place, but several of the ones that are out there will are, have been looking into combining homework encryption with machine learning. The trickiest part is generally the nonpolynomial aspect of it. So rare function, sigmoid functions, and I can dive into how they deal with it.
CW: Okay. That’s awesome. So privacy is like a genetic thing for you, like top to bottom?
PT: At this point, it’s been quite a while that I’ve been working on privacy on a variety of technologies.
CW: Oh, that’s cool. Yeah. And it’s always nice to meet another recovering academic. I put up my red grading pen many years ago and haven’t looked back, it does have its perks. It does. So good. So that’s private AI, that’s what you do. And again, big shout out to the tech. It works well. As, as you’re, you know, you know, you’re the CEO and, you know, you’re an early stage company, so you’re out there selling, I imagine what sort of, what sort of verticals are really embracing this technology?
PT: Hmm. great question. So we are seeing a lot of traction in conversational AI. Okay. lot of traction from AI companies as well, customer service insurance telemedicine, intro tech financial institutions, basically very vertical agnostic. Some are just not ready for it yet. Okay. Which ones would that be?
CW: Well let’s pause there. I wanna dig in and maybe a concrete example, like how does, you know, how does your engine show up in a conversational AI setting, say in the telemedicine space.
PT: Yeah. So suppose you want to keep HIPAA-compliant records of calls, and you wanna share these with third parties so that you can make your processing more accurate or provide extra feedback or analysis of a call. You want to limit the amount of PII that’s being sent over. You actually want to probably follow HIPAA compliance safe Harbor guidelines, which lists 18 different entity types that you need to remove. So you, in that case, you wanna keep the protected health information, like what disease they had with symptoms they had, but you want to remove things like their healthcare number and their name and their location, unless it’s state level and where we come in is to automatically scrub it. And it’s very difficult to do because of how messy conversations are. So somebody might say, my healthcare number is a five, three, no, sorry, actually that was a five-two. And then you need to capture all of that. And traditionally, that’s done with regular expressions, which do not generalize well for unstructured data. Right?
CW: Right. Absolutely. So you’re working at the level of like the chat bot on my phone. Are you also working with like phone transcripts and data like that?
PT: Yeah. Yeah, for sure.
CW: That’s exciting. And how many, how many entities are you able to obscure slash SPO?
PT: Over 50 different entity types.
CW: Wow. Okay. Across?
PT: 42 languages.
CW: That’s awesome. What’s which one are you most proud of? Like, which one’s the hardest to figure out?
PT: Which one, the numerical PI is such a nightmare.
CW: Okay. All right.
PT: Especially because there’s, you know, you’re not gonna find data that’s out there with credit card numbers and people talking about their credit card numbers.
CW: Not unless you’re on fortune, I suppose. Yeah. Okay.
PT: Right. and also the ability to capture these different types of PII across multiple languages as somebody quotes, which is within a conversation. So some countries might have in Canada or Spanglish or English in India.
CW: Yeah. Generally pigeon right? You know, mixtures of languages. Yeah.
PT: Exactly. So being able to do that, I’m also very proud of the team for having developed.
CW:Fascinating. All right. I could dig into this a lot, maybe one more question on the tech. You mentioned datasets being hard to find, so how do you go about improving your models?
CW: And you’re not using client data, which is huge. I love that.
PT: Yeah. Unless they specifically give us access to yet-identified data. Yep. And so it has to be specifically that information. A lot of data is synthetic. A lot of it, we source ourselves online. As long as the license allows us to use it commercially, I have, I mean, that is not so obvious. I have seen certain companies list, the data sets that they use between their models. And if you look at the licenses of those data sets, you’re not supposed to do that. Yeah. Medical data sets that are specifically for research purposes only, and you can’t even buy a license for
CW: Yep. Data, data licensing is a real problem. We could do a whole podcast just on that,
PT: I think. Yeah. Seriously. And it’s not something that you teach you in school?
CW: No, no. They don’t like many other useful things not taught in schools. Good. Okay. So that’s a little bit about the tech and some of the verticals when you’re selling to these companies that are out there, and they’re saying, I need a privacy solution. What are the personalities roles that you’re, that you’re talking to? Like, who’s out there shopping right now?
PT: Hmm. Yeah. I mean, our goal is to sell to developers. So ultimately, that’s who we are selling to that we might normally talk to their managers who are more aware of the actual problem that they’re trying to solve because they know where the data’s coming from and who the data has to go to. And who’s getting angry at the top for misusing the data?
CW: Yeah. Okay. So how, how does that work if the developer, you know, you’ve got an API out there in the cloud, are they sending the data to your cloud? Or how are they, how are they actually getting this technology into their pipelines?
PT: Hmm. rarely that way most of the time, because we practice what we preach and we wanna yeah. Minimize the flow of personal information. We send a container to their environment that they deploy there. And then they call it a rust API.
CW: Oh, interesting. So then they just have to pass some security scans with the container, and they’re good to go. Yeah. And this is like individual developers buying a license or a couple of licenses for, for the tech stack they’re working on.
PT: Yeah. And we always get some pleasantly surprised people when they run the security scan on the container and say, yeah. Oh, normally we find 30 different vulnerabilities are clean.
CW: Yeah. That’s awesome. Yeah. Love to hear it. Okay. So you’re, you’re targeting the individual developer. Are they mostly working on, you know, you talked about chatbots what types of projects are they working on that they’re injecting? They’re trying to inject privacy in.
PT: Ah okay. So chatbots, for example. Yeah. have a really big problem if you wanna use production data to train your models, those models are going to be memorizing information from that production data.
CW: That’s right.
PT: One solution proposed is differently, private training, however, very costly very noisy. And you need to make assumptions of independence that don’t always hold.
PT: Another option is to do what we allow people to do, which is replace personal information with fake personal information. So if it’s not there in the first place, it’s not gonna memorize it. And in addition to that, if you’ve got the replacement in place, that’s very natural. You’re not gonna suffer downstream model accuracy loss, and you can still tell what a conversation was about. Interesting. you can still tell whether they were happy or not. You can use it for training. You can use it for information even about, you know, which location an accident happened in without necessarily ha being able to tie that into a policy number or a person’s name.
CW: So interesting.
PT: There’s so much rich data around the PII. That’s just toxic.
PT: And because we can just replace it naturally, you can go ahead and use the data as if it were the original one.
CW: Interesting. So that they, so you find that your clients, your users are in a lot of cases using this as part of the ETL pipeline to scrub the data before it gets sent to a machine learning model. And exactly. And because of the way it’s working, the sort of correlations among entities within, within the text are being preserved, is that the right way to think of it?
PT: It can, if chosen to, to a certain extent but oftentimes it’s not really the entities that people are interested in. It’s more the…
CW: The local context. Yeah. Yeah.
PT: That’s right.
CW: Interesting, very cool stuff.
PT: Thank you.
CW: So I’ve spent too much of my time in my life, probably selling, trying to sell AI to the enterprise, both in my previous career, as you know, picking vendors to build a data science stack. And then now at Indico trying to sell AI to the enterprise, what do you find to be the big challenges in getting big companies to embrace A?,
PT: The big challenges to getting? So I don’t see it so much as the company, big companies having to embrace AI as there’s a solution to a problem, and maybe AI solves it better, and it’s, you’re selling to them not AI itself.
CW: Yeah. Unpack that a little bit further. That’s an interesting thought.
PT: Let me give you an example. Yeah.
Data loss prevention companies can do a lot of regular expression scans that are very efficient. And so you can process a huge amount of flow of data through them. And if it’s not perfect, it’s fine. Cuz it could still pick up more or less whether there’s some sort of sensitive or personal information within that shouldn’t be going. However, when you need something to be very spec, very precise, like what you’re sending to a machine learning team like what you’re sending to a third party or if you need to create a report, that’s very specific because there was a data leak, for example.
PT: That is something DLP providers are not able to do, but if you go to an enterprise and you say, I can solve this problem with a more precise solution than DLP solutions do yeah. They don’t care if you’re using regular expressions that are better. Or if you’re using AI that are better, as long as it works long, better, it’s really the AI that you’re selling.
CW: Yeah. Interesting. That’s a good distinction. So I wanted to dig a little bit into some of these regulations. Yeah. Hold on one second. Sorry. Hey, how are you doing, Pearl? One of the sources of frustration for me as I was, as I was looking at data privacy back in the day, I don’t do so much with it nowadays, but was that the regulations are extremely stark. They’re like no PII rather. Whereas like a data scientist like myself would say, okay, I can guarantee it 99% confidence that we’ve scrubbed this much of the PII. Right. So how, how are people thinking through that right now? Mm-hmm cause I think that’s been a common pain point in the last
PT: Few years. That is a great question. So HIPAA for example is the only one that provides a re-identification risk threshold of, and it’s, it’s a pretty high threshold.
CW: But 0.04, so four out 10,000 instances can get through. Yeah. Yeah. Okay.
PT: Yeah. So that is one guideline. And what I like about that guideline, even though it’s, it is certainly flawed is that it is admitting that no, one’s perfect. Yeah. And what we can, and the thing is a lot of people expect privacy technologies to be perfect. Otherwise, why would we even use them? Alright, well, let’s just not encrypt things because people can go get fishing attacked. Let’s just forget about VPNs, who cares cuz they can’t be perfect? Not, not how it works. Our goal is to minimize risk and I can’t speak for all legislators, but it seems like as long as you’ve made a genuinely good attempt at finding the best solutions at thinking through the vulnerabilities and addressing them, you if or when a data leak or a vulnerability gets exposed happens yeah. The public and also the, the legislators will be a lot more lenient on you.
CW: Yeah. Yeah. That’s, that’s a great point. Minimizing risk. I, one of the conversations I had back in the day was well, you can’t put this in the cloud because you know, the cloud’s dangerous. And I pointed out that these documents had been copied dozens of times across network drives and people’s laptops. It’s like, you know, this is probably the least of your worries, the fact that this is within our own VPC in AWS. So, so that point about like making a reasonable attempt, I think is it sounds like people are thinking the right thoughts nowadays, which is exciting to me.
PT: Yeah. And I’m curious about what you have found to be the biggest problem of selling AI to enterprise.
CW: So it’s actually, it’s actually related to the question that I asked you about, you know, how do you, how do you prove that it’s good enough or do you have a good measure of good enough? And I, I find that a lot of folks will approach our technology. For example, helps folks automate their unstructured data flows. So moving documents through a pipeline, whatever that human-driven pipeline is starting with email and ending in a database kind of a thing. And we always get asked or most of the time we get asked the question like what’s the accuracy gonna be? And of course, my standard answer is that accuracy doesn’t convert. It doesn’t have units. So you can’t like convert it to time saved and which converts to dollars saved. And so I liked your answer about, you know, it’s, it’s really about minimizing risk and our technology is really about optimizing a process by taking unstructured and adding structure to it. And so getting people to stop thinking the thoughts that maybe make a lot of sense for a robotic process where it’s yeah. You know, rules driven has been a challenge and I we’re starting to see some daylight there, but a lot of people are still stuck in that, that old way of thinking about things.
PT: That’s a really good point. And I’d like to add to that please, when it comes to accuracy, we’re getting to the point where AI systems, in certain cases, if you create on the right data will perform these tasks more accurately than humans.
CW: A hundred percent. Yep.
PT: And I think there’s this expectation that humans are perfect.
CW: There is not.
PT: If somebody’s not willing to accept that, I don’t think that they’re really gonna understand what the accuracy truly means.
CW: No, I can’t tell you how many conversations I’ve had where I’ve said, Hey, just as a BA as a baseline, how accurate is your human process today? And the answer is, well, it’s a hundred percent. It’s like, no, it’s not. That means you haven’t measured it. And, and then the Corolla area is, you know, you get into a platform, you, you start trying to build out an AI solution, and you have four or five people supervising the machines. And it turns out they’re actually teaching the machines four or five different processes. Right. And yeah, there’s this latent variable, which is, is it Bob or Sue or, you know, Jane who didn’t have her second cup of coffee this morning doing the job and the model doesn’t have access to that, that latent information. Right. So there are very few, some of those, the thing about some of those tasks where AI is starting to beat humans, you know, like I think about chess engines nowadays, way better than, than human players go. Image identification those are really neat and tidy tasks and the stuff that people do, knowledge workers do in their jobs not eat and tidy. And so that, that, that really is the starting point. Like what should we automate? What should we structure? And then let’s figure out how to do it with AI.
PT: Absolutely. And a lot of these tasks are just simply not scalable when it’s a human doing them.
CW: Indeed. Yeah. And so not scalable and, again, not uniform, so good. So that’s selling AI to the enterprise. I don’t know how easy any of this is to share, but like anonymized wins or client successes that you’re really proud of. You’d like to sort of share with the audience.
PT: Ooh. I mean there are quite a few
CW: Yeah, go for it.
PT: I can share them anonymously as you <laugh> mentioned. Yeah. we do have a few public companies as customers. Fair enough. Which is always very exciting for the semi early startups. I mean, we’re only two, three years old, three years, I guess at this point. One of them is one of the biggest communications systems creators, I guess you could qualify them as we have insurance that purchases our system. We have had financial institutions who use our systems on a one-off basis, but also some that do ARR. So being able to go across all of these verticals is really exciting to me cuz my, yeah, our vision for the company is really for everybody to be able to just plug and play privacy. And yeah. The fact that the models that we’ve trained are so flexible across all these in different industries is yeah, really exciting.
CW: Yeah. That is really exciting. Generalized generalizability with AI is, is obviously a hot topic in the research the research community talk to me about a use case that really surprised you.
PT: Oh we have some customers that use this Perna entity recognition because we have such an accurate system cuz we have to be so accurate. Yeah. that we’re the most accurate any system they’ve found on the market.
CW: Wow. OK. So they’ve tried Azure, Google, text extract, all of that stuff. Yeah. The Amazon. Yeah.
PT: Yeah. Which is hilarious. That’s hilarious. Unexpected side benefit. Other cases that have, I think that was the most surprising.
CW: Okay. Just vanilla NR, but really, really good. Yeah. That’s cool. I love it. So let’s let’s zoom out a little bit. You teased unstructured data at the top and of course that is the topic of the day. How do you Patricia Thane define unstructured data
PT: Mm. Data that is very messy or, you know, that you need to structure in some way that gets sort of value out of I guess you don’t necessarily need to. That is quite the question. There, Chris, thank you.
CW: You’re welcome.
PT: Yeah. Data. That’s not in a table.
CW: Yeah. That’s my favorite definition is something you can’t fit in an Excel sheet or a database. Yeah. At least not in a meaningful way. Like you could put pictures of cats and dogs and cells in Excel, but you’re not gonna run any formulas on them.
PT: Yeah, exactly. And just on that line, I’m sort of on that line. I mean, there’s so much to say about unstructured data. This is a random thought that came to mind. There’s this belief that companies get more value from structured data than unstructured data.
PT: And well, they put in the time to structure it. So that makes sense. But now we’re at the stage where machine learning is actually allowing us to do the same for unstructured data because of its unprecedented accuracy and scale. Yeah. so that is going to flip around very soon.
CW: Yeah. I agree. One of the earliest aha moments for me in working with unstructured data was we were trying to do a structured data machine learning project. And we like this data is just such a mess and realized that it was all transaction data. And we could go back to the source contracts using NLP and actually rebuild the database as brown truth. Cause the documents are actually the ground truth, right? Someone hand keyed these things and did a terrible job and painful.
PT: Machines work better than humans.
CW: Yeah. Because humans, if you don’t set up your database right. With the right validation, I can put a string in instead of 10%. Right. I can write it out. And that’s a dumb example, but I think we’re, we’re gonna see, we’re gonna see in the future where there’s gonna be a real opportunity for your data lake of unstructured data to actually be the ground truth instead of some distilled version that’s in a database that you’re not, you know, taking as good a care of as you should.
CW: Where do you work with a lot of companies that are working with their unstructured data? Where do you think we are in terms of where do you think the enterprise is? Let me phrase it that way. Where do you think the enterprise is on the maturity curve of working with unstructured data? You said it’s gonna flip soon, but like how soon is soon?
PT: Oh yeah. Soon on an enterprise scale. Not <laugh> yeah. Yeah. how I think it really depends. Some banks, for example, are light years away from other banks.
PT: Some insurance companies are light years away from other insurance companies. Agreed. and you can see that the ones who’ve started more recently and who have taken what AI can do for them from, the very beginning, ha are getting that competitive advantage that the hemos that are slower to move might be concerned about.
CW: Yeah. So I think there, you, you identified two of the markers that sort of distinguish maturity as being started recently and size, or I would, I might rephrase it as inertia.
PT: There are a few that are very large that are also innovative.
CW: Okay. Why are they innovative? What are the characteristics of them? That’s letting them win.
PT: I think a mandate from the top.
PT: Yeah. And I think good management,
CW:Good management. Yeah. On the podcast here, we talk with a lot of centers of excellence leaders for automation. And two of the characteristics that I often see in mature organizations is one just alignment top to bottom on the value and what we’re trying to accomplish. And mm-hmm, <affirmative> two, the people that are actually leading that COE and this describes all of the guests that I’ve had on the show actually know what they’re doing. They have good processes and good accountability and all of that. So governance is a huge one. Right, right.
CW: Yeah. So that lines up any other, any other markers that you would say you see as being like a sign that an organization like a big organization is, is ready to go with AI?
PT: Hmm. They’ve started thinking about data rather than just about hiring machine learning engineers.
CW: Yeah. Talk to me about that one. Yeah.
PT: Yeah. So the quality of the data is everything right. When you’re going to train the models. And in some cases some people will hire machine learning engineers and have them label the data and do all of the other work.
CW: Nope. Don’t do it.
PT: It’s just incredibly inefficient. Yeah. and I think that in part that’s about hiring proper project management with some experience in AI deployment it is, and I think that as more and more people get that experience in AI deployment and then move to these larger institutions or move across institutions to share that knowledge we’ll probably see more uptake of actual AI solutions that get deployed.
CW: Yeah. That, that’s a really good one. I wanna, I wanna turn that one into a soundbite don’t hire machine learning engineers until you’ve figured out your data and how you really wanna use it.
CW: Yeah. That’s great advice. And if you’re out there and you’re listening to this and you have machine learning engineers that are, that are labeling data on mass, like fi fire yourself, you’ve done a bad job.
CW: That’s not what they’re there for.
PT: Listening to that, but yeah. Maybe, maybe not fire yourself, learn.
PT: Move on.
CW: That’s the much more diplomatic way to put it. Yeah. But please do learn the lesson. Like there are people, the difference between structured and unstructured data is that it takes intelligence to decide what the right structure is. And two different intelligence looking at the same document, you know, even something as subtle as I’m an account receivable versus accounts payable, I’m gonna be more focused on different parts of that invoice as I work with it. Yeah. Right. So you have to bring that intelligence to those documents and images and whatever else it might be. And your machine learning engineers have intelligence, but it’s not that exactly.
PT: And this is actually something that I think healthcare AI has figured out because interestingly, the data is just so foreign to the machine learning engineers and it’s so obviously foreign. So who do they get to label it? Doctors. Nurses. Yeah. People who have some sort of growth in healthcare or biology. They don’t.
CW: That’s interesting.
CW: Okay. I generally don’t think of healthcare as being the cutting edge of anything. Of course, I live in the United States. So, you know, it’s a whole other ball of wax.
PT: That’s interesting, but you know, a lot of Canadian startups that do work in healthcare actually sell to us hospitals before they sell to Canadian hospitals. Fascinating. Cause you guys have more money. So you are at the cutting edge of initiative when it comes.
CW: This is good. This is good. I’m learning things here. I love it. So that’s where we’re on the maturity curve. Obviously depends on a few factors, which we talked about size of the organization, how recently they’ve started you know, whether they have, I would characterize it as whether they have someone in a, in a management role or a, a business project management role that has had a win deploying AI successfully. So the answer to how mature the enterprise is, is it depends on those things.
PT: It does. And it might change from team to team.
CW: Absolutely. Oh yeah, absolutely. Especially in some of these large orgs where you just totally siloed. Right. And you all have your own, your own it stack and business stack on top of one another. Yeah. what can thinking about this? And I like the point you raised about Canadian healthcare tech startups, what can AI vendors do to help the enterprise get up that maturity curve faster? Is there anything that we could all be doing together?
PT: I think it really depends on the use case.
CW: Okay. Yeah. Focus on a use case that some other, you talk to me about that. Yeah.
PT: Let’s see. So I think at one point it’s all about whether you’re addressing a current pain point and in some cases, a current pay point that they tried to address themselves for about a year or two and then failed. That’s really when you wanna come in. Yeah. And there’s, I’m, I’m actually a big believer that people don’t change very easily. And yes it takes personal experience for a lot of people recognize what the right path is.
PT: So I think it’s actually just about waiting it out.
CW: Interesting. So this is a bit contrary. I like this. So you’re saying the enterprise is eventually gonna figure out they have to see enough wins and you’ll get sort of a snowball effect, but,
PT: And it wins in enough to lose losses as well. Yeah. Not to do, to be able to look at a vendor and say, oh, these guys they’re not bullshitting me. They know what they’re doing. They’re gonna get, be able to deploy correctly.
CW: Yeah. Yeah. Interesting. I like that. And I’ll, you know, I can, I can attest that people don’t like to change. I’ve had the same haircut for eight years now and I <laugh>, I would, I would do something different. I just can’t think of anything.
PT: Funny. I find out how to cut my own hair in five minutes during the pandemic.
CW: Yeah. Like most of us. Yeah. Yeah. Oh man, what world? Okay. So for AI vendors waited out, is there anything that AI vendors can do? So Indico S had the same experience that we’ve had, we’ve had large, super big enterprise companies come to us and say, oh, we’ll try it, but we’ve tried two dozen other vendors, and then we win the contract. And so there I can, you know, sort of anecdotally say there’s a lot of truth to what you’re saying. Is there anything that AI vendors can do to help the enterprise understand better, some of those losses and why, when, when they do, when they do get a win by using your tech, whoever you are, Mr. Vendor or Mrs. Vendor why that was a win for them? Is there, is there any part of the sales process that could be better there?
PT: Yeah, I mean, there, it, it is sometimes tough to determine ROI. But giving your best as saying how many annotation hours or years went into it and how many engineer hours went into it. Give an example of other organizations with AI teams who chose to use your solution because they didn’t wanna build it themselves. Those things tend to work, but usually, the people are already trying to find the problem or, or already found the problem are trying to find a solution to the problem that they found and maybe they’ve trying to solve. I think with regards to the ones that aren’t there yet just soundbites of education. Yeah. And eventually, eventually there will be some conversion.
CW: Yeah. But sort of a, a nurture type of sales motion, right? Yeah. Yeah. Okay. That’s great. You talked about ROI and I, I wanna zoom in on that, but first I want to ask like early on, in my days building a data science team, I was, I was more interested in buying platforms to solve problems that were already sort of solved rather than yeah. We could, we could download, download Roberta and, you know, build our own training harness for it and, and ingest label data. But why don’t we buy a platform that already does that they exist? Yeah. Do you find yourself when people are evaluating your product, do you find yourself in a situation where the data scientists are like, now we can get there, we’ll build our own privacy engine. We don’t, we don’t need to buy an external process
PT: On occasion, on occasion. And so far, I think every single time they’ve come back about a year later for a year.
CW: Okay. And what had changed? They just failed.
PT: They either realized that they didn’t have time to prioritize it. Yeah. realize that it is a lot harder than they thought it was. Yeah. going to be. And at some point I think once they start looking into it, it can get overwhelming because they need to do it for multiple languages. They need to do it for multiple legislations. They don’t really have a grasp in privacy law. They might not have access to a privacy lawyer to give them advice. It’s, it’s just, and that’s just part of it. Right. The actual model deployment the actual making sure the model isn’t has no bugs in production. That’s a really painful part. Yeah. And who you have the team once that model is built to make sure that that model is going to be reliable.
CW: Yeah. There was a bit, there was a big trend about five years ago where everyone was talking about, you need to hire full stack data scientists. And while there are some, there are definitely parts of the stack they’d rather be working in than others. And often the, the, the, the model ops part of it is not the place where they wanna spend their time. Right.
PT: Right. And there are some people who love model ops.
CW: Absolutely. Yeah.
PT: Yeah. And, they didn’t become data scientists.
CW: That’s right. Yeah, exactly. Yeah. They are. They’re an even rare form of DevOps personality. Right. And super critical.
CW: So let’s see. We’re doing great on time here. What can you talked a little bit about, we talked a little bit about what the AI vendors can do for the folks out there in the enterprise, you know, listening to this podcast because they wanna learn more about unstructured data, where should they be going to get better educated? Is it just, you know, call up Chris and Patricia on LinkedIn, or are there resources that you recommend to folks or is it just, you have to try it out, you have to live it.
PT: So learning more about unstructured data or what AI can do for unstructured data
CW: The latter.
PT: Okay. I think most of the courses that you can find out there either on Sera or I think Microsoft has some good ones to you, to me and so on just get, learn a basics, AI about the basics of AI learn about what the data has to look like for at least a couple of projects so that you could get some idea of what those patterns are that the machine is learning.
PT: And yeah. Try to think of a task and then get a few samples of data and try to label it yourself and see, see what you learn.
CW: Yeah. Okay. That’s good advice. Get your hands dirty. Right.
PT: I figure hat helps you think of the nuances of where there might be corner cases in AI model. Isn’t going to pick up right off the bat.
CW: Yeah. So let’s, this is actually one of my favorite topics when it comes to AI and unstructured data is how, how do you turn the black box gray? Like, how do you, how do you help people understand why the AI made a mistake or why it made the right choice when it made the right choice?
PT: Yeah. It, it depends on the task there too. But if you think about AI detection that it made the right choice, I think normally if somebody thinks that it didn’t make the right choice and it actually did we, we send them the definition that we’re using for that particular entity.
PT: If it didn’t it’s normally, cuz we haven’t seen any similar context before, so okay. What we do there is not try to explain why it didn’t work to them. We just fix it and send them a
CW: Yeah. Yeah. Interesting. Okay. And on, on a related note, how do you when you said you do some of these like custom builds with a customer’s data, mm-hmm <affirmative> how do you, how do you help them and guide them towards, you know, choosing the right data to build the right model?
PT: Ah we only need a few samples of our data. It tends to be for a very specific use case. So we don’t really tell them too much about what data to give us as long as there’s we just ask for some variety of the kind of data that they’ll be processing.
CW: Okay. Awesome. Really lightweight. I love it.
PT: Yeah. It’s super lightweight.
CW: All right. So with the last five or 10 minutes, let’s put on our like, you know put on our magician hat or look into the crystal ball or whatever you wanna call it. What are you most excited about in the, the next couple of years in terms of where AI is going?
PT: Ooh. interesting question. I’m excited that it’s less hype and much more real world deployments. Now the tools are out there to make it possible for real world deployments in a way that weren’t, wasn’t available in 2016, say the models are much better. The understanding and expertise is growing and previously machine learning engineers were really hard to come by. Now, universities are producing a lot of those. We’re going to be able to see a lot more app real world deployments as a result of that as well. So what I’m most excited about is less talk more action
CW: Yeah, same. I, I am also very excited about that. I have survived the hype cycle and everyone carries some wounds, I think. What do you, what do you think are, how should I phrase this? What do you think the scariest things are about AI becoming more ubiquitous, especially in the, you know, in the enterprise context in the next few years?
PT: Mm. I would tie that into, I mean, of course the misuses, the possible misuses yep. For example, our, the technology that we’re building from what we build now to what we’re planning on building it can on, on a dime turn into surveillance tech
PT: And I’m building a privacy product and we’re building a privacy company together. But what if one day I accidentally onboard an investor who tells us, oh, look at that hundred million dollar contract with the NSA.
PT: Or with another surveillance organization for surveillance specifically. So here’s the thing. I, I don’t, I don’t disagree that a certain amount of surveillance is probably necessary, right? Yeah. For, for national security, however, not everybody needs to be surveil IED. And as we know, a lot of surveillance agencies do over extend. Yeah. what has been proposed for example, by an Kabuki and when she was privacy commissioner Ontario is privacy preserving surveillance. And that is something that our tech could help with by removing the personally identifiable information. Unless somebody says in a conversation I’m going to attack X location. Yeah. Right. Then you might wanna figure out whose name that was and where they’re located.
CW: Yeah. So conditional surveillance,
PT: Conditional surveillance. Yeah.
CW: I like that.
PT: Yeah. but there’s also no way of knowing how they’re using, how they’d be using the technology. So all of those things came up at night.
CW: Okay. All right. As you’re as more and more of your containers, get out into the wild. Right. Interesting. yeah. I like, I like the thought about conditional surveillance. I also really like the I really like that. You’re worried about that. That’s, that’s good. And it, it sort of falls into, this is a category we don’t get into too often because just because of the sort of profiles that we talk to, but ethical AI, right. And your, your training and intelligence, it’s, you know, it’s a Silicon intelligence, but it’s still an intelligence. And if you’re not careful, your intelligence isn’t gonna be ethical or ethically used.
PT: Absolutely. And I guess in part it’s also about yeah. Being careful who you partner with, being people who take money from being careful who you onboard on your senior leadership team.
CW: Yeah. Yeah. Absolutely. People that steer the ship. That’s great advice for the startup founders and the, the AI experts out there. The people, you know, at the forefront to wrap this up. My question of all the meta question, what did I not ask you that I should have?
PT: Ooh. Hmm. Give me a minute to think about that.
CW: Yeah. Take a minute.
PT: How about why I’m excited about our partnership with Indico.
CW: Oh, okay. Yeah. Why are you Patricia excited about this nascent partnership between our two companies?
PT: So I’m very excited about it because I love the way that Indico works with regards to processing and structured data, super convenient, bringing in a bunch of different services together. Yeah. and also making sure that share those services work well because curation is really hard for the organizations that are trying to determine what to build. So I’m really excited to be that we’re part of that pipeline now. And it’s it’s really great to be part of an ecosystem of high quality.
CW: Yeah. But I, well, let me say this first, I also am really excited. I was I was one of a couple people that got to kick the tires back and, you know, a few months ago and was really in love with the way the tech worked. So yeah, this is gonna be great. And two, I think the point you made earlier that just in the last five or six years, the tech has started to become good enough and is now good enough that you can deploy in the enterprise AI on unstructured data and find real ROI. This is, this is the point where you start to build an ecosystem of like, I have this tool for this, and I have this tool for this. And here’s the platform that glues ’em all together. And you know, I don’t do this all the time, but commercial for Indico Indico is that platform that glues them all together. So you know, come talk to me.
PT: That’s amazing.
CW: Great. Well, this has been Unstructured Unlocked. My guest today has been Patricia Thaine, CEO of Private AI, a fantastic privacy company. Patricia, thank you so much. It’s been a great time.
PT: Thank you so much, Chris. I had a really fun time.
Check out the full Unstructured Unlocked podcast on your favorite platform, including:
To learn more subscribe to our LinkedIn newsletter.