Building Better Search

In speech and writing, how often do we use one term — and only that term — to describe an idea? For example, if you were searching through a document for information relating to a business’ current assets, looking up only “current assets” would mean that you miss out on anything discussing cash, short-term assets, receivables, inventory, and prepaid expenses. Yet, in too many of our search interactions today, searching for information is limited to keyword lookups. Some newer techniques augment strict keyword based approaches to automatically include synonyms using pre-built dictionaries. While this can pay dividends, this approach can be brittle and isn’t as comprehensive as concept based searches. Effective concept based searches which account for not just synonyms but also for context in language can lead to a very different search experience. Imagine a scenario where a search for “wealth inequality” you also draws hits such as “the gap between the rich and the poor”, “unfair distribution of wealth”, “income inequality”, and so forth.
Pure keyword-to-keyword search is unintuitive to human speech and expression. It limits us — and with today’s deep learning capabilities, it’s a limitation that we can avoid.

Remember when a few of months ago, a judge ordered the then-nominee for the EPA, Scott Pruitt, to release thousands of emails so that the CMD watchdog organization could inspect them for ties to fossil fuel companies? Nearly 7,000 pages of emails were handed over, and the following day, CMD revealed that Pruitt was indeed friendly with various fossil fuel businesses. Now, we don’t know whether CMD used a keyword search to find the relevant documents, but if that was all they did, they would have had to brainstorm every possible keyword and its variations and still fall short of true results.

With a well-built concept search system, entering “fossil fuels” into the search bar should not only return all mentions “fossil fuels”, but anything related to oil and gas too, from fracking plans to oil companies.

For legal aides who spend days poring through hundreds of documents, emails, and other content to discover useful evidence, such a system would save a significant amount of time and lead to improved quality. It is applicable to other industries too, from finance to medical — anything which would require combing through reams of text.

So, how does fuzzy search work?

indico’s Text Features API creates of hundreds of thousands of rich feature vector representations for a given text input, learned using deep learning techniques. These feature vectors — numerical representations in multi-dimensional space — are a computer’s way of assigning meaning to language. We can use these representations to calculate similarity of concepts between sentences and a search query. This is why you can search a broad concept like “free market” and get results about money, competition, and demand.

An experiment

We decided to compare concept vs. keyword search on another public email dataset — the Enron emails — with a simple concept search model, built with indico’s Text Features API. Specifically, we explored the 1000+ emails of some randomly chosen users and determined which concepts to search for based on a word cloud analysis of the entire dataset from Zichen Wang. We broke down the emails to the sentence level using indico’s Text Features API’s automatic sentence splitting function. The top results for concept searches for the phrase effect of economic downturn were:

Results for query "effect of economic downturn"

These are all excellent examples of concerns and the impact of a recession — and note how our search query is explicitly stated in any of these sentences. We also found interesting results for human resources:

These results are particularly intriguing as they don’t mention “HR” or a specific task that we would associate with HR, like “careers” or “hiring”, but the content chosen by the Text Features API are clear examples of these functions in action.

Speaking of HR, while poking around through the database, we noticed an email that clearly indicated some kind of (failed) tryst had taken place between two co-workers. So, we pulled a sentence directly from that email to see if we could pinpoint any other communication that revealed a similar pattern…

Search query: “Nothing has changed that nor do I think we need to act weird around each other going forward.”

Ooh lala.

Note how out of all the fuzzy search top three results, only one email contained the term we were searching for. If we consider this on a grander scale, how much information are we missing out on by simply using keyword searches? How many legal cases, business deals, and other decisions may have been affected by incomplete information?

If you’d like to learn more or are looking to implement a machine learning solution for your business, reach out to us at contact@indico.io.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

Schedule Demo

Resources

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

November 12 at 8 AM PT | 11 AM ET

Technology

Solutions

Why Indico

By Industry

By Use Case

By Role

Services

Resources

Documentation

Customer Stories

Partners

Find a Partner

Become a Partner

Partner Portal

Company

Press & Events

Careers

BLOG

Building Better Search

So, how does fuzzy search work?

An experiment

Increase intake capacity. Drive top line revenue growth.

Related Posts

Artificial Intelligence, Digital Transformation, Insurance, Insurance Underwriting, Machine Learning

AI’s role in transforming insurance: Chapter 2 of Indico’s Decision Era eBook

Artificial Intelligence, Data Science, Digital Transformation, Insurance, Insurance Claims, Insurance Underwriting, Machine Learning

AI-enhanced decisioning: Transforming the insurance submission clearance process

Artificial Intelligence, Data Analytics, Insurance, Insurance Claims, Insurance Underwriting, Intelligent Document Processing, Machine Learning

Increase speed and accuracy in policy issuance for stronger business results

See how Indico Data’s AI-driven solutions can revolutionize your decision-making processes.

Schedule
1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.

November 12 at 8 AM PT | 11 AM ET

Technology

Solutions

Why Indico

By Industry

By Use Case

By Role

Resources

Documentation

Customer Stories

Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)

BLOG

Building Better Search

So, how does fuzzy search work?

An experiment

Increase intake capacity. Drive top line revenue growth.

Related Posts

Artificial Intelligence, Digital Transformation, Insurance, Insurance Underwriting, Machine Learning

AI’s role in transforming insurance: Chapter 2 of Indico’s Decision Era eBook

Artificial Intelligence, Data Science, Digital Transformation, Insurance, Insurance Claims, Insurance Underwriting, Machine Learning

AI-enhanced decisioning: Transforming the insurance submission clearance process

Artificial Intelligence, Data Analytics, Insurance, Insurance Claims, Insurance Underwriting, Intelligent Document Processing, Machine Learning

Increase speed and accuracy in policy issuance for stronger business results

See how Indico Data’s AI-driven solutions can revolutionize your decision-making processes.

Schedule1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.

Get our best content on intelligent automation sent to your inbox weekly!

Schedule
1-1 Demo