Indico Data receives top position in Everest Group's Intelligent Document Processing (IDP) Insurance PEAK Matrix® 2024
Read More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report

BLOG

Recognizing Emotion in Text with Machine Learning (No Code Required)

June 15, 2016 | Business, Text Data Use Case, Tutorials

Back to Blog

Hey guys! I’m Julie, Director of Ops, back for a quick tutorial of two awesome new offerings we’ve built for you. I’m on the non-technical / cat loving side of things so I break it down a bit. I also run the Machine Learning Without a PhD group on LinkedIn where jargon isn’t allowed. Feel free to join if you’d like 🙂 #machinelearning4everyone

So today you’re getting a 2-for-1 (or as my dad likes to call it a “TooFer”).

1) An overview on using our shiny new indico toolkit
2) A demonstration of my favorite new indico API: Emotion API

Today’s topic will be the 2016 Tony Awards that aired this past Sunday night.

 

 


via GIPHY
With all the sadness in the world lately, we’re keeping it light this week.

 

Let’s begin by heading to your dashboard. Oh wait, you don’t have an account yet? Let’s fix that right now. And don’t worry, you get 10K API calls free and don’t need to enter your credit card.

 

After logging in, you’ll see the toolkit underneath your dashboard on the lefthand side of the screen, or you can just go to https://indicodata.ai/dashboard/analyze…and here we are.

 

 

indico toolkit

 

The toolkit is an awesome place to parse text for quick insight into your data using the following text APIs:

 

  • Sentiment Analysis
  • Text Tags
  • Language Detection
  • Political Analysis
  • Keywords
  • People
  • Places
  • Organizations
  • Emotions
  • Personality
  • Personas

Check out our Product page if you’d like further explanation of each API.

Anyway, as I mentioned earlier, we’ll be checking out our new Emotion API. We took the classic problem of Sentiment Analysis and kicked it up a notch to offer more depth and understanding to text in terms of five major emotions: anger, fear, joy, sadness, and surprise.

So let’s talk about Hamilton. You know, that musical that was nominated for a record-setting 16 Tony Awards. (16!) I haven’t been lucky enough to see this show and you’re a unicorn if you’ve managed to get tickets to watch it. In case you didn’t know – you have to pay to even get into the lottery for a chance at purchasing a ticket. Yep.

I grabbed some data in preparation of the whirlwind on Twitter for #Hamilton at the #TonyAwards – around 100 random tweets from Sunday afternoon (before the awards) that I then popped into the indico Toolkit. Paste it in, click on the API you want to use, and voilà! Machine learning.

 

 

 

The Toolkit will return a CSV file with four columns: authored text, predicted emotion, a linguistic assessment of confidence (e.g. “Very Confident”), and a numerical confidence score (e.g. “4”). So, what did we find? Joy. All the joy. Here’s a sample of the machine learning algorithm’s most confident assessments of joy:

 

 

Joy-before-the-awards

 

I then took a quick pulse after the Tony Awards were over to see whether anything had changed. Not surprisingly, we still have a lot of happy people.

 

 

Joy-after-the-awards

 

There were some results that algorithm really wasn’t confident about though, so it’s a good idea to do a bit of post-processing to remove any results that are marked as “less confident” or completely “not confident”. Here’s an example of a tweet that the model thought was expressing sadness, but marked as “not confident” about this result:

 

AND TOTALLY WELL DESERVED!!! ðŸ˜ðŸ˜ðŸ˜ðŸ˜ https://t.co/abCmw8QRVN

You can see why it would be confused by that. I’m confused by that.

We’re working on adding a feature to the Toolkit that lets you set a threshold so that the less/not confident results are automatically removed, but for now I just sorted everything in my spreadsheet based on the confidence score.

Let’s take an aggregate look at all the emotions expressed after Hamilton walked away with 11 wins. Yes, I said 11. You read that right. Anyway – I created this handy-dandy spreadsheet designed to autofill the results into a bar chart, and pasted everything from my downloaded CSV file right in.

 

 

Ta-da.

 

Here’s a closer look at the results, which I’ve post-processed to highlight the ones that the algorithm felt most confident about:

 

 

Aggregate results of emotion displayed on Twitter for Hamilton at the Tonys

 

Shocker – joy and surprise are the overwhelming leaders. I thought it was interesting that there were any instances of fear and sadness though, so I took a closer look. Here’s an example of what the algorithm thought felt fearful:

 

What a time to be alive @HamiltonMusical

Apparently that references a song from the Hamilton soundtrack. Just reading this tweet without having heard the song and not really knowing the context surrounding it…I could see it leaning towards either joy or fear. Maybe even sadness. My guess is that without an exclamation mark (i.e., What a time to be alive! Woohoo!) the algorithm figured (quite reasonably) that fear was the best choice.

Let’s take a look at what it thought felt sad:

.@HamiltonMusical literally saved my life last year. Got through one of the toughest times of my life thanks to this show. #TonyAwards

Perhaps that’s more hopeful than sad, but there’s definitely a sad element to it. Given the constraints of just five emotions for the algorithm to pick and choose from, “sad” is a pretty reasonable prediction. Perhaps we should try to add more categories — what do you think? Reach out to me at julie@indico.io with your thoughts!

The one emotion that wasn’t represented in that chart above was anger. In fact, there was one tweet that was marked as angry, but I didn’t include it as it didn’t meet the “very confident” threshold. However, I’m going to show it to you anyway because I thought it was pretty cool that the model picked up on it (?):

a bunch of Tonys could be the thing that finally gets the mainstream press to pay attention to #HamiltonMusical #TonyAwards

So, why is this awesome?

Think about how this algorithm scales. I only looked at 100 tweets here, but if you have 1,000, or 10,000 – or even 100,000 tweets, you can have an immediate gut check on your brand or campaign in seconds, without having to read through every single one until you want to cry. Get the information you need quickly, and then get on to what really matters to you, whether it’s writing a report on the success of your recent ad campaign, or figuring out who your best target audience is. Looking at demographics? That’s old news. Now you can understand – really understand – the people you’re trying to talk to.

And there you have it. Unstructured text data in….emotions out. Email me at julie@indico.io if you want the template (the one that produced that bar chart) for your own use.

And in closing…

 

 

 


Effective January 1, 2020, Indico will be deprecating all public APIs and sunsetting our Pay as You Go Plan.

Why are we deprecating these APIs?

Over the past two years our new product offering Indico IPA has gained a lot of traction. We’ve successfully allowed some of the world’s largest enterprises to automate their unstructured workflows with our award-winning technology. As we continue to build and support Indico IPA we’ve reached the conclusion that in order to provide the quality of service and product we strive for the platform requires our utmost attention. As such, we will be focusing on the Indico IPA product offering.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

[addtoany]

Unstructured Unlocked podcast

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

podcast episode artwork
March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

podcast episode artwork
March 13, 2024 | E42

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

podcast episode artwork

Get started with Indico

Schedule
1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!