What Even is Sentiment Analysis?

Machine learning is an incredibly frothy space. It almost seems like every week there’s some new company using machine learning in some revolutionary new way. With all of the deep learning, data mining, and artificial intelligence buzzwords floating around, it can be extremely difficult to separate the signal from the noise.
This is not one of those articles. Machine learning is an incredibly tough subject, and if the goal was to inundate you with buzzwords and technical detail until you believed I was qualified to talk about machine learning it would be simple. What’s much harder is doing the opposite. I’m going to take a piece of machine learning, in this case sentiment analysis, and explain what it’s good for, what it’s not, and what to watch out for without using a single line of code.

What is sentiment analysis?

At the highest level it’s taking a piece of text and determining whether it is positive or negative. It’s something that humans do automatically, so it can be difficult to appreciate how difficult it is to teach a computer to do the same. Let’s grab an example from the classic Hutzler 571 banana slicer:

As shown in the picture, the slicer is curved from left to right. All of my bananas are bent the other way.

As a human, I can easily tell that this is a negative review. If the banana slicer is facing the wrong way, then it’s clearly broken. Amazon confirms this – it’s a two star review.
But here’s the question:

How did I know this was a negative review?

There are no clear negative words in the review. If they had said something like, “this is an awful banana slicer,” then I could have guessed that since someone used the word “awful”, it’s a very negative review. When it comes to more subtle phrases like the one above, it can be very difficult to say exactly why I know it’s a negative review. Even though I’m certain that it is negative, it is very hard for me to explain why it’s negative.

How do I teach a computer to do that?

The short is that it’s not easy. Realize that as humans, we approach the problem with all sorts of knowledge that a computer doesn’t have. To a computer, that review looks like this:

01000001 01110011 00100000 01110011 01101000 01101111 01110111 01101110 00100000 01101001 01101110 00100000 01110100 01101000 01100101 00100000 01110000 01101001 01100011 01110100 01110101 01110010 01100101 00101100 00100000 01110100 01101000 01100101 00100000 01110011 01101100 01101001 01100011 01100101 01110010 00100000 01101001 01110011 00100000 01100011 01110101 01110010 01110110 01100101 01100100 00100000 01100110 01110010 01101111 01101101 00100000 01101100 01100101 01100110 01110100 00100000 01110100 01101111 00100000 01110010 01101001 01100111 01101000 01110100 00101110 00100000 01000001 01101100 01101100 00100000 01101111 01100110 00100000 01101101 01111001 00100000 01100010 01100001 01101110 01100001 01101110 01100001 01110011 00100000 01100001 01110010 01100101 00100000 01100010 01100101 01101110 01110100 00100000 01110100 01101000 01100101 00100000 01101111 01110100 01101000 01100101 01110010 00100000 01110111 01100001 01111001 00101110

So, we have to go from that to “this is negative”. As a human we know that those 0s and 1s are letters. We know that when you combine those letters they form words, and when you put those words together they create sentences. Those sentences can express how we feel about a particular topic. The human mind goes through an extraordinarily complex process to figure this all out and make an extremely accurate prediction, and we don’t even realize it.

We’ve been trying to teach computers how to do this for over a decade now and we’re… making progress.

How do we know we’re making progress?

Excellent question! So, to figure out if we’re making progress we need to set a goal. When you hear that someone hit 94% accuracy, what’s that out of? What’s 100% accuracy?

For such a complicated problem, the answer is actually quite simple. Basically, this is what we do:

It takes 5 minutes just to scroll to the bottom.

We go through a very long spreadsheet. In this case, it’s filled with tweets. We manually decide whether each one is positive or negative, and we write the answer down. I can not emphasize enough how painful this is:

People earning six figures a year scroll through a spreadsheet reading tweets and deciding whether they are positive or negative. For weeks at a time. This is not a joke.

This is called “labeling data”, and it is probably the single largest limiting factor for machine learning today. We have people do it, we say that’s what 100% accuracy is, and then we have the computer do it. The more the computer agrees with the person, the more accurate we consider it to be. Seems straightforward, right?

Why it’s not quite that simple

It turns out that not all data is created equal. Predicting sentiment is a very different problem depending on whether the text you’re making predictions on consists of Amazon reviews, tweets, or medical journals. It also depends how much data you’ve got.

It’s really easy to predict the sentiment of a single news article. It’s much, much harder to predict the sentiment of 100,000 tweets.

To make things that much more complicated, we also have to deal with the learning process. When you teach a computer what sentiment is, you end up showing it a huge number of examples. Depending on the data you’ve got, the number of examples you might use range from a few hundred to hundreds of millions.

So what’s the problem? Well it’s not really fair to use those examples when you check your model’s accuracy. After all, you already know the answers.

Like this, but with computers.

To make things even more complicated, there’s the whole question of domain. The thing is that there are SO MANY different kinds of text. Each one has a different “sentiment” associated with it. “Continuous cleavage” means something very different in a geologic context than it does in everyday speech. No model is going to be the best at every kind of text. If you ever end up maintaining a sentiment analysis API, here’s a quick sampling of the kinds of things people are going to ask you to determine the “sentiment” of:

Scientific journal articles
UN proceedings
Facebook posts
Cat pictures
Pitchfork reviews
The number 7 (less positive than 3)
Obama
War and Peace

Again, this is not a joke. People ask us to classify things like this every day. When you’re handling hundreds of requests each second (this is the scale that indico deals with, other companies deal with very different amounts of data) you really see it all.

A model that is really good at analyzing Facebook is probably not that good at analyzing Nature articles

Personally I like to consider myself pretty up-to-date with the world at large. I’m an early adopter of a lot of technology, and am always looking for a new artistic TV show… but I just learned who the Kardashians were last week. In machine learning terms you might say I “overfit”. I “failed to generalize” to examples outside of my “training set”. In human terms you might say that I’ve been living under a rock. I may have gotten really good at recognizing everything under my rock, but there is a very big world out there.

You’re telling me this isn’t a phone?

This “overfitting” problem is very well known and has been studied in incredible detail. The question of “How do I get this computer to generalize?” is one that has existed for as long as machine learning itself, and to be perfectly honest it’s still one of the biggest problems in machine learning.

Let’s get real

What are the affects of this in the real world? Well it means that it’s almost impossible to put together an accurate measure of how “good” we are at sentiment analysis. Most commercial systems for sentiment analysis explicitly prohibit the use of their products to assess accuracy. The reason for this is that any measure of accuracy is subject to bias. Any company can put together a series of tests that makes their sentiment analysis look more accurate than everyone else’s. It just happens to be extremely dishonest, and in many cases illegal.

To show you how wide the spread can be among very similar problems, we’ve gone ahead and posted some of our accuracy numbers. We’ve removed the neutral class from these examples because our product only predicts positive vs. negative. This may make our numbers seem artificially inflated compared to others benchmarking on the data. This data should not be used to compare indico to any other services, as such a comparison would be apples to oranges.

If some of those metrics like “precision” and “recall” aren’t familiar to you, don’t worry. The point is to show you how a single model can have such a wide set of responses.

Conclusion

Be extremely wary of benchmarks, especially if a company has gone out and chosen their own dataset. At the end of the day, you want to be able to try it yourself. If you have a specific dataset, you need to make sure the benchmarks are run on the same data that you will be using them on. If you don’t have a dataset at the ready, you should rely on your own qualitative assessment of the model. Chances are, it’ll give you a better sense of the API than any benchmark will.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

Schedule Demo

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Upcoming Webinar

From Automation to Agency: Indico Data Unveils the Future of Insurance with Agentic AI

Technology

Solutions

Why Indico

By Industry

By Use Case

By Role

Resources

Documentation

Customer Stories

Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)

BLOG

What Even is Sentiment Analysis?

What is sentiment analysis?

How did I know this was a negative review?

How do I teach a computer to do that?

How do we know we’re making progress?

<img decoding="async" class="alignnone size-full wp-image-1700" src="https://indicodata.ai/wp-content/uploads/2021/09/Screen-Shot-2015-08-14-at-6.20.37-PM-2.png" alt="This spreadsheet is longer than War and Peace">It takes 5 minutes just to scroll to the bottom.

Why it’s not quite that simple

Like this, but with computers.

You’re telling me this isn’t a phone?

Let’s get real

Conclusion

Increase intake capacity. Drive top line revenue growth.

Related Posts

Artificial Intelligence, Digital Transformation, Insurance, Insurance Claims, Insurance Underwriting, Intelligent Document Processing, Machine Learning

Maximizing underwriting profitability through AI-powered document processing

Artificial Intelligence, Data Analytics, Data Science, Digital Transformation, Insurance Claims, Insurance Underwriting, Intelligent Document Processing, Machine Learning

Better data, better decisions webinar recap: How AI and automation are reshaping risk and pricing

Artificial Intelligence, Data Analytics, Data Science, Digital Transformation, Insurance Claims, Insurance Underwriting, Machine Learning

Innovation vs. risk in insurance decisioning: Leveraging AI to transform strategy

See how Indico Data’s AI-driven solutions can revolutionize your decision-making processes.

Schedule1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.

Get our best content on intelligent automation sent to your inbox weekly!

It takes 5 minutes just to scroll to the bottom.

Schedule
1-1 Demo