Happy Chinese New Year!
With the lunar new year around the corner, we can’t help but wonder what the latest trends are during this special time of the year. These trends are not only interesting but also crucial for goods and service providers to capitalize on this spirit of celebration. Here at indico, we want to demonstrate how quickly and easily these kinds of trends can be found to empower businesses with the knowledge of what people buy and what they care about from potential customers themselves.
In this demonstration, we will be looking at Twitter activity relevant to Chinese New Year approaching the time of the occasion. We will be using Python (2.7+), an unofficial Twitter client library a third party, and indico’s Python client library. Essentially, we will be getting tweets from Twitter, extracting the text and any images, and feeding them to indico’s Keywords API and Image Recognition API.
To forego the technical bits, go ahead and jump to the bottom of the post.
Getting Started
Authentication is always rough, especially for those new to using online APIs. For Twitter, you will need the following:
Consumer Key
Consumer Secret
Access Token
Access Token Secret
To obtain these keys, you will need to:
1. Sign into a Twitter Account
2. Navigate to https://apps.twitter.com/
3. Click Create New App
4. Fill in the details. Feel free to put any placeholder website under Website
and leave Callback URL
empty.
5. Read and agree to the Developer Agreement
6. Should be navigated to your newly created Application page. Go to Keys and Access Tokens
7. Click Generate My Access Token and Token Secret
8. Now, on this page, you should have access to all 4 keys.
For indico, simply sign into https://indicodata.ai and find your API Key at the top of your dashboard. If you have troubles, chat with us straight from the chat window.
Installation
Assuming you have a development environment (access to a terminal or command prompt), go ahead and install the following:
1. Python 2.7+
2. PIP (a Python package manager)
3. Install TwitterSearch via PIP
4. Install indicoio via PIP
The Code
Here are our steps: get tweets, give it to indico, and get results. Don’t worry, the code is simple and straightforward, we promise.
try:
# Construct Search Query
tso = TwitterSearchOrder()
tso.set_keywords(tags)
tso.set_include_entities(True)
# Authorize the Client
CLIENT = TwitterSearch(
consumer_key="",
consumer_secret="",
access_token="",
access_token_secret=""
)
tweets = CLIENT.search_tweets_iterable(tso)
except TwitterSearchException as e:
# Catch Potential Search Exceptions
print e
Extracting the Text and Images
Twitter will give us loads of information about the tweets. That is extremely nice of them, but we only need certain parts.
# Extract Data
extracted_data = {}
for tweet in tweets:
#extracting the entities / media for images and hashtags
entities = tweet.get("entities", {})
media = entities.get("media", [])
#info will hold our tweet information
info = {}
info["id"] = tweet.get("id_str")
info["text"] = tweet.get("text", "")
info["hashtags"] = [tag["text"] for tag in entities.get("hashtags", [])]
info["photos"] = []
for medium in media:
if str(medium["type"]) == "photo":
info["photos"].append(medium["media_url"])
#Saving via the ID as a key to update tweets and prevent duplicates
extracted_data[info["id"]] = info
Additionally, indico’s APIs accept URLs as inputs! indico will handle downloading the images and downsizing them for you.
Adding Indico Data
# Adding Indico
for tweet in tweets.values():
try:
tweet["keywords"] = indicoio.keywords(tweet["text"], top_n=3)
except indicoio.IndicoError as e:
print e
photo_info = {}
for photo in tweet["photos"]:
try:
photo_info[photo] = indicoio.image_recognition(photo, top_n=3)
except indicoio.IndicoError as e:
print e
tweet["image_tags"] = photo_info
return tweets
Now, our tweets
object contains all the information about tweets we will need, including the additional analysis by indico! We can perform any kind of analysis on this data. A simple and impactful one is a frequency analysis of the tags we found. Additional tip: if you want to analyze data overtime or play with the data in a sandbox type of environment, we recommend you cache results from Twitter using cPickle (or similar libraries) to avoid being rate-limited by the API.
Analysis: Frequency
keywords = defaultdict(int)
imagetags = defaultdict(int)
for tweet in tweets.values():
for keyword in tweet["keywords"]:
keywords[keyword] += 1
for tags in tweet["image_tags"].values():
for tag in tags:
imagetags[tag] += 1
sorted_keywords = sorted(keywords.iteritems(), key=lambda x: -x[1])
sorted_imagetags = sorted(imagetags.iteritems(), key=lambda x: -x[1])
keywords_top_30 = sorted_keywords[:30]
imagetags_top_30 = sorted_imagetags[:30]
Now, we have the top thirty keywords and image tags of Twitter statuses. Taking a quick look at some of these tags, we see several keywords, such as “cny”, “chinese”, “celebrate(ing)”. These are obvious keywords given the occasion, so let’s ignore them. We can do this via a blacklist or simply ignore it. There are also nonsense keywords like “https” from links that we can ignore as well. Here’s a little bit of code to do that. Disclaimer: It is slightly inefficient but easy on the eyes.
IGNORED_LIST = [ "https", "rt", "celebrates", "chinese","celebrate", "celebrations", "celebrating" ]
sorted_keywords = sorted(keywords.iteritems(), key=lambda x: -x[1])
sorted_imagetags = sorted(imagetags.iteritems(), key=lambda x: -x[1])
filtered_keywords = filter(sorted_keywords, key=lamdba x: x not in IGNORED_LIST)
filtered_imagetags = filter(sorted_imagetags, key=lamdba x: x not in IGNORED_LIST)
keywords_top_30 = filtered_keywords[:30]
imagetags_top_30 = filtered_imagetags[:30]
Results
The following is some code to look up the source tweets for certain keywords and image tags.
def lookup_keyword(tweets, keyword):
return filter(lambda x: keyword in x["keywords"], tweets.values())
def lookup_imagetag(tweets, imagetags):
return filter(lambda x: any([imagetags in tags for tags in x["image_tags"].values()]), tweets.values())
Keywords
- “monkey”: Well, it is the year of the monkey so this makes plenty of sense.
- “warriors”: Let’s say we don’t know anything about sports. Here is an example of a tweet it came from:
RT @warriors: #Warriors Chinese New Year gear is now available at the @warriors_store! Happy shopping »
https://t.co/BuzWyDQ2VV https://t.cu2026
. Looks like the Warriors have Chinese New Year gear, and letting fans know. Chinese New Year themed merchandise is a great way to increase revenue for brands that find that their followers are excited about the year of the monkey. - “manchester”:
'RT @CNY_MCR: Looking for the perfect city to celebrate #ChineseNewYear look no further! https://t.co/Qn0rB7vyyY #Manchester https://t.co/AEu2026'
It looks like the celebration in Manchester’s Chinatown is getting a lot of attention. This is a great opportunity to increase tourism revenue. - “sesame”:
New Post on my Blog: Chinese Laughing Sesame Balls (u7b11u53e3u68d7) #ChineseNewYear SnacknnRecipe Link:u2026 https://t.co/QUD3YtKljk'
People are super into Chinese sesame recipes! It might be time for fooderies to roll out their special Chinese New Year edition foods. - “hamper” Huh, people must be all about those hampers, or this business found a large number of users to tweet about their giveaway. A little perplexed, but hey, whatever works.
'RT @vegetarianexpre: To celebrate #ChineseNewYear weu2019re giving away 5 @wingyipstore hampers & 1 deluxe hamper! For a chance to win, follow u2026', u'To celebrate #ChineseNewYear weu2019re giving away 5 @wingyipstore hampers & 1 deluxe hamper! For a chance to win, follow & RT.'
- There are several, several more, but we’ll leave that as an activity for the reader!
Image Tags
For the most part, the images came with tweets that were captured in the keywords analysis, so these tags should serve to bolster the findings and find popular items that people enjoy sharing. These tags describe the most popular kinds of images that the Image Recognition API picked up on.
book jacket, dust cover, dust jacket, dust wrapper
These tags are quite lengthy, but in general captures specific group of items, such as posters, books, and other flat objects. In our case, it captured both Chinese New Year posters and banners. Several campaigns on Twitter include pictures of advertisements as images and a lot of them will also be captured in this category. There could potentially be a space for new innovative Chinese New Year banners.
carousel, carrousel, merry-go-round, roundabout, whirligig
The Chinese New Year Festivities are being recognized in this category, which kind of makes sense considering there is a limited pre-determined set of tags for image recognition.
envelope
Red Envelopes! This is clearly quite a popular trend. Perhaps, there is a space for long-distance or bulk red envelope giving (electronically or not). Red envelope designs are apparently also something to show off!
jersey, T-shirt, tee shirt
A great way to customize apparel for the Chinese New Year!
plate
Plating the Chinese New Year themed dishes, eat up! If you’re looking for some, The Food Network has some great recipes for you to try at home!
I hope that you’ve enjoyed this tutorial and know that you can use it for not just only the Chinese New Year but for anything you want to dig a little further into. Knowledge is power and in this case, an ultimate competitive advantage!
Feel free to drop me a line at chris@indico.io if you have any questions, need any help, or want to talk use cases for your particular project. We’re all here to help.