One of our core values as a company is about giving back and supporting our community. We regularly sponsor hackathons (over 10 so far this year!) to encourage students to get started in the exciting world of machine learning. Recently, we sponsored HackPrinceton. We found Waseem Khan’s ContextCam project particularly interesting, so we reached out to learn more about it.
What is ContextCam, and what inspired you to build it?
My project, in essence, is a camera for the blind, using image recognition APIs to deliver an audio description to users to help them understand what they’ve got in front of them.
My undergraduate research is in computer vision, so I was hoping to bring what I’ve learned in this niche and apply it to a project that could help the disabled and increase accessibility. I made the project entirely with stuff from the hackathon, including a Raspberry Pi camera and the cardboard outer shell.
How did you build ContextCam?
I used indico’s Facial Emotion Recognition API, IBM for image recognition, the open source Natural Language Toolkit (NLTK) for grammar analysis, pyDictionary for definitions, and pico2wave for text to speech. I did my own grammatical analysis to make a grammatically correct sentence based on the returned API results. This way the person could take a picture of an apple and receive adjectives, a definition, and other possible categories. Or, one could take a picture of someone’s face and see their top three most likely emotions to judge mood.
What do you plan to do with ContextCam in the future?
ContextCam was originally supposed to be entirely local and I had planned to train my own ConvNet, but I found the APIs to be more accurate. So, I decided to make it an Internet of Things device. I designed the outer frame with cardboard, a plastic cup, a lanyard, a free power bank, and hot glue. Since I had to return the parts to the hackathon, I no longer have it, but I plan to do future work in computer vision — hopefully for good.