5 Q’s with Sentiment Analysis Expert Oleg Rogynskyy
The Center for Data Innovation spoke with Oleg Rogynskyy, the founder and CEO and founder of Semantria, a sentiment and text analytics startup based out of Montreal. Sentiment analysis, which applies machine learning, natural language processing and computational linguistics to extract meaning from text, has been applied to political campaign targeting, automated comment moderation and advertising, among other areas.
Travis Korte: First, what is the concept behind Semantria? How did the business come about?
Oleg Rogynskyy: I’ve been in text mining and sentiment analysis for almost 8 years now, having worked at nStein and Lexalytics, which is one of the older text mining and sentiment analysis companies out there. They were the ones who had the first commercial sentiment analysis software available, about ten years ago.
Semantria was founded in 2011 and the concept was pretty simple. I realized there were so many more use cases with this technology, and the $100,000 price tag for enterprise-level service and software is a really strong barrier for smaller companies to use the technology. The idea behind Semantria was to enable a nontechnical user to get access to fully-fledged sentiment and text analysis and classification in under three minutes and for less than $1000. And we were able to achieve this goal with a combination of several tools and technologies that we built on top of sentiment and text mining engines which we licensed from Lexalytics.
TK: What goes into making this technology easy to use?
OR: The first step to getting it into the hands of your average user was to build a very easy to use web API. So we built a Semantria API in the Amazon cloud with as standard of an interface as possible. So we literally see people at hackathons integrate with Semantria in under 20 minutes. The API also enables low cost usage of the technology. The pricing plans start at $999 dollars, so you don’t need a corporate budget to start using it. We have hundreds of startups and even individuals using Semantria today, just because the technology is so affordable.
The second step was to get this technology in the hands of a nontechnical user, maybe someone doesn’t even know what API stands for. So for that purpose we looked at all the software that a data user would have on his computer. We looked at SPSS, we looked at building a web interface, and we realized there is one piece of software everyone has: Microsoft Excel. We chose to integrate with that, the most common data management platform out there, and that’s how the Semantria Excel plugin came to life. What the plugin lets us do is open an Excel file or a CSV file with tweets, survey responses, emails, basically any textual information, and we can analyze it in minutes.
TK: Who is using the software and what are they using it for? What are some interesting applications you are seeing?
OR: Semantria has a plethora of customers right now, ranging from Fortune 50 companies, to individual users, analysts, marketing directors, etc. The use cases fall into three categories. One is social media monitoring; you get a lot of tweets and Facebook posts, and you want to know what’s being said about this person or this brand, etc. The second one, which is the fastest growing for us, is the customer experience management space. There are a bunch of vendors out there who come into, lets say, McDonalds, take over all of the McDonalds survey systems, survey all the customers going to all the restaurants, collect this massive amount of feedback, plug it into one common platform and then use Semantria to produce intelligence and reports for their customers. The third one is market research; this includes focus groups, a combination of social media and internal data, maybe some interviews, etc., that market research companies analyze.
However, all these use cases are pretty old, so a lot of people before Semantria were doing the same thing. What Semantria is interesting for is due to the much lower price point, as well as due to the fact that you don’t need to be technical, we democratized and spurred many more use cases that weren’t possible before. One of the product managers at probably the largest scented candle manufacturer out there came to us, got our Excel plugin, and the use case is very interesting. They started analyzing all the mentions of scents on social media. Like when you say on Twitter, “I just smelled cinnamon and it made me think of Christmas.” Those kinds of tweets, Facebook posts, blog posts, etc., are what this company is looking for. When they find them, they run trend analysis on Semantria and figure out which scents create which associations in people’s minds. They also put these on a timeline; a lot of people talk about the smell of forests in the springtime, so this company releases a forest-scented candle right before the springtime and targets this segment of the market they had no idea about. As it turns out, they found 30 or 40 new smells and scents that they brought to the market, and that strategy turned out to be very successful for them.
Another one that is very interesting is a large car manufacturer, probably third or fourth largest on the planet. They employ social media and Semantria in a very interesting fashion. First they scan social media for all mentions of their brand or other brands, and especially they pay attention to intent to purchase. You use Semantria to detect something along the lines of “I’m looking to buy a new car, should I buy a Mercedes or BMW?” And once they find these mentions on the internet, they use location data to find out where the tweet or Facebook post came from, and then they feed this data into the CRM (customer relationship management) system, and deliver these leads to local dealers, and follow up to see if the dealers got in touch with this person, offered deals, etc. They grab the leads of the people in the market to buy a car before they even reach competing dealerships.
Another strategy they do is actually Foursquare-based, and this one is extremely interesting from an operations standpoint. They track every check-in on Foursquare at every competing dealership in the United States. It makes sense, because when someone checks in at a dealership, they either have a problem with their existing car or they’re looking to buy a new car, both of which make them an ideal candidate for other dealers to reach out and offer discounts and lure them in.
The third use case I wanted to mention, we were amazed about. It turns out there are a lot of companies out there, both in fantasy sports world and real sports, world who are looking to discover new sports talent. And also it turns out that successful athletes are being referred to in a specific way; there are specific keywords and a specific way of talking about athletes that usually succeed versus those that don’t succeed. Two clients are using the Semantria API and plugin to track mentions in the news of up-and-coming athletes, and then they build some kind of ranking system to determine whether a given athlete is being talked about in a way that signifies success. I don’t know exactly how they do it, but they make money using our technology and figuring out which athlete is more likely to succeed.
TK: You use something called a “semantic thesaurus” behind the scenes. Can you explain that a little bit?
OR: Sure. One of the biggest challenges in text mining and sentiment analysis is understanding context. A huge problem for Microsoft, when they scan social media to find out what’s going on, is that they have to make sure they don’t track “dirty windows” when they’re looking for “Microsoft Windows.” The computer just doesn’t know which windows you’re talking about. Same goes with “office.” Is someone talking about their office in physical space, or Microsoft Office the software product? People just say “office.” So the only way to tackle this problem is to understand the context of the message. That’s what we try to solve with our Wikipedia-based ontology. We call it a concept matrix; most of it came from our parent company Lexalytics. But basically what it is is a huge thesaurus of everything that is on Wikipedia. We can measure semantic distance between two concepts on Wikipedia and determine whether they’re closely related or not. For example, we’ll know out of the box that a cat is closely related to a lion because they’re both cats. A cat is loosely related to a hippo because they’re both mammals. And a cat is not related to an airplane at all, for example. This kind of distance measurement allows us to tell the difference between “Microsoft Windows” and “dirty windows.” It lets us understand that while the word “suck” could be negative in the real world, when you talk about vacuum cleaners “suck” is actually positive. The technology allows us to detect context and make decisions based on the context.
TK: What’s your longer-term vision for sentiment analysis? What will it allow us to learn five or ten years down the line?
OR: Right now it’s not a precise technology. Contextuality is a huge problem that everybody is trying to solve, and furthermore sentiment itself is a hard thing to measure and it’s either positive or negative. The next step we see the industry moving into is actually emotion analysis. Instead of one axis of negativity to positivity, you’re going to have multiple axes: how angry or how kind or how loving this message is. It will provide you with a much deeper measure of what feelings are being expressed in the content. There’s a lot of startups in this space that are trying to get into emotion analysis, but the problem we see with this kind of knowledge is very few companies can take advantage of sentiment itself. Without something deeper it’s a waste of theirs and your time, because people are not ready yet. It’s too far out—five or maybe even seven years from becoming mainstream. The vision in general is getting into understanding what people are emotional about, what kind of emotions you’re advertising for your product.
There are thousands of use cases, so we invite everyone to go download our Excel plugin, which is free to try, and come up with great ideas. It’s a clean canvas for everyone to start working with and we keep on being amazed everyday at how creative people are. The more people are trying it, the more democratized this technology will be.