The Center for Data Innovation spoke with Bharat Krish, CEO and co-founder of RefineAI, a machine learning startup based in New York that uses AI to help video creators test their products by measuring real-time audience engagement. Krish discussed the various ways in which businesses can use emotion recognition technology and how RefineAI addresses potential bias in its algorithms.
Eline Chivot: In 2017, you created a face recognition and facial expression recognition algorithm. Why and how did you build it?
Bharat Krish: We built our face and emotion recognition deep learning models as the market at that time did not have a good, cost-effective solution for the use case we were tackling. Our goal was to build a solution for content creators to identify how engaging their content is and what their audience felt about it. When we approached some of the solutions in the market we realized they were falling short of our expectations. They were also not built for the real world and were not cost-effective. For example, they had a hard time recognizing my face and expressions as I have a darker skin tone. This is when we realized we needed to build better algorithms and models to tackle this problem.
We used a convolutional neural network (CNN, or ConvNet) and a pre-trained model as a basis for our recognition model. We modified the last two layers of the neural network and trained with a large amount of data that includes images of faces across ethnicity, age, and gender. We built the dataset from open source, university resources, and our own labeling. We also trained our models to recognize real-world scenarios such as low light, beards, reflection on glasses, and face tilts. All these efforts helped increase the accuracy of our models in the real-world.
The algorithmic analysis is only one piece of the solution. As we are tracking the emotions of faces in videos, an important component of the solution is also the workflow to process the videos. Videos can be in different formats and resolutions, and analyzed frame by frame. The data from the models for each frame is then matched with the correct frame in the video and aggregated together for deeper analytics.
Chivot: How do you measure emotional appeal and engagement levels?
Krish: We track seven basic emotions: fear, anger, disgust, sadness, surprise, happiness, and neutral. These are basic emotions across all humans. Human emotions are at a constant transient state and they are also a combination of multiple emotion types at any given time. For example, you could be surprised and happy at the same time—while receiving a surprising good news—or surprised and afraid at the same time—while watching a suspense thriller movie.
Each emotion indicator in the model is a probabilistic measure of that emotion appearing. The overall emotions appearing in a video is an average of the probabilistic measure of all the emotion classes.
Engagement levels are important to content creators and educators to understand if their content is interesting to their audience. For example, brands would like to know how engaged their audience will be while watching an ad. This could be the difference between clicking on the “Skip Ad” button or not. Companies working on this typically analyze the eye gaze. This poses many problems, one of them being the calibration to the video screen size and position. We instead analyze movements of the head to determine the levels of engagement.
Since we are analyzing the engagement and emotions evoked by the audience or the tester, this would be a universal measure even if tastes change. Having said that, we need to train our models and update the algorithms we use to keep up with the demographic changes and the global community of users of our product. We also believe that including qualitative analysis along with quantitative measurement is necessary to get a 360 degree feedback from content testers.
Chivot: Why would this kind of service be valuable to a company?
Krish: We have worked on several use cases that show how businesses can leverage AI-powered emotion recognition.
In the field of market research, we work with brands like Nivea for which it is important to understand the audience’s feedback to their ad content to create emotionally engaging ads. Unfortunately, traditional testing is expensive (upwards of $10K per test). So market testing is unreachable for most brands. The bigger brands only test during special occasions such as the Super Bowl. By using machine learning-based emotion recognition, we can get feedback from audiences for as little as $100 per test and results from audience panels across the world providing feedback from their own natural environment in real-time. So brands can test content with a global audience throughout the year for all their content.
We also work on sales pitches, as leveraging affective computing can offer self-awareness. We work with a large company to incorporate our emotion recognition into their apps for their salespeople to practice pitching and analyze the outcome with data. This can help them better prepare for actual sales pitches. Evoking the right emotion is important for selling, and sales people can gauge emotions using machine learning.
Our AI can be used for recruitment videos as well. Video interviews for jobs are becoming a very common way of screening for job candidates. There are many software platforms and application tracking systems using video-based assessment. But the video responses of many candidates can be a little overwhelming to recruiters. For example, if a job had attracted 100 applicants and each video interview lasted 30 minutes, this would result in 3,000 minutes of video, which would take 12 days for one person to review. With emotion recognition AI, recruiters are able to shorten this screening cycle dramatically.
Sports is another area where our service is valuable. Analyzing the sentiments of athletes and the audience can provide insights for automated highlights, sponsorship insights, and to come up with data-driven branding strategies.
Finally, in education, we have seen increased interest from universities to use emotion recognition and engagement detection in their research, for example to understand the effectiveness of the content on their students.
Chivot: How do you use this technology at RefineAI to promote diversity and inclusion? Why is this something you value so highly?
Krish: I have been promoting equality and inclusion throughout my professional career. I built diverse teams while in my corporate roles, I ran corporate diversity groups, I supported the Miami-Dade Public Schools STEM advisory board as their chairman, and I built a startup with a diverse team. People like me who are fighting against race, gender, and age inequality are not satisfied, because the progress is far too slow to reflect true diversity.
When organizations start using machine learning for decision making, the bias is further magnified if a biased dataset is used. For example, as I mentioned earlier, generic face recognition algorithms have a hard time recognizing my face as I am darker-skinned. This motivated me to build AI that is inclusive by training it with a diverse dataset. I also believe an inclusive AI is a better product and can make a business better. AI can also be used at a mass scale to detect bias and mitigate risks which humans might miss.
Chivot: Many applications of facial recognition are considered contentious, such as the use of facial recognition in surveillance or criminal justice, and have prompted calls to ban the technology entirely. How can you ensure that this technology is beneficial, rather than harmful?
Krish: I get asked this question a lot. My interest in building AI-based facial recognition was partly due to the failure of generic algorithms misidentifying people with darker skin tones, like mine. I attribute such issues to the diversity of the people building the algorithms and the diversity of the dataset used to train the algorithms. It is much more difficult and cost-prohibitive to gather diverse and inclusive datasets. Just search for common terms like “Engineer” or “Nurse” to find related images online. When leveraged for training algorithms, these readily available datasets can further propagate stereotypes.
We built our AI models with a diverse dataset incorporating data from various ethnic backgrounds, geography, and demographics. This will always be a work in progress.
We consciously do not work with the surveillance and criminal justice industry as I do not believe in the widespread use of face recognition in those industries. Machine learning algorithms have to be trained specifically to the use cases for maximum accuracy (i.e., minimum false positives). The dataset should include the representative demographic that the product will be analyzing. This takes time and effort to build and perfect. The use of generic algorithms and datasets can lead to wrong decisions that can disproportionately affect the minority population segments. The increased reliance on error-prone AI algorithms could also hinder human judgement.
The good news is there is a lot of discussion around this topic. There has been a lot of effort in making face recognition inclusive by recognizing bias and fixing it with diverse datasets. The research arena is well aware of the issues and has made a lot of progress. Amazon, Microsoft, and IBM have made a lot of improvements to their face recognition algorithms, and datasets have become more diverse.