Image: Prompt for ChatGPT: "An abstract image showing the concept of strength in diversity with natural elements"
Hey Everyone,
The AI Startup I’ve been thinking the most about this week is likely Hume AI.
Meet Hume’s Empathic Voice Interface (EVI), the first conversational AI with emotional intelligence.
With a single API call, interpret emotional expressions and generate empathic responses. Meet the first AI with emotional intelligence.
Try it here: demo.hume.ai
Hume's EVI API can do more than just chat - it can also take actions!
EVI will be publicly available in April. If you’re a developer interested in earlier access to the API fill out this form: https://share.hsforms.com/15hCR14R4S-e-dlMwN42tkwcjsur
Series B
They have just raised $50 million in a Series B, and a NY based.
The round was led by EQT Ventures, with participation from Union Square Ventures, Nat Friedman & Daniel Gross, Metaplanet, Northwell Holdings, Comcast Ventures, and LG Technology Ventures.
Those are some slightly unusual backers.
I am very bearish on facial recognition and emotive prediction in general. Hume AI is a startup and research lab building artificial intelligence optimized for human well-being.
Hume operates at the intersection of artificial intelligence, human behavior, and health and well-being and has created an advanced API toolkit for measuring human emotional expression that is already used in industries spanning from robotics to customer service, healthcare, health and wellness, user research, and more.
The measuring of emotions by AI is not without controversy but Hume AI brands itself as “the feelings lab”.
In the PR they say that Hume AI was founded by Dr. Alan Cowen, a former Google researcher and scientist best known for pioneering semantic space theory – a computational approach to understanding emotional experience and expression which has revealed nuances of the voice, face, and gesture that are now understood to be central to human communication globally.
His tagline is typically “teaching AI to make people happy”.
A Voice AI with Emotional Intelligence?
Left to right: Michael Opara and Lauren Kim founding engineers; Alan Cowen, founder, CEO and chief scientist; and Janet Ho, COO, are building "emotionally intelligent" AI models.
Capabilities of EVI in March, 2024
EVI understands the user’s tone of voice, which adds meaning to every word, and uses it to guide its own language and speech. Developers can use this API as a voice interface for any application.
✨EVI has a number of unique empathic capabilities ✨
1. Responds with human-like tones of voice based on your expressions.
2. Reacts to your expressions with language that addresses your needs and maximizes satisfaction.
3. EVI knows when to speak, because it uses your tone of voice for state of the art end-of-turn detection.
4. Stops when interrupted, but can always pick up where it left off.
5. Learns to make you happy by applying your reactions to self-improve over time.
And includes fast, reliable transcription and text-to-speech and can hook into any LLM.
Over the years the team at Hume AI has grown as well:
Voice AI’s Future?
The company believes AI voice products have the ability to revolutionize our interaction with technology; however; the stilted, mechanical nature of their responses is a barrier to truly immersive conversational experiences. The goal with Hume-EVI is to provide the basis for engaging voice-first experiences that emulate the natural speech patterns of human conversation.
That’s sounds somewhat like what ElevenLabs have been doing.
Their demo is voice driven so you better have a microphone connected:
This is the sort of startup that someone like Amazon might want to acquire one day.
Hume is a research lab and technology company. Our mission is to ensure that artificial intelligence is built to serve human goals and emotional well-being.
I just wonder if the prediction of emotion actually serves emotional well-being? The very act seems like an API begging to be abused.
Their empathetic voice interface, EVI, is an emotionally intelligent conversational AI is the first to be trained on data from millions of human interactions to understand when users are finished speaking, predict their preferences, and generate vocal responses optimized for user satisfaction over time. These capabilities will be available to developers with just a few lines of code and can be built into any application.
That sounds like the foundation of a social commerce platform AI. Valuable to E-commerce leaders like Amazon, Pinduoduo, Walmart, Alibaba and others.
The company’s eLLM enables EVI to adjust the words it uses and its tone of voice based on the context and the user’s emotional expressions. Does that suggest it’s multimodal and require video as well?
Hume AI seeks to understand more nuanced and often multidimensional emotions of its human users. On its website, the startup lists 53 different emotions it is capable of detecting from a user, including:
Admiration
Adoration
Aesthetic Appreciation
Amusement
Anger
Annoyance
Anxiety
Awe
Awkwardness
Boredom
Calmness
Concentration
Confusion
Contemplation
Contempt
Contentment
Craving
Desire
Determination
Disappointment
Disapproval
Disgust
Distress
Doubt
Ecstasy
Embarrassment
Empathic Pain
Enthusiasm
Entrancement
Envy
Excitement
Fear
Gratitude
Guilt
Horror
Interest
Joy
Love
Nostalgia
Pain
Pride
Realization
Relief
Romance
Sadness
Sarcasm
Satisfaction
Shame
Surprise (negative)
Surprise (positive)
Sympathy
Tiredness
Triumph
I’m not clear why the product is named after the philosopher David Hume. David Hume had a sharp wit and was a skeptic, certainly not the empathetic sort.
"Hume’s empathic models are the crucial missing ingredient we’ve been looking for in the AI space," said Ted Persson, Partner at EQT Ventures who led the investment. "We believe that Hume is building the foundational technology needed to create AI that truly understands our wants and needs, and are particularly excited by Hume’s plan to deploy it as a universal interface."
"What sets Hume AI apart is the scientific rigor and unprecedented data quality underpinning their technologies," said Andy Weissman, managing partner at Union Square Ventures.
They do have a lot of publications but I’m still unclear on exactly what they do, even after reading their PR, marketing material and browsing their papers lightly.
What are the Applications of EVI?
This feels like something that could automate screening job applicants and HR duties. Who needs an empathetic LLM? Perhaps HR, sales and customer service for the most part.
The EVI was built using a kind of multimodal generative AI that combines standard large language model capabilities with expression measuring techniques. The company calls this novel architecture an “empathic large language model” or eLLM, and says this is what allows EVI to adjust the words it uses and the tone of its voice, based on the context and emotional responses of human speakers.
It’s clever, with the ability to accurately detect when the speaker is ending their conversational turn, so it can start responding almost immediately, with latency of less than 700 milliseconds. It’s sensitive too, and it will stop speaking if the user interrupts it. It all adds up to a much more fluid, humanlike conversational interaction.
However, I am not clear on their case studies.
As a Series B startup they are relatively early still in their application of their technology. They have around 35 employees so far but with the new funding they should be able to expand their team and find actual applications.
They envision a future where AI systems use scientific approaches to fulfill human needs. Emotional intelligence is the missing ingredient needed to build AI systems that proactively find ways to improve your quality of life. For me ideas are:
Senior care
Early childhood education tutors
Some nursing interactions
Some business development scenarios
Some customer success circumstances
For robotics and robot to human interactions
For smart car intelligence, “you sound sad today, want me to play you your favorite song?”
Where is emotional intelligence needed in society and in day-to-day situations in modern life? There are a lot of potential interactions here.
Their series A was only a bit under $13 million.
"Hume AI’s toolkit supports an exceptionally wide range of applications, from customer service to improving the accuracy of medical diagnoses and patient care, as Hume AI’s collaborations with Softbank, Lawyer.com, and researchers at Harvard and Mt. Sinai have demonstrated."
The influx of new funding values the startup at $219 million. Nat Friedman was involved, but wait, he’s involved in nearly everything!
The company also announced the launch of “Hume EVI,” a conversational voice API that developers integrate into existing products or build upon to create apps that can detect expressional nuances in audio and text and produce “emotionally attuned” outputs by adjusting the words and tone of the AI.
Hume AI TL;DR
Hume AI's EVI: A leap forward in AI chatbot technology
Hume AI's EVI is a unique voice interface currently undergoing beta testing.
Unlike other AI chatbots, EVI can interpret the emotional tone of human speakers to better comprehend their speech.
It adjusts its responses based on the user's emotional tone, using data from numerous human interactions to create a proper vocal response almost instantly after the user has finished speaking.
I’ve been writing about Voice tech and Voice AI for a long time. It’s been a long wait to HER (2013). Will corporate AGI feel like a smart Operating system? Possibly, possibly not so emotionally intelligent.
Besides its empathic conversational capabilities, EVI supports fast and reliable transcription and text-to-speech functionality, meaning it can adapt to a wide range of scenarios. Developers will be able to enhance it even further by integrating it with other LLMs.
If I bought a robot for $12,000 in 2026, you’d think it would need some emotional intelligence capabilities.
The founder and CEO is just 33, I think his team have time to perfect this product. These capabilities will be available to developers with just a few lines of code and can be built into any application. Maybe this is the Stripe of Empathetic LLMs.
If it feels a bit speculative and nebulous, we have to remember the startup was founded just recently in 2021.
EVI uses a new form of multimodal generative AI that integrates large language models (LLMs) with expression measures, which Hume refers to as an empathic large language model (eLLM). Time will tell if this API based product can deliver in real-world settings.
Originally known as the Startup that teaches AI to read emotions. Thus far I think they’ve seen the most traction in pilots involving healthcare and patient facing chatbots. Hume’s AI has been integrated into applications in industries like health and wellness, customer service and robotics, Cowen said.
Healthcare
Robotics
Beauty & Wellness
Customer Service