Macgence provided digital assistant training in 40+ languages for a major cloud-based voice service provider used with virtual assistants.
Table of Contents
Challenge
Acquired 13,000+ hours of unbiased data (incl. Children’s data) across 40+ languages.
Execution
13,000+ hours of PI normalised data sourced within 8 weeks at 95%+ accuracy.
Impact
Highly trained Digital assistant models able to understand multiple languages and different age groups.
Overview
- Chatbots and digital assistants have become critical stakeholders in today’s digital scenario that has been fueled by conversational AI. However, the effectiveness and intelligence of these virtual assistants are solely dependent on the technology and data used to train them. Data plays a pivotal role in breathing life into your AI systems, enabling automation, streamlining activities, boosting enterprise productivity, and driving customer engagement. Let’s explore how data fuels the capabilities of Conversational AI.
Challenges
The lack of quality training data related to conversational AI has been a bottleneck in its progress & adoption.
- Acquire hours of conversational audio data in different languages and in different age groups on a range of topics & variety of media domains in 8kHz & 16kHz sampling rates.
- Ensure diversity in datasets – domains, speaker’s demographics, background, etc. to train Conversational AI in an unbiased way.
- Acquiring hours of conversational audio data from Children is a complicated process due to their age factor, parental control and availability.
Solution
- 8 kHz Data Acquired 9,900+ hours of unbiased/unscripted quality audio data (Call Center / General Conversation) on a range of 17 general topics i.e. Finance, Insurance, Retail, Telecom, Hospitality, Legal, Family, Friends, Culture etc.
- 16 kHz Data Acquired 10,800+ hours of high-quality audio data from a wide variety of media domains: Arts and Culture, Beauty and Lifestyles, Biography, Cars and Motors etc. from a diverse set of speakers with respect to their accents, gender, age and demographics.
- Total Data Acquired over 20,600+ hours of high-quality audio data across 40 different languages in multiple dialects from over 3,000+ experienced and credentialed linguists across the world, so as to train the Conversational AI agent in an unbiased way.
Outcome
- The high-quality audio data empowered the client to train its Conversational AI on a wide variety of topics, ranging from Telecom, Hospitality to Legal in 40 different languages and dialects to mimic human conversation. The benefits that the client derived from the platform were: • It can seamlessly interact with humans in multiple languages.
Applications of Conversational AI
Customer Support and Service
Complete automation of Chat support, call support, etc.
Healthcare
NLP applied to Conversational AI models to automate medical transcription & reports.
Financial
Conversational AI can assist customers with banking transactions, account inquiries, and financial advice.
Automotive
Improve the driving experience by assisting in navigation, controlling car systems and giving information on real time basis using conversational AI.
The Macgence Way
TAT
Compliant high-quality data available at your disposal that comes with benefits of customization as well that can be quickly delivered
QUALITY
Our dataset goes through rigorous 2-level quality checks before delivery
COMPLIANCE
Adherence to both the mandatory compliances of HIPAA & GDPR
ACCURACY
Provides ~98% accuracy across different annotation types and model datasets
NO. OF USE CASES SOLVED
Experience across a diverse range of use cases