DingTalk and Tongyi Lab Launch Industry-Specific Speech Recognition Large Model Fun-ASR
On August 22, DingTalk and the speech team at Tongyi Lab jointly released a new generation of large-scale speech recognition model, Fun-ASR. This model accurately recognizes professional terminology across ten industries including interior construction and animal husbandry, and supports customized training for enterprise-specific models. Through deep collaboration, Fun-ASR efficiently transcribes various audio signals with capabilities in multi-industry terminology understanding, multilingual accent recognition, and contextual semantic reasoning.
Currently integrated into features such as subtitles and interpretation in DingTalk Meetings, smart meeting summaries, and voice assistant functions, Fun-ASR aims to build a stable, efficient, and easily scalable speech recognition foundation particularly suitable for enterprise scenarios demanding high accuracy and contextual comprehension.
Core Technology Highlights: Three Key Capabilities Ensuring High-Precision Recognition
Trained on over one billion hours of audio data and co-developed using real-world scenario data from multiple industries—including internet, technology, interior construction, animal husbandry, and automotive—provided by DingTalk, Fun-ASR significantly enhances its ability to recognize industry-specific terms.
Benchmark tests show an 18% improvement in recognition accuracy in the insurance industry and a 15%-20% increase in sectors like construction and animal husbandry. The model also supports enterprise-defined hotwords, allowing import of more than 1,000 custom vocabulary entries to improve recognition of rare or niche terms.
Fun-ASR can optimize inference using internal enterprise information such as contact lists, schedules, and knowledge bases within DingTalk, effectively reducing hallucinations in large models (with proper authorization) and delivering more reliable transcription results.
Leveraging an efficient end-to-end architecture, the model further refines algorithms using actual voice data provided by enterprises, improving recognition accuracy for proprietary content such as brand names, project codes, product names, and personal names.
For example, after enterprise-specific training, the model precisely identifies complex phrases like "Belgian imported Pulse latex" and "proprietary Sonocore foaming process" for KUKA Home, laying a solid foundation for subsequent customer demand analysis.
Future Outlook: Deepening Industry Adaptation Continuously
Li Xiangang, head of the speech team at Tongyi Lab, said: "We look forward to collaborating with DingTalk to drive innovative applications of speech recognition technology in enterprise settings. We will continue expanding the data volume and model scale of Fun-ASR, enhancing the replicability of solutions, and delivering smarter, more efficient experiences for businesses."
Zhu Hong, CTO of DingTalk, noted: "Through just three months of close collaboration, we achieved model deployment and earned recognition from leading customers—an important breakthrough toward industry leadership, and a replicable blueprint for other enterprises seeking customized large models."
The potential of Fun-ASR continues to be explored. Both parties will focus on upgrading capabilities in dialect recognition, noise-resistant performance, multilingual support, and deeper enterprise customization, comprehensively improving the precision and practicality of speech transcription to empower more enterprises in their intelligent transformation journey.
We dedicated to serving clients with professional DingTalk solutions. If you'd like to learn more about DingTalk platform applications, feel free to contact our online customer service or email at

English
اللغة العربية
Bahasa Indonesia
Bahasa Melayu
ภาษาไทย
Tiếng Việt
简体中文