High-quality data is the foundation for training and applying large AI models, and serves as the "fuel" for enterprises transforming and upgrading with AI. However, many companies face challenges when developing AI applications because large models struggle to understand unstructured data.

Can more enterprise users gain access to effective data tools and achieve AI-Ready data freedom?

Recently, OpenDataLab and DingTalk jointly launched DLU (Document Language Understanding), a document parsing tool for enterprise users based on MinerU, aiming to help businesses overcome the challenges of preparing AI-Ready data, lower the barriers to AI application development, and accelerate the large-scale adoption of AI technologies across industries.

MinerU is an intelligent document parsing engine developed by OpenDataLab at Shanghai AI Laboratory (Shanghai AI Lab). Renowned for its accurate parsing capabilities and broad compatibility, it has gained widespread popularity among users, surpassing 40,000 stars on GitHub.

As a world-class scientific research institution in artificial intelligence, Shanghai AI Lab boasts deep technical expertise in large models and data intelligence. Its self-developed OpenDataLab platform is a leading domestic AI model data platform, aggregating over 7,700 open-source, high-quality annotated datasets and having provided more than 2 million data services to over 100,000 users. The latest release, MinerU 2.0, achieves significant improvements in both parsing speed and accuracy, matching the performance of mainstream 72B large models despite having only 0.98B parameters.

DingTalk, an enterprise-level smart mobile collaboration platform under Alibaba Group, offers a rich suite of enterprise document products and serves a vast user base. Products such as DingTalk Docs and AI-powered spreadsheets have deeply integrated MinerU’s capabilities and are offering document parsing functionality to ecosystem developers via its open platform, laying a solid technical and practical foundation for the joint development of DLU.

Built upon MinerU, DLU will soon be open-sourced and features excellent file format compatibility, deep content understanding, and precise structured output. It supports not only mainstream formats like Office documents, PDFs, Markdown, and code files, but also DingTalk's proprietary document, spreadsheet, and AI spreadsheet formats. Moreover, it can extract plain text and accurately parse complex visual elements such as charts, formulas, illustrations, and even chemical molecular structures, efficiently converting them into high-quality corpora suitable for large model training.

DLU will deeply integrate with DingTalk’s collaborative office ecosystem to enable end-to-end AI application workflows

In the future, leveraging DingTalk’s strengths in enterprise service scenarios, DLU will become deeply embedded within the collaborative office ecosystem, enabling users to complete the entire workflow—from document creation and parsing, knowledge base management, data annotation, to customized model training—on a single platform, significantly enhancing efficiency in both AI application development and daily operations.

He Conghui, young scientist at Shanghai AI Laboratory and founder of the OpenDataLab/MinerU open-source project, said: "MinerU already has a broad user base. We aim to further expand its application in enterprise scenarios, fully leverage the value of the OpenDataLab platform, and collaborate with partners to build a 'PyTorch of data tools,' empowering more enterprises to achieve AI-Ready data freedom."

Zhu Hong, CTO of DingTalk, said: "By open-sourcing DLU, we can effectively address the challenge of data preparation for enterprises in the AI era and strengthen the foundation for intelligent transformation. DingTalk is actively building a new AI ecosystem and looks forward to collaborating with more technology partners and industry players to provide robust support for digital and intelligent upgrades across countless sectors."

We dedicated to serving clients with professional DingTalk solutions. If you'd like to learn more about DingTalk platform applications, feel free to contact our online customer service or email at This email address is being protected from spambots. You need JavaScript enabled to view it.. With a skilled development and operations team and extensive market experience, we’re ready to deliver expert DingTalk services and solutions tailored to your needs!