The Language Resources and Intelligent Technology Sub-laboratory
Release time:2025/5/16 9:57:07
Guided by national development strategies, the Language Resources and Intelligent Technology Sub-Lab centers its work around three core pillars: "Resource Construction, Technological Breakthroughs, and Application Services." Focusing on critical challenges in the construction of resource systems and intelligent analysis for Standard Chinese (Putonghua) and low-resource languages, the lab drives the integrated innovation of language resource development and Chinese information processing technologies. It aims to build a model platform for language intelligence applications, supporting the modernization of China’s language and writing systems.
The research team is primarily based on the Applied Linguistics Research Department and the Lexicography Research Center, with core members affiliated to the "Dengfeng Program" (Peak Climbing Program) in Corpus Linguistics, a special discipline of the institute. The team consists of 3 senior researchers, 3 associate researchers, and 4 assistant researchers, covering multiple fields including linguistics, ethnolinguistics, computational linguistics, artificial intelligence, and lexicography.
The research directions include: (1) Chinese National Corpus Platform: Aiming to build a large-scale, structurally balanced, richly annotated, dynamically updated, and widely applicable Chinese language resource system with open sharing. It integrates efficient functions such as case retrieval, frequency statistics, collocation queries, usage comparisons, and lexical profiling, providing robust infrastructure for language teaching and research. (2)Low-Resource Chinese Language Processing Platform: Focusing on the informatization of under-resourced languages, with current priorities on constructing resource platforms and processing tools for ancient languages and ethnic minority languages.(3)Intelligent Lexicography Platform: Addressing the needs of dictionary compilation in the digital-intelligent era, it leverages large language models (LLMs) to enable extensive retrieval of massive corpora, authoritative dictionaries, and online resources, supporting the editorial work of flagship publications.