The BCC Corpus System is currently the world’s largest open-access online corpus system of Chinese. Its diachronic channels cover continuous modern and contemporary newspaper texts ranging from 1872 to 2025, formed by the sequential connection of the data from Shen Bao and People’s Daily.
This lecture will introduce the construction philosophy and the operational mechanisms of the BCC Corpus System, alongside demonstrations of its functional modules. Based on the 150-year longitudinal data, we conduct digital humanity research on the evolution of modern and contemporary Chinese, focusing on lexical morphology, usage distribution, and stylistic shifts. This study aims to provide empirical clues for understanding the formation of contemporary Chinese.
BCC 語料庫是當前全球最大的線上開放語料庫系統,其歷時語料由 1872 年至 2025 年時間跨度超過 150 年的近現代報刊(《申報》與《人民日報》)銜接構成。本次講座將介紹BCC語料庫構建的理念和運行設計,並對其典型操作進行展示。基於 BCC 歷時語料庫,我們開展了關於中文詞彙形態、使用分佈和風格變化等問題的數字人文研究,以期為認識現代漢語形成的機制提供線索。
Rao Gaoqi, PhD., Associate Prof in Faculty of Linguistic Sciences, Beijing Language and Culture University (BLCU), core member of the R&D teams for the BCC Corpus System and the BLCU Smart Teaching platform (BST). He founded and edits Hanyutang, a leading platform for Chinese linguistics popularization and academic information, also serves as editorial board members of several peer-reviewed journals, including Journal of Language Strategy Studies, Studies in Language Planning and Digital Humanities. He has over 50 publications in journals and proceedings, and drafted a number of national standards and group standards. His research interests include: natural language processing, language planning, educational technology, digital humanities.
饒高琦,北京語言大學語言科學院副研究員,碩士生導師,國家標準委語言與術語技術委員會 SAC TC62 委員,BCC 語料庫和北語國際中文智慧教學系統核心研發人員。《語言戰略研究》《語言規劃學研究》《數字人文》編委/編輯。語言學公眾號/小紅書「漢語堂」主編。
dhi@ust.hk