漢語共時語料庫與追蹤語料庫:語料庫語言學的新方向

鄒嘉彥, 鄺藹兒, 路斌, 蔡永富

Research output: Contribution to journalArticles

Abstract

隨著信息技術的不斷提升、互聯網的普及,漢語自然語言處理的難題不斷得到解決,漢語語料庫的發展和語料庫語言學的應用也面臨著新的契機。如何持續充分應用龐大的多種語料庫,並協同與配合語言學和人文、社會科學多個領域,來追蹤了解各種語言現象及其背後的社會文化深層含義,是語料庫語言學可以承擔的新任務。LIVAC漢語共時語料庫持續處理和分析泛華語七個地區十七年四億字的語料,可真正起到"時間錦囊"的作用,為緊密追蹤、科學觀察泛華地區語言現象及有關社會文化演變,提供了堅實的基礎和科學依據。該文介紹LIVAC如何由漢語"共時語料庫"演變為"追蹤語料庫"。
The advancement of information technology and the Internet has offered important solutions to many classical problems in Chinese natural language processing. It has also opened up new opportunities for corpus linguistics, particularly the cultivation and utilization of large corpora for monitoring and tracking various language phenomena from the linguistic perspective, and investigating such language development in relation to the underlying social and cultural implications traditionally studied by humanities and social sciences. Over the past 17 years, the LIVAC corpus has grown into a very large corpus of its kind, containing results from the analysis of about 400 million Chinese characters drawn from news media from 7 communities of pan-Chinese regions. The long-term effort behind LIVAC has enabled it to function as serial time capsules, which provide a solid foundation for scientifically tracking and monitoring various phenomena of language changes together with the associated social and cultural developments within and across pan-Chinese regions. This paper introduces how the LIVAC synchronous corpus has evolved into a monitoring corpus of Chinese communities. Copyright © 2012 中國科學院軟件研究所.
Original languageChinese (Traditional)
Pages (from-to)38-45
Journal中文信息學報
Volume25
Issue number6
Publication statusPublished - Nov 2011

Citation

鄒嘉彥、鄺藹兒、路斌和蔡永富(2011):漢語共時語料庫與追蹤語料庫:語料庫語言學的新方向,《中文信息學報》,25(6),頁38-45。

Keywords

  • 語料庫語言學
  • LIVAC漢語語料庫
  • 共時語料庫
  • 追蹤語料庫
  • Corpus linguistics
  • LIVAC corpus
  • Synchronous corpus
  • Monitoring corpus
  • Alt. title: Chinese synchronous corpus and monitoring corpus: A new direction of corpus linguistics