二字短語凝固度分級考察

Research output: Contribution to journalArticlespeer-review

Abstract

為解決二字短語擴充詞表帶來的歧義切分大幅增加問題,我們對擴收的二字短語進行了凝固度的分級。我們首先考察驗證了已曾提出過的各種標準和方法。考察證明,結構類型、“成分字替換率”“前 /後接歧義度”與凝固度密切相關,也與接續類型 (A/BC~AB/C)密切相關。其中,定中、狀中、述賓三類結構以前字為基準的後字替換率有特別價值,該頻率高的字組多為A/BC型接續,其他字組多為AB/C型接續。在此基礎上,我們提出了二字短語擴充詞表的分級方案和具體的分級排歧策略。
This paper attempts to solve the problem of multi-ambiguities caused by the enlarged vocabulary of two-character phrases (TCP), by means of grading the TCP according to their agglomeration degree. By testing various standards and methods, we find that these three factors --- the structure of the phrases, the replacing rate (RR) of component character (CC) and the ambiguous rate by front and back connecting --- are not only closely related to the agglomeration degree of tw o-character phrases, but also related to the type of ambiguity (A/BC ~ AB/C). We also find that the RR of back CC (RR1) to the structures of adnominal-N, adverbial-V/A and VO are especially useful: these three types of phrases with the high RR1 are mostly of A/BC connecting, whereas the other phrases are of AB/C. Based on this result, we present a grading scheme for the enlarged vocabulary of TCP and give some disambiguation rules relating to the agglomeration degree. Copyright © 2000 教育部語言文字應用研究所.
Original languageChinese (Simplified)
Pages (from-to)21-33
Journal語言文字應用
Volume2000
Issue number2
DOIs
Publication statusPublished - 2000

Citation

梁源(2000):二字短語凝固度分級考察,《語言文字應用》,2000(2),頁21-33。

Keywords

  • Alt. title: On the agglomeration degree of two-character phrases