Abstract
口語語料庫的建設是口語研究的基礎工作,該文選擇具有代表性的交談式談話節目《鏘鏘三人行》和對談式談話節目《魯豫有約》作為語料,建立了一個小型的談話節目語料庫,並構建了包含五大類16小類的會話結構標注體系,對語料進行了會話結構的標注。統計得到打斷結構309例,插入結構141例,重復結構111例,問答結構653/589例,阻礙—修正結構51/21例,反映了會話結構在數量上的不均衡分佈,節目的形式、性質以及交際任務是會話結構分佈的主要影響因素。會話結構組合具有模式性,該文使用Trigram方法對其組合情況進行了分析,發現語料中的高頻組合是問答毗鄰對,此外有大量的非毗鄰性組合。會話結構組合模式不但反映出談話節目的風格特點,還有助於分析會話中的功能性模塊、會話策略的形成,進而更加深入地瞭解會話的運作機制。
The construction of a speech corpus is the foundation of research on oral languages. In this paper, a small scale corpus is constructed based on the representative talk shows, Qiangqiang Sanrenxing and Lu Yu Youyue. An annotation system constituted by 5 primary categories and 16 sub types is developed to annotate the conversational structures. According to the statistics of conversational structures, there are 309 interrupted structures, 141 inserted structures, 111 repetitive structures, 653/589 question and answer structures, 51/21 obstruction-correction structures, which reflect the unbalanced distribution of the number of conversational structures. The form, nature and communicative tasks of the talk shows are the main influencing factors of the distribution of the conversational structure. In addition, conversational structures show certain patterns, and therefore trigram analysis is carried out to explore the combinations. It is found that the highest frequency combination in the corpus is the question-answer adjacency pair, in addition to a large number of contingency combinations. The combination patterns of conversation structures not only reflect the style of the talk shows, but also help to analyze the functional modules in the conversation, the formation of conversation strategies, and thus help us more deeply understand the operational mechanisms of the conversation. Copyright © 2016 中國科學院軟件研究所.
The construction of a speech corpus is the foundation of research on oral languages. In this paper, a small scale corpus is constructed based on the representative talk shows, Qiangqiang Sanrenxing and Lu Yu Youyue. An annotation system constituted by 5 primary categories and 16 sub types is developed to annotate the conversational structures. According to the statistics of conversational structures, there are 309 interrupted structures, 141 inserted structures, 111 repetitive structures, 653/589 question and answer structures, 51/21 obstruction-correction structures, which reflect the unbalanced distribution of the number of conversational structures. The form, nature and communicative tasks of the talk shows are the main influencing factors of the distribution of the conversational structure. In addition, conversational structures show certain patterns, and therefore trigram analysis is carried out to explore the combinations. It is found that the highest frequency combination in the corpus is the question-answer adjacency pair, in addition to a large number of contingency combinations. The combination patterns of conversation structures not only reflect the style of the talk shows, but also help to analyze the functional modules in the conversation, the formation of conversation strategies, and thus help us more deeply understand the operational mechanisms of the conversation. Copyright © 2016 中國科學院軟件研究所.
Original language | Chinese (Simplified) |
---|---|
Pages (from-to) | 140-146 |
Journal | 中文信息學報 |
Volume | 30 |
Issue number | 6 |
Publication status | Published - Nov 2016 |
Citation
王珊和劉銳(2016):談話節目語料庫的構建與會話結構分析,《中文信息學報》,30(6),頁140-146。Keywords
- 談話節目
- 會話結構
- 組合模式
- Talk shows
- Conversational structures
- Combination patterns
- Alt. title: The construction and analysis of a Chinese talk shows corpus