Abstract
Ranging from machine translation (MT) to cross-lingual information retrieval, many NLP applications require parallel corpora as critical resources. Given the phenomenal growth in patents and in the need to mediate between different languages, we explore a new but important area involving patents by investigating how a Chinese-English-Japanese trilingual parallel corpora can be cultivated from comparable patents, and introduce our mined trilingual corpus, which demonstrates the considerable potential of cultivating large-scale parallel corpora from comparable patents.
Original language | English |
---|---|
Publication status | Published - 2011 |