Learning to adapt cross language information extraction wrapper

Tak Lam WONG

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

We propose a framework for adapting a previously learned wrapper from a source Web site to unseen sites in different languages. To achieve this, we exploit the previously learned information extraction knowledge and the previously extracted or collected items in the source Web site. These knowledge and data are automatically translated to the same language as the unseen sites via online Web resources such as online Web dictionaries or maps. Site independent features which capture the characteristics of the content of the data are then derived from the translated information. Several text mining methods are employed to automatically discover a set of machine labeled training examples in the unseen site. Both content oriented features and site dependent features of the machine labeled training examples are used for learning the new wrapper for the new unseen site using our language independent wrapper induction component. We conducted experiments on some real-world Web sites in different languages to demonstrate the effectiveness of our framework. Copyright © 2011 Springer Science+Business Media, LLC.
Original languageEnglish
Pages (from-to)918-931
JournalApplied Intelligence
Volume36
Issue number4
DOIs
Publication statusPublished - Jun 2012

Fingerprint

Websites
Glossaries
Industry
Experiments

Citation

Wong, T.-L. (2012). Learning to adapt cross language information extraction wrapper. Applied Intelligence, 36(4), 918-931.

Keywords

  • Web applications
  • Information extraction
  • Web mining