Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning

Lidong BING, Wai LAM, Tak Lam WONG

Research output: Chapter in Book/Report/Conference proceedingChapters

Abstract

We develop a new framework to achieve the goal of Wikipedia entity expansion and attribute extraction from the Web. Our framework takes a few existing entities that are automatically collected from a particular Wikipedia category as seed input and explores their attribute infoboxes to obtain clues for the discovery of more entities for this category and the attribute content of the newly discovered entities. One characteristic of our framework is to conduct discovery and extraction from desirable semi-structured data record sets which are automatically collected from the Web. A semi-supervised learning model with Conditional Random Fields is developed to deal with the issues of extraction learning and limited number of labeled examples derived from the seed entities. We make use of a proximate record graph to guide the semi-supervised learning process. The graph captures alignment similarity among data records. Then the semi-supervised learning process can leverage the unlabeled data in the record set by controlling the label regularization under the guidance of the proximate record graph. Extensive experiments on different domains have been conducted to demonstrate its superiority for discovering new entities and extracting attribute content. Copyright © 2013 ACM.
Original languageEnglish
Title of host publicationProceedings of the 6th ACM International Conference on Web Search and Data Mining
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages567-576
ISBN (Print)9781450318693
DOIs
Publication statusPublished - 2013

Citation

Bing, L., Lam, W., & Wong, T.-L. (2013). Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (pp. 567-576). New York: Association for Computing Machinery.

Keywords

  • Information extraction
  • Entity expansion
  • Proximate record graph
  • Semi-supervised learning

Fingerprint Dive into the research topics of 'Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning'. Together they form a unique fingerprint.