Abstract
Entity resolution identifies entities from different data sources that refer to the same real-world entity and it is an important prerequisite for integrating data from multiple sources. Entity resolution mainly relies on similarity measures on data records. Unfortunately, the data quality of data sources is not so good in practice. Especially web data sources often only provide incomplete information, which leads to the difficulties of direct applying similarity measures to identify the same entities. In order to address this problem, the concept of confidence is introduced to measure the trustworthy of the similarity calculation. An adaptive rule-based approach is used to calculate the similarity between records and its confidence is also derived. Then the similarity and confidence are propagated on the entity relational graph until fix point is reached. Finally, any pair of two records can be determined as matched or unmatched based on a threshold. We performed a series of experiments on real data sets and experiment results show that our approach has a better performance comparing with others. Copyright © 2014 IEEE.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2014 IEEE International Conference on Data Science and Advanced Analytics, DSAA |
Publisher | IEEE |
Pages | 97-103 |
ISBN (Electronic) | 9781479969913 |
DOIs | |
Publication status | Published - 2014 |