A new samples selecting method based on K nearest neighbors

Kai YANG, Yi CAI, Zhiwei CAI, Xingwei TAN, Haoran XIE, Tak Lam WONG, Wai Hong CHAN

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Short text classification uses a supervised learning process, and it needs a huge amount of labeled data for training. This process consumes a lot of human resources. In traditional supervised learning problems, active learning can reduce the amount of samples that need to be labeled manually. It achieves this goal by selecting the most representative samples to represent the whole training set. Uncertainty sampling is the most popular way in active learning, but it has poor performance when it is affected by outliers. In our paper, we propose a new sampling method for training sets containing short text, which is denoted as Top-K Representative (TKR). However, the optimization process of TKR is a N-P hard problem. To solve this problem, a new algorithm, based on the greedy algorithm, is proposed to obtain the approximating results. The experiments show that our proposed sampling method performs better than the state-of-the-art methods. Copyright © 2017 IEEE.
Original languageEnglish
Title of host publicationProceedings of 2017 IEEE International Conference on Big Data and Smart Computing (BigComp)
Place of PublicationSouth Korea
PublisherIEEE
Pages457-462
ISBN (Print)9781509030156, 9781509030149
Publication statusPublished - 2017

Fingerprint

Supervised learning
Sampling
Personnel
Experiments
Problem-Based Learning
Uncertainty

Citation

Yang, K., Cai, Y., Cai, Z., Tan, X., Xie, H., Wong, T. L., et al. (2017). A new samples selecting method based on K nearest neighbors. In Proceedings of 2017 IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 457-462). South Korea: IEEE.

Keywords

  • Training
  • Uncertainty
  • Entropy
  • Sampling methods
  • Optimization
  • Approximation algorithms
  • Labeling