Building Cantonese dictionaries using crowdsourcing strategies: The words.hk project

Research output: Chapter in Book/Report/Conference proceedingChapters

Abstract

The words.hk project is the first attempt to build a Cantonese-to Cantonese dictionary using a lean start-up (see Ries, The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business, 2011) model combined with crowdsourcing strategies. The goal is to produce a comprehensive dictionary written for Cantonese and in Cantonese. Existing resources are often (1) not available electronically, (2) out of date, or (3) too Anglo- or Sino-centric. Building large data sets from these existing resources requires a lot of editing and ‘data-janitorial’ work, which can be done far better with a large group of less-experienced people than just a handful of experts, and crowdsourcing strategies are particularly appropriate in these cases. We started with a small team of editors and software developers in 2014. In less than 3 years’ time, we grew into an organisation with over 400 volunteers, gathered over 42,000 entries, of which more than 36,000 entries have been edited with Written Cantonese descriptions, examples, and translations as of June 2017. Given the nature of the project and the member composition – a language with no authority to fall back on and most members with no formal linguistics or lexicographical training – we adhere to two simple principles, in order to keep the dictionary growing without introducing major issues in the core data: ‘usage over etymology’ and ‘decision problem avoidance’. I will discuss how these principles have shaped the architecture of the project, the editing workflow, and other technological difficulties that we face. Copyright © 2019 Springer Nature Singapore Pte Ltd.
Original languageEnglish
Title of host publicationDigital humanities and new ways of teaching
EditorsAnna Wing-bo TSO
Place of PublicationSingapore
PublisherSpringer
Pages89-107
ISBN (Electronic)9789811312779
ISBN (Print)9789811312762
DOIs
Publication statusPublished - 2019

Citation

Lau, C.-M. (2019). Building Cantonese dictionaries using crowdsourcing strategies: The words.hk project. In A. W.-B. Tso (Ed.), Digital humanities and new ways of teaching (pp. 89-107). Singapore: Springer.

Keywords

  • Cantonese
  • Dictionary compilation
  • Crowdsourcing
  • Usage over etymology
  • Decision problem avoidance
  • Open data

Fingerprint

Dive into the research topics of 'Building Cantonese dictionaries using crowdsourcing strategies: The words.hk project'. Together they form a unique fingerprint.