The words.hk project is the first attempt to build a Cantonese-to Cantonese dictionary using a lean start-up (see Ries, The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business, 2011) model combined with crowdsourcing strategies. The goal is to produce a comprehensive dictionary written for Cantonese and in Cantonese. Existing resources are often (1) not available electronically, (2) out of date, or (3) too Anglo- or Sino-centric. Building large data sets from these existing resources requires a lot of editing and ‘data-janitorial’ work, which can be done far better with a large group of less-experienced people than just a handful of experts, and crowdsourcing strategies are particularly appropriate in these cases. We started with a small team of editors and software developers in 2014. In less than 3 years’ time, we grew into an organisation with over 400 volunteers, gathered over 42,000 entries, of which more than 36,000 entries have been edited with Written Cantonese descriptions, examples, and translations as of June 2017. Given the nature of the project and the member composition – a language with no authority to fall back on and most members with no formal linguistics or lexicographical training – we adhere to two simple principles, in order to keep the dictionary growing without introducing major issues in the core data: ‘usage over etymology’ and ‘decision problem avoidance’. I will discuss how these principles have shaped the architecture of the project, the editing workflow, and other technological difficulties that we face. Copyright © 2019 Springer Nature Singapore Pte Ltd.
|Title of host publication||Digital humanities and new ways of teaching|
|Editors||Anna Wing-bo TSO|
|Place of Publication||Singapore|
|Publication status||Published - 2019|
CitationLau, C.-M. (2019). Building Cantonese dictionaries using crowdsourcing strategies: The words.hk project. In A. W.-B. Tso (Ed.), Digital humanities and new ways of teaching (pp. 89-107). Singapore: Springer.
- Dictionary compilation
- Usage over etymology
- Decision problem avoidance
- Open data