Initiatives of digital humanities research on Cantonese


Research output: Contribution to conferencePapers


According to Ethnologue, Cantonese ranks the third among the Chinese dialects (after Mandarin and Wu) in terms of speaker population. In Hong Kong, nearly 90% of the residents speak Cantonese as the usual home language. Cantonese is also a language that is widely taught in different parts of the globe.
Rigorous linguistic studies of Cantonese have become active since the 1970s. A number of important works have been published since then. Most of them focus on language-internal features, such as phonology, morphology, grammar and semantics, etc. After forty-some years of work, our understanding of Cantonese has been much enhanced. It is about time to re-orientate our focus of Cantonese linguistics research.
One of the areas that we are interested in is Digital Humanities (DH) which refers to the synergy between digital technologies and humanities. DH has gained increasing interest and impact in the field of humanities in the past few decades. With technological enriched digital archives, socio-cultural spaces where researchers from different disciplinary backgrounds interact, share knowledge, beliefs, and values, have been opened up. Digital Humanities research involves processing of huge amount of textual materials that reflect local culture.
There has not been any serious attempt of DH for Cantonese possibly due to the fact that not much language data in machine-readable form has been collected and processed. The paucity of corpus data is one of the major hurdles in developing DH in Cantonese. In fact, there is a reasonable amount of Cantonese textual data spanning a period of about two centuries awaiting further processing for DH research.
The Department of Linguistics and Modern Language Studies at the Education University of Hong Kong has identified DH in Cantonese as one of its research initiatives. A number of initiatives have been developed, such as an online Cantonese self-learning platform ( and a corpus of mid-20ᵗʰ century Hong Kong Cantonese ( With this groundwork in place, we are eager to develop more DH initiatives. At the same time, we realize a number of issues that we need to address when doing DH works on Cantonese. These include:
Data processing such as transcription of spoken data and orthographic representation of colloquial items;
Parts-of–speech tagging which can help distinguish homonyms with different syntactic behaviors;
Syntactic parsing which can work out the functions of constituents in the corpus data.
It is anticipated that the above foundational works are important and essential in developing DH research in Cantonese. One of our research issues is sentiment analysis. With Cantonese corpus data collected from different periods, we would be able to track the change of the sentiment values of the same concept which could reflect the perception of people in the community over time. Copyright © 2017 22nd International Conference on the Yue Dialects.
Original languageEnglish
Publication statusPublished - Dec 2017
Event第二十二屆國際粵方言研討會 = The 22nd International Conference on Yue Dialects - 香港教育大學, Hong Kong
Duration: 08 Dec 201709 Dec 2017


Conference第二十二屆國際粵方言研討會 = The 22nd International Conference on Yue Dialects
Country/TerritoryHong Kong


Chin, A. (2017, December). Initiatives of digital humanities research on Cantonese. Paper presented at The 22nd International Conference on the Yue Dialects, The Education University of Hong Kong, Hong Kong, China.


Dive into the research topics of 'Initiatives of digital humanities research on Cantonese'. Together they form a unique fingerprint.