Projects & Resources
CRLLS develops innovative tools, databases, and resources that bridge academic research with practical applications in language education, preservation, and technology.
CRLLS Repository
Our flagship tools and platforms serving researchers, educators, and communities worldwide.
DOLD (Digital Platform for Collecting Online Language Data)
A web-based digital platform for remote collection of language production data from speakers worldwide, enabling efficient psycholinguistic experiments and fieldwork with minimal manual intervention.
Visit ResourceEnglish Loanwords in Hong Kong Cantonese
A database comprising over 700 English loanwords documented in Hong Kong Cantonese over a time span of 180 years detailed in Bauer and Wong (2008), revealing insights into language contact and lexical acquisition.
Visit ResourceBrushtalk: A corpus of Miyazaki Touten’s family collection: Documents on Chinese October Revolution from Japan
A database of historical documents related to the Chinese revolution from the family collection of Miyazaki Touten.
Visit ResourceTripiţaka Koreana Treebank
A treebank of the entire Tripiţaka Koreana (Chinese Buddhist canon stored in Korea) with treebank annotation: word boundaries, parts-of-speech and dependency relations.
Visit ResourceThe Corpus of Mid-20th Century Hong Kong Cantonese (HKCC)
Transcribed dialogues from 81 black-and-white Cantonese films (1943-1970) bridging the gap between early and contemporary Cantonese. Contains 767k POS-tagged / romanized tokens.
Visit ResourceCantonese Self-Learning Dictionary
A comprehensive self-learning platform for Cantonese, featuring phonology lessons, tone practice with musical staves, everyday conversations, and a dictionary with Mandarin/English search.
Visit ResourceTypeDuck
A SCOLAR-funded Cantonese keyboard for non-Chinese speakers with 20,000+ users worldwide. Revolutionizing Cantonese learning through innovative input technology.
Visit ResourceWaitau and Hakka TTS
Comprehensive database preserving Hong Kong's indigenous languages and traditional folk songs through digital technology, featuring text-to-speech capabilities.
Visit ResourceHKI Stories Collection
Digital platform collecting and preserving stories in Hong Kong indigenous languages, supporting cultural heritage preservation through community engagement.
Visit ResourceContemporary Spoken Cantonese Corpus (CSCC)
Created in the mid-2010s from university student interactions, featuring interview-style discussions about world scenic spots. Valuable for researching speaker-hearer negotiations and stance-taking in spontaneous Cantonese.
Visit ResourceHong Kong Mid-1990s Newspaper Column Corpus (HKMNCC)
~600,000 Chinese characters from Hong Kong newspaper columns featuring informal writing with Cantonese vernacular and English code-mixing. Sources include Hong Kong Economic Times, Hong Kong Economic Journal, and Ming Pao.
Visit ResourceClassical Chinese Poems Sing Along
Educational app providing classical poems in Cantonese singing style, preserving lexical tones and enhancing understanding of rhythmic and prosodic features. Features listening, karaoke-style singing, and composition modules.
Visit ResourceFunded Research Projects
Our portfolio of externally funded research initiatives spanning corpus linguistics, language acquisition, and digital humanities.
| Project Period | Funding | Project Title | Principal Investigator |
|---|---|---|---|
| Jul 2022 – Jun 2023 | FDF | A digital platform for collecting online language data (DOLD) | Prof CHEUNG Hin Tat, Dr CHIN Chi On Andy |
| Jan 2023 – Dec 2023 | Start-up Research Grant | Online Discourse of Autism in Chinese Newspapers and Social Media | Dr YIP Wai Chi Jesse |
| Jun 2023 – Jun 2024 | Faculty KT | Revolutionizing Language Education: Integrating AI technology into corpus-aided English speaking training | Dr CHEN Hsueh Chu Rebecca |
| Nov 2023 – Oct 2024 | FDF | The Construction of a Centralised Repository for Digital Humanities Projects | Dr LAU Chaak Ming |
| Dec 2024 – Nov 2025 | FDF | Exploration of Multilingualism in Central and Southeast Asia with a Corpus-based Approach | Dr YIP Wai Chi Jesse |
| Jun 2025 – May 2026 | FDF | Developing a Large-Scale Cantonese Lexical and Word Associations Database for Mental Health Research in Hong Kong | Dr LAU Chaak Ming |
A digital platform for collecting online language data (DOLD)
Prof CHEUNG Hin Tat, Dr CHIN Chi On Andy
Online Discourse of Autism in Chinese Newspapers and Social Media
Dr YIP Wai Chi Jesse
Revolutionizing Language Education: Integrating AI technology into corpus-aided English speaking training
Dr CHEN Hsueh Chu Rebecca
The Construction of a Centralised Repository for Digital Humanities Projects
Dr LAU Chaak Ming
Exploration of Multilingualism in Central and Southeast Asia with a Corpus-based Approach
Dr YIP Wai Chi Jesse
Developing a Large-Scale Cantonese Lexical and Word Associations Database for Mental Health Research in Hong Kong
Dr LAU Chaak Ming