CRLLS Logo
Centre for Research on Linguistics and Language Studies (CRLLS) 語言學及語言研究中心

Projects & Resources

CRLLS develops innovative tools, databases, and resources that bridge academic research with practical applications in language education, preservation, and technology.

CRLLS Repository

Our flagship tools and platforms serving researchers, educators, and communities worldwide.

Tool

DOLD (Digital Platform for Collecting Online Language Data)

A web-based digital platform for remote collection of language production data from speakers worldwide, enabling efficient psycholinguistic experiments and fieldwork with minimal manual intervention.

Visit Resource
Database External

English Loanwords in Hong Kong Cantonese

A database comprising over 700 English loanwords documented in Hong Kong Cantonese over a time span of 180 years detailed in Bauer and Wong (2008), revealing insights into language contact and lexical acquisition.

Visit Resource
Corpus External

Brushtalk: A corpus of Miyazaki Touten’s family collection: Documents on Chinese October Revolution from Japan

A database of historical documents related to the Chinese revolution from the family collection of Miyazaki Touten.

Visit Resource
Treebank External

Tripiţaka Koreana Treebank

A treebank of the entire Tripiţaka Koreana (Chinese Buddhist canon stored in Korea) with treebank annotation: word boundaries, parts-of-speech and dependency relations.

Visit Resource
Corpus

The Corpus of Mid-20th Century Hong Kong Cantonese (HKCC)

Transcribed dialogues from 81 black-and-white Cantonese films (1943-1970) bridging the gap between early and contemporary Cantonese. Contains 767k POS-tagged / romanized tokens.

Visit Resource
Tool External

Cantonese Self-Learning Dictionary

A comprehensive self-learning platform for Cantonese, featuring phonology lessons, tone practice with musical staves, everyday conversations, and a dictionary with Mandarin/English search.

Visit Resource
Tool

TypeDuck

A SCOLAR-funded Cantonese keyboard for non-Chinese speakers with 20,000+ users worldwide. Revolutionizing Cantonese learning through innovative input technology.

Visit Resource
Database External

Waitau and Hakka TTS

Comprehensive database preserving Hong Kong's indigenous languages and traditional folk songs through digital technology, featuring text-to-speech capabilities.

Visit Resource
Collection

HKI Stories Collection

Digital platform collecting and preserving stories in Hong Kong indigenous languages, supporting cultural heritage preservation through community engagement.

Visit Resource
External

Contemporary Spoken Cantonese Corpus (CSCC)

Created in the mid-2010s from university student interactions, featuring interview-style discussions about world scenic spots. Valuable for researching speaker-hearer negotiations and stance-taking in spontaneous Cantonese.

Visit Resource
External

Hong Kong Mid-1990s Newspaper Column Corpus (HKMNCC)

~600,000 Chinese characters from Hong Kong newspaper columns featuring informal writing with Cantonese vernacular and English code-mixing. Sources include Hong Kong Economic Times, Hong Kong Economic Journal, and Ming Pao.

Visit Resource
External

Classical Chinese Poems Sing Along

Educational app providing classical poems in Cantonese singing style, preserving lexical tones and enhancing understanding of rhythmic and prosodic features. Features listening, karaoke-style singing, and composition modules.

Visit Resource

Funded Research Projects

Our portfolio of externally funded research initiatives spanning corpus linguistics, language acquisition, and digital humanities.

Project Period Funding Project Title Principal Investigator
Jul 2022 – Jun 2023 FDF A digital platform for collecting online language data (DOLD) Prof CHEUNG Hin Tat, Dr CHIN Chi On Andy
Jan 2023 – Dec 2023 Start-up Research Grant Online Discourse of Autism in Chinese Newspapers and Social Media Dr YIP Wai Chi Jesse
Jun 2023 – Jun 2024 Faculty KT Revolutionizing Language Education: Integrating AI technology into corpus-aided English speaking training Dr CHEN Hsueh Chu Rebecca
Nov 2023 – Oct 2024 FDF The Construction of a Centralised Repository for Digital Humanities Projects Dr LAU Chaak Ming
Dec 2024 – Nov 2025 FDF Exploration of Multilingualism in Central and Southeast Asia with a Corpus-based Approach Dr YIP Wai Chi Jesse
Jun 2025 – May 2026 FDF Developing a Large-Scale Cantonese Lexical and Word Associations Database for Mental Health Research in Hong Kong Dr LAU Chaak Ming
FDF Jul 2022 – Jun 2023

A digital platform for collecting online language data (DOLD)

Prof CHEUNG Hin Tat, Dr CHIN Chi On Andy

Start-up Research Grant Jan 2023 – Dec 2023

Online Discourse of Autism in Chinese Newspapers and Social Media

Dr YIP Wai Chi Jesse

Faculty KT Jun 2023 – Jun 2024

Revolutionizing Language Education: Integrating AI technology into corpus-aided English speaking training

Dr CHEN Hsueh Chu Rebecca

FDF Nov 2023 – Oct 2024

The Construction of a Centralised Repository for Digital Humanities Projects

Dr LAU Chaak Ming

FDF Dec 2024 – Nov 2025

Exploration of Multilingualism in Central and Southeast Asia with a Corpus-based Approach

Dr YIP Wai Chi Jesse

FDF Jun 2025 – May 2026

Developing a Large-Scale Cantonese Lexical and Word Associations Database for Mental Health Research in Hong Kong

Dr LAU Chaak Ming