CUCall is a systematic collection of Cantonese/Putonghua speech data over the telephone networks. They are intended to assist the development of telephone based speech recognition systems. The data were collected from telephone calls of over 1,000 speakers through fixed-line or various types of mobile networks. The total amount of data is about 190 hours. Transcriptions are provided in the forms of Chinese characters and phonemic symbols.
CUCall is divided into the following parts:
More information about CUCall can be found from:
The followings are some samples of CUCall for preview purpose. Just download and see if they meet your requirements. By downloading these files, you acknowledge that the copyright of of the data belongs to CUHK and agree the downloaded data will solely be used for preview purpose.
cxsearch is a tool for locating files in CUCorpora from the accompanying transcription files. It accepts regular expression as search pattern. For detail, see the readme file.
You may download cxsearch here.
The corpora of CUCall is now available for licensing. Click here for prices. Industrial parties may license it for developing commercial products or internal research purpose. For research and educational purpose at academic institutions, a discounted special package is available.
The following materials are required for the processing of your request for licensing: