LABLITA Corpus of Spontaneous Spoken Italian
|The LABLITA corpus represents Italian spontaneous speech events. Spontaneous speech events are those communication events where the programming of speech is simultaneous to its execution by the speaker; i.e. the speech event is non-scripted or only partially scripted.|
The corpus is a repository gathering sub-corpora collected in different recording initiatives starting form 1965 and it is updated every year.
The corpus gathers the recording sessions in wav files (Windows PCM 22,050Hz 16 bit) and for each session delivers a set of label files comprising:|
Transcripts are labeled for what regards the occurrence of terminal and non-terminal prosodic breaks in speech.
- Orthographic Transcription in CHAT format
- Metadata in CHAT and IMDI format
- Text to speech synchronization in .xml flies (under completion)
Text to speech synchronization specifies an alignment unit for each sequence of words ending with a terminal break.
The Corpus repository is located in the Language Lab of the Italian Department of the University of Florence (LABLITA) where it can be accessed within the frame of Research Projects.|
The copyright of the LABLITA corpus belongs to Emanuela Cresti, who directed the setting up of the data base.
The exploitation rights of the corpus are presently granted to the Italian Department of University of Florence. A sampling of the LABLITA corpora have been published in the C-ORAL-ROM corpus and is therefore available to public.|
Sampling of the corpus can be distributed for R&D propose through license agreement released by the Italian Department of the University of Florence.
Prof. Emanuela Cresti
Co-ordinator of the C-ORAL-ROM project
University of Florence
Piazza Savonarola, 1
phone: +39 055 5032486
fax: +39 055 503247