C-ORAL-ROM

INTEGRATED REFERENCE CORPORA FOR SPOKEN ROMANCE LANGUAGES

ANNUAL REPORT 2001

 
 
  Index    
       
 
Summary of 2001 Activities
Sampling criteria
Textual and acoustic format
WINPITCHCORPUS
User Group, Promotion and Awareness
Future Work
Further Information
                         

http://lablita.dit.unifi.it/coralrom

                             
The C-ORAL-ROM project is now collecting a comparable set of corpora of spontaneous spoken language for the main romance languages (French, Italian, Portuguese and Spanish, roughly 300.000 words for each language). C-ORAL-ROM represent the variety of speech acts performed in everyday language use and enables the description of their prosodic and syntactic structure in the four romance languages. The C-ORAL-ROM corpus will be delivered in standard textual format in a multimedia edition on DVDs, where each utterance in the acoustic signal will turns out aligned to its textual counterpart. Corpus edition will be integrated with tools for text concordances and analysis of the acoustic signal and accompanied with comparative linguistic studies, models and standard linguistic measures of spoken language. C-ORAL-ROM challenges human language technology to establish non-generic limits and enables the definition of new models for speech technology in a multilingual frame.