Main features of the C-ORAL-ROM project

Collecting multilingual corpora of spontaneous spoken language in a natural context, C-ORAL-ROM addresses a key issue in human language technology:

  • the need of recognition of spoken language in unlimited contexts
  • the adequacy of speech synthesis with respect to natural prosody
  • the need of a rapid shift toward multilingual applications based on the spoken language interface

C-ORAL-ROM corpora represent the variety of speech acts performed in everyday language use and enables the description of their prosodic and syntactic structure in the four romance languages, from a quantitative and qualitative point of view.

C-ORAL-ROM challenges human language technology to establish non-generic limits and enables the definition of new models for Multilingual language technology.

  1. Corpora
  2. Multilinguality and comparability in spontaneous speech
  3. Representation of spontaneous speech variation and Sampling criteria
  4. Prosodic tagging and utterance based alignment
  5. Standard measures of spontaneous speech