| The C-ORAL-ROM
project is now collecting a comparable set of
corpora of spontaneous spoken language for the
main romance languages (French, Italian,
Portuguese and Spanish, roughly 300.000 words for
each language). C-ORAL-ROM represent the variety
of speech acts performed in everyday language use
and enables the description of their prosodic and
syntactic structure in the four romance
languages. The C-ORAL-ROM corpus will be
delivered in standard textual format in a
multimedia edition on DVDs, where each utterance
in the acoustic signal will turns out aligned to
its textual counterpart. Corpus edition will be
integrated with tools for text concordances and
analysis of the acoustic signal and accompanied
with comparative linguistic studies, models and
standard linguistic measures of spoken language.
C-ORAL-ROM challenges human language technology
to establish non-generic limits and enables the
definition of new models for speech technology in
a multilingual frame.
|