|
For a better approach to
the linguistic information of spoken language, the speech
software WinPitch Corpus integrates the multimedia
resource, thus ensuring text-sound alignment and
simultaneous acoustic analysis :
- functions for sound/text
alignment: text-tags insertion based on
sound-wave tag;
- slow down of the acoustic
signal for an easy and precise tag inserction;
- real-time sound-signal
analysis with respect to main vocal parameters (
Fo, duration, intensity, spectrum) for
long signals (unlimited);
| The conception of
C-ORAL-ROM multimedia storage of spoken language
resources is based on the selection of a natural
alignment unit that is also identified as a basic
tagging level in textual corpora i.e. utterance
|
- word
based alignment is meaningless for prosodic
reasons: words are co-articulated in prosodic
units and the acoustic effect of a word based
alignment is perceptively unnatural
- syllable
based alignment is extremely expensive and the
aligned units are not a meaningful linguistic
entity (syllables do not have a meaning)
- the utterance
based alignment is both meaningful from a
linguistic point of view and natural from a
perceptual point of view.
| In C-ORAL-ROM
all the textual information is tagged
simultaneously with respect to prosodic parsing
and utterance limit: each prosodic unit
corresponding to an utterance will turn out
aligned to its textual counterpart. A careful
study of prosody for the accomplishment of an
utterance based alignment is one of the main
feature of the C-ORAL-ROM Project. The result is
extremely significant for the exploitation of the
resulting resource: C-ORAL-ROM can be seen as
a data base of natural utterances
The
exploitation of such a data base is relevant for
syntactic properties, prosodic properties, action
value properties, lexical properties of natural
utterances at both acoustic and textual levels.
|
The
utterance based alignment defined on highly
prominent prosodic cues is a proposed standard
for spoken multimedia archives
The selection of textual
units corresponding to an utterance is based on highly
identifiable prosodic properties that the linguistic
entities corresponding to an utterances have at the
perceptual level.
The definition of utterance in spoken language is theoretically defined.
|