Vai ai contenuti.

LABLITA

Sezioni
Strumenti personali
Sei qui: Portale » IPIC - Information Structure Database

IPIC - Information Structure Database


HOME FUNDING CREDITS COPYRIGHT ACCESS TO THE DATABASE TAGSET SESSION DESCRIPTIONS


Home

IPIC is a collection of texts chosen from the Informal sections of the C-ORAL-ROM Italian and C-ORAL-BRASIL corpora and manually tagged with informaion units. The annotation strictly follows the Language into Act Theory (Cresti, 2000) and Informational Patterning Theory (Cresti & Moneglia, 2010).

Beyond this main annotation, IPIC contains different types of data and metadata: each session contains audio, session metadata, transcription, text-to-sound alignment, and a multi-level annotation which includes part-of-speech. The annotation of information structure for each utterance goes hand in hand with the annotation of terminal and non-terminal prosodic breaks, which is the main requirement of Informational Patterning Theory. Each sequence ending with a terminal break is terminated with respect to its information structure and matches a reference unit of spontaneous speech.

This project was been managed by Emanuela Cresti and Tommaso Raso. Tommaso Raso and the LEEL Team carried out the annotation of the Brazilian mini-corpus; Ida Tucci annotated the Italian texts and performed the cross-linguistic validation of the annotation consistency of the two collections.

The XML model and the informatic infrastructure were designed and developed by Alessandro Panunzi and Lorenzo Gregori (Mello, Panunzi & Raso, 2012: p. 133-150).

Each text in the archive has been converted into an XML file and stored in an XML database in order to make the whole collection queryable; this database, DB-IPIC, has now been turned into an on-line linguistic resource which can be used for the study of linear relations among Informational Units in spoken language.

IPIC database contains the annotation of the full Italian C-ORAL-ROM texts (74 texts for 124.735 total words and 20.835 terminated sequences). Moreover two comparable mini-corpora of Italian and Brazilian Portuguese informal sections has been settled to allow cross-linguistic comparison of information structure in spoken language. The choice of informal dialogical sessions to build IPIC is consistent with the main goal of the project, that is the analysis of the variation of spoken language structures in face-to-face interactions.

The Brazilian mini-corpus is a subset of the C-ORAL-BRASIL resource, consisting of 20 texts (29,909 words and 5,511 terminated sequences). The Italian mini-corpus contains 20 texts from the Italian resources (32,589 words and 5,663 terminated sequences), properly chosen to be comparable with the Brazilian collection. Detailed data from the three annotated corpora in DB-IPIC can be found in the tables below.

The tables below detail the main figures of the resource, testifying their comparability with regard to the main diaphasic variation parameters. More datailed tables showing the spoken language context variation considered by this sampling is available.


Access to IPIC Database



IPIC Tables



General data



Italian

Brazilian

Italian mini-corpus

Number of sessions

74

20

20

Total words

124,735

29,909

32,589

Terminated Sequences
(Utterances & Stanzas)

20,835

5,511

5,663

Stanzas

1,991

466

546

Utterances

18,844

5,045

5,117

Simple utterances

14,862

4,245

4,034

Compound utterances

1,639

530

443



Interaction types


Italian

Brazilian

Italian mini-corpus

Sessions

Utterances

Sessions

Utterances

Sessions

Utterances

Monologues

27

5,241

7

999

8

1,351

Dialogues

23

7,525

7

2,461

7

2,326

Conversations

24

8,069

6

2,051

5

1,986



Communicative context


Italian

Brazilian

Italian mini-corpus

Sessions

Utterances

Sessions

Utterances

Sessions

Utterances

Family/Private

60

17,620

15

4,139

14

4,140

Public

14

3,215

5

1,372

6

1,523



Information Units



Italian

Brazilian

Italian mini-corpus

Textual Units




Topic

3,272

503

850

Topic List

135

10

34

Appendix of Comment

919

114

233

Appendix of Topic

150

21

43

Parenthesis

1,167

124

328

Locutive introducer

893

223

230

Dialogic Units




Phatic

2,094

430

580

Allocutive

212

139

67

Incipit

1,463

103

398

Conative

281

67

108

Expressive

147

136

48

Discourse connector

571

167

128

Creato da admin
Ultima modifica 15 May 2012, 14:09
 
 


Sviluppato con Plone

Questo sito è conforme ai seguenti standard: