IPIC - Information Structure Database
| HOME | FUNDING | CREDITS | COPYRIGHT | ACCESS TO THE DATABASE | TAGSET | SESSION DESCRIPTIONS |
Home
IPIC is a collection of texts chosen from the Informal sections of the C-ORAL-ROM Italian and C-ORAL-BRASIL corpora and manually tagged with informaion units. The annotation strictly follows the Language into Act Theory (Cresti, 2000) and Informational Patterning Theory (Cresti & Moneglia, 2010).
Beyond this main annotation, IPIC contains different types of data and metadata: each session contains audio, session metadata, transcription, text-to-sound alignment, and a multi-level annotation which includes part-of-speech. The annotation of information structure for each utterance goes hand in hand with the annotation of terminal and non-terminal prosodic breaks, which is the main requirement of Informational Patterning Theory. Each sequence ending with a terminal break is terminated with respect to its information structure and matches a reference unit of spontaneous speech.
This project was been managed by Emanuela Cresti and Tommaso Raso. Tommaso Raso and the LEEL Team carried out the annotation of the Brazilian mini-corpus; Ida Tucci annotated the Italian texts and performed the cross-linguistic validation of the annotation consistency of the two collections.
The XML model and the informatic infrastructure were designed and developed by Alessandro Panunzi and Lorenzo Gregori (Mello, Panunzi & Raso, 2012: p. 133-150).
Each text in the archive has been converted into an XML file and stored in an XML database in order to make the whole collection queryable; this database, DB-IPIC, has now been turned into an on-line linguistic resource which can be used for the study of linear relations among Informational Units in spoken language.
IPIC database contains the annotation of the full Italian C-ORAL-ROM texts (74 texts for 124.735 total words and 20.835 terminated sequences). Moreover two comparable mini-corpora of Italian and Brazilian Portuguese informal sections has been settled to allow cross-linguistic comparison of information structure in spoken language. The choice of informal dialogical sessions to build IPIC is consistent with the main goal of the project, that is the analysis of the variation of spoken language structures in face-to-face interactions.
The Brazilian mini-corpus is a subset of the C-ORAL-BRASIL resource, consisting of 20 texts (29,909 words and 5,511 terminated sequences). The Italian mini-corpus contains 20 texts from the Italian resources (32,589 words and 5,663 terminated sequences), properly chosen to be comparable with the Brazilian collection. Detailed data from the three annotated corpora in DB-IPIC can be found in the tables below.
The tables below detail the main figures of the resource, testifying their comparability with regard to the main diaphasic variation parameters. More datailed tables showing the spoken language context variation considered by this sampling is available.
IPIC Tables
General data
|
|
Italian |
Brazilian |
Italian mini-corpus |
|
Number of sessions |
74 |
20 |
20 |
|
Total words |
124,735 |
29,909 |
32,589 |
|
Terminated Sequences |
20,835 |
5,511 |
5,663 |
|
Stanzas |
1,991 |
466 |
546 |
|
Utterances |
18,844 |
5,045 |
5,117 |
|
Simple utterances |
14,862 |
4,245 |
4,034 |
|
Compound utterances |
1,639 |
530 |
443 |
Interaction types
|
Italian |
Brazilian |
Italian mini-corpus |
||||
|
Sessions |
Utterances |
Sessions |
Utterances |
Sessions |
Utterances |
|
|
Monologues |
27 |
5,241 |
7 |
999 |
8 |
1,351 |
|
Dialogues |
23 |
7,525 |
7 |
2,461 |
7 |
2,326 |
|
Conversations |
24 |
8,069 |
6 |
2,051 |
5 |
1,986 |
Communicative context
|
Italian |
Brazilian |
Italian mini-corpus |
||||
|
Sessions |
Utterances |
Sessions |
Utterances |
Sessions |
Utterances |
|
|
Family/Private |
60 |
17,620 |
15 |
4,139 |
14 |
4,140 |
|
Public |
14 |
3,215 |
5 |
1,372 |
6 |
1,523 |
Information Units
|
|
Italian |
Brazilian |
Italian mini-corpus |
|
Textual Units |
|
|
|
|
Topic |
3,272 |
503 |
850 |
|
Topic List |
135 |
10 |
34 |
|
Appendix of Comment |
919 |
114 |
233 |
|
Appendix of Topic |
150 |
21 |
43 |
|
Parenthesis |
1,167 |
124 |
328 |
|
Locutive introducer |
893 |
223 |
230 |
|
Dialogic Units |
|
|
|
|
Phatic |
2,094 |
430 |
580 |
|
Allocutive |
212 |
139 |
67 |
|
Incipit |
1,463 |
103 |
398 |
|
Conative |
281 |
67 |
108 |
|
Expressive |
147 |
136 |
48 |
|
Discourse connector |
571 |
167 |
128 |