Dynamic Italian Language Resources. Creation, deployment and maintenance of a web-based infrastructure aimed at consolidating the teaching of Italian worldwide

Direzione Generale per il Coordinamento e lo Sviluppo della ricerca

Anno 2007 - Protocollo: RBNE075J8Z Strategic Program in Linguistics

Research Principal Investigator
CRESTI Emanuela, Societa' Internazionale di Linguistica e Filologia Italiana

Scientific management
Ing. Samuele Paladini, Università di Firenze, LABLITA
Prof. Alessandro Panunzi, Università di Firenze, LABLITA
Prof. Massimo Moneglia, Università di Firenze, LABLITA

The internet is the largest existing repository of linguistic information, but is also one of the main environments and means of use of language, a space within which both functional and creative uses of a language are practiced with growing frequency. The project aims to build up through crawling techniques a repository of the Italian Language (Risorsa Dinamica di Rete Italiana- which exploits the Italian contents on the web. The project also intends to complement this web-based language infrastructure with computational tools designed for the exploitation of vast corpora to enhance language competence and use. The database will allow us to collect massive amounts of freely downloadable documents, covering all the possible domains of language use: law, religion, politics, literature, trade etc. To reach this goal, the research units will each organize the crawling processes in their own domains of competence, as well as expert evaluation in the selection of contents. They will thus ensure an appropriate representation of the Italian culture. This will avoid the main problem of present web corpora, that is the low representativeness of these data bases in terms of the language they are trying to describe. The resource will be annotated by means of metadata with the intent of structuring the database and making it appropriate both for research and corpus-based linguistic education purposes. Metadata will give representative value to otherwise shapeless masses of data. is promoted, and will be disseminated and mantained by SILFI (Società internazionale di linguistic e filologia italiana) and it is designed for use by all parties involved in the teaching of Italian abroad: students, businessmen, second and third generation emigrants who will be able to profit from the access to a huge database of representative texts to better characterize the Italian culture and way of life. will gather and mantain huge amount of documents assembled through crawling techniques. It will be structured following the best practices of the day in corpus linguistics and will allow users to access the most representative use of the Italian language from both a practical and a cultural perspective. The resulting infrastructure will offer web access to language technologies that will allow the user easy access to Italian language use in all relevant domains., will be accessible and searchable online, and therefore easily usable for language learning purposes thanks to a range of tools – provided within the infrastructure – tools that will specifically be user-friendly while ensuring a valid computation of the linguistic information. The linguistic annotation of language contents will be accomplished through robust language technologies that are already available to the consortium. Contents will be indexed so to allow real time research on the web. has been conceived as a dynamic database that can be easily enhanced over time and is sensitive to the heavy qualitative and quantitative changes undergone by web content. The infrastructure is planned as a permanent resource of the Italian language, available well beyond the project’s lifespan. The dynamicity of will be made possible by automating the process of acquisition and indexing of documents and by the role assumed by SILFI in the dissemination and maintenance of the resource. SILFI, in keeping with its statutory objectives, will test the resource and distribute it among Universities all over the world thanks to the international network of its members. The maintenance of even after its conclusion will be guaranteed by SILFI. In this way, we believe, this precious resource designed for public use will be safeguarded against abandonment or a limited use by just a small elite of specialists.


1. web corpora
2. corpus linguistics
3. corpora for L2
4. language tecnology
5. language usage
6. distribuited systems

List of the Research Units
Associated Investigator of the Research Unit Program Qualification Institution Affiliation
1. CRESTI Emanuela Professor Societa' Internazionale di Linguistica e Filologia Italiana Progetto RIDIRE
2. MONEGLIA Massimo Professore associato Università degli Studi di FIRENZE Dip. ITALIANISTICA
3. MARELLO Carla Professore ordinario Università degli Studi di TORINO Dip. SCIENZE LETTERARIE E FILOLOGICHE
4. TOGNINI BONELLI Elena Professore ordinario Università degli Studi di SIENA Dip. STUDI AZIENDALI E SOCIALI
5. D'ACHILLE Paolo Professore Ordinario Università degli Studi ROMA TRE Dip. ITALIANISTICA
6. DE BLASI Nicola Professore Ordinario Università degli Studi di NAPOLI "Federico II" Dip. FILOLOGIA MODERNA
7. NESI Paolo Professore Ordinario SUniversità degli Studi di FIRENZE Dip. SISTEMI E INFORMATICA

