corpor@uclouvain / Catalog

31 corpora found

VESPA

The Varieties of English for Specific Purposes dAtabase (VESPA) learner corpus

English written 2022

ICLE V3

International Corpus of Learner English

The International Corpus of Learner English (Version 3) is a corpus of writing by higher intermediate to advanced learners of English. It contains 5.5 million words of EFL writing from learners representing 25 different mother tongue backgrounds.

English written 5.5 million words 2020

Smartphone data for Alzheimer's detection

Corpus électronique Alzheimer

Corpus de messages électroniques provenant de patients diagnostiqués en stade pré-clinique d'Alzheimer.

French written 2020

Opinio

Corpus de messages Twitter issus de la communauté francophone de Belgique

Corpus ouvert de publications Twitter issues de la communautés francophone de Belgique.

French written 38 millions de tweets 2019

SPANS L2/L1

Spanish Narrative Spoken - L2/L1

Corpus de courtes narrations spontanées produits par des étudiant.e.s francophones de ELE.

Spanish video, learners 2018 [Collecte en cours]

CARE

The Corpus of Academic writing in fRench and English

English, French written 2018

Vos pouces pour la science

Corpus de messages belges et français issus de 43 comptes Facebook

Corpus de messages Facebook provenant de publications et de messagerie instantanée. Les messages proviennent de France et de Belgique.

French written 22 millions de mots 2018

CEFR-based Short Answer Grading

A corpus of short answers written by learners of English and graded with CEFR levels

English written, learners 712 texts 2017

COPINE

Corpus de Peticiones en Interacciones Naturalizadas en Español (Corpus de Requêtes dans des Interactions Naturalisées en Espagnol)

Corpus de requêtes en espagnol par des étudiants natifs et par des étudiants francophones en espagnol, par écrit et à l'oral, dans un contexte académique.

Spanish written, oral, video, learners, natives ca. 40 Go 2017

PROCEED

Process Corpus of English in Education

PROCEED (PROcess Corpus of English in EDucation) is a process learner corpus which relies on keystroke logging and screencasting to reproduce university students' writing process.

English written, video, learners 2017

NESSI

New Englishes Student Interviews

NESSI (New Englishes Student Interviews) is a multimodal corpus of informal interviews with university students who use English as an institutionalized second-language variety.

English oral, video, learners ca. 170,000 words 2016

MulTINCo

Multilingual Traditional Immersion and Native Corpus

MulTINCo includes spoken and (longitudinal) written data collected from French-speaking learners of Dutch and English as a second language (L2) in different educational settings (CLIL and traditional L2 classes). The database contains numerous background variables, as well as written productions in the learners’ first language (L1) (viz. French) and productions from native speakers of the learners’ L2 (viz. L1 Dutch and L1 English data).

English, Dutch written, learners 2015

Tunisian Lecture Corpus (TLC)

The Tunisian Lecture Corpus (TLC) is a non-native, specialized corpus of academic lectures collected in two institutions of higher education in Tunisia in the academic year 2014-2015. The corpus comprises around 106 thousand words and is made up of 12 video and 1 audio recordings. Thirteen lecturers of undergraduate courses in three disciplines: cultural studies, linguistics, and literature, participated in this research. Course descriptions of 9 out of 10 courses recorded are included in the corpus in addition to metadata about the participants such as gender, age, language background, and teaching experience.

English oral 106,200 words, 20 hours and 50 minutes 2015

ACADes

Corpus de artículos científicos/académicos en español

Spanish written 10 355 730 mots 2013

Emploi de Twitter par des eurodéputé·e·s et candidat·e·s à un mandat au parlement européen

French, Dutch, English, Spanish, Catalan, Galician, Basque, German written 2013

MadSex

Corpus Madrileño Oral de la Sexualidad. (Madrilenian Spoken Corpus of Sexuality)

Sociolinguistic corpus of ca. 1 million words that focuses on the topic of sexuality. It is composed of 54 sociolinguistic interviews collected in 2010-2012 in Madrid on a pre-stratified sample (district, gender, age and level of education). The interviews were based on a questionnaire (indirect elicitation), face-to-face, recorded. Three interviewers participated.

Spanish oral 1 million words 2013

sms4science

Corpus international de SMS pour la recherche scientifique

Corpus de SMS collecté entre 2004 et 2012. Il contient des messages en français, anglais, allemand et italien.

French, English, Italian, German written 150 000 SMS 2012

Label France

Unidirectional French-English translation corpus.

English, French written 2 million words 2009

Di Rupo 1er mai

Discours du 1er mai prononcé par Elio Di Rupo entre 2002 et 2007.

French written 2008

LONGDALE

Longitudinal Database of Learner English

The LONGDALE (Longitudinal Database of Learner English) is a truly longitudinal database of learner English containing data from learners from a wide range of mother tongue backgrounds. The same students are followed over a period of at least three years and data collections are organized at least once a year.

English written, learners ca. 780,000 words 2008

Modern Times

Narrations par natifs et apprenants NL et FR.

Narrations sur base d'un extrait de Modern Times (Ch. Chaplin 1934 ou 36) par des natifs et apprenants du néerlandais et du français.

French, Dutch written 2008

Leerdercorpus Nederlands

Gevarieerde verzameling schrijftaken van leerders van het Nederlands van verschillende niveaux.

Dutch written 2005

LOCNESS

Louvain Corpus of Native English Essays

Corpus of essays written by native English students (British and American).

English written 324 304 words 2005

Corpus Nederlands door Natives (CNN)

Argumentatieve teksten door studenten

Argumentatieve schrijftaken geschreven door tweedejaarsstudenten.

Dutch written 2004

PLECI

Poitiers-Louvain Echange de Corpus Informatisés

English-French bidirectional translation corpus.

English, French written 4 million words 1998

LINDSEI

Louvain International Database of Spoken English Interlanguage

The Louvain International Database of Spoken English Interlanguage (LINDSEI) is a corpus of informal interviews with higher intermediate to advanced EFL learners of English from a series of mother tongue backgrounds.

English oral, learners Over 1 million words; 554 interviews 1995

LOCNEC

Louvain Corpus of Native English Conversation

The Louvain Corpus of Native English Conversation (LOCNEC) is a corpus of informal interviews with British university students, which is the native counterpart of the Louvain International Database of Spoken English Interlanguage (LINDSEI).

English oral About 170,000 words, of which some 120,000 were produced by the interviewees 1995

The French Academic wRiting (FAR) corpus

The French Academic wRiting (FAR) corpus is a corpus of native novice French academic writing that was compiled at Université catholique de Louvain (Belgium).

French written 344,298 words 01/04/2018

DELF/DALF Corpus

A corpus of written and oral learner productions (A1-C2) from the DELF/DALF exam (Diplôme d'études en langue français, Diplôme approfondi de langue française). (Presently being compiled)

French written, oral, learners

MUST

Multilingual Student Translation corpus

The Multilingual Student Translation (MUST) corpus is a corpus of translations produced by foreign language learners or trainee translators collected collaboratively by a large number of partner teams internationally. The corpus is only accessible to the members of the MUST network.

Multilingual written, learners

VALIBEL

Français parlé en Belgique

Vaste corpus de français parlé (alignement texte-son) illustrant la variation géographique, sociale et stylistique.

French oral