39 corpora found

The Varieties of English for Specific Purposes dAtabase (VESPA) learner corpus
outlined_flag English   written   calendar_today 2022  

International Corpus of Learner English
The International Corpus of Learner English (Version 3) is a corpus of writing by higher intermediate to advanced learners of English. It contains 5.5 million words of EFL writing from learners representing 25 different mother tongue backgrounds.
outlined_flag English   written   straighten 5.5 million words   calendar_today 2020  

Corpus électronique Alzheimer
Corpus de messages électroniques provenant de patients diagnostiqués en stade pré-clinique d'Alzheimer.
outlined_flag French   written   calendar_today 2020  

Corpus de messages Twitter issus de la communauté francophone de Belgique
Corpus ouvert de publications Twitter issues de la communautés francophone de Belgique.
outlined_flag French   written   straighten 38 millions de tweets   calendar_today 2019  

Spanish Narrative Spoken - L2/L1
Corpus de courtes narrations spontanées produits par des étudiant.e.s francophones de ELE.
outlined_flag Spanish   video, learners   calendar_today 2018 [Collecte en cours]  

The Corpus of Academic writing in fRench and English
outlined_flag English, French   written   calendar_today 2018  

Corpus de messages belges et français issus de 43 comptes Facebook
Corpus de messages Facebook provenant de publications et de messagerie instantanée. Les messages proviennent de France et de Belgique.
outlined_flag French   written   straighten 22 millions de mots   calendar_today 2018  

A corpus of short answers written by learners of English and graded with CEFR levels
outlined_flag English   written, learners   straighten 712 texts   calendar_today 2017  

Corpus de Peticiones en Interacciones Naturalizadas en Español (Corpus de Requêtes dans des Interactions Naturalisées en Espagnol)
Corpus de requêtes en espagnol par des étudiants natifs et par des étudiants francophones en espagnol, par écrit et à l'oral, dans un contexte académique.
outlined_flag Spanish   written, oral, video, learners, natives   straighten ca. 40 Go   calendar_today 2017  

Process Corpus of English in Education
PROCEED (PROcess Corpus of English in EDucation) is a process learner corpus which relies on keystroke logging and screencasting to reproduce university students' writing process.
outlined_flag English   written, video, learners   calendar_today 2017  

New Englishes Student Interviews
NESSI (New Englishes Student Interviews) is a multimodal corpus of informal interviews with university students who use English as an institutionalized second-language variety.
outlined_flag English   oral, video, learners   straighten ca. 170,000 words   calendar_today 2016  

Multilingual Traditional Immersion and Native Corpus
MulTINCo includes spoken and (longitudinal) written data collected from French-speaking learners of Dutch and English as a second language (L2) in different educational settings (CLIL and traditional L2 classes). The database contains numerous background variables, as well as written productions in the learners’ first language (L1) (viz. French) and productions from native speakers of the learners’ L2 (viz. L1 Dutch and L1 English data).
outlined_flag English, Dutch   written, learners   calendar_today 2015  

The Tunisian Lecture Corpus (TLC) is a non-native, specialized corpus of academic lectures collected in two institutions of higher education in Tunisia in the academic year 2014-2015. The corpus comprises around 106 thousand words and is made up of 12 video and 1 audio recordings. Thirteen lecturers of undergraduate courses in three disciplines: cultural studies, linguistics, and literature, participated in this research. Course descriptions of 9 out of 10 courses recorded are included in the corpus in addition to metadata about the participants such as gender, age, language background, and teaching experience.
outlined_flag English   oral   straighten 106,200 words, 20 hours and 50 minutes   calendar_today 2015  

Corpus de artículos científicos/académicos en español
outlined_flag Spanish   written   straighten 10 355 730 mots   calendar_today 2013  

Presse écrite parue en ligne.
outlined_flag French   written   calendar_today 2013  

outlined_flag French, Dutch, English, Spanish, Catalan, Galician, Basque, German   written   calendar_today 2013  

Presse écrite parue en ligne.
outlined_flag French   written   calendar_today 2013  

Corpus Madrileño Oral de la Sexualidad. (Madrilenian Spoken Corpus of Sexuality)
Sociolinguistic corpus of ca. 1 million words that focuses on the topic of sexuality. It is composed of 54 sociolinguistic interviews collected in 2010-2012 in Madrid on a pre-stratified sample (district, gender, age and level of education). The interviews were based on a questionnaire (indirect elicitation), face-to-face, recorded. Three interviewers participated.
outlined_flag Spanish   oral   straighten 1 million words   calendar_today 2013  

Presse écrite parue en ligne.
outlined_flag French   written   calendar_today 2013  

Corpus international de SMS pour la recherche scientifique
Corpus de SMS collecté entre 2004 et 2012. Il contient des messages en français, anglais, allemand et italien.
outlined_flag French, English, Italian, German   written   straighten 150 000 SMS   calendar_today 2012  

Unidirectional French-English translation corpus.
outlined_flag English, French   written   straighten 2 million words   calendar_today 2009  

Discours du 1er mai prononcé par Elio Di Rupo entre 2002 et 2007.
outlined_flag French   written   calendar_today 2008  

Longitudinal Database of Learner English
The LONGDALE (Longitudinal Database of Learner English) is a truly longitudinal database of learner English containing data from learners from a wide range of mother tongue backgrounds. The same students are followed over a period of at least three years and data collections are organized at least once a year.
outlined_flag English   written, learners   straighten ca. 780,000 words   calendar_today 2008  

Narrations par natifs et apprenants NL et FR.
Narrations sur base d'un extrait de Modern Times (Ch. Chaplin 1934 ou 36) par des natifs et apprenants du néerlandais et du français.
outlined_flag French, Dutch   written   calendar_today 2008  

Gevarieerde verzameling schrijftaken van leerders van het Nederlands van verschillende niveaux.
outlined_flag Dutch   written   calendar_today 2005  

Louvain Corpus of Native English Essays
Corpus of essays written by native English students (British and American).
outlined_flag English   written   straighten 324 304 words   calendar_today 2005  

Argumentatieve teksten door studenten
Argumentatieve schrijftaken geschreven door tweedejaarsstudenten.
outlined_flag Dutch   written   calendar_today 2004  

Poitiers-Louvain Echange de Corpus Informatisés
English-French bidirectional translation corpus.
outlined_flag English, French   written   straighten 4 million words   calendar_today 1998  

Louvain International Database of Spoken English Interlanguage
The Louvain International Database of Spoken English Interlanguage (LINDSEI) is a corpus of informal interviews with higher intermediate to advanced EFL learners of English from a series of mother tongue backgrounds.
outlined_flag English   oral, learners   straighten Over 1 million words; 554 interviews   calendar_today 1995  

Louvain Corpus of Native English Conversation
The Louvain Corpus of Native English Conversation (LOCNEC) is a corpus of informal interviews with British university students, which is the native counterpart of the Louvain International Database of Spoken English Interlanguage (LINDSEI).
outlined_flag English   oral   straighten About 170,000 words, of which some 120,000 were produced by the interviewees   calendar_today 1995  

A corpus of written and oral learner productions (A1-C2) from the DELF/DALF exam (Diplôme d'études en langue français, Diplôme approfondi de langue française). (Presently being compiled)
outlined_flag French   written, oral, learners  

Tagged Arabic Texts of the GREgORI Project
outlined_flag Arabic   written  

Tagged Armenian Texts of the GREgORI Project
outlined_flag Armenian   written  

Tagged Georgian Texts of the GREgORI Project
outlined_flag Georgian   written  

Tagged Greek Texts of the GREgORI Project
outlined_flag Georgian   written  

Modern Translations Texts of the GREgORI Project
outlined_flag French, English, German, Italian   written  

Tagged Syriac Texts of the GREgORI Project
outlined_flag Georgian   written  

Multilingual Student Translation corpus
The Multilingual Student Translation (MUST) corpus is a corpus of translations produced by foreign language learners or trainee translators collected collaboratively by a large number of partner teams internationally. The corpus is only accessible to the members of the MUST network.
outlined_flag ​Multilingual   written, learners  

Français parlé en Belgique
Vaste corpus de français parlé (alignement texte-son) illustrant la variation géographique, sociale et stylistique.
outlined_flag French   oral