A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Corpus | Language | Timespan | Size | Anno. | Availability | Licence | Add. comments | VLO | ||||||||||||||||||
2 | Hungarian Historical Corpus | Hungarian | 177-2010 | 30 million words | Concordancer | Avail. through dedicated website | yes | ||||||||||||||||||||
3 | Medieval Charter Sections Corpus | Czech, Latin | 14th century | 57 chapters | manually tagged, named entities | Download | CC-BY-NC-SA 4.0 | LINDAT | yes | ||||||||||||||||||
4 | Sheffield Corpus of Chinese | Chinese | Download | CC-BY-NC-SA 3.0 | Oxford Text Archive | yes | |||||||||||||||||||||
5 | Reference corpus of historical Slovene goo300k 1.2 | Slovenian | 1584-1899 | 300,000 tokens | manually tokenised, lemmatised, PoS-tagged, modern synonyms for archaic words | Download, concordancer | CC-BY 4.0 | CLARIN.SI, KonText | yes | ||||||||||||||||||
6 | Digital library and corpus of historical Slovene IMP 1.1 | Slovenian | 1584-1919 | 17.7 million tokens | tokenised, lemmatised, PoS-tagged | Download, concordancer | CC-BY-SA 4.0 | CLARIN.SI, KonText | yes | ||||||||||||||||||
7 | IMP corpus n-grams 1.0 | Slovenian | 1584-1919 | 2.5 million n-grams | Download | CC-BY-SA 4.0 | CLARIN.SI | yes | ET: not sure if this really belongs here, as it is not a corpus. | ||||||||||||||||||
8 | Corpus of Historical American English - Kielipankki Korp version 2017H1 | American English | 1810-2009 | 385 million tokens | tokenised | Concordancer | CLARIN ACA | Kielipankki, Korp | yes | ||||||||||||||||||
9 | Historical Corpus of the Welsh Language 1500-1850 | Welsh | 1500-1850 | 420,000 words | Download, concordancer | Avail. through dedicated website | yes | ||||||||||||||||||||
10 | GerManC. A Historical Corpus of German Newspapers 1650-1800 | German | 1650-1800 | 1650-1800, 800,000 words, sampled by genre | Download | CC-BY-NC-SA 3.0 | Oxford Text Archive | yes | |||||||||||||||||||
11 | The Old Bailey Corpus | Late Modern English | 1720-1913 | 134 million words | Detailed sociobiographical, pragmatic and textual annotation | Download, concordancer | CC-BY-NC-SA 4.0 | CLARIN-D, CLARIN Federated Content Search available | yes | ||||||||||||||||||
12 | "PolDiLemma" Middle Polish Diachrone Lemmatised Corpus | Polish, German, Latin, Czech | 16th-18th century | lemmatised | Download | Public Domain | CLARIN-D, CLARIN Federated Content Search available | yes | |||||||||||||||||||
13 | Helsinki Corpus of Scottish Correspondence (1540-1750) | English | 1540-1750 | 0.5 million tokens | tokenised | Concordancer | CLARIN ACA | KielipankkI, Korp | yes | ||||||||||||||||||
14 | Parsed Corpus of Early English Correspondence (PCEEC) | English | 1410-1681 | 2.2 million words | tokenised, PoS-tagged, syntactically parsed | Download (need to "apply for approval") | Oxford Text Archive | yes | |||||||||||||||||||
15 | B4 Tatian Corpus of Deviating Examples 2.1 | Latin, Old High German | 9th century | 11,300 tokens | tokenised, MSD-tagged | Download, concordancer | CC-BY | University of Hamburg | yes | ||||||||||||||||||
16 | Syntactic Reference Corpus of Medieval French | Old French | 9th-13th century | 245,000 tokens | syntactically parsed | Download | CLARIN ACA | CLARIN-D (external site?) | yes | ||||||||||||||||||
17 | Hamburg Corpus of Old Swedish with Syntactic Annotations (HaCOSSA) | English, German, Latin, Old Norse, Swedish | 128,204 words | syntactic and morphological annotation | Download | CLARIN RES | University of Hamburg | yes | |||||||||||||||||||
18 | Deutsches Textarchiv (DTA) | German | 1600-1900 | CLARIN PUB | LINDAT | yes | |||||||||||||||||||||
19 | Reference Corpus Middle Low German/Low Rhenish (1200-1650) | Middle Low German | 1200-1650 | 200,700 tokens | tokenised, MSD-tagged | Download | CC-BY | University of Hamburg | yes | ||||||||||||||||||
20 | B4 Ludolf | Middle Low German | 1350 | 6,690 tokens | tokenised, tagged for clause type and grammatical function | CLARIN ACA | University of Hamburg | yes | |||||||||||||||||||
21 | B4 Historisches Predigtenkorpus zum Nachfeld | Middle High German | 9,2500 tokens | tokenised, syntactic, discursive annotation | CLARIN ACA | University of Hamburg | yes | ||||||||||||||||||||
22 | Mannheimer Korpus Historischer Zeitungen und Zeitschriften | German | 18th and 19th centuries | 750 volumes, 3532 pages overall | Download | yes | |||||||||||||||||||||
23 | Menota | Old Norse | 1.6 million tokens | tokenised, MSD-tagged, lemmatised | Concordancer | CC-BY | CLARINO, Corpuscle | no | |||||||||||||||||||
24 | Greek Medieval Texts | Ancient Greek | 4th-16th century | 3.4 million words | Available - Unrestricted Use | CC-BY | clarin:el | no | |||||||||||||||||||
25 | Austrian Baroque Corpus | Austrian | 1650-1750 | 200,000 | tokenised, PoS-tagged, lemmatised, named entities | Concordancer | Clarin Austria | no | |||||||||||||||||||
26 | Corpus Informatizado do Português Medieval | Portuguese | 9th to 16th centuries | 2 million | tokenised, PoS-tagged | Concordancer | Avail. through dedicated website | no | |||||||||||||||||||
27 | Parsed Corpus of Historical Portuguese | Portuguese | 1380-1881 | 3.3 million | tokenised, PoS-tagged (2 million), treebanked (1.2 million) | Avail. through dedicated website | no | ||||||||||||||||||||
28 | OROSSIMO Corpus - History | Greek | n/a | 553,131 Tokens | Structural Annotation (paragraph) | Download | CC - BY | clarin:el | no | ||||||||||||||||||
29 | ARCHER Corpus | English | 1600-1999 | none | Restricted online access (users must apply, signed user agreement required) | none | Curated by University of Manchester; interface is likely to be CQPweb | no | |||||||||||||||||||
30 | Historical Corpora at Lancaster University | English | 1500- | Numerous resources; millions of tokens | Wordclass, in some cases also semantic tagging (USAS system) | Restricted online access (users can register online; access conditions for corpora vary, and some are UK users only) | none | Numerous historical corpora available via | no | ||||||||||||||||||
31 | Older Scottish texts : the Edinburgh DOST corpus / A.J. Aitken, Paul Bratley and Neil Hamilton-Smith | English | 1450-1600 | 877.000 tokens | none | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive | yes | ||||||||||||||||||
32 | Anthology of Middle English texts / Santiago Gonzalez y Fernandez-Corugedo | English, Middle (1100-1500); English; Hebrew | 1100-1400 | 4000 words | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | ||||||||||||||||||
33 | Helsinki corpus of English texts | English; English, Old (ca. 450-1100); English, Middle (1100-1500) | 730-1710 | 240000 words | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | ||||||||||||||||||
34 | Corpus of biblical text in Scots / John Kirk | Scots | not known | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | |||||||||||||||||||
35 | Pamphlets of the American Revolution : [selections] / edited by Bernard Bailyn | English | 1750-1776 | none | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive | yes | |||||||||||||||||||
36 | Corpus of Late Modern English prose / David Denison | English | 1837-1926 | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | |||||||||||||||||||
37 | The Helsinki corpus of Older Scots : [1450-1700] | Scots | 1450-1700 | none | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive | yes | |||||||||||||||||||
38 | The Lampeter Corpus of Early Modern English Tracts | English | 1640-1740 | none | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive | yes | |||||||||||||||||||
39 | Paris speech in the past | French, Middle (ca. 1400-1600); French | 2000-07 | none | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive | yes | |||||||||||||||||||
40 | The York-Helsinki parsed corpus of Old English poetry (YCOEP) | English, Old (ca. 450-1100) | 730–1710 | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | |||||||||||||||||||
41 | Corpus of Early English Correspondence Sampler (CEECS) | English | 1418–1680 | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | |||||||||||||||||||
42 | The York-Toronto-Helsinki Parsed Corpus of Old English prose (YCOE) | English, Old (ca. 450-1100); Latin | 600-1150 | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | |||||||||||||||||||
43 | The English language of the north-west in the late Modern English period: a Corpus of late 18c Prose | English | 1761-90 | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | |||||||||||||||||||
44 | Polish language of the 1960s | Polish | 1963-1967 | none | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive | yes | |||||||||||||||||||
45 | Dictionary of Old English Corpus in Electronic Form (DOEC) | English, Old (ca. 450-1100); Latin | 600-1150 | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | |||||||||||||||||||
46 | Partonopeus de Blois: transcriptions of all manuscripts and fragments | French, Old (ca. 842-1400) | 1166-1199 | not known | none | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive | yes | ||||||||||||||||||
47 | A Corpus of English Dialogues 1560-1760 (CED) | English | 1560-1760 | not known | none | Download | Oxford Text Archive licence | Oxford Text Archive | yes | ||||||||||||||||||
48 | Parsed Corpus of Early English Correspondence (PCEEC) | English; English, Middle (1100-1500) | 1410-1695 | 2.2 million words | POS-tagging and parsing | Download | Oxford Text Archive licence | Oxford Text Archive | yes | ||||||||||||||||||
49 | The Electronic Text Corpus of Sumerian Literature. Revised edition. | English; Sumerian | 2100 BCE-1700 BCE | not known | Each word form in the composite transliterations has been assigned to a lexeme which is specified by a citation form, word class information and basic English translation. | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive | yes | ||||||||||||||||||
50 | The Lancaster Newsbooks Corpus | English | 1654-1655 | not known | none | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive | yes | ||||||||||||||||||
51 | GeMi Corpus | German | 1500-1700 | 119,802 tokens | TEI Lite markup, no linguistic annotation | Download | http://creativecommons.org/licenses/by-nc-sa/3.0/ | Oxford Text Archive; full title The Nottingham Corpus of Early Modern German Midwifery and Women's Medicine (ca. 1500-1700) | yes | ||||||||||||||||||
52 | EEBO-TCP | English | 1450-1700 | 766 million tokens | TEI P5 markup, no linguistic annotation | Download | CC-0 | Oxford Text Archive; the 'corpus' is thousands of text, available individually for download | yes | ||||||||||||||||||
53 | ECCO-TCP | English | 1700-1800 | 74 million tokens | TEI P5 markup, no linguistic annotation | Download | CC-1 | Oxford Text Archive; the 'corpus' is thousands of text, available individually for download | yes | ||||||||||||||||||
54 | EVANS-TCP | English | 1640-1821 | 102 million tokens | TEI P5 markup, no linguistic annotation | Download | CC-2 | Oxford Text Archive; the 'corpus' is thousands of text, available individually for download | yes | ||||||||||||||||||
55 | Hansard Corpus | English | 1803-2005 | 1.6 billion | POS-tags, lemmas, semantic tags | Concordancer | none | corpus.byu.edu (Brigham Young Corpora) | no | ||||||||||||||||||
56 | Corpus testuale del Tesoro della Lingua Italiana delle Origini | Italian | 23 million tokens | Lemmas | Web concordancer | unknown | Avail. through dedicated website | no | |||||||||||||||||||
57 | DiaCORIS | Italian | 1861-1945 | Web concordancer | unknown | Avail. through dedicated website | no | ||||||||||||||||||||
58 | M.I.DIA. (Morfologia dell'Italiano in DIAcronia) | Italian | 13th-20th cent. | 7,5 million tokens | Web concordancer | CC-BY-NC 4.0 | Avail. through dedicated website | no | |||||||||||||||||||
59 | Archivio Datini | Italian | Lemmas | Web concordancer | unknown | Avail. through dedicated website | no | ||||||||||||||||||||
60 | Frantext | French | 10th-21st cent | 297 586 781 words | Lemmas, POS-tags | Web concordancer | unknown | Available by paying substription | no | ||||||||||||||||||
61 | eFontes Mediae et Infimae Latinitatis Polonorum (Elektroniczny korpus polskiej łaciny średniowiecznej) | Polish, Latin | 1000–1550 | 5 million tokens | Lemmata | Web concordancer | unknown | no | |||||||||||||||||||
62 | Corpus of the 16. century Polish (Korpus polszczyzny XVI wieku) | Polish, Latin | 16 century | TEI P5 markup, lemmata, transcription | Corpus search | unknown | no | ||||||||||||||||||||
63 | The Electronic Corpus of the 17th and 18th century Polish (Korpus tekstów polskich z XVII i XVIII w.) | Polish, Latin | 1601–1772 | 12 million tokens | POS tags (for 0.5M tokens), rich structural annotation | Corpus search | unknown | no | |||||||||||||||||||
64 | Corpus of old Polish texts until 1500 (Korpus tekstów staropolskich do roku 1500) | Polish, Latin | ?–1500 | 620 thousands tokens | TEI P5 markup, no linguistic annotation | Data available to download | unknown | no | |||||||||||||||||||
65 | Corpus of the 19. century Polish (Korpus polszczyzny XIX-wiecznej) | Polish | 1830–1918 | 625 thousands tokens | Lemmata, POS tags, transliteration, transcription | Corpus search | unknown | no | |||||||||||||||||||
66 | XV century New Testament translations (Piętnastowieczne przekłady Nowego Testamentu – elektroniczna konkordancja staropolska) | Polish, Latin | 1380–1500 | 400 thousands tokens | TEI P5 markup, no linguistic annotation | Data download, translation browser, Polish and Latin word lists | unknown | no | |||||||||||||||||||
67 | IMPACT GT corpus (Korpus GT projektu IMPACT) | Polish | 1570–1756 | 1.5 million tokens | transcription | Corpus search | unknown | ||||||||||||||||||||
68 | Chronopress | Polish | 1945–1954 | 16 million tokens | Web concordancer | CC BY SA | no | ||||||||||||||||||||
69 | Bundesblatt/Feuille fédérale/Foglio federale | German/French/Italian | 1849-2014 | 203,585,806 tokens (German), 239,125,036 tokens (French), 85,223,085 tokens (Italian) | TreeTagger (all data), RFtagger (German data) | CQPweb | Université de Genève. SNF Project linked to this corpus containing documents published by the Swiss Federal Council: http://p3.snf.ch/project-143585 | no | |||||||||||||||||||
70 | DIAKORP v6 | Czech | 14th--20th century | 4 mil. tokens | currently only basic structural markup | Web concordancer | CC BY NC SA | Available upon request from the Czech National Corpus also for download. | no | ||||||||||||||||||
71 | Old Hungarian Corpus | Hungarian | 12th century - 17th century | 3 million tokens | segmented into tokens and sentences; partly normalized (to modern Hung. spelling), partly morphologically tagged; locus markers | Download & Concordancer | freely available for everyone | Avail. through dedicated website | not yet | ||||||||||||||||||
72 | Corpus of Old and Middle Hungarian court records and private correspondence | Hungarian | 16-18th century | 850 000 words | tokenised, lemmatised, morphosyntactically tagged, sociolinguistic metadata added | Concordancer | freely available for everyone | Avail. through dedicated website | not yet | ||||||||||||||||||
73 | Mikes dictionary | Hungarian | 1717-1761 | 1.5 million words | lemmatised | Concordancer | freely available for everyone | Avail. through dedicated website | not yet | ||||||||||||||||||
74 | Deutsches Textarchiv (German Text Archive, DTA) | German | 1600–1900 | 211 million tokens (growing further) | TEI text structures; tokenized, lemmatized, POS, normalized orthography | Download, Corpus Search, Text-Image-Display | Creative Commons (CC BY-NC, CC BY-SA, CC BY) | * Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) -- CLARIN-D * manual transcription + TEI annotation, automatic linguistic annotation * wide range of text types | yes | DTA subcorpora | |||||||||||||||||
75 | Dinglers Polytechnisches Journal (Polytechnical Journal of Dingler) | German | 1820–1931 | 77.5 million tokens | TEI text structures; tokenized, lemmatized, POS, normalized orthography | Download, Corpus Search, Text-Image-Display | CC BY-NC-SA 3.0 DE | * Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) -- CLARIN-D * manual transcription + TEI annotation, automatic linguistic annotation | yes | ||||||||||||||||||
76 | Referenzkorpus Mittelhochdeutsch (Middle High German Reference Corpus | German | 1050–1350 | 2.5 million tokens | tokenized, lemmatized, POS, normalized orthography, morphosyntactic description | Download, Corpus Search | CC BY-SA 4.0 International | * Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) -- CLARIN-D * manual transcription + linguistic annotation | yes | ||||||||||||||||||
77 | Die Grenzboten (journal) | German | 1842–1921 | 89 million tokens | basic TEI text structures; tokenized, lemmatized, POS, normalized orthography | Download, Corpus Search, Text-Image-Display | free | * Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) -- CLARIN-D * OCR, computer-aided TEI annotation, automatic linguistic annotation | yes | ||||||||||||||||||
78 | TreeTagger -- Middle High German parameter file | German; Middle High German | 1100-1500 | 10 million tokens | tokenized, lemmatized, POS | Download | free | Institute for Natural Language Processing, University of Stuttgart, CLARIN D, Middle High German Conceptual Database; CRETA | yes | ||||||||||||||||||
79 | OCR Post-correction | German: Antiqua and Fraktur | 18th to 20th century | web application | free | Institute for Natural Language Processing, University of Stuttgart, CLARIN D, OCR, post- correction, CRETA | yes | ||||||||||||||||||||
80 | Part-of-speech tagging: mixed text | Latin, Middle English | web application | free | Institute for Natural Language Processing, University of Stuttgart, CLARIN D, CRETA | yes | |||||||||||||||||||||
81 | DDR-Presseportal (GDR press portal) | German | 1945-1994 | 1.1 billion tokens | basic TEI text structures; tokenized, lemmatized, POS, normalized orthography | Corpus Search | CLARIN ACA | * Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) -- CLARIN-D * OCR, computer-aided TEI annotation, automatic linguistic annotation | no | ||||||||||||||||||
82 | Brieven als buit (Letters as loot) | Dutch | 17th-18th century | 460.000 words (1.000 letters) | manually transcribed (diplomatically), automatically lemmatised and grammatically tagged | concordancer | free | Dutch Language Institute (INT) | No | ||||||||||||||||||
83 | Corpus Gysseling | Dutch | 13th century | 1,5 million words | manually lemmatised and POS-tagged | concordancer, download | INT Licence for researchers | Dutch Language Institute (INT) | No | ||||||||||||||||||
84 | The Morpho-Syntactic Database of Mikael Agricola's Works | Finnish | 1544-1551 | 83,678 Sentences; 428,314 Tokens; 38,308 Words | Turku Dependency Parser: keyword, part of speech, morphological components and syntactical function | Interface | CC BY ND | Kielipankki Korp | No | ||||||||||||||||||
85 | The Finnish Gutenberg Corpus | Finnish | up to 1925 (IPR expired) | 2,457,531 Sentences; 34,487,420 Words | Interface | CC BY | Kielipankki Korp | Yes | |||||||||||||||||||
86 | Aleksis Kivi Corpus (SKS) | Finnish, Swedish | 1834–1872 | 52,821 Sentences; 413,735 Words | Interface | CC BY NC | Kielipankki Korp | Yes | |||||||||||||||||||
87 | Finnish Folk Poetry | Multilingual | 1564-1939 | 1,435,012 Sentences; 7,141,783 Words | Interface | CC BY NC | Kielipankki Korp | Yes | |||||||||||||||||||
88 | Classics of Finnish Literature, Kielipankki Version | Finnish | 1880-1949 | 1,500,000 Words | Interface, Download | EUPL v.1.1 SA | Kielipankki Korp, Kielipankki Download | Yes | |||||||||||||||||||
89 | Corpus of Old Literary Finnish | Finnish | 1543-1810 | 167,400 Sentences; 4,133,202 Words | Interface | EUPL v.1.1 SA | Kielipankki Korp | Yes | |||||||||||||||||||
90 | Corpus of Early Modern Finnish, Kielipankki Version | Finnish | 1809-1899 | 8,600,000 Words | Interface | EUPL v.1.1 SA | Kielipankki Korp | Yes | |||||||||||||||||||
91 | The Letters of Paul Sinebrychoff, Kielipankki Version | Finnish, Swedish | 1895-1909 | 100,000 Words | Interface | CC BY | Kielipankki Korp, Subcorpus Finnish, Subcorpus Swedish | Subcorpus Finnish No, subcorpus Swedish Yes | |||||||||||||||||||
92 | The Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version | Finnish, Swedish | 1770-2011, appr. 10 corpora per decade | 612,061,367 Sentences; 8,728,581,153 Words | Interface | CC BY | Kielipankki Korp, Subcorpus Finnish and Subcorpus Swedish see separate entries on this list, N-grams for both | Main corpus Yes, subcorpora No, N-grams for both subcorpora Yes | |||||||||||||||||||
93 | Classics Library of the National Library of Finland - Kielipankki version | Finnish, Swedish | 1549-1944 | 692 works in Finnish, 285 works in Swedish will be available in the near future | Will be available in the near future at Interface, Download | CC BY | Kielipankki Korp, Kielipankki Download, Subcorpus Finnish, Subcorpus Swedish | No | |||||||||||||||||||
94 | Virtual Old Literary Finnish (VVKS) - Kielipankki Korp version | Finnish | 1543-1791 | 48 Texts | Interface, Download | CC BY NC ND | Kielipankki Korp, Kielipankki Download | No | |||||||||||||||||||
95 | The Newspaper and Periodical OCR Corpus of the National Library of Finland (1771-1874) | Finnish, Swedish | 1771-1874 | 15 Gb | Download | CC BY | Kielipankki Download | Yes | |||||||||||||||||||
96 | The Newspaper and Periodical OCR Corpus of the National Library of Finland (1875-1920) | Finnish, Swedish | 1875–1920 | 8,740,000,000 Tokens; 371 Gb | Download | CLARIN ACA | Kielipankki Download | No | |||||||||||||||||||
97 | Open Richly Annotated Cuneiform Corpus, Korp Version | cuneiform | ancient | 741,129 Tokens | Interface | CC BY SA | Kielipankki Korp | Yes | |||||||||||||||||||
98 | The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version | Finnish | 1840-2011 | 5,246,334,710 Tokens | Interface | CC BY SA | Kielipankki Korp | No | |||||||||||||||||||
99 | The Swedish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version | Swedish | 1770-1950 | 3,481,646,321 Tokens | Interface | CC BY SA | Kielipankki Korp | No | |||||||||||||||||||
100 | Corpus of Old Written Estonian | Estonian | 1224-1227, 1485-1889 | 134 texts; total 2,155,435 tokens; total 1,718,114 tokens in Estonian | The texts are in the original written form. 16.-18. century texts have been tagged with contemporary Estonian, morphological and language information. 19. century texts are unannotated. | Interface | CC BY NC | CELR Meta-Share | Yes |