MWE Resource Form (Responses)


1	Timestamp	Name of the resource	URL	Type of resource	Language(s)	Size	Maximum length (number of words) of the annotated MWEs	Are the MWEs only contiguous or also non-contiguous?	Availability	Licence	Licence type	If you are a resource owner/developer and the resource is not available: are you interested in making it available (e.g. for research)?	Additional description of the resource	Other comments	Do you want to provide more detailed information?	Resource creator/owner	Contact email of the resource creator/owner	Relevant publications	Type of MWE description: Intensional or extensional	Size: the number of MWE base forms in the resource	Size: the number of MWE variants in the resource	Size: the number of variation patterns	Type(s) of MWEs	Special features	Grammatical framework	Lexical framework	Origin/source(s) of the MWEs in the resource	Sample entry

2	07/05/2014 06:50:01	Lexicon of Arabic Modal Multiword Expressions and Repository of their Variation Patterns	http://www.rania-alsabbagh.com/am-mwe.html	MWE dictionary or lexicon (MWEs only)	Modern Standard Arabic Egyptian Arabic	10 K	4	Also non-contiguous	Available, unrestricted use		Creative Commons (CC): http://creativecommons.org/examples				Yes (click continue to fill in more information)	Rania Al-Sabbagh	alsabba1@illinois.edu	Rania Al-Sabbagh, Roxana Girju and Jana Diesner. 2014. Unsupervised Construction of a Lexicon and a Pattern Repository of Arabic Modal Multiword Expressions. In Proceedings of the 10th Workshop of Multiword Expressions at EACL 2014, Gothenburg, Sweden, April 26-27, 2014.	Extensional								Dictionary, repository of variation patterns
3	07/05/2014 07:57:39	around 200 corpora for sixty languages	sketchengine.co.uk	Web service	60	computed at run time: millions	20	Only contiguous	Available, restricted use			yes	Terms and other MWEs will be available as a web service, as automatically identified (using grammar patterns and statistics over part-of-speech-tagged, lemmatised, very large corpora)	The survey doesn't fit our resources very well. Our resources are often the best there is for a language, so this is unfortunate	No (click continue to submit)
4	07/05/2014 08:35:15	National Corpus of Polish	http://clip.ipipan.waw.pl/NationalCorpusOfPolish	Treebank with MWE annotations	Polish	20,000 multi-word named entities	23	Also non-contiguous	Available, unrestricted use	GNU GPL v.3	GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html		The Named Entity level of the National Corpus of Polish is concerned. Its gold standard subcorpus, available under GPL, contains 87,300 NEs, annotated together with their nested NEs. As a result, annotation trees are provided. Over 22% of them are multi-word NEs. Coordinated NEs are annotated disjointly, which results in some discontinuities.		Yes (click continue to fill in more information)	Institute of Computer Science, Polish Academy of Sciences, with 3 partners	agata.savary@univ-tours.fr	WASZCZUK, J., GŁOWIŃSKA, K., SAVARY, A., PRZEPIÓRKOWSKI, A., LENART, M. (2013): Annotation tools for syntax and named entities in the National Corpus of Polish, in the International Journal of Data Mining, Modelling and Management, Vol. 5, No. 2, Inderscience Publishers, pp. 103-122, preprint. SAVARY, A., CHOJNACKA-KURAŚ, M., WESOŁEK, A., SKOWROŃSKA, D., , ŚLIWIŃSKI, P. (2012), "Anotacja jednostek nazewniczych", in PRZEPIÓRKOWSKI, A., BAŃKO, M., GÓRSKI, R., LEWANDOWSKA-TOMASZCZYK, B. (eds.). Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warszawa, pp. 129--167. SAVARY, A., PISKORSKI, J. (2011), Language Resources for Named Entity Annotation in the National Corpus of Polish, in Control and Cybernetics 40(2), Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland, pp. 361-391.	corpus occurrences, their lemmas and other attributes		about 20,000		Compound named entities: person names, organization names, geographical names, geopolitical names, dates, time expressions, as well as relative adjectives (e.g. Polish) and personal derivations (a varsovian) thereof.	Outermost NEs are annotated with all their nested NEs, e.g.: [National Corpus of [Polish]] The corpus is balanced with respect to different genres.			Corpus	[ [Irlandzkej]relAdj(irlandzki;placeName(Irlandia)) Armii Republikańskiej ]orgName(Irlandzka Armia Republikańska) 'Irish Republican Army'
5	07/05/2014 10:14:04	ACL RD-TEC: a dataset for terminology extraction and classification	http://www.elra.info/Language-Resources-LRs.html	a terminological bank	English	75,0000 entries		Only contiguous	Available, unrestricted use	ELRA, free for research	ELRA, free for research	yes	This is a terminological resource, each entry is annotated as valid and invalid term, in which valid terms are further annotated as technology and non-technology terms		No (click continue to submit)
6	07/05/2014 15:30:18	Comprehensive Multiword Expressions (CMWE) Corpus	http://www.ark.cs.cmu.edu/LexSem/	Treebank with MWE annotations	English	3500 instances (2400 types)		Also non-contiguous	Available, unrestricted use	CC-BY-SA	Creative Commons (CC): http://creativecommons.org/examples		This dataset provides human annotations of multiword expressions (MWEs) for sentences in social web reviews from the English Web Treebank corpus. 55,579 words (3,812 sentences, 723 documents) were annotated. MWEs are formed by grouping together words into strong (highly idiosyncratic) or weak (loosely collocational) expressions according to our English annotation guidelines (https://github.com/nschneid/nanni/wiki/MWE-Annotation-Guidelines). For example, I will sum_ it _up~with , it was worth_every_penny ! is annotated as containing 2 strong MWEs (sum_up, worth_every_penny) and 1 weak MWE (sum_up~with). These are comprehensive annotations, i.e., for each sentence, the annotator marked all expressions deemed MWEs. Every annotation was reviewed by at least two annotators. See (Schneider et al., LREC 2014) for details. The full text of the corpus is distributed by LDC. If you do not have access to the English Web Treebank you will only be able to see the annotated MWEs, not the surrounding context. A statistical system for MWE identification that was trained on this corpus is available at the same URL.		Yes (click continue to fill in more information)			Description of the corpus and annotation process: Nathan Schneider, Spencer Onuffer, Nora Kazour, Emily Danchik, Michael T. Mordowanec, Henrietta Conrad, and Noah A. Smith (2014). Comprehensive annotation of multiword expressions in a social web corpus. LREC. Annotation guidelines: https://github.com/nschneid/nanni/wiki/MWE-Annotation-Guidelines Description of MWE identification tool trained on the corpus: Nathan Schneider, Emily Danchik, Chris Dyer, and Noah A. Smith (2014). Discriminative lexical semantic segmentation with gaps: running the MWE gamut. Transactions of the Association for Computational Linguistics 2(April):193−206. http://www.cs.cmu.edu/~nschneid/mwe.pdf					multiword named entities; a wide variety of MWEs that are idiomatic in form, function, or frequency—this includes compounds, light/support verb constructions, verb particle constructions, prepositional verbs, phrasal idioms, and collocations. Each MWE instance is simply a "strong" or "weak" grouping of tokens; there is no explicit taxonomy of MWE categories.	Distinction between strong and weak MWEs (weak MWEs can contain nested strong MWEs as constituents). Gappy (non-contiguous) MWEs are allowed and other MWEs may occur inside the gap.			annotated directly in context	I will sum_ it _up~with , it was worth_every_penny ! is annotated as containing 2 strong MWEs (sum_up, worth_every_penny) and 1 weak MWE (sum_up~with).
7	10/05/2014 07:52:01	DICI (Dictionary of Italian Collocations)	no website	MWE dictionary or lexicon (MWEs only)	italian	11	3	Also non-contiguous				yes	It is still a work in progress		No (click continue to submit)
8	12/05/2014 06:10:46	Dictionary Development Process list of semantic domains.	http://semdom.org/	list of domains	English (The materials have been translated into a number of other languages. The translations can be downloaded from http://rapidwords.net/. However I do not know the quality or naturalness of the translations.)	None of the MWEs are annotated, except for being tagged for semantic domain.	6	Also non-contiguous	Available, unrestricted use	Creative Commons--Share Alike	Creative Commons (CC): http://creativecommons.org/examples		The list of domains includes example words and MWEs for each domain. The list is posted at http://semdom.org/. The list can be downloaded from http://rapidwords.net/.		No (click continue to submit)
9	15/05/2014 12:54:31	Wiktionary English phrasal verbs	http://en.wiktionary.org/wiki/Category:English_phrasal_verbs	Monolingual list of MWEs	English	2110							English verbs accompanied by particles, such as prepositions and adverbs.		No (click continue to submit)
10	15/05/2014 12:56:03	Wiktionary English idioms	http://en.wiktionary.org/wiki/Category:English_idioms	Monolingual list of MWEs	English	7894							English phrases understood by subjective, as opposed to literal meanings.		No (click continue to submit)
11	20/05/2014 01:07:56	Proposition Bank	https://catalog.ldc.upenn.edu/LDC2004T14	Treebank with MWE annotations	English			Also non-contiguous	Available, restricted use	LDC User Agreement for Non-Members Subscription & Standard Members, and Non-Members	LDC User Agreement for Non-Members		PropBank annotation was developed to provide training data for supervised machine learning classifiers. It provides semantic information, including the basic “who is doing what to whom,” in the form of predicate-by-predicate semantic role assignments. The annotation involves selection of a roleset, a coarse-grained sense of the predicate, which has a listing of the roles expressed as argument numbers associated with that sense. E.g., the roleset for Take.01: Take.01: acquire, come to have, choose, bring Arg0: Taker Arg1: Thing taken Arg2: Taken-from, source of thing taken Arg3: Destination The roleset and example sentences from frame files serve as a guide to annotators on how to assign argument numbers to annotation instances. The goal is to assign these labels across the many possible syntactic realizations of the same semantic role. The recent expansion of PB to provide coverage for noun, adjective, and complex predicates such as MWEs has enriched the semantics that PB is able to capture, but it has created an overwhelming number of new rolesets. To alleviate this, PB has opted to begin unifying frame files through a process of ‘aliasing’(Bonial et al., 2014), in which related concepts are aliased to each other and unified so that there is a single roleset representing all instantiations. Extending aliasing to a variety of MWEs is explored, such that take it easy, as in “I’m just going to take it easy,” would be aliased to the existing lexical verb roleset for relax.	Type of resource: Proposition bank on top of a treebank For further information, see https://catalog.ldc.upenn.edu/. References: Claire Bonial, Julia Bonn, Kathryn Conger, Jena D. Hwang and Martha Palmer. In preparation. Prop- Bank: Semantics of New Predicate Types. Proceedings of the Language Resources and Evaluation Conference - LREC-2014. Reykjavik, Iceland.	Yes (click continue to fill in more information)	Martha Palmer	Martha.Palmer@colorado.edu	http://verbs.colorado.edu/~mpalmer/projects/ace.html	Extensional				Phrasal verbs, light verb constructions, verbal expressions.	This is a proposition bank on top of a treebank.				Roleset id: take.26 , Project anger on someone, idiomatic, Source: , vncls: , framnet: take.26: Roleset added due to instances in CallHome corpus. Framed by Claire. No VN class. Roles: Arg0-PAG: angry person Arg1-PPT: usually "it", thing causing anger Arg2-GOL: person anger is projected on Example: Typical Usage person: ns, tense: ns, aspect: ns, voice: ns, form: ns Whether they take it out on Governor Schwartzeneggar in California could be another test of that as well. Arg0: they Rel: [take][out] Arg1: it Arg2: on Governor Schwartzeneggar Argm-loc: in California
12	20/05/2014 10:00:59	Lassy Small	http://www.let.rug.nl/~vannoord/Lassy/	Treebank with MWE annotations	Dutch	30.557	57	Also non-contiguous	Available, unrestricted use	academic free, fee for commercial use see http://tst-centrale.org/nl/producten/corpora/lassy-klein-corpus/6-66?cf_product_name=Lassy+Klein-corpus http://tst-centrale.org/nl/producten/corpora/lassy-klein-corpus-commercieel/6-83?cf_product_name=Lassy+Klein-corpus+commercieel			LASSY (Large Scale Syntactic Annotation of written Dutch) is a STEVIN project. STEVIN is a Flemish-Dutch Language and Speech Processing Technology Programme launched by de Nederlandse Taalunie. The STEVIN programme office is run jointly by NWO Humanities Division and SenterNovem. A large corpus of written Dutch texts (1,000,000 words) has been syntactically annotated (manually corrected), based on D-COI and its successor. In addition, a very large corpus (almost 700,000,000 words) has been syntactically annotated automatically. The project extends the available syntactically annotated corpora for Dutch both in size as well as with respect to the various text genres and topical domains. In addition, various browse and search tools for syntactically annotated corpora have been developed and made available. Their potential for applications in corpus linguistics and information extraction is illustrated and evaluated in a series of case studies. See also @incollection{van2013large, title={Large scale syntactic annotation of written Dutch: Lassy}, author={Van Noord, Gertjan and Bouma, Gosse and Van Eynde, Frank and De Kok, Daniel and Van der Linde, Jelmer and Schuurman, Ineke and Sang, Erik Tjong Kim and Vandeghinste, Vincent}, booktitle={Essential Speech and Language Technology for Dutch}, pages={147--164}, year={2013}, publisher={Springer} }		No (click continue to submit)
13	20/05/2014 10:12:47	Alpino Treebank	http://www.let.rug.nl/~vannoord/trees/	Treebank with MWE annotations	Dutch	2704	11	Only contiguous	Available, unrestricted use	no licence			The Alpino treebank contains syntactically annotated Dutch sentences. The treebank (more than 150,000 words) includes the full cdbl (newspaper) part of the Eindhoven corpus. The Alpino Treebank was released in 2002. In the mean-time, our treebanking efforts have led to various corrections of the actual annotations, improvements of the various tools we use, and differences in the actual XML-format that we use for the annotations.		Yes (click continue to fill in more information)	Gertjan van Noord	g.j.m.van.noord@rug.nl	Robert Malouf, Gertjan van Noord. Wide Coverage Parsing with Stochastic Attribute Value Grammars. In: IJCNLP-04 Workshop Beyond Shallow Analyses - Formalisms and statistical modeling for deep analyses. Leonoor van der Beek, Gosse Bouma, Robert Malouf, Gertjan van Noord. The Alpino Dependency Treebank. In: Computational Linguistics in the Netherlands CLIN 2001. Rodopi 2002. Leonoor van der Beek, Gosse Bouma, and Gertjan van Noord. Een brede computationele grammatica voor het Nederlands. Nederlandse Taalkunde, 2002. Gosse Bouma and Geert Kloosterman. Querying dependency treebanks in XML. In Proceedings of the Third international conference on Language Resources and Evaluation (LREC), Gran Canaria, 2002. Gosse Bouma, Gertjan van Noord, Robert Malouf. Alpino: Wide Coverage Computational Analysis of Dutch. In: Computational Linguistics in the Netherlands CLIN 2000. Rodopi 2001.	treebank				named entities idiomatic expressions foreign language		dependency treebank		Corpus
14	20/05/2014 10:27:11	DuelME	http://tst-centrale.org/nl/producten/lexica/duelme/7-35?cf_product_name=DuELME	MWE dictionary or lexicon (MWEs only)	Dutch	5000		Also non-contiguous	Available, restricted use	academic free, fee for commercial use	TST cenrrale		The paper describes a 5.000 entry corpus-based multi-word expression lexical database forDutch developed using thesemethods. The database has been externally validated, and its usability has been evaluated in NLP-systems for Dutch. The MWE database developed fills a gap in existing lexical resources for Dutch. The generic methods and tools for MWE identification and lexical representation focus on Dutch, but they are largely language-independent and can also be used for other languages, new domains, and beyond this project. The research results and data described in this paper have therefore significantly contributed to strengthening the digital infrastructure for Dutch, and will continue to do so in the context of the CLARIN research infrastructure.		Yes (click continue to fill in more information)			@incollection{odijk2013identification, title={Identification and lexical representation of multiword expressions}, author={Odijk, Jan}, booktitle={Essential Speech and Language Technology for Dutch}, pages={201--217}, year={2013}, publisher={Springer} }	not sure				mostly verbal expressions	LMF version exists	generic		Dictionary, Corpus
15	26/05/2014 17:25:45	Pattern Dictionary of English Prepositions (PDEP)	http://www.clres.com/db/TPPEditor.html	Dictionary or lexicon with MWEs (also includes MWEs)	English	Approximately 270 English phrasal prepositions	4	Only contiguous	Available, unrestricted use		GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html		The Pattern Dictionary of English Prepositions (PDEP) provides a comprehensive inventory of English prepositions, including phrasal prepositions. PDEP provides a sense=annotated corpus for these prepositions and characterizes their behavior in prototypical syntagmatic patterns. Included in this description is the class to which each sense belongs, enabling an examination of properties across prepositions, such as spatial or temporal prepositions.		No (click continue to submit)
16	30/05/2014 11:00:12	Multilingual Collocation Dictionary	no website	Dictionary or lexicon with MWEs (also includes MWEs)	French, Romanian, German	250 multilingual entries	3	Also non-contiguous	Available, restricted use	CC-BY-NC academic use only, no derivatives	Creative Commons (CC): http://creativecommons.org/examples	yes	The multilingual dictionary contains trilingual entries (verbo-nominal collocations) for French, for Romanian and for German. We represent verbo-nominal collocations, with their morpho-syntactic properties (preference for specific number, case or gender, for voice, for some prepositions). Examples extracted from corpora and their frequency are also available		No (click continue to submit)
17	02/06/2014 21:59:15	Collection of Distibutionally Idiosyncratic Items (CoDII)	http://www.english-linguistics.de/codii/	Multilingual list of MWEs	English, German	English: < 100 German: > 400	4	Also non-contiguous	Available, unrestricted use			yes	The Collection of Distributionally Idiosyncratic Items (CoDII) is a linguistic resource on lexical items which have highly idiosyncratic occurrence patterns, such as bound words. So, rather than being a general MWE resource, only bound words (and the expressions containing them) are documented) The bound words and the corresponding expressions can be downloaded as txt files from: http://multiword.sourceforge.net/PHITE.php?sitesig=FILES&page=FILES_20_Data_Sets Files: German_CE_Trawinski, English_CE_Trawinski		Yes (click continue to fill in more information)	Frank Richter, Beata Trawinski, Manfred Sailer	sailer@em.uni-frankfurt.de	http://www.english-linguistics.de/codii/index.html	Intensional				MWEs with bound words	Various linguistic classifications of the MWE are included			manually, based on the phraseological literature
18	06/06/2014 17:35:05	English DELA e-dictionary	http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html	Dictionary or lexicon with MWEs (also includes MWEs)	English	296,606 simple word forms for 150,145 different lemmas 132,990 multi-word forms for 69,912 different lemmas	8	Only contiguous	Available, unrestricted use	LGPL-LR	GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html		The file contains inflected forms and lemmas for both single and compound words. Example of a compound entry: waves of immigrants,wave of immigrants.N+NPN+z1:p Inflected form: waves of immigrants Lemma: wave of immigrants Category: N (noun) Syntactic structure: NPN (noun preposition noun) Popularity: z1 (frequently used) Morphological features: p (plural)		Yes (click continue to fill in more information)	CHROBOT, A., COURTOIS, B., HAMANI, M., GROSS, M., ZELLAGUI, K.	agata.savary@univ-tours.fr	SAVARY, A. (2000): Recensement et description des mots composés - méthodes et applications.. Thèse de doctorat en Informatique Fondamentale (PhD Thesis), Université de Marne-la-Vallée. SAVARY, A. (2000): Recensement et description des mots composés - méthodes et applications.. Thèse de doctorat en Informatique Fondamentale (PhD Thesis), Université de Marne-la-Vallée. (in French)	Extensional	69912	132990		Contiguous general language MWEs, mainly compound nouns and adjectives.	Popularity: z1 (frequently used)	None	Corpus processors: Unitex, NooJ	Dictionary	waves of immigrants,wave of immigrants.N+NPN+z1:p Inflected form: waves of immigrants Lemma: wave of immigrants Category: N (noun) Syntactic structure: NPN (noun preposition noun) Popularity: z1 (frequently used) Morphological features: p (plural)
19	06/06/2014 17:37:09	French DELA	http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html	Dictionary or lexicon with MWEs (also includes MWEs)	French	683,824 forms of simple words for 102,073 different lemmas 108,436 compound forms for 83,604 different lemmas		Only contiguous	Available, unrestricted use	LGPL-LR	GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html				No (click continue to submit)
20	06/06/2014 17:49:29	SAWA - a Grammatical Lexicon of Warsaw Urban Proper Names	http://zil.ipipan.waw.pl/SAWA	MWE dictionary or lexicon (MWEs only)	Polish	300000	6	Only contiguous	Available, unrestricted use	CC BY-SA	Creative Commons (CC): http://creativecommons.org/examples		Contains proper names of the places and institutions related to the Warsaw transportation system. Almost all of the names are multi-word.		Yes (click continue to fill in more information)	IPIPAN Warsaw	Malgorzata.Marciniak@ipipan.waw.pl	SAVARY, A., RABIEGA-WIŚNIEWSKA, J., WOLIŃSKI, M. (2009): Inflection of Polish Multi-Word Proper Names with Morfeusz and Multiflex, in MARCINIAK, M., MYKOWIECKA, A. (eds.) "Aspects of Natural Language Processing", Lecture Notes in Computer Science 5070, Springer Verlag, pp. 111–141. http://www.info.univ-tours.fr/~savary/Papers/savary-et-al-LNAI-2009.pdf MARCINIAK, M., RABIEGA-WIŚNIEWSKA, J., SAVARY, A., WOLIŃSKI, M., HELIASZ, C. (2009): Constructing an Electronic Dictionary of Polish Urban Proper Names, in Recent Advances in Intelligent Information Systems (Proceedings of the Balto-Slavonic Natural Language Processing Workshop, Kraków), Academic Publishing House EXIT, Warsaw, pp. 743–749. http://www.info.univ-tours.fr/~savary/Papers/marciniak-et-al-BSNLP-2009.pdf	both	9000	300000	450	Proper names of places and institutions related to the Warsaw transportation system (street, squares, bus stops, bridges, people after whom streets are named, etc.)	Includes old variants on steer and square names, notably those before 1989. Nested NEs are delimited and factorized. Morphosyntactic variants are represented.		Multiflex (http://www.springerlink.com/content/n265j22n73084433/), Morfeusz (http://sgjp.pl/morfeusz/)	Dictionary, Intitutional lists of streets and bus stops	Intentional entry: ulica(ulica:subst:sg:nom:f) {Aleksandra Bardiniego}(Aleksander Bardini:subst:sg:gen:m1),subst(NC-O_N-ulica-OSOBY) 'Aleksander Bardini Street' Extensional entry: ul. A. Bardiniego,ulica Aleksandra Bardiniego:subst:sg:loc:f
21	06/06/2014 18:01:00	SEJFEK - Grammatical Lexicon of Polish Economic Phraseology	http://zil.ipipan.waw.pl/SEJFEK	MWE dictionary or lexicon (MWEs only)	Polish	146,861 inflected forms	10	Only contiguous	Available, unrestricted use	CC BY-SA	Creative Commons (CC): http://creativecommons.org/examples				Yes (click continue to fill in more information)	Filip Makowiecki, Agata Savary	agata.savary@univ-tours.fr	SAVARY, A., ZABOROWSKI, B., KRAWCZYK-WIECZOREK, A., MAKOWIECKI, F. (2012): SEJFEK — a Lexicon and a Shallow Grammar of Polish Economic Multi-Word Units, in Proceedings of Cognitive Aspects of the Lexicon (COGALEX-III), a Workshop at COLING 2012, Mumbai, India. http://aclweb.org/anthology//W/W12/W12-5116.pdf	both	11212	146861	305	Multi-word nominal terms from the domain of economy and finance	Nested MWEs are delimited and factorized.		Multiflex (http://www.springerlink.com/content/n265j22n73084433/), Morfeusz (http://sgjp.pl/morfeusz/), Toposław (http://zil.ipipan.waw.pl/Toposlaw), Unitex (http://igm.univ-mlv.fr/~unitex/)	Dictionary, Internet	Intensional entry: założenie(założenie:subst:sg:nom:n2) {lokaty bankowej}(lokata bankowa:subst:sg:gen:f),subst(NC-O_N-nb-inv-pl) Ekstensional entry: założenia lokat bankowych,założenie lokaty bankowej:subst:sg:gen:n2
22	06/06/2014 18:15:30	Prolexbase	http://zil.ipipan.waw.pl/Prolexbase	multilingual relational database of named entities with MWEs	Polish, English, French	320000	16	Only contiguous	Available, unrestricted use	CC BY-SA	Creative Commons (CC): http://creativecommons.org/examples				Yes (click continue to fill in more information)	Małgorzata Baron, Béatrice Bouchou Markhoff, Leszek Manicki, Denis Maurel, Agata Savary, Mickaël Tran, Duško Vitas	agata.savary@univ-tours.fr	Maurel, D. (2008): Prolexbase: a Multilingual Relational Lexical Database of Proper Names. In proceedings of LREC 2008, Marrakech, Morocco. http://www.lrec-conf.org/proceedings/lrec2008/summaries/91.html Savary, A., Manicki, L., Baron, M. (2013): Populating a Multilingual Ontology of Proper Names from Open Sources. In Journal of Language Modelling, Vol 2, No. 2, pp. 189-225. http://jlm.ipipan.waw.pl/ojs/index.php/JLM/article/view/63	both	173000	320000		Proper names, most of which are multi-word units.	Semantic network with interlingual links. Relations of synonymy, meronymy, etc. between the named objects. Relative adjectives and inhabitant names. All data are manually validated.			Dictionary, Wikipedia, Geonames
23	13/06/2014 16:44:26	Szeged TreebankFX	http://www.inf.u-szeged.hu/rgai/mwe	Treebank with MWE annotations	Hungarian	6734 light verb constructions	3	Also non-contiguous	Available, restricted use	academic use only	own licencing		The Szeged Treebank is a morphosyntactically tagged and syntactically annotated database, which is available in both constituency-based and dependency-based versions. All texts in the corpus are manually annotated for LVCs. The corpus contains 6734 occurrences of 1215 LVCs altogether in 82,099 sentences.		Yes (click continue to fill in more information)	University of Szeged, Department of Informatics	vinczev@inf.u-szeged.hu	Vincze, Veronika 2011: Semi-Compositional Noun + Verb Constructions: Theoretical Questions and Computational Linguistic Analyses. PhD thesis, University of Szeged, August 2011. Vincze, Veronika; Csirik, János 2010: Hungarian Corpus of Light Verb Constructions. In: Proceedings of COLING 2010, Beijing, China, pp. 1110-1118. Vincze, Veronika; Zsibrita, János; Nagy T., István 2013: Dependency Parsing for Identifying Hungarian Light Verb Constructions. In: Proceedings of IJCNLP 2013, pp. 207-215.	Extensional	1215			light verb constructions	verbal, participial and nominal occurrences are also annotated; non-adjacent LVCs are annotated	constituency and dependency grammar		manually annotated
24	15/06/2014 15:34:26	The Grammatical Lexicon of Polish Phraseology (SEJF = Słownik elektroniczny jednostek frazeologicznych)	http://zil.ipipan.waw.pl/SEJF	MWE dictionary or lexicon (MWEs only)	Polish	3200 multi-word lexemes, 68,000 corresponding inflected forms	6	Only contiguous	Available, unrestricted use	CC BY-SA license.	Creative Commons (CC): http://creativecommons.org/examples				Yes (click continue to fill in more information)	Monika Czerepowicka (lexicography) and Agata Savary (automatic inflection and validation)	czerepowicka@gmail.com	GRALIŃSKI, F., SAVARY, A., CZEREPOWICKA, M., MAKOWIECKI, F. (2010): Computational Lexicography of Multi-Word Units: How Efficient Can It Be?, in Proceedings of Multiword Expressions: from Theory to Applications (MWE 2010), Workshop at COLING 2010, Beijing, China, August 28. CZEREPOWICKA, M., KOSEK, I. (2011): Problemy opisu związków frazeologicznych w formalizmie „Multifleks” (na przykładzie rodzaju wyrażeń frazeologicznych), in "Różne formy, różne treści", pp. 117–126, Warszawa 2011. CZEREPOWICKA, M. (2011): „Toposław” jako narzędzie znakowania jednostek wieloczłonowych, in Matusiak-Kempa, I., Przybyszewski, S. (eds.) Nowe zjawiska w języku, tekście, komunikacji. Kontekst a komunikacja, Olsztyn, pp. 28–35. CZEREPOWICKA, M. (2014): Jednostki obce w słowniku języka polskiego na przykładzie "Słownika elektronicznego jednostek frazeologicznych" (SEJF), in LingVaria IX (2014) \| 1 (17), doi: 10.12797/LV.09.2014.17.04, pp. 59-68.	Extensional	3200	68000	160 graph-based inflection paradigms	The Dictionary contains mainly multi-word nouns (2121 lemmas) and adverbs (604), adjectives (446) and others of general (non terminological) Polish language.	SEJF can code nested MWEs.	<CATEGORIES> Nb : sg , pl Case: nom, gen, dat, acc, inst, loc, voc Gen: m1, m2, m3, f, n1, n2, p1, p2, p3 Pers: pri, sec, ter Deg: pos, com, sup Asp: imperf, perf Neg: aff, neg Accent: akc, nakc Postprep : praep, npraep Accom: congr, rec Agglt: nagl, agl Vocal: wok, nwok <EXTRA_CATEGORIES> Usage: <E>,offic, neut, spok <GRAPHICAL_CATEGORIES> LetterCase: same, all_lower, all_upper, first_upper,first_upper_each_word,no_letter_case,other Init:<E>,dot,no_dot,dot2,no_dot2,dot3,no_dot3,dot4,no_dot4,dot5,no_dot5 Dot : pun , npun <CLASSES> subst: (Nb,<var>),(Case,<var>),(Gen,<fixed>),(Usage,<var>) depr: (Nb,<fixed>),(Case,<var>),(Gen,<fixed>) num: (Nb,<fixed>),(Case,<var>),(Gen,<var>),(Accom,<var>) numcol: (Nb,<fixed>),(Case,<var>),(Gen,<fixed>),(Accom,<var>) adj: (Nb,<var>),(Case,<var>),(Gen,<var>),(Deg,<var>) adja: adjc: adjp: adv: (Deg,<var>) ppron12: (Nb,<fixed>),(Case,<var>),(Gen,<var>),(Pers,<fixed>),(Accent,<var>) ppron3: (Nb,<var>),(Case,<var>),(Gen,<var>),(Pers,<fixed>),(Accent,<var>),(Postprep,<var>) siebie: (Case,<var>) fin: (Nb,<var>),(Pers,<var>),(Asp,<fixed>) bedzie: (Nb,<var>),(Pers,<var>),(Asp,<fixed>) aglt: (Nb,<var>),(Pers,<var>),(Asp,<fixed>),(Vocal,<var>) praet: (Nb,<var>),(Gen,<var>),(Asp,<fixed>),(Agglt,<var>) impt: (Nb,<var>),(Pers,<var>),(Asp,<fixed>) imps:(Asp,<fixed>) inf:(Asp,<fixed>) pcon:(Asp,<fixed>) pant:(Asp,<fixed>) ger: (Nb,<var>),(Case,<var>),(Gen,<fixed>),(Asp,<fixed>),(Neg,<var>) pact: (Nb,<var>),(Case,<var>),(Gen,<var>),(Asp,<fixed>),(Neg,<var>) ppas: (Nb,<var>),(Case,<var>),(Gen,<var>),(Asp,<fixed>),(Neg,<var>) winien: (Nb,<var>),(Gen,<var>),(Asp,<fixed>) pred: prep:(Case,<fixed>) conj: qub: xxs: (Nb,<var>),(Case,<var>),(Gen,<fixed>) xxx: ign: interp: sp: burk: brev:(Dot,<fixed>)		Dictionary, Corpus, collected manually
25	18/06/2014 08:26:35	List of Hungarian light verb constructions	http://www.inf.u-szeged.hu/rgai/mwe	Monolingual list of MWEs	Hungarian		3	Also non-contiguous	Available, unrestricted use				Light verb constructions were collected from the manually annotated corpora Szeged TreebankFX and SzegedParalellFX. The list contains their base forms.		Yes (click continue to fill in more information)	University of Szeged, Department of Informatics	vinczev@inf.u-szeged.hu	Vincze, Veronika 2012: Light Verb Constructions in the SzegedParalellFX English-Hungarian Parallel Corpus. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation (LREC 2012). Istanbul, Turkey, pp. 2381-2388. Vincze, Veronika; Csirik, János 2010: Hungarian Corpus of Light Verb Constructions. In: Proceedings of COLING 2010, Beijing, China, pp. 1110-1118.					light verb constructions				Corpus
26	18/06/2014 08:28:51	List of English light verb constructions	http://www.inf.u-szeged.hu/rgai/mwe	Monolingual list of MWEs	English		3	Also non-contiguous	Available, unrestricted use				Light verb constructions were collected from the manually annotated corpora Wiki50 and SzegedParalellFX. Their base forms are included in the list.		Yes (click continue to fill in more information)	University of Szeged, Department of Informatics	vinczev@inf.u-szeged.hu	Vincze, Veronika; Nagy T., István; Berend, Gábor 2011: Multiword expressions and Named Entities in the Wiki50 corpus. In: Proceedings of RANLP 2011. Hissar, Bulgaria, pp. 289-295. Vincze, Veronika 2012: Light Verb Constructions in the SzegedParalellFX English-Hungarian Parallel Corpus. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation (LREC 2012). Istanbul, Turkey, pp. 2381-2388.					light verb constructions				Corpus
27	18/06/2014 08:33:25	Bilingual list of English-Hungarian light verb constructions	http://www.inf.u-szeged.hu/rgai/mwe	Multilingual parallel list of MWEs	English, Hungarian		3	Also non-contiguous	Available, unrestricted use				Light verb constructions from the manually annotated SzegedParalellFX corpus were collected and the English and Hungarian equivalents were matched. Also, their verbal counterparts are also provided (if any).		Yes (click continue to fill in more information)	University of Szeged, Department of Informatics	vinczev@inf.u-szeged.hu	Vincze, Veronika 2012: Light Verb Constructions in the SzegedParalellFX English-Hungarian Parallel Corpus. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation (LREC 2012). Istanbul, Turkey, pp. 2381-2388.					light verb constructions				Corpus	observe make an observation megfigyelést tesz megfigyel
28	19/06/2014 11:19:09	SALDO	http://spraakbanken.gu.se/resurs/saldo	Dictionary or lexicon with MWEs (also includes MWEs)	Swedish	about 7500	10	Also non-contiguous	Available, unrestricted use	CC-BY	Creative Commons (CC): http://creativecommons.org/examples		SALDO is a lexical-semantic resource, differently organized from a wordnet. It treats single-word items and MWEs in the same way, in the sense that MWEs have a part of speech and an inflectional paradigm but no internal structure. See the following references for more information: @article{Borin-Lars2013-9, title = "SALDO: a touch of yin to WordNet's yang", journal = "Language resources and evaluation", author = "Borin, Lars and Forsberg, Markus and LÃ¶nngren, Lennart", year = "2013", volume = "47", number = "4", url = "http://dx.doi.org/10.1007/s10579-013-9233-4", pages = "1191--1211", } @article{Borin-Lars2013-6, title = "Close encounters of the fifth kind: Some linguistic and computational aspects of the Swedish FrameNet++ project", journal = "Veredas", author = "Borin, Lars and Forsberg, Markus and Lyngfelt, Benjamin", year = "2013", volume = "17", number = "1", url = "http://www.ufjf.br/revistaveredas/files/2013/11/2-BORIN-FORSBERG-LINGFELT-FINAL.pdf", pages = "28--43", }
29	19/06/2014 11:24:43	Swedish FrameNet (SweFN)	http://spraakbanken.gu.se/eng/swefn	Dictionary or lexicon with MWEs (also includes MWEs)	Swedish	a few thousand		Also non-contiguous	Available, unrestricted use	CC-BY	Creative Commons (CC): http://creativecommons.org/examples		SweFN is a framenet for Swedish. It reuses Berkeley FrameNet frames as much as possible and also adds new frames. The word sense inventory used for identifying lexical units is that of SALDO, hence any MWE found in SALDO is a candidate for a lexical unit in SweFN.
30	19/06/2014 11:35:27	Swedish FrameNet++ (SweFN++)	http://spraakbanken.gu.se/eng/swefn	Dictionary or lexicon with MWEs (also includes MWEs)	Swedish English a number of South Asian languages Finnish	about 8000	10	Also non-contiguous	Available, unrestricted use	CC-BY	Creative Commons (CC): http://creativecommons.org/examples		Swedish FrameNet++ is a lexical macroresource created by interlinking a number of freely available digital lexical resources. As opposed to most such endeavors (e.g. BabelNel, UBY, Etymological WordNet, etc,) SweFN++ is not only based on automatic processing of the resources, but a considerable amount of manual post-correction and qualified linguistic and lexicographic work have gone into this effort. The resources are interlinked using the sense and form-unit PIDs of SALDO, the pivot resource of SweFN++. Part of the sense inventory is linked to other languages through WordNet synsets and IDS/LWT identifiers. See the following references: @article{Borin-Lars2013-6, title = "Close encounters of the fifth kind: Some linguistic and computational aspects of the Swedish FrameNet++ project", journal = "Veredas", author = "Borin, Lars and Forsberg, Markus and Lyngfelt, Benjamin", year = "2013", volume = "17", number = "1", url = "http://www.ufjf.br/revistaveredas/files/2013/11/2-BORIN-FORSBERG-LINGFELT-FINAL.pdf", pages = "28--43", } @incollection{Borin-Lars2013-15, title = "The Intercontinental Dictionary Series â€“ a rich and principled database for language comparison", booktitle = "Approaches to Measuring Linguistic Differences / ed. by Lars Borin ; Anju Saxena ", author = "Borin, Lars and Comrie, Bernard and Saxena, Anju", year = "2013", publisher = "De Gruyter Mouton", address = "Berlin", isbn = "978-3-11-030525-8", pages = "285--302", }
31	19/06/2014 14:43:44	Reference data for Collocation Extraction	http://ufal.mff.cuni.cz/~pecina/resources.html	MWE dictionary or lexicon (MWEs only)	Czech	12000 thousands approx	2	Also non-contiguous	Available, unrestricted use	CC-BY-NC	Creative Commons (CC): http://creativecommons.org/examples	yes	Annotated list of dependency bigrams occurring in the PDT more than five times and having part-of-speech patterns that can possibly form a collocation. Each bigram is assigned to one of the six MWE categories described below by three annotators.		Yes (click continue to fill in more information)	Pavel Pecina	pecina@ufal.mff.cuni.cz	Pavel Pecina. Lexical association measures and collocation extraction. Language Resources and Evaluation, 44, pages 137-158, 2010. Pavel Pecina. Reference Data for Czech Collocation Extraction. In Proceedings of the LREC 2008 Workshop Towards a Shared Task for Multiword Expressions, pages 11-14, Marrakech, Morocco, 2008. Pavel Pecina and Pavel Schlesinger: Combining Association Measures for Collocation Extraction. Proceedings of the 21th International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, Australia, July 2006.	Intensional	12232			1. stock phrases 2. names of persons, organizations, geographicallocations, and 3. support verb constructions 4. technical terms 5. idiomatic expressions		dependency		Corpus	geometrický AI1A Atr prostor NI-A Head
32	19/06/2014 18:24:24	SEJF - The Grammatical Lexicon of Polish Phreseology (SEJF = Słownik Elektroniczny Jednostek Frazeologicznych)	http://zil.ipipan.waw.pl/SEJF	MWE dictionary or lexicon (MWEs only)	Polish	The lexicon contains about 3200 multi-word lexemes, 68,000 corresponding inflected forms.	6	Only contiguous	Available, unrestricted use	The lexicon contains about 3200 multi-word lexemes, 68,000 corresponding inflected forms	Creative Commons (CC): http://creativecommons.org/examples				Yes (click continue to fill in more information)	Monika Czerepowicka (lexicography) and Agata Savary (automatic inflection and validation)	czerepowicka@gmail.com	GRALIŃSKI, F., SAVARY, A., CZEREPOWICKA, M., MAKOWIECKI, F. (2010): Computational Lexicography of Multi-Word Units: How Efficient Can It Be?, in Proceedings of Multiword Expressions: from Theory to Applications (MWE 2010), Workshop at COLING 2010, Beijing, China, August 28. CZEREPOWICKA, M., KOSEK, I. (2011): Problemy opisu związków frazeologicznych w formalizmie „Multifleks” (na przykładzie rodzaju wyrażeń frazeologicznych), in "Różne formy, różne treści", pp. 117–126, Warszawa 2011. CZEREPOWICKA, M. (2011): „Toposław” jako narzędzie znakowania jednostek wieloczłonowych, in Matusiak-Kempa, I., Przybyszewski, S. (eds.) Nowe zjawiska w języku, tekście, komunikacji. Kontekst a komunikacja, Olsztyn, pp. 28–35. CZEREPOWICKA, M. (2014), Jednostki obce w słowniku języka polskiego na przykładzie "Słownika elektronicznego jednostek frazeologicznych" (SEJF), LingVaria 2014 (IX), z. 1 (17), s. 59-68 [doi: 10.12797/LV.09.2014.17.04].	Extensional	about 3200	about 68000	160 graph-based inflection paradigms	SEJF contains mainly nominal (2121 units) and also adjectival (446) and adverbial (604) compounds of the general (non terminological) Polish language.		The morphosyntactic tagset using following categories: Nb : sg , pl Case: nom, gen, dat, acc, inst, loc, voc Gen: m1, m2, m3, f, n1, n2, p1, p2, p3 Pers: pri, sec, ter Deg: pos, com, sup Asp: imperf, perf Neg: aff, neg Accent: akc, nakc Postprep : praep, npraep Accom: congr, rec Agglt: nagl, agl Vocal: wok, nwok Each unit is annotated as a one of the following classes: subst: (Nb,<var>),(Case,<var>),(Gen,<fixed>),(Usage,<var>) depr: (Nb,<fixed>),(Case,<var>),(Gen,<fixed>) num: (Nb,<fixed>),(Case,<var>),(Gen,<var>),(Accom,<var>) numcol: (Nb,<fixed>),(Case,<var>),(Gen,<fixed>),(Accom,<var>) adj: (Nb,<var>),(Case,<var>),(Gen,<var>),(Deg,<var>) adja: adjc: adjp: adv: (Deg,<var>) ppron12: (Nb,<fixed>),(Case,<var>),(Gen,<var>),(Pers,<fixed>),(Accent,<var>) ppron3: (Nb,<var>),(Case,<var>),(Gen,<var>),(Pers,<fixed>),(Accent,<var>),(Postprep,<var>) siebie: (Case,<var>) fin: (Nb,<var>),(Pers,<var>),(Asp,<fixed>) bedzie: (Nb,<var>),(Pers,<var>),(Asp,<fixed>) aglt: (Nb,<var>),(Pers,<var>),(Asp,<fixed>),(Vocal,<var>) praet: (Nb,<var>),(Gen,<var>),(Asp,<fixed>),(Agglt,<var>) impt: (Nb,<var>),(Pers,<var>),(Asp,<fixed>) imps:(Asp,<fixed>) inf:(Asp,<fixed>) pcon:(Asp,<fixed>) pant:(Asp,<fixed>) ger: (Nb,<var>),(Case,<var>),(Gen,<fixed>),(Asp,<fixed>),(Neg,<var>) pact: (Nb,<var>),(Case,<var>),(Gen,<var>),(Asp,<fixed>),(Neg,<var>) ppas: (Nb,<var>),(Case,<var>),(Gen,<var>),(Asp,<fixed>),(Neg,<var>) winien: (Nb,<var>),(Gen,<var>),(Asp,<fixed>) pred: prep:(Case,<fixed>) conj: qub: xxs: (Nb,<var>),(Case,<var>),(Gen,<fixed>) xxx: ign: interp: sp: burk: brev:(Dot,<fixed>)		Dictionary, Corpus, collected manually	aleja(aleja:subst:sg:nom:f) sztywnych(sztywny:subst:pl:gen:m1),subst(NC-O_N) entry: aleja sztywnych morphosyntactic tag of the entry: subst [noun] morphosyntactic disambiguation in the brackets after a word: (lexeme : morphosyntactic tag of a proper form : value of the Number category : value of the Case category : value of the Gender category) information in the brackets after a morphosyntactic tag of the entry, eg. (NC-O_N) - type of a graph which is use to inflect the unit list of all inflected forms of the unit (MWE): aleja sztywnych,aleja sztywnych:subst:sg:nom:f aleje sztywnych,aleja sztywnych:subst:pl:nom:f alei sztywnych,aleja sztywnych:subst:sg:gen:f alej sztywnych,aleja sztywnych:subst:pl:gen:f alei sztywnych,aleja sztywnych:subst:pl:gen:f alei sztywnych,aleja sztywnych:subst:sg:dat:f alejom sztywnych,aleja sztywnych:subst:pl:dat:f aleję sztywnych,aleja sztywnych:subst:sg:acc:f aleje sztywnych,aleja sztywnych:subst:pl:acc:f aleją sztywnych,aleja sztywnych:subst:sg:inst:f alejami sztywnych,aleja sztywnych:subst:pl:inst:f alei sztywnych,aleja sztywnych:subst:sg:loc:f alejach sztywnych,aleja sztywnych:subst:pl:loc:f alejo sztywnych,aleja sztywnych:subst:sg:voc:f aleje sztywnych,aleja sztywnych:subst:pl:voc:f
33	21/06/2014 11:17:29	WICOL	http://www.vronk.net/wicol/index.php/Main_Page	MWE dictionary or lexicon (MWEs only)	Slovak, German	not specified	3	Only contiguous	Available, restricted use			yes	Collocation profiles of 250 Slovak nouns Collocation profiles of 700 Slovak Adjectives Collocation profiles of 500 German Nons with Slovak equivalents Collocation profiles of 250 German Adjectives with Slovak equivalents		No (click continue to submit)
34	22/06/2014 19:53:21	WordNet-Affect translated in Romanian and Russian.	http://lilu.fcim.utm.md/resourcesRoRuWNA.html	Dictionary or lexicon with MWEs (also includes MWEs)	English, Romanian, Russian	348	4	Also non-contiguous	Available, unrestricted use		no licence		WordNet-Affect is a lexical resource that contains information about emotions the words convey. It has been developed from the lexical knowledge base WordNet, through a selection and labelling of the affective concepts represented by sets of synonyms. Affective labels (a-labels) were manually assigned to Word Net synsets of nouns, adjectives, verbs and adverbs which convey affective meaning. Words labelled with the Emotion tag were further reannotated into six emotional categories: joy, fear, anger, sadness, disgust, surprise. Word Net-Affect is freely available for research purposes at http://wndomains.itc.it. The collection of WORDNET-AFFECT synsets used in our work was provided as a resource in SemEval-2007 Affective Text task focused on text annotation with affective tags. Word Net-Affect is organised in six files: anger.txt, disgust.txt, fear.txt, joy.txt, sadness.txt, surprise.txt. We keep the same data organisation. Please cite the following reference in the publications or presentations containing research results obtained through the use of this resource: "Emotions in words: developing a multilingual WordNet-Affect". CICLING 2010, Iasi, Romania, 2010.		No (click continue to submit)
35	24/06/2014 15:01:41	Oxford Arabic Dictionary	http://www.oxforddictionaries.com/words/arabic	Dictionary or lexicon with MWEs (also includes MWEs)	Arabic, English			Also non-contiguous							Yes (click continue to fill in more information)	Oxford University Press	tressy.arts@gmail.com	http://ukcatalogue.oup.com/product/9780199580330.do	Intensional				compound nouns, compound adverbs, preposition + nouns, compound terminology, named entries, phrasal verbs, verbal expressions, collocations, etc.	All MWEs are written in both languages			Corpus, manually
36	24/06/2014 16:51:39	Unified Medical Language System (UMLS) SPECIALIST Lexicon	http://specialist.nlm.nih.gov/lexicon	Dictionary or lexicon with MWEs (also includes MWEs)	English	301,345 MWE base forms; 417,755 including all inflectional variants		Only contiguous	Available, unrestricted use	For terms and conditions of use, please see http://lexsrv3.nlm.nih.gov/LexSysGroup/docs/termsAndConditions.html	Terms and conditions; link given above.		The SPECIALIST Lexicon has been built since 1994 at the U.S. National Library of Medicine, National Institutes of Health. It is intended to be a general English lexicon that includes many biomedical terms. It provides comprehensive coverage of biomedical vocabulary as well as commonly occurring English words. The lexicon entry for each word or term records the syntactic categorization, variant forms (morphological information), and specification of acronyms.		No (click continue to submit)
37	29/07/2014 09:38:15	Multilingual Collocation Dictionary system Centre Tesniere (MultiCoDiCT)	http://tesniere.univ-fcomte.fr/multicodict_eng.html	MWE dictionary or lexicon (MWEs only)	Arabic < > French Chinese < > English < > French French < > Portuguese < > Spanish Korean < > English < > French				Unknown				Multilingual collocation dictionaries of specialised domains exploiting inherent mathematical properties by means of formal specification techniques A software engineering approach to multilingual terminology management. Applications in multilingual : Terminology Standards Safety critical domains : e.g.: clinical medicine.		No (click continue to submit)
38	29/07/2014 14:27:12	Stanford Multiword Expression Resources	http://mwe.stanford.edu/resources/	MWE dictionary or lexicon (MWEs only)	English, Russian								The following is a list of resources relevant to the LinGO Multiword Expression Project, along with a basic description of each resource, the date of release and a description of the author(s). In the instance that a reference is listed for the resource, we ask that any published results which make use of the given data set cite that reference appropriately. English and Russian Prepositional Phrases Verb particle constructions with compositionality judgements BNC verb particle construction frequency list Verb particle constructions with Levin verb classes and Google frequencies		No (click continue to submit)
39	29/07/2014 14:30:42	MWE resources listed in http://multiword.sourceforge.net	http://multiword.sourceforge.net/PHITE.php?sitesig=FILES&page=FILES_20_Data_Sets	list of resources	English, Chinese, Czech, German, Portuguese, Greek, French, Estonian										No (click continue to submit)
40	19/08/2014 11:57:00	LEX-MWE-PT: Word Combination in Portuguese Language	http://metashare.ilsp.gr:8080/repository/browse/lex-mwe-pt-word-combination-in-portuguese-language/8c13600ccd0711e1a404080027e73ea2f9cfd28f51d5437b8f5827c516c348fe/, http://www.clul.ul.pt/en/research-teams/187-combina-pt-word-combinations-in-portuguese-language	MWE dictionary or lexicon (MWEs only)	Portuguese (European)	12,753 MWE lemmas		Also non-contiguous	Available, restricted use	Restrictions: Academic - Non Commercial Use Distribution Access/Medium: Downloadable Licensors: Amália Mendes, amalia.mendes@clul.ul.pt (Copied from META-SHARE)	Under negotiation		This lexicon includes multiword expressions (MWE) of European Portuguese extracted from a balanced 50,8M word written corpus – a subcorpus of the Reference Corpus of Contemporary Portuguese (CRPC). This corpus covers different genres, being mainly constituted by journalistic texts (59%), but it also includes texts from literature (21%), magazines (15%), miscellaneous, supreme court verdicts, parliament sessions and leaflets (5%). The MWE lexicon covers 1.198 lemmas (composed of single words from different POS categories: nouns, adjectives, verbs and adverbs) and a total of 12.753 MWE lemmas (which include inflectional variants of the MWE lemmas) and 242.233 concordances of those MWE expressions manually verified. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: Centro de Linguística da Universidade de Lisboa: Amália Mendes	amalia.mendes@clul.ul.pt	ANTUNES, Sandra, Maria Fernanda BACELAR DO NASCIMENTO, João Miguel CASTELEIRO, Amália MENDES, Luísa PEREIRA, Tiago SÁ (2006) "A Lexical Database of Portuguese Multiword Expressions" in VIEIRA, Renata et al. (2006) PROPOR 2006, LNAI 3960, Berlin, Springer-Verlag, pp. 238-243.	Extensional (in the form of corpus concordances)	12,753 MWE lemmas	242,233 concordances of those MWE expressions manually verified.		frozen groups (e.g., patrão fora, dia santo na loja 'while the cat is away, the mice will play'); semi-frozen groups where the meaning of the expression can not be predicted by the meaning of the parts (e.g., esticar o pernil 'kick the bucket'), that are not subject to syntactical variability (e.g., internal modification esticar o grande pernil 'kick the big bucket' or passivization o pernil foi esticado 'the bucket was kicked') but allow inflectional variation (e.g., esticaram o pernil 'kicked the bucket'); semi-frozen groups that can be compositional and in some cases semantically idiosyncratic, and that allow for the substitution of one of the collocates by other words associated through a synonym or hyperonymy/hyponym relation (e.g., onda/maré/vaga de assaltos 'wave of robberies'; países/estados membros 'member states'); sets of favoured co-occurring forms, that constitute however syntactic dependencies.	A detailed description of the lexicon, its structure and content is given at the resource webpage: http://www.clul.ul.pt/en/research-teams/187-combina-pt-word-combinations-in-portuguese-language			Since the extraction of lexical collocations must rely on a large collection of data, a written and balanced corpus of 50 million words, the COMBINA corpus, was designed from the existing corpus CRPC: http://www.clul.ul.pt/en/research-teams/183-reference-corpus-of-contemporary-portuguese-crpc
41	19/08/2014 12:03:28	PANACEA Environment SCF MWE merged Italian Lexicon	http://metashare.ilsp.gr:8080/repository/browse/panacea-environment-scf-mwe-merged-italian-lexicon/c4e3084680c211e28763000c291ecfc8d62c1eae3f784dd99671954514e657ce/, http://hdl.handle.net/10230/20173	MWE dictionary or lexicon (MWEs only)	Italian				Available, restricted use	Creative Commons Attribution-NonCommercial 3.0 Unported License (http://creativecommons.org/licenses/by-nc/3.0/)	Creative Commons (CC): http://creativecommons.org/examples		The Italian PANACEA_ENV_MWE_SCF_merged.lmf.xml lexicon is obtained by merging two automatically extracted lexicons: a domain lexicon (environment) for SCFs, PANACEA_SCF_IT_environment.lmf.xml and a MWE Italian lexicon env-mw.lmf.xml. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu). <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Monica Monachini	monica.monachini@ilc.cnr.itz
42	19/08/2014 12:06:41	PANACEA Labour SCF MWE merged Italian Lexicon	http://metashare.ilsp.gr:8080/repository/browse/panacea-labour-scf-mwe-merged-italian-lexicon/c903fc2880c211e28763000c291ecfc84a99d41ec468410985a8d1ebfc06de71/, http://hdl.handle.net/10230/20174	MWE dictionary or lexicon (MWEs only)	Italian				Available, restricted use	Creative Commons Attribution-NonCommercial 3.0 Unported License (http://creativecommons.org/licenses/by-nc/3.0/)	Creative Commons (CC): http://creativecommons.org/examples		The Italian PANACEA_LAB_SCF_MWE_merged.lmf.xml lexicon is obtained by merging two automatically extracted lexicons: a domain lexicon (labour) for SCFs, PANACEA_SCF_IT_labour.lmf.xml and a MWE Italian lexicon lab-mw.lmf.xml. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu). <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Monica Monachini	monica.monachini@ilc.cnr.itz	MENDES, Amália, Sandra ANTUNES, Maria Fernanda BACELAR DO NASCIMENTO, João Miguel CASTELEIRO, Luísa PEREIRA, Tiago SÁ (2006) "COMBINA-PT: a Large Corpus-extracted and Hand-checked Lexical Database of Portuguese Multiword Expressions", Proceedings of the V International Conference on Language Resources and Evaluation - LREC2006, Genoa, May 22-28 2006, pp. 1900-1905.
43	20/08/2014 10:39:10	BioLexicon	http://metashare.ilsp.gr:8080/repository/browse/biolexicon/37c86584de6c11e2b1e400259011f6ead5fa82f93c0544b29b1b61526cd7c87f/, http://catalog.elra.info/product_info.php?products_id=1113	Dictionary or lexicon with MWEs (also includes MWEs)	English				Available, restricted use	Several licenses for different uses (academic/commercial) and users (ELRA members/non-members).	ELRA		BioLexicon is a large-scale English terminological resource which has been developed to address the needs emerging in text mining efforts in the biomedical domain. It contains information on: - terminological nouns, including nominalised verbs and proper names (e.g., gene names) - terminological adjectives - terminological adverbs - terminological verbs - general English words frequently used in the biology domain Existing information on terms was integrated, augmented, complemented and linked, through processing of massive amounts of biomedical text, to yield inter alia over 2.2M lexical entries (over 3.3M semantic relations), and information on over 1.8M variants and on over 2M synonymy relations. Moreover, extensive information is provided on how verbs and nominalised verbs in the domain behave at both syntactic and semantic levels, supporting thus applications aiming at discovery of relations and events involving biological entities in text. It contains domain specific verbs (658), includes both automatically-extracted syntactic subcategorization frames (1710), as well as semantic event frames (850) that are based on corpus annotation by domain experts. Once populated with terms from existing repositories, BioLexicon was augmented with term variants extracted from the scientific literature and complemented with manually selected lexical items, such as biologically relevant verbs and multiword token expressions. BioLexicon is available in a relational database format (MySQL dump format) and it adheres to the EAGLES/ISO standards for lexical resources. <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Mapelli Valérie	mapelli@elda.org
44	20/08/2014 10:47:19	LABEL-LEX (MW)	http://metashare.ilsp.gr:8080/repository/browse/label-lex-mw/86090e98de7011e2b1e400259011f6ea56b6b33c48c549568791c59a76545065/, http://catalog.elra.info/product_info.php?products_id=700	MWE dictionary or lexicon (MWEs only)	Portuguese	88 619			Available, restricted use	Several licenses for different uses (academic/commercial) and users (ELRA members/non-members).	ELRA		LABEL-LEX (MW) is a Portuguese formalized lexicon, containing 88 619 inflected multiword lexical units (formally, sequences of simple words). The units are distributed as follows: - 85,881 nouns, with information about type, gender, number, inflected forms, irregular inflected forms and subcategorisation frames - 2,204 adverbs - 409 adjectives, with information about degree, gender, number, comparison, position, inflected forms, irregular inflected forms and subcategorisation frames - 125 pronouns, prepositions/postpositions and conjunctions <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Mapelli Valérie	mapelli@elda.org		Extensional	88619			nouns, adverbs, adjectives, pronouns, prepositions/postpositions and conjunctions
45	20/08/2014 10:51:49	OpenLogos Bilingual Dictionaries	http://metashare.ilsp.gr:8080/repository/browse/openlogos-bilingual-dictionaries/c27d98c2e0d511e3a462080027f903f2d1cca27a783a451bae80dfe14cc90043/	Dictionary or lexicon with MWEs (also includes MWEs)	English>French, English>German, English>Italian				Available, unrestricted use	Restrictions: Academic - Non Commercial Use	GPL		The OpenLogos bilingual dictionaries (English-French, English-German and English-Italian) contain the following linguistic information: part-of-speech (POS), gender (GEN), number (NUM), morphological paradigms (PAT) for source and target words, head word (HEAD) in multiwords, homographs (HOMO), auxiliary (AUX), alternate word (ALT), causative verb (CAUS), reflexive verb (REFL), and aspectual verb (ASP). In addition, they contain semantico-syntactic knowledge (SAL), a three-level interlingua-style hierarchical taxonomy with over 1,000 elements, embracing all POS. SAL represents the conceptual formalization of things, ideas, relationships, dispositions, conditions, processes, etc., as described in the SAL Tutorial of the Learn Logos application, available with the OpenLogos software. Each bilingual dictionary contains over 80,000 entries. Verbs, nouns and adjectives are the most represented classes. We believe that they are useful for machine translation and other natural language processing applications. <Description from META-SHARE>		Yes (click continue to fill in more information)	Anabela Barreiro	anabela.barreiro@inesc-id.pt
46	20/08/2014 10:56:01	The CINTIL Corpus – International Corpus of Portuguese	http://metashare.ilsp.gr:8080/repository/browse/the-cintil-corpus-international-corpus-of-portuguese/99a51c1ade6d11e2b1e400259011f6eabab9a2512cd6404c8f828ed94885c413/, http://catalog.elra.info/product_info.php?products_id=1102	Dictionary or lexicon with MWEs (also includes MWEs)	Portuguese				Available, restricted use	Several licenses for different uses (academic/commercial) and users (ELRA members/non-members).	ELRA		CINTIL-Corpus Internacional do Português is a linguistically interpreted written and spoken corpus of European Portuguese. It is composed of one million annotated tokens, each one of which verified by human expert annotators. The annotation comprises information on part-of-speech, open class lemma and inflection, multi-word expressions pertaining to the class of adverbs and to the closed POS classes, and multi-word proper names (for named entity recognition). Multiword Lexical Units (MWU) for Named Entity Recognition (NER): Delimitation and classification of multi-word expressions for Named Entities following the usual IOB tagging schema for NER, and the typical classes of Number, Date, Person, Location, etc.<Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Mapelli Valérie	mapelli@elda.org
47	25/08/2014 14:58:24	bgMWE – tool for MWE recognition	http://metashare.ilsp.gr:8080/repository/browse/bgmwe-tool-for-mwe-recognition/c51ec6406afd11e281b65cf3fcb88b70b4b3bc3889ed462581042dea4cb48a06/, http://dcl.bas.bg/en/bgMWE_en.html	tool	Bulgarian (language dependent)				Available, restricted use	CC-BY-NC	Creative Commons (CC): http://creativecommons.org/examples		bgMWE is a tool for corpus processing and MWE recognition and tagging. It is developed in Java and is thus platform independent. bgMWE comprises a set of modules which can be applied for particular NLP tasks. It is largely language independent and can work either in resource-light mode, or its performance can be boosted by employing lexical resources. The system includes the following modules: Web crawler for Wikipedia; Extraction of lexical data – lists of words and MWEs; Converter between formats – vertical format, XML, etc.; Preprocessing module – applying a chunker, a tagger, etc.; Collection of frequency data; MWE recognition and tagging. Further improvement of bgMWE is planned in the following directions: improving efficiency; implementing various methods for MWE recognition; developing a visualisation module or integrating existing open source visualisation methods; module for extensive evaluation. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: Institute for Bulgarian Language. Contact person: Ivelina Stoyanova	dcltools@dcl.bas.bg
48	25/08/2014 15:01:45	Bulgarian MWE dictionary	http://metashare.ilsp.gr:8080/repository/browse/bulgarian-mwe-dictionary/50f06bb26afc11e281b65cf3fcb88b703aab1e9d754e40c88e16d771d08c1842/, http://dcl.bas.bg/en/mweDictionary_en.html	MWE dictionary or lexicon (MWEs only)	Bulgarian	27744			Available, restricted use	CC-BY-NC	Creative Commons (CC): http://creativecommons.org/examples		The Bulgarian dictionary of MWEs includes 27,744 MWEs altogether which are divided into 13 categories based on their idyomaticity which is evaluated with respect to the following features: whether the MWE is a named entity; whether the MWE contains a reference to a named entity; the degree to which the meaning of the MWE is compositional and transparent. The MWEs are extracted from several sources: Wikipedia, the Thesaurus of Bulgarian (1994) and other printed dictionaries and electronic corpora. The MWEs are manually verified and classified into categories. <Description from META-SHARE>. Further improvement of bgMWE is planned in the following directions: improving efficiency; implementing various methods for MWE recognition; developing a visualisation module or integrating existing open source visualisation methods; module for extensive evaluation. <From http://dcl.bas.bg/en/bgMWE_en.html>		Yes (click continue to fill in more information)	IPR holder: Institute for Bulgarian Language. Contact person: Ivelina Stoyanova	iva@dcl.bas.bg
49	25/08/2014 15:08:13	Chooser - annotation tool	http://metashare.ilsp.gr:8080/repository/browse/chooser-annotation-tool/5f603b6c6a5911e281b65cf3fcb88b7080220eacf93548d7a4b6d2c8f15960db/, http://dcl.bas.bg/en/Chooser.html	tool	Language independent			Also non-contiguous	Available, restricted use	GPL (share alike)	GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html		Chooser is an OS independent multi-functional system for linguistic annotation, adaptable to different annotation schemata. The basic annotation functionalities of the tool are: (i) fast and easy-to-perform selection; (ii) run-time access to information for the candidate senses such as definition, frequency, the associated wordnet synsets with all the pertaining info – synonyms, gloss, semantic relations, notes on usage, form, etc.; (iii) identification of MWEs with contiguous and non-contiguous constituents and supplying information for them at run-time. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: Department of Computational Linguistics, Institute for Bulgarian Language. Contact person: Borislav Rizov	boby@dcl.bas.bg
50	25/08/2014 15:14:08	Lists of Bulgarian Multiword Expressions (BulMWEs)	http://metashare.ilsp.gr:8080/repository/browse/lists-of-bulgarian-multiword-expressions/bc41fa6266cd11e281b65cf3fcb88b70b5af49e6365f4bee903c9527bd6d1e4a/, http://dcl.bas.bg/en/dictionaries_en.html	MWE dictionary or lexicon (MWEs only)	Bulgarian				Available, restricted use	Restrictions: Academic - Non Commercial Use Fee: free of charge			The lists of Multiword expressions are the result of automatic and semi-automatic tagging and classification of the corpus Wiki1000+ (13.4 million tokens): Non-decomposable - 700, Idiosyncratically decomposable - 3,156, Simple decomposable (NEs without connection between elements - 36,932, NEs with a meaningful element(s) - 11,248, Non-NEs with a vague connection between components - 1,46, NEs with meaningful components but connection difficult to restore - 1,086, NEs with descriptor and additional element - 18,962, Non-NEs with a NE as one of the components - 27,373, Non-NEs with a standard, easy to restore connection between components- 140,394, NEs with a standard, easy to restore connection between components - 16,653, Non-NEs with explicit connection between components - 1,468), “Free collocations” - 49,651, Free phrases- 1,197,762. <Description from META-SHARE>		Yes (click continue to fill in more information)	Department of Computational Linguistics, Institute for Bulgarian Language. Contact person: Ivelina Stoyanova	iva@dcl.bas.bg
51	25/08/2014 15:18:17	Mutilingual dictionaries	http://metashare.ilsp.gr:8080/repository/browse/mutilingual-dictionaries/0dbc01f46afb11e281b65cf3fcb88b70f5819e245fc3484b8b7fafb21b1bd291/, http://dcl.bas.bg/en/multilingualDictionary_en.html	Dictionary or lexicon with MWEs (also includes MWEs)	Bulgarian, English, German, Romanian, Greek, Polish				Available, restricted use	CC-BY-NC	Creative Commons (CC): http://creativecommons.org/examples		The set of multilingual dictionaries covers all pairs of languages among the following: Bulgarian, English, German, Romanian, Greek, and Polish. The main source of the dictionaries is Wikipedia – translations of article titles and category labels. The dictionaries include single words, MWEs and phrases but are predominantly phrase-to-phrase. The following sets of dictionaries are included in the pack: General bilingual dictionaries for each pair of languages; Bilingual dictionaries of personal names for each pair of languages; Bilingual dictionaries of organisations for each pair of languages; Bilingual dictionaries of toponyms for each pair of languages. The dictionaries are stored in plain text format for easy and flexible storage and processing. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: Institute for Bulgarian Language. Contact person: Svetla Koeva	svetla@dcl.bas.bg
52	25/08/2014 15:29:25	Wiki1000+ corpus with annotated MWEs	http://metashare.ilsp.gr:8080/repository/browse/wiki1000-corpus-with-annotated-mwes/a2038f0a6af411e281b65cf3fcb88b704919deb94d474ffc825290985f395f46/, http://dcl.bas.bg/en/wikiCorpus_en.html	Corpus with annotated MWEs	Bulgarian				Available, restricted use	Licence: CC-BY Restrictions: Academic - Non Commercial Use, Attribution Distribution Access/Medium: Downloadable	Creative Commons (CC): http://creativecommons.org/examples		Wiki1000+ is a corpus of articles from Wikipedia, compiled for the purposes of the study of multiword expressions (MWEs) in Bulgarian. The Wiki1000+ corpus contains 6,311 text samples with at least 1,000 tokens each, amounting to 13.4 million tokens. The corpus is a part of the Bulgarian National Corpus. Wiki1000+ is annotated with the following linguistic information: sentence boundaries, tokenisation, lemmatisation, POS tagging, and MWE annotation. MWE annotation includes MWE id, labelling the components of the MWE and determining the type of the MWE according to a classification based on idiomaticity. <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Svetla Koeva	svetla@dcl.bas.bg
53	26/08/2014 09:03:11	Bulgarian Sense-Annotated Corpus	http://metashare.ilsp.gr:8080/repository/browse/bulgarian-sense-annotated-corpus/b7d5478666cd11e281b65cf3fcb88b705fc4c009156a4a9499794778d015eaa8/, http://dcl.bas.bg/semcor/en/	Sense-annotated corpus with MWEs	Bulgarian	5797			Available, restricted use	Restrictions: Academic - Non Commercial Use Distribution Access/Medium: Accessible Through Interface			The Bulgarin Sence-annotated Corpus (BulSemCor) contains sense-disambiguated lexical items defined in the context of occurrence. The Bulgarian Sense-annotated Corpus follows the methodology of the Princeton University SemCor. As BulSemCor it consists of excerpts from the Brown Corpus of Bulgarian. Each lexical item (simple word, compound word or multiword expression) is assigned manually the unique semantic or grammatical meaning from the Bulgarian wordnet (BulNet) in the particular context. Contrary to other sense annotated corpora, the BulSemCor covers both open and close class words and all occurences of multiword expressions and named entities. The annotated lexical units inherit all the information from the synonym sets in the BulNet, incl. explanatory definition, PoS, usage examples, notes on grammatical, stylistic, and pragmatic properties, and all relations (semantic morpho-syntactic and extra-linguistic) pertaining to the synset, as well as the semantic and derivational relations pertaining to the literal. The BulSemCor contains 101 062 tokens, 99 480 annotated lexical units - 86 842 single words, а 5797 multiword expressions. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: Institute for Bulgarian Language. Contact person: Ivelina Stoyanova	dcltools@dcl.bas.bg
54	26/08/2014 09:07:34	Collocation and Term Extractor (CollTerm)	http://metashare.ilsp.gr:8080/repository/browse/collocation-and-term-extractor/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75/, http://www.nljubesic.net/resources/tools/collterm/	tool	Language independent				Available, unrestricted use	Apache Licence 2.0 Restrictions: Inform Licensor Execution location: hidden Distribution Access/Medium: Downloadable	Apache		CollTerm is a language independent tool for collocation and term extraction. It is an application that collects collocation and term candidates based on five different co occurrence measures for multiword units (i.e. collocations) or distributional differences from large representative corpus by application of the TF-IDF measurement on singleword units. The language dependent part consists of stop-word list and list of MWU MSD-patterns that can be coded with regular expressions as well. The application is describe in the paper presented at TKE2012 by Pinnis, M., Ljubešić, N., Ştefănescu, D., Skadiņa, I, Tadić, Gornostay, T. Term Extraction, Tagging, and Mapping Tools for Under-Resourced Languages. The first version of this application is available as an integral part of ACCURAT Toolkit that is available under Apache 2.0 license (http://www.accurat-project.eu/index.php?p=accurat-toolkit). In this version of the tool a calibration of MWU MSD-patterns has been provided for Croatian thus enhancing the usability of the tool. The plan is to provide calibration for other CESAR languages as well.		Yes (click continue to fill in more information)	IPR holder: University of Zagreb, Faculty of Humanities and Social Sciences, c/o Marko Tadić. Contact person: Nikola Ljubešić	nljubesi@ffzg.hr
55	26/08/2014 09:10:38	Dictionary of Neologisms in Bulgarian Language	http://metashare.ilsp.gr:8080/repository/browse/dictionary-of-neologisms-in-bulgarian-language/7ad446f268ad11e281b65cf3fcb88b70dd4a3a216cb34a998c25fda3d4e70b2a/, http://infolex.ibl.bas.bg/PhrasThes/searchNeologPage.seam?cid=17	Dictionary or lexicon with MWEs (also includes MWEs)	Bulgarian	160			Available, restricted use	CC - BY - NC - ND Restrictions: Academic - Non Commercial Use, No Redistribution Download location: hidden Distribution Access/Medium: Accessible Through Interface	Creative Commons (CC): http://creativecommons.org/examples		The Dictionary of Neologisms in Bulgarian Language contains over 2,200 new words and 160 new multiword units (compounds and terminological units) that have entered the Bulgarian language in the past 20 years. Each entry contains information about: part-of-speech (for lexemes); origin (for borrowed words); stylistic and grammatical notes; lexical meaning of the unit; synonyms and antonyms (if available). If necessary, short examples (phrases or sentences) are given to illustrate the use of the neologism in context. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: Institute for Bulgarian Language. Contact person: Diana Blagoeva	d.blagoeva@ibl.bas.bg
56	26/08/2014 09:13:13	Java version of NooJ (JavaNooJ)	http://metashare.ilsp.gr:8080/repository/browse/java-version-of-nooj/2f8caa506aff11e2aedc000423bfd61c0a125e4434514b43ba542943a6108ec7/, http://www.nooj4nlp.net/pages/download.html	tool	Language independent				Available, restricted use	GPL Restrictions: Academic - Non Commercial Use Fee: no price Download location: hidden Distribution Access/Medium: Downloadable	GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html		NooJ is a linguistic development environment that allows linguists to formalize several levels of linguistic phenomena: typography and spelling; lexicons of simple words, multiword units and discontinuous expressions; inflectional, derivational and productive morphology; local and structural syntax, transformational and semantic analysis and generation. For each of these levels NooJ provides linguists with one formal framework specifically designed to facilitate the description of each phenomenon, as well as parsing/development/debugging tools designed to be as computationally efficient as possible, from Finite-State machines to Turing machines. This approach distinguishes NooJ from other computational linguistic frameworks which provide a unique formalism based on a compromise between power and efficiency. As a corpus processing tool, NooJ allows all researchers and professional to extract information from general or technical corpora by applying sophisticated queries based on concepts rather than word forms and build indices, add semantic annotations, perform statistical analyses, etc. Java version of NooJ is an oper source software and working on all operating systems. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: Max Silberztein	elliadd@univ-fcomte.fr
57	26/08/2014 09:16:40	MONO version of NooJ (MONONooJ)	http://metashare.ilsp.gr:8080/repository/browse/mono-version-of-nooj/fc91787a6b7f11e29f6e000423bfd61cad17bb05bcbd470da8cec4ebdda3481e/, http://www.nooj4nlp.net/pages/download.html	tool	language independent				Available, restricted use	MS - NC - No ReD - ND Restrictions: Academic - Non Commercial Use, No Derivatives, No Redistribution Fee: no price Download location: hidden Distribution Access/Medium: Downloadable	META-SHARE: http://www.meta-net.eu/meta-share/licenses		NooJ is a linguistic development environment that allows linguists to formalize several levels of linguistic phenomena: typography and spelling; lexicons of simple words, multiword units and discontinuous expressions; inflectional, derivational and productive morphology; local and structural syntax, transformational and semantic analysis and generation. For each of these levels NooJ provides linguists with one formal framework specifically designed to facilitate the description of each phenomenon, as well as parsing/development/debugging tools designed to be as computationally efficient as possible, from Finite-State machines to Turing machines. This approach distinguishes NooJ from other computational linguistic frameworks which provide a unique formalism based on a compromise between power and efficiency. As a corpus processing tool, NooJ allows all researchers and professional to extract information from general or technical corpora by applying sophisticated queries based on concepts rather than word forms and build indices, add semantic annotations, perform statistical analyses, etc. MONO version of NooJ is operative on all platforms that support MONO. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: Max Silberztein	Max.Silberztein@univ-fcomte.fr
58	26/08/2014 09:20:13	PANACEA Environment Bilingual Glossary French-to-English (ENV Glossary FR-EN)	http://metashare.ilsp.gr:8080/repository/browse/panacea-environment-bilingual-glossary-french-to-english/f333e5a6bbb611e28763000c291ecfc880c9eb1f3a94470eb387e97674e2bcac/, http://hdl.handle.net/10230/19969	Dictionary or lexicon with MWEs (also includes MWEs)	French, English				Available, unrestricted use	CC - BY Download location: hidden Distribution Access/Medium: Downloadable	Creative Commons (CC): http://creativecommons.org/examples		This glossary contains terminology in French-to-English, with a focus on environmental terms, resulting from PANACEA research. It contains about 3846 entries, both single words and multiwords, with part-of-speech information, manually validated. <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact persons: Gregor Thurmair, Vera Aleksić	info@linguatec.de
59	26/08/2014 09:23:47	PANACEA Environment Multi Word Italian Lexicon	http://metashare.ilsp.gr:8080/repository/browse/panacea-environment-multi-word-italian-lexicon/f8769888bbb611e28763000c291ecfc8297387836d4e4c379114c193ecd3cc85/, http://hdl.handle.net/10230/20182	MWE dictionary or lexicon (MWEs only)	Italian				Available, restricted use	CC - BY - NC Download location: hidden Distribution Access/Medium: Downloadable	Creative Commons (CC): http://creativecommons.org/examples		The Environment MW Italian Lexicon is a lexicon of noun-noun multiword expressions automatically extracted from a 36Mio word web crawled corpus in the environmental domain. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu). <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Monica Monachini	monica.monachini@ilc.cnr.itz	Frontini F., Quochi V., Rubino F. (2012) “Automatic Creation of quality Multi-word Lexica from noisy text data” Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data. COLING 2012. Mumbay, India. Quochi, Valeria, Frontini, Francesca and Rubino Francesco (2012). A MWE Acquisition and Lexicon Builder Web Service. Proceedings of the COLING 2012. Mumbay, India.
60	26/08/2014 09:27:47	PANACEA Labour Legislation Bilingual Glossary French-to-English (LAB Glossary FR-EN)	http://metashare.ilsp.gr:8080/repository/browse/panacea-labour-legislation-bilingual-glossary-french-to-english/f6aea810bbb611e28763000c291ecfc8dc31bd362d6b43dbb6c42a1b69cb1c0f/, http://hdl.handle.net/10230/19988	Dictionary or lexicon with MWEs (also includes MWEs)	French, English				Available, unrestricted use	CC - BY Download location: hidden Distribution Access/Medium: Downloadable	Creative Commons (CC): http://creativecommons.org/examples		This glossary contains terminology in French-to-English, with a focus on labour legislation terms, resulting from PANACEA research. It contains about 2441 entries, both single words and multiwords, with part-of-speech information, manually validated. <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact persons: Gregor Thurmair, Vera Aleksić	info@linguatec.de	Thurmair; Gr., Aleksić, V., 2012: Creating Term and Lexicon Entries From Phrase Tables. Proc. EAMT Trento
61	26/08/2014 09:30:56	PANACEA Labour Multi Word Italian Lexicon	http://metashare.ilsp.gr:8080/repository/browse/panacea-labour-multi-word-italian-lexicon/f8d9d876bbb611e28763000c291ecfc853a5588bfab84b39a330e1b4220b2d83/, http://hdl.handle.net/10230/20177	MWE dictionary or lexicon (MWEs only)	Italian				Available, restricted use	CC - BY - NC Download location: hidden Distribution Access/Medium: Downloadable	Creative Commons (CC): http://creativecommons.org/examples		The Labour MW Italian Lexicon is a lexicon of noun-noun multiword expressions automatically extracted from a 70Mio word web crawled corpus in the labour law domain. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu). <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Monica Monachini	monica.monachini@ilc.cnr.itz	Frontini F., Quochi V., Rubino F. (2012) “Automatic Creation of quality Multi-word Lexica from noisy text data” Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data. COLING 2012. Mumbay, India. Quochi, Valeria, Frontini, Francesca and Rubino Francesco (2012). A MWE Acquisition and Lexicon Builder Web Service. Proceedings of the COLING 2012. Mumbay, India.
62	26/08/2014 09:35:52	Serbian NooJ module (SrpNooJ)	http://metashare.ilsp.gr:8080/repository/browse/serbian-nooj-module/0d68b2f28b3411e2ab9f001517144592e9978ff1de0d4abebd4d6c8935fcb9af/, http://www.nooj4nlp.net/pages/resources.html	Lexical conceptual resource with examples of MWEs	Serbian				Available, unrestricted use	CC - BY Restrictions: Attribution Fee: no price Download location: hidden Distribution Access/Medium: Downloadable	Creative Commons (CC): http://creativecommons.org/examples		Serbian NooJ module (SrpNooJ) was produced in the scope of the EU-funded CESAR project. It consists of a set of resources in both alphabets that are in use for Serbian: Cyrillic and Latin. Each set consists of: the dictionary properties’ definition file (metadata), one text – a novel “Dva carstva” (Two empires) from a Serbian author Branimir Ćosić comprising of 106684 tokens, a sample dictionary in readable form with 35 lemma that belong to 9 grammatical classes, with examples of multiword units and derivational morphology, a sample of morphological grammars used for lemmas from a sample dictionary – three for simple nouns, two for adjectives, two for verbs, and one for a multiunit noun, a readable sample dictionary of inflected forms automatically produced from a sample dictionary of lemmas and a sample morphological grammars, a syntactic grammar for recognition of one class of named entities – full personal names with their roles or functions, a full compiled dictionary (divided in three files: nouns, verbs, and other). It comprises of 85868 entries: nouns (40886), adjectives (25558), verbs (15366), and other (4058). <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Miloš Utvić	misko@matf.bg.ac.rs	Samples location: http://www.nooj4nlp.net/pages/resources.html
63	26/08/2014 09:38:42	Shallow Grammar for the National Corpus of Polish (NKJPGrammar)	http://metashare.ilsp.gr:8080/repository/browse/shallow-grammar-for-the-national-corpus-of-polish/f95d762a6aff11e284b6000423bfd61c5891270890b246d88f606717f0ce6ea7/, http://clip.ipipan.waw.pl/LRT?action=AttachFile&do=view&target=gramatyka_Spejd_NKJP_1.0.zip	grammar	Polish				Available, restricted use	GPL Restrictions: Share Alike Fee: free of charge Download location: hidden Distribution Access/Medium: Downloadable	GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html		Shallow Grammar for the National Corpus of Polish is a set of rules which was used for the automatic pre-annotation of the National Corpus of Polish at the syntactic level. It was constructed manually and encoded in the shallow parsing system Spejd (http://nlp.ipipan.waw.pl/Spejd/). It consists of 1187 rules for multiword entities, abbreviations, syntactic words, and syntactic groups. <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Katarzyna Głowińska	k.glowinska@gmail.com	Głowińska K., Przepiórkowski A. The Design of Syntactic Annotation Levels in the National Corpus of Polish. W: LREC 2010 proceedings. Waszczuk, J., Głowińska, K., Savary, A., Przepiórkowski, A. Tools and Methodologies for Annotating Syntax and Named Entities in the National Corpus of Polish. In: Proceedings of Computational Linguistics – Applications (CLA 2010), Workshop at IMCSIT 2010, Wisła, Poland, October 18-20.
64	26/08/2014 09:44:02	CST Tokeniser (rtfreader)	http://metashare.ilsp.gr:8080/repository/browse/cst-tokeniser/fc95a26642cf11e28d2f0050569b00008e521dc7f3a24ee48b252a6001e61201/, https://github.com/kuhumcst/rtfreader	tool	Danish				Available, restricted use	GPL Download location: hidden Distribution Access/Medium: Downloadable	GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html		Sentence segmenter. Optional tokenisation, MWU-recognition and recognition of abbreviations. Input from RTF (rich text) or flat text. In the case of RTF, layout and style info is used to recognise and properly treat e.g. head lines and bulleted lists. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: University of Copenhagen. Contact person: Bart Jongejan	bartj@hum.ku.dk	https://github.com/kuhumcst/rtfreader/blob/master/README.md
65	26/08/2014 17:48:59	Oxford Collocations Dictionary	http://abloz.com/huzheng/stardict-dic/dict.org/stardict-OxfordCollocationsDictionary-2.4.2.tar.bz2	MWE dictionary or lexicon (MWEs only)	English	8378			Available, unrestricted use		GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html				No (click continue to submit)
66	28/08/2014 20:16:48	British English Source Lexicon (BESL) version 2.2	http://metashare.ilsp.gr:8080/repository/browse/british-english-source-lexicon-besl-version-22/dc410e62de6811e2b1e400259011f6eaff8112b159c346f8a910378af93ece2a/, http://catalog.elra.info/product_info.php?products_id=834	Dictionary or lexicon with MWEs (also includes MWEs)	English (British)	58,000 multi-word compound nouns		Only contiguous	Available, restricted use	ELRA END USER Restrictions: Academic - Non Commercial Use For Members of ELRA ELRA END USER Restrictions: Academic - Non Commercial Use For Non Members of ELRA	ELRA		BESL is a complete database of the English lexicon. It consists of over 230,000 lemmas, over 350,000 word forms, 60,000 proper nouns, 3,000 abbreviations, and 58,000 multi-word compound nouns. Each headword is provided with a full listing of all inflected forms and other morphological variation. Every word form is marked for part of speech (using Penn TreeBank notation). Most single-word forms include a representation of IPA pronunciation. BESL covers both British and American English, and other spelling variants, with cross-references between corresponding forms. Each lemma is graded on a scale between 1 and 9 to indicate frequency, based on corpus evidence. Lemmas are also classified by domain, where appropriate (e.g. Computing, Religion). Obscene or offensive lemmas are marked using a 2-grade system. Proper name lemmas in BESL include personal names, surnames, place names, and brand names. BESL is provided in XML. <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Mapelli Valérie	mapelli@elda.org						compound nouns
67	28/08/2014 20:23:57	CINTIL-Corpus Internacional do Português	http://metashare.ilsp.gr:8080/repository/browse/cintil-corpus-internacional-do-portugues/fe32ebf2485511e2a2aa782bcb074135aa0fdcd287ac45e7b67de9c36d8d2890/, http://catalog.elra.info/product_info.php?products_id=1102, http://cintil.ul.pt/	Corpus with annotated MWEs	Portuguese				Unknown	Not Available Through Meta Share ELRA END USER Restrictions: Academic - Non Commercial Use	ELRA		CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese. At present it is composed of 1 Million annotated tokens, verified by human expert annotators. The annotation comprises information on part-of-speech, open classes lemma and inflection, multi-word expressions pertaining to the class of adverbs and to the closed POS classes, and multi-word proper names (for named entity recognition). The corpus has been developed at the University of Lisbon by the NLX group at the Faculty of Sciences and the Anagrama group at the Cenro de Linguística da Universidade de Lisboa. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holders: António Branco, Amália Mendes	antonio.branco@di.fc.ul.pt	Florbela Barreto, António Horta Branco, Aida Cardoso, Amália Mendes, Fernanda Bacelar Nascimento, Raïssa Gillier e João Silva, CINTIL Corpus Internacional do Português: Annotation Manual, v. 7.0 , , 2012					multi-word expressions pertaining to the class of adverbs and to the closed POS classes, and multi-word proper names (for named entity recognition).
68	01/09/2014 13:04:01	Italian Syntactic-Semantic Treebank (ISST)	http://metashare.ilsp.gr:8080/repository/browse/italian-syntactic-semantic-treebank-isst/ccc16e0ede7311e2b1e400259011f6eafc6f8055ac6343659ae911e80a008400/, http://catalog.elra.info/product_info.php?products_id=887	Treebank with MWE annotations	Italian				Available, restricted use	Several licenses for different uses (academic/commercial) and users (ELRA members/non-members).	ELRA		ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in XML. ISST has a five-level structure covering orthographic, morpho-syntactic, syntactic and semantic levels of linguistic description. Syntactic annotation is distributed over two different levels: the constituent structure level and the functional relations level. The fifth level deals with lexico-semantic annotation, which is carried out in terms of sense tagging of lexical heads (nouns, verbs and adjectives) augmented with other types of semantic information: ItalWordNet (see ELRA-M0018) is the reference lexical resource used for the sense tagging task. Both syntactic and lexico-semantic annotations refer to the morpho-syntactically annotated text, which in turn is linked to the orthographic file with the text and mark-up of macrotextual organisation (e.g. titles, subtitles, summary, body of article, paragraphs). <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Mapelli Valérie	mapelli@elda.org	http://www.aclweb.org/anthology/W00-1903					Several. (The ones mentioned in the article added in the "Relevant publications" field are compounds, support verb constructions and idioms.)	The description field of the META-SHARE record contains detailed information about the features of the treebank.	The adopted morpho-syntactic annotation scheme conforms to the EAGLES international standard.The ISST functional annotation scheme is based on FAME (Lenci et al. 1999, 2000).		ISST corpus consists of about 300,000 word tokens reflecting contemporary language use. It includes two different sections: 1) a "balanced" corpus, testifying general language usage, for a total of about 210,000 tokens; 2) a specialised corpus, amounting to 90,000 tokens, with texts belonging to the financial domain. The balanced corpus contains a selection of articles from different types of Italian texts, namely newspapers (La Repubblica and Il Corriere della Sera) and a number of different periodicals which were selected to cover a high variety of topics (politics, economy, culture, science, health, sport, leisure, etc.). The financial corpus includes articles taken from Il Sole-24 Ore. All in all, they cover a 10 year time period (1985-1995). (Copied from http://www.aclweb.org/anthology/W00-1903)
69	01/09/2014 13:24:25	LX-Stopwords	http://metashare.ilsp.gr:8080/repository/browse/lx-stopwords/29892e16a35a11e1a404080027e73ea22e53349e39f348a7944b0b5bef6e9c41/, http://nlx.di.fc.ul.pt/	List of stopwords with MWEs	Portuguese	173			Available, unrestricted use	Restrictions: Academic - Non Commercial Use, Commercial Use User Nature: Academic, Commercial Distribution Access/Medium: Downloadable			LX-Stopwords resource is a manual list of words from Portuguese composed by 2631 words of 51 types. The words are grouped in three big classes, arranged according to their morpho-syntactic category and inflectional feature value (closed classes, open classes, and multi-word units). This list was created as a support resource to develop CRIVO/EtiFac tool (see Branco & Silva, 2001), a tool for the semiautomatic annotation of corpora. With this in mind, the list seeks to be an as exhaustive as possible repository of all word forms that belong to closed classes, items typically with high frequency and fixity. Taking into account the ambiguity between words of different categories, which means that some words from closed classes (1866 words) can be part of others categories, two classes were added to the list: open classes (592 words) and multi-word units (173 words), including only the words already contained in closed classes. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: University of Lisbon, Faculty of Sciences. Licensor, distribution rights holder and contact person: António Branco	antonio.branco@di.fc.ul.pt	Catarina Carvalheiro, LX-Stopwords Narrative Description: http://194.117.45.196:2000/LX-Stopwords.pdf. In Proceedings Branco & Silva, EtiFac: A Facilitating Tool for Manual Tagging, pp. 81-90 , Proceedings of XVII Encontro Anual da APL, 2001: http://www.di.fc.ul.pt/~ahb/nexing/main.htm										Samples location: http://194.117.45.196:2000/stopwordssample.txt <entries> <sub-class>_QNT#ms</sub-class> <list> <stopword>algum</stopword> <stopword>certo</stopword> <stopword>imenso</stopword> <stopword>muito</stopword> <stopword>nenhum</stopword> <stopword>numeroso</stopword> <stopword>pouco</stopword> <stopword>tanto</stopword> <stopword>todo</stopword> </list> </entries> <entries> <sub-class>_REL</sub-class> <list> <stopword>como</stopword> <stopword>onde</stopword> <stopword>que</stopword> <stopword>quem</stopword> <stopword>quÃª</stopword> </list> </entries>
70	01/09/2014 13:32:23	New Oxford Dictionary of English, 2nd Edition (NODE)	http://metashare.ilsp.gr:8080/repository/browse/new-oxford-dictionary-of-english-2nd-edition/9460637ede6b11e2b1e400259011f6ea58609ecf25e1458f8e72077ed6ad7a70/, http://catalog.elra.info/product_info.php?products_id=679	Dictionary or lexicon with MWEs (also includes MWEs)	English	More than 10 000			Available, restricted use	ELRA END USER Restrictions: Academic - Non Commercial Use For Members of ELRA User Nature: Academic ELRA END USER Restrictions: Academic - Non Commercial Use For Non Members of ELRA User Nature: Academic	ELRA		This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Mapelli Valérie	mapelli@elda.org						phrasal verbs and other multi-word phrases
71	01/09/2014 13:37:28	Oxford English phonetics files	http://metashare.ilsp.gr:8080/repository/browse/oxford-english-phonetics-files/e986bb8ede6911e2b1e400259011f6eacf808bda74be4dc4879f8d2cf624cc4a/, http://catalog.elra.info/product_info.php?products_id=845	Lists of word forms together with a representation of their IPA pronunciation (with MWEs)	English	"a large number"			Available, restricted use	ELRA END USER Restrictions: Academic - Non Commercial Use For Members of ELRA User Nature: Academic ELRA END USER Restrictions: Academic - Non Commercial Use For Non Members of ELRA User Nature: Academic	ELRA		Derived from a range of Oxford Dictionaries, these files list word forms together with a representation of their IPA pronunciation. It contains 250,000 words. Pronunciation is based on standard British English. Word forms include dictionary lemmas and inflections or other morphological variations, plus a wide range of proper name and encyclopedic material. The data also includes a large number of common multi-word phrases and compound nouns. The files are provided in XML. <Description from META-SHARE>		Yes (click continue to fill in more information)	Contact person: Mapelli Valérie	mapelli@elda.org						common multi-word phrases and compound nouns				Dictionary, Derived from a range of Oxford Dictionaries
72	01/09/2014 13:50:11	SEJFEK4Spejd	http://metashare.ilsp.gr:8080/repository/browse/sejfek4spejd/07c31f266b0011e284b6000423bfd61c7e50e1c0b2b74065adaf52ade4365eeb/, http://zil.ipipan.waw.pl/SEJFEK4Spejd	Shallow grammar of multi-word economic terms	Polish	11,270 automatically generated rules			Available, restricted use	CC - BY - SA Restrictions: Attribution, Share Alike Fee: free of charge Download locations: hidden Distribution Access/Medium: Downloadable GPL Restrictions: Share Alike Fee: free of charge Download location: hidden Distribution Access/Medium: Downloadable	CC and GPL		SEJFEK4Spejd is the SEJFEK lexicon (Grammatical Lexicon of Polish Economical Phraseology) converted into a lexicalized Spejd shallow grammar. It contains 11,270 automatically generated rules which recognize inflected, case-insensitive multi-word economic terms from the lexicon. Recognized multi-word terms are combined into syntactic words. During the analysis disambiguation (unification and POS-based selection of interpretations) of terms is also performed. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR and distribution rights holders: Institute of Computer Science, Polish Academy of Sciences. Contact persons: Bartosz Zaborowski, Aleksandra Wieczorek	bartosz.zaborowski@ipipan.waw.pl	SAVARY, A., ZABOROWSKI, B., KRAWCZYK-WIECZOREK, A., MAKOWIECKI, F. (2012): SEJFEK — a Lexicon and a Shallow Grammar of Polish Economic Multi-Word Units, in Proceedings of Cognitive Aspects of the Lexicon (COGALEX-III), a Workshop at COLING 2012, Mumbai, India.
73	02/09/2014 08:16:19	Slovene Lexical Database (SLD)	http://eng.slovenscina.eu/spletni-slovar/leksikalna-baza	Dictionary or lexicon with MWEs (also includes MWEs)	Slovene	2,500 headwords with 10.946 lexical units, including: - 2.053 multi-word lexical units (non-idiomatic lexemes) - 1.446 phraseological units (idiomatic lexemes) - 44.626 collocations (2 content words) - 4.602 extended collocations (more than 2 content words) - 8.298 syntactic combinations	14	Also non-contiguous	Available, unrestricted use	CC BY-NC-SA 2.5	Creative Commons (CC): http://creativecommons.org/examples		SLD is a lexical database with a comprehensive syntactic and semantic description of some of the most common Slovene nouns, verbs, adjectives and adverbs. It was designed as reference dictionary for general public, school population and linguists, however, the encoded syntactic structures and patterns for each registered sense of the word also present an important lexical resource for NLP applications. The database is conceptualized as a network of interrelated lexico-grammatical information (sense, syntax, collocations, examples), with lemma (headword) representing the top hierarchical level and functioning as the umbrella for all lexical units placed under it. MWEs are included as either independent lexical units (multi-word units, phraseological units) or as integral part of other lexical units (collocations, extended collocations, syntactic combinations), depending on their degree of semantic compositionality.		Yes (click continue to fill in more information)	Ministry of Education, Science and Sport	info@slovenscina.eu	GANTAR, Polona, KREK, Simon, 2011: Slovene lexical database. In: Majchraková, D., Garabík, R. (eds.). Natural language processing, multilinguality: sixth international conference, Modra, Slovakia, 20-21 October 2011. pp. 72-80. KREK, Simon, 2012: New Slovene sketch grammar for automatic extraction of lexical data. SKEW3, 3rd International Sketch Engine Workshop, 21-22 March 2012, Brno, Czech Republic.	both	2.053 multi-word units (non-idiomatic lexemes) 1.446 phraseological units (idiomatic lexemes) 44.626 collocations (2 content words) 4.602 extended collocations (more than 2 content words) 8.298 syntactic combinations	2.386 multi-word units 2.175 phraseological units 102.292 collocations 8.420 extended collocations 10.789 syntactic combinations		1. collocations (2 content words 1.1. extended collocations (more than 2 content words) 2. syntactic combinations (semantically transparent, structurally fixed), i.e. combinations with numeric elements, combinations with proper nouns, combinations with prepositions, coordinate structures, similes and analogies 3. multi-word lexical units (at least partly semantically opaque, structurally fixed, not idiomatic) 4. phraseological units (completely semantically opaque, idiomatic meaning is marked)	Automatically extracted from reference 1-billion word Gigafida corpus of written Slovene (http://eng.slovenscina.eu/korpusi/gigafida) and manually validated.			Corpus	0. headword: davek (tax, n) 1. collocations: [prometni, vstopni, plačani] davek, [visok] davek; [odmera, uvedba, plačevanje] davka, [utaja] davkov; [plačevati, pobirati, znižati] davke etc. 2. syntactic combinations: biti oproščen davka; cena brez davka; davek po odbitku etc. 3. multi-word lexical units: davek na dodano vrednost 4. phraseological units: krvni davek
74	02/09/2014 08:22:16	ssj500k dependency treebank	http://eng.slovenscina.eu/tehnologije/ucni-korpus	Treebank with MWE annotations	Slovene	- 500.000 tokens with lemmas and POS tags - 235.000 tokens with dependency tree links, i.e. 11.000 dependency parsed sentences - 4.398 named entities (including multi-word NEs)	23	Only contiguous	Available, unrestricted use	CC BY-NC-SA 2.5	Creative Commons (CC): http://creativecommons.org/examples		The ssj500k treebank was built as a training corpus for machine-learning NLP applications and includes balanced sampled texts from the reference corpus of written Slovene. All texts have been manually segmentated, tokenized and annotated in terms of lemmatization, morphosyntactic tagging, dependency parsing (approx. 1/2) and named entity identification (approx. 1/5). Currently, only multi-word named entities (personal names, place names, organisation names and proper names) are explicitly annotated.		Yes (click continue to fill in more information)	Ministry of Education, Science and Sport	info@slovenscina.eu			1483	1709		1. "geo": place name (215) 2. "org": name of organisation (392) 3. "person": name of person (814) 4. "other": proper names (288)				Corpus	<name type="other"> Grand ssj4.15.54.t16 grand Grand Npmsn Slmei National ssj4.15.54.t17 national National Npmsn Slmei </name>
75	02/09/2014 10:52:18	Serbian Wordnet (SrpWN)	http://metashare.ilsp.gr:8080/repository/browse/serbian-wordnet/e3c4ffae8bde11e288f7001517144592cf4cb1f92d7644319d6c1d339f4d0229/, http://korpus.matf.bg.ac.rs/SrpWN	Wordnet with MWEs	Serbian	10164			Available, restricted use	CC - BY - NC Restrictions: Academic - Non Commercial Use Fee: no price Download location: hidden Distribution Access/Medium: Downloadable	Creative Commons (CC): http://creativecommons.org/examples		Serbian WordNet (SrpWN) represents a lexical semantic network, containing synsets with glosses and various semantic relations, such as antonymy, meronymy, causation, category domain, etc. The initial version of the Serbian Wordnet was produced in the scope of the EU-funded Balkanet project and it contains all synsets from basic concept sets 1 and 2, and two thirds of synsets from basic concept set 3. Through interlingual relations it is connected to English Wordnet (versions 2.0 and 3.0) and wordnets of many other languages. Currently the Serbian Wordnet contains 18,366 synsets (literals 31,274): 1380 adjectives (literals 1887), 2104 verbs (literals 3918), 14,765 nouns (literals 25,298), other 117. 706 synsets are not connected to the PWN, being either Balkan specific concepts (532) or Serbian specific concepts (174). 18,310 synsets have definitions in Serbian, and 1,274 have examples of usage. Semantic relations in SrpWN: hypernym - 16,590; holo_part - 1,298; holo_member - 3,831; holo_portion - 118; near_antonym - 736; be_in_state - 252; causes - 63. From 31,274 literals in SrpWN 10,164 are multi-word units. <Description from META-SHARE>		Yes (click continue to fill in more information)	IPR holder: Cvetana Krstev	cvetana@matf.bg.ac.rs	C. Krstev and B. Djordjević and S. Antonić and N. Ivković-Berček and Z. Zorica and V. Crnogorac and L. Macura, "Cooperative Work in Further Development of Serbian Wordnet," INFOtheca, vol. 9, pp. 59a-78a, May 2008. Cvetana Krstev, Ivan Obradović, Duško Vitas, “An Approach to the Development of Language Specific Concepts in Wordnets”, In Southern Journal of Linguistics, Special Theme: South Slavic and Balkan Languages, Mila Dimitrova-Vulchanova (ed.), Vo. 29, No. 1/2, pp. 106-118, Department of Modern Linguistics, University of Mississippi, 2008. (More references listed in META-SHARE)
76	02/09/2014 11:19:55	The database of Estonian multi-word expressions (ESTMWE)	http://metashare.ilsp.gr:8080/repository/browse/the-database-of-estonian-multi-word-expressions/4d8252e8463411e2a6e4005056b400243ed5ec91ec5044bbb0e85b2ce16f472b/, https://metashare.ut.ee/repository/browse/the-database-of-estonian-multi-word-expressions/4d8252e8463411e2a6e4005056b400243ed5ec91ec5044bbb0e85b2ce16f472b/, http://www.cl.ut.ee/ressursid/pysiyhendid/index.php?lang=en	MWE dictionary or lexicon (MWEs only)	Estonian	12500			Available, unrestricted use	Proprietary Restrictions: Academic - Non Commercial Use Distribution Access/Medium: Accessible Through Interface CC - BY Restrictions: Attribution Distribution Access/Medium: Downloadable	Creative Commons (CC): http://creativecommons.org/examples		This database contains a subtype of multi-word expressions, namely those consisting of a verb and a particle or a verb and its complements.		Yes (click continue to fill in more information)	IPR holder: Tartu Ülikool, University of Tartu. Contact person: Kadri Muischnek		Documentation of information recorded in the database, references, etc: http://www.cl.ut.ee/ressursid/pysiyhendid/index.php?lang=en					This database contains a subtype of multi-word expressions, namely those consisting of a verb and a particle or a verb and its complements. The expressions consisting of a verb and its subject are not included. The multi-word units consisting of a verb and a infinite form of a verb are included irregularly. Subtypes: yv – particle verb nv – expression consisting of a noun (phrase) and a verb; could be divided further into idiomatic expressions and collocations tv – support verb construction av – catenative verb construction
77	02/09/2014 12:12:00	THAMUS lexicons	http://metashare.ilsp.gr:8080/repository/search/?q=thamus	Monolingual and bilingual lexicons with MWEs	Italian, German >Italian, Italian>German, English>Italian, Italian>English				Available, restricted use	Several licenses for different uses (academic/commercial) and users (ELRA members/non-members).	ELRA		28 generic and technical (domain-specific) mono- and bilingual dictionaries. Multi-word terms contain morphological coding for the head word. A full list of the dictionaries, including dictionary name, (the name reflects both domain and linguality), ELRA ID, ELRA catalogue URL, a specification of domain and whether the entries are in canonical or inflected form, the number of entries and language(s), will be made available at the PARSEME WG1 wiki.		Yes (click continue to fill in more information)	Contact person: Mapelli Valérie	mapelli@elda.org
78	04/09/2014 06:59:31	English NN compounds	http://www.csse.unimelb.edu.au/research/lt/resources/ncompound/ncompound.tgz	Monolingual list of MWEs	English	Total instances: 2169. Test instances: 1081 (file name: test). Training instances: 1088 (file name: train). Semantic relations: 20	2	Only contiguous	Available, unrestricted use	This dataset is made available under the terms of the Creative Commons Attribution 3.0 Unported licence (http://creativecommons.org/licenses/by/3.0/), with attribution via citation of the following paper, which describes the dataset in full detail: Kim, Su Nam and Timothy Baldwin (2008) Standardised Evaluation of English Noun Compound Interpretation, In Proceedings of LREC 2008 Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp. 39-42. The paper can be found in the PDF at: http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf	Creative Commons (CC): http://creativecommons.org/examples		This tarball contains the set of noun-noun compounds annotated for semantic relation originally presented in: Kim, Su Nam and Timothy Baldwin (2005) Automatic Interpretation of Noun Compounds using WordNet Similarity, In Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP-05), Jeju, South Korea, pp. 945-56. <From README.txt in tar folder>		Yes (click continue to fill in more information)	Tim Baldwin and Su Nam Kim	tb@ldwin.net	Kim, Su Nam and Timothy Baldwin (2008) Standardised Evaluation of English Noun Compound Interpretation, In Proceedings of LREC 2008 Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp. 39-42. The paper can be found in the PDF at: http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf http://people.eng.unimelb.edu.au/tbaldwin/pubs/nlpke2008.pdf					Noun-noun compounds	Semantic relations between the components of the compund				FORMAT: Format for each instance in the test and training data: NOUN1 NOUN2 RELATION (e.g. "apple pie material" => NC = "apple pie", with semantic relation "material", indicating that the "pie" is made of "apple") SEMANTIC RELATIONS: The semantic relations are as defined in: Barker, Ken and Stan Szpakowicz (1998) Semi-automatic recognition of noun modifier relationships. In Proceedings of the 17th International Conference on Computational Linguistics (COLING 1998), Montreal, Canada, pp. 96-102.
79	04/09/2014 08:17:48	English compound nominalisation interpretation dataset	http://www.csse.unimelb.edu.au/research/lt/resources/nominalisation/nominalisation.tgz	Monolingual list of MWEs	English	464 compounds in total			Available, unrestricted use	This dataset is made available under the terms of the Creative Commons Attribution 3.0 Unported licence (http://creativecommons.org/licenses/by/3.0/), with attribution via citation of the following paper, which describes the dataset in full detail: Nicholson, Jeremy and Timothy Baldwin (2008) Interpreting Compound Nominalisations, In Proceedings of LREC 2008 Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp. 43-45. The paper can be found in the PDF at: http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf	Creative Commons (CC): http://creativecommons.org/examples		The dataset is based on a random sample of 1000 sentences from the BNC. 3 annotators independently identified all binary compound nouns in the dataset, and got together to resolve any disagreements. This led to a total of 464 compound nouns, of which 119 consisted of one or more proper nouns and were excluded (and are tagged as "PN"), leaving 345 compound nouns. Each of these was then again multiply annotated according to the 5-way classification of SUB, DOB, POB, NA or NV, as described below: SUB: the head noun is deverbal, and the modifier correspond to the subject of the base verb (e.g. "student demonstration", interpreted as "_student(s)_ _demonstrate_") OBJ: the head noun is deverbal, and the modifier correspond to the object of the base verb (e.g. "eye irritation", interpreted as "[SOMETHING] _irritates_ the _eye_") POBJ: the head noun is deverbal, and the modifier correspond to a prepositional argument of the base verb (e.g. "bird cage", interpreted as "_cage_ for _bird_") NA: the head noun is deverbal, but the modifier are not an argument of the base verb in an acceptable paraphrase (e.g. "memory size", where "size" can be interpreted as being deverbal, but not meaningfully in this context) NV: the head noun is not deverbal (e.g. "scout hut") In the case that the head noun is (potentially) deverbal, the base verb is provided.		Yes (click continue to fill in more information)	Jeremy Nicholson, Tim Baldwin	tb@ldwin.net	Nicholson, Jeremy and Timothy Baldwin (2008) Interpreting Compound Nominalisations, In Proceedings of LREC 2008 Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp. 43-45. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf		The dataset is based on a random sample of 1000 sentences from the British National Corpus (BNC: Burnard (2000)). 32% of the sentences were found to contain at least one compound noun, with 464 compounds in total. About a quarter (119) of these were identified as containing one or more proper nouns.			Compound nomalisations					SAMPLE ANNOTATION: Annotated examples include: <doc> Demand for the new car is strongest in large urban areas like New <cn rel="PN" hvf="">York city</cn> , Los Angeles and Miami , where bomb ings , riots and car-jackings fill the <cn rel="NA" hvf="bulletin">news bulletins</cn> . </doc> where "news bulletin" has been identified as a compound noun, the head noun has been identified as deverbal (base verb = "bulletin"), and the noun compound type has been tagged as "NA"; <doc> Demand for the new car is strongest in large urban areas like New <cn rel="PN" hvf="">York city</cn> , Los Angeles and Miami , where bomb ings , riots and car-jackings fill the <cn rel="NA" hvf="bulletin">news bulletins</cn> . </doc> where "York city" has been identified as a compound noun but tagged as incorporating a proper noun ("PN"), and "news bulletin" has also been identified as a compound noun, with "bulletin" being derived from the base verb "bulletin" but the compound type again being "NA"; and <doc> During my first attack I experienced some very inaccurate <cn rel="POB" prep="in" com="ADJ" hvf="fire">return fire</cn> which ceased just before I broke away . </doc> where "return fire" is a compound noun, "fire" is the base verb of the head noun, and the modifier is a prepositional object ("POB") of the head noun, where the preposition is "in" (i.e. the interpretation is of the form "fire in return").
80	04/09/2014 08:25:46	Deep lexical acquisition of English verb-particle constructions	http://www.csse.unimelb.edu.au/research/lt/resources/vpc/vpc.tgz	Monolingual list of MWEs	English				Available, unrestricted use	his dataset is made available under the terms of the Creative Commons Attribution 3.0 Unported licence (http://creativecommons.org/licenses/by/3.0/), with attribution via citation of the following paper, which describes the dataset in full detail: Baldwin, Timothy (2008) A Resource for Evaluating the Deep Lexical Acquisition of English Verb-Particle Constructions, In Proceedings of LREC 2008 Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp. 1-2. The paper can be found in the PDF at: http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf	Creative Commons (CC): http://creativecommons.org/examples		This is a sample of VPC token instances identified by the (various) POS tagger-, chunker-, chunk grammar-, and parser-based extraction methods of: Baldwin, Timothy (2005) The Deep Lexical Acquisition of English Verb-particle Constructions, Computer Speech and Language, Special Issue on Multiword Expressions, Volume 19, Issue 4, pp. 398-414. as having high confidence of being evidence of either an intransitive VPC or (simple) transitive VPC for a given verb--preposition combination. The data is separated into individual sets of instances for each verb--preposition combination, with up to 50 (putative) token instances each of the two valences. In addition, there is a gold-standard set of intransitive and transitive VPCs generated by hand-checking the sets of evidence to check that there is at least one true positive VPC instance, and further filtering out simple adverbial VPCs (e.g. "walk in"). Full details of the different files can be found in readme.file, and full details of the different tasks can be found in readme.task. <From README.txt>		Yes (click continue to fill in more information)	Tim Baldwin	tb@ldwin.net	Baldwin, Timothy (2008) A Resource for Evaluating the Deep Lexical Acquisition of English Verb-Particle Constructions, In Proceedings of LREC 2008 Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp. 1-2. The paper can be found in the PDF at: http://www.lrec-conf.org/proceedings/lrec2008/workshops/W20_Proceedings.pdf					particle verbs (VPC)
81	04/09/2014 08:31:29	List of MWE resources (part of a chapter on MWEs in the Handbook of Natural Language Processing, second edition)	http://handbookofnlp.cse.unsw.edu.au/?n=Chapter12.Chapter12	List of MWE resources, tools, workshops and bibliographic references	several								A list of MWE resources that Tim Baldwin and Su Nam Kim put together as part of a chapter on MWEs in the Handbook of Natural Language Processing, second edition: http://handbookofnlp.cse.unsw.edu.au/?n=Chapter12.Chapter12		Yes (click continue to fill in more information)	Contact person: Tim Baldwin	tb@ldwin.net	@incollection{baldwin-handbook10, author = {Timothy Baldwin and Su Nam Kim}, title = {Multiword Expressions}, booktitle = {Handbook of Natural Language Processing, Second Edition}, editor = {Nitin Indurkhya and Fred J. Damerau}, publisher = {CRC Press, Taylor and Francis Group}, address = {Boca Raton, FL}, year = {2010}, note = {ISBN 978-1420085921} }						Several
82	04/09/2014 08:39:26	Bilingual Spanish-English and English-Spanish lexicons (INCYTA)	http://metashare.ilsp.gr:8080/repository/search/?q=INCYTA,	Bilingual lexicon with MWEs	Spanish > English, English > Spanish				Available, restricted use	Several licenses for different uses (academic/commercial) and users (ELRA members/non-members).	ELRA		Collection of bilingual lexicons from several domains. The metadata is collected from the META-SHARE catalogue. All lexicons have the (MWE survey) category "Bilingual lexicon with MWEs", and it seems like all of them are bidirectional English>Spanish and Spanish>English (this must be checked and verified).		Yes (click continue to fill in more information)	Contact person: Mapelli Valérie	mapelli@elda.org
83	04/09/2014 12:02:12	SIGLEX-MWE Software & Resources for MWE	http://multiword.sourceforge.net/PHITE.php?sitesig=FILES, http://sourceforge.net/projects/multiword/	data sets with MWEs	several	25 data sets, 6 tools			Available, unrestricted use	GNU General Public License version 2.0 (GPLv2)	GPL		The central forum for the MWE community. Share your open-source data sets and MWE extraction tools, exchange ideas on evaluation strategies and further development of the tools, and discuss theoretical definitions and linguistic properties of MWEs. <From http://sourceforge.net/projects/multiword/>		No (click continue to submit)
84	12/09/2014 09:52:55	Algemeen Nederlands Woordenboek (ANW, Dictionary of Contemporary Dutch)	https://catalog.clarin.eu/vlo/record?4&docId=hdl_58_10032_47_056f1e3bdb30c3ac022916421452e7f0&q=multiword+expressions&index=0&count=13, http://anw.inl.nl/search	Dictionary or lexicon with MWEs (also includes MWEs)	Dutch				Available, restricted use	free for academic use; non applicable for commercial parties	unknown		The Algemeen Nederlands Woordenboek (ANW, Dictionary of Contemporary Dutch) is a corpus-based, scholarly dictionary of contemporary standard Dutch in the Netherlands and in Flanders, describing the Dutch vocabulary from 1970 onwards. The dictionary provides information on form, content and use of words belonging to the general vocabulary of Dutch and it focuses on written language. It provides semasiological and onomasiological access to the dictionary and is meant to be useful for a wide range of users. The ANW can be characterised as an online dictionary under construction.		Yes (click continue to fill in more information)	Creator: Institute of Dutch Lexicology (Instituut voor Nederlandse Lexicologie, INL), Description and Production (Descriptie en Productie)	servicedesk@inl.nl						"Lexical subtypes": proper names, terminology, multi-word expressions
85	14/09/2014 08:00:40	MkdComp	no website	MWE dictionary or lexicon (MWEs only)	Macedonian	784	6	Only contiguous	Unknown			yes			Yes (click continue to fill in more information)	Aleksandar Petrovski	a.petrovski.sise@gmail.com		intensional and extensional	784	6273	42	Compound nouns, compound adjectives, adverbs, conjunctions, prepositions, compound terminology, named entities				Dictionary, Corpus
86	23/09/2014 13:09:34	JRC-Names	https://ec.europa.eu/jrc/en/language-technologies/jrc-names	Multilingual parallel list of MWEs	dozens of languages written in over twenty different scripts	about 280,000 distinct names (first name and last name) plus about 320,000 spelling variants (status September 2014), growing daily		Only contiguous	Available, restricted use	European Commission End-user Licence Agreement (EULA), mostly free see http://optima.jrc.it/Resources/LICENCE-EULA_JRC-Names_2011.pdf			JRC-Names is a highly multilingual named entity resource for person and organisation names (called 'entities'). It consists of large lists of names and their many spelling variants (up to hundreds for a single person), including across scripts (Latin, Greek, Arabic, Cyrillic, Japanese, Chinese, etc.). The named entity resource file with the list of spelling variants is accompanied by Java-implemented demonstrator software that (a) allows to produce - for any input name - a list of known spelling variants, and that (b) analyses UTF8-encoded text files to find known entity mentions, returning the name variant found, the preferred display name for that entity, the unique name identifier for that name, the position of the entity name in the text, and its length in characters. All entity variants were found in real-life text. Spelling mistakes are included on purpose as these occur in real life and they help retrieve intended name mentions.		Yes (click continue to fill in more information)	European Commission - Joint Research Centre (JRC)	Ralf.Steinberger@jrc.ec.europa.eu	Steinberger Ralf, Bruno Pouliquen, Mijail Kabadjov, Jenya Belyaeva & Erik van der Goot (2011). JRC-Names: A freely available, highly multilingual named entity resource. Proceedings of the 8th International Conference Recent Advances in Natural Language Processing (RANLP). Hissar, Bulgaria, 12-14 September 2011.		280000	320000		named entities				Corpus	24 P Vladimir+Putin 24 P Владимр+Путин 24 P วลาดิมีร์+ปูติน 24 P Влади́мир+Влади́мирович+Пу́тин 24 P Vadimir+Poutine 24 P 普京 24 P 弗拉基米尔普京 24 P Vladimir+Putin+Владимир+Путин 24 P فلاديمير+بوتين 24 P ولادمير+پوتين 24 P ვლადიმირ+პუტინი 24 P فلادمير+بوتين 24 P 弗拉基米尔•普京 24 P Vladimir+Vladimirovitch+Putin 24 P Владимиры+фырт+Владимир+Путин 24 P Vladimir+Putín 24 P Vlagyimír+Putyin 24 P Vladìmir+Putin 24 P 弗拉基米尔•弗拉基米罗维奇•普京 24 P Vladimir+Puttin 24 P Vladimir+Vladimorovich+Putin 24 P ウラジーミルプーチン 24 P Vladimir+Poutin 24 P Вадимир+Путин 24 P Βλαντίμιρ+Πούτιν 24 P Władimir+Putin 24 P Vladimira+Putina 24 P Valdimir+Poetin 24 P Владмир+Путин 24 P Władimira+Putin 24 P וולאדימיר+פוטין 24 P Vladimr+Poutine 24 P Valdímir+Putin 24 P فلاديمير+جيريرو 24 P Владимир+Владимирович+Путин 24 P Vladimir+Poutine 24 P Vladmir+Putin 24 P Vladimir+Putin-Владимир+Путин 24 P Vladimirju+Putinu 24 P Владимиир+Путин 24 P ލަޑިމިއަރ+ޕޫޓިން 24 P Vladimir+Vladimirovic+Putin 24 P Vladimir+Vladimorovitsj+Poetin 24 P Vladimir+Vladimirovich+Putin 24 P Vladimirus+Putin 24 P Vladimir+Vladimirovic+Poutine 24 P Путін+Володимир 24 P Vladimir+Vladimirovič+Putin 24 P Vlidamir+Putin 24 P Vládimir+Putin 24 P 弗拉基米尔+普京 24 P Vladimír+Putin 24 P Wladimr+Putin 24 P Vladamir+Poutine 24 P Уладзімір+Пуцін 24 P Vladimir+Vladimirovitsj+Poetin 24 P Vladimir+Poetin 24 P Vladamir+Putin 24 P Vladimir+Ptin 24 P Վլադիմիր+Պուտին 24 P Vladímir+Vladímirovich+Putin 24 P Vladimiras+Putinas 24 P Vladímir+Putin 24 P Wladimir+Poetin 24 P ウラジーミル・プーチン 24 P Vladimirjem+Putinom 24 P Vladimirr+Putin 24 P Vladimier+Poetin 24 P Vladimir+Vladimirovitj+Putin 24 P 弗拉基米尔+弗拉基米罗维奇+普京 24 P Vladimirja+Putina 24 P Βλαντιμίρ+Πούτιν 24 P Vladímir+Ptin 24 P Vadimir+Putin 24 P Vladimir+Pekhtin 24 P Vlagyimir+Vlagyimirovics+Putyin 24 P Waldimir+Putin 24 P Putin+Vladimir 24 P Valadimir+Poutine 24 P Vladmir+Poutine 24 P Vladimir+Putyin 24 P 弗拉基米尔弗拉基米罗维奇普京 24 P Vlagyimir+Putyin 24 P 블라디미르+푸틴 24 P Wladimir+Wladimirowitsch+Putin 24 P Vladimir+Ptuin 24 P Wladimir+Poutine 24 P Wlaidimir+Putin 24 P விளாடிமிர்+பூட்டின் 24 P Vladimir+PUTIN 24 P Vladimir+Putin+Vladimir+Yakovlev 24 P Vlaidimir+Putin 24 P Valdiimir+Putin 24 P Путін+Володимир+Володимирович 24 P ولادیمیر+پوتین 24 P Владимиръ+Пѹтинъ 24 P Владимир+Путин 24 P Владамир+Путин 24 P Vladimir+Pútin 24 P Vladimin+Putin 24 P Wiladimir+Putin 24 P Vladimir+Vladimirovici+Putin 24 P ולדימיר+פוטין 24 P Władymir+Putin etc. ...
87	23/09/2014 13:29:34	Parallel English-French split phrasal verbs	http://cameleon.imag.fr/xwiki/bin/view/Main/Phrasal_verbs_annotation	Multilingual parallel list of MWEs	English, French	750	2	Also non-contiguous	Available, unrestricted use				We evaluated the difficulty in translation English phrasal verbs (e.g. give up, take off) into French using a standard Moses SMT system. We focused on transitive, split occurrences (e.g. take my shirt off) and compared hierarchical and phrase-based models. The resource contains English sentences with split phrasal verbs marked, and corresponding automatic SMT translations in French with the translations of the source phrasal verbs also marked. Reference translations are not provided but can be easily retrieved from the WIT3 TED Corpus.		No (click continue to submit)
88	23/09/2014 22:13:51	ITU Web2.0 Treebank	no website yet	Treebank with MWE annotations	Turkish	2860 MWEs 5K sentences	3	Also non-contiguous	Available, restricted use	academic use only commercial use for a fee	not specified yet	yes	ITU Web2.0 Treebank is recent effort of creating a web treebank for Turkish. It has annotations in multiple layers: normalization, morphology, MWEs and syntax		No (click continue to submit)
89	23/09/2014 22:16:26	ITU-METU-Sabancı Turkish Dependency Treebak	no	Treebank with MWE annotations	Turkish	3531	3	Also non-contiguous	Available, restricted use	academic use only	not specified yet	yes	The reannotation of the METU-Sabancı Turkish Treebank with new dependency annotation schemes and MWEs		No (click continue to submit)
90	29/09/2014 08:59:46	WikiMwe	www.ukp.tu-darmstadt.de/data/wikimwe/	Monolingual list of MWEs	English	> 350,000	4	Only contiguous	Available, restricted use	CC-BY-SA	Creative Commons (CC): http://creativecommons.org/examples		WikiMwe is a large resource of English multiword expressions mined from Wikipedia. It contains over 350,000 multiword units of size 2-4, including technical terminology, non-compositional multiword expressions, and collocations. For each entry, POS and frequency information and pointwise mutual information (PMI) scores are included. Additionally, we provide definitional and category information for many entries, in order to facilitate the application of the resource in theoretical (semantic similarity, domain disambiguation) and applied (terminology extraction) natural language processing research. Details on WikiMwe can be found in the following publication: S. Hartmann, G. Szarvas, and I. Gurevych (2011). Mining Multiword Terms from Wikipedia, in M.T. Pazienza & A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, pp. 226-258, Hershey, PA, USA: IGI Global.		Yes (click continue to fill in more information)	Silvana Hartmann, UKP Lab, Technische Universität Darmstadt	hartmann@ukp.informatik.tu-darmstadt.de	http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&pub_id=TUD-CS-2011-0204&type=99&bibtex=yes	no inflection patterns								English Wikipedia corpus
91	01/10/2014 06:33:10	Ontology of Rhetorical Figures for Serbian (RetFig)	http://resursi.mmiljana.com/RetFigS.aspx	ontology	Serbian	98 figures			Unknown			yes	The RetFig page http://resursi.mmiljana.com/RetFigS.aspx contains a classification of rhetorical figures in Serbian. Clicking on the + sign shows an example for each rhetorical figure. If you wish to download the ontology in XML or OWL format, you first need to send an authentication request to the moderator (Kontakt form) and once you get your username and password, you can sign up (Prijava form) and you will see a link for download at the bottom of the page. If you wish to find out a bit more about the ontology, this paper gives more details: Ontology of Rhetorical Figures for Serbian Miljana Mladenović, Jelena Mitrović, Text, Speech, and Dialogue, Lecture Notes in Computer Science Volume 8082, 2013, pp 386-393 http://link.springer.com/chapter/10.1007%2F978-3-642-40585-3_49 From the paper introduction: "Natural language texts are not always ”flat” with unique, ordinary, untwisted literal meaning. On the contrary, texts written in a natural language almost always have more than one meaning, due to the usage of various linguistic operations over words, phrases, sentences, et cetera. Without taking these facts into consideration, we can get incomplete and imprecise results in some NLP tasks. This especially holds true in areas of opinion mining, sentiment analysis and discourse analysis. For example, if we say ”He is as fast as light”, this statement will be marked as a positive opinion statement. On the other hand, if we say ”He is as fast as a turtle”, opinion mining techniques will not show the correct result unless we include the process of detection of rhetorical figures. Our first task, in this direction, is to create the very first formal and comprehensive domain ontology of rhetorical figures in Serbian that will lead us, primarily, towards an ontology based semantic tool for annotation of rhetorical figures and implementations in other NLP tasks."		Yes (click continue to fill in more information)	Miljana Mladenovic, Jelena Mitrovic	jmitrovic@gmail.com	Mladenović, Miljana, and Jelena Mitrović. "Ontology of Rhetorical Figures for Serbian." Text, Speech, and Dialogue. Springer Berlin Heidelberg, 2013.
92	02/10/2014 08:27:42	Pattern Dictionary of English Verbs	http://pdev.org.uk/	Pattern Dictionary of English Verbs	English	5793 verbs									No (click continue to submit)
93	08/10/2014 13:55:44	BabelNet	http://babelnet.org	Dictionary or lexicon with MWEs (also includes MWEs)	50 languages.	49 million lemmas		Only contiguous	Available, restricted use	CC-BY-NC	Creative Commons (CC): http://creativecommons.org/examples		BabelNet is both a multilingual encyclopedic dictionary, with lexicographic and encyclopedic coverage of terms, and a semantic network which connects concepts and named entities in a very large network of semantic relations, made up of more than 9 million entries, called Babel synsets. Each Babel synset represents a given meaning and contains all the synonyms which express that meaning in a range of different languages.		No (click continue to submit)
94	13/10/2014 17:24:18	SemLex	http://ufal.mff.cuni.cz/lexemann	MWE dictionary or lexicon (MWEs only)	Czech	almost 9,000 MWEs	12	Only contiguous	Available, unrestricted use	CC-BY	Creative Commons (CC): http://creativecommons.org/examples		The SemLex lexicon was compiled during the annotation of MWEs in the Prague Dependency Treebank and it should contain all MWEs occurring in it. There are almost 9,000 MWEs in the lexicon and they are connected to the text data (800,000 words). Each entry includes its basic form, lemmas, frequency of the MWE in annotated corpora, syntactic structure (i.e. the topology of lemmas in the dependency tree) and deep syntactic structure (analogicaly with tectogrammatical tree and lemmas). Majority of entries has part of speech of the whole phrase (i.e. it can be used in place of e.g. noun in the sentence). Some of them have gloss, example, or synonyms. There is no categorization acording to for instance PoS of the MWE components -- but this information can be obtained from the corpus.		Yes (click continue to fill in more information)	Charles University in Prague, ÚFAL (Pavel Straňák, Eduard Bejček)	bejcek@ufal.mff.cuni.cz	Bejček Eduard, Straňák Pavel: Annotation of Multiword Expressions in the Prague Dependency Treebank. In: Language Resources and Evaluation, Vol. 44, No. 1-2, Copyright © Springer Netherlands, ISSN 1574-020X, pp. 7-21, Apr 2010 Straňák Pavel: Annotation of Multiword Expressions in The Prague Dependency Treebank. Ph.D. thesis, Univerzita Karlova v Praze, Prague, Czech Republic, 79 pp., Sep 2010		8800				Part of speech of the whole MWE together. Each MWE is linked to (several) occurences in the data.	Functional Generative Description		Corpus	BASIC_FORM: bezpečnostní pás (= safety belt) LEMMATIZED: bezpečnostní pás GLOSS: dlouhý pruh n. předmět v růz. zařízeních (brief description) POS: N PDT25_FREQ: 1 TREE_STRUCT: pás — [head] bezpečnostní → 1 BASIC_FORM: zákon o dani z nemovitostí (= real estate tax law) LEMMATIZED: zákon o daň z nemovitost POS: N PDT25_FREQ: 0 (due to a inter-annotator disagreement) TREE_STRUCT: zákon — [head] daň → 1 nemovitost → 2 BASIC_FORM: Jihovýchodní Asie (= Southeast Asia) LEMMATIZED: jihovýchodní Asie POS: N PDT25_FREQ: 4 TREE_STRUCT: Asie — [head] jihovýchodní → 1 BASIC_FORM: držet hubu (= shut up) LEMMATIZED: držet huba POS: V PDT25_FREQ: 1 TREE_STRUCT: držet — [head] huba → 1 BASIC_FORM: tím pádem (= consequently) LEMMATIZED: ten pád POS: D PDT25_FREQ: 7 TREE_STRUCT: pád — [head] tím → 1
95	13/10/2014 23:38:10	VALLEX	http://ufal.mff.cuni.cz/vallex	Valency lexicon	Czech	4250 verbs, 6460 entries		Also non-contiguous	Available, unrestricted use	CC 3.0 BY-NC-SA	Creative Commons (CC): http://creativecommons.org/examples		The Valency Lexicon of Czech Verbs, Version 2 (VALLEX 2.x), is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. VALLEX 2.x has been developed at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague. VALLEX 2.x is a successor of VALLEX 1.0, extended in both theoretical and quantitative aspects. VALLEX 2.x provides information on the valency structure (combinatorial potential) of verbs in their particular senses. VALLEX is closely related to the Prague Dependency Treebank project: both of them use Functional Generative Description (FGD), being developed by Petr Sgall and his collaborators since the 1960s, as the background theory. In VALLEX 2.x, there are roughly 2,730 lexeme entries containing together around 6,460 lexical units ("senses"). Note that VALLEX 2.x - according to FGD, but unlike traditional dictionaries and also unlike VALLEX 1.0 - treats a pair of perfective and imperfective aspectual counterparts as a single lexeme (if perfective and imperfective verbs would be counted separately, the size of VALLEX 2.x would virtually grow to 4,250 verb entries). To ensure high quality of the data, all VALLEX entries have been created manually, using several previously existing lexicons as well as corpus evidence from the Czech National Corpus.		Yes (click continue to fill in more information)	Charles University in Prague, ÚFAL (Markéta Lopatková, Zdeněk Žabokrtský, Václava Kettnerová, Eduard Bejček)	bejcek@ufal.mff.cuni.cz	Lopatková, M., Žabokrtský, Z., Kettnerová, V.: Valenční slovník českých sloves. Praha: Karolinum, 382 p., 2008 (ve spolupráci se Skwarskou, K., Bejčkem, E., Hrstkovou, K., Novou, M., Tichým, M.) Žabokrtský Zdeněk, Lopatková Markéta: Valency Information in VALLEX 2.0: Logical Structure of the Lexicon. The Prague Bulletin of Mathematical Linguistics, No. 87, pp. 41-60, 2007.	Intensional				verbal valency		Functional Generative Description		Dictionary, Corpus	angažovat {biasp} [ 1 ] ≈ zaměstnat / zaměstnávat -frame: ACT(1){obl} PAT(4){obl} EFF(jako+4){opt} DIR3(){typ} -example: angažoval otce jako vyjednávače; angažovali herce do nové revue -rfl: pass: do nové hry se nakonec angažovali jen osvědčení herci -class: appoint verb [ 2 ] ≈ učinit/činit účastným -frame: ACT(1){obl} PAT(4){obl} LOC(){typ} -example: angažovat občany v boji za lepší zítřky -rfl: pass: občané se angažovali v boji za lepší zítřky
96	13/10/2014 23:46:02	PDT-Vallex	http://ufal.mff.cuni.cz/PDT-Vallex/	Valency lexicon	Czech	over 11000 valency frames for more than 7000 verbs		Also non-contiguous	Available, unrestricted use	CC BY-NC-SA 3.0	Creative Commons (CC): http://creativecommons.org/examples		The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT). It contains over 11000 valency frames for more than 7000 verbs which occurred in the PDT or PCEDT. It is available in electronically processable format (XML) together with the aforementioned treebanks (to be viewed and edited by TrEd, the PDT/PCEDT main annotation tool) , and also in more human readable form (see the links above and below). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives.		Yes (click continue to fill in more information)		uresova@ufal.mff.cuni.cz	1. Urešová Zdeňka: PDT-Vallex - trochu jiný valenční slovník. In: Slovo – Tvorba – Dynamickosť. Na počesť Kláry Buzássyovej, Copyright © Veda, Bratislava, Slovakia, ISBN 978-80-224-1107-3, pp. 278-286, 2010 2. Urešová Zdeňka: Building the PDT-VALLEX valency lexicon. In: On-line Proceedings of the fifth Corpus Linguistics Conference, http://ucrel.lancs.ac.uk/publications/cl2009, University of Liverpool, UK. 2009 3. Hajič Jan, Panevová Jarmila, Urešová Zdeňka, Bémová Alevtina, Kolářová Veronika, Pajas Petr: PDT-VALLEX: Creating a Large-coverage Valency Lexicon for Treebank Annotation. In: Proceedings of The Second Workshop on Treebanks and Linguistic Theories, Copyright © Vaxjo University Press, Vaxjo, Sweden, ISBN 91-7636-394-5, ISSN 1651-0267, pp. 57-68, Nov. 2003	Intensional				valency of verbs, nouns, adjectives and adverbs			Functional Generative Description	Dictionary, Corpus	angažovat angažovat-1 ACT(1) PAT(4) ?EFF(.4[{jako,jakožto}:/AuxY];za+4) (zaměstnat) angažoval neherce jako herce angažovat-2 (1x) ACT(1) PAT(4) (motivovat) akce angažovala lidi
97	07/01/2015 21:24:18	Serbian DELA e-dictionary	no website	Dictionary or lexicon with MWEs (also includes MWEs)	Serbian	4,581,657 simple word forms for 133,361 different lemmas 262,686 multi-word forms for 13.717 different lemmas	7	Only contiguous	Available, restricted use	Restricted use: attribution, academic use only, commercial use for a fee, no derivatives, no redistribution		yes	Dictionary contains inflected forms and lemmas for both single and compound words. Example of a compound entry: švedsku pelenu,švedska pelena.N:fs4q Inflected form: švedsku pelenu Lemma: švedska pelena 'Swedish diapers' Semantic marker: Concrete Category: N (noun) Morphological features: - Gender: f (feminine) - Number: s (singular) - Case: 4 (accusative) - Animacy: q (non-animate)		Yes (click continue to fill in more information)	Cvetana Krstev, Duško Vitas	cvetana@matf.bg.ac.rs	Cvetana Krstev, Processing of Serbian – Automata, Texts and Electronic dictionaries Faculty of Philology, University of Belgrade, Belgrade, 2008. Cvetana Krstev, Duško Vitas, Agata Savary, “Prerequisites for a Comprehensive Dictionary of Serbian”, in Proceedings of the 5th International Conference on NLP, FinTAL 2006, Turku, Finland, August, 2006, eds. Tapio Salakoski et al., LNAI, pp. 552-564, Springer, Berlin, Heidelberg, 2006 Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas, “An Approach to Efficient Processing of Multi-word Units”, in Computational Linguistics - Applications, eds. Adam Przepiórkowski et al, Studies in Computational Intelligence 458, Springer-Verlag, Berlin Heidelberg, DOI 10.1007/978-3-642-34399-5_6, pp. 109-229, 2013.	Extensional	13717	262686	108	Contiguous general language MWEs, mainly compound nouns and adjectives, prepositions, conjunctions, adverbs and interjections. Contains also terminology (mainly form Library and Information Science)	All MWEs have additional information in form of markers: semantic (e.g. +Hum for human), pronunciation (e.g. +Ek for Ekavian), domain (e.g. +DoM=Culinary).	None	Corpus processor Unitex	MWEs are extracted from corpora, from traditional dictionaries and also added manually	žutog kao limun,žut kao limun.A+Col:adms4v Inflected form: žutog kao limun Lemma: žut kao limun 'yellow as a lemon' Category: A (adjective) Semantic: +Col (color) Morphological features: - Degree: a (positive) - Definiteness: d (yes) - Gender: m (masculine) - Number: s (singular) - Case: 4 (accusative) - Animatness: v (animate)
98	08/01/2015 13:51:45	Verne80days_MSD+MWU+NE_Serbian	no website	annotated text	Serbian	54,899 units		Only contiguous	Available, restricted use	Restrictied use (attribution, academic use only, commercial use for a fee, no derivatives, no redistribution)		yes	Jules Verne's novel "Around the World in 80 Days" lemmatized and morphologically annotated (simple words, MWE, Named Entities). Multiword units include conjunctions, interjections, prepositions, adverbs, nouns and adjectives. Named Entities include persons, organizations, geo-political names, time expressions and amount expressions.		Yes (click continue to fill in more information)	Cvetana Krstev, Duško Vitas	cvetana@matf.bg.ac.rs	Cvetana Krstev, Processing of Serbian – Automata, Texts and Electronic dictionaries Faculty of Philology, University of Belgrade, Belgrade, 2008. Duško Vitas, Svetla Koeva, Cvetana Krstev, Ivan Obradović, “Tour du monde through the dictionaries”, Actes du 27eme Colloque International sur le Lexique et la Gammaire, L'Aquila, 10-13 septembre 2008, eds. M. Constant, T, Nakamura, M. De Gioia, S. Vecchiato, pp.249-256, Universite Paris-Est, Institut Gaspard-Monge, 2008.					Text contains 54,899 units. Out of this number, there are 954 MWUs and 3,036 NEs. Among MWUs there are: 391 noun, 1 adjective, 6 numerals, 141 conjunctions, 279 adverbs, 122 prepositions, 2 interjections. Among NEs, 2049 are MWUs. There are 56 (37 MWUs) organization names, 644 (543 MWUs) temporal expressions, 1144 (165 MWUs) geo-political names, 555 (534 MWUs) amount expressions and 1123 (770 MWUs) personal names.		none	Unitex corpus processor		{Električni sat,električni sat.N+Comp+Conc:ms1q} {iznad,iznad.PREP+p2} {kamina,kamin.N+Sr:ms2q} {bio,biti.V+Imperf+Tr+Iref+Aux:Gsm} {je,jesam.V+Imperf+It+Iref+Aux:Pzsi} {spojen,spojiti.V+Perf+Tr+Iref+Ref:Tms} {sa,sa.PREP+p6} {satom,sat.N:ms6q} {u,u.PREP+p7} {spavaćoj sobi,spavaća soba.N+Comp:fs7q} {Fileasa Foga,.NE+persName+full:ms2v} ,
99	27/01/2015 17:57:56	MILA Lexicon	http://www.mila.cs.technion.ac.il/resources_lexicons_mila.html	Dictionary or lexicon with MWEs (also includes MWEs)	Hebrew	3000	4	Only contiguous	Available, restricted use	For non-commercial research purposes, this resource is licensed under the GNU General Public License (GPL)	GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html				No (click continue to submit)
100	27/01/2015 18:01:46	Hebrew Verb Complements Lexicon	http://www.mila.cs.technion.ac.il/resources_lexicons_verbcomplements.html	MWE dictionary or lexicon (MWEs only)	Hebrew	5600	2	Also non-contiguous	Available, unrestricted use	For non-commercial research purposes, this resource is licensed under the GNU General Public License (GPL)	GNU General Public Licence (GPL): http://www.gnu.org/licenses/gpl.html		Statistics on the likelihood of seeing a verb co-ocurring with any of the six most frequent prepositions in Hebrew.		No (click continue to submit)
101	27/01/2015 21:07:07	MWUEI	no website	Dictionary or lexicon with MWEs (also includes MWEs)	English-Italian	approx. 14,000	5	Also non-contiguous	Unknown			no		still under development	No (click continue to submit)
102	28/01/2015 16:17:45	WICOL	http://www.vronk.net/wicol/index.php/Main_Page	MWE dictionary or lexicon (MWEs only)	Slovak, German	collocational profiles of: for Slovak 255 nouns 730 adjectives 10 adverbs 8 verbs for German only 49 verbs for German-Slovak 500 nouns 285 verbs 262 adjectives 287 proverbs for German and Slovak	5	Also non-contiguous	Available, restricted use			yes			Yes (click continue to fill in more information)	Prof. Dr. Peter Ďurčo	durco@vronk.net	Ďurčo, Peter – Banášová, Monika – Hanzlíčková, Astrid: Feste Wortverbindungen im Kontrast. Trnava: UCM 2010, 128 s. ISBN 978-80-8105-197-5 Ďurčo, Peter: Zum Konzept eines zweisprachigen Kollokationswörterbuchs. Prinzipien der Erstellung am Beispiel Deutsch – Slowakisch. In: F. Hausmann (Hrsg.): Collocations in European lexicography and dictionary research. Lexicographica, Vol. 24. Tübingen: Niemeyer Verlag 2008, 69-89. ISSN 0175-6206 Ďurčo, P. – Garabík, R. – Majchráková, D. – Ďurčo, M.: Contrastive Dictionary of German and Slovak Collocations. In: Cognitive Studies/Études Cognitives, Vol. 9, 2009, Warsaw: Institute of Slavic Studies, Polish Academy of Sciences, 101-115.	Extensional								Dictionary, Corpus
103	14/02/2015 14:19:41	CombiNet	http://combinet.humnet.unipi.it/	MWE dictionary or lexicon (MWEs only)	Italian			Also non-contiguous					CombiNet is an ongoing project funded by the Italian Ministry of Education, University and Research (MIUR) that aims at developing a corpus-based online dictionary of Italian Word Combinations, i.e. MWEs of various kinds as well as distributional profiles of single words (argument structure patterns, subcategorization frames, and selectional preferences).
104	17/02/2015 13:12:49	Jakob-Lexikon	http://www.jakoblexikon.ch	Dictionary or lexicon with MWEs (also includes MWEs)	German	Around 1200 verbal MWEs	10	Also non-contiguous	Unknown				The lexicon was built for psycho-semantic analysis of texts. For further information contact Mark Luder via info@jakoblexikon.ch		No (click continue to submit)
105	03/09/2015 09:31:00	Greek MWEs DB	to be provided end of October 2015	DB with MWEs only	Greek	around 300 verbal MWEs	8	Also non-contiguous	Unknown			it will be made available at the end of October 2015 when the terms of availability will be specified	An ongoing research project. The DB is aimed to serve both as an NLP resource and as a dictionary. To this end it provides exhaustive morphological description of fixed parts using the PAROLE tagset, structural description (free XPs, possible word order permutations, binding and control phenomena), variant forms of the MWE, relations between variants if any, syntactic alternations (causative-inchoative, passivisation, dative genitive alternation). Structural description is theory neutral. On the lexicographic front it provides a glossing of the MWE, an English translation, a fully glossed usage example retrieved from corpora or the WEB, corpus examples, incorrect usages of the MWE testing structural properties, synonymous MWEs and MWEs with the opposite meaning and pairs of verb MWEs that have the relation of causative-inchoative structures but with a different verbal head. The xml output of the DB is formatted according to LMF --this is ongoing work.			Stella Markantonatou, Panagiotis Minos, Erasmia Koletti, Elpiniki Margariti, Aimilia Stripelli, George Zakis	Contanct person: Stella Markantonatou, email: marks@ilsp.athena-innovation.gr	(1) Stella Markantonatou, Erasmia Koletti, Elpiniki Margariti, Panagiotis Minos, Aimilia Stripeli, Georgios Zakis, Niki Samaridi. 2015. Lexical Resource for free subject verb MWEs. Parseme 4th general meeting (2) Stella Markantonatou, Erasmia Koletti, Elpiniki Margariti, Panagiotis Minos, Aimilia Stripeli, Georgios Zakis, Niki Samaridi. 2015.Lexical resource for free subject verb MWEs. Modern Greek MWE 2015’ in the framework of the 12th International Conference on Greek Linguistics, 16th September 2015.	both				Free subject verb MWEs				HNC http://hnc.ilsp.gr/ and the web