Marco turchiMiscellaneous
- Co-Organizer of: - Intelligent Analysis and Processing of Web News Content workshop at WI-IAT - Milan 15 September 2009
- Statistical Multilingual Analysis for Retrieval and Translation associated workshop at EAMT - Barcelona 13 May 2009
- European Project SMART Meeting in Bristol May, 2008
- Coordinator and head coach of basketball teams from September 1993 - Student Co-advisor for Master and Degree thesis on Text Analysis
NLP/Text Mining Libraries
- Gate a General Architecture for Text Engineering - Weka Data Mining software in Java - Apache Lucene: information retrieval library - lingpipe: Java libraries for the linguistic analysis of
human language
SMT tools
- Moses: statistical Machine Translation System - srilm: toolkit for building and applying statistical language
models (LMs) - irstlm: LM toolkit - Giza++: training of statistical translation models - Multi-thread GIZA: multi-thread extension to GIZA++ word aligning tool.
General purpose Libraries
- SVMlight: an implementation of Support Vector Machines (SVMs) in C - Apache Cayenne: persistence framework providing object-relational mapping (ORM) and remoting services - SciPy: software for mathematics, science, and engineering in Python - mysql++: C++ wrapper for MySQL’s C API
Corpora
- Europarl:
parallel corpus for SMT in 11 European languages: Romanic (French,
Italian, Spanish, Portuguese), Germanic (English, Dutch, German,
Danish, Swedish), Greek and Finnish. - JRC-Acquis: parallel corpus for SMT in 22 languages. - seTimes: parallel corpus for SMT for Balcanic languages: Turkish, Croatian, Albanian, Serbian, Macedonian, Bulgarian, Greek, Romanian, English. - EMEA: parallel corpus from the European Medicines Agency in 22 languages. - CzEng: Czech-Englsih parallel corpus. - EPPS: word alignment documents - Spanish-Dutch NER human annotated data
My extended CV
- Download here
|