Dataset


Description

An introduction and description of the corpus is available here.


Download


Unified format

May 19, 2011 - NEW VERSION of Training Corpus!!!!!

Last weeks the committee has been noticed about new errors related to some aspects of the corpus (e.g. drug labeling error, charoffset errors, etc.). Thus, the corpus has been deeply evaluated, and the noticed errors have been fixed. We apology for any inconvenience.

Also a new change has been introduced into the new corpus: the label "interaction" has been replaced by a new label "pair", which identify all possible DDI candidate pairs appearing in a single sentence. Thanks to all the participants who sent their observation about inconsistencies or errors, we appreciate very much this valuable information that make us improve the DrugDDI corpus.


Metamap(MTMX) format

May 19, 2011 - NEW VERSION of Training Corpus!!!!!

The errors and inconsistencies detected in this format have been corrected. We apology for any inconvenience.


Structure

The corpus has been generated in xml format, composed by the structure described in this document.

Participants are allowed to submit a maximum of 5 runs. Each run can include different sources of information and use different techniques. A submission file must be an txt file that includes all pairs of drugs (at the sentence level).


Test file example

June 3, 2011 - NEW!! The Test dataset is available for registred participants here.

An Example of a test dataset in Unified format can be found here.

An Example of a test dataset in MMTx format can be found here.




Universidad Carlos III de Madrid - Computer Science Department - LABDA http://labda.inf.uc3m.es

Valid XHTML 1.0 Strict - ¡CSS Válido! - Icono de conformidad con el Nivel Triple-A, 
				de las Directrices de Accesibilidad para el 
				Contenido Web 1.0 del W3C-WAI