The GeneReg corpus consists of 314 Medline abstracts dealing with the regulation of gene expression in the model organism E. coli. The regulation of gene expression can be described as the process that modulates the frequency, rate or extent of gene expression. During gene expression, the coding sequence of a gene is converted into a mature gene product or products, namely proteins or RNA (taken from the definition of the Gene Ontology class Regulation of Gene Expression, GO:0010468). GeneReg provides three types of semantic annotations: named entities involved in gene regulatory processes, such as transcription factors and genes, pairwise relations between regulators and regulated genes, and event triggers (clue verbs) essential for the description of, e.g., gene expression and gene regulation events. For all three annotation levels, the annotation vocabulary was taken from the Gene Regulation Ontology (GRO).
Download the GeneReg corpus(.tar.gz, 290K)