1. Introduction

1.1 What is IntEnz?

IntEnz is the name for the Integrated relational Enzyme database. IntEnz will contain enzyme data approved by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB).

1.2 Why IntEnz?

For many years, Amos Bairoch has been supporting the Enzyme Nomenclature database, appropriately called ENZYME. Indeed, this was the only electronic version of The Enzyme Nomenclature. For the big part of the bioinformatics communtity, ENZYME is the Enzyme Nomenclature. An obvious example is its use for SWISS-PROT annotation. However, in spite of containing the data from The Enzyme Nomenclature, it was never considered an official publication on The Enzyme Nomenclature (as opposed to, say, Enzyme Nomenclature 1992, Academic Press, San Diego). ENZYME exists as a plain ASCII text file.

The Web Version of Enzyme Nomenclature has been prepared by Gerry Moss and includes the complete contents of Enzyme Nomenclature 1992 plus subsequent supplements and other changes. It is official and easy to navigate. However it is not a true database but a set of manually edited HTML pages.

During their course of evolution, ENZYME, Enzyme Nomenclature and other databases containing enzyme-related information accumulated a number of discrepancies in the data that are meant to be identical. These discrepancies could be anything e.g. typos, corrections, new data or their absence. With more new data, it is becoming difficult to keep these sources in sync.

The goal of the IntEnz project is to have a relational database that integrates all of the Enzyme Nomenclature.

1.3 Funding

After the BioBabel grant, no active funding is available to actively develop IntEnz. None of the annotators funded by new Felics grant will be based at the EBI.

1.4 Project partners

SIB (enzyme annotation).

2. IntEnz annotation

2.1 General remarks

Since most of the data in IntEnz will have "official" status, the mechanism of approval by NC-IUBMB had to be implemented. The data will enter IntEnz in one of two ways:

  • Legacy data were loaded from the two pre-existing sources. As a rule, these do not have to be formally approved since the Enzyme Classification List already is official. However, if there are serious discrepancies between the sources and curator thinks the change should be made to the master data, this may require an approval to suggested change.
  • New data are entered directly to IntEnz via IntEnz Web Tool. The tool, while prevents entering incomplete data, does not check data validity. This data (whether it is a new entry or any modification of an existing entry) will be considered suggested. Suggested entries are visible to IntEnz curators but not to general public. An NC-IUBMB member has priveleges to move the entry status from suggested to proposed. Proposed entries or changes have to be publicly available for at least 2 months before the next action is undertaken. This allows the scientific community to send the feedback to the NC-IUBMB. After this, the entry could be either (i) given approved status, (ii) modified and left as proposed for some more time, or (iii) rejected completely.

2.2 IntEnz to ChEBI mapping

Since IntEnz is one of the sources of terminology for ChEBI database, it was logical to link all the compound terms in reaction and cofactor fields of IntEnz to corresponding ChEBI entries.

Initially, the reaction field in an IntEnz entry was a free text. We aimed to improve quality of reaction data in IntEnz and provide other databases with the new means of annotation by

  • Assigning public, stable, unique identifiers to reactions, thereby
  • Disentangling enzyme numbers and reactions
  • Removing redundancy
Partners from SIB and Uni Cologne expressed interest in this development. As a result, the reaction database Rhea was created. For more info, refer to the initial reaction specification document and the Rhea project website.

2.3 Sources of data

  1. Trinity College Dublin (TCD) maintains on behalf of the NC-IUBMB the Enzyme Classification List (HTML files).
  2. Swiss Institute of Bioinformatics (SIB) produces ENZYME (ASCII file).

3. IntEnz products

3.1 Relational database

The main product of this project is a relational database, IntEnz, which can be queried via the web interface. The updated figures can be found in the statistics page.
Currently, we don't provide a downloadable database dump, which would be an Oracle dump only usable by Oracle installations.

3.2 HTML files

The contents of the relational database can be browsed in form of HTML files in the IntEnz website, which show not only the IntEnz (integrated data) view, but also the NC-IUBMB and SIB views.

3.3 ENZYME flat file

The contents of the relational database is exported every night in form of ASCII file (enzyme_intenz.dat). It is essentially identical to the enzyme.dat flat file prepared by SIB (found at ftp://ftp.expasy.org/databases/enzyme/) which is used e.g. for UniProt anotation. In fact, the enzyme_intenz.dat file is used by SIB curators as a basis for their work to produce enzyme.dat. That way, enzyme data is curated just once using the IntEnz curator tool.

3.4 IntEnzXML

IntEnz data is exported as XML, following the provided specification.

3.5 BioPAX export

For compatibility with a broadly used standard, IntEnz is also exported as a BioPAX file.

3.6 Curator tool

Most of IntEnz annotation is done from outside of the EBI with the help of the IntEnz Curator Tool.