IntEnz includes one or more reactions per enzyme entry, as free
text. Those reactions have no unique identifiers, so if two
enzymes catalyze the same reaction, that means two rows in the
database containing the same equation string
(redundancy).
Reactions are ordered for an enzyme entry with multiple
reactions, but there is currently no way to distinguish reaction
steps and alternative reactions (ambiguity).
The string of characters which defines the equation has no link
to other databases for reactants and products
(isolation).
Enzyme entries have a unique EC number which can eventually
change due to transfer or splitting. Besides, the curator tool
architecture assigns them an internal unique identifier which
changes whenever the entry changes its status
(instability).
Advantages | Disadvantages |
---|---|
Generic schema format which could in theory map to any domain model. | Non standard approach of database design - which means only a few specialised people could support it. |
-- | Would require us to move to MySQL - which would cost a minimum 5 months full time development to move all our PL/SQL procedures developed for IntEnz. IntEnz does not using any generic mapping tools like Toplink or OJB so all the JDBC would need to be converted which would require a major revamp. |
-- | Long term maintenance of this project would be at risk as mySQL is not officially supported by any DBA's in our group. |
Advantages | Disadvantages |
---|---|
Existing tool which is succesfully used in production by another group at the EBI. | The tool is dependent on the database schema as described above. According to the developer if we deviate from using the generic database schema setup it would require a major revamp of the tool and in his opinion it would not be worth it. |
Using standard JAVA which is supported in our group. | The validations which are currently done could cause a maintenance headache. Currently all validations are done via the STRUTS validation framework which ties in all the data neatly. |
The CVS facility is a great feature. | Again the long term maintenance of this tool would be slightly worrying due to their non-standard approach to design. It is much easier to hire a developer who knows the STRUTS framework rather than a specific non-standard database tool. |
-- | Using tool would require some major rework and it does not make sense in the light that IntEnz already has a specialised curator tool with all the enzyme functionality in it. The IntEnz curator tool is stable and has been in production for over a year, in addition the curators are finally happy with it. |
The NC-IUBMB and ENZYME have strict requirements on how their data and therefore their controlled vocabularies are written. It is therefore not possible to use an external source as a controlled vocabulary as we would be unable to guarantee the formatting required by ENYZME and NC-IUBMB. For this reason we need to use ChEBI as our controlled vocabulary because
An abstract reaction has no clear definition of the
biochemical process, the ambiguity coming from a broad range
of reactants and/or products. Thus, its participants cannot
be easily mapped to ChEBI. The only information for an
abstract reaction is a textual description.
This is the way in which every reaction is treated currently
in IntEnz.
A simple reaction has no steps. It can be part of an overall (complex) reaction, i.e. be a step reaction.
Deletion of reactions is actually a change of status (the data won't be removed from the database), much in the same way deletion of compounds is handled in ChEBI.
Though not being part of the reaction itsef, cofactors will be reimplemented to take advantage of the new mapping to ChEBI, favouring the reuse of data in IntEnz.
The assignment of reactions - simple or complex ones - to an
enzyme entry will allow curators to select from existing
reactions in the database, or create new ones if necessary.
Several reactions assigned to the same enzyme entry are
considered as alternative reactions. Sorting of alternative
reactions will be implemented as well.
End users and curators will query IntEnz for reactions
(by EC number, reactant/product ChEBI id, reactant/product
name...) using a web interface.
External databases will update cross references to stable
public reaction id's, querying the database or - in the
future - using webservices.
The mapping of reactants, cofactors and products requires the use of the ChEBI database. It can be achieved in several ways:
IntEnz would need access to ChEBI, by adding the needed web services jar files and configuration files to the distribution.
An unbalanced ("abstract") reaction must have just a textual representation, like the current equation field.
A balanced ("chemical") reaction must meet the following requirements:
A reaction will have four possible directions:
Reactions will have a status flag to avoid real deletions in the database and hence to enforce stable identifiers.
Reactions will be unique within the database, that is: one reaction could be assigned to more than one enzyme (reusable).
Reactions could have comments andlinks to other databases (KEGG reactions, MACiE). Links (to KEGG "IUBMB" reactions, for example) will be automated whenever possible.
Reactions will have qualifiers:
"Polymerization", "Class of reactions" and
"Chemically balanced"
("Spontaneous" has been discarded).
These qualifiers are not mutually exclusive, and should
be automated whenever possible, but still letting curators
to annotate the reactions.
Complex (overall) reactions will be made of two or
more single (step) reactions. Summing single
reactions will be possible, to yield a complex one.
Splitting a simple reaction into steps will convert it
into a complex one.
Validation will be applied to complex reactions as well as
their steps. Every step reaction will have a global
coefficient in order to balance apropriately the overall
reaction.
For now, just one level of complexity will be allowed, that
is a complex reaction of this type cannot be part (step)
of another complex reaction.
Step reactions will have a qualifier attached to it
which defines it as a "Primary" reaction or "Secondary"
reaction. (this was discarded, as it was really
meant to be applied to elementary reactions - see below -,
but even these cannot be qualified like this in an
absolute way)
Other type of complex (coupled) reactions will have no
specified steps, but will be the sum of two or more
simultaneous unordered elementary reactions, which
will act as reusable building blocks (for example,
hydrolysis of ATP).
These complex reactions without steps could act as steps
in an overall transformation.
Reactions which currently include an OR operator in its equation text ("an aldehyde or ketone", for example) would split into two alternative reactions for the enzyme entry.
Complex cofactors (text including the operators ";", "and", "or") will be splitand mapped to ChEBI. The OR operator could have several meanings (to be decided if one is enough).
The database schema should be modified as follows:
There is an example available (outdated, as of 2006-10-31).
As part of the future data integrity checker:
The classes ReactionDTO and CofactorDTO should be modified according to Reaction and Cofactor.
A validation for stoichiometry - possibly based on the compounds formulae - should be implemented in the biobabel package and used by the ReactionDTO class.
A new class ChebiHelper
should provide methods to use it as a ChEBI webservices
client, retrieving information on the reaction participants:
recommended name, formula, structure (image)...
A new class CompoundMapper would be
responsible for retrieving compound data from the IntEnz
database (data already imported from ChEBI). It should be
used by EnzymeReactionMapper and
EnzymeCofactorMapper.
The web interface should be extended to create and edit reactions (see a preliminary model. Instead of a plain text field, buttons will be implemented to add alternative reactions, add steps or edit an existing reaction; a reaction editor will show up, including:
A post-condition should be added to the edition of new reactions: it must not be already present in IntEnz, otherwise the existing reaction will be used. The check would need a fingerprint of the reaction generated from its participants.
We estimate the project will take approximately 9 months. We have spent about 2 months gathering requirements and writing this specification, hence we estimate in another 7 months we expect the project to be complete. See below for the project deliverables:
Priority | Description | Developer | Estimate |
1. | Reaction database design and modelling. | RA and PdM | 10 d |
2. | Reaction database implementation. | RA | 20 |
3. | Reaction mapping to ChEBI and ChEBI import. | RA | 45 |
4. | Reaction validator. | RA | 20 |
5. | Curator tool implementation of Reaction database. | RA | 40 |
6. | Public tool implementation of Reaction database. | RA | 20 |