Special characters

XChars (for "special characters") is a XML framework conceived to:
  • disambiguate chemical language.
  • add information to text in order to allow computers and non-expert humans to learn it
  • allow different representations of the same text depending on the available or preferred encodings

Meaning

The meaning of 'S' is different depending on the context:
S-adenosyl-L-methionine
(3S)-linalool
In the first case it refers to the Sulfur element, but in the second one it has a stereochemistry meaning. Thus, using XChars XML they would become
<element>S</element>-adenosyl-<stereo>L</stereo>-methionine
(3<stereo>S</stereo>)-linalool
, which can be distinguished by a computer.
"caldesmon" is a protein, but a computer won't know it unless it's given a long list of protein names. That's not portable: the list should go wherever the data go in order to a machine to know which names are proteins. XChars solves this by labelling the name:
<protein>caldesmon</protein>
Labelled names can be given a highlighted representation. But most importantly, they can be searched for because now they have a meaning which was only obvious for humans who know the name.

Formatting

There is another use of XChars, more related to formatting. The XChars markup is easily converted to an ascii version by the software provided (see XSL and Java package modules). Other transformations are possible, HTML is just one of them:
<i>S</i>-adenosyl-<small>L</small>-methionine
(3<i>S</i>)-linalool
XChars is highly customizable. The configuration file can be tweaked to get this HTML tranlation:
<span class="elem">S</span>-adenosyl-<span class="ster">L</span>-methionine
(3<span class="ster">S</span>)-linalool
and use CSS so that both 'S' can be distinguished on a web page at a glance.
The scissile bond in peptidase substrates can be written using its Unicode character, but if the data is used in a plain ascii text file it would look like
&#9532;
That's not very meaningful. XChars representation is:
<scissile/>
This is one of the empty XML elements in XChars. For the ascii encoding it would become -|-, but on a web page it would be translated to the unicode entity &#9532;. Moreover, the gif encoding allows the symbol to appear exactly the same in every browser, no matter the font it uses:
<img src="img/scissile.gif"/>