Scientific Databases and Visualization - BioReader
The main focus of this project is the development and application of natural language processing (NLP) methods to support dealing with chemical compound names. A chemical compound can have many different names; it can have several trivial names as well as several systematic names, even when following naming recommendations as those of the International Union of Pure and Applied Chemistry (IUPAC). Furthermore, underspecifying names and class names frequently occur in publications, databases and patents.
This Project focuses on two different approaches:
ChemHits identifies names of chemical compounds via string normalization. Input names are normalized and subsequently matched against one of several reference databases (ChEBI, KEGG, etc.). (Version 1.0 released Dec. 2009!)
CLP(name2structure) aims at a deep analysis resulting in a chemical structure and classification for a given name.
The methods and tools developed under this project are to be used by curators of the SABIO-RK database for the identification of compounds.
page last modified: 15.02.2010,12:04
Project Manager
Group Leader, Priv.-Doz. Dr. Wolfgang Mueller Email:
Phone: +49 (0)6221 - 533 - 231 Fax: +49 (0)6221 - 533 - 298