ECCC-3 Contents Page THEOCHEM Home Page Elsevier Chemistry Home Page






WWW-based Chemical Information System

Peter Ertl and Olivier Jacob

Novartis Crop Protection AG, Agro Research Computing
R-1045.2.43, CH-4002 Basel, Switzerland

peter.ertl@cp.novartis.com



Abstract

A WWW-based chemical information system developed within the Crop Protection Division of Novartis is presented. The system enables easy retrieval of molecules from the corporate database, generation and editing of structures using a novel molecular editor written in Java, on-line calculation of hydrophobic and electronic physicochemical properties, generation of molecular images of various types and simple QSAR analysis. The article illustrates numerous advantages of WWW-based chemical information tools installed on the company intranet.


1. Introduction

The capabilities of the world wide web in providing textual and graphic information are very well known. Newly emerging techniques like Virtual Reality Modelling Language, chemical plug-ins and Java added another functionality to the web. They changed it from a static to a fully dynamic environment. Easy to use and platform independent WWW-based tools became in this way ideal also for application in chemical information processing [1, 2]. The Novartis WWW-based chemical information system uses these new technologies to provide bench chemists with a set of tools to assist in the design of potent and at the same time environmentally acceptable agrochemicals. The system currently enables :

The system utilizes client-server web architecture. Users interact with it through their Netscape browsers which are installed mainly on IBM PC compatible computers. All of the processing is done on a Silicon Graphics WebFORCE server running the cgi-bin PERL scripts and C-executables invoked by them. For the interactive manipulation of molecular structures Java applets are used.


2. Molecular Engine

The Molecular Engine is the main tool for calculation of physicochemical properties and display of molecules. The engine accepts the input of structures in several formats, namely:

In the next step the desired action must be chosen. The following options are currently possible :

After submitting the job the request is sent to the server, processed there and the results are displayed using the HTML language within the browser window.


2.1. Calculation of Molecular Properties

Using the Molecular Engine it is possible to calculate important hydrophobic and electronic properties for organic molecules.

Hydrophobic properties express the ability of a molecule to be transported in the environment and in an organism, to interact with biological membranes, and to be bound to a receptor by hydrophobic forces. Hydrophobic properties calculated currently by the Molecular Engine are

For the calculation of hydrophobic parameters in-house programs based on the published theory [4 - 7] are used. Parameters are calculated as a sum of atomic or group contributions, so topological information contained in the SMILES code is sufficient.

Electronic properties characterize the electronic distribution within the molecule. They account for the ability of the molecule to react and for the electronic interaction between the drug and the receptor. Electronic properties are calculated by the AM1 [8] semiempirical quantum-chemical method using 3D structures generated by the CORINA program [9]. Available electronic properties are:



2.2. Generation of Molecular Images

With the Molecular Engine it is possible to generate molecular images in various modes. Images are generated in a gif format by using the gd gif library [10]. The response is very fast, the average image is generated in 2 - 3 seconds. Examples are shown below.

The advantage of CPK (space filling) images is the possibility to see the overall space requirement of the molecule. In ball-and-stick or tube models, on the other hand, the atomic connectivity is visible. The dotted molecular surface with a visible molecular frame combines the advantages of both of these approaches. Generated images may be saved locally on disk and incorporated in a reports or other documents.

The system allows also calculation and display of molecular lipophilicity potential (MLP) [11] on the molecular surface. MLP is calculated from atomic lipophilicity contributions. Examining the surface lipophilicity potential can reveal parts of the molecule which are involved in the hydrophobic interactions and provide some hints concerning its mode of action.

Further it is possible to interactively manipulate 3D molecular structures generated by CORINA using an in-house molecular visualizer written in Java.


3. Editing of Structures

Since molecular construction and editing are indispensable for chemical information systems, and no such tool was available for the WWW, we decided to develop our own WWW-based molecular editor. Two versions of editor are currently available, a clickable map version and a Java-powered version which uses the latest developments in network programming.

The version of editor based on clickable map enables easy construction of organic molecules by adding / connecting atoms, bonds, rings and functional groups. The user chooses the desired action from the menu, and then picks the appropriate place on the drawing area. All of the actual processing is done in the background on the server, where the coordinates of picked point are sent. The program running there performs the requested changes and returns a gif image of the modified structure. The disadvantage of this approach is, however, that the program requires a new connection to the server for each structural change.


The new version of the editor is written in the Java language. Java is a new programming environment developed recently by the Sun Microsystems [12]. Programs written in Java called "applets" can be downloaded through the network and integrated into the main body of the web documents. By using this technology the Molecular Editor can be used directly within the web page. The work with this editor is analogous to editing molecules with standard chemical drawing packages. The editor creates a SMILES code which is passed for further processing to the server.

4. Calculation of Molecular Similarity

Finding and examining molecules which have similar physicochemical properties with the known leads can provide many useful ideas for the design of new structures and also some hints concerning their mode of action. Web based molecular similarity searching enables chemists to find such similar molecules. The properties currently considered in searching are :

It is possible to use all these properties as a criterion (default) or any combination of them. Similarity of two molecules is calculated simply as a distance between two points in the n-dimensional space of (properly scaled) molecular properties. The search may be performed either in the database of commercially available third party molecules (with more than 500 000 entries) or the full corporate database. The actual searching is done by the C program launched by the PERL cgi-bin script.


5. Tools for QSAR Analysis

One of the main motivations for setting up our WWW server was to provide chemists with a set of simple, interactive and easy-to-use tools for basic molecular modelling and QSAR analysis. Currently, it is possible to generate data tables with calculated standard hydrophobic and electronic physicochemical properties for a series of molecules. To create such tables it is sufficient to input or paste into a standard form interface a list of corporate test numbers, optionally with biological activities, and all subsequent calculations are done automatically. Calculations run on the Silicon Graphics server using in-house programs for estimation of hydrophobic properties and the MOPAC package for the calculation of electronic information. The resulting table can be directly pasted into Excel and manipulated there. Data tables may also be submitted for automatic QSAR processing, where the best correlation equations between activity and properties are identified (best in terms of crossvalidated R2cv). The correlations may be viewed and examined graphically (quality of fit, identification of outliers) using the interactive Java applet.


6. Summary

The WWW-based system described here has been in use in the Crop Protection Division of Ciba-Geigy and later of Novartis since about one year. Due to the user friendliness of web technology it was possible to introduce it without any special training. The acceptance of the system is very good and currently it is accessed by more than one hundred users, mostly bench chemists. The system contributes to the goal of the Crop Protection Division of Novartis - to offer products which are beneficial to agriculture and whose safety to human health and environment has been investigated and substantiated in accordance with the state-of-the-art in scientific knowledge and methods.


References

  1. S.M. Bachrach, P. Murray-Rust, H.S. Rzepa, B.J. Whitaker, Network Science http://www.awod.com/netsci/Science/Special/feature07.html (1996).
  2. W.D. Ihlenfeldt, J. Gasteiger, Chemie in unserer Zeit, 29 (1995) 249.
  3. D. Weininger, J.Chem.Inf.Comput.Sci., 28 (1988) 31.
  4. A.K. Ghose, A. Pritchett, G.M. Crippen, J. Comput.Chem. 9 (1988) 80.
  5. V.N. Viswanadhan, A.K. Ghose, G.R. Revankar, R.K. Robins, J.Chem.Inf.Comput.Sci., 29 (1989) 163.
  6. K. Wakita, M. Yoshimoto, S. Miyamato, H. Watanabe, Chem.Pharm.Bull., 34 (1986) 4663.
  7. W.J. Lyman, W.F. Reehl, D.H.R. Rosenblath (eds.) Handbook of Chemical Property Estimation Methods, Am.Chem.Soc., Washington DC, 1990.
  8. M.J.S. Dewar, E.G. Zoebisch, A.F. Healy, J.J.P. Stewart, J.Am.Chem.Soc., 107 (1985) 3902.
  9. J. Sadowski, J. Gasteiger, Chemical Reviews, 93 (1993) 2567.
  10. Th. Boutell, gd gif C library, http://www.boutell.com/gd/, 1996.
  11. F. Croizet, M.H. Langlois, J.P. Dubost, P. Braquet, E. Audry, Ph.Dallet, J.C. Colleter, J.Mol.Graphics., 8 (1990) 153.
  12. Sun Microsystems, What is Java, http://java.sun.com/ 1995