TrAC - Internet Column


To cite this article please refer to the printed edition of TrAC: Trends Anal. Chem., 16 (1997) 370

Web-Based Information Management System

Douglas Brown, Antony Williams and David McLaughlin

Eastman Kodak Company Rochester, NY 14650-2132


ABSTRACT
INTRODUCTION
WIMS DESIGN GOALS
TOUR OF WIMS
HOW IT WORKS
SUMMARY


FIGURES

Fig. 1 - Login screen illustrating WIMS interface.
Fig. 2 - WIMS search screen.
Fig. 3 - WIMS Login result example.
Fig. 4 - A NMR spectrum displayed from the Web Plotter CGI.
Fig. 5 - A structure displayed from our Quantum database.
Fig. 6 - Sample output from the Recent command.
Fig. 7 - Sample output from the Stats command.
Fig. 8 - Sample output from the Active command.
Fig. 9 - The Relational Structure of WIMS.


ABSTRACT

A number of efforts have been made over the years to implement laboratory information management systems (LIMS) for analytical data at Eastman Kodak Company. These efforts have ranged from paper trails and spreadsheets to vendor supplied software, but in all cases the ability to provide corporate wide access to this data has been unfulfilled. We have recently used World Wide Web technology to implement a Web-based Information Management System, WIMS, which is a full-featured sample manager allowing desktop access to sample information, structures and spectra, as well as access to a structure database for accessing historical reference data. This paper will give an overview of WIMS from design goals to inception, as well as a discussion of future directions.


INTRODUCTION

Intranets have redefined the expectations of corporate communication systems. Electronic mail and instant access to endless corporate documentation, such as product information and various corporate guidelines, can now be obtained at low cost with a huge potential ROI (return on investment). Just as importantly, web front ends to databases over intranets offer unparalleled worldwide access to dynamic corporate information from a simple web browser. For corporate scientific laboratories obvious database applications to move to intranets are Laboratory Information Management Systems (LIMS).

LIMS are typically used to store sample history and analytical results on chemicals or materials in searchable databases. More advanced LIMS needs are association of samples with structures and analytical data such as spectra and automated reporting. Commercially available LIMS have failed to meet the needs of many corporate research laboratories to date, especially the specific requirements of spectroscopic laboratories. As a result, many labs have developed in-house systems or have extensively modified commercial packages. To move to a web-based application, while maintaining historical functionality, will require further investment in development, but the payoff can be a vastly more versatile and accessible, low maintenance system.

Many attempts have been made over the years within the Analytical Technology Division at Kodak to implement LIMS. These have included, and indeed still include: filing systems for data, paper trails and log books, simple spreadsheets containing sample identifiers and information as well as a full-blown vendor supplied LIMS system. In all cases, the ability to provide worldwide access to spectroscopy-based data has been unfulfilled. The Nuclear Magnetic Resonance (NMR) group has undertaken the task of providing a user-friendly system that can link a unique sample identifier to sample information, a chemical structure or structures, associated NMR spectra, and a final report of analysis. In order to allow access to the system by Kodak sites across the world we have chosen World Wide Web technology to implement a platform independent Web-based Information Management System (WIMS) on the corporate intranet.

The WIMS project was iteratively improved during development of the system as our working knowledge of the capabilities of the web grew. It is expected that WIMS will continue to evolve as users and programmers become more familiar with the current capabilities, and as web and browser technology expand. This paper will give an overview of WIMS from a design goals standpoint, a tour of WIMS functions and a discussion of future directions.


WIMS DESIGN GOALS

WIMS began as a simple method to simply assign a unique sample identifier to incoming work for the NMR group at Kodak. From there we added tracking of the submitters and analysts, and soon realized the enormous potential of a web-accessible LIMS. The goals for the project were modified over the period of several months, and are summarized below.

Easy Access to Samples, Structures and Spectra - The major goal of WIMS is to integrate access to sample information including reports, structures and spectra to satisfy the needs of the users, who are primarily the analysts, while still meeting the other design goals listed below. As the overall function of a LIMS is to store sample related information for later retrieval, the task was to optimize these functions for all levels of users - from those who simply want to login a sample with minimal interaction, to those who will take maximum advantage of the information in the system. The WIMS interface consists of two frames as seen in Fig. 1. A narrow frame at the top contains the primary WIMS commands so that individual tasks can be easily selected from anywhere within WIMS. All output appears in a large lower frame. Structures and spectra are stored in separate databases and are retrieved and displayed using C-based CGIs (Common Gateway Interfaces) written for these tasks.

Fig. 1 - Login screen illustrating WIMS interface.

Speed - Every aspect of WIMS was designed with much consideration paid to minimizing execution and response times. To this end WIMS presently runs on a Sun Ultra-1 using the NCSA HTTPD server package [1]. Note that the only images within the core WIMS program (spectra and structures are in different databases) are in the top frame, which is only loaded once per session, and additionally no backgrounds, non-standard fonts, or other Web or HTML gimmicks are employed that require significant bandwidth and could delay the loading of a page.

The heart of WIMS are the CGIs that send information to the server, and return HTML data to the client browser. The CGIs are written in ANSI C and moderated by Thomas Boutelle's cgic library [2]. Handling data with C-based CGIs is a fast method for communicating with external databases via the web, as long as the programs are kept small to minimize overhead. In addition, it can be easier to code CGIs versus other tools such as plugins or JAVA applets, as no special development environments are needed.

Multiple binary flat file databases are utilized in WIMS, one for each analytical area (such as NMR, IR, and MS), as we decided that most users will generally want to search only the data from their own analytical area. For example, NMR analysts will generally want to search just the NMR data. By having the NMR data in its own database, this type of search can proceed much faster than if all the data were together. Users can still search other databases if authorized, and a CGI has been written that will search all the databases simultaneously for some key fields.

The use of binary flat files is a standard method for operating small- to medium-sized databases. New entries are simply appended to the end of the database file. A small separate binary file contains the first and last lab numbers in the database, which are assigned sequentially. An entry can then easily be located as an offset of {(lab number - first number) * (record size)} from the beginning of the file. This eliminates the need for a hash file or other random access programming methods to quickly find a record in the database, and overall this technique provides very fast access to simple databases. This method was tested with a database containing 1,000,000 entries and routine sample queries still took only a few seconds to execute. Note that field sizes (and therefore record sizes) are constant and restricted. This has some obvious disadvantages, such as much empty space and the limited amounts of information that can be stored, but the tradeoff for performance and simple data access and coding makes it a favorable compromise for our situation.

Finally, each major database function in WIMS has been coded as a separate CGI program. This obvious approach makes for more code and duplication versus having one large program, but the desired functions can then be executed more quickly, as there is less overhead for each individual function than if WIMS were one large CGI.

Searching - To many users this is the most important function of WIMS. The search has been designed so that all login fields are searchable (see Fig. 2), and all non-blank fields submitted are searched. Searches are case insensitive substring searches. In addition, starting and/or ending date delimiters can be specified, as well as a choice of output formats. A non-indexed, beginning to end, search is used so that entered data is immediately available for search, rather than having to wait until the databases are indexed again. Although this approach is probably slower than an ideal indexed search would be, Kodak users want to include all current data in any searches rather than wait for them to be indexed. Searches currently take only one to two seconds of real-time which is acceptable. Searching of report text is provided by the Excite engine [3] for which indexes are updated nightly.

Fig.2 - WIMS Search Screen.

Data Integrity - Data integrity is maintained by not allowing any changes to the "Date In" or "Date Out" fields, which are automatically filled with a date and time stamp by WIMS on login and logout respectively. Note that with CGIs this type of simple security is easy to implement directly into the CGI C code. There is no need to change file permissions, as only the CGIs have direct access to the database. For the audit trail, the full log files from the NCSA HTTPD web server are kept, which contain a list of all transactions performed by the server.

Accounting - Users have not requested specific accounting functions to date but we anticipate that this will be an important part of WIMS. An example has been provided with the Stats function. This function currently sums the number of total samples and active samples per month that have been entered into the system and a "turnaround time" per month, which is an average difference between the "Date Out" and "Date In" time stamps.

Easy Maintenance - This goal has been satisfied by coding WIMS in a conventional manner using ANSI C for almost all of the main program. Furthermore, code maintenance has been simplified through several means. The different major WIMS functions are individual core programs and an initial HTML program for each analytical area supplies the core programs with the analytical area name, for example "NMR", which is passed throughout the code. In this situation new analytical areas can be added without changing the core programs, and global changes to the database functions can be made by changing just the necessary core programs. Finally, WIMS utilizes the Unix make function to keep compilation and the incorporation of new or modified code simple.

Portability and Expandability - We wanted to create a system that would be technologically feasible for the next ten years. It is not expected that the WIMS code, or even the database format, will remain the same in this time frame but we expect that we will still be using a web-based LIMS in several years. WIMS consists entirely of HTML, PERL, and ANSI C programs, along with the binary databases. All these should be completely portable with little or no modifications to other platforms such as NT servers if necessary. Endless new functions can be added and the database format can certainly handle our short-term needs. By using an open web-based system we are also in a good position to take advantage of any enhancements such as JAVA and APIs.

Security - WIMS contains proprietary data and our goal is to meet or exceed company guidelines regarding security. Security will not directly be a part of WIMS and instead we plan to use a centralized authentication service provided by Kodak intranet services. Access to WIMS is currently by analytical area, with one user and password per area, and may remain so.


TOUR OF WIMS

Much of the development time with WIMS has been spent on creating a simple data flow with an interface to give users easy access to the information they need. To demonstrate this interface and data flow a brief tour of WIMS is presented here, which will follow a sample from login through some of the functions WIMS users will typically perform.

On entering WIMS the login form is presented as seen in Fig. 1. This form is used to add sample information to the database. A customized feature of WIMS that can be seen in this form is the editable pull-down menus, such as the one for "Analyst". These pull-down menus are created from text files that can be directly edited by clicking on the field name. An advantage of using HTML forms for a LIMS application is that the data is not entered until the "LOGIN" (submit) button is activated. The data can be easily reviewed and the user can move on to another function without having to delete anything, then go back to the login screen or just abandon the incomplete form.

After logging in a sample to WIMS a page of information is returned which contains several hypertext links to perform various operations, most of which are related to the sample. An example is shown in Fig. 3. To perform a sample query, the WIMS number for a sample can be directly entered using the View function to receive a very similar page. Addition of reports, email of reports, uploading of structures, and viewing sample spectra and structures are all done from this page.

Fig. 3 - WIMS Login result example.

Text reports can be typed or pasted into a provided text box, and later edited. HTML reports must currently be uploaded (through browser or other local program) to an anonymous ftp server, from which the reports are automatically copied to the WIMS server nightly. The NMR group stores NMR spectra in a proprietary compressed ASCII format on the various instruments as they are processed. The name must contain the unique WIMS sample identifier. These are also automatically copied to the server nightly. When a sample query is run using the View command, the reports and spectral directories are checked for files containing the same WIMS sample number, and the appropriate links are created on-the-fly to access this data. Spectra are currently viewed using a custom routine that generates GIF images on-the-fly [4]. An example is seen in Fig. 4.

Fig. 4 - A NMR spectrum displayed from the Web Plotter CGI.

Proposed and final structures can also be uploaded to our Quantum database from this page in the MDL molfile format from the client. They are currently accessed using links from the WIMS sample number (and other identifiers such as "Kodak Number") to a custom CGI that displays GIF images of structures [5]. An example of output from this CGI is shown in Fig. 5. Note the link to download this spectrum in molfile format. This result will also contain (not shown)1H and 13C NMR assignments, a mass spectrum and an IR spectrum if they exist in our database.

Fig. 5 - A structure displayed from our Quantum database.

The Recent command returns a screen containing a listing of the most recent entries to the database, and links to directly view each entry as seen in Fig. 6.

Fig. 6 - Sample output from the Recent command.

The Stats command returns a listing of the number of total and active jobs per month, as seen in Fig. 7, as well as an average "turnaround times" in various categories.

Fig. 7 - Sample output from the Stats command.

The Active command will display all active jobs in a fixed or user-selected format. For the user-selected formats the data is sorted using the first field. This function is primarily used by analysts to manage and prioritize their day-to-day work. An example is shown in Fig. 8.

Fig. 8 - Sample output from the Active command.

The Search command takes the user to a search input form. All non-blank fields are searched and there are some other options such as date delimiters as seen in Fig. 2.

The relational structure of WIMS is shown in Fig. 9 below. The user is presented with an interface through the web browser into which commands are entered. Most commands activate CGIs to perform the requested functions on the databases, and then return the results to the web browser for display.

Fig. 9 - The Relational Structure of WIMS.


HOW IT WORKS

The process that WIMS uses to move data between the web browser and databases on the server will be described using the Login command as an example. When accessing the URL (Uniform Resource Locator) for the NMR WIMS, an HTML form is loaded. This screen is composed of two frames. The top row of buttons will remain fixed. The larger bottom frame will contain the output from any command within this URL. In addition, the login screen is initially loaded into the bottom frame. This screen is created on the fly by the login CGI program, and is different for each analytical area. This is the same screen that will be loaded when clicking on the Login button. From here the following occurs:

1) When the Login button is clicked, the analytical area name is passed to the CGI gen_login.

2) The gen_login program returns the HTML code to the browser frame to produce the analytical area Login screen as mentioned above. This screen is actually an HTML form.

3) After filling out the relevant fields, clicking the LOGIN button on this page will execute a CGI program to read in and process the data from the form. This login CGI opens the NMR database header and reads in the first and last lab numbers contained in the database. The last number is incremented by the number of samples requested (default is 1, maximum is 100), and this number of records is appended to the database.

4) Finally, an HTML screen is returned by the login program to the browser describing how many new entries were made and their newly assigned WIMS lab numbers.


SUMMARY

We are taking full advantage of what the Web has to offer including low cost, a simple and ubiquitous interface and ease of development, and we are moving much of our analytical information to the Kodak intranet. As a part of this project, we have developed a web-based information management system, WIMS, to meet our needs for a sample management system, which provides worldwide access to Kodak chemists. Current work is focused on developing the ability to work more easily with multiple samples from a project, including multiple logout of samples, and the ability to associate one report with multiple samples. We are also plan to implement a structure drawing Java applet to submit structures to the Quantum database from WIMS, and a helper application so that users can retrieve spectral data through WIMS and manipulate, view, and print it from their local computer. This development has, for the first time, provided us with a platform independent application with sample management capability and access to molecular structures, spectra and reports of analysis. For the management of analytical data here at Kodak we have shown that "The Web is the Way".


ACKNOWLEDGEMENTS

The authors are indebted to the following people for useful discussions and technical support during this project: Brian Adams, William McKenna, Ilia Levi, Peter Monacelli, Alan Payne, Paul Schulwitz, and John Spoonhower.


REFERENCES

[1] National Center for Supercomputing Applications, http://hoohoo.ncsa.uiuc.edu/docs/.

[2] T. Boutell, http://www.boutell.com/cgic.

[3] Excite Inc., http://www.excite.com.

[4] D. McLaughlin, Eastman Kodak Company, CGI to display structures and other information from Quantum database, unpublished, 1996.

[5] D. E. Brown, Eastman Kodak Company, WebPlotter CGI to display spectra as GIF files, unpublished, 1996.

x