In. This is to certify that the dissertation entitled AUTOMATED ANALYSIS OF TRIPLE QUADRUPOLE MS/MS DATA presented by Hugh Ralph Gregg has been accepted towards fulfillment of the requirements for Ph.D. degree in Chemistry fl»? Major professor Date W6 MS U is an Ajfimafl'w Action/Equal Opportunity Institution 0-12771 MSU LIBRARIES .—,_. RETURNING MATERIALS: Place in book drop to remove this checkout from your record. FINES will be charged if 560E is returned after the date stamped below. AUTOMATED ANALYSIS OF TRIPLE QUADRUPOLE MS/MS DATA By Hugh Ralph Gregg A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Chemistry 1986 ABSTRACT AUTOMATED ANALYSIS OF TRIPLE QUADRUPOLE MS/MS DATA By Hugh Ralph Gregg A data Dexplosion has accompanied the advent of microcomputer controlled instrumentation. Many of these automated instruments are capable of generating more data than the operator is capable, willing, or able to analyze. What the analyst wants is not the data, but the chemical information contained in the data. In this dissertation, several software tools developed for collecting, storing and retrieving data as well as methods for extracting the information contained in mass spectrometry/mass spectrometry (MS/MS) data are presented. Triple quadrupole mass spectrometry (TOMS) is a multi- variate technique with unique data storage and retrieval requirements. Traditional MS or GC/MS data base systems are unable to handle the many dimensions of data that make MS/MS a powerful tool for structure determination and mixture analysis. A multi-dimensional data base was developed to quickly and efficiently store all data and instrumental parameters' produced by TQMS. A program is presented which is able to extract any arbitrary two- dimensional plane of data for display and interpretation. The final tools presented in this dissertation are aids in the interpretation of MS/MS data. One of the techniques used in analyzing a mass spectrum of an unknown is to identify characteristic neutral losses from the ions present in the spectrum. MS/MS is unique in that if the collision conditions demonstrate first order fragmentation of the parent ion, all the daughter ions formed are direct, single event fragmentation products of the parent. A program was developed which compares neutral losses (the difference between parent and daughter masses) with a table of common losses and presents a list of possible fragments and/or the substructures giving rise to those fragments. Mass spectra matching routines have traditionally been used to identify unknown spectra. The simplicity and wide variations in intensities in daughter spectra make normal EI matching techniques unsuitable. The matching algorithm presented in this dissertation was designed to group daughter spectra. Each of the tools presented in this thesis addresses different aspects of the data/information handling problem and demonstrates the need for powerful software tools to complement today’s complex instrumentation. ACKNOWLEDGEMENTS I’d like to thank everyone that has helped make my stay in East Lansing both rewarding and pleasurable, with an extra thanks to Chris Enke for his guidance and friendship. Tom Atkinson also deserves special recognition, for as busy as he was, he always had time to talk about chemistry, computers or home repair. Chris Enke’s research group provided an ideal atmosphere to both learn and make friends - thanks to all past and present group members. I’d also like to acknowledge the following people and organizations: the Physiology Department, Jack Hoffart, Paul Sorrenson (and Art’s bar), the Milton Hilton (and all former tenants), the Winded Spartans, Brenda Spiewak, Mike McPherson, the Beak & Dome Bar & Grill and of course, Milton Webber. ii TABLE OF CONTENTS Chapter 1: Introduction Introduction Structure determination by MS/MS Thesis outline Conclusions Chapter 2: Instrument Control Introduction Phase 1: Digital strip chart recorder Digital strip chart recorder: Hardware Digital strip chart recorder: Software Phase 2: Single micro control system Single micro control system: Hardware Single micro control system: Software Dr. Memory’s monitor Dr. Memory’s monitor internal structure Structured Library Oriented Programming System SLOPS internal structure Mass spectrometer control software Phase 3: Current mass spectrometer control system Conclusions Chapter 3: Multi-dimensional Instrument Data Base Introduction Scientific versus business data bases Survey of data base structures Data base capabilities An example of multi-dimensional data A programmer’s view of the data base Data base file formats Dictionary file Header file Pointer file Data file Creation of a data set Conclusions iii mmNo-a 10 10 15 15 19 23 26 30 34 35 36 38 39 42 44 45 46 50 54 59 62 46 66 Chapter 4: Retrieval and Display of Multi-dimensional Data Introduction Data retrieval EXTRACT - the program How EXTRACT works EXTRACT internal structure User interface Subroutine EXEDIT - Presentation of EXTRACT results Subroutine XTRACT Examples of EXTRACT use Conclusions Chapter 5: Extracting the Information Contained in MS/MS data Introduction Information contained in MS/MS data Neutral spectrum A simple expert system for neutral loss analysis ANEUT knowledge base and rules Examples of ANEUT use Data groupings MS versus MS/MS spectra matching Intensity matching for daughter spectra Ranking and sorting the data groups Examples of daughter spectra grouping Conclusions Bibliography Appendix 1: Dr. Memory, SLOPS and control software Appendix 2: Multi-dimensional data base subroutines Appendix 3: EXTRACT User’s Guide iv 69 71 72 73 75 75 78 80 82 83 86 90 93 94 98 98 101 104 105 107 108 110 113 115 122 140 153 NNHQQQMutht-IOIOOON hhhwwwwwwwwwwuw NH raw LIST OF TABLES TOMS operational modes Summary of commands for the digital strip chart software Dr. Memory’s monitor Basic SLOPS subroutines and commands TOMS status display TOMS control system commands Capabilities of the multi-dimensional data base Method to collect 5 dimensions of data Comparison of sequential and direct access files Definitions in the TOMS dictionary Dump of the parameters in the Reader file Summary of Header file format Summary of one record in the Pointer file Dump of one logical record in the Data file Outline of how EXTRACT functions User/machine interface types Numeric representation for EXTRACT 030! :5 wNwthHmmhuNHI-ok wNi-IH OI 010'. 0| mmmhpppwuwwwcswu NNNH 4 LIST OF FIGURES Software tools for structure determination First TOMS control system A typical single microprocessor system Relationship between conventional and SLOPS programming SLOPS library entry format Data base structures Writing to the multi-dimensional data set Instrument description file format Use of the dictionary file Header file format Pointer file format Data file format Menu format for EXTRACT Example plot of intensity vs. pressure extraction Example plot of intensity vs. energy extraction Example plot of three dimensional data extraction Comparison of daughter and neutral spectra Entries from the knowledge base for ANEUT Neutral spectrum and analysis of 1,4-benzenediamine Neutral spectrum and analysis of bis-2-ethyl-hexy1 adipate ‘ Characteristics used to match daughter spectra Example of the daughter spectrum matching algorithm Example of the El spectrum matching algorithm vi 12 16 20 27 41 47 52 53 58 61 63 77 85 87 95 99 102 103 109 111 112 Chapter 1 Introduction Introduction In the fall of 1978 a new instrument was just coming online and being tested in Dr. Enke’s research laboratory. The triple quadrupole mass spectrometer (TOMS), designed by Yost and Enke (1-3), was completed and proved capable of generating daughter spectra (mass spectra of selected parent ions). The technique of mass spectrometry/mass spectrometry (MS/MS) was not new (4-6), but this new instrument was capable of unit mass resolution in both mass analyzers, and boasted a highly efficient collision chamber. It was a new instrument - a promising technique and the basis for many years of fascinating research. In tandem quadrupole mass spectrometry (7), ions created in the source are selected by the first mass filter (quadrupole l) and undergo a fragmenting collision with neutral molecules in the collision cell (quad 2, RF only: not mass filtering). The ionic products of this collision are then mass analyzed (by quad 3) and detected. By this process, a fragmentation spectrum from each ion in the normal mass spectrum can be obtained. 2 The TOMS instrument can be operated in several ways by using the mass filtering quadrupoles in one of three modes: 1) selecting one mass for transmission, 2) scanning a series of masses, or 3) allowing all masses to pass through the quadrupole (RF only mode). Table 1.1 shows a summary of the operating modes and the resulting scans. The information contained in MS/MS data can be used in a variety of ways. Daughter spectra are useful in mixture analysis and screening techniques (2,5,8-11) as well as structure determination problems (2,5,12-14). I will present, in this dissertation, tools and techniques necessary to extract the information present in this type of multi-dimensional data. To help the reader to gain an appreciation for how these pieces fit into the overall goal, I will present the structure determination scheme developed in our laboratory. This method, and associated tools, have been developed and refined over the years by several people (15-16). Structure determination by MS/MS The basic premise behind our structure elucidation scheme is that if many small, "simple" parts or substructures of a sample are known, its structure can be determined. Each ion in the source results from some 3 Table 1.1 TOMS operational modes Quad 1 Quad g Quad 3 Description scan RF only* RF only* Normal mass spectrum no gas RF only* RF only* scan Normal mass spectrum no gas fixed RF only"I scan Daughter scan: spectrum gas on of all daughter ions from a selected parent scan RF only* fixed Parent scan: spectrum gas on of all parents that ’fragment to form a specific daughter ion scan RF only' seen at a Neutral loss/gain scan: gas on fixed offset spectrum of those from Quad 1 parents that lose/gain a given mass during a collision fixed RF on1y* fixed Single or multiple gas on reaction monitoring *RF only quadrupoles pass all masses (not mass filtering) 4 substructural feature of the original molecule, and the fragmentation (daughter) spectra of these ions are often indicitive of these substructures. Other MS/MS information (parent scans, neutral loss/gain scans, neutral spectra, etc.) can also be used to identify structural fragments. The general approach is to make correlations between a known substructure and some subset of the complete MS/MS fragmentation 'map. With a library of these correlations, an unknown MS/MS fragmentation map can be quickly analyzed for the indicated structural characteristics. In the first implementation of this scheme, we will compare an unknown daughter spectrum against a library of daughter spectra. The closely matching daughter spectra are assumed to be related to structural similarities, specifically common substructures. The structures of the reference compounds will be compared to determine the substructures they have in common. Lists of substructure/spectrum correlations will be made in this way. This process would be impractical without a group of synergistic software tools to help automate each step. Figure 1.1 shows in block form each of the_ major tools needed for a project of this magnitude. An unknown sample enters the scheme diagramed in Figure 1.1 as experimental data which is stored in the multi-dimensional data base (see Chapter 3). These unknown ______________ ' intelligent : r i Controller 5 I. ...... 1' _______ I i Triple Quad Mass Spectrometer ’ l AL Reference Speciruv‘klibrary storage I Experimenter's Data Base Data Base t l I 3 '. 3 5: i 15. fi'» . l *5 +5; {2% reference test I a g : 3 spectra Spectrum spectra : m o ' l \xu: ' l 3 54% Matching : g 75:" r matched lists : 2 is. * v : a pa- ---------- ‘ ’ ' I | t st structures : Substructure 5.-.? ---------- (sub)Structure : . l 1 Searching E'"fib7;,;'gfo';;g;" Data Base : :- motched identified 3 i i ‘ substructures .1"! r substructures :1 " """""" : Molecular i """""" " : Structure i< Structure 1 Formula . -‘ - ' s . . .---9[9.°.".'.’.°I---.i Generator L---93".°.'f’.*.°f---.i t Identified Structure Software tools for structure determination Figure 1.1 6 daughter spectra are compared to spectra in the reference data base (17-18) by matching algorithms (19-20, also see Chapter 5). The substructures of the best matched daughter spectra are obtained (currently these substructures are manually entered, but algorithms to extract these substructural features from structures of the parent molecule are being developed). All resulting substructures (and .information from other sources) will be used as input to GENOA (21-23) to determine all chemically possible structures 'that contain the identified substructures. The final step involves analyzing GENOA’s output structures for the major features and determining what experiments will further reduce the number of candidate structures. Thesis outline This thesis describes my work toward automating the data collection and analysis of a complex, multi-variate instrument. That this work involves three vastly different areas of computer science (real-time processing for control systems, data base management and expert systems for data analysis) underscores the diversity of techniques needed to take full advantage of today’s automated instruments. This dissertation consists of five chapters, including this introduction. Each chapter is independent of the others (although Chapter 4 refers back to Chapter 3 for details of 7 the data base) and includes introductory and concluding remarks. Control system hardware and software are discussed in Chapter 2. This chapter describes the first and second phases of the control system and its operation. The design considerations and tradeoffs that went into each phase are discussed. Although these phases of the control system have been subsequently superseded, they provided an excellent foundation on which more sophisticated systems were implemented. Chapter 3 describes a data_base for multi-dimensional data. The first version of this data base software was developed jointly with researchers at Lawrence Livermore National Laboratory (24-25), but was modified and extended locally. This data base software is currently running on three TOMS instruments, and has proven to be extremely reliable and invaluable for the storage of MS/MS data. A special data retrieval program is discussed in Chapter 4. Described is a program that is able to extract any two dimensional plane of data from the multi- dimensional data stored in the data base. This is a convenient and powerful tool for trend analysis and allows the user to look at a matrix of data from several orthogonal axes. 8 The final chapter describes some of the chemical information available in MS/MS data, and presents several ways of extracting some of this information from the data. The concept of a neutral spectrum is introduced, it’s utility is explored, and a simple expert system for the analysis of these spectra is described. Also presented is a grouping algorithm designed to cluster daughter and neutral spectra so that the common structural features of their parent molecules can be obtained. The substructural information gained by using these techniques can be used to help determine the overall structure of unknown compounds. Conclusions In this dissertation, the process of automating the analysis of TOMS data is described, from the analog to digital converter that samples the ion intensity, through an expert system which samples the information present in MS/MS data collected. Throughout this work, I have tried to show that comprehensive software tools can aid the chemist in such complex tasks as structure determination. Someday these tools will be integrated with the addition of an intelligent controller as shown in Figure 1.1. This controller will be able to direct the experiments performed by the instrument. The work that others and I have done in Dr. Enke’s lab (solid lines and boxes in Figure 1.1) sets an excellent foundation for these higher level systems. Chapter 2 Instrument Control Introduction In the fall of 1978, the triple quadrupole mass spectrometer designed by Drs. Richard Yost and Christie Enke (2) produced it’s first daughter spectra. This instrument, while designed to be controlled by computer, was initially operated manually. The early TOMS instrument used a ramp generator to sweep the mass selected by one of the quadrupoles and had a strip chart recorder for data output. To collect daughter spectra, the first quadrupole was manually set (with a potentiometer) to the mass of a parent ion (identified by an oscilloscope), the sweep generator was switched to the third quadrupole, and a scan was taken. The strip chart recording was then measured and a mass scale added. As this process indicates, the collection and analysis of each spectrum was a time consuming task. The design of a general purpose, single or multiple microprocessor system for instrument control had been initiated in the fall of 1978 (26). One of the goals of this research was complete control of all instrumental parameters of the TOMS. We realized that this would not be 9 10 accomplished overnight, and decided to implement computer control in three phases. The first phase would be a simple digital strip chart recorder, the second phase would be the implementation of the new microprocessor system hardware, and the final phase would be complete computer control of the instrument with a multiple microcomputer system. Phase 1: Digital strip chart recorder The first phase of the TOMS automation was the implementation of a digital strip chart recorder. This allowed a computer to generate the sweep signal for mass selection and to collect the ion intensities. A computer can then assign the mass values and display the data in several different formats. Digital strip chart recorder: Hardware The INTEL 8085 microprocessor chip was chosen to be the heart of the general purpose control computer. An 8085 microcomputer evaluation board, the SUE-85, was convenient for implementing the first phase of the TOMS automation. The SDK-85 is a microcomputer with a monitor (in read only memory, ROM), a limited amount of program memory (random access memory, RAM), a keypad, an eight character display and an area for custom hardware. In the extra space provided on the SDK-85, I wire-wapped interfaces to the 11 instrument, a floppy disk drive and a keyboard/display unit along with some extra RAM and ROM (see Figure 2.1). The terminal I constructed for this system included two video memories; a graphics memory which provided a low resolution display (256 pixels horizontal by 240 pixels vertical) and a text display which provided 16 lines of 64 characters. Both of these memories, -a 9 inch monitor, a parallel keyboard and a disk drive with an intelligent controller were mounted in a terminal enclosure. This terminal was connected to a parallel port on the SDK-85 with a ribbon cable. The instrument was interfaced with a single digital-to-analog converter (DAC) to control the mass. selected by one of the quadrupoles, and an analog-to- digital converter (ADC) to measure the ion intensity. The ion current was converted to a voltage and amplified by a Keithley model 18000 programmable current amplifier. The disk drive (model 270) and intelligent controller (model 1070) were manufactured by 'PERSCI. The disk controller had an INTEL 8080 microprocessor which handled all the disk functions (seek, read, write, etc) and maintained a file structure. The use of this controller saved much development time, since all the disk related functions were already done. I wrote a program for the lab PDP-ll minicomputer (PIPERSCI) to read and write PERSCI formatted floppy disk. With PIPERSCI, programs are created on the PDP-ll (with good editors and cross-compilers) and 12 SDK-85 8085 CPU — RAM J — ROM J — A/ D J<-— from current amplifier :: D/ A F9 to quadrupole controller ' """ t Griffith-3583;53- TEZI video J * graphics 1 board f keyboard _ disk flontroller ——-----------‘-—-- h‘-—-- First TOMS control system Figure 2.1 13 carried to the microcomputer for use. Data stored on a floppy by the microcomputer are transferred to the PDP-ll for post-processing and display. Digital strip chart recorder: Software Software for this control system was written entirely in assembly language. The initial versions of the code were assembled and typed in by hand; later versions were cross-compiled on the PDP-ll and transferred to the microcomputer on floppy disk. ‘ A bootstrap for the microcomputer disk was burned into a ROM (after interfacing a PROM programmer to the lab PDP-ll and writing a program to burn the PROMs). The final version of the control software for this phase consisted of only a few commands (summarized in Table 2.1). These commands consist of only the basics: load in a new program, set the number of data points to average, set the threshold level, scan a quadrupole, store collected data to disk, and display data. This system was not meant to be the final word in mass spectrometry control systems, but was designed to replace the strip chart recording method with a digital method and gain the experience and tools required to design and implement a more complete system. With this system, scans could be initiated and data could be collected and stored on disk. 14 Table 2.1 Summary of commands for the digital strip chart software CHANGE FSCAN SCAN DISP WRITE query the user for the following: low mass high mass mass increment number of points to average threshold for saving data quickly scan from low mass to high mass to observe the peak on an oscillosc0pe scan from low to high, collecting data display the data on the graphics display write the data to the disk 15 Post processing and display of processed .data were accomplished on a PDP-ll computer using programs written by Phil Hoffman (18,27). Phase 2: Single micro control system The second phase of instrument control proceeded on two fronts, the development of the control hardware and the software systems. The design and development of microprocessor hardware modules was done primarily by Bruce Newcome (26,28). My role in the development of the hardware was as the software consultant. Together, we would determine whether a hardware design ’feature’ would make programming the hardware easier, or conversely make the software much more difficult to code. By designing both the hardware and software together, we were able to trade off functions between hardware and software for the most efficient operation of the resulting system. The hardware will be briefly described here to enable the reader to gain appreciation for the control software. Single micro control system: Hardware As can be seen in Figure 2.2, the Newcome micro- processor has bus connections on two levels; each module is attached to a ’mother-board’, which is in turn plugged into a backplane. Each ’Bruce-bus’ microprocessor module 16 A typical single microprocessor system Figure 2.2 l7 performs one specific function. A set of these modules are joined to comprise a microcomputer system. The ’standard’ microcomputer board consists of an 8085 microprocessor board, a memory board capable of holding either RAM or ROM, two serial ports and an interrupt controller. This hardware modularity allows a great deal of flexibility in the design of a control microcomputer system. Specific modules for the triple quadrupole mass spectrometer control consisted of two digital-to-analog converters (DACs), one analog-to-digital converter (ADC), a controller for a Keithley programmable current amplifier, two video displays and a keyboard. Each of the DACs controlled one quadrupole power supply, allowing the software to select masses in both quadrupoles. With computer control of the amplification range of the Keithley amplifier, the microcomputer was able to sample the input signal and adjust the gain for a 10s dynamic range. The graphics and alphanumeric video memories were split to separate display units, allowing simultaneous display of data and of the collection parameters. The terminal constructed for the first phase was dismantled and the disk drive was built into the TOMS console. 18 Single micro control system: Software The development of microcomputer systems from the chip level necessitated the development of software for testing both hardware and software. At that time, there were no affordable, commercially available software packages that could be adapted to our needs. The first versions of the home-built microcomputers were debugged with highly specific software; the first operating processor/memory combination did nothing but flash a light on the CPU board. As the hardware matured, so did the software. A software monitor (called "Dr. Memory"), able to operate on any standard CPU board, was developed and refined, and used as the basis of more complex software systems. This monitor is the equivalent of a simple operating system, complete with device drivers for the commonly used peripherals (i.e. the terminal and disk). This frees the user to program the higher level functions without concern for the details of the hardware registers. Although this monitor eased the hardware dependencies of the applications software, a more complete programming system and command interpreter was needed. A library of commands, and a programming system was developed (the structured library oriented programming system, SLOPS) as a basis for specific control applications. Finally, control software for the triple 19 quadrupole mass spectrometer was implemented using the tools provided by both Dr. Memory and SLOPS. In practice, all three levels of programming (monitor, library and control software) were developed concurrently, but they will be described separately below. Today, one would choose from a variety of convenient and inexpensive hard- ware and software modules that are commercially available, but at this. relatively early time in microprocessor applications, we were on our own. Figure 2.3 shows the relationship between currently available software modules and Dr. Memory, SLOPS and the applications programs. Dr. Memory’s monitor A monitor for the home-built microcomputer systems needs the following characteristics: handle all terminal I/O, allow display/change of memory locations, execute code starting at any location, stop and restart execution of programs and load software. Dr. Memory was designed with all these features in mind, and the code fit in only 2 Kbytes of ROM, complete with help screens, communication software and general purpose I/O subroutines. Dr. Memory is a general purpose monitor used for both hardware and software debugging. The majority of the available commands (listed briefly in Table 2.2 and more fully in Appendix 1) are for examining and changing memory 20 Conventional programming SLOPS based programming Applications programs Applications programs subroutine libraries SLOPS utilities - operating system Dr. Memory device drivers Relationship between conventional and SLOPS programming Figure 2.3 21 Table 2.2 Dr. Memory’s Monitor. (V2.6 9/3/80) Commands Description Break Breakpoint - stops program execution Cold restart - used to clean up user stack 0 Octal entry and display format H Hexadecimal entry and display ? Prints a summary of this text a/ Open location a for modification a\ Opens two bytes for modification n Modify open location, close it n Modify, close, open next location nA Modify, close, open previous location Sr Open register aG Start user program at location a P Procede (at saved PC) S Start the second EPROM (SLOPS) T Talk to the PDP-11 L Download from the PDP 11/40 Note: a’s and n’s are optional, defaulting to the last value. Restarts fig; Description RST 0 C7 Cold restart - sets default stack, etc. EST 1 CF A \ EST 2 D7 BC \ Diagnostics - prints RST 3 DE DE ) the contents of the RST 4 E7 HL / indicated registers RST 5 EF PC / RST 6 F7 SP, flags printed RST 7 FF~ Breakpoint (warm restart) Useful subroutines Crlf Outputs a combination Downld Downloads data from the PDP ll Efclr Clears an event flag Getnum Gets a number input Gettt Gets a character from the USARTs Lights Sends a predefined light pattern out Lite Sends a specified light pattern out Nulljb Boredom routine Print Prints an ASCII string Putnum Outputs a number Puttt Writes a character to a USART Rhlr Rotates HL right 22 locations, and starting programs. New pieces of hardware are easily checked out with this tool. The new interface hardware is connected into the system, the power applied, and the operator can use Dr. Memory to access the registers of the new device and manually exercise and test it. Software (8085 assembly language) entered into the lab PDP-11 computer is cross-compiled and downloaded to the microcomputer. Dr. Memory has two commands that facilitate this process: TALK and DOWNLOAD. TALK is a routine to connect the user’s terminal to the PDP—ll through a second serial port. The DOWNLOAD command instructs the microcomputer to accept a program from the PDP-ll, and load it directly into memory. Dr. Memory is used to start execution of these new routines, and stop execution at any time to examine memory or registers. This monitor includes a series of software debugging tools to aid the assembly language programmer. The 8085 restart commands (EST 0 to RST 7) are software interrupts, used in the monitor as debugging breakpoints and display routines. Restarts 1-6 print the contents of different registers and restart 0 initiates a cold restart (as if the power were just applied). The breakpoint restart (RST 7) is the most powerful restart for the programmer, returning control to Dr. Memory. The complete context of the user’s program is saved, and the programmer is able to examine and 23 change registers and memory locations. When control is returned to the program under test, the program’s context is restored and program execution continues as if the breakpoint never occurred. Dr. Memory’s monitor internal structure Internally, Dr. Memory consists of data tables, interrupt handlers, subroutines and a command processor. All the terminal I/O’s are interrupt driven, so commands can be entered and buffered while the microcomputer is performing another task. These commands are executed when the current command finishes. A break character from the console terminal acts as a special command, caught by the interrupt handler, which causes the microcomputer to stop executing the current program, saves the program context and returns control to Dr. Memory. The command processor is a simple program that accepts input from the terminal and checks it against the list of available commands. Valid commands are executed while undefined input returns a "Say what?" response. A special subroutine, NULLJB, is called whenever the microcomputer is not executing a command or when terminal input is pending. This subroutine continually checks for an ’event’ to occur (a character input from either serial port, a timer to go off, or any other interrupt process), 24 at which time control is returned to the calling program if the program was waiting for that event. While NULLJB is waiting for an event to occur, it flashes the lights on the processor board in a characteristic pattern. By watching the microcomputer, one can instantly tell if a program is running or waiting for input. l Structured Library Oriented Programming System The structured library oriented programming system (SLOPS) defines a structure for both commands and subroutines (which are identical in SLOPS’ view of the world). The base SLOPS system fits into 2 Ebytes ROM, and consists of a command processor (more elaborate than Dr. Memory’s command processor) and a series of subroutines and/or commands. The basic premise for SLOPS is to create a library of entries (either subroutines or commands) which can be built upon to create new, more elaborate entries. These entries are chained together to form a linked list which can be quickly scanned. SLOPS is conceptually similar to the FORTH programming language (29), in that they are both threaded, interpreting compilers. When either programming system is started, it is in interactive mode. In this mode, commands entered are immediately executed. A new command definition, when entered, is "compiled", stored in memory, and ready for 25 execution. This compilation phase searches the library (or dictionary, as it is called in FORTH) for each word in the definition and replaces it with the address of that subroutine, or assembles the instruction if necessary. This process of combining previously defined modules to form more complex functions is the basis of threaded programming. The major difference between SLOPS, FORTH and CONVERS (a FORTH style language developed by Bonner Denton, et. al. at the University of Arizona) (30-31) is the way arguments are handled. Both FORTH and CONVERS are post-fix languages (similar to Hewlett-Packard calculators), while SLOPS uses pre-fix notation. In FORTH, arguments for the subroutine or function are assumed .to be pushed onto an internal stack followed by the function (i.e. 2 2 +), while SLOPS functions assume arguments will follow the function name (i.e. + 2 2). Both of these notations have merits, but pre-fix notation seemed clearer, especially for single argument functions (i.e. MASS 41). At the time we needed to implement the control software for the TOMS instrument, FORTH was only available for PDP-11 minicomputers. CONVERS was developed at. the University of Arizona as a FORTH style language for the 'INTEL 8080 series microcomputers, and was implemented in our lab by Eric Carlson (32) for an early version of the 26 multi-microcomputer system used to control a stopped-flow spectrophotometer. Instead of porting CONVERS to the new microcomputer hardware, we decided to use the experience and concepts gained from the development of CONVERS to implement a new, hopefully better, programming system. SLOPS internal structure Each library entry consists of a header flag, the name of the entry, a link to the next entry, and the entry itself. The header flag byte contains the length of the entry’s name and a two bit flag describing the type of. entry. Only three types of entries are currently used. Figure 2.4 shows each of the entry formats. Two of the formats have variable length entries; hence they have link fields pointing to the next entry in the library. The other two types of entries have fixed lengths and do not require link fields, saving a few bytes of RAM per entry. The subroutine or command library entry is the most common type of entry. The code for the subroutine starts immediately after the link field and can be as long as the available RAM. The macro library entry is presently unused. The label and assembler library entries are used for _the built in assembler/linker. The data field of the label entry is two bytes long, and can be either a value or an address (absolute or relative). The data field for the Entry type Subroufine Macro Label Assembler flag byte format: 27 Entry format IflagI label IlinkI subrofii‘ne l l l:L L l l l—E IflagI IabEI‘ IlinkI macFo Iiiiyiitiix, flagI lobar l 1%: l label's value Iflag |ab3l I l L h op code Ldent? length I depen ant of label. 0 0 Subroutine entry 0 1 Label entry 1 0 Macro entry 1 l Assembler entry SLOPS library entry format Figure 2.4 28 assembler entry is one byte long, and is the microprocessor operational code for this mnemonic. The command processing routine of SLOPS prompts the user for one or more lines of input, and breaks the input stream down into ”words”. These ”words" are defined as alphanumeric strings delimited by non-alpha characters (spaces, commas, quotes, etc.) The input words are kept in a forward-chaining stack, and the first word in a line is assumed to be a command. The command processor searches the library for a match, and if a subrbutine/command is found, it is executed. If the executing procedure needs input parameters, it gets them from the word stack; if no words are available, SLOPS prompts the user for the required input. Any executing task is able to call any subroutine in the system, recursively if necessary. When any procedure is finished, it simply RETURNs to the program that called it. The top level command, when done, returns to SLOPS. If more words are in the word stack, they are then executed in turn until the stack is empty, and SLOPS prompts the user for more commands. The base SLOPS system contains a number of useful subroutines as well as the command kernel. Table 2.3 contains a list commands and brief descriptions of each subroutine or command, and Appendix 1 is an abbreviated form of the user’s manuals for both Dr. Memory and SLOPS. Subroutines Addr Ascii Blank Brkdwn Check Cvtext Cvtint Dcmp Delay Link Number Search Ttyin Word 29 Table 2.3 Basic SLOPS subroutines and commands Returns code address of library entry Converts binary to and from ASCII Clears the screen Breaks input line into words Compares word with library entry Converts a number to ASCII string Converts ASCII string to binary number Double compare Software time delay Links to next library entry Get number from word stack Search library for a match Gets a line of input Get word from word stack (also see subroutines listed under Dr. Memory) Arithgetic subroutines Ddiv Div Dmult Dsub Mult Commands Convert Downld Drmem Talk Double divide Divide Double multiply Double subtract Multiply Converts a number to any base Loads from the PDP 11 (See Dr. Memory) Puts Dr. Memory in control Talk to the PDP 11 (See Dr. Memory) 30 As can be seen from this list, about half of the subroutines are used by SLOPS to keep track of library entries, etc. The other half are general purpose subroutines (double multiply, convert binary to/from ASCII, etc.) An extended SLOPS system includes an 8085 assembler and many more general purpose subroutines. The extended SLOPS system is a powerful tool for software development. The user is able to create and test small subroutines and build more complex systems using these building blocks. Execution speed of the resulting commands is excellent due to the extensive use of assembly language and the low overhead of the command processor. When a library of low level commands is built, this system behaves like a higher level programming language, allowing the programmer to use structured programming techniques. Mass spectrometer control software SLOPS provided a good foundation for control software for the triple quadrupole mass spectrometer. The new control hardware featured two video displays, one for graphics and another for status display and user interaction. The graphics screen could be logically divided into two halves termed the upper and lower displays. Each of these displays were equivalent, but could be programmed independently. The status display also 31 had two ’halves’ to control each half of the graphics display. The status display, shown in Table 2.4, displayed the current status of the instrument, the mass range for each quadrupole to scan and the threshold for gathering and storing intensity data. Table 2.4 TOMS status display UPPER LOWER Mass: Quad 1 xxx-xxx xxx-xxx Quad 3 xxx-xxx xxx-xxx # pts to avg xxxxx xxxxx threshold xxx.x:-x xxx.xz-x min/max range -x/-x -x/-x Quad 1 Quad 3 Intensity DC:xxx.x RF:xxx.x xxx.x:-x The two halves of each screen effectively allow two separate experiments to be run simultaneously. This feature, as well as much improved graphics and the ability to change the gain on the amplifier during data collection, made this control software much more convenient and versatile than the ’digital strip chart’ of phase 1. The commands available to the operator of this system are listed in Table 2.5. The control software allows four sets of parameters to be stored (with the SAVE command) and retrieved (with the GET command). This allows the operator to set up several different experiments and quickly switch INI GET SAVE PARAM MANUAL FSCAN SCAN WDATA 32 Table 2.5 TOMS control system commands Initializes the system, clearing the displays, setting default mass values, etc. Gets a set of parameters from one of four stored setting areas. Saves the current parameter table in a bank of parameter tables, for future recall. Sets general parameters. It acts in a ’single character input’ mode for the parameter to change, i.e. ’O’ not ’OUAD’ for changing a Ouad’s mass range. The parameters for either graph, the UPPER or the LOWER, must be changed individually, and the commands U and L identify which graph is being changed. - The following commands are currently supported: O n low~high Changes Ouad n’s mass range T int range Changes the threshold R min-max Changes the min and max range A n Changes the # of points to average G low,n Changes the low mass on the graph F Changes flags, each flag queried M text Puts message on the status display H text . Puts a header (title) on the graph ? Prints a summary of the options AZ exits (control Z) Implements ’manual’ control. Using the keypad of the terminal, the keys 1, 2 and 3 are for quad l; 4, 5 and 6 are for quad 3; and keys 7, 8 and 9 are for the range. Keys 1, 4 and 7 decrease the current value by 1, keys 2, 5 and 8 set the current value into the parameter table, and keys 3, 6 and 9 increase the current value by l. Increments the mass, checks for a typed character, and if none entered, leaps about. Scans the set mass ranges, collects data, updates the screens, records the data and loops until a character is typed. Writes the data from RAM to a disk file. 33 between them. Setting parameters is accomplished by using the PARAM command. The parameter changes are immediately displayed on the status and/or graphics displays. The MANUAL command allows the operator to step the quadrupole masses by single DAC units to accurately find the maximum of a peak. For example, when set up for a daughter scan, the MANUAL command allows the operator to tune the first quadrupole mass to the maximum intensity (which might not be an integral mass due to calibration inaccuracies). Two commands, FSCAN and SCAN sweep one or both (as defined on the status display) quadrupole masses from the preset minima to the maxima. The FSCAN command sweeps quickly, allowing the spectra to be displayed on an oscilloscope. The SCAN command is necessarily slower since data are being collected and averaged. Each of these commands sweeps the entire specified range of one or both displays, and restarts the sweep unless told to stop by any keystroke on the keyboard. One last command, WDATA, writes the data collected by a SCAN command to the disk for post processing by the lab PDP—11 computer. This control software is written as a series of modular subroutines allowing for independent testing of each module. Appendix I briefly describes the library of subroutines created for and used by this system. Enhancements to the control software, such as special 34 purpose commands, are easily constructed using these routines as a foundation. Phase 3: Current mass spectrometer control system The single microcomputer system described above was a temporary solution to the control problem. When that system was implemented, we were aware that the system could not be operated at the high speeds we desired. The third and current phase of instrument control was accomplished with a multiple microprocessor system, designed and implemented to allow several tasks to run concurrently. A full description of the control hardware can be found elsewhere (26,33). Commercial software tools and systems were being developed and becoming available during the implementation of the SLOPS based system. These tools were now sufficiently powerful, flexible and affordable that we decided to implement the third phase of software control using these tools. The control software for the multiple microprocessor system was written in the FORTH programming language, now available for microprocessors. FORTH is a high level programming language commercially available that is well suited to control systems. The experience gained from the SLOPS based control system was invaluable in the design and implementation of the current FORTH based 35 software. Complete descriptions of the current control software can be found in Carl Myerholtz’s thesis with modifications by Adam Shubert and Mike Kristo (11,34-39). Conclusions The triple quadrupole. mass spectrometer is a complex and experimental instrument, and automation could not occur in a single step. The three-phase process described in this chapter worked very well, allowing us to become familiar with the advantages and disadvantages of various techniques. The transition from a completely manually~ operated instrument to a completely computer controlled instrument was arduous both in implementing the harware/software as well as in convincing the operators that a computer can reliably operate a complex instrument. Chapter 3 Multi-dimensional Instrument Data Base Introduction The evolution of GC/MS brought with it the development of computer data systems designed to handle the additional dimension of time as well as the mass and intensity axes. However, even three dimensions are inadequate to cope with the measurement capabilities of MS/MS instruments and many other modern computer-controlled systems. In the case of a totally computer-controlled MS/MS, a large number of variables can be scanned, either singly or jointly. Each such scan will produce one plane of information in a multi- dimensional data base. As an example, the axial energy can be scanned at fixed masses or the mass (in either quad 1 or quad 3) can be scanned with incremental changes in axial energy. Other variables (dimensions) include direct inlet probe temperature, collision gas pressure, ionization voltage, chemical ionization gas pressure, lens voltages (both in the source and between quadrupoles), collision gas type and others. Thus, a new, more versatile data base management system was required. 36 37 Fortunately, the storage of all these data has become affordable using computers with large capacity disks and magnetic tapes. Now the analyst can store all data as they are acquired on disk, and subsequently put them on magnetic tape for archival storage. Once these data are stored, the problem becomes one of rapidly and flexibly accessing the data. An additional problem in multi-dimensional instrumentation is the need to extract any plane of information in the data base, even if that plane does not correspond to the types of scans that produced the data base. The concerns of data retrieval are explored in the next chapter. The chemical literature contains many examples of systems for creating and searching libraries (40-49). There are also suggestions for using pattern recognition (50-55) and artificial intelligence or heuristic (21,56-59) means of (interpreting data. Little is said, though, about rapid, versatile and efficient initial storage of raw data in real time. We have developed a system that provides these storage characteristics, provides for storage of multi- dimensional data, and incorporates mechanisms for rapidly accessing the data. 38 Scientific versus business data bases In many ways, both business and scientific data bases are similar, but there are significant differences (60-74). Business data bases must be able to handle text ("warehouse X"), integers (52 widgets left in stock), and floating point numbers ($42,000) (60-72). In addition to these requirements, ’scientific data bases may also have to accommodate bit strings, vectors, arrays, graphs and multi— dimensional data (63-74). Both types of systems must have query languages, or methods to retrieve the data stored in the data base. Business systems usually require a small set of specific answers (all employees with salaries greater than $30,000). Scientific data base retrieval languages or programs must .be able to handle the uncertainty (error) present in many scientific measurements (68). Queries must be made in the form ”retrieve all energies between -20 volts and -18 volts". Both data base query systems must be able to join several queries ("names of employees older than 62" AND "been with the company longer than 15 years"), and display the results as lists, tables or graphs. The data formats of business data bases must be defined before, use by the data base administrator using a data definition language, and the data are often entered by keypunch operators. The structure of the data base must be 39 carefully considered and analyzed, for the resulting data base may be in use for many years. This data base definition process can take weeks or months, and has spawned an industry that provides data base definition templates for the popular data base programs. Scientific data bases, especially those created by during an experiment, must be created quickly (possibly automatically) and be able to accept data from the ongoing experiment in real time. A scientist does not have the time to spend weeks organizing a data base for one experiment, and then redefine the data base for the next! A scientific data base must be easily adapted to new and rapidly changing problems. Survey of data base structures There are essentially 4 major structures for data bases: flat file, hierarchial, network and relational (so-61,72). or these, the flat file is the simplest, and is used in the majority of the ”spread sheet” programs for micro- and mini-computers (75-76). These flat files are two-dimensional tables consisting of columns for each variable or attribute, and rows for each observation or ‘instance. If the data to be recorded are amenable to this form with minimum duplication of values, this format is efficient and can represent a natural working order of data 40 from right to left and top to bottom. Example 1 in Figure 3.1 illustrates a simple flat file data base. Hierarchial data bases provide 1 to N links between records, such that one root record may have one or more "child” records, each of which may have one or more "grandchild” records, and so forth. An illustration of an hierarchical data base can be seen in example 2 in Figure 3.1, where Prof. Smith teaches 3 classes and Prof. Jones teaches two others. By providing the links between the records, redundant information (Prof. Smith teaches math 101; Prof. Smith teaches math 103, etc.) is. eliminated. Network model data bases provide for N to M linkages between records, and have all the features and advantages of hierarchial data bases. In addition, they provide more flexible ways to interconnect records, allowing almost all redundant information to be eliminated. Example 3 in Figure 3.1 shows a simple network data base.. As is expected, this increased flexibility comes at a price; network model data bases are more complex than hierarchial models, and often require special training for data base administrators. Relational data bases make a radical departure from the linkage schemes used in both hierarchial and network 41 F’rof. Smith I math 101 IProf. Jones I math 413 room 42 r room 123 l 1 1 Example 1: flat Prof. Smith Prof. Jones file moth 101 math 103 math 142 math 413 math 417 Example 2: hierarchial model ]< moth 101 student Miller [me' Smith math 103 student Johnson I Prof. Adamk . . physncs 101 student Webber Example 3: network model student Miller moth 101 Prof. Smith moth 101 student Miller physics 101 Prof. Jones math 413 student Johnson moth 101 Prof. Adams physics 101 student Webber math 103 I student Webber physics 101 1 relational link j . Exampie 4: relational model Data base structures Figure 3.1 42 models. In the relational model, all data are stored in flat file format, and "relations" are established between various tables. In this manner, implicit links are made between relations as opposed to the explicit links needed in hierarchial and network models. The data base system creates these links as needed by "joining" tables together on the fly. Example 4 in Figure 3.1 shows an example of a simple relational data base. This type of data base is easy to maintain (the flat files are conceptually simpler), but require more computational power to join relations for all but the simplest queries. For example, from Figure 3.1, a query ”list all students of Prof. Jones" requires the course number from the two tables to be joined, and only students in one of Prof. Jones’ classes to be listed. Data base capabilities We have included in our system several capabilities that have been either overlooked or not necessary in past data base management implementations for scientific instrumentation (see Table 3.1). These extra capabilities of the multi-dimensional data base system are essential for use with our triple quadrupole mass spectrometers. There is a large degree of interdependence among the TOMS parameters, and the effects 43 Table 3.1 Capabilities of the multi-dimensional data base 1) The ability to store all instrument parameters 2) The ability to store a variety of changing parameters 3) The ability to store any X/Y data pairs (such as mass/intensity and energy/intensity pairs) 4) The ability to add comments before, during or after the experiment. 5) One compound or experiment for each dataset. of these dependencies are not yet completely understood. Storage of all instrumental parameters along with the data is required if we are to fully understand these parameters. To study these interdependencies effectively, the operator may vary several parameters incrementally to obtain a dataset with multiple dimensions of data. These data can later be searched in a variety of ways to study the effects of individual instrumental parameters. For many research applications, complete mass spectra or daughter spectra are not required. For an energy dependence study, the operator may want to vary the axial energy for a certain parent/daughter fragmentation. This data base is able to handle these data (intensity vs. axial energy) as well as the more usual mass spectral data (intensity vs. mass). The additional capability ‘Df allowing the operator to enter comments at any time clur‘i-r‘g the experiment allow the dataset to be used as a notebook ’ 44 storing information about that experiment with the experimental data. An example of multi-dimensional data The control systems for the triple quadrupole mass spectrometers at MSU have the ability to devise "methods" which allow the rapid collection of multiple dimensions of data. This is useful for characterizing both the instrument as well as a chemical sample. A method to collect data for a study of axial energy and collision gas pressure would follow the outline presented in Table 3.2. Table 3.2 Method to collect 5 dimensions of data 1) LOOP: collision gas pressure: from low to high. 2) LOOP: axial energy: from low to high. 3) LOOP: quad 1 mass: set to parent in the EI spectrum 4) scan quad 3. _ 5) continue to the next parent mass 6) continue to the next energy 7) continue to the next pressure This method creates a five dimensional dataset: collision pressure, axial energy, quad 1 mass, quad 3 mass and intensity. This- dataset contains a wealth of information about the compound, and the data are collected automatically (except for setting the collision gas pressure). The operator is then able to search through 45 this dataset and extract a plane of information, even if that plane wasn’t specifically scanned. For example, with this dataset, we can extract data to produce a plot of the intensity vs. axial energy for a certain parent/daughter pair at a specific collision pressure. Details of data retrieval and extraction from a dataset are presented in the next chapter (24-25,77). A programmer’s view of the data base The data base definition is complete when the variables to be studied are defined. Since many experimental. parameters will not change, an hierarchial model was chosen for this application. At the top level or root of the hierarchy are the parameters that do not change during the course of the experiment. A second level was implemented in order to store a traditional mass spectrum as one record in the data base. This structure was choosen to reduce redundancy in the data base, and hence conserve disk space. The details of the data base structure are presented later in this chapter. The portion of this system used to write and read data to and from the data base consists of a series of FORTRAN subroutines. These subroutines are easy to incorporate into any computer-controlled instrument. Only a few 11088 of code are needed to implement this data base system into 46 the control software (see Figure 3.2). All the work of managing the data in a dataset is accomplished by a set of subroutines which frees the programmer to concentrate on other aspects of the project. Appendix 2 provides more information regarding the subroutines used for data storage and retrieval. Because the subroutines are short and data storage is efficient, this system could be added to many existing dedicated instrument control computers. It could also be put on a separate, time-shared computer system, with a high speed data link. Both of the triple quadrupole mass spectrometers at MSU are controlled by one or several microprocessors linked to a minicomputer which handles the data storage and retrieval. This allows for the efficient separation of tasks, the data acquisition and the data storage and retrieval. By using separate computers, the instrument can be collecting data while other operators analyze their previously collected data. Data base file formats Computer files are generally organized in one of three ways: sequential, indexed or direct (60—62,71,78). A sequential file is the simplest, and as the name implieS. is a sequentially ordered collection of information, placed in the file as received and packed in an unstructured WHY- 000 10 47 PROGRAM SAMPLE INCLUDE ’MDDB.CMN’ ! include the commons REAL*4 X(500),Y(500) ! arrays for data LOONTI = 5 ! define the terminal LUN CALL MSINIT ! initialize things IF (IERR .NE. 0) THEN ... 9 check for errors IFILE = 1 ! define the dataset number CALL MSOPEN(IFILE,’New dataset name? ’,’NEW’,2,3,4) ! open new dataset, using 2 LUNs 2,3,4. Now define the variables NUMSTC(IFILE) = 3 ! three static variables ISTATC(1,IFILE) = 23 2 the first is code #23 RSTATC(1,IFILE) = 2.0 9 and it’s value is 2. NUMVAR(IFILE) = 6 ! six variable parameters IVAR(1,IFILE) = 42 2 first variable is #42 CALL PUTPRM(IFILE) ! and write to the dataset (fill up the X, Y arrays with data) RVAR(n,IFILE) = value ! record the variables NUMDAT(IFILE) = xx ! number of data pairs CALL PUTDAT(IFILE,X,Y) ! and store the data IF (more) GOTO 10 CLOSE (UNIT=2) ! close all the files END ! and all done Writing to the multi-dimensional data set Figure 3.2 48 If these data are written in such a way that they can be printed or displayed directly without any processing, the file is called a sequential, formatted ASCII file. This type of file is almost never used for large amounts of data, as it requires a great deal of space on the disk and is cumbersome to access. A variation, 'of an unformatted sequential file, is again sequential, but the data are written in internal (unformatted or non-printing) format. This is a compact form of storing data. It has been used in some instrument data systems, but has several drawbacks, the most severe being access speed. As with any sequential file, the only way to find any particular piece of data is to start at the beginning of the file and read all entries until the particular datum is encountered. For large datasets, this is a very slow process. A second type of file format is an indexed file. In this type of file, a "key field" is associated with some part of the data record. the keys from all the records are collected in one or more structured indicies, and access to the records are provided through these indicies. This is a powerful file format, but unfortunately is not part of the FORTRAN-77 standard. It was not used in order to remain within the standard. Another drawback is the time required to write a record; after the data are written, the file 49 system must update the indicies which could slow the data collection process down. A third type of file is a direct access file. The file is broken into fixed length records, and any one of these records may be accessed almost immediately. An elementary direct access file, consisting of one record per scan, could be used by an instrument that always puts out a fixed amount of data. For mass spectrometry, this format would waste a great deal of space, as the fixed record length would have to be long enough to hold the maximum number of mass/intensity pairs that might be recorded in any one scan. Table 3.3 shows the advantages and disadvantages of sequential, indexed and direct access files. Table 3.3 Comparison of sequential and direct access files Sequential Indexed Direct Record length variable fixed fixed Time required fast medium fast to write record Access speed to slow fast fast specific datum Memory/storage very wastes wastes space compact space sapace 50 In our system we have combined elements of several file types to create a fast, efficient system. An unformatted, sequential file, the header file, is used to store a variety of information that describes the instrument, the experimental conditions, and the variables to be recorded. This file also contains the instrumental parameters that did not change during the given experiment, and these variables become the root of the data base hierarchy. A second file, the pointer file, is written with short, direct access records. The major function of this file is to record the starting record number (pointer or index) of the variables and data pairs located in the third file.‘ This file also contains some redundant data for fast data retrieval. The third file, the data file, is also direct access and contains the values for variable parameters and the X-Y data pairs acquired. Another file, the instrument description or dictionary file, is also direct access and each record contains a definition of a variable or parameter. Each file is described in more detail below. Dictionary file The instrument description file, or dictionary, contains complete descriptions ' of the instrumental parameters that may be varied or recorded. To reduce computer storage, all variables and parameters are assigned a code number. These code numbers are the record numbers in 51 a direct access file which point to the descriptions of each code number. Two descriptions are stored for each code number, a short one (less than 20 characters) and a long one (up to 57 characters), and one flag integer (see Figure 3.3). The short descriptions are used for speed when the operator is entering them, or when display space is at a premium. The long descriptions are more suitable for tables and graphs. The general purpose nature of the data base is thus maintained by conferring the instrument- specific parameter lists to this file, which is, in effect, a kind of conversion table or template. Each different instrument can have it’s own dictionary file. The flag associated with each description in the dictionary tells how that description is used. If the flag is greater than zero for a particular code (say the code for ”collision gas type”), the value associated with the variable also needs to be looked up in the dictionary. For example, if we were looking at code 81, "collision gas type”, and had a value of l, we would display ”Argon", not the numeric value 1 (see Figure 3.4, example 1). In this case, the flag is a pointer into the dictionary, and the value is an offset from this pointer. However, if the flag were zero, we would use the definition itself; in this case, there either isn’t a value (as in the "Argon" example above), or the value has no physical meaning. In the final case, when the flag is negative (or more precisely, a 52 Record Number Record Contents (80 byte direct access records) 1 A20 A58 l2 QSDICT OLDICT IPTDIC QSDlCT Short description of instrument parameter QLDICT Long description of instrument parameter lPTDlC Flag: -1 ==> The value is the answer (i.e. "70.0" eV) 0 ==> No values . (i.e. ”Argon") >0 ==> Must look up the value (i.e. gas #1 = "Argon”) END More definitions Instrument description file format Figure 3.3 Example 1: Parameter 3'81 has a value at 3 record __] 81 m+1 m+2 m+3 'CAD Gas Collision Gas m me no AR Argon 0 N2 Nitrogen O HE Helium 0 Result: Collision Gas: Helium Example 2: Parameter #fo has a value of 70.0 record 20 eV Electron Energy Result: Electron Energy: 70.0 Use of the dictionary file Figure 3.4 54 "-l"), the value itself has physical significance, such as 70.0 eV electrons (see Figure 3.4, example 2). The instrument description file is a powerful feature of this system. By changing this dictionary, the data base management software can be used by almost any analytical instrument. Table 3.4 shows a portion of the dictionary for our triple quadrupole mass spectrometer. Modifications to an instrument often add new variables, and many data base systems would need to be fundamentally modified to account for the new parameter. This software only requires the addition of a new definition in the dictionary. Header file The header file serves primarily as a notebook, storing a variety of numerical and textual descriptions of the analysis. Included in this file are the values of all instrument parameters that will not change in the experiment. Examples of these static parameters are the operator’s name, positive ion mode, EI spectra, etc. (see Table 3.5, a sample dump of a header file). Other parameters that will change in the course of the experiment are called variables, and their codes are listed in this file. The contents and order of data in the header file are diagramed in Figure 3.5 and summarized in Table 3.6. 5) 9) 10) 13) 14) 18) 19) 20) 21) 22) 23) 24) 43) 44) 45) 46) 47) 43) 51) 52) 73) 74) 75) 76) 30) 55 Table 3.4 Definitions in the TOMS dictionary Operator 1) John 2) Milton Date Scan type 1) lscan 2) 3scan 3) Dscan 4) Pscan 5) Nscan 6) Sweep 7) Stable ion Neutral Loss Mass Parent mass Daughter mass Source 1) CI 2) EI Ions 1) Pos 2) Neg SP CI Gas 1) CH4 2) H2 3) CH4+N20 FC ~EC eV Ion Volume Repeller CI Drawout EIV Ql Lens 1 Q1 Lens 2 Q1 Lens 3 Q1 Offset Ql Mode 1) RF 2) DC 3) Scan Q1 Mass 01 Delta M Q1 Res. 02 Lens 1 02 Lens 2 02 Lens 3 02 Offset 02 Pressure Operator John 0. Public Milton Webber Date of Experiment Scan type Quad 1 scan Quad 3 scan Daughter Ion Scan Parent Ion Scan Neutral Loss (gain) Scan Potential Sweep Stable ion Scan Neutral Loss (gain) Mass (amu) Parent mass Daughter mass Source Type Chemical Ionization Electron Impact Ion Type Positive Ions Negative Ions Source Pressure (Torr) Chemical Ionization Gas Methane Hydrogen Methane + Dinitrogen Oxide Filament Current ,(Amperes) Emission Current (Milliamperes) Electron Energy (Volts) Ion Volume (Volts) Repeller Potential (Volts) CI Drawout-Potential (Volts) EI Ion Volume (Volts) Pre-Quad 1 Lens 1 Potential (V) Pre-Quad 1 Lens 2 Potential (V) Pre-Quad 1 Lens 3 Potential (V) Quad 1 Offset Potential (Volts) Quad 1 Mode RF Only (No mass filtering) DC (Mass Filtering) Scan Quad 1 Mass (amu) Quad 1 Delta M Quad 1 Resolution Pre-Quad 2 Lens 1 Potential (V) Pre-Quad 2 Lens 2 Potential (V) Pre-Quad 2 Lens 3 Potential (V) Quad 2 Offset Potential (Volts) Quad 2 Pressure (Torr) 81) 104) 105) 106) 107) 108) 109) 112) 113) 114) 135) 136) 140) 141) 142) 143) Table 3. CAD Gas 1) AR 2) N2 3) RE 4) SF6 5) 002 6) CH4 7) NZ 8) CH4+N20 Q3 Lens 3 Q3 Lens 2 Q3 Lens 3 Q3 Offset Q3 Mode 1) RF 2) DC 3) Scan Q3 Mass Q3 Delta M Q3 Res. EM Lens 1 Conversion Dynode EM Voltage Peak Finding Thres Min Peak Width Max Peak Width Scan Rate 56 4 (cont’d.) Collsion Gas Argon Nitrogen Helium Sulfur Rexaflouride Carbon Dioxide Methane Hydrogen Methane + Dinitrogen Oxide Pre-Quad 3 Lens 1 Potential (V) Pre-Quad 3 Lens 2 Potential (V) Pre-Quad 3 Lens 3 Potential (V) Quad 3 Offset Potential (Volts) Quad 3 Mode RF Only (No mass filtering) DC (Mass Filtering) Scan Quad 3 Mass (amu) Quad 3 Delta M Quad 3 Resolution Pre-Electron Multiplier Lens 1 Conversion Dynode Potential (V) Electron Multiplier Potential Peak Finding Threshold Minimum Peak width Maximum Peak width Scan Rate 57 Table 3.5 Dump of the parameters in the Reader file 11 variable parameters: Homms‘lmmbwnw l-u-o Modification History Registry Identification Number Quad 1 Mass Quad 3 Mass Scan type Quad 1 Mode Quad 3 Mode (amu) (amu) Neutral Loss (gain) Mass (amu) Source Type Ion Type Quad 2 Pressure (Torr) 24 static parameters: coooslozouswml-a l3 14 15 16 17 18 19 20 21 22 23 24 Comments: 70.0 14.9 13.5 10.1 -19.1 21.6 0.000 -53.8 4.20 -11.6 -4.90 -25.1 -5.60 0.144E+04 -12.5 ~14.0 0.000 0.000 0.000 0.150E+04 4.00 25.0 4.00 0.129E+05 TRIAL RUN. Electron Energy (Volts) Repeller Potential (Volts) CI Drawout Potential (Volts) EI Ion Volume (Volts) External Potential (Volts) Pre-Quad 1 Lens 1 Potential (Volts) Pre-Quad 1 Lens 2 Potential (Volts) Pre-Quad 1 Lens 3 Potential (Volts) Quad 1 Offset Potential (Volts) Pre-Quad 2 Lens 1 Potential (Volts) Quad 2 Offset Potential (Volts) Pre-Quad 3 Lens 1 Potential (Volts) Quad 3 Offset Potential (Volts) Electron Multiplier Potential (Volts) Quad 1 Delta M Quad 3 Delta M Quad 1 Resolution Quad 3 Resolution Quad 2 Pressure (Torr) Peak Finding Threshold Minimum Peak width Maximum Peak width Scan Rate Date of Experiment Record Number END 58 Record Contents (unformatted binary variable length) l2 I2 A10 A8 NUMSTC QDATE QTIME NUMVAR NUMSTC Number of static (fixed) parameters NUMVAR Number of variable parameters QDATE Date the dataset was created (DD-MMM-YY) ' QTIME Time the dataset was created (HH:MM:SS) I L 4. ‘_ :2 R4 l2LR4 “12 R4 ISTATC(I) ISTATC(Z) j . ISTATC(NUMSTC) RSTATC(I) RSTATC(Z) RSTATC(NUMSTC) ISTATC(i) Static parameter code number RSTATC(i) Static parameter value I l—F—-' l2 l2 4 2— l2 ’ lVAR(l) IVAR(NUMVAR) IVAR(2-) lVAR(i) Code number for variable i fir a--—» 'A10 A8 480 QDATE arm: ocoTANT QDATE Date this comment was entered (DD-MMM-YY) QTlME Time this comment was entered (HH:MM:SS) QCOMNT ASCII comment More comments Reader file format Figure 3.5 59 Table 3.6 Summary of Header file format 1) The number of static or fixed parameters (all computer readable instrument settings that will not be changed in this experiment) 2) The number of variables (those parameters which are likely to be changed during the analysis) 3) Date and Time 4) The code numbers and values of static parameters 5) The code numbers of variable parameters 6) Comments as needed, with time of comment entry in front of each. The header file is constructed so the comments are stored at the end. This allows comments to be entered before, during and after an experiment, thus providing exceptional archival value. The computer clock time is prefixed to each comment so that the coincidence of comments and particular sections of the collected data can be established during post-collection analysis. Pointer file There exists one record in the pointer file foreach scan performed on the instrument, and each record has as it’s first entry a pointer into the data file. This allows one to access any data from an experiment rapidly. The other elements in the record are either for housekeeping, or to enable faster display of the data. The last entries, ’the fast access variables, are especially important for multi-dimensional work. These are copies of variables 60 stored in the data file. They are redundant, but this redundancy gains speed when searching the dataset for specific results (see the next chapter for more details). Selected variables, kept in the pointer file, save significant amounts of time by not accessing the data file. For example, if we want. to examine the behavior of a daughter ion from a specific parent, and one of the fast access variables is quad 1 mass, we can quickly determine if we must retrieve the data for this scan. If quad 1 mass were equal to the parent of interest, we would get the data for this scan. However, if quad 1 mass didn’t equal the. parent of interest, we don’t have to access the data file, and we’ve saved the operator some time. The time savings can become significant (from seconds to minutes) for medium to large datasets. The contents and order of data in the pointer file are shown in Figure 3.6. Each scan produces one record in the pointer file. Each record contains the elements listed in Table 3.7. The pointer file was designed as a separate file for several reasons. First, since it is a small, direct access file, it can be quickly accessed. Each read of this file immediately brings the important aspects of this scan to light, i.e. the dependent variable, number of data pairs Record Number END 61 Record Contents (32 byte direct access records) I2 I2 R4 R4 l2 I2 R4 l2 R4 l2 R4 IPTDAT arms men vaeru) RVARF(2) mam) I NUUDAT RSUMY mam) IVARF(2) mam) IPTDAT Index (pointer) into the data file NUMDAT Number of X,Y pairs in this scan RTIME Time this scan was taken (seconds since start) RSUMY Sum of Y values for this scan IVARD Code for the dependant variable IVARF(i) Code for a selected variable RVARF(i) Value for a selected variable Identical to above Pointer file format Figure 3.6 62 Table 3.7 Summary of one record in the Pointer file 1) The record number (pointer) where the data for each scan begins in the Data file 2) The number of data pairs (for example, mass/intensity or energy/intensity) in the scan. 3) Elapsed time (in seconds) from experiment start to scan start 4) The sum of the Y values (for MS, the total ion current) for the scan 5) A code number for the dependent variable in the scan 6) Copies of three selected variables and most importantly, three selected variables and their values. The second reason the pointers-were placed in a separate file was for extensibility. More scans can easily be added without wasting any space caused by preallocation of space within the data file. Data file The data file contains one or more records per scan. The contents and order of data in this file are shown in Figure 3.7. Each record or set of records associated with a particular scan contains 1) the values of variable parameters, such as probe temperature, lens voltage, etc. and 2) the X-Y data pairs (intensity vs. dependent variable). The logical records in the data file are variable length; one scan may have 10 X,Y pairs and the next may Record Number om\I m 0' # (N N END 63 Record Contents (64 byte direct access records) Values for variables, scan I ___l X,Y data pairs. scan I ~777 /~’ ,«71 WéZfiv/g Values for variables, scan 2 X,Y data pairs, scan 2 %/ L F C .... .5. @ZfiW/Z Data file format Figure 3.7 64 have 1000 pairs. In this scheme, variable length logical records have been imposed onto one or more direct access physical records. The direct access records allow us to jump quickly to a specific scan. One disadvantage of imposing variable length logical records onto fixed length physical records is that some space following the end of the logical record is wasted. With small physical records, this wasted space is small, and is outweighed by the convenience of direct access to the data. Table 3.8 is a display of one logical record in the data file. The physical record number that is the beginning of each logical record, or scan, is stored in the pointer file. Each logical record consists of the values of the variables (in the order specified in the header file), and the X,Y data pairs. Each of these values is stored unabridged as real numbers. Since the data are stored in a separate file, adding new scans of data is simply a matter of extending the file. Creation of a data set The multi-dimensional data base described follows the hierarchial model. Those variables, instrument parameters and miscellaneous data that do not change during the iexperiment, form the foot segment of a l:N tree and are stored in the header file. As data are acquired by the 65 Table 3.8 Dump of one logical record in the Data file Scan number 1 Time (seconds) 0 Sum of the Y’s 0.600E+05 Variable parameters: 1 0.000 Modification History 2 1.00 Registry Identification Number 3 -l.00 Quad 1 Mass (amu) 4 0.000 Quad 3 Mass (amu) 5 lscan Scan type 6 Scan 01 Mode 7 RF Q3 Mode 8 0.000 Neutral Loss (gain) Mass (amu) 9 EI Source 10 Pos Ions 11 0.000 Quad 2 Pressure (Torr) Ql Mass intensity 01 Mass intensity 18.100 4433.000 32.000 6607.000 28.000 19263.000 66 instrument control computers, they are stored in scan records consisting of one record in the pointer or index file, and one or more records in the data file. Again, the hierarchial model is followed and those variables not scanned are stored together with a link to the scanned data (actually, the variables and data are stored together in the data file as described earlier, and the link is implicit). Business style data bases are designed with ”data definition languages" (60-62) which define the records and fields available. A multi-dimensional dataset is defined by creating the header file, specifying the static and variable parameters. This process is usually done by the control computers and the user is not required to learn a ”data definition language” or become a data base administrator. This ability to create a dataset tailored to an individual experiment allows for very fast data storage and retrieval. when an instrument is modified and a new parameter added, the dictionary file is simply updated, and new datasets may be created containing the new variable. Conclusions The data base described here has several unique features: 1) It has the ability to store multi- 67 dimensional data in real time; 2) Datasets can be automatically created and tailored to individual experiments; and 3) The data base can be extended with a simple addition to the dictionary file. These features set this data base system apart from those used in business (no data definition language, no complicated structures or programming required). Since each dataset holds information about only one compound or experiment, the overhead and complexity of a complete laboratory information management package (66,67,70,73-74) are eliminated. The subroutines that comprise this data base are all capable of handling multiple datasets. This is necessary for matching spectra and doing various "massaging" functions on the data, such as averaging spectra. Various programs for matching and data manipulation have been written by Kevin Cross, of MSU. Examples of the use of this system, and a program for extracting orthogonal planes of data are presented in the following chapter. The instrument data base system described here is an efficient, extensible and modular set of routines to store multi-dimensional data (24—25) in a hierarchial data base. The dictionary file is an extremely powerful mechanism for adapting this system to ever changing instrumentation. This system has been in routine use since 1982 on three 68 triple quadrupole mass spectrometers, each with a different set of ion optics and features. The three file formats for the data allow a dataset to expand almost without limit. The provisions for the rapid retrieval of data make the system easy to use. The ease of programming and modularity of the subroutines has been proven by the variety of applications for which it has been adapted. Chapter 4 Retrieval and Display of Multi-dimensional Data Introduction The ability to store and retrieve data is essential to its utility. New instrumentation, capable of creating and collecting more data than can be practically analyzed or far more than is ultimately needed is constantly being used to collect unwieldy amounts of difficult-to-access data. Any new computer-controlled instrument can be told to blindly collect and store data, and a disk quickly becomes full of data, much of which will never encounter human observation. However, with the right tools, one might be able to automate a search through this sea of data to extract trends, obtain minimum or maximum values for parameters, or otherwise gain some appreciation for the collected data. With the ability to automatically collect grand amounts of data comes the need to automatically sort and analyze them to extract the reduced set of trends or conclusions we seek (79-80). In triple quadrupole mass spectrometry, large amounts of multi-dimensional data can be collected by the control computers. This instrument is capable of collecting much 69 70 more data in one hour than the operator could possibly. analyze in a month. This is not to say that large quantities of data should not be taken, Just that tools to look at these data must be utilized. If such tools are not available, and cannot be developed, a more selective approach to data collection is required to simplify the analysis. The latter requires the ability to anticipate over which range of experimental variables the needed information will be found. Since the triple quadrupole MS/MS technique is relatively new, the knowledge needed to anticipate the role of the various parameters is not yet available. In fact, copious amounts of data must be collected in order to assess the effects of these parameters. Since we must take large quantities of data, and since suitable software tools for data analysis were not available, we developed our own tools, notably a program called EXTRACT. The previous chapter described a multi-dimensional data base suitable for storage of data from MS/MS experiments. Fast storage of data into this data base is essential while the instrument is operational; fast retrieval of data from the data base is also a requirement. In the first case, fast storage is required since the sample may have a limited life. In the second case, fast retrieval of data from the data base is required due to an operator’s limited patience. The program we have developed, EXTRACT, provides 71 for the quick presentation of results from the dataset, as well as several other ’user-friendly’ features. In this chapter, I will discuss the considerations that went into the design of EXTRACT; how it works, the user interface, examples of use, internal configuration of the program and possible future directions for this and other data retrieval programs. Appendix 3 is a copy of the EXTRACT User’s Guide for details on the operation of the program. Data retrieval The data retrieval methods for business and scientific data bases have the same goal: to extract a user defined subset of the dataset for display or a report. The query languages for business systems often involve data manipulation languages, query languages and report generators (60-62). While these full report generation systems are extremely powerful and flexible, programmers or data base administrators are often needed to create the templates required for even simple reports. Since the data base needs of the business community are relatively static (i.e. same inventory form used every week), the weeks or months required to tailor a report generation template are justified. Full laboratory information management systems (LIMS) often have full reporting capabilities (73-74), but our 72 needs dictated a much more simple interface. LIMS packages are more akin to business systems and generate standard sample analysis reports. A scientific data base is needed for raw experimental data, tools are needed to view these data from a variety of viewpoints and scientists often need to interactively search the data base for the desired information. The retrieval program must include provisions for: 1) the uncertainty (error) present in our measurements; 2) ever changing instrumentation; and 3) simplicity of use. Simplicity of use is a key point; scientists generally don’t want to learn a full query language or design report templates to see the results of their experiments. EXTRACT - the program EXTRACT is a general program for the retrieval of data from a multi-dimensional dataset. This program is completely generalized, and can be used with any instrument that uses the multi-dimensional data base described in Chapter 3 (25). The only element that links the data in a multi-dimensional dataset to a specific instrument is the correlation between a parameter code and a physical or logical parameter of an instrument, as defined in the instrument description (or dictionary) file. EXTRACT is instrument independent; it makes extensive use of the dictionary, and all displays are derived from this file. 73 In this way, the report generation displays are automatically created from the contents of the dataset and definitions from the dictionary. How EXTRACT works Data, as stored in the multi-dimensional data base, are stored in scans (intensity vs. something) as recorded by the instrument. If the operator wishes to see these data in this format (i.e. one of the scans performed by the instrument), it is a simple matter to retrieve that scan. However, if the operator wishes data presented along an axis that wasn’t scanned, EXTRACT must search through the dataset extracting the data that the user wishes to see. To do this, the operator sets the limits of the search and instructs the program to extract data matching these criteria. The base rule of EXTRACT is to exclude as much of the data in a dataset as possible, and then to present the user with all data that remains. In this way, we are assured that no datum will go unnoticed unless we specifically reject if from further consideration. Specifically, EXTRACT follows the steps outlined in Table 4.1. 74 Table 4.1 Outline of how EXTRACT functions 1) EXTRACT loops through each scan in the dataset. 2) The variables stored with this scan are examined; if any variable is outside of the user set limits, this scan is rejected. 3) The data in this scan are examined; if the user requested values that don’t exist, this scan is rejected. 4) The values the user wished displayed are extracted and saved. 5) The next scan is examined. By following these steps, we are certain that all the data that fall within the user-specified limits are extracted and saved for later display. When a datum is retrieved from a scan (step 4 above), the minimum and maximum values of the variables throughout the extraction process are saved. These values are then presented to the user after the entire dataset has been searched, and serve to inform the user .of the status of the extraction just performed. Say that the user extracted some data, but did not set 'any limits for the CAD gas pressure. If the minimum and maximum extracted values indicate a wide pressure range, the operator may wish to set limits on the pressure and extract the data again. The user sets the acceptable limits for variables, extracts the data, and the program displays the actual minimum and maximum values for all the variables. In this 75 way, the user may interactively modify the extraction limits while keeping an eye on the resulting data. EXTRACT internal structure EXTRACT consists of three main subroutines, a user interface (EXEDIT), a graphics interface (MSPLOT) and the extraction subroutine (XTRACT). User interface Various types of user interfaces were considered for this program. There are four main types of user/machine interfaces (see Table 4.2). Table 4.2 User/machine interface types 1) question/answer or prompting 2) command or switch oriented 3) menu driven 4) icon driven The first type of interface, prompting, is the simplest to code, but the worst to use. Prompting, or question and answer, is also the easiest for a novice computer user to understand and use; the computer asks questions and the user answers them. However, this type of interface is the 76 most difficult for an experienced user of the software, for he or she too, must answer each question in turn, and this is a time consuming process. On the other hand, a command oriented interface doesn’t prompt the user at all. With this style interface, the user is assumed to be an expert, and has all the commands or switches memorized, and they are entered as needed. For a novice, or any user that doesn’t use the software routinely, this type of interface requires the user to have a copy of the user’s guide beside the terminal. The fourth interface, icon driven, is a special graphical form of a menu-driven system, and requires a graphics terminal to operate the software. The menu style interface, not requiring a graphics terminal, was selected as appropriate for EXTRACT. As described in the theory section above, the user must set the limits of variables, and EXTRACT displays the actual limits found. To be a useful, interactive program, all these limits must be presented to the user, the user must be allowed to change the extract limits, and see the results of these changes. A menu format, on a video terminal, fulfills these requirements. The format selected for the menu is shown in Figure 4.1. This display consists of four sections: two lines for the header or titles, up to 20 lines for the ... a.=u.. ao3 REC-3.5.; afln§un 9.: 039.3(3— HU. nsocxgv nvuiau 8v (an—0&0“ Asp-*3: .nflu'u— 0.: “SH age-SH Avaqu 06v 003 .5935. 3...:.8. 3 .3335: .5055. 3...: on. one: 8 .585. 3...: o... one: 8 .5555. 3...: o... «.3. 50m .585. 3...: o... .3: no .5345. 3...: o... 8a: 8 .585. 3...: o... .- Sc .5555. 3...: o... 3...... .8: .5935. 3...: o... 3.3 .9 .5355. 3...: o... > so Sm .5055. 3...: o... a...» .5055. 43:318.. 55mm xdl Cal XI! .3! 0—8073) yam. Cum. XII Cal Quads) 3.3... .....u 3...... 38.5.0 3...:... ....u 3...... 33.3.0 78 display of the variables, a line for the commands and a last line for expanded descriptions. Each of the variable displays consists of 40 character positions, including 10 characters for the variable description, and 4 numeric fields of 6 characters each. This crowded display is necessary to display all the variables, the user set limits and the resulting extraction limits. These extraction displays are generated from parameters stored Tin the dataset. The parameter code numbers (form the header file) are looked up in the dictionary file to produce text for the display. All text displays (except the two header lines) are generated this way, making EXTRACT a fully generalized program able to retrieve data from any dataset as described in chapter 3. iThis automatic creation of customized report generation screens makes EXTRACT easy to use. Subroutine EXEDIT EXEDIT is a screen oriented editor, allowing the user to enter and change the limits of the extraction. EXEDIT is a driver that keeps track of the values and their locations on the screen, and calls subroutines to update the screen and parse command input. The command input 'subroutine is special because it accepts single character input from the terminal, decides if the user is starting to 79 type a command or a number, and prompts the user accordingly. By doing this, the advanced user may reduce the number of keystrokes required to enter extraction limits. This input subroutine also allows the user to enter both the minimum and maximum limits separately, or enter them at the same time. Upon entering the EXTRACT program, none of the variables have extraction limits set (as in Figure 4.1); the user must set these limits as desired. To change a variables limit, the user moves the highlighted or active area using the cursor keys on the terminal. Once the- variable to change has been selected ("Scan” in Figure 4.1), the user simply types in a new value, and that value is inserted into the display. Since each variable position on the screen is only six characters long, some significant figures may not be displayed (see Table 4.3). Table 4.3 Numeric representation for EXTRACT Full number Truncated number integers and floating 42 42 point numbers less than 123.4567 123.45 six characters long *63.7 —63.7 floating point numbers 1.23E+07 .123+8 greater than six digits -2.34E-17 -.2-16 The worst case truncation of the numbers is for small negative numbers, where the number is truncated to one digit plus the exponent. Note that EXTRACT displays the full number on the last line of the display. 80 To alleviate this problem, the last line on the screen is an enhanced description of the active area, including the full. dictionary definition, and the full value, or the value’s description, if there is one. EXTRACT is based on the principle of retrieving a plane of data from a multi-dimensional data base. The retrieved plane of data can be visualized as a plot of Y vs. X. These X and Y variables are chosen by positioning the active area to a variable, and typing either X or Y (or Z for three dimensional plots, see description below). The selected X and Y variables become the axes titles for the resultant plot. Presentation of EXTRACT results Once the user has entered all the limiting values for the extraction and specified the X and Y axes, the program is told to extract all data that are not outside of the given limits. This extraction process can take from microf seconds to several minutes, depending on the size of the data base and the type of extraction desired. When the extraction is complete, the screen is updated, showing the user-requested limits and -the actual extraction limits found. The actual limits will always fall within the given limits, but may indicate a larger range than acceptable. For example, if no limits were set for a variable, say 81 quad 2 pressure, an extraction performed, and the resulting display showed that the actual limits of quad 2 pressure ranged from zero to 1.0x10" torr, this would indicate that several points were extracted at various pressures, and the resulting plot would be meaningless. In this case, the user should limit the pressure range, and extract the data again. When an extraction has yielded limits within an acceptable range, the data may be filed and plotted (using the graphics program MULPLT). The link to the graphics program has been automated (subroutine MSPLOT) and can be called up in two keystrokes. This automated link sets up a variety of scaling factors, axes limits and other parameters for the user, in one of four formats: point plot, line plot, bar graph and spectrum plot. The first three of these are options of MULPLT, but the fourth, a spectrum plot, is a specialized bar graph that normalizes the data to the base peak. By calling MULPLT (81) directly from EXTRACT, either simple or publication quality graphics, on a variety of graphics devices, can be obtained. (MULPLT was originally written by Dr. Tom Atkinson, but I have extensively modified it in recent years, adding such features as color, new commands, new device support, shading of bar graphs, speed enhancements, a post processor for raster printers and multiple plots, etc.) 82 EXTRACT is capable of generating pseudo three dimensional plots by extracting a series of two dimensional planes of data, each offset by a specific ’2’ variable increment. These planes of data are extracted and filed, and a post processing program (PLOT3D) is used to generate~ a series of MULPLT commands to produce pseudo three dimensional plots. These plots are not true three dimensional plots; they are just two dimensional plots offset with X and Y increments. True 30 plots are projection plots, and usually have the ’hidden’ lines removed. They are "prettier”, but the hidden data are inaccessible, a feature considered undesirable for most of our applications. Subroutine XTRACT The subroutine XTRACT was written with speed in mind. Since one complete pass through the entire dataset is required for a thorough extraction of data, hooks are built in to extract more than one plane of data in one pass through the dataset. While the ability to extract several planes of information in a single pass would dramatically speed up three dimensional extractions, it would have no effect on standard two dimensional extractions. Currently, three dimensional plots are semiautomatically made by EXTRACT calling the subroutine XTRACT many times. Each time XTRACT is called, one more two dimensional plane of 83 data is added to a three dimensional .file for future plotting. Although the multiple plane extraction feature has- not yet been utilized, there is no reason to believe that it will not work. To implement this feature would require modifications to the EXTRACT main program and possibly to the EXEDIT subroutine. Examples of EXTRACT use EXTRACT has proven to be a very useful program for looking at data collected by a triple quadrupole mass spectrometer. By using this tool, we are able to collect large amounts of data from one sample, and present them in a variety of ways. This allows us to easily pick out trends in the data, and identify the effects of instrument parameters on the collected data. By using EXTRACT, we have been able to quickly gain new insights into the processes going on in the TQMS instrument. A five dimensional matrix (intensity, quad 1 mass, quad 3 mass, collision energy and collision gas pressure) of data is often obtained for a compound on the TQMS. An example of such a dataset is used for Figures 4.2 and 4.3, of the compound cyclohexyl-acetic acid. The upper portion of Figures 4.2 and 4.3 shows the extraction limits used to produce the plots shown in the lower portion of the figure. In the case of Figure 4.2, a pressure plot for a specific 84 .. N .v 9:3... sowuosmuxo ousuuomn .s> unwasous« «o uoam oflmlsxm _ 3380.... «no co_m=_oo Norm. .o .6th ...- m: .o ri- - u b . u n h b n n n n - - - O . \tll [comm No ~\E .53.. Eat. so ~\E .3395“. 2.... 0.3801335203 ml ms nu ml 0004.3 ma . 3...: o... .5. N w 3...: o... 330m TEN. 7.8.. 3...: 2.. 53...... 8a m m 3...... oc. . one: 8 m 3...: e... on... «e m m 3...: o... 83. Sum Nib Nfih mfim mdw one: 8 «.8 «.mm mdm mém use: «a 8 3...: o... o as s s . 3...: o... 3.: so: one 3...: 05 03s) >H 8” man. . 3.8: 05 > ....o lam 8 ENE. 3...:. o... a...» m... 8 8m 19.1 50m 5! xsl 5m 030....) xs... 5.. .xs... SI 030......) 0330.1 .0...“ 03!... 0095...” minnow ..wa 0.3!...— 0093..“ Kigsuazug 85 _ n.v omsm«h so«aosmaxo humans .n> >a«asous« Mo uo«m o~mlaxm 33>. .8... N no.5 ON 0. o of: owl on... owl on! p —y b — L n p - P - n — b — p o m. . ... 5. Icons No N\E “cocoa 50.. so u\E 5:333. 28 2.8... 3.2663 .mm mm: 3...: a... .836 8a « . 3...: o... 28 N N 3...: o... 3.38 mg. mn3«. n.0««. 7.3m. 3000...... me n m 3...: 05 one: 8 N N 3...: a... can: «a m m 3...: o... 253 zoom N35 so mfim m6... .3... 8 «.Nm ..8 R8 m.«m one: «0 av. 8. 3...: o... a ..«m s s 3...: 8.. 3a.: .8: .8 8.. 3...: o... 3.3 .9 vmmmN Eh 3...: o... > 4.. ....m E «8.. 3...: 9.. ...... D: 8. SN Infill. 50m .3. Cu. XII Cal Qua...) XI. .3. XI .3! O—iaaxo 9:3 .3. .¢«I«J 98.3.0 00—3001 3a...“ 0.3-1.— «01.5.0 86 parent/daughter combination at a specific collision energy has been extracted. Figure 4.3 is from the same dataset, showing an energy plot for the same parent/daughter at a specific pressure. EXTRACT is capable of generating pseudo three dimensional plots, enabling the user to see trends in several dimensions at once. Figure 4.4 is an extraction from the same dataset used in the above examples, but now three dimensions of data are displayed. In this example, if we were looking for the energy/pressure that resulted in the largest daughter intensity, we would see that quad 2 offset needs to be about 5 volts, and the collision pressure about 2x10'4 torr. Conclusions EXTRACT has proven to be a valuable tool in the analysis of multi-dimensional data. The ability to extract data in a plane other than that scanned gives the operator more flexibility designing an experiment. For many samples, collecting a full three dimensional map (intensity, quad 1 .mass, quad 3 mass) at a variety of collision pressures and energies allows us to determine the optimum collision parameters. 87 \ \\ \ \ \\ \ \_ acoeod E9; mm N\E coacoooo Eco osmoo.;>xmzos>o ¢.e ensues newuoshuxo menu aoooweoolwv conga no «can oumlaxm scammed mom co_m_=ou \.\\ M“\\W\.MUV .\\ \ .1 Doc: "slug .... .->u ( 88 EXTRACT is an instrument-independent. program; it is capable of working with any multi-dimensional data base (as described in the last chapter). This is due the use of an instrument description (or dictionary) file. EXTRACT has no instrument specific parameters coded in; instead the dictionary file provides translation for the variables and other parameters associated with the multi-dimensional data base. The generality of EXTRACT and the multi-dimensional data base was proven by their use with simulated electrochemical data. Simulated data was generated and stored in the data base. EXTRACT was used to retrieve and display these data in ways that would have been difficult or impossible to simulate directly. The use of the dictionary file for EXTRACT display has the added advantage of easing report generation. With business data bases, the field names must be known and entered with each query (”list all employees with salary GT 35000”). The menu format for EXTRACT displays all the parameters in a dataset, and the cursor keys provide an easy way to select the variables. EXTRACT provides for range limits and feedback for each variable to minimize the impact of data uncertainty. Being able to select the extraction limits for a variable and see 89 the actual extracted range for all variables make EXTRACT a powerful tool for retrieving scientific data. Chapter 5 Extracting the Information Contained in MS/MS Data Introduction The triple quadrupole mass spectrometer is capable of generating large amounts of data. The analysis of these data, in a reasonable time would be impossible without the use of the software tools presented in the last several chapters. These and other tools allow the operator to examine the collected data and display them in a variety of formats. These displays present the operator with the raw data as they were collected (or possibly normalized), and allow the operator to interpret the results. The next logical. step in computer-assisted problem solving is to automate the extraction of the information present in the data. There are two predominant ways that a computer can help interpret data from a mass spectrometer: expert systems and pattern recognition. Artificial intelligence (AI) is a field of computer science dealing, generally speaking, with making computers "think”. A subfield of AI is the study of expert systems. An expert system (82-84) is a computer program that mimics a human expert in a specific field of expertise. This is currently possible only for small, well 90 91 defined problems for which there exists a well-defined ”knowledge base" of information on which the program bases its decisions. Essentially, these programs apply a set of "rules" to the data, presumably the same rules that human experts use, to extract the information present in the data. Several researchers have applied expert systems (85-93) to mass spectrometry. The DENDRAL project, from Stanford University (21,85-93) has shown that expert systems could be used to help interpret mass spectra. The original DENDRAL algorithm, developed by J. Lederberg, was able to identify all possible acyclic molecular structures given a set of constituent atoms. Heuristic DENDRAL achieved the same objective in less time by using mass spectrometric data and rules to infer constraints on the structure gen-l eration. Meta-DENDRAL was then'developed to automatically generate the rules for Heuristic DENDRAL. CONGEN was later developed to replace the older DENDRAL algorithm, and was able to generate cyclic structures. DENDRAL has three functional units, Plan, Generate and Test. The planning phase uses rules to constrain the generate phase (using CONGEN). The test phase then uses another set of rules to ”fragment" the generated structure and compare the resulting mass spectrum with the unknown. In this way, the generated structures could be ranked, and top ranked structures are often indicative of the unknown’s structure. 92 One program from the DENDRAL project, GENOA (22-23), is a program to generate all chemically possible structures from the molecular formula, constrained by any additional chemical information presented to it. We are currently using this program in our laboratory as one part of our structure elucidation scheme (see Figure 1.1). Another expert system applied to mass spectrometry is the system developed at. Lawrence Livermore National Laboratory (95-98). Currently this system is capable of tuning a triple quadrupole mass spectrometer based on both signal intensity and peak shape. Pattern recognition is a field of study based on the assumption that data can be clustered or grouped into distinct sets (50-54,85-87). Each of these sets is presumed to have one or more characteristics that distinguish it from other sets. A subset of this field is routinely used in mass spectrometry: spectrum matching (85-87). Any MS data system capable of searching a library for reference spectra that "match" an unknown is performing a simple pattern recognition task. The criteria for separating normal EI mass spectra into groups, with top ranked matches, have been studied extensively (40-49). In this chapter, I will present two tools that help a chemist extract a small portion of the information present in MS/MS data. MS/MS data are different (i.e. more 93 specific) than "normal" MS data, and .new rules for information extraction are required. The first tool presented is a knowledge-based program to aid in the analysis of neutral losses from parent ions. The second tool is an example of a pattern recognition technique, and is used to match daughter (or neutral) spectra. Information contained in MS/MS data Mass spectrometers have traditionally been used to aid chemists in determining the structure of unknowns. The information contained in a typical mass spectrum comes from several basic sources, including 1) the absolute mass to charge ratio of an ion; 2) the relative m/z value relative to another ion in the spectrum; 3) the intensity of an ion peak relative to the base peak intensity, total intensity or another ion’s intensity; and 4) the absence of ions at certain m/z values. The absolute mass of a peak shows the presence of a relatively stable charged species, giving an indication of it’s chemical makeup. The relative masses of two or more peaks gives the mass(es) of non-charged species that may have been lost from the higher mass ion. The relative intensities of the ion peaks in a spectrum yield information about which fragmentation pathways are most predominant. The data present in a mass spectrum are interpreted by analyzing these absolute and relative masses and intensities according to'well-defined rules (99). 94 A set of all daughter spectra from a sample contains all the information present in an El spectrum from the same sample. Each daughter spectrum has the advantage of being simpler than the E1 spectrum of the molecule, but the concatenation, or overlapping, of all the daughter spectra produces a spectrum similar to the E1 spectrum. The relative simplicity of a daughter spectrum is due to the selective nature of the first mass filter and the less severe nature of the second collision. Each daughter ion contributes evidence of the composition of its parent ion, and can be used to help interpret the structure of the parent. Due to these differences, new information. extraction rules or algorithms are needed to analyze MS/MS data. Neutral spectrum A neutral spectrum is similar to and derived from a daughter spectrum. A neutral spectrum presents the relative amounts and masses of neutral fragments lost in the fragmentation of a parent. These scans are easily derived from a normal daughter spectrum by simply subtracting the daughter masses from the parent mass. The resultant spectrum contains the same information present in a daughter spectrum (less the parent ion, now at mass 0), but is presented in a different form. Figure 5.1 shows intensity intensity 95 2—methyl-4—pheny|—2-butonol daughter spectrum of m/z 147 1oo - 50 -4 .4 O l I r I r T! I r I l I I I1 rfi I I r T I I I I I rj I [j 1 o 20 40 so so 100 120 140 ' m/z neutral spectrum of m/z 147 100 -' so - ° rTrrTrTT'er'r'Trrrr'r'rrrfl o 20 40 so no 100 120 140 m/z Comparison of daughter and neutral spectra Figure 5.1 96 both daughter and neutral spectra for parent m/z 147 from 2—methyl-4-phenyl-2-butanol. One of the techniques for identifying an unknown from its EI mass spectrum is to characterize the neutral losses from ions in the spectrum. The characterization of these losses is complicated by the richness of an El spectrum, in that it is difficult to tell exactly which ions are formed from neutral losses from the molecular or any other ions. Losses are most easily observed from the molecular ion, looking backwards down the spectrum. ‘ If the next ion present is 28 daltons less, the parent lost either CO or Czflc. However, as we proceed down the spectrum, it becomes unclear where a specific loss occurred from. This limits the usefulness of this technique to confirming postulated structures. In tandem mass spectrometry, we can determine exactly which daughter ions are formed from which parent ions. If we set up the collision conditions (CAD gas pressure, axial energy) so as to ensure only a single collision for the parent ion and little or no chance that the resultant daughters will further fragment, we know that all the daughter ions formed are direct, single-event fragmentation products of the parent. If the collision gas pressure were too high, some of the daughter ions formed would again collide with the target gas. These second and higher order 97 fragmentation processes give the same types of uncertainty in a daughter spectrum as in an E1 spectrum. Therefore, if we keep the collision conditions conservative (CAD gas pressure low, medium energy), we produce spectra of first- order collision products. These spectra are not as rich as spectra produced at higher pressures since only first-order products are generated. However, first-order daughter and neutral spectra give direct, definitive information about the composition of parent ions. Daughter spectra and neutral spectra present essentially the same information, but this information can be used in a complementary fashion. Daughter spectra present those species that retain the charge during fragmentation, while neutral spectra present the uncharged fragments. Deriving neutral spectra gives spectra of a form amenable to the same matching techniques used for ion spectra. If the spectra were left in daughter ion form, different pattern recognition techniques (such as sliding correlations) would have to be used to group spectra by neutral loss information. By analyzing both of these spectra, we are able to deduce some information about the parent ion’s composition and structure. 98 A simple expert system for neutral loss analysis The low mass neutral fragments are simple structures, generally corresponding to simple, common neutral losses. I have written a program (ANEUT) to aid in the determination of these losses. ”Expert knowledge" may be encoded in a knowledge-based system, such as an expert system, in two primary ways: as rules and as base of knowledge upon which rules can draw information (85-87,90-93). The knowledge base for this program was derived from a table of possible losses from appendix A.5 in McLafferty’s Interpretation of flags S ectra, and was enhanced as we gained experience with the program. The conclusions drawn from this deductive system are simply a list of possible neutral losses - it is the chemist’s responsibility to utilize the resulting information and update the knowledge base as required. ANEUT knowledge base and rules The knowledge base is a text file which allows for the easy extension of the knowledge available to the program. This file consists of one entry for each possible loss, or series of losses. Figure 5.2 shows several entries.from the current file, and the format of each entry. For series 'of losses, such as CnH2n+1, an alkyl loss, limits are placed on the values of n (for this case, n is greater than 99 «.m mousse 9=m2< new moon ouvouzosx one loam nomsaom mEmEEoo \\ll. :JILHHHHHUIII ||t|t\\\/lllr rtl|HHHHtlltis 1!!!! 2282 35:2 38 s. 233 stops .8. .32 +.o..no!.~_ £2.38. 2.93 .muczanoo 523 33:82 :38 : so. 2352 emcee .3/1. .\/4.:.\.\-- ,I. \.7.>.\/ o «:5 .o m: 2.0 .o o o A_+=sz co.m .o A_+=~V= =0.o~._ Co co.m ._ 100 0). These limits, the formula, and a comment comprise one record. The comment describes the conditions or conformation that lead to a specified loss. There are currently only three ”rules" for the program ANEUT. One rule is used to match the actual losses (found in the spectrum) with the possible losses (generated from the knowledge base). Another rule is used to allow higher mass neutrals to be a sum of lower neutrals, and the final rule adds a "quality" to a possible loss. The primary rule is simple: if the mass of an actual loss is within 10.3 daltons of the mass calculated from the knowledge base, that loss/knowledge base entry correlation is retained. The mass deviation of 10.3 allows for instrumental effects (the instrument may be slightly off calibration, etc). This rule searches the entire knowledge base generating a list of correlations. It should be noted that this rule does not guarantee a complete and exhaustive check of all possible neutral losses, instead it relies on the knowledge base to contain a fairly complete list of the common neutral losses. The second rule allows several of the neutral losses to occur, and effectively adds their masses together. The third rule adds a ”quality" factor to certain of the correlations made by the first rule. If the unknown daughter or neutral spectrum contains a series of losses 101 (such as 15, 29, etc. from CnH2n+1), each of these entries is flagged as being the likely loss. If just one of the elements in a series is present, it is not flagged: This is based on the assumption that if a series of losses is possible, the fragmentation process will generate the entire series. Example of ANEUT use Examples of ANEUT’s use are seen in Figures 5.3 and 5.4. Figure 5.3 shows the neutral spectrum from parent m/z 108 (the molecular ion) of 1,4-benzenediamine and the output from ANEUT. As can be seen from this list of possible neutral losses, the peaks at l and 27 daltons are well accounted for (H and HCN), while the peak at 28 daltons is the loss of both H and HCN. Figure 5.4 shows the neutral spectrum from parent m/z >147 (probably HOOC(CH2)4COOH2*) from bis—2-ethylhexyl adipate and the results from ANEUT. After ignoring the Cl, N and S containing structures, we are left with fragments indicative of acids and alcohols. The comments used for ANEUT’s output are directly from McLafferty’s table which was derived from normal EI spectra. These comments represent the possible conformation of the original molecule and take into account the extensive rearrangements possible in the source. As a 102 m.m mosses ucwlowoososconlv.~ mo mwahaoso moo lsauoomm accuse: Ow. «Amuse to Amuse so noes saw-uuaateaut owesoos. .enuuoosaua «a so «.mm pa + H cusses mo cowuscwnlou m.mm +oru0I|z .nucouox awashu .svssoAlou souaxo ewes-os< o o «.mu .2"... 0:26 .8..qu .dsrzuzuz?‘ «2 New A=I=V+A:~I.mv so Ammlmv mo wood use-oucssseus uwmwouom .moooa n: we m.~v nonsense .nvcsoamou omdo>uos0uoz souosuwz z o a m.Nv . «oeseoesa can noessaose pesos .uoeseaso ssxsm u: masses m e.mm sumo-loo o~sssoa omen ~\:c on ow o. —[PF-Pht-b|—- p—Pbbp-IPP-hnnrph- P-r C a. e I on I oo. Aco_ 3300.95 no. ~\E acucoa mc_Eo_UmcmNcmml¢._ U1 \ O {usua mm mm mm an hm am so: 103 e.m unsung oaomwos amxomtfizsaulwlnma mo mwmhnmsm moo assuuumm Hmsusuz monomauhs< .mNomm «O m “Um ham 53830850.. .3381:— = o as 0 m 0 mm. 0 5n :28 a canned new: mooo.m sate “mosuo.e none ace mo.aaos a o as no m.m m.me sumac no use“ .55qu new: no: he 003 N: o n o m: o m.m one 00003035305 aloe .eocfifiss «and. Tamas: m m: 0 m6 m.mv whousoouuwc .suwumIOHeouuwz. No 2 m.m m.mv Iomuuol 03050 5 333 not; :89: lo...“ :8 + 00 mo 003 a o m o o m.m mdv 3 + S cusses .«o 5325430 En cdm noun" 8 a Tm odn maonooamhxosuu! .waoao0~m«n a o m m o : H.m +o.mm z 0:43 :33 moo?"— moum £085.". some now mo :3 m o m mdm and , 3230: o as now 93 ease-loo ads-Lou Oahu use: N\c. . on on on on at on ON Op 0 Fun-b—nnnn-pnnan-nb-muhn-Prbn—nbhn-mb-b-nmpb—Pan—bh-n-h-mb-n mn-bbbn-m-nb-nnbb N M“ I on m. r S. r ,M m I oo— A+NIOOU¢ANIQVUOOIV C; u\E “cocoa Bogus Exogiésmimrma we we me mo 9. mm mm mm ma m~ loz 104 result, these comments may only be useful for analyzing neutral species lost during the fragmentation of the molecular ion. Future extensions to this system include constraining the matching process to include or exclude certain atoms. For example, if we know that the compound doesn’t include nitrogen, several of correlations will not be displayed, shrinking the list of possible losses. Alternatively, we may know that the unknown contains only carbon, hydrogen and oxygen, effectively eliminating all other elements. This same program, used with a different knowledge base, could be used to aid in the identification of peaks in a daughter spectrum. This program or similar expert systems can be (and will be) used in an iterative manner with others, including GENOA. The more information that an expert system is given about an unknown, the more certain its analysis of the unknown. For example, output from ANEUT can be used as constraint input into GENOA, limiting the number of structures generated. Data groupings The data content of a daughter spectrum is not as great as that in an El spectrum; however, the information content may be greater if the conclusions are more certain. The 105 sum of the information contained in all the daughter spectra for one compound could then greatly exceed the' information available from the E1 spectrum. Many information extracting techniques have been applied to E1 spectra to deconvolute them into simple patterns (52-54). Some of these pattern recognition techniques have not fared well, being overwhelmed by the overlapping data present. These same techniques applied to the daughter spectra from a compound don’t reveal any new information, because the daughter spectra are already well grouped subsets of the E1 spectrum. A daughter spectrum of the molecular ion bears a great similarity to the E1 spectrum, however the daughter spectra from other than the molecular ion are relatively simple. The normal EI spectrum contains fragments from all substructures in the compound, while a daughter spectrum contains fragments from one (or a small number) of substructures of the molecule. If we are able to group sets of these daughter spectra from different compounds, we expect, and find, that the closest matching spectra are derived from compounds with similar substructural features. MS versus MS/MS spectra matching The traditional spectrum matching programs designed for matching EI mass spectra take one of several general 106 approaches to this problem of grouping spectra. Each of these methods relies on a library of spectra which may be abridged in a variety of ways. Biemann searching techniques abridge the spectra to the two most abundant peaks in a 14 dalton window (42). Other techniques reduce the intensity to a binary value, i.e. either there is a peak at a mass or not (40-41). Kevin Cross of MSU has implemented a matching system for E1 spectra that is based on the unabridged spectra (20,47). This matching program, patterned after a variety of other matching techniques, assigns weights to the masses and the intensities of each peak, and produces good results for El spectra. The simplicity and wide variations in intensities in daughter spectra make normal EI matching techniques less suitable for daughter spectra. Better daughter spectra groupings (and) hence substructural feature groups) can be achieved by doing minimal intensity screening for peaks at similar m/z values. I have studied a variety of intensity matching algorithms to determine the importance of the relative intensities of daughters at the same mass. The problem becomes more complicated when you must match a daughter spectrum from a weak parent ion to a spectrum from an intense parent ion. Simply normalizing the spectrum to the parent peak or to the total ion current may result in the noise peaks becoming too prominent. The threshold level for recording a peak on the spectrometer may have 107 been set too high, and low intensity daughter ions may not be seen. These and other problems require special attention when matching daughter spectra. Intensity matching for daughter spectra The approach I took to determining the importance of daughter ion' intensity was to try several intensity compression algorithms on daughter spectra from the same parent structure taken under different conditions. Also included in the reference library were vastly different compounds with the same parent ion masses. The first attempt ignored intensities altogether, ire. a binary compression. If any peak existed at the same mass in both the unknown and reference spectra, it scored one point. On this basis, the top scoring matches proved to be an unselective sample of the data base. This was primarily due to scoring ’matches’ of intense daughter peaks with very small peaks, and neglecting to account for missing peaks. This indicated the need for some intensity weighing, and several methods were tried. Reducing the intensity range from six orders of magnitude available from the instrument to a number representative of the magnitude (i.e. LOGIO) gave a total dynamic range of zero to six. Even with this dramatic reduction of scale, and counting intensities that matched 108 within :1 unit, the resultant groupings of spectra were not as good as desired. The intensity differences, from similar spectra, were just too great for this 6 level approach. The current grouping algorithm uses only three levels of intensity information: strong, medium and weak. First, any peaks that are "weak" (less than an arbitrary 800 counts) are marked as being weak. These small peaks must be classified as weak before normalizing the spectrum to identify them as peaks not much greater than the instrument background noise. Next, the daughter spectrum is normalized to the parent ion, and those peaks that are less than one percent are marked weak, between 1 and 10 percent are medium, and greater than 10 percent are strong. Two peaks are considered to have matched intensities if they are in the same or adjacent groups. ' This algorithm produces good results, allowing for wide intensity variations present in daughter spectra. Ranking and sorting the data groups The last step in a grouping program is to have the computer rank the results (Figure 5.5). This involves assigning ‘quality factors to matching peaks, and deducting points for mis-matched spectra. Peaks in an unknown that intensity match peaks in the reference are given full 109 Intensities reduced to: Weak 800 counts or < 12) ( < Medium ( 1* g intensity 5 10%) Strong (10% g intensity 5 100%) mass comparison intensity comparison weighting factor ref = unknown intensity matches + 1.0 U (a mass in both) (within i one group) ref =.unknown intensity doesn’t + 0.5 U (a mass in both) match mass in unknown weak intensity : 0.0 not in reference (ignored) mass in unknown medium, strong - 0.33 U not in reference intensity mass in reference weak intensity - 0.05 R not in unknown mass in reference medium, strong - 0.1 R not in unknown intensity The weighting factor is based on the number of peaks in either the reference or unknown spectra. R and U each represent the percent of the total spectrum (based on the number of peaks, not their intensities) in the reference and unknown spectra. Characteristics used to match daughter spectra Figure 5.5 110 credit, while those peaks at the same mass but different intensities gain only one half credit. Each "credit" is the percentage of the number of peaks in the unknown. For example, if an unknown has 5 peaks, and 4 peaks match in mass and intensity and one matches mass but not intensity, the rating would be 90% (4*100/5 + 1*50/5). Medium and strong peaks in the unknown that are not in the reference spectrum are negative 1/3 credits. Weak peaks from the unknown, not in the reference, are given a zero credit (i.e. they are ignored). The last of the ranking credits are based on reverse searching (comparing the reference to the unknown). These negative points are based on the number of peaks in the reference spectrum (here one credit is a percentage of the number of peaks in the reference spectrum). Unmatched strong and medium peaks in the reference are a -lOX credit, while weak peaks not in the unknown spectrum deduct 5* of a credit. Examples of daughter spectra grouping An example of this grouping algorithm is shown for a group of 21 compounds, each containing a parent at m/z 149. In both Figures 5.6 and 5.7, the phthalate ion (CsH4(CO)2OH*) of di-ethyl-phthalate is matched against all the daughter spectra from these 21 compounds (which include other phthalates and a variety of other compounds). The grouping. technique correctly clusters the spectra from the 111 Matching scan 6 (0149.0) from dataset R15444 (diethyl phthalate) Group factor Scan 100 6 97 10 87 10 66 13 64 10 63 7 58 4 58 5 55 7 54 13 53 13 50 2 49 5 35 6 34 2 Parent (0149. (0149. (D149. (0120. (0149. (0121. (0149. (0149. (0121. (0121. (0120. (0164. (0177. (0 93. (0164. 0) 0) 1) 8) 0) 0) 0) 0) 1) 0) 6) 0) 1) 0) 0) Dataset R15444 (diethyl phthalate) R27399 (dioctyl phthalate) R20688 (dibutyl phthalate) R27399 (dioctyl phthalate) R22855 (dipentyl phthalate) 88248 (p-t-butylbenzyl alcohol) R8239 (2-t-butyl-6-methyl phenol) R12776 (lO-undecenoic acid, methyl ester) R15444 (diethyl phthalate) . R20688 (dibutyl phthalate) R22855 (dipentyl phthalate) R8248 (t-butylbenzyl alcohol) R15444 (diethyl phthalate) R8416 (1,3 benzenedicarboxylic acid) R8239 (2-t-buty1-6-methyl phenol) Example of the daughter spectrum matching algorithm Figure 5.6 112 Matching scan 6 (0149.0) from dataset R15444 (diethyl phthalate) PT PC NC NS NR IS 100 100 4 0 0 0 68 55 3 l 0 61 67 51 2 2 0 25 66 59 3 l 1 61 59 58 3 1 2 61 58 60 3 '1 2 61 57 55 3 1 2 61 56 61 3 l 3 8 53 52 2 2 2 25 50 55 4 0 l 0 49 40 2 2 1 70 49 32 2 2 0 75 46 59 3 1 3 8 45 33 2 2 l 75 44 34 2 2 1 75 *1 s2 *3 351555313 IR Scan Parent no hi caaucoca-scncncococococ: NH 030! 14 6 7 3 13 13 13 4 7 2 10 10 6 5 16 9 (0149.0) (0121.1) (0149.0) (0121.0) (0120.6) (0120.8) (0121.0) (0149.0) (0164.0) (0149.1) (0121.0) (0 93.0) (0177.1) (0 92.8) (0 93.2) 1,3 benzenedicarboxylic acid tetra ethyl silicic acid 2-t-butyl-6-methyl phenol overall match factor Dataset R15444 R15444 R8416 R2060! R22855 R27399 88416 R13923 R8239 R20688 Rl3923 R8416 R15444 R22855 R15444 (diethyl phthalate) (diethyl phthalate) t (dibutyl phthalate) (dipentyl phthalate) (dioctyl phthalate) *1 *2 x3 (dibutyl phthalate) t1 (diethyl phthalate) (dipentyl phthalate) (diethyl phthalate) pattern correspondence - intensity based fit ‘ 3 common number of peaks number of peaks the sample (unknown) not matched to ref. number of peaks ;n the reference not in the unknown percent total ion current of unmatched sample (unknown) percent total ion current of unmatched reference Example of the EI spectrum matching algorithm Figure 5.7 113 phthalates, and other samples are distributed lower on the scale (Figure 5.6). Figure 5.7 is a comparison of the match factors produced from the EI matching technique. As can be seen from these figures, the EI matching algorithm did not successfully group the phthalates together, while an algorithm designed for matching daughter spectra performed well. Conclusions These two techniques extract only a small fragment of the information present in MS/MS data. The neutral- spectral studies have shown that the first-order losses from parent ions give direct, definitive information about the composition of the parent ion. The expert system presented in this chapter is a simple program that shows the promise of this technique. The grouping method presented here demonstrates the differences between daughter/neutral spectra and EI spectra. In order to effectively group daughter or neutral spectra, we need to use algorithms with minimal intensity screening. The technique of MS/MS is a powerful tool for structure determination. The tools presented in this thesis are an introduction to the kind of tools, rules and algorithms needed to elucidate the structure of unknowns. The combined research efforts of many members of the Enke group 114 have and will continue to lead toward the development and implementation of a variety of tools and techniques useful in determining the structure of unknown samples. B IBLIOGRAPHY 10 11 12 13 14 115 BIBLIOGRAPHY Yost, R.A., Enke, C.G., J. Am. Chem. Soc., 1 0, 2274 (1978). Yost, R.A., Enke, C.G., Anal. Chem., 5;, 1251A (1979). Yost, R.A., Ph.D. Dissertation, Michigan State University, E. Lansing, MI (1979). McLafferty, F.W. in ”Tandem Mass Spectrometry", F.W. McLafferty, Ed., John Wiley & Sons, New York, NY, 1983, Chapter 1. Levsen, E., Beckey, H.D., Org. Mass Spectrom., g, 570 (1974). Bozorgzadeh, M.H., Morgan, R.P., Benyon, J.H., Analyst, 103, 613 (1978). . Yost, R.A., Enke, C.G., in "Tandem Mass Spectrometry", F.W. McLafferty, Ed., John Wiley & Sons, New York, NY, 1983, Chapter 8. Yost, R.A., Fetterolf, 0.0., Hass, J.H., Harvan, 0.J., Weston, A.F., Skotnick, P.A., Simon, N.M., Anal. Chem., 56, 2223 (1984). Maquestian, A., et. al., in "Tandem Mass Spectrometry”, F.W. McLafferty, Ed., John Wiley & Sons, New York, NY, 1983, Chapters 21-26. Bauer, M.H., Masters Thesis, Michigan State University, Lansing, MI (1983). E. Myerholtz, C.A., Ph.D. Dissertation, Michigan State University, E. Lansing, MI (1983). Yost, R.A., Enke, C.G., Org. Mass Spectrom., 16, 171 (1981). Yost, R.A., Enke, C.G., American Lab, June 1981. Levsen, K. in ”Tandem Mass Spectrometry", F.W. McLafferty, Ed., John Wiley & Sons, New York, NY, 1983, Chapter 3. 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 116 Giordani, A.B., Gregg, H.R., Hoffman, P.A., Cross, R.P., Beckner, C.F., Enke, C.G., presented at the 32nd Annual Conference on Mass Spectrometry and Allied Topics; San Antonio, TX, May 27—June l, 1984. Cross, R.P., Palmer, P.T., Giordani, A.B., Beckner, C.F., Hoffman, P.A., Gregg, H.R., Enke, C.G., ACS Symposium Series, in press. Hoffman, P.A., Enke, C.G., presented at the 3lst Annual Conference on Mass Spectrometry and Allied Topics; Boston, MA, May 8-13, 1983. Hoffman, P.A., Ph.D. Dissertation, Michigan State University, E. Lansing, MI, (in preparation). Cross, R.P., Enke, C.G., presented at the 32nd Annual Conference on Mass Spectrometry and Allied Topics; San Antonio, TX, May 27-June l, 1984. Cross, R.P., Ph.D. Dissertation, Michigan State University, E. Lansing, MI (1985). Lindsay, R.K., et. a1., "Applications of Artificial Intelligence to Organic Chemistry: The Dendral Project", McGraw Hill, New York, NY (1980). Carhart, R.E., Smith, D.H., Gray, N.A.B, Nourse, J.G., Djerassi, C., l. Org. Chem., 15, 1708 (1981). Carhart, R.E., Varkony, T.H., Smith, D.H., ACS Symposium Series 54, 126 (1977). - Crawford, R.W., Brand, H.R., Wong, C.W., Gregg, H.R., Hoffman, P.A., Enke, C.G., presented at the 30th Annual Conference on Mass Spectrometry and Allied Topics; Honolulu, HI, June 6-11, 1982. Crawford, R.W., Brand, H.R., Wong, C.W., Gregg, H.R., Hoffman, P.A., Enke, C.G., Anal. Chem., 56, 1121 (1984). Newcome, B.H., Ph.D. Dissertation, Michigan State University, E. Lansing, MI, (1984). Hoffman, P.A., Private Communication. Newcome, B.H., Enke, C.G., Rev. Sci. Instr., 55, 2017 (1984). Brodie, L., "Starting FORTH, An Introduction to the FORTH Language and Operating System for Beginners and Professionals", Prentice—Hall, Englewood Cliffs, NJ (1981). 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 117 Denton, M.B., Private Communication. Tilden, S.B., Denton, M.B., Technical Report #18, for ONE contract #N00014-75-C-05l3 (1979). Carlson, E., Ph.D. Dissertation, Michigan State University, E. Lansing, MI (1978). Gregg, H.R., Chai, J.W., Chakel, J.A., Hoffman, P.A., Latven, R.E., Matthews, R.S., Myerholtz, C.A., Newcome, B.H., Enke, C.G., presented at the 29th Annual Conference on Mass Spectrometry and Allied Topics; Minneapolis, MN, May 24-29, 1981. Schubert, A.J., Ph.D. Dissertation, Michigan State University, E. Lansing, MI (in preparation). Myerholtz, C.A., Newcome, B.H., Enke, C.G., presented at the 31st Annual Conference on Mass Spectrometry and Allied Topics; Boston, MA, May 8-13, 1983. Kristo, M.J., Enke, C.G., presented at Rochester FORTH. Conference, June 1985. Kristo, M.J., Myerholtz, C.A., Schubert, A.J., Enke, C.G., presented at Rochester FORTH Conference, June 1985. Myerholtz, C.A., Schubert, A.J., Kristo, M.J., Enke, C.G., accepted for publication in Instruments and Chemistry. Myerholtz, C.A., Schubert, A.J., Kristo, M.J.,‘Enke, C.G., accepted for publication in Instruments and Chemistry. Grotch,.S.L., Anal. Chem., 42, 1214 (1970) Wangen, L.E., Woodward, W.S., Isenhour, T.L., Anal. Chem., 35, 1605 (1971). Hertz, H.S., Hites, R.A., Biemann, K., Anal. Chem., 11. 681 (1971). Naegli, P.R., Clerc, J.T., Anal. Chem., 6 739A (1974). Gronneberg, T.O., Gray, N.A.B., Eglinton, 0., Anal. ,Chem., 31, 415 (1975). Pesyna, G.M., Venkataraghavan, R., Dayringer, R.E., McLafferty, F.W., Anal. Chem., 15, 1362 (1976). Blaisdel, B.E., Anal. Chem., 55, 180 (1977). 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 118 Damen, H., Henneberg, 0., Wiemann, B., Anal. Chim. Acta, 103, 289 (1978). Van Marlen, G., Van Den Hende, J.H., Anal. Chim. Acta, 112, 143 (1979). Lebedev, H.S., Tormyshev, V.M., Derendyaev, B.G., Koptyug, V.A., Anal. Chim. Acta, 133, 517 (1981). Jurs, P.C., Kowalski, B.R., Isenhour, T.L., Anal. Chem., 5;, 21 (1969). Jurs, P.C., Kowalski, B.R., Isenhour, T.L., Rielley, C.N., Anal. Chem., 3;, 1949 (1969). Bender, C.F., Shepherd, H.0., Kowalski, B.R., Anal. Chem., 35J 617 (1973). Lam, T.F., Wilkins, C.L., Brunner, T.H., Saltberg, L.J., Kaberline, S.L., Anal. Chem., 35, 1768 (1976). Hitter, C.L., Isenhour, T.L., Computers Chem., A, 243 (1977). Lowry, S.B., Isenhour, T.L., Justice, J.H., McLafferty, F.W., Dayringer, R.E., Venkataraghavan, R., Anal. Chem., 55, 1720 (1977). Buchs, A., Duffield, A.M., Schroll, G., Djerassi, C., Delfino, A.B., Buchanan, B.G., Sutherland, C.L., Feigenbaum, E.A., Lederberg, J., 5. Ag. Chem. Soc., 5;, 6831 (1970). Delfino, A.B., Buchs, A.B., Helv. Chip. Acta, 55, 2017 (1972). Buchanan, B.G., Smith, D.H., White, W.C., Gritter, R.J., Feigenbaum, E.A., Lederberg, J, Djerassi, C., J. 5;. Chem. Soc., 55, 6168 (1976). McLafferty,F.W., Anal. Chem., 55, 1441 (1977). Cardenas, A.F., Data Base Management §ystepg, 2nd ed., Allyn and Bacon, Inc., Boston, MA (1985). Gillenson, M.L. Database, Wiley-Interscience, New York, NY (1985). Wiederhold, 0., Database .Design, McGraw-Hill, New York, NY (1977). Borman, S.A. Anal. Chap, 51, 983A (1985). 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 119 Hampel, V.E. in Data for Science and Technology, Glaesen, P.S. (ed.), Pergamon Press, Oxford, (1981). Chen, C.C. and Harnon, P. (ed.) Numeric Databases, Ablex Publishing, Norwood, NJ (1984). Glaeser, P.S. (ed.) Data for Science and Technology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984). - Glaeser, P.S. (ed.) Data for Science and Technology, North-Holland Publishing, Amaterdam, The Netherlands (1983). Shoshani, A., Olken, F. and Wong, H., pg. 349 in The Role 5: Data 15 §cientific Programs, Glaeser, P.S. (ed.) Amsterdam, The Netherlands (1985). Sobel, y., Dagane, I., Carabedian, M., and Dubois, J., pg. 395 in The Role 5: Data 15 Scientific Programs, Glaeser, P.S. (ed.) Amsterdam, The Netherlands (1985). Smith, F.J. and Hughes, J.G., pg. 435 in The Role 5: Data 55 Scieptific Programs, Glaeser, P.S. (ed.) Amsterdam, The Netherlands (1985). Rumlble, J.R. Jr. and Hampel, V.E. (eds.),Database Management ip_Scienca and Technology, Elsevier Science Publishers Amsterdam, The Netherlands (1984). Rumlble, J.R. Jr. in Database Management 15 Science 555 Technology, Rumlble, J.R. Jr. and Hampel, V.E. (eds.), Elsevier Science Publishers Amsterdam, The Netherlands (1984). Kipiniak, W. and Finnerty, W. pg 17 in Computers 55 the Laboratory, Liscouuski, J.G. (ed.), ACS Symp. Series 265 (1984). Baumann, F., Lewis, R.A., and Brown, A.C. III, pg 23 in Computers 55 the Labopgtory, Liscouuski, J.G. (ed.), ACS Symp. Series 265 (1984). RS-l, BBN Software Products. Lotus 1-2-3, Lotus Development Corp. Gregg, H.R., Enke, C.G., in preparation for submission to Computers and Chemistry. VAX FORTRAN, Digital Equipment Corp. Enke, C.G., Science, 215, 785 (1982). 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 120 Perone, S.P., ACS Symp. Series, 265, 99 (1984). Atkinson, T.V., Gregg, H.R., "MULPLT: A Multiple Dataset, Fila Based Plotting Program", submitted to the DECUS software library, Marlboro, MA (1984). Hayes-Roth, F., Waterman, D.A., Lenat, D.B., "Building Expert Systems", Addison-Wesley, Reading, MA (1983). Hayes-Roth, F., Com uter, 11, 263 (1984) Barr, A., Feigenbaum, E.A., "Handbood of Artificial Intelligence", William Kaufman, Los Altos, CA, Vol. 1 (1981), Vol. 2 (1982). Jurs, P.C., Computer Software Applications 15 Chemistry, Wiley-Intarscience, New York, NY (1986). Rames, L.S., et. a1., Anal. Chem., 58, 295R (1986). Martinsen, D.P. and Song, B.H., Mass Spectrom. Rev., 4, 461 (1985). Barr, A. and Feigenbaum, E.A., The Handbook 5: Artificial Intelligence Vol. 5, William Kaufmann, Los Altos, CA (1982). Buchanan, B.G. and Feigenbaum, E.A., Artificial Intelligence, 11, 5 (1978). 6" Dessay, R., Anal. Chem., 56, 1200A (1984). Dessay, R., Anal. Chem., 56, 1312A (1984). Addis, T.H., Designing Knowledge-Baseg S stems, Kogan Page Ltd, London (1985). Pierce, T.H. and Hohna, B.A. (eds.), Artificial Intelligence ;5_ Chemistr , ACS Symp. Series, 306 (1986). Mclafferty, F.W. and Stauffer, D.B., 55_Chem. Inf. Comp. Sci., 25, 245 (1985). Wong, G.M., Crawford, R.W., Kunz, J.C., Kehlar, T.P., IEEE Transactions 55 Nuclear Sciences, NS-31, 804 (1984). Wong, G.M., Crawford, R.W., Lanning, S.M., Brand, H.R., presented at the 32nd Annual Conference on Mass Spectrometry and Allied Topics; San Antonio, TX, May 27-June 1, 1984. 97 98 99 121 Wong, G.M., Lanning, S.M., Crawford, R.W., Brand, H.R., presented at the 32nd Annual Conference on Mass Spectrometry and Allied Topics; San Antonio, TX, May 27-Juna 1, 1984. Brand, H.R., Lanning, S.M., Wong, C.M., submitted to Int. Joint Conf. on Artificial Intelligence, August 18-24, 1985. McLafferty, F.W., "Interpretation of Mass Spectra, 3rd Ed.", University Science Books, Mill Valley, CA (1980). APPEND ICES 122 Appendix 1 Dr. Memory, SLOPS and control software This appendix contains brief descriptions of each of the commands and subroutines used in the TQMS control software. For the purpose of brevity, the complete user’s manuals for each of the three software packages (Dr. Memory, SLOPS, and the control software) will not be reproduced. Each command or subroutine in each user manual consisted two or more pages. A sample of the format used is presented (for the subroutine that checked with the user, then set/reset specified bits, BITCHK), followed by only the description section of the documentation for the rest of the subroutines. Appendix 1, part 1: Bit check example part 2: Dr. Memory part 3: SLOPS part 4: TQMS control software 123 Appendix 1, Part 1 BITCHK Description Version This routine, bit check, sends out a message, gets a response, and clears, sets or leaves alone the indicated bit in the indicated byte. A YES response sets the bit, NO clears is, and nothing () does not change it. hg 1.0 Required Returns Modified B DE HL bit position (mask) pointer to the byte to change pointer to query text none none Pushes/Pops Calls Bugs 0/0 - register A modified PRINT TTYIN BITCHK: BITCHI: BITCHZ: 8088 CALL CALL LDA CPI JZ CPI JZ CPI JZ CPI RNZ MOV CMA MOV LDAX ANA STAX RET LDAX ORA STAX RET BITCHK PRINT TTYIN CHRBUF ’Y BITCH2 ’y BITCH2 ’N BITCHl 124 -0 .6 -m -0 as. -0 use -0 -0 .0 -o as. .0 U. -0 -0 .. \ss -0 -0 -0 -0 ask the question get the answer get the first character yes or no ignore ... get and reverse the mask clear the bit and head home set the given bit and head home 125 Appendix 1, Part 2 0r. Memory summary Subroutines Globals Definitions, memory locations, etc Lowcore Restart and trap jumps Interupts Interupt handling routines RSTO Cold start, sets defaults RSTl A \ RST2 BC \ Diagnostics - prints RST3 DE \ the contents of the RST4 HL / indicated registers RST5 PC / RST6 SP+f1ags/ RST7 Software breakpoint Kernal Dispatcher for the monitor Commands Processes several commands Go Go and Procede commands Quest Prints a summary of the commands Regstr Open a register for modification Talk Connect terminal to PDP 11 Chrchk Checks and converts ASCII to binary! Close Closes an open location* Crlf Outputs a combination Downld Downloads data from the PDP ll Efclr Clears an event flag Getnum Gets a number input Gettt Gets a character from the USARTs Lights Sends a predefined light pattern out Lite Sends a specified light pattern out Modify Modify a location in memory* Nulljb Boredom routine Open Opens a memory locations* Print Prints an ASCII string Putnum Outputs a number Puttt Writes a character to a USART Rhlr Rotates HL right Stkbit Stacks 3/4 bits of HL on stackt * these routines are useful only to Dr. Memory, and probably will not help the casual programmer. 126 GLOBALS This is a list of mnemonics that assign values to ASCII characters, memory locations, etc. LOWCORE These are the hardware vector jumps. This routine just transfer control to the processing routines. The restart interrupts are transferred to RAM where RST 7’s await (unless user modified). INTERRUPTS These routines process interrupts from devices. The USART interrupt routines fetch and store one character. The break interrupt routine does a warm restart. RSTO This cold start routine initializes several parameters needed for the monitor to operate. Breakpoint instructions are placed in RAM where unused interrupts transfer control, jump tables set up, flags set to default values, default user stack set up as well as a default program counter. RSTl Debugging aid: prints the contents of register A RST2 Debugging aid: prints contents of register pair BC RST3 Debugging aid: prints contents of register pair DE RST4 Debugging aid: prints contents of register pair HL RST5 Debugging aid: prints the user’s program counter RST6 Debugging aid: prints the value of the stack pointer and flags RST7 Warm start or breakpoint entry. This routine saves all of the user registers and status. Control is returned to either Dr. Memory or Slope, depending on who is in control. ' KERNAL This is the heart of Dr. Memory. It accepts numbers (addresses and values) and commands and dispatches to the appropriate routine. COMMANDS These are routines that process Dr. Memory’s commands. Those processed in this module are slash, backslash, carriage return, line feed, up arrow, Slops, Rex, and Octal. GO Two of Dr. Mamory’s commands live here: Go and Procede. Procede is useful - it restores all user status and registers, including program counter. This gets us back to ’user mode’. QUEST REGSTR TALK CHRCHK CLOSE CRLF DOWNLD EFCLR GETNUM GETTT LIGHTS 127 This is the ? command of Dr. Memory. It prints a summary help message. This is another of Dr. Memory’s commands, the S. It is used to translate a register mnemonic to an address, and open one or two bytes for modification. This is a routine that ’connects’ the two USARTs together. Used to communicate to the POP 11 thru the micro. Note that there is no exit to this routine. As such, it is both a command and a subroutine. A break is needed to exit. Character check. Numbers (actually digits, in register A) are converted to binary (numbers either octal or hex) and the result is left in register A. Non-numbers cause the carry bit to be set. Sets the open/closed flag to closed, and write out a carriage This routine sends a carriage return/line feed combination to the terminal. This subroutine talks to the PDP 11, and expects a defined protocol for loading the micro’s memory. Binary records are loaded directly into memory, while ASCII records are sent to the terminal. This routine clears the event flag specified in register A. This routine gets characters from the keyboard, echoes them, checks them, and if numeric, builds a number and loops for more. The number, if entered, is built in HL and the ’entered number’ flag is set. This routine returns when a non-numeric is entered. These are the get character routines. GETTTO gets a character from USART 0, while GETTTl gets a character from USART 1. Each routine waits until a character is entered. The character is returned in register A. This routine displays a characteristic light pattern each time called. The pattern displayed is stored at location LITES. LITE MODIFY NULLJB OPEN PRINT PUTNUM PUTTT RHLR STKBIT 128 This routine turns lights on/off as defined by a mask in register A. Each of the low 4 bits specify a light as follows: 0 ==> bottom PS light 1 ==> middle PS light 2 ==> top PS light 3 ==> SOD light This routine will modify a memory location if there is an open location and a number has been entered. This routine waits for the occurence of specified event flags. Register A contains a mask of the flags to wait for. While waiting, this routine plays with the lights. These routines set the open location flag and writes the contents of the open byte(s) to the terminal. OPEN opens a one byte location, OPEN2 opens a two byte location.r This subroutine prints a string of ASCII characters which ‘are ended by a null (0). HL point to the start of the string. This routine writes out a number, one or two bytes long, octal or hex, to the terminal. HL contain the number to output, and B contains a flag byte: 1 for byte output, 0 for word output. These routines output a byte to a USART. PUTTTO sends a byte to USART 0, while PUTTTl send it to USART l. The character to send is in register A. These routines wait until the USART is ready to send. This routine rotates register pair HL right one bit. The most significant bit is always set to 0, and the least significant bit is returned in the carry flag. This routine takes the least significant 3 or 4 bits, depending on the octal/hex flag, of HL and pushes them on the stack under the return address. 1F -..-5&7. i" ‘i 129 Appendix 1, Part 3 SLOPS summary Globals Globals Mnemonics, address, etc for Dr. Memory Sglobals Globals for Slops Init Initialization routine Kernal Driver routine and dispatcher Subroutines Addr Returns code address of library entry Ascii Converts binary to and from ASCII Blank Clears the screen Brkdwn Breaks input line into words Check Compares word with library entry Cvtext Converts a number to ASCII string Cvtint Converts ASCII string to binary number Dcmp Double compare Delay Software time delay Link Links to next library entry Number Get number from word stack Search Search library for a match Ttyin Gets a line of input Word Get word from word stack (also see subroutines listed under Dr. Memory) Arithmetic gpbroutineg Ddiv Div Dmult Dsub Mult Commands Convert Downld Drmem Talk Double divide Divide ' Double multiply Double subtract Multiply Converts a number to any base Loads from the PDP 11 (See Dr. Puts Dr. Memory in control Talk to the PDP 11 Memory) 130 GLOBALS This is a list of mnemonics, address assignments, SGLOBALS INIT KERNAL ADDR ASCII BLANK BRKDWN CHECK CVTEXT etc. for Dr. Memory. Needed for proper assembly. Slops globals. Mnemonics for Slops use, definition of memory locations and subroutine addresses. This routine initializes a few parameters, and sets Dr. Memory to look directly at Slops, not here. This is the heart of Slops. It gets a command line, searches the library, and if the command if found, starts the named routine. This routine returns a pointer to the code of a library subroutine entry, given a pointer to the entry. e.g. the pointer is incremented past the flag byte, the name and the link field, if present. These are two routines, TOASCI and UNASCI. TOASCI adds either 48 or 55 to the number in register A to make it either numeric or alpha, as needed. UNASCI checks the character in register A to see if it is a legal digit, with radix given in register C. If legal, the number is converted to binary. Carry flag is set otherwise. Blanks the screen and resets the cursor to the ’homa’ position. This utility routine breaks a line of input (assumed to be in the terminal buffer) down into ’words’. A word is defined as alphanumeric characters delimited by non-alpha characters. These non-alpha characters are also words of length l. Spaces are defined as non-alpha delimitars, but do not count as words. Text after a semicolon is treated as a comment, and not broken down into words. Compares the current word (eg an entered ’command’) with an entry in the library. Sign flag set if the words don’t match. Converts a number (DE) to an ASCII string (radix (C)). The resultant string is ’pushed’ to an area pointed to by HL. The end of the string is flagged by a null (0). Note that the string built is built backwards, so on entry, HL must point to the TOP of a buffer. On return, carry set implies a bad radix was given. CVTINT DCMP 001V DELAY DIV DMULT DRMEM DSUB LINK MULT NUMBER 131 This routine converts an ASCII string (radix (C)) to a number (DE). The carry flag will be set on return if a bad character was discovered. On return, carry set implies a conversion error (bad character in string). Double compare, patterned after the 8085 CMP instructions. Here, flags (S, CY, 2) set according to BE - HL. Double divide. The following takes place: BC = HLBC / DE, remainder in HL Software timing loop generate approximate time delays. HL contain the count, each count is about 10 microseconds. Note that HL=1 is the shortest time, and HL=0 is the longest. Divide routine. This acts as follows: DE = DE / A, remainder in A Double multiply. This routine acts as follows: DE = DE 3 A This routine sets Dr. Memory as the kernel in charge, and returns to it. Double subtract. This routine acts as follows: HL = HL - DE This is the counterpart to the 8085 instr DAD D This routine links from one library entry to the next. e.g. BC pointing to a library entry is redirected to point to the next entry. The sign flag is set if no more entries exist. Multiplication. This routine acts as follows: DE = D t E This routine tries to find a number as the next element on the word stack. If found, conversion to binary is attempted. The conversion radix may be specified with the number or defaulted. Radix qualifiers follow the number, as follows: ’ octal decimal hexadecimal radix n, where n is any valid number. The carry flag tells of errors, which might include any of the following: no next word, next word not a number, or the radix qualifier (number after :) is not a number. 0 fl VVV :n SEARCH TTYIN WORD CONVERT 132 This routine searches through the library for a match with the current word. If‘a match is found, the entry is passed back along with its flag byte. The sign flag is used as an error indicator for no match. Keyboard driver. Accepts a line of input, ended by a control character (ASCII value < 32), stores the line in the input buffer and echoes each character as typed. This routine also processes the following special characters: delete deletes the last character escape blanks the screen cntl/U deletes the current line of input Pops pointer to and length of next word off the word stack. If no more entries exist, the sign flag is set. This command converts a number from one base to another. To invoke this routine, type: CONVERT number base where ’number’ is the number to be converted to the base ’base’. For a description of valid numbers, see NUMBER. 133 Appendix 1, Part 4 Control system summary Commands INI Initialize the system GET This routine gets a set of stored parameters SAVE This routine save the current parameter table PARAM This is the general parameter setting routine MANUAL Routine for ’manual’ control FSCAN Fast scan - see the spectra on the oscilloscope SCAN Scan, collect data and display it WDATA This routine writes the data from RAM to a disk file Graphics gpbroutinag CHRLIB CLEAR DRAW GRFAXS GRFCHR GRFLAB GRFVEC Display UPDATE UPSTAT UPSNUM UPINT UPMASS UPGRAF UPAXES DLAST Character library This subroutine/command clears the graphics screen Etch-a-sketch command ' This is the axis drawing routine This subroutine draws a character on the screen This routine draws a string on the graphics screen Draws a vector on the graphics screen update routines This routine updates the status displays This routine updates the status display This routine updates numbers on the status display Format and display the intensity on status display Format and display the mass on the status display This routine updates the graphic display This routine draws the axes Draws the last collected intensity and graphic 134 Other ggbroptineg BITCHK FMASS FORMAT GETGRF GETMAS GETPM GETPRM INCMAS INTENS INT16 MSADC MSDATA MSINIT MSMASS PVRAM SMASS SMGL SRANGE STARTP Sets/clears bits in a flag based on user response Formats the mass as follows: aa:xxx.x Format a number for output This routine returns a flag byte with graph info. This routine gets a mass from the word stack Get X coordinate for current mass for display Get a new set of parameters Increment the quads to the next mass This routine collects the data This routine returns the sum of 16 intensities Gets one ADC value Store current intensity in memory Initialize the control hardware Send the quad controller a mass via DACs Put text to the status display video RAM Set current mass Send message to user, get line of response Set range of Kiethley amplifier This is a file of the startup (default) parameters 135 Commands INI GET SAVE PARAM MANUAL This is a routine to clean things (mainly the display) up. The data RAM is zeroed, graphics and status screens are cleared and updated, range and mass reset. This routine gets a set of parameters from a stored bank. The number of the bank where the parameters are stored is requested, and the entire bank is transferred into the current parameter table. This routine save the current parameter table in a bank of parameter tables, for future recall. Use is as follows: SAVE n where n is a number between 1 and 4 inclusive. This is the general parameter setting routine. It acts in a ’singla character input’ mode for the parameter to change, i.e. ’0’ not ’QUAD’ for changing a Quad’s mass range. At this time, little checking is done of the parameters, and each parameter must be set individually, i.e. changing the’ mass range does not change the graph parameters. The parameters for either graph, the UPPER or the LOWER, must be changed individually, and the commands U and L identify which graph is being changed. The following commands are currently supported: Q n low-high Change Quad n’s mass scan T int range Change the threshold R min-max Change the min and max range A n Change the # points to average G low,n Change low mass on the graph F Change flags, each queried M text' Puts message on status display H text Puts a header on graph ? Prints a summary of the options “2 to exit (control 2) Routine for ’manual’ control. Using the keypad of the terminal, the keys 1, 2 and 3 are for quad l; 4, 5 and 6 are for quad 3; and keys 7, 8 and 9 are for the range. Keys 1, 4 and 7 decrease the current value by 1, keys 2, 5 and 8 set the current value into the parameter table, and keys 3, 6 and 9 increase the current value by l. FSCAN SCAN WDATA 136 This routine increments the mass, checks for a typed character, and if none entered, loops about. This routine scans the set mass ranges, collects data, updates the screens, records the data and loops until a character is typed. This routine writes the data from RAM to a disk file. Graphics subroutines CHRLIB CLEAR DRAW GRFAXS GRFCHR GRFLAB This is not a subroutine, but a library of vector moves that draws the ASCII character set on the MATROX graphics board. A list of pointers, in ASCII“ order, points to the entries for each character. This subroutine/command clears the graphics screen. The actual clearing takes a maximum of 34 msec, but this routine does not wait. This command allows the user to draw, dot by dot, on the MATROX graphics board. The numeric keypad is used to determine the direction, 5 ==> don’t move, 9==> northwest, 4 ==> east, etc. The 0 key tells the routine to clear dots in the drawn path, the period (.) tells the routine to draw points. To exit, type a slash (/). This is the axis drawing routine. It will draw an axis either vertical or horizontal. Large and small tick marks can also be drawn, and large tick marks can be labeled. This routine works from a table of data the defines the axis to be drawn. The table (an axis description block) is defined below. ' This subroutine draws a character on the graphics screen. The ASCII code for a character is passed in register A, the graphics board must be aware of the address to plot at (lower left corner of the character space), and this routine will draw the character from the library (CHRLIB). This routine draws a string of characters on the graphics board. The supported characters are all the printing ASCII characters. The string can be left, center or right justified. Registers D and E contain the coordinates of the lower left (center or right) for the string to be drawn. HL points to the string, terminated by a null. GRFVEC Display UPDATE UPSTAT UPSNUM UPINT UPMASS UPGRAF UPAXES DLAST 137 Draws a vector, of length (C), in the direction (B) [Note that the direction mnemonics (north, south, east, west, ne, se, nw, sw) should be used], starting at physical coordinates X and Y (D,E). update routines This routine updates the status and graphic displays. This routine updates the status display. Individual parts of the display can be updated, or the entire screen can be blanked and redrawn. This routine updates the numbers on the status display for either the upper or the lower scan. Not updated are the descriptions, the current info, or the messages; just the numbers under the "UPPER" or "LOWER" headers. - This routine scales the given intensity and range to the format xx.x -xx and sends it to the status display, at location pointed to by HL. This routine updates a mass range on the status display, in the following format: xxx-xxx This routine updates the graphic display. The entire graph may be updated, or only the current information can be plotted. This routine draws the axes, taking into account single or double plot, linear or log scale, low mass for display, and # pts/AMU. This routine updates the last mass, and the current intensity. This is done to allow time for the quads to settle, ions to be selected, etc. before the next intensity is collected. While waiting, graphing the last displayed point seemed like the thing to do. Other subroutines BITCHK FMASS This routine sends out a message, gets a response, and clears, sets or leaves alone the indicated bit in the indicated byte. A YES response sets the bit, NO clears is, and nothing () does not change it. This routine formats the given mass as follows: aa:xxx.x, where as is either RF or DC, and xxx.x is the mass. FORMAT GETGRF GETMAS GETPM GETPRM INCMAS INTENS INT16 MSADC MSDATA 138 This routine converts a number from registers DE into ASCII in a predefined format, into a buffer pointed to by HL. This routine converts numbers to decimal, adds the required number of trailing zeros, inserts a decimal point, and pads the from of the string with spaces. The user than cells PRINT to output the formatted string. This routine returns a flag byte with graph information.’ This routine attempts to get a valid mass from the word stack. If a number was entered, it is scaled for the DAC’s in use (MASS = MASS * 8.). If no number given, the carry flag is set. This routine returns the X-coordinate on the graphics display corresponding to the current mass. Also returned are some of the graph parameters. This routine changes the entire parameter table for Mass Spec control. A table, set up exactly like the parameter table, is pointed to by HL, and its contents ‘ are moved to the current or active parameter table. This routine checks the parameter table, finds which quads want their mass incremented, and increments them. Checks are made if any of the masses increment past the and of their allowed scan, and if so, the masses are reset. This routine collects the data. A small sample of intensities (16) are collected, and this set is used to determine if autoranging is required. If so, the Keithley is ranged, and time is allowed for it to settle. This is repeated until the signal is in range, or no more auto-ranging is possible. The signal is then sampled and averaged the required number of times. This routine returns the sum of 16 intensities. It is intended to give an idea of the current ion intensity. This is the lowest level data acquisition routine. It requests and receives a datum from the analog to digital converter. This routine stores the current intensity in the data RAM, at a location corresponding to the DAC value (mass). MSINIT MSMASS PVRAM SMASS SMGL SRANGE STARTP 139 This routine sets up the parallel ports used for the DACs, ADC. It also gets the default parameters, sets masses to reasonable values, and cleans up the displays. This is the lowest level routine to send a mass to the DACs controlling the quad’s mass. This routine prints the text pointed to by BC to the VRAM pointed to by HL, offset by DE. The display is written into during vertical or horizontal flyback, minimizing flicker. This routine gets the "current masses" from the parameter table, and sends them to the DACs. This routine sends a message (pointed to by HL), and gets the response from the user. The message is broken down into words, and an attempt to get a number is done. This routine sends a new range to the Keithley, and delays a bit to allow the amplifier to settle and become reasonably stable. This is a file of the startup (default) parameters. 0n startup, this file is copied into the parameter table. 140 Appendix 2 Multi-dimansional data base subroutines This appendix contains descriptions of the subroutines used to interface with the multi-dimensional data base format. 141 —~_* SUBROUTINE MSINIT This subroutine (along with the PARAMETER statements in MDDB.CMN) set up the characteristics of this system. All possible datasets are initialized, and system-wide globals are set up, and the dictionary file is opened. C C C C C C C calls CONCAT (STRING) C SCOPY (STRING) C DSINIT (MDDB) C EXIT (RSX) C C C C C h gregg jul-82 SUBROUTINE MSOPEN(IFILE,QPRMPT,QTYPE,LOONl,LOON2,LOON3) BYTE QPRMPT(1),QTYPE(1) INTEGER*2 IFILE,LOON1,LOON2,LOON3 This subroutine open (close) the files associated with a dataset. New or Old files may be opened, and filenames can be prompted or defaulted. where IFILE the number of the data set to open negative means close that dataset. QPRMPT prompt for asking for a dataset name if LENGTH(QPRMPT) = 0, use QDSFIL as the dataset name (in common). QTYPE is either ’NEW’, ’OLD’ or ’READONLY’ LOONn are the logical units to use for the dataset returns (if ’OLD’, see GETPRM,GETCOM) calls CONCAT (STRING) LENGTH (STRING) GETCOM (noon) GETPRM (MDDB) h gregg ju1—82 OOOOOOOQOOQQQOOOOOOOOOOOO 142 Subroutines 55 write into 5 dataset SUBROUTINE PUTPRM(IFILE) INTEGERIZ IFILE This subroutine writes parameters to the header file. with the current time and data. where IFILE is the dataset number uses NUMSTC the number of static variables ISTATC code for the static parameters RSTATC value of corresponding parameter x NUMVAR the number of static variables IVAR code for the variable parameters calls DATE (RSX) TIME (RSX) h gregg jul—82 OQOOOOOOQQOOOOOOGOO SUBROUTINE PUTCOM(IFILE) INTEGER¥2 IFILE This subroutine writes a comment (title, words of wisdom, etc.) to the header file, including the current date and time, if needed. where IFILE is the dataset number uses QDATE is the date to write out. If LENGTH(QDATE) = 0, get current date. QTIME is the time to write out. If LENGTH(QTIME) = 0, get current time. QCOMNT an array holding the comment The comment is assumed to and with a null and be 80 bytes or less long. calls DATE (RSX) TIME (RSX) LENGTH (STRING) h gregg ju1*82 OOOQOOQOOOOOOOOOOOOOOOOO 00QQOOOQOQOOQQQQOOOOOOOOOO 143 SUBROUTINE PUTDAT(IFILE,X,Y) INTEGERXZ IFILE REAL34 X(l),Y(l) This subroutine puts a scan of data into the pointer and data files. All pointers are kept internally, and some values are calculated. . where IFILE is the number of the dataset to use X,Y are the X,Y data pairs uses NUMDAT number of data points IVARD code for X of X,Y pair RTIME time the scan was taken RVAR values of the variables IVARF codes for fast variables RVARF* values of fast variables, from RVAR. RSUMY* sum of Y values, done here ISCAN* + l is where this scan will go * ==> calculated or kept internally, do not change! calls PTDATM (MDDB) PUTPTR (MDDB) h gregg jul-82 S OOOOOOOOOOOQOOOO OOOOOOQOOOOOOOOOOOO 144 ubroutines to read from a dataset SUBROUTINE GETCOM(IFILE) INTEGER#2 IFILE This subroutine gets the next comment in the header file, along with its time and date of entry. where IFILE is the dataset number returns QDATE the date of the comment QTIME the time of the comment QCOMNT the comment (or title) itself calls none h gregg ju1—82 SUBROUTINE GETDAT(IFILE,JSCAN,X,Y) INTEGERtZ IFILE,JSCAN REALt4 x(1).Y(l) This subroutine gets all information associated with the current scan. All error information is from the subroutines. where IFILE is the dataset number JSCAN is the scan number to retrieve X,Y are arrays for the X,Y data pairs returns (sea GETPTR, GETVAR, GETXY) calls GETPTR (MDDB) GETVAR (MDDB) GETXY (MDDB) h gregg jul-82 000000000000000000000 00000000000000 145 SUBROUTINE GETPTR(IFILE,JSCAN) INTEGER*2 IFILE,JSCAN This subroutine gets JSCAN’s pointer variables out of the pointer file. where IFILE is the dataset number JSCAN is the number of the pointer to get returns IPTDAT pointer into the data file NUMDAT number of data points RTIME time this scan was taken RSUMY sum of the Y values IVARD code for the dependent variable IVARF codes for the fast variables RVARF values of the fast variables calls none h gregg ju1-82 SUBROUTINE GETVAR(IFILE) INTEGER¥2 IFILE This subroutine get the variable parameters for the current scan (i.e. GETPTR must have been called). where IFILE is the dataset number returns RVAR variable values for the current scan calls none h gregg ju1-82 0000000000000 000000000000000 146 SUBROUTINE GETXY(IFILE,X,Y) INTEGERtZ IFILE 331114 X(l),Y(l) This subroutine gets the X,Y pairs for the current scan (i.e. GETPTR and GETVAR must have been called). where IFILE is the dataset number X,Y are arrays for the X,Y pairs calls none h gregg ju1—82 SUBROUTINE GETEND(IFILE) INTEGER*2 IFILE This subroutine gets quickly to the and of the dataset, and sets up the commons for the next write. where IFILE is the dataset number returns all variables necessary for the next write calls GETPRM (MDDB) GETVAR (MDDB) h gregg may-85 “1.1 147 User interface subroutines 00000000000000000000000000000000000000000000 SUBROUTINE EXEDIT(INSTR,TRACE,IXY) INTEGERIZ INSTR,IXY(6) REAL*4 TRACE(4,1) This subroutine edits the ’traca’ matrix used by the extract subroutine. This trace matrix is the mask for the extraction program, and this subroutine allows manipulation of the input parameters (1&2). where INSTR is a return variable for the main program: ( 0 == 30 plot: -INSTR is the parameter for 3rd dimension = 0 ==> exit = l ==> start an extraction = 2 ==> file the data = 3 ==> call MULPLT = 4 ==> extract and file the data = 5 ==> file and plot the data = 6 ==> extract, file and plot the data TRACE is the matrix of min/max values (l,n)&(2,n) are the min/max limits allowed — edited here (3,n)&(4,n) are the min/max found in the data base - from extract IXY(l) is the code for the independent variable IXY(2) is the code for the dependent variable IXY(3) is the number of pairs extracted IXY(4-6)aren’t used here The numbers on the screen have the following format (see NUMDIS): integer - if it fits real - if its fits exponential: .xx-ee, .xxx-e, .xx+ee or .xxx+e if negative, one decimal place is lost The screen format is as follows: lines 1&2 are for titles lines 3-22 are for the numbers: . 3 variable 1 . . . . variable 2 4 variable 3 . . . . variable 4 etc 22 variable 39 . . . . variable 40 line 23 is for input line 24 is for descriptive information C Each variable location consists of a 40 character area as C 000000000000000 000000000000000000000000000000 148 follows: bytes 3-12 are a description of the variable bytes 13,20,27,34 are spaces bytes 14-19,21-26,28-33,35-40 are the numeric fields 1-4 ' calls: GETPOS (MDDB) VTxxxx (string) EDISPL (MDDB) NUMDIS (MDDB) INPUT (MDDB) SUBROUTINE INPUT(ICMD, INTEGER*2 ICMD REAL*4 VALUE This subroutine works h gregg 7/84,8/84 VALUE) for EXEDIT, getting the user input, correcting mistakes and translating the input into an index for the main program. calls VTxxxx (string) returns: ICMD VALU what the user typed l -- Extract, File 3 -- Plot, EF 5 -- FP, FP 6 -- Go, Go 7 -- Quit, AZ, Quit 8 -- Z, Z 9 -- Help, (PF2), He1p 10 -- 13 -- <1eft-arrow key> l4 -- (carriage-return) 15 value value, va1ue 16 -- X, X l7 —- Y, Y h gregg may—85 149 Main extraction subroutine 00000000000000000000000000000000000000000 SUBROUTINE XTRACT(IFILE,NPLANE,IXY,TRACE,ARRAY,MAXARY) INTEGER*2 IFILE,NPLANE,IXY(6,l),MAXARY REAL*4 TRACE(1),ARRAY(1) This subroutine extracts data from the multi-dimensional data base. Several ’planes’ of data may be extracted at once, dependent on the size of the data ARRAY. As many planes of data as can fit into the ARRAY will be extracted. The extraction parameters are passed in two arrays - IXY and TRACE. IXY contains the codes for the X and Y values to be retrieved, while TRACE contains the constraints on the constants (min and max values forming range). On return, TRACE will also contain the range of values selected for every variable; this helps determine if the extraction parameters were specified in enough detail. where IFILE is the dataset number NPLANE is the number of X,Y planes to extract IXY are codes and pointers for each plane (l,m) pointer to code for X (2,m) pointer to code for Y (3,m) number of X,Y pairs extracted (4,m) pointer to first X value in ARRAY (5,m) pointer to first Y value in ARARY (6,m) max number of X,Y pairs, no shuffle Note that an additional plane is used intern’ly and space for it must be allocated by caller. TRACE extraction parameters [equivalent to TRACE(4,NUMVAR(IFILE)+4,NPLANE)] (l,n,m) minimum allowed value (2,n,m) maximum allowed value (3,n,m) minimum found value (4,n,m) maximum found value (i,l,m) scan number (ISCAN) (i,2,m) time scan was taken (RTIME) (i,3,m) sum of Y values (RSUMY) (i,4,m) the Y values (i,k,m) the variables [IVAR(ifila,j)] where k=j+4 m is the extraction plane ARRAY is the data array MAXARY is the maximum size of ARRAY 000000000000000 calls ASHUFL SHUFFL GETPTR GETVAR GETXY STORXY TRACER IT ITRACE (MDDB) (MDDB) (MDDB) (MDDB) (MDDB) (MDDB) (MDDB) (MDDB) (MDDB) 150 h gregg aug-BZ 151 Common area - variable definitions C C MDDB.CMN - common areas for the MDDB data base routines C 00000 000 000 PARAMETER MXFIL=1 PARAMETER MXSTC=40 PARAMETER MXVAR=40 Variables common to all for the dictionary: open datasets static params variable params number of number of number of maximum maximum maximum datasets: BYTE QSDICT (20) 2 short dictionary description BYTE QLDICT (58) ! long dictionary description INTEGER*2 LUNDIC ! logical unit for the dictionary INTEGERXZ IPTDIC ! dictionary pointer: no/val/point misc. variables: BYTE QSYSTM (20) ! device and UIC for common files BYTE QDATE (10) ! the date \ either current BYTE QTIME ( 8) ! the time / or from the dataset BYTE QCOMNT (82) 2 comment lines from the header INTEGER¥2 LOONTI ! the logical unit for TI: (errors) INTEGER*2 IERR ! error return flag INTEGERXZ MAXFIL ! == MXFIL INTEGERXZ MAXSTC ! == MXSTC INTEGERt2 MAXVAR ! == MXVAR variables for each data set: BYTE QDSFIL ( 30,MXFIL) ! dataset file name INTEGER*2 LUNHDR ( MXFIL) ! LUN for the header file INTEGERXZ LUNPTR ( MXFIL) 2 LUN for the pointer file INTEGER*2 LUNDAT ( MXFIL) ! LUN for the data file INTEGEth ISCAN ( MXFIL) 2 current scan number INTEGER*2 IPTDAT ( MXFIL) ! pointer to current scan INTEGER*2 JPTDAT ( MXFIL) ! pointer to memory record INTEGER*2 IDAT ( MXFIL) ! pointer into RDAT REAL*4 RDAT ( .16,MXFIL) ! one data record buffer REAL*4 RTIME ( MXFIL) ! time this scan was taken INTEGER*2 INTEGERIZ REAL34 INTEGERtZ INTEGERXZ REAL*4 INTEGER*2 REAL¥4 INTEGEth INTEGERXZ INTEGER*2 INTEGERtZ INTEGERtZ REAL*4 COMMON QSDICT, LUNDIC, QDSFIL, NUMSTC, NUMDAT, RSTATC, 152 NUMSTC ( MXFIL) ! ISTATC (MXSTC,MXFIL) ! RSTATC (MXSTC,MXFIL) ! NUMVAR ( MXFIL) ! IVAR (MXVAR,MXFIL) ! RVAR (MXVAR,MXFIL) ! IVARF ( 3,MXFIL) ! RVARF ( 3,MXFIL) ! MAXDAT ( MXFIL) ! NUMDAT ( MXFIL) ! IVARD ( MXFIL) ! IPTX ( MXFIL) 3 IPTY ( MXFIL) ! RSUMY ( MXFIL) ! /MSFILE/ QLDICT, QSYSTM, QDATE, IPTDIC, MAXFIL, MAXSTC, LUNHDR, LUNPTR, LUNDAT, ISTATC, NUMVAR, IVAR, IVARD, MAXDAT, RVAR, RTIME, number of static params code for static params value for static param number of var. params code for variable params value for variable param code for the fast vars values of fast vars max number of data pairs number pairs this scan code for dependent var pointer for X, this scan pointer for Y, this scan sum of Y, this scan QTIME, QCOMNT, MAXVAR, IERR, LOONTI, ISCAN, IPTDAT, JPTDAT, IPTX, IPTY, IDAT, RSUMY, RVARF C __________________________________________________________ C 153 Appendix 3 EXTRACT User’s Guide Hugh Gregg June 24, 1985 EXTRACT is a program for interrogating a multi-dimansional dataset and extracting any two dimensional plane of data from it. The resulting data can then be formatted and presented to MULPLT for immediate graphics presentation, or the user may choose different extraction parameters and extract another plane of data. 154 EXTRACT, the program The program EXTRACT was created to extract any two dimensional plane of data from a multi-dimensional dataset. This is accomplished by specifying a set of extraction limits, and letting the program search through the dataset extracting all the data that matches the specified constraints. The actual limits of the extraction are then presented to the user. If these limits are acceptable, the user may plot (using MULPLT) the extracted data. Multi—Dimensional Dataset Datasets created by multi-dimensional instruments, such as Triple Quadrupole Mass Spectrometers (TOMS), may contain more than two dimensions of data (i.e. more than mass vs. intensity pairs). A TOMS has many instrumental parameters, each of which, when varied, creates an orthogonal dimension of data. Examples of these dimensions include quad 1 mass, quad 3 mass, the axial energy, collision pressure, and ion intensity. Varying more than two variables results in a dataset with multiple dimensions of data. A multi- dimensional data base system (MDDB) was created to handle these data. An MDDB dataset is a set of three files (*.HDR, *.PTR, *.DAT) that contain all the instrumental parameters as well as the data. The data is stored in this dataset in a series of "SCANS”. Each scan is the result of one instrumental operation (i.e. an intensity vs. quad 1 mass scan, or an intensity vs. axial energy scan at specific parent/daughter masses). The values of all variables at the time each scan was recorded are stored with the scan. In this way, each scan is a two dimensional slice into the multi-dimensional dataset with all other variables held constant. How extraction works EXTRACT extracts a plane of data based on the constraints specified by the user. The user is presented with a list of all the variables in the dataset and must choose what data is to be presented. The abscissa and ordinate (X and Y) of the resulting plane are selected. If the user wishes to limit the extraction process, the limits of the variables must be set. When EXTRACT is told to do it’s stuff, it extracts all data that satisfy the constraints. When this is accomplished, the user is presented with the actual limits found during the extraction. If these limits are acceptable, the data may be formatted for MULPLT and displayed. However, if the actual limits of the extracted data indicate a variance larger than acceptable, the limits of the offending variable may be narrowed and the extraction tried again. 155 Using EXTRACT To use EXTRACT, you must first convince RSX to let you use it. This is done by typing ”EXTRACT" or "EXTRACT datasetname". If you don’t specify the dataset name on the command line, EXTRACT asks you for the name. If the dataset exists, EXTRACT displays all the variables and waits for the user to type something. One section of the screen is highlighted - this is the "active" area. The user may move to a new active area by using the cursor control (arrow) keys on the terminal. To change a specific value, move the active area to the position of the value to change, then just type in the new value (followed by a ). Commands can be entered at any time, and take immediate action. For more information about the commands and how to enter values, see the sections on COMMANDS and VALUES. Screen Display Format The screen format of EXTRACT displays all the user variables stored in the dataset. Each variable has four values associated with it: the minimum and maximum extraction limits (set by the user) and the minimum and maximum limits found by EXTRACT in the dataset. Each value is displayed in only six character positions and may appear truncated. If the value conveniently fits within the six character window, it is displayed in full. A problem arises when displaying large or small numbers (i.e. 2.3456E7 counts). These numbers are displayed in exponent format with implicit "times 10 raised to" (i.e. .234+8). In the worst case, only one digit of the number plus the exponent is visible (-.3-16). The bottom line of the screen displays an enhanced version of the active area. A number truncated in the main portion of the screen is displayed in full along with the full name of the variable. This status line is also used to display definitions of certain values. For example, a value of "2" for the variable "CAD gas type" produces the description "CAD gas type: Nitrogen". Values Values may be entered in a variety of ways, the simplest of which is to type in the new value. While this method will always work, there are a few short cuts available. When the active area is at either the minimum limit or the maximum limit, the following may be used: Minimum limits (CR) N1,N2 Maximum limits (CR) 156 A single carriage return, when the minimum limit is the active area, resets the minimum and maximum limits to "no limit". This allows EXTRACT to use any values for this variable. Entering a number N results in that value being inserted into both the minimum and the maximum value slots. Note that the input of exponents must be done in computer notation (i.e. 1.23E7), and that a carriage return must be typed after the number. This format enters the number N1 into the minimum value slot, and the number N2 into the maximum value slot. This format enters the number N into the minimum value slot, and the number N+i into the maximum value slot. This format allows easy entry of a range of limits; you specify the minimum value and the range or increment. ‘ This vrmat enters the number N into the min1n.m value slot, and the number N+l.0 into the maximum value slot. This format is identical to the above format, but defaults to a range of 1.0 (useful for mass selection). This format enters the number N into the minimum value slot, and sets the maximum value to infinity (1.0E36). A carriage‘ return at the maximum limit sets the maximum equal to the minimum limit. Entering a number N results in that value being inserted into the maximum value slot. 157 Command Summary This is a list of the available commands and a brief description of their use. All commands may be abbreviated to their first letter, and are explaned in greater detail below. Single keystroke commands: arrow keys move the active area (PFl) get the "Command?" prompt get the "Value:" prompt (OR) at first limit: sets "no limits" at second limit: set second value to first AZ Exits EXTRACT anything else starts either a command or value Other Commands (in response to the ”Command?" prompt): X set the abscissa Y set the ordinate Z three-dimensional extraction and file Extract Extract a plane of data File File the extracted data Plot call MULPLT EF Extract and File a plane FP File and Plot extracted data Go Extract, File and Plot extracted data Save Save the current extract limits Read Read in saved extract limits Help Display the command summary Quit Exits EXEDIT Description of Commands X or Y This command sets the active area to be either the abscissa or ordinate (X or Y) for the resulting extract plane. A highlighted letter (X or Y) is placed in front of the variable name to indicate the current X and Y variables. 2 This command executes a three-dimensional extract. Position the active area to the variable that will be the third axis (note that this variable’s limits must be set). Enter the 2 command, and you will be asked for the incremental value for the 2 variable. This procedure does a series of two dimensional extracts, starting with 2 equal to the minimum limit, and subsequent extraction with the 2 value incremented until the maximum limit is reached. Data Extract File Plot EF FP Save 158 from each two dimensional extract are filed. An auxiliary program (PLOT3D) is used to read this file and produce MULPLT files for pseudo three-dimensional plots. This command extracts a plane of data from the dataset and displays the resulting extraction limits. The data extracted are not automatically filed or plotted. This command files previously extracted data. The filename is derived from the dataset name (*.BIN), and each File command in one extraction run uses the same file (the MULPLT tag for each file command starts at "AA" and is incremented for subsequent file commands: "AB", "AC", etc). A MULPLT command file is also created (*.PDL). Before the first file command is executed, the user is asked whether the plot will' be a point plot, line plot, bar graph or spectra plot. Point plots and bar graphs are just that; *the data for line plots will be sorted; spectra plots are normalized to the base peak, and the maximum intensity is put into a MULPLT special features file (*.SPF). This command go directly 'to MULPLT, passing it the name of the command file generated by the File command. The user is left in MULPLT, ready for a 00 command, or whatever. To exit MULPLT, type Halt or AZ, and you will be returned to EXTRACT. Extract and File command for those times that you know the extraction will work, but you don’t want the data plotted yet. File and Plot command. Used after doing an extract command, and you wish to plot the data. Extract, File and Plot extracted data, this command ”Goes for it", and does it all. This command saves the current extraction limits in a file (the default file name is derived from the dataset name, *.EXT). This is a normal file that may be edited, displayed, etc. For information on the file format, see the Read command below. 159 Read This command reads in a file of saved extraction limits (the default file name is derived from the dataset name, *.EXT). The file contains lines with the following form: "variable = N1 to N2" for each of the variables with limits set. The variable name must exactly match the short dictionary definition, and the two keywords "=" and ” to " are not optional. Help This command produces a short list of available commands. Quit , This command exits EXTRACT. The mechanics of EXTRACT Internally, EXTRACT consists of three major phases: extraction parameter editing (EXEDIT), extraction (XTRACT) and formatting for plotting (MSPLOT). Both the editing and plotting phases are described in the COMMANDS section. The extraction routine itself basically operates as two units. The first unit processes the simplest of the extractions; i.e. when the extraction plane is a scan. In this case, the requested plane of information is stored.in the format that the user wants, and all that is required is to retrieve it. The retrieval of a plane of data that wasn’t scanned is more complicated. A large loop is initiated where each cycle of the loop looks at one scan in the dataset. The limits specified by the user are used to try to reject a scan from further consideration. If the scan cannot be rejected, the X and Y variables that the user specified are studied. If there are multiple X values, all X values are retrieved along with their corresponding Y values and are stored in the extraction plane. If there are multiple Y values and only a single X, the Y values are averaged and the resulting X, Y pair is placed in the extract plane. Finally, if no X or Y values match their extraction limits, no data from this scan is saved. Once this scan is finished, EXTRACT loops back and gets the next scan until the entire dataset has bee examined. The dimensions of the internal arrays in EXTRACT are not fixed as in conventional programs. This allows quite a bit of flexibility in the size of the scans and planes extracted. Space for the data in each scan (from the dataset) and the extract plane (under construction) is dynamically allocated from a pool of 16000 bytes (room for 2000 X,Y pairs). Therefore, as the extract plane is built (more data pairs added to it), the room available to hold the next scan diminishes. In practice, this space is more than enough. If we assume a fixed length scan of 1000 X,Y 160 pairs, we still have room for an extraction plane of 1000 X,Y pairs. GLOSSARY of terms used Active area Extract Plane Scan Value slot (CR) (PFI) describes the area currently highlighted on the screen. the process of searching the dataset for a specific two dimensional plane of data. a two dimensional grouping of data, usually extracted from a multi-dimensional dataset. one instrument scan; i.e. one grouping of data consisting of X,Y data pairs and the corresponding values for the variables. one of the four minimum or maximum values for a variable. A value slot may become the active area. the carriage return key. the PFI key on VTlOO’s, the "gold" key. Used to start a command. the PF2 key on VTlOO’s, next to "gold" key. Used to obtain Help. the PF3 key on VT100’s. -Used to start a value. control Z (hold the control key down and press the Z key). Used to exit EXTRACT. RS "17111111711111!fillflffllwliflflllflltlfl