é;

IS’W u: I . 51w If~.:-T . S"

526'!”an ’ ". . 'J t ‘ l

’ ' " I O’N- .~'

:212.'2r-2.:2....,::"..r2 .. .-,.
.21-1 "

y}: Ev f. .‘Slu'.
.' I
.' 5'

re:

n,f"d‘¢
3“" "S”

      
   

    

   
 

.: '15: $35553: .‘2
. .,

      
 
  
  
   
 

   

Suﬁ”. .2-
rS‘a M‘ 15:21:22,: ‘22-:

s

    

I
- 2'41, 2.7.2,:
;:-'.“.2’?‘:'.* 31$," ~H' . k :
. v, S‘T.-‘<:" .72). .I
."‘ ’f": ” {E E’E ’E”; E ’E’ESf’EsxfiffﬂES r '33."; E’SE‘;P;2.'L1‘T!"21’.:2’W{’3;E" (23%;: “3mg. 1.-
' ‘ .S‘rqﬁm . ... .nES :1“: S' i‘S‘?‘ 7-3631“: ., ‘;2.§;‘:\
:‘Hm W15; ‘. -. . .4713.S§'.l1.§.i 25 PM 5}:
1?};‘CE It’ll." . ! ._l ‘. 5., ’E‘S
s "."2' _ ‘ “pp“ L I

 
 
  
    
    

  

 

. W"); _..2.€‘7;‘ 4:.é-L-xga
... , 2 2
'L'v‘v '5’.»
\ ’f
22.1?
-.
33.;
‘. '3!
r p
....I
x”: .
. -.-
- .
--
- ~
.

w
.n

’

..

J'-
. .22.:1‘7.‘
ml-.~-
, m. V
...... w:
4-"...

  

:1: S‘i'f' .'
M m Sm.

 
  

'~"'. I .':_'_ “."IL .ES’E’
.22. . . . 2 232‘ S2 . 2:22 23? 22:2..- 2
JESS: ‘. 2. SSS-b23239 quS '2.’ ' ”SEW”? " ""S
;. '1: ’. v :1. 1’.’ rr’ .c I’e’f .; . ’2 E
""- 2. I’SE’ I"; ~2‘ 2-: ' r425 ” c’" ’ . ‘é
T’EE’ I,3:S--”.~f.:_2,:f§;;2’;2”"."', 3;" WW" SZ'SS‘SES’S SSEMS’ﬂfé‘Sﬁ:
0*” ' .
1'19?“ 1 IL
9

   
  
   
   
     
 
 
    

  

  
   
 

   
     

       

    

   

 
  
  
   

   
 
   
  

   
      
       
   
 
 
 

  

   

 

    
 
  
    
  
 

4'7. , ’E” 2!:

. -.‘ 'J.’ 1’74' ., ,‘é' TESS" '7” I: ;- '7 .mg _ v. 0. $.33 ,-
’222ES‘25‘55‘13‘U.‘ mm s ' '22? E ’ '1’:S‘2“"' ; , . ’.---.SS'1’E~"" 2:22:52“: "'4’ :. :S .1S’ isSﬁ-ESS'SSS:.s§¥:;':f”fg
2:22;};2324512927'45'1'Sta-Ty :2;-n..!2;2“14,322 2' 233;" 2:52..“ 2223351.“: S'i‘cw'!S"2SSS3SSi:S:12’”'S'SSMSSESJSESEE';EESSSiii-‘35::2:
' . E'--"'T.S7S§22'fg~;S:;SS':i’PS r2”: .;:: Wait, 2fogzi':"S‘;HiS?‘ “1‘31:fo =S§95§féIm2 7.. Elm] S' .29: 352“" LESSMg:2:9;22212;:=§;2i22§2:2sr;3a
2- twp a: I.’ J’ . 1‘ ”’89.! r i2'L2glI '3 '2 -1 ' 2" ' ’2‘! u 2’ :9- t 2* ‘:.'.,.I S. at. :21" 'L ' "S
-2' SS} SSE”?! “Suﬁ?! ”...:t‘S‘w 2:" '.'.’S,I~,§='2’:' ". . 7:5?th 12:31:». -332}. ml :7 21:33? 2E2-2g227m ':S;5:‘SS.S,§§ISSESS:
~"-.-SI.2’:::‘ ."W’h-‘S W?! 5.2%.:12 .,..'7-~I2 {SgiFSE‘ifESSg-2. . S1335 £3; I*2i‘."'"" ’5 S":WW'SEJET2;Sftg22z‘2::fi2$S2JET'TSSSIEpn'qQS
ST,‘ ST ‘EiE’E’ ﬂS‘. "S’iES ' .i-nEE-i 3 2f I ‘zhE" .. .2 2 :{ﬁi‘r Fair; 5' E57, 1 1.31:1?“ ‘3‘;
:1? I “SW ”ES 2:: ““27“"? >253...2’Qf-’2TS2’S:'3 2”” 2E ”(352225! 2,"- iij-I .2532 2222:: “‘Siﬁi‘ut” ‘S’f’SESiﬁfSESTSE:SSSESSSSSSE‘SSEEPSSSESM‘T'T
u i: ’5’?“ SE: J‘SQI’SE"; ‘.’:‘r2:,",.2. 5‘73. 2.2‘ 2.222;: .2S29'S‘2S. :5: E”{ . 22"15-‘25', :xf’ _:e;"‘S:S:2LS’§:
"" Sag ’2L2SM‘E1’E1’SSH‘I’W 'S'SSSSESm’:"Si§*:2"’37,;;5(7g’1[, 3 S~,l'"§,.S’f~1ff SENS"?’02?i’§:’i¢.’12S24 EST-{SE l’fSE” 2+ .::.:.Si."‘12r‘ E’tif’ ‘ SSEWSZEQM
“1:311:!': RES‘llf-VEXELQdifv. x ‘ s :,'}’E‘ .’O’.’E \ '. .' :-' ; ‘4’, X”; - “ICE 1".” 5 'S :3”. '3“ 23"" P . “’7’ F1“? r“?

E NE 7”” 54“! S’ -' 2.52o‘EESETEf;r-‘S.’SE:’IL 'S"L2: :‘T ’L’ Riggui’ S”””” SSE ”of,‘ “'11:“? A “Md". LESS? ’N’M'S’t’ SSTSS’S SW 1
“:iﬁ- , ' ."-‘ 3131- ' ‘ ’1'2.' 1. ‘ SSE“ .’ 4122.12.11.“ " S ’ Sw-

” SE9”) 'E‘ES " .;.-,. {92%wi 'Z:'-:' " :‘f‘E ,cﬂhmyn SJ wfzi5.’J!f‘!'f"$b.‘:2’r;’2I 5
'2’ ’S’S‘SE 'u‘F'S’S’IE” ”‘3' SWEET?!" . ’St 2 ' ES’HE’ES SESE'ELSA“ SSS'SSS'J" 22S: NM 323‘“ S':'2§S2§E:SS:S;:SS232’ f’S '.';w'S‘S' JRS
Q {2. .'"S m1: .1 .I] {QM . ’Shf SESS'I’] S’zf "329$. :S' c "3:131 ES’S’S :' "' :SIS‘SE’ My; ﬁrffii “-1.le i' “’5!

I Sn; P132???” H K "S {but 1‘3}: 7 f’.‘ ”thguhHL‘hﬂ"; {II/ﬁr},- 13’4“”? [14? gig! :35)“; b"): huff; 7f; y’.‘ '23:"! f‘a‘wg jé‘EE ’2' ..i: f . . T
’. Sit-iii)!” 0," h; . 4’ _2 o ’Iv’), ’r‘ qr) 'E’ pr”. 4 {It ' TH}. ‘ it, ‘I 'v -‘:S’E’"}3".E ’
"rm-"1W 2 ‘SSM ' 1;?2) 4412:2521: I‘ S:‘=E.SSS£;2’2SI.'1"S: if'ﬂré. w’JSSSH 0f 2“,: st’yﬁElEESSESz'sSE’E"?12!:’(S‘SSES 'Ir::t.'l:

SA ?:""2"’ HS) 9'2“er: ' 22 (if; S: ‘S‘Sr:SStSSS2:;:.7‘S}:,S'S-”"Sr” SSSTSEMES“:‘22ﬁgi2sik'ffel'i'53: ’-S"2’1.¢'2' 5‘2‘

‘2‘ ‘2‘ a. S2 W. u :3". "S “373,1“; “@413?
:2 . 2 2: :. 2. f
5 0’5 ' 7:1,“; Q
. $1.:

~.t\
......
u
¢
0
~,.\-

4‘ '~
. . . . .
-2 r.
-' a“
' II ' ‘E
”A“
-,«~:r ~..
I ‘ ;‘ pt:
. . ,
m. 5':
m,

   
 

Civil

It??? I as“:
”' S’ ﬁ'I’ES E’s“

. wv~w

       
 
  
 

  
 
  
  

"It;
'1’: ”SW
D "I }| ”"[,
'; ESP? ”4:." .Eik ’y‘f "J’l‘j;
” IL I‘ 5 i2." E"';-‘r>l‘
4 ,,

    
  
   
     
 
  
 
      
     
  
    
  

  

I t,
\
’r ”(’E

         

j.
‘ 2.:

   

  
  
      
  

 
      
  
 
  

  

 
     
   

    
 
      
  

    

 

   
 
   

   
  
 
 

  
 

   
  
   
 

   
  
   
     

   

 

 

D ‘ZPSIE:

nun...

, . _ g' '~
2 I I y] r E’! "t'
3 .II SES‘E‘.” 3-:T-291:
9 EFE’E’SWPEhﬁd” £2” “if: E’E’EH
.S 2: 331:",f'."'§",fzé 2' 2
I ‘ J T Erﬁ§§ﬁ§é {131? EE’EEEEEEEEE’E’EE’E”E’”S’ r” [: ’d’E’E’E’ ’ ' ‘ ' S;
’ 5S§fi1§!"5{}' WI :{'}5":"‘.2 ‘.’ E: .12, I
l :flﬂg”":’f ... ’5’.I’I'Ei§?’IIl:f, {"2 I '. . E
' S - 2255;119:23 :7:§5§S§}2.:;2Si222::2§132S5? ﬁnglSE ' . ...' . , . .. 2-
' ' Snark-£11559:I'ére‘aée-e I2,£;::;2S.2:::S:2;2;2=:;S 22:" . 2 .. 1: 1:21.73” “’ TM: 2 '2....':.2 S22
' v' « 11“”: 81.3.73: (£31121? 'SE’EE a! "2L: IS?“ 0 ("3"1'!’ {/2 EEE’E'EE’ i: E” 1:th .ZpI ’Ei’ .; '25“ ’tjil’
. g: z g-(v l I, :1 ’of - 1" 4.1,":P" J .Pb {"11 a W i,’ . ﬂ! £32,}: I? : .
., I. ,1 , , :H'.S.21S"S.:'7ISF‘P;2Si-S 1r 7 "f”? Tag: (~ .-. ..
; :5 ’MS- 4S" " SH "SH 2. SSS EMS 53’: SlrS-f:2’SS’SSS'~-"’"- 2 E252 .22I2SE3»? 12*- . 22 -;:"2':
" . "2‘ MS 2:§S‘:'.,." ‘S’S'S €15:’:*£=‘::~;'SSEShSL-SE: {SuiSuEzﬁSSr;Sf.’m-SEISS: ,--. 2 2.3552?
3' .Sf; ”222‘; 2“2",2,‘:;.S: ‘-:':f2‘:::Si.I':S§2.t ;~‘;::?2,;7-::T ﬁzﬁﬁsfmﬁdicf uni " 'S‘SPS'I',’S?’ . :‘r.§f_.:¢:‘f‘:§
WERE; ; ‘g-Sv’ 5’1"“ S“f;’E “WES J:t"?£r“’:'E’S”’ SEER-E3917 ’ ”22’§S§S:EEXS;S3£§:§?SS§§
Sféifzi, - 13,73; :33: “TITS“. ”i’ '1'" maggff'ekzgSz'S " ’ 'I'rTgistﬁgzsisg
r,’ 'v':’ ‘ '2",: 5 r, 'rI':' ..hI. , n :;"‘ E '_ :‘ 2’ E. ,th’7EI‘.r’I {51
74:29:!“ I'E'ZM‘ISS S’SS'SISSSEV’JS’FS‘SSI S’n: E’IS'RE‘ EEEé’c‘Sé'ES-Eix’v’g’h . - SSiS'S2S21‘55Eé:"'
'S‘Ei’iiﬁﬂg, ;" iS'Sﬁ’SEﬂe’. ‘12? SW1 "Sm: :522'2SSei221'é‘S55S‘WH; ﬁg”? ‘52 $925
.2: ”‘SS’SS’”S’§””‘SE”E ~‘" 22 1W "SS v.2 TQM FEE” Way
I "'f’ l ... Lg . I‘ ’ Eregf wgii IHL‘ 17‘ i; ‘5' is
2 S: 213.3% 4:? g: SS “224:2
' Jr ',

I

3%
‘S
’_”’E
'2'

J
1

. : EglE’ﬁi’,

   

0 a
m,-
«up-w..-
'M‘M
um
man. "a -"
“.2.- .._.
Y.» \m
~ .-
I

.— ‘.::
w

—o-—s<~“-v
-m
.‘ .

.
nan
W

>—-...—.

fhb.

".-...,.

1‘50 ‘ILO‘U'I
MIMIMQQNMMJ * mm,
Michigan State
University

 

 

 

 

 

 

 

 

This is to certify that the

dissertation entitled

GENERATION OF SUBSTRUC'IURE IDENTIFICATION RULES
AND MOLECULAR STRUCTURES FRCM MS/MS SPECTRA

presented by

Kevin Joseph Hart

has been accepted towards fulﬁllment
of the requirements for

Ph . D . degree in Chemistry

ﬂ% 5%

Major professor

 

 

 

Date Qw/D/ /7?0
V

MS U is an Affirmative Action/Equal Opportunity Institution 0 12771

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE

 

DATE DUE DATE DUE

 

 

 

 

 

 

 

 

4—.»

 

 

 

4'

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

==Tﬁ

 

 

MSU Is An Affirmdive Action/Equal Opportunity Institution

GENERATION OF SUBSTRUCTURE IDENTIFICATION
RULES AND MOLECULAR STRUCTURES FROM
MS/MS SPECTRA

By

Kevin Joseph Hart

A DISSERTATION
Submitted to
Michigan State University
in partial fuﬂllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Chemistry
1989

w v: I h‘
Fm}, . .2
V v

1 -1
t-k’

ABSTRACT

GENERATION OF SUBSTRUCTURE IDENTIFICATION RULES
AND MOLECULAR STRUCTURES FROM MS/MS SPECTRA

By
Kevin Joseph Hart

An Automated Chemical Structure Elucidation System (ACES) is
being developed to provide a method for obtaining molecular
structures from mass spectrometry / mass spectrometry (MS/MS)
spectra. The MAPS (Method for Analyzing Patterns in Spectra)
program generates rules to identify substructures in unknown
compounds. This program requires a reference database of MS/MS
spectra and a listing of the substructure content of each of the
reference compounds to provide these rules. Two complete
reference databases (using single and multiple collision conditions)
were compiled, each consisting of 6,106 MS/MS spectra for 105
compounds, many of which are regulated drugs., A new version of
the MAPS program has been developed to generate “feature-
combination” rules. Feature-combination rules possess greater
reliability and recall than those generated using previous versions of
MAPS. An Automated Structure Library Search (ASLS) function
was implemented to provide the substructure listing used by the
MAPS software. A commercial generator program was modified to
automatically provide candidate structures for an unknown based on

a list of candidate molecular formula(e) and substructures identified

as present in the unknown. Both the ASLS and Automated Structure
Generator (ASG) programs were modified versions of programs
originally developed at Stanford University (STRCHK and GENOA).
The rules obtained from the MAPS software were evaluated by 1)
the recall and reliability of the rules with respect to the reference
database, 2) a comparison of the content of several rules with known
fragmentation patterns and 3) application of the ‘rules to compounds
not in the A reference database (test compounds). Among the
substructures for which MAPS rules were generated are
phenothiazine, barbiturate, t-butyl, phenol and amphetamine. Many
of these rules were found to be reliable when applied to the test
compounds. In some cases, the substructures for rules which
produced false positives were cross-correlated with another

6‘

substructure. Other false positives could be classified as near
misses” and were often due to inappropriate substructure definitions.
It was concluded that while the current reference database was large
enough to prove the viability of this method, it was too small to
produce sufficient rules for a generalized structure elucidation

system.

Copyright by
Kevin Joseph Hart
1989

 

“It’s a terrible thing to lose one’s mind,
or never to have had one.”
Newsweek

Dan Quayle, Vice President of the United States,
speaking to the NAACP.

“Stress can be avoided...If your car isn’t working
and that’s causing you stress, buy a new one.”
State News

This guy obviously never had to make car payments.

“There is yet time enough for you
to take a different path.”

A fortune in a Chinese fortune cookie acquired by
the author on the eve of his thesis defense.

“Toaster Ovens! Did you here meg)",
PROGRAMMABLE TOASTER OVENS!”
Bloom County

A threat issued to a “Banana JR.” personal computer by
its manufacturer after it spit out a user’s “cheap software”.

ACKNOWLEDGEMENTS

I would like to thank my research advisor Chris Enke for providing
his support, intellectual and moral, over the last several years. I
decided to come to Michigan State University because of Chris’s
interesting and active research group. It has proven to be one of the
best decisions I have ever made. I have grown a great a deal
(admittedly not without pain) in the environment that Chris has
striven to build here at State. I would also like to thank the
members of my committee, Dr. Charles Sweeley, Dr. William Reusch,

Dr. Thomas Atkinson and Dr. Victoria McGufﬁn, for their assistance.

I would like to thank George Yefchak, Eric Erickson, Jon Wahl, Mark
Cole and the other members of the Enke group for their friendship
and assistance. I also want to thank the “old farts”: Mark Bauer,
Hugh Gregg, Bruce Newcome, Kathy Fix and Norm Penix. I would
never be able to play cards as well as I do without their expert
training. Fortunately I learned a great deal of science from them as

well.

Pete Palmer and Adrian Wade deserve special mention. The ACES
system could not have been realized without their contributions. I
am grateful to them for both for lending their expertise and their
friendship in this endeavor. A special thank you goes to Drake

Diedrich and Chris Weaver, two undergraduate programmers, for

 

 

their programming efforts. The development of the ACES system is
truly a joint effort of some very talented people.

I want to acknowledge the huge amount of love and support I
received from my family, especially my mother, Joan. She has
always been there when I needed her. In addition, I want to
acknowledge my good friend Chris Evans for providing a shoulder to
lean on during the bad times and for all of the entertaining
experiences we shared in graduate school. I also want to
acknowledge the love and support of my very special friend, Bobette

Nourse. You are, and always will be, my friend.

I would be remiss if I did not extend my regards to the MSU
Department of Public Safety for towing my car while I was
proctoring an exam for 300 freshman chemistry students. Their
dedication to the educational effort here at MSU is truly astounding.
The secretary in the Dean’s office also deserves special note. She
once responded, “Does he really need it now?” and “Can he wait until
next pay period?” (read that, “next month”) in response to an inquiry
about why my paycheck was three days late. Why do Fellowships

always get screwed up?

Finally, I would like to acknowledge the National Institutes for
Health for funding much of this work. Finnigan MAT has also been
generous in their support of our efforts. Thanks are due to Molecular
Design, LTD., especially Dennis Smith, for licensing the source code to
the GENOA and STRCHK programs.

 

TABLE OF CONTENTS
List of Tables - - - - _ ._ ...... . ..... -----xi
List of Figures"--- -- - - ........ -- xiii

Chapter 1 “Automated Structure Elucidation using MS/MS Spectra”

 

 

 

 

 

 

 

 

Introduction ............................................................... 1
Historical Background ........................................................................... 5
Triple Quadrupole Mass Spectrometry - ..... - -5
Structure Elucidation of Pharmaceuticals
Using MS/MS Data. ..................................................................... 8
Automated Structure Elucidation for MS and
Other Methods ..... 15
MS [MS Spectral Matching and Reference
Databases - 18
Pattern Recognition and Other Methods ........................ 22
The Automated Chemical Structure Elucidation
System ---- 22
System Components - -_ - - 23
Multiple Rulebases .................................................................. 2 9
Potential for Recursive Operation _ -3 0
Conclusions ..... 3 2
References - - --.--33

 

Chapter 2 “Automated MS/MS Spectral Data Acquisition for MAPS
Reference Databases”

 

 

 

 

 

Introduction ........................................................................................... 3 9
Key Instrumental Parameters -- 40
Automated Acquisition of MS/MS Spectra ............................... 5 3
Data Transfer Software and Computer

Facilities - ............................ 6 3
Standard Compounds Selected for the Reference

Database 6 5
Purity of the Reference Database Standards- - -7 3
Irregularities in the Reference Database 77
References - ............... 8 8

 

viii

Chapter 3 “Generation of MAPS Substructure Identification Rules”

 

 

 

 

 

 

Introduction ........................................................................................... 8 9
MAPS Software Development - -- - - - 91
The MAPS Software - Version 11 .................................................. 9 3
Initial Work on Feature-Combination Rules _ -- _ -_ 103
Feature-Combination Rules Obtained Using MAPS
Version II .......... -106
The MAPS Software- Version III ............................................. 1 1 1
The GENT Program -- 1 12
The MAPS (v. 111) Program -- 115
The RULE Program -_ ............ 121
References _ _ _ 123

 

Chapter 4 “Automated Substructure Search and Structure

 

 

Generation”

Introduction ........................................................................................ 124
Computerized Representation of Chemical Structure ....... 126
The Structure Generation Program, GENOA 13 O
The Structure Checking Program, STRCHK ............................. 13 6
Use of Substructure Search in Rule Generation ................... l 3 7
Automatic Structure Generation 145
Potential for the Reduction of the Number of

Candidate Structures through Ancillary Experiments ...... 152
References __ l 60

 

Chapter 5 “Evaluation of MAPS Feature-Combination Rules”

 

 

 

 

Introduction ........................................................................................ 1 6 2

Reliability and Recall for the MAPS Rulebase ...................... 16 3

Analysis of Several MAPS Rules 168
PHENOTHIAZINF. l 6 8
BARBITURATE l 8 5
PI-IENOL and T-BUTYI _ .---189

Alternate MAPS Rules from Multiple Collision

MS/MS Data ..... 19 6

 

Comparitive Recall for MAPS Rulebases Generated
from Data Acquired at Different Collision Gas

 

 

Pressures - ____________________ - -.-__-.-.--_19 8
Effect of Collision Gas Pressure on Rule Content ................. 200
Conclusions -- 2 0 5
References - __ 205

 

ix

 

Chapter 6 “Evaluation of MAPS Rules by Application to Test

 

 

 

 

 

 

Compounds”
Introduction ................................................................... 207
Evaluation of Individual Rules using Test Compounds. 2.1 l
PHENOTHIAZINF. _-21 1
BARBITURATE 21 4
PHENOL and T-BUTYL- 2 l 6
AMPHETAMINE 2 1 7
Evaluation of Rulebases Generated using
Global Parameters _- -218
Recommendations for Future Development of MAPS ....... 22 3
Conclusions --22 5

 

1.1

1.2

2.1

2.2

2.3

3.1

3.2

3.3

4.1

4.2

4.3

4.4

4.5

LIST OF TABLES

Comparison of the number of potential spectral
features in the MS and MS/MS data spaces for
selected masses ........................... 3

 

List of key parameters affecting relative daughter ion
intensity in a triple quadrupole mass spectrometer
(adapted from reference 47) ................................................................ 21

Standard instrumental conditions for creation of
MAPS reference databases using a Finnigan
TSQ-70 TOMS ............ 5 3

 

List of compound names and CAS numbers for each
of the reference database compounds - -- ---.--67

 

List of nominal masses, molecular formulae and
number of daughter spectra obtained for each

reference database compound ............................................................. 7 0
Summary of the inputs to the MAPS (v.11) software ................ 94
Procedure for generating MAPS (v.11) rules-W ------ --99

 

Reliability and recall estimates obtained for the MAPS
(v.11) PHENOTHIAZINE substructure identification
rule at several different match factors ......................................... 10 3

Connectivity table for cloral adapted from reference 17 ..... 12 8

DENDRAL connectivity table for hydroquinone adapted
from reference 15 129

 

Atom connectivity matrix for hydroguinone adapted
from reference 15 - - -- ------------ ----- 129 .

 

GENOA connectivity table obtained for the barbiturate
substructure - - ------- .-131

 

Summary of major GENOA commands - --------------- --13 3

 

xi

4.6

4.7

5.1

6.1

6.2

6.3

6.4

List of substructure constraints (ordered by reference
number) and the number of cases generated by the

structure generator. The number of atoms and a

descriptive name for each substructure reference is

provided in parentheses. - -----152

 

List of substructure constraints (ordered by number of
atoms) and the number of cases generated by the

structure generator. The number of atoms and a

descriptive name for each substructure reference is

provided in parentheses. -153

 

Cross-correlation values calculated for each of the

spectral features contained in the MAPS rule for the
PHENOTHIAZINE substructure (shown in Figure 3.22)

with respect to the “88110” substructure .................................... 184

Compound name, molecular weight, molecular formula,
and CAS number for each of the test compounds used to
evaluate the MAPS rules........‘ ............................................................. 20 8

List of MAPS rules and the results obtained when the
rules were applied to the test compounds (X - correct
identification, F - incorrect identification). - - -- -- 212

 

Reliability, number of predictions and number of rules
obtained for three MAPS rulebases - ------------------------ 219

 

Rule reliabilities, number of predictions and number
of rules for several MAPS rulebases obtained using the
indicated parameters ............................................................................ 221

xii

 

LIST OF FIGURES

Block diagram of the major components of a triple

 

 

 

 

 

 

 

 

quadrupole mass spectrometer ........................................................... 6
MS/MS map for isopropanol. Adapted from Yost

and Enke, American Laboratory, (June, 1981). --7
Primary mass spectra of (a) pentobarbital and

(b) amobarbital . 10
Daughter spectra of the molecular ions of

(a) pentobarbital and (b) amobarbital 11
Daughter spectra of the 156+ fragment ions of

(a) pentobarbital and (b) amobarbital ------ 13
Daughter spectra of the 141+ fragment ions of

(a) pentobarbital and (b) amobarbital ------ 14
Schematic of the Automated Chemical structure

Elucidation System (ACES) ............................................................ 2 4
The intial MAPS (v.11) rule obtained for the
PI-IENOTHIAZINE substructure 2 6
Daughter spectra of the molecular ion of promazine

with a collision energy of (a) 5 eV, (b) 15 eV and

(c) 30 eV - 4 3
Daughter spectra of the molecular ion of promazine

with a collision energy of (a) 45 eV, (b) 60 eV and

(c) 75 eV 44
Relative abundance of several fragment ions of

meperidine plotted as a function of collision energy ............. 4 5
Daughter spectra of the molecular ion of promazine

with a collision gas (argon) pressure of (a) 0.4

mtorr, (b) 1.2 mtorr and (c) 1.8 mtorr (CE=30eV) ................... 47

xiii

2.5

2.6

2.7

2.8

2.9

2.10

2.11

2.12

2.13

2.14

2.15

Daughter spectra of the m/z 199 fragment ion of
promazine with a collision gas (Argon) pressure of
(a) 0.4 mtorr, (b) 1.2 mtorr and (c) 1.8 mtorr. --------

Relative abundance of the m/z 56 daughter ion of

meperidine plotted as a function of collision energy
and a collison gas pressure of 0.4 mtorr, 1.2 mtorr

and 1.8 mtorr. - -----

Daughter spectra of the m/z 141 fragment ion of a
variety of barbiturates (Elab=30 eV, p=0.4 mtorr).
See Table 2.2 for compound names corresponding

to the reference names provided in this figure. ..........

Daughter spectra of the m/z 91 fragment ion of a
variety of phenols (Elab=30 eV, p=0.4 mtorr)

See Table 2.2 for compound names corresponding
to the reference names provided in this figure.

 

Plots of collision gas pressure measured by (a)
manifold ion guage and (b) Q2 convectron guage
versus peak area of the m/z 69 fragment ion of

perfluoro-tert-butylamine (PFTBA) ....................................

Primary mass spectra of PFTBA using (a) Q1 and
(b) Q3 as the scanning quadrupole----

v-v—vwv-vv

 

The ICL procedure used to repetitively acquire 100
primary mass spectra for characterization of probe

48

-_49

51

-52

56

-57

temperann'e programs ......................................................................... 58

Reconstructed ion chromatograms of (a) the' m/z 58
fragment ion, m/z 284 molecular ion ‘and (c) total

ion current for promazine - - -

 

The ICL procedure used to acquire the primary

mass spectra found in the reference database ...............

The ICL procedure used to acquire the daughter
spectra found in the reference database--

 

Schematic of the computing facilities available for

xiv

6 O

61

62

running the ACES programs ............................................................... 6 4

2.16

2.17

2.18

2.19

2.20

2.21

2.22

2.23

2.24

3.1

3.2

Equations used to calculate the uniqueness and
correlation values for the spectral features found
in the reference databases ............

74

 

Substructure “SS143” representing the base

substructure found in the structure of barbiturates

A partial primary mass spectrum (55-75 amu)
of hydroxyamphetamine showing a doubly charged

ion at m/z 58.5 and 67.5 ...........................................................

Daughter spectra of the (a) m/z 58.5 and (b)
m/z 67.5 fragment ions of hydroxyamphetamine
(Elab = 2 eV, p=1.8 mtorr)

......... 75

.......... 79

80

 

Daughter spectrum of the m/z 67.5 fragment ion

of hydroxyamphetamine with Etab = 30 eV .......................

Daughter spectra of the (a) the doubly charged
molecular ion and (b) the m/z 108.5 fragment ion

......... 82

-83

 

of morphine (Elab = 2 eV, p=1.8 mtorr)-.-----. -------------

Daughter spectra of the (a) m/z 98.5 and (b) m/z
99.5 fragment ions of oxymorphone (Elab: 2eV,

85

 

p=1. 8 mtorr) -- ---------------- - -

Daughter spectra of the (a) m/z 112. 5 and (b) m/z
113 .5 fragment ions of oxymorphone (Elab: 2 eV,

----86

 

p=1. 8 mtorr) - ------------- -

Daughter spectrum of the doubly charged
molecular ion of oxymorphone (Elab = 2 eV,
p=1.8 mtorr) - --

8 7

 

An excerpt from the “substructure-buckets”,

SS—BUCKEI‘S -- --

An excerpt from the “feature- buckets”
FEATURE-BUCKETS, showing different features with
the same nominal mass--

------- 95

3.3

3.4

3.5

3.6

3.7

3.8

3.9

3.10

3.11

3.12

3.13

3.14

Equations used to calculate the uniqueness and
correlation of spectral features in MAPS for use by
the U and C filters ------------------- 98

 

The initial MAPS (v.11) rule obtained for the
PHENOTHIAZINE substructure - ....... -...102

 

Equations used to calculate the rule reliability and
rule recall estimates in MAPS ......................................................... 10 3

The MAPS (v.1) feature-combination rule obtained
for the BROMO substructure. Adapted from
reference 11. -- .... 105

 

Plot of the number of initial spectral features

for the BARBITURATE substructure for use in

generating feature-combination rules versus

minimum correlation for several minimum

uniqueness values (Ui) ....................................................................... 10 8

MAPS (v.11) feature-combination rule for the
PHENOTHIAZINE substructure with a recall and
reliability estimate of 100% ............................................................. 1 10

The GENT output file format ............................................................ 1 1 3
An exerpt from the GENT output file produced

using 40 reference database compounds and two
substructure definitions - - -- - ---------- 114

 

An illustration of feature-combination bit string
generation with backchecking _ -- - --------- ..--l 17

 

Sample output from the MAPS (v.III) program
with user input highlighted in bold faced type ...................... 119

The format of the MAPS feature-combination
rule save files ......................................................................................... 12 0

Sample output from the RULE program with
user input highlighted in bold faced type ................................. 122

4.1

4.2

4.3

4.4

4.5

4.6
4.7
4.8
4.9
4.10

4.11

4.12

4.13

4.14

4.15

5.1

Structure corresponding to the WLN notation

 

given in text. Adapted from reference 17. -

A sample GENOA session- -- -

Substructure drawings for the 88145 and
8872 substructures-

A demonstration of the SURVEY function of STRCHK .

GENOA session illustrating the creation of a

substructure definition ------- .....

 

Format and an excerpt of the “SUBSTRLIS” ﬁle ...........

Schematic of automated structure genration in ACES

-‘A‘-

 

Format and example of the “.MFG” results file

Format and example of the “.MPS” results file - -- --

 

Flowchart of the automatic structure generator .............

Sample output from the automated structure

generator (ASG) program .........................................................

Schematic showing the program components and
interactions required to obtain ancillary experiment

recommendations for the intelligent controller (IC)...

Format and an example of the “.GEN” file

 

Sample output from the survey function using 54
candidate structures generated from the
molecular formula shown ----- -

 

Flowchart showing the basic algorithm for

obtaining ancillary experiment recommendations ......

Histogram of the substructure library showing
the number of unique occurrences of each

substructure in the reference database compounds .............

xvii

-127

135

136

138

140
144
146
147
148

150

151

155

156

---157

158

165

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

5.10

5.11

Histogram showing the number of intial features
obtained for each substructure in the substructure
library using the indicated minimum uniqueness

and correlation values .................................................................

Histogram showing the overall recall achieved for
each substructure in the substructure library using

....... 169

- -----17 1

 

the indicated Ui/Ci/Cc program parameters --

The initial MAPS (v.11) rule obtained for the

PHENOTHIAZINE substructure - - -- ------

 

Structures for several fragment ions of
phenothiazine compounds. Adapted from
references 4 and 6.

Comparison of documented fragmentation
pathways for phenothiazine derivatives with
the features contained in the MAPS

PHENOTHIAZINE rule ‘ -- -------- -

........ 174

-- ---1 7 6

 

MAPS (v.11) feature-combination rule for the
PHENOTHIAZINE substructure with a recall and

reliability estimate of 100% ......................................................

The MAPS feature-combination rule for the
PHENOTHIAZINE substructure (S8132) with the
mass filter enabled and the indicated

program parameters -- ----------------------- -----

....... 178

------ --179

 

MAPS (v.11) exclusion rule for the
PHENOTHIAZINE substructure (SSl32) generated
using the indicated program parameters

180

 

MAPS (v.lII) feature-combination rule for the
PHENOTHIAZINE substructure (SSl32) generated
using a high initial uniqueness and low initial

correlation value ............................................................................

Structure summary for several of the phenothiazine
derivatives in the reference database -

....... 181

----- 183

 

xviii

5.20

5.21

5.22

Substructure drawings for the BARBITU RATE

substructure and two speciﬁc derivatives ........................

MAPS (v.III) feature-combination rule for the
BARBITURATE substructure (SSl4l) generated

using the indicated program parameters ..........................

Fragmentation pathways for the a) “$8145” and
b) “S8147” substructures. Adapted from references

MAPS (v.III) feature-combination rule for the
“88145” substructure generated using the

indicated program parameters ................................................

MAPS (v.III) feature-combination rule for the
“SS147” substructure generated using the

indicated program parameters ................................................

MAPS (v.III) feature-combination rule for the
T-BUTYL substructure (8821) generated using the

indicated program parameters ................................................

MAPS (v.11) exclusion rule for the T-BUTYL
substructure (8821) generated using the indicated
program parameters

MAPS (v.III) feature-combination rule for the
PHENOL substructure generated using the

indicated program parameters ................................................

Histogram showing the recall of the
feature-combination rules obtained using

Ui=30%, Ci=30% and Cc=30%= -------- - .......

 

Daughter m/z values observed in a MAPS rule
from a m/z 198 parent ion using single and
multiple collision conditions --------- -- ------

Additional structures for fragment ions observed

from phenothiazine derivatives ..............................................

xix

........ 186

........ 187

-----l 9 0

....... 191

....... 192

....... 194

-.194

....... 194

....... 199

---202

....... 203

5.23 MAPS (v.III) feature-combination rule for the

6.1

6.2

PHENOTHIAZINE substructure generated using the
indicated program parameters and the multiple
collision data-3--- - - 204

 

Structure for each test compound listed in Table 6.1 .......... 209
Substructure definition for the “88118” substructure

(a) with benzylic hydrogens (initial definition) and
(b) without benzylic hydrogens (new definition) .................. 218

XX

CHAPTER 1

Automated Structure Elucidation
Using MS/MS Spectra

Introduction

Structure elucidation of organic compounds .is often essential to
resolving a variety of chemical problems in academic, industrial and
governmental settings. The mass spectrometry / mass spectrometry
technique (MS/MS) has played an important role in fundamental
studies of ion structure, reaction mechanisms and thermochemistry,
as well as being used in the study of chemical problems in the
environment, natural products, industrial products, foods, forensic
science, petroleum products, bioorganic compounds and
pharmaceuticals [1,2]. The MS/MS technique was initially popular
for its potential in analyzing mixtures but has increasingly been
applied to structure elucidation [3]. Interest in automating the
structure elucidation process has grown as the popularity of this
technique increased. This interest parallels that in other areas of
analytical chemistry as new developments in hyphenated techniques
[4] has led to the ever-increasing ability of new instrumentation to

generate large quantities of multidimensional data [5].

Despite the increasing involvement of computers in MS/MS

instrumentation, automation of the interpretation of MS/MS spectral

data continues to be one of the foremost challenges to workers in this
field. A similar demand occurred with the introduction of GC/MS
instruments which are also capable of producing huge quantities of
data. Advances in the collection of multi-order mass spectra have
drastically increased the potential size of the MS‘1 data space from
which spectral feature / molecular structure correlations can be
derived. These advances . include the ability to collect a complete
MS/MS map. in a few seconds [6], multi-order mass spectra (MSn,
where n > 1) such as Fourier transform mass spectrometers (FT-MS),
which are capable of five consecutive stages of MS (n=5) [7] and ion
trap mass spectrometers (ITMS) which are capable of MS12 [8]. The
combination of GC with MS/MS has become more powerful with the
increase in MS/MS data acquisition speed [6]. This combination is a
compromise between the length of time that a chromatographic peak
is available for analysis as it emerges from the column and the
number of MS/MS spectra that can be acquired during this time.
New developments in MS/MS instrumentation based on tandem
time-of-ﬂight mass analysis are especially interesting since the most
ambitious of these promise to deliver full MS/MS fragmentation
maps on the capillary GC time scale [9]. Thus, higher resolution
chromatography (i.e. capillary GC vs. packed column GC) can be used
to separate complex mixtures without the loss of the structural
information contained in the MS/MS spectra. If such a system is
realized, a data system that fully exploits the information contained
in the MS/MS data space will be required to speed the analysis of

mixtures.

The full MS/MS data space (the MS/MS map) consists of a
primary mass spectrum and daughter spectra for each of the m/z
values in the primary mass spectrum. The information contained in
all of the major MS/MS scans on instruments such as the triple
quadrupole mass spectrometer (TQMS) [10] resides in this data space.
The problem facing chemists is to extract the structurally relevant
information from this data space and use it to deduce the molecular
structure of unknowns. This problem can be quite challenging since
the MS/MS data space increases much more rapidly with mass than
the MS data space. A comparison of the number of potential features
in the MS/MS data spaces is provided in Table 1.1. The number of
potential features in the MS/MS data space is given by the following
equation: 0.5 * (n2 + 5n) where n is the nominal mass of the largest
ion produced by a molecule (i.e. the molecular mass, plus isotope

conuibutions, if any). For example, the molecular ion of isopropanol

 

 

Mass Range (amu) Number of Features
MS MS/MS
60 60 1,950
500 500 126,250
4000 4000 8,010,000

 

Table 1.1: Comparison of the number of potential spectral features
in the MS and MS/MS data spaces for selected masses.

is 60 amu and the number of potential features in the MS/MS data
space is 1,950 (60 primary m/z, 60 daughter m/z, 60 neutral loss
masses and 1770 specific parent-daughter combinations) versus 60

in the MS data space. The m/z limit for some of the early TQMS

The full MS/MS data space (the MS/MS map) consists of a
primary mass spectrum and daughter spectra for each of the m/z
values in the primary mass spectrum. The information contained in
all of the major MS/MS scans on instruments such as the triple
quadrupole mass spectrometer (TQMS) [10] resides in this data space.
The problem facing chemists is to extract the structurally relevant
information from this data space and use it to deduce the molecular
structure of unknowns. This problem can be quite challenging since
the MS/MS data space increases much more rapidly with mass than
the MS data space. A comparison of the number of potential features
in the MS/MS data spaces is provided in Table 1.1. The number of
potential features in the MS/MS data space is given by the following
equation: 0.5 * (n2 + 5n) where n is the nominal mass of the largest
ion produced by a molecule (i.e. the molecular mass, plus isotope

contributions, if any). For example, the molecular ion of isopropanol

 

 

Mass Range (amu) Number of Features
MS MS/MS
60 6 0 1,950
500 500 126,250
4000 4000 8,010,000

 

Table 1.1: Comparison of the number of potential spectral features
in the MS and MS/MS data spaces for selected masses.

is 60 amu and the number of potential features in the MS/MS data
space is 1,950 (60 primary m/z, 60 daughter m/z, 60 neutral loss
masses and 1770 specific parent-daughter combinations) versus 60

in the MS data space. The m/z limit for some of the early TQMS

instruments was 500 (potentially 126,250 features in the MS/MS
data space) but new instruments are capable of 4000 (potentially
8,010,000 features in the MS/MS data space)! While not all of the
data channels in the MS and M'S/MS data spaces are utilized for a
given compound, it is significant that the amount of structural data
derived from MS/MS experiments greatly exceeds that for MS
experiments. It should be noted that this information can be
multiplied when the number of possible instrumental conditions (i.e.
ionization method, collision energy, collision gas pressure, etc.) is

considered.

A further complication in analyzing MS/MS data is the variety
of instruments capable of producing MS/MS data with varying
degrees of spectral resolution (i.e. TQMS-unit resolution [10], FTMS-
high resolution [11], sector instruments-less than unit resolution
[12]), different energetics in the process used to obtain daughter ions
(i.e. sectors: keV range [13], TQMS: 5-150 eV [14]), and assorted
ionization techniques to provide the parent ions (i.e. 70 eV electron
impact (El), chemical ionization (CI) with a wide selection of reagent
gases, fast atom bombardment and other ionization techniques [15]).
While much has been done to provide sophisticated data acquisition
systems [16-18], no data interpretation software has yet been
developed that fully exploits the information obtainable from MS/MS
instruments such as the TQMS. Any data system capable of
interpreting MS/MS spectra must take into account the size of the
data space and the versatility of the MS/MS technique to be of

general use to the scientific community.

The following section provides some historical background on
the TQMS instrument, the value of MS/MS data in the analysis of
drug compounds, methods used in automating the analysis of
conventional mass spectra and previous attempts at automated
interpretation of MS/MS data. The final section of this chapter
provides an overview of the Automated Chemical Structure
Elucidation System (ACES). This system provides software tools for
the automated interpretation of MS/MS data from unknown

compounds and generation of candidate structures for the unknown.
HISTORICAL BACKGROUND
Triple Quadrupole Mass Spectrometry

The major components of a TQMS instrument are an inlet
system, an ion source, two quadrupole mass filters, one quadrupole
collision chamber, a variety of lenses for focusing ions and a detector.
The orientation of these components is shown in Figure 1.1. The
purpose of the inlet system is to introduce the sample in the gas
phase into the ion source. Diverse inlet systems are available for
TQMS instruments including a liquid inlet for volatile liquids, a
heated direct insertion probe for solid samples and a gas
chromatograph transfer inlet for GC/MS or GC/MS/MS. Once the
sample is in the ion source a number of ionization methods exist to

produce ions from the molecular sample. These methods include 70

eV electron impact (E1), chemical ionization (CI) and fast atom

bombardment (FAB) [15].

After the sample is ionized, there are a number of experiments,
or scan modes, that can be used for structure elucidation of the
sample. The most common scan mode is the daughter scan. This
scan is performed by selecting a specific m/z ratio for Q1, passing the
ions with that m/z ratio into Q2, collisionally dissociating the ions
into fragments and acquiring the mass spectrum of the fragment ions
produced using Q3. The inlet system and ionization method used to
create the database for this work was the direct insertion probe and
E1 ionization. The database consists of primary and daughter spectra
for a number of compounds. The creation of this database is

discussed in detail in Chapter 2.

 

 

 

 

 

 

 

 

 

 

 

 

 

o
1 L L L E
E E E T
1011

Smimimé

--- scum
E E E E T
T s s s o
R

 

 

 

 

Figure 1.1: Block diagram of the major components of a triple
quadrupole mass spectrometer.

The MS/MS map obtained for isopropanol on a TQMS

instrument is shown in Figure 1.2. The m/z ratios found along the

 

 

    

 

. - A
/, ‘u.
£231 0.2;
' ‘\ N
at)“ 559°
'//(’//l,.JJ\‘-if\, up;
/// (r1, \1\ 57-3
//_/1,1 mtﬁw
/ 'm >
, / o
\, -
. /

 

Adapted from Yost and Enke, American Laboratory, (June,

MS/MS map for isopropanol.

1981).

Figure 1.2:

 

 

front edge of the figure comprise the primary mass spectrum. The
m/z ratios for the diagonal edge beginning on the right and moving
up and to the left constitute the daughter spectrum for the specified
parent ion. The parent scan (the‘ spectrum of parent ions yielding a
specified daughter) is found along a diagonal edge beginning on the
left and moving up and to the right. Beynon and coworkers
demonstrated in 1978 that the advantages of the MS/MS technique
for structure elucidation include: i) daughter spectra give clear
indication of the parent ion structure, ii) the molecular formula can
be deduced without high resolution mass spectrometry, iii) the
presence of particular substructures can be established with
certainty, and iv) library searches of spectra related to substructures
is possible [19]. The determination of molecular structure of
pharmaceuticals using MS/MS spectra is discussed in the next section

to illustrate the value of MS/MS spectra for this purpose.
Structure Elucidation of Pharmaceuticals Using MS/MS Data

Drug testing and toxicology requires the determination of drugs
and their metabolites in biological samples. A number of recent
reviews of the application of the MS/MS technique to structure
determination of pharmaceuticals demonstrates the utility of the
technique in these important analyses [20-23]. Cooks and coworkers
have reported the utility of mass-analyzed-ion-kinetic-energy
(MIKES) spectra in the identification of closely related barbiturates
[24]. In this study, protonated molecular ions of a series of

barbiturates were obtained by C1 using methane as a reagent gas.

Daughter spectra for these molecular ions were then obtained and
compared. Among the findings of this study was the marked
differences found in the daughter spectra of two isomeric
barbiturates (amobarbital and pentobarbital). The primary mass
spectra of these compounds, however, were closely similar. Thus,
MS/MS spectra could be used to differentiate between these two
compounds while primary mass spectra could not successfully make

this determination.

The primary mass spectra and selected daughter spectra for
these isomeric barbiturates are examined here to illustrate the
ability of MS/MS to differentiate closely related compounds. They
are among the spectra collected to create the reference databases
used by the ACES software. The primary mass spectra and molecular
structures for' pentobarbital and amobarbital are shown in Figure 1.3.
As can be seen, the spectra are quite similar. In fact, when the
amobarbital primary mass spectrum was searched against the
library provided with the TQMS instrument, pentobarbital was the
closest matching compound (with a high degree of match). The
daughter spectra for the 197+ ion of both compounds are shown in
Figure 1.4. Note the presence of the 155+ daughter ion (neutral loss
of 42 amu) in the daughter spectrum of pentobarbital but not in the
daughter spectrum of amobarbital. Recall, the base peak in the
Primary mass spectrum is 156+. There are also differences in the
smaller 129+, 112+, 58+ and 57+ daughter ions. Both spectra have
Strong 141+ daughter ions corresponding to a neutral loss of 56 amu.

Possible structures for the 197+ parent ions are also shown in Figure

10

4838525 SV 93 333333 A3 we 8.8% «38 SEE “Q— 9.52.,—

 

 

 

 

 

 

 

 

 

._<._._m¢<m02<
oma....£~....°1 ...oﬁ....o.n.
¢ 1 4 d: I: 4 II 14 l‘1 ‘1 1‘14 1
‘ SM: d... 1.341 d—
o .. .L .
\05 «a: ”...... ”2.13
ﬂzﬂvnfﬂuvﬂyl_ at .IIA
:2 u a :8 «S
...O :0 :3
3V 7 <5»
0 O ...”: 13
2
z T 1:.
3+? 9.3 .12:
._<h_mm_<m0._.zmn_
.°m~1...£~...tomu ...2.:........h
.- 1 M j . 41 #1 1111121: 6
”4.3 an: 1:. 18
AV ﬂvﬂv «L u.MMM nunvION
:o £85895 a
:2 ... 13.
«39.8
m (8*
A V .- ...
o4:
O O
2 ... to.
...
a: toad

 

 

no+ﬂe Hound

 

 

4333395 SV 93 .8333an as we 82 833—2: 2: we «been; Bingo “v." «...—a...—

 

11

 

 

 

 

 

 

 

 

._<._._n¢<003<
1 °m~ . I - . oma . - . own - . .. - mm - I 1 .I
t . _ _ 4
a .a .
«.pn
nu \‘nanu uo~
«:0 «:0 :o/
. é ..
Av av
,- 1o.
v. a.
I ...": 12:
no+ﬂep|oopad i
4<tnm<mthmm
1 Jun . - . - own I. t - oma - - - - mm . t p - 1
t . . - q. — 4 I
o.~na
c..o
1o~
0 «Jo 7
so «5:838 1:
:2 «to 0:0 ......3
N1
Amwv o.~.~ too
av .- nu «1 1o.
-.
too”

 

 

 

..+a. cave"

12

1.4 and are probably due to loss of an ethyl side chain (neutral loss

of 29 amu).

It is interesting to note that the daughter spectra of the 156+
and the 141+ ions, shown in Figure 1.5 and Figure 1.6, respectively,
are identical for the two barbiturates. This evidence leads to the
conclusion that the alkyl side chain that differentiates amobarbital
from pentobarbital has been lost and the 156+ and 141+ ions have an
identical structure for the two compounds. The preceding
observations show that identical substructures derived from
different compounds can be identified from their MS/MS spectra. In
fact, MS/MS for metabolic monitoring is increasingly being used since
drug metabolites often retain key substructural elements [25-27].
The focus of the remainder of this chapter is how to automate the
discovery of MS/MS spectral features which are indicative of
molecular substructures and their application to substructure
elucidation of unknown compounds. A recurrent theme in
subsequent chapters will be the application of the resulting system

to the identiﬁcation of drug and other substructures.

"m; «...—HE

 

13

 

 

 

 

 

 

 

 

 

 

 

 

.Bﬁanoaa SV 93 Emnanoaeon 3 no «:2 82:53 on“ a): on. we Econ... 52930
._<..._mm<moa<
roancuromutowwvoPorebobo-v-o.~...
— d
o._:
ro~
1:
3V
too
at
our NE. ...”: .2
o
u n 35?..an -2:
:0 .
:2 .a 45.33052
town 1 o: - omuiowu 1 0.. 0.0 Pvl - one. 1 p
o o: . - _ . -
z + _
I 9... ran
13
3
r8
7 oi: 12.
Icon
oo+ﬂer Gonna
i].

 

 

14

.Bmﬁanog By 93 REP—32:2. A8 .8 83 Seaman..— 3; a}: 2: .8 «been... 829:5 34 9.5a...—

 

 

 

 

 

 

 

 

 

45.3535
i own 11. nmm 1111 emu 3% 3% JR mm
c 1 1 q «1 I 11 t 0
one.
ton
.... 8 r3
3.
row
.3 a... r ...
O
noon
N oo+uc11o.o1«
:2 so ..Emcgohzma
ova own can on on 0. cu
0 T Li 1 - 1 m 1 p n n » E i k..-
0 Oz ......-
n . ...
.
0.09 to.
E
I.
To.
HI
Icon

ocean, o.ovn

 

15

Automated Structure Elucidation for MS and Other Methods

Diverse structure elucidation systems have been developed to
assist the chemist in analyzing spectral data to accurately determine
molecular structure. These systems have been reviewed by Small
and categorized as being direct or indirect database methods [28].
Library searching (or spectral matching) falls into the direct database
category while interpretive systems (i.e. expert systems) fall into the
indirect database methods. The spectral data used by these systems
were obtained using infrared spectroscopy (IR), nuclear magnetic
resonance (NMR) and mass spectrometry (MS). The ability to
integrate data from a number of spectral techniques is another
important characteristic in classifying structure elucidation systems.
An important prerequisite for integration of these methods is the
incorporation of a structure generator to provide candidate
structures [28]. The structure generator accepts structural
constraints expressed as substructures to limit the generation of
candidate structures to those that are consistent with the spectral
data. The integrated systems include DENDRAL (NMR and MS)
[29,30], CASE (IR and NMR) [31], CHEMICS (IR, NMR and MS) [32,33],
and DARC (NMR) [34]. Small concluded his review by stating the two
major problems that face automated spectral interpretation systems.
The first problem was an, overall lack of available software and the

second was the limited size and availability of spectral databases.

The systems that have been applied to mass spectral data were

recently reviewed by Palmer [35] and by Martinsen [36]. By far the

16

most popular structure elucidation system for MS data is spectral
matching. The goal of this technique is to retrieve the mass
spectrum of a compound from a reference library that is identical to
the mass spectrum of an unknown. Since the structures that
correspond to each of the mass spectra in the library are known, the
system can identify the structures of unknown compounds. Almost
every commercial MS instrument provides the user with spectral
matching software and a mechanism for building a library of
reference spectra. Large databases of mass spectra have been
compiled and are readily available [37]. The largest of these contains
130,544 mass spectra for 113,587 compounds. Arguably the most
severe disadvantage of spectral matching, however, is the
necessarily limited size of the reference library. Chemical Abstracts
Services recognizes over 7,000,000 different organic compounds so
even the largest library represents a small portion (less than 2%) of
the possible structures for an unknown. Other disadvantages that
are cited for. spectral matching include: 1) the search time increases
with the size of the reference library, 2) incorrect identifications and
ambiguous hit lists become more frequent with the size of the
reference library and 3) a close match does not necessarily mean the
structure of the reference spectrum is the structure of the unknown
due to decreasing spectral differences for closely related compounds
[35].

Among the indirect database methods (interpretive methods)
that have been applied to mass spectra are pattern recognition and

artificial intelligence [28]. These methods were also reviewed by

17

Palmer [35]. Pattern recognition systems attempt to identify
features in the mass spectrum for an unknown that correspond to
characteristic features for a compound class (substructure). If these
features are found in the spectrum of an unknown, the substructure
can be said to be present in the structure of the unknown compound.
An example of this approach is the self-training interpretive and
retrieval system (STIRS) which deduces substructural information
about an unknown through the application of 26 different classes of
mass spectral data [38,39]. However, this system does not utilize a
structure generator to provide candidate structures. Furthermore, it
is not a viable method for use with MS/MS data, as will be discussed
in more detail below, since it relies on spectral matching and a

reference library.

The DENDRAL project was an example of an artificial
intelligence approach to interpreting spectral data including mass
spectra [29,30]. This system used mass spectrometry fragmentation
rules derived automatically by Meta-DENDRAL to determine
molecular fragments (substructures) that were present or absent in
the structure of an unknown [40]. A structure generator, GENOA, was
used to exhaustively generate all possible candidate structures
consistent with the molecular formula of the unknown compound
and the substructures derived from application of the fragmentation
rules to the mass spectrum of the unknown [41]. The candidate
structures were then ranked according to the similarity of their
predicted mass spectrum and the spectrum of the unknown

compound. The DENDRAL approach is the most similar to the

18

approach we have adopted in our laboratory to provide an
automated structure elucidation system for MS/MS data. The
differences lie in the way we obtain substructure identification rules
and in the way we intend to reduce the number of candidate

structures.
MS/MS Spectral Matching and Reference Databases

A logical first step in automating the interpretation of MS/MS
data was to apply spectral matching techniques, the method of choice
for MS data. A system was developed that allowed the collection of
daughter spectra for reference compounds into a reference database
[42] and provided matching functions for comparison of daughter
spectra for unknowns with those reference spectra [43]. The multi-
dimensional database allowed for the storage of instrumental
parameters and X,Y data of any type and dimensionality. The
matching program allowed the user to match either the primary
spectrum or a selected daughter spectrum against reference spectra
contained in the database to determine the structure of the
unknown. Software to develop spectrum / substructure correlations
was then added. This software was based on the theory that
individual daughter spectra within the complete set of MS/MS
spectra are related to the molecular substructures from which they
arise. The software was subsequently interfaced with a structure
generation program to provide an automated structure elucidation

system [44].

19

This software was recently reviewed by Palmer and a number
of major limitations were discussed [35]. These limitations include:
I) the process of identifying the daughter spectrum / substructure
correlations was not sufficiently automated, 2) the user was required
to have some knowledge of the ionic structure(s) represented by the
parent ions of daughter spectra to confirm the reliability of the
daughter spectrum / substructure correlations, 3) there was a
limited number of these correlations, 4) the use of matching
algorithms with daughter spectra weighed heavily the variable
daughter ion intensities commonly found in collisionally assisted
dissociation (CAD) spectra and most importantly, 5) the method did
not take full advantage of the extra dimension of information that

MS/MS affords.

There have been several attempts at the development of an
instrument-independent CAD spectral database [45,46] but they have
met with little success thus far. This database was intended to
address the problem of widely varying daughter ion intensities for
spectra collected on MS/MS instruments by specifying a standard set
of "dynamically-correct" operating conditions [47,48]. For example,
relative intensities of daughter ions have been reported to vary by a
factor in the hundreds on different TQMS instruments even with the

same nominal operating conditions [45].

The key MS/MS parameters that were determined to have a
significant effect on the precision and accuracy of daughter ion

intensity measurements are listed in Table 1.2 [47]. A kinetics-based

20

measurement protocol was developed by Martinez to determine if an
instrument had a dynamically correct design and thus could be used
to collect standardized CAD spectra [48]. A round robin was
performed to collect data on several instruments to access the
viability of the protocol [49,50]. The results showed that three out of
the five TQMS instruments tested gave precise and accurate (< 10%
relative deviation) measurements under single collision conditions.
The results. for multiple collision conditions showed that an

instrument-independent response could not be obtained.

As of this writing, an instrument-independent CAD spectral
database is at least five years in the future [51] and there remains
considerable doubt whether one will ever be compiled. Among the
problems encountered was a disagreement about the standard
conditions to be used to collect the reference spectra. Several
members of the Instrument Independent CAD Spectral Database
Workshop [51] pointed out that multiple collision conditions were
required to obtain analytically useful data for a number of important
compound classes and that the conditions required could change with
a relatively simple functional group being added to the substructure
of interest. The end result was an agreement to study the feasibility
of multiple collision conditions with a specified number of collisions
as a standard for comparison of daughter spectra. Compilation of a
database will begin if the standard conditions issue can be settled.
Until that time, methods which rely on spectral matching and large
databases will not be viable for a general MS/MS structure

elucidation system.

 

 

 

 

 

21

 

Kinetic parameters

1)
2)

3)

The target gas used.
The center-of-mass interaction energy Ecm since the cross

section and branching ratios depend on Ecm.
The target thickness.

Instrument parameters

4)

5)

6)

7)

8)
9)

The lab collision energy which is the difference between the
ion source potential and the Q2 rod offset (Elab)-

The difference between the ion source potential and the Q1 rod
offset which influences resolution within Q1 and determines
the nominal energy distribution of the projectile ion beam
entering Q2.

The effective pathlength of the projectile ion through Q2 as
inﬂuenced by the Q2 ﬁeld radius, the Q2 RF frequency, the
mass of the projectile and the axial energy (Elab)-

The value of q; and the interquadrupole aperatures of
diameter < 1.4 to which can have mass dependent focussing
differences depending on the values used.

The difference between the Q2 rod offset and the Q3 rod offset.
The type of detector used because of differences in mass-
dependent conversion gain

Internal energy of the projectile ion

10) The type of ionization used (ie. electron impact versus chemical

ionization).

 

Table 1.2: List of key parameters affecting relative daughter ion

intensity in a triple quadrupole mass spectrometer
(adapted from ref. 47).

 

h
l

 

 

 

22

Pattern Recognition and Other Methods

Several applications of pattern recognition using MS/MS data
have been reported but have been limited in scope. Principal
component analysis has been used to evaluate high energy CAD
spectra obtained using B/E linked scans for the differentiation of
some alkyl benzenes [52]. Factor analysis was applied to the
problem of more than one parent ion being selected for high energy
CAD (isomeric/isobaric ions) [53]. This method was able to
distinguish alkyl substituents on the benzene ring but not the ring
isomers. An interesting system has been reported that uses exact
mass measurements and high energy CAD spectra [54,55]. The
software generates possible fragment ions (called superatoms)
consistent with the molecular formula and checks for their presence
in the CAD spectrum of the unknown. Candidate structures are
generated using the superatoms found to be present. This system is
limited, however, to acyclic systems. Johnson and Biemann have also
reported an automated system for peptide sequencing using CAD
spectra [56]. This system exhaustively generates possible peptide
sequences from known subsequences and examines the daughter

spectra for the presence of characteristic cleavages.
The Automated Chemical Structure Elucidation System

The Automated Chemical Structure Elucidation System (ACES)

has been under development in our laboratory to provide the user

23

with a variety of software tools for obtaining candidate structures for
unknown compounds [57 -62]. The approach used is to determine the
molecular formula and the molecular substructures present in or
absent from the molecular structure of an unknown compound using
MS/MS spectra. This information is then used to constrain the

exhaustive generation of candidate structures.
System Components

The major program components and the two distinct
operational modes of the system are shown in Figure 1.7. The first
operational mode, shown in the figure with a dashed line, is the
learning mode. This is the mode of operation used to create the
substructure identification rules. The first step involves the
collection of the full MS/MS map for standard compounds of known
structure. These maps are' then stored in a reference database. The
MAPS software (Method for Analyzing Patterns in Spectra) uses the
reference database along with the substructures known to, be present
in the known compounds to develop substructure identification rules
[58,59]. The list of substructures that are present in the molecular
structure of each of the known compounds is provided by
substructure search software and is discussed in Chapter 4. The
substructure identification rules are then stored to be used in the

analysis of unknown compounds.

24

.Ammoac 839$ consumes—m 83025 33825 33:.23‘ 05 mo 33828 #4 «Eur—

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

25

The substructure identification rules have the format:

IF "<spectral feature l> [[11,C1]"
and "<spectral feature 2> [U2,C7_]"
and ”<spectra1 feature 3> [U3,C3]"
and "<spectral feature 11) [Un,Cn]"
THEN substructure ”X" is present.

The MAPS rule for the phenothiazine substructure is shown in Figure
1.8. The types of MS/MS features used are primary scan ions (PS),
daughter scan ions (DS), neutral loss mass (NL), and parent-ion /
daughter-ion combinations (PD). The first two letters in the
triangular brackets of a rule clause represent the type of MS/MS
feature. The value(s) in parentheses that follows the letters is the
m/z value (or mass in amu for 'NL features). The values in square
brackets are the uniqueness (U) and correlation (C) of the feature
with respect to the substructure indicated in the "THEN" clause of the
rule. The U value is an measure of the "selectivity" of the feature for
the indicated substructure. The C value is a measure of the
"sensitivity” of the feature for the indicated substructure since it
indicates the probability that the feature will be found when the
indicated substructure is present in the structure being analyzed.
Chapter 3 is devoted to MAPS and the substructure identification

rules produced using the new reference database.

The other operational mode of ACES is the identification mode.
Once again, the process begins with the acquisition of the MS/MS

map for the unknown compound. The substructure identification

26

 

IF

and
and
and
and
and
and
and
and
and

“D (211.0)
“D (210.0)
“D (209.0)
“D (198.0)

“ PD
6‘ PD
“ PD
6‘ PD
“ PD
6‘ PD

(198.0
(198.0
(197.0
(197.0
(196.0

-> 171.0)
> 154.0)
> 196.0)
-> 153.0)
> 152.0)

(70.0 -> 27.0)

[42,77] ”
[41,85] ”
[40,77] ”
[44.92] ”
[77,77] ”
[92,92] ”
[69,85] ”
[91,77] ”
[83,77] ”
[42,85] ”

{F1}
{F2}
{F3}
{F4}
{F5}
[F6}
{F7}
{F8}
{F91
{F10}

THEN substructure PHENOTHIAZINE is present.

(Umin = 40%, Cmin = 70%)

AX?

PHENOTHIAZINE
SUBSI‘RUCIURE

 

Figure 1.8: The

initial

MAPS (v.11)
PHENOTHIAZINE substructure.

rule

obtained for

the

27

rules are then applied to the MS/MS data obtained. The result is a
list of substructures present in or absent from the structure of the
unknown compound. A match factor (MF) must be specified to set
the minimum number of rule clauses that must be true for a
substructure identification to be made. A match factor of 100%
means that all the clauses in the rule must be true for the system to
conclude that the indicated substructure is present in the structure
of the unknown. As will be shown in Chapter 3, there is a tradeoff
between the reliability and the recall of the rule using this method.
The reliability is an estimate of the accuracy of the rule. The recall
of the rule is the frequency that the rule correctly predicted the
presence of A the substructure when applied to the data for the
standard compounds in the reference database. A new method for
generating substructure identification rules that eliminates the
reliability / recall tradeoff was described and tested by Palmer [35].
The implementation of this new method and its application to
generation of substructure identification rules using the new

reference database is also described in Chapter 3.

The molecular weight of the unknown compound is also
required. This piece of information may not be present in E1 spectra
so a CI experiment (a "soft ionization" technique that often yields
molecular ions) may be necessary. The molecular weight and the
elemental compositions of the substructures found to be present in
the structure of the unknown compound are input into the molecular
formula generator (MFG) [60]. The MFG, developed by Peter Palmer,

is an exhaustive molecular formula generator that uses the nominal

28

mass of a molecule and elemental constraints to provide a list of
candidate molecular formula(e). Palmer has also developed the
CARBON program which uses the ratios of daughter ions found in the
daughter spectrum of the M+1 ion to determine the number of
carbon atoms in the molecular ion [60]. Bozorgzadeh has recently
described further development and generalization of this method for
the determination of all the constituent elements in daughter ions
[63,64]. The MFG will also accept an exact mass if this information is

available.

The molecular formula(e) and the substructures found to be
present in or absent from the structure of the unknown compound
are used as constraints in GENOA, an exhaustive structure generation
program [41]. GENOA provides a set of candidate structures that are
consistent with the input constraints. A series of articles on the
application of GENOA to structure elucidation problems using a
variety of spectroscopic data to infer structural constraints has been
published [65-71]. The use of GENOA and structure generation in
general was a major focus of a recent book on computer-assisted
structure elucidation [72]. The changes made to GENOA to
incorporate the software into ACES are discussed in Chapter 4. One
of the key issues in using this approach is the accuracy of the input
constraints. Each of the candidate structures that GENOA produces is
completely consistent with the input constraints. If one of the input
constraints is wrong, then each of the candidate structures is wrong.

This fact leads to the paramount importance of accurate substructure

and molecular formula determinations and accounts for the

 

 

29

continuing search for better ways to generate reliable substructure
identification rules. The MAPS rules are evaluated in Chapter 5 by
comparison of the features found in the rules with those contained in
documented fragmentation pathways and by application to 20 test
compounds in Chapter 6. The evaluation of MAPS rules and
subsequent refinement of the rule generation software are among

the major accomplishments of this research.

Multiple Rulebases

MS/MS databases for reference compounds acquired under
different operating conditions are being compiled to assess the
viability of recommending ancillary experiments for reduction of the
number of candidate structures for an unknown. This reduction can
be achieved by identifying substructures that were previously
unidentified using another set of operating. conditions. The variables
that may be explored include ionization conditions (eg. electron
impact, chemical ionization, FAB), ion polarity (eg. positive ions or
negative ions) and collision conditions (eg. collision energy, collision
gas pressure, target gas, reactive collisions). It is not expected that
one set of instrumental conditions will be optimal for all
substructures but rather a number of instrumental conditions will be
found which are best suited for some fraction of the substructures
being investigated. As will be seen in the [next section, these
rulebases will not only be useful in obtaining new and better MAPS
rules, but may also be useful in reducing the number of candidate

structures obtained using any one of the rulebases.

30

Potential for Recursive Operation

In general, systems which use structure generators provide a
method for ranking candidate structures to assist the user in
determining the most likely candidate structure for the structure of
an unknown compound. The approach to be used in a forthcoming
version of ACES does not rank candidate structures, but instead,
seeks to assist the user in identifying ancillary experiments which
will potentially reduce the number of candidate structures.
Implementation of this approach will establish a "feedback loop"
which includes the instrument (the TQMS in this case) [73,74]. This
approach preserves the experimental versatility that MS/MS

' instrumentation can provide to the structure elucidation chemist.

The dashed box in Figure 1.7 highlights the components
necessary to implement this approach. The first step is to perform a
substructure analysis of the candidate structures using a modified
version of STRCHK (discussed in Chapter 4). The original version of
STRCHK was developed as part of the DENDRAL project. The
modifications made to STRCHK provide the level of automation
necessary for integration of this program into ACES. This program
outputs a list of substructures that differentiate among the candidate
structures (discriminating substructures). MAPS rules that have
been collected under different operating conditions (eg. a higher
collision gas pressure to provide more collisions) could then be
consulted by the EXPT program (currently under development) to

determine if a new experiment exists that can confirm the presence

31

of the discriminating substructures. Thus, multiple rulebases must
be generated using data obtained under a variety of conditions to
implement this feature of ACES. Once the EXPT program has selected
an experiment, the new instrumental conditions can be passed to the
user (or directly to the instrument control software in a totally
automated system) to be implemented. The list of candidate
structures can then be pruned depending on the results of the new
experiment. The goal of this pruning is to reduce the list of

candidate structures to one, the structure of the unknown compound.

This approach is quite different from the attempts to use only a
single, standardized set of instrumental conditions to provide
automated interpretation of MS/MS spectra. There is a historical
precedent for the standardization approach in conventional mass
spectrometry using 70 eV electron impact ionization for creation of
spectral libraries. The value of 70 eV was chosen because the
differences in fragmentation observed using this ionization method
”plateaued" around this value. In other words, the amount of
fragmentation (information) peaked around 70 eV for electron
impact ionization. An analogous situation in MS/MS does not exist
since the information obtainable from MS/MS spectra does not
plateau around a given set of instrumental conditions. While the
requirement that standards be run under each of the desired
instrumental conditions may be considered as a disadvantage, it has

the decided advantage of preserving the versatility of the MS/MS

technique.

32

Conclusions

In a recent book on MS/MS, Busch and co-authors provided an
outlook for this important analytical technique. A critical area of
research that was cited was the computerized interpretation of
MS/MS spectra. “As computer systems are more fully integrated
with the control of the MS/MS instrument, the details and quality of
the MS/MS data will come to depend more explicitly on the design
and operation of the computer / mass spectrometer interface. This is
especially so with computer-controlled triple-quadrupole mass
spectrometers, in which the operating characteristics of each of the
three quadrupoles are under direct computer control...To develop
new experiments, or for the most demanding applications, mass
spectrometrists cannot allow themselves to be drawn into situations

in which their instrumental options are limited [1].”

We believe the automated structure elucidation system
described here conforms to the spirit of the quotation above. It
allows the user to utilize reference databases containing MS/MS
spectra acquired under any set of MS/MS operating conditions. The
only constraint is that the user be consistent within a database.
Thus, multiple rulebases can be generated using complementary sets
of MS/MS instrumental conditions. These rulebases will provide the
basis for recommendation of ancillary experiments by a forthcoming
version of the ACES software. This structure elucidation expert

system (ACES) has the distinct advantage of allowing the use of

33

optimal MS/MS instrumental conditions in solving structure

elucidation problems.

REFERENCES

10.

11.

12.

Busch, .K.L., Glish, G.L., McLuckey, .S.A., "Mass Spectrometry /
Mass Spectrometry: Techniques and Applications of Tandem
Mass Spectrometry", VCH Publishers, Inc., New York, 1988.

McLafferty, F.W., "Tandem Mass Spectrometry", John Wiley &
Sons, New York, 1983.’

Burlingame, A.L., Maltby, D., Russel, D.H., Holland, P.T., Ann],
M 300R (1988)-

Hirschfeld, T., m 52,, 297A (1980).

Gurka, D.F., Betowski, L.D., Hinners, T.A., Heithmar, E.M., Titus,
R., Henshaw, J.M., AHMEEL. 6Q, 454A (1988).

Eckenrode, B.E., Watson, J.T., Holland, J., Enke, C.G., W
W81. 177 (1988).

Laukein, F.H., Abstracts of Papers, 14th Annual FACSS Meeting,
Detroit, MI (1987); American Chemical Society, Abstract 524,
Washington, D.C., (1987).

Louris, J.M., Brodbelt-Lustig, J.S., Cooks, R.G., Glish, G.L., Van
Berkcl. 0.1.. McLuckey, 8A.. W.

submitted

C.G. Enke, personal communication, November, 1989.

Yost, R.A. and Enke, C.G., MW LQQ, 2274 (1978).
Cody, RB. and Freiser, 8.8., W 55, 571 (1983).

Beynon, 1.11., Caprioli, R.M., Ast, 1:, W 2, 229
(1971).

13.
14.

15.

16.

17.

18.

19.

20.

21.

22

23.

24.

25.

26.

27.

28.

34

Busch, K.L., et. al., ibid, p.75.
ibid, p.78.

Watson, J.T., “Introduction to Mass Spectrometry”, Raven Press,
New York, 1985.

Johnson, RH. and Steiner, U., Pittsburgh Conference and
Expositision Abstracts, Atlantic City, NJ, 1986, abstract #541.

Lammbert, S.A., Chapman, W.K., Steiner, U., and Schoen, A.E.,
Pittsburgh Conference and Expositision Abstracts, Atlantic City,
NJ, 1987, abstract #1105.

Parr, V.C., Waddicor, J., and Wood, D., Pittsburgh Conference and
Expositision Abstracts, Atlanta, GA, 1989, abstract #1523.

Borogzadeh, M.H., Morgan, R.P., Beynon, J.H., AnalystJﬂl, 613
(1978).

Busch, K.L., et. al., ibid., p.265.
McLafferty, F.W., ibid., p.385.

Yost, R.A., Perchalski, R.J., Brotherton, H.O., Johnson, J.V., Budd,
M.B., 1am, 3_1_, 929 (1984).

Straub, K.M., Rudewicz, P., Garvic, C., W, 11, 413 (1987).

Soltero-Rigau, E., Kruger, T.L., Cooks, R.G., ABEL—Chm. 42, 435
(1977).

Perchalski, R.J., Lee, M.S., Yost, R.A., W, 26, 435
(1986).

Perchalski, R.J., Yost, R.A., Wilder, B.J., Mm” 531, 1466
(1982).

Covey, T.R., Lee, E.D., Henion, J.D., W” 58,, 2453 (1986).

Small, G.W., AW, 52, 535A (1987).

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

35

Barr, A., Feigenbaum, E.A., "The Handbook of Artificial
Intelligence", Volume 2, Heuristeck, Stanford (1982).

Linsay, R.K., Buchanan, G.B., Feigenbaum, E.A., Lederburg, J.,
”Applications of Artificial Intelligence for Organic Chemistry -
The Dendral Project", McGraw-Hill, New York, 1980.

Munk, M.B., Christie, E.D., MM, 216,, 57 (1989).

Sasaki, S., Fujiwara, I., Abe, H., Yamasaki, T., Anal._Chim._Acta,
121, 87 (1980).

Fujiwara, 1., Okuyama, T., H., Yamasaki, Abe, T., Sasaki, 8., Anal,
W111 527 (1981).

Dubois, J.E., Carabedian, M., Dagane, 1., AnaL_Chim._Ac_ta, 1.5.8..
217 (1984).

Palmer, P.T., Ph.D. Dissertation, Michigan State University, East
Lansing, MI, 1988.

Martinsen, D.P., Sung, B., W 4, 461 (1985).

Stauffer, D.B., Loh, S.Y., Henry, K.D., Twiss-Brooks, A.B.,
McLafferty, F.W., 35th ASMS Conference on Mass Spectrometry
and Allied Topics, Denver, CO, p.391 (1987).

McLafferty, F.W., Stauffer, D.B., W, 25.,
245 (1985).

Dayringer, H.E., Pesyna, G.M., Venkataraghavan, R., McLafferty,
F.W., Wm. 1.1. 529 (1976)-

Buchanan, B.G., Smith, D.H., White, W.C., Gritter, R.J.,

Feigenbaum, E.A., Lederberg, J., Djerassi, C., W”
98, 6168 (1976).

Carhart, R.E., Smith D.H., Gray, N.A.B., Nourse. J.G., Djerassi, C., .L
W46. 1708 (1981)-

Crawford, R.W., Brand, H.R., Wong, C.H., Gregg, H.R., Hoffman,
P.A., and Enke, C.G., Anal._Ch_em,, 16,, 1121 (1984).

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

36

Cross, K.P., and Enke, C.G., mummy LQ, 175 (1986).

Cross, K.P., Palmer, P.T., Beckner, C.F., Giordani, A.B., Gregg, H.R.,
Hoffman, P.A., and Enke, C.G., in Pierce, T.H., Hohne, B.H. (Eds.),
“Artificial Intelligence in Chemistry”, ACS Symposium Series No.
306, American Chemical Society, Washington, D.C., 1986, p. 321.

Dawson. PH. Sun W.. WW1. 5.5..
155, (1983/1984).

Martinez, R.I., Cooks, R.G., 35th ASMS Conference on Mass
Spectrometry and Allied Topics, Denver, CO, p. 1175 (1987).

Martinez, R.I., Dheandhanoo, 8., MW
M.BA. 1 (1988).

Martinez. R.I. 83W”; 8 (1988).
Martinez. R.I. W111”; 127 (1989)-

Dheandhanoo, 8.. WJL 266
(1988).

Martinez, R.I., 37th ASMS Conference on Mass Spectrometry
and Allied Topics, Miami Beach, FL, in press (1989).

Weber, J.J., Thuijl, J.V., De Jong, J., W L8}, 195
(1986).

Giblin, D.B., Peake, D.A., Lapp. R.L., 32th ASMS Conference on
Mass Spectrometry and Allied Topics, San Antonio, TX, p. 644
(1984).

Zidarov, D., Bertrand, 1., 37th ASMS Conference on Mass
Spectrometry and Allied Topics, Miami Beach, FL, in press
(1989).

Bertrand, J., Zidarov, D., 36th ASMS Conference on Mass
Spectrometry and Allied Topics, San Francisco, CA, p. 1384
(1988).

56.

57.

58.

59.

60.

61.

62.

63.

64.

65.

66.

67.

68.

69.

37

Johnson, R.S., Biemann, K., 36th ASMS Conference on Mass
Spectrometry and Allied Topics, San Francisco, CA, p. 1398
(1988).

Enke, C.G., Wade, A.P., Palmer, P.T., Hart, K.J., AWL, 5.2.
1363A (1987).

Wade, A.P., Palmer, P.T., Hart, K.J., Enke, C.G., MM
215, 169 (1988).

Palmer, P.T., Hart, K.J., Enke, C.G., Mama, 3_6_, 107 (1989).

Palmer. P.T., Enke. C.G.. Was» 8.8..
81 (1989).

Hart, K.J., Wade, A.P., Palmer, P.T., Nourse, B.., Enke, C.G., Anal,
W accepted- '

Hart. K.J.. Enke. C.G.. Wm
53.11.19.011. in press»-

Bozorgzadeh. M.H.. W2. 61.
(1988).

Bozorgzadeh, M.H., MES—399521.132!!!» 21, 712, (1988).

Gray, N.A.B., Buchs, A., Smith, D.H., Djerassi, C., W913.
61L, 458 (1981).

Smith, D.H., Gray, N.A.B., Nourse, J.G., Crandell, C.W., AnaLChim,
mm 471 (1981).

Crandell, C.W., Gray, N.A.B., Smith, D.H., Mm
551,, 21, 48 (1982).

Lindley, M.R., Gray, N.A.B., Smith, D.H., Djerassi, C., W
41, 1027 (1982).

Egli, H., Smith, D.H., Djerassi, C., W 61, 1898
(1982).

70.

71.

72.

73.

74.

38

Djerassi, C., Smith, D.H., Crandell, C.W., Gray, N.A.B., Nourse, J.G.,
LindlCY. M.R.. WM. 2425 (1982)-

Lindley, M.R., Shoolery, J.N., Smith, D.H., Djerassi, C., Qrg._Mag,
Res" 2L, 405 (1983).

Gray, N.A.B., "Computer-Assisted Structure Elucidation", John
Wiley and Sons, New York, NY (1986).

Hart, K.J., Palmer, P.T., Enke, C.G., 36th ASMS Conference on
Mass Spectrometry and Allied Topics, San Francisco, CA, p.
1388 (1988).

Hart, K.J., Enke, C.G., 37th ASMS Conference on Mass
Spectrometry and Allied Topics, Miami Beach, FL, in press
(1989).

CHAPTER 2

Automated MS/MS Spectral Data Acquisition
for MAPS Reference Databases

Introduction

Reference databases containing infrared spectra and mass
spectra are routinely used for the identification of molecular
structure by matching the spectrum of an unknown with spectra of
reference compounds [1]. The MAPS (Method for Analyzing Patterns
in Spectra) software attempts to deduce spectral feature /
substructure correlations from reference spectra for use in
determining the presence of substructures in unknown compounds
[2]. An advantage of the MAPS approach is the number of spectra
required to glean these correlations is much smaller than a typical
database used for spectral matching. An additional advantage of this
approach is the ability to predict the presence (or absence) of a
substructure in an unknown compound regardless of whether the
spectrum for the unknown compound is contained in the database.
Only a sufficient number of compounds with a substructure to

produce a reliable rule are required.

The instrumental conditions used to acquire the spectra for a

given reference database are important and must be maintained

39

40

within the database. For the analysis of unknowns, the instrumental
conditions used must be the same as those used to create the
database from which the MAPS rules were generated. The
instrumental conditions used to acquire MS/MS spectra for three
reference databases are discussed in the next section. The
procedures used for automated data acquisition and data transfer
software are then described. The scope of the MAPS rules will
depend upon‘ the diversity of the substructures contained in the
standard compounds used to create the database. Thus, a description
of the standard compounds selected for the reference databases is
provided in a subsequent section. A discussion of some of the
irregularities discovered in the databases complete this chapter on

reference databases.
Key Instrumental Parameters

The instrumental parameters that affect the relative intensity
of daughter ions and the set of daughter ions obtained in a daughter
mass spectrum have been studied in depth for the purpose of
defining a set of standard instrumental conditions [3,4]. An
instrument-independent CAD (collisionally assisted dissociation)
database is the ultimate goal of the research into standardized
operating conditions [5]. Palmer has also studied these instrumental
parameters for two different TQMS instruments [6]. Collision energy
and collision gas pressure were found to be the most significant
instrumental parameters since they not only affected the relative

intensities of the daughter ions but which daughter ions were

 

41

obtained [6]. Another important parameter, the target gas, was

constant in these comparisons.

The MAPS software relies on the appearance of characteristic
spectral features within the MS/MS data space rather than their
relative intensities in the development and application of
substructure identification rules. Thus, only those instrumental
parameters that affect the appearance of spectral features need to be
strictly controlled. Therefore, collision energy and collision gas
pressure were carefully controlled for the acquisition of MS/MS
spectra to be included in the reference databases described in this
chapter. However, other parameters that affect instrument
performance should not be completely discounted. A carefully tuned
instrument is important for optimal ion transmission and mass
assignment. - Also, some MS/MS methods such as energy-resolved
mass spectrometry (ERMS) rely heavily on the relative intensity of
daughter ions. This method is often used for isomer distinction.
Since reproducible daughter ion intensities are difficult to obtain,
energy-resolved mass spectra are likely to remain instrument

dependenL

Ion activation in the "eV" region (i.e. l-100 eV) occurs
predominantly via excitation of vibrational modes of the electronic
ground state. The amount of internal energy deposited from a
collision of a neutral collision gas and a polyatomic ion is directly
correlated with collision energy [7]. The collision energy is defined

by the difference in potential between the source and the Q2 offset

 

42

on a TQMS instrument. Thus, the collision energy determines the
kinetic energy that an ion has on entering the reaction region (Q2).
It has been shown for n-butylbenzene that the internal energy of
parent ions after collision increases with collision energy up to 40 eV
(laboratory energy) with about 50% of the collision energy being
converted to internal energy; the fraction of the collision energy
converted to internal energy decreases as the collision energy

increases over the entire range tested [7].

The effect of increasing collision energy on the daughter
spectrum of the molecular ion of promazine is shown in Figures 2.1
and 2.2. These spectra were obtained with a fixed collision gas
pressure (0.4 mtorr). Few fragment ions are obtained at 5 eV since
the internal energy of the parent ion after collision is lower than the
critical dissociation energy for a number of the fragment ions. At a
higher energy, 30 eV, a number of new fragment ions appear in the
spectrum. The efficiency of fragmentation has also increased since
the relative intensity of the parent ion is about 60% rather than
100% at 5 eV. At the higher collision energies shown in Figure 2.2
more of the parent ion dissociates into smaller fragments. At 60 eV,
the base peak shifts from m/z 86 to m/z 58. Almost all of the parent
is dissociated at 75 eV. The relative abundance of several fragment
ions of meperidine are plotted versus collision energy in Figure 2.3.
Single collision conditions were used to acquire the data for this plot.
Two important features of this plot are the appearance of the m/z 56
daughter ion at 30 eV collision energy and the m/z 70 daughter ion

at a collision energy of 50 eV.

 

43

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

284.0 '3+05
100- ..
5 eV
p=0.41ntorr -2
90- '
so-
oo- '1
zo-
es.1
199.9
:"""""""" V V '7 "7 =
so 100 150 zio ago
203.9 *'a+os
loo- '-
18 eV
'°‘ ....nrrtorr
601 '1
«o-
.‘.2
199.9
204
seqo J 230 9
1 l -
'f"'I""T' 'Uf'i'l "I"‘ "
so 100 150 zoo 250
06.1 "E+04
too-
30 0" L5
9 s IMO nunrr
oo-
-5
so- 203 a -4
19o.9
-3
to-
59.0 -2
237.9
204
-1
70:2
A LE .4 A
""'If"'j"'ﬁ1"" ' 'fi " V
so too 150 200 250

 

a)

b)

c)

 

Figure 2.1: Daughter spectra of the molecular ion of promazine with a
collsion energy of (a) 5 eV, (b) 15 eV and (c) 30 eV.

 

44

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

06.1 tin-00
100-
45 0V
p=0.4rntorr -3
00-
50.0
00-
1-2 a)
200.1
‘°" 190.9 1
' 237.0
-1
2°- 70.1
I
3----‘1 .Le- ..*.- --tﬁ‘t‘-‘-- ...-a
so 130 1§0 230 2io
' 50.0 "E+01
1°°' 06.1 go .v +3
p - .AlllﬂﬂT
00-1
00- 52
2.3.. b
to- '7 )
-1
190.7
204 70.1 230.2
I 41 202.7
A .4. L- -A- “L A
'r"'l rT'Iﬁ"'lﬁﬁ"l"r'U"""
so 100 150 200 250'
' 50.0 ‘*e+00
100-
75 0V '1
p a .J'llﬂﬂ?
00-
P3
60~ 06.0
c)
-2
00d
237.9 -1
20-
190.3
42.2 70.1 ' 203.7
I ° 1 '1
.. -n. Ll - A- A 0
" '1 "'_I1T"lﬁ"'l ' r"'
so 100 150 200 250

 

 

Figure 2.2: Daughter spectra of the molecular ion of promazine with a
collsion energy of (a) 45 eV, (b) 60 eV and (c) 75 eV.

45

.333 .8358
no cocoon.“ a an @283 2:25qu mo 32 EoEmab .838 .8 352.55 gum—om "n." «...—ur—

 

oop om om ov cm 0
_

NE ..

 

... 110'.-
@m N}: n - ' w
l x.
n N... . .
at e .. o.

VH%

o
. __
«:0 «:0 010 2.30

2. N}:

 

 

46

Collision gas pressure is the other key instrumental parameter
which requires careful consideration. The average number of
collisions which an ion is likely to undergo is a function of the
collision gas pressure. At low collision gas pressures, fragment ions
are obtained. from a single collision with the target gas. This
pressure regime is referred to as “single-collision” conditions. At
higher collision gas pressures, some parent ions can collide with two
or more target gas atoms producing granddaughter ions,
greatgrandaughter ions, and so on. This pressure regime is referred
to as “multiple collision conditions”. The effect of increasing collision
gas pressure at constant collision energy (i.e. 30ev) is shown in
Figures 2.4 and 2.5. Figure 2.4 demonstrates the effect on the
molecular ion of promazine and Figure 2.5 shows the effect on the
m/z 199 promazine fragment ion. It is interesting to note the
difference in degree of fragmentation of the parent ions at 0.4 mtorr.
The promazine molecular ion shows much more fragmentation at this
pressure than does the m/z 199 fragment ion, thus demonstrating
the different reaction cross sections that different ions possess. It is
also interesting to note that there are several ions in the daughter
spectrum of the m/z 199 fragment ion (m/z 45, 71 , 96 and 155) that
do not appear or are very low in intensity in the daughter spectrum
of the molecular ion. The relative abundance of the m/z 56 ion of
meperidine is plotted against collision energy for three different
collision gas pressures in Figure 2.6. At the higher collision gas
pressures, the m/z 56 daughter ion is observed at all collision
energies while under single collision conditions, this ion is only

observed at collision energies above 30 eV.

47

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.: rtn+os
100a ..
Out luau"
ooq
L9
to-
-2 (a)
ooq
20‘ no.1 -1
95.1 199.0
59:1 239.0
- I I
""fl'ﬁ'fl'r"1'f"I' " ' 0
so 1oo 1so zoo ago
' 96.1 ‘°B+04
1oo-
‘Lz Inn»?
90- '6
oo- 50.1
-. (b)
‘0' 199 1 29£;o
-2
299.1
20: I
I239.o
A A l AL A AJA
' ' ' f! ' ' r' I I ' ' ' ' ' ﬁw I '1 ' 0
so 1oo 150 2:0 250
55.1 76.1 'B+0¢
1oo- '2
1.0 Intent
“PI
art
c
”A -1 ()
oo-
20‘
199.1
42.1 [ I 239 o
- L A I A
v ~ ' 1 f V‘r' I’ " V ' I ' "rl ' " " l ' 0
so 100 150 zoo 25o

 

 

2.4: Daughter spectra of the molecular ion of promazine with a
collsion gas (argon) pressure of (a) 0.4 mtorr, (b) 1.2 mtorr
and (c) 1.8 mtorr (CE = 30 eV).

Figure

48

 

 

 

19g;6 'B+05

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

100-
0.‘ mtorr
oo- -2
60" (a)
‘05-: pl
20-
10.7 1
so 100 150 2 0
199.0 rhoo
1oo- _
1.2 mtorr ‘5
00-4
-9
60¢
-9 (b)
40-
-2
167.1
204 | H
A A - -4 A A a A ‘1 JP A
V ' r ' r I ' ' r f 1 ' ' ' ' I ' ' ' ' V
so 100 150 200
155.0 'B+03
100"I " re
1.. mtorr
00d
-3
60-1
(0)
167.0 '2
40-
bl
20- '
155.0
4530 70:9 95 9
.. - ﬂ ' so 100 150? - - ' 200

 

 

 

Figure 2.5: Daughter spectra of the m/z 199 fragment ion of promazine
with a collsion gas (argon) pressure of (a) 0.4 mtorr, (b) 1.2
mtorr and (c) 1.8 mtorr (CE = 30 eV).

49

ESE w; 98 :28 NA ESE ed no 8:3er mam 53:8 a can 35:0 .8353 «o
5:25— a 3 3:03 05289:: we :3 Houzwauu om n}: 2: he 852.555 03.23—

SYN

0.53.,—

 

 

    

Agkwov HO
2: on 8 3. cm a
- I — b F n u I u ” o
:25 To .
1 or
- om
%
o w
a a __ 230 r on
- 3
z
:2... a; £0
- om

 

 

50

The major goal in establishing a collision energy and collision
gas pressure standard for a reference database is to ensure that the
collision products obtained for identical parent ions are reproducible.
The standard will ensure that parent ions encounter the same
number of collisions (on average) at the same collision energy.
Figure 2.7 ”demonstrates the reproducibility of the daughter spectra
of a particular m/z 141 ion obtained from six different barbiturate
compounds. Figure 2.8 shows similar consistency in the daughter

spectra of the m/z 91 ion from six different phenols.

There are two major collision gas regimes that are routinely
used to obtain daughter spectra. The first is single collision
conditions which has the advantage of detecting bone fide neutral
losses (i.e. a parent ion dissociating to a daughter ion via one collision
and no daughter ions being further dissociated from additional
collisions with the target gas). The second pressure regime is
multiple collision conditions. These conditions have the advantage of
producing more fragment ions than single collision conditions,
especially for parent ions with small cross sections. Thus, two
complete MS/MS databases were acquired to use in‘ generating MAPS
rules. A third database was created which contains MS/MS data for
some of the reference compounds at a third collision gas pressure.
This third pressure is about 3 times the single collision gas pressure.

The data from this database were used mainly for evaluation

51

£5 5 3355 38s: 3:883 2: 9 wemcaemmoboo 38a: 3.89:8 .8... ﬁn 03:. com

.oSma
.99.:

ed 0 .— .>o cm u .38 ”3833.8: .8 but: a no new 80895 3: use on. we «boon... ~85an

 

 

 

 

 

 

 

 

 

OOH ON” OOH O. O0 O0 ON
0 1 b! D» b m In”! Ll I it b 4‘
0.00
ION
Oooa IOv
HI
IOw
IO.
«1 OOQw'
[OOH
OO+Her ooOvu b
Ocd ONH OOH OO O0 O0 ON
D h I I h l b r L P I
‘ - ‘ ‘
O
OoOO
IO“
ovOa IO.
lbw
HI.
IO.
«3.. :2:
0O+ﬂ¢ OoOvH

 

up." 952..—

 

 

 

 

 

 

 

 

 

 

 

o a oma can o. on a. on
o I h I ‘ I .0 D 1 L“ ‘1’! b“ F 11‘. ‘F‘ P 0
a..a
ton
to.
d]
o..o to.
re.
N1
.90.!
roan
oo+u., ..ovu -
a.” can oma a. o. o. o~
D u p L 5 TL P n u h
‘ 4‘1 1 d ‘
.
o.np
o.o.
«1 ro~
.1 1oo
o..a
or no»
.1 no.
he“:
too"
.o+m. ..oou

 

 

 

52

.oSww «Eu 5 3255 3:8: 855%.. on. 2 mamucamohoo 3:3: 3.59:8 Bu ~.~ 298. new

Abs:— vd n a .>o cm u .39 22.2.9 .3 but: a we :3 Eon—nab 3 Q6 2: we «been» Hana—3Q and «...—a:—

 

 

 

 

 

 

 

 

 

 

5 9. . - 3% » $~ $m
1 11 1 1 d d a:
_ an.
can.
tow
HI.
to.
N1
too
1
lo.
7 Bio
tooa
mo+met Oooo s
on on ov ow
s . . . - . _ 1
‘ - - — I I ‘
Fl
..~o ...n
. Ion
o.vo
so.
7 T
row
to.
«56
Icon
vo+no p.oo -

 

 

 

 

 

 

 

 

 

 

on on a. on
D n n p b F b
I 11 J1 1 1 I l- . 1
_ _
_ n no
o.vo
ION
loo
1
low
[on
n¢§=v
[can
mo+m.- p.09 L
on on ov o~
} n p n F p n P L.
‘ 1 d4 d d d l.
— .
vsnv
show
lo~
HI.
:5.
«I too
loo
3
:30 :2:
no+n.» aJDMl. s

 

 

 

53

purposes rather than for rule generation. Table 2.1 summarizes the
standard conditions used in creating three reference databases. The
collision energy selected was 30 eV for all three databases. Three
different collision gas pressures were selected so that the possibility
of using different pressure regimes as ancillary experiments to

reduce the number of candidate structures could be assessed.

 

Collision energy, ELAB: 30 eV
Collision gas: argon
Collision gas pressure:

Reference DB Pressure*
#1 0.4 mtorr (2.0 X 10-6 torr)
#2 1.2 mtorr (6.0 X 10-6 torr)
#3 1.8 mtorr (1.0 X 10-5 torr)

* the first reading is the average convectron gauge reading while the
reading in parentheses is the indicated manifold reading.

 

Table 2.1: Standard instrumental conditions for creation of MAPS
reference databases using a Finnigan TSQ-‘70 TQMS.

Automated Acquisition of MS/MS Spectra

Data systems which allow for automated data collection for
commercial TQMS instruments have become increasingly available
over the last few years. This important development has spurred
the acquisition of MS/MS spectra for inclusion in the MAPS reference
databases. The instrument used for this work was a Finnigan TSQ-70
(triple stage quadrupole) equipped with a gas chromatographic and

direct inlet. An instrument control language (ICL) is provided with

54

this instrument, which allows the user to pre-program sample
introduction, ionization and spectral acquisition. The following
paragraphs describe the experimental method used to acquire
MS/MS spectra of a reference compounds for inclusion in the

reference database.

MS/MS fragmentation maps using BI ionization for a variety of
standard compounds were acquired to build the reference databases.
Each compound possesses an assortment of substructures so it is
probable that one compound will contribute spectral features to
several different substructure identification rules. A fragmentation
map consists of the primary mass spectrum and daughter spectra for
each ion that had an intensity greater than 1% of the base peak in
the primary mass spectrum. Daughter spectra were not collected for
peaks less than 1% of the base peak to decrease the scan time
required to take a complete map and to avoid unsuitable daughter
spectra due to insufficient signal. As shown in Table 2.1, daughter
spectra for each reference compound were collected at two collision
gas pressures, 0.4 mtorr (2.0 X 10'6 torr indicated pressure on the
manifold gauge) and a pressure three times higher than the first, 1.2
mtorr (6.0 X 10'5 manifold). The first pressure provides single
collision conditions while the latter pressure is closer to commonly
reported CAD pressures on the TSQ-‘70. A third fragmentation map
was also collected at a third collision gas pressure (1.8 mtorr, 1.0 X

10'5 manifold) for some of the standards.

55

The collision gas pressure used to acquire daughter spectra was
set using an indirect indication of the pressure in Q2. The stability of
the convectron gauge which reads the pressure in Q2 was insufficient
for setting a specific pressure on a routine basis. The relative
stability of the manifold ion gauge versus the Q2 convectron gauge is
shown in Figure 2.9. The peak area of the m/z 69 fragment ion of
the calibration compound perfluorotributylamine (PFTBA) is plotted
against collision gas pressure as read from the manifold ion gauge
and the Q2 convectron gauge. The manifold ion gauge reads the
pressure in the manifold (low vacuum) region around the quadrupole
assembly and can give an indirect measure of the collision gas
pressure in Q2 from the leakage, of collision gas into the manifold
region. The smooth curve obtained using the manifold ion gauge
readings and the reproducible spectra shown in Figures 2.7 and 2.8
indicate that the manifold gauge can successfully be used to
reproducibly set the collision gas pressure for daughter scan

experiments.

Before any experiments can be performed on any mass
spectrometer, the instrument must be properly tuned to ensure
proper mass calibration and ion transmission. This is accomplished
by monitoring the mass spectrum of a calibration compound while
adjusting the tune parameters. This process continues until the mass
spectrum matches the “accepted” spectrum for the calibration
compound. Typical mass spectra of PFI‘BA for Q1 and Q3 scans are
shown in Figure 2.10. While adjusting the tune parameters, it is

recommended that the user check to ensure that m/z 69 is the base

56

 

Pressure Study using Indicated Pressure Readings
from Manifold Ion Guage

 

 

 

L6E+O4-
1
1
A 1.2E+04-+
m
‘0 4
N
\
é .
°- 4000.0:
J
0'0 ‘ l ' T ' T ' l ' I
0.000 2.000 4.000 6.000 8.000 10.000
COLLISION GAS PRESSURE ( X 1000000 TORR)
ION GAUGE (MANIFOLD) READING
Pressure Study using ' Pressure Readings from
02 Convectron Guage
1.6E+04-1
.
A l.2E+O4~
at
‘0 1
N
\
é .
8000.0-
5 . (b)
R
0' 4000.0~
0 O-

 

 

'o.ooo 0.2'00 foloo Y 0.500 1 0.500? 1.600 f 1.2'00
COLLISION GAS PRESSURE mTORR )
couvecmow GAUGE R INC

 

Figure 2.9: Plots of collision gas pressure measured by (a) the
manifold ion guage and (b) Q2 convectron guage versus
peak area of the m/z 69 fragment ion of perﬂuoro-tert-
butylamine (PFTBA).

57

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

69.0 '-a+os
1001
PFTBA ( 01115) _6
80-4
-5
60‘ 1.4
131.0 219,0 ('l
-3
1o-
-2
20-
169.0 265 0 413.9 501.9 '1
l l i 463.9 l 613 9
I
O'I’CW' “‘1'" VVVVVV I vvvvvvvvv 'vvlvvv'l'VV'VA'szvvv‘vvlvl vvvvv 0
100 200 300 100 . 500 600
T?9.1 . tn+06
1001 -6
PFTBA» (trans )
b
sou 5
-4
60d
(b)
-3
213.9
40. 130.9
-2
20%
263.8 '1
169.9 l 413.9
9* ": Vivi'I' :7v'vvvlvAVVVfTA'Af'?VlV‘VfVVLVVVfVVfVIV'Afv'vTVVVV 0
100 200 300 400 500 600

 

Figure 2.10: Primary mass spectra of PFTBA using (a) Q1 and
(b) Q3 as the scaning quadrupole.

 

58

peak, m/z 131 and m/z 219 are approximately 50% of the base peak
and that m/z 502 and m/z 614 are discemable (over 1% of the base

peak). The mass calibration of these ions should not deviate from
the actual values by more than 0.1 amu since masses that fall +/- 0.1
amu of a “half mass” (i.e. 37.5 amu) are rounded the half mass value.
Masses outside this range are rounded to the nearest unit mass for

use in the MAPS software.

Four experiments were performed to collect the MS/MS data of
each standard. All of the standards used were solids and were
introduced into the ion source via a direct insertion probe (DIP). The
samples were volatilized by heating the probe tip according to a
predefined temperature program. -The first experiment used an ICL

procedure called KHTIC, shown in Figure 2.11, and a probe procedure

 

KHTICJCL
I=100 {set counter}
ON;EMULT=1000 {turn multiplier on and set voltage to 1000 V}
Q3MS 40,550,l.0 {set scan mode to a Q3 mass spectrum,
scanning from 40 to 550 amu in 1 second}
CENT {collect centroid data}
ASTART {start acquiring data to the disk}
WHILE I > 0 {begin data acquisition loop}
GO;STOP {acquire one scan}
I-=l [ decrement counter}
END [end loop}
OFF {turn multiplier off}
ASTOP {stop acquiring data to the disk}

 

Figure 2.11: The ICL procedure used to repetitively acquire 100
primary mass spectra for characterization of probe
temperature programs.

59

called KHPROBE to determine the total ion current generated versus
probe temperature. The results of this experiment were used to
create a probe temperature program to volatilize the sample at a
relatively constant rate. The reconstructed. total ion current obtained
for promazine is plotted against time in Figure 2.12. The
reconstructed ion current obtained for m/z 284 (the molecular ion of
promazine) and for the m/z 199 fragment ion are also shown in this
figure. As indicated in the KHTIC procedure, each scan is a Q3 mass
spectrum (Q1 and Q2 are in "rf only" mode, that is, they pass all ions)
and requires 1 second. Thus, Figure 2.12 shows that after about 20
seconds, the spectra acquired as promazine is heated off the probe

crucible remain relatively constant.

A second experiment was performed to collect the primary
mass spectrum for the sample. The ICL procedure used was
KHMAPI which is listed in Figure 2.13. This procedure sets the scan
mode, mass limits, scan rate and multiplier voltage, calculates which
ions are at least 1% of the base peak and appends the masses of
those ions into a user list. The procedure also estimates the total
scan time for collection of daughter spectra for all of the masses in

the user list.

The third experiment uses the KHMAP2 ICL procedure, shown
in Figure 2.14, to set the instrumental parameters for each daughter
spectrum. The ions selected in Q1 (parent ion) are read from user

list 1 created in the previous experiment. The KHMAP2 procedure

60

6552.85 .8“ 2.2.50 com .32 05 3 can :3
3300—08 own a}: £3 Eon—mac mm a}: 0.: A3 .3 «8233825 :3 couoabmcooom "an." «...—arm

A233 2.33 5

 

 

 

 

 

 

 

 

 

a zoom
com omw com omﬁ ooa on

I P P — L b n P n n n n p n + b L b n n n p . n p n n p b0

N

A3 6

o

32.". a
3;... 0H... 2: ...

.3

e

2: 1;

as.» x;
mo+mn ENC}. .2: ...

EN

.3

A3 .3

32a _ r;
63»... :1 3:? 53 w

 

 

61

 

KHMAPLICL
UCLR 1;UCLR 2;UCLR 3;UCLR 4 {clear user lists}
=190;J=1;TSCANT=0 {initialize counters and variables}
CENT {acquire centroid data}
ON;EMULT=800 {turn multiplier on and set voltage to 800 V}
DOZE 30 {wait 30 seconds}
ASTART {start acquiring data to the disk}
Q3MS 40 190 2.0 {set scan mode to a Q3 mass spectrum from
40 to 190 amu in 2 seconds}
GO;STOP {acquire one spectrum}
MAXAREA=AREA(40,190,1) {return base peak intensity]
ASTOP {stop aquiring data to the disk}
OFF {turn multiplier off }
WHILE I>40 {begin loop}
PAREA=AREA(I) {return area of largest peak +/- .5 amu of I}
RATIO=PAREAIMAXAREA {calculate relative intensity}
IF RATIO > 0.010 {if > 1% of base peak...}
UAPP MASS(I),1 {...append the mass of the peak to user list 1]
SCANT=(I-15)l200 {calculate an estimated scan time}
TSCANT=TSCANT+SCANT {total scan times for each peak}
END {end if loop} '
I-=1 {decrement mass counter}
END {end while loop}
UAPP TSCANT,3 {append estimated scan time to user list 3}

 

Figure 2.13: The ICL procedure used to acquire the primary mass
spectra found in the reference database.

 

62

 

KI-IMAP2.ICL

T1=MINUTE*60+SECOND {return current time}

ON;EMULT=1200 {turn multiplier on and set voltage to 1200 V}

CENT;J=1 {aquire centroid data and set variable}

DOZE 20 {wait 20 seconds}

ASTART {start acquiring data to the disk]

K = USIZE 1 {return the number of scans to be acquired}

WHILE I <= K [start data acquisition loop} '

PMASS = ULIST 1,1 {get massj in user list 1}

SCANT=(PMASS-15)/200 {calculate a scan time corresponding

to a scan rate of 200 amu/sec}

%1=MANPR*1000000 {return manifold guage reading, scale}

UAPP %l,2 [append scaled reading to user list 2}

%2=CPR { get Q2 convectron guage pressure reading}

UAPP %2,4 {append pressure reading to user list 4}

DAU PMASS,10,PMASS+5,SCANT,-30 {set scan mode to a
daughter spectrum of pmass from 10 to pmass+5 amu in
scant seconds with collision energy of 30 eV}

GO;STOP {acquire daughter spectrum}

J+=1 {increment counter}

END {end while loop}

T2=MINUTE*60+SECOND {get end time}

ET=T2-T1 {calculate elapsed time}

UAPP ET,3 {append elapsed time to user list 3}

ASTOP;OFF {stop acquiring data to disk and turn multiplier off }

 

Figure 2.14: The ICL procedure used to acquire the daughter
spectra found in the reference database.

63

calculates a scan time for each mass which corresponds to a scan rate
of 200 amu/s. Thus, all daughter spectra in the reference data base
have been collected at the same scan rate. The fourth experiment
also uses KHMAPZ, but at a higher collision gas pressure (set
manually). Separate experiments were run to collect these spectra to
allow the collision gas pressure time to equilibrate. Sample residence
times in the source were insufficient in most cases to allow the ICL
procedure to "doze" while the collision gas was equilibrating. For
some compounds, data for a third pressure were also collected using
this procedure so that three data points would be available for the

determination of reaction orders of daughter ions.
Data Transfer Software and Computer Facilities

The DUMP utility was used on the TSQ-70 data system to create
ASCII formatted files (with a file extension ".DAT") for the files
containing the primary mass spectrum, the daughter spectra at
pressure 1, the daughter spectra at pressure 2 'and, if appropriate,
the daughter spectra at pressure 3. These files were then
transferred to a VAXstation 3200. The TSQXFER program was then
used to convert the ASCII data into MAPS compatible format (a LISP
list format with a file extension ".LSP"). A program called GENF has
recently been written which generates the feature-bucket list used
by MAPS in the rules generation process from the various ".LSP"
files. This program was written in C and optimized for speed. The
LISP function "GEN-FEATURE-BUCKETS" was very inefficient when

the large data set was used. The computation time for generating the

 

64

feature buckets was 6 hours. The C version required 46 minutes
(real time) to process the entire data set on the VAXstation 3200.
The computing facilities available for the ACES software are
shown in Figure 2.15. The TSQ-70 data system runs on a PDP-11/73
computer and provides an interface to the instrument control
computer, an instrument control language (ICL) to allow automated
data collection, disk storage for data and library files and data
manipulation software for display of the MS/MS spectra. Two DEC
VAXstation computers are available to run. the ACES software. The
,VA'Xstation 3200 runs the ACES software while the AI VAXstation is

used as a general purpose group computer and for auxiliary storage

 

 

 

Ina-10 mommmeummoum I

 

PDFTMNIMEASWNEM IlﬁﬂﬁﬂlﬂﬂCOMﬁULMﬂﬁO I

 

 

D‘Cﬂ‘f ...

 

 

VIIIINNDN IAOINOOH
3200 ll
00231500 Papoctral Multitasking
Ammunﬂmn ' “W a"!
63!!
NR!

 

 

 

 

 

 

 

Figure 2.15: Schematic of the computing facilities available for
running the ACES programs.

of data files. A Macintosh [I using a Tektronix 4014 terminal
emulation package is used to allow remote data processing

capabilities, access to computer drafting packages and a Postscript

65

printer. All these computers are linked by DECnet to allow file
transfers and remote logins. Once the MS/MS maps are acquired,

they are stored in a reference database on the AI VAXstation.

Standard Compounds Selected for the Reference Database

An important area of application of the MS/MS technique is in
the analysis of pharmaceuticals. These analyses involve the
screening of formulations for active drug components, impurities and
synthetic markers, structural analyses of new drugs and quantitation
of drug metabolites in biological ﬂuids [7]. Since pharmaceutically
active drugs often have similar structures, MS/MS can be used to
establish the structures of variants of more commonly encountered
drugs [7]. This last point is especially important in the analysis of so-
called “designer drugs” [8]. The appellation “designer drug” stems
from the process of substituting functional groups on known drugs to
avoid regulation of the possession and distribution of the original
drug. Unfortunately, these substitutions can radically change the
potency of a drug and thus lead to accidental overdoses [8]. The
generation of substructure identification rules will provide
characteristic fragmentation patterns in the MS/MS data space for a
variety of drug compound classes. In fact, the rules generated by the
MAPS software represent the ﬁrst step in mixture analysis which is
the characterization of important fragment ions by examination of
spectra of standard compounds. These fragment ions are often used
to identify and quantitate drugs in complex mixtures as well as

structure elucidation of pure compounds.

66

Many of the compounds selected for use in creating the
reference databases are regulated drug compounds and are grouped
according to the following classes: opiods, stimulants, antipsychotics,
morphine substitutes and sedative hypnotics. Of the 105 compounds
that were included in the reference databases, 84 were obtained
from the Theta Corporation in “Theta-Kits”. Each standard in these
kits was dissolved in an appropriate solvent (usually methanol) with f
the concentration usually being 1 mg/ml. The sample crucible for
the direct insertion probe holds approximately 7 microlitres. Thus,
ﬁlling the crucible with standard solution and allowing the solvent to
evaporate delivers approximately 7 micrograms of the standard into
the crucible. The other 21 standards used were obtained from
General Motors Research. All of these standards were phenols. The
reference names, compound names and CAS numbers for the
standards used to create the reference databases ' are listed in Table
2.2. The nominal mass, molecular formula and the number of
daughter spectra collected for each of the 105 standards are shown
in Table 2.3. This table shows the diversity of elements and
molecular weights present in the databases. The masses shown in
bold face type in Table 2.3 have entries which are isomeric, that is,
there are other entries with the same elemental composition. A total
of 14,097 spectra were collected for the reference databases. Each
entry in a database consists of a primary mass spectrum and
daughter spectra for each of the ions which had an intensity greater
than 1% of the base peak in the primary mass spectrum. The

database for the third pressure (P3) does not include data for all of

67

8332. 3:282

80:39:09
05 mo :80 «8 838:: m<U 2.: «2:8: 0:30an «o 85 "a." «Ea-r

 

 

 

 

 

6.588 402.83%»: 28.2
398. 8888.83 88.2
3.8-«8 58968.80 ans-«.2
m-«m-co. £88882 o8m_2
0-31.3. 8:. 8:32 R82
8-8-08 Sex-2208.390“ 88.2
92-5 Aux-888.88: SE2
78-83 «oz-8.8895922: .«52
95.8 8:98: 88.2
78-2 . 25:89:82 58.2
«-868 Ego-89982.82: 822
0.9-3.2 E<tmb§~§9==8m 8E2
788. Sex-898.8a28 ««:_2
«-898 «oz-898.88.85.32 88.2
...-«98¢. 5m.8..=888.< 08.2
«18-8 8.88 28.2
98-5 .2921..« ..88: 8:20
«- 3.2: 8.29... ..88: 8:20
man-E. 98859922..-.- 828888 8:20
984. So« -A_bo9v-_>-_o~8_o~:$:«v-« ..88: 8:20
98-99 9:82.88:999280-1365: 88252 m«:20
«-38 :- -_§aE-m-295238E_E-«wz95..v.v ..88: 8:20
3358 958.-295095856-F. :«Ezocﬁaaegﬁtv ..88: 8:20
:«8 -2959??? w. F7.3-9«-AEsESEmEEEE-.. ..88: 9:20
«18-8 -95650-i£o_§oe_u-F. :- «2295......v .88: 2:20
92-5. 8.8 985 99288.0. v. o .98 988: 2:20
8-2.88 .28.- - 1 ... 1.-” v .. .. . -. «:99- me-n 989888.: 3:20
.-«3: 595695656-.. 5931-: «8289956596. «- ..88: «E20
72.-2 F 9565-1959388? F. :-£m_88_§9=-.«.« ..88: 2:20
$8-8 52.592555-.. :-$m_8=o_E_9=-.«.« ..88: 2:20
man-SF .88 958.88. 885.392: 820
«-288 998859880-.. ..88: E20
«-888 $595883. %20.« ..88: «:20
98.8 295695689_.:93.« 2988883.; 820
99.-332 953-..“-23.98656-F.Fv-«_9n8§_§a-.99_88: «:20
rem-8 -25695683. :0. ..88: «:20
«23-2. 92389988953-5-3. ..88: E20
n 93 9:: 0239.200 mozm:mn_m:

 

 

68

4:00

 

 

"ﬂu 030,—.

 

 

«-«m-«m 98 9.22.8 .82
9...-«m 52.8.... ..c««_2
98-8 8.2888: 8.«.2
9893 «2.8.2.828: 88.2
«-.98. «28.99.28 .«v..2
9.4.5 8.5.8 98.2
«-....-«m «2.8.98.8... «8.2
99.98 «2.2.9888 88.2
..m«.«« .889: 83.2
9m«-«« 8.5.89: 83.2
.-«9«« 8.99.29. «««.2
«-....-«m 9.2.9.92 «8.2
9««-«« «2.928.... 3.3-.2
99..-: 82.8989... «922
98-88 «2.298882 «mom-.2
78...: 9.28861 88.2
9. 78 8.5.882 .822
98-..... 8.98882 «822
«8-9-. 8.88 89.2
98.5 vow-8.5.08.0 88.2
«-89.. 8.893.... «..«22
.098 31.8.82: .822
«-....-«o 5.889.: .822
98-8. 61.8.5.5 88.2
m-«9.««« 588.8: 88.2
.-«98. 65883 88.2
«...«8 518.888.. 88.2
98-8.. 61.88982 88.2
9«Yo« 81.88890 ««8.2
98-8. .:<_m.8.88o.9._..o «892
78-8. .:<..m.88829.1 «812
9.8.8 61.829295: ..««o.2
9.98 8898.90 «88.2
98-8.. 01.89.932.91 ...«22
«-«98 8.88 88.2
«-««-«m 8.5.02 8.92
98.9.. .o1.8..o~2oe..xo 38.2
9««-3 ..01.8._o~eu«..m..o. «322
n 9.0 «.522 0229.28 mozm:mu.m:

 

 

69

....oo .~.~ 039—.

28.2... .22 5.8.. .z

.32.... .23 .35.... .58

£22.22... 81 55522.... .61

32%. to ...-...... .8

cm...
«88 8.58.388 8.12
«8.8... 88.52.... 88.2
7.9.... 8.2288 88.2
3.98.: 81.88532 88.2
«.8.2. 8.8.2225 8.«.2
98-«8 81.888238 88.2
«-«98 81.8.8322... 38.2
988.. 81.2.8282... 852
«88... 81.8.5882 88.2
98-88 882.82 88.2
«-888 82.8.8828 812
98-8 88882.... 3.2.2
«-888 82.8.8828 3.8.2
98-88 -.<2.o.8.«8oio.8< 8.2
0.8.8. 81.882288. 88.2
.-8-8 81.88.828.832 88.2
«.88 818.822.. 88.2
93-3.. 8.2.2.5 822
98-8 8.228....» 2.3.2
98.8. 3.288.: ....22
9««.«m 22:8... 88.2
.88..- 1m1.8.8888 88.2
«8...; 818828.02-.. 88.2
988. .81982283 «.8.2
$88.. 81.88882... 852
...-8-8... 81.8858 . 88.2
98-..... 22.888 88.2
«8-88 10.88.8588 88.2
«-88.. 5.8.8.. 88.2
.-«YB 81.8.2.8..2 ..«8.2

 

n m<0 m3<z Dyson—200 mozmcmumm

 

 

70

650983 8333 8:282

:80 ..8 353% «50on 33mg“. .8. .8982. can 62.—8.8m .5322: £338 3582. .«o 3.5 "n." «San.

 

8 n ca 2 2 m : u 22: SN
:. 82,3225 2.1.: «as
8 nos: 2 0 2.6 «S
an m ouz 2 :2 0 8:5 a:
an m 32 2 :2 u :5: a:
Z 32 :2 o 2 «so «a
3 n Caz : :2 u :22 2a
3 «z: a: u «32: 2a
2. m o a z 2 m 2 u SS: 8”
S 3%: z u $50 8"
2. a z 2 m 2 u «33: 8a
2 8S :5 3:6 8"
ma 3sz :3 .32: 2.
8 83:: :3 82—: a.
an 3 _ 2: mm. o «8:: :2
8 noﬂzﬁmmu 82: x:
S 323:6 go 3:
3 83,253 :5: 2:
3 33:22.0 3.2: e:
«N a o _ z 2 m 2 u 35:: a:
8 _o _ 22 m : u 85: E
.: 5.52:5 go a:
h 8.22:8 SE: a:
a .o _ 22 :2 u «22: a:
3 .z 2 m : u 26:: S.
«a 53:6 5:6 «2
2 .o _ 2m. :8 5.3: .2
3. .o: :2 u 2:0 9:
2 .z 2 :2 o 3%.: a:
: :5 =3 25: E
8 E a m 8 82: n2
R 32:36 33: «2
2 33:0 220 «S
a. _z 2 Eu 885 n:

 

III—“wage:

 

 

2:83"— 3:503:

0&0;qu

 

232 35:52

71

lb

 

 

33*

 

 

 

.83 EN 039—.
_mNZvN:a_U cave: NZ“
~0— Z N m _N U $2.22 8m
¢O~Z~NSSU 2v”: man
Q 0 n 2 an I 2 U hang: 8m
nO—ZmNmm—U N222 3m
N O n z N Z on U Rho—2 aoN
n O— Z N m a— U 5032 aaN
n O n z —N I a. U MNVN—z maN
_ m N 2 ON I m— 0 onn=2 caN
V m N 2 ON I o— 0 Nana—z emN
_ O — Z 5N I a. U ooao—S an
n0~ 23 3: U @232 an
nO—Zo— 22 0 v2.32 an
mmNZcNm—SU 3022 va
_ m N Z 8 I C 0 ”avg: emN
~ O— Z N I a. U “ANS: 3N
N O N z .N I c. U 2332 SN
qonzgmmqo 2532 :N
_O_ZaN~.:.~U 2540 noN
‘ mON2a~ I: U swan—2 NoN
mO—ZMNmmwnU gem—2 SN
_ O N Z N I 2 U 3382 ch
~mNONZ§mN—U ~33: vnN
N O N z N— m n— U «6N3: NnN
N O ~Z N I n— O 3.632 :N
n O N Z i I Q U Son—E ovN
N Z «N E 2 0 name: 3N
~MNONZm—EZU N332 NVN
N 0 an I 2 U ~§O NVN
—z _N I 5 U NNZ—Z amN
n O N Z w— m N— U «ONES «MN
N O N Z 8 m m— U 30:: oMN
m C N Z 3 m N— U ”ans—z cMN
NO— 22 :3 U Sana: mnN
nONZNnmN—U 2:22 NmN
m o a z aim : 0 £3.35 23

«3.50m 8.3202 3.353% amu: 35:52

 

72

 

 

 

 

.28 ﬁnd «2:.
« O N 2 «m 2 NM U n««N~2 «hm
«ONn 2 RU «NM—20 3n
«ONomnnU 3 M20 can
_«_On2—m:VNU n«v:2 nNn
N O N 2 Nm 2 9.. U «Nam—2 Nn¢
N«NO«2¢N:«NU «ONE: «3.
nmNOMZ—MZVNU 3.3—2 an
«c.3580 «3:5 5.
hO~2mNENNU acne—2 n;
~«NOm2QNmmNU v32 .3
NO««2©NU «520 o;
nONZva—NNU nN—SE 3v
nOMZ — ~U UNI _NU #2222 «cc
N«_ON20N=~NU nnhnmz 8n
NmNZcNm ~NU NQNSE chm
N O on I nN U «395 «on
—mNOOm:NNU «NMEO «mm
~«NO¢«ENNU SEC «mm
~«N2nma_2«_U Naca—E Nnm
NON2«N=NNU 230—2 Nnm
~«—ON2¢NE¢NU «N552 com
NONmZMNU 3220 36
NO—ZQNINNU «nth: 9mm
vO—Z—NZANU ~««e—Z mam
~ON2«N:NNU ONQSE 0mm
NONZNN: ~NU «Nh«~2 vmm
m0N— chU «cow—2 Nmm
_« — 0N2 QNS a U No«n=z «NM
c O ~ Z —N m 3 U «ONE: th
«OcN—m "NU nNMEU on
NONZVchNU «bah—2 VNm
~Om2mN28U g0 mNm
v0—2~Nl«—U hN«©—Z n—m
—«~Z:U«—:«~U cog—2 n—m
«O—ZMNEE U v2.32 «3

 

320*

 

«35.8"— .3333:

 

cow—F383—

 

3“: 33:82

73

the 105 standards. This database was intended to provide a third
data point for assessing the effect of increasing collision gas pressure

for selected reference database compounds.

Purity of the Reference Database Standards

The primary mass spectra of the reference standards were
checked either by a matching program or by visual comparison to
hardbound spectral collections to ensure their purity. In some cases,
the purity of these compounds was confirmed by the supplier using
gas chromatography. If chromatographic introduction were practical
for MS/MS, the problem of the purity of the reference spectra would
be easier to resolve. An interesting advantage of using the MAPS
software to generate substructure identification rules, however, is
that the standards do not all have to be pure. The reason for this is
that the filters used by the MAPS software to. select the spectral
features for inclusion in the rules discriminate against impurities.
These filters are discussed in detail in Chapter 3 but a brief
description is provided here to illustrate their use in excluding
spectral features due to impurities. The equations used to calculate
the two values used by the filters are provided in Figure 2.16. The
equations shown are for the calculation of the uniqueness (U) and
correlation (C) of each of the spectral features contained in the
database. The U value is a measure of the specificity of a particular
spectral feature for a given substructure while correlation is a

measure of the frequency that a spectral feature appears in the

74

MS/MS data space with the substructure. A minimum U and C value
is speciﬁed by the user before the MAPS rules are generated. Only
those spectral features which exceed the specified minimums are

included in the rule for a given substructure.

 

Uniqueness Fllter:

# of cmpds w/ 38 and F

 

U =
# of empds w/ F

Correlation Filter:

# of cmpde wl SS and F

 

it of empde w/ SS

U: uniqueness value cmpds: compounds F: feature
C: correlation value SS: substructure

 

Figure 2.16: Equations used to calculate uniqueness and correlation
values for the spectral features found in the reference
databases.

The two requirements for impurity features to contaminate the
daughter spectra is that they must be at least 1% of the base peak or
that they coincide with a spectral feature due to the standard (which
is also at least 1% of the base peak). If the first requirement is met,
there will be a daughter spectrum of the impurity parent ion in the
MS/MS map of the standard. If the second requirement is met, the
particular daughter spectrum of the standard .will contain some

features due to the impurity. The following example illustrates the

 

75

ability of the U and C filters to discriminate against impurities. There
are 10 instances of the substructure “88143”, the base substructure
for barbiturates. This substructure is shown in Figure 2.17. Suppose
that three of these compounds are contaminated with the same
impurity which yields a number of spectral features in the MS/MS
maps of the standards. The maximum U and C values that can be
calculated for any spectral feature which is due to the impurity is
30%. If the minimum U or C value is set to 33% (that is, one-third of
the compounds containing SSl43 must possess the specified feature
in its MS/MS map), then the spectral features in the database due to
the impurity will not be included in the rule because they can not

meet the U and C filter requirements!

 

O
X\ A/x
0%

88143
X = any subsﬁtuent

 

Figure 2.17: Substructure definition for “88143” representing the
base substructure found in the structure of
barbiturates.

While the U and C filters help reduce the need for absolute .
purity of the standards, it does not remove all the purity'
requirements. The standard compound must still be the major
component of an impure standard so that daughter spectra are

acquired for the key ions produced by the standard compound. A

76

largely impure standard amounts to a blank entry in the database
and therefore has little value in generating the MAPS rules.
Reasonably pure standards will also allow lower U and C values to be
specified for use in generating “feature combination” rules. Thus,
some effort must be made to ensure the purity of the standards used

to create the reference databases.

Library searches were performed for the primary mass spectra
of all of the reference compounds. A standard was considered as
sufficiently pure if the library search returned the correct compound
as one of the three most probable compounds in the search hit list.
Seventy-three of the 105 reference compounds used to create the
reference database 'met this restriction. Of the remaining 32
compounds, 21 were not present in the NBS library [9] provided with
the TSQ-70 and could not be checked with a library search. The 21
compounds not in the library were checked against a hardbound
collection of mass spectra to ensure the major ions in the primary
mass spectrum matched those in the library mass spectrum. There
were eleven compounds that failed the initial library search test and
were potentially impure. The spectra for four of these compounds
were found to be visually indistinguishable from the hardbound
library spectra. Interestingly, the spectra for these four compounds
had very few major peaks. Thus, the matching algorithm had only a
few peaks to match and failed to recall the correct compound from
the database. This situation argues for the use of alternative
ionization methods since BI ionization can yield only a few major

fragment ions in certain compounds. Spectra for several of the

77

remaining seven compounds exhibited low ion current so the ion
statistics (and thus, the relative intensity of the peaks) were not
comparable to the library spectra. The data for these seven
compounds were acquired again and the old data deleted from the
database. Thus, the spectral features found in the database are

representative of the compounds used to acquire the spectra.
Irregularities in the Reference Database

Examination of the feature-buckets obtained from the
reference "database exhibited a disconcerting number of “half mass”
entries in the daughter (D), neutral-loss (NL) and parent-daughter
(PD) buckets. This occurrence . was unexpected in that “collisional
ionization” reactions (i.e. M+ + T -> M“ + T+ e') do not usually occur
under low energy CAD [7]. Also, there was a number of misclassiﬁed
neutral loss entries (eg. neutral loss of 9 amu). Examination of the
profile data (the database contains centroid data) indicates two
sources for these entries in the feature-buckets. The first is the
presence of doubly charged parent ions which dissociate with charge
retention into doubly charged daughter ions. These ions are
potentially useful for structure determination, however, they seem to
be formed only in low concentrations. The other source of the “half

mass” entries is erroneous peak finding.

A partial primary mass spectrum collected in profile mode for
hydroxyamphetamine is shown in Figure 2.18. The instrument was

tuned for unit resolution from 12 to 700 amu. The peaks at m/z 59

78

and m/z 68, however, are severely broadened. The peaks at m/z 57,
65 and 71 . possess the peak shape typically obtained on this
instrument. Therefore, the peak broadening is likely due to the
presence of peaks at m/z 58.5 and 67.5 (doubly charged: z=2, odd
mass ions). Daughter spectra of these ions were taken to establish
their structure and are shown in Figure 2.19. The scan range for Q3
was doubled to detect any singly charged daughters with masses
larger than the parent ion. Signiﬁcant daughter ions were obtained
at m/z 116, m/z 89 and m/z 67 for the m/z 58.5 parent ion. For the
m/z 67.5 parent ion, daughters were obtained at m/z 134 (loss of
H+), m/z 116 (loss of H304) and m/z 107 (loss of C2H5+). The m/z 19
and m/z 29 daughter ions formed from these losses are also

observed in the spectrum.

The daughter ion at m/z 58.5 (obtained from the 67.5 parent
ion) is of particular interest in explaining apparently spurious
neutral losses (eg. 9 amu). The software that converts raw data into
LISP format calculates neutral losses by subtracting the mass of the
daughter ions from the mass of the parent. Thus, for the m/z 67.5 ->
m/z 58.5 reaction, a neutral loss of 9 amu is calculated. In reality,
the mass of the m/z 67.5 parent ion is 135 amu .and the mass of the
m/z 58.5 daughter ion is 117 amu. The actual neutral loss is,
therefore, 18 amu which corresponds to a neutral loss of water (with
the retention of both charges). The loss of 19 amu (H30+) to form the
m/z 116 daughter ion was not found in the reference database
because this ion was beyond the mass range selected for Q3.. In

addition, the optimal instrumental conditions required to observe the

79

6.5 a): 28 Own a}: 3 2.8 @0320
33:3 9:39: oiESoﬁEazxegn ..o 9:3 mhﬁnv 83.8% 258 >555 3939 < "3." 0.52...—

 

 

 

 

 

 

 

 

 

 

 

 

 

 

mp or mm
o - b b n D - D D I I b $
.mamp
rmamp tau
_ F
H.o> ~.op
Ice
«93$
: . m.mm
H _ H.H> m.pm 1‘ [cm
~.Hp o.>m

 

 

 

-
H.mm _ C m.~m H.>m
_

 

 

 

IooH
Sim..- ﬁmm

 

8C)

 

 

[*5 l [*5 ll

5|.2 'B+05
100-

00‘
'1

 

 

 

 

 

 

 

(a)
116.1
- -o
120
' 67.4 *E+OS
1001
-1
oo- 134.1
I
60-
53.7 (b)

 

 

 

 

 

20 40 60 80 100 120 140

 

 

 

Figure 2.19: Daughter spectra of the (a) m/z 58.5 and (b) m/z 67.5 fragment ions
of hydroxyamphetamine (Elab = 2 eV, p = 1.8 mtorr).

81

larger daughter ions of the doubly charged parents were different
than those used for collecting the reference database spectra. The
conditions used to obtain the spectra shown in Figure 2.19 were low
collision energy (Elab = 2 eV) and multiple collision conditions (1.2
mtorr Ar). Figure 2.20 shows the daughter spectrum for the m/z
67.5 parent ion obtained using the same conditions as those used for
collecting the reference databases (Em, = 30 eV, p = 0.4 and 1.2
mtorr). The only major ions obtained using these conditions were

m/z 58.5 and m/z 29.0.

Doubly charged daughter ions have been observed in the
spectra for a number of other reference compounds. Morphine, for
instance, produces a doubly charged ion at m/z 108.5 ( 5% RA in the
primary mass spectrum). The daughter spectrum obtained for this
parent ion is shown in Figure 2.21 (a). A daughter spectrum was also
taken for m/z 142.5 (half the molecular weight of morphine) to
determine if there was an appreciable amount of doubly charged
molecular ion produced. The daughter spectrum obtained is shown
in Figure 2.21 (b). A loss of H+ from the doubly charged molecular
ion can be seen in this spectrum (m/z 284) as well as a substantial
amount of m/z 108.5. Doubly charged ions are also present in the
primary mass spectrum of oxymorphone at m/z 112.5 (4% RA), m/z
113.5 (2% RA), m/z 99.5 (2% RA), and m/z 98.5 (4% RA). Daughter
spectra were taken for these parent ions and m/z 150.5 (half the
molecular weight of oxymorphine). The daughter spectra obtained

for these ions are shown in Figures 2.22 - 2.24.

812

 

 

 

 

T 67.6 times
100-
-6
to-
so-J
-4
40-
-2
- o

 

 

 

 

 

 

 

 

Figure 2.20: Daughter spectrum of the m/z 67.5 fragment ion of
hydroxyamphetamine with Bus 2 30 eV.

823

 

 

 

 

 

 

 

 

 

 

 

 

 

x10
[ 142l.8 11:34.05
100-
-e
.0. 108.5
-6
60-
215.9 (.)
40- "
so 100 150 200 250
x5
l 10 .0 II‘B+05
100-
La
00-
'6
(b)
so 100 150 200

 

 

 

 

Figure 2.21: Daughter spectra of the (a) doubly charged molecular ion and (b) the
m/z 108 fragment ion of morphine (Elab . 2 eV, p a 1.8 mtorr).

84

The other source of the “half-mass” entries is erroneous peak
finding. The peak at m/z 66 has a “shoulder” which can be
erroneously assigned as a separate peak and rounded to a “half-
mass” by the data transfer software. Since both the shoulder and the
major portion of the peak are present, no data are lost by ignoring
the “half-mass” peak. However, if the utility of the real “half-mass”
ions are to be explored, there is a possibility that some of the
spurious peaks may be included in a rule. Therefore, “half-mass”

entries in a MAPS rule must be manually verified.

The mass filter normally used by the MAPS software does not
allow “half mass” features to be considered for inclusion in the
substructure identification rules. Also, the rules generated so far do
not include the misclassified neutral losses such as 9 amu. Judicious
choice of the minimum U and C values used by the MAPS software
effectively prevents the inclusion of these features. If one of these
features is passed by the U and C ﬁlters, then daughter spectra for
the associated compounds should be taken to determine the actual

value (i.e. 18 amu vs. 9 amu).

85

 

 

x10
98.7' l1a3+05
1 00-

L4

00‘

60‘

 

40‘

20‘

 

 

 

50 100 150 200

 

 

 

I Ix 10 I
99.9 FE+05
100- r-G
so: '5
L4
60‘ (b)
'3
40‘
p2
20‘
’1
ol -o

 

 

50 100 150 200

 

 

 

Figure 2.22: Daughter spectra of the (a) m/z 98.5 and (b) m/z 99.5 fragment ions
of oxymorphone (Elab a 2 eV, p a 1.8 mtorr).

86

 

 

100-

80d

60‘

‘0‘

 

x50
112.

 

 

 

 

200

50 100 150

I*E+06

p2

(I)

 

 

80-

60‘

40‘

20"l

 

 

x30
113.

143.6 169.9 19741225.:
l ' -:

 

 

200

50 100 150

lﬁz+06
-2 '

(b)

'1

 

 

Figure 2.23: Daughter spectra of the (a) m/z 112.5 and (b) m/z 113.5 fragment .

ions of oxymorphone (Elab = 2 eV, p a 1.8 mtorr).

 

8‘7

 

 

 

   
  

 

   
 

 

 

| [x10 |
151.1 ts+os
100-
-6
so-
60- _4
404
169.2 299.9 -2
201 123.6 176.5
r' 207.9 227.9
0‘ -o

50 100 150 200 250 300

 

 

 

 

Figure 2.24: Daughter spectrum of the doubly charged molecular ion of
oxymorphone (m/z 150.5).

88

References

1. Small, G.W., AM” 52, 535A (1987).

2. Wade, A.P., Palmer, P.T., Hart, K.J., Enke, C.G., AW
215, 169 (1988).

3. Dawson. F.H., Sun W.. W55.
155, (1983/1984).

4. Martinez, R.I., Dheandhanoo, 8., MW
Emma. 1 (1988)-

5. Martinez,“ R.I. W.L 127 (1989).

6. Palmer, P.T., Ph.D. Dissertation, Michigan State University, East
Lansing, MI, 1988.

7. Busch, K.L., Glish, G.L., McLuckey, S.A., "Mass Spectrometry /
Mass Spectrometry: Techniques and Applications of Tandem
Mass Spectrometry", VCH Publishers, Inc., New York, 1988.

8. Baum, R.M., m 61,7 (1985).

9. NBS/NIH/EPA/MSDC Mass Spectral Database, National Technical
Information Service (NTIS), 5285 Port Royal Road, Springfield,
VA 22161.

CHAPTER 3

Generation of MAPS
Substructure Identification Rules

Introduction

 

Reliable substructure identification rules are one of the crucial
elements in the Automated Chemical Structure Elucidation System
(ACES). The rules used in this system are generated by the MAPS
(Method for Analyzing Patterns in Spectra) software [1-4]. This
software utilizes the structural information ”contained in mass
spectrometry. / mass spectrometry (MS/MS) data to formulate
spectral feature / substructure correlations. Other interpretive
systems have been devised based on spectral matching (eg. STIRS)
and fragmentation rules (eg. DENDRAL) [5-7].

As was discussed earlier, a spectral matching approach is
inappropriate for interpretation of MS/MS data due to the variability
of daughter spectra and the incumbent lack of a library of daughter
spectra. The DENDRAL project took a knowledge engineering
approach to provide substructure identification rules. Knowledge
engineering is an artificial intelligence method where the knowledge
of human experts regarding a speciﬁc problem domain is captured in

a format usable by computers. The objective was to formulate rules,

89

9O

typically ”IF-THEN” rules, to emulate the process by which the
human experts solve problems. This project was quite successful in
developing artificial intelligence (AI) technology but has not been
recognized as a success in mass spectrometry. The limited results
achieved in interpreting mass spectra is due primarily to the
incomplete understanding of mass fragmentations which encompass
the huge variety of compounds that are analyzed by this technique.
An other significant disadvantage of this method is the reliance on
primary mass spectra, which has in our experience, less information
for an interpretive approach than techniques such as MS/MS. The
MAPS software, on the other hand, uses an empirical approach to
derive the substructure identification rules. Few assumptions, if any,
are made by this software ”about the potential fragmentation
pathways that are open for any given substructure. All pre-
programmed information regarding the substructures are well
known and not subject to interpretation (eg. when using a mass
filter, the elemental composition of the substructures and their
masses are used to limit the spectral features considered for each

substructure).

There are two types of MAPS rules. The first type of rule is
used to predict the presence of a substructure in the structure of an
unknown compound. These rules are referred to as inclusion rules.

These rules have the general format:

"IF <spectral feature f1> is present
and <spectral feature f2> - is present
and <spectral feature fx> is present

THEN substructure ' X ' is present."

91

The second type of rule is used to predict the absence of a

substructure and has the general form:

”IF (spectral feature f1> is absent
and <spectral feature f2> is absent
and <spectra1 feature fx> ' is absent

THEN substructure X is absent."

These rules are referred to as exclusion rules. This chapter is
devoted mainly to the generation and evaluation of inclusion rules.
The generation of more effective exclusion rules remains an open

area for further research.

The following section describes the evolution of MAPS software
and is followed by a detailed description of the software. A new
method for generating the rules is introduced and the software
written to use this method is then described. This discussion
presents two different versions of the MAPS software, an interactive
LISP version and a optimized C version. An analysis of several of the
MAPS rules and a comparison of MAPS rules generated using MS/MS
data acquired under different collision conditions are provided in

Chapter 5.

MAPS Software Development

The original version of the MAPS software was written by Dr.
Adrian Wade in InterLISP-D for a Xerox 1108 Workstation [1]. This
software was subsequently modified by Dr. Peter Palmer to

incorporate a number of the features used for this work (MAPS

92

version I) [2]. Since the hardware resources (i.e. hard disk space,
physical memory and computation speed) of the Xerox computer was
limiting the number and types of experiments that could be run
using this system, the MAPS software was ported to Common LISP
running on a VAXstation computer [3-4]. This new code (MAPS
version 11) was developed with the programming assistance of Chris
Weaver, with several functions being converted .to C for speed by
another undergraduate, Drake Diedrich. Common LISP was chosen
because it runs on a number of computers with more computing
power than the Xerox 1108 (eg. 80386 based PC's and DEC
VAXstation computers). A decision was made to run the software on
a DEC VAXstation 3200 rather than the 80386 based PC because
more physical memory was available to the software on the
VAXstation (i.e. 16 MB vs 640 KB). Much was learned using the LISP
version of the MAPS software but it was limited in some respects by
the inefﬁciency of LISP versus other programming languages such as
C. Thus, a C version of MAPS (version 111) was developed with the
programming assistance of Drake Diedrich. Many AI programs are
developed in LISP and later translated into C for speed. This
transition is often observed in Al based projects where the LISP
prototype is used to solve problems with a previously unknown
algorithm and an optimized C version is used to increase the
efficiency of the newly discovered algorithm [8-10]. The following
discussion focuses on MAPS version 11. The third version of MAPS is

described in a subsequent section.

93

The MAPS Software - Version 11

The MAPS software requires several inputs in order to create
substructure identification rules. These inputs are summarized in
Table 3.1. The major data inputs are the “substructure buckets” (SS-
BUCKETS) and the “feature buckets” (FEATURE-BUCKETS). The SS-
BUCKETS provide the substructure content of each of the reference
compounds in an inverted format to optimize the calculations by the
MAPS software. The LISP version of the substructure buckets have

the format:

(SS-reference-name-l CMPD-name-a CMPD-name-b ...)
(SS-reference-name-2 CMPD-name-a CMPD-name-b ...)
(SS-reference-name-x CMPD-name-a CMPD-name-b ...)

(...).
A portion of the substructure buckets used in this work is provided
in Figure 3.1. The origin of the substructure buckets is discussed in
Chapter 4 since that chapter deals with the substructure search and
structure generation software. Similarly, the FEATURE-BUCKETS data
input is a list of the spectral features found in the MS/MS spectra of
the reference compounds. The spectral features used in creating the
FEATURE-BUCKETS are primary scan ions (PS), daughter scan ions
(DS), neutral loss masses (NL), and parent—ion / daughter-ion
combinations (PD). Selected portions of the FEATURE-BUCKETS used
in this work are‘ shown in Figure 3.2. The SUBSTRUCTURES
information is contained in a file called “SUBSTRUCTURESDAT”. This
file contains atom deﬁnitions (i.e. chemical symbol, nominal mass,

and number of valences) and substructure definitions (i.e.

94

 

INPUT VARIABLE NAME

SUBSTRUCTURES

SS-BUCKETS

FEATURE-BUCKETS

Umin

Cmin

PURPOSE

a list of atom definitions (the
number of valences), the elemental
compositions and maximum masses
allowed for each substructure for
use by the mass ﬁlter

a list containing each substructure
and the reference compound names
which are associated with the
specified substructure

'a list containing each spectral

feature and the reference
compound names which are
associated with the specified
spectral feature

the minimum uniqueness to be
used by the uniqueness ﬁlter

the minimum correlation to be used
by the correlation ﬁlter

 

Table 3.1: Summary of the inputs to the MAPS (v.11) software.

95

substructure reference name, nominal mass, and empirical formula).
This information, along with Umin. and Cmin, is used by the MAPS
“filters” in selecting the spectral features to be included in a

substructure identification rule.

 

(...)

(88131 MI4060)

(88132 M19492 M17723 M17688 M164 M15826 M13696 M11485
M11844 M15755 M15862 M17044 M17691 M19202)

(...)

 

Figure 3.1: An excerpt from the “substructure-buckets”, SS-
BUCKEI‘S.

The ﬁrst filter used by the MAPS software reduces the number
of potential spectral features that can be correlated to a given
substructure to those that are plausible given the mass and
elemental composition of the substructure. In MAPS (v.II), this
reduction is performed by the RELEVANT function. This function
uses the mass and the elemental composition of the substructure to
calculate the masses of potential fragments of the substructure. The
RELEVANT function passes these masses to the uniqueness and
correlation filters, thus limiting the number of uniqueness and
correlation calculations. This function can be quite useful for
focusing the rule generation on those features which are directly
attributable to a substructure. The mass filter has the disadvantage,
however, of eliminating spectral features which may be due to larger
(and possibly not defined) substructures which encompass the

relevant substructure. The spectral features due to the larger

96

 

(...)
((P 198.0 ) GMR8 GMRIO GMRll GMR17 GMR24 GMR25 MI64 M1592

M11 125
M13 152
M15 75 5
M16834
M17691
M19042

M11485
M13696
M15826
M16837
M17723
M19141

MI 1 844
M13774
M15 862
M16990
M17978
M19202

M1241 1
M14687
M16129
M16998
M18009
M19492

M12423
M147 l4
M16208
M17044
M18268

M12885
M15297
M16827
M17688
M18728

M19895)

(...)

((NL 198.0 ) GMR3 GMR12 GMR14 GMR17 GMR19 GMR24 GMR25
M164 M1680 M11122 M11125 M11844 M12166
M12411 M12423 M12885 M13152 M13691 M13774
M14687 M14714 M15297 M15826 M15857 M16086
M16129 M16208 M16827 M16837 M16881 M17044
M17688 M17691 M17978 M18009 M18728 M19141)

(...)

((D 198.0 ) GMR12 M164 M11125 M11485 M11844 M12411 M12423
M13152 M13696 M13774 M14687 M15297 M15755
M15826 M15862 M16129 M16208 M16827 M16834
M16837 M17044 M17688 M17691 M17723 M18009
M19202 M19895)

(...)

((PD 198.0 154.0 ) M164 M11485 M11844 M13696 M15755 M15826
M15862 M17044 M17688 M17691 M17723 M19042
M19202)

(...)

 

An excerpt from the “feature buckets”, FEATURE-
BUCKETS, showing different features with the same
nominal mass.

Figure 3.2:

97

substructure can provide clues for defining new substructures. Since
there are tradeoffs to using the mass filter, it is possible to enable
and disable this ﬁlter using a compiler switch in ‘the latest version of

MAPS.

The uniqueness and correlation filters are actually
implemented simultaneously. These filters calculate two values
which describe the specificity (uniqueness) of a spectral feature for a
given substructure and the frequency (correlation) the spectral
feature appears with the presence of the specified substructure.
These values are then checked against the minimum uniqueness and
correlation values input by the user to determine if the spectral
feature should be placed in the rule for a given substructure. The
equations used to calculate these values are given in Figure 3.3. The
ability to describe the MS/MS spectral features in this way is, in
itself, quite useful for MS/MS practitioners since these descriptors
provide a means to rapidly summarize the relevant information (i.e.
'with respect“ to substructure, compound class, etc.) contained in a
large body of MS/MS reference data. For example, the “PD” spectral
features in a MAPS rule can be used to assist a user in selecting the
specific CAD reactions to monitor in solving a mixture analysis

problem.

The procedure for generating a MAPS (v.II) rule starting with
raw data is given in Table 3.2. This version of MAPS is, like it
predecessor, a collection of LISP functions that manipulate

substructure and spectral feature data. One important function, GEN-

98

FEATURE-BUCKETS has been replaced by a more efﬁcient C program,
GENF. This program requires: 1) a yes/no response to use intensity
classifiers, 2) the minimum number of compounds in which each

feature must be observed to be included in the FEATURE-BUCKETS, 3)

 

SPECTRAL FEATURE UNIQUENESS:

Ux =
Number of compounds with Fx

SPECTRAL FEATURE CORRELATION:

Cx =
Number of compounds with SS):

881:: substructure x
Fx: feature x

 

Figure 3.3: Equations used to calculate the uniqueness and
correlation of spectral features in MAPS for use by the
U and C ﬁlters.

the output ﬁlename and 4) a list of data filenames. Another C
program is used to convert the ASCII MS/MS data from the TSQ-70
triple quadrupole mass spectometer to a LISP format (TSQXFER).
This program requires the ﬁlenames of the primary mass spectrum
and daughter spectra datafiles as well as the substructure list
obtained from the ASLS program (see Chapter 4). The format of this
file is the same as that used with the version I software. A
disadvantage of this software is that a great deal of knowledge of
LISP and the MAPS functions is required to successfully generate

miles. This problem has been largely overcome with MAPS (v.III).

99

 

1. Deﬁne molecular structures for all reference compounds using

GENOA (see Chapter 4).

Deﬁne all substructures of interest (see Chapter 4).

Acquire MS/MS data of all reference compounds (see Chapter

2).

4. Use the RSX DUMP program on the TSQ-70 data system to

convert binary dataﬁles to ASCII.

Transfer ASCII data to a VAX running the ACES software.

Use the TSQXFER program to create LISP compatible datafiles

which contain the compound name, reference name, mass,

MS/MS data in LISP list format, a list of substructures

contained in the structure of the compound and the molecular

formula of the compound. This is best done using a command
ﬁle.

7. Use the GENF C program to create the FEATURE-BUCKETS for
the entire reference database. This is best done using a
command file. This program replaces the GEN-FEATURE-
BUCKETS LISP command.

8. Create (or modify) “SUBSTRUCTURESDAT” to reflect changes to
the substructure library and / or substructure definitions.

9. Invoke LISP using the VAX VMS command:
LISP/RESUME=MAPS_BASE_SYSTEM. This command restores a
“suspended system” which includes the MAPS LISP functions,
among other things.

10. If changes have been made to the substructure library,
compounds have been added to the database, or there are no
existing substructure buckets, use the MAPS “GEN -SS-BUCKETS”
function to generate the substructure buckets.

11. If compounds have been added to the database or there are no
existing FEATURE-BUCKETS, load the ﬁle containing the newly
generated FEATURE-BUCKETS using the LISP command: (LOAD
“filename”).

12. If changes have been made to the “SUBSTRUCTURESDAT” ﬁle,
load the new version using the LISP command: (LOAD
“filename”).

13. If new versions of the MAPS functions have been written,
these need to be loaded at this time.

WN

O‘M

 

Ta ble 3.2: Procedure for generating MAPS (v.II) rules.

100

 

14. Create a new suspended system if any new dataﬁles have been
loaded using the LISP command: (SUSPEND “ﬁlename”). This
will facilitate a future MAPS session which requires the same
data.

15. To generate a set of rules for a speciﬁed U/C combination:

i) modify the MAKERULE function for the desired U/C
ii) type “(SETQ SSRULES (MAKERULE))”.

This function creates a MAPS.rule for each substructure
contained in the SUBSTRUCTURESDAT file using the
minimum U/C value speciﬁed in the MAKERULE function.
The resulting rules are stored in a list identified by a
symbol (eg. SSRULES). The main use of this function as of
this writing has been to generate initial MAPS rules using
a relatively low U/C combination (eg. 10/10). U/C
combinations with higher values may be obtained using
the GENUCRULE function. Use of this function is much
more efficient than regeneration of the rules using a new
U/C combination.

16. Use the MAPS function GENUCRULE to obtain a MAPS rule for a
particular substructure with a higher U/C combination (eg.
“(SETQ NEWRULE (GENUCRULE 40 70 LOWRULE)”). This
function has mainly been used to obtain the starting features
for feature-combination rules (eg. “(SETQ SSl-CR (GEN-'
COMBINATIONS ‘SSl NEWRULE 100 SS-BUCKETS)”). The
parameter “100” in the example is used to specify the
minimum uniqueness for a combination.

 

Table 3.2: cont.

101

The advantage of the version 11 software for experienced users was
the ability to examine intermediate lists (a characteristic of the
interactive environment that LISP provides) and the relative ease of

programming recursive functions.

The MAPS inclusion rule obtained for the phenothiazine
substructure is shown in Figure 3.4. The numbers in square brackets
correspond to the uniqueness and correlation values of the specified
spectral feature with respect to the phenothiazine substructure. The
original method for applying rules to unknowns used a match factor
(MF) to determine the fraction of rule clauses (i.e. spectral features)
which had to be found in the MS/MS spectra of an unknown for a
substructure identification to be made. The substructure
identifications obtained when the phenothiazine rule was applied
against all of the reference database compounds for several different
match factors are summarized in Table 3.3. There are two indices
which describe the effectiveness of a substructure identification rule
with respect to the reference database. These are the reliability and
recall estimates of a rule. The equations used to calculate these
values are shown in Figure 3.5. The best possible rule is one with
100% reliability and 100% recall. The recall observed for the
phenothiazine rule (with 100% reliability), for example, is 77% for a
match factor of 70%. As can be seen from Table 3.3, recall can only
be increased to 100% by accepting a reliability below 100% (i.e.
allowing false positives). Since false positives need to be held to an
absolute minimum in our system, a new method of rule generation

Called “feature combinations” was developed.

102

 

IF

and
and
and
and
and
and
and
and
and

“ D
“ D
“ D
“ D
“ PD
“ PD
“ PD
“ PD
“ PD
“ PD

(211.0)

(210.0)

(209.0)

(198.0)

(198.0 -> 171.0)
(198.0 -> 154.0)
(197.0 > 196.0)
(197.0 > 153.0)
(196.0 > 152.0)
(70.0 -> 27.0)

[42,77] ”
[41,85] ”
[40,77] ”
[44,92] ”
[77,77] ”
[92,92] ”
[69,85] ”
[91,77] ”
[83,77] ”
[42,85] ”

{F1}
{F2}
{F3}
{F4}
{F5}
{F6}
{F7}
{F8}
{F91
{F10}

THEN substructure PHENOTHIAZINE is present.

(Umin = 40%, Cmin = 70%)

*#

PHENOTHIAZINE
SUBSTRUCIURE

 

Figure 3.4: The

initial

MAPS (v.II) .rule obtained for
PHENOTHIAZINE substructure.

the

103

 

 

NH?G%) RELLAEHJTY’Uﬁ) RECALL(%0
100 100 38

90 100 54

80 100 69

70 100 77

60 80 92

50 75 92

40 65 100

 

Table 3.3: Reliability and recall estimates obtained for the MAPS
(v.11) PHENOTHIAZINE substructure identification rule
at several different match factors.

 

RULE RELIABILITY ESTIMATE:

REL =
total number of predictions

RULE RECALL ESTIMATE:
REC= 11W

total number of possible correct predictions

 

 

Figure 3.5: Equations used to calculate the rule reliability and rule
recall estimates in MAPS.

Initial Work on Feature-Combination Rules

A new method of generating MAPS rules to provide highly
reliable rules with increased recall was explored by Dr. Peter Palmer
using MAPS version I [11]. This method uses “feature combinations”
rather than individual spectral features to generate substructure

identiﬁcation rules. A “feature combination” is simply a collection of

104

two or more spectral features. The importance of these combinations
lies in the fact that the uniqueness of a feature combination is equal
to or greater than the highest uniqueness of the individual
features. In fact, the feature-combination method is targeted at
producing combinations of 100% uniqueness with respect to a given
substructure (although a lower uniqueness could be specified, if
desired). A feature-combination rule is constructed by combining
spectral features until the uniqueness of the combination meets a
specified minimum value (usually 100%). The construction of a
feature combination is aborted if the correlation of the combination
falls to zero. The other objective of the feature-combination method
is to discover a sufﬁcient number of feature combinations to achieve
100% recall. A potential bonus obtained by generating feature-
combinations is the isolation of fragmentation pathways which may
be indicative of the fragmentation of the indicated substructure

within a particular structural environment.

Feature-combination rules have the general format:

"IF <spectral features fa1,fa2,m> are present
or <spectra1 features fb1,fb2,m> are present
or <spectral features fx1,fx2m,> are present

THEN substructure ' X ' is present."
These rules also differ from individual-feature rules in the way they
are applied to MS/MS spectra from an unknoWn. If any feature
combination from a MAPS rule is found in the MS/MS spectra of an

unknown, then the substructure indicated in the rule is concluded to

105

be present in the structure of the unknown. An example of a MAPS

(v.1) feature-combination rule is provided in Figure 3.6.

 

IF (neutral loss of 80s)

OR (neutral loss of 818)

OR (parent giving neutral losses of 81s and 825)

OR (neutral loss of 82s AND
parent ion at m/z 79w AND
parent ion at m/z 80w)

OR (neutral loss of 82s AND
parent ion at m/z 80m AND
parent ion at m/z 81m)

OR (parent ion at m/z 79w AND
parent ion at m/z 80w AN D
parent ion at m/z 81w)

THEN the BROMO substructure is present.

 

Figure 3.6: The MAPS (v.1) feature-combination rule obtained for
the BROMO substructure. Adapted from reference 11.

There are two differences between MAPS version I and MAPS
version 11 that are evident from this rule. First, intensity categories
were not implemented in the version 11 software. Intensity
categorization signiﬁcantly increases the number of features in the
FEATURE-BUCKETS with only a modest increase in the individual-
feature uniqueness. Thus, the rule in Figure 3.6 contains intensity
classiﬁers while the rest of the rules in this and other chapters do
not. Second, the rule shown in Figure 3.6 has a ﬁfth type of feature
called “multiple neutral loss” (eg. “parent giving neutral losses of 813
and 823”). The discovery of multiple neutral losses has yet to be

implemented in versions II and III of MAPS.

106

The BROMO rule shown in Figure 3.6 has a reliability of 100%
with respect to the reference database (the Extrel database not the
TSQ-70 database) as well as 83% recall. Thus, the feature-
combination method of generating MAPS rules can provide rules
with high reliability and high recall. This work Was limited to small
substructures. (eg. chloro, ethyl, phenyl) to restrict the number of
initial features used by the feature-combinations function. The
reason for this caution was the upper limit on the number of possible
feature-combinations is 211-1. Thus, for larger substructures, an
inordinate number of combinations were obtained (i.e. a
combinatorial explosion is observed). The major conclusions reached
in this study were: 1) feature-combinations provide rules with high
reliability and high recall, 2) the feature-combinations method
requires a great deal of computation time and 3) reﬁnements to the
feature-combination search algorithm and increased computer
resources are required to deﬁne the limits of this approach [11]. The
following paragraphs describe recently completed research to

overcome the computational barriers encountered in this study.

Feature-Combination Rules Obtained Using MAPS (v.II)

A simple feature-combination generation function (GEN-
COMBINATIONS) was incorporated into MAPS (v.II) so feature-
combination rules generated using the new TSQ-70 reference

database could be evaluated. The algorithm for this function

107

searches all possible combinations of spectral features produced by
the GENUCRULE function (a function which removes all features from
an initial rule which do not meet speciﬁed U/C criteria). Redundant
combinations were then pruned using the PRUNE-FC function. While
this algorithm was inefficient, it was relatively easy to implement
and produced feature-combination rules for use in ACES five months
before MAPS version 111 was completed. In fact, the experience with
MAPS version 11 led to some design changes in MAPS version 111 (eg.
speciﬁcation of a minimum desired correlation for individual feature

combinations”).

In order to use the MAPS (v.II) software effectively, the
number of initial features must be kept to a minimum (usually 10
initial features, which corresponds to 1023 combinations). The
method used in this study for selection of initial features was to use
the U and C filters already present in the code (using the GENUCRULE
function) to restrict the number of features. A disadvantage of this
method is that no "global" minima could be specified for uniqueness
and correlation which would limit the number of initial spectral
features for all of the substructures of interest. Thus, the initial
uniqueness and correlation values varied depending upon which

substructure rule was being generated.

The number of spectral features obtained from the U and C
filters for the barbiturate substructure using a number of initial
uniqueness and correlation values is shown in Figure 3.7. The

complete reference database for 105 compounds contains 78,308

 

108

MS/MS features. After initial generation of MAPS rules using a
minimum uniqueness and correlation of 10%, the number of features
is reduced to 4,756. This represents a 94% data reduction
with no loss of significant information! This reduced database
can be considered as an initial rulebase for all deﬁned substructures.
Initial MAPS rules are simply collections of spectral features which

meet speciﬁed Ui and Ci minima. Feature-combination rules can then

 

 

 

 

1oo-
, U i - 30
78,308 teatues in database
so a
‘ 4,756 features with U -10% and
In C - 10% from initial rule generation
60 - U i - 40
i 40-
‘6 ‘ u i =- so
a 204
z . . \
o r ____ _ ______ ___ a
o 20 4o 60 so too

 

Figure 3.7: Plot of the number of initial spectral features obtained
for the BARBITURATE substructure for use in
generating feature-combination rules versus minimum
correlation (Ci) for several minimum uniqueness values
(Ui).

109

be generated using the features contained in the initial rules to
improve their reliability and recall.

The U1 and Ci parameters used in the MAPS (v.III) program are
used to control the number and type of features used in the
generation of feature-combination rules. ' For example, there are
three different combinations of the Ui and Ci parameters that yield
30 features (see Figure 3.7). These combinations are approximately
Ui = 50%/Ci = 28%, Ui = 40%/Ci = 30% and Ui = 30%/Ci = 35%. The set
of initial features obtained using the first combination of Ui/Ci
parameter values contains higher uniqueness but lower correlation
features while the set of features obtained with the third Ui/Ci
combination contains lower uniqueness and higher correlation
features. The effect of using different Ui and Ci parameters varies

widely among the substructures.

The procedure used in generating feature-combination rules
with this version of the MAPS software was: 1) input a large initial
correlation value with various initial uniqueness values until the
desired number of initial features was obtained and 2) if no features
were obtained, lower the initial correlation value and try again.
Thus, the initial features obtained using this method had an optimal
degree of correlation with respect to the substructure of interest

(with the 10-13 initial feature limitation).

The spectral features contained in the initial rule obtained for
the phenothiazine substructure, shown in Figure 3.4, were used to

generate feature-combinations. The minimum uniqueness and

 

110

correlation values used to generate this rule were 40% and 70%,
respectively. A total of 10 spectral features were passed by the U
and C filters. The mass ﬁlter was disabled for this particular rule so
several features with masses larger than the nominal mass of the
phenothiazine substructure (i.e. 198 amu) were included in the rule.
The feature-combination rule obtained for the phenothiazine
substructure is shown in Figure 3.8. Note that each feature
combination has a U value of 100%. The overall recall of the rule is
100% with a reliability of 100% when applied to the reference
database. Thus, by using the feature combination method, the recall
for this substructure was raised from 77% to 100% while maintaining
100% reliability.

 

1F “ D (198) and PD (198 -> 154) [100,92] ”
(ll “ D (211) and PD (198 -> 154) [100,85] ”

THEN substructure PHENOTHIAZINE is present.
REL = 100% REC = 100%

(Umin = 40%, Cmin = 70% for initial features;
rule clauses with correlation < 85% were deleted)

 

Figure 3.8: MAPS (v.II) feature—combination rule for the
PHENOTHIAZINE substructure with a recall and
reliability estimate of 100%.

Twenty—five substructure identification rules were generated
using this software. However, the recall observed for several of

these rules was lower than expected. For example, an overall recall

111

of 46% was obtained for “88108”, a substructure consisting of a 6
membered ring with 1 nitrogen and 5 carbons. One of the reasons
that a high degree of recall was not observed was the relatively
small number of initial features used in generating the rule. The
latest version of the MAPS software, MAPS version 111, incorporates
several strategies to improve the generation of feature-combination
rules from a relatively large pool of initial spectral features. For
example, the algorithm used in MAPS version 11 could only
manipulate about 13-15 initial features before exhausting virtual
memory. In contrast, a recall of 96% was achieved for the “88108”

substructure using 123 initial features and MAPS version III.
The MAPS Software - VersiOn 111

There are three programs which comprise the MAPS (v.III)
software. These programs include: GENT (GENerate Training set) a
new version of GENF which generates the reduced substructure and
feature data used by MAPS, the MAPS (v.III) program itself and the
RULE program which applies feature-combination rules to the
MS/MS data of unknown compounds. These programs are not only
more efﬁcient, but are easier to use. The following paragraphs
describe the 'major enhancements found in 'this version of the MAPS

software.

112

The GENT Program

The GENT program effectively performs steps 6 and 12 from
Table 3.2. The program begins by prompting for the primary mass
spectrum and daughter spectra filenames. A‘ reference ﬁlename for
each compound is also requested. After the datafiles for each
reference compound have been entered, the program prompts for
the substructure list ﬁlename (usually “SUBSTR.LIS”, the default
filename used by the ASLS program), the filename for the
substructure data (usually “SUBSTUCTURES.DAT”), the minimum
initial feature uniqueness (Ui), the minimum initial feature
correlation (Ci), and the output ﬁlename. Thus, the new “FEATURE-
BUCKETS” produced by GENT are actually initial MAPS rules. This
program is best run using a command file because there are usually

many dataﬁles to be entered.

Low initial feature uniqueness and correlation are input to
GENT (usually 10% for each). The format of the GENT output ﬁle is
provided in Figure 3.9 and the contents of the ﬁle obtained for a
small number of compounds is furnished in Figure 3.10. The file
begins with a list of substructure reference names and mass entries
(from “SUBSTRUCTURES.DAT”). This list is followed by another list
detailing the compound reference names and substructures
contained in the structure of each compound (from “SUBSTR.LIS”). A
list of features is then included at the end of this file. Each feature in
this ﬁnal listis followed by a bit string (i.e. a line of ones and zeros)
which relate compounds with spectral features. Thus, for a reference

database containing 105 features, a string containing 105 bits would

113

 

< substructure referenence name 1 > < mass 1 > [< formula 1 >]
< substructure referenence name 2 > < mass 2 > [< formula 2 >]

< substructure referenence name 11 > < mass 11 > [< formula n >]

< compound referenence name 1 > < (88a, 88b, ..., $82 >
< compound referenence name 2 > < (88a, 88b, ..., $82 >

< compound referenence name 11 > < (88a, 88b, ..., $82 >

< feature 1 > < l|0 > < ($821 Ua Ca), (SSb Ub Cb), ..., (SSz Uz C2) >
< feature 2 > < “0 > < (8821 Ua Ca), (SSb Ub Cb), ..., (SSz Uz C2) >

< feature 11 > < “0 > < (SSa Ua Ca), (SSb Ub Cb), ..., (SSz Uz C2) >
< eof >

 

 

Figure 3.9: The GENT output ﬁle format.

114

285.53 232538 25 28
3259—80 08933 3:288 9. maﬁa 30389 2: 39.5 9750 2: Sec .938 =< “3d 952.:

Ace 8 23
Ge 8 888 :8 S 2.88 2:222:22:2528225222 8.2 .c

a.» 8 888 an S 288 2222588:8822822285: 8.: e
an 8 888 8 8 888 258882888822288228888 8.: .8

3m 02 on w my cooocccc _ ooooocoocoooo a oooocooc _ 9 8o 25¢ 36: oﬁom an:
SN 2: on m m v cocoa ﬂ co g oooooooooooocoooocoo ~ ooooog So 862 0.3— an:

8N mm ow mmv coca Soc 5 ﬂ oooooooocoooooooooocoo g coo 2: Aodwy ada— DAG
Gem 2: on mmv coocoooo _ ooooocoo g cog socooooo _ coca Soc 8&3 oﬁom an:

38559280 av .8 8.2 a 8.. no 8 28v
cmmm 3mm N320
ommm 3mm :éo
cmmm 3mm ©3840

a cmmm

66
0A5 3mm

 

115

follow each feature. The bit string is, in turn, followed by a list of
substructures with a corresponding uniqueness and correlation for
the feature / substructure pair. The GENT output file contains most
of the data required by the MAPS (v.III) program.

The MAPS (v.III) Program

The MAPS (v.III) program was developed with the
programming assistance of Drake Diedrich and includes several
enhancements not found in previous versions of MAPS. This
program requires: 1) the GENT output ﬁlename (eg. “GENT.OUT”), 2) a
substructure reference name to identify the desired substructure
rule (eg. SS141 or ALL for all substructures), 3)‘ the minimum initial
uniqueness (Ui), 4) the minimum initial correlation (Ci), 5) the output
filename for the feature-combination rule(s), and 6) the minimum
correlation for each feature-combination (Cc). MAPS (v.III) is best
run using a command ﬁle if the user is exploring assorted Ui/Ci/Cc

combinations to determine their effect on rule recall and content.

Several parameters which control the rule generation were
implemented as compiler switches. One parameter controls the sort
order of the features which meet the newly specified minimum
uniqueness and correlation (i.e. the features used to generate the
combination). This parameter is set by defining one of three symbols
to select a uniqueness sort (USORT), correlation sort (CSORT) or mass
sort (MSORT). For example, the initial features used to generate
feature combinations could be sorted so that the highest uniqueness

features are tried ﬁrst by defining USORT (eg. #deﬁne USORT) in the

116

MAPS source code. Other parameters set the minimum and
maximum allowed feature-combination length (MAXF and MINF,
respectively), the minimum number of compounds which must
produce a feature before it can be used in a rule (MINCMPDS) and a
maximum elapsed feature-combination generation time (SSTIME). A
typical value for SSTIME is one hour. An additional parameter, HITS,
was included to stop feature-combination generation after recalling

the same compounds a speciﬁed number of times.

The feature-combination generation algorithm used in MAPS
(v.III) takes a different approach to calculating uniqueness values
for partially constructed combinations. The idea behind this
algorithm is that eliminating false positives (an integer value) is
precisely the same as increasing uniqueness (a ﬂoating-point value).
Thus, the goal in this approach is to reduce the false positives for a
given combination by adding or subtracting initial features. The
integer false positives calculations can be performed more efficiently
than the corresponding uniqueness calculations using our current

hardware.

The feature-combination generation algorithm begins with the
sorted initial features. A feature-combination is constructed by
adding features from the sorted list until no false positives are
observed or the recall for the combination falls below a specified
minimum value. Figure 3.11 illustrates this procedure. In part A of
this figure, feature 2 was added to feature 1 but this did not

decrease the false positives associated with feature 1. Thus, a zero

117

 

Feature Combination Bit String
t1 t2 13 f4 t5]... in,

A) 10l1|0|0|0|0|

r1 r2 r3 t4f§u.tn
3) 00|1l0|0l0|0l

r1 f2 t3t4t5 u. in
c) lolol1|ol1o|o|

Figure 3.11: An illustration of feature combination bit string
generation with backchecking.

 

 

was placed in position 2 of the feature-combination bit string.
Feature 3 was then tried in the feature-combination. In this case,
adding feature 3 decreased the false positives associated with the
combination so a l was placed in position 3 of the feature-
combination bit string. An additional optimization called “back-
checking” comes into play at this point. Only the non-redundant
features (i.e. features that have some effect on the uniqueness or
correlation of a combination) should be included in a feature-
combination. Thus, features which do not affect the reliability or
recall of a feature-combination need to be pruned from the feature-

combination.

 

 

 

118

The back-checking optimization occurs after a new feature is
added to a combination (part A, Figure 3.11). The feature-
combination generation function checks the features added before
the latest one to determine if removal of any of these features
increases false positives or decreases recall. If this check is negative
for a feature (i.e. removing the feature does not increase false
positives or decrease recall), then the bit for that feature is changed
to zero, as shown in part B of Figure 3.11. This process does not
preclude the feature from future consideration in feature-
combinations, but only from the current one. In other words, it is
certain that the features removed from one feature-combination are
tried in future combinations since they have already demonstrated
some ability to decrease false positives. The major advantage of
back-checking is that feature-combinations with non-redundant

features are produced by the generation function.

Sample output for the MAPS (v.III) program is provided in
Figure 3.12. The program will output the initial features and their U
and C values with respect to the input substructure. The number of
compounds with the input substructure, the number of rule clauses
(feature-combinations) generated, the overall recall of the rule and
the generation time are provided by the program. The format for
the rule output file has not been changed from the LISP format and

is shown in Figure 3.13.

A total of 42 substructure identiﬁcation rules were generated

out of 49 possible rules with an average recall of 82% using MAPS

119

 

Training set ﬁle: gent_pl_10_10.out
105 compounds 127 substructures
4756 features

Substructures: SS4 8818 8819 8820 8821 8844 8845 8846 8850 8893
88133 8860 8812 8822 8823 8824 8825 8826 8827 8828 8829 8830
8831 8832 8833 8834 8835 8836 8837 8843 8811 8810 8878 8856
8870 8887 8892 SS3 SS7 8888 8851 8854 882 8840 8863 8815 8848
8890 8877 88118 8857 8859 88100 88108 88116 88124 SS6 88144
88148 8871 8872 88141 88143 88145 8885 88110 88132 88147
88142 8813 8847 88130 8861 8884 88104 88155 8891 88102 88157
8882 88122 8864 8865 8883 8898 88121 88156 88158 88418855 8862
88111 88112 88113 88120 88137 88125 88127 88128 88131 8895
88159 885 88119 8897 88149 88146 88105 88109 8867 8869 88107
8894 88126 88135 88123 88139 88117 8886 88153 88154 8896
88103 88151 88152 88115 8874

Generate rule for substructure (name,ALL,retum): s s l 3 2
Feature uniqueness[0.3]: .4

Feature correlation[0.3]: .7

Output filezss l32_p l_40_70_l 00_50.out

Combination correlation[0.7]: .5
88132 7 features 7 s

88132 features

( (PD 198.0 154.0) U=92 C=92)
( (PD 197.0 153.0) U=90 C=76)
( (PD 196.0 152.0) U=83 C=76)
( (PD 198.0 171.0) U=76 C=76)
( (PD 197.0 196.0) U=68 =84)
( (PD 70.0 27.0) U=44 C=84)

( (D 198.0) UﬂZ C=92)

88132 7 features 13 compounds 10 clauses 100% recall 0 5
Generate rule for substructure (name,ALL,retum):

 

Figure 3.12: Sample output from the MAPS (v.III) program with
user input highlighted in bold faced type.

120

 

(setq SSl_COMB_RULE ‘(

(((feature-al) (feature-bl) (...) (feature-n1» Uc Cc (compound-1
compound-2 compound-x))

(((feature-a2) (feature-b2) (...) (feature-n2)) Uc Cc (compound-1
compound-2 compound-x» ‘

(((feature-ay) (feature-by) (...) (feature-ny)) Uc Cc (compound-1
compound-2 compound-x))

))

(setq 882_COMB_RULE ‘(

(((feature-al) (feature-bl) (...) (feature-nl)) Uc Cc (compound-1
compound-2 compound-x))

(((feature-a2) (feature-b2) (...) (feature-n2)) Uc Cc (compound-l
compound-2 compound-x»

 

(((feature-ay) (feature-by) (...) (feature-n10} Uc Cc (compound-1
compound-2 compound-x))

))

(setq SSn_COMB_RULE ‘(

(((feature-al) (feature-bl) (...) (feature-n1)) Uc Cc (compound-l
compound—2 compound-x))

(((feature-a2) (feature-b2) (...) (feature-n2» Uc Cc (compound-1
compound-2 compound-x))

(((feature—ay) (feature-by) (...) (feature-n30) Uc Cc (compound-l
compound-2 compound-x»

))
<eof>

where 881, etc. is the substructure reference name, feature-a1, etc.
are any spectral features, Uc is the feature-combination uniqueness,
Cc is the feature-combination correlation, and compound—l, etc. are
the compound reference names that each feature combination recalls.

 

Figure 3.13: The format of the MAPS feature-combination rule save
files.

121

(v.III). The substructures corresponding to these rules vary widely
in complexity (eg. phenyl, phenothiazine, barbiturate and
dimethylamino). The initial conditions used were: Ui=30%, Ci=30%
and Cc=30%. However, these values are not necessarily optimal for
all substructures. Therefore recall can be further improved using
different starting values for some of the substructures with less than
100% recall. An important, and limiting consideration, however, is
the reliability of the rules when they are applied to compounds
outside of the reference database. This question is addressed in

Chapter 6.
The RULE Program

The rule program is used to apply MAPS feature-combination
rules to MS/MS data of unknown compounds. The required inputs
are: 1) the ﬁlenames of the primary and daughter mass spectra of an
unknown, 2) the name of the file containing the rules and 3) the
output filename. The substructures identified as present (using
inclusion rules) are written to the output ﬁle in a format given in
Chapter 4. A sample of the output from the rule program for a test

compound is [shown in Figure 3.14.

The MAPS (v.III) software represents a significant advance in
the development of ACES. These programs have increased the
amount of data that can be used to generate substructure
identiﬁcation rules using the feature-combination method. Previous

versions of the MAPS software could not effectively utilize the data

 

122

 

Input Primary Spectrum[*.dat]: khunklmp
1
Input Daughter Spectra[*.dat]: khunklmd
12345678910111213141516171819202122232425
26 27 28 29 30 3132 33 34 35 36 37 38 39 40 4142 43 44 45 46
47 48 49
Rule file: maps2_p1_all_rules.out
Output ﬁle: khunkl.res
Unknown has the following substructures:

8818 (P 63.0) (D 59.0)

8818 (D 59.0) (N 14.0)

8818 (D 59.0) (N 50.0)

8818 (D 59.0) (N 51.0)

(and so on for all remaining substructure identiﬁcations)

 

Figure 3.14: Sample output from the RULE program with user input
highlighted in bold faced type.

in the newly generated reference database. The feature-combination
method has proven to give highly reliable rules with respect to the
reference database. These rules are critical to the success of the
ACES system. All three of the programs that comprise the MAPS
(v.III) software can be used in batch mode through the use of
command ﬁles. Thus, a range of parameter values for each program
can be efficiently explored. The result file written by the MAPS
program is used by the automated structure generator to obtain
candidate structures for the unknown. The structure checking and
generation software used in the ACES system is described in the next

chapter.

 

 

 

 

123

 

References

1. Wade,A.P. Palmer, P..T, Hart, K..,J Enke,C.G., MW
215,169(1988).

2. Palmer,.P.T., Hart, K.J., Enke, C.G., Manta, :15. 107 (1989).

3. Hart, K.J., Enke, C.G., 37th ASMS Conference on Mass
Spectrometry and Allied Topics, Miami Beach, FL, p. 348
(1989).

4. Hart, K..,J Enke, C.G., Proceedings of the Symposium on
Chemometrics and Intelligent Automation, W
W in press (1989)

5. McLafferty, F.W., Stauffer, D.B., 1W, 2;,
245 (1985).

6. Dayringer, H.E., Pesyna, G.M., Venkataraghavan, R., McLafferty,
F.W., W11. 529 (1976).

7. Buchanan, B.G., Smith, D.H., White, W.C., Gritter, R.J.,
Feigenbaum, E.A., Lederberg, J., Djerassi, C., W,
28, 6168 (1976).

8. Schildt, H., “Artiﬁcial Intelligence Using C”, Osbourne McGraw-
Hill, New York, NY, 1987.

9. Barber, G.R., W212), 28 (1987).

10. Roland, J., Mm, 214),, 46 (1987).

11. Palmer, P.T., Ph.D. Dissertation, Michigan State University, East

Lansing, MI, 1988.

 

 

CHAPTER 4

Automated Substructure Search and Structure
Generation

Introduction

Two of the important functions required to implement the
Automated Chemical structure Elucidation System (ACES) are 1) the
ability to search structures of standard compounds to determine
their substructure content and 2) the ability to construct candidate
structures for an unknown compound. The latter function is needed
to provide the structures which are consistent with the results of the
ACES interpretive software. The first function is essential for both
the generation of the MAPS substructure identiﬁcation rules and for
the evaluation of the candidate structures of an unknown compound.
Programs for substructure search and isomer generation have
already been developed [1-4] but lack the interfaces and level of
automation required by the ACES system. Consequently, one of the
major tasks in making ACES a reality was modiﬁcation of one of
these programs to provide the substructure search and structure

generation functions needed by ACES.

The software selected for use in this project was GENOA [5], a

structure generation package originally developed as part of the

124

 

g}.

125

DENDRAL project [6]. There are a number of publications in the
literature regarding the use of GENOA. Programs have been
developed for the interpretation of mass spectra using rules of
fragmentation and for the prediction of mass spectra of candidate
structures [7]. These programs were applied to the analysis of
marine sterols. Another program was developed to interpret and
predict 13C-NMR spectra. Substructures derived from these
programs were used to constrain generation of candidate structures.
The structures obtained were then ranked according to the similarity
of their predicted mass and l3C«-NMR spectra [8-10]. Similar
programs were developed for lH-NMR [11] and two-dimensional
NMR [12]. Use of these programs in the analysis of natural products
was also described [13]. The advances in the interpretation of NMR
data were accompanied by updates to GENOA to include generation of
stereoisomers. Much of the GENOA software was subsequently
commercialized by Molecular Design, LTD [14]. Among the functions
not included in the commercial version of the software were mass

spectral prediction and generation of stereoisomers.

This chapter begins by introducing the techniques of
computerized representation of chemical structure. Connectivity
tables are discussed in greatest detail since they are the most
appropriate representation for computerized structure elucidation.
The programs included in the GENOA software are then described,
along with the modifications that were necessary to provide the

structure manipulation functions essential to ACES.

 

126

Computerized Representation of Chemical Structures

There are several representations of molecular structures that
are compatible with computer data storage and manipulation. These
have been reviewed in a book by Gray [15]. Structure representation
schemes have been categorized as 1) fragment codes, 2) linear
notations, 3) connection tables and 4) coordinate representations
[16]. These representations vary in the information they provide (cg.
atomic connectivity vs. molecular shape) and in the computation time
required to develop and interpret them. Consequently, different
applications may best be served by different representations. The
following paragraphs provide brief descriptions of the major
structure representations and] the applications for which they are

used.

Fragment codes usually describe the key functionalities
contained in a structure and are most often used in the searching of
large databases. These codes are also referred to as screens since
they sometimes consist of a range of atoms (eg. 5-10 carbons). The
advantage of fragment codes is that a large database can be rapidly
searched using these codes. The principal disadvantages are 1) they
do not allow direct reconstruction of a structure and 2) they do not

provide for generalized substructure search [15].

Linear notations have also been used extensively in searching

large databases. The Wiswesser linear notation (WLN) is an example

127

of this type of structure representation. The WLN string obtained for
the structure shown in Figure 4.1, for example, was: T56 BN DN FN
HN BU FU HUT] IZ DBT50TJ CQ DQ ElQ [17]. However, the conversion
of structure diagrams to WLN and back again has been only partially
successful. Another limitation of this notation is the difficulty of

manipulating them to derive substructure information [15].

 

 

Figure 4.1: Structure corresponding to WLN notation given in text.
Adapted from reference 17.

Connectivity tables provide a list of the atoms contained in a
molecule or substructure and the atoms to which they are bonded.
Thus, they impart the topology of a molecule or substructure. An
early format for computerized connection tables was published by

Ray and Kirsch in 1957 [15,18]. The connection table published for

128

chloral is shown in Table 4.1 [19]. The structure of chloral and the
corresponding atom numbering are also shown. Among the notable

characteristics of this format are 1) the explicit numbering of

 

 

Component Connections Element Symbol
1 2 O
2 1,3 =
3 2,4,6 C
4 3 H
5 6 C1
6 3,7,8,5 C
7 6 (J
8 6 Cl

5 1
Cl 0
' 6 II2 4
Cl — C- C—H
a j 3
Cl
7

 

Table 4.1: Connectivity table for chloral from reference 17.

hydrogens and 2) the use of element symbols for bond order. The
DENDRAL format for hydroquinone is shown in Table 4.2. This
format does not express the connectivity of hydrogen explicitly.
Also, double bonds are expressed as multiple occurrences of a
connection to an atom (eg. a double bond between carbon #4 and
carbon #6 in the table). A number of other atom properties such as
hybridization is provided in this format as well. Atomic connectivity
can also be represented in matrix form using the atom connectivity

matrix (ACM) [15]. The ACM for hydroquinone is shown in Table 4.3,

 

 

 

 

129

 

 

Atom # Type Neighbors Artype Hybrid
1 O 3 SP3
2 O 8 SP3
3 C 5 1 4 AROM SP2
4 C 6 6 3 AROM SP2
5 C 7 3 3 AROM SP2
6 C 8 4 4 AROM SP2
7 C 8 8 5 AROM SP2
8 C 2 6 7 7 AROM SP2

4 6
C=C

 

Table 4.2: DENDRAL connectivity table for hydroquinone from
reference 15.

and uses the same atom numbering as shown in Table 4.2. The
diagonal elements are used to denote atom properties (eg. atom
name) and the off-diagonal elements are used to deﬁne connectivity
(eg. bond order). Connectivity tables are most appropriate in

systems where structure manipulation and substructure search are

 

 

l 2 3 4 5 6 7 8
l O 0 1 0 0 0 0 0
2 0 O 0 0 0 0 0 1
3 1 0 C 1 2 0 0 0
4 0 0 1 C 0 2 0 0
5 0 0 2 0 C 0 l 0
6 0 0 0 2 0 C 0 1
7 0 0 0 0 1 0 C 2
8 0 1 0 0 0 1 2 C

 

Table 4.3: Atom connectivity matrix for hydroquinone from
reference 15.

 

130

required. Consequently, they are often encountered in structure
elucidation and synthesis programs. The major disadvantage of
connectivity tables for some applications is the computation time

required to manipulate them.

Coordinate representations provide the shape of molecules and
are of value in studying structure-activity relationships [15]. The
difference between this representation and connectivity tables is
that coordinate representations show the relative positions of atoms
in 3-dimensional space (eg. conformational stereochemistry). The
torsion angles contained in these representations “are derived from X-

ray crystallography or quantum mechanical calculations.

The Structure Generation Program, GENOA

GENOA is an interactive isomer generation program which
produces an exhaustive and nonredundant set of structural isomers
based on substructure constraints. The program requires as input
the molecular formula of an unknown and substructure constraints.
These constraints can be inclusive (i.e. a substructure present in the
unknown), exclusive (i.e. a substructure not present in the unknown)
or alternative (i.e. a list of one or more substructures, one of which is
present in the unknown). This generation program also has the
distinct advantage of being able to handle overlapping substructures.
Thus, each substructure constraint entered into the program does not

have to be a distinct structural entity. In terms of the MAPS

131

software, the substructures found as present can all be entered into
the program without further analysis to determine whether a
substructure identification is an unique occurrence, or is, in fact, due

to a larger substructure.

Structures and substructures are represented in this software
using connectivity tables. The connectivity table obtained for the
barbiturate substructure is illustrated in Table 4.4. The HRANGE
descriptor is used to control the number of hydrogens which may be
attached to each atom. The sites where other atoms, either free

atoms or atoms in other substructures, may bond are called "free

 

 

valences". Free valences can be designated in a substructure
Atom # Type Neighbors Hrange
1 N 6 2 FV 0-1
2 C 7 7 3 1
3 N 4 2 FV 0-1
4 C 8 8 5 3
5 C 6 4 FV FV 0-2
6 C 9 9 1 5
7 O 2 2
8 O 4 4
9 O 6 6
O
\N_zcl 7
6/1 \
90: SN—
5\C—C/
/ | 4 \
8O

 

Table 4.4: GENOA connectivity table obtained for the barbiturate
substructure.

132

deﬁnition explicitly using the FV descriptor or implicitly using the
HRANGE descriptor. Free valences are actually converted to

HRANGES in the computer software [15].

The major functions of the GENOA program are summarized in
Table 4.5. Atom and substructure deﬁnitions can be saved to a ﬁle
for repetitive use. Atom deﬁnitions consist of a symbol and the
number of valences (eg. C, valences = 4). Substructures are defined
by creating the connectivity table for each substructure. The
molecular formula definition usually changes for each session. The
draw command allows substructures to be viewed. However, these
drawings are collections of characters and are occasionally confusing
to interpret. The constraint command allows structural constraints to
be entered with a range of occurrences (eg. none, at least 2, exactly
1). Once the molecular formula and substructure constraints have
been entered, the generate command can be issued to begin the

generation of candidate structures.

A typical GENOA session is shown in Figure 4.2. User supplied
input is highlighted in bold faced type. The program is command
driven with the “it” symbol as a prompt. The ﬁrst command given to
the program in the example deﬁnes the molecular formula of a test
compound (i.e. butalbital). The next two commands retrieve
substructures from a user defined substructure library called
“SUB.LIB”. Drawings for the “88145” and “8872” substructures are
provided in Figure 4.3. The program is highly interactive and

prompts for required information if it is not found in the entered

133

 

COMMAND

ALTERNATIVE

BLEACH
CONSTRAINT

DRAW

DEFINE

IDRCET
GENERA'IE

PRESUPPOSE~

DESCRIPTION

is used to define a substructure
constraint on one of a number of
substructures (eg. ALTERNATIVE
carbonyl ethyl

is used to remove color designations in
cases. Atoms may be “colored” to
control how substructures overlap.

is used to define a substructure

constraint with a number of occurrences
(eg. CONSTRAINT ethyl at least 1).

is used to draw structures, cases and
substructures using ASCII charachters
in either a numbered or atom-labelled
format.

is used to define atoms, substructures
and the molecular formula.

is used to terminate the program.

is used to correct an already existing
definition.

is used to remove deﬁnitions.

is used to initiate exhaustive structure
generation.

is used to start structure generation
using a file of alternative substructures
(eg. in some natural products one might
assume the presence a terpenoid
substructure).

 

Table 4.5: Summary of major GENOA commands.

134

 

 

COMMAND DESCRIPTION

RESTORE is used to restart a previously saved
session or a file of substructure
definitions.

SAVE is used to store the current status of a
session in a disk ﬁle.

SEARCH is used to search files for definitions.

SHOW is used to show current status, history

and conectivity tables.

 

Table 4.5: cont.

135

 

 

WELCOME TO GENOA, VERSION 83.0, MOLECULAR DESIGN LTD.
#define molform c 11 h 16 n 2 o 3
MOLECULARFORMULADEFINED
#search mylib sub.lib 38145

88145 DEFINED

#search mylib sub.lib ss72

8872 DEFINED

#constraint

SUBSTRUCTURE NAMEzssl45

RANGE OF OCCURRENCES:at least 1

1 CASE WAS OBTAINED

#constraint 3372 at least 1

#.

1 CASE WAS OBTAINED

#generate

#..D..DDD.

5 STRUCTURES WERE GENERATED
TRANSFERRING CONTROL TO STRCHK...

 

 

Figure 4.2: A sample GENOA session.

 

136

command line. This feature is demonstrated in the example for the
ﬁrst “constraint” command. Another constraint is entered and then
the generate command is issued. In this example, 5 structures were
obtained. GENOA outputs a period to the screen for each unique
structure generated and a D for each duplicate structure. Program
control is then transferred to another program called STRCHK which

is described in the next section.

 

\N _clo
/N \
O=C N— C
\c_ c/ __c/
88145 8872

 

Figure 4.3: Substructure drawings for the 88145 and 8872
substructures.

The Structure Checking Program, STRCHK

One of the major programs included with GENOA is a structure
checking program called STRCHK [8]. This program provides the
ability to interactively search complete structures (obtained from the
structure generation program, GENOA) for occurrences of predefined
substructures. The STRCHK program has many of the same
commands as GENOA (see Table 4.5) with the following additions:

137

PRUNE a command used to remove structures which do not
meet the constraints given in the command line (eg.
PRUNE 883 at least 1 would remove all structures which
did not have at least 1 occurrence of 883).

SURVEY provides a synopsis of the substructure content of the
list of candidate structures obtained from GENOA.

The SURVEY results using the structures generated in the
previous example are shown in Figure 4.4. Once again, user output is
highlighted in bold faced type. The key outputs from the program
are shown in bold faced, italic type. The most important output, in
terms of evaluating a set of candidate structures, is the list of
discriminating substructures. This output consists of a list of
substructure names and an associated number of occurrences in the
candidate structures. Discriminating substructures indicate
quantitatively the relative importance of substructures not yet
determined as being present or absent by the interpreter (or
interpretive software). Thus, the interpreter can focus attention on
identifying these substructures using existing data or possibly data
from new experiments. It is also possible to determine the number
of structures which possess combinations of discriminating features
using the SELECT command (a sub-command of SURVEY) as shown in
Figure 4.4.

Use of Substructure Search in Rule Generation

The substructure buckets are one of the major pieces of

information required by the MAPS software. These data summarize

138

 

#survey y mylib sub.lib all n

READING ENTRIES FROM LIBRARY FILE.

 

 

SCANNING niiioucH STRUCTURES.

THE FOLLOWING LIBRARY FEATURES WERE NOT FOUND IN
ANY STRUCTURE.

(long list of substructures)

ITLE.FOMJXTWUNM?IUMHLARY’IHﬁATLUUESIAREIIN'AIJL
SIWHWCTTHLES.

8819

8843

8872

88116

88141

88143

88145

#STRUCTURES WITH DISCRIMINATING FEATURES:
4 886
3 8871
Do you want to select structures with combinations of features?y
->select
Desired features>ssG and 8871

3 structures.
->done
#

 

Figure 4.4: A demonstration of the SURVEY function of ST RCHK.

139

the substructure content of each of‘ the reference database
compounds. Manual construction of the substructure buckets is time
consuming and prone to errors, especially as the size of the reference
database increases. Consequently, a computerized method of
creating molecular structures, storing the computer representations
obtained, and searching each structure for occurrences of predeﬁned

substructures is required.

This problem was largely solved with the acquisition of the
GENOA software. However, three major tasks remained to provide
automated structure searching. First, a library of predefined,
substructures had to be constructed. The libraries provided with the
software were not directly ‘ applicable to the reference database
compounds. Second, the structures of each of the reference database
compounds had to be generated using the GENOA program. Third,
the STRCHK program had to be modified to allow a series of
structures with differing molecular formula(e) to be searched. The

substructures found to be present are written to a disk file.

The construction of the substructure library required the use of
the substructure definition commands in GENOA. The interactive
version of STRCHK was used to complete this operation. An example
of the commands used to create the substructure definitions is shown
in Figure 4.5. Two types of substructure definitions were included in
the library. The first type of deﬁnition includes those substructures
which define a particular compound class (eg. barbiturate,

phenothiazine, phenol, etc.). The second type of deﬁnition includes

 

 

140

 

WELCOME TO GENOA, VERSION 83 .0, MOLECULAR DESIGN LTD.
#define substructure 33161

(NEW SUBSTRUCTURE)

>?

CHAIN RING BRANCH LINK JOIN BORD HYBRIDIZATION

ATN AME HRANGE FREEV ARTYPE UNJOIN COLOR

ERASE SHOW DRAW GET DONE HALT

 

 

>ring 6 /* */
>bord /* */
FIRST ATOM:1 /* deﬁne a benzene ring */
SECOND ATOM:2 /* */
BONDORDERz’Z /* ' */
>bord342562 /* */
>branch I! !/
FROM ATOle I! fuse 4 carbons to make a !
LENGTH OF BRANCH:4 / l 6 membered ring !/
>join 2 10 ' / l l/
>branch 7 4 /+ fuse 4 more carbons to +/
>join 14 8 /+ make another ring +/
>branch 7 3 /> >/
>at 17 N /> fuse 2 C’s and 1 N to >/
>join l7 9 /> make a 5 membered ring >/
>done

88161 DEFINED

 

Figure 4.5: GENOA session illustrating the creation of a
substructure definition.

141

aromatic ring isomers with differing substitution patterns. These
substructures were included mainly to detect ring isomers among the
set of candidate structures, rather than to produce MAPS
substructure identiﬁcation rules (a difﬁcult, if not impossible task
without resorting to isomer specific experiments such as energy
resolved mass spectrometry). Once all of the substructures were

deﬁned, they were saved to a file called “SUB .LIB”.

The substructure library currently contains 160 substructure
definitions. The reference names of the substructures were
generalized to “88x” where “x” is the entry number in the list of
substructures. Previously, descriptive names such as “ethyl” or “1,2-
phenyl” were used. However, it became increasingly difficult to
formulate short descriptive names, especially for closely related
substructures. Usually, the substructure definitions were kept as
general as possible since it is difficult to make assumptions about
substitution. patterns on the substructures. For example, “8818” is a
generic benzene ring with no speciﬁc substitution pattern. For rule
application, this is the only substructure that is submitted to the
structure generator. There are 127 substructures out of this library

which are found in the reference database compounds.

It should be emphasized that our approach to substructure
identification is not to identify the speciﬁc structure of an ion
and then relate that structure to the overall molecular structure.
Rather, our approach is to discover those spectral features in

the MS/MS‘ dataspace which are indicative of the presence

 

f, ..

 

142

(or absence) of a predefined substructure. This point has been
illustrated using the ion structure of the 149+ ion and the phthalate-
ester substructure [20]. Thus, the substructure definitions are
descriptors of key molecular substructures (not ions) and a reliable
MAPS rule may or may not be found for a given substructure

definition.

The GENOA-formatted structure for each of the reference
database compounds was obtained in the following manner. A
substructure definition was created which included the connectivity
of all of the atoms in each of the compounds except hydrogen. The
resulting substructure deﬁnition was then used as a constraint in the
structure generator. Thus, only one structure was obtained when the
structure generation was initiated. Each structure was then saved to
a ﬁle with a standard ﬁle extension, “.MSMS”. This extension allows a
ﬁle to be easily created which contains the filenames of all of the
GENOA-format structures. There are currently 105 of these
structures corresponding to the reference database compounds listed

in Chapter 2.

The Molecular Design version of the STRCHK program was
modified as part of this research to automatically check the
structures of each of the “.MSMS” ﬁles. Several file protocols were
also developed to automate this program for use in ACES. The
modified program is referred to as ASLS (Automated Structure
Library Search) within the ACES system. This program assumes that
there is a ﬁle called “STR.LIS” which contains the filenames of all of

 

 

143

the reference database structures. This ﬁle can be easily generated
using the VMS (the VAX operating system) DIRECTORY / NOHEADER /
NOFOOTER command. For convenience, a command file which creates
the “STR.LIS” ﬁle and then invokes the ASLS program was written as
part of this research. The results of the program are written to
another ﬁle called “SUBSTR.LIS”. This is the ﬁle used by the MAPS
software to create the substructure buckets and subsequently,
substructure identiﬁcation rules. The format of the “SUBSTR.LIS” file

is shown in Figure 4.6.

Probably the only statistic of general interest regarding the
“SUBSTR.LIS” file obtained from the current reference database is the
average number of substructures found in the reference database
compounds. On average, 11 substructures were identified. This
average means that the compounds contained in our database are not

simple (i.e. monofunctional) but are in reality, quite complex.

7 A number of improvements could still be made to the ASLS
program. These improvements include: 1) creating the “STR.LIS” ﬁle
from within the program using VMS run-time library commands,
thus eliminating the need for a command ﬁle, 2) creating a “unified”
data format so all of the reference database structures can exist in
one file, and 3) modifying the code so the substructure library,
“SUB.LIB”, is only loaded once, rather than each time a search is
performed. It should be noted at this point that the STRCHK software
(STRCHK and associated modules) comprise approximately 10,500

lines of undocumented C code. While Molecular Design was kind

 

144

 

 

FORMAT EXCERPI‘
@<reference name #1) @GMRI
<substructure #1> 884
<substructure #2> 8818
<substructure #n> 8819
@<reference name #2) 8820
<substructure #1> 8844
<substructure #2> 8846
<substructure #n> 8850
@<reference name #x> 8870
<substructure #1> ‘ 8887
<substructure #2> 88133
<substructure #n> @GMRIO

884
8818
8819
8820
8821
8844
8845
8846
8850
8893
88 133
(...)

 

Figure 4.6: Format and an excerpt of the “SUBSTR.LIS” file.

 

 

145

enough to provide us with the source code to this software, they
provided no software support. Thus, the modiﬁcations made to this
software (and to GENOA) required more “sleuthing” than is generally

desirable.
Automatic Structure Generation

Three ACES programs are involved in the generation of
candidate structures. These programs are: 1) the molecular formula
generator (MPG) [21], 2) the RULE program, and 3) GENOA. As was
discussed in Chapter 1, the MFG program produces a list of molecular
formula(e) that are consistent with a given molecular weight and
elemental constraints. If an exact mass is input to MFG, only one
formula will be generated. If only TQMS data are available, a
molecular mass with a typical tolerance of 0.1 amu is entered.
Additional constraints based on the elemental compositions of
substructures found to be present by the RULE software or from
ratio measurements of ions in the M+1 daughter spectrum can be
used to reduce the number of formulae generated by the MFG
program. The RULE program was recently written by Drake Diedrich
in C to apply MAPS rules to MS/MS data of unknowns. This program
prompts for the ﬁlenames of the M8/M8 data, the ﬁle containing the
MAPS rules and the output filename. The interaction of these

programs and their I/O are shown in Figure 4.7.

L!

146

.mmu< 3 88.88» 23228 38828 .3 28228 "b... 0.52..—

 

   

   

Ace-v
chméoe

 

Khmdoe

  

 

 

<0qu

 

 

 

     

   

 

 

  
  

 

C 25:

a 3:222

 

 

 

   
   

   

 

 

 

2:... 6 92215281..
98:2 a. All. Emimd
.38 E
3.5 8 .3
All men:
was... .553
3 2.8:
00:220..
a 6:3:
. 3:229.
. m
2:: . ._ $8:
2.39:. All ma<2
9am ...
m

 

 

 

 

 

 

147

GENOA has been modiﬁed to read and utilize the information
available from the MFG and RULE programs. The modified version of
this program is called AGEN (Automated GENoa). The GENOA
program requires, at a minimum, the molecular formula of an
unknown before candidate structures can be generated. Structural
constraints are also necessary to limit the number of structures
generated. Automation of the structure generation process requires
a link between GENOA, the MFG program and the RULE program. The
links used in this work are files written to include the results of each
program. The MFG program writes a file with a standard file
extension of “.MFG” which contains a list of molecular formula(e).
The format and an example of this file are shown in Figure 4.8. The
RULE program writes a file using a standard file extension of “.MPS”
which lists the substructures found to be present in an unknown.
This software will be modiﬁed to utilize exclusion rules, as well. The

format and an example of this file are shown in Figure 4.9.

 

 

FORMAT EXAMPLE
molforml C20H8N28102
molform2 C20H8N282
C20H12N4SI
molformn C20H21N281F1
<e<b C20H24N28101

 

Figure 4.8: Format and example of the “.MFG” results file.

The data used to obtain the formulae shown in Figure 4.8 were
acquired for propionylpromazine, a compound not in the reference

database. A molecular weight of 340 amu was determined by

148

inspection of the El and CI mass spectra of the compound.
Automated recognition of the molecular ion in unknowns remains to
be implemented in this system. The constraints entered into the
MFG program included the number of carbon atoms (i.e. 20 carbons
as determined by ratios of ion intensities in the M+1 daughter
spectrum). Other constraints (eg. at least two nitrogens and one
sulfur atom) were inferred by the identification of the phenothiazine
and dimethylamino substructures. The five molecular formulae
shown in Figure 4.8 were output by the MFG program and written to

a “.MF ” file.

 

 

FORMAT EXAMPLE
substructure name SSlO
substructure name 8818
... SS46
substructure name 8856
@I 8878
substructure name 8885
substructure name 88132
@I
substructure name @3
@3 @A
substructure name
substructure name
substructure name
@A

<eof>

 

Figure 4.9: Format and example of the “.MPS” results file.

The automated procedure used in the AGEN program is

summarized in Figure 4.10.

The AGEN program can produce more

149

than one result ﬁle; one ﬁle for each entry in the “.MFG” ﬁle. The
format of these ﬁles is the standard “save” format used by GENOA
with a standard ﬁle extension of “.STR” Note in Figure 4.10 that the .
AGEN program can reject a molecular formula if _the substructure
constraints are inconsistent with a given molecular formula. Sample
output from AGEN is provided in Figure 4.11. The output shown was
obtained for chlorpromazine, another test compound. For brevity,
only one molecular formula was used and the output for
substructure retrieval and case generation (i.e. calculating
overlapping substructures) were abbreviated. Ten structures were
obtained in this example and written to a ﬁle as speciﬁed in Figure
4.7. The AGEN program transfers control to another program,
DISCRIM, when structure generation has completed. The DISCRIM
program is a modiﬁed version of STRCHK and is discussed in the next

section.

The computation time required to generate structures can vary
widely (eg. from several minutes to several hours) depending upon
the constraints given to the structure generator. The structure
generation time is a function of, among other things, the number of
undetermined free valences (bonding sites) and atoms. The order in
which the constraints are entered can also have a drastic effect on
structure generation time. The number of cases generated using
chlorpromazine as an example and two different constraint orderings
are shown in Table 4.6 and Table 4.7. The constraints were ordered
according to their reference names in Table 4.6 and according to the

number of atoms contained in each of the substructure definitions in

150

 

AUTOMATIC STRUCTURE GENERATOR (ASG)

 

I GET SS'S (INCLUSION) I

I
I GET MOL. FORMULME) I

I

DEFINE MOL. FORMULA

 

 

’

 

 

 

GET SS DEFINITIONS

I

I CONST SS(X) AT LEAST 1 I

 

 

 

(:‘IENERATEj STRUCTURES

 

 

 

 

 

 

 

YES NO
SAVE I MARK MF
STRUCTURES AS BAD

 

 

Figure 4.10: Flowchart of the auotmatic structure generator.

151

 

Automated Structure Generator
GENOA: MSU Version 1.0

REFERENCE NAME (EXIT to end program): UNKI
There are 1 molecular formula(e) associated with UNKl
There are 11 substructures associated with UNKl

. SUBSTRUCTURE TO BE RETRIEVED: $8132 DEFINED
SUBSTRUCTURE TO BE RETRIEVED: (and so on...)

 

MOLECULAR FORMULA DEFINED

SUBSTRUCTURE NAME: 1 CASE WAS OBTAINED
SUBSTRUCTURE NAME: (and so on...)

10 STRUCTURESWEREGENERATED

SAVED ON UNKIOOSTR

Structure generation complete.

Transferring to DISCRIM for discriminating substructure analysis.
ENTERING DISCRIM PROGRAM...

UNKlm.STR RESTORED

10 STRUCTURES

Starting SURVEY

READING ENTRIES FROM LIBRARY FILE.

 

3cm ‘G‘ﬁinouonsrﬁucnmssw

SAVFIIENAME(STRCHK)= UNK1.GEN
ASG exited.

 

Figure 4.11: Sample output from the automated structure
generator (ASG) program.

152

Table 4.7. The number of cases generated was greatly reduced by
starting with the substructures which had the greatest number of
atoms. This optimization leads to greatly reduced structure
generation times (eg. 36 minutes using the ordering found in Table

4.6 versus 6 minutes using the ordering found in Table 4.7).

 

 

SUBSTRUCIURE REFERENCE # OF CASES GENERATED
8810 (3, 2 carbons and a S) 1
$813 (1, chloro) 1
8818 (6, x-phenyl) 6
$819 (1, methyl) 6
8847 (7, x-chloro-phenyl) 149
$851 (7, x-nitrogen-phenyl) 942
8856 (3, dimethylamino) 3436
8878 (7, x-sulfur-phenyl) 3278
$885 (6, 1,2-phenyl) 562
$890 (6, 1,2,3-phenyl) 210
88132 (14, phenothiazine) 4

 

Table 4.6: List of substructure constraints (ordered by reference
number) and the number of cases ' generated by the
structure generator. The number of atoms and a
descriptive name for each substructure reference are
provided in parentheses.

Potential for the Reduction of the Number of Candidate

Structures through Ancillary Experiments

One of , the long-term goals in the development of the ACES
software is the integration of the various software tools under an
“intelligent controller” program. Among the desired features of this
program is easy access to the ACES software tools, thus minimizing

the number of programs a user must learn to achieve results.

153

 

 

SUBSIRUCIURE REFERENCE # OF CASES GENERATED
88132 (14, phenothiazine) 1
8878 (7, x-sulfur—phenyl) l
SSS] (7, x-nitrogen-phenyl) l
SS4? (7, x-chloro-phenyl) 4
$890 (6, 1,2,3-phenyl) 10
8885 (6, 1,2-phenyl) 2
8818 (6, x-phenyl) 2
$856 (3, dimethylamino) 2
8810 (3, 2 carbons and a S) 2
$819 (1, methyl) 2
8813 (l, chloro) 2

 

Table 4.7: List of substructure constraints (ordered by number of
atoms) and the number of cases generated by the
structure generator. The number of atoms and a
descriptive name for each substructure reference are
provided in parentheses.

Another important capability of the intelligent controller is the
ability to suggest ancillary experiments (MS/MS experiments, in this
case) which can further resolve a structure elucidation problem.
Implementation of these ancillary experiments either by a human
operator or by using the data acquisition / instrument control
software will close a “feedback loop” which includes the mass
spectrometer. This feedback loop will allow the structure elucidation
system to solve problems in much the same way human experts
solve problems. The following paragraphs describe how such a

system may be implemented.

The automated version of the structure generator is an
important step in achieving an “intelligent” MSIMS instrument. A

method for Summarizing the unidentified substructures within a set

154

of candidate structures is also required to make this instrument a
reality. The approach used in the ACES system is to use another
modified version of the Molecular Design STRCHK program to provide
a substructure analysis of a set of candidate structures for an
unknown. The modiﬁed version of STRCHK program (referred to as
DISCRIM within the ACES system) was modified to compile
substructure counts from the files created by the AGEN program (the
“.STR” ﬁles) and to write the results to a file with a standard
extension of “.GEN”. More than one structure file is obtained for an
unknown only if there is more than one viable molecular formula for
an unknown. This process is shown schematically in Figure 4.12 and
the format of the “.GEN” file is given in Figure 4.13. The modiﬁed
version of STRCHK is called DISCRIM in ACES because it provides a
list of “discriminating substructures”. The list of discriminating
substructures produced by the SURVEY function for a set of
candidate structures is shown in Figure 4.14. This list consists of
each of the substructure names and an associated number of
occurrences of each discriminating substructure within the set of
candidate structures. Thus, the automated version of the structure
generator, AGEN, and the automated version of the structure checker,
DISCRIM, provide the basis for incorporating the ability to suggest
ancillary experiments into the ACES system.

The flowchart which describes a proposed program, EXPT, that
will be able to recommend ancillary experiments is shown in Figure
4.15. The EXPT program will read the “.GEN” file for an unknown to

obtain the discriminating substructures. The MAPS rulebase will

155

AUG ..ozobeoo 28$:an 2: he meouaueoEEoooe EoEtoqxu

SEQ; 5390 o. 3::on 338.285 can «Eocene—co 88on 2.. 2365 ouaﬁonom “um... «...-um...—

 

 

1

 

f.

mpzmaimexm
o.

9.‘

 

>¢<4.=Oz<

 

mm<mm43m
ma?!

J

{R

 

 

 

    
 

 

 

 

 

 

 

 

zmw...

 

Ace-v
mhméoe

 

 

Khméce

 

 

156

 

 

FORMAT EXAMPLE
<reference name> NEWTEST
dhﬂNKnFORMS> l
humuxnuu#1 numaxnuu#1
<molforrn #1> C 24 H 38 O 4

<1! of structures>

DISCRIMINATING SUBSTRUCTURES
<DSS #1)

<DSS #2>

<DSS #n>

MOLFORM # 2

<molform #2) _

<1! of structures>

DISCRIMINATING SUBSTRUCTURES
<DSS #l>

<DSS #2>

<DSS #n>

MOLFORM # n

<molform #n)

<1! of structures>

DISCRIMINATING SUBSTRUCTURES
<DSS #l>

<DSS #2>

<DSS #n)

«END

1 25
DISCRIMINATING SUBSTRUCTURES
T-BUTYL
1 1
EITDXY

3 1
NONYL
50

DECYL

2 l
T-AMYL
4
UNDECYL

IDDHZYL

I-PROPYL
3 l
JEND

 

Figure 4.13: Format and an example of the “.GEN” ﬁle.

157

 

# STRUCTURES WITH DISCRIMINATING FEATURES:

2 7 HYDROXYL

3 0 METHYL
ETHYL
MEI‘HOXY
PROPYL
PHENOL
PROPENYL
MEI'HOXYPHENYL
TOLYL
PHENOXY
P-HYDROXY
S—HYDROXY
T-HYDROXY
BMW
2 CY CLOPROPYL

CY CLOBUI'YL

Taken from a set of 54 candidate structures
withMOLFORM=C 11H 12 O 2.

NH#M#~F‘®NNOOO\UIG
ON

 

Figure 4.14: Sample output from survey function using 54
candidate structures generated from the molecular
formula shown.

158

 

 

Retrieve discriminating substructures from “.GEN” file.

I

Check MAPS rulebase for rules corresponding to
discriminating substructures.

Retrieve instrumental conditions. I

J

I Rank substructure I conditions. I

I

{ Input to intelligent controller. I

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.15: Flowchart showing basic algorithm for obtaining
ancillary experiment recommendations.

..4.;

159

then be consulted to determine if there is a rule which can identify
or eliminate each of the discriminating substructures. The
instrumental conditions used to acquire the data upon which the
MAPS rule is based will also be required. The MAPS rulebase will
need to contain rules for a variety of instrumental conditions. These
conditions can include alternative ionization conditions (eg. ammonia
CI) and collision conditions (eg. assorted collision gas pressures,

collision energies, or reactive collisions vs. dissociative collisions).

The ancillary experiments should be ranked to maximize the
potential information which can be gained by running each
experiment. The method of ranking will depend upon the MAPS rule
type. For inclusion rules, (i.e. rules which identify the presence of a
substructure) experiments to identify discriminating substructures
with a small number of occurrences within the set of candidate
structures are preferable. The reason for this ranking is that all
structures not containing the newly identiﬁed substructure will be
removed from the list of candidate structures. Thus, identification of
the presence of a discriminating substructure with a smallest
number of occurrences within the set of candidate structures will
leave, after pruning, the fewest candidate structures. The lack of an
identification using the ancillary experiment, however, does not
mean the substructure is not present, only that it is still unidentified.
For exclusion rules, which are used to identify the absence of a
substructure, experiments which identify discriminating sustructures
will a large number of occurrences within the set of candidate

structures are preferred. The reason for this ranking is that a

160

positive result for a discriminating substructure with the largest

number of occurrences will, after pruning, leave the fewest candidate

structures.

The intelligent controller, when it is completed, must be able to
1) determine what experiments have been performed, 2) call the
EXPT program to obtain an ancillary experiment and 3) either inform
the instrument operator of the new experiment or run the
experiment via the instrument control language. The combination of
the ACES interpretive programs with a program which can
recommend, and perhaps implement, ancillary experiments will

provide a powerful new tool for structure elucidation.

References

1. Smith, D.H., Ed., "Computer-Assisted Structure Elucidation", ACS

Symposium Series, 54, American Chemical Society, Washington,
D.C., p. 126, 1977.

2 . ibid., p.92.

3. ibid., p. 108
4. Gribov, L.A., Elyashberg, M.B., Serov, V.V., W 21,
75 (1977).
5, Carhart, R.E., Smith D.H., Gray, N.A.B., Nourse, J.G., Djerassi, C., L
WM. 1708 (1981)-
6, Linsay, R.K., Buchanan, G.B., Feigenbaum, E.A., Lederburg, 1.,

”Applications of Artiﬁcial Intelligence for Organic Chemistry -
The Dendral Project", McGraw-Hill, New York, 1980.

10.

11.

12.

l3.

14.

15.

16.

17

18.

19.

20.

21.

161

Gray, N.A.B., Buchs, A., Smith, D.H., Djerassi, C., W
§_4_, 458 (1981).

Smith, D.H., Gray, N.A.B., Nourse, J.G., Crandell, C.W., AnaLthm,
Ansell; 471 (1981)-

Crandell, C.W., Gray, N.A.B., Smith, D.H., Want,
531., 2,2, 48 (1982).

Lindley, M.R., Gray, N.A.B., Smith, D.H., Djerassi, C., W,
41, 1027 (1982).

Egli, H., Smith, D.H., Djerassi, C., 3911M 65, 1898
(1982).

Lindley, M.R., Shoolery, J.N., Smith, D.H., Djerassi, C., Mg,
BS1921. 405 (1983)-

Djerassi, C., Smith, D.H., Crandell, C.W., Gray, N.A.B., Nourse, J.G.,
Lindley, M.R., W, 54, 2425 (1982).

Molecular Design, LTD., 2132 Farallon Drive, San Leandro, CA
94577.

Gray, N.A.B., "Computer-Assisted Structure Elucidation", John
Wiley and Sons, New York, NY (1986).

ibid., p. 208.

Koniver, D.A., Wiswesser, W.J., Usdin, E., Science, 1_7_O_, 1437
(1972).

Gray, ibid, p. 215.
Ray, L.C., Kirsch, R.A., Science, 126, 814 (1957).

Palmer, P.T., Ph.D. Dissertation, Michigan State University, East
Lansing, MI, 1988.

Palmer. P.T., Enke. C.G., mm 8.8.
81 (1989).

CHAPTER 5

Evaluation of MAPS Feature-Combination Rules

Introduction

The MAPS rules provided in Chapter 3 were evaluated by their
ability to reliably identify the substructures contained in the
reference database compounds. Since the rules were generated using
the MS/MS data from each of these reference compounds, an
estimate of recall of the rules can also be calculated. Thus, the
reliability and recall estimates describe the effectiveness of the rules
when they are applied to the reference database compounds. This
chapter begins with a discussion of some general statistics for a
rulebase generated using one set of MAPS program parameters.
However, as will be demonstrated in a subsequent section of this
chapter, some MAPS rules containing features that are not due to the
indicated substructure can be generated, depending upon the initial
conditions input to the MAPS program. These conditions vary the
number and type of spectral features which can be used by the
feature-combination function. Thus, different substructures can

have different optimal starting parameters for rule generation.

There are two methods for evaluating MAPS rules in addition

to the reliability and recall estimates. The ﬁrst is to determine the

162

163

validity of the spectral features contained in the MAPS rules based
on proven ion chemistry. Thus, in this chapter, several MAPS rules
have been examined in detail to determine the effect of the initial
ﬁlter values (i.e. U, C and mass) on rule content. The results of this
examination are provided after the reliability / recall statistics are
addressed. Another method of evaluating MAPS rules is to
determine their effectiveness outside of the reference database, that
is, against unknowns. The evaluation of the MAPS rules by tests

with unknowns is presented in Chapter 6.

Reliability and Recall for the MAPS Rulebase

The frequency of unique occurrence within the reference
database for each substructure is shown in a histogram in Figure 5.1.
This figure graphically illustrates the variance that exists in the
number of “examples” available to the rule generation program in
generating a. substructure identification rule. Thus, percent recall is
a somewhat misleading indicator of the effectiveness of a particular
rule. For example, a MAPS rule with 100% recall can be generated
for substructures “SSlS” (phenyl) and “SS64” (dimethoxyphenyl).
There are, however, 83 reference compounds containing the “8818”
substructure while only 2 contain the “$864” substructure. Thus, the
recall estimate of 100% for the “8818” rule means that 83
substructure identifications will be made when that rule is applied to
the reference database compounds while only 2 substructure

identifications will be made when the “8864” rule is similarly

164

applied. The recall parameter for a rulebase, therefore, must be
interpreted with the variance in the number of occurrences of the

substructures within the reference database in mind.

A second, and very important consideration within the scope of
the ACES system, is the ability of a particular substructure
identiﬁcation to reduce the number of candidate structures obtained
from the structure generator. Generally, the number of candidate
structures decreases when the number of atoms in the substructure
constraints (i.e. larger identified substructures) applied increases.
For example, the identiﬁcation of the PHENOTHIAZINE substructure
(14 atoms in the substructure definition and 13 occurrences in the
reference database compounds) is much. more effective than the
identification of the CARBONYL substructure (2 atoms in the
substructure definition and 51 occurrences in the reference database
compounds) in reducing the number of candidate structures for a
test compound such as 3-acetyl-10-(3-dimethylaminopropyl)

phenothiazine.

Another important factor which affects the recall estimate for a
feature-combination rule is the number of initial features input to
the combination generator. It is possible, for example, for one set of
initial uniqueness and correlation values to yield no features (and,
therefore, zero recall) while another set of initial values may produce
many features (and possibly 100% recall). Unfortunately, there is no
direct relationship between the number of initial features and the

recall estimate other than the requirement that some features meet

165

 

 

625383 8323 3:232 05 E 232533
:80 ..e «33.—33° 2.3:: .3 236:: on. «532.» 55: 282533. on. Co 53°35

:53 .u ...-Ea: 00:03.0: 23253.5
0. .0 v. 0. a. a. 0. on 0. no 0. a. C. no a. '0 o. as as as. on as vs as us '5 on a. .0 so 0. o. v. no u. e. 0. on on so 0‘ 0. cm on u. .... 0v

l ...—52.58 a .o ESE—...:

.5». .u .383: 02.280: 23252.3

/Ochvovnvvvnvncvvovononhnoa-nvnnnnnwuonQuouﬁuouuuvununuuuonopo'huoaQ-v'n—up :Op 0 Q s O a v n a -

 

 

any»; 5.5: ogeegmaﬂm

 

0v

 

 

 

”Ian-a oouuqou u]
soon-um so mun"

«sq-tea comp“ u|
soouounooo go mum"

 

and «Laura

166

...—co "_.m

.5». .x .35.: 02.23.: 2325.36

00— on— .m— hm— our nmpvm— nm— «m— —m— amp 0: .o— 59— ov—mvu vvp nQp mop pc— 0v. on. .n— so— on— map 00' 00w «up '0— oo— ou— ou— hw—

 

 

95w:—

. o

.2 mm
mm

.2 me
”I

.0. 0M

engage—3 n .0 £55.53 N. m

r... Mm
I

ﬁg.

.59 .u .35.: 02.23.: 2325.35

can map 0N. nu, «Np ..N— ea, c: a: h: o: a: Q—_. a: a: p: a: oo— .o— hop on. now vo— no. «o. —o— 2: no no he

 

 

.00—

Eduuoama 5.5m:— engages—um

euqnea cousin-u u|
soon-unooo Io roquum

 

167

the minimum uniqueness and correlation values. For example, 100%
recall can be achieved for the PHENOTHIAZINE substructure using 7
initial features while 0% recall is obtained for the ethyl substructure
using 49 initial features. This result is simply due to the limitation of
mass spectrometry in identifying some substructures. A histogram
relating the number of initial features obtained using an minimum
uniqueness and correlation of 30% for each substructure in the
substructure library is provided in Figure 5.2. This ﬁgure shows that
the number of features obtained for a given set of initial conditions
varies widely within the substructure library. Note that zero recall is
observed for substructures with zero occurrences in the reference
database (see Figure 5.1). The rule recall estimate obtained for each
of the substructures in the library is shown in Figure 5.3. These
recall estimates should be interpreted with the data contained in

Figure 5.1 and 5.2 in mind.

Overall, there are 49 substructures with at least 5 unique
occurrences within the reference database compounds. Forty-two
MAPS feature-combination rules were obtained for these
substructures (using Ui = 30, Ci = 30, Cc = 30, USORT, MINF = 1,
MINCMPDS = 5). The average overall recall for these rules was 82%.
Signiﬁcantly, each of these rules has 100% reliability with respect
to the reference database. Thus, the MAPS (v. III) software provides
a method to obtain substructure identification rules with high
reliability and high recall estimates. The content of these rules is

discussed in the next section.

 

168

Analysis of Several MAPS Rules

Several MAPS rules were examined to determine the
correspondence of the spectral features found in the rules with those
from documented fragmentation pathways. These pathways have
been discovered through the use of high resolution mass
spectrometry to determine the elemental composition of the
fragment ions and isotope labelling to assist in determining the
mechanism of the fragmentation. Tandem mass spectrometry has
also proven to be an effective tool in probing the ion chemistry of a
variety of compounds [1]. The following sections provide a
comparison of the spectral features found in MAPS rules with those
identified in the literature. The effect of varying the MAPS program
parameters is explored and an additional evaluation parameter,

cross-correlation is introduced.
“PHENOTHIAZINE”

There are 13 compounds in the reference database which
contain the PHENOTHIAZINE substructure shown in Figure 5.4. These
compounds are clinically useful as antipsychotic drugs [2]. Several
studies of the El and CI mass spectra of these compounds have been
published. One of these studies examined the primary mass spectra
of newly synthesized 2-substituted lO-N-(aminoacyl)phenothiazines
which possess antigastric-antiulcer and antidepressant activity [3].
Another of these studies used metastable ions, exact mass

measurements and deuterated derivatives to investigate the

169

 

.832. dean—2.3 2... 32.0.5.2. 8235... .5362 3. m2»... .023:
23253:... on. 2 2302.33 :28 ..8 32830 3.33.. 122 ..e .38.... on. 5.33. 222.5:

.55. .u .32.: 02.225 23253.5

oo oo oo no no pooo oo oo so oo oo oooo «o .o oo os os ss os os os os os .s os oo oo so oo oo oo oo oo pooo oo oo so oo oo oo oo oo .ooo oo
_. o

 

 

N
. ﬂ
. 3— m.
.
r oou N.
vfoo” ..qu
r I
. .2. m.
. m
w 0°C ”
.
. ooo
is. .u .35.: 02.222. 23253.5
ocsvovovvvnvav:ovononsuononvnnn«n-noooooosoooouvoooou'oouo—o.spo—o—:2". :0. o o s o o o o a u
- r -I I ............ _. o
. N
n
.2: m
. m
:3
. m.
.3» M
. m
...: ...
. ...
5
.25 m
0
. I
.oou

 

A an n 30 6» u 59v
@9550...— 333 no 3‘52

and 252W

 

...So ""6 9:33

 

170

.53 .u .353. 02.222. 23253.5
8. 2. .... 3. 2. 2. .....- 3. -..... .... 8.. 3. .... 3. 3. a: 3. 2. 2. .... 3. 2. 2. 8. 2. 2. 3. .2 2. ... 2. 2. .2 .2.

 

 

 

 

 

 

 

 

 

 

l :llll II t- e
N"
. m
w
.oou m.
. m
.oon m
w '
.oo. 1
_ .
...: m
.
rooo
.55. .x .35.: 02.222. 23253.5
.... n«.- .«. nu. «a. .«. on. ... ........... ... ... ... ... ... o.. .e. .e. .o. ... me. .o. no. «e. .e. ea. c. .. so a
I
_ m
.99. m
. m
.oou
. w
.23 M
. m
woe. ...
m
.oen m
o
. C
room

A em a 80 6a n ED V
0.323..— Eﬁﬁu no 25552

 

171

 

6.22:2...— EaEc... Ago}: c238... 2:
m5... 55: 83253.... 2: a. 23253..» :28 .8 3.632. :32 2.: :29... 2.. 9.36% .5523:

.69 ... 32:52 8:280: 22253:..."

00.01.00“. '0 9....- soooo- '00. a. nOOQONOh such on Ch as as uh oh 0. a. so .00. con. a. .06. occasion-ovannuo —.:.o
. a

.

 

.xmmv ... 32.52 02.230: 23.2535

.vsvovucvcovuvavovon-nhnooanCanaananontuouhuounuvanuuuwuouocouhcouncvun—N¢ :0. O C b O o 9 u up

 

"9598 96

 

38.. u .30 ......" u 80 38.. u 8.:
moi-mm ﬂcﬁaﬁaﬁaoﬁgaorm new :50; ab

86 0.3a..—

 

 

172

.280 "Qm 9...»;
can. ... 39:52 3:220: 9.32:3...5

on. om— am— Nmp an. mm— vm— on. an. .m. on. av. av. no. ov—mv— §¢v a: «v. .1. ov— on. on— an. on— an. tn. an. an— .9. On— 0.4L 6.4L KN—
. o

"3398 %

 

cam. ... __onEaz 02.9.0.0: 958.533

on— ma— cu. nu— an. —N— ON. a—— o—— u—— a: ...—p o.— n—— a: —p— 0: new .0— so. 00— mo— ﬂo— no. ﬂop For ac. no on no

"3998 %

00 p
A $3" u A80 325 u 50 325 n ED v
83¢ nomuu£€006u3aeh ham :doom o5

 

173

fragmentation pathways of phenothiazines [4]. The sites of
protonation in phenothiazines have been determined using methane
and ammonia chemical ionization mass spectra with high resolution
mass measurements to confirm the empirical formulae of fragment

ions [5].

An early study of phenothiazine derivatives used high
resolution mass spectrometry to determine the fragmentation
pathways in preparation of the analysis of phenothiazine metabolites
[6]. The investigators in this study divided the fragmentations of
phenothiazine compounds into three groups: 1) fragments
representing the side chain, 2) fragments representing the intact
phenothizine ring system with part of the side chain attached and 3)
fragments representing a partially fragmented ring system. MS/MS
spectral features due to all three types of fragmentations have been
observed in the MAPS rules for the PHENOTHIAZINE substructure.
The following discussion compares the spectral features contained in
the MAPS rules for PHENOTHIAsz to those which were identified

in the aforementioned studies.

The initial features obtained for the PHENOTHIAZINE
substructure (Ui=40% and Ci=70%) are shown in Figure 5.4. The mass
ﬁlter was not utilized in generating this particular rule. Thus, there
are three spectral features (eg. D 211.0) in this rule with masses
larger than the nominal mass of the PHENOTHIAZINE substructure
(i.e. 198 amu). Proposed structures of fragment ions of

phenothiazine are provided in Figure 5.5‘. Pertinent fragmentation

174

 

IF “D (211.0) {42,77} ” {F1}
and “D {210.0} {41,351 " {F2}
and “D (209.0) {40.77} " {F3}
and “D (198.0) {44,92} " {F4}
and “PD (198.0 -> 171.0) {77,77} " {F5}
and “PD (198.0 -> 154.0) {92,92} " {F6}
and “PD (197.0 -> 196.0) [69,85] " {F7}
and “PD (197.0 -> 153.0) {91,77} " [F8}
and “PD (196.0 -> 152.0) [83.77] " {F9}
and “PD (70.0 -> 27.0) {42,35} " {F10}

THEN substructure PHENOTHIAZINE is present.

(Umin = 40%» Cmin = 70%)

u:

PHENOTHIAZINE
SUBSTRUCIURE

Figure 5.4: The initial MAPS (v.II)
PHENOTHIAZINE substructure.

 

rule obtained for the

175

pathways gleaned from the references discussed above are given in
Figure 5.6. Note that the "PD" features contained in the MAPS rule
for PHENOTHIAZINE (Figure 5.4) correspond directly to the key
fragmentation pathways outlined in Figure 5.6. The feature number
from the PHENOTHIAZINE rule is provided in brackets for each of the
corresponding fragmentations shown in Figure 5.6. These figures
demonstrate the ability of the MAPS rule generation software to
select diagnostic spectral features from within the MS/MS database
for inclusion in substructure identification rules. The feature
combination rule produced from the initial spectral features shown
in Figure 5.4 is provided in Figure 5.7. This rule. has 100% reliability
and 100% recall for the PHENOTHIAZINE substructure with respect to

the reference database.

Another initial rule for the PHENOTHIAZINE substructure was
generated using the mass filter but with the same Ui and Ci. The
three features shown in Figure 5.4 with masses higher than 198 amu
were eliminated by this filter but the remainder of the rule was
identical. A new feature combination rule, shown in Figure 5 .8, was
then generated. This rule contains a different clause but retains
100% reliability and 100% recall with respect to the reference
database. The mass filter effectively restricts the initial rule features
to those that are directly related to the fragmentation of a
substructure. Since the MAPS program stops searching for feature-
combinations after a recall of 100% is reached, it is important that
the initial features be closely related to the indicated substructure

and not just due to the combination of the indicated substructure and

 

 

176

.c E... v 32.289.
Ea... 3.9.3.. ...—E3383 ocﬁﬁﬁocoﬁ .0 use. 2.08%... 3.2.2.. .8 3.....osbm .m.m ...—awr—

 

 

su ~>c um. SE mm. #5 mm. 3.:
...o/
“..."..m \zllﬁ'ﬁ
n30 ..
_x x a: :5»
em. #5 K. #8 8. NE. 3. NE
a
>—
8. 5... Eu 55 ..~.~.e

25
Q.
23
51?.

00va Q.

 

177

another substructure. If the latter were true, then the MAPS rule for
the indicated substructure may only identify the substructure in the
presence of the additional substructure. On the other hand, the rule
generated without the mass filter provides some clues to other
substructures bonded to the indicated substructure (eg. a carbon
bonded to the nitrogen in the PHENOTHIAZINE substructure). There
is, as yet, no automated method for determining the origin of the
features with higher masses. Thus, MAPS rules generated without
the mass filter are, at this point, most useful for manual inspection to

determine additional substructure / spectral feature correlations

 

 

. Corresponding
Fragmentation Path ' Rule Feature Neutral Loss
m/z 198 (III) -> m/z 171 (IV) {F5} (loss of HCN - 27 amu)
m/z 198 (III) -> m/z 154 (VII) {F6} (loss of CS - 44 amu)

m/z 197 (IV) -> m/z 196 (V) {F7} (loss of H - l amu)
m/z 197 (IV) -> m/z 153 (VIII) {F8} (loss of CS - 44 amu)
m/z 196 (V) -> m/z 152 (IX) {F9} (loss of CS - 44 amu)
m/z 70 (X) —> m/z 27 (XI) {F10} (loss of CszN - 43

amu)

 

Figure 5.6: Comparison of documented fragmentation pathways for
phenothiazine derivatives with the features contained
in the MAPS PHENOTHIAZINE rule.

while those generated with the mass filter can be directly applied to
an unknown. Other starting values for the uniqueness and
correlation filters were used to determine their effect on rule

content.

 

178

 

IF “ D (198) and PD (198 -> 154) [100,92] ”
OR “ D (211) and PD (198 -> 154) [100,85] ”

THEN substructure PHENOTHIAsz is present.
REL = 100% REC = 100% -

{Um};l = 40%, cum. = 70% for initial features;
rule clauses with correlation < 85% were deleted)

 

Figure 5.7: MAPS (v.II) feature-combination rule for the
PHENOTHIAZINE substructure with a recall and
reliability estimate of 100%.

There are two sets of values which provide useful information
but not necessarily optimal reliability or recall for a specified
substructure. First, an initial .MAPS rule can be generated using a
high minimum correlation value and a very small minimum
uniqueness value to identify the spectral features with the greatest
frequency of appearance with the presence of a given substructure.
This procedure, when Ci = 100% is used, provides “exclusion” rules
since the absence of these "universally" observed spectral features
can be used to predict the absence of substructures [7-9]. The
exclusion rule for phenothiazine is provided in Figure 5.9. Second, a
high minimum uniqueness value and a very low correlation value
can be used to select spectral features that are strongly dependent

{upon the presence of other substructures.

The MAPS feature-combination inclusion rule obtained for the
PHENOTHIAZINE substructure using an initial uniqueness of 100%

and an initial correlation of 20% is shown in Figure 5.10. The mass

 

5.0.0.202... Bane... 0030...... 0... ...... 0032.0 .02.. 0...... 0... 5.... 39mm.
0.30.2.3... mszsmhozaa 0... .0. 00.2030 0.... 2020258000330. @252 2:. "ad 0...»...—

..—..mOmD 6032.0 .05.. «an... .on n 00 .05 u _U .9. u 5 ”sag—.200 22:5

179

A.

.832: .32: .32:

382: 3:83.: 3...: 3...: 23.3. 3 8. .8... 8.8. a... 8.3. .32 one.
.832: .32: .32:

382: 3:83.: .22: 3...: 232:. 3 8. .8... 8.8. a... 8.3. 8.2.. new

.822: .32: .32: .

382: 3.: 83.: 3.2: 3.2.: 23:. 3 8. .83. 8.8. a... 8.3. 8.... 2...:
.822: .32: .32:

382: 3.: 32.2: 2.2: 3...: 23.3. 3 8. .83. 8.2... a... 8.3. 3.2 2...:
.832: 8.2: .32: .32:

382: 3.: 3.3.: 2.2: 3...: .325 3. 8. .832 8.2.. a... 8.3. .32 2...:
.832: .22: .32: .32:

382: 3.: 32.2: 3.2: 3.2.: 23.3. 2. 8. .832 8.3. a... 8.3. 8.3. 2...:
.832: 3.2: .32: .32:

382: 3.: 32.2: 3.2: 3...: 23:5 3. 8. .83. .3... a... 8.3. 8.2.. one:
.822: .32: .32: 382:

3.: 32.2: 3.2: 332: 3.2.: 23.26 2. so. .832 8.3. a... 8.3. 3.2 2...:
2.8.2: .22: .32: 382:

3.: 33.: 83.: .22: 3.2.: 232:. e. 8. .822 3.2 a... 8.3. 8.... 2...:
2 8.2:
3.2: .32: .32: 382: 3.: 3.2:

83.: 2.2: 332: 32.: 23:5 8 so. .882 e 8.3. 8.... 2.3.

V, 2.22-0:oolnmam so...

 

180

 

(((SS132

(((PD 70.0 42.0) 25 100) ((NL 1.0) 13 100)

((NL 16.0) 13 100)
((NL 29.0) 13 100)
((NL 42.0) 14 100)
((NL 56.0) 13 100)
((NL 157.0) 20 100)
((D 29.0) 15 100)

((D 43.0) 16 100) ((D 55.0) 14 100) ((D 69.0) 18 100)
((D 70.0) 18 100) ((D 71.0) 19 100) ((P 41.0) 14 100)
((P 42.0) 13 100) ((P 43.0) 14 100) .((P 57.0) 14 100)
((P 65.0) 13 100) ((P 68.0) 16 100) ((P 69.0) 14 100)
((P 70.0) '14 100) ((P 71.0) 14 100) ((P 77.0) 13 100)
((P 82.0) 15 100) ((P 83.0) 15 100) ((P 84.0) 16 100)
((P 95.0) 17 100) ((P 125.0) 22 100) ((P 126.0) 19 100)
((P 127.0) 18 100) ((P 139.0) 20 100) ((P 149.0) 14 100)
((P 152.0) 21 100) ((P 153.0) 20 100) ((P 167.0) 17 100)
((P 171.0) 24 100) ((P 178.0) 28 100) ((P 179.0) 27 100)
((P 184.0) 25 100) ((P 185.0) 29 100) ((P 196.0) 31 100)
((P 197.0) 25 100) ((P 198.0) 30 100) ((P 210.0) 31 100)
((P 212.0) 34 100) ((P 223.0) 36 100) ((P 224.0) 38 100))

))

((D 41.0) 14

100)

((NL 15.0) 13 100)
((NL 27.0) 13 100) ((NL 28.0) 12 100)
((NL 32.0) 19 100) ((NL 41.0) 15 100)
((NL 43.0) 14 100) ((NL 44.0) 14 100)
((NL 71.0) 14 100) ((NL 85.0) 15 100)
((NL 187.0) 27 100) ((D 27.0) 14

((D 42.0) 15

100)
100)

1....

Initial conditions: Ui = 10, Ci = 100

 

Figure 5.9: MAPS (v.II) exclusion rule for the PHENOTHIAZINE
substructure (SS132) generated using the indicated
program parameters.

 

 

.22... 20.8.0200 ......a. 30. . ...... 82.2.3.2. 8...... an... a 2...... ..0...0..0w

 

181

.8.... 238:3... 8.32.822... 2.. .5. 0.... 8.32.8835“... 2...... ...<: ...... 2......
. .... 22.: .3380 .3... 32.. .... u 6 .... u .0 .8. u ... 38.88 3......

..

.832: 3.: 3.2:. m. 8. .8.... 3... 9.... ....

.832: 3.: 3.2:. .. 8. .8.... ...... ...... ......

.832: 3.: 3.2:. .. 8. .8... 8...... ...... 8....

.832: 3.: 3.2:. .. 8. .8.... 8.8. ...... ......

.832: 3.: 3.2: 282:. .. 8. .8.... 8.8. ...... 8....

.832: 3.: 882:. .. 8. .8.... ...... ...... ......

.832: 3.: 3.2:. m. 8. .8.... 8.8. 9.... ......

.....2: .32: 3.: 8...: 3...: 282:. ... 8. ......3 o... 9.... ......

.. gaum:oon..... 28.

 

182

filter was utilized in generating this rule. There are 8 spectral
features with 100% uniqueness (and therefore 100% reliability) with
respect to the reference database. The compound reference names
which produce the indicated feature are included in Figure 5.10. A
feature-combination rule was not generated using this initial rule
because each feature already has a 100% reliability estimate. An
interesting. trend is found when the lists of reference compounds in
Figure 5.10 are examined. The features in this rule with correlation
values below 31% are all produced by the same compounds! The
structures for these compounds are summarized in Figure 5.11. This
analysis indicates that the features contained in this rule are highly
correlated with the substructure bonded at the 10 position of the
PHENOTHIAZINE substructure; This type of correlation, referred to
as “cross-correlation”, can be expressed quantitatively for each

feature-combination using the following equation:

XC(F)J' = MW

number of compounds with F

where F is a spectral feature or combination and SSj is substructure j.
The cross-correlation values obtained for each of the features found
in Figure 5.10 are summarized in Table 5.1. The substructure used
for these calculations was “S8110” as shown. Thus, the features with
high XC values are indicative of the presence of “88110” and

PHENOTHIAZINE.

 

 

 

183

 

 

I
I
I1
Compound
Reference
Name R1 R2
c'a2_/\_‘Imz
(7., \_/ 7., a
M164 4" ' °" "Lm’
m,-/-\-c..,
é": v o
M11485 M” 'Lw’w’m’
Ta-ﬂ-‘r‘a
in: V Gilla 0
—-(:III A“ _LCN CH.
M11844
(Eu: V 7.2
M17044 'w' °" "°'

 

Figure 5.11:

Structure summary for several of the phenothiazine
compounds in the reference database.

 

184

 

\_/
Substructure “SSllO”

Feature Number XC(F)110 (%1

 

 

Fl 50
F2 100
F3 100
F4 100
F5 100
F6 100
F7 100
F8 100

 

 

 

Table 5.1:

Cross-correlation values calculated for each of the
spectral features contained in the MAPS rule for the
PHENOTHIAZINE substructure (shown in Figure 3.22)
with respect to the “88110” substructure.

185

The cross-correlation factor has two potential uses in the MAPS
program. First, a cross-correlation value could be calculated for each
feature-combination that meets the specified U/C criteria. If the
cross-correlation value for the combination is too high (i.e. above a
set maximum) then that combination could be discarded. This ability
could be built into the feature-combination generation function of
MAPS. A second potential use of the cross-correlation factor is to
place feature-combinations with high cross-correlation into an
ancillary rulebase to be applied once a more general rule had
identified the major substructure (cg. applying the rule shown in
Figure 5.10 only after the more general rule shown in Figure 5.7 has
identified the presence of the major substructure PHENOTHIAZINE).
This function' could be built into the RULE program.

“BARBITURATE”

Barbiturates are another class of compounds that find use in
pharmaceuticals and in the illicit drug trade [2,10]. There are ten
compounds which contain the BARBITURATE substructure in the
reference database with a variety of substituents, at the l, 3 and 5
positions. Three substructures relating 'to barbiturate derivatives
(including the generic BARBITURATE substructure) are provided in
Figure 5.12. The MAPS feature-combination rule for the
BARBITURATE substructure is shown in Figure 5.13 along with the
parameters input to the MAPS program. Unlike the rules for the
PHENOTHIAZINE substructure, the MAPS rules for the BARBITURATE

substructure do not contain spectral features indicative of the intact

 

*h

186

substructure (cg. m/z 129). Among the notable “PD” dissociations are
m/z 98 -> m/z 80, 70, 28, 27 and m/z 97 -> 69, 55. The features in
this rule represent the largest common fragments produced from a
variety of barbiturate standards. A potential problem can occur
when using these types of features to identify compounds in
biological matrices due to their relatively low intensity compared to

other higher mass spectral features [9].

 

. \ ”Xu/ :1qu

we 54>»

cu2 GI=OH

......»k

“88141” “88145” “88147”
alias: alias: alias:
“BARBITURATE” none none

 

Figure 5.12:.Substructure drawings for the BARBITURATE
substructure and two specific derivatives.

Several studies of the fragmentation pathways of barbiturates
have been published [IO-13]. Two different pathways are illustrated
in Figure 5.14. These two pathways represent common fragments
for the 5-allyl and 5-ethyl barbiturates. Structures for the major
ions formed from these barbiturates (i.e. m/z 168,167 and m/z 156)
are also shown in Figure 5.14. The MAPS rule shown in Figure 5.13
does not contain spectral features indicative of these ions. However,
the MAPS rule for the “88145” (S-allyl barbiturate) and “88147” (5-

 

187

 

(setq 88141 _COMB_RULE (
(((PD 97.0 55.0) (PD 97.0 69.0) (P 112.0) (P 40.0) (D 52.0))
100 60 (M11478 M11490 M1247 M1592 M17109 M1777»
(((PD 98.0 28.0) (PD 98.0 27.0) (PD 98.0 70.0))
100 60 (M11478 M11490 M1247 M1592 M1777 M1960»
(((PD 98.0 28.0) (PD 98.0 27.0) (D 26.0))
100 60 (M11478 M11490 M1247 M1592 M1777 M1960»
(((PD 98.0 28.0) (PD 98.0 27.0) (PD 54.0 27.0))
100 so (M11478 M1247 M1592 M1777 M1960»
(((PD 98.0 28.0) (PD 98.0 27.0) (PD 56.0 29.0))
100 so (M11478 M11490 M1247 M1592 M1960»
(((PD 98.0 28.0) (PD 98.0 27.0) (PD 56.0 41.0))
100 50 (M11478 M11490 M1247 M1592 M1960»
(((PD 98.0 80.0) (PD 98.0 70.0) (D 28.0) (NL 57.0) (D 27.0))
100 50 (M11478 M1247 M15821 M1592 M1777»
(((PD 97.0 55.0) (PD 97.0 69.0) (P 112.0) (P 40.0) (PD 167.0 149.0))
100 50 (M11478 M11490 M1247 M17109 M1777»
(((PD 98.0 28.0) (PD 98.0 27.0) (PD 83.0 55.0))
100 50 (M11490 M1247 M1592 M1777 M1960»
(((PD 98.0 28.0) (PD 98.0 27.0) (PD 167.0 149.0))
100 40 (M11478 M11490 M1247 M1777»
(((PD 98.0 28.0) (PD 98.0 27.0) (PD 54.0 26.0))
100 40 (M11490 M1247 M1592 M1960»
(((PD 98.0 80.0) (PD 98.0 70.0) (D 28.0) (NL 112.0) (D 27.0))
100 40 (M11478 M1247 M15821 M1777»
(((PD 98.0 80.0) (PD 68.0 40.0) (PD 97.0 69.0))
100 40 (M1247 M15821 M1777 M1960»
(((PD 98.0 80.0) (PD 68.0 40.0) (P 40.0) (D 27.0))
100 40 (M1247 M15821 M1777 M1960»
(((PD 98.0 80.0) (PD 68.0 40.0) (D 66.0) (D 27.0))
100 40 (M1247 M15821 M1777 M1960»
(((PD 169.0 126.0) (PD 69.0 39.0) (NL 87.0) (PD 69.0 68.0) (D 27.0))
100 40 (M11478 M15682 M15821 M1777»
(((PD 169.0 126.0) (PD 69.0 39.0) (PD 69.0 68.0) (P 113.0) (D 27.0))
100 40 (M11478 M15682 M15821 M1777»
(((PD 169.0 126.0) (PD 69.0 39.0) (PD 69.0 68.0) (P 98.0) (D 27.0))
100 40 (M11478 M15682 M15821 MI777))
(((PD 169.0 126.0) (PD 69.0 39.0) (PD 69.0 68.0) (P 99.0) (D 27.0))
100 40 (M11478 M15682 M15821 MI777))'

 

Figure 5.13: MAPS (v.III) feature-combination rule for the
BARBITURATE substructure (SSl41) generated using the
indicated program parameters.

188

 

(((PD 169.0 126.0) (PD 69.0 39.0) (PD 69.0 68.0) (P 54.0) (D 27.0))
100 40 (M11478 M15682 MI5821 MI777))
(((PD 68.0 40.0) (PD 55.0 29.0) (P 40.0))
100 40 (M1247 M1777 M1960 M1961))
(((PD 68.0 40.0) (PD 69.0 41.0) (NL 86.0) (P 40.0) (P 44.0))
100 40 (M1247 M1777 M1960 M1961»
(((PD 68.0 40.0) (PD 69.0 41.0) (NL 86.0) (P 40.0) (D 27.0))
100 40 (M1247 M1777 M1960 MI961))
(((PD 68.0 40.0) (PD 69.0 41.0) (P 112.0) (P 40.0) (P 44.0))
100 40 (M1247 M1777 M1960 M1961»
(((PD 68.0 40.0) (PD 69.0 41.0) (P 112.0) (P 40.0) (D 27.0))
100 40 (M1247 M1777 MI960 MI961))
(((PD 97.0 55.0) (PD 97.0 69.0) (PD 83.0 55.0) (P 40.0) (PD 149.0 93.0))
100 40 (M1247 M1592 M17109 MI960»
(((PD 97.0 55.0) (PD 97.0 69.0) (NL 86.0) (P 40.0) (P 50.0))
100 40 (M11478 M1247 M17109 MI777))
(((PD 97.0 55.0) (PD 97.0 69.0) (NL 87.0) (P 40.0))
100 40 (M11478 M1247 M17109 MI777))

 

))
Initial Conditions: Ui = 10, ‘Ci = 50, Cc = 40, mass ﬁlter enabled.

 

Figure 5.13: cont.

189

ethyl barbiturate) do show spectral features indicative of the intact
barbiturate substructure with part of the side chains attached (see
Figure 5.14). The MAPS feature combination rules for these
substructures are provided in Figures 5.15 and 5.16. The spectral
features indicative of the intact barbiturate ions (shown in Figure
5.14) are highlighted in bold faced type in these figures. The
fragmentation of the BARBITU RATE substructure appears to be quite
dependent on the side chains. Thus, the MAPS feature-combination
rule for the BARBITURATE substructure, with uniqueness and
correlation minima set to achieve 100% recall, contains the smaller
common fragments produced by large numbers of the barbiturate
standards. This situation makes for a somewhat weaker rule for a
substructure 1 since there is a possibility that other, substructures
with higher masses contained in compounds not in our database, may
fragment to give fragments of the same nominal mass as those
contained in these types of rules (thus leading to a false positive).
While there is always some risk of this situation arising in a “trained”
system such as ACES, the risk seems more acute if a substructure

rule does not contain spectral features due to the intact substructure.
“PHENOL and T-BUTYL”

The T-BUTYL substructure is an example of a highly cross-
correlated substructure. The feature combination rule shown in
Figure 5.17 has a number of features which are most likely due to
PHENOL (eg. m/z 105), not T-BUTYL. This observation was noted in a

previous study which used many of the same phenol standards [9].

 

 

 

190

 

A)

., 484—:

CH: -CH 6112

m/z 167

-—D m/z 98

o
HUI/KIN
o / 311

m/z 156

 

Figure 5.14: Fragmentation pathways for the A) “88145” and B)

“88147” substructures.

and 11.

Adapted from references 10

 

191

 

(setq SSl45_COMB_RULE ‘(
(((PD 169.0 152.0) (PD 96.0 28.0))
100 100 (M11478 MI247 MI777))
(((PD 169.0 152.0) (PD 168.0 96.0) (PD 124.0 43.0))
100 100 (M11478 MIz47 MI777))
(((PD 96.0 28.0) (PD 153.0 136.0»
100 100 (M11478 11711247 M1777»
(((PD 96.0 28.0) (PD 167.0 43.0))
100 100 (M11478 111247 MI777))
(((PD 96.0 28.0) (PD 169.0 109.0»
100 100 (M11478 111247 MI777))

))
Initial Conditions: Ui = 40, Ci = 70, Cc = 50

 

 

Figure 5.15: MAPS (v.III) feature-combination rule for the “88145”
substructure generated using the indicated program
parameters.

 

 

192

 

 

. 8.888986
83on 633:3: 2: win: 6280:...» 832533 :55? 2: 8.. 2.: 2:56 922 ":6 «.3»:—

88:2 88:2 8222 83:6 8 82

8.6. 8.8 :6 8.82 .6 8.8 8.8 :6 8.8 8.22 .96 8.8 3.: :66
88:2 «8:2 88:: 83:6 8 82

8.8 8.8 :6 8.8 8.8 :6 8.8 8.4m a: 8.8 8.8 9666
88:28:: 88:2 83:6 8 8.

8.8 8 8.8 8.8 96 8.8 8.83 96:
88:2 82: 88:2 83:6 8 82

8.8 :6 8.8 8.8 :6 8.8 8.82 96:
88:2 82288.2 83:6 8 8_

8.8 8.8 :6 8.8 8.8 :6 8.8 8.82 9::
88:2 82.28822 83:6 8 82

8.8 8.8 96 8.8 8.8 96 8.8 8.82 9::
88:2 88:2 88:2 83:6 8 82

8.8 :6 8.82 .6 8.8 8.22 96 8.8 8.8 96:
88:2 88:2 88:2 83:6 8 8.

8.82 .6 8.8 8.8 96 8.8 .8: 9: 8.8 8.8 :66
88:2 88:2 82: 83:6 8 82

8.8 86 8.82 .6 8.8 3.: 96 8.8 8.8 9::
88:2 88:2 8:: 83:6 8 82

8.82 .6 8.8 8.8 96 8.8 8.22 96 8.8 8.8 :6:

6. 8521828838 886

 

 

193

 

.83 “36 9:22,.—

 

8 u 6: .8 u 6 .2 u 5 88:8: 28::

88:2 88:2 83:6 8 82

8.8 8.8 :6 8.8 8.8 :6 8.8 8.82 :6 8.8 8.8 :66
88:2 «8:2 83:26 8 82

8.8 :26 8.8 3: :6 8.8 8.82 :6 8.8 8.8 :66
88:2 N822 83:6 8 82

8.8 8.8 :6 8.8 8.22 :6 8.8 8.82 :6 8.8 8.8 :66
88:2 «8:2 83:6 8 82

8.8 :76 8.8 8.22 :6 8.8 8.82 :6 8.8 8.8 :66
88:2 «8:2 83:6 8 82

8.8 8.8 :6 8.8 8.22 :6 8.8 8.82 :6 8.8 8.8 :66

2

 

194

A cross-correlation factor similar to the one already presented for

feature-combinations can be calculated using the following equation:

XC(SSJ)‘= WW

number of compounds with SSj

where SSj and SS]: are substructures j and k respectively. The result

 

(setq SSZl__COMB_RULE ‘(
(((PD 175.0 133.0) (PD 145.0 105.0» 100 57

(((PD 175.0 133.0) (PD 147.0 107.0» 100 57 ..»
(((PD 161.0 119.0» 100 50 .»
(((PD 175.0 142.0» 100 50 .»

(((PD 175.0 133.0) (PD 175.0 145.0)) 100 50
(((PD 105.0 65.0) (PD 119.0 103.0)) 100 50
(((PD 175.0 133.0) (PD 119.0 103.0)) 100 50
(((PD 175.0 133.0) (PD 161.0 121.0)) 100 50
(((PD 161.0 128.0) (PD 147.0 107.0)) 100 50
(((PD 175.0 133.0) (PD 135.0 91.0)) 100 50

AAAAAAAAAA
o o o o c o o o o c
o o c o o o o
v
V

))

Initial Conditions: Ui = 20, Ci = 50, Cc = 40, MINF=1
(rule clauses with Cc < 50 not shown)

 

Figure 5.17: MAPS (v.III) feature-combination rule for the T-
BUTYL substructure (8821) generated using the
indicated program parameters.

of this calculation for T-BUTYL, XC(SS21), is 93%. (i.e. 13 out of 14 of
the T-BUTYL containing compounds also contain PHENOL)! Thus, the
only correlation value that has potential for isolating the spectral
features due to T-BUTYL from those due to PHENOL is 100%. The
initial MAPS rule obtained using a 100% minimum correlation with a

minimum uniqueness of 10% is shown in Figure 5.18. The features

 

 

195

found in this rule are those which might be expected from

fragmentation of a T-BUTYL substructure. Unfortunately, a feature-

 

(((SSZl

(((NL 1.0) 14 100)
((NL 15.0) 14 100)
((NL 28.0) 13 100)
((NL 33.0) 21 100)
((NL 44.0) 15 100)
((NL 57.0) 15 100)
((P 55.0) 14 100)

((NL 2.0) 15 100)
((NL 16.0) 14 100)
((NL 29.0) 14 100)
((NL 42.0) 15 100)
((NL 55.0) 18 100)
((D 55.0) 15 100)
((P 57.0) 15 100)

((NL 14.0) 22 100)
((NL 26.0) 14 100)
((NL 30.0) 15 100)
((NL 43.0) 15 100)
((NL 56.0) 14 100)
((P 53.0) 17 100)

((P 58.0) 16 100)

 

 

)))
Ui = 10, Ci = 100, mass filter enabled

 

the T-BUTYL
the indicated

Figure 5.18: MAPS (v.II) exclusion rule for
substructure (SS2l) generated using
. program parameters.

combination rule with 100% reliability could not be generated from
these features. The addition of compounds to the reference database
which contain T-BUTYL, but not PHENOL, is the only recourse for

obtaining a reliable substructure identification rule for the T-BUTYL

 

(setq SS44_COMB__RULE ‘(
(((PD 133.0 105.0) (NL 14.0) (PD 91.0 65.0) (D 53.0) (D 67.0))
100 46 (GMRIO GMRll GMR12 GMRl GMR23 GMR24 GMR2
GMR3 GMR5 GMR7 M15297 MI6129 M16208 M16834
M16990))
))
Ui = 10%, Ci = 50%, Cc = 45%, mass ﬁlter disabled

 

Figure 5.19: MAPS (v.III) feature-combination rule for the PHENOL
substructure generated using the indicated program
parameters.

 

196

substructure. The cross-correlation factor for PHENOL with respect
to the T-BUTYL substructure, on the other hand, is 41%. The MAPS
feature-combination rule obtained for PHENOL, using an initial
correlation of 50% and a feature-combination correlation of 45% is
shown in Figure 5.19. The reliability of this rule is 100% and the

recall is 46% with respect to the reference database.
Alternate MAPS Rules From Multiple Collision Data

MS/MS spectra for use in generating MAPS rules were acquired
under two different sets of operating conditions (see Chapter 2). A
major question regarding the selection of pressure regimes for CAD
spectra concerns the relative value of the spectral features observed
in these spectra for structure elucidation. One school of thought
treats spectral features as having equal value in determining
molecular structure. Thus, an increase in the number of spectral
features should result in an increased ability to determine a
structure. Since increasing collision gas pressure increases the
number of fragment ions observed, multiple collision conditions are
preferred. MAPS rules have been generated using a variety of
program parameters to vary the number and types of initial features
used by the rule generator. While in some cases an increase in recall
was observed with increasing the number of initial features, there
were also many examples where zero recall was obtained no matter
how many features were used. Thus it is not just the number of

features which is important in identifying a substructure.

197

Another school of thought recognizes that some spectral
features are more important than others and the collision pressure
should be set to an optimal yalue to detect them (or to avoid the
production of features which interfere with features that would
otherwise be unique). In some cases this means using single collision
conditions to obtain bone ﬁde neutral losses. In other cases multiple
collision conditions are required to obtain fragments of parent ions
which possess very small cross sections (a case where one collision
provides insufficient energy deposition for dissociation to occur).
Thus, selection of a pressure regime is compound dependent. This
dependence is one of the difficulties facing the adoption of standard

conditions for a CAD spectral database.

The MAPS rules discussed heretofore were generated using
MS/MS spectra acquired under single collision conditions. The base
peak in these spectra is almost always the parent ion and there are
often relatively few major fragment ions (daughter ions). Single
collision conditions are established by adjusting the target gas
pressure so the probability of only a single collision between a mass
selected parent ion and the target gas is very large compared to the
probability of more than one collision between the parent (and
daughter) ions and the target gas. In other words, the fragment ions
in a CAD (collisionally-activated dissociation) mass spectrum
acquired under single collision conditions are the result of
dissociation of the parent ions only and not from dissociation of
daughter ions (sometimes referred to as granddaughter ions). A

preliminary comparison of the rules obtained from the single-

198

collision data to those generated using the multiple-collision spectra

is provided below.

Comparative Recall for MAPS Rulebases Generated from

Data Acquired at Different Collision Gas Pressures

An additional MAPS rulebase was generated using the MS/MS
spectra acquired under multiple collision' conditions. The spectra
used to obtain these rules are characterized by a great deal more
fragmentation and often by a base peak other than the parent ion.
The MAPS program parameters (eg. initial feature uniqueness) were
the same as those used to generate the rulebase discussed in the
previous section. The recall ‘obtained for substructures with at least
5 occurrences in the reference database is provided in Figure 5 .20
for the two rulebases. Overall, 40 MAPS feature-combination rules
with non-zero recall were obtained (versus 42 rules using the single-
collision data) with an overall recall of 79% (versus 82% recall using

the single-collision data).

One significant detail in Figure 5.20 is that there are four
substructures where non-zero recall was obtained using one dataset
and zero recall using the other dataset (i.e. SS7 - monosubstituted
phenyl, SS19 - methyl, S823 - butyl and S856 - dimethylamino). Of
these four substructures, only the S819 substructure (methyl) was
better identified using multiple collision conditions. The recall shown

for the remaining substructures in figure 5.20 shows some

 

 

199

.86mnoo\86m05\86mu5
98: 60:83: 3:: 2828223886388 2: .2: :82 2: 9:328 8888:: Sad 2:3...—

algl d pl 0 an 4 1

did

on. on. at. 0V. ch on. an. uuv ... o—— o—— .o— no. .0 no a. no —. .-

O
Q
(ﬁ) M

   

 

1% I. d E- ! a- 4

h. a. n- In no no on on .u on O. a. '0 av .9 a. an «a —n on 0. Or a. o. o— h 0 a v

 

 

0
ON
2 M
00 m
on

 

200

variability between the two rulebases. Thus, the likelihood that a
MAPS rule will identify a substructure, as provided by the recall
estimate, varies between single and multiple collision conditions.

The significance of this observation for the ACES system is that
an ancillary experiment using alternative pressure regimes (eg.
multiple collisions) may potentially identify a substructure that could
not be identified using the original pressure regime (e.g. single
collision). Therefore multiple rulebases, one based on single collision
conditions and one based on multiple collision conditions, can be used
to reduce the number of candidate structures by securing additional
substructure identifications. This component of the ACES system
requires further data and program support to be fully realized. The
most critical program development in this regard is the intelligent
controller / program shell. The intelligent controller should lead the
user through the programs which comprise the ACES system (a shell
function) and act as an interface to the user and /or mass
spectrometer so ancillary experiments can be performed (a control

function).

Effect of Collision Gas Pressure on Rule Content

The uniqueness and correlation values of a given spectral
feature in the reference database varies depending on the pressure
regime chosen for data acquisition. Since the MAPS software uses
these values to identify the starting features for feature-combination
generation, the content of a MAPS rule may alSO be different from

one generated using another pressure regime. Consider, for example,

 

201

the “PD” features in the rules for PHENOTHIAZINE (single and
multiple collision conditions) that have m/z 198 as the parent ion.
These features are shown in Figure 5.21. Three “PD” features with a
m/z 198 parent met the initial uniqueness and correlation minima of
30% using the single-collision data. Note that four additional ions are
observed under multiple collision conditions and using the same
MAPS parameters. The structure for the ions labelled in Figure 5.21
are provided in Figures 5.5 and Figure 5.22. The uniqueness and
correlation for each of the “PD” features are given in brackets in
Figure 5.20. Note that the very diagnostic feature, “PD 198 -> 154”, is
less characteristic of the phenothiazine substructure under multiple
collision conditions than under single collision conditions (U = 92 vs.
U = 68). However, the new features included in the initial rule using
multiple collision conditions are also quite diagnostic (eg. “PD 198 ->
45”, U=90 and “PD 198 -> 166”, U = 87).

The feature-combination rule obtained for U1 = 40%, Ci = 50%,
Cc = 50%, mass filter enabled, and the multiple collision data is shown
in Figure 5.23. Several features observed only under multiple
collision conditions are highlighted in this rule in bold-faced type (eg.
“PD 198 -> 45”). Note that even though the rule content changed,
100% reliability and 100% recall were maintained for the
PHENOTHIAZINE substructure generated using the multiple collision
data. There is one single-feature clause in this rule, “PD 197 -> 170”,
which has 100% reliability in predicting the presence of the
PHENOTHIAZINE substructure within the reference database. As will

202

 

 

P1 m/z 197 [42,46] IV

 

 

m/z 198 m/z 171 [76.76] VI
III

111/: 154 [92,92] VII

 

 

 

P2 I
I‘ll/Z 197 [45,75] IV
111/2 171 [61,84] VI
I'll/Z 165 [87,53] XII
m/z 198 m/z 154 [68,84] VII
III

111/: 128 [53,53] XIII

m/z 127 [81,69] XIV

m/z 45 [90,69]

 

 

 

 

Figure 5.21: Daughter m/z values observed in a MAPS rule from a
m/z 198 parent ion using single and multiple collision
conditions.

 

 

 

203

 

m/z 166 m/z 128 m/z 127

 

 

 

Figure 5.22: Additional structures for fragment ions observed from
phenothiazine derivatives.

be shown in the next chapter, single feature clauses are somewhat
unreliable outside of the reference database. This result is not
particularly surprising if one considers that a substructure
identification based on one feature is not a result that an “expert”
mass spectrometrist would find compelling. Generally a series of
features is the most reliable indicator of the presence of a
substructure. The rules shown here often have two or three features
in each rule clause and it is expected that this length will increase as
more compounds are added to the reference database. The length of
the rule clauses will increase because more features will be required
to achieve 100% uniqueness (reliability) within the reference
database. The rate of increase, however, will decrease rapidly with
increasing database size since all legitimate examples of the
fragmentation of a given substructure will eventually be represented
in the MAPS rules. Thus, the size of the reference database required
to achieve this stabilization of rule content is probably much smaller

than the size required for reliable spectral matching of unknowns.

204

 

(setq S8132_COMB_RULE ‘(
(((PD 197.0 170.0» 100 76 (...) )

))

(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD
(((PD

198.0 45.0) (PD 198.0 154.0)) 100 69 (...))
199.0 167.0) (PD 198.0 154.0)) 100 69 (...))
198.0 45.0) (PD 198.0 171.0)) 100 69 (...))
199.0 167.0) (PD 197.0 196.0)) 100 61 (...))
199.0 167.0) (PD 197.0 153.0)) 100 61 (...))
196.0 45.0) (PD 197.0 196.0)) 100 61 (...))
196.0 45.0) (PD 197.0 153.0)) 100 61 (...))
196.0 45.0) (PD 196.0 69.0)) 100 61 (...))
198.0 45.0)(PD.198.0 127.0)) 100 53 (...))
198.0 45.0) (PD 199.0 155.0)) 100 53 (...))
198.0 45.0) (PD 197.0 196.0)) 100 53 (...))
198.0 45.0) (PD 197.0 153.0)) 100 53 (...))
196.0 45.0) (PD 196.0 169.0)) 100 53 (...))
198.0 166.0) (PD 198.0 127.0)) 100 53 (...))
171.0 45.0) (PD 199.0 167.0)) 100 53 (...))
171.0 45.0) (PD 199.0 155.0)) 100 53 (...))
171.0 45.0) (PD 196.0 169.0)) 100 53 (...))
171.0 45.0) (PD 197.0 196.0)) 100 53 (...))
199.0 167.0) (PD 179.0 153.0)) 100 53 (...))

Ui = 40%, Ci = 50%, Cc = 50%, mass filter enabled

 

Figure 5.23: The MAPS (v.III) feature-combination rule for

PHENOTHIAZINE generated using the indicated
program parameters and the multiple collision data.

 

 

205

Conclusions

The evaluation of the rules generated using MAPS (v.III)
indicates that this new software provides high reliability and high
recall substructure identification rules. The rules obtained for
several substructures were determined to contain features which
correspond to established ion structures using a variety of
instrumental techniques. Some rules, however, clearly contain
features from other substructures. This contamination stems from a
high degree of cross-correlation with another substructure in the
reference compounds. In some instances, raising the correlation
above the cross-correlation value can eliminate the contaminating
features. In other cases, additional compounds containing the
pertinent substructure (and not the contaminating substructure)
need to be added to the reference database to alleviate the cross-
correlation problem. Further evaluation of rule reliability is
addressed in the next chapter using test compounds. Of particular
interest is the minimum advisable uniqueness and correlation values
to be used in rule generation to ensure reliable predictions outside of
the reference database (eg. is a feature-combination rule based on a
combination of low uniqueness and correlation features reliable or

just a statistical anomaly?).

References

1. Busch, K.L., Glish, G.L., McLuckey, S.A., "Mass Spectrometry /
Mass Spectrometry: Techniques and Applications of Tandem
Mass Spectrometry", VCH Publishers, Inc., New York, 1988.

 

 

10.

11.

12.

13.

206

Merck Index

Morosawa, S., Kamal, S., Dandiya, P.C., Sharma, H.C., Ms
3.121291152111211. 309 (1982)-

Hallberg, A., Al-Showaier, 1., Martin, A.R., Lﬂﬂmﬂjgﬂm
2_1_, 841 (1984).

Flurer, R.A., Busch, K.L., MW” 23, 118 (1988).
Gilbert, J.N.T., Millard, B.J., MW, 2, 17 (1969).

Wade, A.P., Palmer, P.T., Hart, K.J., Enke, C.G., Mm
215, 169 (1988).

Palmer, P.T., Hart,'K.J., Enke, C.G., Ialnmﬁ, 16, 107 (1989).

Palmer, P.T., Ph.D. Dissertation, Michigan State University, East
Lansing, MI, 1988.

Falkner, F.C., Watson, J.T., Wm], 8, 257 (1974).
Watson, J.T., Falkner, F.C., WI, 1227 (1973).

Thompson, R.M., Desiderio, D.M., W,L 987
(1973).

Grutzmacher, W.F., Arnold, W., W, 1365
(1966).

Chapter 6
Evaluation of MAPS Rules by Application
to Test Compounds

Introduction

There are two methods of generating MAPS feature-
combination rules. The first method is to generate rules individually
for each substructure using a number of different program
parameters (e.g. Ui). The second method is to generate rules for a
number of substructures using global program parameters (i.e. fixed
Ui/Ci/Cc values for all rules). The reliability of the MAPS (v.III)
rules obtained thus far was evaluated by applying the rules to test
compounds (i.e. compounds not in the reference database). These
compounds were drawn from several compound classes that are
represented in the database (eg. phenothiazines, amphetamines,
opiods and barbiturates). A list containing the compound name,
molecular weight, molecular formula and CAS number of each of the
test compounds is provided in Table 6.1. The structure for each of
these compounds is given in Figure 6.1. The rules used in this
evaluation were those described in Chapter 5 and those contained in

several rulebases generated using global program parameters. A

207

 

.322: mm; 222 2:22:25 2 :8: 26:25:88 82 0222 .2: 2263
5.2 :onE:: m<U 6:22 222228.28 8222622: .3363 22222522: .oEa: 6:256:80 "2.: 03:5.

 

208

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

h-mm-m~_ NONZEESU m _N 0:632:25 om
8-28-88 268222:: 88 22329222222236.2822 2.. 2
6.6643 «02854.20 new 0:368:86: M:
8-8-88 6:28:80 8 m 2 28 62222222232826.2226-m.2 8 2
«132-3 $2386 :62 6:825:52: c2
2-8-288 8:22:60 2.228 228 22.222222281222338-8-523-: m 2
8-8-88 N022.228 S 2 2.2222862228822268 2: 2
«86-2w 02:me :2 905 6:22:95 2
8-8-8 828:2: 38 22:26 65822225222 N2
6-28-8 82:22: 82 22225822 22
8.8.2888 828:8: 2 3 22.828222:-:-8-m-o=222222oe o 2
06-25-28 mOZSE22U 2 2N 0:28.822: 6
8-8-2 2 52:28.20 2 2 m 222222. 22
8-3-2 2 5:22:22: 2.28 2852:2222: 8
~48me meoummﬁu 2mm 65:85:56.2 o
82-2.34 mOZoESU mwm 6:26:88: m
22-8-82 628:8: 82 m 62222222822222.8326 2.
8-8- 22.2 88822816 :8 2223222222822 m
oéméemm 88va180 62. m 6:868:52252605 m
88.8 8262:2820 w 2 m 2228522282226 2
tw<U ":2 32 65222 6:256:25 t 22255::

 

 

209

 

 

 

 

 

 

 

Figure 6.1: Structure for each test compound listed in Table 6.1.

210

 

c6333

12

ll

 

(mi.

l4

l3

 

16

15

 

l8

l7

 

6.1:

 

211

reliability for the individual rules and an overall reliability for each

rulebase with respect to the test compounds was then calculated.

Evaluation of Individual Rules using Test Compounds

Table 6.2 contains a list of MAPS (v.III) rules generated using a
variety of program parameters, the test compounds identified by
each rule and the reliability and the recall for each rule with respect
to the test compounds. These rules were generated for the
PHENOTHIAZINE (i.e. “$8132”), BARBITURATE (i.e. “SSl43”), T-BUTYL
(i.e. “8821”), PHENOL (i.e. “3844”) and AMPHETAMINE (i.e. “SSll8”)
substructures. The program parameters used to generate each of the
rules in Table 6.2 are provided in the rule name using the format:
“< SSname >_< pressure >_< Ui >_< Ci >_< Uc >_< Cc >“. Other program
parameters used were: USORT (enabled), MAXF = 10, MINF = 1,
MINCMPDS '= 10, HITS = 5 (unless otherwise indicated), SSTIME =
3600 seconds.

“PHENOTHIAZINE”

The first three rules listed in Table 6.2 were generated for the
PHENOTHIAZINE substructure. The first and third rules were
generated using the single collision data while the second was
generated using the multiple collision data. Both of these rules have
100% reliability with respect to the test compounds. The recall of the
second rule (generated using multiple collision conditions) is lower

than the recall observed for the first rule (single collision conditions).

 

 

 

Lb

AguaocmEoE 82.235 - ...— .:ouaomu:oE 80.228 .. :0 3:25:88
2.22 on. 8 com—23: 203 8222.2 2: :02; 2:05:52: 22:30.2 2: :5 3:: $22 we a: “"6 035,—.

 

2.2222 8 .55

8 x N x m x x x 2 2.218262184212223 .2"

>4

 

22222 on .322:

mm x x x m x x N 2.2-82to2noou222uw22a .3

 

23222 2 .322:
3.82-81842122: .3

§

 

22222.. cm .322:
3-826222222722222 £2

 

232d
2-82-2..onu222u2222a .2

NNNN

 

~32:
8-813.842.4222: .3

 

. I u .. . W32:
8 82 on em 222 2:23. .2

N

Ft;
EHNNNNN
NNNN

 

212

Fax

rm omlc¢_lcmlcml_%v¢mm .3

FHFHFHNNN
N><FH><><><
>4
[a

 

>4.
hthNNNN

neuoc2uomnc2u222u$a .2

 

FHFHNNNN

.m 8-8218632]? .2

 

N
FR
N
In
Chi-i=4 sax

..H 2 : $.82ucmucuu2mn2ma .2—

 

- u u I 4:222 cm:
cm 82 8 cm 22:222.: .2:

 

. rm rm cnlceloolo—I_%hv~mm .a

'GO °°8388 $ 8 8

 

cnloelohlcVI—qlme—mm .w

 

. u I n .2222 a:
82 x . 8 82 8 cm 222 $23 .5

 

- 8:82ncmuomu2mumza .2.

 

8H N omloo_IcVIch:—Imv~mm .

 

81826221363: .

 

c
Inﬁll
[:4

 

oeueo2nopuc¢nugumm2mm .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

§8§czeoza oco cases as 23 8 as as e

m

2.

omu82uomucms2nsmm2a .n
a

.2

§
>422
xxx
xxx
xxx

equoc2lchucwu2aumm2mm

 

 

213

Thus, the use of multiple collision conditions was not advantageous

for identifying this substructure in the compounds tested.

The third PHENOTHIAZINE rule was generated using the single
collision data but lower Ui/Ci/Cc program parameter values. The
reliability of this rule, however, is lower than the previous two rules
(i.e. two false positives were obtained). Thus, the reliability of the
MAPS rules are dependent on the program parameters used in
generating the rules. The minimum advisable Ui/Ci/Cc program
parameters are difficult to assess since the number of compounds
containing a substructure and the number of initial features obtained
for a Ui/Ci combination varies widely among the substructures in the
substructure library (see Chapter 5). Generally, for substructures
that are represented in only a few reference compounds, larger Ui/Ci
parameter values are required to ensure a reliable rule outside of
the reference database. For substructures contained in many
reference compounds, smaller Ui/Ci parameter values can be used.
For example, the Ui/Ci parameters are percentages so a 50% initial
correlation (Ci) means that for a substructure represented in 36
compounds, a feature must be exhibited in the MS/MS spectra of 18
compounds before it can be used in the generation of feature-
combinations. On the other hand, for a substructure represented in
only 4 compounds, a 50% initial correlation means that only two
compounds must exhibit a feature before it can be used in rule
generation. The features for substructures represented in many
compounds, therefore, have more statistical validity outside the

reference database than those exhibited by only a few compounds.

214

An additional noteworthy observation on Rule 3 in Table 6.2 is
that an identification was made for Test Compound #10 based on a
single feature. As was mentioned in Chapter 5, identifications based
on only a single feature are generally unreliable. The reliability of
rule #3 with no single feature clauses is 83%.. This error can be

avoided by setting MINF to a value greater than one.

“BARBITURATE”

There are seven MAPS rules listed in Table 6.2 that are
associated with barbiturates (i.e. $8143, 88145, 88147 as shown in
Chapter 5). The feature combination rule shown in Figure 5.13 for
the barbiturate substructure ‘SSl43 gave disappointing results (two
false positives and no correct identifications). This rule was
generated with a low initial uniqueness parameter (10%) and a
relatively high initial correlation parameter (50%). The reliability
and recall estimates for this rule with respect to the reference
database were 100%. The reliability and recall for this rule with
respect to the test compounds, however, were 0%. Thus, the
initial uniqueness value of 10% (i.e. a minimum of l in every 10
reference compounds must have the feature and the substructure)
was too low to produce a substructure identification rule which was

reliable outside of the reference database.

Rules 5-7 shown in Table 6.2 were generated using an initial
uniqueness value of 30% and an initial correlation values of 20% and

40%. Rule 6 in Table 6.2 was generated using a lower initial

 

 

215

correlation parameter than Rule 5. Thus, Rule 6 should have an
equal or higher recall than Rule 5 because there are more features to
use in generating a rule. However, as can be seen from Table 6.2,
Rule 6 has a lower recall than Rule 5. The reason for this difference
is the HITS optimization. This optimization exits a rule generation
loop if rule clauses have been generated which recall all of the
compounds a given number of times (as specified by the HITS
parameter). Thus, the MAPS (v.III) program will not continue to add
feature combinations past the point where the. HITS parameter is
satisfied. All of the rules presented here were generated using a
HITS parameter value of 5 unless otherwise indicated. A non-zero
recall was obtained for Rule 5 and not for Rule 6 because a feature-
combination was generated that identified 88143 in Test Compound
#17 using Rule 5 while this same combination was overlooked during
the generation of Rule 6. This combination was not obtained for Rule
#6 because the program exited the feature-combination generation
loop before the combination was tried (due to the HITS limitation

being met).

An interesting effect of another MAPS program parameter,
HITS, was explored using the Rule #7. Rule 7 in Table 6.2 was
generated using the same Ui/Ci/Cc parameters as were used in
generating Rule #6 and with a HITS parameter of 20. The
substructure identification for Test Compound #17 was observed and
a non-zero recall was obtained for this rule. Thus, when using a
lower set of Ui and Ci program parameters, a higher HITS parameter

can be used to increase the number of feature combinations that are

216

tried. This will reduce the probability that the MAPS program will

abort before trying important combinations.

Rule #8 (see Figure 5.15) for the S-allyl barbiturates did not hit
on the one test compound that had this substructure. There were,
however, only three examples of this substructure in the reference
database. Rule #9 in Table 6.2 for the S-ethyl barbiturates was also
a poor rule outside of the reference database. This poor performance
is probably due to the low initial uniqueness parameter used in
generating the rule (10%). Rule 10 in Table 6.2 was generated for
the same substructure using a higher initial uniqueness parameter
(30%), a lower initial correlation parameter (20%) and an increased
HITS parameter (20 hits). The new rule ,did not increase the recall
obtained for this substructure but did decrease the number of false
positives by one. Overall, it appears that there is an insufficient
number of barbiturate standards in the reference database which
contain the 88143, 88145, 88147 substructures to generate reliable

rules for these substructures.

“T-BUTYL” and “PHENOL”.

Not surprisingly, Rules ll-l4 in Table 6.2 for the T-BUTYL and
PHENOL substructures exhibit poor reliability outside the reference
database. As was noted in Chapter 5, these substructures are highly
cross-correlated. Significantly, the T-BUTYL substructure is also
highly cross—correlated with the “XPHENYL” substructure since most
of the reference compounds containing the T-BUTYL substructure are

attached to a benzene ring. For Rule 14, the compounds

217

corresponding to 6 out of the 7 false positives for PHENOL contain a
benzylic carbon. The remedy for this problem is a larger and more
diverse reference database. Ultimately, however, MS/MS spectra
must be able to discriminate among these substructures in order to

obtain reliable rules for them.

“AMPHETAMINE”

The AMPI-IETAMINE substructure was originally defined in an
attempt to generate a MAPS rule that would identify compounds in
this general compound class. The substructure definition used for
this purpose is shown in Figure 6.2(A). Examination of the
“substructure-buckets” for this substructure definition revealed that
many of the opiod reference compounds (eg. Test Compound #6
shown in Figure 6.1) also had this substructure. A third compound
class which contains a very similar substructure was discovered
when Rule 15 listed in Table 6.2 was applied to the test compounds.
All four of the false positives for this rule are phenobarbitals. The
only difference between the substructure in the phenobarbitals and
the substructure shown in Figure 6.2(A) are the presence of two

benzylic hydrogens.

The AMPI-IETAMINE substructure was changed (henceforward
referred to as 88118) to the more generic deﬁnition shown in Figure
62(8). The MAPS rule generated using this new definition is listed
in Table 6.2 (Rule 16). The reliability of this rule is much improved
over the previous rule (i.e. 88% versus 50%). Thus, substructure

deﬁnitions which are too specific, that is, definitions which exceed

218

 

(a

 

 

Figure 6.2: Substructure definition for the “88118” substructu;
(a) with benzylic hydrogens (initial definition) and (b)
without benzylic hydrogens (new definition).

the specificity which MS/MS spectra afford, can also affect rule
reliability outside of the reference database. Several other rules for
88118 are listed in Table 6.2. Of these, Rule 20 is important in that it
demonstrates that rules based on the multiple collision data can
identify substructures in compounds where rules based on single
collision data failed (cg. Test Compounds 13 and 14). This
information will be of use in implementing the EXPT program (see
Chapters 1 and 4). The EXPT program is intended for use in
suggesting ancillary experiments to reduce the number of candidate

structures for an unknown.
Evaluation of Rulebases Generated using Global Parameters

The reliability of two rulebases, one generated using Ui=30%,
Ci=30% and Cc=30% and another generated using Ui=30%, Ci=50% and
Cc=30%, were calculated to determine the relative merit of using
global parameters to generate MAPS rules. The overall reliability of

each rulebase is provided in Table 6.3. The total number of

 

219

predictions and the number of rules in each rulebase are also

 

 

 

 

 

 

 

provided.

P1 30 30 30 P1 30 50 30
REL 64 80
number of
predictions 118 51
number of
rules 31 ll

 

Table 6.3: Reliability, number of predictions and number of rules
obtained for three MAPS rulebases.

The overall reliabilities shown in Table 6.3 indicate that the
global parameters used (30%/30%I30% and 30%/50963096) were not
optimal for reliable rule generation (i.e. they had overall reliabilities
less than 100%). Specifically, two false positives were observed for
the PHENOTHIAZINE substructure using the “Pl_30_30_30” rulebase.
Yet, it has been shown that a 100% reliable rule (with respect to the
test compounds) can be generated using a Ui/Ci/Cc combination of
40%/70%/50%. The “P1_30_50_30” rulebase has a higher overall
reliability than the “Pl_30_30_30” rulebase, however, fewer rules
are obtained using the higher correlation value (i.e. Ci=50%). One
problem with using global parameters is that it allows the number of
initial features used in rule generation to vary over a broad range for
the substructures contained in the substructure library (see Chapter
5). Thus, selecting initial features using global parameters (percent
uniqueness and correlation) does not appear to be the best method of
creating a rulebase. A possible new method for generating feature-

combination rules that may yield better results is to allow the MAPS

 

 

220

software to vary the Ui and Ci parameters to select a fixed number of
features (i.e. the “best” features available in the database). Thus,
each rule in a rulebase created using this new method will be
generated utilizing approximately the same number of initial
features (with the advantage that the features are the best that are

available within the database).

Additional problems with the MAPS rules were discovered
during this investigation. Since each feature-combination in a MAPS
(v.III) rule possesses 100% reliability, the presence of any feature-
combination in the MS/MS spectra of an unknown was deemed to be
sufficient to make a substructure identification. However, many of
the false positives observed, using the “Pl_30_30_30” rulebase were
based on a small number of rule clauses “firing” (i.e. the feature-
combination representing a rule clause Was found in the MS/MS
spectra of an unknown). The RULE program lists each rule clause
that fires for an unknown. The overall reliability for the rulebase
was recalculated using a new rule application method (i.e. at least 10
rule clauses firing before an identification is made). The results
obtained for this study are summarized in Table 6.4. The overall
reliability of the rulebase increased from 64% to 82% using this new

method of rule application.

Ten of the test compounds were added to the database to
increase the number of reference database compounds used for rule
generation. The reliability of the rulebase obtained from the new

reference database was determined using the remaining 10 test

 

 

 

221

compounds.‘ This process was repeated by switching the ten
compounds added to the reference database and the ten used as test
compounds. The results of this study are also provided in Table 6.4.
There was a slight improvement in the reliabilities observed using

the larger reference databases (using both methods of rule

 

 

application).
# of cmpds in DB REL # of pred.’s # of rules
1 0 S 64% l 3 5 2 0
(82%)* 6 8 2 0
l l 5 65% 3 2 l 9
(88%)* - 2 8 l 9
1 l 5 69% 4 3 1 9
(86%)* 3 2 l 9

CE = 30 eV, p = 0.4 mtorr
Ui = 30%, Ci = 30%, Cc = 30%
HITS = 20

" minimum of 10 rule clauses for an identification to be made

 

Table 6.4: Rule reliabilities, number of predictions and number of
rules for several MAPS rulebases obtained using the
indicated parameters.

A very promising trend was observed in the false positives
obtained using the rulebases shown in Table 6.4. Many of these false
positives may be classiﬁed as “near misses” and have to do with the
way substructures are interpreted by the structure checking

software. The substructure list used to determine correct and

222

incorrect predictions was obtained in the same way as the list used
to generate the “substructure buckets”. For example, one false
positive was obtained for test compound #5 (i.e. “SSlZl” - a six
carbon membered ring with one CH2 substituent and the remaining
free valences undefined). The structure corresponding to test
compound #5 (see Figure 6.1) clearly has a six membered ring with a

CH2 substituent.

The reason this substructure identification is considered a false
positive is that the ring corresponding to the “88121” substructure in
test compound #5 is fused to a benzene ring. Thus, two of the carbon
atoms in the potential “SSlZl” ring are doubly bonded (in one
resonance form) and the structure checking software does not
include the “88121” substructure in the substructure list for test
compound #5. One-half of the false positives associated with the
first rulebase listed in Table 6.4 can be classified as “near—misses”.
Another one-quarter of these false positives may be attributed to
cross-correlation, removed by use of a better rule (i.e. a rule
generated using more stringent program parameters) or neglected
because the MS/MS data used can not be reasonably expected to
identify the substructure (cg. aromatic ring isomers). Thus, there is
considerable evidence that further expansion of the reference
database and refinement of the MAPS software will yield
substructure identification rulebases with sufficient reliability for

use in a generalized structure elucidation system (ACES).

 

223

Another important point is the reliability of the first database
increases to 88% if test compound #10 is neglected. The threshold
used to acquire daughter spectra for test compound #10 was set an
order of magnitude lower than the rest of the test compounds
because the large majority of the ion current for this compound is
concentrated in only a few ions. Thus, no identifications were
obtained using the 1% threshold (i.e. the threshold used for the rest

of the test compounds).

In summary, the major causes of false positives for the MAPS
rules were: 1) cross-correlation among substructures, 2)
inappropriate substructure definitions and 3) inappropriate
application of the rules. The use of a lower threshold for acquiring
daughter spectra also increased false positives. This last result may
change, however, when a larger reference database is used to
generate the rules. A lower threshold is desirable because more
substructure identifications can be made if daughter spectra of weak

primary scan ions are available.

Recommendations for Future Development of MAPS

The following are the major recommendations for future
development of the MAPS software. These recommendations are
based on the evaluations made of the MAPS rules discussed in this
and other chapters of this thesis. First, the MAPS code should be

modified to Select the values of the Ui/Ci variables so that a fixed

 

 

 

224

number of initial features (i.e. the “best” features) are selected for
use in generation of feature combinations. Second, the reference
database should be increased in size to provide more examples (i.e.
provide a better statistical basis) of the fragmentation of the
substructures contained in the substructure library. A larger
reference database will also decrease the number of false
correlations ’in the rules and, therefore, decrease false positives.
Third, the structure software should be modified to eliminate the
problem of overlapping substructures (i.e. to take into account the
“near-misses”). Fourth, the substructure library should be increased
to exploit a larger number of the functionalities present in the
reference database compounds. Fifth, a truly novel and extremely
valuable contribution to the ‘optimal generation of MAPS rules would
be to provide a mechanism for the software to define its own
substructures, rather than looking for only those defined in the
substructure library. The molecular formula generator and the
“daughter buckets” may prove useful information to implement this
new procedure since the daughter ions are indicative of important
functionalities present in the reference database compounds. Lastly,
the RULE program should be modified to allow a user-defined
minimum number of rule clauses which must “fire” before a

substructure identification is made by the program.

 

225

Conclusions

The MAPS (v.III) software has proven to be effective in
generating substructure identification rules based on feature-
combinations. The rules obtained for several substructures were
found to be highly reliable when applied to test compounds. Other
substructure identification rules which exhibited false positives
among the test compounds were often found to be “near-misses” or
due to cross-correlation with another substructure which was also
present in the test compounds. The addition of more compounds to
the reference database and a further modification of the MAPS code
should alleviate these problems with the MAPS rules. It is also
important to examine closely the substructure definitions used in
generating the MAPS rules and in assigning the list of correct
predictions for the test compounds. Inappropriate substructure
definitions often lead to false positives. These changes will increase
the overall reliability and applicability of the substructure

identification rulebase(s) used by the ACES system.