PERVASIVE ALTERNATIVE RNA EDITING IN TRYPANOSOMA BRUCEI 

By 

Laura Elizabeth Kirby 

 

 

 

 

 

 

 

A DISSERTATION 

Submitted to 

Michigan State University 

in partial fulfillment of the requirements 

for the degree of 

 

Microbiology and Molecular Genetics—Doctor of Philosophy 

 

 

2019 

ABSTRACT 

PERVASIVE ALTERNATIVE RNA EDITING IN TRYPANOSOMA BRUCEI 

By 

Laura Elizabeth Kirby 

Trypanosoma brucei is a single celled eukaryote that utilizes a complex RNA editing 

system to render many of its mitochondrial genes translatable.  Editing of these genes requires 

multiple small RNAs called guide RNAs to direct the insertion and deletion of uridines.  These 

gRNAs act sequentially, each generating the anchor binding site for the next gRNA.  This 

sequential dependence should render the process quite fragile, and mutations in the gRNAs 

should not be tolerated.  In the examination of the gRNA transcriptome of T. brucei, many 

gRNAs were identified that are capable of generating alternative mRNA sequences, and 

potentially disrupting the editing process.  In this work, the effects of alternative editing are 

characterized.  This analysis revealed the role of gRNAs in developmental regulation of gene 

expression, showing a correlation between the abundance of the initiating gRNAs across two 

different points in the life cycle of T. brucei and their expression.  This study also revealed the 

existence of mitochondrial dual-coding genes, which provide protection for genetic material 

that is not under selection at all points of the life cycle of T. brucei.  The examination of these 

dual-coding genes showed that RNA editing patterns can shift between cell lines and under 

different energetic conditions.  Examining the gRNAs involved in these editing pathways 

revealed that there is a high amount of mismatching base pairs that are tolerated for editing to 

function, and that gRNA abundance is not a reliable predictor for editing preference.  Finally, a 

reexamination of the gRNA transcriptome revealed that many gRNAs are still unidentified and 

most likely are generating new alternatively edited sequences. 

 

 

ACKNOWLEDGEMENTS 

 

I would first like to acknowledge my family, without whose support I could not have 

completed the work I have done.  I am forever grateful for their continuous encouragement 

and constant faith in me.   

I gratefully acknowledge Dr. Donna Koslowsky, who has been an excellent mentor.  She 

not only provided her extensive insights on the field of RNA editing, but also aided and advised 

me in my professional development.  She had shaped me as a researcher and as a person, and I 

was very fortunate in being able to work with her.   

I would also like to thank the small undergraduate army I have had the priveledge to 

work with.  They taught me a great deal about mentoring and leadership, and their many hours 

greatly aided me in the completion of this work.   

For their assistance in my computational education, I would like to thank Dr. Yanni Sun 

and Dr. Arend Hintze.  Without their assistance, my research would not have been possible.   

For serving as my graduate committee, I would like to thank Dr. Shannon Manning, Dr. 

Charles Hoogstraten, Dr. Chris Adami and Dr. Yanni Sun.  Their many contributions have shaped 

and refined my research.   

I would also like to thank Dr. Cori Fata-Hartley, for serving as my teaching mentor and 

helping me design and conduct an education research project.   

I would like to thank the friends I have gained during my time at Michigan State who 

have supported me and been wonderful colleagues: Alexis Weber, Sandy Olenic, Ahrom Kim, 

and Shreya Saha.   

 

iv 

Finally, I would like to thank the Department of Microbiology and Molecular Genetics, 

the College of Natural Science, the Elenor L. Gilmore Endowment, the Frank Peabody 

Microbiology Student Research Fund, the Russell B. DuVall Endowment, the Berttina 

Wentworth Fellowship, and the Marvis A. Richardson Endowed Fellowship for their support of 

my research.   

 

 

 

v 

TABLE OF CONTENTS 

 

LIST OF TABLES ......................................................................................................................... ix 
 
LIST OF FIGURES ........................................................................................................................ x 
 
CHAPTER 1: INTRODUCTION .................................................................................................... 1 
Kinetoplastids ............................................................................................................... 1 
Kinetoplastid RNA Editing ............................................................................................ 1 
Trypanosoma brucei .................................................................................................... 2 
Trypanosoma vivax ...................................................................................................... 6 
Trypanosoma cruzi ....................................................................................................... 6 
Leishmania spp. ............................................................................................................ 7 
Phytomonas spp. .......................................................................................................... 7 
Procyclic gRNA Transcriptome ..................................................................................... 8 
Evolution and retention of RNA editing in kinetoplastids ......................................... 10 
Dual-coding and dual-function genes ........................................................................ 13 
Project Summary ........................................................................................................ 15 

 
CHAPTER 2: ANALYSIS OF THE TRYPANOSOMA BRUCEI EATRO 164 BLOODSTREAM GUIDE RNA 
TRANSCRIPTOME ................................................................................................................... 18 
Abstract ...................................................................................................................... 18 
Author Summary ........................................................................................................ 19 
Introduction ............................................................................................................... 19 
Materials and Methods .............................................................................................. 21 
Results ........................................................................................................................ 23 
Discussion................................................................................................................... 37 
Accession Numbers .................................................................................................... 40 
Acknowledgments...................................................................................................... 40 

 
CHAPTER 3: MITOCHONDRIAL DUAL-CODING GENES IN TRYPANOSOMA BRUCEI ............... 42 
Abstract ...................................................................................................................... 42 
Author Summary ........................................................................................................ 43 
Introduction ............................................................................................................... 43 
Materials and Methods .............................................................................................. 46 
Results ........................................................................................................................ 49 
Discussion................................................................................................................... 60 
Acknowledgments...................................................................................................... 65 

 
CHAPTER 4: ANALYSIS OF THREE PAN-EDITED MRNAS REVEALS DUAL-CODING GENES AND 
COMPLEX MULTIPATH EDITING ............................................................................................. 66 
Abstract ...................................................................................................................... 66 

 

vi 

Introduction ............................................................................................................... 67 
Materials and Methods .............................................................................................. 70 
Results ........................................................................................................................ 73 
Discussion................................................................................................................... 93 
Acknowledgements .................................................................................................... 99 

 
CHAPTER 5: CLUSTER CLASSIFICATION OF UNKNOWN GRNAS REVEALS THE ROBUSTNESS OF 
THE RNA EDITING SYSTEM ................................................................................................... 100 
Abstract .................................................................................................................... 100 
Introduction ............................................................................................................. 101 
Materials and Methods ............................................................................................ 105 
Results ...................................................................................................................... 108 
Discussion................................................................................................................. 129 
Acknowledgements .................................................................................................. 134 

 
CHAPTER 6: SUMMARY AND DISCUSSION ........................................................................... 135 
Introduction ............................................................................................................. 135 
Summary of Chapter 2 ............................................................................................. 136 
Summary of Chapter 3 ............................................................................................. 137 
Summary of Chapter 4 ............................................................................................. 138 
Summary of Chapter 5 ............................................................................................. 139 
Genetic Integrity ...................................................................................................... 140 
Developmental Regulation ...................................................................................... 144 
Protein Diversity ...................................................................................................... 146 
Editing Efficiency ...................................................................................................... 147 
Future Work ............................................................................................................. 149 
Conclusion ................................................................................................................ 151 

 
APPENDICES ......................................................................................................................... 152 

APPENDIX A. Quantification of the number of identified bloodstream and procyclic 
gRNA transcripts that cover a respective nucleotide in the fully edited mRNA...... 153 
APPENDIX B.  Alignment of the mitochondrial fully edited mRNAs and the most 
abundant gRNAs required for full coverage identified in the bloodstream and 
procyclic life cycle stages ......................................................................................... 158 
APPENDIX C.  All gRNA major classes pulled for ATPase 6 in the EATRO 164 procyclic 
and bloodstream transcriptomes ............................................................................ 187 
APPENDIX D. Identified CR3 mRNA and gRNA transcripts....................................... 215 
APPENDIX E. ND7 5'-most gRNA populations and the predicted mRNA sequences 
generated ................................................................................................................. 217 
APPENDIX F. RPS12 5'-most gRNA populations and the predicted mRNA sequences 
generated ................................................................................................................. 219 
APPENDIX G.  Alignments of T. brucei and T. vivax edited mRNAs ......................... 220 

 

vii 

APPENDIX H.  Alignments of protein sequences of pan-edited dual-coding genes in 
L. tarentolae, L. amazonensis, P. serpens, and Perkinsela CCAP1560/4 with T. brucei 
and T. vivax sequences ............................................................................................ 227 
APPENDIX I. RPS12 gRNA Alignments for TREU 667 SDM79 and EATRO 164 SDM79 
cells , and all editing variants ................................................................................... 234 
APPENDIX J.  gRNAs identified to edit the RPS12 mRNAs of found in both TREU 667 
and EATRO 164 gRNA transcriptomes ..................................................................... 241 
APPENDIX K. ND7 gRNA Alignments for TREU 667 SDM79 and EATRO 164 SDM79 
cells, and all editing variants .................................................................................... 244 
APPENDIX L.  gRNAs identified to edit the ND7 5’ mRNAs of found in both TREU 667 
and EATRO 164 gRNA transcriptomes ..................................................................... 251 
APPENDIX M.  Predicted ND7 protein sequences ................................................... 253 
APPENDIX N. CR3 gRNA Alignments for TREU 667 SDM79, and all editing variants254 
APPENDIX O. CR3 gRNA Alignments for EATRO 164, and all editing variants ......... 261 
APPENDIX P.  gRNAs identified to edit the CR3 mRNAs of found in both TREU 667 
and EATRO 164 gRNA transcriptomes ..................................................................... 270 
 

REFERENCES ......................................................................................................................... 274 

 

 

 

 

viii 

LIST OF TABLES 

 
 
 

Table 1. Differences in mitochondrial transcript abundance, polyadenylation and the extent 
of RNA editing in two life cycle stages of T. brucei. ................................................................. 5 
 
Table 2. Number of gRNA transcripts in procyclic and bloodstream major classes and ratio of 
procyclic transcripts to bloodstream transcripts for each gene. ........................................... 25 
 
Table 3. Summary of the gRNA data coverage for each gene. .............................................. 26 
 
Table 4. Most common gRNA transcription start sites in procyclic and bloodstream data. . 29 
 
Table 5. Identified gaps or weak overlaps (less than 6 nucleotides) between populations of 
gRNAs observed in both data sets. ........................................................................................ 31 
 
Table  6.  Summary  of  populations  found  in  both  data  sets  that  have  more  reads  in  the 
bloodstream data set than in the procyclic data set. ............................................................ 32 
 
Table 7.  Editing efficiencies of RPS12, ND7, and CR3. .......................................................... 76 
 
Table 8.  Editing efficiency for each RPS12 gRNA population. .............................................. 80 
 
Table 9.  Editing efficiency for each ND7 5’domain gRNA population .................................. 84 
 
Table 10.  Editing efficiencies by block level of CR3 .............................................................. 85 
 
Table 11.  Summary of ACORNS Results .............................................................................. 110 
 
Table 12.  Cluster summary ................................................................................................. 110 
 
Table 13. Cluster size summary ........................................................................................... 110 
 
Table 14.  gRNA population analyses for RPS12. ................................................................. 118 
 
Table 15. gRNA population analysis for ND7 5’ ................................................................... 124 
 
Table 16.  CR3 gRNA population analysis ............................................................................ 127 
 

 

 

ix 

LIST OF FIGURES 

 
 
 

Figure 1. The abundance of the initiating gRNA of all edited mRNAs in each stage. ............ 25 
 
Figure 2. The frequency of nt variations versus nucleotide position in the gRNA. ............... 27 
 
Figure  3.  Comparing  the  number  of  non-complementary  nucleotides  5’  of  the  anchoring 
region or 3’ of the guiding region in procyclic and bloodstream gRNAs. .............................. 28 
 
Figure 4. Length of gRNA complementarity to fully edited mRNAs for both bloodstream and 
procyclic gRNAs. ..................................................................................................................... 28 
 
Figure 5. The percentage of different nucleotide overlaps found between adjacent gRNAs ...  
................................................................................................................................................ 30 
 
Figure  6.  Alignment  of  conventional  ATPase  6  protein  sequence  to  hypothetical  proteins 
generated by the 11U alternative edited mRNA and the 4U alternatively edited mRNA. ... 33 
 
Figure 7. Editing sites 420–489 of COIII aligned with the gRNAs identified for that region in 
the procyclic and bloodstream data sets. .............................................................................. 34 
 
Figure 8. Alternative editing of the 5' end of pan-edited genes results in access to different 
reading frames. ...................................................................................................................... 52 
 
Figure 9. Positions of stop codons on all RFs of the edited genes in T. brucei. ..................... 53 
 
Figure 10. Mutational frequencies in mitochondrially encoded genes categorized by effect 
on amino acid sequence. ....................................................................................................... 55 
 
Figure 11. Percent conservation of editing patterns between T. brucei and T. vivax. .......... 56 
 
Figure  12.  Principal  component  analysis  of  frequency  of  amino  acid  mutation  types  and 
editing conservation between T. brucei and T. vivax pan-edited transcripts. ...................... 58 
 
Figure 13. Amino acid sequences of ARFs of dual-coding genes. .......................................... 60 
 
Figure 14.  Observed RPS12 editing pathways in the TREU 667 cell line and the EATRO 164 
cell line grown in SDM79 and SDM80 .................................................................................... 79 
 
Figure  15.    Alignment  of  RPS12  proteins  from  T.  brucei,  T.  vivax,  Leishmania  tarentolae, 
Leishmania donovani, and Leishmania amazonensis.   ......................................................... 80 
 

 

x 

Figure 16. Regions with poor gRNA coverage and functionally conserved residues in RPS12 .  
................................................................................................................................................ 81 
 
Figure 17.  Observed ND7 5’ editing pathways in the TREU 667 cell line and the EATRO 164 
cell line grown in SDM79 and SDM80 .................................................................................... 82 
 
Figure 18. Regions with poor gRNA coverage and functionally conserved residues in ND7 5’  
................................................................................................................................................ 84 
 
Figure 19.  Observed CR3 editing pathways in the TREU 667 cell line .................................. 87 
 
Figure 20. Four different 3’ end sequences found in the TREU 667 transcriptome for the CR3 
transcript and CR3 protein sequences ................................................................................... 88 
 
Figure 21.  Observed CR3 editing pathways in the EATRO 164 cell line grown in SDM79 and 
SDM80 .................................................................................................................................... 90 
 
Figure 22.  Alignment of CR3 predicted protein variants from the EATRO 164 cell line. ...... 91 
 
Figure 23.  Predicted secondary structures of most abundant CR3 predicted proteins ....... 91 
 
Figure 24.  Frequencies of early total deletions of DNA encoded uridines in partially edited 
ND7 and RPS12 transcripts .................................................................................................... 93 
 
Figure 25.  Example clusters of related gRNAs generated by ACORNS from the EATRO 164 PC 
gRNA transcriptome ............................................................................................................. 112 
 
Figure 26. Observed RPS12 editing pathways in the TREU 667 cell line the EATRO cell line ....  
.............................................................................................................................................. 117 
 
Figure 27.  Analysis of functionality and abundance of productive gRNAs populations that 
edit RPS12 in TREU 667 cells ................................................................................................ 119 
 
Figure 28.  Analysis of functionality and abundance of productive gRNAs that edit RPS12 in 
EATRO 164 cells.................................................................................................................... 120 
 
Figure 29. Observed ND7 5’ editing pathways in TREU 667 cell line and the EATRO 164 cell 
line ........................................................................................................................................ 122 
 
Figure 30.  Analysis of functionality and abundance of productive gRNAs that edit that edit 
ND7 5’ in TREU 667 cells ...................................................................................................... 123 
 
Figure 31.  Analysis of functionality and abundance of productive gRNAs that edit ND7 5’ in 
EATRO 164 cells.................................................................................................................... 123 

 

xi 

 
Figure 32. Observed CR3 editing pathways in TREU 667 and EATRO 164 cell lines ............ 126 
 
Figure 33.  Analysis of functionality and abundance of gRNA subpopulations that edit CR3 in 
TREU 667 cells ...................................................................................................................... 128 
 
Figure 34.  Analysis of functionality and abundance of gRNA subpopulations that edit CR3 in 
EATRO 164 cells.................................................................................................................... 128 
 
 

 

 

xii 

CHAPTER 1: INTRODUCTION 

Kinetoplastids 

Trypanosoma brucei is a member of the Kinetoplastea, a group of protozoans 

characterized by a large network of DNA in their mitochondria known as the kinetoplast that is 

physically attached to the flagellum [1].  While not all kinetoplastids are parasites, the group 

encompasses some of the most successful parasites in existence, inhabiting an incredibly wide 

range of hosts from plants to invertebrates to vertebrates [2,3]. The dixenous members cycle 

between two distinct hosts and can encounter different environments with distinct metabolic 

constraints. These environmental shifts require rapid and extensive changes in gene expression.  

This is particularly interesting considering the kinetoplastids’ bizarre and complicated use of 

RNA editing for their mitochondrial gene expression.   

Kinetoplastid RNA Editing 

RNA editing is one of several unique genetic features found in the mitochondria of these 

parasites. RNA editing creates open reading frames in “cryptogenes” by insertion and deletion 

of uridylate residues at specific sites within the mRNA. The U-insertions/deletions are directed 

by small guide RNAs (gRNAs) and can repair frameshifts, generate start and stop codons and 

more than double the size of the transcript [4]. 

The kinetoplast DNA (kDNA) consists of two types of DNA molecules, maxicircles and 

minicircles. Maxicircles are large circular DNA molecules that contain the genes for two 

ribosomal RNAs, 12S and 9S, and the protein coding genes [5]. While some of the protein-

coding genes do not require RNA editing prior to translation, most require extensive editing 

 

1 

before they can be translated [6,7]. The sequence changes are guided by small complementary 

RNA molecules (the gRNAs) that are encoded on the minicircles [8]. Minicircles make up the 

bulk of the kinetoplastid network with each minicircle encoding 1–5 gRNAs. 

This effectively means that the genetic information for the edited mitochondrial mRNAs 

is dispersed between the mRNA cryptogenes on the maxicircles and as many as 10,000 gRNA 

encoding minicircles. In T. brucei, the extensive editing of a single transcript can require more 

than 40 gRNAs and hundreds of editing events [9]. The gRNAs act as templates for the large 

multi-subunit protein complex known as the editosome [4,6]. The editosome cleaves the 

mRNA, inserts or deletes the correct number of uridines and then re-ligates the mRNA in an 

energy intensive process. This is repeated until the mRNA is complementary to the small gRNA. 

The initiating gRNA interacts with the 3' end of the pre-edited transcript and generates the 

anchor binding region for the next gRNA.  In fact, all subsequent gRNAs anchor to the edited 

sequence created by the preceding gRNA. Editing proceeds from the 3' end to the 5' end of the 

mRNA transcript with the terminating gRNA either creating the start codon or bringing an 

existing start codon into frame. Because each gRNA directs editing that generate the anchor 

region for the next gRNA, the RNA editing process is sequentially dependent on correct editing 

by each gRNA. As a result, the process is incredibly fragile. 

Trypanosoma brucei 

Trypanosoma brucei is the causative agent of Human African trypanosomiasis (HAT) and 

one agent of Animal African Trypanosomiasis (AAT).  Each year, 10,000 new cases of HAT are 

reported, and 3 million cattle are killed, severely impacting the lives and livelihood of those in 

infected areas [10,11].  The trypanosomes live in two distinct environments: the animal host 

 

2 

and the insect vector, the tsetse fly. These environments are distinct in temperature and 

nutrient composition, providing a challenge to T. brucei as it cycles between hosts. While in the 

mammalian host, T. brucei lives entirely extracellularly in the bloodstream. It is frequently 

subject to attacks by the host’s adaptive immune system, and the population evades these 

attacks through antigenic variation [12]. This part of the life cycle can be quite long, with the 

longest known infection lasting 29 years [13]. In the bloodstream, the bulk of the trypanosome 

population exists in the actively dividing slender form. The slender form is optimized to utilize 

its glucose rich environment, using glycolysis to generate energy [14,15]. During this stage of 

the life cycle, the mitochondrion is down-regulated, lacking both Krebs cycle enzymes and a 

functional electron transport chain (ETC) [16]. While the activity of the mitochondrion is 

relatively low during the bloodstream stage (BS), expression of the mitochondrial genome is still 

essential [17,18]. Once the population reaches an optimum density, a small portion of the 

population transitions into stumpy form trypanosomes.  The stumpy form is nondividing and 

appears to be transitional, activating mitochondrial genes in preparation for uptake in a blood 

meal by its tsetse fly vector and subsequent transfer to a harsher environment [14]. Successful 

transition to the fly vector requires activation of the ETC, and ATP synthesis via oxidative 

phosphorylation. Once inside the tsetse fly, the parasite utilizes proline as its primary energy 

source while residing and actively dividing in the midgut [19–21]. This stage of the life cycle is 

followed by a dramatic bottleneck when the trypanosomes transition from the midgut to the 

salivary glands of the tsetse fly, with as few as 1-5 trypanosomes completing the transition 

[22,23]. From the salivary glands, trypanosomes are then refluxed into their next mammalian 

 

3 

host during a bloodmeal. In order to adapt to these sudden changes in environment, T. brucei 

must vastly alter its gene expression, most notably, in its mitochondria.   

The 22 kb maxicircle of T. brucei encodes several genes involved in the mitochondrial 

ETC and oxidative phosphorylation, NADH dehydrogenase (ND) subunits 1-5 and 7-9, 

cytochrome oxidase (CO) subunits I-III, cytochrome b (CYb), ATP synthase subunit 6 (A6), as well 

as genes encoding the ribosomal protein small subunit 12 (RPS12), 12S and 9S rRNAs, and some 

genes with unknown functions: C-rich regions (CR) 3 and 4, and Maxicircle unidentified reading 

frames (Murf) 2 and 5 [5]. Twelve of these genes require some amount of RNA editing to be 

translatable, with some requiring only one or two gRNAs (COII, CYb, MurfII), and others 

requiring editing across the span of the transcript (ND3, ND7, ND8, ND9, COIII, A6, RPS12, CR3, 

and CR4) [4,6,7].  

Distinct differences in mitochondrial transcript abundance, polyadenylation and the 

extent of RNA editing are observed during the complex life cycle (Table 1). The pattern of 

differential RNA editing observed is especially interesting. For example, the CYb and COII 

mRNAs are edited during the insect stage, but are primarily unedited in bloodstream forms 

[24,25]. In contrast, editing of the NADH dehydrogenase subunit transcripts (ND3, ND7, ND8 

and ND9) and RPS12 appears to occur preferentially in bloodstream forms [5,26–30]. Other 

transcripts, COIII and A6 are edited equally in both life cycle stages [31,32].  

Aside from the genes encoded on the maxicircle, there are the minicircle gRNAs.  The 

minicircles range in abundance from 5,000–10,000 present in each network, and are ~1kb in 

size, with each minicircle encoding 2–5 gRNAs. In T. brucei, there are more than 200 different 

minicircle sequence classes (~1200 gRNAs) [8,33]. While the minicircles make up a bulk of the

 

4 

Table 1. Differences in mitochondrial transcript abundance, polyadenylation and the extent of RNA editing in two life cycle stages 
of T. brucei. 

No. of uridines 

Relative level of mature RNA 

PolyA tail length 

Added 

Deleted 

Edited 
size (nt) 

Stage 
Edited 

Long 

Slender 

Short 
Stumpy 

Procyclic 

Bloodstream 

Procyclic 

2-17 (tail) 

7(tail) 

34 
448 

0 

4 

547 

0 
0 

210 

0 
0 

553 

259 
345 

132 

26 
N.D. 
148 
325 

0 
0 
0 
28 
0 

0 

41 
0 
0 
13 
0 
0 

89 

46 
20 

28 

4 

N.D. 
13 
40 

1149 
611 
1,151 
821 
1,647 

663 

969 
960 
1,343 
452 
1,314 
1,779 

1,238 

574 
649 

325 

1,111 
N.D. 
299 
567 

N.D. 
N.D. 
Pa 

P/BSb 
NEc 

P 

P/BS 
NE 
NE 
P/BS 
NE 
NE 

5’P/BS, 
3’BSd 
BSe 
BS 

BS 

P/BS 
N.D. 
BS 
BS 

0.04 
0.07 
~0 
1.0 
0.07 

~0 

1.0 
~1 
>1.0 
>1.0 
~1.0 
0.5 

~10 

~20 
>1.0 

>1.0 

~1 
N.D. 
>1.0 
1.0 

1.3 
1.4 
0.5 
N.D.f 
0.4 

0.5 

N.D. 
N.D. 
N.D. 
N.D. 
N.D. 
0.8 

N.D. 

N.D. 
N.D. 

N.D. 

~1 
N.D. 
N.D. 
N.D. 

1.0 
1.0 
1.0 
1.0 
1.0 

1.0 

1.0 
~1 
1.0 
1.0 
~1.0 
1.0 

1.0 

1.0 
1.0 

1.0 

~1 
N.D. 
1.0 
~0 

N.A. 
N.A. 

N.A. 
N.A. 

Short (UEh & Ei) 

Short (E) & Long (E) 

Short (E) & Long (E) 
Short (E) & Long (E) 

Short 

Short 

Short (E) 

Short & Long 
Short & Long 

Short (E) 

Short & Long 

N.D. 

Short(E) 

Short (E) & Long (E) 
Short (PE) & Long (E) 

Short & Long 

Short (UE) & Long 

(E) 

Short(E) & Long(E) 

Short & Long 
Short & Long 

Short (E) 

Short & Long 

N.D. 

Short(UE) 

Short (E) 
Short (PE) 

Short(UE & E) & 

Short(UE & E) & 

Long (E) 

Short & Long 

Long (E) 

Short & Long 

N.D. 
N.D. 

N.D. 
N.D. 

Short (E) & Long (E) 

Short (UE) 

Number 

of PC 
major 
classesg 

N.A.j 
N.A. 
11 
81 
N.A. 

N.A.k 

151 
N.A. 
N.A. 
34 
N.A. 
N.A. 

129 

70 
39 

50 

1 

N.A. 
37 
41 

References 

[24,34,35] 
[24,34,35] 

[24,25,36,37] 

[31,36] 

[24,37,38] 

[24,37,39] 

[32] 

[38,40,41] 
[37,42,43] 

[26] 

[37,38,44,45] 

[24,38] 

[5,27] 

[5,28,37] 

[29] 

[30] 

[40,46] 

N.D. 
[47] 
[48] 

Gene 

12S 
9S 
CYb 
A6 
COI 

COII 

COIII 
ND1 
ND2 
ND3 
ND4 
ND5 

ND7 

ND8 
ND9 

RPS12 

Murf 2 
Murf 5 

CR3 
CR4 

aP, transcript is edited only in the procyclic (insect) developmental stage. 
bP/BS, transcript is edited in both bloodstream and procyclic stages. 
cNE, never edited, editing of these transcripts has not been reported. 
dThe ND7 transcript is differentially edited in the procyclic and bloodstream stages. 
eBS These transcripts are only fully edited in the bloodstream developmental stage 
[49]. 
fN.D. Values have not yet been determined. 
 

gAll data comes from EATRO 164 procyclic gRNA transcriptome previously 
published [9]. 
hUE, unedited, the transcripts which carried these tails were typically unedited. 
iE, edited, the transcripts carrying these tails were typically edited. 
jN.A., Not applicable. 
kCOII is a cis-edited transcript. Poly-A tails listed as short are between 10 and 50 nts 
long and tails listed as long are between 150 and 200 nts long.

 

5 

mitochondrial DNA, early studies using both Northern blot and primer extension analyses on a 

limited number of gRNAs indicate that gRNAs are present in both insect and bloodstream 

forms, suggesting that the regulation of RNA editing is not at the level of gRNA availability 

[28,50,51]. 

Trypanosoma vivax 

Like T. brucei, Trypanosoma vivax is a causative agent of AAT.  Its kinetoplast DNA is also 

very similar.  The maxicircles in T. vivax possess the same genes as the maxicircle of T. brucei, 

and the genes that are edited in T. brucei are also edited in T. vivax to the same extent [52].  

The minicircles of T. vivax vary significantly in size, from 300-1100 bp, encoding 1-3 gRNAs [52].  

The life cycle of T. vivax is highly similar to that of T. brucei.  In its mammalian host, it lives 

extracellularly in the bloodstream, primarily metabolizing glucose.  When it is taken up by the 

tsetse fly in a bloodmeal, it initially resides in the proventriculus and foregut.  From there, cells 

migrate to the proboscis and propagate, preparing to be deposited into the next mammalian 

host [53].   

Trypanosoma cruzi 

T. cruzi is known for causing Chagas disease in South and Central America and is carried 

between hosts by triatomine bugs.  Like T. brucei and T. vivax, it has highly similar kDNA, with 

the maxicircle containing the same genes in the same order.  The edited genes of T. brucei and 

T. vivax are also edited in T. cruzi to the same extent [54].  Like T. brucei, T. cruzi spends most of 

its insect stage in the nutrient depleted midgut of its host, metabolizing amino acids for survival 

[55,56].  Once cells have propagated, they migrate to the hindgut and are excreted from the fly.  

 

6 

Transmission to the mammalian host occurs by contact with a wound or mucous membrane.  

Once inside the mammalian host, T. cruzi invades many different types of nucleated cells by 

using the microtubule cytoskeleton of the host cell to recruit lysosomes to create vacuolar 

compartments where T. cruzi resides [57].  Once inside the vacuole, T. cruzi metabolizes glucose 

as its primary energy source [58,59].   

Leishmania spp. 

Leishmania spp. are found on almost every continent in the world, and infect 700,000 to 

1.2 million people annually [60].  Leishmania spp. are transmitted by the phlebotomine sand fly.  

Unlike the trypanosomes, while in the midgut of the sand fly, Leishmania spp. primarily 

metabolize glucose, because of the fly’s frequent sap meals.  Leishmania spp. are transmitted 

to their mammalian hosts by the bite of the sand fly.  Once inside the host, they are 

phagocytosed by host cells.  Inside macrophages, they replicate within lysosome like 

compartments, and it is believed that these compartments are not nutrient restrictive [61,62].  

The maxicircle of the Leishmania spp. parasites has the same genes as the trypanosomes, but 

their editing patterns significantly vary.  The pan-edited genes in Leishmania spp. are ND3, ND8, 

ND9, RPS12, CR3, and CR4, and the partially edited genes are A6, COII, COIII, MurfII, and ND7 

[63].   

Phytomonas spp. 

Phytomonas spp. parasitize plants, utilizing their sucrose and polysaccharides as energy 

sources.  Their insect vectors maintain a highly sap rich diet that allows the parasites to 

continually metabolize carbohydrates, unlike the Trypanosoma spp.  Possibly as a result of this, 

 

7 

several metabolic pathways are incomplete.  The pathways for beta oxidation of fatty acids or 

oxidation of amino acids are missing key enzymes, but the pathways for the synthesis of these 

metabolites are more complete [64].  The ETC of the mitochondria is also affected.  Genes for 

all cytochromes are missing from nuclear and kDNA, and the cytochrome oxidase genes 

normally present on the maxicircle are also missing [64,65].  The other maxicircle genes are 

present, and ND3, ND8, ND9, RPS12, CR3 and CR4 are pan-edited, while ND7, A6 and MurfII are 

only partially edited, as with Leishmania spp.   

Procyclic gRNA Transcriptome 

The gRNA transcriptome of insect stage (procyclic) T. brucei was previously sequenced 

[50].  This library was generated from the EATRO 164 cell line, grown in SDM79 medium, the 

most commonly used medium when culturing procyclic trypanosomes.  As no reference 

genome exists for the minicircles, gRNAs could only be identified based on their function.  Using 

a longest common substring algorithm, gRNAs were identified based on their complementarity 

to previously determined fully edited mRNA sequences.  The RNA editing system tolerates G:U 

base pairs, so this program allows these base pairs in alignments, but these base pairs do not 

contribute to the overall alignment score as much as canonical Watson-Crick base pairs (1 point 

for G:U and 2 points Watson-Cricks).  Guide RNAs with scores higher than 45 points were 

identified as editing gRNAs.  Due to the fact that trypanosomes post transcriptionally add a 

poly-uridine (poly-U) tail is to gRNAs, the sequences generated in this transcriptome possess 

the poly-U tail as well [66].  This program ignores the transcript’s poly-U tail in the alignment, 

and it does not contribute to the score.  Using this program, full complements of gRNAs were 

found for A6, COIII, CR4, CYb, and RPS12, and near full complements were identified for the 

 

8 

other edited genes.  This study found that multiple different sequence classes of gRNAs (major 

classes) edited the same region of an mRNA (this group of gRNAs is called a population).  Major 

classes within a population had many transition mutations, primarily A-G mutations, which 

appeared to be due to the editing system’s toleration of G-U base pairs.  Interestingly, 

populations of gRNAs varied extremely in transcript abundance, with abundance varying from 

<10 to >350,000 reads.   

The gRNAs identified in this study possessed common characteristics.  64% of transcripts 

had 38-48 nucleotides (nt) of complementarity to their target mRNA.  84% of transcripts had 6 

or fewer non-complementary nts at 5’ end, and most transcripts had 0 nts non-complementary 

at 3’ end prior to the poly-U tail.  Conservation was observed in the gRNA transcription start 

site, with 74% of transcripts starting with 5’-ATATA-3’.  Interestingly, a large proportion of 

transcripts had 5’-AAAAA-3’ transcription start sites as well.   

Beyond the identification of gRNAs directing the conventional mRNA edits, this study 

identified a number of gRNAs that could generate alternative edits.  Most of these edits caused 

minor changes to the predicted mRNA and protein sequences, by either changing a single 

amino acid (ND8) or changing no amino acids at all (A6).  However, some gRNAs were identified 

dramatically altered the mRNA or protein sequence.  One edit in the essential A6 gene caused a 

frameshift that would alter and shorten the C-terminus of the protein [17].  Another generated 

a dramatically different sequence at the 3’ end of CR3, and no gRNA was identified capable of 

editing that sequence.  Interestingly, another study identified an alternative edit in COIII, that 

linked an open reading frame in the unedited 5’ end of the transcript with the reading frame in 

 

9 

the edited 3’ end of the transcript [67,68].  The gRNA required to generate this alternative edit 

was not identified in the procyclic gRNA transcriptome of T. brucei. 

Evolution and retention of RNA editing in kinetoplastids 

The kinetoplastid RNA editing system is energetically expensive and, due to the system’s 

sequential dependence, should be highly fragile.  This means that with even high accuracy rates 

for each gRNA, the overall fidelity of the process is astonishingly low. Even a single point 

mutation could drastically change the editing pattern, and stop the editing process, aborting 

expression of the protein. A major question in the field has been why this fragile and 

metabolically expensive system of RNA editing would evolve and persist. 

Initially, it was proposed that U-insertion/deletion editing (kRNA editing) was one of 

many RNA editing processes that were in fact relics of the RNA world. However, the very 

different mechanisms of the RNA editing systems in existence and their very limited distribution 

within specific groups of organisms indicate that they are more likely derived traits that evolved 

later in evolution [69,70]. The sheer complexity of the kRNA editing process, with no obvious 

selective advantage, led to the proposal that insertion/deletion editing arose via a constructive 

neutral evolution (CNE) pathway [71]. RNA editing in trypanosomes is always mentioned in 

support of CNE as an example of how seemingly non-advantageous, complex processes can 

arise [72,73]. More recently however, it has been hypothesized that RNA editing co-evolved 

with G-quadruplex structures found in the pre-edited mRNAs [74]. These structures can help 

regulate transcription in order to promote DNA replication and prevent kDNA loss, and thus 

provide an advantage to the organism. However, they must be removed by the RNA editing 

system prior to translation [74]. Another prominent hypothesis is that RNA editing is 

 

10 

advantageous because it is a mechanism by which an organism can fragment and scatter 

essential genetic information throughout a genome [75,76]. Kinetoplast DNA is far less stable 

than chromosomal DNA, and loss of minicircles due to asymmetric division of the kDNA 

network have been frequently observed, particularly in laboratory cultures of Leishmania 

tarentolae [77,78]. Buhrman et al. [76] suggest that the scattering of essential gRNA genes 

throughout the DNA network would prevent fast growing deletion mutants from outcompeting 

more metabolically versatile parasites during growth in the mammalian host. Using a 

mathematical model of gene fragmentation in changing environments (absence of functional 

selection), they showed a distinct advantage for gene fragmentation. In their model, the 

number of tolerable generations under periods of relaxed selective pressure was increased by 

more than 40% before loss of the ability to move to the next life cycle stage. 

One mechanism for protecting small asexual populations is by increasing the severity of 

the mutations that can occur. If mutations severely impact fitness, deleterious mutations are 

selected out, preventing their fixation [79].  This phenomenon increases the ‘drift robustness’ 

of a population.  One study modeled the acquisition of drift robustness mathematically and 

computationally [80].  This study showed that in a simulated environment, small populations 

evolved a lower fitness than large populations, but when the most common genotypes from 

these populations were placed in a scenario with extremely high genetic drift, the genotypes 

evolved from small populations experienced a smaller decline in fitness than the genotypes 

evolved in a large population.  Furthermore, they examined the types of mutations in the 

fitness landscape nearest to the peaks that each simulated population had fixed on and found 

that in smaller populations, there was an excess of mutations possible that were neutral, 

 

11 

beneficial or strongly deleterious, whereas in larger populations, there were more small-effect 

deleterious mutations possible.  As the RNA editing process may be operating as a proof-

reading system to weed out mutations by making them lethal, these findings strongly support 

the hypothesis that RNA editing is beneficial to the trypanosomes by providing a level of drift 

robustness to the population as a whole.   

While these hypotheses do address the evolution and retention of the RNA editing 

system itself, they do not address another key issue with the system as a whole: maintaining 

genetic material used in RNA editing while that material is not under selection.  During the life 

cycle of T. brucei, trypanosomes undergo a severe bottleneck as they transition through the 

tsetse fly and into the mammalian host, and then within the bloodstream, they undergo 

multiple bottlenecks at each antigenic switch, as they evade the host immune system [22]. Such 

bottlenecks create additional forces of genetic drift, where genes can be lost even if their 

deleterious fitness effect is considerable.  This life cycle should make T. brucei particularly 

sensitive to genetic drift, especially for those genes which are not under selection (Krebs cycle 

and ETC) and should make them extremely vulnerable to Muller’s ratchet (the gradual increase 

of mutational load that eventually leads to extinction) [81–84]. 

During a reexamination of the EATRO 164 procyclic gRNA transcriptome, a number of 

gRNAs were identified capable of shifting the open reading frame of their respective 

transcripts.  These gRNAs acted at the 5’ end of edited mRNAs and either shifted the position of 

an existing start codon or generated a new start codon that would allow that transcript to be 

translated in an alternative reading frame.  Surprisingly, the alternative reading frames spanned 

the full or nearly full length of their transcripts, suggesting that these transcripts were capable 

 

12 

of generating two distinctly different protein products.  Based on these observations, we 

hypothesize that trypanosomes use dual-coding genes to protect genetic information by 

essentially hiding a gene not under selection (i.e. ETC genes) within one that remains under 

selection. Thus, the ability to access overlapping reading frames may be one explanation for 

how genetic material that is unused in one life cycle stage may be preserved while it is not 

under selection.   

Dual-coding and dual-function genes 

Dual-coding genes are defined as a stretch of DNA containing overlapping open reading 

frames (ORFs) [85,86]. Overlapping reading frames are common in viruses and are thought to 

persist due to strong genome size constraints [87,88]. More recently however, overlapping 

genes have been identified in mammalian and bacterial genomes [89–92]. In these organisms, 

size is not an issue and the potential advantage of overlapping genes is less clear. Maintaining 

dual-coding genes is costly, as it constrains the flexibility of the amino acid composition of both 

proteins, constraining the ability of each protein to become optimally adapted [93]. As this 

constraint can be alleviated by gene duplication, it is thought that dual-coding regions can 

survive long evolutionary spans only if the overlap provides a selective advantage. In mammals, 

many of the identified dual-coding genes produce two proteins that bind and regulate each 

other [94,95]. For these proteins, dual-coding may be advantageous for the tight co-expression 

needed. An alternative model suggests that under high mutation rates, the overlapping of 

critical nucleotide residues is advantageous because it may reduce the target size for lethal 

mutations [96]. 

 

13 

The use of genetic information with more than one function is not a new idea in T. 

brucei.  The nuclear encoded α-ketoglutarate dehydrogenase E2 (α-KDE2) is known to be a 

dual-function protein, in that it plays important roles in both the Krebs cycle and in 

mitochondrial DNA inheritance [97]. RNAi knockdowns of this gene in bloodstream form (BF) 

trypanosomes also show a pronounced reduction in cell growth. Similarly, the Krebs cycle 

enzyme α-ketoglutarate decarboxylase (α-KDE1) is a dual-function protein with overlapping 

targeting signals that allow it to be localized to both the mitochondrion and glycosomes [98]. 

RNAi knockdowns of α-KDE1 in BF trypanosomes is lethal, suggesting that, in addition to its 

enzymatic role in the Krebs cycle, it plays an essential role in glycosomal function in T. brucei 

[98].  

Another example of this was identified in the RNA editing system.  Alternative editing of 

COIII is reported to generate a novel DNA-binding protein, Alternatively Edited Protein-1 (AEP-

1), that functions in mitochondrial DNA maintenance [67,68]. In this transcript, one alternative 

gRNA generates sequence changes at two sites that links an open reading frame (ORF) found in 

the pre-edited 5' end, to the 3' transmembrane domains found in the COIII edited ORF. This was 

the first indication that one cryptogene could contain information for more than one protein.  It 

has been previously suggested that both alternative editing and dual-function proteins are 

important mechanisms for expanding the functional diversity of proteins found in 

trypanosomes [67,97–99]. We hypothesize that because trypanosomes live exclusively 

extracellularly in their mammalian host, they are more sensitive to genetic drift, and an equally 

important role for these dual-coding/function genes may be the protection of genetic 

information. 

 

14 

Project Summary 

The goal of this work is to examine the impact of alternative editing on the protein 

diversity, editing efficiency, developmental regulation, and genetic integrity of Trypanosoma 

brucei.  This investigation began with the generation of the gRNA transcriptome of bloodstream 

form EATRO 164 T. brucei.  This analysis identified near full complements of gRNAs for the 

edited genes, as was discovered in the procyclic transcriptome.  A detailed comparison of the 

gRNAs identified in both datasets revealed conserved characteristic, such as anchor length, 

length of complementarity, and transcription start site sequences, even though very few 

identical sequences existed between the two transcriptomes.  Additionally, an interesting 

correlation was found that suggests a relationship between the relative abundance of initiating 

gRNAs between stages and the developmental pattern of mRNA editing. 

During this comparison of the two transcriptomes, a number of alternative editing 

gRNAs were identified.  Notably, three of these gRNAs were capable of shifting translation of 

ND7, RPS12, and CR3 into alternative reading frames.  This discovery prompted the analysis of 

the mitochondrially encoded transcripts to determine which of the genes had the capacity to be 

dual-coding.  Using mutational bias analysis, we show that as many as six cryptogenes in 

addition to the previously discovered COIII/AEP-1, encode more than one protein, and that RNA 

editing allows access to both reading frames.  

In order to determine if mRNA transcripts with access to multiple open reading frames 

exist within the mitochondrial transcriptome, we deep sequenced the transcript populations of 

three putative dual coding genes: RPS12, the 5’ editing domain of ND7 (ND7 5’), and CR3. Using 

the previously generated gRNA transcriptomes, we constructed detailed editing pathways for 

 

15 

each of these genes. We found evidence that CR3 and ND7 5’ are dual-coding genes, based on 

the identification of transcripts that would translate into different reading frames. This study 

indicates that RNA editing can be used to access multiple open reading frames using two 

different methods: in ND7 5’, different gRNAs bring alternate start codons into frame and in 

CR3, different gRNAs can shift the reading frame of the existing start codon.  In addition, CR3 

showed incredible editing diversity. In two different cell lines, highly divergent editing patterns 

were characterized, with the two cell lines using different sets of gRNAs to edit the CR3 

cryptogene. This suggests that the use of a gRNA-guided editing system can also dramatically 

increase protein diversity in spite of an incredibly rigid and mutationally fragile system.  

With a more complete understanding of the existing edits found in the mRNA 

transcriptomes of RPS12, ND7 5’ and CR3, we used this knowledge to analyze the RNA editing 

system’s ability to tolerate noise.  Reexamining the procyclic gRNA transcriptome revealed 

many previously unidentified gRNAs, potentially capable of generating alternative edits or 

disrupting the editing system.  Using a new program called ACORNS (Assemble Clusters Of 

Related Nucleotide Sequence), the gRNAs were grouped into clusters based on sequence 

homology.  This allowed us to determine which unidentified gRNAs were related to previously 

identified gRNAs.  This analysis showed that more than half of the unidentified gRNAs were not 

related to any gRNA of known function, suggesting that many more alternative edits are waiting 

to be discovered.  In order to analyze the impact of the gRNAs that were related to previously 

identified gRNAs, another new program, GUIDE (gRNA Uridine Insertion/Deletion Editor), was 

created.  This program is able to analyze the functionality of gRNA clusters generated by 

ACORNS, by simulating the editing process.  Combining this data with the mRNA transcriptomes 

 

16 

previously generated, we conducted a detailed analysis of each population of gRNAs capable of 

editing these three genes.  This analysis revealed a surprisingly high tolerance for mismatches 

and gaps in mRNA/gRNA alignments in the editing system, most notably in the editing of the 

essential RPS12 [100].  

This project found that not only is alternative editing present in T. brucei, but that it is 

pervasive, and the system, as a whole, is surprisingly robust.  We propose the hypothesis that 

the RNA editing system does in fact promote the genetic robustness of T. brucei through the 

facilitation of dual-coding genes, as well as the introduction of alternative edits that increase 

protein diversity and allow the editing system to continue to evolve.   

 

 

 

17 

CHAPTER 2: ANALYSIS OF THE TRYPANOSOMA BRUCEI EATRO 

164 BLOODSTREAM GUIDE RNA TRANSCRIPTOME 

Abstract 

The mitochondrial genome of Trypanosoma brucei contains many cryptogenes that 

must be extensively edited following transcription. The RNA editing process is directed by guide 

RNAs (gRNAs) that encode the information for the specific insertion and deletion of uridylates 

required to generate translatable mRNAs. We have deep sequenced the gRNA transcriptome 

from the bloodstream form of the EATRO 164 cell line. Using conventionally accepted fully 

edited mRNA sequences, ~1 million gRNAs were identified. In contrast, over 3 million reads 

were identified in our insect stage gRNA transcriptome. A comparison of the two life cycle 

transcriptomes show an overall ratio of procyclic to bloodstream gRNA reads of 3.5:1. This ratio 

varies significantly by gene and by gRNA populations within genes. The variation in the 

abundance of the initiating gRNAs for each gene, however, displays a trend that correlates with 

the developmental pattern of edited gene expression. A comparison of related major classes 

from each transcriptome revealed a median value of ten single nucleotide variations per gRNA. 

Nucleotide variations were much less likely to occur in the consecutive Watson-Crick anchor 

region, indicating a very strong bias against G:U base pairs in this region. This work indicates 

that gRNAs are expressed during both life cycle stages, and that differential editing patterns 

observed for the different mitochondrial mRNA transcripts are not due to the presence or 

absence of gRNAs. However, the abundance of certain gRNAs may be important in the 

developmental regulation of RNA editing. 

 

18 

Author Summary 

Trypanosoma brucei is the causative agent of African sleeping sickness, a disease that 

threatens millions of people in sub-Saharan Africa. During its life cycle, Trypanosoma brucei 

lives in either its mammalian host or its insect vector. These environments are very different, 

and the transition between these environments is accompanied by changes in parasite energy 

metabolism, including distinct changes in mitochondrial gene expression. In trypanosomes, 

mitochondrial gene expression involves a unique RNA editing process, where U-residues are 

inserted or deleted to generate the mRNA’s protein code. The editing process is directed by a 

set of small RNAs called guide RNAs. Our lab has previously deep sequenced the gRNA 

transcriptome of the insect stage of T. brucei. In this paper, we present the gRNA transcriptome 

of the bloodstream stage. Our comparison of these two transcriptomes indicates that most 

gRNAs are present in both life cycle stages, even though utilization of the gRNAs differs greatly 

during the two life-cycle stages. These data provide unique insight into how RNA systems may 

allow for rapid adaptation to different environments and energy utilization requirements. 

Introduction 

The life cycle of Trypanosoma brucei involves two distinct environments, the animal 

host and the insect vector. These environments are distinct in temperature and nutrient 

composition, providing a unique challenge to T. brucei as it cycles between hosts. In the 

bloodstream, trypanosomes exist in two forms, the actively dividing slender form and the non-

dividing stumpy form. The slender form is optimized to utilize its glucose rich environment, 

using glycolysis to generate energy [14]. The stumpy form appears to be transitional, activating 

 

19 

mitochondrial genes in preparation for uptake in a blood meal by its tsetse fly vector and 

subsequent transfer to a harsher environment [14]. Once inside the tsetse fly, the parasite 

utilizes proline to drive oxidative phosphorylation and ATP production in the mitochondrion 

[19]. While the activity of the mitochondrion is relatively low during the bloodstream stage (BS), 

expression of the mitochondrial genome is still essential [17,18]. In T. brucei, the mitochondrial 

genome consists of two types of DNA molecules, maxicircles and minicircles. Maxicircles are 

22kb circular DNA that contain the genes for two ribosomal RNAs, 12S and 9S, and eighteen 

mRNA genes [5]. While some of the protein-coding genes do not require RNA editing prior to 

translation, most require extensive editing before they can be translated [6,7]. This process 

involves the insertion of hundreds of uridylates (U)s and less frequently deletion of Us, often 

doubling the size of the transcript. The sequence changes are guided by small complementary 

RNA molecules (the guide RNAs) that are encoded on the minicircles [8]. Minicircles make up 

the bulk of the kinetoplastid network (anywhere from 5,000–10,000 present in each network) 

with each minicircle encoding 3–5 gRNAs. In T. brucei, there are more than 200 different 

minicircle sequence classes (~1200 gRNAs) [8].  

Distinct differences in mitochondrial transcript abundance, polyadenylation and the 

extent of RNA editing are observed during the complex life cycle (Table 1). The pattern of 

differential RNA editing observed is especially interesting. For example, the cytochrome b (CYb) 

and cytochrome oxidase II (COII) mRNAs are edited during the insect stage, but are primarily 

unedited in bloodstream forms [24,25]. In contrast, editing of the NADH dehydrogenase 

subunit transcripts (ND3, ND7, ND8 and ND9) and editing of the ribosomal protein subunit 12 

transcript (RPS12) appears to occur preferentially in bloodstream forms [5,26–30]. Other 

 

20 

transcripts, cytochrome oxidase III (COIII) and ATPase subunit 6 (A6) are edited in both life cycle 

stages [31,32]. Early studies using both Northern blot and primer extension analyses on a 

limited number of gRNAs indicate that gRNAs are present in both insect and bloodstream 

forms, suggesting that the regulation of RNA editing is not at the level of gRNA availability 

[28,50,51]. Our lab has previously published deep sequencing results of the gRNA 

transcriptome of the T. brucei EATRO 164 procyclic form [9]. Here we present the deep 

sequencing data for the gRNA transcriptome of a bloodstream form of EATRO 164. A total of 

211 populations of gRNAs were identified. We define a population as a group of gRNAs that 

may vary in sequence, but direct the editing of the same or near same region of the mRNA. 

Because kinetoplastid RNA editing allows G:U base pairing, most populations contain multiple 

sequence classes that can guide the generation of the same mRNA sequence. While the 

number of populations identified was similar to the number identified in the procyclic gRNA 

transcriptome (214 populations), the total number of gRNAs identified was much reduced and 

the coverage was less complete; full complements of gRNAs were only identified for COIII and 

CYb. In spite of the reduced number of gRNAs, an interesting correlation was found that 

suggests a relationship between the relative abundance of initiating gRNAs between stages and 

the developmental pattern of mRNA editing. 

Materials and Methods 

Parasites, isolation of mitochondria and RNA extraction 

T. brucei brucei clone IsTar from stock EATRO 164 were grown in rats and isolated as 

previously described [101]. Bloodstream forms were virtually all long-slender forms isolated 

after 4 days of infection. Parasites were used immediately for isolation of mitochondria using 

 

21 

differential centrifugation as previously described or stored frozen at -80°C until RNA extraction 

[9]. Both total RNA from whole parasites and mitochondrial RNA (mtRNA) from purified 

mitochondria were isolated by the acid guanidinium-phenol-chloroform method [102]. 

Ethics statement 

Rats were raised according to the animal husbandry guidelines established by Michigan 

State University. All vertebrate animal use procedures were approved by MSU’s Institutional 

Animal Care and Use Committee (Application 03/11-051-00). MSU has filed with the Office of 

Laboratory Animal Welfare (OLAW) an assurance document that commits the university to 

compliance with NIH policy and the Guide for the Care and Use of laboratory Animals. 

Library preparation and Illumina sequencing 

Samples of mtRNA and total RNA were both treated with DNAse RQI and size fractioned 

on a polyacrylamide gel as previously described [20]. Guide RNAs were extracted from the gel 

and prepped for sequencing using the Illumina ‘Small RNA’ protocol as previously described [9]. 

Libraries from both mtRNA and total RNA samples were deep sequenced on Illumina GAIIx. 

Reads were then processed and trimmed as previously described [9]. Data with two or more Ns, 

shorter than 20nts after trimming or with an overall mean Q-score < 25 were discarded. 

Redundant reads were then removed, while maintaining the number of redundant reads and 

reads containing fewer than 4 consecutive Ts were removed. 

Identification of gRNAs 

To identify gRNAs, each transcript read was aligned to the conventionally edited mRNAs 

based on known base pairing rules (canonical Watson-Crick base pairs and the G-U base pair). 

In the initial screen, no gaps were allowed in the alignment, allowing the formulation of the 

 

22 

gRNA-mRNA alignment as an extended longest common substring (LCS) problem as previously 

described [25]. Matched gRNAs were then scored (two points for G:C and A:U base pairs and 

one point for G:U base pairs). gRNAs with scores >45 were identified as guiding a specific region 

based on the identified mRNA fully edited sequence. Additional searches with reduced 

stringency (scores >30) were performed on regions with low gRNA coverage. The matched 

gRNAs were sorted into populations based on their guiding positions, and the populations 

analyzed and sorted into major sequence classes. 

Results 

Much of the initial characterization of RNA editing in T. brucei was done using the 

EATRO 164 strain. These experiments suggested that RNA editing was developmentally 

regulated in that certain genes were shown to be more fully edited in some stages than others 

(Table 1) [24–32,46]. It was also reported that the developmental regulation was not controlled 

by gRNA availability, as gRNAs were found in both life cycle stages [28,50,51]. In these early 

studies, however, only a small number of gRNAs were investigated. In this study, we used deep 

sequencing to compare the gRNA transcriptomes of a bloodstream form to a procyclic form of 

T. brucei EATRO 164. The EATRO 164 strain was isolated in 1960 from Alcephalus lichtensteini 

and maintained in the lab of Dr. K. Vickerman until being obtained by Dr. Stuart in 1966 [103]. 

Dr. Stuart derived the procyclic form from the Bloodstream culture in 1979 [103]. Both cell lines 

have been maintained in separate culture since that time. 

Trypanosomes from the EATRO 164 strain were grown in Wistar rats to a parasitemia of 

1–2 x 109 trypanosomes per mL and isolated using DEAE cellulose columns. Mitochondria and 

gRNAs were purified as previously described [9]. Libraries were generated using gRNAs isolated 

 

23 

from whole cell RNA and gRNAs isolated from mitochondrial RNA. Both bloodstream gRNA 

libraries were searched using conventionally accepted fully edited mRNA sequences, and a total 

of 1,024,604 gRNA reads were identified. Surprisingly, the library generated using gRNAs 

isolated from whole cell RNA had more than twice as many identified gRNA reads as the data 

generated using gRNAs isolated from mitochondrial RNA. To insure sufficient abundance and 

gRNA coverage, the two data sets were combined for the analyses presented here. In contrast, 

over 3 million gRNA reads were identified in our procyclic gRNA transcriptome generated from 

gRNAs isolated from mitochondrial RNA. Of the 1,024,604 reads identified from the 

bloodstream transcriptomes, 982,450 reads were sorted into major sequence classes. 

The overall ratio of identified procyclic gRNA reads to BS gRNA reads was 3.5:1. This 

ratio varies significantly by gene (Table 2), and by populations within genes (APPENDIX A) and, 

except for the initiating gRNA, no apparent trend relating gRNA abundance and developmental 

editing pattern was observed. Interestingly, for the initiating gRNA, mRNAs that are fully edited 

in the procyclic stage only, or are fully edited in both life cycle stages had initiating gRNAs with 

more reads in the procyclic data set (Figure 1) [31,32,40,46]. In contrast, mRNAs that are only 

fully edited or are more abundant in the BS, had more initiating gRNAs reads in the BS data set 

(Figure 1) [26–30]. 

Because the identified gRNAs from the BS cells were less abundant, the rule used to 

identify major gRNA sequence classes was relaxed. Instead of using a strict cut off for the 

minimum number of reads required, the cut off was assessed on a case-by-case basis. For 

example, if the total population only had 100 reads, a sequence class with only 10 reads would 

still be identified as a major sequence class. Once all major classes were identified, 657 

 

24 

Table 2. Number of gRNA transcripts in procyclic and bloodstream major classes and ratio of 
procyclic transcripts to bloodstream transcripts for each gene. 

Bloodstream 
gRNA Reads 

41,628 
371,139 
13,316 
25,753 
11,022 

157 

13,567 
291,927 
112,868 
83,924 
17,191 
982,492 

Procyclic 

gRNA 
Reads 
266,532 
948,845 
236,808 
51,979 
31,622 
2,605 
75,739 
702,061 
584,639 
72,027 
403,131 
3,375,988 

Gene 

A6 
COIII 
CR3 
CR4 
CYb 
MurfII 
ND3 
ND7 
ND8 
ND9 
RPS12 
Total 

 

Ratio of PC 
to BS Reads 

6.40 
2.56 
17.78 
2.02 
2.87 
16.59 
5.58 
2.40 
5.18 
0.86 
23.45 
3.44 

Figure 1. The abundance of the initiating gRNA of all edited mRNAs in each stage. mRNAs to 
the left of the dashed line are constitutively edited or are edited only in the procyclic stage 
[31,32,40,46]. mRNAs to the right of the dashed line are only fully edited or more abundant 
fully edited in the bloodstream stage [26–30].  

 

 

25 

sequence classes were identified that could be sorted into 211 populations (Table 3). 

Although the overall gRNA numbers were down in comparison to the procyclic data set, most of 

the populations found in that stage (214 gRNA populations) were also identified in the BS 

transcriptome. However, there were a number of populations that were unique to either the 

procyclic or BS stage. 

Table 3. Summary of the gRNA data coverage for each gene. 

Gene 

A6 
COIII 
CR3 
CR4 
CYb 
MurfII 
ND3 
ND7 
ND8 
ND9 
RPS12 
Total 

Populations 

BS 
29 
42 
9 
16 
2 
1 
12 
45 
20 
24 
11 
211 

PC 
28 
39 
9 
18 
2 
1 
12 
48 
21 
23 
13 
213 

Unique 

Average* gRNA 

Populations 
PC 
BS 
1 
0 
1 
4 
0 
0 
0 
2 
0 
0 
0 
0 
0 
0 
5 
2 
3 
2 
0 
1 
0 
2 
13 
10 

Overlap (nts) 
PC 
BS 
18 
20 
22 
19 
14 
19 
17 
18 
14 
12 
N.A. 
N.A. 
15 
15 
21 
17 
21 
17 
16 
16 
17 
21 
19 
17 

Gaps 

BS 
1 
0 
1 
2 
0 
1 
1 
7 
2 
1 
2 
18 

PC 
0 
0 
0 
0 
0 
1 
1 
2 
1 
0 
0 
5 

Weak 

Overlaps 
PC 
BS 
0 
0 
0 
0 
2 
0 
0 
0 
0 
0 
0 
0 
0 
0 
1 
4 
0 
1 
2 
0 
1 
0 
5 
6 

aThe average gRNA overlaps were determined excluding any regions where neighboring gRNAs 
shared no overlap. 
 

Surprisingly, when the bloodstream and procyclic data sets were compared, only 37 

identical major sequence classes were found in both. However, distinctly related sequence 

classes could be identified when comparing the BS and procyclic populations. Comparing the 

related major classes from each transcriptome (BS vs procyclic) revealed a median value of ten 

single nucleotide variations per gRNA. Interestingly, nt variations were much less likely to occur 

in the consecutive Watson-Crick anchor region of the gRNA than in the rest of the gRNA 

indicating a very strong bias against G:U base pairs in this region (Figure 2). 

 

26 

 
Figure 2. The frequency of nt variations versus nucleotide position in the gRNA. Gray bars 
indicate the number of gRNA sequence classes with an identified nt difference between related 
procyclic and bloodstream gRNAs. Nucleotide numbering for each gRNA was normalized by 
setting the start of the Watson-Crick anchor region to zero. Black bars indicate the number of 
gRNA sequence classes whose contiguous Watson-Crick anchors end at that position (start of 
Watson-Crick = zero, so this is an indication of the length of the contiguous Watson-Crick 
region). 

The Watson-Crick anchors (defined as the number of consecutive nts in the 5’ region 

with only G:C and A:U base pairs) had a median length of eleven nucleotides and anchor length 

did not vary between the two forms. The vast majority of major classes of gRNAs had 

consecutive Watson-Crick anchors greater than seven nts long (92.5%). In addition, most gRNAs 

with Watson-Crick anchors shorter than eight nts were not an abundant major class for their 

respective populations. Consistent with observations made from the procyclic data set, most 

gRNAs had zero non-base pairing nucleotides 5’ to the poly-uridine tail and 4 to 6 non-base 

pairing nucleotides 5’ to the anchor region (Figure 3). Also consistent with procyclic data, most 

 

27 

of the gRNAs (59%) had 38 to 48 nts of complementarity (including anchor regions) with their 

respective mRNAs (Figure 4). Transcription start sites also did not vary, as preference for an 

RYAYA start site was observed (Table 4). 

Figure 3. Comparing the number of non-complementary nucleotides 5’ of the anchoring 
region (A) or 3’ of the guiding region (excluding the U-tail) (B) in procyclic and bloodstream 
gRNAs. 
 

 

Figure 4. Length of gRNA complementarity (including anchors) to fully edited mRNAs for both 
bloodstream and procyclic gRNAs. 
 
 
 
 

 

 

28 

Table 4. Most common gRNA transcription start sites in procyclic and bloodstream data. 

Initiating 
Sequence 

Stage 

Percentage of 

Sequence Classes 

Percentage 
of Transcripts 

ATATAT 

ATATAA 

AAAAAA 

ATATAC 

ATACAA 

ATATTA 

ATATAG 

ATAAAT 

ATACAT 

ATAAAA 

Bloodstream 

Procyclic 

Bloodstream 

Procyclic 

Bloodstream 

Procyclic 

Bloodstream 

Procyclic 

Bloodstream 

Procyclic 

Bloodstream 

Procyclic 

Bloodstream 

Procyclic 

Bloodstream 

Procyclic 

Bloodstream 

Procyclic 

Bloodstream 

Procyclic 

 
Coverage and gaps 

32.20% 
35.20% 
20.00% 
21.10% 
3.60% 
4.60% 
3.80% 
4.30% 
2.40% 
2.80% 
4.90% 
2.60% 
2.20% 
2.60% 
2.50% 
2.60% 
2.70% 
2.20% 
1.70% 
2.10% 

33.60% 
37.40% 
17.90% 
24.70% 
1.60% 
1.30% 
1.50% 
3.80% 
1.30% 
1.70% 
13.40% 
0.90% 
0.70% 
7.60% 
1.00% 
0.70% 
3.20% 
2.20% 
0.20% 
1.10% 

In order to determine if the BS gRNA transcriptome contained a full complement of 

guide RNAs, the gRNA populations were aligned to the fully edited mRNAs (APPENDIX B). We 

note, that for an mRNA to be fully edited, not only must all editing sites on the mRNA be 

covered by a gRNA, the downstream gRNA must generate the anchor binding site for the 

subsequent gRNA. Therefore, adjacent gRNAs must overlap. Overall, there was an average of 17 

nts of overlap between adjacent gRNAs, with the average overlap varying slightly by gene 

(Table 3). As the median Watson-Crick Anchor is 11 nts, in most cases, the overlap extends 

beyond the Watson-Crick anchor of the subsequent gRNA. However, we did observe a number 

 

29 

of regions where the overlap is minimal. Currently, there is no data that stipulates the minimum 

anchor needed for efficient editing. However, we postulate that similar to microRNAs, for an 

anchoring sequence to be sufficiently specific, it should be at least six nucleotides [104]. Indeed, 

when examining the overlaps between most gRNAs, there are only ten (four procyclic and six 

BS) that are less than six nucleotides (Figure 5). 

 
Figure 5. The percentage of different nucleotide overlaps found between adjacent gRNAs. 
gRNAs were aligned to their fully edited mRNA sequence and the number of mRNA nts with 
complementarity to both adjacent gRNAs determined. 
 

We therefore used six nucleotides as a cut off to identify regions with potential missing 

guide RNAs for both life cycle stage transcriptomes. In contrast to the procyclic data, where full 

complements of gRNAs were identified for five of the mRNA transcripts (A6, COIII, CR4, CYb, 

and RPS12), in the BS transcriptome, a full complement of gRNAs was only identified for COIII 

and CYb. Overall, there are 12 edited regions where no gRNAs were identified, and five regions 

with weak gRNA overlaps in the BS data (Table 5). Of these 17 regions, seven belong to ND7 

alone. Interestingly, nine of the 17 missing populations are in very low abundance in the 

procyclic data, having 100 or fewer reads. Because the number of reads in the BS data is ~3.5 

fold less abundant, this could account for some of these regions of poor coverage. There are six 

 

30 

regions that lack gRNA coverage in both data sets. These are found in CR3, MurfII, ND3 and ND7 

(Table 5). Interestingly, three of these regions are close to the 3’ end of their respective genes. 

Regions of weak overlap (ND9(238–242), ND9(609–612)) and regions without gRNA coverage 

(CR3(278–292), ND8(541–553)) that are unique to the procyclic transcriptome were also 

observed. Interestingly, the regions of poor procyclic coverage are found in CR3, ND8 and ND9, 

all transcripts that are preferentially edited in the BS form [5,28,29,47]. 

Table 5. Identified gaps or weak overlaps (less than 6 nucleotides) between populations of 
gRNAs observed in both data sets. 

Stage Missing 

Coverage 

Range 

Bloodstream 

669-670 

Gap or 
Overlap 
2 nt Gap 

Abundance of 

Equivalent gRNA 

39,063 

BS/Pa 

Procyclic 

Bloodstream 
Bloodstream 

BS/P 
BS/P 
BS/P 

Bloodstream 
Bloodstream 
Bloodstream 

BS/P 

Bloodstream 
Bloodstream 

233/226-230 

1 nt G/5 nt O  Missing in Both Stages 

278-292 
143-165 
302-306 

80-85 

389-401 

92-94 
95-120 
292-293 
325-326 
485-486 

1000-1000 
1079-1085 

15 nt Gap 
23 nt Gap 
5 nt Gap 
6 nt Gap 
13 nt Gap 
3 nt Gap 
26 nt Gap 

2 nt Overlap 

2 nt Gap 

125 
7,175 
643 

Missing in Both Stages 
Missing in Both Stages 
Missing in Both Stages 

1 

3,259 
888 

0 nt Overlap  Missing in Both Stages 
1 nt Overlap 

7 nt Gap 

101 
44 

BS/P 

1086/1086-1088 

1 nt G/3 nt G  Missing in Both Stages 

Bloodstream 
Bloodstream 
Bloodstream 
Bloodstream 
Bloodstream 

Procyclic 
Procyclic 

Bloodstream 

Procyclic 

Bloodstream 
Bloodstream 
Bloodstream 

1225-1232 
1269-1270 

54-56 

153-159 
386-389 
541-553 
238-242 
340-342 
609-612 
122-132 
156-158 
337-349 

8 nt Gap 

2 nt Overlap 
3 nt Overlap 

7 nt Gap 
4 nt Gap 
13 nt Gap 

5 nt Overlap 

3 nt Gap 

4 nt Overlap 

11 nt Gap 

3 nt Overlap 

13 nt Gap 

31 

123 
63 
1 
4 
2 
413 
652 
36 
7 
3 
62 
128 

Gene 

A6 
CR3 
CR3 
CR4 
CR4 
MurfII 
ND3 
ND7 
ND7 
ND7 
ND7 
ND7 
ND7 
ND7 
ND7 
ND7 
ND7 
ND8 
ND8 
ND8 
ND8 
ND9 
ND9 
ND9 
RPS12 
RPS12 
RPS12 

 

Table 6. Summary of populations found in both data sets that have more reads in the 
bloodstream data set than in the procyclic data set. 

PC and BS Shared 

Populations more 

Percentage of populations 

Populations 

abundant in BS 

more abundant in BS 

28 
38 
9 
16 
2 
1 
12 
43 
18 
23 
11 
201 

6 
12 
5 
10 
0 
0 
6 
20 
8 
16 
4 
87 

21% 
32% 
56% 
63% 
0% 
0% 
50% 
47% 
44% 
70% 
36% 
43% 

Gene 

A6 
COIII 
CR3 
CR4 
CYb 
MurfII 
ND3 
ND7 
ND8 
ND9 
RPS12 
Total 

 

Gene specific gRNA characteristics 

ATPase 6. In the BS gRNA transcriptome, a total of 29 gRNA populations containing 86 

different major sequence classes were identified that could guide the editing of A6 (Table 3; 

APPENDIX C Part A). One population was identified that was unique to the BS transcriptome 

(gA6(281-329)). The gRNAs bordering this population share extensive overlap, so its absence in 

the procyclic transcriptome would not impact the editing process (APPENDIX C Part A). We note 

that two of the gRNAs identified have single nucleotide mismatches. The bloodstream gA6(640–

668) has an identified mismatch (C:U) that disrupts the complementarity of the gA6(640–668) 

population (APPENDIX B Part A). The second mismatched gRNA (gA6(520–533)) would 

introduce a frameshift. Excluding these two mismatched regions, there is complete coverage of 

ATPase 6. In contrast to the procyclic data, where the conventional initiating gRNA and the 

gRNA immediately following it were extremely rare, both of these gRNAs, gA6(773–822), 

previously identified as gA6-14 and gA6(745–789), were fairly abundant, each having hundreds 

 

32 

of reads. The alternative initiating gRNA identified in the procyclic data set was not found. This 

finding is similar to that found in the T. brucei Lister strain 427 where authors identified 

alternative initiating gRNAs not found in the EATRO 164 procyclic gRNA transcriptome [105]. 

Another disparity between the two life cycle data sets was found when comparing the 

abundance of gRNAs implicated in a potential alternative edit. In the procyclic gRNA 

transcriptome, a gRNA was identified that would guide the insertion of 11 U-residues instead of 

the needed 12 between G555 and A568 [9]. This gRNA (pA6(557–593)) was 25-fold more 

abundant than the conventional gRNA (pA6(549–593)). In the BS data set however, more than 

400 reads of the 12U gRNA were identified and only one read was found that would encode the 

alternative 11U edit. Surprisingly, while G555-A568 would be correctly edited (insertion of 

12Us), the next editing site (A549-G555) is edited by bsA6(520–553), the gRNA that introduces 

the 1 nt frameshift. This frameshift would generate a predicted protein with nearly the same 

amino acid sequence as the procyclic 11U frameshift edit (two amino acid changes) (Figure 6). 

Figure 6. Alignment of conventional ATPase 6 protein sequence to hypothetical proteins 
generated by the 11U alternative edited mRNA and the 4U alternatively edited mRNA. Double 
underlined residues show where the alternative sequences differ from the conventional 
sequence. The shaded residues in the 4U sequence show where it differs from the 11U 
sequence. 
 

 

Cytochrome oxidase subunit III. Forty-two gRNA populations, guiding the editing of 

COIII were identified in the BS transcriptome; three more than in the procyclic data set 

(APPENDIX C Part B). This disparity is caused by the presence of several unique populations. 

While the procyclic data set contained one unique population, the BS data contained four gRNA 

 

33 

populations not previously identified. Of these four unique populations, three of them are 

required for full overlapping coverage in the bloodstream. They are not however, required for 

full coverage in the procyclic stage. These three unique gRNA populations all span relatively 

small regions of weak overlap (Figure 7, APPENDIX B Part B). 

 
Figure 7. Editing sites 420–489 of COIII aligned with the gRNAs identified for that region in the 
procyclic (grey) and bloodstream (black) data sets. The gRNA covering 443–474 was only found 
in the bloodstream data set. 
 

An alternative edit of COIII has been described, involving distinct edits at two adjacent 

sites that links the open reading frame of the edited 3’ end to an ORF found in the 5’ pre-edited 

sequence [67]. The previously identified alternative gRNA that can generate the needed editing 

events was not found in either the BS or procyclic transcriptomes. 

C-rich regions 3 and 4. In the BS data set, nine populations and 34 major sequence 

classes were identified that direct the editing of the CR3 transcript. The coverage of edited CR3 

is nearly complete in the bloodstream data set with only a one nt gap in coverage (editing site 

233) (APPENDIX B Part C). This is in contrast to the procyclic transcriptome, where gRNAs that 

matched the published sequence downstream of nt 196 were very rare (<10 copies) and no 

gRNAs were identified that could direct editing near the 3’ end (nucleotides 275–292). 

A full consensus sequence for edited CR4 has only been found in BS T. brucei [48]. Using 

this sequence, 16 gRNA populations, containing 62 major sequence classes were identified in 

 

34 

the BS transcriptome (APPENDIX C Part D). In contrast to the procyclic data, where a full 

complement of gRNAs were identified, there are two gaps in the BS coverage (Table 5). 

Cytochrome b and maxicircle unidentified reading frame II. RNA editing in the 

Cytochrome b (CYb) transcript is limited to the 5’ end and two gRNA populations are sufficient 

to guide the small number of edits needed to render the CYb transcript functional. Both 

populations were observed in both data sets, with a total of 6 major classes. Interestingly, in 

both data sets, the initiating gRNA is significantly more abundant than the second gRNA, being 

approximately 30 fold more abundant in the procyclic data set and approximately 200 fold 

more abundant in the bloodstream data set (APPENDIX C Part E). This is in contrast to most of 

the other transcripts where the initiating gRNAs are not very abundant. In addition, almost all 

of the CYb gRNA major classes have an A-run transcription start site, deviating from the 

common RYAYA initiation site pattern. 

Editing in MurfII is also limited to the 5’ end and requires only two gRNAs. One of these 

gRNAs (gMurfII(30–79)) is encoded on the maxicircle [106]. While this gRNA was observed in 

both data sets, the gRNAs identified were not identical. A purine-purine transition near the 3’ 

end of the gRNA differentiates the procyclic and BS forms (APPENDICES B and C Parts F). An 

initiating gRNA is needed to generate the 3’ most edits that create the anchor sequence for 

gMurfII (30–79). This gRNA was not found in either data set, despite additional searches with 

reduced search stringency. 

NADH dehydrogenase subunits 3, 7, 8, and 9. In the initial characterization of RNA 

editing in T. brucei EATRO 164, fully edited ND subunit transcripts were only found in RNA 

isolated from the BS stage. We were therefore surprised to find that fewer ND gRNA 

 

35 

populations were identified in the BS transcriptome and a full complement of gRNAs was not 

identified for any of the ND subunits. The most complete coverage was found for ND3 and ND9. 

For ND3, the BS data set contained twelve populations and 41 major classes of gRNAs. One gap 

in coverage was observed, from 389–401. This region overlaps a region that has no clear 

consensus sequence, 375–395 [26]. ND9 is the only gene in this study whose bloodstream gRNA 

reads outnumber the procyclic gRNA reads identified (Table 2). Twenty-four bloodstream gRNA 

populations were identified with all edited nucleotides covered if gRNAs with a single base pair 

mismatch are taken into account (APPENDIX B Part J). 

While 45 gRNA populations were identified for ND7 in the BS data set, the gRNA 

coverage was significantly worse when compared to the identified procyclic gRNAs (Table 5). 

Despite the poor coverage, two unique gRNA populations (bs gRNA (772–816) and (1128–

1182)) were identified (APPENDICES B and C Parts H). ND8 also had poor gRNA coverage (Table 

5). Interestingly, there are several populations in ND8 that contain highly abundant gRNA 

sequence classes with mismatches that shorten the complementarity of the gRNA. These 

usually have a single mismatch in the gRNA that would otherwise guide conventional editing 

(APPENDIX C Part I). 

Ribosomal protein S12. The BS data set contained 11 populations and 26 major 

sequence classes that direct editing of RPS12 (Table 3). While the procyclic transcriptome 

contained a full complement of gRNAs, the BS RPS12 data contains one gap in coverage and 

one region of poor overlap, (Table 5). This was surprising, as RPS12 has been shown to be 

essential in both life cycle stages [100,107]. The region of the mRNA with poor coverage has a 

high percentage of C residues and gRNAs covering this region may utilize C:A base pairs. If this 

 

36 

is the case, some classes of gRNAs may not have been detected, as the program used to search 

for gRNAs does not allow for C:A base pairs (APPENDIX B Part K). 

Discussion 

This is the first comprehensive characterization of the mitochondrial gRNA 

transcriptome from the bloodstream stage of Trypanosoma brucei brucei. As we have 

previously characterized the insect stage gRNA transcriptome, these data allow the comparison 

of gRNA characteristics across the two main life cycle stages [9]. In the EATRO 164 BS gRNA 

transcriptome, gRNAs for every edited gene were identified. Interestingly, while the number of 

populations identified in this data set was only slightly lower than that reported in the procyclic 

data set, the total number of gRNA transcript reads identified was considerably lower despite 

the fact that multiple transcriptome libraries were combined. While this may be a reflection of 

the down regulation of mitochondrial transcription in the bloodstream stage (see Table 1), it is 

impossible to rule out technical problems in the generation and sequencing of the libraries. It 

has been previously reported that gRNA presence did not correlate with developmental RNA 

editing patterns in T. brucei and our data does not challenge this [50,51]. The data did however, 

show an interesting trend in the abundance of the initiating gRNAs as relates to their 

developmental editing patterns. It may be that the abundance of the initiating gRNAs is 

regulated in order to control editing of their target mRNAs. However, we cannot rule out the 

possibility that not all of the populations of initiating gRNAs were identified. For the pan-edited 

mRNAs, the initiating gRNAs direct sequence changes that are often downstream of the stop 

codon. Sequence changes in this region would be tolerated, as long as the anchor sequence for 

the next gRNA is maintained. This type of mutation was observed in the 3’ end of ATPase 6 [19]. 

 

37 

In addition, characterization of the initiating gRNAs in the Lister 427 T. brucei cell line identified 

several gRNAs that would direct an alternative editing pattern, suggesting a high tolerance for 

sequence changes near the mRNA 3’ ends. [105]. 

As expected, general gRNA characteristics are conserved across the two life-cycle 

stages. Populations retain the general location of their anchors, there is relatively little shift in 

the location of populations, and the lengths of complementarity are very similar. We did 

observe that considerable nucleotide variations were found in the guiding regions of the gRNAs 

from the different life cycle strains of the EATRO 164 cells. This particular cell line dates back to 

1960 when the BS form was originally acquired [103]. Procyclic cells were derived from the BS 

stock in 1979 and the two cell lines maintained separately since that date [103]. Mixed 

trypanosome genotypes are detected frequently in field isolates from both tsetse flies and 

mammals and it may be that separation into different culture conditions allowed different 

genotypes to predominate in each life cycle strain [22,108,109]. Because gRNAs utilize both 

canonical (Watson-Crick) as well as G:U base-pairing to direct the change in sequence, most 

transition mutations in the gRNA, would not lead to changes in the mRNA sequence and would 

not be selected against [33]. We do note however, that a very strong bias against A to G 

transitions is observed in the anchor regions of the gRNAs. This suggests that transition 

mutations in this region are not tolerated. This suggests that the editing machinery recognizes 

and selects for a conventional base-paired double helix in the initial gRNA/mRNA pairing. The 

ability to discriminate against G:U base-pairs in the initial interaction would greatly increase the 

accuracy of the gRNA targeting event. Considering the sequential nature of the overall editing 

process, this would be very advantageous. 

 

38 

Coverage 

Surprisingly, complete gRNA coverage was observed only for the pan-edited COIII and 

for CYb, where editing is limited to the 5’ end. The identification of the CYb gRNAs was 

expected, as it has been previously reported that the gRNAs are present in both life cycle stages 

even though editing of CYb is limited to the procyclic stage [8,24]. The full coverage of COIII was 

also not surprising, as COIII was shown to be fully edited and equally abundant in both stages 

[32]. However, we expected to see complete coverage of ATPase 6 and RPS12 as both of these 

transcripts have been shown to be essential in both life cycle stages [17,100,107,110]. For 

ATPase 6, we did identify a total of 29 gRNA populations that do cover all of the editing sites. 

However, one of the gRNAs (bsA6(643–667)) has a single nucleotide mismatch (C:U) and one 

would introduce a frameshift (bsA6(520–553)). The C:U mismatch occurs near the middle of the 

gRNA, placing the C:U mismatch in a region that is unusually high in Gs and Cs (APPENDIX B Part 

A). It may be that the G:C base pairs immediately upstream of the mismatch stabilize the 

gRNA/mRNA interaction, allowing it to be tolerated. The frameshift gRNA is also interesting, as 

it occurs just upstream (1 editing site) of another site where we had previously observed a 

frameshift sequence anomaly. Both frameshifts (the BS 4U and the Procyclic 11U) generate a 

predicted protein with nearly the same amino acid sequence. As the frameshifts occur 

downstream of the highly conserved amino acid region involved in proton translocation [31], it 

may be that this different carboxyl terminus is tolerated. 

Near full coverage is also observed for RPS12. For this transcript, one BS identified gRNA 

(bsRPS12(96–121)) has an A-nt insertion that disrupts the gRNA complementarity. Surprisingly, 

the other mRNA transcript found with near complete coverage was ND9 (one gRNA has a single 

 

39 

nt mismatch). All of the other mitochondrially encoded Complex I members did have 

substantial gaps in coverage. Currently, there is considerable debate on the necessity of 

Complex I subunits for either stage of the trypanosome life cycle. Studies using RNAi and 

knockout cell lines of nuclear-encoded members of Complex I have shown that the complex is 

unnecessary for survival in either life cycle stage [111,112]. However, the nuclear encoded 

Complex I member genes are maintained [42], and while we not did identify full coverage for 

the ND transcripts, a vast majority of the gRNAs were found in both life cycle stages. 

This study used high-throughput sequencing to characterize the gRNA transcriptome 

during the bloodstream stage of the trypanosome life cycle. This work suggests that gRNAs are 

expressed during both life cycle stages, and that differential editing patterns observed for the 

different mitochondrial mRNA transcripts are not due to the presence or absence of gRNAs. 

Accession Numbers 

SAMN04302078, SAMN04302079, SAMN04302080, and SAMN04302081 NCBI’s Sequence Read 

Archive. 

Acknowledgments 

The authors dedicate this work in memory of David Judah, MS, DVM. He was a 

wonderful colleague. 

We also acknowledge the work of Joshua Foster, Mark Johnson, James Rauschendorfer, 

Heather Tyler, Callie Vivian, and Alexis Weber who were involved in sorting and identifying 

gRNAs, the MSU RTSF for their contribution in deep sequencing and Ken Stuart and Jason 

 

40 

Carnes at the Center for Infectious Disease Research for supplying the T. brucei strains used in 

this study. 

 

 

 

41 

CHAPTER 3: MITOCHONDRIAL DUAL-CODING GENES IN 

TRYPANOSOMA BRUCEI 

Abstract 

Trypanosoma brucei is transmitted between mammalian hosts by the tsetse fly. In the 

mammal, they are exclusively extracellular, continuously replicating within the bloodstream. 

During this stage, the mitochondrion lacks a functional electron transport chain (ETC). 

Successful transition to the fly, requires activation of the ETC and ATP synthesis via oxidative 

phosphorylation. This life cycle leads to a major problem: in the bloodstream, the mitochondrial 

genes are not under selection and are subject to genetic drift that endangers their integrity. 

Exacerbating this, T. brucei undergoes repeated population bottlenecks as they evade the host 

immune system that would create additional forces of genetic drift. These parasites possess 

several unique genetic features, including RNA editing of mitochondrial transcripts. RNA editing 

creates open reading frames by the guided insertion and deletion of U-residues within the 

mRNA. A major question in the field has been why this metabolically expensive system of RNA 

editing would evolve and persist. Here, we show that many of the edited mRNAs can alter the 

choice of start codon and the open reading frame by alternative editing of the 5' end. Analyses 

of mutational bias indicate that six of the mitochondrial genes may be dual-coding and that 

RNA editing allows access to both reading frames. We hypothesize that dual-coding genes can 

protect genetic information by essentially hiding a non-selected gene within one that remains 

under selection. Thus, the complex RNA editing system found in the mitochondria of 

 

42 

trypanosomes provides a unique molecular strategy to combat genetic drift in non-selective 

conditions. 

Author Summary 

In African trypanosomes, many of the mitochondrial mRNAs require extensive RNA 

editing before they can be translated. During this process, each edited transcript can undergo 

hundreds of cleavage/ligation events as U-residues are inserted or deleted to generate a 

translatable open reading frame. A major paradox has been why this incredibly metabolically 

expensive process would evolve and persist. In this work, we show that many of the 

mitochondrial genes in trypanosomes are dual-coding, utilizing different reading frames to 

potentially produce two very different proteins. Access to both reading frames is made possible 

by alternative editing of the 5' end of the transcript. We hypothesize that dual-coding genes 

may work to protect the mitochondrial genes from mutations during growth in the mammalian 

host, when many of the mitochondrial genes are not being used. Thus, the complex RNA editing 

system may be maintained because it provides a unique molecular strategy to combat genetic 

drift. 

Introduction 

Trypanosomes are one of the most successful parasites in existence, inhabiting an 

incredibly wide range of hosts [2,3]. The dixenous members cycle between two distinct hosts 

and can encounter different environments with distinct metabolic constraints. These parasites 

are unique in that they all possess glycosomes (where glycolysis occurs) as well as mitochondria 

[16]. The salivarian trypanosomes (e.g. T. brucei, T. vivax) are especially interesting, because 

 

43 

they are exclusively extracellular in their mammalian hosts, continuously replicating within the 

bloodstream over periods of months. During this stage of the life cycle, the mitochondrion is 

down-regulated, lacking both Krebs cycle enzymes and a functional electron transport chain 

(ETC) [21]. Successful transition to the fly vector, requires activation of the ETC and ATP 

synthesis via oxidative phosphorylation. This unique lifecycle leads to a major problem: when 

the mitochondrial genes are unused, they are not under selection, hence the integrity of these 

genes are threatened by genetic drift [75,113]. Exacerbating this, salivarian trypanosomes 

undergo a severe bottleneck as they transition through the tsetse fly and into the mammalian 

host, and then within the bloodstream, they undergo multiple bottlenecks at each antigenic 

switch, as they evade the host immune system [22]. Such bottlenecks create additional forces 

of genetic drift, where genes can be lost even if their deleterious fitness effect is considerable. 

These parasites possess several unique genetic features, including RNA editing of the 

mitochondrial transcripts. RNA editing creates open reading frames in “cryptogenes” by 

insertion and deletion of uridylate residues at specific sites within the mRNA. The U-

insertions/deletions are directed by small guide RNAs (gRNA) and can repair frameshifts, 

generate start and stop codons and more than double the size of the transcript (for review see 

[4]). While the mRNA cryptogenes are encoded on maxicircles (25±50 copies per DNA network), 

the guide RNAs are encoded on thousands of 1 kb minicircles, encoding 3±5 gRNA genes each 

[8]. This effectively means that the genetic information for the edited mitochondrial mRNAs is 

dispersed between the mRNA cryptogenes on the maxicircles and the thousands of gRNA 

coding minicircles. The extensive editing of a single transcript can require more than 40 gRNAs 

and hundreds of editing events [9]. While the initial gRNA can interact with the 3' end of the 

 

44 

pre-edited transcript, all subsequent gRNAs anchor to edited sequence created by the 

preceding gRNA. Hence, editing proceeds from the 3' end to the 5' end of the mRNA transcript 

with the terminal gRNA (last one in the cascade) often creating the start codon needed for 

translation. This sequential dependence means that with even high accuracy rates for each 

gRNA, the overall fidelity of the process is astonishingly low. A major question in the field has 

been why this fragile and metabolically expensive system of RNA editing would evolve and 

persist. 

Another level of complexity in the kinetoplastids RNA editing process was the detection 

of an alternative editing event that leads to the production of a functionally discrete protein 

isoform. Alternative editing of Cytochrome Oxidase III (COIII) is reported to generate a novel 

DNA-binding protein, AEP-1, that functions in mitochondrial DNA maintenance [67,68]. In this 

transcript, one alternative gRNA generates sequence changes at two sites that links an open 

reading frame (ORF) found in the pre-edited 5' end, to the 3' transmembrane domains found in 

the COIII edited ORF. This was the first indication, that one cryptogene could contain 

information for more than one protein. Here, we show that as many as six additional 

cryptogenes also encode for more than one protein. Analyses of the terminal gRNA populations 

indicate that gRNA sequence variants exist that can alter the choice of the start codon and the 

open reading frame by alternative editing of the 5' end of the mRNA. Mutational bias analyses 

indicate that six of the mitochondrial genes may be dual-coding, with RNA editing allowing 

access to both reading frames. Dual-coding genes are defined as a stretch of DNA containing 

overlapping open reading frames (ORFs) [85,86]. Of particular interest are dual-coding genes 

that contain two ORFs read in the same direction: a canonical protein (normally annotated as 

 

45 

protein coding in the literature) and an alternative ORF. Maintaining dual-coding genes is costly, 

as it constrains the flexibility of the amino acid composition of both proteins. Hence, it is 

thought that dual-coding genes can survive long evolutionary spans only if the overlap is 

advantageous to the organism [93]. We hypothesize that trypanosomes use dual-coding genes 

to protect genetic information by essentially hiding a non-selected (ETC) gene within one that 

remains under selection. Thus, the ability to access overlapping reading frames may be added 

to a growing list of gene protective strategies made possible by the complex RNA editing 

process [74,75,113]. 

Trypanosome growth 

Materials and Methods 

T. brucei procyclic clones from IsTAR (EATRO 164), TREU 667 and TREU 927 cell lines 

were grown in SDM79 at 27°C and harvested at a cell density of 1-3x107. The TREU 667 cell line 

was originally isolated from a bovine host in 1966 in Uganda [114]. The TREU 927 cell line was 

originally isolated from Glossina pallidipes in 1970 in Kenya [115]. The EATRO 164 strain was 

isolated in 1960 from Alcephalus lictensteini and maintained in the lab of Dr. K. Vickerman until 

being obtained by Dr. Ken Stuart in 1966 [103]. Dr. Stuart derived the procyclic form from the 

bloodstream form culture in 1979. 

Guide RNA isolation, preparation, and sequencing 

Mitochondrial mRNAs and gRNAs were isolated as previously described [9]. All RNAs 

were treated with Promega DNAse RQI. In order to isolate gRNAs from TREU 667 and TREU 927 

cells, RNAs were size fractionated on a polyacrylamide gel as previously described [9]. Guide 

 

46 

RNAs were then extracted and prepped for sequencing using the Illumina Small RNA protocol 

[9]. Libraries from TREU 667 and TREU 927 were deep sequenced on the Illumina GAIIx; reads 

were processed and trimmed as previously described [9]. 

Messenger RNA preparation and sequencing 

In order to isolate target mRNAs, isolated TREU 667 mitochondrial RNAs were reverse 

transcribed using the Applied Biosystems High Capacity cDNA Reverse Transcription Kit. CR3 

cDNAs were amplified via PCR using the following primers (underlined portions are gene 

specific and non-underlined portions are tag regions used in deep sequencing reaction): 

CR3DS5’NEV:ACACTGACGACATGGTTCTACAAGAAATATAAATATGTGTATG 

CR3DS3’170:TACGGTAGCAGAGACTTGGTCTCAATAAACCCATATTAAATAAAAAACAAAAATCC 

After amplification, the products were purified using the QIAquick PCR Purification Kit, 

and paired end Illumina deep sequencing was performed on the Illumina Miseq (2x 250 bp 

paired end run). Low quality results were removed using FaQCs, adapters were removed using 

Trimmomatic and PEAR was used to merge paired end reads. Finally, Fastx was used to compile 

identical reads while maintaining the number of redundant reads. CR3 edited transcripts were 

identified by comparing sequence downstream of the 5' never edited region to the edited CR3 

sequence. Guide RNAs were identified by using the mRNA sequences as queries against our 

existing gRNA databases, as previously described [9]. 

Mutational frequency and editing conservation analysis 

Mitochondrial pan-edited genes were categorized as potentially dual-coding based on 

identification of extended alternative reading frames and/or presence of identified gRNAs that 

generate alternative 5' end sequences. These genes include CR3, CR4, ND3, the 5' editing 

 

47 

domain of ND7, ND9 and RPS12. Nondual-coding pan-edited genes include ATPase 6, COIII, ND8 

and the 3' editing domain of ND7. Partially edited genes include CYb,Murf II and COII. Never 

edited genes include COI, ND1, ND2, ND4 and ND5. For all analyses, ND7 was considered as two 

separate coding regions: the 5' editing domain (ND7N) and the 3' editing domain (ND7C) [27]. 

As we hypothesize that only the 5' editing domain of ND7 is dual-coding, mutation calculations 

for ND7N was pooled with the dual-coding genes and ND7C was pooled with nondual-coding 

pan-edited genes. T. brucei and T. vivax mRNA sequences of mitochondrial encoded genes were 

aligned based on protein sequence using Clustal Omega [116]. Nucleotide sequence mutations 

were identified and their effects on the amino acid sequence were classified as silent, missense 

or nonsense mutations. Missense mutations were further divided into three groups based on 

the PAM 250 matrix where conversions with a value <0 were considered not conserved, 

conversions with a value 0x0.5 were considered modestly conserved, and conversions with a 

value >0.5 were considered strongly conserved [117]. Mutation frequencies were normalized 

for each gene using nucleotide sequence length. Frequencies were compared using unpaired t-

tests. 

The extent of editing conservation between T. vivax and T. brucei was calculated by 

aligning the pan-edited genes based on ACG sequence. For each alignment, each location 

between an A, C or G nucleotide where a U-residue was inserted or deleted in either sequence 

was considered an editing site. Editing sites were classified as identical in both sequences, 

altered in insertion or deletion length, having switched from an insertion site to a deletion site, 

or only occurring in one of the sequences. Percent editing conservation was based on total 

number of editing sites within each mRNA. Percentages were compared using unpaired t-tests. 

 

48 

A principal component analysis (PCA) was performed on all three reading frames of the 

pan-edited genes using the scikit-learn principal component analysis tool [118]. For this 

analysis, the predicted protein sequences for all three reading frames were aligned using Clustal 

Omega [116]. Missense, nonsense, and indel mutations were quantified. Missense mutations 

were further divided into three groups as described above. Each mutation type was quantified 

and the relative frequency of each mutation calculated based on protein amino acid length. The 

variables used in the PCA include the protein mutation frequencies and the percentage of 

identical editing sites in each mRNA. The first reading frame of each gene is defined as the ORF 

published in the literature. 

Data availability 

CR3 sequence accession number: SAMN06318039. TREU 927 gRNA sequence accession 

number: SAMN06318154. TREU 667 gRNA sequence accession number: SAMN06318153. NCBI's 

Sequence Read Archive. 

Results 

In T. brucei, analyses of the gRNA transcriptome for the pan-edited transcripts indicate 

that full editing involves a large number of gRNA populations [9,119]. In addition, most of the 

gRNA populations (population defined as guiding the same or near same region of the mRNA) 

contain multiple sequence classes. The sequence classes most often differ in R to R or Y to Y 

mutations, hence guide the generation of the same mRNA sequence (A:U and G:U base pairs 

allowed). During these analyses, we noted that the terminal gRNA population for Cytosine-rich 

Region 3 (CR3) (putative NADH dehydrogenase subunit 4L [120]), had 3' sequences that would 

extend editing beyond the previously identified translation start codon. In addition, this 

 

49 

population had several sequence variants that would generate different edited sequences in 

this region. The most abundant terminal gRNA would introduce a stop codon in-frame with two 

alternative AUG start codons found near the 5' end (Figure 8A). Other sequence classes 

however, would either bring the upstream AUGs into frame, or shift the reading frame. 

Intriguingly, the alternative +1 reading frame (ARF) did not contain any premature termination 

codons. In order to determine if these gRNAs were utilized, we used Illumina deep sequencing 

to identify the most abundant forms of fully edited CR3 transcripts. Surprisingly, we identified 

multiple forms of the mRNA (Figure 8A, 8B and 8C and APPENDIX D). The first was the fully 

edited sequence predicted by the most abundant gRNA identified (Figure 8A). The other 

transcripts however, had unique editing patterns at the 5' end (Figure 8B and 8C and APPENDIX 

D). Use of these 5' CR3 sequences allowed us to identify novel gRNAs. Predicted translation of 

these mRNA sequences indicate that they use the +1 reading frame, and that the protein 

generated would be the same length as the ORF previously identified. This suggests that CR3 is 

dual-coding, and that selection of the terminal gRNA determines which reading frame will be 

used. A re-examination of the terminal gRNAs for the pan-edited genes indicated that at least 

two other transcripts, NADH dehydrogenase subunit 7 (ND7) and ribosomal protein subunit 12 

(RPS12), have identified gRNA sequence variants within the terminal gRNA population that 

allow access to alternative reading frames (Figure 8D and 8E and APPENDICES E and F). 

Interestingly, the alternative gRNA for ND7 generates a +2 frameshift with a 65 amino acid 

open reading frame. The ND7 transcript is differentially edited in two distinct domains 

separated by 59 nts that are not edited in the mature transcript (the HR3 region) [27]. Only the 

5' domain is edited in both life cycle stages; full editing of the 3' domain was only found in 

 

50 

bloodstream form (BF) parasites. The stop codon for the +2 frameshift is found within the HR3 

region, therefore this alternative protein would be generated by full editing of only the 5' 

domain. While the most abundant gRNA in the Eatro BF transcriptome (~50,000 reads) would 

generate a sequence utilizing the identified ND7 ORF (Figure 8D ORF), the most abundant gRNA 

(>100,000) in the Eatro 164 procyclic library is the +2 ARF gRNA (Figure 8D ARF and APPENDIX 

E). In RPS12, the alternative gRNA deletes an additional U-residue downstream of the existing 

start codon, shifting the reading frame into the +1 ARF (Figure 8E). Interestingly, in Leishmania 

tarentolae, a gRNA, gRPS12VIIIa, has been identified that would also shift the frame of the 

existing start codon into the +1 ARF [121]. 

The identification of gRNAs that could alter the reading frame led us to re-analyze the 

ORFs of the edited transcripts. In addition to CR3, we found extended ORFs in two different 

frames for Cytosine-rich Region 4 (CR4) and NADH Dehydrogenase (ND) subunit 9, while several 

others had shorter, but still significant ORFs in alternative frames (Figure 9). We do note that 

the original sequence publications for both CR4 and RPS12 (CR6) had indicated that the fully 

edited sequence contained extended ORFs in two different frames [30,48]. Additionally, NADH 

Dehydrogenase subunit 3 (ND3) was also considered to be potentially dual coding, based on 

mutational analysis described below. 

As we did find potential ARFs in the edited transcripts, we analyzed the predicted ORFs 

for biases in their mutational pattern. Dual-coding genes often display an atypical codon 

mutation bias due to constraints imposed by the need to maintain protein function in both 

genes. In single-coding genes, changes in the third nucleotide of a codon give rise to 

synonymous amino acids, so this position (N3) is much less constrained. In contrast, in dual-  

 

51 

Figure 8. Alternative editing of the 5' end of pan-edited genes results in access to different 
reading frames. CR3 (A, B, C): Sequenced mRNA variants are aligned with gRNAs and predicted 
protein sequences. Inserted U-residues are lowercase while deleted U-residues are shown as 
asterisks. Canonical Watson-Crick base pairs (|); G:U base pairs (:). Previously identified start 
codons are doubled underlined. Potential upstream AUG start codons are indicated by wave 
underlines. Alternatively edited nucleotides are shown in red. Common anchor regions are 
shown in blue. ND7 (D), and RPS12 (E): Predicted mRNA and protein sequences, based on 
identified gRNAs. 
 
coding genes, the N3 position in one frame is the N1 or N2 position in the alternative frame. 

Therefore, they have low rates of synonymous mutations [122]. This codon bias has been used 

to develop algorithms to detect novel overlapping genes [123,124]. These algorithms however, 

cannot be used in the analysis of our edited transcripts as the two-component genetic system 

(mRNAs created by gRNA editing) introduces another layer of mutational constraint [125]. In 

addition, the edited sequence of the transcripts is known for only a limited number of  

 

52 

Figure 9. Positions of stop codons on all RFs of the edited genes in T. brucei. For each gene, 
reading frame 1 (RF1) is designated as the protein ORF previously identified. Hypothetical dual-
coding reading frames are shown in red. A6 = ATPase 6; CO = Cytochrome Oxidase; CYb = 
Cytochrome b; Murf = Maxicircle unidentified reading frame; ND3 ±ND9 = NADH 
Dehydrogenase subunits. 
 
kinetoplastids, and only the salivarian trypanosomes have the same general life cycle; other 

 

kinetoplastids, like Leishmania and T. cruzi, have evolved different infective cycles and are 

 

53 

under very different selective pressures [113,126,127]. Fully edited sequences are known for T. 

vivax, the earliest branching salivarian trypanosome [52,128]. T. vivax differs from T. brucei in 

that they complete the insect phase of their life cycle entirely within the proboscis of the fly. 

This parasite has been described as an intermediate stage in the evolutionary pathway from 

mechanical transmission (ancestral) to full adaptation to the midgut and salivary glands of the 

tsetse fly [129]. Using the T. vivax sequence, we analyzed mutation patterns in all of the 

mitochondrially-encoded mRNAs (Figure 10). mRNA sequences were aligned by codons based 

on their protein alignments (Clustal Omega [116]). Mutated codons were identified and 

classified as silent, missense and nonsense mutations. Missense mutations were further divided 

into three groups based on the PAM 250 matrix [117]. These data clearly show that the RNA 

editing process significantly constrains the types of mutations tolerated within the 

mitochondrial genome. In comparison to the genes that are not edited (ND1, ND2, ND4, ND5, 

COI) or have limited editing (CYb,Murf II and COII), a distinct suppression of silent mutations 

and strongly conserved missense mutations were observed for all of the pan-edited genes, 

consistent with previous observations (Figure 10) [125]. A suppression of mutations that lead to 

moderately conserved amino acid replacements was also observed, but these were not as 

striking due to the low frequency of this type of mutation. No significant difference was 

observed in the frequency of not conserved missense mutations, though a trend towards a 

lower frequency of these mutations in the putative dual-coding genes (CR3, CR4, ND3, ND9, 

5’ND7 and RPS12) was noted. This was complemented by a significant increase in the frequency 

of strongly conserved missense mutations in the putative dual-coding genes in comparison to 

the other pan-edited genes (3’ND7, ND8, A6 and COIII). 

 

54 

 

Figure 10. Mutational frequencies in mitochondrially encoded genes categorized by effect on 
amino acid sequence. T. brucei and T. vivax pan-edited dual-coding (PanEd DC), pan-edited 
nondual-coding (PanEd NDC), partially edited (PartEd), and never edited (NevEd) mRNA 
sequences were aligned based on their amino acid alignment (reading frame 1, defined as the 
reading frame encoding the gene product previously annotated in the literature) [116]. 
Mutations were categorized as silent (Si), strongly conserved (SC), modestly conserved (MC), 
not conserved (NC) or nonsense (Non). The amount of conservation was determined using the 
PAM 250 matrix, where conversions with a value 0<x≤0.5 were considered modestly conserved, 
and conversions with a value >0.5 were considered strongly conserved, and conversions with a 
value ≤0 were considered not conserved. Error bars depict standard error. * p<0.05, ** p<0.01 
(unpaired t-test). 
 

Surprisingly, while the overall mutational frequency of the fully edited pan-edited genes 

was similar, a comparison of the conservation of editing patterns did show a significant 

difference between the putative dual-coding and the other pan-edited genes (Figure 11). The 

dual-coding genes consistently had a lower conservation of their editing pattern. Upon further 

examination, we found that most changes in the editing pattern resulted from thymidine 

insertions and deletions within the maxicircle DNA sequence, which was then corrected by the 

editing machinery. These types of mutations do not result in a change to the final mRNA 

 

55 

sequence once edited. The T. brucei (Tb) dual-coding genes appeared to consistently insert 

more U-residues, while T. vivax (Tv) had more U-residues encoded within the DNA sequence. 

Indeed, comparisons of the length of the coding regions of Tb and Tv cryptogenes (unedited 

sequence) show that the putative dual-coding genes are almost 10% shorter in Tb. In contrast, 

the nondual coding cryptogenes are not significantly shorter (~2.5%). Some of the other 

changes in editing patterns did generate small internal frameshifts as previously described by 

Landweber and Gilbert [125]. However, the high prevalence of internal frameshifts reported for 

COIII by Landweber and Gilbert is reflected in our analysis only for COIII and A6. 

Figure 11. Percent conservation of editing patterns between T. brucei and T. vivax. Alignment 
of the fully edited mRNAs was based on ACG sequence (see APPENDIX G). Each editing site was 
defined as a site on at least one of the two aligned mRNAs where an editing event occurred. 
Sites were then classified as identical, only identified in one of the two sequences, type 
switched (one site is an insertion and the other is a deletion), or altered in length. Error bars 
depict standard error. * p<0.05, ** p< 0.01 (unpaired t-test). 

 

 

56 

Since differences in the types of amino acid mutations were observed, we performed a 

principal component analysis on the frequency of mutation types for all three reading frames of 

the pan-edited genes (Figure 12). In addition, we included the percentage of editing site 

conservation as a variable. This analysis clearly clustered the putative +1 dual-coding transcripts 

(reading frame 2). The first component (z-axis,) is strongly based on editing conservation, and 

separates the dual-coding genes from the other pan-edited genes as expected. While 

component 2 (x-axis) separated ORF1 and ORF3 from ORF2 of each gene, component 3 clearly 

separated the dual-coding ORF2s from nondual-coding ORF2s. The ND7N ORF3 was the only 

exception, and the gRNA data suggests that it is a dual-coding gene using the +2 (ORF3) reading 

frame. This suggests that an additional layer of mutational constraint beyond that imposed by 

the RNA editing process can be detected for six of the extensively edited transcripts. 

Because dual-coding genes are often conserved in multiple species, we analyzed the 

available sequences of other kinetoplastids (Leishmania tarentolae (Lt), Leishmania mexicana 

amazonensis (Lma), Phytomonas serpens (Ps), Perkinsela CCAP1560/4 (Pk)) to determine if they 

also contain multiple overlapping reading frames with homology to those found in T. brucei. 

Interestingly, many of the alternative reading frames did show some homology to the ARFs 

found in Tb. However, most of these ARFs are punctuated with stop codons (APPENDIX H). 

Extended alternative reading frames are found in CR3, 5'ND7 and RPS12 in Ps. However, the 

extended ARF in the Ps CR3 is in the +2 reading frame and the ND7 and RPS12 ARFs shows very 

little homology with the Tb/Tv ARF (APPENDIX H Parts A, D and F) [130]. Interestingly, while 

Perkinsela has lost many of the genes in the mitochondria, RPS12 was retained [131]. The Pk 

RPS12 ARF possesses a near full open reading frame with one stop codon three codons after an  

 

57 

 

Figure 12. Principal component analysis of frequency of amino acid mutation types and 
editing conservation between T. brucei and T. vivax pan-edited transcripts. A. First factorial 
plan (z-axis: first component, x-axis: second component, y-axis: third component). ND7N = ND7 
5' editing domain, ND7C = ND7 3' editing domain. ORF2 and ORF3 are defined as the +1 and +2 
reading frames, respectively. B. Histogram of eigenvalues for first six components. Eigenvalues 
represent the amount of the variance accounted for by each component. C. Absolute 
contribution of each analyzed mutation frequency to components 1, 2, and 3. Amount of 
conservation was determined using the PAM 250 matrix as described in Figure 10. Mutation 
type: SC = strongly conserved, MC = moderately conserved, NC = Not conserved, Non = 
Nonsense, InDel = insertion or deletion. Editing conservation (EdCon) was determined using 
alignments of edited mRNAs (APPENDIX G). Aligned editing sites were characterized as identical 
or altered. 
 
in frame start near the 5' end of the edited transcript. This pattern is reminiscent of the 

conventionally edited sequences of CR3 and ND7, and could suggest that an alternative edit 

 

58 

may remove the stop codon, allowing access to the ARF. The L. tarentolae CR4 orthologue also 

has two extended ORFs. Interestingly, the published sequence for Lt CR4 appears to switch 

between the two ORFs (switch appears to occur in a stretch of 13 inserted Us) [77]. This may 

explain why only the carboxyl half of the published Lt CR4 showed good homology with Tb and 

Lma [132]. Translation of the Lt ARF does generate a protein with the N-terminus showing high 

homology to the conventional Tb and Lma CR4, while translation of the published ORF shows 

some homology to the Tb CR4 ARF (APPENDIX H Part B). These data are intriguing enough that 

these sequences should be re-examined. While most of the other pan-edited transcripts had 

multiple stop codons in the +1 and +2 reading frames, many did show good homology to the Tb 

ARF sequences. Particularly intriguing are the ND3, ND8 and ND9 alignments. While internal 

stop codons are found in Tv, Lt and Lma ND9 ARFs, they show strong homology to the Tb ND9 

ARF throughout the protein (APPENDIX H Part E). In ND3, the amino ends of the ARFs show 

strong homology between all four of the Trypanosoma and Leishmania species (APPENDIX H 

Part C). This homology decreases after an internal stop codon found in the same position in 3 of 

the 4 species. As ND8 is the only other pan-edited gene in Lt, Lam and Ps, we also examined the 

conservation of the ND8 ORF and ARF, even though our mutational analyses did not tag the 

ND8 gene as dual-coding. While the ND8 ARFs were punctuated by multiple stop codons, they 

surprisingly also showed areas of strong homology between all 5 species, especially down 

stream of an internal methionine (APPENDIX H Part G). We do note that we cannot rule out the 

possibility that alternative editing can remove stop codons observed in the ARFs. 

Analyses of the ARF predicted proteins suggest that they are all short transmembrane 

proteins with two or more predicted transmembrane alpha helices (Figure 13) [133,134]. While 

 

59 

functional homologues are often difficult to detect in trypanosomes, searches using the 

predicted protein sequence of each ARF did identify small molecule transport proteins with 

limited confidence. Using Phyre2, the ND7 ARF was identified as a homolog of the bacterial 

sugar transporter SemiSWEET (61.5% confidence) [135,136]. SemiSWEET, which forms 

homodimeric structures, is also a distant homolog of the yeast mitochondrial pyruvate carrier 1 

(MPC1). This protein has two transmembrane alpha helices and forms a heterodimer with 

either of the other two pyruvate carrier proteins [137,138]. While still very speculative, it is 

intriguing that the small ARF proteins might oligomerize to form small mitochondrial membrane 

transporters. 

Figure 13. Amino acid sequences of ARFs of dual-coding genes. Predicted transmembrane 
regions are shaded in gray [133,134]. *No start codon was identified and the amino acid 
sequence shown begins at the 5' end or after the first stop codon at the 5' end. Exclamation 
point indicates premature termination codon. 
 

 

Discussion 

The work presented here, suggests that as many as six of the extensively edited mRNAs 

in T. brucei are dual-coding and that it is alternative editing using different terminal gRNAs that 

allows access to the two different reading frames. Deep sequencing of the 5' end of CR3 

indicates fully edited transcripts that have access to both reading frames are present in the 

mitochondrial transcriptome and gRNA analyses indicate that three different cell lines contain 

 

60 

gRNAs that can alternatively edit the 5' ends of CR3, RPS12 and ND7. In addition, analyses of 

the mutational bias in pan-edited genes suggest that an additional layer of mutational 

constraint is observed in the putative dual-coding genes. While the overall mutational 

frequency observed for the fully edited mRNAs is similar for all pan-edited genes, the types of 

amino acid changes that appear to be tolerated are significantly different. This is consistent 

with these genes having to maintain functional proteins in two different reading frames. 

Analyses of other trypanosomes, do show that some of the ARFs have intriguing homology to 

the ARFs identified in T. brucei and T. vivax. However, most of the ARFs are punctuated with 

stop codons. These data are difficult to interpret because we cannot rule out the possibility that 

the stop codons are removed by alternative editing events. In addition, the other trypanosome 

species have evolved very different infective life cycles and are under different selective 

pressures. For example, P. serpens is a pathogen that infects important crops and is transmitted 

by sap-feeding bugs. These parasites have glucose readily available in both life cycle stages and 

are unique in that they lack a fully functional respiratory electron transport chain [64,65]. For 

Leishmania, all life cycle stages possess an active Krebs cycle and ETC linked to the generation 

of ATP [61,62,139]. These unique adaptations to different hosts suggest that they may not be 

under the same evolutionary pressure to maintain dual-coding genes. 

Overlapping reading frames are common in viruses, and are thought to persist due to 

strong genome size constraints [87,88]. More recently however, over-lapping genes have been 

identified in mammalian and bacterial genomes [89–92]. In these organisms, size is not an issue 

and the potential advantage of overlapping genes is less clear. For dual-coding genes, the need 

to maintain both ORFs constrains the ability of each protein to become optimally adapted [93]. 

 

61 

As this constraint can be alleviated by gene duplication, it is thought that dual-coding regions 

can survive long evolutionary spans only if the overlap provides a selective advantage. In 

mammals, many of the identified dual-coding genes like Gnas1 and XBP1, produce two proteins 

that bind and regulate each other [94,95]. For these proteins, dual-coding may be 

advantageous for the tight co-expression needed. An alternative model, suggests that under 

high mutation rates, the overlapping of critical nucleotide residues is advantageous because it 

may reduce the target size for lethal mutations [96]. This may be particularly important for 

organisms that have evolved to exist in dual-metabolic environments (two hosts). We 

hypothesize that the trypanosome mitochondrial ARFs encode small metabolite transporters 

that provide a distinct growth advantage to bloodstream form parasites. The complete overlap 

of these small transporter genes with electron transport chain (ETC) genes would protect the 

integrity of the ETC genes that are required only in the insect host. Thus, in trypanosomes, dual-

coding genes may be a mechanism to combat genetic drift during extended periods of growth 

in non-selective environments. In T. brucei, it is known that a number of bloodstream form 

essential proteins are functionally linked to Krebs cycle or ETC genes. While not a ªclassic º 

dual-coding gene in that production of the alternative protein does not involve overlapping 

reading frames, the pan-edited COIII gene does contain the information for two distinct 

proteins, COIII and AEP-1. AEP-1 is important for kinetoplastid DNA maintenance and 

overexpression of the DNA-binding domain results in a dominant negative phenotype including 

decreased cell growth and aberrant mitochondrial DNA structure [68]. The nuclear encoded α-

ketoglutarate dehydrogenase E2 (α-KDE2) is known to be a dual-function protein, in that it 

plays important roles in both the Krebs cycle and in mitochondrial DNA inheritance [97]. RNAi 

 

62 

knockdowns of this gene in bloodstream form (BF) trypanosomes also show a pronounced 

reduction in cell growth. Similarly, the Krebs cycle enzyme α-ketoglutarate decarboxylase (α-

KDE1) is also a dual-function protein with overlapping targeting signals that allow it to be 

localized to both the mitochondrion and glycosomes [98]. RNAi knockdowns of α-KDE1 in BF 

trypanosomes is lethal, suggesting that in addition to its enzymatic role in the Krebs cycle, it 

plays an essential role in glycosomal function in T. brucei [98]. It has been previously suggested 

that both alternative editing and dual-function proteins are important mechanisms for 

expanding the functional diversity of proteins found in trypanosomes [67,97–99]. We 

hypothesize, that in salivarian trypanosomes, an equally important role for these dual-

coding/function genes may be the protection of genetic information. 

The ‘why’ of the unique RNA editing process in kinetoplastids has been a long-standing 

paradox. The complex machinery and the sheer number of gRNAs required to direct the 

thousands of U-insertion/deletions indicate that this process is metabolically very costly. 

Initially, it was proposed that U-insertion/deletion editing (kRNA editing) was one of many RNA 

editing processes that were in fact relics of the RNA world. However, the very different 

mechanism of the RNA editing systems in existence, and their very limited distribution within 

specific groups of organisms indicate that they are more likely derived traits that evolved later 

in evolution [69,70]. The sheer complexity of the kRNA editing process, with no obvious 

selective advantage, led to the proposal that insertion/deletion editing arose via a constructive 

neutral evolution (CNE) pathway [71]. Indeed, RNA editing in trypanosomes is always 

mentioned in support of CNE as an example of how seemingly non-advantageous, complex 

processes can arise [72,73]. More recently however, it has been hypothesized that RNA editing 

 

63 

co-evolved with G-quadruplex structures found in the pre-edited mRNAs [74]. These structures 

are thought to be advantageous in that they can help regulate transcription in order to 

promote DNA replication and prevent kinetoplast DNA loss. However, they must be removed by 

the RNA editing system prior to translation [74]. Another prominent hypothesis is that RNA 

editing is advantageous because it is a mechanism by which an organism can fragment and 

scatter essential genetic information throughout a genome [75,76]. Kinetoplast DNA is far less 

stable than chromosomal DNA, and loss of minicircles due to asymmetric division of the kDNA 

network have been frequently observed, especially in laboratory cultures of Leishmania [76,77]. 

Buhrman et al. [76] suggest that the scattering of essential guide RNA genes throughout the 

DNA network, would prevent fast growing deletion mutants from outcompeting more 

metabolically versatile parasites during growth in the mammalian host. Using a mathematical 

model of gene fragmentation in changing environments (absence of functional selection), they 

showed a distinct advantage for gene fragmentation. In their model, the number of tolerable 

generations under periods of relaxed selective pressure was increased by more than 40% 

before loss of the ability to move to the next life cycle stage. If the dual-coding ARFs give BF 

trypanosomes a selective growth advantage similar to that observed by the COIII alternative 

protein AEP1, then the number of ‘essential’ gRNA genes would increase greatly. Currently, 

only AEP1, A6 and RPS12 mitochondrial genes have been experimentally shown to be essential 

[68,100,110]. In addition, the presence of alternative editing and dual-coding genes would 

complement the protection provided by gene fragmentation by also shielding the genes from 

deleterious point mutations within critical ETC genes. This suggests that the complex RNA 

editing system found in the mitochondria may therefore provide multiple molecular strategies 

 

64 

to increase genetic robustness. Protection of the mitochondrial genome during growth in the 

mammal would increase the capacity for successful transfer to an insect vector and maximize 

the parasites long-term survival and spread. 

Acknowledgments 

We thank the Ken Stuart Lab for trypanosome cell lines and Chris Adami for helpful discussions. 

We would also like to acknowledge the Dr. Marvis A. Richardson Endowed Fellowship Fund for 

their recognition of LEK. 

 

 

 

65 

CHAPTER 4: ANALYSIS OF THREE PAN-EDITED MRNAS REVEALS 

DUAL-CODING GENES AND COMPLEX MULTIPATH EDITING 

Abstract 

Trypanosoma brucei is a single celled eukaryote that possesses a highly complex RNA 

editing system. In this system, a large set of small RNAs, called guide RNAs direct the insertion 

and deletion of uridines in mitochondrial mRNAs.  These changes extensively alter the target 

mRNAs, up to doubling them in length.  Recently, mutational analysis showed that several of 

the edited genes possessed capacity to encode two different protein products.  These 

overlapped reading frames could be accessed through alternative RNA editing, that shifts the 

translated reading frame.  In this study, we analyzed the editing patterns of three putative dual-

coding genes, ribosomal protein S12 (RPS12), the 5’ editing domain of NADH dehydrogenase 

subunit 7 (ND7 5’), and C-rich region 3 (CR3).  We found evidence that fully edited ND7 5’ and 

CR3 are can translate in more than one reading frame.  Moreover, we found that CR3 has a 

complex set of editing pathways that vary substantially between cell lines, and that changing 

available energy sources also alters the editing preferences of CR3 and ND7 5’.  These findings 

suggest that editing patterns can be influenced by the current environment, and that 

alternative editing may be utilized by the trypanosomes to introduce variation within this 

fragile editing system.   

 

 

66 

 

Introduction 

Trypanosoma brucei is a member of the Kinetoplastea, a group of protozoans 

characterized by a large network of DNA in their mitochondria known as the kinetoplast [1]. 

The kinetoplast is composed of two types of concatenated circular DNA molecules: maxicircles 

and minicircles. The maxicircles all encode mitochondrial ribosomal RNAs as well as 18 protein 

coding genes, most of which are components of the electron transport chain. The 

approximately 30-50 identical copies of the maxicircle make up a relatively small proportion of 

the kinetoplast [5]. Most of the DNA network is composed of 5,000 and 10,000 1 kb minicircles, 

each of which encodes 2-5 small non-coding guide RNAs (gRNAs) [8,33]. These gRNAs are used 

in the process of RNA editing.  In T. brucei, RNA editing consists of specific uridine insertion and 

deletion events that render 12 of the 18 mitochondrially encoded mRNAs translatable [4]. The 

gRNAs act as templates for the large editosome complex which cleaves the mRNA, inserts or 

deletes the correct number of uridines and then re-ligates the mRNA in an energy intensive 

process. This is repeated until the mRNA is complementary to the small gRNA. Each gRNA 

directs edits that generate the anchor region for the next gRNA, thus the RNA editing process is 

sequentially dependent on correct editing by each gRNA. As editing of some of the extensively 

edited mRNAs can involve upwards of 40 gRNAs, this renders the process incredibly fragile. 

[140]. We hypothesize, that such an expensive and fragile process evolved in response to the 

unique life cycle of T. brucei.  

T. brucei is a dixenous parasite, invading the bloodstream of a mammalian host and 

being transmitted between hosts by bite of a tsetse fly. Once taken up in a blood meal by the 

tsetse fly, T. brucei transitions into the replicating procyclic state in the midgut, and the energy 

 

67 

T. brucei requires for this replication is gained through metabolism of amino acids [19,20]. This 

is accomplished through use of a portion of the Krebs cycle and the electron transport chain 

(ETC), thus most of the ATP required is produced by the mitochondria [19–21]. This stage of the 

life cycle is followed by a dramatic bottleneck when the trypanosomes transition from the 

midgut to the salivary glands of the tsetse fly [22,23]. From the salivary glands, trypanosomes 

are then refluxed into their next mammalian host during a bloodmeal. Once the parasite is 

deposited into its mammalian host, it quickly transitions to utilizing glycolysis for its energy 

generation, removing the requirement for ATP production in the mitochondria [15]. While in 

the mammalian host, T. brucei lives entirely extracellularly. It is frequently subject to attacks by 

the host’s adaptive immune system, and the population evades these attacks through antigenic 

variation [12]. This part of the life cycle can be quite long, with the longest known infection 

lasting 29 years [13]. This life cycle should make T. brucei particularly sensitive to genetic drift,  

especially for those genes which are not under selection (Krebs’s cycle and ETC) and should 

make them extremely vulnerable to Muller’s ratchet (the gradual increase of mutational load 

that eventually leads to extinction) [81–84]. One mechanism for protecting small asexual 

populations is by increasing the severity of the mutations that can occur. If mutations severely 

impact fitness, mutated individuals are selected out, preventing their fixation [79].  Recently, 

computer modeling studies suggest that small asexual populations can evolve this type of 

mechanism (termed “drift robustness”) in order to maintain fitness [80].  The sequential 

dependence of the kRNA editing process implies that the system is inherently fragile to 

mutations. Even a single point mutation can drastically change the editing pattern, and stop the 

editing process, aborting expression of the protein. Hence, the RNA editing process may 

 

68 

operate as a proof-reading system to weed out mutations by making them lethal. This is 

effective however, only if the mitochondrial genes are under selection. Previously, we showed 

that many of the mitochondrially pan-edited genes have a distinct mutational bias that is 

suggestive of dual-coding genes (coding two proteins by overlapping reading frames) [141]. The 

overlapping of ETC genes not under selection in the bloodstream stage with genes that are 

under selection during this stage of the life cycle, would prevent the accumulation of 

mutations. As the extensively overlapped genes share most gRNAs, this strategy would ensure 

that almost all of the genetic material is protected. 

Our analyses suggested that out of the twelve pan-edited genes in T. brucei, six are 

potentially dual coding, and that the RNA editing system is used to determine which reading 

frame is accessed. In order to determine if mRNA transcripts with access to multiple open 

Reading frames (ORFs) exist within the mitochondrial transcriptome, we deep sequenced the 

mRNA transcript populations of three putative dual coding genes: ribosomal protein S12 

(RPS12), the 5’ editing domain of NADH dehydrogenase subunit 7 (ND7 5’), and C-rich region 3 

(CR3). Using the previously generated gRNA transcriptomes, we constructed detailed editing 

pathways for each of these genes. The editing pathway of RPS12 was primarily linear, reflecting 

the high degree of conservation required for a gene that is essential [100,107]. We found no 

evidence of utilization of the gRNA that provides access to the alternative reading frame [141]. 

In contrast, we did identify transcripts using different reading frames for both CR3 and ND7 5’. 

This study indicates that RNA editing can be used to access multiple open reading frames using 

two different methods: in ND7 5’, different gRNAs bring alternate start codons into frame and 

in CR3, different gRNAs can shift the reading frame of the existing start codon. In addition, CR3 

 

69 

showed incredible editing diversity, in that two different cell lines showed very different editing 

patterns, using different sets of gRNAs to edit the CR3 cryptogene. This suggests that the use of 

a gRNA-guided editing system can also dramatically increase protein diversity in spite of an 

incredibly rigid and mutationally fragile system.  

Materials and Methods 

T. brucei culture and RNA Isolation 

T. brucei clones from strains EATRO 164 and TREU 667 were grown in SDM79 and 

harvested as previously described [9].  EATRO 164 cells grown in SDM79 were then gradually 

transitioned to SDM80 using serial 1:3 dilutions when cells reached a density of at least 5x106 

cells/mL.  SDM80 was prepared as described by Lamour et al. with the exception of using 

undialzyed FBS, and reducing the amount of FBS added by half [142].  This results in the final 

concentration of glucose being 0.5 mM instead of 0.15 mM.  This concentration is still well 

below that of SDM79, which has a glucose concentration of 6 mM. Once cells had been 

acclimated to SDM80, cells were harvested as previously described [9].  Mitochondrial vesicles 

were isolated using differential spins and mitochondrial RNA was then isolated from vesicles as 

previously described [9].   

Preparation, Sequencing, and Analysis of mRNAs 

cDNAs were generated from isolated RNAs using the Applied Biosystems High Capacity 

cDNA Reverse Transcription Kit.  CR3, RPS12, and ND7 5' editing domain cDNAs were amplified 

via PCR using the following primers (underlined sequences are gene specific and non-

underlined sequences are tag regions used in deep sequencing reaction): 

 

70 

CR3 5': ACACTGACGACATGGTTCTACAAGAAATATAAATATGTG 

CR3 3' Short: TACGGTAGCAGAGACTTGGTCTACAAAAATTATTTGCATACTT 

CR3 3’ Extended: TACGGTAGCAGAGACTTGGTCTACAAAAATTATTTGCATACTTTTTT 

RPS12 5': ACACTGACGACATGGTTCTACACTAATACACTTTTG 

RPS12 3': TACGGTAGCAGAGACTTGGTCTAAAAACATATCTTAT 

ND7 5’: ACACTGACGACATGGTTCTACAGATACAAAAAAACATGAC 

ND7 3’: TACGGTAGCAGAGACTTGGTCTCTTTTATATTCACATAACTTTTCTGTAC 

Amplified cDNAs from EATRO 164 cells grown in SDM79, EATRO 164 cells grown in 

SDM80, and TREU 667 cells grown in SDM79 were individually barcoded and combined in equal 

molar amounts.  Samples were sequenced in a 2x250bp paired end format (PE250) using an 

Illumina MiSeq Standard flow cell and 500 cycle reagent cartridge, version 2. Sequence data 

was preprocessed as previously described [141].   

Sequence data was then separated by cell line, growth media and gene.  Sequence data 

was then analyzed using a new pipeline and program called SKETCH (Segmentation of 

Kinetoplast Edited Transcripts to Characterize editing Heterogeneity).  This program allowed us 

to classify mRNAs at the block editing level and determine which editing patterns were most 

prominent.   

For each set of sequences, SKETCH would remove low quality sequences whose 

sequences containing more than 5 mismatches to the unedited template, disregarding uridines.   

In order to classify the editing patterns observed in the mRNA transcripts, SKETCH requires a set 

of template sequences.  Initially, the templates supplied to SKETCH were the conventional fully 

edited and unedited sequences for each of the three genes examined.  These sequences were 

 

71 

then segmented based on the editing blocks previously defined by the locations of gRNA 

populations [9].  Each transcript was then classified by editing block, with each block being 

classified as matching the unedited sequence, matching the fully edited sequence or being 

unknown.  After the initial characterization of the transcripts, the most abundant unknown 

sequences for each editing block were then added to the reference pool.  Sequences were then 

reclassified by SKETCH based on the newly added reference sequences.  This process was 

repeated until the most abundant forms of editing were identified.  SKETCH code is available 

upon request. To validate the newly identified editing patterns as true alternatives, the new 

sequences were screened against the gRNA transcriptome as previously described [9].  

Sequences with a gRNA match were then considered valid alternative edits.   

Uridine deletion analysis 

For RPS12 and ND7 5’, once full editing pathways were characterized, editing sites with 

DNA encoded Us were identified. Each encoded U site was then characterized based on the 

proximity of the preceding gRNAs’ 3’ poly-U tail as well as whether the all uridines at the site in 

question were deleted in the final fully edited sequence.  For each site, a window was defined 

consisting of the 6 sites upstream and 6 sites downstream of the site in question.  Using these 

parameters, the mRNA transcripts of RPS12 and ND7 were analyzed at each deletion window.  

The window of each transcript for each encoded U was examined and classified as unedited, 

fully edited or partially edited.  Partially edited sequences were then classified based on the 

editing state of the encoded U site.  For sequences with total deletions, each editing sequence 

was then classified based on the states in the 3' end of the window as either matching the fully 

edited sequence or not.  Code available upon request. 

 

72 

Results 

In order to confirm that transcripts with access to two reading frames exist in vivo, we 

analyzed the mRNA transcriptomes for three of the putative dual-coding genes, RPS12, ND7 5’ 

and CR3. This mRNA deep sequencing data was then used in combination with the sequenced 

gRNA transcriptomes, to generate precise editing pathway maps. In order to determine how 

robust the observed editing pathways were, we characterized editing in two different cells 

lines, TREU 667 and EATRO 164. In addition, we examined the effect of energy source on these 

editing pathways by using two different media, SDM79 and SDM80. SDM79 is the standard 

medium used to grow the procyclic stage parasite. However, it contains 6mM glucose, and 

experiments have shown that under these levels of glucose, the procyclic stage can grow in the 

absence of electron transport chain (ETC) activity [112,142–147]. The SDM80 medium was 

developed to more closely resemble insect gut conditions and has very low glucose 

concentrations [142]. Trypanosome growth in this medium requires ATP production using the 

ETC [142].  

RPS12 is an essential component of the mitochondrial ribosome [30,100,107]. RPS12 is 

extensively edited (pan-edited) with 132 Us inserted and 28 Us deleted. Full editing is directed 

by 12 populations of gRNAs (defined as a group of gRNAs that edit the same region of an 

mRNA) [9,30].  In this analysis, we identify 10 populations, with three of the previously 

identified populations being combined with other populations that shared a very high amount 

of overlap.  One new population (F) was identified through a search of the gRNA transcriptome 

under reduced stringency.  Analyses of the canonical editing pattern indicate that there are two 

long ORFs, and mutational bias analyses indicate that both ORFs may be selected for [141].  The 

 

73 

longest ORF encodes the RPS12 protein and encompasses a second shorter ORF of unknown 

function [30].  Northern blots revealed that edited RPS12 mRNAs were found in both life cycle 

stages, however, edited mRNAs were more abundant in bloodstream form than procyclic form 

trypanosomes [30].   

Because RPS12 is essential, we expected it to have a very robust editing pattern in both 

cell lines, as well as under both energy conditions.  In contrast, neither ND7 or CR3 appear to be 

essential in the insect stage of the parasite [112]. The canonical ND7 has two separate editing 

domains that are edited independently [27].  Interestingly, while the 3’ editing domain is fully 

edited only in the bloodstream life cycle stage, the 5’ editing domain is edited in both life cycle 

stages [5,27]. In addition, the mutational bias analyses indicate that only the 5’ editing domain 

has characteristics indicative of a dual coding gene. The canonically edited CR3 is also a putative 

Complex I member (ND4L) and is preferentially edited in the BS stage [47,120]. Complex I has 

been shown to be non-essential in both life cycle stages, and other mitochondrially encoded 

complex I subunits, ND3, ND8, and ND9, have been shown to be preferentially edited in the 

bloodstream stage [5,26,28,29,111,112]. 

RNA seq data was generated by reverse transcribing all mtRNAs using random primers.  

For both RPS12 and CR3, transcripts were then selectively amplified using sequence specific 

primers targeted to the terminal 5’ and 3’ never edited regions as to not bias against any 

possible editing pattern. For ND7, the 5’ editing domain was selectively amplified using 

sequence specific primers targeted to the 5’ never edited region and the homology region 3 

(HR3) that separates the 5’ and 3’ editing domains [27]. The HR3 is a span of 59 nts that is also 

never edited, hence should not bias the analysis.  The targeted transcriptome libraries were 

 

74 

generated from TREU 667 cells grown in SDM79 and EATRO 164 cells grown in SDM79 and 

SDM80.  Additionally, for CR3, we generated another library using TREU 667 cell line mRNA by 

selecting for transcripts of a larger size, instead of taking transcripts of all sizes (SDM79).  This 

allowed us to enrich the library for transcripts that had initiated the editing process. Amplified 

cDNAs were then gel purified, barcoded and combined in equal molar amounts for sequencing. 

While the number of total reads obtained did vary by cell line and media used, surprisingly few 

transcript were fully edited (canonical AUG + ORF). For both RPS12 and CR3, the majority of 

reads (>80%) were completely unedited (Table 7). CR3, which has previously been shown to be 

preferentially edited in the BS stage, had the lowest percentage of fully edited transcripts, with 

only 0.1% – 0.2% translatable transcripts detected in both cell lines and under both growth 

conditions. In contrast, while RPS12 had similar levels of unedited transcripts, a larger 

percentage of translatable transcripts were found. For this essential transcript, the number of 

fully edited transcripts differed between the two different cell lines; 2.3% in TREU 667 and 0.9% 

in EATRO 164. Growth of the EATRO cells in low glucose media (SDM-80) did result in a 

substantial jump in the both the number of transcripts that initiated the editing process, and 

the number of fully edited transcripts (4.16%). This suggests that energy source may influence 

editing efficiency. While the predominance of completely unedited transcripts found for both 

CR3 and RPS12 was surprising, these numbers are in line with those found in other studies 

[131,148,149].  

The ND7 5’ transcriptome analyses differed substantially from both RPS12 and CR3 in 

that the majority of these transcripts had initiated the editing process. The TREU cell line 

showed the highest editing efficiency with ~80% of transcripts having initiated editing and 9.7% 

 

75 

Table 7.  Editing efficiencies of RPS12 (A), ND7 (B), and CR3 (C). 
Transcript  Cell line and 

% partial 
edited 

% fully edited 

RPS12 
RPS12 
RPS12 
ND7 5’ 
ND7 5’ 
ND7 5’ 
CR3 
CR3 
CR3 
CR3 

 

Total # 
Reads 
Media  
787,584 
Treu 667, SDM79 
Eatro 164, SDM79  846,549 
Eatro 164, SDM80  1,381,092 
Treu 667, SDM79 
1,141,322 
Eatro 164, SDM79  915,610 
Eatro 164, SDM80  313,657 
Treu 667, SDM79 
18,832 
Treu 667 enriched.  50,589 
Eatro 164, SDM79  348,210 
Eatro 164, SDM80  53,000 

% 
unedited  
89.8% 
92.6% 
81.3% 
20.3% 
47.0% 
27.4% 
84.9% 
18.1% 
93.2% 
90.6% 

7.9% 
6.5% 
14.5% 
70.0% 
52.8% 
72.1% 
15.0% 
73.1% 
6.6% 
9.3% 

2.3% 
0.9% 
4.2% 
9.7% 
0.2% 
0.5% 
0.1% 
8.8% 
0.2% 
0.1% 

of the transcripts fully edited and translatable. In contrast, in EATRO cells, only 53% of the 

transcripts had initiated editing, and a scant 0.2% had completed the editing process. As with 

RPS12, we did see an increase of efficiency in the cells grown in SDM80, with over 70% initiating 

editing. However, even with the large increase in initiation of the editing process, only a scant 

0.5% of transcripts were fully edited (Table 7).  The sharp drop in the ability to complete the 

editing process appears to be due to loss of an optimal gRNA for one region of this transcript 

(described below).   

Editing Cascade and Reading Frame Analyses 

In order to determine if the low editing efficiencies were due to any one step in the 

editing cascades, a full analysis of each editing step was done. For these analyses, we 

developed a pipeline that used our gRNA database to distinguish true alternative edits from 

both mis-edited and partially edited transcripts. This pipeline uses two programs, Segmentation 

of Kinetoplast Edited Transcripts to Characterize Editing Heterogeneity (SKETCH), and the gRNA 

database search program previously described [9]. The SKETCH program analyzes segments of 

transcripts that are defined by the relative range of coverage of each gRNA population used in 

 

76 

conventional editing patterns. Block sequences are compared to both the unedited sequence 

and the fully edited conventional sequence and then classified into unedited, fully edited and 

“unknown” blocks. Once the most abundant sequences of all segments are identified, 

transcripts containing each most abundant “unknown” sequences are used as queries against 

the gRNA database. If a gRNA is identified that can generate the edit, the sequence is 

considered a true alternative edit. If no plausible gRNA is identified the edit is considered a 

misedit or a junction, depending on the sequence and the status of other segments on that 

transcript with this sequence. By examining segments of transcripts independently, we were 

able to identify both branching and converging editing pathways. 

RPS12 Analysis 

As expected, the essential RPS12 showed the most robust editing path. In all three 

analyses, the majority of transcripts used the same series of 10 gRNA populations (A – J) (For 

full gRNA sequences and alignments see APPENDICES I and J). Use of the final gRNA population 

(gJ) in the cascade lead to only the RPS12 ORF, and we found no evidence of an alternative AUG 

or frameshift leading to utilization of the second ORF. We do note that there is a downstream 

start codon, that if translated, would be read in the alternative reading frame (APPENDIX J). 

While the editing cascades were relatively straight forward, we did see some minor deviations 

(Figure 14). Editing of block B could utilize a number of different gRNAs, including several that 

were used in one cell line only (dashed arrows). gRNAs B1 and B1* are variants of the same 

gRNA, with gB1* introducing a single amino acid (aa) change (V/Y) (Figure 15).  While editing 

using the TREU specific gB3t and gB4t gRNAs lead to a distinct editing “dead end” (dead end = 

disruption of the next canonical anchor sequence, and no detection of any further editing), the 

 

77 

EATRO specific gB2e did not disrupt the editing cascade. Use of this variant, however, did 

introduce a frameshift seven amino acids (aa) from the C-terminus (Figure 15). Because gB2e 

did not disrupt editing, a significant percentage (5.3% in SDM79 and 7.6% in SDM80) of 

translatable RPS12 transcripts did contain the alternative C-terminus (J2 transcripts). This 

alternative C-terminus was previously reported in the 29-13 strain [148], however, it appears to 

be absent in the TREU cell line.  

A drop in editing efficiency was seen at the D to E block transition due to the incorrect 

utilization of the gFp guide RNA (Dx) that disrupted the editing cascade (Table 8). While mis-

editing by gFp was limited in the TREU 667 cell line (7.1% of D-block edited transcripts), it’s use 

was much more prominent in the EATRO cell line (17.5%), leading to a significant drop in 

transcripts that could continue past D-block editing. Interestingly, growth in SDM80 lead to a 

significant increase in mis-editing by gFp, with over 32.5% of trancripts using gFp incorrectly, 

resulting in a significant portion of dead-end transcripts. The EATRO cell line had additional 

minor dead-end pathways at the D to E transition.  Misediting by a ND7 gRNA  (gEep) again 

disrupted any further editing, and mis-anchoring by the gE guide RNA (marked with box m) also 

led to the generation of an anchor sequence that could be used by a ND8 gRNA (gFep) disrupting 

any further editing. Interestingly, the editing efficiency did not drop as transcripts transitioned 

to the next block of editing (Table 8). In EATRO-SDM80 cells, the editing efficiency at level F is 

~5.6%, and at level G it actually increases to 5.9%.  Editing efficiency at the block level is  

 

78 

Figure 14.  Observed RPS12 editing pathways in the TREU 667 cell line (A) and the EATRO 164 
cell line grown in SDM79 (B) and SDM80 (C).  U = unedited transcripts. Dot sizes are 
proportional to the percent of block level edited transcripts using the gRNA indicated. Colored 
arrows indicate the gRNA population used. Dashed arrows with closed heads represent gRNA 
populations used in only one cell line (superscript ‘e’ or ‘t’).  gRNA names with superscript ‘p’ 
represent promiscuous gRNAs.  Dots enclosed by a red box represent end point mRNAs with no 
AUG start codon.  gFp is a promiscuous gRNA that edits both in the D and F editing block of 
RPS12.  Arrows with a boxed ‘m’ represent a gRNA that has mis-anchored.   

 

 

79 

calculated based on the number of transcripts that match any of the fully edited sequences in 

that block, regardless of the condition of earlier blocks.  Analysis of editing intermediates 

suggest that this increase occurs due to the ability of the downstream gRNA (the gFp 

population) to overwrite transcripts that have been previously edited through the G-level.  

Because of the overwriting gFp population, mRNAs exist that are fully edited at the G editing 

block but are in a transition state in block F.   

 

Figure 15.  Alignment of RPS12 proteins from T. brucei, T. vivax, Leishmania tarentolae, 
Leishmania donovani, and Leishmania amazonensis.  Asterisks indicate identical residues, 
colons indicate conserved residues.  Highlighted amino acids are changes introduced by 
alternative editing (S>P, gGe, V>Y gB1*) Alignment was generated by Clustal Omega [116].  
RPS12 signature sequence is shown in bold [150]. 
 
Table 8.  Editing efficiency for each RPS12 gRNA population. Percentages were 
calculated based on the number of transcripts that had completed each editing level 
out of the total number of RPS12 transcripts.   

Block 

Percent complete editing of block 

TREU 667 (SDM 79)  EATRO 164  (SDM 79)  EATRO 164 (SDM 80) 

Initiated Editing 

A 
B 
C 
D 
E 
F 
G 
H 
I 
J 

 

10.2 
8.9 
6.9 
6.0 
4.9 
4.5 
3.8 
3.7 
3.7 
3.4 
2.3 

7.4 
4.6 
4.4 
4.2 
3.1 
2.2 
1.4 
1.4 
1.3 
1.2 
0.9 

18.7 
15.9 
14.6 
13.5 
11.7 
6.9 
5.6 
5.9 
5.7 
5.4 
4.2 

The only other minor variation was the use of the EATRO specific gGe guide that occurs 

in a highly cytosine-rich region (Figure 16). Previous examinations of the gRNA coverage in this 

 

80 

T. brucei B1     --MWFLYGCCLRFVLFVLCYYMSPRLPSSGNRRVLYAVFYLYNFVWMLRCFFCC-FIGLVMSLFIIEGGGFVDLPGVKYYTRIVS--------- T. brucei B2FSe   --MWFLYGCCLRFVLFVLCYYMSPRLPSSGNRRVLYAVFYLYNFVWMLRCFFCC-FIGLVMSLFIIEGGGFVDLPGYKILFTYCKLDLDIRYVF T. vivax         --MWFLYGCCLRFVLFVLCYYMSPRLPSSGNRRVLYAVFYLYNFVWLLRCFFCCVFFGLHLSLFIIEGGGFVDLPGIKYYTRMFIN-------- L. tarentolae    MRVLFLYGLCVRFLYFCLVLYLSPRLPSSGNRRCLYAICYMFNILWFFC-VFCCVCFL-NHLLFIVEGGGFIDLPGVKYFSRFFLNA------- L. donovani      VRVLYLYGLCVRFLFFSLVLYLSPQLPSSGNRRCLYAISIMFNILWIFL-VFCCVFFV-VHLLFIVEGGGFIDLPGVKYFSRFFCKS------- L. amazonensis   VRVLYLYGLCVRFLFLCLVLYLSPRLPSSGNRRCLYAISIMFNILWYFL-VFCCFVFV-IFQLFIVEGGGFIDLPGVKYFSRFCNVS-------                    : :*** *:**: : *  *:**:*** **** ***:  ::*::* :  .***  :     ***:*****:**** * region identified only rare gRNAs with multiple C:A basepairs, alignment mismatches and with 

gaps between adjacent gRNAs  [9,119]. While this analysis did extend the identified gRNA 

population and eliminated the gap region, we did not identify either mRNA sequences or gRNAs 

that improved the alignment mismatches (Figure 16). The use of alternative base pairs is not 

unheard of.  A study of in vitro deletions found that alternative base pairs such as C:A, C:U, and 

C:C were tolerated to varying extents [151]. Interestingly, this portion of RPS12 encodes the 

signature sequence, which is nearly universal [150]. Use of the gGe variant gRNA results in a 

single point mutation, substituting a proline in place of a serine within this important sequence.  

Figure 16. Regions with poor gRNA coverage and functionally conserved residues in RPS12.  
Functionally important aa residues are underlined [150].  Pipes (|) indicate Watson/Crick base 
pairs and colons (:) indicate G/U base pairs. Red highlighted hashtags (#) indicated gaps or 
mismatches, green highlighted # indicate C:A basepairs. The introduced substitution mutation 
introduced by use of the gGe gRNA is highlighted in yellow (S>P). 
 
ND7 5’ analysis 

 

Analyses of the ND7 5’ targeted transcriptomes, indicate that full editing of the 5’ 

domain requires five gRNA populations for both cell lines (Figure 17, APPENDIX K). Two variants 

of the terminal population (gE1 and gE2) were identified that resulted in different 5’ terminal 

editing patterns (APPENDIX L).  Translation of these editing patterns yields two different protein 

products in two different reading frames, one (RF1) encoding the canonical ND7 protein (E1) 

and the other (RF3) encoding a putative metabolite transporter (E2, see below) [141].  While 

transcripts for both open reading frames were found in both cell lines, there were notable 

 

81 

L  C  Y  Y  M  S  P    R  L  P  S  S  G  N  R  R  V  L  Y  A        V  F  Y  L  Y  N  F  V  W  M  uuAuGuuAuuAuAuGAGuCCG**CGAuuGCCCAGuuCCGGuAACCGACGuGuAuuGuAuGC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAu ||||:||||                         |||||:|#|||||#||:||||||||||  |    :| AATATAATATA gI    14TAAATTTAGTGACCGAAGGCTAGTGGCT-CATATAACATACG--G----TAATATA gFp |||::|:|:|||||||:||#:  |:|||||#||||||                    ||:|  |    :|||||:||:||:|||||:|::||:|:|| AATGTAGTGATATACTTAGAT--GTTAACGTGTCAAGATATA gH     gE 05TATTATG--G----TATAAAGTAGATGTATTAGAGTAAGCTTA                    |:  |:||::#|#||:||||:|||||:||||||||||||             13TATTCAGT--GTTAGTAGATTGAGGCTATTGGTTGCACATAACATTCATA gG                          |||:|#||:|:|!:|||||||||||:||:|:|||||  |                         14TAATGTGTTAGGATCATTGGCTGCATATGATATACG--GAAATATA gGe differences in the populations. The TREU 667 cell line had the highest editing efficiency with 

over 80% of the transcripts initiating the RNA editing process and ~9.7% of the transcripts fully 

edited through Block E. Use of the gE1 or gE2 gRNAs appeared to be equally efficient, resulting 

in nearly equal amounts of RF1 and RF3 fully edited transcripts. A small percentage of 

transcripts (4.9% of transcripts that completed block E editing) were observed that appeared to 

be mis-edited by a TREU specific gRNA (gE4t), leading to a dead-end product (no ORF).  In 

addition, gE4t also appeared to be able to overwrite editing directed by gE2, to generate a small 

number of transcripts that could be translated in RF2 (pink E3t).   

Figure 17.  Observed ND7 5’ editing pathways in the TREU 667 cell line (A) and the EATRO 164 
cell line grown in SDM79 (B) and SDM80 (C).  U = unedited transcripts. For arrow and gRNA 
naming descriptors see Figure 14.  Dots enclosed by a red box represent end point mRNAs with 
no AUG start codon.  + indicates that more than one mRNA form was condensed into this circle 
to simplify the figure (See APPENDIX M).  Condensed forms encode largely the same amino acid 
sequence with only small variants.  Terminal dots are colored blue for reading frame 1, 
magenta for reading frame 2, or green for reading frame 3.  Boxed green dots have no 
functional start codon, but are translatable into reading frame 3 with the use of an alternative 
start codon (UUG).  

 

 

82 

While the number of “dead-end” pathways were very limited in the TREU cell line, use 

of the gC guide RNA population appeared to be very inefficient, resulting in a large drop in the 

percent of Block C-edited transcripts (25.8% drop, Table 9). A mutant gC gRNA (gCFSt), did result 

in a small percentage of transcripts with a frameshift C-terminus. Interestingly, while 9.1% of C 

block transcripts used the gCFSt gRNA, only 2.4% of the transcripts that have completed D-block 

editing come from this minor branch. This suggests that this alternative edit decreases the 

efficiency of use of the subsequent gRNAs. In contrast, full editing of the ND7 5’ domain in the 

EATRO 164 cell line was very inefficient. While transcripts were able to initiate the editing 

process relatively efficiently (~50 – 70%, dependent on growth medium used), less than 1% of 

ND7 transcripts were fully edited at level E (Table 7). This appears to be due to the use of 

several EATRO specific gRNAs that disrupt further editing (Table 9). Again, the largest drop in 

editing efficiency occurred at the B to C-block transition. In addition, the EATRO specific use of 

gBex, gC1ex and gC2ex all disrupted the editing cascade  (Table 9). This compounded the editing 

efficiency problem, with a majority of C-block edited transcripts (47% in SDM79 and 73.4% for 

SDM80), no longer editing competent. The 5’ end of ND7 has multiple AUG sequences not 

created by the editing process. Translation predictions of these editing blocked transcripts 

(C1ex, C2ex) indicate that they do have ORFs that extend through the HR3 region. The protein 

product of Bex transcripts is in the ARF, but is ten amino acids shorter, while the C1ex and C2ex 

products, which translate in the canonical ND7 reading frame, produce proteins that are both 

three amino acids shorter. Further drops in efficiency occurred due to an anchor mis-match 

(A:A) found in the gD guide RNA population (Figure 18, APPENDIX L).  While the gD mutation is 

also observed in TREU, this cell line contains a sizable population of non-mutated gD guide 

 

83 

RNAs. Editing by the gE4e guide, results in a transcript with no in-frame AUG.  However, 

translation of this transcript (E4e) in RF3 has no stop codons and we cannot rule out the 

possibility of a non-canonical START codon.  

Table 9.  Editing efficiency for each ND7 5’domain gRNA population. Percentages were 
calculated based on the number of transcripts that had completed each editing level out of the 
total number of ND7 transcripts. 

Block 

Percent complete editing of block 

TREU 667 (SDM 79)  EATRO 164  (SDM 79)  EATRO 164 (SDM 80) 

Initiated Editing 

A 
B 
C 
D 
E 

 

79.7 
45.8 
44.7 
18.9 
11.7 
9.7 

52.4 
47.0 
45.8 
13.4 
0.4 
0.2 

72.6 
68.5 
66.7 
16.9 
1.1 
0.5 

Figure 18. Regions with poor gRNA coverage and functionally conserved residues in ND7’5’.  
Functionally important residues are underlined [152].  Pipes indicate Watson/Crick base pairs 
and colons indicate G/U base pairs.  Red highlighted hashtags (#) indicate gaps or mismatches, 
green highlighted # indicate C/A base pairs.   
 

 

Similar to RPS12, ND7 5’ has a cytosine-rich region with poor gRNA coverage (Figure 18).  

This cytosine-rich region contains two conserved residues involved in ND7 function and 

coincides with the C level of editing in the editing pathways where the largest drop in editing 

efficiency is observed (Table 9) [152].  The gC gRNA population is relatively rare (only 114 reads 

found in the TREU gRNA transcriptome, and 6185 reads in EATRO) and has 5 nt mismatches 

with the conventional ND7 sequence (including C:A basepairs; Figure 18).   

 

 

 

84 

H  L  Y  R  F  T  F  G   P  Q  H  P  A     A  H  G    V  L  C  C  L  L  Y  F  C  G  E   F  I  V ACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG***CAGCACAuG**GuGuuuuAuGuuGuuuAuuGuAuuuuuGuGGuGA*AuuuAuuG                                                               :||:||:|:|:||:||:|::|||:| ||:|||:|                                                           gB 15TAATAAGTGATATGAAGATGCCATT-TAGATAGC                     ||: |||#|||:#|||#||   #||#|||||  |:::|||||||||||||||:|||                gC 19TAAT-TAGATGTT-TAGAGC----TCATGTAC--CGTGAAATACAACAAATAATATA ||||:|:||||:|:|||||||:| ::||||#||||| TGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA gD CR3 Analysis 

Previous work indicated that C-Rich region 3 is a putative Complex I member and that it 

is preferentially edited in the Bloodstream stage [47,120]. However, CR3 gRNAs are present in 

both life cycle gRNA transcriptomes, and PCR amplification of 5’ edited transcripts were 

successfully cloned and sequenced in the TREU 667 procyclic cell line [9,119,141]. These studies 

indicated that multiple forms of the mRNA did exist that used different reading frames 

suggesting that CR3 is dual-coding and that it is selection of the terminal gRNA that determines 

which reading frame will be used [141]. In this study, we used primers flanking the editing 

domain in order to analyze the entire CR3 sequence. Interestingly, while 15% of the TREU CR3 

transcripts had initiated the editing process, only 2.2% had completed editing by the initiating 

gA guide RNAs (Table 10). This suggests that the large drop in editing efficiency occurs due to 

incomplete editing by the block A guides. These gRNAs are fairly abundant, and we see no 

alignment issues, so it is unclear why editing of Block A is so inefficient (APPENDIX N).   

Table 10.  Editing efficiencies by block level of CR3.  Transcripts whose gRNAs covered 
two blocks (DEs and FGs) were included in both blocks for this calculation.   

Percentage Complete 

Level 

TREU 667 

TREU 667 
(Enriched) 

EATRO 164 
(SDM 79) 

EATRO 164 
(SDM 80) 

Initiated editing 
A 
B 
C 
D 
E 
F 
G/FG 
 

15.1 
2.2 
1.8 
1.7 
1.6 
1.2 
0.8 
0.4 

 

81.9 
68.3 
56.6 
54.6 
52.0 
39.1 
22.3 
8.8 

85 

6.8 
1.9 
1.2 
0.9 
0.6 
0.6 
0.2 
0.2 

9.4 
2.3 
1.8 
1.5 
1.1 
1.0 
0.4 
0.3 

While the percentage of fully edited transcripts was very low percentage (0.2 – 0.4%, 

Table 10), we were able to again identify the major 5’ alternative editing patterns that direct 

translation to either the ORF or to the +1 Alternative Reading Frame (RF2). To increase the 

robustness of the analyses, we also generated a biased CR3 transcriptome, by size selecting for 

longer transcripts during the amplification process. Analyses of the TREU transcriptome 

indicates that the full CR3 editing pathway has multiple branches, resulting in a total of 12 

major forms of fully edited CR3 (Figure 19). These 12 forms are comprised of three major 5’ 

editing patterns, paired with any of four different 3’ editing patterns. The two initiating gRNAs 

identified (gA1 and gA2), direct identical editing patterns except gA2 inserts an additional three 

U- 1 phenylalanine). The gB guide RNAs all anchor in different areas (Figure 20A) and do 

introduce substantial AA changes near the 3’ end (Figure 20B). However, all gB guide RNAs 

generate the anchor binding site (ABS) that is recognized by gC, hence all 4 nodes merge to a 

common sequence guided by gC and gD (APPENDIX N). 

The 5’ end editing patterns begin to diverge after Block D editing. FGtx transcripts are 

generated by the use of two subsequent gRNA populations, gEt and gFGtxp.  gFGtxp is a 

promiscuous gRNA (previously identified as a ND7 gRNA) that spans both the F and G editing 

blocks. These transcripts were more abundant than both Gt and FGt, however final editing using 

this gRNA does not generate a AUG start codon. It has been proposed that trypanosomes can 

use UUG as an alternative start codon, thus we cannot rule out the possibility that FGtx 

transcripts can be translated (Figure 20B). Analyses of intermediates suggest that the gE guide 

(red arrow) can in fact “overwrite” gEt, indicating that a proportion of these may still be re-

edited into other forms. Editing via the gE population required an additional gRNA to generate 

 

86 

the anchor for either gFt or gFGtp. Generation of Gt transcripts (canonical CR3) requires 2 

additional gRNAs, while FGt (+1 ORF) transcripts are generated by a single gRNA population 

(gFGtp), another promiscuous gRNA (CR4).  

 
Figure 19.  Observed CR3 editing pathways in the TREU 667 cell line.  U = unedited transcripts. 
For arrow and gRNA naming descriptors see Figure 14. Dots enclosed by a red box represent 
end point mRNAs with no AUG start codon. Terminal dots are colored blue for reading frame 1 
or magenta for reading frame 2.  Boxed magenta dots have no functional start codon, but are 
translatable into reading frame 2 with the use of an alternative start codon (UUG).   
 

Surprisingly, when we examined the editing pathways of CR3 in the EATRO 164 libraries, 

we discovered that while three of the four initial 3’ editing patterns were found in this library, 

editing beyond those patterns was completely divergent (Figure 21, APPENDIX O).  A 

completely different set of gRNAs were used to generate fully edited CR3 transcripts (APPENDIX 

 

87 

P). The divergent pathway did show some superficial similarities to the editing patterns 

observed in  

Figure 20. Four different 3’ end sequences found in the TREU 667 transcriptome for the CR3 
transcript (A) and CR3 protein sequences (B). U-residues inserted by editing are indicated by 
lowercase; different sequences created by the different gRNAs are highlighted in RED.  Thick 
underline sequence indicates the anchor binding site (ABS) for the initiating gRNAs (gA1 and 
gA2). Green = ABS for gB1B2; Blue = ABS for gB3; Purple = ABS for gB4. Bolded amino acids 
show sequence variants and shaded sequence shows position of predicted transmembrane 
domains [134] 
 

TREU 667 cells. While both the B1 and B2 transcripts were directly edited by gCe, the B4 

 

transcripts required an additional gRNA to generate the ABS recognized by gCe.  In EATRO cells, 

B4 transcripts could be edited by 3 different gRNAs (gB5e, gB6e and gB7e). While gB7e disrupted 

editing, both gB5e and gB6e generated the anchor that could be used by either gC or gCe. 

Surprisingly, while the conventional CR3 gC guide RNA was clearly used by B4 transcripts, we 

saw no evidence of its use in the B1/B2 pathways. Transcripts using gC could be further 

extended by both gD and gE guide RNAs, however, no evidence of editing beyond the gE guides 

was observed.  In contrast, use of the alternative gCe guide RNA population, could be extended 

by a series of additional guide RNAs, generating transcripts with functional AUG start codons. 

 

88 

A. B1    AuUGuuGuGuuuuAuAuuACAGAuuuuuAGuGuuAuCA---uUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAA B2    AuUGuuGuGuuuuAuAuuACAGAuuuuuAGuGuuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAA B3t    AuUGuuGuGuuuAuAuuAuuuCAGAuuuuAuGGuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAA B4B4’ AuUGuuGuGuuuAuuuuuuuuuuuuAUUUUAuCAuuuGAuAuGuuGuuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAA B. ORF G1t   MFDCLVLLFFYCLFVHFFCFLFVCDLFLCLLFSFCFLLDFCFLFNMGLLLCFILQIFSVII-IIVYKFSLLD G2t   MFDCLVLLFFYCLFVHFFCFLFVCDLFLCLLFSFCFLLDFCFLFNMGLLLCFILQIFSVIIFIIVYKFSLLD G3t   MFDCLVLLFFYCLFVHFFCFLFVCDLFLCLLFSFCFLLDFCFLFNMGLLLCLYYFRFYGIIFIIVYKFSLLD G4t   MFDCLVLLFFYCLFVHFFCFLFVCDLFLCLLFSFCFLLDFCFLFNMGLLLCLFFFFILSFDMLLSFLLLYISFRY  ARF FG1t   MCMIYKNNVYVVVLFWFWLYIFFVFYLFVICFYVCYLVFVFYWIFVFYLIWVYCCVLYYRFLVLS-LLLYISFRY FG2t   MCMIYKNNVYVVVLFWFWLYIFFVFYLFVICFYVCYLVFVFYWIFVFYLIWVYCCVLYYRFLVLSFLLLYISFRY FG3t   MCMIYKNNVYVVVLFWFWLYIFFVFYLFVICFYVCYLVFVFYWIFVFYLIWVYCCVYIISDFMVSFLLLYISFRY FG4t   MCMIYKNNVYVVVLFWFWLYIFFVFYLFVICFYVCYLVFVFYWIFVFYLIWVYCCVYFFFLFYHLICCYHFYYCI FG1tx                  LVVYCVYHCIFLWIFVYVCYLVFVFYWIFVFYLIWVYCCVLYYRFLVLS-LLLYISFRY FG2tx                  LVVYCVYHCIFLWIFVYVCYLVFVFYWIFVFYLIWVYCCVLYYRFLVLSFLLLYISFRY FG3tx                  LVVYCVYHCIFLWIFVYVCYLVFVFYWIFVFYLIWVYCCVYIISDFMVSFLLLYISFRY FG4tx                  LVVYCVYHCIFLWIFVYVCYLVFVFYWIFVFYLIWVYCCVYFFFLFYHLICCYHFYYCI However, many of the gRNAs used were promiscuous, in that they had been previously 

identified as gRNAs of other transcripts. As with the TREU editing pathway, we observe 

transcripts capable of being translated in two reading frames with the FGe mRNAs translating in 

RF1, and the FGe* mRNAs translating in RF2 (Figure 21A, Figure 22). In addition, the Ge mRNAs, 

while not having a functional “AUG” do translate into RF2 if the first “UUG” is used.  As with 

ND7, we observed a shift in editing pattern preference when the EATRO 164 cells were changed 

from SDM79 medium to SDM80.  Interestingly, a new fully edited form of CR3 appeared in the 

EATRO164 SDM80 library only.  The gRNA gCe80 is used in the EATRO SDM79 pathway, but 

editing appears to cease here.  Cells grown in SDM80 continue this editing pathway with two 

additional gRNAs, gDEe80 and gFGe80 (Figure 21B).  This mRNA is translatable, but produces a 

distinctly different and shorter protein product (Figure 21A).  The protein products of the two 

different cell lines are highly dissimilar.  Using bioinformatics tools to predict the secondary 

structure of these proteins, we find that the difference is most noticeable in the RF1s of the two 

cell lines (Figure 23).  Interestingly, the RF2s have a very similar predicted secondary structure.  

This evidence suggests that the two different cell lines are able to use the CR3 transcript to 

create distinctly different protein products.  

 

89 

 

 

Figure 21.  Observed CR3 editing pathways in the EATRO 164 cell line grown in SDM79 (A) and 
SDM80 (B).  U = unedited transcripts. For arrow and gRNA naming descriptors see Figure 14. 
Dots enclosed by a red box represent end point mRNAs with no AUG start codon. Terminal dots 
are colored blue for reading frame 1 or magenta for reading frame 2.  Boxed magenta dots have 
no functional start codon, but are translatable into reading frame 2 with the use of an 
alternative start codon (UUG).  + indicates that more than one mRNA form was condensed into 
this circle to simplify the figure (See Figure 22). Condensed forms encode largely the same 
amino acid sequence with only small variants.   

 

90 

Figure 22.  Alignment of CR3 predicted protein variants from the EATRO 164 cell line. Bolded 
amino acids show sequence variants and shaded sequence shows position of predicted 
transmembrane domains [134]. 
 

Figure 23.  Predicted secondary structures of most abundant CR3 predicted proteins.  
Secondary structure predictions were generated by RaptorX [153–155].  Shaded regions 
indicate predicted transmembrane alpha helices predicted by Phobius [134].   
 

 

91 

 

 

ORF FG1e      MCMIYKYYHICVRWDFGDHCLFGCYELYFMFCYGYCFLFNMGLLLCF--ILQIFSVII-IIVYKFSLLD FG2e      MCMIYKYYHICVRWDFGDHCLFGCYELYFMFCYGYCFLFNMGLLLCF--ILQIFSVIIFIIVYKFSLLD FG5e      MCMIYKYYHICVRWDFGDHCLFGCYELYFMFCYGYCFLFNMGLLLCLYYLCIFIVVIIFIIVYKFSLLD  ARF FG1e*v1   MCMIYKLTIVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVL-YYRFL-VLS-LLLYISFRY FG2e*v1   MCMIYKLTIVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVL-YYRFL-VLSFLLLYISFRY FG5e*v1   MCMIYKLTIVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVYITYVFLLLLSFLLLYISFRY FG6e*v1   MCMIYKLTIVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVYIILCIFIVVIIFIIVYKFSLLD FG1e*v2   MCMIYKLTYVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVL-YYRFL-VLS-LLLYISFRY FG2e*v2   MCMIYKLTYVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVL-YYRFL-VLSFLLLYISFRY FG5e*v2   MCMIYKLTYVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVYITYVFLLLLSFLLLYISFRY FG6e*v2   MCMIYKLTYVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVYIILCIFIVVIIFIIVYKFSLLD FG1e*v3   MCMIYKNIFVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVL-YYRFL-VLS-LLLYISFRY FG2e*v3   MCMIYKNIFVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVL-YYRFL-VLSFLLLYISFRY FG5e*v3   MCMIYKNIFVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVYITYVFLLLLSFLLLYISFRY FG6e*v3   MCMIYKNIFVLGGILVIIVYLVVMSCILCFVMVIVFYLIWVYCCVYIILCIFIVVIIFIIVYKFSLLD G1ex         LLFGVLFLICFVYFIVYLVVMSCILCFVMVIVFYLIWVYCCVL-YYRFL-VLS-LLLYISFRY G2ex         LLFGVLFLICFVYFIVYLVVMSCILCFVMVIVFYLIWVYCCVL-YYRFL-VLSFLLLYISFRY G5ex         LLFGVLFLICFVYFIVYLVVMSCILCFVMVIVFYLIWVYCCVYITYVFLLLLSFLLLYISFRY G6ex         LLFGVLFLICFVYFIVYLVVMSCILCFVMVIVFYLIWVYCCVYIILCIFIVVIIFIIVYKFSLLD  EATRO SDM 80 Only FGe80 MCMIYKNNGSCGFVGWFRLGYCYCECCSFCMIIL Deletion of Encoded Uridines Directed by the Poly-U Tail 

During the analyses of the mRNA deep sequencing data, we observed many partially 

edited transcripts where the deletion of encoded uridines appeared to occur early (prior to 3’ 

insertion events).  We hypothesize that the poly-U tail of preceding gRNAs may direct the 

removal of these encoded uridines. To examine this hypothesis, we examined partially edited 

RPS12 and ND7 5’ transcripts with early deletion events, and determined their relative 

proximity to the preceding gRNA (Figure 24).  (CR3 was excluded from this analysis due to the 

variability in gRNA location caused by the multiple branching editing pathways). These analyses 

revealed that for both RPS12 and ND7 5’, encoded U sites that are near the poly-U tail of the 

preceding gRNA (U-Tail accessible sites) have a higher frequency of total deletions in the 

partially edited transcripts. This suggests that the proximity of the preceding gRNA’s poly-U tail 

does impact deletion of encoded uridines and supports our hypothesis that poly-U tails can 

guide the deletion of encoded uridines.   

 

92 

 

Figure 24.  Frequencies of early total deletions of DNA encoded uridines in partially edited 
ND7 and RPS12 transcripts.  Editing sites with DNA encoded Us were identified, and each 
encoded U site was then characterized based on the editing access of the preceding gRNAs’ 
poly-U tail (U-Tail Accessible or Not Accessible) as well as whether the site in question was 
totally deleted in the final fully edited sequence (Correct Total Deletion or Incorrect Total 
Deletion).  For each site, a window was defined consisting of the 6 sites upstream and 6 sites 
downstream of the site in question.  Each editing sequence was then classified based on the 
states in the 3' end of the window as either matching the fully edited sequence or not.  Error 
bars depict standard error.  (*=p<0.05 **=p<0.01 unpaired T-test) 
 

Discussion 

In this work, we developed a new transcriptome analysis pipeline to fully characterize 

the editing pathways for three putative dual-coding genes, RPS12, ND75’ and CR3. The pipeline 

uses a new program, SKETCH, in combination with our database search program [9]. Combining 

these two programs allowed us to separate true alternative edits from partially edited 

transcripts and allowed the precise mapping of the full progression of the editing process. This 

 

93 

characterization was done in two different cell lines (TREU 667 and EATRO 164) and under 

different energy conditions in order to determine the robustness of the editing process. 

Surprisingly, distinct differences in both editing progression as well as editing efficiency were 

observed in the two different cell lines. In addition, growth of parasites under different energy 

conditions also appeared to be able to influence the editing process. In both cell lines, the 

editing process appeared to be very inefficient, with most of the transcripts completely 

unedited. A comparison of the two cell lines grown in SDM79 did suggest that overall, the TREU 

667 cells were more efficient in editing these three pan-edited transcripts. However, when the 

EATRO 164 cells were transferred from a high glucose medium (SDM79) to a glucose-restricted 

medium (SDM80), the number of transcripts that initiated the editing process more than 

doubled. For RPS12, the increase in editing initiation resulted in a 4-fold increase in the number 

of fully edited and translatable mRNAs.  

Of the three transcripts characterized, the essential RPS12 showed the most robust 

editing progression. Editing of RPS12 is relatively linear, with only a few minor branching 

alternatives.  For this mRNA, the first start codon found on the fully edited transcripts 

consistently translated into the canonical RPS12 open reading frame and we found no evidence 

of transcripts that access the alternative reading frame. The most prominent alternatively 

edited branch was only observed in the EATRO 164 cell line and causes a frame-shift that 

extends the reading frame at the 3’ end (Figure 14 B2e).  Interestingly, this same alternative edit 

was previously described by Simpson et al. in the 29-13 strain, which shows that this edit is not 

an isolated occurrence in the EATRO 164 strain [148]. They observed the alternative at a low 

abundance, which agrees with our observations as well.  Their analysis identified a large 

 

94 

amount of variance at the 5’ end, with 5.7% of transcripts being translatable in the canonical 

ORF.  We did observe sloppy editing at the 5’ end, but found two primary forms 5’ end RPS12 

editing.  Interestingly, this data indicates that the 29-13 strain has a similar editing efficiency to 

the TREU 667 strain and EATRO 164 strain grown in SDM80.   

In contrast to RPS12, we found distinct evidence that ND7 5’ is dual-coding. In both cell 

lines, alternative editing by different terminal gRNA variants resulted in transcripts with either 

RF1 (the canonical ND7) or RF3 (a putative metabolite transporter) linked to the first AUG [141].  

Interestingly, ND7 5’ has also been sequenced in 29-13 cells [148].  While that study did not 

directly state evidence of dual-coding, they did indicate that a large proportion of the fully 

edited ND7 5’ transcripts had a single nucleotide difference in the 5’ UTR.  This difference could 

very well be the same difference we observe in E2 transcripts that links an upstream AUG to the 

ARF. This suggests that the ability to access the two different reading frames is maintained 

across a number of different trypanosome cell lines. 

While fully edited ND7 5’ transcripts were found in both cell lines, a major difference 

was observed in the efficiency of the editing process. In TREU 667 cells, over 79% of ND7 5’ 

transcripts had initiated the editing process and a full 9.7% are fully edited. In contrast, EATRO 

164 cells grown under the same conditions (SDM79) had only 52.4% transcripts that initiated 

editing and a scant 0.2% fully edited. Growth of EATRO 164 cells in SDM80 did substantially 

increase the number of transcripts that had initiated RNA editing (72.6%), however, no 

corresponding increase in fully edited transcripts was observed. The major differences in 

editing efficiency appear to be due to both the use of alternative gRNAs that could disrupt the 

editing cascade and well as a gRNA mutation that affected the ability of the guide RNA to 

 

95 

efficiently anchor. Surprisingly, the gRNAs that disrupt editing in the EATRO cell line are also 

present in the TREU gRNA transcriptome. It is unclear why we see evidence of their use in only 

the EATRO cells. It may be that in the TREU cells, these gRNAs are more efficiently used in a 

different editing pathway. A full understanding of gRNA selection and use will require the 

characterization of the entire edited transcriptome. In addition to the large decrease in the 

efficiency of ND7 5’ editing observed in the EATRO cells, we also saw a distinct shift in the 

number of fully edited transcripts that translate in RF3, the alternative open reading frame. This 

alternative protein has been previously predicted to be a metabolite transporter as it shares 

distant homology with a bacterial sugar transporter, SemiSWEET [141].  

The most pronounced differences between the two cell lines was observed for the CR3 

transcript. In both cell lines, CR3 utilizes a much more complicated editing pathway than either 

ND7 5’ or RPS12 and the overall efficiency of the editing process is very low.  Surprisingly, the 

number of CR3 transcripts that initiate RNA editing is comparable to the percentage observed 

for RPS12. However, editing by the initiating gRNA appears to be very inefficient. In TREU cells, 

while 15.1 % of the transcripts initiate editing, only 2.2% are fully edited through the first 

editing block. A similar drop is also observed in the EATRO cells. The identified gRNA population 

that initiates editing does not contain any mismatched base pairs and it is unclear why full 

editing by this gRNA is so inefficient. The canonical CR3 is a putative NADH Dehydrogenase 

complex I member (ND4L) [120]. Editing of Complex 1 members does appear to be 

developmentally regulated, with full editing only observed in the Bloodstream stage. [5,27–30] .  

It may be that editing is stalled right after initiation by a transcript specific mechanism. 

However, a small percentage of transcripts are still edited. Transcripts edited to the canonical 

 

96 

CR3 sequence were only observed in the TREU cell line. In this cell line, four different 3’ editing 

pathways converge to an internal consensus sequence which then diverges again near the 5’ 

end, generating a variety of different proteins in two different reading frames. In the EATRO cell 

line, editing initiates with the same 3’ gRNAs, but diverges at the internal consensus sequence 

when they employ a different set of gRNAs for full editing.  Because of how different their 

editing pathways are, the TREU and EATRO protein products cannot be directly compared. 

Searches were run on various different databases in order to determine the putative functions 

of the many CR3 proteins.  Unfortunately, these searches yielded few significant results, with 

most proteins only sharing homology with the transmembrane domains of many different 

proteins. The very small percentage of CR3 transcripts that undergo full editing suggests that 

the protein products may not be made or utilized in this stage of the parasite life cycle. 

However, we hypothesize that the ability to alternatively edit transcripts may be an important 

evolutionary mechanism to maintain genetic plasticity. 

The dual host life cycle of T. brucei leaves it vulnerable to genetic drift especially in 

regards to the mitochondrial ETC genes that are not always under selection. Previously, we 

proposed a mechanism that would contribute to the drift robustness of these mitochondrial 

genes. By overlapping ETC genes not under selection in the bloodstream stage with genes that 

are under selection during this stage of the life cycle, the accumulation of mutations can be 

prevented [141]. These overlapped genes share most gRNAs, and this strategy ensures that 

almost all of the genetic material is protected. We also hypothesize that the sequential nature 

of gRNA use and the sensitivity of the RNA editing process to both mRNA and gRNA mutations 

can also protect against genetic drift by increasing the deleterious effects of the mutations 

 

97 

(LaBar and Adami 2017). Increasing the lethality of mutations would insure that deleterious 

mutations are purged from the population during long periods of growth in the mammalian 

host. While the process of RNA editing may help weed out mutations by making them lethal, it 

would also prevent the population from generating beneficial mutations as well.  This strategy 

leaves organisms no options for evolving.  We suggest that alternative edits, such as those seen 

in the CR3 and others previously observed, editing pathways generate protein diversity without 

compromising the genetic information found within the genome [68].   

Our analysis revealed a number of details about the larger mechanisms of gRNA 

selection in RNA editing.  We found a surprising number of gRNAs with that had been identified 

to edit two different mRNA sequences.  While some of these promiscuous gRNAs appear to be 

unproductive, generating dead end branches on their editing pathways, many appear to be 

productively used, as with the editing pathways of CR3 in the EATRO 164 cell line.  The majority 

of these editing pathways are directed by promiscuous gRNAs and these pathways generate 

translatable transcripts.  Interestingly, most of these gRNAs were identified to edit members of 

complex I, particularly ND8 and the 3’ editing domain of ND7.  It may be possible that this is 

another mechanism of increasing the drift robustness of T. brucei by giving these gRNAs 

multiple functions.  Promiscuous gRNAs have been previously identified editing L. tarentolae 

RPS12 and ND3, however, they were not shown to be producing translatable transcripts 

[156,157].   

In addition to this, we determined that RNA editing is not strictly sequential.  While 

overall, editing proceeds from the 3’ to 5’ across of the editing domain, we have found 

evidence that shows that gRNAs can overwrite editing that has been previously generated.  

 

98 

These observations show that the RNA editing system can tolerate some amount of abnormal 

editing, despite the fragility of the system as a whole. Interestingly, another study that 

examined the use of alternative RPS12 gRNAs previously identified found that the three gRNAs 

in question were being utilized, and that despite not generating the conventional editing 

patterns, a very small number of transcripts returned to the canonical editing pattern after the 

alternatives [148].  This last observation may be another instance of gRNA overwriting.  Non-

sequential editing is also suggested by our proposal that the poly-U tail can be used to direct 

editing.  We showed that deletions in partially edited transcripts were more common in regions 

close to the U tail of the preceding gRNA (Figure 24).  Interestingly, in some cases, an upstream 

deletion site would be completely deleted while a downstream site would not be deleted.  This 

phenomenon was also observed in another study, in an in vitro editing system [158].  

In this analysis we determined the editing pathways of RPS12, ND7 5’, and CR3 in EATRO 

164 and TREU 667 cells, and we found evidence of dual-coding in ND7 and CR3.  We also 

showed that editing patterns can vary quite significantly between cell lines and based on 

available energy sources.  Using the gRNA transcriptomes to validate alternative edits was vital 

to completing this project.  In light of the extreme variation we observed in the CR3 editing 

pathway, we believe that in order to fully understand the dynamics of the editing pathways as a 

whole, we need to sequence all of the edited mRNAs and gRNAs in multiple cell lines under 

multiple conditions.   

Acknowledgements 

We thank the Ken Stuart Lab for the trypanosome cell lines.  We would also like to thank 

Hanyou Pan for his contributions to the data analysis of ND7 5’.   

 

 

99 

CHAPTER 5: CLUSTER CLASSIFICATION OF UNKNOWN GRNAS 

REVEALS THE ROBUSTNESS OF THE RNA EDITING SYSTEM 

Abstract 

The RNA editing system of Trypanosoma brucei uses small RNAs called guide RNAs to 

direct the insertion and deletion of mRNAs.  Thousands of gRNAs are used in this system to 

render twelve mitochondrial mRNAs translatable.  These gRNAs edit sequentially, with each 

gRNA generating the anchor binding sequence for the next gRNA.  This means that the system is 

inherently fragile.  gRNA transcriptomes have been generated and gRNAs were identified based 

on their complementarity to previously described edited mRNAs.  This method was effective in 

identifying many gRNAs, but we found that many were still unidentified.  To determine if these 

gRNAs were nonfunctional mutants or potentially had undescribed functions, we grouped all 

gRNAs into clusters, where each cluster had significant sequence conservation, using our new 

program ACORNS.  This showed that most unidentified gRNAs were not related to any 

functionally known gRNAs and could be generating unobserved alternative editing patterns.  

Recently, the editing pathways of three genes, RPS12, ND7 5’ and CR3 were described in detail.  

Each gene had branches in their editing pathways, but it was not always clear based on gRNA 

abundance data alone why one branch of the pathway was more expressed than another.  

Using the defined gRNA clusters, we screened all related members editing these three genes 

against their targets to determine what proportion of each family was able to productively edit, 

using another new program GUIDE.  We found that most of these unidentified gRNAs were 

predicted to disrupt RNA editing. However, using this information in combination with the 

 

100 

mRNA data for these three genes, we found that these mutations are highly tolerated by the 

editing system.  We also determined that gRNA abundance does not correlate with the mRNA 

editing preference.  We also observed many gRNA populations of high abundance that were 

apparently not used to edit.  In most cases these populations had no issues in their mRNA 

alignments but were seen to be only used in one cell line and not the other.  Currently, the 

complete mechanism of gRNA selection is unknown.   

Introduction 

The RNA editing system of the trypanosomes is a unique and complex system.  It 

requires the use of two genetic components, the protein coding genes whose transcripts 

require editing, and the small RNA genes encoding the guide RNAs (gRNAs) that direct the 

specific edits [7].  The genes that require editing are all mitochondrially encoded and are either 

associated with the electron transport chain or the mitochondrial ribosome. In this RNA editing 

process, mRNA transcripts can be dramatically altered by the insertion and deletion of uridines 

[4,6].  These editing events are catalyzed by a multi-subunit editosome complex [140].  

Currently, over 47 proteins have been identified that are involved in the cleavage, uridylyl 

addition or deletion and subsequent re-ligation events that are required for every nucleotide 

change [140]. Formation of these specialized complexes as well as performing the hundreds of 

edits required is highly energetically expensive.  The RNA editing process is also incredibly 

fragile to mutations, as it is sequentially dependent.  RNA editing starts at the 3’ end of the 

mRNA, and each gRNA generates the anchor for the next gRNA.  A mutation in any gRNA could 

disrupt formation of the next anchor, halting the editing process and with it abort the 

expression of the protein. 

 

101 

It has been hypothesized that this energy intensive process evolved in response to the 

parasites complex life cycle [75,76,141]. A dixenous kinetoplastid, Trypanosoma brucei, 

undergoes substantial shifts in its energy sources over its life cycle, requiring extensive 

regulation of metabolic pathways.  In the nutrient deprived environment of the insect, it relies 

primarily on metabolizing amino acids to drive the Krebs cycle and Oxidative Phosphorylation. 

Once transferred to the glucose rich bloodstream of its mammalian host however, it shifts from 

relying on mitochondrial respiration, and moves to using glycolysis alone. In addition, the 

exclusively bloodstream nature of T. brucei in the mammal host requires that they replicate 

continuously, escaping the host’s immune response by periodically switching their surface 

glycoproteins [12]. These conditions should make the genes involved in mitochondrial 

respiration particularly susceptible to genetic drift. Recently, we showed that as many as six of 

the mitochondrial transcripts may be dual coding and that alternative editing near the 5’-end of 

the transcript can be used to access both open reading frames (ORFs). We hypothesized, that 

the alternative ORF may give the parasites a selective growth advantage in the mammalian 

host. This would significantly increase the deleterious effects of mutations and protect against 

genetic drift. The linking of an ETC gene with an essential gene has been previously shown for 

cytochrome oxidase III (COIII). For this transcript, alternative editing by one gRNA can link an 

ORF found in the unedited 5’ sequence with the trans-membrane domains found in the fully 

edited carboxyl-end of the protein [67]. This alternative protein, AEP-1, is involved in mt DNA 

maintenance and appears to be essential during bloodstream growth [68]. The detection of 

such a large number of dual-coding genes (seven including COIII) in a genome that encodes only 

17 proteins, suggests that this is an important mechanism to protect the integrity of the ETC 

 

102 

genes that are required only in the insect. Thus, the ability to overlay genetic information 

maybe a protective strategy made possible by RNA editing [99,141]. The changes in metabolism 

are reflected in other changes in the pattern of RNA editing found the different life cycle stages 

in T. brucei [5,24–30]. For instance, many of the transcripts encoding members of NADH 

dehydrogenase (Complex I) are only fully edited in the bloodstream stage. In contrast, 

cytochrome oxidase II  (COII, Complex 3) and cytochrome B (Complex 3) are both preferentially 

edited in the insect stage.  Recently, we even observed a shift in the editing pattern found in 

insect stage trypanosomes if the growth medium is switched from glucose rich to the glucose 

depleted. This suggests that editing can respond to its environment (Chapter 4).  In addition, 

multiple studies have shown that different editing patterns can be observed for transcripts of 

the same mRNA, with edits being directed by multiple alternative gRNAs. The most dramatic 

example of this is seen in the putative NADH dehydrogenase subunit 4L, known as C-rich region 

3 (CR3) [120].  For transcripts of this gene, two different sets of gRNAs were used in two 

different cell lines of T. brucei.  Both editing pathways were complex and possessed multiple 

branches that were fully translatable.  Curiously, the gRNAs required to generate the editing 

pathways of the two different cell lines were present in both cell lines, but apparently not 

selected for use for unknown reasons.   

The gRNAs that direct the RNA editing process are stored on 1 kb circular DNA 

molecules called minicircles. These minicircles make up the bulk of the mitochondrial DNA of T. 

brucei, with up to 10,000 minicircles per cell, and there are an estimated ~1200 different 

sequence classes of minicircle at varying abundances [8,33].  gRNA transcriptomes have been 

generated for the insect and bloodstream stage trypanosomes [9,119]. Analyses of these 

 

103 

transcriptomes identified over 600 gRNA populations (gRNAs that edit the same region of the 

same gene) involved in editing the mitochondrial mRNA transcripts.  These populations typically 

contained multiple different sequence classes of gRNAs (major classes).  The sequence 

differences observed are all R to R or Y to Y changes, and since both A:U and G:U base pairs 

occur in the editing process, the multiple sequence classes can all guide the generation of the 

same mRNA sequence. These data also show extreme quantitative differences between the 

different gRNA populations, with population sizes ranging from <10 to >300,000 reads. We had 

postulated that regions with very low gRNA coverage are areas with mRNA sequence variations 

and that the editing of these regions may be directed by gRNAs not identified due to internal 

mismatches with conventional sequence. In addition, we also identified some abundant, 

mutated gRNAs that could potentially introduce frameshifts or create sequence that disrupts 

the upstream anchor binding site. This was surprising, as we had hypothesized that these 

mutations would be selected against due to the fragile nature of the editing process. Because of 

the high stringency of our initial screen, only gRNAs with mismatch mutations biased towards 

the 3’ or 5’ ends of the gRNAs were initially identified and it was unclear how prevalent 

mutated gRNAs that could disrupt editing were.  Analyses of our gRNA libraries did indicate that 

they contain millions of “unidentified” reads that had key guide RNA characteristics 

(characteristic transcription start sites, U-tail and ~ length).  

In this manuscript, we describe the development of two new pipelines that allows us to 

begin to characterize these “unidentified” transcripts in order to determine what impact they 

might have on RNA editing efficiency. ACORNS (Assemble Clusters Of Related Nucleotide 

Sequences), allows us to identify and classify gRNA sequences. A second program, GUIDE (gRNA 

 

104 

Uridine Insertion/Deletion Editor), simulates the RNA editing process and allows us to predict 

the effect of the gRNA mutation on the sequencing process. Using the CR3, RPS12 and ND7 5’ 

mRNA transcriptome data, we identified the gRNA populations that can edit or disrupt editing 

of these genes.  We found a surprisingly high toleration of mismatches and gaps in gRNA/mRNA 

alignments, as well as many instances where the most abundant gRNAs present were not those 

preferentially used.  A deeper look at the gRNA transcriptomes also revealed the presence of 

surprisingly many unidentified and functionally unknown gRNAs, suggesting that more work is 

needed to discover their roles in the RNA editing system.   

Materials and Methods 

Cluster analysis of related gRNAs by ACORNS 

In order to determine the relationships of previously identified and unknown gRNAs, a 

new program was created called ACORNS (Assemble Clusters of Related Nucleotide Sequences).  

This program functions to identify putative gRNAs, determine relationships between gRNAs, 

and group related gRNAs into clusters.   

Identification of putative gRNAs. As described previously, identical sequences were 

collapsed and sequences without four consecutive Ts were filtered out, indicating the lack of a 

poly-U tail [9].  To perform this analysis, a new program was generated, called Assemble 

Clusters Of Related Nucleotide Sequences (ACORNS).  ACORNS first filtered out all putative 

gRNAs, based on two criteria: having 40 nucleotides prior to the start of the poly-U tail, and 

having a transcription start site that either matches one of the top twenty most common six 

nucleotide gRNA transcription start sites, or is one mutation away from one of these sites 

[9,119].  Maxicircle edited and unedited sequences as well as ribosomal RNAs were also filtered 

 

105 

from the sequence file.  Identical sequences were collapsed, but retained their overall total 

read abundance.  Additionally, if sequences were identical with the exception of the of the start 

position of the poly-U tail, or 5’ end location of the gRNA, they were still consolidated, keep the 

sequence of the most abundant transcript.   

Identification of gRNA families. A pair of related gRNAs are defined as two gRNAs that 

only differ by a single substitution, insertion or deletion mutation.  Once putative gRNAs were 

identified, ACORNS aligned each gRNA with every other gRNA to determine which gRNAs were 

related.  Alignments were scored using the python-levenshtein package [159]. In order to 

prevent errors due to 5’ exonuclease activity or difference in poly-U site, overhanging 

nucleotides on either end of the alignment were not counted as mismatches.   

ACORNS then grouped gRNAs into families based on their relationships.  Each family was 

grouped together by starting with the most abundant transcript, and including any other gRNAs 

related to that transcript, as well as any gRNAs related to those gRNAs, and so on.  Once gRNAs 

were grouped into families, sequences were trimmed to the same length and any new identical 

sequences were collapsed.   

Visualization of these related gRNA clusters was performed by a subprogram, and 

clusters were visualized in two different ways, with color coding based on gRNA identity or 

gRNA abundance.  Identity was defined as the gene known to be edited by previously identified 

gRNAs, with other all other gRNAs labeled as “unknown”.  Color coding by abundance was 

based on a log scale, and each scale created for each cluster, with the least abundant gRNA 

being colored purple and the most abundant gRNA being colored red, and all other gRNAs being 

scaled accordingly.   

 

106 

Prediction of RNA editing by GUIDE 

A reference library of all edited forms previously described of RPS12, ND7 5’, and CR3 

were collected and annotated with the editing states in each transcript (Chapter 4).  For this 

analysis we generated a new piece of software known as the gRNA uridine insertion/deletion 

editor (GUIDE).  GUIDE uses the outputs of ACORNS, and for each gene, GUIDE pulls the families 

of gRNAs that have members that had been previously identified to edit the given gene.  Editing 

of each member of each family was then simulated.  For each family, the template generated 

by the previously identified members of that family were used.  If gRNAs generating different 

forms of the same mRNA were in the same family, all appropriate templates were tested on 

each family member, and the predicted edit with the longest alignment with the gRNA was 

saved.   

In order to predict the edits for each gRNA, GUIDE determines the most likely anchor 

binding location on the template.  Anchor sequences were required to use Watson-Crick base 

pairs only and be located in the first twenty nucleotides (nt) of the gRNA.  The first twenty nt of 

each were scanned along the template, and the longest consecutive stretch of Watson-Crick 

base pairs was identified.  Additionally, a second anchor was identified, using a weighted score, 

where each G:C base pair was worth two points and A:U base pairs were worth one point.  

Editing from both anchors were tested and the anchor that generated the longest edited 

sequence was saved.   

The rules of editing used by the program are the standard uridine insertion/deletion 

rules observed in the editing system.  If a nucleotide from the gRNA is aligned to a nucleotide 

from the mRNA that is “illegal” (anything other than Watson-Crick or G:U base pairs), a uridine 

 

107 

will either be inserted or deleted to resolve the mismatch.  If this cannot resolve the mismatch, 

editing ends.  Once edits were predicted, they were classified based on their effect on the 

editing process as a whole and their effect on the predicted protein.   

Results 

In our characterization of the EATRO 164 gRNA transcriptome, we identified over 3 

million reads and ~64,000 unique gRNA sequences capable of generating conventional editing 

patterns [9]. These gRNAs were identified by finding the longest common substring to 

conventionally edited mRNAs and retaining only those scoring 45 or more (Watson-Crick base 

pairs = 2; G:U base pairs = 1). In these studies, we found that lowering the stringency and/or 

allowing for mismatches lead to the identification of thousands of gRNAs with characteristics 

suggesting that they were misaligned.  While this initial analysis did identify an almost full 

cohort of guide RNAs, there were still millions of reads in the transcriptome that could not be 

classified. To determine how many of these reads were unidentified gRNAs for possible 

alternative sequence or mutated conventional gRNAs, we developed a new pipeline. ACORNS 

(Assemble Clusters Of Related Nucleotide Sequences) has a number of features that allows it to 

identify and classify putative guide RNAs:  

1.  It filters out all contaminating nuclear and maxicircle sequences.  

2.  It Identifies putative gRNAs based on three criteria: presence of a U-tail (defined as 4 

consecutive U-residues), length (40 nts prior to last stretch of U-residues) and 

transcription start site (based on sequences defined in the previous analyses) [9,119]. 

 

108 

3. It identifies related gRNAs by scoring the best alignment of each gRNA against all 

other putative gRNAs in the library. gRNAs with a single mismatch or gap are then 

classified as related. 

4. It clusters related gRNAs by starting with the most abundant transcript, grouping all 

relatives of that transcript into the cluster, then adding all relatives of any relative to the 

cluster. 

5. Using a subprogram, the clusters of related guide RNAs can be visualized, revealing 

the relationships between the previous identified gRNAs and unknown gRNAs.  

ACORNS analyses were done for both the TREU 667 and EATRO 164 procyclic gRNA 

transcriptomes, leading to the identification of 1256 clusters in TREU and 1168 clusters in 

EATRO 164 (Tables 11 and 12). This allowed us to identify all of the gRNAs that were distinctly 

related to a conventional gRNA but had undergone a mutational event. In addition, we also 

identified a number of clusters that had no previously identified members. The clusters 

identified varied greatly in size, with the largest cluster containing over 3400 sequence 

members with a total of 1,377,190 transcript reads. These large clusters (1000+ members) were 

actually quite rare, with only 10 and 4 clusters of this size identified in the TREU and EATRO 

gRNA libraries, respectively (Table 13). Interestingly, the majority of clusters were quite small, 

containing fewer than 25 sequence members and an average transcript number of 

approximately 250 reads (Table 13). The majority of the small clusters were “unidentified”, in 

that they contained no previously identified gRNA member (Table 13). An increase in the size of 

the cluster, also increased the probability that they contained a previously identified gRNA and 

that cluster members could then be identified. 

 

109 

Table 11.  Summary of ACORNS Results.  gRNAs are classified as previously identified, related 
to a previously identified gRNA based on sequence similarity or unrelated to any previously 
identified gRNAs. 
  
Initial Reads 
Final Reads after ACORNS Step 2. 
Previously Identified Reads 
Previously Unidentified tagged as Related 
Previously Unidentified tagged as Unrelated 

11,387,683 
9,049,005 
4,413,142 
2,072,099 
2,563,764 

15,251,292 
11,199,364 
5,121,216 
2,636,714 
3,441,434 

TREU 667 Procyclic  EATRO 164 Procyclic 

 
Table 12.  Cluster summary. Clusters were defined as a group of 10 or more related gRNAs. 
Cluster characteristics describe the percent of the cluster members that had been previously 
identified.  
Cluster Characteristics 

EATRO 164 Procyclic 

TREU 667 Procyclic 

≥95% Previously Identified 
>0% Previously Identified 
0% Previously Identified 
Unclustered 

# of Clusters 

238 
267 
751 
NA 

Reads 

4,232,188 
3,488,118 
3,289,359 
189,699 

# of Clusters 

191 
215 
762 
NA 

Reads 

3,281,190 
3,180,019 
2,401,148 
186,648 

 
Table 13. Cluster size summary. Cluster size is determined by the number of unique sequence 
classes in a cluster.  % Unidentified clusters is the percentage of all clusters in each bin that are 
completely unidentified.   
 
% Unidentified clusters 
Cluster Size  TREU 667  EATRO 164  TREU 667  EATRO 164  TREU 667  EATRO 164 
10 - 24 
25 - 49 
50 - 99 
100 - 249 
250 - 499 
500 - 999 
1000 + 

137,413 
240,804 
427,029 
1,575,518  1,398,488 
2,167,031  2,153,210 
2,354,552  3,379,901 
4,107,318  1,159,362 

77% 
64% 
61% 
54% 
26% 
27% 
25% 

68% 
63% 
55% 
45% 
45% 
21% 
30% 

503 
265 
184 
139 
47 
26 
4 

518 
291 
185 
173 
60 
19 
10 

# of clusters identified 

Total # Reads 

134,707 
211,154 
425,535 

 

Figure 25 shows an example of three different clusters assembled by the ACORNS 

program. The visualization program represents each individual gRNA sequence as a dot, with 

lines connecting dots representing relationships between the individual sequences. Each 

member is numbered in order of abundance of transcript reads (0 = most abundant). gRNAs can 

be further characterized within the cluster by using color to identify either transcript specific 

gRNAs (Figure 25 A, C and E), or to indicate Log transcript abundance (Figure 25 B, D and F). The 

 

110 

first cluster shown illustrates a relatively small cluster with 104 different gRNA sequence 

members (Figure 25 A and B). In this visualization, it is clear that most of the members were 

previously identified (red dots), including the most abundant member (dot 0). Figure 25B shows 

the transcript abundance of each cluster member. This cluster is very characteristic of most of 

the clusters we identified, in that it has a central very abundant gRNA (red dot in center, >5,000 

reads) with most other cluster members identified having very few reads (purple = 1 read in the 

transcriptome). 98.1% of the transcripts in this cluster were previously identified. Figure 25 C 

and D illustrate a large cluster with 1801 sequence members. In contrast to the first cluster, 

most of the identified sequence members were previously unidentified (gray dots), a few of the 

members however, had been previously tagged as RPS12 specific gRNAs (red dots). The 

previously identified RPS12 gRNAs were tagged as editing the Block H region of RPS12.  The 

originally identified transcripts were rare, with only 255 reads found in the EATRO 164 library. 

The ACORNS analysis allowed us to identify an additional 276,358 reads that cover this region. 

However, most of these newly identified gRNAs, including the most abundant sequence class, 

have mutations that affect their ability to correctly edit the mRNA transcript. 

 

111 

 

 

Figure 25.  Example clusters of related gRNAs generated by ACORNS from the EATRO 164 PC 
gRNA transcriptome.  Each dot represents an individual gRNA sequence and lines connecting 
dots represent relationships between gRNAs.  Each cluster is shown with two different color 
schemes; editing gene identity (A, C and E) and gRNA read abundance (B, D and F).  The first 
two clusters (A,B and C,D) both contain RPS12 identified gRNAs (red dots) and previously 
unidentified gRNAs (gray dots). The third cluster (E and F) contains a majority of previously 
unidentified gRNAs (grey), but also contains CR3 identified gRNAs (purple) and a few gRNAs 
that had been tagged for a dead-end ND7 5’ alternative editing pattern (green).   

 

112 

Figure 25 (cont’d). 

 

 

The final cluster shown also illustrates a large cluster with 1039 members.  (Figure 25 E 

and F).  For this cluster, the most abundant member (member 0) contained over >400,000 

reads and had been previously identified as an alternative CR3 gRNA (gCe80). The ACORN cluster 

analysis allowed us to identify and additional 829 sequences and over 12,700 reads as being 

related to this alternative gRNA.  Interestingly, in the visualization of this cluster, we noted that 

4 of the members were previously tagged as involved in the generation of a disruptive (dead-

end) alternative edit of ND7 5’. An examination of these transcripts indicate that they are in fact 

a specific mutant subclass of the gRNA cluster that are now capable of anchoring and creating a 

misedit in ND7. This cluster was the only example of a cluster containing related gRNAs with 

different targets (distinct from promiscuous gRNAs that all have multiple targets), and could be 

an example of how alternative editing originates. 

 

113 

These data analyses suggest that both procyclic libraries have large numbers of 

unidentified gRNAs of unknown function. It may be that these gRNAs are directing alternative 

editing events that have not yet been characterized. The full mRNA transcriptome has not been 

sequenced and the limited amount of mRNA sequence available does suggest an abundance of 

alternative editing (Chapters 3 and 4) [9,67,68,99,141,148,160]. Surprisingly, both libraries also 

contained large numbers of mutated conventional gRNAs. This was unexpected, due to the 

fragile nature of the RNA editing process. We had hypothesized, that the sequential nature of 

the RNA editing process, would decrease the tolerance for mutations in both the gRNA and the 

mRNA genes (Chapter 4) [9]. In order to determine how these mutated gRNAs might influence 

the RNA editing process, we developed a second program (GUIDE, gRNA Uridine 

Insertion/Deletion Editor) that simulates the RNA editing process. This program takes the fully 

edited mRNA templates for the identified gRNAs in each cluster, finds the best anchor for each 

gRNA and then simulates editing based on the conventional base-pairing rules (A:U and G:U 

pairing both allowed). For each gRNA, the anchor length, length of complementarity and the 

number of sites showing non-conventional editing are determined. This allows the classification 

of the gRNA into different bins: 1) low quality anchor; 2) does not fully edit; 3) conventionally 

edits and 4) alternatively edits. Guide RNAs that generate alternatively edited sequence were 

then further classified as 1) Disruptive edits (does not generate the anchor for the next gRNA); 

2) Frameshift edits (does not disrupt editing but generates a frameshift); 3) missense editing 

(does not disrupt editing but generates a missense mutation).  We note that in the GUIDE 

program, we initially classified C:A base pairs as disruptive (stopping the editing process). 

However, because C:A base pairs have been previously identified within known gRNA 

 

114 

alignments, we did sort gRNAs containing a C:A base pair into their own bin [9,119]. These 

analyses were done for all gRNA clusters identified for RPS12, ND7 5’ and CR3. These 3 genes 

were chosen because the mRNA transcriptomes and full editing pathways had been previously 

characterized (Chapter 4). 

Of the unidentified gRNAs that are related to previously identified gRNAs, 942,397 and 

958,140 reads were related to RPS12, ND7 5’ or CR3 editing gRNAs in TREU and EATRO cells 

respectively.  Of these newly identified gRNAs, 92.6% in TREU and 95.5% in EATRO were 

predicted to be incapable of fully editing, to fail to create anchor for the next gRNA or to cause 

a frameshift.  This suggests that a large number of gRNAs could be disruptive, however, 

because of the large differences in population sizes for the different guide RNAs, it is important 

to evaluate the percent of disruptive editing at the population level.  This data could indicate 

how tolerant the RNA editing process is to mutations in the gRNA population. 

RPS12 

Editing of the essential RPS12 transcript was relatively straight forward with a limited 

number of alternative edits (Chapter 4, Figure 26). The main editing pathway involved 10 gRNA 

populations (gA – gJ).  Analyses of these populations still show a large variation in the 

abundance of the different populations even after the cluster analyses (Figure 26, Table 14). 

Surprisingly, we saw no correlation between the abundance of the gRNA populations and RNA 

editing efficiency. For example, in the TREU cell line, the B1 and B1* mRNA transcripts are 

equally abundant (Figure 26A). However, the gB1* gRNA is almost 50-fold more abundant than 

gB1 (Figure 26B and Table 14). In contrast, in the EATRO cell line, the B1 mRNA transcript is 

significantly more abundant (5x), while the gB1 and gB1* guide RNAs are approximately equal 

 

115 

in abundance. Interestingly, the only gRNA identified that can generate the B1* edits requires a 

gap in the alignment with the mRNA to create the correct sequence. Our GUIDE program bins 

this gRNA into the “disrupts editing” bin (brown) and it is unclear why this gap is tolerated. We 

do note that there are three G:C base pairs surrounding the gap that would help stabilize this 

alignment. We hypothesize however, that the gap in the alignment does in fact affect the ability 

of this gRNA to efficiently edit. In the TREU cells, the abundance of gB1* may contribute to its 

editing efficiency compensating for the gap and increasing the amount of B1* mRNA observed. 

This pattern appears to be repeated in EATRO, where the B1 mRNA is almost five times as 

abundant as the B1* mRNA, but the gRNAs are nearly equally abundant. Interestingly, in both 

transcriptomes the gRNAs responsible for generating the B1 edits both require C:A base pairs 

for the gRNAs to function correctly.  This was the first of many instances observed where 

editing is impossible without the use of C:A base pairs.  

A significant number of the gRNA populations required for the editing of RPS12 did have 

large numbers of mutated gRNAs that were predicted to stop (mismatch cannot be resolved by 

the insertion or deletion of a U-residue) or disrupt editing (does not generate the correct 

anchor sequence for the next gRNA). For example, while the gC populations had a small 

number of gRNAs with perfect alignment to the canonical sequence (1.6% in TREU and 5.9% of 

the gC populations in EATRO), most of these gRNAs contain an illegal base pair (G:A or G:G). 

Similarly, more than 70% of the gD population in the TREU cell line have a U:U mis-match in the 

middle of the alignment and both the gG and gH populations have a majority of gRNAs that 

contain a single mismatch in the best alignment with their editing block.(Figures 27 and 28). 

Surprisingly, when the GUIDE program simulated the edits generated by the mutants of the gD  

 

116 

Figure 26. Observed RPS12 editing pathways in the TREU 667 cell line (A and B) the EATRO cell line (C and D).  U = unedited 
transcripts. Dot sizes are proportional to the percent of block level edited transcripts using the gRNA indicated. Colored arrows 
indicate the gRNA population used (A and C). Dashed arrows represent gRNA populations used in only one cell line (superscript ‘e’ or 
‘t’).  gRNA names with superscript ‘p’ represent promiscuous gRNAs.  Dots enclosed by a red box represent end point mRNAs with no 
AUG start codon.  Lines connecting dots (B and D) indicate gRNA population size and functionality. Disruptive gRNAs were 
considered to be those that were predicted by GUIDE to be unable to complete editing (excluding those that only required a C:A 
base pair to finish editing), unable to generate the anchor for the next productive gRNA, or generated a frameshift mutation.   

 

 

117 

Table 14.  gRNA population analyses for RPS12.  
TREU 667 

TREU 667 

Editing 
Block 

Population 

gRNA 
Reads 

mRNA 
Reads 

EATRO 

EATRO 

164 gRNA 

164 mRNA 

Reads 

Reads 

A 

B 

C 

D 

E 

F 

G 

H 
I 
J 

gA 
gB1 
gB1* 
gB3t 
gB4t 
gB2FSe 

gC 
gCt 
gD 

gFp (Dx edit) 

gE 
gEep 
gFp 
gFep 
gG 
gGe 
gH 
gI 
gJ+ 

416 
1,901 
84,459 
1,692 
2,784 
1 
3,081 
6 
387,246 
2,877 
11,593 
1,162 
2,940 
0 
4,505 
105,217 
41,824 
14,251 
5,573 

69,923 
27,805 
23,318 
2,463 
946 
0 
46,436 
577 
35,563 
2,724 
35,046 
0 
29,838 
0 
28,973 
0 
29,331 
26,417 
17,870 

95 
19,878 
17,975 
0 
0 
15 
304 
0 
460,928 
2,371 
5,269 
178 
2,371 
1,843 
19,028 
99,153 
287,393 
6,136 
18,094 

38,946 
28,494 
5,880 
0 
0 
2,820 
35,255 
0 
20,067 
4,617 
17,723 
879 
10,818 
955 
11,163 
468 
10,843 
9,747 
7,665 

 
population, it predicted that the mutants would generate an alternative sequence that lacks the 

anchor binding site for the gE population (Figures 27 and 28).  However, this alternative 

sequence was not identified in the RPS12 mRNAs of the TREU or EATRO cells (Chapter 4). This 

suggests that the alignment error is tolerated in the generation of the canonical sequence.  For 

most of the mismatches detected, the error in the alignments are all immediately flanked by 

multiple G:C base pairs. It may be that the presence of multiple stable G:C pairs allows the 

editing machinery to tolerate these mismatches. The EATRO gH gRNAs are the exception. This 

population contains a majority of transcripts with a single point mutation near the end of the 

gRNA that should prevent it from generating the full anchor sequence for gI. There are no 

 

118 

stabilizing G:C pairs that flank this mismatch. The other notable difference between the TREU 

and EATRO editing patterns were found in an alternative edit (Ge) observed in the EATRO cell 

line only. We note that the gGe guide RNA is very abundant in both transcriptomes (>100,000 

reads in TREU cells and >90,000 reads in EATRO cells). However, we found evidence of its use in 

only the EATRO cell line. In summary, we identified multiple populations of gRNAs that were 

predicted to be nonfunctional or disruptive in the RPS12 editing pathway.  However, these 

predictions are contradicted by the observed RPS12 mRNA data, suggesting that many of these 

mismatches that we had previously considered to render gRNAs nonfunctional or disruptive are 

tolerated by the RNA editing system.   

 
Figure 27.  Analysis of functionality and abundance of productive gRNAs populations that edit 
RPS12 in TREU 667 cells.  The functionality of each subpopulation is shown as a bar, with 
percentage shown on the left y-axis, and subpopulation abundance before and after 
identification of gRNA relatives is shown on the right y-axis.  gRNAs labeled as ‘Disruptive Edit’ 
failed to generate the anchor for the subsequent gRNA, and gRNAs labeled as ‘C:A base pair’ 
required a C:A base pair to be tolerated for the editing to be completed correctly. gRNAs with 
shaded names were found in the TREU 667 gRNA transcriptome but were not used in editing 
the TREU 667 RPS12 mRNAs, despite being used in EATRO 164 cells.   
 

 

119 

Figure 28.  Analysis of functionality and abundance of productive gRNAs that edit RPS12 in 
EATRO 164 cells.  For description of axes and gRNA functionality labels, please see Figure 27.   
 
ND7 

 

As with RPS12, the ACORNS program identified gRNAs related to those previously 

identified to generate the ND7 5’ editing pathways (Figure 29).  The ND7 5’ editing domain 

contains five editing block levels, and the editing pathway is relatively straight forward until the 

final editing block, where alternative editing generates transcripts translatable in multiple 

reading frames (Chapter 4).  In full editing of this transcript, two of the 5 gRNA populations 

appear to be problematic; gC and gD (Figure 29). In both cell lines, the gC population requires 

both C:A base pairs as well as multiple mismatches or gaps in the best alignments with the 

canonical sequence (Figure 30). These multiple alignment errors do not appear to be well 

tolerated by the editing system, as a severe decrease in editing efficiency was observed in both 

cell lines from the B to C block level (25.8% in TREU cells and 32.4% in EATRO cells) (Chapter 4). 

For the ND7 5’ gD guide RNAs, both cell lines have a majority of gRNA transcripts with a base 

pair mismatch in their anchor.  In the EATRO cells, almost all gRNAs in the gD population either 

 

120 

require a C:A or A:A base pair in the anchor (Figure 31).  While these mismatches are 

surrounded by G:C base pairs, editing efficiency does appear to suffer with a 13.0% drop in 

editing efficiency and only 0.4% of transcripts completing the D block level (Table 15).  In 

contrast, in TREU cells, while most gRNAs have a A:A mismatch in the anchor, ~31% of the 

population has the ability to form a conventional Watson-Crick anchor (Figure 30). The drop in 

editing efficiency is less (7.2%) than observed in the EATRO cells, and a full 11.7% of transcripts 

complete D block level editing. In these two problematic gRNA populations of ND7 5’ editing, 

we begin to see what the limits of the RNA editing system are.  We find that multiple 

mismatches as well as mismatches in the anchor binding region of a gRNA appear to have 

severe impacts on editing efficiency.  Surprisingly, however, these aberrant gRNAs do not 

appear to halt editing altogether.   

 

121 

 

Figure 29. Observed ND7 5’ editing pathways in TREU 667 cell line (A and B) and the EATRO 
164 cell line (C and D).  For descriptions of dots and arrows, please see Figure 26.  Dashed 
arrows with open heads represent a hypothetical rewrite.  + indicates that more than one 
mRNA form was condensed into this circle to simplify the figure .  Condensed forms encode 
largely the same amino acid sequence with only small variants.  Terminal dots are colored blue 
for reading frame 1, magenta for reading frame 2, or green for reading frame 3.  Boxed green 
dots have no functional start codon but are translatable into reading frame 3 with the use of an 
alternative start codon (UUG). Lines connecting dots (B and D) indicate gRNA population size 
and functionality. Disruptive gRNAs were considered to be those that were predicted by GUIDE 
to be unable to complete editing (excluding those that only required a C:A base pair to finish 
editing), unable to generate the anchor for the next productive gRNA, or generated a frameshift 
mutation.   

 

122 

 

Figure 30.  Analysis of functionality and abundance of productive gRNAs that edit that edit 
ND7 5’ in TREU 667 cells.  For description of axes and gRNA functionality labels, please see 
Figure 27.  gRNAs with shaded names were found in the TREU 667 gRNA transcriptome but 
were not found to be utilized in editing the TREU 667 ND7 5’ mRNAs, despite being utilized in 
EATRO 164 cells.  The population labeled gE2v1,gE1v2 contained members that generated both 
sequence patterns and could not be separated.   
 

Figure 31.  Analysis of functionality and abundance of productive gRNAs that edit ND7 5’ in 
EATRO 164 cells.  For description of axes and gRNA functionality labels, please see Figure 27.  
gRNAs with shaded names were found in the EATRO 164 gRNA transcriptome but were not 
found to be utilized in editing the EATRO 164 ND7 5’ mRNAs, despite being utilized in TREU 667 
cells.  The population labeled gE2v1,gE1v2 contained members that generated both sequence 
patterns and could not be separated.   

 

123 

 

 

Table 15. gRNA population analysis for ND7 5’ 

Editing 
Block 

Population 

gA 
gB 
gBex 
gC 
gCFSt 
gC1ex 
gC2ex 
gD 
gE1+/gE2+ 
gE4t 
gE4e 

A 

B 

C 

D 

E 

 
CR3 

TREU 667 

TREU 667 

EATRO 

EATRO 164 

gRNA 
Reads 

6,819 
35,890 
113 
92 
66 
161 
538,052 
49,883 
14,804 
14,772 
53 

mRNA 
Reads 
523,128 
510,653 
0 
196,504 
19,717 
0 
0 
133,558 
106,379 
5,651 
0 

164 gRNA 

Reads 

589 
7,478 
21,775 
24,332 
2 
94 
434,196 
7,781 
163,481 
98 
246,486 

mRNA 
Reads 
430,266 
395,621 
23,440 
67,080 
0 
38,479 
21,471 
3,552 
942 
0 
1,301 

The editing pathways of CR3 are significantly different from those of RPS12 and ND7 5’ 

(Figure 32, Chapter 4).  The most obvious difference is the fact that the pathways are highly 

branched, generating multiple distinctly different mRNA products.  The other key difference is 

that most of the editing pathways defined in the TREU and EATRO cells are not shared.  The two 

cell lines appear to use almost completely different sets of gRNAs to edit CR3, with the 

exception of some edits at the very 3’ end of the transcript. What was most surprising about 

these data was the fact that most of the gRNAs necessary to generate both the TREU and 

EATRO CR3 editing pathways were found in both gRNA transcriptomes in similar relative 

abundances (Table 16, Chapter 4). In addition, similar to the RPS12 and ND7 5’ data, we saw 

very little correlation between the abundance of the gRNA population and the corresponding 

abundance of the mRNA transcript generated. For example, in the EATRO cell line, the gB1B2 

population (38,663 reads) was significantly more abundant than the gB4 guide RNA population 

(100 reads). Nevertheless, the number of mRNAs with the B4 editing pattern was 5-fold higher 

 

124 

than the B1B2 mRNAs. The EATRO B4 mRNAs can be further edited by three different gRNAs, 

gB5e, gB6e and gB7e.  Again, while gB6e is the most abundant of the three, the predominant 

mRNA found is the B5e transcript (Table 16). In addition, while the gB5e guide RNA is also 

abundant in TREU cells, we found no corresponding B5e mRNAs in this cell line. The most 

significant divergence of the EATRO CR3 editing pattern occurs at the B to C editing block 

transition. In TREU, all of the different 3’ editing patterns converge and are further edited by 

the gC guide RNA population. In EATRO cells, the gC population is much less abundant (1590 

reads instead of >17,000). Correspondingly, very few mRNA transcripts were observed that 

used a gC guide.  Instead, the bulk of the B-level transcripts were further edited by gCe (a rare 

gRNA with only 43 reads detected).  The C-block gRNAs also contained the most abundant 

gRNA detected for CR3 editing, gCe80. This gRNA has close to 500,000 reads in both cell line 

transcriptomes. It can anchor to all of the B block mRNAs in both the TREU and EATRO 

pathways but was only observed editing the B5e template in the EATRO cells. Despite the 

overwhelming abundance of the gCe80 guide RNA population, the corresponding mRNA had the 

smallest number of detected reads. 

Surprisingly, there were very few CR3 gRNA populations in either cell line that had 

sequence members that could disrupt editing (Figures 33 and 34).  In the TREU cell line, the FGtx  

5’ end pattern of mRNA editing is generated by a promiscuous gRNA, gFGtxp (originally identified 

as a CR4 gRNA). This gRNA population is predicted to cause a frameshift just upstream of the  

anchor region of gFGtxp.  However, this frameshift is not observed in the mRNA population, and 

it appears that this gap is tolerated like those observed in RPS12.   

 

 

125 

 

 
Figure 32. Observed CR3 editing pathways in TREU 667 (A and B) and EATRO 164 (C and D) cell 
lines.  For descriptions of dots and arrows, please see Figure 26.  Dashed arrows with open 
heads represent a hypothetical rewrite.  + indicates that more than one mRNA form was 
condensed into this circle to simplify the figure. Condensed forms encode largely the same 
amino acid sequence with only small variants. Terminal dots are colored blue for reading frame 
1, or magenta for reading frame 2.  Boxed magenta dots have no functional start codon but are 
translatable into reading frame 2 with the use of an alternative start codon (UUG). Lines 
connecting dots (B and D) indicate gRNA population size and functionality. Usable gRNAs were 
considered to be those that correctly edit, utilize C:A base pairs to edit, or generate only small 
missense mutations.   
 
 
 
 
 
 
 
 
 

 

126 

Table 16.  CR3 gRNA population analysis 

Editing 
Block 

Population 

A 

B 

C 

D 

DE 

E 

F 

FG 

G 

gA1 
gA2 
gB1B2 
gB3t 
gB4 
gB4t' 
gB5e 
gB6e 
gB7e 
gC 
gCe 
gCe80 
gD 
gDep 
gDEe80 
gE 
gEt' 
gEt 
gEep 
gFt 
gFexp 
gFGtp 
gFGtxp 
gFGep 
gFGe*p 
gFGe80 
gGt 
gGep 

TREU 667 

TREU 667 

EATRO 

EATRO 164 

gRNA 
Reads 

100 
1,729 
13,278 
1,598 
945 
12 
3,800 
221 
1,376 
17,045 
37 
538,230 
1,649 
24 
301 
9,135 
567 
34 
14,062 
34,666 
0 
14 
49,892 
219 
69,138 
255 
466 
38,001 

mRNA 
Reads 

5,907 
28,654 
10,776 
9,158 
9,536 
8,692 
0 
0 
0 
27,646 
0 
0 
26,287 
0 
0 
15,016 
14,687 
5,099 
0 
8,073 
0 
1,035 
2,172 
0 
0 
0 
1,231 
0 

164 gRNA 

Reads 

2,368 
19,947 
38,663 
0 
100 
0 
803 
2,359 
369 
1,590 
43 
434,196 
30,125 
118 
120 
15,242 
582 
0 
30,982 
189,828 
1,265 
90 
3,220 
1,912 
104,747 
589 
21,788 
17,318 

mRNA 
Reads 

534 
5,934 
757 
0 
3,720 
0 
2,025 
689 
744 
616 
2,079 
270 
280 
1,981 
0 
204 
0 
0 
1,864 
0 
112 
0 
0 
79 
691 
0 
0 
68 

 

 

127 

Figure 33.  Analysis of functionality and abundance of gRNA subpopulations that edit CR3 in TREU 667 cells.  For description of 
axes and gRNA functionality labels, please see Figure 27.  gRNAs with shaded names were found in the TREU 667 gRNA 
transcriptome but were not found to be utilized in editing the TREU 667 CR3 mRNAs, despite being utilized in EATRO 164 cells.   

 

Figure 34.  Analysis of functionality and abundance of gRNA subpopulations that edit CR3 in EATRO 164 cells.  For description of 
axes and gRNA functionality labels, please see Figure 27.  gRNAs with shaded names were found in the EATRO 164 gRNA 
transcriptome but were not found to be utilized in editing the EATRO 164 CR3 mRNAs, despite being utilized in TREU 667 cells.   

 

 

128 

In the EATRO cell line, only two terminal gRNA populations have predicted editing 

problems.  About half of the gFGe*p subpopulation has a low-quality anchor (<5 nt), however 

the subpopulation does have >53,000 reads that have higher quality anchors, so this mutation 

may be tolerated. The gGep population is also interesting, in that it is predicted to generate a 

frameshift when editing was simulated by our GUIDE program (Figure 34).  This frameshift is 

not observed in the mRNA transcriptome data however because editing appears to stop before 

reaching the last two sites (Chapter 4).  

Discussion 

ACORNS and GUIDE are two new tools that can aide in the understanding of the 

complex dynamics of the kinetoplastid RNA editing system.  The ability of ACORNS to cluster 

related gRNAs proved to be a powerful mechanism for the identification of gRNAs with 

mutations that disrupt their alignment to fully edited sequence. These analyses allowed us to 

identify and characterize nearly 2 million additional gRNA transcripts from our transcriptome 

libraries. In addition, these analyses identified a large cohort of gRNA clusters that are not 

involved in directing the sequence changes associated with the known canonical transcripts. 

More than 25% of the gRNA transcripts found in the EATRO and TREU gRNA transcriptomes 

have no known functional relatives. This strongly suggests that the coding capacity of the 

mitochondrial genome is much larger than previously thought.  

Full characterization of the known gRNA population was also informative. In our initial 

gRNA characterization study, we had noted the extreme population differences found between 

the different identified gRNAs [9]. We had hypothesized that the low copy number gRNAs were 

an artifact of the high stringency of our initial screen. The cluster analyses suggest that extreme 

 

129 

population size differences do exist between the different gRNAs. In both the TREU and EATRO 

transcriptomes,  ~30 clusters were identified with over 500 sequence members each. These 

clusters accounted for the bulk of the gRNA transcript reads (6,461,870 reads in TREU and 

4,539,263 reads in EATRO).  In contrast, over 500 clusters were found that contained fewer 

than 25 sequence members. These tiny clusters account for only ~135,000 transcript reads. The 

very large numbers of low copy number gRNAs are suggestive of high plasticity in the gRNA 

encoding minicircles. Studies in Leishmania have suggested that minicircle sequence class 

frequencies are extremely variable [63,78]. How these huge differences in gRNA abundance 

influences gRNA selection and use is unclear. In only ~50% of the editing branch points 

characterized for RPS12, ND7 5’ and CR3, did the abundance of the gRNAs involved somewhat 

correlate with the preferred editing path. However, even when gRNA abundance did align with 

mRNA abundance, they were often not proportional. In addition, block editing by an abundant 

gRNA is often followed by editing using a rare gRNA, with no equivalent drop in editing 

efficiency. In addition, multiple instances were observed, where highly abundant gRNAs did not 

appear to be used in one cell line. In one example, the gRNAs responsible for generating the 

B5e, B6e and B7e mRNA forms of RPS12 are present in both cell lines, but only apparently act in 

the EATRO cell line, despite being all more than ten times more abundant than their only 

competitor. It may be that protein factors play a predominant role in gRNA selection. One study 

examined the role of the RNA editing mediator complex (REMC), which is heterogenous and 

consists of one primary subunit TbRGG2 that formed associations with either MRB8170 or 

MRB8180 [161].  They showed that depletion of MRB8180 caused global effects on RNA editing, 

but depletion of MRB8170 had transcript specific effects, substantially increasing the amount of 

 

130 

pre-edited RPS12 transcripts, but not significantly affecting the amount of pre-edited ND7 5’ 

transcripts.  This specificity is intriguing, and it is possible that the REMC or other protein 

factors are involved in gRNA selection.  

The identification of all relatives of conventional gRNAs, including those that had 

undergone a mutational event, also allowed us to more carefully characterize the effect of 

mutational noise on the RNA editing system. These analyses indicate that the RNA editing 

system can tolerate more mutational noise than we had originally hypothesized. Surprisingly, a 

large number of the gRNA populations we characterized had base pair mismatches with their 

best aligned mRNA transcripts.  In this study, we saw that while most of the mismatches were 

specifically C:A base pairs, almost every other mismatch was also observed. Even gaps in the 

alignment of the gRNA to the mRNA appeared to be tolerated. In almost all cases however, the 

mismatch base pair and/or gap appeared to be stabilized by multiple flanking G:C base pairs. 

While we cannot rule out the possibility that rare, perfect match gRNAs do exist for these 

regions, it may be that the incompletely base paired interaction is the most stable structure 

possible, hence is generated by that gRNA [162].  While a minimum number of non-paired 

nucleotides does appear to be tolerated within the guiding region, mismatches or non-Watson-

Crick base pairs within the anchor region do not appear to be tolerated, greatly affecting the 

efficiency of the RNA editing process [119]. 

In a large number of the gRNA populations containing alignment mismatches, our GUIDE 

program predicted that the mismatch would drive the generation of an alternative edit. 

However, these alternative edits were not observed in the mRNA transcriptome data. These 

data contradict one of the most prominent models of RNA editing progression, known as the 

 

131 

“mismatch recognition” model [33].  In this model, when the gRNA/mRNA duplex initially 

forms, the editosome proceeds to edit beginning at the first mismatch site closest to the anchor 

binding region.  Once this mismatch is resolved either by the insertion or deletion of a uridine, 

the next mismatch site is edited, and no sites further will be edited until the sites nearest the 

anchor binding region are resolved.  When GUIDE predicts editing patterns, it follows this 

model, predicting each editing site, moving from the anchor binding region towards the poly-U 

tail.  If editing did follow this strict mismatch recognition model, the alternative sequences 

predicted by GUIDE should have been observed in the mRNA transcriptome data.  An 

alternative model suggests that RNA editing occurs via a more “dynamic interaction” [162].  

This model proposes that when a gRNA/mRNA duplex forms, the editosome targets regions of 

the duplex with low thermodynamic stability and edits those regions. As editing progresses, the 

duplex realigns, changing the targets of the editing system. These cycles of progressive 

realignment proceed until the gRNA/mRNA duplex reaches maximum stability.  In this way, RNA 

editing does not necessarily proceed in a strict 3’ to 5’ directional manner. This model suggests 

that mismatches and gaps in alignments can be tolerated because they do not significantly 

impact the stability of the final gRNA/mRNA duplex.  Supporting this, are the frequent 

observations of neighboring G:C base pairs, which would substantially enhance the stability of 

these mismatches.  The “dynamic interaction” model is further supported based on the 

existence of “junction regions” in partially edited mRNAs [161–165,156,166,148,160].  These 

regions adjoin the unedited and fully edited regions of a partially edited mRNA, but do not 

match either the unedited or fully edited sequence.  Junction regions vary significantly across 

partially edited transcripts, possessing no consensus sequence.  These regions can vary in size 

 

132 

and depletion of different protein factors affects their occurrence and length, but the presence 

of junction regions remains ubiquitous across all partially edited mRNAs [161–

165,156,166,148,160].  The mismatch recognition model reconciles the presence of junction 

regions as areas of mis-editing (utilization of the wrong gRNA or a misaligned gRNA), hence all 

junction regions would have a gRNA capable of generating the sequence. In our 

characterization of the editing pathways of RPS12, ND7 5’ and CR3, highly abundant mRNA 

sequences were screened against the gRNA transcriptomes at low stringency levels in order to 

identify true alternative edits. While we were able to identify and number of gRNAs that could 

direct alternative edits, a large number of “junction sequences” were identified that do not 

match any gRNA in our databases. If the “dynamic interaction” model is correct, these multiple 

variable junction sequences could be generated by the same gRNA during the editing process.  

Alternative base pairs have also been shown to be tolerated to different extents in an in vitro 

gRNA directed deletion assay [151]. In this study, substitutions were made of the nucleotides 

immediately upstream of a deletion site.  This study found that when the base pair upstream of 

the deletion site was C:A, C:U or a C:C, the site was still found deleted in the mRNAs.  Deletions 

were not observed when the base pair was a G:A or G:G.    

Another facet of gRNA utilization is the existence of promiscuous gRNAs, gRNAs editing 

more than one target.  In this analysis we showed many populations that edit more than one 

gene, and one population editing the same gene in two different locations (gFp of RPS12).  

Interestingly, most of the promiscuous populations were found in the CR3 data set, and these 

promiscuous gRNAs were the only productive promiscuous gRNAs identified.  No promiscuous 

populations were found editing ND7 5’, and the only promiscuous gRNAs found to edit RPS12 

 

133 

lead to editing pathway dead ends.  Predictions made by GUIDE indicate that many of the 

promiscuous gRNAs generating the CR3 EATRO specific pathways should also be able to 

productively edit in TREU. Because these are promiscuous gRNA, it is possible that their 

availability is impacted by their use in editing other transcripts. A full mRNA transcriptome 

would allow a full analysis of the global impacts of editing on any particular gRNA cluster. 

This study revealed the surprising amount of noise and errors that are tolerated in the RNA 

editing system of Trypanosoma brucei.  Some questions still remain, such as the functions of 

the large proportion of unknown gRNAs, how gRNAs are selected, and what is the true extent 

of gRNA promiscuity.  In order to answer these questions, we believe that a full deep sequence 

of all edited mRNAs paired with gRNA transcriptomes of multiple cell lines would shed more 

light on this complicated situation.   

Acknowledgements 

We would like to thank the Ken Stuart Lab for the trypanosome cell lines and Chris 

Adami for his assistance in the conceptualization of this work.   

 

 

 

134 

CHAPTER 6: SUMMARY AND DISCUSSION 

Introduction 

Trypanosoma brucei is one of the few organisms that utilizes the kinetoplastid RNA 

editing system.  This system seems unnecessarily complex, using two genetic components to 

generate one fully functional product.  Moreover, this system is prone to malfunction; each 

gRNA generates the anchoring region for the next gRNA, making the system sequentially 

dependent.  This makes the mutation or loss of any gRNA along the editing pathway extremely 

detrimental, especially considering that two of the twelve edited genes are essential [17,100].  

This problem is made worse by the fact that some gRNAs are incredibly rare, and during 

replication, the 5,000-10,000 minicircles encoding the gRNAs are divided asymmetrically, 

making minicircle loss not only possible, but routine [8,167].  This system should not work, but 

it does.  Kinetoplastids are some of the most successful parasites on Earth, infecting insects, 

plants, mammals, fish, birds, and reptiles [168].   

The study of this editing system inevitably brings up questions of how this system 

evolved and how it continues to be maintained.  The concept of drift robustness begins to 

explain this surprising amount of fragile complexity [80].  Drift robustness is a form of genetic 

robustness that allows an organism to be protected from extreme events of genetic drift by 

making mutations either neutral or lethal.  T. brucei frequently undergoes population 

bottlenecks throughout its life cycle and as its mitochondria is completely asexual, it seems 

particularly prone to genetic drift [22].  The fragility of the RNA editing system seems to 

coincide remarkably well with the idea of drift robustness.  By rendering mutations or loss of 

 

135 

minicircles lethal, this would prevent accumulation of slightly deleterious mutants in the 

population.  But such a system should suffer from another disadvantage.  In a system where 

mutations are lethal or neutral, how can the organisms continue to evolve?   

In the first examination of the gRNA transcriptome, many gRNAs were identified that 

were capable of generating alternative edits [9].  These edits ranged from having no effect on 

the protein sequence to causing a frameshift and altering a large portion of the protein.  These 

findings sparked this project, which sought to understand the impact of alternative editing on 

genetic integrity, developmental regulation, protein diversity and editing efficiency.  These 

goals were accomplished through the generation of the bloodstream gRNA transcriptome and 

comparative analysis of it with the insect stage gRNA transcriptome, analysis of dual-coding 

genes that utilize alternative edits to access multiple reading frames, the generation of libraries 

of putative dual-coding mRNAs at different states of editing, and analysis of gRNA population 

diversity and the impact of that diversity on the editing system as a whole.   

Summary of Chapter 2 

This chapter characterized the gRNA transcriptome of EATRO 164 bloodstream stage 

Trypanosoma brucei, and compared it to the gRNA transcriptome of procyclic stage 

trypanosomes.  As with the procyclic gRNA transcriptome, conventionally accepted fully edited 

mRNA sequences were used to identify the gRNAs, and a comparison of the two life cycle 

transcriptomes show a 3.5:1 ratio of procyclic to bloodstream gRNA reads. This ratio varies 

significantly by gene and by gRNA populations within genes. The variation in the abundance of 

the initiating gRNAs for each gene, however, displays a trend that correlates with the 

developmental pattern of edited gene expression. Surprisingly, there were very few gRNAs 

 

136 

found in both transcriptomes, but there were many gRNAs that appeared to be related 

between transcriptomes.  Comparing these related major classes from each transcriptome 

revealed a median value of ten single nucleotide variations per major class. Nucleotide 

variations were much less likely to occur in the consecutive Watson-Crick anchor region, 

indicating a very strong bias against G:U base pairs in this region. In spite of the variation we 

saw between related gRNAs, we did find several conserved gRNA characteristics, such as 

transcription start site sequence, length of complementarity, and non-base pairing nucleotides 

at the 5’ and 3’ ends of gRNAs. Overall, gRNA coverage of edited mRNAs as well as overlap 

between adjacent gRNAs was lower in the bloodstream gRNA transcriptome than in the 

procyclic gRNA transcriptome.   

This work indicates that gRNAs are expressed during both life cycle stages, and that the 

differences in the extent of editing previously reported for different mRNA transcripts are not 

due to the presence or absence of gRNAs. However, the abundance of the initiating gRNAs may 

be important in the developmental regulation of RNA editing.  

Summary of Chapter 3 

In this work, we show that many of the mitochondrially edited mRNAs in T. brucei can 

alter the choice of open reading frame by alternative editing of the 5' end. Dual-coding genes 

have specific mutational biases, such as an increase of the ratio of nonsynonymous to 

synonymous mutations, and an increase in mutational frequency overall.  Analyses of 

mutational bias of all mitochondrial genes indicate that six of the pan-edited genes may be 

dual-coding.  These analyses include measuring the conservation of editing patterns between T. 

brucei and T. vivax transcripts and frequencies of different types of mutations. These data were 

 

137 

used in a principal component analysis, which showed a distinct difference between alternative 

reading frames of dual-coding genes and single-coding genes.  Discovery of alternative gRNAs 

reveal that RNA editing can allow access to both reading frames. We predicted the functions of 

two of these alternative reading frames as small metabolite transmembrane transporters.  We 

hypothesize that dual-coding genes can protect genetic information by overlapping genes that 

are under selection at different portions of the life cycle.   

Summary of Chapter 4 

In this study, we analyzed the editing patterns of three putative dual-coding genes, 

ribosomal protein S12, the 5’ editing domain of NADH dehydrogenase subunit 7, and C-rich 

region 3, and constructed detailed editing pathway maps using mRNA and gRNA transcriptome 

data.  While editing of RPS12 showed only transcripts that produce the canonical RPS12 

protein, we did observe a second downstream start codon capable of producing the alternative 

protein, if selected by the ribosome.  In ND7 5’ and CR3, we found evidence that both of these 

transcripts are edited to express protein products in more than one reading frame.  Moreover, 

we found that CR3 has a very complex set of highly branched editing pathways that vary 

significantly between cell lines, with a different set of gRNAs being used in each cell line, 

despite both sets of gRNAs being present in both cell lines.  We also found that changing the 

energy source available to cells also alters the editing preferences of both CR3 and ND7 5’.  In 

addition to this, we found evidence that the poly-U tail that is added post transcriptionally to 

gRNAs may also be used in editing.  These findings suggest that these reading frames can be 

alternatively selected based on the current environment, and that alternative editing may be a 

way for the trypanosomes to continue evolving this rigid editing system.   

 

138 

Summary of Chapter 5 

In the analysis of the gRNA transcriptomes, gRNAs were identified based on 

complementarity to edited mRNAs.  While millions of gRNAs were found using this method, we 

discovered that many gRNAs were still left unidentified.  This high proportion of unidentified 

gRNAs was shocking, as we had predicted that the RNA editing system should be intolerant of 

mutations, and the presence of so many unidentified gRNAs meant that many editing pathways 

had not been characterized, or many mutated gRNAs were also present. To determine the 

identity and function of these gRNAs, two new programs were created, ACORNS and GUIDE.  

The first program functions to group related gRNAs into clusters, where each cluster had 

significant sequence conservation.  This analysis showed that more than half of all unidentified 

gRNAs were not related to any functionally known gRNAs and could be generating 

uncharacterized alternative editing patterns.   

However, there were still many gRNAs that were related to previously identified gRNAs.  

These gRNAs could be capable of disrupting the editing process or generating small alternative 

edits that are tolerated.  In order to investigate this, our second program, GUIDE examined the 

gRNAs responsible for generating the editing pathways of RPS12, ND7 5’ and CR3, as well as 

their previously unidentified relatives.  Using the defined gRNA clusters, GUIDE screened all 

members editing these three genes against their targets to determine what proportion of each 

family was able to productively edit.  The initial analyses of these previously unidentified gRNAs 

revealed than nearly all were predicted to disrupt the editing system, but in our examination of 

the mRNA data, we found this not to be the case.   

 

139 

By combining the analyses of the GUIDE program and the mRNA transcriptome data, we 

learned more about the robustness of the RNA editing system.  In examining the RPS12 gRNAs 

we found that single mismatches or gaps were highly tolerated in gRNA alignments, with only 

very small drops in editing efficiency.  In the examination of the ND7 5’ gRNAs, however, we 

identified the limit of this tolerance.  Significant drops in editing efficiency were observed when 

gRNAs either possessed multiple mismatches or gaps, or possessed mismatches that disrupted 

the anchor binding region.  Finally, in our analysis of the CR3 gRNAs, we observed many gRNA 

populations of high abundance that were apparently not used to edit.  We found that RNA 

editing preference does not correlate positively or negatively with gRNA abundance, and in 

most cases, the apparently unused gRNA populations had no issues in their mRNA alignments. 

We predict that these gRNAs may be in use elsewhere in the editing system, and to fully 

understand this system, we recommend that a full mRNA transcriptome be analyzed.   

Genetic Integrity 

The introduction and maintenance of kinetoplastid RNA editing 

As previously mentioned, the sheer size and complexity of the RNA editing system has 

left many in search of the answers to how it evolved and how it has been maintained.  Many 

hypotheses have been proposed, such as it being a relic of the old RNA world, being a product 

of constructive neutral evolution, or that the system co-evolved with G-quadruplex structures 

that served to protect the genetic information [69–74].  As for how it has been maintained, one 

prominent theory is that RNA editing is advantageous because it is a mechanism by which an 

organism can fragment and scatter essential genetic information throughout a genome [75,76]. 

Because kinetoplast DNA is less stable than chromosomal DNA, and minicircles are frequently 

 

140 

lost due to asymmetric division, this hypothesis suggests that scattering essential guide RNA 

genes throughout the DNA network would prevent fast growing deletion mutants from 

outcompeting more metabolically versatile parasites during growth in the mammalian host 

[76,77].  

We propose that the RNA editing system, and its inherent fragility operate as a system 

to weed out deleterious mutations by making them lethal, as a form of drift robustness.  Drift 

robustness is not adaptive, however, and prevents the population from generating beneficial 

mutations as well. This strategy leaves organisms no options for evolving.  We suggest that 

alternative edits, such as those seen in CR3 and others previously observed, editing pathways 

generate this evolution without compromising the rigid conservation of other genes such as the 

essential RPS12 [68].   

Supporting this is the fact that, of the kinetoplastids, Trypanosoma have some of the 

harshest life cycles in terms of maintaining genetic integrity, with the electron transport genes 

under very relaxed selection in the glucose rich bloodstream stage, and very strict selection 

imposed in the insect stage. In conjunction with this, we observe that Trypanosoma brucei, 

Trypanosoma cruzi, and Trypanosoma vivax all maintain more genes that are edited and more 

extensive editing than their other kinetoplastid counterparts with milder life cycles [4,16,19–

21,52–56,58,59].  For example, Phytomonas serpens infects important crops and is transmitted 

by sap feeding bugs. These parasites have glucose readily available in both life cycle stages, and 

are unique in that they lack a fully functional respiratory electron transport chain, and this 

species is also missing two edited genes entirely, and pan-edits six genes and partially edits 

three genes, compared to the nine pan-edited and three partially edited genes in the 

 

141 

Trypanosoma spp. [64,65]. For Leishmania spp., all life cycle stages possess an active Krebs cycle 

and ETC linked to the generation of ATP, but these cells are never restricted from access to 

glucose [61,62,139]. Leishmania do not pan-edit ATPase 6 or ND7, but only partially edit them 

[63].  These observations suggest that RNA editing provides a larger advantage to organisms 

with a more complex life cycle.   

Dual-coding and dual-function genes 

One oversite that has not been accounted for in the hypotheses that attempt to explain 

the advantage of kinetoplastid RNA editing is how genetic material that is not under selection is 

maintained.  There has been considerable debate on the necessity of Complex I subunits for 

either stage of the trypanosome life cycle. Studies using RNAi and knockout cell lines of nuclear-

encoded members of Complex I have shown that the complex is unnecessary for survival in 

either life cycle stage, and in this work we were unable to find complete gRNA coverage of the 

edited ND subunits in the bloodstream stage, despite the ND subunits generally being more 

fully edited or only fully edited in the bloodstream stage [5,26–29,111,112]. However, the 

nuclear encoded Complex I member genes are maintained [42], and the vast majority of the 

gRNAs required to edit mitochondrially encoded ND subunits were found in both life cycle 

stages.  This evidence together suggests that the Complex I proteins are vulnerable to genetic 

drift but have somehow been maintained.   

In this work, we propose that by overlapping Complex I genes not under strict selection 

with genes that are under selection, the accumulation of mutations can be prevented. Because 

these overlapped genes share most gRNAs, and alternative edits only occur in the terminal 

gRNAs, this strategy ensures that almost all of the genetic material is protected. The genes 

 

142 

predicted to be dual-coding based on our mutational bias analysis include almost all of the 

edited ND transcripts, excluding only ND8.  In the examination of procyclic ND7 5’ and CR3 

(putative ND4L) we found that these two genes are alternatively edited to produce more than 

one protein product [120].  This evidence shows that not only are dual-coding genes being 

utilized in T. brucei, but also that RNA editing is facilitating the use of these dual-coding genes, 

providing the world with a concrete advantage of utilizing this type of RNA editing.   

Based on limited sequence homology, we hypothesize that the trypanosome 

mitochondrial alternative reading frames (ARFs) encode small metabolite transporters that 

provide a distinct growth advantage to bloodstream form parasites. These proteins would 

function differently from all other edited proteins, which do not function as transporters.  

While it has been previously suggested that both alternative editing and dual-function proteins 

are important mechanisms for expanding the functional diversity of proteins found in 

trypanosomes, a duplication event could easily alleviate the evolutionary constraints imposed 

by dual-coding genes [67,97–99]. We maintain that in salivarian trypanosomes, these genes 

must provide the additional benefit of protecting genetic information in order to continue to be 

overlapped.  Protection of the mitochondrial genome during growth in the mammal would 

increase the capacity for successful transfer to an insect vector and maximize the parasites 

long-term survival and spread. 

Analyses of other trypanosomes do show that some of the ARFs have intriguing 

homology to the ARFs identified in T. brucei and T. vivax. However, most of the ARFs are 

punctuated with stop codons. It is possible that these genes have since lost their function and 

are no longer required to be dual-coding due to the reduced selective pressures endured during 

 

143 

their life cycles, or it is possible that these stop codons are removed by alternative editing 

events.  

In addition to the dual-coding genes we have described, another dual-coding gene, 

COIII, is accessed through alternative editing, by connecting an unedited 5’ reading frame with 

an edited 3’ reading frame, and the alternative protein produced, AEP-1, has been shown to be 

essential [67,68].  In addition, there are known Krebs cycle proteins (α-ketoglutarate 

dehydrogenase E2 and α-ketoglutarate decarboxylase) that have two functions, rendering them 

protected while they are not under selection as well [97,98].  These data show that 

trypanosomes are utilizing dual-coding and dual-function genes to protect genetic integrity.   

Another set of dual-function genes appear in the gRNAs, as promiscuous gRNAs.  The 

editing pathways of CR3 are littered with gRNAs identified to edit other genes, such as ND8 and 

the 3’ editing domain of ND7, both of which were not predicted to be dual-coding, and are 

under less protection.  We believe that this is yet another mechanism of increasing the drift 

robustness of T. brucei through the use of alternative RNA editing.   

Developmental Regulation 

In the examination of the bloodstream gRNA transcriptome, we found the abundance of 

the initiating gRNAs in the procyclic and bloodstream gRNA transcriptome is correlated with the 

developmental editing patterns of the genes they edit.  However, we cannot rule out the 

possibility that not all of the populations of initiating gRNAs were identified. We identified 

alternative initiating gRNAs for CR3, and without deep sequencing all pan-edited mRNAs, we 

can’t know that others don’t exist.   

 

144 

It has been previously reported that gRNA presence did not correlate with 

developmental RNA editing patterns in T. brucei [50,51]. This, however, was reported on the 

observation of a very limited number of gRNAs, but we found that for the most part, this held 

true, with gRNA populations having similar relative abundances across both transcriptomes, 

with, of course, the exception of the initiating gRNAs.   

In our examination of the editing pathways of RPS12, ND7 5’, and CR3, we found many 

editing patterns that were only observed or were more prominent in only one cell line.  This 

was most prevalent in CR3, where the editing pathway of the TREU cells is almost completely 

different from that of the EATRO cells.  Curiously, the gRNAs required for both pathways are 

present in the transcriptomes of both cell lines in relatively equal abundances.  Additionally, 

another exclusive editing pathway was discovered when the EATRO cells were moved into 

glucose depleted medium.  This pathway produced a transcript that would make a unique 

protein, totally different from all other CR3 protein products.  These gRNAs appear functional in 

both cell lines but are perhaps being used to edit a different gene when they are not observed 

to be editing CR3.  Indeed, many of the EATRO specific CR3 gRNAs are promiscuous gRNAs 

known to edit other genes.   

To better understand the complexities of gRNA selection, we examined the editing 

branch points of the RPS12, ND7 5’ and CR3 pathways. Examination of editing branch points 

revealed that the abundance of the gRNAs involved did not correlate with the observed editing 

preference.  This work has raised many questions about how gRNAs are selected and editing 

pattern preferences are exerted, and more study is needed to understand this system.   

 

 

145 

Protein Diversity 

In the examination of the RPS12 mRNAs, we found one major alternative editing event. 

This event was observed in the EATRO 164 cell line only, and causes a frame shift that extends 

the reading frame at the 3’ end by nine amino acids.  This same alternative edit was previously 

described in the 29-13 strain as well, which shows that this edit is not an isolated occurrence in 

the EATRO 164 strain [148].Frameshifting gRNAs were also identified to edit ATPase 6 in both 

bloodstream and procyclic EATRO 164 cells. The predicted frameshifts also occur close to the 3’ 

end of the transcript and alters the C terminus of the protein.  As the frameshifts occur 

downstream of the highly conserved amino acid region involved in proton translocation, it may 

be that this is also tolerated [31]. 

Edits of the ND7 5’ mRNAs generate transcripts that can translate into two reading 

frames in TREU 667 and EATRO 164 cells.  Interestingly, this gene was also sequenced in 29-13 

cells [148].  That study indicated that a large proportion of the fully edited ND7 5’ transcripts 

had a single nucleotide difference in the 5’ UTR.  As the upstream start codon that allows the 

ARF to be translated is in what is known as the 5’ UTR, this difference could be the same 

alternative edit that generates the ARF transcripts.  This suggests that this alternative editing 

may be widespread.   

Like ND7 5’, we also detected evidence of dual-coding in CR3.  The alternative edits in 

both cell lines produce multiple variations of the CR3 proteins.  While the editing efficiencies of 

CR3 are very low, possibly preventing the generation of these CR3 proteins, we propose that 

this mechanism of branched editing pathways is a safe way for the trypanosomes to introduce 

variation into the rigid system and continue to evolve.   

 

146 

In addition to the high level of alternative editing observed in CR3, in our re-examination 

of the procyclic EATRO 164 and TREU 667 gRNA transcriptomes, we identified many gRNAs in 

the existing transcriptomes still have no known function.  These gRNAs may be generating 

alternative editing events, thus further increasing the protein diversity of T. brucei.   

Editing Efficiency 

The editing efficiencies of all three genes we examined were surprisingly low.  The 

efficiencies of TREU cell line mRNAs were all less than ten percent, while the EATRO 164 cells 

grown in SDM79 were all less than one percent.  With CR3 and RPS12, more than 80% of 

mRNAs were completely unedited, while those numbers were much lower in ND7 5’.  There are 

many factors we observed that had the potential to affect the overall editing efficiency. 

Mutations and non-canonical base pairs 

Because gRNAs utilize both canonical (Watson-Crick) as well as G:U base-pairing to 

direct the change in sequence, most transition mutations in the gRNA, would not lead to 

changes in the mRNA sequence and would not be selected against [33]. In our observations of 

the gRNA transcriptome, we found a very strong bias against A to G transitions in the anchor 

regions of the gRNAs, suggesting that G:U base-pairing is not well tolerated in this region. 

However, we also found many populations of gRNAs where non-canonical base pairs appeared 

to be tolerated, even in essential genes, ATPase 6 and RPS12 [17,100,107,110]. In the 

bloodstream database, a gRNA population with a C:U base pair must be tolerated to be able to 

complete the editing pathway of ATPase 6, and in RPS12, we observed a gRNA population that 

requires toleration of a gap in the alignment to be capable of fully editing.   

 

147 

In addition to this, RPS12 and ND7 5’ possess to highly cytosine rich regions that 

typically have poor gRNA coverage.  These regions both encode conserved amino acids vital to 

the functions of the proteins.  When we deep sequenced these regions of the mRNAs, we had 

hoped to find alternative sequences that allowed us to identify a more abundant population to 

carry out these edits, but we only found the previously described editing patterns.  Using these 

sequences and performing a search of the gRNA databases at a low stringency yielded 

populations with imperfect anchors, requiring noncanonical base pairs to be tolerated.  In 

RPS12, this did not appear to affect editing efficiency, but in ND7 5’, the effects were severe.  

However, the gRNA population identified to edit this region of ND7 5’ had multiple mismatches 

and gaps in both cell lines, whereas the RPS12 populations did not.   

These examples are far from isolated incidents.  Many other populations were identified 

where noncanonical base pairs were required to generate fully edited mRNA sequences.  In 

most cases, these alternative base pairs do not seem to affect the editing efficiency.  The most 

prominent mismatch was the C:A base pair, but almost every other mismatch was observed.  

Even gaps in the alignment of the gRNA to the mRNA seem to be tolerated.  Of note, we did 

observe that most mismatches were flanked by nearby G:C base pairs that may have assisted in 

stabilizing the alignments.  The use of alternative base pairs has been previously shown to be 

tolerated at different extents, with the use of C:A, C:C and C:U base pairs not completely 

disrupting editing [151].  This evidence suggests that the RNA editing system will tolerate some 

amount of illegal base pairs, there is a limit to what will be tolerated.   

The use of non-canonical base pairs in RNA editing support the model of editing known 

as the “dynamic interaction” model [162].  In this model, editing of an mRNA in a gRNA duplex 

 

148 

does not proceed in a site by site fashion, but instead, the editosome targets regions of low 

stability first, and continues to edit and re-edit the mRNA until a thermodynamically stable 

gRNA/mRNA duplex is achieved.  In this model, non-canonical base pairs may be tolerated if 

they do not significantly disrupt the stability of the duplex.   

Overwriting 

In our exploration of partially edited mRNAs, we found that RNA editing is not always 

strictly sequential.  While overall, editing proceeds from the 3’ to 5’ across of the editing 

domain, we have found evidence that shows that gRNAs can overwrite editing that has been 

previously generated.  Another study that examined the use of alternative RPS12 gRNAs 

previously identified found that the three gRNAs in question were being utilized, and that 

despite not generating the conventional editing patterns, a very small number of transcripts 

returned to the canonical editing pattern after the alternatives [148].  This last observation may 

be another instance of gRNA overwriting.  This is another source of noise, lowering overall 

editing efficiency.   

Future Work 

In this work, we proposed the hypothesis that T. brucei utilizes dual-coding genes to 

protect vulnerable genetic material from genetic drift and accesses the overlapped reading 

frames through alternative RNA editing.  In order to test this hypothesis, we should confirm the 

presence of the alternative protein products in vivo and knock down these products to 

determine if do provide a benefit to the cells.  Knocking down edited transcripts has proved 

difficult, but not impossible.  One study was able to genetically engineer an artificial site-

specific RNA endonuclease to target the ATPase 6 edited mRNA [110].  This engineering 

 

149 

requires an eight-nucleotide specific target sequence, and in order to use this technique, this 

would require finding a target sequence specific to the alternatively edited mRNAs only, and 

leave intact the mRNAs expressing the canonical ETC proteins, which could easily prove difficult 

as we have found that these sequences can be quite similar.   

Another approach to tackle this problem would be to sequence the dual-coding 

transcripts from cells grown under a variety of energetic conditions.  By varying the conditions 

and identifying which conditions the alternative proteins were most prominently expressed in, 

this could help in elucidating the functions of these proteins.  Once a baseline for each set of 

conditions was determined, the artificial site-specific RNA endonucleases could be engineered 

to target the unedited transcripts for each dual-coding gene and observe the effects on cell 

growth in the varied conditions.  This could give insight into the importance of the two different 

overlapped proteins at different points in the trypanosome life cycle.   

Another direction to pursue is understanding the mechanisms of gRNA selection.  We 

observed many promiscuous gRNAs, and it is possible that even more promiscuity exists in the 

system.  To better understand this, it would be necessary to deep sequence all of the edited 

mRNAs and have paired gRNA transcriptome data.  Ideally, this would be done in multiple cell 

lines, and under various energetic conditions, as we saw significant variation in our two cell 

lines and media conditions.  This would allow us to see the full picture of alternative mRNA 

editing and gRNA usage.  Then we could begin to determine how much gRNA promiscuity plays 

a role in gRNA selection, or if other factors are in play.   

 

 

 

150 

Conclusion 

This work found that alternative editing is pervasive in Trypanosoma brucei, and 

brought to light the use of overlapped reading frames, providing another strong reason for the 

utility and maintenance of RNA editing.  This work also showed that the RNA editing system is 

surprisingly robust and tolerant of mutational noise.  

 

 

 

151 

APPENDICES

 

152 

APPENDIX A. Quantification of the number of identified bloodstream and procyclic gRNA transcripts that cover a respective 

nucleotide in the fully edited mRNA. 

Bloodstream gRNAs are shown in dark gray and procyclic gRNAs are shown in light gray. Nucleotides and deletion sites were 

both numbered as edited positions in the mRNA transcripts starting from the 5’ end (+1 =0). Boxes indicate the positions of 
identified populations of gRNAs (coverage ranges shown in parenthesis). Boxes with dark gray or light gray diagonal stripes indicate 
populations identified only in the bloodstream or procyclic transcriptomes respectively. A. ATPase subunit 6; B. Cytochrome oxidase 
III; C. C-rich region 3; D. C-rich region 4; E. NADH dehydrogenase subunit 3; F. NADH dehydrogenase subunit 7; G. NADH 
dehydrogenase subunit 8; H. NADH dehydrogenase subunit 9; I. Ribosomal Protein S12. All individual data points were designated 
with solid circles. Close overlapping of individual data points generate the observed solid lines.  
 
A. ATPase subunit 6 

 
 
 

 

 

153 

 

B. Cytochrome oxidase III 

C. C-rich region 3 

 

154 

 

 

D. C-rich region 4 

E. NADH dehydrogenase subunit 3 

 

155 

 

 

F. NADH dehydrogenase subunit 7 

G. NADH dehydrogenase subunit 8 

 

156 

 

 

H. NADH dehydrogenase subunit 9 

I. Ribosomal Protein S12 

 

157 

 

 

APPENDIX B.  Alignment of the mitochondrial fully edited mRNAs and the most abundant gRNAs required for full coverage 

identified in the bloodstream (blue) and procyclic (gray) life cycle stages. 

Conservative mutations between gRNAs are shown in green and mutations that disrupt alignment are shown in red.  

Lowercase u’s indicate uridylates added by editing, asterisks indicate encoded uridylates deleted during editing. Nucleotides and 
deletion sites in the fully edited mRNA were numbered starting from the 5’ end (+1=0). Watson-Crick (|) and G:U (:) base pairs are 
indicated. Mismatches are indicated by the number sign (#).  A) ATPase 6; B) Cytochrome Oxidase III; C) C-Rich Region 3; D) C-Rich 
Region 4; E) Cytochrome b; F) Maxicircle Unidentified Reading Frame II (Murf II); G) NADH Dehydrogenase Subunit 3; H) NADH 
Dehydrogenase Subunit 7; I) NADH Dehydrogenase Subunit 8; J) NADH Dehydrogenase Subunit 9; K) Ribosomal Protein S12. 
 
A) ATPase subunit 6 
0         10        20        30        40        50        60        70        80        90         
AAAAAUAAGUAUUUUGAUAUUAUUAAAGUAAAuAuGuuuuuAuuuuuuuuuuGuGAuuuAUUUUGGuuGCGuuuGuuAuuAuGuAuGuAuuAuuGuGuAu 
                               |||||||::||:|:|||:::||::|||||||:|:|||||||||| 
                           11TTTTATACGGAAGTGAAAGGGAAGTACTAAATGAGACCAACGCAACATATA 5’ pA6(29-72) 
                           11TTTTGTATAGAAATGAAGAGAAGGTACTAGGTAAAACCAATGCAAATATA 5’ bsA6(29-75) 
                                                               ||::|::|||:|:|:|:|||:||||||:||:||||||| 
                                                pA6(62-102) 11TAATTAGTGCAGATAGTGATATATACATGATGACACATA 
                                               bsA6(62-100) 22TAATTAGTGTAGATAATGATACATATATAGTAACACATA 
                                                                                       :|||:||::|::|| 
                                                                        pA6(86-127) 10TTATAGTAGTATGTA 
                                                                           bsA6(90-129) 10TATAGTATATG  
 
100       110       120       130       140       150       160       170       180       190        
GAuCuAGGuuAuGuuuuAuuGuGuAuuuuAAuUGuuuAAuGuuGAuuuuuGAuuuuuuAuuAuuuuGuuuG*UUUGAuuuGuAuuuGuuuGuuGGuuuGu 
|||                                                              :::|:|: |:|||:||:|||:|:||:||||:||||| 
CTAACATA 5’  pA6(62-102)                  pA6(164-208) 12TATTATGTGGTAGAT-AGACTGAATATAGATAAGCAACTAAACA 
C-AAAATA 5’ bsA6(62-100)                     bsA6(164-208) 14TATTAGTAGAT-AGACTGAATATAGACAGATAACCAAACA 
:|:||||:|:||:||:||:|||||||||                                                                 |::||::: 
TTGGATCTAGTATAAGATGACACATAAATATA 5’  pA6(86-127)                             pA6(192-243) 04TATTAAGTG 
TTAGATTCAATACGAGATAGCACATAAAATATATA 5’ bsA6(90-129)                       bsA6(190-243) 19TTAATTAAGTG 
              ||:|||::::|||:||||||:|:|||:||||||||||||| 
      12TAATAGAAGATAGTGTATAGAATTGACAGATTGCAACTAAAAACTACATA 5’ pA6(113-152) 
   12TTTTAATATAGAATGGTGCATGAAATTGACGAGTTACAACTAAAA-CTATA 5’ bsA6(105-148) 
                                       ||:|::||:||::|::||:|||||:|:|||:|| |||||||||||| 
                                  11TGATATAGTTAGAAGTTGGAAGATAATGAGACAGAC-AAACTAAACATATA 5’(138-183) 
                             10TAATAAGTGGTAGTTAGAGACTGGAAAATAGTAAAACAAAC-AAAT-AAATA 5’ bsA6(139-175) 
 
 

 

 

158 

200       210       220       230       240       250       260       270       280       290        
G***UUUGuuuuuAuuGuuGuGGuuuAuGuuGuuuAAuuuAuAuAGuuuAAUUUUGuAuuA*UUGuAuuACuUAUUUG***AAuuuG*UAuuUGuuGuuu 
|   |||:|                                                          ||||||:||:::   ||:||: ||::|||::|:| 
C---AAATATATA   pA6(164-208)                       pA6(266-313) 12TTAATGAGTAGGT---TTGAAT-ATGGACAGTAGA 
C---GAATATA 5’ bsA6(164-208)                       bsA6(281-313) 09TAGTGTATAGA----TTAGAT-ATGGATGATAAG 
:   |||::||:|||:::|||::|:|||||||||||||||||||                               bsA6(291-329) 16TAATAGTAGA 
T---AAATGAAGATAGTGACATTAGATACAACAAATTAAATATA 5’ pA6(192-243)                      pA6(301-345) 11TAAA 
T---AAATGGAAATAGTAGTATCAAGTACAACAAATTAAATATA 5’ bsA6(190-243)                    bsA6(301-345) 15TAAA 
                         |:||:|::|||||||:|:|||||:||||:|:|||||| |||||||| 
                      11TAGTATAGTAAATTAAGTGTATCAGATTAGAGCATAAT-AACATAATAATACA 5’  pA6(224-269) 
                   02TTTAAGTGTAACAGATTGAATATGTCAAGTTAAAACATAAT-A-TATATA 5’ bsA6(221-262) 
                                                 ||||:|::|||:| |||||:|||:||:|:|   |||||| |||||  
                                 pA6(248-292) 13TATTAGAGTATAGT-AACATGATGGATGAGC---TTAAAC-ATAAAATATA5’  
                                      bsA6(254-298) 17TATATAGT-AGTATGATGGATAGAC---TTAAGC-ATAAACAACAATATATA 
 
300       310       320       330       340       350       360       370       380       390        
uGuAuuGuuuuuuuAuuGuAuAuuGCAuuuuuAuuuuuGuuuuGuuuuuuAuGuGAuuuuuuuuuGuuuAAuAAuuuGuUAGuuGGuGAuA****Guuuu 
||||||||||||||                                               ||||::|||||:|||:|:|||::||:|||||    :|||| 
ACATAACAAAAAAAAAAAAA 5’ pA6(266-313)         pA6(360-407) 13TAAAAGTAAATTGTTAGATAATTGACTACTAT----TAAAA 
ACATAACAAAAAAAAAA 5’ bsA6(281-313)   bsA6(352-401) 12TACTAAGAGAAGATGAATTGTTAGGTAATCAATCACTAT----CAAAA 
ATATAGTAAAGAGATAGCATATAACGTAAATATATAAA 5’ bsA6(291-329)                                 :|||    :|:|: 
 ::||::|||||:|||::||||:||||:||:||:||||||||||||                          pA6(387-435) 10TTTAT----TAGAG 
CTGTAGTAAAAAGATAGTATATGACGTGAAGATGAAAACAAAACAATATA 5’ pA6(301-345)     bsA6(387-435) 11TTTAT----TAAAG 
CTGTAGTAAAAAGATAGTATGTGACGTGAAAATGAAAACAAAACAATATA 5’ bsA6(301-345) 
                                 |||:||::||:|:::|||||:||||:|||:|:||||||||||||| 
                 pA6(331-375) 12TATAGAAGTAAGATGGAAAATGCACTGAAAGAGAACAAATTATTAATATA 5’ 
                bsA6(331-371) 15TATAGAGATAAGACAGAAGATGCACTAAAGAAAAACAAATTAAATATA 5’ 
 
400       410       420       430       440       450       460       470       480       490        
AuGGAuGuuuuuuuuAUUC**GuuuuuuGuuGuGuuuuuuAGAGuGuuuuuCuuuGuuGuGuCGuuGuuuGuCGACGuuuuuGCGuuuGUUUUGuAAuuu 
||||||||                                                :||:|:||::::||::|||||:|:|||:|||:||||||||||| 
TACCTACATATATA 5’  pA6(360-407)         pA6(455-497) 14TTAATATGGTGGTAAGTAGCTGTAGAAATGCAGACAAAACATTATATA 5’ 
TAAT-ATA 5’ bsA6(352-401)                    04TAATTAGAGTAACATAGCAATAGATAGCTGCATTAA 5’ bsA6(452-477) 
||||||::||:|:|:||||  ||||:||||||||||                                                    |:|:|:::|||:| 
TATCTATGAAGAGAGTAAG--CAAAGAACAACACAATATATA 5’  pA6(387-435)             pA6(487-526) 16TATAGAGTGTTAGA 
TGTCTATAGAAAAGATAAG--CAGAGAACAACACAATATATA 5’ bsA6(387-435)            bsA6(487-526) 16TATAGAGTGTTAGA 
                        |||:::|||||::||||||||:|:||:|:|:||||||||:| 
                      12TAAAGTGACACAGGAAATCTCATAGAAGGGAGCAACACAGTATATA 5’  pA6(424-464) 
                      13TAAAGCGACATGGAAAATCTCATAGAAGGGAGCAACACAGTATATA 5’ bsA6(424-464) 
                                           bsA6(458-500) 15TATAGTAATAGATAGCTGCGAAGATGCAAACAGAACATTAAA 
 

 

 

159 

500       510       520       530       540       550       560       570       580       590        
AuuAuCAuCCCAuUUUUUAuuGuuGAuGuuuuuuGAuuuuuuuUAuuuuAuuuuuGuuuuuuuuuuuuAuGGuGuuuuuuGuuAuuGAuuuAuuuuAuuu 
|||||||||||||:||:||||||||||                                          ||::|:|:||:|:|:|:|||:||||||||||: 
TAATAGTAGGGTAGAAGATAACAACTAAACATA 5’(487-526)        pA6(568-611) 12TTATTATAGAAGATAGTGACTGAATAAAATAAG 
TAATAGTAGGGTAGAAGATAACAACTAAACATA 5’ bsA6(487-526)    bsA6(576-616) 07TACATATAGAATAGTGACTGGATGAAATGAA 
                      :||||::||:|:|||:|||:||||||:||||||||||||||||||||                      |:|:||||:|| 
      pA6(521-567) 05TTAATTGTAAGAGACTGAAAGAAATAAGATAAAAACAAAAAAAAAAAAAA 5’ (589-629)13AATTTAGTGAAATGAA 
 bsA6(520-553)  04TATATAACTGTGAGAGACTAAGAAGAATGAAATAAAA-CAAAAAAAAAAA 5’    (589-629)10AATTTAGTGAAATGAA 
                                               ||||:|:|::#||:|:|:|||||::|||||:||||||||:|||||||| 
TTTAAA 5’ bsA6(458-500)      pA6(557-593) 09TATAAATGAGAGT-AAGAGAGAAAATGTCACAAGAAACAATAGCTAAATAA 5’ 
                              bsA6(546-592) 11TAAATGAGAGTGAAAGAGAGAGATACCGTAGAAGACAATAACTAAATATA 5’ 
 
600       610       620       630       640       650       660       670       680       690 
AuuuuuGuGuuuuGuuuuuGuuuAuuAuuuuAUGuGuuuuuAuAuUUGuuGGAuuuAUUuGCC***GCCAuAuuAC****AGuuAuuuAuuuuuuGuAAu 
||||||||||||                                                                     |:|:|:||||::|:|:|||| 
TAAAAACACAAATCATA 5’ pA6(568-611)                        pA6(680-714) 13TAAT-----TTAGTGAATAGGAGATATTG 
TAAAAACACAAAACAAATA 5’(576-616)                        bsA6(671-714) 14TTAATG----TTAGTAAGTGGAGAATATTA 
|||:|:||:||::||||:||||||||||||                                                                     |: 
TAAGAGCATAAGGCAAAGACAAATAATAAATA 5’  pA6(589-629)                      pA6(698-728) 11TAATAAGAGATAGTG 
TAAGAGCATAAGGCAAAGACAAATAATAAATA 5’ bsA6(589-629)                     bsA6(699-727) 11TAATAAGAAATATGA 
              :||||:::|||:|||||||::||:|:||||||:|||||||||                
           12TTAAAAGTGAATGATAAAATGTACGAGAATATAGACAACCTAATATATA5’ pA6(613-654) 
          11TTTAAAAGTGAATGATAAAATGTACGAGAATATAGACAACCTAATATA 5’ bsA6(613-654) 
                                           |||:|:|:::|:|:||:|:||   :|||||:|||    |||||||||| 
                        pA6(640-689) 12TTATATAGATAGTTTGAGTAGATGG---TGGTATGATG----TCAATAAATATATA 5’  
                             bsA6(643-667)15TAAGTAGTCTAGGTAGATGG---CGTTATAGTG----TCAATAAATATATACA 5’ 
 
700       710       720       730       740       750       760       770       780       790        
AuGAuuuuGCAGuuGAuAAuGG**AuuuuuuGuuGuuuuuGuuGuuuGuuuAGuuuuGuAuuuGAuuuuuGAuAGuuAuuAuAuuGuuGuuGAAAuuuG* 
|:|||:|||||||||                                                       :|||:||||:||||:::|||::||||:|| 
TGCTAGAACGTCAACATAAAA 5’ pA6(680-714)       pA6(770-822)TCTCTTCTTTCCCTTTATTAATAGTATAGTGACAGTTTTAGAC- 
TACTAGAACGTCAACATAGA 5’ bsA6(671-714)                    bsA6(773-822) 09TTTAATAGTATAGTGACAGTTTTAGAC- 
||:|:|:|:||:|:||||||||  |||||                                            
TATTGAGATGTTAGCTATTACC--TAAAATTA pA6(698-728) 
TGTTAGAATGTCAATTATTACC--TAAATATATA 5’ bsA6(699-727) 
                     ::  ||||:|:::|:|:|||:|:||:|:||:||||:|||||||||||| 
                  11TTT--TAAAGAGTGATAGAAATAGCAGATAAGTCAAGACATAAACTAAATA 5’ pA6(720-767) 
               14TTTATT--TAGAGAGTAGCAAAGACAGTAAGTAGATCAAAACATAAAT-ATATA 5’ bsA6(717-763) 
                                                :||||:|:|::||||::|::||:||:||||||||||||||||| 
                                pA6(747-789) 11TTAAATTAGAGTATAAGTTGGAAGCTGTCAATAATATAACAACATAAAA 5’ 
                               bsA6(747-789) 11TTAAATTAGAGTGTAAGTTGGAGACTATCGATAATATAACAACATATATA 5’ 

 

160 

 
800       810       820       830       840       
*GuuUGuuA**UUGGAGUUAUAGAAUAAGAUCAAAUAAGUUAAUAAUA_ 
 :||:||||  |:|||||||||| 
-TAAGCAAT--AGCCTCAATATCAGG 5’ 
-TAAGCAAT--AGCCTCAATATCATATA 5’  
 
 
 
Alternate initiating gRNA (procyclic transcriptome only) 
 
750       760       770       780       790       800        810       820       830       
uAGuuuuGuAuuuGAuuuuuGAuAGuuAuuAuAuuGuuG*uGAAA*uuG**GuuuUGuuA**UUGGAGUUAUAGAAUAAGAUCAAAU 
                        :||||:|||::::|| |:||| |||  :|||:||:|  ||||||||||||    
         pA6(774-822) *14TAATAGTATGGTGAC-ATTTT-GAC--TAAAGCAGT--AACCTCAATATCATA 5’ 
 
 

 

 

161 

B)  Cytochrome Oxidase III  
0         10        20        30        40        50        60        70        80        90         
GGUUAUUGAGGAUUGUUUAAAAUUGAAUAAuuAuuAuuuuuuuAuGuuuuuGuuuC*****GuuGuAuAuuuGuuGGuGuuA****GuGGuGuuuuuGuu 
                                    |||:||||||||:|:|:|:||     |||||||||| 
              pCO3(35-70)11TATG-TAGTTAAGAAAATGCAGAGATAGAG-----CAACATATAATTAATA 5’ 
          bsCO3(36-70)12TATATATGGTAGAAAAGAGATACAAGAATAGAG-----CAACATATAATATATA 5’ 
                                                       ||     :|::|||||::||:::|:|:|    |||:|||||||||| 
                                   pCO3(54-101) 11TAAATAG-----TAGTATATAGGCAGTTATAGT----CACTACAAAAACAA 
                                     bsCO3(51-99) 09TAAAG-----TAGTATATAAACAGTTACGAT----CATCACAAAAACAA 
                                                                                  |    :||||:|:||:::| 
                                                                  pCO3(81-116) 10TT----TACTATAGAAGTGA 
                                                                        bsCO3(88-115) 07TCTATAGAAGTAA 
100       110       120       130       140       150       160       170       180       190   
uuuuuAuCuuuACCuGCCAuuGuuAuuGuGuAuuGGuuAuuuuGuuuGuuG****GGAuuuAuuuGuuuAuuGUUUG****GuAGuuuuuuAuuuGuuGA 
||                                        |::|:|:||:    |:||:||:|:||:|||||||||    |||:| 
AATATA 5’           pCO3(141-185) 12TAATTTAGTAGATAAT----CTTAGATGAGCAGATAACAAAC----CATTATATA 5’  
TATA 5’            bsCO3(141-185) 14TAATTTAGTAGATAAT----CTTAGATGAGCAGATAACAAAC----CATTATATA 5’ 
||:|:|:||||||||||#|||||||||:|:|||                               |::::|||::|:|:    |:||||:|:|||||||||| 
AAGAGTGGAAATGGACGATAACAATAATATATA 5’(81-116) (163-203) 10TAGATATAGTGGATAGTAGAT----CGTCAAGAGATAAACAACT 
AGAGATAGAGATGGATAGGTAGCAATAA 5’(88-115)            bsCO3(168-195) 12TATAGTGAAT----TATCAGAGAATAGACTACT 
      09TAGATGGATGGTGACAATAACATATATA 5’bsCO3(108-132) 
                  ||:|:|:|:|:|||||:|||||:|:|||||||||    |:                                       :||:| 
(117-156) 13TATATAGTGATAGTGATACATAGCCAATGAGACAAACAAC----CTATATATA 5’             pCO3(195-247) 09TAATT 
  (117-155)16TGATAGTAGTAGTGACACGTGATCAATAAGACAAACAAC----C-ATATACA 5’            bsCO3(195-244) 13TAATT 
                                                                
200       210       220       230       240       250       260       270       280       290        
uuGuG****GuuuuAuuuuuuuuuuuGuuGGuuuuuGuAuuuGuuuGuuGuuGuuAuuGuuAGAuuuGuuuuGuGAuuuuuuACGuGGuuuAuuuGAuuu 
||:|                                                      |:|||:||:|:|:|:||:|:|||:|||:|:||||||||||||| 
AATAA----AA 5’(163-203)                  pCO3(258-299) 10TATAATTTAGATAGAGCATTGAAAGATGTATCAAATAAACTAAA 
AACAC----CAAAATATATA 5’(168-195)         bsCO3(259-300) 13TTAATTTAGATAAGATATTGAGAAGTGCACTAGATAAACTAAA 
|::::    :|:|||:|:|:|||:|||:||:|||||||||||:|||:|                                              
AGTGT----TAGAATGAGAGAAAGAACGACTAAAAACATAAATAAATA 5’(195-247)              
AGTGT----CAAGATGGAAAAAGAAGTAGCCGAGAACATAAACAATATA 5’(195-244) 
                              ::|:|:|:|||:|:|:||||::|:|:|:|||||||||||||||||:                   ||:||:| 
             pCO3(229-274) 10TTTAGAGATATAGATAGACAATGATAGTGACAATCTAAACAAAACATATA 5’(293-320) 10TAATTAGA 
                   bsCO3(236-279) 11TTATAAATGAATAATGACAATGATAGTCTAGACAAGACACTAAAATATA 5    11TAAATTGAA 
 
 

 

 

162 

300       310       320       330       340       350       360       370       380       390        
uuGuGuuuuAuuACGuuGuAuCCAGuAuuGuuuuuuAuGGuuuuuAuGuAG*UGAGuuuGuuuuAuuuAuGGCGuuuuuuG**UUGuAuuAuuuGGuuuA 
 |:|:|                                        ||:||: :||:||:::||||:|||||:||:|::|||  ||||||||| 
TATATA 5’(258-299)            pCO3(345-391) 12TATATT-GCTTAAGTGAAATGAATACTGCGAGGAAC--AACATAATATATATA 5’  
ATATATA 5’(259-300)          bsCO3(345-389) 03TATATT-GCTTAAATGAGATAAATGCTGCAGAGAAC--AACATAAAATATATA 5’ 
:|:|:|:|||:|||:|::|||#|||||||                                  ||||:|||::|::|:||::  |||||:|||:||:|||| 
GATATAGAATGATGTAGTATATGTCATAA 5’(293-320)     pCO3(362-406) 09TAATAGATATTGTGAGAAGT--AACATGATAGACTAAAT 
AGTGTAAGATAATGTAGCATGGGTCATAACAAATATATA 5’bsCO3(291-332)    06TAATAGATACTGTGAGAAGT--GACATAATAGACTAAAT 
                                                                             |||::  :||:||:|||::||:|| 
                                                          pCO3(376-418) 10TTTAAAGT--GACGTAGTAAGTCAGAT 
                                                         bsCO3(378-422) 17TAATTAAT--AGTGTGATAGACTAGAT 
                        |:|||:::||||||::|||:|||::||| :|||||||||||||                                :|| 
      pCO3(323-365) 11TATTATAGTGAAAAATGTCAAGAATGTATC-GCTCAAACAAAATATATA 5’        pCO3(397-436) 10TGAT 
     bsCO3(323-365) 09TATTATAGTGAAAGATGTTAGAAATGTGTC-ACTCAAACAAAATATATATA 5’ 
 
400       410       420       430       440       450       460       470       480       490        
uGuuuAuuuuuGuGuuGuGAGuuuGCUUUCGuuuuuuGuuuACCuuAuAuGuuuuGuuGuuuAuuAuGuGAuuAuGGuuuuGuuuuuuAuuGG*UAuuuu 
|||||||                                                       |||:|::|||:||:::||:|||:||:|||||| |||| 
ACAAATATATATA 5’(362-406)                pCO3(461-497) 14TACTTATAGTGTACTGATGTTAAGACAGAAGATAACC-ATAATACATA 
ACAAATATATA bsCO3(362-406)              bsCO3(457-499) 14TATAAGTAGTATATTAGTGTCAGAGTAAAAGATAACC-ATAAAATTATA 5’ 
|||:||||||||||||:|:                                               bsCO3(483-522) 14TAAAAGTGACT-ATGGAA 
ACAGATAAAAACACAATATACA 5’(376-418)                                                          :: ||||:| 
ACGAATAAAGACACAACATTCAATATA 5’(378-422)                                    pCO3(491-539) 10TTT-ATAAGA 
|:|:|||:|:|||:::||||:|:|||||||||||||                                           bsCO3(504-535)15TTAGAA 
ATAGATAGAGACATGGCACTTAGACGAAAGCAAAAAA AAAA 5’ (397-436)                     
            ||:|::::|||:|:|::||||:|:|::||||||||||||                
        12ATCATAGTGTTCAGATGGGAGCAGAGAGTAAATGGAATATAATATATA 5’pCO3(411-449)        
          16TTCAGTGTTTAAGCGAAGGTAGAAGATAAATGGAATATACAATATATA bsCO3(413-452) 
                    |:|||||:|||::||:||:|:|||||||:|::||||||||||||||||: 
              04TATATTAAATGGAAGTGAAGAATAGATGGAATGTGTAAAACAACAAATAATATCATA 5’pCO3(418-467) 
                 10TTTAAATGTAGGTAAAAAGTGAATGGAATATGCAAAACAACAA-TATTATA 5’ bsCO3(427-460) 
                              10TAAAAGTAGATAGAATATGCAAGATAGTAAATAATACACTAATAA 5’bsCO3(443-474) 
 
 

 

 

163 

500       510       520       530       540       550       560       570       580       590        
uuAGAuuuAuuuAAuuuGuuGAuAAAuACAuuuuAUUUGuuUGuuAGuGGuuuAuuuGuuAAuuuuuuuGuuuuGuGUUUUUGGuuuAGGuuuuuuuGuu 
AATCTAAGTAAATTAAACAACTAATAAA 5’(483-522)                                              |||::|:|:||::|| 
:||:|||:||:||||:::||||:|||||||||||||||:|                             pCO3(585-629) 12TAATTTAGAGAAGTAA 
GATTTAAGTAGATTAGGTAACTGTTTATGTAAAATAAATATA 5’(491-539)              bsCO3(585-628) 13TAATTTAGAGAAGCAG 
GATGTAGGTAAATTGAGCAGCTGTTTATGTAAAATATATA 5’(504-535)                                   
                             ||||:|||:|:|:||||||||:||:|:||||||||||| 
         pCO3(528-565) 12TATAGTAAGATAGATAGACAATCACTAAGTGAACAATTAAAATATATA 5’  
      bsCO3(525-563) 13TAATATGTAGAGTAGATAAACAGTCACTAGATGAACAATTAATATATA 5’ 
                                                 ::||||:|:::||||:|:|:||||:|:||||:|||:||||||||| 
                                pCO3(548-592) 11TTTAAATGAGTGATTAGAGAGACAAGATACAAGAACTAAATCCAAATATATA 5’  
                                  bsCO3(551-593) 13TAATAGATGATTGAAGAGACGAGATACAAGAGCCAAATCCAAAATAACA 5’ 
 
600       610       620       630       640       650       660       670       680       690               
G**UUGuuGuuuuGuAuuAuGAuuGAGuuuGuuGuuuG****GuuuuuuGuuuuuGuGAAACCAGuuAUGAGA**GUUUGCAuuGuuAuuuAuuACAuuA 
|  ||:|||:||::|||||||||||||||:                                        :|:|  :|||:||:||::||:||:||||||| 
C--AATAACGAAGTATAATACTAACTCAAGATATA 5’(585-629)      pCO3(669-717) 09TTTTT--TAAATGTGACGGTAGATGATGTAAT 
T--AATAGTAGAGCATAATACTAACTCAATATATA 5’(585-628)     bsCO3(669-715) 08TTTTT--TAAATGTAGTAGTAAATGATGTAGT 
     |:||:|:||:||:||::||::||||::|:|||||    |||||| ||||||||:                                           
  13TATAATAGAATATGATGTTAGTTCAAGTAGCAAAC----CAAAAA#CAAAAACATA 5’ pCO3(604-647)         
  13TATAATAGAATATGATATTAGTTTGAACAGCGAGC----CATAAAACAAAAACATA 5’ bsCO3(604-643) 
                                    ||:    :|:|:|::|:|:|:|:||||||||:|||#:|  ||||||||||| 
                   pCO3(635-669) 14TAAT----TAGAGAGTAGAGATATTTTGGTCAGTACATT--CAAACGTAACATATA 5’ 
                  bsCO3(635-669) 11TAAT----TAGAGAGTAGAGATATTTTGGTCAGTACATT--CAAACGTAACATATA 5’ 
                                                            |||||::|||||:|  :||:||||||||||||| 
                                pCO3(659-691) 12TAACATAGATACTTGGTTGATACTTT--TAAGCGTAACAATAAATTATA 5’ 
                          bsCO3(653-682) 15TAGATAGTAGTGATGTTTTGGTCAATATTCT--CAAGTGTATA 5’ 
 
 

 

 

164 

700       710       720       730       740       750       760       770       780       790        
AGuuGuGG****UGuuuuuGGuuCuAuuuuAuuuuuAuuGGAuuuAuUACAuuuuA**UGCAuGuuuuuuuAGGuGuuuuGuuGuuGuuuAuuuGuuuuA 
||:|||||    ||||:|                                                       :||:|:||:|:::|:|:||:|||:||:| 
TCGACACC----ACAAGATAATA 5’(669-717)                    pCO3(772-815) 14TATCATAGAATAGTGATAGATGAACGAAGT 
TTAGTACC----ACAATA-CCAAGATATA 5’(669-715)              bsCO3(773-814) 11TATATAAAGTGACGGTAGATGAACAGAAT 
       ::    |:||:|:::|||||::|:|:|||:||::|||:||||||||||                                           |||| 
    10TTT----ATAAGAGTTAAGATGGAGTGAAAGTAGTCTAGATAATGTAAATATATA 5’pCO3(706-753)    pCO3(796-829) 06TAAAT 
 26TGATATAT----ATAGAAGTTAAGATAGAATGAAGATGACTTAAATAATGTAAATATA 5’bsCO3(707-753) (788-829) 13TAATAAGTGAAAT 
                        ||||:|||:|:|||:|:||||:|:|||||:|||  ||||||||             bsCO3(798-842) 14TATAT 
       pCO3(723-765) 14TGATAGAATGAGAATGATCTAAGTGATGTAGAAT--ACGTACAATATATA 5’ 
       bsCO3(724-765) 10TATAGAATAGAAGTGACTTAGATGATGTAGAAT--ACGTACAATATATA 5’ 
                    bsCO3(736-781) 17TAATTTAGATAGTGTAAGAT--GCGTGTAGAAGAATCCACAAAACATATA 5’ 
                                                       ||  |:|||::|:|:||||::|||:|||||||||||| 
                              pCO3(754-790) 15TAATGTAGTAT--ATGTATGAGAGAATCTGCAAGACAACAACAAATTATATA 5’ 
                             bsCO3(754-790) 12TAATGTAGTAT--ATGTATGAGAGAATCTACAAGACGACAACAAATTATATA 5’ 
 
800       810       820       830       840       850       860       870       880       890        
uGCGuuuGuuuAAuuuuuuGuGuAuGGAuACACGuuuuGuuuuuuuGuAuuGuGuuuGuuuAuAuuGACAuuuuGuuGAUUUAGuuuGAuuuuuuuuAuu 
|:|||:||||||||||                                 |||:|:|:|:|:||:||::||||:|:||::||||||||||||| 
ATGCAGACAAATTAAATATA 5’(772-815)pCO3(848-890)08TATAATATAGATAGATGTAGTTGTAGAGCAGTTAAATCAAACTAATATATA 5’ 
ATGCAAACAAATTAA-TATA 5’(773-814)(845-889)11TATATATAGTGTAAGTAGATGTAGCTGTGAAATAACTAAATCAAATTATA 5’ 
|:|:|:|:|:|||||:|:|||:|||||||| 
ATGTAGATAGATTAAGAGACATATACCTATAGTGCAAAACAATA 5’ (796-829) 
GTGCAGATAAATTAGAAGACACATACCTATATATA 5’(788-829) 
  |:|||::|||||:|:|::|:||||:||||||||||||||||                                      |||::||:|||:|:|:|||| 
TAGTAAATGAATTAGAGAGTATATACTTATGTGCAAAACAAAATATA 5’(802-842)   pCO3(880-918) 13TATAATTGAATTAAGAGAGATAA 
ATGTAGGTAAATTGAGGGATATGTGTCTATGTGCAAAACAAAATA 5’(798-842)      bsCO3(880-929) 14TAATTGAATTGAAAGAGATAA 
               |||:::|:|||:|||||||||||:|:||:||||||||||:|                           
  (814-854) 06TAAAGGTATATATCTATGTGCAAAGCGAAGAAACATAACATATATA 5’        
    bsCO3(828-855) 22TTATGTAGATGTGCAAAGTAGAGAGACATAACACAATATATA 5’ 
                                              
 

 

 

165 

900       910       920       930       940       950       960       970       980       990        
GCGAuuuGuuuAuuuuGAuGuuuuAuGuGuuAuGuAuuuGuGuGuGuAAuuuuAuuGGuGuuuuUUUAGUUGuuGAuuA*GuuAAuuuGuAuuGGUAGUU 
|||||:||||||||:||||                                            |::|||:||:::|||:| |||||:||:||||||||||| 
CGCTAGACAAATAAGACTAAATATA 5’(880-918)    pCO3(963-1003) 08TATATTGGAATTAATGGCTAGT-CAATTGAATATAACCATCAA 
TGCTAAACAGATAAGACTACAAAATATATA 5’(880-929)   bsCO3(965-1003) 17TGTAATTAATAGTTAGT-CAATTGAATATAGCCATCAA 
        ::||||:||:|::||:||:||:||||||||:|||||||:|| 
     13TTGAATAGAATTGTAAGATGCATAATACATAGACACACATATAATATA 5’ pCO3(907-947) 
        13TATATAGTTAGAAGATACATGATACATAAGTACACACATTAAAAGATA 5’bsCO3(920-952) 
                                    ||||:::|||::|||:|:||||:|:|:||||:||||||||||| 
                                 12TTAAATGTACATGTTAGAGTAACTATAGAAAAGTCAACAACTAAATATA 5’  pCO3(935-977) 
                  bsCO3(940-977) 14TAATTAGTGTATATTGAGATAGTCACAGAAGAATCAACAACTAAATATA 5’ 
                                           ::::||||:|:||||::|||:|:||||:||||||||| |||| 
                                   14TAATTAGTGTATTAGAGTAACTGCAAGAGAATCGACAACTAAT-CAATATA 5’ pCO3(942-983) 
                      bsCO3(951-981) 10TAATGTACA-TATAGTGACTATAAGAAGATCAACAACTAAT-CATA 
 
1000 
UGUAGGAAG 
||||  
ACATATA 5’ 
ACATATATA 5’ 
 

 

 

166 

C) C-rich region 3 
0         10        20        30        40        50        60        70        80        90                
AGAAAUAUAAAUAUGUGUAUGAUAUAUAAAAACAAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuGuACAuuuuuuuuGuuuuuu 
                                  |||:|:|||    |||:|:||||||||||| 
          pCR3(33-62) 09TAAAGTGAGATTATAGACT----AACGAGCCAAAACAACATGTATA 5’ 
           bsCR3(34-62) 09TAGTGTGAT-GTAAACT----AACAAGTCAAAACAACATATATA 5’ 
                                          |    |::|:|::|||:::||||:|:|:|||||::||||||||||||| 
                          pCR3(41-88) 14TAT----AGTAGATTAAAGTGACAAGAGAGTAACAGGCAAACATGTAAAA-TATATA 5’ 
                         bsCR3(41-89) 08TTT----AGTAGATTAAAGTGACAAGGGAATAACGAGCAAACATGTAAAGATATA 5’ 
                                                                               |::||||::|:|:|::||||:| 
                                                          pCR3(78-118) 16TATCATAGTATGTGGAGAGAGTAAAAGA  
                                                         bsCR3(78-118) 04TATCATAGTATGTGGAGAAAGTAAAAGA 
                                                                            pCR3(105-140) 13TATAGTTAT 
                                                                            bsCR3(105-140) 12TTAGTTAT 
 
100       110       120       130       140       150       160       170       180       190        
AuuuGuuuGuG***A**UUUGuuuuuAuGuuuGuuA*UUUAGuuuuuGuuuuuuAuuGGAuuuuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuA 
|||||||||||   |  ||                                    |||::||:||:||:|:|:|:||||||:|:||||||||||||:| 
TAAACAAACAC---T--AATATATA 5’(78-118)  pCR3(154-196) 13TTAATTTAGAAGTAGAGAGTGAATTATGCTCAAATAACAACATATATA 5’ 
TAAACAAACAC---T--AATATATA 5’(78-118)         (162-200) 15TACTATAGATAGAAGATAGATTATGCTCAGATGACAACACAAATATATA 
     :|:|||   |  :|||:|:|||||||||||| |:||                                                  :||:::|:|| 
AGATGGAGCAC---T--GAACGAGAATACAAACAAT-AGATA 5’ (105-140)                   pCR3(190-230) 06TTAATGTAGAT 
AGATAGAGCAC---T--GAACGAGAATACAAACAAT-AGATA 5’ (105-140)                    bsCR3(192-232) 11TATATAGAT 
                       |:||||::|::||| |:|||:||:|||:|:|||||||||||:|:| 
                  13TATAGAATATGAGTAAT-AGATCGAAGACAGAGAATAACCTAAAGATA 5’ pCR3(122-166) 
                       09TTATAGAGTAAT-GAATCAAAGACAAGAGATAATCTAAAAACAATATA 5’ bsCR3(129-167) 
 
200       210       220       230       240       250       260       270       280       290        
uuuuuuuuuuuuAuuuuAuCAuuuGAuAuGuGuAuCA*AAuuGuuAuuuAuuAuuuAG*UUCGuUUA*UAuuGuuAuuuUUAuAAuuuAuuuAAGUAUGC 
:|||:||:|:||||:|||:||:|||||||||                                                 :|||#|:|#||:|#||||||| 
GAAAGAAGAGAATAGAATGGTGAACTATACAACATATA 5’(190-230)    (293-308) 12TAATTAGT-GAAATGATAGTGATTAGAGTCATACG 
AAGAGAGAGAAATGGAATAGTAGACTATACACAATAGATA 5’(192-232)  (268-312) 05TAAATAGTAATGGAGATATTGAGTGAATTCGTACG 
                           |||:|:||||| ||#|::|||#:|||||::|| |||||||| ||||||||||            
                        12AATATATATAGT-TTCATGATATGTAATAGGTC-AAGCAAAT-ATAACAATAATATA 5’ pCR3(226-277) 
                             09TAACAGT-TTGATGATAGGTAATGAGTC-AAGTAAAC-ATAACAATATA 5’ bsCR3(234-265) 
                                  14TT-TCTATGATAGGTAGTAGATC-AAGCAAAT-ATAACAATAAGATA 5’ bsCR3(241-279) 
 
300       310 
AAAUAAUUUUUGU POLYA 
||||||||| 
TTTATTAAATATATA 5’ pCR3(293-308) 
TTTATTAAAAATATA 5’ bsCR3(268-312) 

 

 

167 

D)  C-rich region 4. Resequencing of the mRNA indicated that there were 2 errors in the original sequence (yellow highlights). 
 
0         10        20        30        40        50        60        70        80        90         
UAAUUUAUUGUUAUCUUUGUGUAUUUAUUAuuAuuuuAuuuuAAuuuuGGuuGuGC***AuuuuuuuuuuuuuuAuuuG***GuG*UGuuuGuGuuuuA* 
                          |||:||:|:||||:||:||||:|:|||||||   ||||||                              
       pCR4(25-64) 12TATATATAGTAGTGAAATGAAGTTAAGATCAACACG---TAAAAATA 5’       
      bsCR4(25-64) 14TATATATAGTAGTAAAATGAAGTTAAGATCAACACG---TAAAAATAATA 5’ 
                                                 ::|::|:|   |::|:|:|||:|:|||:||:   ||: |||:|||||||||  
                                 pCR4(48-103) 12TTTAGTATG---TGGAGAGAAAGAGAATGAAT---CAT-ACAGACACAAAAT- 
                                bsCR4(48-103) 12TTTAGTATG---TAGAGAGAAAAAGAATGAAT---CAT-ACAGACACAAAAT- 
                                                                                        :|||:||:||:| 
                                                                        pCR4(87-134) 10TTAAATACGAAGT- 
                                                                           bsCR4(93-142) 05TATTAAAGT- 
 
100       110       120       130       140       150       160       170       180       190         
UGuA*C*A*GuuuAuGGuAuAuuuuAuuGuuGuuuuGuuuuuuGuuuuuGuuGUUUGuuUGuGuGGGuAuGuuuuAuuuGuuuuGuuAuAGuuGuuuGuu 
|:||                                                   ||:|:|:|:|:||||:|||:|||:||::||||||||||| 
ATATATA 5’(48-103)                    pCR4(154-192) 14TAATAGATATATCCATGCAAGATAGACGGAACAATATCAAATTATA 5’ 
ACATATAACGAAAGACAAAACGAGAGACAAGAAC 5’(48-103)      (166-196) 06TATGCGTGTAAGATAGATAAAACAATATCAACAAAATATA5’ 
|:|| | | ||:||||:||||:|:|||||||||||                                                    ||||::::|||::| 
ATAT-G-T-CAGATACTATATGAGATAACAACAAATATATA 5’(87-134)                 pCR4(186-232) 11TTATATTGGTAAATGA 
GTAT-G-T-TAGATGCTATATAAAGTGACAACAAAACAAAAAAA 5’(93-142)             bsCR4(186-232) 14TTATATTGGTAAATGA 
                            |:|::|:||::|||:|:|:|:|||::|:|:||||||||:||||:| 
                       14TATATAGTAGAATGAAAGATAGAGACAGTAGATAAACACACTCATATATATA 5’ pCR4(127-171) 
 
200       210       220       230       240       250       260       270       280       290         
uuuuuuuGuuGuUUUG*GGuuGuGAuuuuuuAuuG**GuGuuuuG***AuuGuAuAGuuuAuuuuuuuuGuGACGuuAuAAuuUUGuuuAuuuuuuuuuu 
                                                  :|:||:|:|||:||:|||:||||||:|:||||:||||||||||||||||| 
||:|:|:||::||||: ||:|||||||||||||     p(251-300) 11TATATTAGATAGAAGAAATACTGCAGTGTTAAGACAAATAAAAAAAAAAA 5’ 
AAGAGAGCAGTAAAAT-CCGACACTAAAAAATA 5’(186-232)    06TATATTAGATAGAAGAAATACTGCAGTGTTAAGACAAATAAAAAAAAAAAA 5’ 
AAGAGAGCAGTAAAAT-CCGACACTAAAAAATA 5’(186-232)                                   ||:||:|||||:|:||||:| 
              ||: ::|::::|::||:|||::  |||:||::   ||||||||||||||   pCR4(280-320)10TGTAGAATAAATAGAGAAAAGA 
p(213-261) 12TAAT-TTAGTGTTGGAAGATAGT--CACGAAGT---TAACATATCAAATATATA 5’ 
           12TAAT-TCAGTATTAGAGAGTAAC--TACAGAGC---TGACATATCAA-TATATA 5’bsCR4(213-258) 
 
 

 

 

168 

300       310       320       330       340       350       360       370       380       390         
uuAuuuuGuuuuGuGuuuuuuGuAuuG*UUGuuuuuAuUUGGuuuGuuuGGuuuuuuuuuG***UAuuuuuuGUUGuGuuuuGuGuuAuuuuuuGAuuuA 
:|||:|:|::|||||||||||                                                      |:|:|:||:|||:|||:|:|||||:| 
GATAGAGCGGAACACAAAAAAAAAA 5’(280-320)                     pCR4(374-417) 11TATATAGAATACAGTAAGAGACTAAGT 
        :||||:|:||:|:|:|||:: :||::|||||:||:|:||||||||     bsCR4(374-415) 17TATATAGAATATAATGGAGAACTAAGT 
     11TTAAAATATAAGAGATATAGT-GACGGAAATAGACTAGACAAACCATAAAATA 5’(307-351) 
     09TTAAAATATGGAAAATATAGT-GACAGAGATGAGCTGAACAAACCAATAAATTAGTTGGTTTGTT 5’ bsCR4(307-352) 
                                            |::||:::|||:|:|:|:   ||:|:|:||||:|||||||||||||         
                           pCR4(343-388) 15TAGTAAGTTAAAGAGAGAT---ATGAGAGACAATACAAAACACAATATATA 5’ 
                       bsCR4(340-390) 05TTTAAATGAGTTAAGAGAGAAT---ATAAAGAGCAGCATAAAACACAATAAATATA 5’ 
                                                                           
400       410       420       430       440       450       460       470       480       490         
uuuuuuAuGuUGuuuuuUGuuuuGGG***UG*GuuuuuuuGuuuuuGuuuuuuuuuuuuGuuuAuGuuuGuuuuuAuuuGuGGuuGuuGuuAuuuuGuuA 
:|||:|||||||||||||                                                          ||||:|:::|:|::|||:|:||:|| 
GAAAGATACAACAAAAAAAAAA 5’(374-417)                         pCR4(475-519) 12TTAAATATTGATAGTAATGAGACGAT 
AGAGAATACAACAAAATATATA 5’(374-415)                           bsCR4(478-524) 14TATATTAGTGATAATGAGATAAT 
     ||||::::|:||:|:||:||:|   || :|||:|||||:||||||||||||||| 
  14TAATATGGTAGAAGATAAGACTC---AC-TAAAGAAACAGAAACAAAAAAAAAAA 5’ pCR4(404-457) 
  08TAATATAGTAGAAGATAAGACTC---AC-TAAAGAAACAGAAACAAAAAAAAAAAAAA 5’ bsCR4(404-458) 
                                          |||::|:|||:|:|:|::||||:|||::|||:|||:|||||||||||:| 
                         pCR4(442-489) 12TTAAAGTAGAAAGAGAGAGTAAATGCAAGTAAAGATAGACACCAACAATA 5’ 
                        bsCR4(442-487) 11TTAAAATAGAAAGAGAGAGTAAATACAGATAAAGATGAACACCAACAAAATA 5’ 
                                                      |||:|::|:||::|:|:||:|||:||:||:||||||||||||||| 
                                    bsCR4(453-497) 11TAAAGAGTAGATGTAGATAAGAATGAATACTAACAACAATAAAACATA 5’ 
                                    bsCR4(453-497) 15TAAGAGATAGATGTAAGTGAGAATAAACACTAGTAACAATAAAACATA 5’ 
 
500       510       520       530       540       550       560       570       580       590         
GuuuGGuuGuuGUUGuuAuuUGuGuAuA****GGUUUAuuUAuA*UGCGuuuuuuAuuuuAGAuAAuUAuG****G****UA**UUGGUUUUAUAAAAUG 
:|:||:||:||||||||||| 
TAGACTAATAACAACAATAA-TATATA 5’(475-519) 
CAGACTAGTAACAACAATAAACATA 5’ (478-524) 
     ::||:|::|::||||:|:||||||    ||:|:|:||||| |||||||||| 
  03TTTAATAGTAGTAATAGATACATAT----CCGAGTGAATAT-ACGCAAAAAAAA 5’pCR4(504-554) 
     12TATAGTAGTGATAGATACATGT----CTAGATGAGTAT-ACGCAAAAAAAAAAA 5’ bsCR4(507-554) 
                                           || |:|::||:|:||:|:|||||||||:|    |    #|  ||||||||||||| 
                         pCR4(542-575) 12TAAT-ATGTGAAGAGTAGAGTCTATTAATGC----C-----T--AACCAAAATATTTAA 5’ 
                        bsCR4(542-584) 19TAAT-GTGTGAAAGATGGAATCTATTAGTGT----C----AT---ACCAAAATATTTATA 5’ 
                                            
600       
UUUUUUCU polyA 
 

 

 

169 

E) Cytochrome b 
 
0         10        20        30        40        50        60        70        80        90         
GUUAAGAAUAAUGGUUAUAAAUUUUAUAUAAAuAuGuuuCGuuGuAGAuuuuuAuuAuuuuuuuuAuuAuuuAGAAAuuuGuGuuGUCUUUUAAUGUCAG 
                                 :||:|:||:|::||:||:::|||||||||||||     
                    12TGATAGGTGTCGTATAGAGTAGTATTTAGGGATAATAAAAAAAAAA 5’ pCYb(32-64) 
                     06TATAGGTGTCGTATAAGGTAGTATTTGAGGATAATAAAAAAAAAA 5’ bsCYb(32-64) 
                                                      ||||||:::|||||||:|:||||||||::::|||||||| 
                                       pCYb(53-91) 11TTAATAAGGGAAATAATGAGTCTTTAAGTGTGACAGAAAAAAAAAA 5’ 
                                    bsCYb(54-91) 05TATCAATAGGAGGGGTAATGAGTCTTTAGATGTAACAGAAAAAAAA 5’ 
 
 
 
 
 
 
 
 
F)  Maxicircle unidentified reading frame II  
 
0         10        20        30        40        50        60        70        80        90         
UUUUAUAUAGAAAGGUAUAUAAUCUAUAAUGAuuuuAAuGuuuGGuuGuuuuA****AuuuAGuuuuAuuuUUGuGCUUUGAUUGuAGUCGUGUUUUUGA 
                               :|||||||::|::::||::|:||    |||||:||||||||||||||||| 
              pMURF2(30-79) 11TTTAAAATTGTAGGTTAATGAGAT----TAAATTAAAATAAAAACACGAAAGATA 5’ 
             bsMURF2(30-79) 08TTTAAAATTGTAAGTTAATGAGAT----TAAATTAAAATAAAAACACGAAAGATA 5’ 
 

 

 

170 

G) NADH Dehydrogenase subunit 3  
 
0         10        20        30        40        50        60        70        80        90         
UCAAAAAAUCCUCGCCUUUUUACUUUAGUUUGUUAUCAuuAuuuuuAuAuuuGuuuuUG*A*UAuuGuGGuuuA**UUAuuuuAuuuAuAGGuuuuuuuu 
                                  |||:||:||||:||:|::::||||:| | |||::|||||||  |                      | 
                   pND3(33-76) 12TATAGTAGTAAAGATGTGGGTAAAAGC-T-ATAGTACCAAAT--ATTATATA 5’ (99-143) 11TA 
               bsND3(30-73) 12TATAATAGTGATAAAGATGTGAATAAAGAC-T-ATAACACCAAAT--TATA 5’    (98-141) 12TAA 
                                                                |||:|:::|||  |:|:|:||:|:|:||:|||||:|| 
                                                pND3(63-113) 11TTAATATTGAAT--AGTGAGATGAGTGTCTAAAAAGAA 
                                               bsND3(63-108) 09TTAATATTAGAT--AGTGAGATGAATATTCAAAGAAAA 
 
100       110       120       130       140       150       160       170       180       190        
uAuGuuuuuuAuGuuuuuuAuuGCAuuuuuuuGAuuGuuuuCGuuGuuGuuuGuGGuuuuCGuGuGGuUUGuAuGAuAuGAAuUCACGuuuG*GUGuuuu 
|||||||:|||||| 
                                         |||:::|::|:|:||:||:|::||:||||:|||: :|||||| 
ATACAAAGAATACA 5’(63-113)                 pND3(158-205) 15TAAGTGTATTAGATGTGCTGTGTTTGAGTGTAAAT-TACAAAA 
ATACAAAAA-TACATA 5’(63-108)              bsND3(158-205) 13TAAGTGTATTAGATATGCTGTGTTTGAGTGTAAAT-TACAAAA 
|||:::||||||::|||:|||::||:||:|||||||||||||:|                                               || :|:|:|| 
ATATGGAAAATATGAAAGATAGTGTGAAGAAACTAACAAAAGTATA 5’ pND3(99-143)            pND3(190-229) 13TAC-TATAGAA 
ATATAGAGAATATAGAAAATGGCGTAAAGAGACTAACAAAAGAATA 5’ bsND3(98-141)          bsND3(190-233) 13TAC-TATAGAA 
                               ||:|:|:::|||:|:||::|:||||:|:||||||||:|||| 
                          18TATAATTGATGGAAGTAGCAGTAGATACTAGAAGCACACTAAAC-TATATA 5’ pND3(130-170) 
                            11TAATTAGTAAGAGTGACAATAAACACTAGAGGCACATCAAACATATATA 5’ bsND3(130-174) 
 
200       210       220       230       240       250       260       270       280       290        
AuACAuuGGAuuuAUGuuuuGuuAGuUGuUUGuuuuuuGuAuuGuuAAAuuCCAuUAuuuGuGuUUUGuuGuuuGuuuuuGUGAuA*GuGuuGuuuuAuu 
||||||                                                |||||:|:|:||:|:|:::|:|||:|::|:||| ||||||||||||: 
TATGTATATA 5’ (158-205)           pND3(253-299) 14TCTTTAATAGATATAAGATAGTGAGCAAGAGTATTAT-CACAACAAAATAGTATA 5’ 
TATGTATATA 5’ (185-205)            bsND3(253-299) 06TTTAATAGATATAGAGTGACAGATAGAAACATTAT-CACAACAAAATAATATA5’ 
||||||::|||:|||:||:|||||||||||                                                       || :::||:||:|||: 
TATGTAGTCTAGATATAAGACAATCAACAATATATA 5’(190-229)                   pND3(284-329) 06TAAT-TGTAATAAGATAG 
TATGTAGCTTAGATACAGAGTAGTCAACAAACAATATA 5’(190-233)                   pND3(285-328) 10TT-TATAATAAAATAG 
                        |:|::||::|::|:|:|||:|||||||:|||||||||||:| 
       pND3(223-263) 11TTTAGTAAGTAGGAGATATAGCAATTTAGGGTAATAAACATATATT 5’   
     bsND3(222-263) 16TATTAATAGATAGAGAGTGTAGCAATTTAGGGTAATAGACATATAAA 5’  
 
 

 

 

171 

300       310       320       330       340       350       360       370       380       390        
uuuGuuAUGGuuuuuuGuUUUUGuGGuuuuuGuuuuuuGuuGuAuGuAuAG****GAuuUGuGuGGuAuuuuuGGGAUCAC*GuAuAUUUGUGUGGUGUA 
|:|:||||:|:|||:|:|||||||||||||                                               pND3(402-438) 13TATATTGC 
AGATAATATCGAAAGATAAAAACACCAAAATATA 5’(284-329)                                   bsND3(402-435) 08TGC 
AGATAATACCGGAAGATAGAAACACCAAA-TA 5’(285-328) 
                       :|::||:|::|:||:::|:|||::||||:    ||||::||||||||| 
      pND3(322-369) 13TTATTAAGAGTAGAAGGTAGCATGTATATT----CTAAGTACACCATAA-TATATA 5’  
     bsND3(322-370) 12TTATTAAGAGTAGAAGGTAGCATGTATATT----CTAAGTACACCATAAATA 5’ 
                                                ||||    :|||::::|:||||:|:||::||||| ||||||| 
                           pND3(347-388) 18TATTTTATC----TTAAGTGTATCATAGAGACTTTAGTG-CATATAACAAATGTATA 5’  
                                bsND3(355-388) 11TTT---ATTAAGTGTACTGTGAAAGCTTTGGTG-TATATAACAAACAAATATA 5’ 
 
400       410       420       430       440       450       460                   
AUUUUAuuuuGuuuAuGA**UGuuuUUUGUUGUAUUAUACAUAUUAUAUUAAUAAAUAUAUAAAA 
  ||:||:|:::||||||  ::||||:|||||||||||| 
TTAAGTAGAGTGAATACT--GTAAAAGACAACATAATATTATA 5’ (402-438) 
TTAAATGAGATAAATGCT--GCAGAGAACAACATAAAAT-ATATA 5’ (402-435) 
 

 

 

172 

H) NADH Dehydrogenase subunit 7  
 
0         10        20        30        40        50        60        70        80        90         
UGAUACAAAAAAACAUGACUACAUGAUAAGUAuCAuuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCC 
                                     :|||||::|:|:|:::||||:|:|||||||||:|                         |||#|       
             pND7(36-69) 14TATTATAGT-GAATACGGTGAGAGTTATCAGAGAAATGTAAATAATATA5 (108-137)TTAGATTTTTAGAG 
            bsND7(28-71) 12TATTATAGTAAGATGCAATGAAAGCCGTCAAGAGAATGTAAACATATAAA 5’ 
                                                            ||:||||:|:||||:|:|||||||:| ::||||#||||| 
                                           pND7(59-91) 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA5’ 
                                           bsND7(58-91) 12TAAAGTGTAAATATAGCGAAGTGTAAAT-CAGGTGACATAAATATATA5’ 
 
100       110       120       130       140       150       160       170       180       190        
G***CAGCACAuG**GuGuuuuAuGuuGuuuAuuGuAuuuuuGuGGuGA*AuuuAuuGuuuA**UAUUGAuUGuAuuAuA***G*GuuAUUUGCAUCGUG  
|   #||#|||||  |:::|||||||||||||||:||| 
C----TCATGTAC--CGTGAAATACAACAAATAATATA 5’ pND7(108-137) 
                         :||:||:|:|:||:||:|::|||:| ||:|||:|||||  ||||||| 
                      14TTAATAAGTGATATGAAGATGCCATT-TAGATAGCAAAT--ATAACTACATA 5’  pND7(124-170)   
                   16TATATAGTAAATGACATGGAAGTGCTACT-TAAATAACAAAT--ATATATA 5’ bsND7(121-166) 
                                                     ||||:::|:|  |||::|:||||::|||   | :||||| 
                                    pND7(152-190) 12TAATAGTGAGT--ATAGTTGACATGGTAT---C-TAATAATACGTAGCATTAAA5’  
                                 bsND7(151-199) 12TAAAGTGATAGAT--GTGATTGATATGATGT---C-CAATAA-ATGTAGCATAAA5’ 
 
200       210       220       230       240       250       260       270       280       290        
GUACAGAAAAGUUAUGUGAAUAUAAAAGUGUAGAACAAUGUCUUCCGuAUUUCGACAGGUUAGAuuAuGuuA*GuGuuuGuuGuAAuGAGCAuuuGuuGu 
                                               :||:|:|||||:||:|||||||||  
                 pND7(246-269) 14TAAATAAGGAAATCTATGAGGCTGTTCAGTCTAATACACAACTATA 5’  
                bsND7(246-269) 12TAAATAAGGAAATCTATGGGGCTGTTCAGTCTAATACACAACTATA 5’ 
                                                              |:|||||:|:| :||||:::||||||:||||:||:|||| 
                                             pND7(261-311) 08TTTTAATATAGT-TACAAGTGACATTATTCGTGAATAACA 
                                            bsND7(261-293) 10TTTTAATGTAGT-TATAAGTGACATTGCTCGTGA-CAACA 
                                                                                                  |:| 
                                                                               pND7(297-338) 13TACATA 
                                                                          bsND7(292-324) 13TGAAATAGTG 
 
 

 

 

173 

300       310       320       330       340       350       360       370       380       390        
CuuuA***UGuuuuGAGuAuAuGuuGCGAuGuuGuuuGuCGuuACGuuGuGCAuuuAuGCGuuuAuuAAuuGuA****GAAuuuAC***CCGuAGuuuuA 
|||||   ||||                                         |||||::|||:||:||||||:|    ||||||||   |#|||||||| 
GAAAT---ACAATATA 5’(261-311)       pND7(352-398) 07TATAAATGTGCAGATGATTAACGT----CTTAAATG---GACATCAAAAG-ATATA5’ 
GAA-T---ATA-TATA 5’(261-293)       bsND7(353-402) 13TAAAATGTGCAGATAATTAATGT----CTTAGATG---GGTATCAAAATTATATATA5’ 
||:||   ::|||:||:||||:|||||||||||||||:|                                              ||   #|:||::|:|| 
GAGAT---GTAAAGCTTATATGCAACGCTACAACAAATATA 5’(297-338)             pND7(390-424)16TAATTG---AGTATTGAGAT 
GAAAT---ATAAGACTCATATACGA-GCTACAACAA-TATATA 5’(292-324)          bsND7(391-424) 14TAATG---GATATTAAGAT 
                            :||||::||::||||:||::||::|||||||:||||||||||||:|| 
          pND7(327-373) 13TATTATAGTAAGTAGCAGTGTGACGTGTAAATATGCAAATAATTAATATA 5’  
        bsND7(327-365) 12TAATTGTAGTAAGTAGCAATGTAACGCGTAGATATGCAAATATACATA 5’ 
 
400       410       420       430       440       450       460       470       480       490        
AuGGuuuGuuGuGuAuAuCAuGuAuGGuuuuGG*AuuuAGGuuGuuuGuCUCCGuuG*UUAuGAuCAuuuGAGGAA***CG*UGACAAAuuGAuGACAuu 
||::|:|:|:|:|||||||||||||                                                              |||||:|::||||| 
TATTAGATAGCGCATATAGTACATAAATTATA 5’(390-424)                          pND7(486-530) 11TTTTAATTGTTGTAA 
TATCAAGTGACATATATAGTACATAACATTAAA 5’(391-424)                        bsND7(486-530) 11TTTTAATTGTTGTAA 
             :||||||||:||::||:|:|: ||:|||:|::||||||||| 
          13TTATATAGTATATGTCAGAGCT-TAGATCTAGTAAACAGAGGAATATATA 5’ pND7(412-452) 
             13TATAGTGTATGTTGGAATC-TAAGTTCGATAGACAGAGGCAAT-A-TATA 5’ bsND7(414-458) 
                                                      :||: |:|::||||:|::|:|||   || |:||#||||||||||||| 
                                    pND7(453-485) 13TATAAT-AGTGTTAGTGAGTTTCTT---GC-ATTGATTAACTACTGTAA 
                                   bsND7(453-485) 13TATAAT-AGTGTTAGTGAGTTTCTT---GC-ATTGATTAACTACAATAA 
 
500       510       520       530       540       550       560       570       580       590        
uuuuGAuuuAuG**UUGuGGuuGuCGuAuGCAuuuGGCUUUCAuGGuuuuAuuA*GGuAUUCUUGAUGAuuuuGuuuuuGGuuuuGuuGAuuuuuuGuuG 
||                                                               :||:|||:|:|:|:|::|||:|:::||:||:|:||| 
AATA 5’(452-501)                                pND7(564-615) 12TTTATTAAGATAGAGATTAAAGCGGTTAGAAGATAAC 
ATA 5’(453-485)                                bsND7(564-615) 10TTTATTAAGATAGAGATTAAAGCAGTTAGAAGATAAC 
||:|||:|:||:  |:||::|||||||||||                                                      |:|::||||:|:::|| 
GAGACTGAGTAT--AGCATTAACAGCATACGAATATA 5’(486-530)                   pND7(584-630) 15TATAGTTAAAGAGTGAC 
GAGACTGAGTAT--AGCATTAACAGCATACGAATATA 5’(486-530) 
         |||:  |:||:::||:|:||::||:|::|||||||||:|||                                                ::|: 
14TAATTGTATAT--AGCATTGACGGTATGTGTGAGTCGAAAGTACTAAATATA 5’(508-548)          pND7(596-642) 18TAAATTTGAT 
     11TTATAT--AGTGCTAGTAGTATATGTAGGTTGAGAGTACCAAAATAATATA 5’bsND7(508-553)     bsND7(596-640) 11TTAAT 
                           |||:|||||::||:||||:||:|:|||| |:||||||||||||| 
         pND7(526-569) 12TAATATGTAAATTGAGAGTATCAGAGTAAT-CTATAAGAACTACTATATATA 5’ 
        bsND7(526-569) 13TAATATGTAAATTGAGAGTATCAGAGTAAT-CTATAAGAACTACTATATATA 5’ 
                                         ||||:::|:|||:| |:||:|||::|||||||||||| 
                  pND7(540-576) 05TACTGATAGTATTGAGATAGT-CTATGAGAGTTACTAAAACAAATAATAATA 5’              
                bsND7(540-571) 14TAATCGTTAGTGTCAAGATAGT-CTATAAGAACTACTAAATAATATA 5’ 

 

 

174 

600       610       620       630       640       650       660       670       680       690       
uuGuuGA***UAAuAuCAuGuuuGuuuGuuAuGGAuuGuuAuGAuuuGuuAuuuGuGGGuAAUCGuuuAuuuUAuuuGCGuuuGC***GuGGuuuGuCAu  
|||||||   ||||||                                         :::|||||::|||:|||||:|||::||||   ||:||||||||| 
AACAACT---ATTATATT 5’ pND7(564-615)     pND7(656-699) 12TTTTATTAGTGAATGAAATAGACGTGAATG---CATCAAACAGTATATATA 5’ 
AACAACT---ATTATATA 5’ bsND7(564-615) bsND7(666-701) 12TACATAGTTAGTGA-TAGAATAAGTGCAGATG---TACCAAACAGTAGATATA 5’ 
|:::|||   |||||||||:|||||||||||                                                 :|||:|   :::|||::|||: 
AGTGACT---ATTATAGTATAAACAAACAATTAATA 5’(584-630)               pND7(679-727) 13TTAAATG---TGTCAAGTAGTG 
|::|::|   ||||||||:||:|:|:||:|||||||||||||:              bsND7(679-725) 16TGATGTTAAGTG---TGTTAGACGGTA 
AGTAGTT---ATTATAGTGCAGATAGACGATACCTAACAATATATA 5’(596-642) 
AGTGACT---GTTATAGTGCGAGTGAACAATACCTAACAATTAAA 5’(596-640) 
                              |||::||:::|||||:|::||||:|||:|||||||||||||: 
                           13TATATTTAGTGATACTGAGTAATAGACATCCATTAGCAAATAGTATA 5’ pND7(629-670) 
                              11TTTTAGTAGTATTAGATAGTAAGTGTCCATTAGCAAATAAAATATATA 5’ bsND7(632-674) 
 
700       710       720       730       740       750       760       770       780       790        
uuuuuGAuuuAuAuGAuuuA**GuuuuuA**A**UAGuuuAAGuGGuGuuuuGuCuCGuuCGuuAGGuAuGGuGuGAGAuuGUCGuuuAuuuAGuuGuuA 
::|||:||||||||||||||  ||||||                                                   ||::||:||:|:|||::||:|| 
GGAAATTAAATATACTAAAT--CAAAAAAAAAAA 5’(679-727)               pND7(778-830) 12TATAGTAGTAAGTGAATTGACGAT 
GAAGGTTGAATATACTAAAT--TAAA 5’(679-725)       bsND7(772-806) 12TAATTGATATGACACTCTAGTAGCAAATAGATCAACAAT 
                                             bsND7(782-816) 16TAATTGATGTTATATTCTAAGGGCAGATAGATCAACAAT 
            |||:|:|:|  :|:|:||  |  ||:|:||||||:|:|||||||||||                                  |:||:|:| 
        13TAATATTGAGT--TAGAGAT--T--ATTAGATTCACTATAAAACAGAGCATATATA 5’(711-758)   (792-845) 10TTTAATAGT 
       13TATATATTGAGT--TAGAGAT--T--ATCAGATTAACCATAGAACAGAGCAAAA 5’ (709-741)   (790-839) 11TAATTAGTGAT 
                          |:||  |  ||||:|||:||:|::|:||:|||||||||||              
    pND7(725-764) 14TAACTTAGAT--T--ATCAGATTTACTATGAGACGGAGCAAGCAATAATATA 5’ 
    bsND7(722-765) 14TATAAGAGT--T--ATTGAGTTCGTTACAAGATAGAGCAAGCAATTATATA 5’ 
                                                         |:|||:|:||:||::||:|:||||:|||||||||||||| 
                               pND7(756-794) 14TAATTATAGTGTAAGTAGTCTATGTCATATTCTAGCAGCAAATAAATCATA 5’ 
                                bsND7(756-794) 14TTTATAGTGTAAGTAGTCTATGTCATATTCTAGCAGCAAATAAATCATA 5’ 
 
 

 

 

175 

800       810       820       830       840       850       860       870       880       890         
****UGA*****GuUGuAuuuuAuGuuuuGuuAuGAuuAuuGuuuuuGuuuuAuAGGuGAuGCAuuuGA*UCGuuuAuuuuuACGuuuGuuuGAUAuGCG 
    ::|     ||::|||||||||||||:|                                          :|||||||:|||:|||::|:|||||:|: 
----GTT-----CAGTATAAAATACAAAATATATA 5’(778-830)         pND7(872-916) 09TTAAATAAAGATGTAAATGAGCTATATGT 
----ACT----ATATA 5’(772-806)                           bsND7(872-919) 17TTAAGTGAAGATGTGGGTGAATTATATGC 
----ATT-----CAATA 5’(782-816) 
    ::|     ::||:|:|||||:|:|::|:|||||||||||||| 
----GTT-----TGACGTGAAATATAGAGTAGTACTAATAACAAAATATA 5’(792-845) 
----ATT-----TAGCATGAGATATAGAACAATACTAATATATA 5’(790-839) 
                                   :|||||::::|:|:::|||:|:|::||:||||||| :|||||||| 
                                13TTTAATAGTGGAGATGGAATGTTCGTTATGTAAACT-GGCAAATATA 5’ pND7(834-877) 
                                15TTTAGTAGTAAAAATAAGATATTCACTATGTAGATT-AGCAAATAAATATA 5’ bsND7(834-879) 
                                                                ||||:| ||::||||||:|||||:|:|:||||||||: 
                                               pND7(863-902) 10TTAAATT-AGTGAATAAAGATGCAGATAGACTATACGT 
                                                  pND7(866-902) 10TATT-AGTGAATGAAAATGTGAGTAGATTATACGC 
 
900       910       920       930       940       950       960       970       980       990        
uAuGAGuuuGuuGAuuuGuAAGCAAuGuuuuuuuGuuGGuuuuuuuGuuuuuG*****GuuuuGuuuGuuuGuuuG**AuuAuuuAuAuuGuGAuAuuAC 
|||                                                         |||::|:|:|:|:|::|  |:|||:||:|:||||||||||| 
ATAATATA 5’(863-902)                     pND7(959-1000) 14TAAAAGTAGATAGATAGGC--TGATAGATGTGACACTATAATG 
ATAAATAATATA 5’(866-902)                bsND7(959-1000) 11TAAAAGTAGATAAATAGGC--TGATAGATGTGACACTATAATG 
||:||:|||||||||||                                                                   |||||:|:::||||:|| 
ATGCTTAAACAACTAAAATATA 5’ gND7(872-916)                           pND7(983-1017) 12TAATATGATGTTATAGTG 
GTGCTTAAACAACTAAACATA 5’(872-919) 
  ||:|:|:|:|::||:|||||||||||::|||||:||:||                                                        ||||| 
13TATTTAGATAGTTAGACATTCGTTACGGAAAAATAATCATATATA 5’ pND7(901-939)        pND7(1001-1032) 08TAATAGTTAATG 
12TATTTAAGTGACTAGATATTCGTTATAGAAAAACAATCATATATA 5’ bSND7(901-939)           bsND7(1000-1043) 13TTTAATA 
       |::||:|:||::|||||||:||:||:|||||||||:|||:|                                                
ATGTTCTAGTAATTGAATGTTCGTTATAAGAAGACAACCAAAGAAATA 5’ pND7(907-947)       
   17TCTAGTAATTGAGTGTTCGTTATAAGAAGACAACCAAAGAAATAGAAAACGGAACC 5’ bsND7(907-951) 
                                 ||:|:::|:||:|:||:|::|     |||:|:|:||||||||||  | 
                              13TAATAGTTAGAAGAGCAGAGGC-----CAAGATAGACAAACAAAC--TTATATA 5’ pND7(932-978) 
                                12TTAATTAAAGAGATAGAGAT-----CAAAGTAAGCAGACAAAC--TAATAATATATA 5’ bsND7(934-983) 
                                                                                                
 

 

 

176 

1000      1010      1020      1030      1040      1050      1060      1070      1080      1090            
CAuuG****AGACCAuuAuuAuGuuAuuuuAuAGuuuGuGGuGuuGuuGuuuGCCGGGuAuA*UCAuuuGC*UUGUGuuGAACACCCCAAAGGuGA***G 
|                                                       :::|||| ||||:|:| :|:|:::|||||||##   ::|||   | 
GAAATA 5’(959-1000)                  pND7(1055-1085) 03TTTTATAT-AGTAGATG-GATATGGCTTGTGGAAAGATTACT   C 5’ 
GAAATATA 5’ (959-1000) 
|||::    |||||||||#||||||||||                                                   :||||||#|#||:||:|   |  
GTAGT----TCTGGTAATTATACAATAAATATATA 5’ pND7(983-1017)   (1089-1121) 14AAATGAGTGTTTTGTGGAG-TTTCATT---C 
#||:|    |||||||:||:|:|||||||||||              bsND7(1087-1113) 12TAAATG-AATATAGCTTATAGAGTTTCTGCT---T 
ATAGC----TCTGGTAGTAGTGCAATAAAATATATATA 5’ pND7(1001-1032)                                          : 
GTGAC----TTTGGTGATAGTGCAGTAAGATGTCAAACACCACATATA 5’ bsND7(1000-1043)            pND7(1099-1143) 14TAT 
||:|    |:|||#||||:||:|||:|||||||||||||:|||                                    bsND7(1099-1143) 12TAT 
13TAGC----TTTGGGAATAGTATAATGAAATATCAAACACTACATA 5’ pND7(1015-1043) 
 14TT----TTTAGTGATAGTGCAGTAAGATGTCAAACACCACATATA 5’ bsND7(1013-1043) 
                                 |:||::::||||:::|:|||:|||:|||:| |||||#|| ||||: 
              pND7(1032-1067) 11TTTAAGTGTCACAGTGATAAATGGCTCATGT-AGTAA-CG-AACATATATA 5’ 
             bsND7(1032-1078) 10TTTAAATGCTATAGTGACGAGTGGCTTATGT-AGTAAACG-AATATAAA 5’ 
 
1100      1110      1120      1130      1140      1150      1160      1170      1180      1190       
uAuuGuuuGuuAuuA****UGuuuuuGuGuuGGuuuAuGuuCUCGuuuACGuuuGCGuuGuGCGGAuuuuuuGCA*UAUUUGuuuAuuGGAuGuuuGuuu 
|||||||||||||:|    |:|                                                            :|||||::|||:|:|:|||  
ATAACAAACAATAGT----ATATA 5’(1089-1121)                         pND7(1181-1218) 12TTAAATAGTCTATAGATAAA 
ATAACAAACAATAA-----ATA 5’ (1087-1113)                        bsND7(1183-1224) 13TAATAGTGACTTATAAGTAGA 
|||:::||||||:||    ||::|:|:||:|:||||||||||||                       
ATAGTGAACAATGAT----ACGGAGATACGATCAAATACAAGAGAATATA 5’(1099-1143)   
ATAGTGAATAATGAT----ACGAAGGTACAGTTAAATACAAGAGAATATA 5’(1099-1143) 
        |:|||:||    |:|:|:|:::|:::|:|||:|||||||||||:|||:|:|         
     13TATAATGAT----ATAGAGATGTAGTTAGATATAAGAGCAAATGTAAATGTATA 5’pND7(1107-1157) 
     14TATAATAGT----ATAGAGATACAGTCAAGTGTAAGAGCAA-TGCAAAT-TATA 5’bsND7(1107-1146) 
                          09TCAATTGAGTATAAGAGCAGATGCAAGCGTAACATGCCTAATATATA 5’ bsND7(1128-1167) 
                                                   :|||:|:|::|:|::||:|:|::|| ||:|:||:||||||||||||:| 
                                pND7(1150-1197) 10TTAAATGTAGTATGTTTAGAGAGTGT-ATGAGCAGATAACCTACAAATATA 5’ 
                            bsND7(1150-1195) 10TATATAAGTGTAATATGCCTGGAAGATGT-ATAGACAAATAACCTATAAA 5’ 
 
 

 

 

177 

1200      1210      1220      1230      1240      1250      1260      1270      1280      1290       
GCGuGGuuuuuuAuuGCAuGAuuuAGuuGC***C*GuuuuAGGuAAuAuuGAuGuuGuuuuuGGAuCCGUAGAUCGuuA*GuuuuAuAuGuG**A***** 
|||||:|:||:||||||||                                                   ||:|||:|:| ||:|||||:::|  | 
CGCACTAGAAGATAACGTAAATATATA 5’(1181-1218)         pND7(1269-1320) 12TAATTTAGTAGT-CAGAATATGTGC--T----- 
TGTATCAGAAAATGACGTACTAAATATA 5’(1183-1224)       bsND7(1269-1320) 17TAATTTAGTAGT-CAGAATATGTGC--T----- 
           |||||:||::||:|||||:|   | |||:||:|||||||||||||||| 
        14TAATAATGTGTTAGATCAATG---G-CAAGATTCATTATAACTACAACATATA 5’pND7(1210-1257) 
           09TAATGTGTTAGATCAATA---G-CAAGATTCATTATAACTACAACATATA 5’bsND7(1233-1257) 
                                                    |:::|||:|||::|:||:|||||||||| ||| 
                              pND7(1251-1282) 12TAAGTGTGACAGAAATTTGGGTATCTAGCAAT-CAA-TATATATA 5’ 
                     bsND7(1240-1270) 16TTTTATTGTAGTTATAGTGAAGACTTAGGCATA--GCAAT-CAAATATA 5’ 
 
1300      1310      1320      1330 
*GGUUAUUGuAGGAUUGUUUAAAAUUGAAUAAAAA 
 |:|:|||||||||||||||| 
-CTAGTAACATCCTAACAAATATA 5’(1269-1320) 
-TTAGTAACATCCTAACAAATATA 5’(1269-1320) 
 

 

 

178 

I) NADH Dehydrogenase subunit 8  
0         10        20        30        40        50        60        70        80        90         
CAAUUUAAUAAUUUUAAGUUUUGGUUGAUUAuuAuuuuuuuAuuuuuuuAuuuuuGuAuGuuuuuuuuuGAuuuuuuGuuuuuuuUUUUUGuuuGuuuuu 
                              |||:|:|||:||||:|||:|||||:|||||||||||||||                   |||::|::||:|| 
          pND8(29-68) 08TATTATATAGTGAAAGAATAGAAAGATAAAGACATACAAAAAAAAAAA 5’  (87-136) 14TAAATGAGTAAGAA 
        bsND8(28-56) 04TAGGGAGATAGTAAAAGAGTAGGAGGATGAGGATAAAA 5’       bsND8(86-139) 12TAAAATGAGTAAGAG 
                                                        :|||:|:|:|:||::||||:||||||:||:|||||||||||||| 
                                         pND8(55-98) 11TTATATAGAGAGAAGTTAAAGAACAAAGAAGAAAAACAAACAAAATATA5’ 
                                       bsND8(54-97) 16TATATATAGAAAGAGACTGAGAAACAAAAGAAAAAGACAAACAAA-TATA 5’ 
 
100       110       120       130       140       150       160       170       180       190 
AuAuGuGUuuuGuuuGuuGuGuuA****CuAUUU*GuuuA***CCCAuuGAGuuAACCAuuGuuAGuuuAuuGGuuCGuGGUAACCAuuuuuuGCGUUUU 
|||:||::|:||:|::||||||||    |||||| :|                         :|||:|:|||::||||:|:||||||||#|||||||||| 
TATGCATGAGACGAGTAACACAAT----GATAAA-TA 5’(87-136) (161-187) 11TTAATTAGATAGTCAAGTATCATTGGTATAAAACGCAAAT 
TATATGCAAGATAGGCAGTACAAT----GATAAA-CAAATATA 5’(86-139)    15TGTAATTAAGTAGTTAGGTATCATTGGTAAAAGACGCAAAA 
            :||::|::::|||    |||:|| :|::|   |||||||||||#||||||                         ||||:|::|::||| 
(111-153)13TTAAGTAGTGTAAT----GATGAA-TAGGT---GGGTAACTCAAATGGTAAATATATA5’   p(186-230) 13TTAAAGAGTGTGAAA 
      12TATTCAGATAGTATAGT----GATAGA-CAGAT---GGGTGACTTA-TTGGTAACATATA5’ pND8(187-228) 11TAAAAGAGCGTAAAA 
                   :|:|||    |||:|: :||:|   |#||:|:|:|:||||||||||||||||| 
 pND8(117-170) 11TTTATAAT----GATGAG-TAAGT---GAGTGATTTAGTTGGTAACAATCAAATAAACA 5’ 
 
200       210       220       230       240       250       260       270       280       290 
uAUU***GGuGuGGuuuAGAGCGuuGuAuuGCuuGuCGuuuAuGuGAuuuAAuuuGCCCuA****GuuuAGCAuuGGAuG***UUCGuGuuGGGuGGAGu 
AATATA 5’ pND8(161-187)                                                      :|::   |:|:|:||:|||::|:| 
TGATATA 5’ bsND8(160-199)                                 pND8(276-318) 13TAATTGT---AGGTATAATCCATTTTA 
|||:   |:|:||:||:||||||||||||||                           bsND8(289-318) 10TTTTGT---AAGTAGAGCTCATTTCA 
ATAG---CTATACTAAGTCTCGCAACATAACATA 5’(186-230)             
GTGA---CCATATTAAATCTCGCAACATATATTATA 5’(187-228)           
              :||:|||:|:|::||||:|:||||||||||||                                        
06TAATAGCTATAGTAAGTCTTGTAGTATAATGGACAGCAAATACAA 5’ (213-244) 
           12TTAGATGTTGCAGTATAATGAGTAGCAAATACATATAAA 5’ bsND8(219-245) 
                                      :||||::|:|:|:|||:|:|||||    :||  :||||||||| 
                     pND8(237-267) 15TTAAATGTATTGAGTTAGATGGGAT----TAATATGTAACCTAC 5’  
                      bsND8(246-271) 11TAATGTAGTGAGTTAAATGGGAT----TAGATTGATACCTAC---AAGCATA 5’ 
                                         |||:|:|:|:|||:|:|||||    |||:||#||:|||||   |||:|: 
                        pND8(240-288) 10TATATATTGAGTTAGATGGGAT----CAAGTCATAGCCTAC---AAGTAT 5’ 
              bsND8(259-285) 05TATAATGTTAGTATATTGAGTTAGATGGTAT----TAGATCGTAGCCTAC---AAGAATATA 5’ 
 
 

 

 

179 

300       310       320       330       340       350       360       370       380       390    
uuuGGuGGuCAU**C*GuuuuGCGGAuuGAuuuACAuuGAGuuAU*C**GU**CGuuGuAuuuAuuGuGGuuuuuGuAuGCAuGuuuGCCCGACAGAU** 
:|::|||:||||  | |||                                                      |||:||::|||:|:|:|||||#|||| 
GAGTCACTAGTA--G-CAACATATATA 5’(276-318)                 pND8(372-417) 14TAAATATGTGTATAGATGGGCTTTCTA-- 
GAGTCATCAGTA--G-CAATATATA 5’(276-318)                         (390-414) 15TAT-TGTGTGTGAGCGAGTTGTTTG-- 
 |||::::||||  | :|:|::|::||:|||:|||||||||||||                                                 ||||| 
TAACTGTTAGTA--G-TAGAGTGTTTAGCTAGATGTAACTCAATATATA 5’(301-344) (391-431) 14TAAAATTTAATTACGTGTTAGTCTA-- 
  11TAATCGTTAGTA--G-TAAGATGCCTAGCTAGATGTAACTCAATATA 5’(301-344)                  bsND8(407-441)15TATAT-- 
       :|#||  | :||:|:|:|||::||:||||:|||||||| |  :|  | 
    13TATAATA--G-TAAGATGTCTAGTTAGATGTGACTCAATA-G--TA--GATTATA 5’ pND8(310-353) 
        14TTA--T-CAGAGTGTCTAATTAAGTGTAACTCAATA-A--CA--TATA 5’ bsND8(316-344) 
                                |:||||:||:|||| |  ::  |||||||||||| 
 pND8(331-364) 14TAATGTCTATAGT--AGTGTAGCTTAATA-G--TG--GCAACATAAATATATA 5’  
       17TATAGTAATAGAGTGTGTGATTGGATGTAGTTCGATA-G--CA--GCA 5’bsND8(325-355) 
                                       :|:|||| |  :|  |::||||:|||:|:||:||:|||||||||| 
                      pND8(338-382) 09TTTTAATA-G--TA--GTGACATGAATGATACTAAGAACATACGTAAATATA 5’  
                  bsND8(338-385) 04TATTTTTAATA-G--TA--GTGACATAGATAATATCAGAGACATACGTACAACATGTA 5’ 
 
400       410       420       430       440       450       460       470       480       490 
**GCCAuuACGCAUUCAuuGuuuGuuAuGuGuuuuuGuuGuuuAGCC**AU**GuAuuuAuuG*GCGC***C***CAAGuuuuuAuuGuuuGGuuGuuGu 
  :||||||||||||#||                 bsND8(465-503) TTTTATAAGTGTC-AGTG---G---GTTCGAAGATAGTAAACTAACAACA 
--TGGTAATGCGTAAATATATA 5’(372-417)                          |||| :|:|   #   # |:|||:||||:||::|||:|:|| 
--TGGTAATGCGTAA-TATATATGTTAAA 5’(390-414)   pND8(477-512) 14TAAC-TGTG---A---A#TTAAAGATAATAAGTCAATAGCA 
  :||||||#:||::|||||||||||||:|:|                               bsND8(482-510) 12TATAGTAATGAGCCAGTAACG 
--TGGTAATATGTGGGTAACAAACAATATATATA 5’ (391-431)                                           ||::|::|::: 
            ||::||:|:|:||:||||||||:|||||||||                              pND8(489-531) 14TAATTAGTAGTG 
  12TATATGCAGTGGGTGATAGACGATACACAAGAACAACAAAAAAAA 5’ (411-442)                bsND8(494-539)18TTATAGTG 
--GGAATATGTGTAGGTAGTAAATAATATACAAAAACAACAATAAA 5’(407-441)                    
                           ||:|:||:|::||:|:||:||  ||  :|||:||||: |||#   |   |||||||| 
          pND8(426-466) 13TTATATAAGAGTAATAGATTGG--TA--TATAGATAAT-CGCA---G---GTTCAAAATA 5’ 
         bsND8(426-480) 02TTATATAGAAGTAGTGAATTGG--TA--TATGAGTAAC-TGCG---G---GTTCAATATATA 5’ 
 

 

 

180 

 
500       510       520       530       540       550       560       570       580       590 
uuuAuGuuAuuuGAuuuuuAuuuGuGuuuuGuGuAGuuAuuuAuuuuGGGuGAuuuAuuGUGuuuAuGAuuuAA***AGAA**AuuCACGGUGAAAUUAA 
AAATTATA 5’ (465-503)                                  ||||::|:|:||::||:|||   ||||  |||||||||||||||| 
:|||||||||||:                         pND8(554-598) 13TAATAGTATAGATGTTAGATT---TCTT--TAAGTGCCACTTTAAT 5’ 
GAATACAATAAATATATA 5’(477-512)       bsND8(554-598) 05TAATAGTATAGATGTTAGATT---TCTT--TAGGTGCCACTTTAATATATA 5’ 
AGATACAATAATTATATA 5’ (482-510)       
|:|||:|:|:||||::||||||||||||||:| 
AGATATAGTGAACTGGAAATAAACACAAAATAGATA 5’ (489-531) 
||||::|:|:||:||||:||||::|||:|:||||||||||| 
AAATGTAGTGAATTAAAGATAAGTACAGAGCACATCAATAATATATA 5’ pND8(500-540)  
AAATATGATAGACTGAGAATAGATACAAGACACATCAATA-TATA 5’(494-539) 
       bsND8(523-567) 12TATAGAGTATATTGATAGATAGAGCTCACTAAATGACACAAATATAAATA 5’ 
 
600       610 
AUUUUGACUAAAU poly[A] 

 

 

181 

J) NADH Dehydrogenase subunit 9  
 
0         10        20        30        40        50        60        70        80        90         
UUAAUAUCAACUUAAUUUUUUUUAUAAACAuuAuAuuAUGuGuAuAuUUUUAuGuuuAuuuCGuuuAuGuuuuuGuuuAAuuUUAuuuuA**UUGuuuGu 
                                  :||:||:::||||:||:||||:|:||||||||||||||| 
             pND9(33-71) 14TATTATAGTAGTATGTATATGAAGATACGAGTAAAGCAAATACAAATATA 5’  
          bsND9(25-72) 11TTTTGTAATATAGTGTATATGTGAGAATATAAATAAGGCAAATACAAAATATA 5’ 
                                                             ||::|:||:||:|:::||||:||||||:||  |||||||| 
                                         pND9(60-105) 11TATTTAGTGAGTATAAGAGTGAATTGAAATAAGAT--AACAAACA 
                                      bsND9(60-101) 11TAAATGTAGTGAATATGGAGATAGATTAAGATAAAAT--AACAAACA 
                                                                                        |||  |:::||:| 
                                                                       pND9(87-124) 11TTAAT--AGTGAATA 
                                                                     bsND9(87-124) 10TATAAT--AGTGAATA 
 
100       110       120       130       140       150       160       170       180       190        
GuuGuAGAuGGuGuuUUGuuuGuuuuGuuGAuuGuAGuuuuuuGuuuuuuuAuuGuuuuGuuAGuuuuuuuuuGuuuuAUUGuAuGuuuuuAuuuuuuAA 
|||:||                                                                       ||||:::|::|:|:|||:::|||| 
CAATATA 5’(60-105)                                          pND9(176-216) 14TAATAGTGTGTAGAGATAGGGAATT 
TA-TATA 5’(60-101)                                         bsND9(176-216) 14TAATAGTATATAGAGATAGAAAATT 
:|::||||::|||:|:|:||||||| |||||| 
TAGTATCTGTCACGAGATAAACAAA-CAACTATA 5’ (87-124) 
TGACATCTACTACAGAGCAGACAAA-CAACTATAAA 5’ (87-124) 
                  :|||:|:|::||:|:|||||::||:|:|||:|||||||||||:| 
 pND9(117-160) 13TTAAATAGAGTAATTGACATCGGAAGATAAAGAAATAACAAAATA-TATA 5’ 
bsND9(117-162) 12TTAAATAGAATAGTTAGTGTCAAAGAGTAAAGAAATAACAAAACAATATA 5’ 
                                                  :|||::|:||:|||||||::||:|:|||||||||||||||||||| 
                                pND9(149-193) 15TTGATAGTAGAATAATCAAAGGAAGATAAAATAACATACAAAAATAATATA 5’ 
                              bsND9(147-187) 15TAGAATAGTGAGATAATCAAAGAGAAGCAGAATAACATACAATATATA 5’ 
 
200       210       220       230       240       250       260       270       280       290          
uuuGuGAuuuuuGuuuuuAuAuuGUUGuGAuUUGuuAuuGAuuGAuuuuuGuGGuuuuuGuuuuuGuCGuuuuAuGuuGuuGUAuAuuuuAuuuuGuuuG 
|:|||||:|||||||||                                                        |||||:::|:||:|:||||:|:||:||| 
AGACACTGAAAACAAAATATAACATA 5’(176-216)                  pND9(272-312) 06TATACAGTGATATGTGAAATGAGACGAAC 
AGATGCTAAAAACAAAATATAACATA 5’(176-216)                 bsND9(273-314) 12TTTATAGTAGTATATAAGATGGAACGAAT 
 ||:|:|:||:|:|:|:|||||::|::||||:|||||||||||                                                          
TAATATTGAAGATAGAGATATAGTAGTACTAGACAATAACTAAAATATA 5’ pND9(201-242)               pND9(303-339) 13TAA- 
 11TAATTAAGAGTAGGGATATGGTGACACTAGATAGTAACTAACTAAAAAAAA 5’ bsND9(204-249)         bsND9(305-339) 13TAAT 
                                        :|||:||:|:|:||:::|||:|:||:|||||:|||||||||    
                       pND9(239-279) 12TTTAATTAGAGATACTGGAAATAGAAGCAGCAGAATACAACATATTAAA 5’  
                      bsND9(239-286) 11TTTAATTGAAAGTACTAGAGATAAGAGTAGCAAGATACAACAATATATA 5’ 
 

 

 

182 

300       310       320       330       340       350       360       370       380       390         
uuuuuGuGuGuuCGuuuGuGuuuuGuuuuGuGuuGUUUGuuuGUAuuuuuuGGAuuGuGuuuuA*GuuuuA**GuuGuuuuuGuUAuGCGuuuuuGuuGu 
|:|:|||||||||                                        :||::|:|:|:| |||:||  :|::|||:|||||||||||| 
AGAGACACACAAGAATA 5’(272-312)     pND9(352-392) 13TAATTAGTATAGAGT-CAAGAT--TAGTAAAGACAATACGCAAATTATA 5’             
AAAGACACACAAGCATATATA 5’ (273-314)    (352-392) 10TAATTAGTATAGAGT-CAAGAT--TAACAGAGACAATACGCAAATTATA 5’ 
   :||||::|||:||||||:|:|:|:|||||||||||||                                           :||||:|:|:|:|:|::|  
AGTGACACGTAAGTAAACACGAGATAGAACACAACAAACATATA 5’(303-339)          pND9(382-421) 10TTAATATGTAGAGATAGTA 
AGAGTCACGTAAGTAAACACGAGATAGAACACAACAAACATATA 5’(305-339)       bsND9(380-418) 12TAATAGTATGTAGAGATAGTA 
                    :||||:|:||:|::|||:|:||||||:|:||:||||||||||||| :| 
   pND9(319-366) 11TTAAAATAGAATATGACAGATAAACATGAGAAGCCTAACACAAAAT-TATA 5’  
 bsND9 (343-368) 03TTAAAATAGAATATGACAGATAGAGATAAAGAATCTAACACAAAAT-TAAATA 5’ 
 
400       410       420       430       440       450       460       470       480       490         
uGGAACGC*GAAuGuuuUGAUUUGuuuGGuuuuUAuuuuGuuGGuAAuGAuAuuuuACAUCGuuuAuuuGuuGAuuG****GuuuuuuGuuGGuuuuuuu 
::||||:| ||||:||||||||                                               |:||:|:||    :|:|:|::|||||:|:|:| 
GTCTTGTG-CTTATAAAACTAA 5’(382-421)                  pND9(468-514) 13TATAATTGAC----TAGAGAGTAACCAGAGAGA      
ATCTTGCG-CTTACAAAATATATA 5’(380-418)                bsND9(469-514) 11TTAATTGAC----TAAAGAGTGACCGAAGAAG 
          :|||:|:||:|:||||:|:|:||:||||:||||||||||                                    
       08TTTTATAGAATTGAACAGATCGAAGATAAGACAACCATTAAAATATA 5’ pND9(409-447)  
     05TATATTGTAGAATTAAGTGAATCAGAAATAAAGCAACCATTAAAATATA 5’ bs(410-447) 
                                        :|||::|||:|||:|:||||||:||||:|:||||||||    |||| 
                       pND9(439-484) 13TTAACTGTTATTATGAGATGTAGTAAATGAGCAACTAAC----CAAATATA 5’   
                     bsND9(438-483) 10TATAATTGTTATTGTAGAATGTAGCGAGTGAATAACTAAC----CAA-TATATA 5’ 
 
500       510       520       530       540       550       560       570       580       590          
uuGuuGAAGuGuuAUCCAuuAuuuGGuuuGuuuGuAuuGuuAuuuuGuGuGuuG**GuGGAGGAGAUAGuAuGuACGuuuACAAuGuuAuuuuuGuuGuu 
::|||||||||||||                                                      ::||:|||:|:||||||:||||:|:||||||| 
GGCAACTTCACAATATATA 5’(468-514)              pND9(568-604) 11TGCTATAGTGTATATGTAGATGTTATAATAGAGACAACAA 
AGCAACTTCACAATATATA 5’(469-514)                bsND9(569-600) 04TGAGTAGTATATGCAAATGTTACAGTGAAAACAACAA 
   :|||||:::|:|||||||||:||:||:::|||||||||||||||:|:|                                 |||:|:|:||:|:||||: 
12TTAACTTTGTAGTAGGTAATAGACTAAGTGAACATAACAATAAAATATA 5’(502-549)    pND9(582-612) 09TTTATAGTGAAGATAACAG 
14TAACAGTTTGATGATAGGTAATGAGTCAAGTAAACATAACAATATATA 5’(509-542)      bsND9(582-611) 12TTTATAATGGAGATAGTAA 
                                                                 bsND9(597-625) 12TTTATAGTAGAGATATTAA 
                                   ||||:|:||:||:|:|::|:  |::||:|||||#:||||||||||| 
                 pND9(534-566) 14TAATAATAGTAGAATATATGAT--CGTCTTCTCTAGTATACATGCAAA 5’ 
                bsND9(533-566) 09TTATAATAGTAGAATATATGAT--CGTCTTCTCTAGTATACATGCAAATAAA 5’ 
                                    |||:|:||:||:|:|||::  :|:|||:|||:|||||||||||| 
                   pND9(535-578) 13TTAATAGTAGAATATACAGT--TATCTCTTCTGTCATACATGCAATTATA 5’  
                    bsND9(543-580) 09TAATAGTGAACATATAGT--CATCTCTTTTATTATACATGCAAATTATA 5’ 

 

183 

600       610       620       630       640       650       660       670 
GCAuACC**AAuuUUUAuuuG*CAuuAuuuuAuuuA***AuA**UCACCGuUGUAAUUCUAAAUUUCUCACUUCC 
|||||                                 
CGTATATATA 5’(568-604) 
TATATA 5’(569-600) 
:||||||  ||||##|||:|| |||||| 
TGTATGG--TTAATTATAGAC-GTAATATA 5’(582-612) 
CGTGTGG--TTATTAATAGAC-GTAA 5’(582-611) 
CGTGTGG--TTGAAGATAAAC-GTAA 5’(597-625) 
          ||||:|:||||: |||||:||:|:|||   |||  #||||:|||||||||| 
       12TTTAAGAGTAAAT-GTAATGAAGTGAAT---TAT---GTGGTAACATTAAGAATATATA 5’ pND9(609-644)  
       12TTTAAAAGTGAAT-GTAATAAGATAGAT---TG----ATAG 5’ bsND9(609-640) 
                :||||: |||:|:|:||||:|   |||  |||||::||||||:|| 
             12TGTAAAT-GTAGTGAGATAAGT---TAT--AGTGGTGACATTAGGAAATATATA 5’ pND9(615-659) 
         12TTAAAGTTAGT-GTAATAAGATAAGT---TAT--AGTGGCAACATTAAGTATATA 5’ bsND9(618-658) 
 
 

 

 

184 

K) Ribosomal Protein S12  
 
0         10        20        30        40        50        60        70        80        90         
CUAAUACACUUUUGAUAACAAACUAAAGUAAAuAuAuuuuGuuuuuuuuGCGuAuGuGA*UUUUUGUAUG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGu 
                                    ||:|::::||:|:|||:|||||:| :||:||||:| ||||||                    |::| 
                pRPS12(35-76) 12TATTTAGAGTGGAAGAGACGTATACATT-GAAGACATGC-CAACAAATA (96-121)12TATTATAGTA 
                     bsRPS12(38-78) 18TAGTGAAGAGAGTGTATATGCT-AAAGACATAC-CAACAATATATA(96-121)11TAATAGTA 
                                            |:||:|:||||::||| |:||::|||| ||||||||     (96-131)10TATAGTA 
                   pRPS12(43-78) 14TATATAGTTAGAAGATGCATGTACT-AGAAGTATAC-CAACAACATATA 5’  
                  bsRPS12(43-78) 12TATATAGTTAGAAGATGCATGTACT-AGAAGTATAC-CAACAACATATA 5’ 
                                                                |::||:: :||::|:|:||| ::|:|||:|||||||| 
                                            pRPS12(63-109) 12TATAGTATGT-TAATGATAGATG-TGAGACAGAATAAACA 
                                           bsRPS12(66-99) 07TAGTAGAGTGT-CAATAGTAAATG-CAAAACAAAATAAATATA5’ 
                                                                           :|::|:||| |||:|:||:||||||| 
                                               pRPS12(74-106) 12TAATATGTCATTAGTAGATG-CAAGATAAGATAAACA 
                                                  bsRPS12(73-115) 16TCTTTTATAGTAAATG-TAGAGCAAGATAGACA 
 
100       110       120       130       140       150       160       170       180       190        
uuuAuGuuAuuAuAuGAGuCCG**CGAuuGCCCAGuuCCGGuAACCGACGuGuAuuGuAuGC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAuGu 
|||||:||||                                                            :|||:|:|:|||||:|||:||::|:||||:| 
AAATATAATATA 5’(63-109)                           pRPS12(169-208) 12TATATAGAGTGAATATGTTAGAATGAGCCTATA 
|||||:|                              bsRPS12(156-207) 12TTATATG--G----TATAAGATAGATGTGTTAGAATAGACTTACA 
AAATATATATA 5’ (74-106)                            
GAATACAATAATATATATA 5’(73-115)  
||||:||||:|||||:|:||||                                                                         ::||:| 
AAATGCAATGATATATTTAGGC--ACTAA 5’ (96-121)                                   pRPS12(194-235) 11TTTTATA 
GAATGTAGTGATATATTCAGGT--AGCTAACGTGTCAAATATA 5’(96-121)                     bsRPS12(194-235) 09TTTTATA 
||||:||||:|||||:|:||||  ||||||||#|:|||                                                       
AAATGCAATGATATATTTAGGC--GCTAACGGATTAAGATATA 5’(96-131)                          
                     |:  |:||::||#|::||||:|||||:|||||||||||| 
             10TATTCAGT--GTTAGTGGATTGAGGCTATTGGTTGCACATAACATTCA 5’ (119-158) 
                          :|||:|#||:|:|#:|||||||||||:||:|:|||||  |    || 
                        TTTTAATGTGTTAGGATCATTGGCTGCATATGATATACG--G----CAATATA 5’ pRPS12(139-170) 
                       11TTTAGTAGATTGAGGCTATTGGTTGCACATAACATTCATA 5’ bsRPS12(133-158) 
                                               
 

 

 

185 

200       210       220       230       240       250       260       270       280       290        
uGCGuuGuuuuuuuuGuuGuuuuAuuGGuuuAGuuAuG**UCAuuAuuuAuuAuAGA***GGGUGGuGGuuuuGuuGAuuuACCC***G****GuG*UAA 
|||||||||                                                           ::|||:::||||:||#||   :    ||: ||| 
ACGCAACAATAAATA 5’ (169-208)                     pRPS12(267-322) 07TTTAAAGTGACTAGATAGG---T----CAT-ATT 
ATGCAACATATA 5’(156-207)                        bsRPS12(288-322) 12TTTAAAGTAACTAGATGGA---T----CAT-ATT 
                                           bsRPS12(269-308) 10TTAGTACAAGAGCAGTTAAATGGG---C----TAC-ATT 
                                   |||:  ||||:|:|||:||:|||   :::|||::|||||||||||:| 
                                14TATAT--AGTAGTGAATGATGTCT---TTTACCGTCAAAACAACTAGAATATA 5’pRPS12(234-280) 
                                15TATAT--AGTAGTGAGTGATATCT---TTTACCGTCAAAACAACTAGAATATA 5’bsRPS12(234-280) 
|:|:|::|:|:||:|::||:|||||::|||||||||#|  ||||||| 
ATGTAGTAGAGAAGATGACGAAATAGTCAAATCAAT-C--AGTAATATATA 5’(194-235) 
ATGTAGCAGAGAAGATGACGAAATAGTCAAATCAAT-C--AGTAATATATA 5’(198-235) 
    :||:|:|:|:|::|||::|:|||:|:|:|||||:|  ||||||| 
14TATAATAGAGAGAGTAACGGAGTAATCGAGTCAATGC--AGTAATATATATA 5’ pRPS12(203-246) 
07TATAATAGAGAGAGTAATGGGATAGTTAAATCAATAC--AGTAATTAAAATA 5’ bsRPS12(203-245)                                   
 
300       310       320       330       340       350                                           
AGuAuuAuACA*CG**UAuuGuAAGuuAGA*UUUAGAuAUAAGAUAUGUUUUU[AAUA]POLYA 
|:|||:||||| ||  ||||||| 
TTATAGTATGT-GC--ATAACATATA 5’ (267-322) 
TCATAATATATA 5’(269-308) 
TTATAATGTGT-GC--ATAACATATA 5’bsRPS12(288-322) 
          || |:  ||||::|||||||| ||:|||||||||||||||| 
     16TAAGT-GT--ATAATGTTCAATCT-AAGTCTATATTCTATACAATATAAA 5’ pRPS12(309-349) 
     24TAAGT-GT--GTAATGTTCAATCT-AAATCT-TAGTCTATACAAAATAAA 5’bsRPS12(309-336) 
 
 

 

 

186 

APPENDIX C.  All gRNA major classes pulled for ATPase 6 in the EATRO 164 procyclic (shaded 

gray) and bloodstream (white) transcriptomes. 

Populations of gRNAs are bordered boxes.  A) ATPase 6; B) Cytochrome Oxidase III; C) C-
Rich Region 3; D) C- Rich Region 4; E) Cytochrome b; F) Maxicircle Unidentified Reading Frame II 
(Murf II); G) NADH Dehydrogenase Subunit 3; H) NADH Dehydrogenase Subunit 7; I) NADH 
Dehydrogenase Subunit 8; J) NADH Dehydrogenase Subunit 9; K) Ribosomal Protein S12. 
 
A) ATPase 6 

3' 
75 
72 
75 
75 
75 
75 
62 
75 
62 
75 
75 

 

102 
100 
102 
102 
100 

 

127 
129 
118 
118 
118 
118 
129 

 

152 
152 
148 
152 
148 

 

183 
177 
175 
177 

 

208 
210 
208 
208 
208 

 

243 
248 
243 
243 
243 
246 

 

269 
269 
262 
262 
262 
267 

 

5' 
31 
29 
31 
31 
29 
35 
29 
33 
31 
39 
37 

62 
62 
62 
74 
64 

86 
90 
84 
82 
86 
83 
91 

113 
105 
105 
116 
104 

138 
144 
139 
132 

164 
176 
165 
158 
164 

192 
218 
190 
189 
207 
193 

 

 

 

 

 

 

 

224 
226 
221 
219 
218 
221 

 

 

Reads 
2,044 
1,630 
143 
489 
395 
224 
203 
144 
95 
82 
40 

ATPase 6 gRNA Sequences 

 AT ATAAACGTAACTGAAATGAATCACGAGAGAAAGATAAAGATATAT AT12 

ATATAC AACGCAACCAGAGTAAATCATGAAGGGAAAGTGAAGGCATATTT T11 

 AT ATAAACGTAACTGAAATGAATCGCGAGAGAAAGATAAAGATATAT AT21 

 AT ATAAACGTAACCAAAATGGATCATGGAAGAGAAGTAAAGATATGT AT09 

 AT ATAAACGTAACCAAAATGGATCATGGAAGAGAAGTAAAGATATGTTT T11 

 AT ATAAACGTAACCAAAATGGATCATGGAAGAGAAGTAAAGAT T13 

 ATATACAACGCAACC AGATAAATCATGAAGAGAAAGTGAAGGTATATTT T09 

 AT ATAAACGTAACCAAAATGGATCATGGAAGAGAAGTAAAGATAT T12* 

       AACGCAACC AGATAAATCATGAAGAGAAAGTGAAGGTATAT AT14 

 AT ATAAACGTAACCAAAATGGATCATGGAAGAGAAGTAA T16 

 AT ATAAACGTAACCAAAATGGATCATGGAAGAGAAGTAAAG TCAACTTAAT08* 

 

   

1,435 
154 
92 
10 
9 

 ATACA ATCATACACAGTAGTACATATATAGTGATAGACGTGATTAA T11 

  ATAAAA CATACACAATGATATATACATAGTAATAGATGTGATTAA T23 

ATATAA ATCATACACGATAATATATGCGTAGTAACAGATGTGATTAA T18 

ATATAT ATCATACACGATAATGCATATGTAGTAAC T13 

  ATAAAA CATACACAATGATATATACATAGTAATAGATGTGATT T14* 

 

   

2,158 
177 
85 
38 
28 
22 
21 

 

743 
54 
128 
84 
16 

 

430 
210 
3,468 
205 

 

54 
36 
24 
14 
218 

 

172 
147 
52 
10 
7 
5 

 

864 
808 
149 
105 
57 
32 

   ATAT AAATACACAGTAGAATATGATCTAGGTTATGTATGATGATAT T11 

ATATA TAAAATACACGATAGAGCATAACTTAGATTGTATATGATA T20 

    ATAATAATACAC ATAGAACATGACCTAGATTGTACATAGTGATATAT T12 

  ATATAATAATACAC ATAGAACATGACCTAGATTGTACATAGTGATATATAT T13 

  ATATAATAATACAC ATAGAACATGACCTAGATTGTACATAGTGATAT T15 

    ATAATAATACAC ATAGAACATGACCTAGATTGTACATAGTGATATATA AT18 

ATATA TAAAATACACGATAGAGCATAACTTAGATTGTATATGAT T18 

   

ATAC ATCAAAAATCAACGTTAGACAGTTAAGATATGTGATAGAA GATAAT12 

  AC ATCAAAAATCGACATTAGATAATTGAGGTATGTGATAGAGTATAATTT T11 

   ATATC AAAATCAACATTGAGCAGTTAAAGTACGTGGTAAGATATAATTT T12 

  AC ATCAAAAATCAACATTGAGCAATTGAGGTACATGATA TGATATAAT09 

   ATATC AAAATCAACATTGAGCAGTTAAAGTACGTGGTAAGATATAATTTA T07* 

   

AT ATACAAATCAAACAGACAGAGTAATAGAAGGTTGAAGATTGATAT AGT11 

   ATATC ATCAAACAAACAGAATAATAGAGAATCAGAGGT GAATGTTAAGT15 

     ATAAA TAAACAAACAAAATGATAAAAGGTCAGAGATTGATG GTGAATAAT08* 

    ATAT ATCAAACAAACAAAGTAATAGAAAGTCAGAGATTGATGTTAAATA T10 

   

 ATAT ATAAACACAAATCAACGAATAGATATAAGTCAGATAGATGG TGTATTAT12* 

 AT AAACAAACACAAATCAGTAGACGAGTACAAGT GAGATGGACGTATAGAT07 

ATAAT ACAAACACAAACTGATAGACGAATACGAGTTAGATGGACG TAT06 

 ATAT ACAAACACAAACTGACGAATAGATACAGATTAAGTGAATGAAATAAT T11 

   AT ATAAGCACAAACCAATAGACAGATATAAGTCAGATAGATGA TTAT14 

   

         ATATAAATTAAACAACATAGATTACAGTGATAGAAGTAAATGTGAATTA T04 

ATC AGACTATGTGAGTTAGATGACGTGAATTATA CTGTATAT12 

         ATATAAATTAAACAACATGAACTATGATGATAAAGGTAAATGTGAATTAAT T19 

         ATATAAATTAAACAACATGAACTATGATGATAAAGGTAAATGTGAATTAATG TACT* 

         ATATAAATTAAACAACATGAACTATGATGATAAAGGT T10 

     *ATTGTATAAATTAAACAACATGAACTATGATGATAAAGGTAAATGTGAATT T03 

   

ACATAA TAATACAATAATACGAGATTAGACTATGTGAATTAAATGATATGA T11 

ACATAA TAATACAATAATACGAGATTAGACTATGTGAATTAAATGATAT T13 

       ATATAT ATAATACAAAATTGAACTGTATAAGTTAGACAATGTGAATT TT 

       ATATAT ATAATACAAAATTGAACTGTATAAGTTAGACAATGTGAATTAT T14 

       ATATAT ATAATACAAAATTGAACTGTATAAGTTAGACAATGTGAATTATA T21 

      AC ATACAATAATACAGAATTAAACTGTGTAAGTTAGATAGTGTAAATT T10* 

 

   

187 

5' 
248 
253 
255 
252 
262 
259 
254 

 

266 
281 
284 
283 
281 

3' 
292 
298 
292 
292 
304 
304 
298 

 

313 
313 
310 
310 
312 

 

 

291 
293 
301 
300 
301 
301 

 

331 
332 
331 
331 
332 
331 
332 
331 

329 
329 
345 
346 
335 
345 

 

375 
378 
371 
375 
378 
374 
378 
371 

 

 

360 
349 
352 
360 
354 
362 
349 
361 

407 
389 
401 
407 
401 
407 
389 
407 

 

 

387 
387 
387 
387 
390 
398 
397 

424 
427 
421 
424 
424 
427 
424 

5' 
455 
455 
452 
476 
458 

 

 

435 
435 
437 
435 
435 
435 
435 

 

464 
467 
460 
457 
464 
468 
457 

 

3' 
491 
497 
477 
500 
500 

Reads 
25,157 
5,405 
244 
134 
4,881 
395 
290 

 

586 
14 
13 
2 
2 

 

133 
12 
647 
263 
125 
33 

 

24,736 
8,776 
3,561 
712 
387 
302 
144 
1,547 

 

181 
41 
705 
429 
160 
143 
104 
74 

 

1,428 
1,049 
934 
664 
635 
28 
23 

ATPase 6 gRNA Sequences cont. 

           ATATA AAATACAAATTCGAGTAGGTAGTACAATGATATGAGATTA T13 

    ATATAT AACAACAAATATAGATTCAAGTAAGTGATGTAGTAATATGA T11 

           ATATA AAATACAAATTCGAGTAGGTAGTACAATGATAT TATTATTAAT15 

           ATATA AAATACAAATTCGAGTAGGTAGTACAATGATATAGA TTATTAAT07 

ATAT ATACAAAACAACAGATATAGATTCGGATAGGTAATATGA GATCT13 

  AT ATATAAAACAACAGATATGAATTCAAGTGAGTGATACAGTA TAT15* 

    ATATAT AACAACAAATACGAATTCAGATAGGTAGTATGATGATATA T15 

   

AAAAAA AAAAAAACAATACAAGATGACAGGTATAAGTTTGGATGAGTAAT T12 

   AAA AAAAAAACAATACAGAATAGTAGGTATAGATT AGATATGTGAT09* 

     ATAT AGAACAATACAAAATAACGAGTACAG T08 

      TAT AGAACAATACAAAATAACGAGTACAGG ATAAGTGATAT08 

     AT AGAAAACAATACAGAATAGTAGGTATAGATT AGATATGTGAT19 

   

AAATATAT AAATGCAATATACGATAGAGAAATGATATAAGATGATAA T16* 

AAATATAT AAATGCAATATACGATAGAGAAATGATATAAGATGAT T14 

 ATAT AACAAAACAAAAGTAGAAGTGCAGTATATGATAGAAAAATGATGT CAAAT11 

ATAT AAACAAAACAGAAATAGAAATGCAATATACGATAAGAAAATGGTATA T12 

 ATAT ACAAAACAT AAATAAAAGTGCAGTATATGATAAAGAGATAATAT T11 

 ATAT AACAAAACAAAAGTAAAAGTGCAGTGTATGATAGAAAAATGATGT CAAAT15 

   

 ATAT AATTATTAAACAAGAGAAAGTCACGTAAAAGGTAGAATGAAGATA T12 

AT ATAAATTATTAAACAGAAAGAGATCATGTAGAAAGTGAGATAGAAAT T14 
   ATATAA ATTAAACAAAAAGAAATCACGTAGAAGACAGAATAGAGATA T13 
 ATAT AATTATTAAACAAGAGAAAGTCACGTAAAAAGTAGAATGAAGATA TTAT05 

AT ATAAATTATTAAACAGAAAAGAGTCATATAGAAAATAAGATAGAAAT T12 

  ATAT ATTATTAAACAAAGAGAAATCATATAAGAGACAGAATGAGAATA T15 

AT ATAAATTATTAAACAGAAAGAGATCATGTAGAAAGTGAGATAAAAAT TTT 

   ATATAA ATTAAACAAAAAGAAATCACGTAGAAGACAGAATAGAGATA T15* 

   

ATATAT ACATCCATAAAATTATCATCAGTTAATAGATTGTTAAATGAAAA TTTT 

              ATATAA ATCACCAACTAATAAGTTATTGAATGAGAGAAAGTTATATA T12 

       ATATA ATAAAACTATCACTAACTAATGGATTGTTAAGTAGAAGAGAATCAT T11 

ATATAT ACATCCATAAAATTATCATCGGTTAATAGATTGTTAAATGAAAA T11 

       ATATA ATAAAACTATCACTAACTAATGGATTGTTAAGTAGAAGAGAATC CT07 

ATATAT ACATCCATAAAATTATCATCGGTTAATAGATTGTTAAATGAA T12 

              ATATAA ATCACCAACTAATAAGTTATTGAATGAGAGAAAGTTATATA T09 

ATATAT ACATCCATAAAATTATCATCGGTTAATAGATTGTTAAATGAAA TTTTTTTTTTTTTTT 

   

  ATATAT AACACAACAAGAAACGAATGAGAGAAGTATCTATGAGATTATT T14* 

  ATATAT AACACAACAAGAGACGAATAGAAAAGATATCTGTGAAATTATT T13 

ATATAT AAAACACAATAGAAAACGGATAAGAGAGATATTCATAGAGTTATT T13 

  ATATAT AACACAACAAGAGACGAATAGAAAAGATATCTGTGAAATTATT T12 

  ATATAT AACACAACAAGAGACGAATAGAAAAGATATCTGTGAAATT T11 

  ATATAT AACACAACAAGAGACGAATAGAAAAGATATCTGTGA T05 

  ATATAT AACACAACAAGAGACGAATAGAAAAGATATCTGTGAA T13 

 

   

25,624 
2,307 
1,864 
368 
6,879 
1,781 
232 

    ATAT ATGACACAACGAGGGAAGATACTCTAAAGGACACAGTGAAA T12 

 ATAT ATAACGACACAATAGAGAAAGATGCTCTGAGAGATGTAATA T13 

      ATAAAT TACAACAAAGAAAGATACTCTAGAAAGCACAGTGAGAAAT T16 

   AAATTAACGACA AACAAAGAGAAATACTCTGAGAAATATGATGAAA T12 

    ATAT ATGACACAACGAGGGAAGATACTCTAAAAGGTACAGCGAAA T13* 

ATAT AACAACGATACGACAGAGAAAGATATTCTAAGAGATATGACA T13* 

   AAATTAACGACA AACAAAGAGAAATACTCTGAGAAATATGATGAAA T12 

 

   

Reads 
368 
22 
54 
14 
1 

ATPase 6 gRNA Sequences cont. 

  ATATATAATTAC AAACAAACGCAGAGATGTCGGTAAATAATGATATAAT T11 

    ATAT ATTACAAAACAGACGTAAAGATGTCGATGAATGGTGGTATAAT T14 

                        AATT ACGTCGATAGATAACGATACAATGAG ATTAATTTT 

AAATT TAAATTACAAGACAAACGTAGAAGC T24 

AAATT TAAATTACAAGACAAACGTAGAAGCGTCGATAGATAATGATAT15 

 

 

 

   

487 
487 
487 

 

521 
520 
521 

526 
528 
526 

 

567 
553 
553 

 

 

8,723 

1 
635 

 

232 
85 
11 

 

ATACAA ATCAACAATAGAAGATGGGATGATAATAGATTGTGAGATA T17 

ATAC ACATCAACAATAGAAGATGGGATGATAATAGATTGTGAGATA T27 

ATACAA ATCAACAATAGAAGATGGGATGATAATAGATTGTGAGATA T16 

   

AA AAAAAAAAAAAACAAAAATAGAATAAAGAAAGTCAGAGAATGTTAAT T05 

    AAAAAAAAAAAC AAAATAAAGTAAGAAGAATCAGAGAGTGTCAATA TATTTT 

     AAAAAAAAAAC AAAATAAAGTAAGAAGAATCAGAGAGTGTCAAT T12 

   

 

188 

5' 
557 
549 
549 
546 
546 
549 

3' 
593 
593 
592 
592 
593 
592 

Reads 
69,619 
2,587 
181 
313 
76 
30 

ATPase 6 gRNA Sequences cont. 

     AATAAATCGATAACAAAGAACACTGTAAAAGAGAGAA TGAGAGTAAATAT09 

  AC AATAAATCAATAACAGAGAATATCATAGAGAGGAAAGATAGAAAT T12* 

 ATAT ATAAATCAATGACAAGAAGCACTGTAGAAAAAGAGAGTGAAAAT T13 

   AT ATAAATCAATAACAGAAGATGCCATAGAGAGAGAAAGTGAGAGTAAA T11 

  AT AATAAATCAATAACAAAGAACATTGTAAAAGAGAAAAGTGAGAATAAA T13 

 ATAT ATAAATCAATGACAAGAAGCACTGTAGAAAAAGAGAGTGAAAAT T13 

 

 

 

   

568 
576 
589 
589 

611 
616 
629 
629 

 

 

613 
613 
613 
613 

654 
657 
654 
657 

 

 

640 
647 
640 
654 
643 
640 
640 
680 
680 
672 
680 
671 
685 

689 
689 
689 
689 
667 
668 
662 
714 
719 
716 
714 
714 
714 

 

 

698 
686 
699 
699 

728 
728 
727 
727 

 

 

720 
715 
728 
720 
717 
718 
720 
720 
720 

 

747 
747 
745 

767 
755 
765 
767 
763 
763 
763 
767 
767 

 

789 
789 
789 

 

 

790 
770 
773 
774 
777 

822 
822 
822 
822 
822 

670 
115 
854 
3,292 

 

618 
183 
459 
399 

 

39,063 

678 
131 
234 
4,401 
250 
165 
7,581 
1,291 
119 
105 
319 
309 

 

740 
12 
88 
24 

 

4,588 
2,272 
920 
165 
1,613 
488 
452 
269 
177 

 

13 
587 
91 

 

8,663 

1 
428 
223 
195 

  ATACT AAACACAAAAATGAATAAAATAAGTCAGTGATAGAAGATATTAT T12 

AT AAACAAAACACAAAAATAAGTAAAGTAGGTCAGTGATAAGA TATACAT07* 

AT AAATAATAAACAGAAACGGAATACGAGAATAAGTAAAGTGA TTTAAT13 

AT AAATAATAAACAGAAACGGAATACGAGAATAAGTAAAGTGA TTTAAT10 

   

 ATATAT AATCCAACAGATATAAGAGCATGTAAAATAGTAAGTGAAAAT T12 

  AT ATAAATCCAACAAGTATAAGAACATATAGAATAGTAGGTGAAAAT T12 

   ATAT AATCCAACAGATATAAGAGCATGTAAAATAGTAAGTGAAAAT T11 

ATAT ATAAATCCAACAAGTATGAAGACACGTAAAATAGTAAATGAAAAT T14* 

   

 ATAT ATAAATAACTGTAGTATGGTGGTAGATGAGTTTGATAGATATA T12 
 ATAT ATAAATAACTGTAGTATGGTGGTAGATGAGTTTGAT T11 
 ATAT ATAAATAACTGTAGTATGGCGGTAGATGAGTTTGATAGATATA T11 

 ATAT ATAAATAACTGTAGTATGGTGGTAGATGA TTTTGATAGATATAT12 
ACATATATAAATAACTGTGATATTGCGGTAGATGGATCTGATGAAT T14† 
 ATATAATAAATAACTATAATAAGGTGGTAAGTGAGTTCAGTGAATATA T14† 
          AAAACTGTAATATGGAGGTAAGTGAATTTGATAGATGTA TTAATTTT† 

     AAAATA CAACTGCAAGATCGTGTTATAGAGGATAAGTGATT TAAT13 

ATATAA ATTATCAACTGTGAGATTATATTACAAGGAATAAGTGATT T13 

    ACACA ATCAACTGCAGAATTATATTACAGAGAGTGAGTAATTGTAA AAT12 

      AAATA CAACTGCAAGATCGTGTTGTAGAGGATAAGTGATT TAAT11 

      AGATA CAACTGCAAGATCATATTATAAGAGGTGAATGATTGTAAT 14T* 

      ATATA TAACTGCAGAATCATATTATAAGGGATGAA CGATTGT13 

   

    ATT AAAATCCATTATCGATTGTAGAGTTATGT GATAGAGAATAAT11 

ATATATT AAAATCCATTATCGATTGTAGAGTTATGTTATAGAGAATAA TAT21 
         AAATCCATTATCAGTTGCGAGATTGTA GTATAAAGAATAAT14 
  ATATAT AAATCCATTATTAACTGTAAGATTGTA GTATAGT17 

   

 AT AAATCAAATACAGAACTGAATAGACGATAAAGATAGTGAGAAATTT T11 

       ATATATAT AAACTAAACAAATAGCAGAGACAGTGAGAGATTCGTTAT AAT13 

   AT ATCAAATACAAAACTGAGCAGATGACAGAGATAGTAAA TGATTTAT12 

 AT AAATCAAATACAGAACTAGATGAACAATAGAGATAGTGAGAAATTT T12 

  ATATA TAAATACAAAACTAGATGAATGACAGAAACGATGAGAGATTTATT T14* 

    ATA TAAATACAAAACTAGATGAATGACAGAAACGATGAGAGATTTAT AAT17 

  ATATA TAAATACAAAACTAGATGAATGACAGAAACGATGAGAGATTT T12 

 AT AAATCAAATACAGAACTAGATAGACGATAGAGATAGTGAGAAATTC TTTT* 

 AT AAATCAAATACAGAACTAGATAGACGATAGAGATAGTGAGAAATTT T17 

   

ATAAAT ACAACAATATAATAACTGTCGAAGGTTGAATATGAGATTAAAT T11 

ATATAT ACAACAATATAATAGCTATCAGAGGTTGAATGTGAGATTAAAT T11 

ATATAT ACAACAATATAATAGCTATCAGAGGTTGAATGTGAGATTAAATGA T11 

   

  ATA CTATAACTCCAATGACGAAATCAGTTTTA CAGTGATATGATAATT T12 

  GGA CTATAACTCCGATAACGAATCAGATTTTGACAGTGATATGATAATTATT* 

ATATA CTATAACTCCGATAACGAATCAGATTTTGACAGTGATATGATAATT T09 

ATATA CTATAACTCCGATAACGAATCAGATTTTGACAGTGATATGATAAT 

ATATA CTATAACTCCGATAACGAATCAGATTTTGACAGTGATATGAT T14 

 

189 

 
 

 

B) Cytochrome Oxidase III 

3' 
70 
73 
70 
70 

 

101 
99 
99 
101 
92 
95 

 

112 
116 
131 
115 
115 
115 
112 

 

132 

 

156 
155 
150 

 

185 
185 
188 
185 
185 
185 
185 
185 
188 
185 
185 
185 
187 
185 
185 
188 
188 
185 

 

203 
211 
195 
216 
247 
247 
243 
247 
244 
243 
247 
247 
247 
244 
243 
244 
243 
243 
244 

 
 

Reads 
1,179 
826 
112 

14,200 

COIII gRNA Sequences 

ATAATT AATATACAACGAGATAGAGACGTAAAAGAAT TGATGTAT12 

 AT ATAAATATACAACGAGATGAAGGCATAGAGAAA AGATGGTATATAAT14 

ATATAC AATATACAACGGAATGAGAATATAAGAAAGTGATGATA TTAT11 

ATATAT AATATACAACGAGATAAGAACATAGAGAAA AGATGGTATATAT13 

 

   

1,386 
721 
364 
229 
122 
78 

 

834 
550 
1 
443 
117 
96 
64 

 

6 

 

104 
465 
36 

 

ATAT AAAACAAAAACATCACTGATATTGACGGATATATGATGA TAAAT12 

ATATAT AACAAAAACACTACTAGCGTTGACAGATATATGATGAAAT T12 

  ATAT AACAAAAACACTACTAGCATTGACAAATATATGATGAAAT T13* 

  AT AAAACAAAAACACTGCTAATATCGACGAATATATGATGGAA AAT14* 

    ATATAAAAT ACACCACTGATATCAACGAGTATATGATGAGATA T14 

    ATATAT AAAACACCACTGACATCGATAAGTATATAGTGAAGTGA TTAAT15 

   

               GTA GAGTGAAGATAGAGAAATAAAGATATCGTT T13 

ATATATAATAACAATA GCAGGTAAAGGTGAGAAAGTGAAGATATCATT T10 

  TACATAATAACAGTGGCGGGTAGAGATAGAAGAATAAAGATACTATT T08 

        AACGATGGA TAGGTAGAGATAGAGAAATGAAGATATT TTAT05* 

     AATAACAATGGA TAGGTAGAGATAAAGAAATGAAGATATC T06 

     AATAACGATGGA TAGGTAGAGATAGAGAAATGAAGATATC T07 

  ATACATAACAGTGGCAGA GTGAAGATAGAGAAATAAAGATATCATT T11* 

   

AT ATATACAATAACAGTGGTAGGTAGA T09 

   

ATATATA TCCAACAAACAGAGTAACCGATACATAGTGATAGTG ATAT13 

 ACATATA CCAACAAACAGAATAACTAGTGCACAGTGATGATG ATAGT16 

    ATATA CAACAAACAAAATAATCGATGCACAGTGATAGT AGTAGT13 

   

126,513 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAATAGATGA TTTAAT13 

761 
542 
350 
199 
183 
172 
116 
109 
106 
99 

   ATAT ATTACCAAACAATAGATGAGTAGATTCTAATAGATGA TTTAATTAAGTTTT* 

ATAT AAAACTACCAAACAGTAAATAGATAAGTTCTAATAAGTGAGATAATT T11 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAATA TATGATTTAAT12 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAATAGATA TTTAATTAT05 

   ATAT ATTACCAAACAATAGACAAGTAGATTCTAATAGATGA TTTAAT13 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAATAG CTGATTTAAT13 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAATAGAT CATTTAATTAT05 

ATAT AAAACTACCAAACAGTAAATAGATAAGTTCTAATAAGTGAGATAATTAAT TGTTAT16 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAAT CGATGATTTAATTAAT17 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAATAAATGA TTTAATTAAT08 

62,901 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAATAGATGA TTTAAT14 

810 
685 
497 
332 
308 
140 

 

2,487 
861 
232 
138 
5,008 
688 
462 
457 

83,016 
9,810 
2,126 
1,676 
1,379 
363 
269 
149 
130 
126 
102 

 ATAT AAACTACCAAATAATGAACAGATAAATTTCAGTGAGTGA TTTAAT13* 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAATAGAT T13 

     AT ACTACCAAACGATAAGCAGATAAGTCTCAGTGAATGA TGTAACT15 

ATAT AAAACTACCAAACAGTAAATAGATAAGTTCTAATAAGTGAGATAATT T10 

ATAT AAAACTACCAAACGATAGACGAATAAGTTCTGATAAGTGA TATAT12 

   ATAT ATTACCAAACAATAGACGAGTAGATTCTAATAGATG T15 

   

            AAA ATAATCAACAAATAGAGAACTGCTAGATGATAGGTGA TATAGAT13 

      ATATT AACCACAATCAATAAGTAAGAGACTACTAGATGATAGATAA T14 
   ATATATAAAACCACAATCAT CAGATAAGAGACTATTAAGTGATA T12*† 
ATATAT AATAAAACCACAATTAGCAAGTAAGAAG GTATCAGATGATAATTAT06† 

ATATAT ACAAACAAATACAGAGATCGACGAGAAAGAAAGTGAGATT TAT12 

       ATAAATAAATACAAAAATCAGCAAGAAAGAGAGTAAGATTGTGATTAAT T08 

      ATAT ACAAATACAAAAATCGATAGAAAAGAAAGTGAGATCATGATT TAT12 

       ATAAATAAATACAAAAATCGACAGAGAGAAAAGTAGGATTGTGATTAAT T12 

     ATAT AACAAATACAAGAGCCGATGAAGAAAAAGGTAGAACTGTGATTAAT T12 

    ATATAT ACAAATACAGAAACTGACGAAAGAGAGAATGAAGTTATGAT CT17 

ATATAT ACAAACAAATATGAGAACTAACAAGAGAGAAAGTGAGATTAT T12* 

ATATAT ACAAACAAATATGAGAACTAACAAGAGAGAAAGTGAGATTATA T17 

ATATAT ACAAACAAATATGAGAACTAACAAGAGAGAAAGTGAGATT T13 

   ATATAT AACAAATACAGAAGCCAACGAGAGAAGGAATAAGATTGTAAT T10 

    ATATAT ACAAATACAGAAACTGACGAAAGAGAGAATGAAGTTAT T14 

     ATAT AACAAATACAAGAGCCGATGAAGAAAAAGGTAGAACT TTGATTAAT12 

      ATAT ATAAATACAAAAACTAACGAAAGAAAAGATGGAACTGTGGTTAAT T12 

      ATAT ACAAATACAGAAACTGACGAAAGAGAGAATGAAGTT T19 

     ATAT AACAAATACAAGAGCCGATGAAGAAAAAGGTAGAACTGT TATTAT14 

 
 

   

 

190 

 

 

 

 

 

 

5' 
35 
36 
29 
36 

54 
51 
51 
52 
50 
49 

81 
81 
81 
88 
88 
88 
81 

108 

117 
117 
118 

141 
141 
134 
146 
142 
141 
145 
143 
131 
147 
141 
141 
141 
143 
141 
134 
141 
142 

163 
163 
168 
185 
204 
195 
199 
195 
195 
199 
202 
201 
204 
199 
202 
204 
195 
204 
202 

 
 

 

3' 
274 
279 
270 
268 
279 
264 
279 

 

299 
308 
310 
300 
306 
299 
300 
307 
300 
300 
289 
307 
300 

 

320 
321 
335 
332 
321 

 

365 
365 
365 
365 
360 

 

391 
391 
389 
389 
389 
389 
391 
389 
388 
390 

 

406 
406 

 

418 
422 
418 
418 
422 

 

426 
436 
449 
438 
449 
449 
452 
453 
452 
467 
469 
460 

 

474 
3' 

5' 
229 
236 
229 
238 
238 
234 
234 

 

 

 

 

 

 

 

 

258 
265 
265 
258 
267 
261 
258 
261 
262 
261 
258 
263 
257 

293 
284 
293 
291 
293 

323 
330 
323 
323 
323 

345 
345 
345 
349 
347 
343 
357 
353 
352 
354 

362 
362 

376 
378 
384 
376 
378 

397 
397 
411 
409 
413 
410 
413 
418 
413 
418 
437 
427 

443 
5' 

 

Reads 
289 
575 
559 
120 
37 
33 
29 

COIII gRNA Sequences cont. 
      ATA TACAAAACAAATCTAACAGTGATAGTAACAGATAGATATAGAGATT T10 

ATAT AAAATCACAGAACAGATCTGATAGTAACAGTAATAAGTAAATAT T11 

       ATATAT AAACAAATCTAATGATAACGATGACGGATAGATATAGAGATT TAT16 

 ATATAAACCACAAT ACAGATCTGACAGTAATGATGATAGGTAAAT T05 

ATAT AAAATCACAGAACAGATCTGATAGTAACAGTAATAAGTAAAT TTCT14 

          ACAAAACAT ATCTAGCAGTAACAGTGACGAATAGATACAA T07 

ATAT AAAATCACAGAACAGATCTGATAGTAACAGTAATAAGTAAATATAA TTTT 

 

   

18,740 
4,452 
4,418 
814 
435 
181 
347 
328 
244 
195 
182 
71 
39 

       ATATAT AAATCAAATAAACTATGTAGAAAGTTACGAGATAGATTTAATA T10 

 AAA AAAACACAAAAATCAAGTGAACTATGTAGAGGATTGTAAGATAA T11 

   ATAAAACACAAAAATCAAGTGAACTATGTAGAGGATTGTAAGATAA T13 

      ATATAT AAAATCAAATAAATTACGTAGAGAGTTACAGAATAAGTTTAAT T10 

ATATAT AACACAAAAATCAGATAGACTATGTAGAAGATTGTGAAAT T11 

       ATATAT AAATCAAATAAACTATGTAGAAAGTTACGAGATAGATTT T08 

      ATATAT AAAATCAAATAGATCACGTGAAGAGTTATAGAATAGATTTAAT T14 

 ATAC AAACACAGAAATCAGATAGATCACGTAGAGAGTTATAAGATAAATTT T08 

   ATATATATT AAAATCAGATAAGCCACGTAGAAGATTGTAAAGTGAATT AT12 

   ATATATATT AAAATCAGATAAGCCACGTAGAAGATTGTAAAGTGAATTT T09 

                        AATCACGTGAAAGATCGTAGAATGAGTTTAAT T13 

 ATAC AAACACAGAAATCAGATAGATCACGTAGAGAGTTATAAGATAAAT AT08 

      ATATAT AAAATCAAATAGATCACGTGAAGAGTTATAGAATAGATTTAATA T12 

 

   

1,024 

21 
1 

1,923 
151 

          AATACTGT ATATGATGTAGTAAGATATAGAGATTAA T10 

          ATATAAA GATACAACGTAATAAGGCATAGAAGTTAAGTGAATTAT TGT12 

    AAAAAACAATACTGGATATGATGTAGTAAGATATAGAGATTAA TAACT06 

ATATAT AAACAATACTGGGTACGATGTAATAGAATGTGAAAGTTAAAT T14 

         AATACTGA GATACGACGTGATAAGATATAGAAGTTAA T11 

 

   

33,179 

856 
301 
4,275 
139 

 

522 
203 
490 
461 
365 
262 
106 
79 
48 
41 

 

101 
139 

 

9,942 
1,634 
185 
169 
375 

 

15 
10 

20,772 
1,420 
224 
125 
545 
60 
20 
451 
238 
218 

 

37 

Reads 

  ATAT ATAAAACAAACTCGCTATGTAAGAACTGTAAAAAGTGATATT AT12 

  ATAT ATAAAACAAACTCGCTATGTAAGAACTGTAAAAAA GTGATATT T09 

  ATAT ATAAAACAAACTCGCTATGTAAGAACTGTAAAAAGCGATATT AT14 

ATATAT ATAAAACAAACTCACTGTGTAAAGATTGTAGAAAGTGATATT AT24 

     ATACAT ATAAACTCACTGCATAAGAATCATAGAGAGTGATATT AT11* 

   

 ATATAT ATAATACAACAAGGAGCGTCATAAGTAAAGTGAATTCGTTATAT T12 

  ATATT ATAATACAACAGAAAATGTCATAAGTGAGATGAATTCGTTATAT T08 

   ATATAA AATACAACAAGAGACGTCGTAAATAGAGTAAATTCGTTATAT TTT 

 ATATATAA AATACAACAAGAGACGTCGTAAATAGAGTAAATTCGTT T06 

 ATATATAA AATACAACAAGAGACGTCGTAAATAGAGTAAATTCGTTAT T11 

 ATATATAA AATACAACAAGAGACGTCGTAAATAGAGTAAATTCGTTATATAA T15 

  ATATT ATAATACAACAGAAAATGTCATAAGTGAGATGA TTCGTTAT06 

   ATATAA AATACAACAAGAGACGTCGTAAATAGAGTAAATTT TTT 

    ACAAAT ATACAACAAAAGATGCCGTAGATAAGATAGATTTG GTATATTTT 

  ATATAT TAATACAACAGAAGACGCTATAAGTGAGATAGATT GATTATAT27 

   

ATATAT ATAAACATAAATCAGATAGTACAATGAAGAGTGTTATAGATAA T09 

  ATAT ATAAACATAAATCAGATAATACAGTGAAGAGTGTCATAGATAA T06 

   

     ACA TATAACACAAAAATAGACATAGACTGAATGATGCAGTGAAA T13 

ATAT AACTTACAACACAGAGATAGACATAGATCAGATAATGTGATAA T13 

     ACA TATAACACAAAAATAGACATAGACTGAATGATGCA TTGAAAT11 

     ACA TATAACACAAAAATAGACATAGACTGAATGATGCAGTAAAA TTTTAACT06* 

ATAT AACTTACAACACAGAAATAAGCATAGATCAGATAGTGTGATAA TTAAT17 

   

      AAAACGAT AGCAAATTCATGACGTGAAAATAGATGTAA T11 

AAAA AAAAAACGAAAGCAGATTCACGGTACAGAGATAGATATAG T10 

            ATATATA ATATAAGGTAAATGAGAGACGAGGGTAGACTTGTGATAC TAT12 

                  ATAATAAGGTAT ACAGAGAACGGAAGCAGACTTATGATATAA T12 

            ATATATA ATATAAGGTAAATGAGAGACGAGGGTAGACTTGTGAT T10 

              ATATA ATATAAGGTAAATGAGAGACGAGGGTAGACTTGTGATATA T09 

          ATATAT AACATATAAGGTAAATAGAAGATGGAAGCGAATTTGTGAC T16 

ATATATATAAACAAC AGACATATAAGGTAAGTAAGAGATGAAGGTAAATTT T09 

          ATATAT AACATATAAGGTAAATAGAAGATGGAAGCGAATTTGTGAT TCT05 

ATAC TATAATAAACAACAAAATGTGTAAGGTAGATAAGAAGTGAAGGTAAATT ATATTTT 

 A TACATAATAAACAATGAGATATATAAGGTGAAT CGAAAGTGAAATAT12 

  ATATTAT AACAACAAAACGTATAAGGTAAGTGAAAAATGGA TGTAAAT12 

   

A ATAATCACATAATAAATGATAGAACGTATAAG ATAGATGAAAAT10 
COIII gRNA Sequences cont. 

191 

497 
497 
499 
497 
497 
499 
499 
499 
499 
501 
499 

 

522 
539 
539 
539 
539 
539 
539 
539 
539 
535 
539 
539 
539 
535 

 

565 
564 
564 
563 
565 
563 
564 
563 

 

592 
594 
592 
592 
592 
592 
592 
592 
592 
594 
593 
592 
593 
593 
593 
593 
593 
596 
593 
603 
593 

 
 
 
 
 
 
 
 

 

66,677 
5,941 
936 
371 
139 
6,621 
3,602 
295 
185 
185 
107 

 

288 

145,031 
4,441 
1,630 
1,118 
1,052 
849 
847 
653 
599 
482 
448 
353 
1,604 

 

188 
181 
77 

3,592 
640 
336 
420 
144 

 ATACAT AATACCAATAGAAGACAGAATTGTAGTCATGTGATA  TTCAT14 

 ATACAT AATACCAATAGAAGACAGAATCGTAGTCATGTGATA TTCAT13 

 ATAT AAAATACCAATAAAGAACAGAATTATAGTTGTATGATAGATAA  AT12 

 ATACAT AATACCAATAGAAGACAGAATTGTAGTCATGTGAT T11 

 ATACAT AATACCAATAGAAGACAAAATTGTAGTCATGTGATA   TTCATAT11 

ATATT AAAATACCAATAGAAAATGAGACTGTGATTATATGATGAATA T14* 

ATATT AAAATACCAATAGAAAATGAGACTGTGATTATATGATGAAT T14 

 ATAT AAAATACCAATAAAGAACAGAATTATAGTTACATGATAGATAATA T10 

ATATT AAAATACCAATAGAAAATGAGACTGTGATTATATGAT T15 

AAA AAAAAATACCAGTAGAAGATAAGACCATAATCATGTGATAA TTTT 

ATATT AAAATACCAATAGAAAATGAGACTGTGATTATATGATG T14 

   

AAATA ATCAACAAATTAAATGAATCTAAAAGGTATCAGTGAAAA T14 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAATTTAGAGAATATT T10 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAATTTAGAGAATATTAAT TTCT08 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAATTTAGAGAATATTA T15 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAATTTAGAA TATTAATAG TTTT 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAATTTAGAGAAT T14* 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAGTTTAGAGAATATT T10 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAATTTAGAGAATATTAATA T14 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAATTTAGAGAATATTAATAG TTTT 

  ATAT ATAAAATGTATTTGTCGACGAGTTAAATGGAT GTAGAAGAT12 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATAAATTTAGAGAATATT T11 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAATTT TAGAATAT12 

ATA TAAATAAAATGTATTTGTCAATGGATTAGATGAATTTAGAAAATATT T09 

  ATAT ATAAAATGTATTTGTCGACGAGTTAAATGGAT GTAGAAGAT15 

   

ATATAT AAAATTAACAAGTGAATCACTAACAGATAGATAGAATG ATAT12* 

  ATATT AAATTAACAGATAAGCCACTGACAAATAGATAGAGTG ATAT12 

 ATATAT AAATTAACAAATAGACTACTAATAAGTGAGTAAGATGTATT AATTTATATATTTT* 

  ATATAT AATTAACAAGTAGATCACTGACAAATAGATGAGATGTAT AAT13* 

ATATAT AAAATTAACAGATAGATCATTAACGAGTAGATAAAGTG ATAT11 

  ATATAT AATTAACAAGTAGATCACTGACAAATAGATGAGATGTATTT T08 

  ATATT AAATTAACAAATAAACTATTAATGGATGAGTGAGATGTA ATTAT15 

  ATATAT AATTAACAAGTAGATCACTGACAAATAGATGAGATGT T16* 

 

   

99,540 
27,720 
1,442 
599 
504 
309 
238 
176 
150 
147 
128 
114 

84,835 
6,897 
4,207 
1,803 
1,137 
722 
486 
456 
206 

 
 
 
 
 
 
 
 

 

 ATATAT AAACCTAAATCAAGAACATAGAACAGAGAGATTAGTGAGTAAATT T12 

 ACAT AAAAACCTAAACTGAGAATACGAGACAAAGAAATTAGTGA TTAAAT12 

 ATATAT AAACCTAAATCAAGAACATAGAACAGAGAGATTAGTGAGTAAATTA AT11 

 ATATAT AAACCTAAATCAAGAACATAGAACAGAGAGATTAGTGAGTAA T13 

 ATATAT AAACCTAAATCAAGAACATAGAACAGAGAGATTAGTGAG AAAAT13 

 ATATAT AAACCTAAATCAAGAACATAGAACAGAGAGATTAG GGAGTAAAT14 

 ATATAT AAACCTAAATCAAGAACATAGAACAGAGAGATTAGCGAGTAAATT T09 

 ATATAT AAACCTAAATCAAGAACATAGAACAGAGAGATTAGTGAGTAAA AT11 

 ATATAT AAACCTAAATCAAGAACATAGAACAGAGAGATTAGTGAGTAAATTATT T10 

 ACAT AAAAACCTAAACTGAGAATACGAGACAAAGAAATTAG GGATTAAAT11 

 ACATT AAAACCTAAACTGAGAATACAAGACAGAGAGATTAA GGATTAAAT13 

 ATATAT AAACCTAAATCAAGAACATAAAACAGAGAGATTAGTGAGTAAATT T10 

 ACAAT AAAACCTAAACCGAGAACATAGAGCAGAGAAGTTAGTAGATAA T13 

 ACATT AAAACCTAAACTGAGAATACAAGACAGAGAGATTAATGA TTAAAT14 

 ACAAT AAAACCTAAACCGAGAACATAGAGCAGAGAAGTTAGTAGATA T13 

ATATAT AAAACCTAAATCAGAGACGCAGAATAGAGAGATTGATA TAT10* 

 ACAAT AAAACCTAAACCGAGAACATAGAGCAGAGAAGTTAGTAGAT T12 

  A AAAAAAACCTAAACCGAGAACATAGAGCAGAGAAGTTAGTAGATAA T15 

 ACATT AAAACCTAAACTGAGAATACAAGACAGAGAGATTAAT T12 

  ACAAAAAAACCTAAACCGAGAACATAGAGCAGAGAAGTTAGTAGATAA T05 

  ATAT AAAACCTAAACCAAAGATATGAGACAGAGAGATTAGTGA TATGT13 

 

 

 

 

 

 

 

 

   

3' 
629 

Reads 
85,916 

    ATATA GAACTCAATCATAATATGAAGCAATAACAATGAAGAGATTTAA T12 

COIII gRNA Sequences cont. 

192 

 

 

 

461 
461 
456 
462 
461 
457 
458 
454 
462 
460 
461 

483 
491 
488 
490 
498 
495 
491 
487 
486 
504 
491 
502 
491 
504 

528 
528 
524 
525 
528 
523 
526 
527 

548 
555 
547 
551 
554 
558 
548 
550 
545 
558 
558 
548 
551 
555 
552 
556 
553 
551 
557 
551 
555 

 
 
 
 
 
 
 
 

 

5' 
585 

 

629 
629 
631 
629 
629 
629 
629 
629 
622 
629 
634 
628 
624 
630 
628 
630 
628 
631 
630 
647 
647 
643 
647 
643 
643 

 

669 
669 
676 
669 
679 

 

691 
682 

 

717 
722 
706 
715 
722 
726 
715 
722 
730 
715 
715 
717 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

9,106 
1,991 
1,447 
276 
269 
263 
221 
150 
145 
131 

17,313 
8,519 
3,564 
2,850 
628 
582 
231 
135 
112 

10,315 
7,606 
250 
637 
173 
150 

 

482 
268 
60 
42 
25 

 

649 
39 

 

758 
374 
354 
160 
1,329 
1,181 
911 
738 
129 
107 
102 
90 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

    ATATA GAACTCAATCATAATATGAAGCAATAACAATGAAGAGATTT T11 

    ATATA GAACTCAATCATAATATGAAGCAATAACAATGAAGAGATTTA T12 

   ATAT ATAAACTCAATCATAGTATAAGATGACGACAATGAGAAGATTTAA T13 

    ATATA GAACTCAATCATAATATGAAGCAATAACAATGAAAAGATTTAA T12 

    ATATA GAACTCAATCATAATATGAAGCAATAACAATGAAGA TTTAAT14 

    ATATA GAACTCAATCATAATATGAAGCAATAACAATAAGAGA TTTAAT07 

    ATATA GAACTCAATCATAATATGAAGCAATAACAATGAAGAGATT AAT13 

    ATATA GAACTCAATCATAATATGAAGCAATAACAATGAA TAGATTTAA T06 

          ATATAT ATCATAATACAAGGCAATGACGACGAGAAGATTTAGATTAA T11 

    ATATA GAACTCAATCATAATATGAAGCAATAACAATGAAGAGA ATTAATTAT06 

  AT ATAACAAACTTAATCGTAATATGAAACAACGA GAATGAGAAAAT13 

    ATATAT AACTCAATCATAATACGAGATGATAATGACGAAGAGATTTAA T13 

         ATATA TAATCATAATACAGAACGATGGCAGTGAAGAGATTTAGATTAA T12 

ATATATAT CAAACTCAATTGTAGTACGAGACAATAATGATGAGAAGATTT T10 

    ATATAT AACTCAATCATAATACGAGATGATAATGACGAAGAGATTT T08 

ATATATAT CAAACTCAATTGTAGTACGAGACAATAATGATGAGAAGATTTAA T11 

    ATATAT AACTCAATCATAATACGAGATGATAATGACGAAGAGATTTA T11 

 ATATAT ACAAACTCAGTCATAGTATAAGACAGTGATAATGAGAGAATTT T12 

ATATATAT CAAACTCAATTGTAGTACGAGACAATAATGATGAGAAG T13 

 ATACAAAAAC AAAAACCAAACGATGAACTTGATTGTAGTATAAGATAATA T13 

 ATACAAAAAC AAAAACCAAACGATGAACTTGATTGTAGTATAAGATAAT T13 

ATACAAAAACAAAAT ACCGAGCGACAGATTTGATTGTAGTATAAGATAATA T11 

 ATACAAAAAC AAAAACCAAACGATGAACTTGATTGTAGTATAAGATA T21 

ATACAAAAACAAAAT ACCGAGCGACAAGTTTGATTATAGTATAAGATAATA T13* 

ATACAAAAACAAAAT ACCGAGCGACAAGTTTGATTATAGTATAAGATAAT T15* 

   

ATATACAATGCAAACTTA CATGACTGGTTTTATAGAGATGAGAGATTAA T14 

ATATACAATGCAAACTTA CATGACTGGTTTTATAGAGATGAGAGATTAA T15 

  ATATACAATGT ACTCTCATAATTGGTTTCATAGAGATAGAAGATTAA T15 

ATATACAATGCAAACTTA CATGACTGGTTTTATAGAGATGAGAGATT T08* 

     ATATA CAAACTCTTATAACTGGTTTTACGAGAATGAGAAATTAAAT T15 

   

ATAT TAAATAACAATGCGAATTTTCATAGTTGGTT CATAGATACAAT12 

           AT ATGTGAACTCTTATAACTGGTTTTGTAG TGATGATAGAT15 

   

            ATAAT AGAACACCACAGCTTAATGTAGTAGATGGCAGTGTAAATTTTT T10 

        ATAT AATCAAAAACACCGTAACTTGATGTAGTAGATAGTAGTGTAAATTTTT T07 

      ATATAGAACCAAAAACAG TGCAATTTAGTGTGATAGATGATAGTGTAAATTTTT T07* 

      ATATAGAACCAAT AACATCGCGACTTAGTGTGATAAGTAATAGTGTAAATTTTT T07 

      ATATAT AACCAAAAACACTGCGATTTGATGTAATAAGTGAC TGTGTAAAT11 

    ATAT ATAGAACCAAGAACACCATAGTTTGATGTGATAG TGATAGTGTAAAT14* 

       ATATAGAACCAT AACACCATGATTTGATGTAGTAAATGATGATGTAAATTTTT T09 

        ATAT AACCAAAAACACTGTAACTTGATGTAGTAGATAGTAGTGTAAATTTTT T08* 

ATAT TAAAATAGAACTAAAGACACTGTAACTTAGTG AGTAAATGATATTAAT10* 

       ATATAGAACCAT AACACCATGATTTGATGTAGTAAATGATGATGTAAAT AT08 

       ATATAGAACCAT AACACCATGATTTGATGTAGTAAATGATGATGTAA T11 

          ATAAATT AAAACACCATAATTTGATGTGATAAGTAATGATGTAAAT AT17 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

3' 
753 
753 

Reads 
31,331 
8,037 

ATATAT AAATGTAATAGATCTGATGAAAGTGAGGTAGAATTGAGAATATT  T10 

ATATAT AAATGTAATAGATCTGATGAAAGTGAGGTAGAATTGAGAATAT AT14* 

COIII gRNA Sequences cont. 

193 

587 
586 
585 
585 
592 
591 
588 
594 
580 
590 
603 
585 
580 
587 
587 
585 
586 
587 
591 
604 
605 
604 
607 
604 
605 

635 
635 
635 
637 
633 

659 
653 

669 
669 
669 
669 
684 
689 
669 
669 
695 
675 
677 
675 

 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

5' 
706 
707 

 

753 
753 
748 
753 
753 
748 
752 
753 
753 
753 
753 
753 
750 
748 
746 
753 
765 
767 
765 
765 
767 

 

781 
781 
781 
779 
778 
790 
790 

 

815 
815 
815 
815 
815 
815 
815 
814 
815 
814 
814 

 

829 
829 
829 

 

842 
842 
842 

 

854 
855 
855 
854 

 

890 
891 
882 
889 
889 
891 

 
 
 
 
3' 
918 
927 
929 

6,744 
3,791 
2,726 
1,214 
1,124 
977 
848 
595 
206 
182 
154 
131 
1,457 
688 
304 
416 
2,739 
1,193 
2,604 
456 
284 

 

495 
159 
40 
19 
19 
153 
89 

 

ATATAT AAATGTAATAGATCCAATGAAGGTAAGATAGAACTGAGAATATT T09 

ATATAT AAATGTAATAGATCCAATGAAGGTAAGATAGAACTGAGAATAT AATTAT07 

        ATA TAATAAATCCAATGAAGATAAAGTAGAGTCAGAGATATTATGAT AT13 

ATATAT AAATGTAATAGATTCAATGAAGGTAAGATAGAACTGAGAATATT T09 

ATATAT AAATGTAATAGATTCAATGAAGGTAAGATAGAACTGAGAATAT AAT14 

      ATATA TAATAAATCCAATGAAGATAAAGTAGAGTCAGAGATATTATGATTT  T10 

ATATAT  AATGTAATAAATCTAATAGAGATAAGATAGAACTGAGGATAT AT12 

ATATAT AAATGTAATAGATCTGATGAAAGTGAGGTAGAATTGAGAAT T09* 

ATATAT AAATGTAATAGATCTGATAAAAGTGAGGTAGAATTGAGAATAT AT13 

ATATAT AAATGTAATAGATCTGATGAAAGTGAGGTAGAATTGAGA TTTAT08 

ATATAT AAATGTAATAGATCCAATGAAGGTAAGATAGAACTGAGAAT T14 

ATATAT AAATGTAATAGATCTGATGAAAGTGAGGTAGAATT TAGAATATAT08 

   ATATAT TGTAATAAATCCGATAGAAGTAAGATAGAACTGAGA TAT15 

      ATATA TAATAAATCCAATAAGAATAAGATGGAACTGAAGA GATTATGAT11 

           AT ATAAATCCAATAAAAGTGAAATAGAATC TGAGTTTT 

  ATAT AAATGTAATAAATTCAGTAGAAGTAAGATAGAATTGAAGATAT ATATAGT26 

  ATATAT AACATGCATAAGATGTAGTGAATCTAGTAAGAGTAAGATAG T14 

ATATAT AAAACATGCATAGAGTGTAGTAAGTTCAGTGAAAGTGA TATAGT09 

  ATATAT AACATGCATAAGATGTAGTAGATTCAGTGAAGATAAGATA T14 

  ATATAT AACATGCATAAGATGTAGTAGATTCAGTGAAGATAAGAT T10 

ATATAT AAAACATGCATAGAGTGTAGTAAGTTCAGTGAAAGTGA TATAGTTTT 

   

ATAT ACAAAACACCTAAGAAGATGTGCGTAGAATGTGATAGATTTAAT T16 

ATAT ACAAAACACCTAAGAAGATGTGCGTAGAATGTGATAGATTTAA AT15 

ATAT ACAAAACACCTAAGAAGATGTGCGTAGAATGTGATAGATTT T23 

       AAAACACCTAAGAAGATGTGCGTAGAATGTGATAGATTTAATA T06 

        AAACACCTAGAGAAACGTGCATGAGAT TGTGATAGATTAAT07 

ATATAT TAAACAACAACAGAACGTCTAAGAGAGTATGTATA TGATGTAAT15 

ATATAT TAAACAACAGCAGAACATCTAAGAGAGTATGTATA TGATGTAAT12 

   

111,116 
25,628 

 ATAT AAATTAAACAGACGTATGAAGCAAGTAGATAGTGATAAGATACT AT14 

 ATAT AAATTAAACAGACGTATGAAGCAAGTAGATAGTGATAAGATACTT T10 

323 
315 
281 
230 
188 
138 
136 
1,459 
527 

 

24 
5 
92 

 

116 
386 
156 

 

66 
5 
1 
1 

 

9,675 
138 
109 
2,508 
202 
127 

 
 
 
 

Reads 
1,822 
600 
222 

 ATAT AAATTAAACAGACGTATGAAGCAAGTAGATAGTGATAAGAT T14 

 ATAT AAATTAAACAGACGTATGAAGCAAGTAGATAGTGATAA T12 

 ATAT AAATTAAACAAACGTATGAAGCAAGTAGATAGTGATAAGATACT AT11 

 ATAT AAATTAAACAGACGTATGAAGCAAGTAGATAGCGATAAGATACT AT15 

 ATAT AAATTAAACAGACGTATGAAGCAAGTAGATAGTGATAAG TTACTAT05 

ACATAT AATTAAACAAACGTATAGAGCAAGTAAGTAGCAGTGAAATATTT T11 

 ATAT AAATTAAACAGACGTATGAAGCAAGTAGATAGTGATAAGATAC AAT12 

  ATAT AATTAAACAAACGTATAAGACAAGTAGATGGCAGTGAAATAT AT11 

  ATAT AATTAAACAAACGTATAAGACAAGTAGATGGCAGTGAAATATTT T13 

 

ATAACAAAACGTGA TATCCATATACAGAGAATTAGATAGATGTATAAA T06 

         ATATA TATCCATACACAGAAAATTAGATGAACGTGTAAAATGAGTAA TTAT13 

         ATATA TATCCATACACAGAAGATTAAATAGACGTGTAAAGTGAATAA T15 

   

ATAT AAAACAAAACGTGTATTCATATATGAGAGATTAAGTAAATG AT15 

  AT AAAACAAAACGTGTATCTGTGTATAGGGAGTTAAATGGATGTATA TAT14 

ATAT AAAACAAAACGTGTATCTGTGTATAGGGAGTTAAATGGATGTAT T20 

   

   ATAT ATACAATACAAAGAAGCGAAACGTGTATCTATATATGGAAA T06 

ATATAT AACACAATACAGAGAGATGAAACGTGTA GATGTAT23 

ATATAT AACACAATACAGAGAGATGAAACGTGT T22 

     AT ACATAATACAAAGAGACAGAACGTG ATATCT16 

   

ATATAT AATCAAACTAAATTGACGAGATGTTGATGTAGATAGATATAAT AT08 

 ATAT AAATCAAACTAAATTAGCAAGGTGTCAGTGTAAATGAGCATGATAT T13 

        ATACAA TAAATCAACAAGATGTCGATATAGATGGATATGATATGAGAGAAT T15 

     AT ATTAAACTAAATCAATAAAGTGTCGATGTAGATGAATGTGATATA TAT11 

     AT ATTAAACTAAATCAATAAAGTGTCGATGTAGATGAATGTGATAT T11 

 ATAT AAATCAAACTAAATTAGCAAGGTGTCAGTGTAAATGAGCATGATAT TAT13 

   

 

 

 

     ATATAA ATCAGAATAAACAGATCGCAATAGAGAGAATTAAGTTAA TAT14 

AA ATATAAAACATCAAGATAAATGGATTGTGATAGAGAAAGTTAAATT T11 

 ATATATAAAACATCAGAATAGACAAATCGTAATAGAGAAAGTTAAGTTAA T14 

COIII gRNA Sequences cont. 

194 

706 
707 
701 
706 
707 
699 
707 
713 
707 
715 
713 
719 
715 
714 
719 
707 
723 
728 
724 
725 
728 

736 
737 
739 
735 
750 
754 
754 

772 
771 
775 
778 
772 
772 
777 
771 
773 
773 
771 

 

796 
788 
788 

802 
798 
799 

814 
828 
829 
830 

848 
846 
838 
845 
846 
846 

 

 

 

 

 

 
 
 
 

5' 
880 
882 
880 

 

880 
882 
881 

907 
909 
913 
905 
905 
920 

935 
940 
935 
939 

942 
951 
951 

963 
965 

 

 

 

 

 

921 
929 
929 

 

947 
950 
946 
944 
944 
952 

 

977 
977 
977 
977 

 

983 
981 
981 

 

1003 
1003 

209 
159 
28 

 

787 
126 
558 
3,472 
1,321 
162 

    ATAT AATATCAAAATAAACAGATCGTAGTAAAAGAAGTTAGATTAA T12 

 ATATATAAAACATCAGAATAGACAAATCGTAATAGAGAAAGTTAAGTT T13* 

 ATATATAAAACATCAGAATAGACAAATCGTAATAGAGAAAGTTAAGTTA TTTT 

   

   ATATAA TATACACACAGATACATAATACGTAGAATGTTAAGATAAGT  T16 

  ATATA ATTACACACACAGATACGTGATATATAGAATGTTAAGGTAA TATAAT10 

      ATAC ATACACACAAATATATAACATATAGAGCATTGAG TTAGATAAT15 

      ATATAA ACACACAAATATATGGCATATAGAGCATTGAAGTAGATAA T13* 

      ATATAA ACACACAAATATATGACATATAGAGCATTGAAGTAGATAA T15 

ATAG AAAATTACACACATGAATACATAGTACATAGAA GATTGATATAT13 

 

   

2,920 
522 
354 
28 

ATATA AATCAACAACTGAAAAGATATCAATGAGATTGTACATGTAAAT T15 

ATATA AATCAACAACTAAGAAGACACTGATAGAGTTATATGTG ATTAAT14* 

ATATA AATCAACAACTGAAAAGATATCAATGAGATTGTACATGTAAAT T08 

ATATA AATCAACAACTGAAAAGATATCAATGAGATTGTACATGT T09 

 

   

1,160 
127 
25 

 

111 
68 

ATAT AACTAATCAACAGCTAAGAGAACGTCAATGAGATTATGTG ATTAAT14 

   AT ACTAATCAACAACTAGAAGAATATCAGTGA TATACATGTAAT10 

   AT ACTAATCGACAACTAGAGGGACATCAGTGA TTTATACGTAT15 

   

  ATA TACAAACTACCAATATAAGTTAACTGATCGGTAATTAAGG  TTATAT15 

ATATA TACAAACTACCGATATAAGTTAACTGATTGATAATTAA TGTTCT14 

 

 

195 

C) C-Rich Region 3 

3' 
62 
62 
64 
62 

 

88 
88 
87 
86 
88 
88 
88 
88 
88 
88 
88 
77 
88 
88 
88 
88 
88 
88 
88 
88 
88 
88 
89 
89 
77 
83 
88 
89 
88 
77 
86 
89 
83 

 

118 
123 
118 
123 
121 
124 

 

140 
140 
142 

 

166 
166 
166 
166 
162 
166 
167 
161 
168 
167 

 

196 
199 
200 
200 
200 

 

Reads 

18 
3 
1 
1 

 

140,541 
34,162 
3,434 
3,016 
2,553 
1,270 
902 
693 
648 
632 
468 
213 
212 
182 
181 
177 
170 
155 
151 
135 
133 
106 
598 
526 
437 
285 
255 
237 
152 
112 
70 
53 
31 

CR3 gRNA Sequences 

         ATATGT ACAACAAAACCGAGCAATCAGATAT AGAGTGAAAT09 

         ATATAT ACAACAAAACTGAACAATCAAATGT AGTGTGAT09 

ATCGCAAGGTCGT GGACAACAAAACTGAACAATCAAAT T11 

         ATATAT ACGACAAAACTGAACAATCAAATGT AGTGTGT08* 

   

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATGAT AT14 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATGATT T12 

ATATAATT AAATGTACAGACAAATGATAGAGAGACGATGAGATTAAGT TATAT12 

  AAAAATT AATGTACAAATAAACGATAGAGAGACAGTGAGATTA TGAT13 

   ATAT AAAATGTACAGACGAGCAGTGAAGAGACAGTGAGATTA TACAT11 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATG T11 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGAT T14 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTAAAATTAGATGAT AT12 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATT T09 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTA TATGATAATATTTT 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATGA AATAT14 

 ATATAT AAAATGTACATACGAACGATAAAAGGGCAGTGAAATTAGATAATT TT 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGCGAAATTAGATGAT ATCT07 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAG TTGTTAATAT11 

 ATATAT AAAATGTACAAACGGACAATGAGAAAACAGTGAAATTAGATGAT ATCT16 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGA AGATATAT11 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAA T19 

 ATATAT AAAATGTACAAACGAACAATGAGAGAACAGTGAAATTAGATGAT AT13 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTAAAATTAGATGATT T12 

 ATATAT AAAATGTACAAATGGACAATGAGAGAACAGTGAAATTAGATGAT AATAT07 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGA T12 

 ATATAT AAAATGTACAAACAGACAATGAGAGAACAGTGAAATTAGATGAT AATAT08 

  ATAT AGAAATGTACAAACGAGCAATAAGGGAACAGTGAAATTAGATGATT TCT07 

  ATAT AGAAATGTACAAACGAGCAATAAGGGAACAGTGAAATTAGATGAT AAT10* 

 ATATATAAAATGTACAT ACGAACGATAAAAGGGCAGTGAAATTAGATAATT T09* 

      ATATAA GTACAAACAAACAGTGAGAAGATAACGAGACTGAGTA TATATTTT* 

   ATAT AAAATGTACAAATAAGCAGTAGAAGAGCAGTGAAATTGA TGATAT09 

  ATAT AGAAATGTACAAACGAGCAATAAGGGAACAGTGAAATTAGATG TTTAAT09 

   ATAT AAAATGTACAGACAAGCAGTGAAGAGACAGTGAGATTA TACAGTTTT 

 ATATATAAAATGTACAT ACGAACGATAAAAGGGCAGTGAAATTAGATAATTA T12 

   AAAATC AATGTACAAATAAGCGATAAAGAGACAGTGAGATCA TGAT15 

  ATAT AGAAATGTACAAACGAGCAATAAGGGAACAGTGAAATTAGAT T12 

      ATATAA GTACAAACAGACAGTGAGAAGATAACGAGACTGAGTA TATTTAT17 

 

   

573 
1,321 
685 
251 
243 
120 

      ATATAT AATCACAAACAAATAGAAAATGAGAGAGGTGTATGA TACTAT15 

 ATATAT AAACAAATCACGAACGAGTAGAAAAT TGGAAGAATGTAT12 

      ATATAT AATCACAAACAAATAGAAAATGAAAGAGGTGTATGA TACTATTTT 

  ATATT AAACAAATCACAAATGAGTAGAGAACGAGA TTGATGTATAT16 

     ATAC ATAAATCACAAACGAATAAGGGGCAGAAGAG TTGTATGATAT05 

ATATAT AAAACAAATCATAAACGAATAAAGAGTGAGGAAGGTGTATAAAT TTT 

 

   

15,770 

692 
442 

 

27,885 

212 
139 
114 
309 
421 
370 
348 
110 
53 

 

1,492 
136 
2,176 
114 
98 

   A TAGATAACAAACATAAGAGCAAGTCACGAG GTAGATATTGATAT14 

   A TAGATAACAAACATAAGAGCAAGTCACGAG ATAGATATTGAT12 

AT ATTAAATAACAAACATAGAAATAGATCACAGATAGATGA TTGATGAAT06 

   

         ATAGAAATCCAATAAGAGACAGAAGCTAGATAATGAGTATAAGA TAT13 

         ATAGAAATCCAATAAGAGACAGAAGCTAGATAATGAGTATAA T12 

         ATAGAAATCCAATAAGAGACAGAAGCTAGATAATGAGTATA T17 

    ATAT ACAAAAATCCAATGAAAAATAAAGACTGAGTGATGGATG CAAT15 

      ATATAT AAATCCAATAAGAAATGAAAGCTAGATAGTGAGTATAAG T16 

  ATATAT ACAAAAATCCAATGAAAAATAAAGACTGAGTGATGGATGTAA TTATTTAT14 

   ATAT AACAAAAATCTAATAGAGAACAGAAACTAAGTAATGAG ATAT09 

      ATATTAT AATCCAATAGAAAGCAGAGACTAAATGATAGATGTAAA T09* 

ATATAT AAACAAAAATTCAATAGAAAACGAGAACTGAGTAAT TGATATGT05 

 ATATAT AACAAAAATCCGATAAAAGATGGAAACTAAGTGATAGATATA T13 

   

    ATAT ATACAACAATAAACTCGTATTAAGTGAGAGATGAAGATTTAAT T13 

 ATAT TAAACACAACGATAGATCTATATTAAGTAGAAGATAGAAATTTA T16 

ATAT ATAAACACAACAGTAGACTCGTATTAGATAGAAGATAGA TATCAT15 

ATAT ATAAACACGACAATAGATTCATATTAAATAGAAGATAGAGATTTA T12 

ATAT ATAAACACGACAATAGATTCATATTAAATAGAAGATAGAGATTT T07* 

 

   

196 

5' 
34 
34 
36 
34 

41 
40 
48 
51 
51 
47 
48 
41 
52 
51 
46 
40 
41 
50 
41 
49 
55 
41 
40 
41 
56 
41 
40 
41 
40 
47 
50 
47 
51 
39 
51 
48 
47 

78 
93 
78 
89 
86 
76 

105 
105 
98 

122 
124 
125 
127 
123 
124 
129 
123 
132 
125 

154 
156 
162 
156 
157 

 

 

 

 

 

 

 

5' 
190 
190 
192 

 

 

226 
231 
237 
241 
234 

293 
268 

 
 

3' 
230 
230 
232 

 

277 
265 
265 
279 
265 

 

308 
312 

Reads 

2 

1,243 
279 

 

19 
2 
1 
936 
528 

 

2 
125 

CR3 gRNA Sequences cont. 

  ATATACA ACATATCAAGTGGTAAGATAAGAGAAGAAAGTAGATGTAAT T06 

  ATATACA ACATATCAAGTGATAAGATAAGAGAAGAAAGTAGATGTAAT T12 

ATAGATA ACACATATCAGATGATAAGGTAAAGAGAGAGAATAGATATA T13* 

   

 ATAT AATAACAATATAAACGAACTGGATGATGTATAGTACTTTGATATATA AT12 

    ATATAACAATAC AAATGAGCTAGATAATGGATGATAGTTTGATAT T12 

TATAAAATAACAATAC AAGTGAATTAGATAATGGATGATG TTTCAAT15 

 AT AGAATAACAATATAAACGAACTAGATGATGGATAGTA TCT17 

    ATATAACAATAC AAATGAACTGAGTAATGGATAGTAGTTTGA CAAT09 

   

ATATAT AAATTATTTGCATACT GAGATTAGTGATAGTAAAGTGATTAAT13 

AT ATAAAAATTATTTGCATGCTTAAGTGAGTTATAGAGGTAATGATA AAT07 

 

 

197 

D) C-Rich Region 4 

3' 
64 
64 
64 
62 

 

103 
103 
98 
103 
103 
103 

 

134 
134 
134 
139 
134 
142 
138 
138 

 

171 
171 
192 
200 
196 
200 
196 
196 

 

232 
225 
228 
228 
232 
227 
232 
227 

 

261 
261 
261 
261 
258 
259 
259 
258 
258 
258 

 

300 
284 
295 
295 
295 
298 
295 
297 
301 
295 

 

320 
320 

 
3' 
351 

 

 

 

 

 

5' 
25 
25 
25 
25 

48 
48 
65 
51 
59 
52 

87 
89 
88 
93 
90 
93 
90 
91 

127 
128 
154 
174 
166 
171 
169 
171 

186 
186 
186 
186 
186 
186 
189 
189 

213 
213 
216 
222 
213 
213 
213 
213 
215 
223 

 

251 
242 
251 
250 
253 
253 
244 
250 
251 
255 

 

280 
279 
 

5' 
307 

 

Reads 
121 
596 
308 
83 

 

296 
175 
108 
18 
11 
10 

 

7,793 
426 
208 
200 
143 
2,052 
584 
139 

CR4 gRNA Sequences 

 ATAAT AAAAATGCACAACTAGAATTGAAGTAAAGTGATGATA TATAT14 

 ATAAT AAAAATGCACAACTAGAATTGAAGTAAAATGATGATA TATAT14* 

 ATATT AAAAATGCACAACTAGAATTGAAATAAAGTGATGGTA TATAT13 

ATATAATT AAATGCACAGCCAAAGTTAAGGTAGAATAGTGATA TAT14 

   

   ATA TATATAAAACACAGACATACTAAGTAAGAGAAAGAGAGGTGTATGATT T12 

  *ATA TACATAAAACACAGACATACTAAGTAAGAAAAAGAGAGATGTATGATT T12* 

    ATATAT TAAAACACAAATACATCAGATAGAAAGAGA TTGAGTGTATAAT17 

   ATA TACATAAAACACAGACATACTAAGTAAGAAAAAGAGAGATGTATG T21 

   ATA TACATAAAACACAGACATACTAAGTAAGAAAAAGAGAGAT T15 

ATTATA TACATAAAACACAGACATACTAAGTAAGAAAAAGAGAGATGTAT T15 

   

   ATATAT AAACAACAATAGAGTATATCATAGACTGTATATGAAGCATAAAT T10 

 ATATATAT AAACAACAATAGAGTATATCATAGACTGTATATGAAGCATAA CT08 

 T ATATAT AAACAACAATAGAGTATATCATAGACTGTATATGAAGCATAAA AT13 

     AAACAAAACAACAGTGAAATATACCGTAGATTGTATGTGAAAT TATATAT06 

 ATATATAT AAACAACAATAGAGTATATCATAGACTGTATATGAAGCATA T13 

A AAAAAACAAAACAACAGTGAAATATATCGTAGATTGTATGTGAAAT TAT05 

 ATAT AACAAAACAACGATAAGATGTATCATGAGCTGTATATGAGATATA T25 

 ATAT AACAAAACAACGATAAGATGTATCATGAGCTGTATATGAGATAT T13 

 

   

6,971 
204 
3,603 

6 
5 
3 
1 
1 

 

31 
22 
7 
4 
166 
152 
25 
12 

 

14,358 

264 
114 
112 
5,062 
1,370 
1,074 
490 
340 
183 

 

23 
9 
77 
64 
25 
22 
20 
17 
16 
11 

 

643 
93 

 

Reads 
725 

ATAT ATATACTCACACAAATAGATGACAGAGATAGAAAGTAAGATGATA TAT15 

ATAT ATATACTCACACAAATAGATGACAGAGATAGAAAGTAAGATGAT T13 

    ATATTA AACTATAACAAGGCAGATAGAACGTACCTATATAGATAA T14 

AT AAATAAACAACTATAATAAGACAAGTG CGATGTACT10 

 ATATA AAACAACTATAACAAAATAGATAGAATGTGC GTAT06 

   AAATAAACAACTATAATGAAGTGAGTGAAG GTACGT12 

 ATATA AAACAACTATAACAAAATAGATAGAATG GGCGTAT06 

     A AAACAACTATAATGAAGCAGATAGAA GGTACGTATAT17 

   

   ATAAAAAATCACAGCCTAAAATGACGAGAGAAAGTAAATGGTTATA TAGAT05 

   ATATAT ATCACAACCTAAGATAACGAAAAAGAGCAGATAATTGTA TATTTT 

ATATAT AAAATCACAACTTAGAATGACGAAAGAGAATAGATAGTTGTA TATTGT22 

ATATAT AAAATCACAACTTAGAATGACGAAAGAGAATAGATAGTT TTAAAAAAT16 

   ATAAAAAATCACAGCCTAAAATGACGAGAGAAAGTAAATGGTTATA T16 

   ATAT AAATCACAACCTGAAATAGCAGAGAAGAGTAAATGATTATA T13 

   ATAAAAAATCACAGCCTAAAATGACGAGAGAAAGTAAATGGTT TTT* 

 ATATAT AAATCACAACCTGAAATAGCAGAGAAGAGTAAATGATT T09* 

   

              ATAT ATAAACTATACAATTGAAGCACTGATAGAAGGTTGTGATTTAA T12 

              ATAT ATAAACTATACAGTCAAGACACTGATGAGAGATCGTGATTTAA T13 
              ATAT ATAAACTATACAATTGAAGCACTGATAGAAGGTTGTGATTT TTT 

              ATAT ATAAACTATACAATTGAAGCACTGATAGAAGGTTG ATTTAATAT08 

               ATATAT AACTATACAGTCGAGACATCAATGAGAGATTATGACTTAA T13 

CCCTCGGCCGCAGCGATATA AAACTATACAATTAGAGCATCAGTAGAAGATTGTGACTTAA T14 

               ATATA AAACTATACAATTAGAGCGTCAGTAGAAGATTGTGACTTAA T15* 

               ATATAT AACTATACAATTGAGATATCAGTGAAGAATTGTGATTTAA T11 

               ATATAT AACTATACAGTCGAGACATCAATGAGAGATTATGACTT T14 

               ATATAT AACTATACAGTCGAGACATCAATGAGAGATT T05 

 

 AAAAAAAAAAATAAACAGAATTGTGACGTCATAAAGAAGATAGATTATAT T11 

         AAAAAAA AAAATTATAACGTCATAGAAGAGATAGACTATATGATTAA T09 

 ATAT AGAAAATAAACAGAATTGTGACGTCATAAAGAAGATAGATTATAT TTTT 

   AT AGAAAATAAACAGAATTGTGACGTCATAAAGAAGATAGATTATATA T06 

 ATAT AGAAAATAAACAGAATTGTGACGTCATAAAGAAGATAGATTAT T16 

   AAAAAAAAATAAACAGAATTGTGACGTCATAAAGAAGATAGATTAT TTT 

   AT AGAAAATAAACAGAATTGTGACGTCATAAAGAAGATAGATTATATAGTT T12 

    AAAAAAAATAAACAGAATTGTGACGTCATAAAGAAGATAGATTATATA T12 

AAAAAAAAAAAATAAACAGAATTGTGACGTCATAAAGAAGATAGATTATAT T05 

 ATAT AGAAAATAAACAGAATTGTGACGTCATAAAGAAGATAGATT TTTT 

   

 AAAA AAAAAACACAAGGCGAGATAGAGAAAAGAGATAAATAAGAT GT10 

AAAAA AAAAAACACAAGGCGAGATAGAGAAAAGAGATAAATAAGATT T05 

 

CR4 gRNA Sequences cont. 

            ATAAAAT ACCAAACAGATCAGATAAAGGCAGTGATATAGAGAATATAAAAT T11 

198 

353 
351 
351 
354 
352 
353 
352 
352 
352 

 

388 
390 
388 

 

417 
417 
415 

 

457 
456 
457 
457 
457 
458 
456 
443 
457 
458 

 

489 
489 
487 
486 
487 
487 
497 
494 
497 
494 

 

519 
524 
524 
524 
524 

 

554 
547 
547 
554 
556 
554 

 

575 
584 
596 
568 

675 
259 
214 
140 
485 
374 
134 
42 
38 

 

863 
2,308 
255 

 

411 
447 
423 

 

4,211 
620 
392 
406 
185 
227 
186 
97 
34 
25 

             ATAT AAACCAAACAAGCTGAATAAGAACAGTGATATAGAAGATATA TAT14 

              AAAAT ACCAAACAGATCAGATAAAGGCAGTGATATAGAGAATATAAAATA T12 

            ATAAAAT ACCAAACAGATCAGATAAAGGCAGTGATATAGAGAATATAAA T12 

              AT AAAACCAGACAAACTGAATGAAGACAGTAACGTAGAAGATAT T10 

TTGTTTGGTTGATTAAAT AACCAAACAAGTCGAGTAGAGACAGTGATATAAAAGGTATAAAAT T09* 

             ATAT AAACCAAACAGACCAAGTGAAGATGGCAGTATAAGAGATATGA TATAT11 

              AAAT AACCAAACAAGTCGAGTAGAGACAGTGATATAAAAGGTATAAA T16* 

              AAAT AACCAAACAAGTCGAGTAGAGACAGTGATATAAAAGGTATAA T12 

              AAAT AACCAAACAAGTCGAGTAGAGACAGTGATATAAAAGGTAT T13 

   

  ATAT ATAACACAAAACATAACAGAGAGTATAGAGAGAAATTGAATGA T15 

ATAT AAATAACACAAAATACGACGAGAAATATAAGAGAGAATTGAGTAAATT T05* 

  ATAT ATAACACAAAACATAACAGAGAGTATAGAGAGAAATTGAATGA T15* 

   

  AAAA AAAAAACAACATAGAAAGTGAATCAGAGAATGACATAAGATATA T11 

     A AAAAAACAACATAGAAAATAAGTCAGAGAGTAATATGAGAT TGTTATAAT05 

  ATATAT AAAACAACATAAGAGATGAATCAAGAGGTAATATAAGATATA T17 

   

    AAAAAAAAAAACAAAGACAAAGAAATCACTCAGAATAGAAGATGGTATAA T14 

     AAAAAAAAAACAAAGACAAAGAAATCACTCAGAATAGAAGATGGTATA T13 

    AAAAAAAAAAACAAAGACAAAGAAATCACTCAGAATAGAAGATGGTAT T12 

    AAAAAAAAAAACAAAAACAGAAAAACTGTCTAAGATAGAGAATGATATA T15 

    AAAAAAAAAAACAAAGACAAAGAAATCACTCAGAATAGAAGATGG GATAAAT08 

AA AAAAAAAAAAAACAAAGACAAAGAAATCACTCAGAATAGAAGATGATATAA T08* 

     AAAAAAAAAACAAAAACAAGAAGACTATCTGAGGTAGAAAATGATATA T14 

       ATATAAACAT AAACAAAGAGACCATCCGAAATAGAGAAT T15 

    AAAAAAAAAAACAAAGACAAAGAAATCACTCAGAATAGAAGATGATATA T09 

 A AAAAAAAAAAAACAAAGACAAAGAAATCACTCAGAATAGAAGATGATAT T09* 

 

   

3,197 
238 
730 
246 
208 
149 
1,153 
282 
2,511 
306 

   ATAACAACCACAGATAGAAATGAACGTAAATGAGAGAGAAAGATGAAA T13 

   ATAACAACCACAGATAGAAATGAACGTAAATGAGAGAGAAAGATGAAAGT T11 

ATAA AACAACCACAAGTAGAAATAGACATAAATGAGAGAGAAAGATAAAA T12 

 ATAT ATAACCACAAATAGAGACAAATGTAAGTAAAGAGAGAAAGTAAAA T19 

ATAA AACAACCACAAGTAGAAATAGACATAAATGAGAGAGAAAGATAAA T06* 

ATAA AACAACCACAAGTAGAAATAGACATAAATGAGAGAGAAAGAT T11 

 AT ACAAAATAACAACAATCATAAGTAAGAATAGATGTAGATGAGAAA T11 

ATATAT AAATAACAACAATCACAGATGAAGGTAGATATAAGTGAGAAA T14 

 AT ACAAAATAACAATGATCACAAATAAGAGTGAATGTAGATAGAGAA T15* 

ATATAT AAATAACAACAATCACAGATGAAGGTAGATATAAGTGAGAAA T13* 

 

   

766 
390 
1,913 
105 
101 

 

146 
53 
50 
98 
84 
37 

 

511 
997 
352 
62 

ATATAT AATAACAACAATAATCAGATTAGCAGAGTAATGATAGTTATAAAT T13 

  ATACAAATAACAACAATAGCCAAGTTAATAGAGTGATGATGATTA AT14 

  ATACAAATAACAACAATGATCAGACTAATAGAGTAATAGTGATTATA T14 

  ATACAAATAACAACAATGATCAGACTAATAGAGTAATAGTGATT T12 

  ATACAAATAACAACAATGATCAGACTAATAGAGTAATAGTGATTAT T15 

   

               AA AAAAAACGCATATAAGTGAGCCTATACATAGATAATGATGATAATT T03 

             AAAAAAAAAAA GCATATAAATAGATCTATATATGAGTGATAGTGACAATTA T15 

              AAAAAAAAAA GCATATAAATAGATCTATATATGAGTGATAGTGACAATT T11 

            AAAAA AAAAAACGCATATGAGTAGATCTGTACATAGATAGTGATGATA T12 

                ATAGAAAACGCATATGAGTAGATCTGTACATAGATAGTGATGATA CT06 

CAACTCGACTGCGTGAA AAAAAACGCATATGAGTAGATCTGTACATAGATAGTGATGAT TTT 

   

 AAATTTATAAAACCAAT CCGTAATTATCTGAGATGAGAAGTGTATA AT12 

 ATATTTATAAAACC ATACTGTGATTATCTAAGGTAGAAAGTGTGTA AT21 

ATA TTTATAAAACCAATATCATAGTTATCTGAGGTAGAGAGTGTATA ATTTAATGTAT18 

               CATACC TAATTATCTGAAGTAGAAGATGTATGTAAAT T14 

 

199 

311 
306 
309 
324 
307 
310 
309 
310 
312 

343 
340 
343 

374 
377 
374 

404 
405 
406 
405 
409 
404 
405 
411 
405 
406 

442 
440 
442 
442 
443 
446 
453 
453 
453 
453 

475 
480 
478 
481 
479 

504 
503 
504 
507 
507 
508 

542 
542 
542 
537 

 

 

 

 

 

 

 

 
 

 

E) Cytochrome b 

5' 
32 
32 
32 
32 
32 
32 
32 

 

53 
51 
52 
54 
54 
54 
53 
56 

3' 
59 
60 
61 
64 
62 
64 
64 

 

91 
91 
91 
91 
91 
91 
91 
91 

Reads 
327 
255 
190 
176 
65 
35 
17 

CYb gRNA Sequences 

        AAATAATAGGGATTTATGATGAGATATG CTGTGGATAT14 

       AAAATAATAGGGATTTATGATGAGATATG CTGTGGATAT12 

      AAAAATAATAGGGATTTATGATGAGATATG CTGTGGATAT13 

AA AAAAAAAATAATAGGGATTTATGATGAGATATG CTGTGGATAGT09 

  GT AGAAAATAATAGGGATTTATGATGAGATATG CTGTGGATAGT12 

AA AAAAAAAATAATAGGAGTTTATGATGGAATATG CTGTGGATATTTT* 

AA AAAAAAAATAATAGGAGTTTATGGTGAGATATG CTGAGAGTAT05 

 

   

18,713   AAAAAA AAAAGACAGTGTGAATTTCTGAGTAATAAAGGGAATAAT T11 
10,649   AAAAAA AAAAGACAATATAGATTTCTGGGTGATAAAAGGGATAATAA CT11 

908 
339 
8,406 
2,265 
204 
95 

   ATAA GAAAGACAATATAGGTTTCTGGGTAATGGAGAGAATAATA T16 

 AAAAAA AAAAGACAATGTAGATTTCTGAGTAATGGGGAGGATAA CTATTTATTTT* 

   AAAA AAAAGACAATGTAGATTTCTGAGTAATGGGGAGGATAA CTAT05* 

 AAAAAA AAAAGACAATGTAGATTTCTGAGTAATAGGGAGGATAA CTAT16 

AAAAAAA AAAAGACAATGTAGATTTCTGAGTAATGGGGAGGATAAT T07 

AAAAAAA AAAAGACAATGTAGATTTCTGAGTAATGGGGAGGAT T05* 

 
F) Maxicircle Unidentified Reading Frame II 

5' 
30 
30 
34 
33 

3' 
79 
79 
79 
79 

Reads 
2,605 
125 
17 
15 

Murf II gRNA Sequences 

ATAG AAAGCACAAAAATAAAATTAAATTAGAGTAATTGGATGTTAAAATT T11 

ATAG AAAGCACAAAAATAAAATTAAATTAGAGTAATTGAATGTTAAAATT T08 

ATAG AAAGCACAAAAATAAAATTAAATTAGAGTAATTGAATGTTAA CAT12 

ATAG AAAGCACAAAAATAAAATTAAATTAGAGTAATTGAATGTTAAA T09 

 
 

 

 

200 

G) NADH Dehydrogenase 3 

3' 
76 
79 
73 
79 
79 
79 

 

113 
113 
113 
108 
108 
108 
108 

 

143 
141 
141 
141 
141 
141 
141 
141 
141 
141 

 

170 
170 
174 
174 

 

205 
189 
205 

 

229 
229 
233 
234 
233 

 

263 
263 
264 
264 
263 
263 
263 
263 

 

299 
299 
298 
299 
299 
299 
299 

 

329 
320 
329 
333 
329 
329 
328 

 
 

5' 
33 
31 
30 
31 
42 
33 

63 
65 
64 
63 
65 
66 
68 

99 
98 
101 
98 
100 
99 
100 
99 
98 
100 

130 
130 
130 
132 

158 
155 
158 

190 
190 
190 
198 
191 

223 
222 
222 
223 
223 
222 
223 
220 

253 
253 
253 
253 
252 
253 
252 

284 
288 
285 
300 
285 
282 
285 

 

 

 

 

 

 

 

 

 
 

 

Reads 
313 
70 
625 
511 
397 
26 

ND3 gRNA Sequences 

ATATATT ATAAACCATGATATCGAAAATGGGTGTAGAAATGATGATA T12 

ATAT ATAATAAACCACAGTATCAGAGACAGATATAGAAGTGATGATAGT T13 

    ATAT TAAACCACAATATCAGAAATAAGTGTAGAAATAGTGATAATA T12 

ATAT ATAATAAACCACAGTATCAGAGACAGATATAGAAGTGATGATAGT T09 

ATAT ATAATAAATCACAGTATCAGAGACAGATATAGAA TGATGATAGT12 

ATAT ATAATAAACCACAGTATCAGAGACAGATATAGAAGTGATGATA T16 

 

   

1,006 
411 
84 
59 
33 
27 
14 

 

826 
542 
501 
240 
199 
188 
168 
52 
24 
21 

 

507 
413 
2,397 
133 

   ACATAAGAAACATAAAGAAAAATCTGTGAGTAGAGTGATAAGTTATAAT T11 

   ACATAAGAAACATAAAGAAAAATCTGTGAGTAGAGTGATAAGTTATA T15 

AT ACATAAGAAACATAAAAAGAAATCTGTAAGTAGAGTAGTAAGTTATAA GT14 

 ATACAT AAAAACATAAAAAGAAACTTATAAGTAGAGTGATAGATTATAAT T09 

 ATACAT AAAAACATAAAAAGAAACTTATAAGTAGAGTGATAGATTATA TTTT 

 ATACAT AAAAACATAAAAAGAAACTTATAAGTAGAGTGATAGATTAT T12 

 ATACAT AAAAACATAAAAAGAAACTTATAAGTAGAGTGATAGATT T10 

   

  AT ATGAAAACAATCAAAGAAGTGTGATAGAAAGTATAAAAGGTATAA T11 

  ATAA GAAAACAATCAGAGAAATGCGGTAAAAGATATAAGAGATATAAA T12 

ATATAA GAAAACAATCAGAGAAATGCGGTAAAAGATATAAGAGATAT T09 

ATATAA GAAAACAATCAGAGAAATGCAGTAAAAGATATAAGAGATATAAA T13 

ATATAA GAAAACAATCAGAGAAATGCGGTAAAAGATATAAGAGATATA T11 

ATATAA GAAAACAATCAGAGAAATGCAGTAAAAGATATAAGAGATATAA TCT13 

ATATAA GAAAACAATCAGAGAAATGCAGTAAAAGATATAAGAGATATA T18 

ATATAA GAAAACAATCAGAGAAATGCGGTAAAAGATATAAGAGATATAA T10 

ATATAA GAAAACAATCAGAGAAATGCGGTAAAAGATATAAGAGATATAAG T09 

ATATAA GAAAACAATCAGAGAAATGCGGTAAAAGATATAAGAGATATG T20 

   

 ATATAT CAAATCACACGAAGATCATAGATGACGATGAAGGTAGTTAA TAT18 

 ATACAT CAAATCACACGGAAATCATAGATGGCAATGAAGATAGTTAA T11 

ATA TATACAAACTACACGGAGATCACAAATAACAGTGAGAATGATTAA T12 

ATA TATACAAACTACACGGAGATCACAAATAACAGTGAGAATGATT T15 

 

   

412 
236 
1,756 

   T ATGTATAAAACATTAAATGTGAGTTTGTGTCGTGTAGATTATGTGAA T15 

       AAATAAAACACC AACGTGAATTTATATTGTATAGATCGTATGAGAAT AT15 

ATAT ATGTATAAAACATTAAATGTGAGTTTGTGTCGTATAGATTATGTGAA T14 

 

   

4,864 
123 
227 
139 
40 

   ATATAT AACAACTAACAGAATATAGATCTGATGTATAAGATATCA T14 

   ATATAT AACAACTAACAGAATATAGATCTGATGTATAAGATATCG T13 

 ATAT AACAAACAACTGATGAGACATAGATTCGATGTATAAGATATCA T13 

ATAT AAACAAACAACTAATAAAATGTAAATCTGATGTGTGA TATACT08 

 ATAT AACAAACAACTGATGAGACATAGATTCGATGTATAAGATATC T11 

 

   

3,225 
3,165 
384 
164 
106 
232 
177 
32 

 

260 
106 
83 
133 
89 
57 
22 

 

786 
435 
282 
268 
168 
64 
3 

 
 

   TTAT ATACAAATAATGGGATTTAACGATATAGAGGATGAATGATT T11 

   ATAT ATACAAATAATGGGATTTAACGATATAGAGGATGAATGATTA T15 

AAAAAT AACACAGATAATGGAATTTAATGATATGAGAAATGGATGATTA T05 

 AAAAT AACACAGATAATGGAATTTAATGATATGAGAAATGGATGATT TCT17 

   ATAT ATACAAATAATGGGATTTAACGATATAGAGAGTGAATAATT TTTT 

   AAAT ATACAGATAATGGGATTTAACGATGTGAGAGATAGATAATTA T16 

 AAAAAT ATACAGATAATGGGATTTAACGATGTGAGAGATAGATAATT T05 

   AAAT ATACAGATAATGGGATTTAACGATGTGAGAGATAGATAATTAAT T13 

   

 ATAT GATAAAACAACACTATTATGAGAACGAGTGATAGAATATAGATAAT TTCT14* 

 ATAT AATAAAACAACACTATTACAAAGATAGACAGTGAGATATAGATAAT T13 

ATACAT ATAAAACAACACTATCATAGAAGCAGACAGTGAGATATGAGTAAT T15 

 ATAT AATAAAACAACACTATTACAAAGATAGACAGTGAGATATAGATAAT T07 

   AT AATAAAACAACACTATTACAAAGATAGACAGTGAGATATAGATAATG AT06 

 ATAT AATAAAACAACACTATTACGAAGATAGACAGTGAGATATAGATAAT T12 

   AT AATAAAACAACACTATTACGAAGATAGACAGTGAGATATAGATAATG AT16 

   

  ATAT AAAACCACAAAAATAGAAAGCTATAATAGAGATAGAATAATGTTA A07 

     ATTTTAAGTT AGAGTGAGAAATTGTAGTGGAAATAAGATGATA AAAT11 

  ATAT AAAACCACAAAGGTAGAAGATCGTAATAGAGATAGAATAATATT T09 

AT AACAAAAACCACAGAGATGAGAGATTGTAATAAG TATAGTGATAAT13 

  ATAT AAAACCACAAAAATAGAAAGCTATAATAGAGATAGAATAATGTT T12 

  ATAT AAAACCACAAAAATAGAAAGCTATAATAGAGATAGAATAATGTTATT CT09 

     AT AAACCACAAAGATAGAAGGCCATAATAGAGATAAAATAATATT T10 

   

 

201 

5' 
322 
324 
321 
322 
322 
321 
320 
324 

 

 

347 
345 
355 
350 
349 
355 

402 
402 
402 
403 

3' 
369 
369 
369 
369 
370 
370 
368 
370 

 

388 
388 
388 
388 
388 
388 

 

438 
438 
435 
438 

Reads 
48,018 
5,034 
3,062 
142 
721 
291 
149 
43 

 

95 
82 

1,307 
633 
376 
104 

 

417 
128 
753 
126 

ND3 gRNA Sequences cont. 
ATAT AATACCACATGAATCTTATATGTACGATGGAAGATGAGAATTAT T13 

ATAT AATACCACATGAATCTTATATGTACGATGGAAGATGAGAATT T11 

ATAT AATACCACATGAATCTTATATGTACGATGGAAGATGAGAATTATG TTTCT06 

ATAT AATACCACATGAATCTTATATGTACGATGGAAGATGAAAATTAT TCT10 

 AT AAATACCACATGAATCTTATATGTACGATGGAAGATGAGAATTAT T12 

    AAATACCACATGAATCTTATATGTACGATGGAAGATGAGAATTATG CAGT08 

 ATAT ATATCACACAAATTCTATACATATAATAGAGAATGAGAGTTACAA T06 

 AT AAATACCACATGAATCTTATATGTACGATGGAAGATGAGAATT T05* 

   

  ATATGTAAAC AATATACGTGATTTCAGAGATACTATGTGAATTCTAT T22 

  ATATGTAAAC AATATACGTGATCTCAGAGATACTATGTGAATTCTATGT T07 

ATATAAACAAAC AATATATGTGGTTTCGAAAGTGTCATGTGAATT AT12 

      ATAGAT AATATACGTGATCTTAGAGGTACCATGTGAGTCT GAGTTAT12 

             AGTATGCGTGATTTTAGAGATATTGTATGAATTTT T08 

             AGTATGCGTGATTTTAGAGATATTGTATGAATT ATAT15* 

   

ATATA TATAATACAACAAGGAGCGTCATAAGTAAAGTGAA TTCGTTATAT13 

 ATAT TATAATACAACAGAAAATGTCATAAGTGAGATGAA TTCGTTATAT09 

ATATATAA AATACAACAAGAGACGTCGTAAATAGAGTAAA TTCGT08 

 ATAT TATAATACAACAGAAAATGTCATAAGTGAGATGA TTCGTTAT06 

 

202 

 
 

 

H) NADH Dehydrogenase Subunit 7 

5' 
36 
35 
36 
36 
38 
28 
27 
31 
34 
36 
24 
24 
28 

59 
58 

108 
95 

124 
124 
139 
121 
139 
121 
122 

152 
147 
150 
151 
147 
152 

246 
246 

261 
261 
261 
260 
261 

297 
295 
292 
292 
292 
293 

327 
327 
327 
327 
330 
329 

352 
353 
352 
354 
352 

390 
391 
390 
391 

5' 

 

 

 

 

 

 

 

 

 

 

 

 

3' 
69 
69 
71 
69 
71 
71 
71 
71 
71 
69 
71 
71 
71 

 

91 
91 

 

137 
132 

 

170 
168 
170 
166 
170 
166 
166 

 

190 
199 
199 
199 
179 
190 

 

269 
269 

 

311 
293 
310 
293 
293 

 

338 
338 
334 
324 
324 
324 

 

373 
365 
365 
373 
378 
378 

 

398 
402 
385 
402 
402 

 

424 
427 
424 
424 

 

3' 

Reads 
100,761 

765 
240 
121 
113 

35,079 
20,487 

546 
402 
354 
327 
298 
194 

 

4,376 
1,242 

 

13 
1 

 

3,944 
3,640 
1,507 
396 
384 
178 
26 

ND7 gRNA Sequences 

   ATATA ATAAATGTAAAGAGACTATTGAGAGTGGCATAAG TGATATTAT14 

   ATATA ATAAATGTAAAGAGACTATTGAGAGTGGCATAAGG GATATTAT11 

  AAAT ATACAAATGTAAAGAAGCTATCAGAGGTAATATAAG TGATATAAT13 

   ATATA ATAAATGTAAAGAGACTATTGAGAGTGACATAAG TGATATTAT13 

  ATAT ATACAAATGTAAAGAGACTATCGAGAGTGACATA TGTGATATTAT11 

  AAAT ATACAAATGTAAGAGAACTGCCGAAAGTAACGTAGAATGATATT AT12 

  AAAT ATACAAATGTAAGAGAACTGCCGAAAGTAACGTAGAATGATATTT T08 

ATAAAT ATACAAATGTAAGAGAACTGCCGAAAGTAACGTAGAATGAT TTTT 

  AAAT ATACAAATGTAAGAGAACTGCCGAAAGTAACGTAGAAT T15 

   ATATA ATAAATGTAAAGAGACTATTGAGAGTGGCATAAG TGATATTAT12 

ATAAAT ATACAAATGTAAGAGAACTGCCGAAAGTAACGTAGAATGATATTTGTT TTTT 

  AAAT ATACAAATGTAAGAGAACTGCCGAAAGTAACGTAGAATGATATTTATT T11 

  AAAT ATACAAATGTAAGAGAACTGCCAAAAGTAACGTAGAATGATATT ATCT09 

   

ATATATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAA TGT13 

ATATATAAATACA GTGGACTAAATGTGAAGCGATATAAATGTGAAA T12 

   

ATATAATAAACAACATAAAGTGCCATGTACT#CGAGAT TTTTAG 

  AT ATAAACAACATAAAATACTATGTGATGTAGGAT CTGTGAATTAAT09 

 

ATAC ATCAATATAAACGATAGATTTACCGTAGAAGTATAGTGAATAAT T14 

 ATATA CAATATAAACAGTAGATTCACTGCAGAAGTATGATAGATAAT T11 

  AT ATCAATATAAACAATAAGTTCGTCATAGA TTTACAGTAGATAAT12 

  ATACAT ATATAAACAATGAATTCACTGTGAAGATACGATAGATGATATA T14 

  AT ATCAATATAAACAATGAGTTCGTCATAGA TTTACAGTAGATAAT12 

    ATAT ATATAAACAATAAATTCATCGTGAAGGTACAGTAAATGATATA T16 

    ATAT ATATAAACAATAAATTCATCGTGAAGGTACAGTAAATGATAT T15 

 

   

988 
563 
200 
2,089 
295 
143 

  AAATTACGATGCAT AATAATCTATGGTACAGTTGATATGAGTGATAA T12 

ATATATA CACGATGCAGATAATCTATAGTATGATTGATATAAGTGATAAATTT T09 

ATATATA CACGATGCAGATAATCTATAGTATGATTGATATAAGTGATAAAT AT12 

    AAA TACGATGTAAATAACCTGTAGTATAGTTAGTGTAGATAGTGAA AT12* 

   ATATATAAATGCAAATAACG TGTAATACAGTCAATATAGATGATAAATTT T09 

  AAATTACGATGCAT AATAATCTATGGTACAGTTGATATGAGTGATAA T13 

 

   

2,470 
837 

 

3,259 
2,874 
146 
133 
1,105 

 

888 
765 
91 
394 
178 
90 

 

2,128 
1,225 
20,110 

997 
517 
111 

 

283 
3,043 
602 
556 
173 

ATATCAAC ACATAATCTGACTTGTCGGAGTAT CTAAAGGAATAAAT14 
ATATCAAC ACATAATCTGACTTGTCGGGGTAT CTAAAGGAATAAAT12† 

 

ATAT AACATAAAGACAATAAGTGCTTATTACAGTGAACATTGATATAATTT T08 

     ATATATAAGACAAC GATGCTCATTATGATAGATACTGATGTAATTT T10 

 ATAT ACGTAAAGACAATAGGTGTTCATTGCAGTAGATATTGATGTAATTT T10 

     ATATATAAGACAAC GATGCTCATTATGGTAGATACTGATGTAATTTA T07 

   ATATATATAAGACAAC AGTGCTCGTTACAGTGAATATTGATGTAATTT T11* 

   

   AT ATAAATAACATCGCAACGTATATTCGAAATGTAGAGATA CAT13 

ATATA ACAAACAACATCGTAATATGTGCTCGGAGTATAGAGATAAT TAAATAT13 

    ATACT ACAACATCGCGATATATACTTGGAATGTAAAGGTGATAAA GT11 

   ATATATAACAACATCG AGCATATACTCAGAATATAAAGGTGATAAA GT12 

   ATATATAACAACATCG AGTATATACTCAGAATATAAAGATGATAAA GT09 

              ATCGG AGCATATACTTGAGATATAAAGATGATAA T11 

   

   A TATAATTAATAAACGTATAAATGTGCAGTGTGACGATGAATGATATT AT13 

      ATACAT ATAAACGTATAGGTGCATAATGTAACGATGAATGATGTT AAT11 

      ATACAT ATAAACGTATAGATGCGCAATGTAACGATGAATGATGTT AAT12 

   A TATAATTAATAAACGTATAAATGTGCAGTGTGACGATGAATGATATT T13 

AAA TTACAATTAATAGACGTATAAGTGCATAGTGTAGTGATGGATAAT T17 

AAA TTACAATTAATAGACGTATAAGTGCATAGTGTAGTGATGGATAATG AT13 

   

   ATATA GAAAACTACAGGTAAATTCTGCAATTAGTAGACGTGTAAAT AT07 

ATATA TATTAAAACTATGGGTAGATTCTGTAATTAATAGACGTGTAAA AT13* 

   ATATATAAAACTACGA GTAGATTCTATGATTGATGAACGTGTAAAT T11* 

ATATA TATTAAAACTATGGGTAGATTCTGTAATTAATAGACGTGTAA T12 

ATATA TATTAAAACTATGGGTAGATTCTGTAATTAATAGACGTGTAAAT TTCTTTT 

 

   

2,283 
200 
162 
298 

 ATATTAA ATACATGATATACGCGATAGATTATTAGAGTTATG AGTTAAT16 

AAATT ACCATACATGATATATACAGTGAACTATTAGAATTAT AGGTAATGT06 

 ATATTAA ATACATGATATGCGCAGTAGACTATTAAAGTTATG AGTTAAT12 

AAATTACA ATACATGATATATACAGTGAACTATTAGAATTAT AGGTAAT14 

 

   

Reads 

ND7 gRNA Sequences cont. 

203 

452 
451 
452 
464 
450 
458 
450 

 

485 
485 

 

530 
526 
530 
528 

 

548 
544 
553 

 

569 
569 
569 

 

576 
574 
571 
571 

 

615 
615 
603 
596 
611 
615 
611 
615 
611 
611 
603 

 

630 

 

642 
640 
648 
642 

 

670 
674 
674 
669 
671 
671 

 

699 
701 

 

727 
711 
725 
727 

 

758 
741 
741 

 

764 
759 
765 
765 
765 

 

3' 

412 
414 
416 
416 
407 
414 
410 

453 
453 

486 
499 
486 
486 

508 
508 
508 

526 
526 
527 

540 
531 
540 
540 

564 
567 
562 
564 
567 
564 
564 
567 
563 
564 
568 

584 

596 
596 
596 
596 

629 
632 
630 
630 
630 
629 

656 
666 

679 
679 
679 
679 

711 
710 
709 

725 
731 
722 
719 
735 

 

5' 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

15,183 

223 
199 
5 

3,126 
2,332 
132 

 

3 
207 

 

2,523 
2,002 
1,136 
344 

     ATATATAA GGAGACAAATGATCTAGATTCGAGACTGTATATGATATAT  T13 

  ATATATAACAAC GAGACAGATAATCTAGATTTGAAGTTATATGTGATAT  T12 

    ATATATAA  GGAGACAAATGATCTAGATTCGAGACTGTATATGAT  T13 

AT ATTATAATAACGGAGATGAGCAATTTAGATTCAGAGTTATATGTGAT T13 

          ATATA AGACAAACAATCTAAATCTGAGACTGTATATGATATGTATAAT T10 

    ATAT ATAACGGAGACAGATAGCTTGAATCTAAGGTTGTATGTGATAT T12 

          ATATA AGACAAACAATCTAAATCTGAGACTGTATATGATATGTAT T13 

   

ATAAAATGTCATCAATTA GTTACGTTCTTTGAGTGATTGTGATAAT AT13 

 ATAAATAACATCAATTA GTTACGTTCTTTGAGTGATTGTGATAAT AT13 

   

ATATAA GCATACGACAATTACGATATGAGTCAGAGAATGTTGTTAATTT T11 

ATATATAA ATACGACAATCATAACGTGAATCAGAGA CTGTGATTAAT11 

ATATAA GCATACGACAATTACGATATGAGTCAGAGAATGTTGTTAATTT T11 

   ATATA ATACGACAATCACGATATAGATTAAAAGATGTTGTTAATTT T10* 

 

   

1,129 
116 
1 

 

3,050 
4,850 
351 

 

544 
291 
8,794 
1,694 

    ATAT AAATCATGAAAGCTGAGTGTGTATGGCAGTTACGATATA TGTTAAT14 

    ATATAAAA CATGGAAGCTAAGTGTGTATGATGATTATGATATA TGATTAAT06 

ATA TAATAAAACCATGAGAGTTGGATGTATATGATGATCGTGATATA T17 

 

ATATAT ATCATCAAGAATATCTAATGAGACTATGAGAGTTAAATGTATA AT12 

ATATAT ATCATCAAGAATATCTAATGAGACTATGAGAGTTAAATGTATA AT13 

ATATAT ATCATCAAGAATATCTAATGAGACTATGAGAGTTAAATGTAT T06 

 

ATAATAAT AAACAAAATCATTGAGAGTATCTGATAGAGTTATGA TAGTCAT05 

      AAAT ACAAAATCATCAGGGATACTTGGTAAGATTGTGAAAGTTAAGT T16 

      ATATAAT AAATCATCAAGAATATCTGATAGAACTGTGA TTGCTAAT14* 

      ATACAAT AAATCATCAAGAGTATCTAATAGAACTGTGA TTGCTAAT14 

 

   

1,896 
360 
267 
173 
1,316 
1,043 
754 
703 
188 
146 
131 

 

831 

 

20 
512 
159 
32 

 

1,258 
463 
419 
299 
101 
65 

 

59 

2,991 

 

2 
1 

1,728 
373 

AT ATATTATCAACAACAATAGAAGATTGGCGAAATTAGAGATAGAATTATT T12  

AT ATATTATCAACAACAATAGAAGATTGGCGAAATTAGAGATAGAATT T09 

      ATATT ACAACAACAAGAAATCAATGAAGTCAGAGATAAAGTTATTAA T15 

  TAGATATCAACAACAT CAGAGAATCAATGAAACTAGAGATAGAGTTATT T10 

 ATACA TATCAACAACGACAAGAGATCAGTGAAATTAGAAGTAAAGTT T13* 

AT ATATTATCAACAACAATAGAAGATTGACGAAATTAGAGATAGAATTATT T10 

 ATACA TATCAACAACGACAAGAGATCAGTGAAATTAGAAGTAAAGTTATT T09* 

AT ATATTATCAACAACAATAGAAGATTGACGAAATTAGAGATAGAATT T15 

 ATACA TATCAACAACGACAAGAGATCAGTGAAATTAGAAGTAAAGTTATCA T06 

 ATACA TATCAACAACGACAAGAGATCAGTGAAATTAGAAGTAAAGTTATC T12 

      ATATT ACAACAACAAGAAATCAATGAAGTCAGAGATAAAGT AT08 

   

 ATAAT TAACAAACAAATATGATATTATCAGTGACAGTGAGAAATTGATA T15 

 

  ATA TATAACAATCCATAGCAGATAGACGTGATATTATTGATGATAGT TTAAAT18 

   AAAT TAACAATCCATAACAAGTGAGCGTGATATTGTCAGTGATAAT T11 

ATAAATCATAACAATCTATAATAGACGAGCGTGATATTGTCAATGATAAT T10 

  ATA TATAACAATCCATAGCAGATAGACGTGATATTATTGATGATAAT T12 

   

    ATAT GATAAACGATTACCTACAGATAATGAGTCATAGTGATTTATA T13 

ATAT ATAAAATAAACGATTACCTGTGAATGATAGATTATGATGATTT T11 

ATAT ATAAAATAAACGATTACCTGTGAATGATAGATTATGATGATTTAT T26* 

   ATACAT ATAAACGATTACTCATAGATAGCAAGTCATAGTGATTTAT T08* 

   ATTT AAATAAACGATTACTTACGAGTGACAGATTGTGATGATTTAT T14 

 ATATTT AAATAAACGATTACTTACGAGTGACAGATTGTGATGATTTATA T15 

   

ATATAT ATGACAAACTACGTAAGTGCAGATAAAGTAAGTGATTATTT T12 

ATAT AGATGACAAACCATGTAGACGTGAATAAGATAG TGATTGATACAT12* 

   

AAAAAA AAAAACTAAATCATATAAATTAAAGGGTGATGAACTGTGTAAAT T13 

                     ATAAGTCAGAAAGTGATAGATCGTGTAAAT T15 

         AAATTAAATCATATAAGTTGGAAGATGGCAGATTGTGTGAAT TGTAGT16 

 AAAAA AAAAACTAAGTCATATAAGTCAGAAAGTGATAGATCGTGTAAAT TTT 

 

   

3,229 

22 
2 

 

55,757 
2,867 
8,175 
552 
115 

ATATAT ACGAGACAAAATATCACTTAGATTATTAGAGATTGAGTTATA AT13 

                     AA TTAGACTATTAGAGATTGAGTTATAT TTT 

   AAAACGAGACAAGATACCAA TTAGACTATTAGAGATTGAGTTATATA T14 

   

ATATAA TAACGAACGAGGCAGAGTATCATTTAGACTATTAGA TTCAAT14 

  ATATATAAT GACGAGATAAGACATCACTTAGACTGT AGAGAT14 

ATATA TTAACGAACGAGATAGAACATTGCTTGAGTTATTGAGAAT AT14* 

ATATA TTAACGAACGAGATAGAACATTGCTTGAGTTATTGAGAATT T12 

  ATA TTAACGAACGAGATAGAACATTGCTTGAGTT TT 

 

 

Reads 

ND7 gRNA Sequences cont. 

204 

794 
794 
794 
794 
794 

 

806 
816 

 

830 
822 
830 
833 
833 
833 
830 
830 
830 

 

845 
839 
839 
844 
839 
844 

 

877 
877 
865 
857 
865 
857 
879 

 

902 
902 

 

 

916 
919 
919 
919 
919 

 

939 
931 
939 

 

947 
951 

 

978 
988 
978 
986 
978 
988 
978 
988 
983 
983 
983 
983 

 

 
 
 
 
 
 
3' 

1000 

 

 

 

 

 

 

 

 

 

756 
757 
751 
755 
756 

772 
782 

778 
777 
775 
781 
780 
778 
778 
778 
778 

792 
790 
790 
792 
790 
790 

843 
834 
829 
831 
832 
832 
834 

863 
866 

 

872 
872 
867 
872 
871 

901 
899 
901 

907 
907 

932 
934 
931 
933 
940 
936 
941 
937 
934 
933 
936 
937 

 

 
 
 
 
 
 
5' 
959 

 

48,263 

318 
174 
167 
2,829 

 

367 
77 

 

25,793 
5,729 
2,644 
2,373 
2,079 
903 
450 
331 
134 

 

98 
52 
343 
190 
99 
89 

 

26,437 
1,279 
188 
121 
23 
20 
1 

AT ACTAAATAAACGACGATCTTATACTGTATCTGATGAATG TGATATTAAT14 

AT ACTAAATAAACGACGATCTTATACTGTATCTGATGAAT ATGATATTAATTTT 

AT ACTAAATAAACGACGATCTTATACTGTATCTGATGAATGGGATA TTAAT14 

AT ACTAAATAAACGACGATCTTATACTGTATCTGATGAATGA TATTAATTGT07 

AT ACTAAATAAACGACGATCTTATACTGTATCTGATGAATG TGATAT14* 

   

ATATA TCATAACAACTAGATAAACGATGATCTCACA GTATAGTTAAT12 

 ATAACTTATAACAACTAGATAGACGG GAATCTTATATTGTAGTTAAT16 

 

    ATAT ATAAAACATAAAATATGACTTGTAGCAGTTAAGTGAATGATGAT AT12 

          ATATAT TAAAATACAACTTATGATGACTAAGTGAATGATGATT CAAT10 

    ATAT ATAAAACATAAAATATGACTTGTAGCAGTTAAGTGAATGATGATTTT T08 

AAATA ATAACAAAACATGAGATATAACTTGTAGTGATTAGATGAATGAT T11 

AAATA ATAACAAAACATGAGATATAACTTGTAGTGATTAGATGAATGATA T13 

AAATA ATAACAAAACATGAGATATAACTTGTAGTGATTAGATGAATGATAGT AT13 

    ATAT ATAAAACATAAAATATGACTTGTAGCAGTTAGGTGAATGATGAT ATTAAAT13 

    ATAT ATAAAACATAAAATATGACTTGTAGCAGTTAAGTGAACGATGAT ATTAAAT14 

    ATAT ATAAAACATAAAATATGACTTGTAGCAGTTAAGTAAATGATGAT AT07 

   

ATAT AAAACAATAATCATGATGAGATATAAAGTGCAGTTTGTGATAATT T10 

      ATAT ATAATCATAACAAGATGTAGAGTACGATTTATAGTGATTAA T12 

      ATAT ATAATCATAACAAGATATAGAGTACGATTTATAGTGATTAA T11 

 ATAT AAACAATAATCATGATGGGATATAAAGTGCAGTTTGTGATAATT T10 

      ATAT ATAATCATAACAGAGCATAGAATACAGTTTATAGTGATTAA TTGT05* 

 ATAT AAACAATAATCATGATGGGATATAAAGTGCAGTTTGTGATAATTAA TAT05 

   

  ATAT ATAAACGGTCAAATGTATTACTTATAAGATAGAG TTGATAAT14 

    AT ATAAACGGTCAAATGTATTGCTTGTAAGGTAGAGGTGATAATT T13 

    ATATATAACGATC AATGCATTATCTATGAAGCAAGAACAGTGATTATAAT T15 

                 AATGTGTG ACCTATAAAGTGAGAATAATGATTATA T12 

    ATATATAACGATC AATGCATTATCTATGAAGCAAGAACAGTGATTAT TTTT 

                 AATGTGTG ACCTATAAAGTGAGAATAATGATTAT T11 

ATAT AAATAAACGATTAGATGTATCACTTATAGAATAAAAATGATGATT T15 

 

   

2,491 
126 

 

 
3 

3,227 
528 
295 
142 

    ATATA ATATGCATATCAGATAGACGTAGAAATAAGTGATTAAAT T10 

ATATAATAA ATACGCATATTAGATGAGTGTAAAAGTAAGTGATTA T10 

 

 

ATATAA AATCAACAAATTCGTATGTATATCGAGTAAATGTAGAAATAAAT T09 

 A TACAAATCAACAAATTCGTGCGTATATTAAGTGGGTGTAGAAGTGAAT T17* 

 A TACAAATCAACAAATTCGTGCGTATATTAAGTGGGTGTAGAAGTGAATGATT T15 

 A TACAAATCAACAAATTCGTGCGTATATTAAGTGAGTGTAGAAGTGAAT TAAAT19 

 A TACAAATCAACAAATTCGTGCGTATATTAAGTGGGTGTAGAAGTGAATG T15 

 

   

1,365 
292 
249 

 

24 
64 

 

44,852 
11,706 
3,513 
2,467 
182 
175 
137 
97 

18,961 
1,729 
472 
397 

 

 
 
 
 
 
 

Reads 
173,548 

 ATATAT ACTAATAAAAAGGCATTGCTTACAGATTGATAGATTTAT T12 

ATATATTACCAACAT AGAGACATTGCTTATGAGTTAACAGATTTATAT T12 

 ATATAT ACTAACAAAAAGATATTGCTTATAGATCAGTGAATTTAT T11 

 

             ATAAAGAAACCAACAGAAGAATATTGCTTGTAAGTTAATGA TCTTGTAT10 

CCAAGGCA AAAGATAAAGAAACCAACAGAAGAATATTGCTTGTGAGTTAATGA TCT17 

   

     ATATAT TCAAACAAACAGATAGAACCGGAGACGAGAAGATTGATAA T13 

  ATATAAATAATCAAACAGACGAATGAAACTAGAGATAGAGAAATTAAT T12 

     ATATAT TCAAACAAACAGATAGAACCAGAGACGAGAAGATTGATGAA T12 

    ATAAATAATCAAACAGACGAATGAAACTAGAGATAGAGAAATTAATA T12 

     ATATAT TCAAACAAACAGATAGAACCGGAGACGAGAAG CTTGATAAT05 

  ATATAAATAATCAAACAGACGAATGAAACTAGAGATAGAGAAATTA TTAT15 

       ATAT TCAAACAAACAGATAGAACCGGAGACGAGAA TATTGATAATTTAAT14 

  ATATAAATAATCAAACAGACGAATGAAACTAGAGATAGAGAAATT TAT13 

ATATAT AATAATCAAACAGACGAATGAAACTAGAGATAGAGAAATTAAT T12* 

ATATAT AATAATCAAACAGACGAATGAAACTAGAGATAGAGAAATTAATA T05 

ATATAT AATAATCAAACAGACGAATGAAACTAGAGATAGAGAAATTA T14 

ATATAT AATAATCAAACAGACGAATGAAACTAGAGATAGAGAAATT T11 

   

 

 

 

 

 

 
ND7 gRNA Sequences cont. 
  ATAAA GGTAATATCACAGTGTAGATAGTCGGATAGATAGATGAAA AT14 

205 

1000 
1000 
1000 
1000 
1000 
1000 
1000 
1000 

 

1017 

 

1032 
1043 

 

1043 
1043 

 

1067 
1067 
1078 

 

1085 
1085 

 

1121 
1113 

 

1143 
1143 
1142 
1131 
1143 
1143 
1143 
1143 
1142 
1143 
1143 
1143 
1143 
1143 
1143 
1143 
1143 

 

1145 
1148 
1157 
1146 
1146 
1146 
1146 
1146 
1154 
1146 
1146 
1146 
1152 

 

1167 
1182 
1182 
1167 
1167 

 

1197 
1195 

 
 
3' 

1218 
1224 

952 
960 
966 
964 
963 
961 
962 
959 

983 

1001 
1000 

1015 
1013 

1032 
1030 
1032 

1057 
1055 

1089 
1087 

1099 
1094 
1099 
1099 
1101 
1105 
1106 
1108 
1094 
1099 
1107 
1099 
1099 
1099 
1099 
1099 
1095 

1108 
1094 
1107 
1107 
1108 
1111 
1114 
1110 
1108 
1107 
1120 
1113 
1108 

1128 
1136 
1138 
1128 
1131 

1150 
1150 

 

 

 

 

 

 

 

 

 

 

 
 

5' 

1181 
1183 

 

748 
472 
412 
372 
371 
265 
234 
3,928 

 

101 

 

3 
1 

 

12 
208 

 

6,797 
2,406 
751 

 

44 
40 

 

1 
412 

 

81,287 
6,615 
5,793 
705 
550 
520 
318 
303 
289 
259 
177 
163 
162 
128 
98 
356 
212 

 

569 
295 
82 

90,559 
10,003 
2,224 
2,220 
1,887 
978 
400 
266 
243 
232 

  ATAAA GGTAATATCACAGTGTAGATAGTCGGATAGATAGATGAAATT T09 

  ATAAA GGTAATATCACAGTGTAGATAGTCGGATAGATAGATGAA T13 

  ATAAA GGTAATATCACAGTGTAGATAGTCGGATAGATA TATGAAAAT13 

  ATAAA GGTAATATCACAGTGTAGATAGTCGGATAGATAGA AAAAAT14 

  ATAAA GGTAATATCACAGTGTAGATAGTCGGATAGATAGAT T15 

  ATAAA GGTAATATCACAGTGTAGATAGTCGGATAGATAGATGA T12 

  ATAAA GGTAATATCACAGTGTAGATAGTCGGATAGATAGATG TAAATTTAT09 

ATATAAA GGTAATATCACAGTGTAGATAGTCGGATAAATAGATGAAA AT11 

 

ATATATAAATAACATAT TAATGGTCTTGATGGTGATATTGTAGTATAA T12 

 

           ATATA TATAAAATAACGTGATGATGGTCTCGAT AGTAATTGATAAT08 

 ATAT ACACCACAAACTGTAGAATGACGTGATAGTGGTTTCAGTG ATAAT15 

 

  AT ACATCACAAACTATAAAGTAATATGATAA GGGTTTCGAT15 

ATAT ACACCACAAACTGTAGAATGACGTGATAGTG AT18 

   

ATATATACAAGC AATGATGTACTCGGTAAATAGTGACACTGTGAATT T12 

ATATATACAAGC AATGATGTACTCGGTAAATAGTGACACTGTGAATTAT T12 

 A AATATAAGCAAATGATGTATTCGGTGAGCAGTGATATCGTAAATT T10 

   

   ATTAGAAA GGTGTTCGGTATAGGTAGATGATATAT AT10 

CTCATTAGAAA GGTGTTCGGTATAGGTAGATGATATATTT TTT 

 

AT ATATGATAACAAACAATACTTACTTT GAGGTGTTTTGTGAGTAAAT14 
  ATA AATAACAAACAATATTCGTCTTTG AGATATTCGATATAAGTAAAT12† 

   

ATATAA GAGAACATAAACTAGCATAGAGGCATAGTAACAAGTGATAT AT14 

ATATAA GAGAACATAAACTAGCATAGAGGCATAGTAACAAGTGATATTT T10 

ATATAAA AGAACATAAACTGACACGAGGGTATAGTGATGAATGATAT AT13 

 ATATAAGAGAACATAAA CAGCATAGAGGCATAGTAACAAGTGATAT AT14 

ATATAA GAGAACATAAACTAGCATAGAGGCATAGTAACAAGTGAT T13 

 TATAA GAGAACATAAACTAGCATAGAGGCATAGTAACAAG GGATATAT13 

ATATAA GAGAACATAAACTAGCATAGAGGCATAGTAACAA T11 

ATATAA GAGAACATAAACTAGCATAGAGGCATAGTAAC T11 

ATATAAA AGAACATAAACTGACACGAGGGTATAGTGATGAATGATATTT T10 

ATATAA GAGAACATAAACTAGCATAGAGGCATAGTAACAAGCGATAT ACT12 

ATATAA GAGAACATAAACTAGCATAGAGGCATAGTAACA T12 

ATATAA GAGAACATAAACTAGCATAGAAGCATAGTAACAAGTGATAT AT05 

ATATAA GAGAACATAAACTAGCATAGAGGCATAGTAACAAATGATAT AT05 

ATATAA GAGAACATAAACTAACATAGAGGCATAGTAACAAGTGATAT AT14 

ATATAA GAGAACATAAACTAGCATAGAGACATAGTAACAAGTGATAT AT14 

ATATAA GAGAACATAAATTGACATGGAAGCATAGTAATAAGTGATAT AT12* 

ATATAA GAGAACATAAATCAGTGCAAAGGTATAGTAGTGAGTGATATT AAT09* 

 

  ATATAAACGTAT ACGAGAGCATAGATCAGTGTGAGAATGTAGTAAT T14 

   ATATAAAA TAAACGAGAATATAAACTGATGTAGAGATATAGTGATAAGTAATATTT T08 

AT ATGTAAATGTAAACGAGAATATAGATTGATGTAGAGATATAGTAATA T13 

  ATATTAAACGT AACGAGAATGTGAACTGACATAGAGATATGATAATA T14 

  ATATTAAACGT AACGAGAATGTGAACTGACATAGAGATATGATAAT T14 

  ATATTAAACGT AACGAGAATGTGAACTGACATAGAGATATGAT T10 

  ATATTAAACGT AACGAGAATGTGAACTGACATAGAGATAT T14 

  ATATTAAACGT AACGAGAATGTGAACTGACATAGAGATATGATA T16 

ATATA TAAACGTAAATGAGAATATGAATCAGTGTGAAAATGTAATAAT T07* 

    ATTAAACGT AACGAGAATGTGAACTGACATAGAGATATGATAATG T13 

  ATATTAAACGT AACGAGAATGTGAACTGACATAGAGAT T13 

  ATATTAAACGT AACGAGAATGTGAACTGACATAGAGATATG TTTT 

   ATAT AACGTAAACGAGAGCATAAATTGATGTGAAGATGTGATAAT TTCTTTT 

 

   

3,079 
1,834 
866 
337 
203 

            ATATAT AATCCGTACAATGCGAACGTAGACGAGAATATGAGTTAAC T14 

ATAT ATAAATATGCAAGAAATCTGTATGATGTAGATGTGAATGAGAATAT T10 

ATAT ATAAATATGCAAGAAATCTGTATGATGTAGATGTGAATGAGAAT T11 

            ATATAT AATCCGTACAATGCGAACGTAGACGAGAATATGAGTTAAT TTT 

            ATATAT AATCCGTACAATGCGAACGTAGACGAGAATATGAGTT T19 

 

   

213 
1,077 

 
 

Reads 
4,103 
1,399 

AT ATAAACATCCAATAGACGAGTATGTGAGAGATTTGTATGATGTAAAT T10 

     AAATATCCAATAAACAGATATGTAGAAGGTCCGTATAATGTGAAT ATAT13 

   

 

  ATATATAA ATGCAATAGAAGATCACGCAAATAGATATCTGATAAAT T13 

 ATA TAAATCATGCAGTAAAAGACTATGTAGATGGATATTCAGTGA TAAT14 

ND7 gRNA Sequences cont. 

206 

1183 
1183 
1183 

1210 
1233 
1233 

1240 
1251 
1242 
1242 
1240 
1240 
1239 
1240 
1241 

1269 
1269 

 

 

 

1223 
1224 
1224 

 

1257 
1257 
1257 

 

1268 
1282 
1283 
1270 
1268 
1270 
1268 
1268 
1268 

 

1320 
1320 

1,017 
138 
2,561 

 

123 
167 
83 

 

95 
63 
6 
296 
216 
54 
25 
23 
12 

 

6 

AAATA AAATCATGCAGTAGAGAACCGTGTAAGTGAGTATCTGATAA TTAT11 

 ATA TAAATCATGCAGTGAAGAGCTACGTAAATGGATATTCAGTGA TAAT11 

 ATA TAAATCATGCAGTAAAAGACTATGTAGATGAATATTCAGTGA TAAT14* 

   

ATAT ACAACATCAATATTACTTAGAACGGTAACTAGATTGTGTAATAA T14 
ATAT ACAACATCAATATTACTTAGAACG ATAACTAGATTGTGTAAT10† 
ATAT ATAACATCAATATTACCTAGAGTG TCAGTTAGATTATGTGATAAAT14† 

 

       AAACTAACGATATT CGGATCTGAGAGTAACATTGATATTATTT T07 

ATATATAT AACTAACGATCTATGGGTTTAAAGACAGTGT GAAT12 

     AC AAACTAACGATTTACGGATTTAGAGACAGTGTTAATGTTAT AT13 

                  A TACGGATTCAGAAGTGATATTGATGTTAT AT15 

       AAACTAACGATATT CGGATCTGAGAGTAACATTGATATTATTT T 

     ATATAAACTAACGA TACGGATTCAGAAGTGATATTGATGTTATTT T16 

                  ATT CGGATCTGAGAGTAACATTGATATTATTTA T17 

                  ATT CGGATCTGAGAGTAACATTGATATTGTTT TT 

               GATATT CGGATCTGAGAGTAACATTGATATTATT AT15 

   

ATA TAAACAATCCTACAATGATCTCGTGTATAAGACTGATGATTTA AT12 

1,074 

ATA TAAACAATCCTACAATGATTTCGTGTATAAGACTGATGATTTA AT17 

 

207 

 
 

 

I) NADH Dehydrogenase 8 

3' 
58 
68 
56 

 

98 
98 
97 
97 
98 
97 
97 
97 

 

136 
133 
133 
135 
131 
136 
136 
139 
139 
139 
139 
132 
131 
133 
132 
138 
138 

 

153 
153 
153 
153 
153 
153 
152 

 

170 

 

186 
187 
187 
198 
207 
199 
199 

 

230 
229 
230 
220 
230 
230 
230 
228 
228 
229 

 

244 
239 
245 
254 

 
 
 

5' 
34 
29 
28 

55 
55 
54 
55 
57 
57 
54 
59 

87 
84 
87 
87 
84 
87 
96 
86 
85 
86 
87 
88 
84 
92 
92 
86 
85 

111 
111 
111 
111 
115 
113 
111 

117 

158 
161 
161 
161 
161 
161 
160 

186 
186 
186 
186 
194 
195 
196 
186 
187 
189 

213 
209 
219 
219 

 

 

 

 

 

 

 

 
 
 

 

Reads 

2 
1 
9 

 

2,577 
274 
236 
100 
98 
65 
57 
29 

ND8 gRNA Sequences 

       GTGGG ATATGAAAGTAAGAGAATAAAAAAA ATTAAT13 

AA AAAAAAAAACATACAGAAATAGAAAGATAAGAAAGTGATA  TATTAT08 

           AAA ATAGGAGTAGGAGGATGAGAAAATGATAG AGGGATTTT* 

   

ATAT AAAACAAACAAAAAGAAGAAACAAGAAATTGAAGAGAGATATAT T13 

ATAT AAAACAAACAAAAAGAAGAAGCGAGAAATTGAAGAGAGATATAT T12 

 ATAT AAACAAACAGAAAAAGAAAACAAAGAGTCAGAGAAAGATATATA T17 

 ATAT AAACAAACAGAAAAAGAAAACAAAGAGTCAGAGAAAGATATAT T10 

ATAT AAAACAAACAAAAAGAAGAAGCGAGAAATTGAAGAGAGATAT T05 

 ATAT AAACAAACAGAAAAAGAAAACAAAGAGTCAGAGAAAGATAT T05 

 ATAT AAACAGACAGAAAAAGAAAACAAAGAGTCAGAGAAAGATATATA TGTAATTATTTT* 

 ATAT AAACAAACAGAAAAAGAAAACAAAGAGTCAGAGAAAGAT T09 

 

   

6,039 
633 
593 
394 
380 
292 
191 

11,689 
1,938 
1,171 
816 
612 
598 
267 
267 
221 
196 

       ATAAATAGTAACACAATGAGCAGAGTACGTATAAGAATGAGTAAA T14 

       A AAATAGTAATACAACAGACAGAGCATATATAGAAATAAGTGAGAAA T16 

         AAATAGTAACACAATGAGCAGAGTACGTATAAGAATGAGTAAA T15 

        TAAATAGTAACACAATGAGCAGAGTACGTATAAGAATGAGTAAA T12 

        AT ATAGTAATACAACAAACGAGATACGTATAGAAATAGATGAGAAA T13 

       GTAAATAGTAACACAATGAGCAGAGTACGTATAAGAATGAGTAAA T12 

       ATAAATAGTAACACAATAGACGAGATACGTGTAGAA TAAGTGATTTAAT11 

ATA TAAACAAATAGTAACATGACGGATAGAACGTATATGAGAATGAGTAAAA T12* 

ATA TAAACAAATAGTAACATGACGGATAGAACGTATATGAGAATGAGTAAAAG T13 

  A TAAACAAATAGTAATATGACGAATGAAGCGTATATGAGAATAAGTAAAA TTTT* 

ATA TAAACAAATAGTAACATGACGGATAGAACGTATATGAGAATGAGTAAA TTTAT14 

     ATAT AATAGTAACACAACGAATAGAACATGTATAGAGATGAATGA TAT17 

      ATAT ATAGTAACACAATAAATGAGACATATATGAAGATGAATGAGAAA TTTT* 

   ATATA GAATAGTAACACAGCAGATAAGATACATATAGAGATAA TGACAGT07 

   ATATAT AATAGTAACACAGCAGATAAGATACATATAGAGATAA TGACAGT14 

  AA AAACAAATAGTAATATGACGAATGAAGCGTATATGAGAATAAGTAAAA T06 

  AA AAACAAATAGTAATATGACGAATGAAGCGTATATGAGAATAAGTAAAAA T10 

 

   

70,659 

ATATATAAATGGTA AACTCAATGGGTGGATAAGTAGTAATGTGATGAAT T13 

479 
245 
147 
127 
121 
804 

 

4 

 

169,806 
124,070 

799 
66 

2,257 
966 
804 

ATATATAAATGGTA AACTCAATGGGTGGATAAATAGTAATGTGATGAAT TTT 

ATATATAAATGGTA AACTCAATGGGTGGATAAGTAGTAATGTGATAAAT T12 

ATATATAAATGGTA AACTCAATGGGTGGATAAGTAGTAATATGATGAAT TGTAATAT15 

ATATATAAATGGTA AACTCAATGGGTGGATAAGTAGTAATGTGAT T14 

ATATATAAATGGTA AACTCAATGGGTGGATAAGTAGTAATGTGATGA T15 

A  TATACAATGGTT ATTCAGTGGGTAGACAGATAGTGATATGATAGAC TTAT12 

   

ACAT ATAAACTAACAATGGTTGATTTAGTGAGTGAATGAGTAGTAATATT T11 

 

       ATAT GAACGCAAAGATGGATTACCACGAGTTAGTAAATTGATGAT AT14 

    ATATAAT AAACGCAAAATATGGTTACTATGAACTGATAGATTAAT T12 

            AAACGCAAAATATGGTTACTATGAACTGATAGATTAAT T11 

    ATATAAT AAACGCAAAAAATGGTTACTATGAACTGATAGATTAAT T14 

ATATA TAATAAAGACGCAAAAGATGGTTACTGTGAATTGATGAGTTAAT T12* 

 ATATATAGT AAAACGCAGAAAATGGTTACTATGGATTGATGAATTAAT T16 

   ATATAGT AAAACGCAGAAAATGGTTACTATGGATTGATGAATTAATG T15* 

 

   

24,763 

  ATA CAATACAACGCTCTGAATCATATCGATAAAAGTGTGAGAAAT T13 

182 
109 
204 
160 
149 
199 
161 
52 
31 

 

173 
19 
10 
1 

 
 
 

       AATACAACGCTCTGAATCATATCGATAAAAGTGTGAGAAAT TTAAT12 

  ATA CAATACAACGCTCTGAATCATATCGATAAAAGCGTGAGAAAT T11 

  ATA CAATACAACACTCTGAATCATATCGATAAAAGTGTGAGAAAT TTAAT10 

  ATA CAATACAACGCTCTGAATCATATCGATAAAAGTG GGAGAAAT11 

  ATA CAATACAACGCTCTGAATCATATCGATAAAAGT TGAGAAATTTAAT11 

  ATA CAATACAACGCTCTGAATCATATCGATAAAAG AGTGAGAAATTTAAT05 

      T ATACAACGCTCTAAATTATACCAGTGAAAATGCGAGAAAT T14 

ATATTAT ATACAACGCTCTAAATTATACCAGTGAAAATGCGAGAAA AT11* 

 ATATA AATACAACGCTTTAGATCATATCAGTGAGAGTGTGAAA T13 

   

        A ACATAAACGACAGGTAATATGATGTTCTGAAT GATATCGATAAT06 

          ACAT AACGACAAGTGATATAACGTTTTAAGTTACA GTGATAAT19 

   AAATA TACATAAACGATGAGTAATATGACGTT GTAGAT13 

AAATTAAATCACATAAATGACGAGCGATACAGTGCT GTAGATGATACT06 

   

 

 

208 

3' 
288 
267 
271 
285 
291 

 

318 
319 
317 
318 
318 
319 
314 
311 
321 
319 
319 
318 
318 
318 

 

353 
344 
350 
344 
344 

 

364 
358 
355 

 

382 
386 
382 
373 
385 
385 
385 

 

417 
414 

 

431 
434 

 

442 
443 
441 

 

466 
466 
466 
466 
466 
482 
462 
462 
462 
462 
462 
464 
480 
462 

 

503 

 

512 
510 
512 

 
3' 

5' 
240 
237 
246 
259 
270 

 

276 
276 
277 
279 
275 
279 
276 
276 
284 
287 
275 
289 
276 
275 

310 
301 
310 
316 
301 

331 
325 
325 

338 
350 
339 
341 
338 
342 
343 

372 
390 

391 
405 

411 
411 
407 

426 
425 
423 
428 
423 
426 
423 
426 
425 
428 
430 
426 
426 
427 

465 

477 
482 
477 

 

 

 

 

 

 

 

 

 

 
5' 

 

Reads 

3 
2 
340 
325 
268 

 

51,820 
28,492 
1,829 
594 
384 
334 
215 
186 
140 
139 
109 
4,761 
1,264 
175 

ND8 gRNA Sequences cont. 

         TATGAACATCCGATACTGAACTAGGGTAGATTGAGTTATATA T10 

              CATCCAATGTAT AATTAGGGTAGATTGAGTTATGTAAAT T15 
       ATACGAACATCCATA GTTAGATTAGGGTAAATTGAGT GATGTAAT12† 
     ATATAA GAACATCCGATGCTAGATTA TGGTAGATTGAGTTATATGATTGTAATAT05† 
ATATA TAACACGAACATCTGATGT ATAAACTGAAGTAAATTGAT16† 

 

    ATATAC AACGATGATCACTGAGATTTTACCTAATATGGATGTT AAT13* 

   ATATAT AAACGATGACTACTAGAATTCTACTCAATGTGAATGTT AAT14 

     ATATAT ACGATGACTACCAAAATTCTATCTGATATGAATGT GATAATAT11 

    ATATAC AACGATGATCACTGAGATTTTACCTAATATGGAT T17 

    ATATAC AACGATGATCACTGAGATTTTACCTAATATGGATGTTT T09 

   ATATAT AAACGATGACTACTAGAATTCTACTCAATGTGAAT T15 

              GATGACTACTAGAATTCTACTCAATGTGAATGTT AATTTATGATAT14 

    ATATACAACT ATGATCACTGAGATTTTACCTAATATGGATGTT AAT11 

ATATATA CAAAACGATGATTACCGAGATTTCATTTAATATGA TTGTCTAATTTT 

   ATATAT AAACGATGACTACTAGAATTCTACTCAATG AATGTTAATTTT 

   ATATAT AAACGATGACTACTAGAATTCTACTCAATGTGAATGTTT T12 

   ATAATAT AACGATGACTATCAGAACTTTACTCGA GATGAATGT13 

    ATATAT AACGATGACTACTGAGACTCTATCTGACGTGAATGTT AAT12 

    ATATAT AACGATGACTACTGAGACTCTATCTGACGTGAATGTTT T07* 

 

   

13,915 

636 
312 
266 
4 

 

1,709 
137 
21 

ATATTA GATGATAACTCAGTGTAGATTGATCTGTAGAATGAT AATAT13 

      ATAT ATAACTCAATGTAGATCGATTTGTGAGATGATGATTGTCAA T11 

        ATGATAACTCAGTGTAGATTGATCTGTAGAATGAT AATATATAATGT05 

   ATATACA ATAACTCAATGTGAATTAATCTGTGAGAC TAT14 

        AT ATAACTCAATGTAGATCGATCCGTAGAATGATGATTGCTAA T11 

   

ATAT ATAAATACAACGGTGATAATTCGATGTGA TGATATCTGTAAT16 

      ATAA ATAACGACGATGATTCAGTGTAAATTGGT ATGTGAAATGATGT12 

              ACGACGATAGCTTGATGTAGGTTAGT GTGTGAGATAATGATAT17 

 

   

1,964 
1,269 
1,073 
13,719 
2,361 
159 
133 

 

2 
894 

 

23 
9 

 

1,950 
923 
215 

 

24,242 
5,218 
2,651 
1,071 
203 
13 

29,495 
11,472 
7,969 
4,922 
582 
526 
127 
111 

 

1 

 

176 
832 
237 

 

Reads 

   ATATAA ATGCATACAAGAATCATAGTAAGTACAGTGATGATAATTT T09 

ATATA AAACATGTATACAAAAATTACAGTAAATGCGACGA AAATAAT13 

   ATATAA ATGCATACAAGAATCATAGTAAGTACAGTGATGATAATT AT14 

  ATATATAAATGCATAC AAGGCCATGATAGATACAATGATGATAA AT11 

ATGTAC AACATGCATACAGAGACTATAATAGATACAGTGATGATAATTT TTATTTT 

ATGTAC AACATGCATACAGAGACTATAATAGATACAGTGATGATA T19 

ATGTAC AACATGCATACAGAGACTATAATAGATACAGTGATGAT T07 

   

      ATAT ATAAATGCGTAATGGTATCTTTCGGGTAGATATGTGTATAAA T14 
AAATTGTATATAT AATGCGTAATGGTGTTTGTTG AGCGAGTGTGTGTTAT15† 

 

   AT ATATATAACAAACAATGGGTGTATAATGGTATCTGAT TGTGCATTAATTTAAAAT14 

AT AAAACACATAATAAATGATGAGTGCGTGAT ATAGGTATAAT13 

 

AAAAA AAACAACAAGAACACATAGCAGATAGTGGGTG ACGTATAT12 

   A TAAACAACAAGAACACATAGCAGATAGTGGGTG ACGTATATGAT13 

  AAAT AACAACAAAAACATATAATAAATGATGGATGTGTA TAAGGTATAT15* 

 

  ATAAAACTTGGA CGCTAATAGATATATGGTTAGATAATGAGAATATAT T13 

  ATAAAACTTGGA CGCTAATAGATATATGGTTAGATAATGAGAATATATA T14 

  ATAAAACTTGGA CGCTAATAGATATATGGTTAGATAATGAGAATATATAGT TAAT11 

  ATAAAACTTGGA CGCTAATAGATATATGGTTAGATAATGAGAATAT T10 

   TAAAACTTGGA CGCTAATAGATATATGGTTAGATAATGAGAATATATGGT TAAT14 

  AT AAAACTTGGGCGCTAATAGATATATGGTTAGATAATGAGAATATAT T12 

    ATACTTGGGCACA CAATAGATATATGGCTGAATAATAGAGATATATAAT T13 

    ATACTTGGGCACA CAATAGATATATGGCTGAATAATAGAGATATAT T18 

    ATACTTGGGCACA CAATAGATATATGGCTGAATAATAGAGATATATA T12 

    ATACTTGGGCACA CAATAGATATATGGCTGAATAATAGAGATAT TCTTTT 

    ATACTTGGGCACA CAATAGATATATGGCTGAATAATAGAGAT T14 

ATATATAACTTGGGCA TCAATGAGTATATGGTTAAGTGATGAAGATATAT T10 

ATATAT AACTTGGGCGTCAATGAGTATATGGTTAAGTGATGAAGATATAT TTT 

    ATACTTGGGCACA CAATAGATATATGGCTGAATAATAGAGATATA ATCTTTT 

   

          ATAT TAAAACAACAATCAAATGATAGAAGCTTGGGTG ACTGTGAATATTTT 

 

ATATA TAAATAACATAAGACGATAACTGAATAATAGAAATT AAGTGTCAAT14 

ATATATT AATAACATAGAGCAATGACCGAGTAATGA TAT12 

ATATA TAAATAACATAAAACGATAACTGAATGATAGAGATT AAGTGTCAAT09 

   

ND8 gRNA Sequences cont. 

209 

489 
489 

 

 

 

500 
494 
495 
495 
494 
495 
500 
495 

523 

554 
554 
554 
554 

 

531 
528 

 

540 
536 
539 
536 
539 
539 
541 
539 

 

567 

 

598 
598 
598 
598 

21,576 
5,835 

 

2,861 
1,242 
884 
151 
1,101 
405 
198 
127 

 

413 

 

20 
7 

5,311 
386 

 ATAG ATAAAACACAAATAAAGGTCAAGTGATATAGAGTGATGATTAA T14 

ATAAATAT AAACACAAATAAAGGTCAAGTAATGTAGAGTGATAATTAA T12 

 

ATATAT AATAACTACACGAGACATGAATAGAAATTAAGTGATGTAAA T12 

    ATATAT ACTACATAAAACACAGATAAGAATCAGATAGTGTGAGATAATA T15 

   ATAT ATAACTACACAAAATATAGGTAAAAGTTAGATGATGTGAAATGAT T11 

    ATATAT ACTACATAAAACACAGATAAGAATCAGATAGTGTGAGATAAT T15 

   ATAT ATAACTACACAGAACATAGATAAGAGTCAGATAGTATAAAGTGATA T19 

   ATAT ATAACTACACAGAACATAGATAAGAGTCAGATAGTATAAAGTGAT T15* 

  ATA AAATAACTACACAGAATACGAGTAAAGATTGAATGATGTAAA TAAT18 

   ATAT ATAACTACACAGAGCACAAATGAAAGTTAAGTAATGTGAAATAGT T23 

   

ATAAA TATAAACACAGTAAATCACTCGAGATAGATAGTTATATGAGATAT T11 

   

ATATA TAATTTCACCGTGAATTTCTTTAGATTGTAGATATGATAA T13 

ATATA TAATTTCACCGTGGATTTCTTTAGATTATAGATATGATAA T06 

ATATA TAATTTCACCGTGGATTTCTTTAGATTGTAGATATGATAA T05 

ATATA TAATTTCACCGTGGATTTCTTTAGATTGTAGATATGGTAA T15* 

 

 

210 

J) NADH Dehydrogenase 9 

3' 
71 
72 
72 

 

105 
101 

 

124 
124 
124 

 

160 
167 
162 
162 
162 
162 

 

193 
187 
193 

 

216 
216 
216 
221 

 

242 
249 
249 
248 
248 
248 
249 

 

279 
268 
268 
279 
288 
286 
286 
286 
286 
279 
286 
289 
268 
286 
286 
282 

 

312 
314 
314 
314 
314 

 

339 
343 
343 
339 
339 
339 
338 

 

 
 
 

 

 

 

 

 

 

 

 

 

5' 
33 
29 
25 

60 
60 

87 
87 
87 

117 
130 
117 
119 
116 
121 

149 
147 
150 

176 
177 
176 
176 

201 
204 
205 
202 
203 
205 
203 

239 
239 
240 
238 
239 
239 
239 
238 
243 
239 
242 
247 
239 
246 
243 
236 

272 
273 
271 
275 
272 

303 
298 
298 
297 
304 
305 
301 

 

 
 
 

 

Reads 

18 
93 
32 

 

540 
472 

 

1 
536 
453 

 

1,145 
137 
906 
65 
55 
30 

ND9 gRNA Sequences 

 ATAT AAACATAAACGAAATGAGCATAGAAGTATATGTATGATG ATATTAT14 

ATAT AAAACATAAACGGAATAAATATAAGAGTGTATATGTGATATAAT T15 

ATAT AAAACATAAACGGAATAAATATAAGAGTGTATATGTGATATAATGTTT T11 

   

A TATAACACAAACAATAGAATAAAGTTAAGTGAGAATATGAGTGA TTTAT11 

 ATAT ATACAAACAATAAAATAGAATTAGATAGAGGTATAAGTGA TGTAAAT11 

   

  ATATCAAC AAACAAATAGAGCACTGTCTATGATATAAGTGATAA TTCT09 

AAATATCAAC AAACAGACGAGACATCATCTACAGTATAAGTGATAA TAT10 

AAATATCAAC AAACAAATAGAGCACTGTCTACGATATAAGTGATAA T05 

   

  ATAT  ATAAAACAATAAAGAAATAGAAGGCTACAGTTAATGAGATAAAT T13 

 AAAATTAACAAAACAATAAAAGAACGAGAAATTACAGT GAATGAAGTAAAT07 

  ATA TAACAAAACAATAAAGAAATGAGAAACTGTGATTGATAAGATAAAT T12* 

ATATA TAACAAAACAATAAGAAGACAGAGAGTTACAGTTAATAAGATAA TTAT19* 

  ATA TAACAAAACAATAAAGAAATGAGAAACTGTGATTGATAAGATAAATA T05 

  ATA TAACAAAACAATAAAGAAATGAGAAACTGTGATTGATAAGAT T11 

 

   

1,196 
5,984 
463 

ATAT  AATAAAAACATACAATAAAATAGAAGGAAACTAATAAGATGATAG T15 

     ATATAT AACATACAATAAGACGAAGAGAAACTAATAGAGTGATAAGA T15* 

 ATAT AATAAAAACATACAATAGAATGAAAGGAAACTAATAAGATGATA TAT11 

 

   

3,869 
357 
620 
63 

 

87 
652 
485 
101 
94 
77 
58 

 

569 
561 
346 
86 
23 

26,097 
7,153 
1,828 
1,812 
553 
499 
451 
300 
262 
236 
197 

 

43 

5,869 
250 
157 
93 

ATACAATAT AAAACAAAAGTCACAGATTAAGGGATAGAGATGTGTGATAA T14 

ATACAATAT AAAACAAAAGTCACAGATTAAGGGATAGAGATGTGTGATA T11 

ATACAATAT AAAACAAAAATCGTAGATTAAAAGATAGAGATATATGATAA T14 

ATAT ATATAAAAACAAAAGTCACGAATTAGAAAGTAAAGATATGTGATAA TTTT 

   

      ATATAA AATCAATAACAGATCATGATGATATAGAGATAGAAGTTATAA  T15 

  AAA AAAAATCAATCAATGATAGATCACAGTGGTATAGGGATGAGAATTA AT11 

 AAAA AAAAATCAATCAATGATAGATCACAGTGGTATAGGGATGAGAATT T11 

  ATAT AAAATCAATCAATAGCAAGTTACGATAATATAGAAGTGAGAATTATA T13 

  ATAT AAAATCAATCAATAGCAAGTTACGATAATATAGAAGTGAGAATTAT T13 

  ATAT AAAATCAATCAATAGCAAGTTACGATAATATAGAAGTGAGAATT T10* 

AAAAA AAAAATCAATCAATGATAGATCACAGTGGTATAGGGATGAGAATTAT T05 

   

    AAATTAT ACAACATAAGACGACGAAGATAAAGGTCATAGAGATTAATT T12 

             AACATAAAT CGGTGAAGATAGAGACTATGAGAATTAATT T07 

                ATAAAT CGGTGAAGATAGAGACTATGAGAATTAAT ATGT15 

   AAAATTAT ACAACATAAGACGACGAAGATAAAGGTCATAGAGATTAATTA T11 

AT AAATATACAACAATATAGAACGATGAAAGTGAAGATTATAAGAGTTAATT T12 

     ATATATAACAACATAGAACGATGAGAATAGAGATCATGAAAGTTAATT T11* 

ATAC ATATACAACAATATAGAACGATGAAAGTGAAGATTATAAGAGTTAATT T11 

     ATATATAACAACATAGAACGATGAGAATAGAGATCATGAAAGTTAATTA T11 

     ATATATAACAACATAGAACGATGAGAATAGAGATCATGAAAGTT T10 

    AAATTAT ACAATATAAGACGACGAAGATAAAGGTCATAGAGATTAATT T10* 

ATAC ATATACAACAATATAGAACGATGAAAGTGAAGATTATAAGAGTTA T11 

  AAAATATACAACAATATGAAGCGACGAAAATAGAGACTATAGA TATTAT06 

                ATAAAT CGATGAAGATAGAGACTATGAGAATTAATT T13* 

     ATATATAACAACATAGAACGATGAGAATAGAGATCATGAAA T11 

ATAC ATATACAACAATATAGAACGATGAAAGTGAAGATTATAAGAGTT T17 

         ACAACAATATAGAACGATGAAAGTGAAGATTATAAGAGTTAATTAAT T13 

   

    ATAA GAACACACAGAGACAAGCAGAGTAAAGTGTATAGTGACATA T06 

ATATAT ACGAACACACAGAAATAAGCAAGGTAGAATATATGATGATAT T13* 

  ATAT ATGAACACACAGAAACAGATGAGATAGAGTATACAGTGATATAA T17* 

ATATAT ACGAACACACAGAAATAAGCAAGGTAGAATATATGATGAT T12 

  ATAT ATGAACACACAGAAACAGATGAGATAGAGTATACAGTGATATA T14 

 

   

44,686 
2,681 
1,741 
270 
139 
6,792 
260 

 ATAT ACAAACAACACAAGATAGAGCACAAATGAATGCACAG TGATAAT13 

  ATAAACAAACAACACAGAGCAGAATACAAGTGAGTATATGAGAATA T11 

  GTAAACAAACAACACAGAGCAGAATACAAGTGAGTATATGAGAATA T13 

 ATAT ACAAACAACACAAGATAGAGCACAAATGAATGCACAGGGATAA TAT11 

 ATAT ACAAACAACACAAGATAGAGCACAAATGAATGCACA CTGATAAT13 

 ATAT ACAAACAACACAAGATAGAGCACAAATGAATGCAC TGAGATAAT15 

ATATAT TAAACAACACGAAACGAAGCATAGATGGATATATGAGA TTCTTTT 

 

 
 
 

   

 

 

 

211 

3' 
366 
366 
368 
369 

 

392 
392 
392 
392 
392 
392 
392 
392 
392 

 

421 
418 

 

447 
447 
447 

 

484 
483 
472 
483 
483 
472 
483 

 

514 
514 
514 
514 

 

549 
542 
539 
547 
547 
547 
547 

 

566 
566 
566 
566 
566 

 

578 
580 

 

604 
600 

 

612 
611 
611 

 

625 

 

644 
640 

 

659 
658 

Reads 

36 
17 
157 
45 

 

10,040 

270 
232 
5,765 
2,338 
691 
575 
189 
167 

 

1 
317 

 

761 
667 
82 

 

72 
185 
130 
46 
34 
16 
25 

 

234 
2,106 
1,535 

59 

 

746 
1,407 
935 
576 
339 
55 
36 

 

15 
11 
89 
72 
24 

 

13 
39 

 

76 
78 

 

66 
66 
17 

 

7 

 

ND9 gRNA Sequences cont. 

  AT ATTAAAACACAATCCGAAGAGTACAAATAGACAGTATAAGATAAAAT T11 

  AT ATTAAAACACAATCCGAAGAGTATAAATAGACAGTATAAGATAAAAT TAAT19 
AT AAATTAAAACACAATCTAAGAAATA GAGATAGACAGTATAAGATAAAAT08† 
  AAAATTAAAACACAATCTAAGAAATA GAGATAGACAGTATAAGATAAAAT08† 

 

 ATATT AAACGCATAACAGAAATGATTAGAACTGAGATATGATT AAT14 

ATATAT AGACGCATAATAAAAGCAACTGAGGCTAGAATATGATTTAA T19 

   TAT AAACGTATAACAAAAGCAGTTAAAGCTAGAATGTGATTTAA T14 

 ATATT AAACGCATAACAGAGACAATTAGAACTGAGATATGATT AAT10* 

ATATAT AAACGCATAATAAGGGCAACTGAAACTAGAGTATGATTTAA TTTCT12 

 ATATT AAACGCATAACAGAGACAATTAGAACTGAGATATGATTT T19 

 ATATT AAACGCATAACAGAGACAATTAGAACTGAGATAT TTT 

  ATAT AAACGCATAATAAGGGCAACTGAAACTAGAGTATGATTT T16 

 ATATT AAACGCATAACAGAGACAATTAGAACTGAGAT T11 

   

   AATCAAAATATTCGTGTTCTGATGATAGAGATGTATAAT T10 

ATATA TAAAACATTCGCGTTCTAATGATAGAGATGTATGATAA T12 

   

ATATAAA ATTACCAACAGAATAGAAGCTAGACAAGTTAAGATATTT T08 

ATATAAA ATTACCAACAGAATAGAAGCTAGACAAGTTAAGATATTC T12 

ATATAAA ATTACCAACGAAATAAAGACTAAGTGAATTAAGATGTT ATAT05 

   

ATAT  AAACCAATCAACGAGTAAATGATGTAGAGTATTATTGTCAAT T13 

ATATAT AACCAATCAATAAGTGAGCGATGTAAGATGTTATTGTTAATA T10 

        ATATA TAACAAATAAACGATGTGAGATATCATTACCAGTGA TTTAAT08 

ATATAT AACCAATCAATAAGTGAGCGGTGTAAGATGTTATTGTTAATA T14 

ATATAT AACCAATCAATAAGTGAGCGATGTAAGATGTTATTGTTAAT T13 

        ATATA TAACAAATAAACGGTGTAGAGTGTCA GTAT16 

ATATAT AACCAATCAATAAGTGAGCGATGTAAGATGTTATTGTT T12 

   

ATAT ATAACACTTCAACGGAGAGAGACCAATGAGAGATCAGTTAATA T13 

ATAT ATAACACTTCAACGAGAAGAAGCCAGTGAGAAATCAGTTAAT T11 

ATAT ATAACACTTCAATAGAAAGAAACTGACAGAGAATTGATTAAT T13 

ATAT ATAACACTTCAACGAGAAGAAGCCAGTGAGAAATCAGTT TTTAATTGT14 

   

ATATAAAATAACAATACAAGTGAATCAGATAATGGATGATGTTTCAAT T13 

  ATAT ATAACAATACAAATGAACTGAGTAATGGATAGTA GTTTGACAAT14* 

     ATAT ATAATACAAACAAACTGAGTGATGGATAGTGCTTTAATGAGAA TAT14* 

  ATAGAATAACAATATAAACGAACTAGATGATGGATAGTATTTTGATA TAT16* 

  ATAGAATAACAATATAAACGAACTAGATGATGGATAGTATTTTGATA TATGCCACG 

  GTAGAATAACAATATAAACGAACTAGATGATGGATAGTATTTTGATA TATTTT 

  ATAGAATAACAATATAAACGAACTAGATGATGGATAGTAT CT17 

   

    AAACGTACATATG ATCTCTTCTGCTAGTATATAAGATGATAATA AT14 

     AACGTACATATG ATCTCTTCTGCTAGTATATAAGATGATAATAT T11 

            ATATG ATCTCTTCTGCTAGTATATAAGATGATAAT TTCT16 

AAATAAACGTACATATG ATCTCTTCTGCTAGTATATAAGATGATAATAT T08 

              ATG ATCTCTTCTGCTAGTATATAAGATGATAATA AT15 

   

 ATATT AACGTACATACTGTCTTCTCTATTGACATATAAGATGATAAT T13 

ATAT TAAACGTACATATTATTTTCTCTACTGATATACAAG TGATAAT09 

   

ATATA TATGCAACAACAGAGATAATATTGTAGATGTATATGT GATATCGT11 

    ATATA TAACAACAAAAGTGACATTGTAAACGTATATG ATGAGTTTT 

   

        TATT AATTGGTATGTGACAATAGAAGTGATATT T09 

AATGCAGATAATT ATTGGTGTGCAATGATAGAGGTAATATT T12* 

AATGCAGATAATT ATTGGTGTGCAATGATAGAGGTAATATTGT T12* 

   

A AATGCAAATAGAAGTTGGTGTGCAAT TATAGAGATGATAT09 

   

274 
1 

 

1 

1,673 

ATATATAAGAATTACAATGGT GTATTAAGTGAAGTAATGTAAATGAGAATT T12 

                   GATA GTTAGATAGAATAATGTAAGTGAAAATT T12 

   

ATATATAA AGGATTACAGTGGTGATATTGAATAGAGTGATGTAAATG T12 

   ATATAT GAATTACAACGGTGATATTGAATAGAATAATGTGA TTGAAAT14* 

 

212 

5' 
319 
319 
343 
343 

 

352 
349 
349 
352 
349 
351 
356 
351 
358 

382 
380 

409 
409 
410 

439 
438 
437 
438 
439 
447 
442 

468 
469 
469 
472 

502 
509 
497 
501 
501 
501 
508 

534 
533 
535 
533 
534 

535 
543 

568 
569 

582 
582 
580 

597 

609 
609 

615 
618 

 

 

 

 

 

 

 

 

 

 

 

 

 
 
 

 

K) Ribosomal Protein Subunit 12 

3' 
78 
76 
78 
78 

 

109 
99 

 

106 
115 
115 
115 
115 
115 

 

121 
121 
121 
121 
121 
121 

 

131 

 

158 

 

170 
164 
158 

 

208 
195 
207 
201 
207 
207 

 

235 
235 
235 
235 
235 
229 
235 
235 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 

 

 

 

 

 

 

5' 
43 
35 
43 
38 

63 
66 

74 
73 
73 
74 
77 
79 

96 
92 
96 
96 
93 
94 

96 

119 

139 
132 
133 

169 
164 
158 
164 
156 
158 

198 
194 
196 
198 
194 
198 
200 
196 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

Reads 
12,531 
2,218 
1,663 
423 

RPS12 gRNA Sequences 

ATAT ACAACAACCATATGAAGATCATGTACGTAGAAGA TTGATATAT14 

   ATA AACAACCGTACAGAAGTTACATATGCAGAGAAGGTGAGAT TTAT12 

ATAT ACAACAACCATATGAAGATCATGTACGTAGAAGA TTGATATAT12 

ATAT ATAACAACCATACAGAAATCGTATATGTGAGAGAAGTGA TTTCT15 

 

   

5,122 

32 

 

1,212 
120 
1,879 
1,091 
466 
311 

 

233 
90 

1,542 
1,066 
793 
219 

 

2 

 

3 

 

44 
18 
56 

 

4,724 
146 
67 
896 
192 
104 

AT ATAATATAAAACAAATAAGACAGAGTGTAGATAGTAATTGTATGA TAT12 

          AT ATAAATAAAACAAAACGTAAATGATAACTGTG AGATGAT07 

   

          ATAT ATATAAAACAAATAGAATAGAACGTAGATGAT TACTGTATAAT12 

    A TATATAATAACATAAAACAAATAGAACGAGATGTAAATGATA TCTAT13 

  ATA TATATAATAACATAAGACAGATAGAACGAGATGTAAATGATA T21 

  ATA TATATAATAACATAAGACAGATAGAACGAGATGTAAATGAT T13 

ATATA TATATAATAACATAAGACAGATAGAACGAGATGTAAAT T05* 

  ATA TATATAATAACATAAGACAGATAGAACGAGATGTAA T14 

 

              AATCA CGGATTTATATAGTAACGTAAAATGA TATTAT12 

      AACTGGGC-ATCT CGGATTTGTATAGTGATATAAAGTGAATAA TTTT 
ATATAAACTGTGCAATCGA TGGACTTATATAGTGATGTAAGATGA TAAT12† 
ATATAGAACTAGGCAGTCA CGGATTTGTATAGTAATGTAGAATGA TAT14† 
 ATATAGAACTGGCAATTT CGGATTTATATAGTGACATGAGATAGATA T15*† 
 ATATAGAACTGGCAATTT CGGATTTATATAGTGACATGAGATAGAT T10† 

   

ATATAGAATTA GGCAATCGCGGATTTATATAGTAACGTAAAATGA TAT10 

 

ACT TACAATACACGTTGGTTATCGGAGTTAGGTGATTGTG ACTTAT10 

 

 ATATA ACGGCATATAGTATACGTCGGTTACTAGGATTGTGTAATTTT 

CATATAAA GGCATATAGTATACGTCGGTTACTGGGATTGTGTAAT07 

       ATACT TACAATACACGTTGGTTATCGGAGTT AGATGAT13 

   

ATAAAT AACAACGCAATATCCGAGTAAGATTGTATAAGTGAGATAT AT12 

        ATAACGCAACA TCAGATGAGATTATATAAGTGAGATATG ATATAT11 

   ATAT ATAACGCAACATTCGAATGAGATTATGTAGATGAAATATGGTAT TAT05 

          ATA TAACATCCAAACAAGATTATATAGGTAGAGTATG ATGTATAATTTTAT22 

   ATAT ACAACGTAACATTCAGATAAGATTGTGTAGATAGAATATGGTATAT T12 

   ATAT ACAACGTAACATTCAGATAAGATTGTGTAGATAGAATATGGTAT T06 

 

   

4,950 
3,025 
222 
444 
338 
36 
25 
20 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ATATATAATGAC TAACTAAACTGATAAAGCAGTAGAAGAGATGATGTAAT T11 

ATATATAATGAC TAACTAAACTGATAAAGCAGTAGAAGAGATGATGTAATATTT T11 

     TAATGAC TAACTAAACTGATAAAGCAGTAGAAGAGATGATGTAATAT AT14 

ATATATAATGAC TAACTAAACTGATAAAGCAGTAGAAGAGACGATGTAAT T12* 

ATATATAATGAC TAACTAAACTGATAAAGCAGTAGAAGAGACGATGTAATATTT T10 

           ATAACTT GACTAATAGAGTAGTGAGAGAGACAGTGTAAT T08 

ATATATAATGAC TAACTAAACTGATAAAGCAGTAGAAGAGACGATGTA T15 

    ATAATGAC TAACTAAACTGATAAAGCAGTAGAAGAGACGATGTAATAT AT12 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

213 

5' 
203 
203 
205 
209 
206 
200 
203 
203 
208 
204 
203 
207 
203 
210 
211 
203 
203 
203 
205 
212 
203 
203 
203 
203 
203 
203 
203 
203 

 

 

 

234 
234 
248 
234 
248 

288 
267 
269 
288 

309 
309 

3' 
246 
246 
246 
246 
246 
246 
246 
246 
246 
246 
246 
246 
246 
246 
246 
250 
246 
246 
246 
246 
246 
246 
246 
246 
246 
246 
246 
245 

 

264 
280 
281 
280 
282 

 

322 
322 
308 
322 

 

349 
336 

Reads 
341,382 
1,324 
1,091 
964 
922 
609 
544 
530 
484 
430 
370 
370 
312 
282 
268 
267 
196 
193 
177 
176 
173 
157 
141 
130 
119 
117 
116 
56 

 

35 
24 
14 
434 
270 

 

16,349 

10 

3,731 
909 

 

128 
195 

RPS12 gRNA Sequences cont. 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAGATAAT AT14 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAAATAAT AT15 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAGATA T13 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGA TAATAT12 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAGAT T14 

  ATAT ATAATGACATAATTAGACTGATAAGATAACGAGAAAAGTGATGTA T12 

ATATAT ATAATGACGTAACTGAGCTAATGAAGCAATGAGAGAGATAAT AT13 

ATATAT ATAATGACGTAACTGAGCTAATGAGACAATGAGAGAGATAAT AT12 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAG TTAATAT11 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAGATAA AAT13 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGGGATAAT AT12 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAGA AATAT13 

ATATAT ATAATGACGTAACTGAACTAATGAGGCAATGAGAGAGATAAT AT12 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAG CGATAATAT14 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGA T12 

AT ATAAATAATGACATAACTAGGTTAGTAAAGTGACGAAGAAGATAAT ATTATTTT 

ATATAT ATAATGACGTAACTGAGTTAATGAGGCAATGAGAGAGATAAT AT13 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAACGAGAGAGATAAT AT21 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAAATA T15 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAG CGAGATAATAT14 

ATATAT ATAATGACGTAACTGAGCTAATAAGGCAATGAGAGAGATAAT AT14 

ATATAT ATAATGACGTAACTGAGCTAATGGGGCAATGAGAGAGATAAT AT12 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCGATGAGAGAGATAAT AT19 

ATATAT ATAATGACGTAACTAAGCTAATGAGGCAATGAGAGAGATAAT AT13 

ATATAT ATAATGACGTAACTGAGCTAATGAGGTAATGAGAGAGATAAT AT12 

ATATAT ATAATGACGTAACTGAGCCAATGAGGCAATGAGAGAGATAAT AT16 

ATATAT ATAATGACGTAACTGGGCTAATGAGGCAATGAGAGAGATAAT AT13 

ATAAAAT TAATGACATAACTAAATTGATAGGGTAATGAGAGAGATAAT AT09 

   

                   TAG TGCCTTCTATAGTAGATGATGATATA TGAT14 

 ATATA AGATCAACAAAACTGCCATTTTCTGTAGTAAGTGATGATATA T14 

  ATA TAAATCAACAGAACTGCCATCTTTTGTAGTA TAGTGATATAAT13 

 ATATA AGATCAACAAAACTGCCATTTTCTATAGTGAGTGATGATATA T16 

ATAT GTAAATCAACAGAACCGTCATCTTTTGTAGTA TAGTGATATAAT12 

   

ATA TACAATACGTGTATGATATTTTATACT AGGTAGATCAGTGAAATT T12 

ATA TACAATACGTGTATGATATTTTATACTGGGTAGATCAGTGAAATT T07 

           ATA TATAATACTTTACATCGGGTAAATTGACGAGA ACATGAT11 

ATA TACAATACGTGTGTAATATTTTATACT AGGTAGATCAATGAAAT15 

   

AAATAT AACATATCTTATATCTGAATCTAACTTGTAATATGTG AAT16 
 AAATAAAACATATCTGAT TCTAAATCTAACTTGTAATGTGTG AAT25† 

 
*Indicates that the tail sequence was shortened where random nucleotides after the poly U tail 
had been indicated. 
†Indicates that the gRNA was identified under conditions of reduced stringency. 
 

 

 

214 

APPENDIX D. Identified CR3 mRNA and gRNA transcripts. 

A-C: Major CR3 mRNA and gRNA sequence classes. The CR3 mRNA transcriptome was 
generated using the TREU667 cell line. Identified sequences were then used to search gRNA 
transcriptomes from four different cell lines: EATRO 164 Bloodstream (BS), EATRO 164 procyclic 
(PC), TREU 927 procyclic and TREU 667 procyclic. ORF = previously identified Open Reading 
Frame (purple protein sequence). ARF = Newly identified Alternative Reading Frame (green 
protein sequence). Alternatively edited nucleotides are shown in Red. Inserted U-residues are 
lowercase while deleted U-residues are shown as asterisks. Canonical Watson-Crick base pairs 
(|); G:U base pairs (:). Previously identified start codons are doubled underlined. Potential 
upstream AUG start codons are indicated by wave underlines. gRNAs were sorted based on 
guiding sequence class. Sequence variations observed in the 3'-U-tail were ignored in assigning 
class. Transcript copy number (Reads), were determined by adding all gRNAs of the same 
sequence class. Only major sequence classes are shown (defined as containing greater than 100 
transcript copies). In the case of rare transcripts, the identified gRNA are shown regardless of 
copy number. gRNA transcript numbers varied greatly between the different cell lines. 
Interestingly, the most abundant mRNA (CR3 Form C, 7147 reads) had the fewest identified 
gRNA reads.  
 

A. CR3 Form A                                                 mRNA Sequence 
AUGUGUAUGAUAUAUAAuuAAuuAuuuuCAuuuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuGuA 

AUGUGUAUGAUAUAUAA--AAuuAuuuuCAuuuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuGuA 

AUGUGUAUGAUAUAUAA--AA--AuuuuCAuuuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuGuA 

Fully Edited Form 

M  C  M  I  Y  N  ST              M  F  D      C  L  V  L  L  F  F  Y  C  L  F  V     
AUGUGUAUGAUAUAUAAuuAAuuAuuuuCAuuuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuGuA 
               |||||||:|:|||||:|:|||:|:|||    |||:|:||||||||||| 
               NUUAAUUAGUGAAAGUGAGAUAUAGACU----AACGAGCCAAAACAACAUGUAUA 
Cell Line 
EATRO 164 PC 
TREU 927 PC 
TREU 667 PC 

gRNA Sequence 
ATATGTACAACAAAACCGAGCAATCAGATATAGAGTGAAAGTGATTAATN 

 ATATAACAACAAAACTGAACAATCAAATGTAGAGTGAAAGTGATTAATN 

 ATATAACAACAAAACTGAACAATCAAATGTAGAGTGAAAGTGATTAATN 

B. CR3 Form B                                                  mRNA Sequence 
AUGUGUAUGAUAUAUAuuAuuAuuuAuuuAuuuuCAuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuG 

AUGUGUAUGAUAUAUA--AuuAuuuAuuuAuuuuCAuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuG 

Fully Edited Form 

M  C  M  I  Y  I  I  I  Y  L  F  S  L  C  L  I      V  W  F  C  C  F  F  I  V  C  L     
AUGUGUAUGAUAUAUAuuAuuAuuuAuuuAuuuuCAuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuG 
             |||:|||:|||:|:|:|::|:||||||:|::||    |||||||||| 
             NUAUGAUAGUAAGUGAGUGGAGGUAAUAUAGGCU----AACAAACCAAUAUAUA 
Cell Line 
EATRO 164 PC 
TREU 927 PC 
TREU 927 PC 
TREU 927 PC 
TREU 667 PC 
TREU 667 PC 

gRNA Sequence 
ATATATAACCAAACAATCGGATATAATGGAGGTGAGTGAATGATAGTATN 

 TATATAACCAAACAATCGGATATAATGGAGGTGAGTGAATGATAGTATATN 

ATATATAACCAAACAATCGGATATAATGGAGGTGAGTGAATGATAGTATATN 

ATATATAACCAAACAATCGGATATAATGGAGGTGAGTGAATGATAGTATN 

ATATATAACCAAACAATCGGATATAATGGAGGTGAGTAAATGATAGTATN 

ATATATAACCAAACAATCGGATATAATGGAGGTGAGTGAATGATAGTATN 

Reads 
2114 
1217 
480 
Reading 
Frame 

ORF 

Reads 

21297 
3398 
452 

Reads 
413 
460 
Reading 
Frame 

ARF +1 

Reads 

5 
14164 
642 
114 
1936 
236 

 

215 

 

 
 

 

 

C. CR3 Form C                                                  mRNA Sequence 
AUGUGUAUGAUAUAUAAAAACA--AuGuGuA*****UGuuGuuGuuuuGuuuuG***AuuuuGGuuGuACAuuuuuuuuG 

AUGUGUAUGAUAUAUAAAAACA--A-GuGuA*****UGuuGuuGuuuuGuuuuG***AuuuuGGuuGuACAuuuuuuuuG 
AUGUGUAUGAUAUAUAAAAACAuuA-GuGuA*****UGuuGuuGuuuuGuuuuG***AuuuuGGuuGuACAuuuuuuuuG 
AUGUGUAUGAUAUAUAAAAACA-uAuGuGuA*****UGuuGuuGuuuuGuuuuG***AuuuuGGuuGuACAuuuuuuuuG 

Fully Edited Form 

 ATAATAAAAATGCACAACTAGAATTGAAGTAAAATGATGATATATATN 

 ATATTAAAAATGCACAACTAGAATTGAAATAAAGTGATGGTATATATN 

 ATATATAAAATGTACAACCAGAATTAAGATAAAGTGATGATGTATATATN 

gRNA Sequence 
 ATAATAAAAATGCACAACTAGAATTGAAGTAAAGTGATGATATATATN 
       AAAATGCACAACTAGAATTGAAGTAAAGTGATGATATATATATN 

M  C  M  I  Y  K  N  N  V  Y       V  V  V  L  F  W     F  W  L  Y  I  F  F  V 
AUGUGUAUGAUAUAUAAAAACAAuGuGuA*****UGuuGuuGuuuuGuuuuG***AuuuuGGuuGuACAuuuuuuuuG 
                     |||:|:||     |:|:::|:||:|:::||:   |:||:||:|||#||||| 
                     NUUAUAUAU-----AUAGUGAUAAGAUGGAAU---UGAAGCCGACACGUAAAUUAAUAUA 
Cell Line 
EATRO 164 PC 
EATRO 164 PC 
EATRO 164 PC 
EATRO 164 BS 
EATRO 164 BS 
EATRO 164 BS 
TREU 927 PC 
TREU 927 PC 
TREU 927 PC 
TREU 667 PC 
TREU 667 PC 
TREU 667 PC 
TREU 667 PC 
* no gRNAs identified. 
 

 ATAATAAAAATGCACAACTAGAATTGAAATAAAGTGATGGTATATATN 

ATATAATTAAATGCACAGCCGAAGTTAAGGTAGAATAGTGATATATATN 

ATATAATTAAATGCACAGCCGAAGTTAAGGTAGAATAGTGATATATATATN 

 ATATATAAAATGTACAACCAGAATTAAGATAAAGTGATGATGTATATATN 

 ATATATAAAATGTACAACCAGAATTAAGATAAAGTGATGATGTATATATN 

 TATAATTAAATGCACAGCCGAAGTTAAGGTAGAATAGTGATATATATATN 

 ATAATAAAAATGCACAACTAGAATTGAAGTAAAATGATGATATATATATN 

 ATAATAAAAATGCACAACTAGAATTGAAATAAAGTGATGGTATATATN 

Reads 
7147 
889 
565* 
505 
Reading 
Frame 

ARF +1 

Reads 

122 
3 
1 
453 
224 
2 
166 
117 
86 
53 
15 
14 
5 

 

 

 

216 

APPENDIX E. ND7 5'-most gRNA populations and the predicted mRNA sequences generated. 

A-J: ND7 gRNA major classes and predicted editing patterns. ND7 terminal (5' most) 

gRNA populations and the predicted mRNA sequence generated. Predicted sequences 
presented are based on the most abundant gRNAs that generate each reading frame found in 
the four gRNA transcriptome databases. Initial characterization of the ND7 transcript was done 
using the EATRO 164 cell line and is unusual in that it is edited in two distinct domains [20]. 
While the 5' domain was edited in both life cycle stages, complete editing of the 3' domain was 
only detected in bloodstream stage parasites. Interestingly, the most abundant EATRO 164 PC 
(procyclic or insect form) gRNA would generate a sequence that brings the 5' most AUG into a 
+2 frame. The ARF is 65 AA long and involves the entire 5' editing domain. In contrast, the most 
abundant gRNAs in the EATRO 164 Bloodstream stage library (EATRO 164 BS), would generate 
sequences that use the originally described ND7 ORF). While gRNA transcript numbers again 
varied greatly between the different cell lines, all three cells lines had gRNA sequence variants 
that allowed access to both reading frames. 

A.  ND7 Form A                                            Predicted mRNA Sequence 

        M  I  S  I  I  L  C  Y  F  W  ST 
M  T  T  W  ST           M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q  
AUGACUACAUGAUAAGUAuCAuuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
               :|||||||:||:||||:|||:||:||||:|:|||||||||||| 
               NUAUAGUAAGAUGCAAUGAAAGCCGUCAAGAGAAUGUAAACAUAUAAA 
Cell Line 
EATRO 164 BS 
TREU 927 PC 
TREU 667 PC 

gRNA Sequence 
  AAATATACAAATGTAAGAGAACTGCCGAAAGTAACGTAGAATGATATN 

  AAATATACAAATGTAAGAAAACTATCGAGAGTGATGTAGAATGATATN 

  AAATATACAAATGTAAGAAAACTATCGAGAGTGATGTAGAATGATATN 

B. ND7 Form B                                              Predicted mRNA Sequence 

M  T  T  W  Y  S  I  I  L  C  Y  F  W  ST  
        M  I  ST          M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q   
AUGACUACAUGAUAuAGUAuCAuuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
             |||:|||||||:||:||||:|||:||:||||:|:|||||||||||| 
             NUAUUAUAGUAAGAUGCAAUGAAAGCCGUCAAGAGAAUGUAAACAUAUAAA 
Cell Line 
EATRO 164 BS 
TREU 927 PC 
TREU 667 PC 

gRNA Sequence 
AAATATACAAATGTAAGAGAACTGCCGAAAGTAACGTAGAATGATATTATN 

AAATATACAAATGTAAGAAAACTATCGAGAGTGATGTAGAATGATATTATN 

AAATATACAAATGTAAGAAAACTATCGAGAGTGATGTAGAATGATATTATN 

C. ND7 Form C                                                Predicted mRNA Sequence 

M  T  T  W  ST 
        M  I  S  T  F  M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q   
AUGACUACAUGAUAAGUACAuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
                   |||||::|:|:||:|:::||::|:|||||||||||||||| 
                   NUAAAUGUAGUGAAGAUUGUCGGAGAAAUGUAAACAUAGCAUAUACA 
Cell Line 
TREU 927 PC 

gRNA Sequence 
  ACATATACGATACAAATGTAAAGAGGCTGTTAGAAGTGATGTAAATN 

D. ND7 Form D                                        Predicted mRNA Sequence 

M  T  T  W  ST 
        M  I  I  V  S  F  M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q 
AUGACUACAUGAUAAuuGUAuCAuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
              |||:|||||:||||:|||::|:||:|||:||:||||||||||||| 
              NUAAUAUAGUGAAUAUAAUGGAGACUAUCGAAGAAAUGUAAACAUAUAAA 
Cell Line 
EATRO 164 PC 
TREU 927 PC 
TREU 667 PC 

gRNA Sequence 
AAATATACAAATGTAAAGAAGCTATCAGAGGTAATATAAGTGATATAATN 

AAATATACAAATGTAAAGAAGCTATCAGAGGTAATATAAGTGATATAATN 

ATATACACAAATGTAAAGAGACTATCGAGAGTGACATAAGTGATATAATN 

217 

 

 

 

 

 

Reading 
Frame 

ORF 

Reads 

20487 
10537 
787 

Reading 
Frame 

ORF 

Reads 

35079 
38432 
4365 

Reading 
Frame 

ORF 

Reads 

75654 

Reading 
Frame 

ORF 

Reads 

240 
477 
1152 

E. ND7 Form E                                        Predicted mRNA Sequence 

M  T  T  W  ST 
        M  I  M   T  F  F  M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F   G  P  Q   
AUGACUACAUGAUAAuG*ACAuuuuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
              ||: |#||:::|||||::|:|:|:::||||:|:|||||||||:| 
              NUAU-UAUAGGGAAUACGGUGAGAGUUAUCAGAGAAAUGUAAAUAAUAUA 
Cell Line 
EATRO 164 PC 

gRNA Sequence 
ATATAATAAATGTAAAGAGACTATTGAGAGTGGCATAAGGGATATTATN 

F. ND7 Form F                                              Predicted mRNA Sequence 

M  T  T  W  ST 
        M  I  S  T  F  M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q   
AUGACUACAUGAUAAGUACAuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
                   |||||::|:|:||:|:::||::|:|||||||||||||||| 
                   NUAAAUGUAGUGAAGAUUGUCGGAGAAAUGUAAACAUAGCAUAUACA 
Cell Line 
TREU 667 PC 

gRNA Sequence 
ACATATACGATACAAATGTAAAGAGGCTGTTAGAAGTGATGTAAATN 

G. ND7 Form G                                              Predicted mRNA Sequence 

M  T  T  W  Y  S  I  I  Y  V  I  F  G  S  F  F  T  F  V  S  F  Y  I  W   S  T  A 
AUGACUACAUGAUAuAGUAuCAuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
             |||:|||||:|||||::|:|:|:::||||:|:|||||||||:| 
             NUAUUAUAGUGAAUACGGUGAGAGUUAUCAGAGAAAUGUAAAUAAUAUA 
Cell Line 
EATRO 164 PC 
EATRO 164 BS 

gRNA Sequence 
ATATAATAAATGTAAAGAGACTATTGAGAGTGGCATAAGTGATATTATN 

ATATAATAAATGTAAAGAGACTATTGAGAGTGGCATAAGTGATATTATN 

H. ND7 Form H                                              Predicted mRNA Sequence 

M  T  T  W  ST 
        M  I  S  T  F  Y  V  I  F  G  S  F  F  T  F  V  S  F  Y  I  W   S  T  A 
AUGACUACAUGAUAAGUACAuuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
                   |||:||:||||:|||:||:||||:|:|||||||||||| 
                   NUAAGAUGCAAUGAAAGCCGUCAAGAGAAUGUAAACAUAUAAA 
Cell Line 
EATRO 164 BS 

gRNA Sequence 
AAATATACAAATGTAAGAGAACTGCCGAAAGTAACGTAGAATN 

I. ND7 Form I                                            Predicted mRNA Sequence 

M  T  T  W  ST 
        M  I  S  T  M  L  F  L  V  V  F  T  F  V  S  F  Y  I  W   S  T  A 
AUGACUACAUGAUAAGUACAAuGuuAuuuuuGGuAGuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
                    ||:|:|||:|::|||::||:||||:|::||||:|||||||||| 
                    NUAUAGUAAGAGUCAUUGAAGAUGUGAGUAUAGUAAAAUGUAAAAUAUA 

Cell Line 
TREU 667 PC 

gRNA Sequence 
ATATAAAATGTAAAATGATATGAGTGTAGAAGTTACTGAGAATGATATN 

J. ND7 Form J                                           Predicted mRNA Sequence 

M  T  T  W  ST 
        M  I  S  T  F  I  V  I  F  G  S  F  F  T  F  V  S  F  Y  I  W   S  T  A 
AUGACUACAUGAUAAGUACAuuuAuuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAG 
                   ||||||::|:|:||:|:::||::|:|||||||||||||||| 
                   NUAAAUAGUAGUGAAGAUUGUCGGAGAAAUGUAAACAUAGCAUAUACA 

Cell Line 
TREU 927 PC 

gRNA Sequence 
ACATATACGATACAAATGTAAAGAGGCTGTTAGAAGTGATGATAAATN 

 

218 

 

 

 

 

 

 

 

 

Reading 
Frame 

ORF 

Reads 

765 

Reading 
Frame 

ORF 

Reads 

6623 

Reading 
Frame 

ARF +2 

Reads 

100761 
354 

Reading 
Frame 

ARF +2 

Reads 

402 

Reading 
Frame 

ARF +2 

Reads 

12929 

Reading 
Frame 

ARF +2 

Reads 
251 

APPENDIX F. RPS12 5'-most gRNA populations and the predicted mRNA sequences generated. 
A-E: RPS12 gRNA major classes and predicted editing patterns. RPS12 terminal (5' most) 

gRNA populations and the predicted mRNA sequence generated. RPS12 differs from both CR3 
and ND7 in that the alternative edit that shifts the reading frame occurs just downstream of the 
previously identified start codon (double-underlined). We do note that the identified 
alternative gRNAs are rare in all of the gRNA libraries except TREU 667. 
 

A.  RPS12 Form A                                                Predicted mRNA Sequence 

                               M  W   F  L  Y  G   C  C  L  R   F  V  L  F  V  
CAAACUAAAGUAAuAuAuuAGuuuuuuGCGuAuGuGA*UUUUUGUAUG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGu 
            ||||||:|:|:||:|:||||::||| |:||::|||| |||||||| 
            NUAUAUAGUUAGAAGAUGCAUGUACU-AGAAGUAUAC-CAACAACAUAUA 

Cell Line 
EATRO 164 PC 
EATRO 164 BS 
TREU 927 PC 
TREU 667 PC 

gRNA Sequence 
ATATACAACAACCATATGAAGATCATGTACGTAGAAGATTGATATATN 

ATATACAACAACCATATGAAGATCATGTACGTAGAAGATTGATATATN 

ATATACAACAACCATATGAAGATCATGTACGTAGAAGATTGATATATN 

ATATACAACAACCATATGAAGATCGTGTACGTAGAAGATTGATATATN 

B.  RPS12 Form B                                               Predicted mRNA Sequence 

                                  M  W   F  L  Y  G   C  C  L  R   F  V  L  F  V  
CAAACUAAAGUAAuAAAuuuuGuuuuuuuuGCGuAuGuGA*UUUUUGUAUG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGu 
            ||||||:|::::||:|:|||:|||||:| :||:||||:| |||||| 
            NUAUUUAGAGUGGAAGAGACGUAUACAUU-GAAGACAUGC-CAACAAAUA 

Cell Line 
EATRO 164 PC 
TREU 927 PC 
TREU 667 PC 

gRNA Sequence 
ATAAACAACCGTACAGAAGTTACATATGCAGAGAAGGTGAGATTTATN 

ATAAACAACCATACAGAAGTTACATATGCAGAGAAGGTGAGATTTATN 

ATAAACAACCGTACAGAAGTTACATATGCAGAGAAGGTGAGATTTATN 

C.  RPS12 Form C                                               Predicted mRNA Sequence 

                                 M  W    F  C  M   V  V  V  Y   V  L  F  Y  L  F 
CAAACUAAAGUAAAAAGuuuuuuuuuuuuGCGuAuGuGA**UUUUGUAUG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGu 
             |||:|:|:|:|:|||::|||||:|||  ||:|||||| |||:| 
             NUUUUAGAGAGAGAAAGUGCAUAUACU--AAGACAUAC-CAAUAUAUA 

Cell Line 
EATRO 164 BS 

gRNA Sequence 
ATATATAACCATACAGAATCATATACGTGAAAGAGAGAGATN 

D.  RPS12  Form D                                              Predicted mRNA Sequence 

                   M  L  F  F  F  R  M  W    F  C  M   V  V  V  Y   V  L  F  Y  L  F 
CAAACUAAAGUAuuAuAuAAuGuuGuuuuuuuuuCGuAuGuGA**UUUUGUAUG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGu 
           |:||||||||:|::|||:|:|:||:|||::||  ||||||||| |||||| 
           NUGAUAUAUUAUAGUAAAGAGAGAGUAUAUGCU--AAAACAUAC-CAACAAGAUAUA 

Cell Line 
TREU 927 PC 

gRNA Sequence 
ATATAGAACAACCATACAGAAATCGTATATGCGAGAGAAATGATATTATATN 

E.  RPS12  Form E                                            Predicted mRNA Sequence 

                                    M  W    F  C  M   V  V  V  Y   V  L  F  Y  L  F 
CAAACUAAAGUAAuuuAAAuuuuGuuuuuuuuGCGuAuGuGA**UUUUGUAUG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGu 
            ||||||||:|::|:|:|||::|||||||||  ||:|||||| |||:| 
            NUAAAUUUAGAGUAGAGAAAGUGCAUAUACU--AAGACAUAC-CAAUAUAUA 

Cell Line 
TREU 667 PC 

gRNA Sequence 
ATATATAACCATACAGAATCATATACGTGAAAGAGATGAGATTTAAATN 

 

Reading 
Frame 

ORF 

Reads 
12531 
1663 
505 
936 

Reading 
Frame 

ORF 

Reads 

2218 
300 
3834 

Reading 
Frame 

ARF +1 

Reads 

144 

Reading 
Frame 

ARF +1 

Reads 

22 

Reading 
Frame 

ARF +1 

Reads 

2664 

219 

 

 

 

 

 

 

APPENDIX G.  Alignments of T. brucei and T. vivax edited mRNAs ATPase 6 (A), COIII (B), CR3 

(C), CR4 (D), ND3 (E), ND7 (F), ND8 (G), ND9 (H), and RPS12 (I). 

Uppercase letters indicate nucleotides originally encoded in the DNA, lower case u's 

indicate uridines inserted during editing and asterisks indicate uridines removed during editing.   
 
A. ATPase 6 - Pan-edited non-dual coding 
AAAAAUAAGUAUUUUGAUAUUAUUAAAGUAAAuAuGuuuuuAuuuuuuuuuuGuGAuuuA T. brucei 

---------------------------------AuGuuuuuGuuuuuuuuuuGuGAuuuG T. vivax 
 
UUUUG-GuuGCGuuuGuuA----uuAuGuAuGuAuuAuuGuGuAuGAuCuAGGuuAuGuu T. brucei 

UUUUG*GuuGCGuuuGuuA****UUAuGUGuGuAuuAuuGuGuGuGAuCuAGGuuAuGuu T. vivax 
 
uuAuuGuGuAuuuuAA---uUGuuuAAuGuuGAuuuuuG-AuuuuuuAuuAuuuuGuuuG T. brucei 

uuGuuGuGuAuuuuAA***UUGuuuGAuGuuAAuuuuuG*AuuuuuuGuuGuuuuGuuuG T. vivax 
 
*UUUGAuuuGuAuuuGuuuGuuGGuuuGuG***UUUGuuuuuAuuGuuGuGGuuuAuGuu T. brucei 

-uuuGAuuuGuAuuuGuuuAuuGGuuuAuG---uuuAuuuuuGuuAuuGuGGuuuAuGuu T. vivax 
 
GuuuA----AuuuAuAuAGuuuAAUUUUGuAuuA*UUGuAuuAC----uUAUUUG***AA T. brucei 

GuuuA****AuuuGuAuAGuuuGAUUUuGuAuuA*UUGuAuuACC****UAuuuG--*AA T. vivax 
 
uuuG*UAuuUGuuGuuuuGuAuuGuuuuuuuAuuGuA-----uAuuG-CAuuuuuAuuuu T. brucei 

uuuG-uAUUUGuuGUUUuGuAuuGuuuuuuuAuuGuA*****UA-uGuCAuuuuuGuuuu T. vivax 
 
uGuuuuGuuuuuuA-uGuGAuuuuuuuuuGuuuAAuAAuuuGuUA-GuuGGuGAuA**** T. brucei 

uGuuuuGuuuuuuGuuG-GAuuuuuuuuuGUUUAAuAGuuuGuuGUG-uGGuGAuA---- T. vivax 
 
GuuuuAuGGAuG--uuuuuuuuAUUC**GuuuuuuGuuGuGuuuuuuAGAGuGuuuuuCu T. brucei 

GuuuuAuGGAuGuuuuuuuuuuG-*C--GuuuuuuGuuGuGuuuuuuAGAAuGuuuuuCu T. vivax 
 
uuGuuGuGuC----GuuGuuuGuCGACGuuuuuGCGuuuG-UUUUGuAAuuuAuuAuCAu T. brucei 

uuGuuAuGUC****GuuGuuuGuCAACAuuuuuACGuuuG*UUUUGuAAuuuAuuGuCAu T. vivax 
 
CCCAuUUUUUAuuGuuGAuGuuuuuuG-A-uuuuuuuUAuuuuA-uuuuuGuuuuuuuuu T. brucei 

CCCAUUUUuuAuuGuuAAuGuuuuuuGuAuuuuuuuuUAuuuuAuuuuuuG-uuuuuuuu T. vivax 
 
uuuA--------uG----GuGuuuuuuG--uuA-uuGAuuuAuuuuAuuuAuuuuuGuG- T. brucei 

uuuGuuuuuuuuuGuuuuGuG-----uAuuuuAuuuGGuuuAuuuuGuuuG--uuuGUGu T. vivax 
 
-uuuuGuuuuuGuuuAuuAuuuuAU--G-uGuuuuuAuAuUUGuuGGAuuuAUUuGCC-- T. brucei 

uuuuuGuuuuuGuuuGuuGuuuuAUuuGuuG---uuGuAuuuAuuGGAuuuAuuuGCC** T. vivax 
 
***GC-CA-uAuuAC****AGuuAuuuAuuuuuuGuAAuAuGAuuuuGCAGuuGAuAAuG T. brucei 

***GC**GuuAuuAC----AGuuGuuuAuuuuuuGuAAuAuGAuuuuGCAAuuGGuAAuG T. vivax 
 
--G**AuuuuuuGuuGuuuuuGuuG-uuuGuuuAGuuuuGuAuuuGAuuuuuGAuAGuuA T. brucei 

**G**AuuuuuuAuuGuuuuuGuuGuuuuG-uuAG------------------------- T. vivax 
 
uuAuAuuGuuGuuGAAAuuuG**GuuUGuuA**UUGGAGUUAUAGAAUAAGAUCAAAUAA T. brucei 

------------------------------------------------------------ T. vivax 
 
GUUAAUAAUA T. brucei 

---------- T. vivax 
 

 
 
 

 

220 

B. COIII - Pan-edited non-dual coding 
GGUUAUUGAGGAUUGUUUAAAAUUGAAUAAuuAuuAuuuuuuuAuGuuuuuGuuuC**** T. brucei 

------------------------------------------------------------ T. vivax 
 
*GuuGuAuAuuuGuuGGuGuuA-****GuGGuGuuuuuGuuuuuuuAuCuuuACCuGCCA T. brucei 

------------------------------------------------------------ T. vivax 
 
uuGuuAuuGuGuAuuGGuuAuuuuGuuuGuuG****GGAuuuAuuuGuuuAuuGUUUG** T. brucei 

------------------------------------------------------------ T. vivax 
 
**GuAGuuuuuuAuuuGuuGAuuGuG****GuuuuAuuuuuuuuuuuGuuGGuuuuuGuA T. brucei 

------------------------------------------------------------ T. vivax 
 
uuuGuuuGuuGuuGuuAuuGuuAGAuuuGuuuuGuGAuuuuuuACGuGGuuuAuuuGAuu T. brucei 

------------------------------------------------------------ T. vivax 
 
uuuGuGuuuuAuuACGuuGuAuCCAGuAuuGuuuuuuAuGGuuuuuAuGuAG*UGAGuuu T. brucei 

------------------------------------------------------------ T. vivax 
 
GuuuuAuuuAuGGCGuuuuuuG**UUGuAuuAuuuGGuuuAuGuuuAuuuuuGuGuuGuG T. brucei 

------------------------------------------------------------ T. vivax 
 
AGuuuGCUUUCGuuuuuuGuuuACCuuAuAuGuuuuGuuGuuuAuuAuGuGAuuAuGGuu T. brucei 

------------------------------------------------------------ T. vivax 
 
uuGuuuuuuAuuGG*UAuuuuuuAGAuuuAuuuAAuuuGuuGAuAAAuACAuuuuAUUUG T. brucei 

------------------------------------------------------------ T. vivax 
 
uuUGuuAGuGGuuuAuuuGuuAAuuuuuuuGuuuuGuGUUUUUGGuuuAGGuuuuuuuGu T. brucei 

------------------------------------------------------------ T. vivax 
 
uG**UUGuuGuuuuGuAuuAuGAuuGAGuuuGuuGuuuG****G--uuuuuuGuuuuuG- T. brucei 

--------------------------------uuG-uuG--uuGuuuuuuuuGuuuuuGu T. vivax 
 
uGAAACCA--GuuA---UGAGA**GUUUGCAuuGuuAuuuAuuACAuuAAGuuGuGG*** T. brucei 

uGAA-uCA**GuuG***UGGGA**AuuuACGuuGuuAuuuAuuACGuuGAGuuGuGG--- T. vivax 
 
*UG-uuuuuGGuuCuAuuuuAuuuuuAuuG---GAuuuAuUACAuuuuA**UGCAuGuuu T. brucei 

-uG*UUUUUGGuuCuAuuuuAuuuuuAuuG***GAuuuGuuGCAuuuuA--uGCAuGuuu T. vivax 
 
uuuuAGGuGuuuuGuuGuuG-uuuAuuuG-uuuuAuG--CGuuuGuuuAAuuuuuuGuGu T. brucei 

uuuuAGGuGuuuuAuuGuuGuuuuA--uGuuuUUAuG**CGuuuGuuuAGuuuuuuAuGu T. vivax 
 
AuGGAuACACGuuuuGuuuuuuuGuAuuGuGuuuGuuuAuAuuGACAuuuuGuuGA-UUU T. brucei 

AuGGAuACACGuuuuGuuuuuuuGuAuGuuGuuuGuuuGuAuuGACAuuuuGuuGAUUU* T. vivax 
 
AGuuuGAuuuuuuuuAuuGCGAuuuGuuuAuuuuGAuGuuuuAuG----uGuuAuGuAuu T. brucei 

GGuuuGGuuuuuuuuGuuGCGAuuuGuuuAuuuuGAuGuuuuAuG****UGuuAuGuAuu T. vivax 
 
uGuGuGuGuAAuuuuAuuGGuGuuuuUUUAGUUGuuGAuuA*GuuAAuuuGuAuuGGUAG T. brucei 

uGuGuGuGuAG------------------------------------------------- T. vivax 
 
UUUGUAGGAAG-- T. brucei 

------------- T. vivax 
 

 
 
 
 

 

221 

C. CR3 - Pan-edited dual coding 
AGAAAUAUAAAUAUGUGUAUGAUAUAUAAAAACAAuGuuuGA****UUGuuuGGuuuuG- T. brucei 

----------------------------------AuGuuuGA---*UUGuuuAGuuuuGU T. vivax 
 
--uuG-uuuuUUUAuuGuuuGuuuGuACAuuuuuuuuGuuuuuuAuuuGuuuGuG----- T. brucei 

U***GuuuuuuuuAuuG-uuGuuuGuACAuuuuuuuuGuUUUUUGuuuAuuuGuG***** T. vivax 
 
***A**UUUGuuuuuAuGuuuGuuA-*UUUA------GuuuuuGuuuuuuAuuGGAuuuu T. brucei 

***A--uuuGuuuuuAuGuuuGuuG--uuuGuuuuuuG--uuUA---uuuG-uGGAuuuu T. vivax 
 
uGuuuuuuAuuuAA----uA--uGGGuuuA---uUGuuGuGuuuAuuuuuuuuuuuuAuu T. brucei 

uGuuuuuuGuuuAA****UA**UGGGuuuG***UUGuuGUGuuuAuuuuuuuuuuuuGuu T. vivax 
 
uuAuCAuuuGAuAuGuuGuuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUU T. brucei 

uuAuCAuuuGAuAuGuuAuuAuCGuUUUUGuuAuuAuAuAuAAGuUUUCGUUAUUAA--- T. vivax 
 
AAAAAAGUAUGCAAAUAAUUUUUGU T. brucei 

------------------------- T. vivax 
 

D. CR4 - Pan-edited dual coding 
UAAUUUAUUGUUAUCUUUGUGUAUUUAUUAuuAuuuuAuuuuAA---uuuuG---GuuGu T. brucei 

-------------------------------------------AUUuuuuuGuuuGuuGu T. vivax 
 
GC--***AuuuuuuuuuuuuuuAuuuG***GuG*UGuuuGuGuuuuA*UGuA*C*A---- T. brucei 

GC*****AuuuuuuuuuuuuuuAuuuG---GuG-uGuuuGuGuuuuA-uGuA--uA**** T. vivax 
 
*GuuuAuGGuAuAuuuuAuuGuuGuuuuGuuuuuuGuuuuuGuuGUUUG-uuUGuG--uG T. brucei 

*GuuuGuGGuAuAuuuuGUuGuuGuuuuuuuuuuuGuuuuuuuuGuuuG***UGuGuuuG T. vivax 
 
GGuA--uGuuuuAuuuGuuuuGuuAuA----GuuGuuuGuuuuuuuuuGuuGuUUUG*GG T. brucei 

GGuA**UGuuuuAuuuGuuuuGuuAuA****GuuGuuuGuuuuuuuuuuuuGuuuuG-GG T. vivax 
 
uuGuG----AuuuuuuAuuG**G--uGuuuuG***AuuGuAuA---GuuuAuuuuuuuuG T. brucei 

uuGuG----AUUUUuuGuuA--GuuuA--UuG---AuuGuAuA***GUuuGuuuuuuuuG T. vivax 
 
uGAC-GuuAuAAuuUUGuuuAuuuuuuuuuuuuAuuuuGuuuuGuGuuuuuuG---uAuu T. brucei 

uGAC*GuuAuAAUuuuGuuuAUUUUuuuuuuuuGUuuuGUUuuGuGuuuuuuGuuuuGuu T. vivax 
 
G*UUGuuuuuA-uUUGGuuuGuuuGGuuuuuuuuuG***UAuuuuuuGUUGuGuuuuGuG T. brucei 

GuuuuuuuuuG----GUUU*GuuuGGuuuuuuuuuG---uAuuuuuuGuuAuGUuuuGuG T. vivax 
 
uuAuuuuuuGAuuuAuuuuuuAuGuUGuuuuuUGuuuuG--GG***UG*G-uuuuuuuGu T. brucei 

uuAuuuuuuGAuuuGuuuuuuAuGuuGuuuuuuG---uAuuGG----GUGGuuuuuuuGu T. vivax 
 
uuuuGuuuuuuuuuuuuGuuuAuGuuuGuuuuuA---uuuGuGGuuGuuG--uuAuuuuG T. brucei 

uuuuGuuuuuuuuuuuuGuuuAUGuuuA---uuGuuuuuuGuAGuuAUUG**UUGuuuuA T. vivax 
 
uuAGuuuGGuuGuuGUUG-uuAuuUGuG---uA--uA****GGUUUAuuUAuA*UGCGuu T. brucei 

uuAGuuuGGuuGuuGuuGUU*AuuuGuGU***AUuuA--UUGGuuuAuuuAuA-uGCG-- T. vivax 
 
uuuuAuuuuA------GAuAAuUAuG****G****UA**UUGGUUUUAUAAAAUGUUUUU T. brucei 

uuuuG---uGUUUU**AA------------------------------------------ T. vivax 
 
UCU T. brucei 

--- T. vivax 
 

 
 
 

 

222 

E. ND3 - Pan-edited dual coding 
UCAAAAAAUCCUCGCCUUUUUACUUUA-GUUUGUUAUCAuuA--uuuuuAuAuuuGuuuu T. brucei 

--------------------------AUG----UUAuCA--AUuuuuuuGuAuuuG---- T. vivax 
 
UG---*A*UA-uuGuGGuuuA**UUAuuuuAuuuA-uAGG--uuuuuuuuuAuGuuuuuu T. brucei 

-GUUUuG-uGuuuGuGGuuuA--UuAuuuuA-uuGuuAGGuuuuuuuCU**AuGuuuuuu T. vivax 
 
AuGuuuuuuAuuGCAuuuuuuuG----AuuGuuuuCGuuGuuGuuuGuGGuuuuCGuGuG T. brucei 

AuGuuuuuuGuuACAuuuuuuuG****AuuGuuuuCGuuGuuGuuuAuGAuuuuCAuGuG T. vivax 
 
---GuUUGuA---uGAuAuG--A-----AuUCACGuuuG*GUGuuuuA--uACAuuGGAu T. brucei 

***GuuuGuA***UGAuAuG**A*****AuuCACGuuuG-GuGuuuuA**UACAuuAGAu T. vivax 
 
uuAUGuuuuGuuA-------------GuUGuUUGuuuuuuGuAuuGuu---AAAuuCCAu T. brucei 

uuAuGuuuuGuuA*************GuuGuuuGUuuuuuGuGuuAUU***GAAUuCuGu T. vivax 
 
UAuuuGu----GuUUUGuuGuuuGuuuuuG--UG----------A-uA*GuGuuGuuuuA T. brucei 

uAuuuGU****GuuuuGuuAUUUG----uAuuuGU*********GuuG-GuAuuGuuuuA T. vivax 
 
uuuuuGuuAU----GGuuuuuuG-uUUUUGuGGuuuuuGuuuuuuGuuG-uA-uGuA--- T. brucei 

uuuuuGUUAU****GGuuuuuuA*UUUUUGuGGuuuuuGuuuuuuG-uGuuGuuA--**U T. vivax 
 
uAG****GAuuUGuGuGGuAuuuuuGGGA-UCAC*GuAuAuuuGuGuGGuGUAAuuuuAu T. brucei 

UAG----GGuuuGuGUGGuAuuuuuGAGA*UCA-uGuAUAUU------------------ T. vivax 
 
uuuGuuuAuGA**UGuuuUUUGUUGUAUUAUACAUAUUAUAUUAAUAAAUAUAUAAAA T. brucei 

---------------------------------------------------------- T. vivax 
 

F. ND7 - Pan-edited dual coding 
UGAUACAAAAAAACAUGACUACAUGAUAAGUAuCAuuuuAuG-uuAuuuuuG--GuAGuu T. brucei 

---------------------------------------AuGuuuAuuuuuGuuGuAGuu T. vivax 
 
uuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG***CAGCACAuG**G- T. brucei 

uuuuuGCAuuuGuAUCGUuuuACAUuuG-GCCCACAGCAuCCCG---CAGCACAuG-*G* T. vivax 
 
uGuuuuAuGuuGuuuAuuGuAuuuuuGuGGuGA*AuuuAuuGuuuA**UA---UUGAuUG T. brucei 

UGuuuuAuGUuGuuuAUUGuAuuuuuGUGGuGA-AuuuAuuGuuuA--uA***UUGAuuG T. vivax 
 
uAuuAuA***G*GuuA--UUUGCAUCGUGGUACAGAAAAGUUAUGUGAAUAUAAAAGUGU T. brucei 

uAuuAuA---G-GuuA**UUUGCAUCGAGGUACAGAAAAGUUAUGUGAGUAUAAGAGCGU T. vivax 
 
AGAACAAUGUCUUCCGuAUUUCGA---CAGGUUAGAuuAuGuuA---*GuGuuuGuuGuA T. brucei 

AGAGCAGUGUCUUCCGuAUUUuGAU***AGAuuAGAuuAuGuuA****GuGuuuGuuGuA T. vivax 
 
AuGAGCAuuuGuuGuCuuuA***UGuuuuGAGuA--uAuGuuGCGAuGuuGuuuGuCGuu T. brucei 

AuGAACAuuuAuuGuCuuuA---uGuuuuGAGuA**UAuGuuACGGuGuuGuuuAuCAuu T. vivax 
 
ACGuuGuGCAuuuAuGCGuuuAuuAAuuGuA****GAAuuuAC***CCGuAGuuuuAAuG T. brucei 

GCGuGuuGCAuuuAuGCGuuuAuuGAuuGuA----GAGuuuACU**C*GUAGuuuuAAuG T. vivax 
 
GuuuGuuGuGuAuAuCAuGuAuGGuuuuGG*AuuuAGGuuGuuuGuCUCCGuuG*UUAuG T. brucei 

GuuuAuuGuGuGuGuCGuGuAuGAuuuuAG-AuuuAGGuuGuuuAuCCCCGuuA-UuAuG T. vivax 
 
AuCAuuuGAGGAA-***CG*UGA-CAAAuuGAuGACAuuuuuuGAuuuAuG**UUGuGGu T. brucei 

GuCAuuuGAGGAG****CG-uGAU*AAGuuAAuGACGuuuuuuGAuuuGuG--uuGuGGu T. vivax 
 
uGuCGuAuGCAuuuGGCUUUCAuGGuuuuAuuA-*GGuAUUCUUGAUGAuuuuGuuuuuG T. brucei 

uGuCGuAuGCAuuuGGCUUUCAuGGuuuuAuuG**GGuAuuCUUGAuGAuuuuGuuuuuG T. vivax 
 
GuuuuGuuGAuuuuuuGuuGuuGuuGA***UAAuAuCAuGuuuGuuuGuuAuGGAuuGuu T. brucei 

 

223 

GuuuuGuuGAUUUuuuGuuGuuAuuGA---uAAuAuCGuGuuuGuuuGuuAuGGAuuGuu T. vivax 
 
AuGAuuuGuuAuuuG--uGGGuAA---UCGuuuAuuuUAuuuGCGuuuGC***GuGGuuu T. brucei 

AuGAuuuAuuGuuuG**UGGGuAA***UCGuuuGuuuuAuuuGCGuuuGC---GuGGuuu T. vivax 
 
GuCAuuuuuuGAuuuA---uAuGAuuuA**GuuuuuA**A**UAGuuuAAGuGGuGuuuu T. brucei 

GuCAuuuuuuGAuuuG***UAuGAuuuG--GuuuuuA--A--uAGuuuGAGuGGuGuuuu T. vivax 
 
GuCuCGuuCGuuAGGuAuGGuGuGAGAuuGUCGuuuAuuuAGuuGuuA****UGA***** T. brucei 

GuCACGuuCAuuGGGuAuGGuAuGAGAuuGCCGuuuAuAuuGuuGuuA-***UGA----- T. vivax 
 
GuUG-uAuuuuA---uGuuuuGuuAuGAuuAuuGuuuuuGuuuuAuA-GGuGAuGCAuuu T. brucei 

GuuG*UAUUUUA***UGuuuuGuuAuGAuuAuuGuuuuuGuuuuAuA*GGuGAuGCAuuu T. vivax 
 
GA*UCGuuuAuuuuuACGuuuGuuuGAUAuGCGuAuGAGuuuGuuGAuuuGuAAGCAA-u T. brucei 

GAC*CGuuuGuuuuuGCGuuuGuuuGAuAuGCGuAuGAGuuuGuuGAuuuGuAAGCAA*U T. vivax 
 
GuuuuuuuGuuGGuuuuuuuGuuuuuG*****GuuuuGuuuGuuuGuuuG**AuuAuuuA T. brucei 

GuuuuuuuGuuGGuuuuuuuGuuuuuG-----GAuuuGuuuGuuuGuuuG--AuuAuuuG T. vivax 
 
uAuuGuGAuAuuACCAuuG****AGACCAuuAuuAuGuuAuuuuAuAGuuuG--uGGuGu T. brucei 

uAuuGuGAuGuuACCAuuG----AGACuAuuAuuAuGuuGuuuuAuAGuuuA**UGGuGu T. vivax 
 
uGuuGuuuGCCGGGuAuAU*-----CAuuuGC*UUGU-GuuGAACACCCCAAAG-----G T. brucei 

uGuuGuuuACCAGGuAuAU******CAUUUGC-UUGU*GuuGAGCAuCCCAAGG*****G T. vivax 
 
uGA***GuAuuGuuuGuuAuuAU****GuuuuuGuGuuGGuuuAuGuuCUCGuuuACGuu T. brucei 

uGA---GuAuuGuuuGuuAuuAU****GuuuuuGuGuuGGuuuGuGUUCCCGuuuGCGuu T. vivax 
 
uGCGuuGuGCGGAuuuuuuGCA--*UA--UUUGuuuAuuGGAuGuuuGuuuGCGuGGuuu T. brucei 

uGCGuuGuGCGGAuuuuuuACA***UA**UUUGuuuGuuGGAuGuuuGuuuACGuGGuuu T. vivax 
 
uuuAuuGCAuGAuuuAGuuGC--***C*GuuuuA--GGuAAuAuuGAuGuuGuuuuuGGA T. brucei 

uuuAuuGCAuGAuuuAGuuGC*****C*G--uuAuuGGuAAuAuuGAuGuuGuuuuuGGA T. vivax 
 
uCC--GUAGAUCGuuA*GuuuuAuAuGuG**A******GGUUAUUGuAGGAUUGUUUAAA T. brucei 

uCU**GuGGAUCGuuA*G------------------------------------------ T. vivax 
 
AUUGAAUAAAAA T. brucei 

------------ T. vivax 
 

G. ND8 - Pan-edited non-dual coding 
-------CAAUUUAAUAAUUUUAAGUUUUGGUUGAUUAuuAuuuuuuuAuuuuuuuAuuu T. brucei 

------------------------------------------------------------ T. vivax 
 
uuGuAuGuuuuuuuuuGAuuuuuuGuuuuuuuUUUUUGuuuGuuuuuAuAuGuGUuuuGu T. brucei 

----AuGuuuuuuuuuGAuuuuuuGuuuuuuuuuuuuGuuuGuuuuuAuAuGuGuuuuGu T. vivax 
 
uuGuuGuGuuA****CuA*UUUGuuuA-***CCCAuuGAGuuAACCA--uuGuuAGuuuA T. brucei 

uuGuuGuGuuA---CC-A-UUuGuuuA****CCCAuuGAAuuAAC-AuuuuG-uAGuuuG T. vivax 
 
uuGGuuC--GuGGUAA---C-C---AuuuuuuGCGUUUUuA***UUGGuGuGGuuuAGAG T. brucei 

uuGA-*CCCGuGGuAA***C*C***AuuuuuuGCGuuuuuA--*UUGAuGuGGuuuAGAA T. vivax 
 
CGuuGuAuuGCuuGuCGuuuAuGuGAuuuAAuuuG-C-----CCuA****GuuuAGCAuu T. brucei 

CGUuGuAuuGCuuGuCGuuuAuGuGAuuuGAuuuGuC*****CC-A----GuuuAGCAuu T. vivax 
 
GGAuG***UUCGuGuuGGGuGG---AGuuuuGGuGGuCA**UC*GuuuuGCG--GAuuG- T. brucei 

AGAuG---uuCGuGuuGGGuGG***AGuuuuGGuGGuCA--uC-GuuuuGCA**GAuuG* T. vivax 
 

 

224 

-AuuuACAuuGAGuuA-*UC**GU-**C----GuuGuAuuuAuuGuGGuuuuuGuAuGCA T. brucei 

*AuuuACAuuGAGuuA***C-CG***AC****GuuGuAuuuAuuGuGGuuuuuGuAuGCA T. vivax 
 
uGuuuGCCCGACAGAU****G---CC----AuuA-----CGCA---UUCAuuGuuuGuuA T. brucei 

uGuuuGuCCAACAGAu----G***CC****AuuA*****CACA***UUCAuuGuuuGuuA T. vivax 
 
uGuGuuuuuGuuGuuuA-------GCC**A**UGuAuuuAuuG*GCGC***C***C---- T. brucei 

uGuGuuuuuGuuGuuuA*******GCC**A--UGuAuuuAuuG-GCGC---C---C**** T. vivax 
 
AAGuuuuuAuuGuuuGG---uuGuuGuuuuAuGuuAuuuGAuuuuuAuuuGuGuuuuGuG T. brucei 

AAGuuuuuGuuAuuuGG***UUGuuGuuuuAuGuuGuuuGAuuuuuAuuuGuGuuuuGuG T. vivax 
 
uAGuuAuuuAuuuuGGGuGAuuuAuuGUGuuuAuGAuuuAA***AGAA**AuuCACGGUG T. brucei 

uAG--------------------------------------------------------- T. vivax 
 
AAAUUAAAUUUUGACUAAAU T. brucei 

-------------------- T. vivax 
 

 

H. ND9 - Pan-edited dual coding 
UUAAUAUCAACUUAAUUUUUUUUAUAAACAuuAuAuuAUGuGuA--uAuUUUUAuGuuuA T. brucei 

-------------------------------------AuG-G-GuuuGuuGuuGuGuuuA T. vivax 
 
uuuCGuuuAuGuuuuuGuuuAAuuUUAuuuuA**UUGuuuGuGuuGuA----------GA T. brucei 

uuuCGUUuGuGUUUUUGuuuGAUuuuGuuuuA--UuGuuuGUGuuGuA**********GG T. vivax 
 
uGGuGuuUUGuuuGuuuuGuuGA-uuGuAGuuuuuuGuuuuuuuAuuGuuuuGuuA-Guu T. brucei 

UGGuGuuuuGuuuGuuuuGuuGA*UUGuAGuuuuuuGuuuuuuuAUUGUUUuGuuA*Guu T. vivax 
 
uuuuuuuGuuuuAUUGuAuGuuuuuAuuuuuuAAuuuGuG-AuuuuuGuuuuuAuAuuGU T. brucei 

uuuuuuuGuuuuAuuGuAuGuuuuuAUUuuuuAAuuuAuG*GuuuuuGuuuuuGuAuuGu T. vivax 
 
UGuG-AuUUGuuAuuGAuuGAuuuuuGuGGuuuuuGuuuuuGuCGuuuuAuGuuGuuGUA T. brucei 

uGuG*AuuuAuuGuuGAuuGAuuuuuGuGGUuuuuGuuuuuGuCGuuuuAuGuuAuuAuA T. vivax 
 
uAuuuuAuuuuGuuuGuuuuuGuGuGuuCGuuuGuGuuuuGuuuuGuGuuGUUUGuuuGU T. brucei 

uAuuuuGuuuuGuuuGUuuuuGuGuuuuCGuuuAuGuuuuGuuuuGuGUUGUUuGuuuuu T. vivax 
 
AuuuuuuGGAuuG----uGuuuuA-*GuuuuA**GuuGuuuuuGuUAuGC---GuuuuuG T. brucei 

GUUUUuuGGAuuG****UGuuuuA**GuuuuA--GuuGuuuuuGuuAuGC***GuuuuuA T. vivax 
 
uuGuuGGA----ACGC*GAAuGuuuUGAUUUGuuuGGuuuuUAuuuuG--uuGGuAAuGA T. brucei 

UUGuuAGA****ACG-uGAGuGuuuuGAuuuGuuuGGuuuuuAuuuuG**UUGGuAAuGA T. vivax 
 
uAuuuuACAUCGuuuAuuuGuuG--AuuG****GuuuuuuGuuG-GuuuuuuuuuGuuGA T. brucei 

uGuuuuACACCGuuuAUuuGuuG**AuuG----AUUuuuuGuuG*GuuuuuuuuuGuuGA T. vivax 
 
-AGuGuuAUCCA--uuAuuuGGuuuGuuuGuAuuGuuAuuuuGuG---uGuuG**GuG-G T. brucei 

*AGuGuuAuCCA**UUAuuuGGUuuGuuuGuAuuAuuGuuuuGuGuuuuA--GuuA-AuG T. vivax 
 
A--GGA-GAUA-GuAuGuACGuuuACAAuGuuA--uuuuuGuuGuuGCAuACC**AAuuU T. brucei 

A-**GAUGAuAuGuA----CGuuuACAAuG--GuuuuuuuGuuGuuGCAuACC**AAuuu T. vivax 
 
UUAuuuG*CA-----uuAuuuuAuuuA***AuA**UCACCGuUGUAAUUCUAAAUUUCUC T. brucei 

uuAuuuG-CAUUuuuuuG-uuuA--*G--------------------------------- T. vivax 
 
ACUUCC T. brucei 

------ T. vivax 
 

 

225 

I. RPS12 - Pan-edited dual coding 
CUAAUACACUUUUGAUAACAAACUAAAGUAAAuAuAuuuuGuuuuuuuuGCGuAuGuGA* T. brucei 

-----------------------------------------------------AuGuGA* T. vivax 
 
UUUUUGUAUG*GuuGuuGuuuAC----*GuuuuGuuuuAuuuGuuuuAuGuuAuuAuAuG T. brucei 

UUUUUGuAuG-GuuGuuGUUUGC*****GuuuuGuuuuGuuuGuuuuAuGuuAuuAuAuG T. vivax 
 
AGuCC----G**CGAuuGCCCAGuuCCGGuAACCGACGuGuAuuGuAuGC**C****GuA T. brucei 

AGUCC****C--CGAuuGCCCAGuuCCGGuAAuCGACGuGuGuuGuAuGC--C--**GuG T. vivax 
 
uuuuAuuUAuAuAAuuuuGuuuG-GA-uGuuGCGuuGuuuuuuuuGuuGuuuuAuuG--- T. brucei 

uuuuAuuuGuAuAAuuuuG--uGuGGuuGuuGCGuuGuuuuuuuuGuuG---uG-uGUUU T. vivax 
 
---GuuuA---GuuA--uG**UCAuuAuuuAuuAuAGA--***G----GGUGGuGGuuuu T. brucei 

UuuG---GuuuG-CAUUUG--UCGuuAuuuAuuAuAGA*****G****GGuGGuGGuuuu T. vivax 
 
GuuGAuuuACCC--***G****GuG*UAAAGuAuuAuACA*CG**UAuuG--uA--AGuu T. brucei 

GuuGAuuuACCC*****G--**GuA-UAAAGuAuuAuACA-CG--uA-uGuuuAuuAAuu T. vivax 
 
AGA*UUUAGAuAUAAGAUAUGUUUUU T. brucei 

AA------------------------ T. vivax 
 

 

 

 

 

226 

APPENDIX H.  Alignments of protein sequences of pan-edited dual-coding genes in L. 

tarentolae, L. amazonensis, P. serpens, and Perkinsela CCAP1560/4 with T. brucei and T. vivax 

sequences. 

A: CR3, B: CR4, C: ND3, D: ND7 5' Editing Domain E: ND9, F: RPS12, G: ND8 (Nondual-

coding). Absent sequences were unavailable. All ORF alignments show published protein 
sequences.  All ARF alignments show +1 or +2 (ND7 only) reading frame translations of the full 
length mRNA sequences.  In the ARF alignments of CR3, ND7 and RPS12, translations were 
made using the alternative T. brucei mRNA sequences shown in Figure 8.  Two alignments of 
CR3 ARFs are presented to display the P. serpens +2 reading frame which has no stop codons. L. 
tarentolae CR4 published protein sequence shows limited homology in the C terminus to all 
other CR4 protein sequences.  The edited mRNA has two editing sites where 13 U residues are 
inserted. If the second of these insertion sites is shortened to 12 U residues, the translation of 
this mRNA has much better homology to other CR4 proteins. Alignments with translations of 
the two different sequences (13U and 12U) are both shown, with the location of the altered 
site highlighted in red. While ND8 does not appear to be dual-coding, this alignment was 
included as well, for comparison of the conservation of a nondual-coding gene with that of the 
dual-coding genes. It should be noted that ND8 is the only nondual-coding gene that is pan-
edited in L. tarentolae, L. amazonensis, and P. serpens, and ND7, A6 and COIII are only partially 
edited in these species. !=Termination codon 
 
A. CR3 
CR3 ORF Alignment 
CR3ORF T. brucei        --MFDCLVLL-FFYCLFVHFFCFLFVCDLFLCLLFSFCFLLDFCFLFNMGLLLCLFFFFI 
CR3ORF T. vivax         --MFDCLVLL-FFLLLFVHFFCFLFICDLFLCLLFVFCLFVDFCFLFNMGLLLCLFFFFV 
CR3ORF L. amazonensis   --MFDFVIIMFL-FMSFVHFFCFLFIVDLLFCLMFFVFFLYDFCFVCNLGFCCCLFFFFL 
CR3ORF P. serpens       IFLFDFVLFLVLFLLFFVHFFCFLFIIDLFCCFLLLFFLVFDFCFCCCFGFVSCLFLFFV 
                          :** :::: :    *********: **: *::: . :. ****   :*:  ***:**: 
 
CR3ORF T. brucei        LSFDMLLSFLLLYISFRY! 
CR3ORF T. vivax         LSFDMLLSFLLLYISFRY! 
CR3ORF L. amazonensis   LSIDMILSFILLYVSFRY! 
CR3ORF P. serpens       FHFDMVLSFILLFVSFRY! 
                        : :**:***:**::**** 
 
CR3 ARF Alignment with P. serpens +1 RF 
CR3ARF T. brucei         ---------RNINMCMIYKNNVYVVVLFWFWLYIFFVFYLFVICFYVCYLVFVFYWIFVF 
CR3ARF T. vivax          --------------CLI----V!FCCFFYCCLYIFFVFCLFVICFYVCCLFFVYLWIFVF 
CR3ARF P. serpens +1     !!VYNI!KHNILYFCLI---LFCF!CYFYCFLCIFFVFYLLLICFVVFYYCFF!CLIFVF 
CR3ARF L. amazonensis    --K!NNMY!V!IYICLI----SLL!CFCLWVLYIFFVFYLLLICYFVWCFLFFFYMIFVL 
                                       *:*      .       * ***** *::**: *    *.   ***: 
 
CR3ARF T. brucei         YLIWVYCCVYFFFLFYHLICCYHFYYCI!VFVIRLKKYANNFC- 
CR3ARF T. vivax          CLIWVCCCVYFFFLFYHLICYYRFCYYI!VFVI----------- 
CR3ARF P. serpens +1     VVVLVLWVVYFYFLFFILIWFYLSYYYL!VSVIKSI!KHKLIS- 
CR3ARF L. amazonensis    CVI!VFVVVCFFFFCYPLIWFCRLFYYMLVSDIKII!LLFL!!K 
                          :: *   * *:*: : **      * : *  *            
 
CR3 ARF Alignment with P. serpens +2 RF 
CR3ARF T. brucei         -------RNINMCMIYKNNVYVVVLFWFWLYIFFVFYLFVICFYVCYLVFVFYWIFVFYL 
CR3ARF T. vivax          ------------CLI----V!FCCFFYCCLYIFFVFCLFVICFYVCCLFFVYLWIFVFCL 
CR3ARF P. serpens +2     ----------SKCIIYKNIIFYI-FVWFC----FVFSVIFIVFCAFFLFFIYYWFVLLFF 
CR3ARF L. amazonensis    K!NNMY!V!IYICLI----SLL!CFCLWVLYIFFVFYLLLICYFVWCFLFFFYMIFVLCV 
                                     *:*         :        *** ::.* : .  :.*.:  :.:: . 
 
CR3ARF T. brucei         IWVYCCVYFFFLFYHLICCYHFYYCI!--------VFVIRLKKYA--NNFC------ 
CR3ARF T. vivax          IWVCCCVYFFFLFYHLICYYRFCYYI!--------VFVI------------------ 
CR3ARF P. serpens +2     IIVFFSVWFL--FLLLFWFCELFIFIFCFSFWYGFIFHIIICKFPLLKAFKNISL!V 
CR3ARF L. amazonensis    I!VFVVVCFFFFCYPLIWFCRLFYYML--------VSDIKII!LL--FL!!K----- 
                         * *   * *:     *:   .:   :         :  *                   

 

227 

B. CR4 
CR4 ORF Alignment with L. tarentolae 13U translation (Published Sequence) 
CR4ORF T. brucei          ILILVVHFFFFYLVCLC-----FMYSLWYILLLFCFLFLLFVCVGMFYLFCYSCLFFFVV 
CR4ORF T. vivax           IFLFVVHFFFFYLVCLC-----FMYSLWYILLLFFFLFFLFVCLGMFYLFCYSCLFFFFV 
CR4ORF L. tarentolae 13U  -------------KCCCFWFFYVLFCVLYILFLFFFLFIWFVCYGLFYLYCICLFICFSL 
CR4ORF L. amazonensis     ---ISNILLFLYIFIYICWLIF-MYSCWYILILFFFLFLLFVVYGLFYLYCIVCLFILCL 
                                                 ::.  ***:** ***: **  *:***:*   :: : : 
 
CR4ORF T. brucei          LGCDFLLVFWLYSLFFLWRYNFVYFFFLFCFVFFVLLF-LFGLFGFFLYFLLCFVLFFDL 
CR4ORF T. vivax           LGCDFLLVYWLYSLFFLWRYNFVYFFFLFCFVFFVLLF-FFGLFGFFLYFLLCFVLFFDL 
CR4ORF L. tarentolae 13U  LCCDFVVVFWLYSVFFVYRYNYFFFFVYFLGVYFFVIILICIWFFIFFFLCLCFDFL--F 
CR4ORF L. amazonensis     LCCDFVVVFWLYSVFFIYRYNFVFFFFFLWFVFIFLIIFIFGFGFLFFFLVLCLVFYFEF 
                          * ***::*:****:**::***:.:**. :  *::.::: :     :*::: **: :   : 
 
CR4ORF T. brucei          FFMLFFVLGGFFVFVFF---------FCLCLFLFVVVVILLVWLL-----LLFVYRFIYM 
CR4ORF T. vivax           FFMLFFVLGGFFVFVFF---------FCLCLLFFVVIVVLLVWLL-----LLFVYRFIYM 
CR4ORF L. tarentolae 13U  WIFVYVVFCFLWIFVVCDVYFIFYIIFCFNCVGVLLVVIYICVSIFLYDVLYFNFNWIIL 
CR4ORF L. amazonensis     LFMLFFVFCGFLLFVMFILFFVSFF---------VLIVLLFCWMLF-----IFVFRFICM 
                           ::::.*:  : :**.                  :::*: :   :       * :.:* : 
 
CR4ORF T. brucei          RFLF! 
CR4ORF T. vivax           RFVF! 
CR4ORF L. tarentolae 13U  KF--- 
CR4ORF L. amazonensis     RFVF! 
                          :*    
 
CR4 ORF Alignment with L. tarentolae 12U translation (Hypothetical Sequence) 
CR4ORF T. brucei          ILILVVHFFFFYLVCLC-----FMYSLWYILLLFCFLFLLFVCVGMFYLFCYSCLFFFVV 
CR4ORF T. vivax           IFLFVVHFFFFYLVCLC-----FMYSLWYILLLFFFLFFLFVCLGMFYLFCYSCLFFFFV 
CR4ORF L. tarentolae 12U  -------------KCCCFWFFYVLFCVLYILFLFFFLFIWFVCYGLFYLYCICLFICFSL 
CR4ORF L. amazonensis     ---ISNILLFLYIFIYICWLIF-MYSCWYILILFFFLFLLFVVYGLFYLYCIVCLFILCL 
                                                 ::.  ***:** ***: **  *:***:*   :: : : 
 
CR4ORF T. brucei          LGCDFLLVFWLYSLFFLWRYNFVYFFFLFCFVFFVLLFL-FGLFGFFLYFLLCFVLFFDL 
CR4ORF T. vivax           LGCDFLLVYWLYSLFFLWRYNFVYFFFLFCFVFFVLLFF-FGLFGFFLYFLLCFVLFFDL 
CR4ORF L. tarentolae 12U  LCCDFVVVFWLYSVFFVYRYNYFFFFVYFLGVYFFVIILICIWFFIFFFYVCVLIFYFEF 
CR4ORF L. amazonensis     LCCDFVVVFWLYSVFFIYRYNFVFFFFFLWFVFIFLIIFIFGFGFLFFFLVLCLVFYFEF 
                          * ***::*:****:**::***:.:**. :  *::.::::      :*:: :  ::::*:: 
 
CR4ORF T. brucei          FFMLFFVLGGFFVFVFFFCLCLFLFVVVVILLVWLLLLFVYRFIYMRFLF!--------- 
CR4ORF T. vivax           FFMLFFVLGGFFVFVFFFCLCLLFFVVIVVLLVWLLLLFVYRFIYMRFVF!--------- 
CR4ORF L. tarentolae 12U  LFMLFFVFCGFLLFVMFILFFISFFVLIVLVFCWLLFIFVFRFFCMTFCILILIGLF!NL 
CR4ORF L. amazonensis     LFMLFFVFCGFLLFVMFILFFVSFFVLIVLLFCWMLFIFVFRFICMRFVF!--------- 
                          :******: **::**:*: : : :**::*::: *:*::**:**: * * :          
 
CR4 ARF Alignment with L. tarentolae 13U translation (Published Sequence) 
CR4ARF T. brucei          -----IYCYLCVFIIILF!FWLCIFFFFIWCVCVLCTVYGIFYCCFVFCFCCLFVWVCFI 
CR4ARF T. vivax           -----------------FFCLLCIFFFFIWCVCVLCIVCGIFCCCFFFCFFCLCVWVCFI 
CR4ARF L. tarentolae 13U  QIH!NTYMYNCKSVV-VFGFFMY---------YFVC---CIFYFCFFFCLFDLCVMVYFI 
CR4ARF L. amazonensis     ------DIKNIK!VI-FYYFYIFL---FTFVGWFLCIVVGIFWFYFFFYFCYL!FTVYFI 
                                           :   :           .:*    **   *.* :  * . * ** 
 
CR4ARF T. brucei          CFVIVVCFFLLFWVVIFYWCFDCIVYFFCDVIILFIFFFYFVLCFLYCCFYLVCLVFFC- 
CR4ARF T. vivax           CFVIVVCFFFLFWVVIFC!FIDCIVCFFCDVIILFIFFFCFVLCFLFCCFFLVCLVFFC- 
CR4ARF L. tarentolae 13U  YIAFVCLFVLVCYVVILLLCFDCIVFFLFTVIIIFFFLFIFWVFIFLLLFWFVFGFLFFF 
CR4ARF L. amazonensis     CIALFVCLFYVCYVVILLLCFDCIVFFLFIDIILFFFFFFYGLCLFF!LFLYLDLVFYFF 
                           :.:.  :. : :***:   :**** *:   **:*:*:* : : ::   *  :  .::   
 
CR4ARF T. brucei          IFCCVLCYFLIYFLCCFLFWVVFLFLFFFFVYVCFYLWLLLFC!FGCCCYLCIG------ 
CR4ARF T. vivax           IFCYVLCYFLICFLCCFLYWVVFLFLFFFFVYVYCFL!LLLFY!FGCCCYLCIG------ 
CR4ARF L. tarentolae 13U  FYVCVLIFYF------------------------EFLFMLFFVFCGFLLFVMFILFFISF 
CR4ARF L. amazonensis     F!FCVWFFILNFCLCYFLFFADFCCLWCLFYFLCHFLF!LFYYFVGCYLYLYFV------ 
                          :   *  : :                         :*  *::   *   :: :        
 
CR4ARF T. brucei          --LFICVFYFR!LWYWFYKMFF--------------- 
CR4ARF T. vivax           --LFICVLCF--------------------------- 
CR4ARF L. tarentolae 13U  FVLIVLVFCWLLFIFV-FRFFCMTFCILILIGLF!NL 
CR4ARF L. amazonensis     --LFVCVLCFK---VG-YEFIFI-------------- 
                            *:: *: :                            
 

 

228 

CR4 ARF Alignment with L. tarentolae 12U translation (Hypothetical Sequence) 
CR4ARF T. brucei          -----IYCYLCVFIIILF!FWLCIFFFFIWCVCVLCTVYGIFYCCFVFCFCCLFVWVCFI 
CR4ARF T. vivax           -----------------FFCLLCIFFFFIWCVCVLCIVCGIFCCCFFFCFFCLCVWVCFI 
CR4ARF L. tarentolae 12U  QIH!NTYMYNCKSVV-VFGFFMY---------YFVC---CIFYFCFFFCLFDLCVMVYFI 
CR4ARF L. amazonensis     ------DIKNIK!VI-FYYFYIFL---FTFVGWFLCIVVGIFWFYFFFYFCYL!FTVYFI 
                                           :   :           .:*    **   *.* :  * . * ** 
 
CR4ARF T. brucei          CFVIVVCFFLLFWVVIFYWCFDCIVYFFCDVIILFIFFFYFVLCFLYCCFYLVCLVFFCI 
CR4ARF T. vivax           CFVIVVCFFFLFWVVIFC!FIDCIVCFFCDVIILFIFFFCFVLCFLFCCFFLVCLVFFCI 
CR4ARF L. tarentolae 12U  YIAFVCLFVLVCYVVILLLCFDCIVFFLFTVIIIFFFLFIFWVFIFLLLFWFVFGFLFFF 
CR4ARF L. amazonensis     CIALFVCLFYVCYVVILLLCFDCIVFFLFIDIILFFFFFFYGLCLFF!LFLYLDLVFYFF 
                           :.:.  :. : :***:   :**** *:   **:*:*:* : : ::   *  :  .:: : 
 
CR4ARF T. brucei          FCCVL-CYFLIYFLCCFLFWVVFLFLFFFFVYVCFYLWLLLFC!FGCCCYLCIGLFICVF 
CR4ARF T. vivax           FCYVL-CYFLICFLCCFLYWVVFLFLFFFFVYVYCFL!LLLFY!FGCCCYLCIGLFICVL 
CR4ARF L. tarentolae 12U  FMFVFWFFILNFCLCCFLFFVDFCCLWCLFYFLYHFLF!LCWCFVGCYLYLCFDFFVWRF 
CR4ARF L. amazonensis     F!FCVWFFILNFCLCYFLFFADFCCLWCLFYFLCHFLF!LFYYFVGCYLYLYFVLFVCVL 
                          *   .  ::*   ** **::. *  *: :* ::  :*  * :  .**  ** : :*:  : 
 
CR4ARF T. brucei          YFR!LWYWFYKMFF 
CR4ARF T. vivax           CF------------ 
CR4ARF L. tarentolae 12U  VF!F!LDYFKI--- 
CR4ARF L. amazonensis     CFKVGYEFIFI--- 
                           *             
 
C. ND3 
ND3 ORF Alignment 
ND3ORF T. brucei        --LLSLFLYLFLILWFIILFIGFFLCFLCFLLHFFDCFRCCLWFSCGLYDMNSRLVFYTL 
ND3ORF T. vivax         --MLSIFLYLVLCLWFIILLLGFFLCFLCFLLHFFDCFRCCLWFSCGLYDMNSRLVFYTL 
ND3ORF L. tarentolae    ------------------------FGRREKVLHFFDCFRCCLWFSCGLYDMNSRFVYVSI 
ND3ORF L. amazonensis   MFFCSFYFNFVLIFCMLILSIGVLFYIFMFLLHFFDCFRCCLWFSCGLYDMNSRLCYIFI 
                                                :     :***********************: :  : 
 
ND3ORF T. brucei        DLCFVSCLFFVLLNSIICVLLFVFVIVLFYFCYGFLFLWFLFFVVCIGFVWYFWDHVYLC 
ND3ORF T. vivax         DLCFVSCLFFVLLNSVICVLLFVFVLVLFYFCYGFLFLWFLFFVLLLGFVWYFWDHVY-- 
ND3ORF L. tarentolae    DLCFAVLLCFVMFYSIIGLILFLIVVVLYFMCKL-FFVWFCFVFLL------FWSIV--- 
ND3ORF L. amazonensis   DLCFAILLCFVLFYSNFGLIIFVLVVLLYFMCKL-FFVWFLLLFFM------LWLMYLI- 
                        ****.  * **:: * : :::*::*::*:::*   :*:** :...       :*       
 
ND3ORF T. brucei        GVILFCLWCFLLYYTYYINKYIK---------- 
ND3ORF T. vivax         --------------------------------- 
ND3ORF L. tarentolae    FNIWFCV---LFVFIFFLLILINSVFSYLILK! 
ND3ORF L. amazonensis   FDSVYCL---FLFFFY!---------------- 
 
ND3 ARF Alignment 
ND3ARF T. brucei        -SKNPRLFTLVCYHYFYICF-W-YCGLLFYL!VFFYVFYVFYCIFLIVFVVVCGFRVVCM 
ND3ARF T. vivax         -----------CYQFFCIWF-C-VCGLLFYC!VFFYVFYVFCYIFLIVFVVVYDFHVVCM 
ND3ARF L. tarentolae    HSKNSSI-Y-----------------NLYRSIGISLGGGKKCCIFLIVFVVVYDFRVVCM 
ND3ARF L. amazonensis   TQKNSRFKFKICFFVHFILILC!FFVCWF!VLVYYFIFLCFCCIFLIVFVVVYDFHVVCM 
                                                    :              ********* .*:**** 
 
ND3ARF T. brucei        IWIHVWCFIHWIYVLLVVCFLYC!IPLFVFCCLFLW!CCFIFVMVFCFCGFCFLLYV!DL 
ND3ARF T. vivax         IWIHVWCFIH!IYVLLVVCFLCYWILLFVFCYLYLCWYCFIFVMVFYFCGFCFLCCY-GL 
ND3ARF L. tarentolae    TWIHDLFMFL!IYVSQFYYVLLCFIPLLV!FYF!!LWCCILCVNC-FLCGFVLYFCYFGV 
ND3ARF L. amazonensis   TWIHVYVIFL!IYVSQFYCVLYCFIPILVWLFLY!WCCYILCVNY-FLCDFYCYFLCYGW 
                         ***   ::  ***  .  .*   * ::*   :      :: *    :*.*       .  
 
ND3ARF T. brucei        CGIFGITYI--CVV!FYFVYDVFCCIIHIILINI!--- 
ND3ARF T. vivax         CGIFEIMYI----------------------------- 
ND3ARF L. tarentolae    L--YLIFDSVYCLFLFFF-Y!F!!IVYSLI!Y!NNK!I 
ND3ARF L. amazonensis   CI!YLILCTV-CFC-FFF-INLINSVK--LSLD!NN-- 
                           : *                                 
 
 
 
 
 

 

229 

D. ND7 5' Editing Domain 
ND7 5' Editing Domain ORF Alignment 
ND7ORF T. brucei          -----------MLFLVVFLHLYRFTFGPQHPAAHGVLCCLLYFCGEFIVYIDCIIGYLHR 
ND7ORF T. vivax           ----------MFIFVVVFLHLYRFTFGPQHPAAHGVLCCLLYFCGEFIVYIDCIIGYLHR 
ND7ORF L. tarentolae      ILFSRLHDNYILYLLIVFLHLYRFTFGPQHPAAHGVLCCLLYLSGEFITYIDVIIGYLHR 
ND7ORF P. serpens         -------IIFIFFIFVVFLHLYRFTFGPQHPAAHGVLCCLLYFSGEYITYIDVIIGYLHR 
                                     : :.:**************************:.**:*.*** ******* 
 
ND7ORF T. brucei          GTEKLCE 
ND7ORF T. vivax           GTEKLCE 
ND7ORF L. tarentolae      GTEKLCE 
ND7ORF P. serpens         GTEKLCE 
                          ******* 
 
ND7 5' Editing Domain ARF Alignment 
ND7ARF T. brucei          -IQKNMTTWYSIIVIFGSFFTFVSFYIWSTASRSTWCFMLFIVFLWWIYCLYWLYYRLFA 
ND7ARF T. vivax           ------------VYFCCSFFAFVSFYIWPTASRSTWCFMLFIVFLWWIYCLYWLYYRLFA 
ND7ARF L. tarentolae      LNFI!PTTR!LYFIFINCFFTLV!IYFRTPASSSPWRIMLFIISFWRIYNVYRCNYWVFT 
ND7ARF P. serpens         -------SNNFYFFYFCCFFAFVSFYVWSTASSRTWCVMLFVIFFRWVYNIYWRNYRLLT 
                                      .    .**::* :*.   **   * .***:: :  :* :*   * ::: 
 
ND7ARF T. brucei          SWYRKVMWI! 
ND7ARF T. vivax           SRYRKVMWV! 
ND7ARF L. tarentolae      SRYRKVMWI! 
ND7ARF P. serpens         PWNGKIVWI! 
                              *::*:* 
 

 

 

230 

 
E. ND9 
ND9 ORF Alignment 
ND9ORF T. brucei        MCIFLCLFRLCFCLILFYCLCCRWCFVCFVDCSFLFFYCFVSFFLFYCMFLFFNLWFLFL 
ND9ORF T. vivax         MGLLLCLFRLCFCLILFYCLCCRWCFVCFVDCSFLFFYCFVSFFLFYCMFLFFNLWFLFL 
ND9ORF L. tarentolae    MFLFLIMFRCVFVLLLFFCLCCRWVFLCFVDCSFVFFYLFVCFFLFFVMFLFFNLWFFLL 
ND9ORF L. amazonensis   MFLFLIMFRCVFVLCLFFCLCCRWVFLCFVDCSFVFFYLFVCFFLFFVMFLFFNLWFFLL 
                        * ::* :**  * * **:****** *:*******:*** **.****: *********::* 
 
ND9ORF T. brucei        YCCDLLLIDFCGFCFCRFMLLYILFCLFLCVRLCFVLCCLFVFFGLCFSFSCFCYAFLLL 
ND9ORF T. vivax         YCCDLLLIDFCGFCFCRFMLLYILFCLFLCFRLCFVLCCLFLFFGLCFSFSCFCYAFLLL 
ND9ORF L. tarentolae    YCLDLFCIDFCGFCFVRFILIYVLFCLLLCFRVSFVLICFFLFFGLVFSLFFCSYALCIF 
ND9ORF L. amazonensis   YCLDLFCIDFCGFCFVRFVLLYVLFCLILCFRVSFVLICFFLFFGLVFSLFFCSYALCIF 
                        ** **: ******** **:*:*:****:**.*:.*** *:*:**** **:   .**: :: 
 
ND9ORF T. brucei        ERECFDLFGFYFVGNDILHRLFVDWFFVGFFLLKCYPLFGLFVLLFCVLVEEIVCTFTML 
ND9ORF T. vivax         ERECFDLFGFYFVGNDVLHRLFVDWFFVGFFLLKCYPLFGLFVLLFCVLVNEMICTFTMV 
ND9ORF L. tarentolae    EREVFDLFGFVFCGNDCLHRFYVDWFFVGFFLCKVYPLFGLFMLNFCMLCEDIVVIATSC 
ND9ORF L. amazonensis   ERECFDLFGFVFCGNDCLHRFYVDWFFVGFFLCKVYPLFGLFVLNFCMLVEEIIVYATCC 
                        *** ****** * *** ***::********** * *******:* **:* ::::   *   
 
ND9ORF T. brucei        FLLLHTNFYLHYFI!--------- 
ND9ORF T. vivax         FLLLHTNFYLHFFV!--------- 
ND9ORF L. tarentolae    FVLCFSNFAI!------------- 
ND9ORF L. amazonensis   FVLVFPILHLFNLIYDNADLNIN! 
                        *:* .  : :           
 
ND9 ARF Alignment 
ND9ARF T. brucei          NINLIFFINIILCVYFYVYFVYVFV!FYFIVCVVDGVLFVLLIVVFCFFIVLLVFFCFIV 
ND9ARF T. vivax           ------------WVCCCVYFVCVFVWFCFIVCVVGGVLFVLLIVVFCFFIVLLVFFCFIV 
ND9ARF L. tarentolae      -RYII!YLLSIICFYFWLCFVVCLCCCYFFVCVVDGFFYVLLIVVLFFFICLCVFFYFLW 
ND9ARF L. amazonensis     IINCLY!NLIKLCFYF!LCFVVYLYYVYFFVYVVGEFFYVLLIVVLFFFICLCVFFYFLW 
                                       .   : **  :    *:* **. .::******: *** * *** *:  
 
ND9ARF T. brucei          CFYFLICDFCFYIVVICYWLIFVVFVFVVLCCCIFYFVCFCVFVCVLFCVVCLYFLDCVL 
ND9ARF T. vivax           CFYFLIYGFCFCIVVIYCWLIFVVFVFVVLCYYIFCFVCFCVFVYVLFCVVCFCFLDCVL 
ND9ARF L. tarentolae      CFYFLIYDFFYCIV!IYFV!IFAVFVLFVLFWYMFCFVYYYVFE!VLYWFVFFCFLVWFL 
ND9ARF L. amazonensis     CFCFLIYDFFYCIVWICFV!IFAVFVLFDLFYYMFCFV!FCVFG!VLYWFVFFYFLVWFL 
                          ** *** .* : ** *    **.***:. *   :* ** : **  **: .* : **  .* 
 
ND9ARF T. brucei          VLVVFVMRFCCWNANVLICLVFILLVMIFYIVYLLIGFLLVFFCWSVIHYLVCLYCYFVC 
ND9ARF T. vivax           VLVVFVMRFYC!NVSVLICLVFILLVMMFYTVYLLIDFLLVFFCWSVIHYLVCLYYCFVF 
ND9ARF L. tarentolae      VYFFVVMRYVFLNVKFLICLVLFFVVMIVYIVFMLIDFLLVFFCVKFIHCLVCLCWIFVC 
ND9ARF L. amazonensis     VYFFVVMRYVFLSENVLICLVLFFVVMIVYIVFMLIDFLLVFFYVKFIHCLVYLCWIFVC 
                          * ...***:   . ..*****::::**:.* *::**.******  ..** ** *   **  
 
ND9ARF T. brucei          WWRR!YVRLQCYFCCCIPIFICIILFNITVVILNFSL---- 
ND9ARF T. vivax           !LMRWYVRLQWFFCCCIPIFICIFLF--------------- 
ND9ARF L. tarentolae      YVKILLWLPLVVLCCVFPILQYSFYFIFV---MNLSIKLIY 
ND9ARF L. amazonensis     WLKRLLFMLHVVLC!FFQFCIYLI!FMIM---LT!TLTKF- 
                                      :*  : :    : *                
 
 
 
 
 
 
 
 
 
 
 
 

 

231 

F. RPS12 
RPS12 ORF Alignment 
RPS12ORF T. brucei        ----------MWFLYGCCLRFVLFVLCYYMSPRLPSSGNRRVLYAVFYLYNFVWMLRCFF 
RPS12ORF T. vivax         ----------MWFLYGCCLRFVLFVLCYYMSPRLPSSGNRRVLYAVFYLYNFVWLLRCFF 
RPS12ORF L. tarentolae    --------MRVLFLYGLCVRFLYFCLVLYLSPRLPSSGNRRCLYAICYMFNILWFFCVF- 
RPS12ORF L. amazonensis   TFLNLIYFVRVLYLYGLCVRFLFLCLVLYLSPRLPSSGNRRCLYAISIMFNILWYFLVF- 
RPS12ORF P. serpens       -----MFFVRSYCLYGFCVRFCFVFLCIYVSPRLPSSGNRRVYVVCFNLYSFVIYCFLFG 
RPS12ORF Perkinsela       ------------MLFGFLVRYGFIEFFFFVSPRLPSSGNRFCYELDMRFFFVCYDFVLLG 
                                       *:*  :*:  . :  ::**********        :: .      :  
 
RPS12ORF T. brucei        CC-FIGLVMSLFIIEGGGF---VDLP-GVKYYTRIVS!-- 
RPS12ORF T. vivax         CCVFFGLHLSLFIIEGGGF---VDLP-GIKYYTRMFIN!- 
RPS12ORF L. tarentolae    CCVCF-LNHLLFIVEGGGF---IDLP-GVKYFSRFFLNA! 
RPS12ORF L. amazonensis   CCFVF-VIFQLFIVEGGGF---IDLP-GVKYFSRFCNVS! 
RPS12ORF P. serpens       CCVICYSQSFYFLCEGGGF---VDLP-CIKLYVRVPIA!- 
RPS12ORF Perkinsela       FSV---LLSSLVFYEGFGFWLFMDVPFGLYYFSRG!---- 
                           .         .: ** **   :*:*  :  : *       
 
RPS12 ARF Alignment 
RPS12ARF T. brucei        NTLLITN!SK--------YILFFLRMWFCMVVVYVLFYLFYVIIWVRDCPVPVTDVYCMP 
RPS12ARF T. vivax         -------------------------CDFCMVVVCVLFCLFYVIIWVPDCPVPVIDVCCMP 
RPS12ARF L. tarentolae    ---NTYRPI!--------IIFILCVYYFCMVYVFVFYIFVWFYI!VHDYLVPVIDVVYMQ 
RPS12ARF L. amazonensis   HK-YLFRPF!--------I!FILFVFYICMVYVFVFYFYVWFYI!VHDYQAPVIDVVYMQ 
RPS12ARF P. serpens       -----LKPIF!LS!YFYLYLCFLFVVIVYMVFVYVFVLYFYVYMLVPVYPVQVIVVFMLF 
RPS12ARF Perkinsela       ---------------------------CCLVFWFVMVL!SFFFLLALVCPVLVIGFVMSW 
                                                       :*   *:    :. : .    . *  .     
 
RPS12ARF T. brucei        YF--IYIILFGCCVVFFVVL-LV!LCHYLL!RVVVLLIYPV!SIIHVL!VRFRYKICF-- 
RPS12ARF T. vivax         CF--ICIILCGCCVVFFVVCFLVCICRYLL!RV-VLLIYPV!SIIHVCLLI--------- 
RPS12ARF L. tarentolae    YV--ICLIFYDFFVFFVV-FVFWIIC-CL!LKVVVLLICQE!SIFHVFFWMRKQ!VIIKI 
RPS12ARF L. amazonensis   LV--LCLIFYDIFWFFAV-LFLWFFS-CL!LKVVVLLICQE!SIFRVFVMCRKFNYLYFY 
RPS12ARF P. serpens       VL--ICIVLLFIVFYLVVVLFVILRVFIFYVRVVVLLIYHV!SYMSVCQ!PK!IIAS--- 
RPS12ARF Perkinsela       IWGFFLFVMILCCWVFPFCCQVWFFMKVLV---------------FGCLWMYRLDCIIFP 
                              : :::      : .   .      :                                
 
RPS12ARF T. brucei        ---- 
RPS12ARF T. vivax         ---- 
RPS12ARF L. tarentolae    ILFR 
RPS12ARF L. amazonensis   KN-- 
RPS12ARF P. serpens       ---- 
RPS12ARF Perkinsela       VV-- 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

232 

G. ND8 (Nondual-coding) 
ND8 ORF Alignment 
ND8ORF T. brucei        MFFFDFLFFFFVCFYMCFVCCVTICLPIELTIVSLLVRGNHFLRFYWCGLERCIACRLCD 
ND8ORF T. vivax         MFFFDFLFFFFVCFYMCFVCCVTICLPIELTFCSLLTRGNHFLRFYWCGLERCIACRLCD 
ND8ORF L. tarentolae    MFVYDFCFSFFVCFYMCFLCCVTLVLPLELTIVSICVRGNHFLRFYWCGLERCIACRLCD 
ND8ORF L. amazonensis   MFCYDFVFSFFVCFYMCFLCCVTLILPCEITIVSICARGHHFLRFYWCGLERCIACRLCD 
ND8ORF P. serpens       MFFFDFFFCFFCFVYMCFCCCVTIVVPCEVSLCSFLVRGTHFLRFYWCGLERCIACRMCD 
                        ** :** * **  .**** ****: :* *::: *: .** *****************:** 
 
ND8ORF T. brucei        LICPSLALDVRVGWSFGGHRFADWFTLSYRRCIYCGFCMHVCPTDAITHSLFVMCFCCLA 
ND8ORF T. vivax         LICPSLALDVRVGWSFGGHRFADWFTLSYRRCIYCGFCMHVCPTDAITHSLFVMCFCCLA 
ND8ORF L. tarentolae    FICPSLALDVRCVRSLCGYRFSDVFNISYRRCIYCGFCMHVCPTDAITHSCFLLFCCCIA 
ND8ORF L. amazonensis   FICPSLAIDVRCIRSLCGYRYSDLFYISYRRCIYCGFCMHVCPTDAITHSCFLLFCCCIA 
ND8ORF P. serpens       YICPSVAIDVRCGVSLIGHRFAHLFFISYRRCIYCGFCMHVCPTDAITHSFVVLFSVLLS 
                         ****:*:***   *: *:*::. * :*********************** .::    :: 
 
ND8ORF T. brucei        MYLLAPKFLLFGCCFMLFDFYLCFV! 
ND8ORF T. vivax         MYLLAPKFLLFGCCFMLFDFYLCFV! 
ND8ORF L. tarentolae    MYLCAPKFVLFGCCFMLFDFYLCFV! 
ND8ORF L. amazonensis   MYLCAPKFVLFGCCFMLFDFYLCFV! 
ND8ORF P. serpens       SYLVAPKFILFGCCFMVFDLFLCFC! 
                         ** ****:*******:**::***  
 
ND8 ARF Alignment 
ND8RF2 T. brucei        NLIILSFGWLLFFYFFIFVCFFLIFCFFFLFVFICVLFVVLLFVYPLS!PLLVYWFVVTI 
ND8RF2 T. vivax         -------------------CFFLIFCFFFLFVFICVLFVVLPFVYPLN!HFVVCWPVVTI 
ND8RF2 L. tarentolae    ---------KHI!CIRLKECLFMIFVFLFLFVFICVFYVVLLWFYHWSWPLLVFVFVVTI 
ND8RF2 L. amazonensis   ---------NIIRSILIKICFVMILFFLFLFVFICVFYVVLLWFYHVRLPLLVFVLVVII 
ND8RF2 P. serpens       ----------!I!CVNVIKCFFLIFFFVFFVLFICVFVVVLPLLFHVKYHCVVFWFVVLI 
                                           *:.:*: *.*:.:****: ***  .:      :*   ** * 
 
ND8RF2 T. brucei        FCVFIGVV!SVVLLVVYVI!FALV!HWMFVLGGVLVVIVLRIDLHWVIVVVFIVVFVCMF 
ND8RF2 T. vivax         FCVFIDVV!NVVLLVVYVIWFVPV!H!MFVLGGVLVVIVLQIDLHWVTDVVFIVVFVCMF 
ND8RF2 L. tarentolae    FCVFIDVV!NVVLPAVYVILYAQV!L!MFVVLEVYVVIGFPMCLILVIVVVFIVVFVCMF 
ND8RF2 L. amazonensis   FCVFIDVV!NGVLPAVYVILYALVWPLMFVVLEVYVVIVIPIYFILVIVVVFIVVFVCMF 
ND8RF2 P. serpens       FYVFIGVV!NVVLLVVCVIIFVLVLP!MFVVVLVWLVIVLHICFLLVIDVVFIVVFVCMF 
                        * ***.***. ** .* ** :. *   ***:  * :** : : :  *  *********** 
 
ND8RF2 T. brucei        ARQMPLRIHCLLCVFVV!PCIYWRPSFYCLVVVLCYLIFICVLCSYLFWVI------YCV 
ND8RF2 T. vivax         VQQMPLHIHCLLCVFVV!PCIYWRPSFCYLVVVLCCLIFICVLC---------------- 
ND8RF2 L. tarentolae    VQQTPLRIHVFCYFVVVLPCIYAHLNLFYLVVVLCYLIFICVLFSLVYLEKY-IWLII!! 
ND8RF2 L. amazonensis   VPPMLLRIHVFCYFVVVLPCIYAHLNLFYLVVVLCCLIFICVLFNL!FEYFFYIFYVVCS 
ND8RF2 P. serpens       VQPMQLPIHLLFYLVCCYPVIWLHPNLFCLVVVLWFLIYFCVFVSCIV!FI!----LFVV 
                        .    * ** :  ..   * *: : .:  *****  **::**:                  
 
ND8RF2 T. brucei        YDLKKFTVKLNFD! 
ND8RF2 T. vivax         -------------- 
ND8RF2 L. tarentolae    INF----------- 
ND8RF2 L. amazonensis   KNLVNV-------- 
ND8RF2 P. serpens       !!!!K!RTK!Y!-- 

 

 

 

233 

APPENDIX I. RPS12 gRNA Alignments for TREU 667 SDM79 (A) and EATRO 164 SDM79 cells (B), and all editing variants (C and D). 

Amino acid translations are shown above the mRNA sequences.  The cDNA sequence of the most abundant gRNA in its 

sequence class is shown aligned beneath the fully edited mRNA. Lowercase u’s indicate uridines added by editing, asterisks indicate 
encoded uridines deleted during editing. Nucleotides and deletion sites in the fully edited mRNA were numbered starting from the 5’ 
end (+1=0). gRNAs are colored based on transcript abundance as follows: Blue<100; Green<1,000; Purple<10,000; Orange<100,000; 
Red>100,000; Black=not quantified. Watson-Crick (|) and G:U base pairs (:) are indicated. Mismatches are indicated by an 
octothorpe (#). Highlighted sequence represents sequences were multiple CU configurations are possible.   
 

A.  TREU SDM79 gRNAs - Jv2, I, H, G, F, E, D, C, B1, A 

                                                    M  W  F  L   Y  G   C  C  L  R   F  V  L  F  V   
                                                                  M   V  V  V  Y   V  L  F  Y  L  F  
CUAAUACACUUUUGAUAACAAACUAAAG*AAuAAAuuuuGuuuuuuuuGCGuAuGuGAUUUUU*GuAuG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGuu 
                           : |||||||:|::::||:|:|||:|||||:|:||:| |||:| ||||||                   |:::| 
                          11T-TTATTTAGAGTGGAAGAGACGTATACATTGAAGA-CATGC-CAACAAATA gJv2    gH 13TATAGTGA 
                                                             || :||:: ||:::|:|:||| ||||::||:||:||||| 
                                                        gI 14TAA-TATGT-CAGTGATAGATG-CAAAGTAAGATGAACAA 
 
L  C  Y  Y  M  S  P    R  L  P  S  S  G  N  R  R  V  L  Y  A        V  F  Y  L  Y  N  F  V  W  M  L  
 Y  V  I  I  W  V  R    D  C  P  V  P  V  T  D  V  Y  C  M  P        Y  F  I  Y  I  I  L  F  G  C  C 
uuAuGuuAuuAuAuGAGuCCG**CGAuuGCCCAGuuCCGGuAACCGACGuGuAuuGuAuGC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAuGuu 
||||:||||                                           |||:|||:|  |    :||||:||||:|:||||||:|||||||||||| 
AATATAATATA gI                             gF 18TCAATTAATATATG--G----TATAAGATAAGTGTATTAAGACAAACCTACAAAATATA 
|||::|:|:|||||||:||#:  |:|||||#||||||                               :|||:|:|:|||||:|||:||::|:||||:|| 
AATGTAGTGATATACTTAGAT--GTTAACGTGTCAAGATATA gH                 gE 12TATATAGAGTGAATATGTTAGAATGAGCCTATAA 
                   |:  |:||::#|#||:||||:|||||:|||||||||||| 
         gG 13TATTCAGT--GTTAGTAGATTGAGGCTATTGGTTGCACATAACATTCATA gG 
                                  |||||:|#|||:|#||:||||||||||  |    :| 
                  12TAAATTTAGTGACCGAAGGCTAGTGGTT-CATATAACATACG--G----TAATATA gFp 
 
 R  C  F  F  C  C  F  I  G  L  V  M    S  L  F  I  I  E     G  G  G  F  V  D  L  P     G      V   K  
  V  V  F  F  V  V  L  L  V  STOP 
GCGuuGuuuuuuuuGuuGuuuuAuuGGuuuAGuuAuG**UCAuuAuuuAuuAuAGA***GGGuGGuGGuuuuGuuGAuuuACCC***G****GuG*UAAA 
||||||||                          ||:  |||:|||:||:||:|||   ||:::#::|||||||||||||| 
CGCAACAATAAATA gE             14TATTAT--AGTGATAGATGATGTCT---CCTGTAGTCAAAACAACTAAATATA gC 
:#:||:|||:|:|::|:|||:|||:::||:|||||||  |||||||||||                ::|||:::||||:||||#   :    ||: |||| 
 11TATAATAAAGAGAGTAGCAAGATAGTTAAGTCAATAC--AGTAATAAATA gD        gB1 11TTAAAGTGACTAGATGGA---T----CAT-ATTT 

 
 Y  Y  T   R    I  V  S  STOP 
GuAuuAuACA*CG**UAuuGuAAGuuAGA*UUUAGAuAUAAGAUAUGUUUUU 
:|||:||||| ||  ||||||| 
TATAGTATGT-GC--ATAACATATA gB1 
        || |:  ||||::|||||||| ||:|||||||||||||||| 
    16TAAGT-GT--ATAATGTTCAATCT-AAGTCTATATTCTATACAATATAAA gA 

 

234 

B.  EATRO SDM79 gRNAs - Jv2, I, H, G, F, E, D, C, B1, A 

                                                    M  W  F  L   Y  G   C  C  L  R   F  V  L  F  V   
                                                                  M   V  V  V  Y   V  L  F  Y  L  F  
CUAAUACACUUUUGAUAACAAACUAAAG*AAuAAAuuuuGuuuuuuuuGCGuAuGuGAUUUUU*GuAuG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGuu 
                          |: |||||||:|::::||:|:|||:|||||:|:||:| |||:| ||||||                   |:::| 
                         10TT-TTATTTAGAGTGGAAGAGACGTATACATTGAAGA-CATGC-CAACAAATA gJv2    gH 13TATAGTGA 
                                                             |: :||:: :||::|:|:||| |:|:|||:||||||||| 
                                                      gI 12TATAG-TATGT-TAATGATAGATG-TGAGACAGAATAAACAA 
 
 
L  C  Y  Y  M  S  P    R  L  P  S  S  G  N  R  R  V  L  Y  A        V  F  Y  L  Y  N  F  V  W  M  L  
 Y  V  I  I  W  V  R    D  C  P  V  P  V  T  D  V  Y  C  M  P        Y  F  I  Y  I  I  L  F  G  C  C 
uuAuGuuAuuAuAuGAGuCCG**CGAuuGCCCAGuuCCGGuAACCGACGuGuAuuGuAuGC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAuGuu 
                                                    |||:|||                                 |||||||| 
                                                   12TAATGTAGTGAGTTAAATGGGATTAGATTGAT--------ACCTACAAGCATATA 
gF 
                                  |||||:|#|||||#||:||||||||||  |    :| 
                  14TAAATTTAGTGACCGAAGGCTAGTGGCT-CATATAACATACG--G----TAATATA gFp 
||||:|||| 
AATATAATATA gI 
|||::|:|:|||||||:||#:  |:|||||#||||||                    ||:|  |    :|||||:||:||:|||||:|::||:|:||||| 
AATGTAGTGATATACTTAGAT--GTTAACGTGTCAAGATATA gH     gE 05TATTATG--G----TATAAAGTAGATGTATTAGAGTAAGCTTACAA 
                   |:  |:||::#|#||:||||:|||||:||||||||||||    gE 12TATATAGAGTGAATATGTTAGAATGAGCCTATAA 
            13TATTCAGT--GTTAGTAGATTGAGGCTATTGGTTGCACATAACATTCATA gG 
                         |||:|#||:|:|!:|||||||||||:||:|:|||||  | 
                        14TAATGTGTTAGGATCATTGGCTGCATATGATATACG--GAAATATA gGe 
 
 
 R  C  F  F  C  C  F  I  G  L  V  M    S  L  F  I  I  E     G  G  G  F  V  D  L  P     G      V   K  
  V  V  F  F  V  V  L  L  V  STOP 
GCGuuGuuuuuuuuGuuGuuuuAuuGGuuuAGuuAuG**UCAuuAuuuAuuAuAGA***GGGuGGuGGuuuuGuuGAuuuACCC***G****GuG*UAAA 
|||||:|                          |||:  ||||:|:|||:||:|||   :::|||::|||||||||||:| 
CGCAATATATA gE                 14TATAT--AGTAGTGAATGATGTCT---TTTACCGTCAAAACAACTAGAATATA gC 
CGCAACAATAAATA gE                                                 ::|||:::||||:||||#   :    ||: |||| 
:#:||:|:|:|:|::|||::|:|||:|:|:|||||:|  |||||||               gB1 11TTAAAGTGACTAGATGGA---T----CAT-ATTT 
 14TATAATAGAGAGAGTAACGGAGTAATCGAGTCAATGC--AGTAATATATATA gD 
 
 
 
 Y  Y  T   R    I  V  S  STOP 
GuAuuAuACA*CG**UAuuGuAAGuuAGA*UUUAGAuAUAAGAUAUGUUUUU 
:|||:||||| ||  ||||||| 
TATAGTATGT-GC--ATAACATATA gB1 
        || |:  ||||::|||||||| ||:|||||||||||||||| 
    16TAAGT-GT--ATAATGTTCAATCT-AAGTCTATATTCTATACAATATAAA gA 
 

 

 

235 

C.  TREU SDM79 Variants 

J Variants 
                                                    M  W  F  L   Y  G   C  C  L  R   F  V  L  F  V   
                                                                  M   V  V  V  Y   V  L  F  Y  L  F  
CUAAUACACUUUUGAUAACAAACUAAAG*AAuAAAuuuuGuuuuuuuuGCGuAuGuGAUUUUU*GuAuG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGuu 
                           : |||||||:|::::||:|:|||:|||||:|:||:| |||:| |||||| 
                          11T-TTATTTAGAGTGGAAGAGACGTATACATTGAAGA-CATGC-CAACAAATA gJv2 
 
                                                 M  W  F  L   Y  G   C  C  L  R   F  V  L  F  V   
                                                               M   V  V  V  Y   V  L  F  Y  L  F  
CUAAUACACUUUUGAUAACAAACUAAAG*AAuAuAuuAGuuuuuuGCGuAuGuGAUUUUU*GuAuG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGuu 
                           : |||||||:|:|:||:|:||||:::|||:||: :|||| |||||||| 
                          11T-TTATATAGTTAGAAGATGCATGTGCTAGAAG-TATAC-CAACAACATATA gJv3 
 
                                                      M  W  F  L   Y  G   C  C  L  R   F  V  L  F  V   
                                                                    M   V  V  V  Y   V  L  F  Y  L  F  
CUAAUACACUUUUGAUAACAAACUAAAG*AAuAuAuAuuuuGuuuuuuuuGCGuAuGuGAUUUUU*GuAuG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGuu 
                                ||||||:|::::||:|:|||:|||||:|:||:| |||:| |||||| 
                               12TATATAGAGTGGAAGAGACGTATACATTGAAGA-CATGC-CAACAAATA gJv1 
 
                                                     M  W  F  L   Y  G   C  C  L  R   F  V  L  F  V   
                                                                   M   V  V  V  Y   V  L  F  Y  L  F  
CUAAUACACUUUUGAUAACAAACUAAAG*AAuAuAuuAuuGuuuuuuuuGCGuAuGuGAUUUUU*GuAuG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGuu 
                          |: ||||:|:|||:::||:|:|::|:|||||:||||:| ||||| ||| 
                         12TT-TTATGTGATAGTGAAGAGAGTGTATACATTAAAGA-CATAC-CAAATATATA gJv4 
 
D Variants 
D 
GC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAuGuuGCGuuGuuuuuuuuGuuGuuuuAuuGGuuuAGuuAuG**UCAuuAuuuAuu 
                                         :#:||:|||:|:|::|:|||:|||:::||:|||||||  ||||||||||| 
                                        11TATAATAAAGAGAGTAGCAAGATAGTTAAGTCAATAC--AGTAATAAATA gD 
         :|||:|:|:|||||:|||:||::|:||||:|||||||||| 
      12TATATAGAGTGAATATGTTAGAATGAGCCTATAACGCAACAATAAATA gE 
 
Dx 
GCUUCUUUUGAAUAAAAuuuGGGuuAuuGGuuuuCGGuuGuuGAGuGuAuuGuAuG**UCAuuAuuuAuu 
             |||||||:::|:|:|||:||:||:|:::::|||:|||||||||  :|||||| 
            09TTTTAAATTTAGTGACCGAAGGCTAGTGGTTCATATAACATAC--GGTAATATA gFp 
 
 

 

 

236 

BC Variants 
B1, C1 
V  M    S  L  F  I  I  E     G  G  G  F  V  D  L  P     G      V   K  Y  Y  T   R    I  V  S  STOP 
GuuAuG**UCAuuAuuuAuuAuAGA***GGGuGGuGGuuuuGuuGAuuuACCC***G****GuG*UAAAGuAuuAuACA*CG**UAuuGuAAGuuAGA*UUUAGAu 
   ||:  |||:|||:||:||:|||   ||:::#::||||||||||||||                           || |:  ||||::|||||||| ||:|||| 
TATTAT--AGTGATAGATGATGTCT---CCTGTAGTCAAAACAACTAAATATA gC              gA 16TAAGT-GT--ATAATGTTCAATCT-AAGTCTA 
                                   ::|||:::||||:||||#   :    ||: ||||:|||:||||| ||  ||||||| 
                                  11TTAAAGTGACTAGATGGA---T----CAT-ATTTTATAGTATGT-GC--ATAACATATA gB1 
 
B1*, C1 
V  M    S  L  F  I  I  E     G  G  G  F  V  D  L  P     G       Y  K  Y  Y  T   R    I  V  S  STOP 
GuuAuG**UCAuuAuuuAuuAuAGA***GGGuGGuGGuuuuGuuGAuuuACCC***G****GG*UAuAAGuAuuAuACA*CG**UAuuGuAAGuuAGA*UUUAGAu 
   ||:  |||:|||:||:||:|||   ||:::#::||||||||||||||                           || |:  ||||::|||||||| ||:|||| 
TATTAT--AGTGATAGATGATGTCT---CCTGTAGTCAAAACAACTAAATATA gC              gA 16TAAGT-GT--ATAATGTTCAATCT-AAGTCTA 
                                   ::|||:::||||:||||#   :    #| |||||:|||:||||| ||  ||||||| 
                                  13TTAAAGTGACTAGATGGA---T-----C-ATATTTATAGTATGT-GC--ATAACATATA gB1* 
 
B3t, Ct 
AAGA***GuGuGuGuuGuuuuGuGuuuGGuuuuAuACCC***G**UUGuuuuG*UAuAuuuuuAuGAuuAuuuAuuCA*CG**UAuuGuAAGuuAGA***UAGAu 
       :|:|:|::::|||::::|||::|:||||||||   |  |||||||| ||||| 
      18TATATATGGTAAAGTGTAAATTAGAATATGGG---C--AACAAAAC-ATATATATA gCt 
                                          :  ||:|:|:: |||||:::|||:||||||:||||| ||  |||||| 
                                         12T--AATAGAGT-ATATAGGGATATTAATAAGTAAGT-GC--ATAACAATATA gB3t 
                                                                             || |:  ||||::|||||||| ||:|||| 
                                                                      gA 16TAAGT-GT--ATAATGTTCAATCT-AAGTCTA 
 
B4t, Ct 
AAGAUUUGuGuGuGuuGuuuuGuGuuuGGuuuuAuACCC***G**UUGuuuuG*UAuuuuuAuAuuuGuAuAuAuuuuCA*CG**UAuuGuAAGuuAGA***UAGAu 
       :|:|:|::::|||::::|||::|:||||||||   |  |||||||| ||||| 
      18TATATATGGTAAAGTGTAAATTAGAATATGGG---C--AACAAAAC-ATATATATA gCt 
                                          :  :|||:|:: |||:|:||:||:|:||||||||:||| ||  ||||||| 
                                         14T--GACAGAGT-ATAGAGATGTAGATATATATAAGAGT-GC--ATAACATATA gB4t 
                                                                              || |:  ||||::|||||||| ||:|||| 
                                                                       gA 16TAAGT-GT--ATAATGTTCAATCT-AAGTCTA 
 

 

 

237 

 

D.  EATRO SDM79 Variants 

J Variants 
                                                    M  W  F  L   Y  G   C  C  L  R   F  V  L  F  V   
                                                                  M   V  V  V  Y   V  L  F  Y  L  F  
CUAAUACACUUUUGAUAACAAACUAAAG*AAuAAAuuuuGuuuuuuuuGCGuAuGuGAUUUUU*GuAuG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGuu 
                          |: |||||||:|::::||:|:|||:|||||:|:||:| |||:| ||||||         
                         10TT-TTATTTAGAGTGGAAGAGACGTATACATTGAAGA-CATGC-CAACAAATA gJv2 
 
                                                 M  W  F  L   Y  G   C  C  L  R   F  V  L  F  V   
                                                               M   V  V  V  Y   V  L  F  Y  L  F  
CUAAUACACUUUUGAUAACAAACUAAAG*AAuAuAuuAGuuuuuuGCGuAuGuGAUUUUU*GuAuG*GuuGuuGuuuAC*GuuuuGuuuuAuuuGuu 
                          |: |||||||:|:|:||:|:||||::||||:||: :|||| |||||||| 
                         11TT-TTATATAGTTAGAAGATGCATGTACTAGAAG-TATAC-CAACAACATATA gJv3 
 
G Variants 
Multiple G sequences from gG 
L  C  Y  Y  M  S  P    R  L  P  S  S  G  N  R  R  V  L  Y  A        V  F  Y  L  Y  N  F  V  W  M  L  
 Y  V  I  I  W  V  R    D  C  P  V  P  V  T  D  V  Y  C  M  P        Y  F  I  Y  I  I  L  F  G  C  C 
uuAuGuuAuuAuAuGAGuCCG**CGAuuGCCCAGuuCCGGuAACCGACGuGuAuuGuAuGC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAuGuu 
L  C  Y  Y  M  S  P    R  L  P  S  F  G  N  R  R  V  L  Y  A        V  F  Y  L  Y  N  F  V  W  M  L  
 Y  V  I  I  W  V  R    D  C  P  A  S  V  T  D  V  Y  C  M  P        Y  F  I  Y  I  I  L  F  G  C  C 
uuAuGuuAuuAuAuGAGuCCG**CGAuuGCCCAGCuuCGGuAACCGACGuGuAuuGuAuGC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAuGuu 
L  C  Y  Y  M  S  P    R  L  P  S  S  G  N  R  R  V  L  Y  A        V  F  Y  L  Y  N  F  V  W  M  L  
 Y  V  I  I  W  V  R    D  C  P  A  L  V  T  D  V  Y  C  M  P        Y  F  I  Y  I  I  L  F  G  C  C 
uuAuGuuAuuAuAuGAGuCCG**CGAuuGCCCAGCuCuGGuAACCGACGuGuAuuGuAuGC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAuGuu 
                   |:  |:||::#|#||:||||:|||||:|||||||||||| 
        gG1 13TATTCAGT--GTTAGTAGATTGAGGCTATTGGTTGCACATAACATTCATA gG 
|||::|:|:|||||||:||#:  |:|||||#|||          
AATGTAGTGATATACTTAGAT--GTTAACGTGTCAAGATATA gH 
 
Ge 
L  C  Y  Y  M  S  P    R  L  P  S  P  G  N  R  R  V  L  Y  A        V  F  Y  L  Y  N  F  V  W  M  L  
 Y  V  I  I  W  V  R    D  C  P  V  L  V  T  D  V  Y  C  M  P        Y  F  I  Y  I  I  L  F  G  C  C 
uuAuGuuAuuAuAuGAGuCCG**CGAuuGCCCAGuCCuGGuAACCGACGuGuAuuGuAuGC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAuGuu 
                         |||:|#||:||||:|||||||||||:||:|:|||||  | 
                        14TAATGTGTTAGGATCATTGGCTGCATATGATATACG--GAAATATA gGe 
|||::|:|:|||||||:||#:  |:|||||#||||||                    ||:|  |    :|||||:||:||:|||||:|::||:|:||||| 
AATGTAGTGATATACTTAGAT--GTTAACGTGTCAAGATATA gH     gE 05TATTATG--G----TATAAAGTAGATGTATTAGAGTAAGCTTACAA 

 

238 

DEF Variants 
D, E, F 
GACGuGuAuuGuAuGC**C****GuAuuuuAuuUAuAuAAuuuuGuuuGGAuGuuGCGuuGuuuuuuuuGuuGuuuuAuuGGuuuAGuuAuG**UCAuuAuuuAuu 
|##||:||||||||||  |    :|                              :#:||:|:|:|:|::|||::|:|||:|:|:|||||:|  ||||||| 
C-TCATATAACATACG--G----TAATA gFp                    gD 14TATAATAGAGAGAGTAACGGAGTAATCGAGTCAATGC--AGTAATATATAT 
            ||:|  |    :|||||:||:||:|||||:|::||:|:||||||||||:| 
     gE 05TATTATG--G----TATAAAGTAGATGTATTAGAGTAAGCTTACAACGCAATATATA gE 
                    12TATATAGAGTGAATATGTTAGAATGAGCCTATAACGCAACAATAAATA gE 
 
Dx 
AGCCGGAACCGACGGAGAGCUUCUUUUGAAUAAAAuuuGGGuuAuuGGuuuuCGGuuGuuGAGuGuAuuGuAuG**UCAuuAuuuAuu 
                                  ||||:::|:|:|||:||:||:|::::||||:|||||||||  :|||||| 
                                 14TAAATTTAGTGACCGAAGGCTAGTGGCTCATATAACATAC--GGTAATATA gFp 
 
Ee 
CCGGAACCGACGGAGAuGuC**C****GuAuA*AuuuuAAAuuuGGGuuAuuGGuuuCGuuGuuuuuuuuGuuGuuuuAuuGGuuuAGuuAuG**UCAuuAuuuAuu 
                                                         #:||:|:|:|:|::|||::|:|||:|:|:|||||:|  ||||||| 
                                                    gD 14TATAATAGAGAGAGTAACGGAGTAATCGAGTCAATGC--AGTAATATATAT 
                      |    :|||| |:||:||||:||:|||||:#||:|||||||| 
               12TATAGTG----TATAT-TGAAGTTTAGATCTAATAGACAGAGCAACAATATATA gEep 
 
D, E Misanchored, Fe 
CGuuACGuuGAGuuAuuGC**C****GuuuuAuuuAUAuAAuuuuAuuuGGGuAGuGCGuuGuuuuuuuuGuuGuuuuAuuGGuuuAGuuAuG**UCAuuAuuuAuu 
                          ||:|:|:|||||:|||:|||:|:||:||#||||||||| 
                     12TATATAGAGTGAATATGTTAGAATGAGCCTATAACGCAACAATAAATA gE 
|:|:||:|:||:||||:||  |    |||#||||||||||              :|:|:|:::||||:|:|||:|:||||::||:||:|||||  ||||||| 
GTAGTGTAGCTTAATAGTG--G----CAACATAAATATATA gFep     gD 12TATGTAGTGAAAAGAGCAATAGAATAGTCAGATTAATAC--AGTAATATATA 

 

 

 

 

239 

 
B Variants 
B1 
L  F  I  I  E     G  G  G  F  V  D  L  P     G      V   K  Y  Y  T   R    I  V  S  STOP 
uuAuuuAuuAuAGA***GGGuGGuGGuuuuGuuGAuuuACCC***G****GuG*UAAAGuAuuAuACA*CG**UAuuGuAAGuuAGA*UUUAGAuAUAAGAUAUGU 
|:|:|||:||:|||   :::|||::|||||||||||:|                            || |:  ||||::|||||||| ||:||||||||||||||| 
AGTGAATGATGTCT---TTTACCGTCAAAACAACTAGAATATA gC             gA 16TAAGT-GT--ATAATGTTCAATCT-AAGTCTATATTCTATACA 
                        ::|||:::||||:||||#   :    ||: ||||:|||:||||| ||  ||||||| 
                       11TTAAAGTGACTAGATGGA---T----CAT-ATTTTATAGTATGT-GC--ATAACATATA gB1 
 
B1’ 
L  F  I  I  E     G  G  G  F  V  D  L  P     G       Y  K  Y  Y  T   R    I  V  S  STOP 
uuAuuuAuuAuAGA***GGGuGGuGGuuuuGuuGAuuuACCC***G****GG*UAuAAGuAuuAuACA*CG**UAuuGuAAGuuAGA*UUUAGAuAUAAGAUAUGU 
|:|:|||:||:|||   :::|||::|||||||||||:|                            || |:  ||||::|||||||| ||:||||||||||||||| 
AGTGAATGATGTCT---TTTACCGTCAAAACAACTAGAATATA gC             gA 16TAAGT-GT--ATAATGTTCAATCT-AAGTCTATATTCTATACA 
                        ::|||:::||||:||||#   :    #| |||||:|||:||||| ||  ||||||| 
                       13TTAAAGTGACTAGATGGA---T-----C-ATATTTATAGTATGT-GC--ATAACATATA gB1* 
 
B2FSe 
L  F  I  I  E     G  G  G  F  V  D  L  P     G       Y  K  I  L  F  T     Y  C  K  L  D   L  D  I  R  Y  V  
F 
uuAuuuAuuAuAGA***GGGuGGuGGuuuuGuuGAuuuACCC***G****GG*UAuAAGAuAuuAuuCA*CG**UAuuGuAAGuuAGA*UUUAGAuAUAAGAUAUGU 
|:|:|||:||:|||   :::|||::|||||||||||:|                          ||||| |:  ||||::|||||||| ||:||||||||||||||| 
AGTGAATGATGTCT---TTTACCGTCAAAACAACTAGAATATA gC              gA 16TAAGT-GT--ATAATGTTCAATCT-AAGTCTATATTCTATACA 
                        ::|||:|:||||:||||#   :    #| |||||:||||:||||| ||  ||||||| 
                       08TTAAAGTGACTAGATGGA---T-----C-ATATTTTATAGTAAGT-GC--ATAACATATA gB2FSe 
 

 

 

 

 

240 

APPENDIX J.  gRNAs identified to edit the RPS12 mRNAs of found in both TREU 667 and EATRO 164 gRNA transcriptomes. 

 

 

Population 
J variant 1 
J variant 2 
J variant 2 
J variant 3 
J variant 3 
J variant 3 
J variant 3 
J variant 3 
J variant 3 
J variant 4 
J variant 4 
J variant 4 
I 
I 
I 
I 
I 
I 
H 
H 
H 
H 
H 
H 
H 
H 
H 
H 
G 
Ge 
Ge 
Fp 
Fp 
Fp 
Fp 
Fp 
Fp 
Fp 
F 
Fep 

 

 

 

Editing 
Region 

J 

I 

H 

G 

F 

 

 

 

 

 

 

Sequence 
   ATA AACAACCGTACAGAAGTTACATATGCAGAGAAGGTGAGATATATT TTTTTTATTT 

   ATA AACAACCGTACAGAAGTTACATATGCAGAGAAGGTGAGATTTATTTTTT TTTTTTT* 

   ATA AACAACCGTACAGAAGTTACATATGCAGAGAAGGTGAGATTTAT ATTTTTTTTTTT 

ATAT ACAACAACCATATGAAGATCATGTACGTAGAAGATTGATATAT TTTTTTTTTTTTT 

ATAT ACAACAACCATATGAAGATCATGTACGTAGAAGATTGATATATA TTTTTTTTTTTT 

ATAT ACAACAACCATATGAAGATCATGTACGTAGAAGATTGATATATAATTTTT TTTTTTTT 

ATAT ACAACAACCATATGAAGATCATGTACGTAGAAGATTGATAT TTTTTTTTTT 

 TAT ACAACAACCATATGAAGATCATGTACGTAGAAGATTGATATATAAT ATTTTTT 

  AT ACAACAACCATATGAAGATCATGTACGTAGAAGATTGATATATAATTT ATTTT 

  ATATATA AACCATACAGAAATTACATATGTGAGAGAAGTGATAGTGTATTT TTTTTAATTTTT 

  ATATATA AACCATACAGAAATTACATATGTGAGAGAAGTGATAGTGT TTTTTT 

    ATATA AACCATACAGAAATTACATATGTGAGAGAAGTGATAGTGTAT ATTTTTTTT* 
 
     AT ATAATATAAAACAAATAAGACAGAGTGTAGATAGTAATTGTATGA TATTTTTTTTTTTT 

      ATAT ATATAAAACAAATAGAATAGAACGTAGATGAT TACTGTATAATTTTTTTTTTAT 

A TATATAATAACATAAAACAAATAGAACGAGATGTAAATGATA TCTATTTTTTTATTTTT 

     AT ATAATATAAAACAAGTAGAATGAAACGTAGATAGTGACTGTATAA TTTTTTTTTTTTCT 

        ATAATATAAAACAAATGAAACGAAGCGTAGACAGTAATTATATGA TATTTTTTTTCTTTT 

     AT ATAATATAAAACAAGTAGAATGAAACGTAGATAG GGACTGTATAATTTTT 
 
ATATAGAACTGTGCAATTGTA GATTCATATAGTGATGTAAAGTGA TATTTTTTTTTTTTT* 

ATATAGAATTAGGCAATCA CGGATTTATATAGTAACGTAAAATGA TATTTTTTTTTTTTG 

 ATATATAACTGGGCATCT CGGATTTGTATAGTGATATAAAGTGAATAA TTTTTTTTTTTTT 

ATATATAACTGGACAATCGTA GGCTTGTATGATGATATGAGATGAGTAAA TTTTTTTTTTTT 

ATATATAACTGGACAATCGTA GGCTTGTATGATGACATGAGATGAGTAAA TTTTTTTTTTTCTTT 

ATATATAACTGGACAATCGA TGGGCTTGTATAATGATATGAGATGAGTAAA TTTTTTTTTTTTTG 

ATATATAACTGGACAATCGA TGGGCTTGTATAATGATATGAGATGAGTAA TTTTTTTTTTTT 

 ATATAAACTGTGCAATCGA TGGACTTATGTAGTGATATAAGATGA TAATTTTTTTTTTTT 

 TATATAACTGGACAATCGA TGGGCTTGTATAATGATATGAGATGAG GAAATTTTTTTTTTTT 

  ATATATAACTGGCAATAT CGGACTCATATAGTGATGTGAAGTAAATA TTTTTTTTTT 
 
      ATACT TACAATACACGTTGGTTATCGGAGTT AGATGATTGTGACTTATTTTTTTTTTTTTC 

ATATAAA GGCATATAGTATACGTCGGTTACT AGGATTGTGTAATTTTTTTTTTTTTT 

ATATAAAGC CATATAGTATACGTCGGTTACT AGGATTGTGTAATTTTTTTTTTTT 
 
AT ATAATGGCATACAATATACTCGGTGATCGGAAGCCAGTGATTTAAATTTT TTTTTTTTTT 

AT ATAATGGCATACAATATACTCGGTGATCGGAAGCCAGTGATTT TTTTTTTTTTT 

AT ATAATGGCATACAATATACTCGGTGATCGGAAGCCAGTGATTTAA TTTTTATTTTTTTTTT 

 T ATAATGGCATACAATATACTCGGTGATCGGAAGCCAGTGATTTAAATT AAATTTTTTT 

AT ATAATGGCATACAATATACTCGGTGATCGGAAGCCAGTGATTTA TTTTTTTTGTTTTTTTT* 

AT ATAATGGCATACAATATACTTGGTGATCGGAAGCCAGTGATTTAAATTT GTTTTTTTTTTTT 

AT ATAATGGCATACAATATACTTGGTGATCGGAAGCCAGTGATTTAAATTTTGTTT T 

ATATAA AACATCCAAACAGAATTATGTGAATAGAATATGGTATATAAT TAACTTTTTTTTTTTTTTTTTT 

ATATATAAATAC AACGGTGATAATTCGATGTGATG ATATCTGTAATTTTTTTTTTTTTTAT 
 

241 

 Reads 
TREU 667  
1 
3,190 
848 
751 
628 

 
72 

 
28 
8 
6 
2 
 
5,762 

 
503 
6,937 
687 
72 
 
12,736 

 
 
 
 
17,207 
1,140 
828 
505 
169 
 
791 
94,870 
196 
 
2,288 
599 
181 
224 
87 
156 
60 
63 

 
 

 Reads 
EATRO 164  

 
1,731 
401 
8,006 
3,974 
2,969 
282 
184 
144 

 
 
 
 
6,201 
2,718 
133 

 
 
 
 
4,786 
2,326 
979 
162 
90 

 
 
 
 
 
 
15,732 
90,998 
142 
 
2,552 
346 
256 
172 
120 

 
 
 
1,823 
 

E 
E 
E 
Eep 
Eep 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 

E 

 

D 

 
 
 

 

ATAAAT AACAACGCAATATCCGAGTAAGATTGTATAAGTGAGATAT ATTTTTTTTTTTT 

   ATAT ATAACGCAACATTCGAATGAGATTATGTAGATGAAATATGGTAT TATTTTT 

    ACAAATAACGCAACA TCAGATGAGATTATATAAGTGAGATATG ATATATTTTTTTTTTT 

ATATATAACAACGAGACA GATAATCTAGATTTGAAGTTATATG TGATATTTTTTTTTTTT 

 

ATACATAACAACGAGACA GATAATCTAGATTTAGAGTTATATG TGATATTTTTT 
 
ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAGATAAT ATTTTTTTTTTTTTT 

   ATATATAATGAC TAACTAAACTGATAAAGCAGTAGAAGAGATGATGTAAT TTTTTTTTTTT 

   ATATATAATGAC TAACTAAACTGATAAAGCAGTAGAAGAGATGATGTAATATTT TTTTTTTTTTT 

   ATATATATGACATAACTT GGCCAATGAGATAATGAAAGAGATGGTGTAAT TTTTTTTTTTTTCT 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAGATA TTTTTTTTTTTTT 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAGAT TTTTTTTTTTTTTG 

ATATATATAATGACT TAACTGAGCTAATGAGGCAATGAGAGAGATAAT ATTTTTTTTTTTT 

ATATATATAATGACGTAACTGAT CTAATGAGGCAATGAGAGAGATAAT ATTTTTTTTTTTTT 

 ATAAATTAATGACATAACTT GACTAATAGGACAGTGAAAGAGGCAGTGTAAT TCTTTTTTTTTT 

ATATAT ATAATGACGTAACTGAGCTAATGAAGCAATGAGAGAGATAAT ATTTTTTTTTTTTT 

ATATAT ATAATGACGTAACTGAGCTAATGAGACAATGAGAGAGATAAT ATTTTTTTTTTTG 

 ATATATAT ATGACGTAACTGAGCTAATGAGGCAATGAGAGAGATAAT ATTTTTTTTTT 

ATAT ATAATGACATAATTAGACTGATAAGATAACGAGAAAAGTGATGTATT TTTTTTTTTT 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGGGATAAT ATTTTTTTTTTTT 

ATATAT ATAATGACGTAACTGAACTAATGAGGCAATGAGAGAGATAAT ATTTTTTTTTTTT 

AT ATAAATAATGACATAACTAGGTTAGTAAAGTGACGAAGAAGATAAT ATTATTTT 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAAATAAT ATTTTTTTTTATTTTT 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGGGAGATAAT ATTTTTTTTTTT 

ATATAT ATAATGACGTAACTGAGTTAATGAGGCAATGAGAGAGATAAT ATTTTTTTTATTTT 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAGGTAAT ATTTTTTGTTTTTTT 

ATATAT ATAATGACGTAACTGAGCTAATGAGGCAATGAGAGAAATA TTTTTTTTTGTTTTT 

      ATAAATAATGACATAACTGAATTGATAGAACGATGAGAGAAATAAT ATTTTTTTTTTTC 

    ATAAAT TAATGACATAACTGAATCGATAGAATAATGAGAGAGATAAT ATTTTTTTTTTTG 

    ATAAATA AATGACATAACTGGATTAGTAAAGTGGTGAAAAAGATAAT ATTTTTTTTTTT 

      A AAATAATGACATAACTGAATTGATAGAACGATGAGAGAAATAAT TTTTTTTTTTCC 

     ATATATAATA ACGTAATTGGATCAGTGAGATAACGA TAGAAATGATATATTTTTTTTTTTTTGTTT 

     ATAT ATAATGACATAATTAGACTGATAAGATGACGAGAAAAGTGATGTA TTTTTTTTTTTC 

    ATATATA AATGACATGACTAAACTAATAGGGCAGTGAAGAAGACAATG ATTTTTTTTTTTC 

  ATATATATC AATGACATAACTGAACTAGTAAGATAATGAGAGAAGT TTTTTTTTTTTAT 

 ATAT ATAAATAATGACGTAATTAAACTGGTAAGATGATAGAAAAAGT TTTTTTTTTTTTT 

    ATATATA AATGACATGACTAAACTGATAGGACAGTAAAGAGGACAATG ATTTTTTTTTTGTTT 

ATATATATATC AATGACATAACTGAACTAGTAAGATAATGAGAGAA TTTTTTTTTCTTTT 

     ATAT ATAATGACATAACTGAACT TGTAAGATAGCGAGATTTTTTTTTTTTTT 

  ATATATATC AATGACATAACTGAACTAGTAAGATAATGAGAGAAGTGAT TTTTTTTTTTTT 

      ATAAATAATGACATAACTGAATTGATAGAACGATGAGAGAAAT TTTTTTTTTTT 

     T TAAATAATGACATAACTGAATTGATAGAACGATGAGAGAAATAAT ATTAATTTTTTTTTTTTT 

      ATAAATAATGACATAACTAAATTGATAGAACGATGAGAGAAATAAT ATTTT 

 
 
 

 ATAT ATAAATAATGACGTAATTAAACTGGTAAGATGATAGAAAAAGTGAT TTTAAATTTTTTTTTT 
 
 
 

242 

10,751 
125 

 
634 
113 
 
 
3,627 
1,168 
10,277 

 
 
 
 
 
 
 
 
2,340 

 
 
 
 
 
 
 
 
39,218 
23,327 
6,953 
5,004 
3,253 
2,627 
1,811 
1,594 
832 
815 
756 
636 
220 
183 
159 
148 
129 
 
 
 

4,860 
64 
706 
234 

 
 
344,244 
5,997 
3,292 
2,172 
1,088 
924 
809 
747 
558 
534 
521 
512 
382 
353 
308 
261 
258 
250 
193 
178 
160 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

C 
C 
C 
C 
Ct 
Ct 
B1 
B1 
B1* 
B2FSe 
B2FSe 
B3t 
B4t 
A 

 

 

 ATAC AAATCAACAAAACTACTACTCTCTATAGTGA TGATGATATGTATTTTTTGTTTTT 

ATATA AGATCAACAAAACTGCCATTTTCTGTAGTAAGTGATGATATAT TTTTTTTTATTTT 

ATATAAATCAACAAAACTGA TGTCCTCTGTAGTAGATAGTGATAT TATTTTTTTTTTCTTT 

 ATATAAATCAACAAAACTAG TGCCTTCTATAGTAGATGATGATATATGAT TTTTTTTTTTT 

ATAT ATATACAAAACAACGGGTATAAGATTAAATGTGAAATGGTATATAT TTTTTTTTTTTTTTTTT 

ATAT ATATACAAAACAACGGGTATAAGATTAAATGTG TTTTTTTTT 
 
ATA TACAATACGTGTATGATATTTTATACTAGGTAGATCAGTGAAATTTTTTTTTTTT 

ATA TACAATACGTGTATGATATTTTATACTGGGTAGATCAGTGAAATT TTTTTTT 

ATA TACAATACGTGTATGATATTTATACT AGGTAGATCAGTGAAATTTTTTTTTTTTTT 

ATA TACAATACGTGAATGATATTTTATACT AGGTAGATCAGTGAAATTTTTTTTT 

ATA TACAATACGTGGATGATATTTTATACT AGGTAGATCAGTGAAATTTTTTTTTTTTTTT 

ATATA ACAATACGTGAATGAATAATTATAGGGATATATGAGATAAT TTTTTTTTTTT 

ATA TACAATACGTGAGAATATATATAGATGTAGAGATATGAGACAGT TTTTTTTTTTTTT 
 
AAATAT AACATATCTTATATCTGAATCTAACTTGTAATATGTG AATTTTTTTTTTTTTTTT 

80 

 
1,587 
338 
8 
1 
 
1,699 

 
69,971 

 
1 
1,941 
3,453 
 
758 

41 
28 

 
333 

 
 
 
17,696 
10 
15,488 
4 
2 

 
 
 
158 

 

243 

C 

B 

 

 
A 

 

 

 

 

APPENDIX K. ND7 gRNA Alignments for TREU 667 SDM79 (A) and EATRO 164 SDM79 cells (B), and all editing variants (C and 

D). 

Amino acid translations are shown above the mRNA sequences.  The cDNA sequence of the most abundant gRNA in its 

sequence class is shown aligned beneath the fully edited mRNA. Lowercase u’s indicate uridines added by editing, asterisks 
indicate encoded uridines deleted during editing. Nucleotides and deletion sites in the fully edited mRNA were numbered 
starting from the 50 end (+1=0). gRNAs are colored based on transcript abundance as follows: Blue<100; Green<1,000; 
Purple<10,000; Orange<100,000; Red>100,000; Black=not quantified. Watson-Crick (|) and G:U base pairs (:) are indicated. 
Mismatches are indicated by an octothorpe (#). Highlighted sequence represents sequences were multiple CU configurations 
are possible.   
 

A.  TREU 667 ND7 

E1v1, D, C, B, A 
  Y  K  K  T  W  L  H  D  K  Y  H  F  M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q  H  P  A 
 I  Q  K  N  M  T  T  W  ST V  S  F  Y  V  I  F  G  S  F  F  T  F  V  S  F  Y  I  W   S  T  A  S  R  
GAUACAAAAAAACAUGACUACAUGAUAAGUAuCAuuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG 
                                                                                     :|#||||##|||#|: 
                                                                                 gC 17TAAGTGTATTAGAGT 
                                                          ||:||||:|:||||:|:|||||||:| ::||||#||||| 
                                                   gD 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA 
                           |:|||||||:||::|:|:|:|:|:|||||||:|||||||||||| 
                        09TATTATAGTAAGATGTAGTGAGAGCTATCAAAAGAATGTAAACATATAAA gE1v1 
 
     A  H  G    V  L  C  C  L  L  Y  F  C  G  E   F  I  V  Y    I  D  C  I  I     G   Y  L  H  R  G  
    S  T  W    C  F  M  L  F  I  V  F  L  W  W   I  Y  C  L    Y  W  L  Y  Y  R      L  F  A  S  W   
***CAGCACAuG**GuGuuuuAuGuuGuuuAuuGuAuuuuuGuGGuGA*AuuuAuuGuuuA**UAUUGAuUGuAuuAuA***G*GuuAUUUGCAUCGUGG 
                                                 ||||||::|:||  :||::||:||||:|||   : ||||||||||||:|: 
                                             gA 13TAAATAGTAGAT--GTAGTTAGCATAGTAT---T-CAATAAACGTAGTATATA 
                            ||||::||:|:|::::|||| |||:||::||||  |||||||| 
                      16TAATTAATAGTATGAGAGTGTCACT-TAAGTAGTAAAT--ATAACTAAACATA gB 
   |##||||||  :|:|||:|::||||||||||||| 
---GAAGTGTAC--TATAAAGTGTAACAAATAACATATATA gC 
 
 T  E  K  L  C  E  Y  K   
Y  R  K  V  M  W  I  ST K 
UACAGAAAAGUUAUGUGAAUAUAAAAG 
 

 

 

244 

B.  EATRO 164 ND7 

E1v1, D, C, B, A 
  Y  K  K  T  W  L  H  D  K  Y  H  F  M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q  H  P  A 
 I  Q  K  N  M  T  T  W  ST V  S  F  Y  V  I  F  G  S  F  F  T  F  V  S  F  Y  I  W   S  T  A  S  R  
GAUACAAAAAAACAUGACUACAUGAUAAGUAuCAuuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG 
                                                                                 ||: |||#|||:#|||#|| 
                                                                            gC 19TAAT-TAGATGTT-TAGAGC 
                                                          ||:||||:|:||||:|:|||||||:| ::||||#||||| 
                                                   gD 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA  
                           |:|||||||:||::|:|:|:|:|:|||||||:|||||||||||| 
                        09TATTATAGTAAGATGTAGTGAGAGCTATCAAAAGAATGTAAACATATAAA gE1v1 
 
     A  H  G    V  L  C  C  L  L  Y  F  C  G  E   F  I  V  Y    I  D  C  I  I     G   Y  L  H  R  G  
    S  T  W    C  F  M  L  F  I  V  F  L  W  W   I  Y  C  L    Y  W  L  Y  Y  R      L  F  A  S  W   
***CAGCACAuG**GuGuuuuAuGuuGuuuAuuGuAuuuuuGuGGuGA*AuuuAuuGuuuA**UAUUGAuUGuAuuAuA***G*GuuAUUUGCAUCGUGG 
                                                 ||||||:::|||  |||::||::||:||||   | :||||:||||||||| 
                                             gA 12TAAATAGTGAAT--ATAGTTAGTATGATAT---C-TAATAGACGTAGCACATATATA  
                       :||:||:|:|:||:||:|::|||:| ||:|||:|||||  ||||||| 
                      15TAATAAGTGATATGAAGATGCCATT-TAGATAGCAAAT--ATAACTACATA gB 
   #||#|||||  |:::|||||||||||||||:||| 
----TCATGTAC--CGTGAAATACAACAAATAATATA gC 
 
 T  E  K  L  C  E  Y  K   
Y  R  K  V  M  W  I  ST K 
UACAGAAAAGUUAUGUGAAUAUAAAAG 
 

 

 

245 

C.  TREU 667 ND7 Variants 

E Variants 
E1v2, D, C 
D  T  K  K  H  D  Y  M  I  S  T  F  M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q  H  P  A   
  Y  K  K  T  W  L  H  D  K  Y  I  Y  V  I  F  G  S  F  F  T  F  V  S  F  Y  I  W   S  T  A  S  R    
GAUACAAAAAAACAUGACUACAUGAUAAGUACAuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG** 
                                                                                   :|#||||##|||#|:   
                                                                               gC 17TAAGTGTATTAGAGT-- 
                                                        ||:||||:|:||||:|:|||||||:| ::||||#|||||# 
                                                 gD 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA 
                                |||||::|:|:||:|:::||::|:|||||||||||||||| 
                               12TAAATGTAGTGAAGATTGTCGGAGAAATGTAAACATAGCATATACA gE1v2 
 
 
E2v1, D, C 
 I  Q  K  N  M  T  T  W  ST V  S  F  M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q  H  P  A  
D  T  K  K  H  D  Y  M  I  S  I  I  Y  V  I  F  G  S  F  F  T  F  V  S  F  Y  I  W   S  T  A  S  R   
GAUACAAAAAAACAUGACUACAUGAUAAGUAuCAuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG* 
                                                                                    :|#||||##|||#|:  
                                                                                gC 17TAAGTGTATTAGAGT- 
                                                         ||:||||:|:||||:|:|||||||:| ::||||#|||||# 
                                                  gD 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA 
                            :|||||:||||:|||::|:||:|||:||:||||||||||||| 
                        15TAATATAGTGAATATAATGGAGACTATCGAAGAAATGTAAACATATAAA gE2v1 
 
 
E2v2, D, C 
 I  Q  K  N  M  T  T  W  ST V  Q  L  L  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q  H  P  A     A  
D  T  K  K  H  D  Y  M  I  S  T  I  V  I  G  S  F  F  T  F  V  S  F  Y  I  W   S  T  A  S  R     S   
GAUACAAAAAAACAUGACUACAUGAUAAGUACAAuuGuuAuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG***CAGC 
                                                                              :|#||||##|||#|:   :##| 
                                                                          gC 17TAAGTGTATTAGAGT---GAAG 
                                                   ||:||||:|:||||:|:|||||||:| ::||||#|||||# 
                                            gD 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA 
                                 |||:|:|||:|||:|:||:||||||:|:||:||||||||||||: 
                                14TAATAGTAATCATTAGAAGAATGTAGATATGGCAAAATGTAAATATA gE2v2 
 
 

 

246 

E3t, D, C 
 I  Q  K  N  M  T  T  W  ST V  S  F  M  L  F  L  V  V  F  T  F  V  S  F  Y  I  W   S  T  A  S  R     
D  T  K  K  H  D  Y  M  I  S  I  I  Y  V  I  F  G  S  F  Y  I  C  I  V  L  H  L   V  H  S  I  P      
GAUACAAAAAAACAUGACUACAUGAUAAGUAuCAuuuAuGuuAuuuuuGGuAGuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG* 
                                     ||:|:|||:|::|||::||:||||:|::||||:|||||||||| 
                                    12TATAGTAAGAGTCATTGAAGATGTGAGTATAGTAAAATGTAAAATATA gE4t 
                            :|||||:||||:|||::|:||:|||:||:| 
                        15TAATATAGTGAATATAATGGAGACTATCGAAGAAATGTAAACATATAAA gE2v1 
                                                                                  :|#||||##|||#|: 
                                                                              gC 17TAAGTGTATTAGAGT- 
                                                       ||:||||:|:||||:|:|||||||:| ::||||#||||| 
                                                gD 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA 
 
E4t, D, C 
 I  Q  K  N  M  T  T  W  ST 
D  T  K  K  H  D  Y  M  I  S  T  S  Y  F  W  ST 
  Y  K  K  T  W  L  H  D  K  Y  K  L  F  L  V  V  F  T  F  V  S  F  Y  I  W   S  T  A  S  R     
GAUACAAAAAAACAUGACUACAUGAUAAGUACAAGuuAuuuuuGGuAGuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG*** 
                                  :|:|||:|::|||::||:||||:|::||||:|||||||||| 
                               12TATAGTAAGAGTCATTGAAGATGTGAGTATAGTAAAATGTAAAATATA gE4t 
                                                                             :|#||||##|||#|: 
                                                                         gC 17TAAGTGTATTAGAGT--- 
                                                  ||:||||:|:||||:|:|||||||:| ::||||#||||| 
                                           gD 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA 
 
 
C Variant 
CFSt, B 
D  D  I  W   S  T  A  S  R     Y  A  H  G    V  L  C  C  L  L  Y  F  C  G  E   F  I  V  Y    I  D  C 
  R  H  L   V  H  S  I  P     L  C  T  W    C  F  M  L  F  I  V  F  L  W  W   I  Y  C  L    Y  W  L  
 T  T  F  G   P  Q  H  P  A     M  H  M    V  F  Y  V  V  Y  C  I  F  V  V  N   L  L  F  I    L  I   
ACAGACGACAGUGUCCACAGCAuCCCG***CuAuGCACAuG**GuGuuuuAuGuuGuuuAuuGuAuuuuuGuGGuGA*AuuuAuuGuuuA**UAuuGAuU 
                  |:||||#|:   ||||:||||||  :|:||:|||||:||||||||||| 
gCFSt 57 Reads 14TAATTGTAGAGT---GATATGTGTAC--TATAAGATACAGCAAATAACATATATA 
                                                         ||||::||:|:|::::|||| |||:||::||||  |||||||| 
                                    gB 26624 Reads 16TAATTAATAGTATGAGAGTGTCACT-TAAGTAGTAAAT--ATAACTAAACATA 
 
 

 

 

247 

D.  EATRO 164 ND7 Variants 

E Variants 
E2v1, D, C 
 I  Q  K  N  M  T  T  W  ST V  S  F  M  L  F  L  V  V  F  L  H  L  Y  R  F  T  F  G   P  Q  H  P  A  
D  T  K  K  H  D  Y  M  I  S  I  I  Y  V  I  F  G  S  F  F  T  F  V  S  F  Y  I  W   S  T  A  S  R   
GAUACAAAAAAACAUGACUACAUGAUAAGUAuCAuuuAuGuuAuuuuuGGuAGuuuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG* 
                                                                                ||: |||#|||:#|||#||  
                                                                           gC 19TAAT-TAGATGTT-TAGAGC- 
                                                         ||:||||:|:||||:|:|||||||:| ::||||#||||| 
                                                  gD 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA 
                           |:|||||:|||||::|:|:|:::||||:|:|||||||||:| 
                        14TATTATAGTGAATACGGTGAGAGTTATCAGAGAAATGTAAATAATATA gE2v1 
 
 
E4e, D 
                     M  I  S  T  Y  C  Y  W  STOP 
             M  T  T  W  STOP 
  Y  K  K  T  W  L  H  D  K  Y  I  L  L  L  V  V  F  T  F  V  S  F  Y  I  W   S  T  A  S  R     S  T 
GAUACAAAAAAACAUGACUACAUGAUAAGUACAuAuuGuuAuuGGuAGuuuuuACAuuuGuAuCGuuuuACAuuuG*GUCCACAGCAuCCCG***CAGCA 
                                |||||:|:|||:||||:||:|||||:||:||||||||||||| 
                               13TATAATAGTAATCATCGAAGATGTAGACGTAGCAAAATGTAACATATA gE4e 
                                                  ||:||||:|:||||:|:|||||||:| ::||||#||||| 
                                           gD 13TGTAAGTGTAGATATAGTAGAATGTAAGC-TGGGTGACGTAGATATATA  
 
 

 

 

248 

BC Variants 
C1ex, B, A 
D  T  K  K  H  D  Y  M  I  S  T  R  G  D  R  R  Q  C  P  Q  H  P   F  I  V  S  F  I  G    I  C  C  L 
  Y  K  K  T  W  L  H  D  K  Y  K  R  R  Q  T  T  V  S  T  A  P   V  H  C  F  I  H  W    D  L  L  F  
GAUACAAAAAAACAUGACUACAUGAUAAGUACAAGAGGAGACAGACGACAGUGUCCACAGCACCCG*UUCAuuGuuuCAuuCAuuG**GGAuuuGuuGuu 
                                                                                              :||:|| 
                                                                                          gB 15TAATAA 
                                                                 : ||#|:|:|:|||||||:||  ||||||::|||| 
                                                           gC1ex 11T-AATTGATAGAGTAAGTGAC--CCTAAATGACAA 
 
  L  Y  F  C  G  E   F  I  V  Y    I  D  C  I  I     G   Y  L  H  R  G  T  E  K  L  C  E  Y  K   
 I  V  F  L  W  W   I  Y  C  L    Y  W  L  Y  Y  R      L  F  A  S  W  Y  R  K  V  M  W  I  ST K   
uAuuGuAuuuuuGuGGuGA*AuuuAuuGuuuA**UAUUGAuUGuAuuAuA***G*GuuAUUUGCAUCGUGGUACAGAAAAGUUAUGUGAAUAUAAAAG 
                    ||||||:::|||  |||::||::||:||||   | :||||:||||||||| 
                   12TAAATAGTGAAT--ATAGTTAGTATGATAT---C-TAATAGACGTAGCACATATATA gA 
:|:|:||:||:|::|||:| ||:|||:|||||  ||||||| 
GTGATATGAAGATGCCATT-TAGATAGCAAAT--ATAACTACATA gB 
|||||| 
ATAACAAAATATA gC1ex 
 
C2ex, B, A 
D  T  K  K  H  D  Y  M  I  S  T  R  G  D  R  R  Q  C  P  Q  H  P    S  L  S  Y  S   V  F  Y  C  C  L 
  Y  K  K  T  W  L  H  D  K  Y  K  R  R  Q  T  T  V  S  T  A  P  V    I  V  L  Q   C  V  L  L  L  F  
GAUACAAAAAAACAUGACUACAUGAUAAGUACAAGAGGAGACAGACGACAGUGUCCACAGCACCCG**UCAuuGuCuuACAG*UGuGuuuuAuuGuuGuu 
                                                                                              :||:|| 
                                                                                          gB 15TAATAA 
                                                                 :  |#|:|:|||||||| ||||||:|||||#|||| 
                                               gC2ex 07TAATATAGAGTAT--ATTGATAGAATGTC-ACACAAGATAAC-ACAA 
 
  L  Y  F  C  G  E   F  I  V  Y    I  D  C  I  I     G   Y  L  H  R  G  T  E  K  L  C  E  Y  K   
 I  V  F  L  W  W   I  Y  C  L    Y  W  L  Y  Y  R      L  F  A  S  W  Y  R  K  V  M  W  I  ST K   
uAuuGuAuuuuuGuGGuGA*AuuuAuuGuuuA**UAUUGAuUGuAuuAuA***G*GuuAUUUGCAUCGUGGUACAGAAAAGUUAUGUGAAUAUAAAAG 
                    ||||||:::|||  |||::||::||:||||   | :||||:||||||||| 
                   12TAAATAGTGAAT--ATAGTTAGTATGATAT---C-TAATAGACGTAGCACATATATA gA 
:|:|:||:||:|::|||:| ||:|||:|||||  ||||||| 
GTGATATGAAGATGCCATT-TAGATAGCAAAT--ATAACTACATA gB 
||| 
ATATATA gC2ex 

 

249 

 
 
Bex, A 
 I  Q  K  N  M  T  T  W  ST V  Q  E  E  T  D  D  S  V  H  S  T  R  F  S  T  V  G  Y  L  L  ST I  C   
D  T  K  K  H  D  Y  M  I  S  T  R  G  D  R  R  Q  C  P  Q  H  P  F  Q  H  S  W  L  F  V  V  D  L  W 
GAUACAAAAAAACAUGACUACAUGAUAAGUACAAGAGGAGACAGACGACAGUGUCCACAGCACCCGUUUCAGCACAGUUGGuuAuuuGuuGuAGAuuuGu 
                                                                                :||||:|::||||:||:|:| 
                                                                           gBex 13TAATAGATGACATTTAGATA 
 
G  E   F  I  V  Y    I  D  C  I  I     G   Y  L  H  R  G  T  E  K  L  C  E  Y  K   
  W   I  Y  C  L    Y  W  L  Y  Y  R      L  F  A  S  W  Y  R  K  V  M  W  I  ST K   
GGuGA*AuuuAuuGuuuA**UAUUGAuUGuAuuAuA***G*GuuAUUUGCAUCGUGGUACAGAAAAGUUAUGUGAAUAUAAAAG 
      ||||||:::|||  |||::||::||:||||   | :||||:||||||||| 
     12TAAATAGTGAAT--ATAGTTAGTATGATAT---C-TAATAGACGTAGCACATATATA gA 
|::|| |:||||||||||  ||||||| 
CTGCT-TGAATAACAAAT--ATAACTATATA gBex 
 
 
 

 

 

 

250 

APPENDIX L.  gRNAs identified to edit the ND7 5’ mRNAs of found in both TREU 667 and EATRO 164 gRNA transcriptomes. 

Editing 
Region 

E 

 

D 

 

Population 

E1 version 1 
E1 version 1 
E1 version 1 
E1 version 2 
E1 version 2 
E1 version 2 
E2 version 1 
E2 version 1 
E2 version 1 
E2 version 1 
E2 version 1 
E2 version 1 
E2 version 1 
E2 version 1 
E2 version 1 
E2 version 2 
E2 version 2 
E4t 
E4t 
E4t 
E4t 
E4t 
E4e 
E4e 
E4e 
E4e 
E4e 
E4e 
E4e 
E4e 
E4e 
 
D 
D 
D 
D 
D 
D 
D 
D 
D 

Sequences 

AAAT ATACAAATGTAAGAAAACTATCGAGAGTGATGTAGAATGATATT ATTTTTTTTT 

AAAT ATACAAGTGTAAGAAAACTATCGAGAGTGATGTAGAATGATATT ATTTTTTTTTTT 

       AAAT ATACAAATGTAAGAAAACTATCGAGAGTGATGTAGAATGATATTT TTTTTTTTTTAT 

 ATATA ATAAATGTAAAGAGACTATTGAGAGTGGCATAAGTGT TTTTATTTTTTTTTTTTAT 

  ACATAT ACGATACAAATGTAAAGAGGCTGTTAGAAGTGATGTAAAT TTTTTTTTTTTG 

ATACATAT ACGATACAGATGTGAAGAAACTATTAGAGATAATGTAAAT TTTTCTTTTTT 

 ATATA ATAAATGTAAAGAGACTATTGAGAGTGGCATAAGTGATATT ATTTTTTTTTTTTTT 

 ATATA ATAAATGTAAAGAGACTATTGAGAGTGGCATAAGTGATATTT TTTTTTTTTGT 

 ATATA ATAAATGTAAAGAGACTATTGAGAGTGGCATAAGTGAT TTTTTTTTTTCT 

 ATATA ATAAATGTAAAAAGACTATTGAGAGTGGCATAAGTGATATT ATTTTTTTTTGTTTTT 

   ATATAATAAA TAAAGAGACTATTGAGAGTGGCATAAGTGATATT ATAATGATTTTTTTTTTTTTT 

AAAT ATACAAATGTAAAGAAGCTATCAGAGGTAATATAAGTGATAT AATTTTTTTGTTTTTTT 

  ATATAAT AATGTAAAGAGACTATTGAGAGTGGCATAAGTGATATT ATAATGATATTTTTTTTT 

  AT ATATAAATGTAAAGAGACTATTGAGAGTGGCATAAGTGATATT ATTTTTT 

            ATATAC ACAAATGTAAAGAGACTATCGAGAGTGACATAAGTGATAT AATTTTTTTATTTT 

 ATATA ATAAATGTAAAGAGACTATTGATAGTGG CATAAGTGATATTATAATGATTTTTTTTTTTTTTTT 

ATA TAAATGTAAAACGGTATAGATGTAAGAAGATTACTAATGATAATT TTTTTTTTTTTT 

ATATA ATAAATGTAAAGACTATTGAGAGTGGCATAAGTGATATT ATAATTTTTTTTTTTTT 

ATATA ATAAATGTAGAGACTATTGAGAGTGGCATAAGTGATATT ATAATTTTTTTTTTTTT 

ATATA AAATGTAAAATGATATGAGTGTAGAAGTTACTGAGAATGATAT TTTTTTTTTTTC 

ATATA AAATGTAAAATGATATGAGTGTAGAAGTTACTGAGAATGATATA TTTTTTTTTTTC 

ATATA AAATGTAAAATGATATGAGTGTAGAAGTTACTGAGAATGAT TTTTTTTTTTTTC 

 ATATAC AATGTAAAACGATGCAGATGTAGAAGCTACTAATGATAATAT TTTTTTTTTTTTG 

 ATATAC AATGTAAAACGATGCAGATGTAGAAGCTACTAATGATAAT TTTTTTTTTTTC 

   ATATACAA TAAAACGATGCAGATGTAGAAGCTACTAATGATAATAT TTTTTTATTTTTT 

 ATATACAATGTAAAACGATT CAGATGTAGAAGCTACTAATGATAATAT TTTCTTTTTTTTT 

 ATATAC AATGTAAAACGATGCAGATGTAGAAGCTACTAATAATAATAT TTTTTTTAATTTTTTTTTTTTT 

  ATATACAATGT AAACGATGCAGATGTAGAAGCTACTAATGATAATAT TTTTTTCTTTTTTTT 

 ATATAC AATGTAAAACGATACAGATGTAGAAGCTACTAATGATAATAT TTTCTTTTTTT 

ATATA AAATGTAAAATGATATGAGTGTAGAAGTTACTGATAATGATAT ATTTTTTTTTTTTT 

ATATA AAATGTAAAATGATATGAGTGTAGAAGTTACTGATAATGAT TCTTTTTTTTTTTTT 
 
ATATATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAATGTTTTTTT TTTTTT 

ATATATAGATGCTGTA GATTAGATGTAGAGTGATATAAG CGTAAATTTTTTTTTTTTTG 

ATATATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAATTTTTTTT CTTTTTTT 

ATATATAA ATGCTGTGGATTAGATGTAGAATGATATGAGTGTGAAATTTTTTTT TTTTTTC 

               ATATA AAATGTAAAATGATATGAGTGTAGAAG TTACTGAGAATGATATTTTTTTTTTTTC 

ATATATAAATGCA GTGGATCAGATGTAAGATGGTATAAGTGTGAATATTTTTTT TTTTT 

ATATATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAATGTTT GTTTTTTTTTT 

ATATATAA ATGCTGTGGATTAGATGTAGAATGATATGAGTGTGAATTTTTTTT TTTTTG 

             GC GGGTCGAATGTAAGATGATATAGATGTGAA TGTATTTTTTTTTTT 

251 

  

  

  
  
  
  
  

  
  

  

  
  

  
  
  
  
  
  
  

 

  

Reads 
TREU 
667 
4,517 

803 

7,073 
213 

1,460 

192 

259 

10,349 
2,680 
2,360 

16 
5 
36,611 

3,590 
15,885 
12,929 
455 
332 
332 
229 

Reads 
EATRO 

164 

6 
1 

193 

148,603 
6,174 
1,811 
659 
593 
504 
241 
203 

34 

49 
26 

218,365 
18,562 
793 
777 
712 
439 
203 

4,096 
2,136 
797 
24 

  

  
  

  

  

  
  
  

  
  
 

  
  
  
  
  

D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
D 
 
C 
C 
C1ex 
C2ex 
C2ex 
C2ex 
CFSt 
CFSt 
 
B 
B 
B 
B 
B 
B 
B 
B 
B 
B 
B 
Bex 
Bex 
Bex 
Bex 
Bex 
Bex 
Bex 
 
A 
A 

 

 

C 

B 

 

A 

 

      AGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAATGTTCTTTT TTTTTTT 

ATATATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAATGTTTTTT GTTTTTTT 

TTATATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAATTTTTTTTGT TTT 

ATATATAGATGCA GTGGGTCAAATGTAAGATGATATAGATGTGAA TGTTTTTTATTTT 

ATATATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAATGTTTTTTTGT TTTTT 

 ATATATAGATGCAGT GGTCGAATGTAAGATGATATAGATGTGAA TGTTTTTTTTT 

ATATATAGATGCAT TGGGTCGAATGTAAGATGATATAGATGTGAA TGTTTGTTTTTTTTTTTT 

              GTGGGCCGAATGTAAGATGATATAGATGTGAA TGTTTTTTATTTTTTT 

ATATATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAATGTTTTT GTTTTTTT 

   TATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTGAATTCTTTTT TTTTTTT 

  ATATATAAATGC TGGATTAGATGTAGAATGATATGAGTGTGAAA TTTTTTTTTTT 

ATATATAGATGCA GTGGGTCGAATGTAAGATGATATAGATGTAAA TGTATTTTTTTTTTTTTT 
 
 ATATAATAAACAACATAAAGTGCCATGT ACTCGAGATTTGTAGATTAATTTTTTTTTTTTTTTTTTT 

ATAT ATACAATAAACAATGTGAAATATCATGTG AAGTGAGATTATGTGAATTTCTTTTTTTTTTTTT 

ATATAAA ACAATAAACAGTAAATCCCAGTGAATGAGATAGT TAATTTTTTTTTTT 

ATATATAAAC ACAATAGAACACACTGTAAGATAGT TATATGAGATATAATTTTTTT 

  ATATATAAAC ACAATAGAGCACACTGTAAGATAGT TATATGAGATATAATTTTTTTTTT 

ATATATATAAAC ACAATAGAACGCACTGTAAGATAGT TATATGAGATATTTTTTT 

ATATA ACAATAAACAACATAGAGCATTGTGTGTATAGTG AGATGTTACTTTTTCTTATTTTTTTGTATTTGTGTT 

ATAT ATACAATAAACGACATAGAATATCATGTGTATAGTG AGATGTTAATTTTTTTTTGTTTT 
 
ATAC ATCAATATAAACGATAGATTTACCGTAGAAGTATAGTGAATAAT TTTTTTTTTTATTT 

 ATATA CAATATAAACAGTAGATTCACTGCAGAAGTATGATAGATAAT TTTTTTTTTTG 

  ATACAT ATATAAACAATGAATTCACTGTGAAGATACGATAGATGATA TATTTTTTTTTGTTTT 

ATACA AATCAATATAAATGATGAATTCACTGTGAGAGTATGATAA TTAATTTTTTTTTTTTTTCT 

   ATATA CAATATAAACAGTAGATTCACTGCAGAGATATGATAGATAATA TTTTTTTTTTT 

   ATATA CAATATAAACAGTAGATTCACTGCAGAGATATGATAGATAATAA TTTTTTTTTTTCTA 

   ATATA CAATATAAACAGTAGATTCACTGCAGAGATATGATAGATAATAAATTTT TTTTTTT 

    ATACAT ATATAAACAATGAATTCACTGTGAAGATACAATAGATGATA TATTTTTTTTTTTTT 

   ATATA CAATATAAACAGTAGATTCACTGCAGAGATATGATAGATAAT TTTTTTTTTT 

ATACA AATCAATATAAATGATGAATTCACTGTGAGAGTATGAT TGTTTTTTTTTTTT 

         ATATAT AACAATAAATTCATCATAGAGATATAGTAAGTGATATGAGACATT TTTTTT 

ATAT ATCAATATAAACAATAAGTTCGTCATAGATTTACAGTAGATAATT TTTTTTTTTTT 

ATAT ATCAATATAAACAATAAGTTCGTCATAGATTTACAGTAGATAATTA TTTTTTTTTTT 

  AT ATCAATATAAACAATAAGTTCGTCATAGATTTACAGTAGATAATTAA GTTTTTTTTTTTT 

  AT ATCAATATAAACAATAAGTTCGTCATAGATTTACAGTAGATAATTAATT TTTTTTTTTT 

ATAT ATCAATATAAACGATGAATTTGTCATAGATTTACAGTAGATAATTA TTTTTTTTTC 

     ATCAATATAAACGATGAATTTGTCATAGATTTACAGTAGATAATT TTTT 

ATAT ATCAATATAAACAATGAGTTCATCATAGATTTACAGTAGATAATTA TTTTTTTTT 
 
ATATATA CACGATGCAGATAATCTATAGTATGATTGATATAAGTGATAAATTT TTTTTTTTT 

    ATA TATGATGCAAATAACTTATGATACGATTGATGTAGATGATAAATTT TTTTTTTTTTG 

252 

211 
203 
195 
158 
154 
153 
128 
123 
112 
109 
106 
101 

114 
183 
22 
14 
13 
23 
57 
4,718 

26,624 
3,050 
1,675 
798 
579 
327 
110 
30 

57 
38 
25 
993 
4,939 

 
  

 

  
  

  
  
  
  

 

  
  
  
  
  
  
  
  
  
  
  
  
 

  

  
  

  
 

  
  
  
  
  
  
  
  

  
  
  
 

  

6,185 

131 
103 

4 

3,969 
3,557 
390 

20,696 
1,024 
714 
693 

535 

APPENDIX M.  Predicted ND7 protein sequences. 

First start codons without premature termination codons were translated. Blue 

sequences are found in TREU cells only, orange sequences are found in EATRO cells only, and 
black sequences are found in both cell lines.   

 

 

 

 

 

253 

RF1 E1v1         MLFLVVFLHLYRFTFGPQHPAAHGVLCCLLYFCGEFIVYIDCIIGYLHRGTEKLCEYK E1v2    MISTFMLFLVVFLHLYRFTFGPQHPAAHGVLCCLLYFCGEFIVYIDCIIGYLHRGTEKLCEYK E2v1FS MISIIYVIFGSFFTFVSFYIWSTASRYAHGVLCCLLYFCGEFIVYIDCIIGYLHRGTEKLCEYK E2v2FS   MISTIVIGSFFTFVSFYIWSTASRYAHGVLCCLLYFCGEFIVYIDCIIGYLHRGTEKLCEYK  RF3 E2v1  MISIIYVIFGSFFTFVSFYIWSTASRSTWCFMLFIVFLWWIYCLYWLYYRLFASWYRKVMWI! E2v2    MISTIVIGSFFTFVSFYIWSTASRSTWCFMLFIVFLWWIYCLYWLYYRLFASWYRKVMWI!  Minor RF2 variants E3t     MISIIYVIFGSFYICIVLHLVHSIPQHMVFYVVYCIFVVNLLFILIVL! E1v1FS      MLFLVVFLHLYRFTFGPQHPAMHMVFYVVYCIFVVNLLFILIVL! E1v2FS MISTFMLFLVVFLHLYRFTFGPQHPAMHMVFYVVYCIFVVNLLFILIVL!  No AUG E4tRF3   LVVFTFVSFYIWSTASRSTWCFMLFIVFLWWIYCLYWLYYRLFASWYRKVMWI! E4eRF3 LLLVVFTFVSFYIWSTASRSTWCFMLFIVFLWWIYCLYWLYYRLFASWYRKVMWI! APPENDIX N. CR3 gRNA Alignments for TREU 667 SDM79 (A), and all editing variants (B). 

Amino acid translations are shown above the mRNA sequences.  The cDNA sequence of the most abundant gRNA in its 

sequence class is shown aligned beneath the fully edited mRNA. Lowercase u’s indicate uridines added by editing, asterisks 
indicate encoded uridines deleted during editing. Nucleotides and deletion sites in the fully edited mRNA were numbered 
starting from the 50 end (+1=0). gRNAs are colored based on transcript abundance as follows: Blue<100; Green<1,000; 
Purple<10,000; Orange<100,000; Red>100,000; Black=not quantified. Watson-Crick (|) and G:U base pairs (:) are indicated. 
Mismatches are indicated by an octothorpe (#). Highlighted sequence represents sequences were multiple CU configurations 
are possible.   

 

A.  TREU 667 CR3 gRNA alignment 

gGt, gFt, gEt’, gE, gD, gC, gB1B2, gA1 
                                              M  F  D      C  L  V  L  L  F  F  Y  C  L  F  V  H  F  
AGAAAUAUAAAUAUGUGUAUGAUAUAUAAuuAAuuAuuuuCAuuuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuGuACAuuu 
                           |||||||:|:|||||:|:||::|||||    |||||::||||||||||| 
                       gGt 13TTAATTAGTGAAAGTGAGATGTAAACT----AACAAGTCAAAACAACAATATA 
                                                               |::|:|:::|||:|:||:|:||:|:||:||||||||| 
                                                      gFt 12TACATATTAGAGTGACAGAGAAGTGACGAGCAGACATGTAAA 
                                                                                          |::|||||:| 
                                                                                gEt’ 11TATATAGTATGTAGA 
 
 F  C  F  L  F  V  C  D       L  F  L  C  L  L   F  S  F  C  F  L  L  D  F  C  F  L  F  N  M  G  L   
uuuuuGuuuuuuAuuuGuuuGuG***A**UUUGuuuuuAuGuuuGuuA*UUUAGuuuuuGuuuuuuAuuGGAuuuuuGuuuuuuAuuuAAuAuGGGuuuA 
|                                 |:||||::|::||| ::||||||:|||:|:|||||||||||:|:| 
ATATA gFt                        11TAGAATATGAGTAAT-GGATCAAAGACAGAGAATAACCTAAAGATATA gD 
:||:||||::|||||:|||||||   |  |||:|                                     |||:|:::|||:|||:|||||::||:||| 
GAAGACAAGGAATAAGCAAACAC---T--AAATACATA gEt’                         gC 12TAAGAGTGAAAGATAGATTATGTCCGAAT 
   :|||||:|||:|:::|:|||   |  :|||:|:|||||||||||| |:||                                          ::||| 
  13TATAGTTATAGATGGAGCAC---T--GAACGAGAATACAAACAAT-AGATA gE                               gB1B2 13TGAAT 
 
L  L  C  F  I  L  Q  I  F  S  V  I  I  I  I  V  Y  K  F  S  L  L  D  STOP 
uUGuuGuGuuuuAuAuuACAGAuuuuuAGuGuuAuCAuUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAAAAAAGUAUGCAAAUAAUUUUUGU 
|::|||||||| 
AGTAACACAAATATAAA gC  
|::|::|:|:|:|:||||||||||:|:|||||||||||||| 
AGTAGTATAGAGTGTAATGTCTAAGAGTCACAATAGTAATATATA gB1B2 
                            :|:|:|||||||:|||::||||:||||:|:||:||||||||||| 
                       gA1 04TATAGTAGTAATGATAGTATATGTTCAGAGGCGATAATCTAATTATATA 

 

 

254 

B.  TREU 667 Cell line variants 

A Variants 
A1 
 L  L  C  F  I  L  Q  I  F  S  V  I  I  I  I  V  Y  K  F  S  L  L  D  ST K  S  M  Q  I  I  F   
  C  C  V  L  Y  Y  R  F  L  V  L  S  L  L  L  Y  I  S  F  R  Y  ST I  K  K  V  C  K  ST F  L   
I  V  V  F  Y  I  T  D  F  ST C  Y  H  Y  Y  C  I  ST V  F  V  I  R  L  K  K  Y  A  N  N  F  C   
AuUGuuGuGuuuuAuAuuACAGAuuuuuAGuGuuAuCAuUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAAAAAAGUAUGCAAAUAAUUUUUGU 
||::|::|:|:|:|:||||||||||:|:|||||||||||||| 
TAGTAGTATAGAGTGTAATGTCTAAGAGTCACAATAGTAATATATA gB1B2 
                             :|:|:|||||||:|||::||||:||||:|:||:||||||||||| 
                            04TATAGTAGTAATGATAGTATATGTTCAGAGGCGATAATCTAATTATATA gA1 
 
A2 
 L  L  C  L  Y  Y  F  R  F  Y  G  I  I  F  I  I  V  Y  K  F  S  L  L  D  ST K  S  M  Q  I  I  F   
  C  C  V  Y  I  I  S  D  F  M  V  S  F  L  L  L  Y  I  S  F  R  Y  ST I  K  K  V  C  K  ST F  L   
I  V  V  F  I  L  F  Q  I  L  W  Y  H  F  Y  Y  C  I  ST V  F  V  I  R  L  K  K  Y  A  N  N  F  C   
AuUGuuGuGuuuAuAuuAuuuCAGAuuuuAuGGuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAAAAAAGUAUGCAAAUAAUUUUUGU 
                                 ||:|||:||:|:||:::||||:|||:||||||:|||||||| 
                              14TAATGGTAGAAGTGATGGTATATGTTCGAAAGCAGTAATCTAAAATATA gA2 
||::|::|:|:||||:||:|:||:|||:||||||||||||||||| 
TAGTAGTATAGATATGATGAGGTTTAAGATACCATAGTAAAAATATATA gB3t 
 
 
 

 

 

255 

B Variants 
B1B2 
 D  F  C  F  L  F  N  M  G  L  L  L  C  F  I  L  Q  I  F  S  V  I  I  I  I  V  Y  K  F  S   
  I  F  V  F  Y  L  I  W  V  Y  C  C  V  L  Y  Y  R  F  L  V  L  S  L  L  L  Y  I  S  F  R  
G  F  L  F  F  I  ST Y  G  F  I  V  V  F  Y  I  T  D  F  ST C  Y  H  Y  Y  C  I  ST V  F  V 
GGAuuuuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuuAuAuuACAGAuuuuuAGuGuuAuCAuUAuuAuuGuAuAuAAGuUUUCG 
  |||:|:::|||:|||:|||||::||:||||::|||||||| 
 12TAAGAGTGAAAGATAGATTATGTCCGAATAGTAACACAAATATAAA gC 
                          ::||||::|::|:|:|:|:||||||||||:|:|||||||||||||| 
                         13TGAATAGTAGTATAGAGTGTAATGTCTAAGAGTCACAATAGTAATATATA gB1B2 
                                                           :|:|:|||||||:|||::||||:||||:|:|| 
                                                      gA1 04TATAGTAGTAATGATAGTATATGTTCAGAGGC 
 
B4t 
 D  F  C  F  L  F  N  M  G  L  L  L  C  L  F  F  F  F  I  L  S  F  D  M  L  L  S  F  L  L  L  Y  I  S  F  R  
  I  F  V  F  Y  L  I  W  V  Y  C  C  V  Y  F  F  F  L  F  Y  H  L  I  C  C  Y  H  F  Y  Y  C  I  ST V  F   
G  F  L  F  F  I  ST Y  G  F  I  V  V  F  I  F  F  F  Y  F  I  I  W  Y  V  V  I  I  F  I  I  V  Y  K  F  S   
GGAuuuuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuAuuuuuuuuuuuuAuuuuAuCAuuuGAuAuGuuGuuAuCAuuuuUAuuAuuGuAuAuAAGuUUUC 
                                 :||:::||:|:|:|:||||:|:|:||||||||||||||||||:| 
                                12TAATGTAAGTGAGAGAAAAGAGTGAAATAGTAAACTATACAATATATA gB4t’ 
  |||:|:::|||:|||:|||||::||:||||::||||||||||                               :|||:|||:||:|:||:::||||:|||:|||| 
 12TAAGAGTGAAAGATAGATTATGTCCGAATAGTAACACAAATATAAA gC                   gA2 14TAATGGTAGAAGTGATGGTATATGTTCGAAAG 
                                                           |||||:|::||||:|:::||||||:|:|||||:||||||||| 
                                                  gB4 08TAATATAGTGAGTTATATAGTGATAGTAGAGATAATGACATATATTATATA 
 
B3t 
 D  F  C  F  L  F  N  M  G  L  L  L  C  L  Y  Y  F  R  F  Y  G  I  I  F  I  I  V  Y  K  F  S   
  I  F  V  F  Y  L  I  W  V  Y  C  C  V  Y  I  I  S  D  F  M  V  S  F  L  L  L  Y  I  S  F  R  
G  F  L  F  F  I  ST Y  G  F  I  V  V  F  I  L  F  Q  I  L  W  Y  H  F  Y  Y  C  I  ST V  F  V 
GGAuuuuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuAuAuuAuuuCAGAuuuuAuGGuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCG 
  |||:|:::|||:|||:|||||::||:||||::|||||||||||||                ||:|||:||:|:||:::||||:|||:||||| 
 12TAAGAGTGAAAGATAGATTATGTCCGAATAGTAACACAAATATAAA gC     gA2 14TAATGGTAGAAGTGATGGTATATGTTCGAAAGC 
                             |||::|::|:|:||||:||:|:||:|||:||||||||||||||||| 
                           12TATAGTAGTATAGATATGATGAGGTTTAAGATACCATAGTAAAAATATATA gB3t 
 
 
 
 

 

 

256 

C Variants 
C 
 C  L  L   F  S  F  C  F  L  L  D  F  C  F  L  F  N  M  G  L  L  L  C  F  I  L  Q  I  F  S  V  I 
  V  C  Y   L  V  F  V  F  Y  W  I  F  V  F  Y  L  I  W  V  Y  C  C  V  L  Y  Y  R  F  L  V  L   
M  F  V  I   ST F  L  F  F  I  G  F  L  F  F  I  ST Y  G  F  I  V  V  F  Y  I  T  D  F  ST C  Y  
AuGuuuGuuA*UUUAGuuuuuGuuuuuuAuuGGAuuuuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuuAuAuuACAGAuuuuuAGuGuuA 
                                 |||:|:::|||:|||:|||||::||:||||::|||||||| 
                                12TAAGAGTGAAAGATAGATTATGTCCGAATAGTAACACAAATATAAA gC 
||::|::||| ::||||||:|||:|:|||||||||||:|:|                ::||||::|::|:|:|:|:||||||||||:|:||||||| 
TATGAGTAAT-GGATCAAAGACAGAGAATAACCTAAAGATATA gD    gB1B2 13TGAATAGTAGTATAGAGTGTAATGTCTAAGAGTCACAAT 
 
 

D Variants 
D 
V  S  L  Y  F  F  V  D     F  C  L  C  L  L   F  S  F  C  F  L  L  D  F  C  F  L  F  N  M  G  L   
 Y  H  C  I  F  L  W     I  F  V  Y  V  C  Y   L  V  F  V  F  Y  W  I  F  V  F  Y  L  I  W  V  Y  
  I  I  V  F  F  C  G     F  L  F  M  F  V  I   ST F  L  F  F  I  G  F  L  F  F  I  ST Y  G  F  I 
GuAuCAuuGuAuuuuuuuGuGG***AUUUUUGuuuAuGuuuGuuA*UUUAGuuuuuGuuuuuuAuuGGAuuuuuGuuuuuuAuuuAAuAuGGGuuuA 
:|||||||:||:|:|:||:|:|   ||||:|||:||||||||||| |                     |||:|:::|||:|||:|||||::||:||| 
TATAGTAATATGAGAGAATATC---TAAAGACAGATACAAACAAT-ATATA gEt         gC 12TAAGAGTGAAAGATAGATTATGTCCGAAT 
                                :||||::|::||| ::||||||:|||:|:|||||||||||:|:| 
                             11TAGAATATGAGTAAT-GGATCAAAGACAGAGAATAACCTAAAGATATA gD 
 
 
 

 

 

257 

E Variants 
E 
      C  L  V  L  L  F  F  Y  C  L  F  V  H  F  F  C  F  L  F  V  C  D       L  F  L  C  L  L   F  S  F  C  
I      V  W  F  C  C  F  F  I  V  C  L  Y  I  F  F  V  F  Y  L  F  V     I    C  F  Y  V  C  Y   L  V  F  V 
     L  F  G  F  V  V  F  L  L  F  V  C  T  F  F  L  F  F  I  C  L  W       F  V  F  M  F  V  I   ST F  L   
A****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuGuACAuuuuuuuuGuuuuuuAuuuGuuuGuG***A**UUUGuuuuuAuGuuuGuuA*UUUAGuuuuuG 
          |::|:|:::|||:|:||:|:||:|:||:||||||||||                                 |:||||::|::||| ::||||||:|| 
    12TACATATTAGAGTGACAGAGAAGTGACGAGCAGACATGTAAAATATA gFt                     gD 11TAGAATATGAGTAAT-GGATCAAAGAC 
                                     |::|||||:|:||:||||::|||||:|||||||   |  |||:| 
                               11TATATAGTATGTAGAGAAGACAAGGAATAAGCAAACAC---T--AAATACATA gEt’ 
                                                                :|:|||   |  :|||:|:|||||||||||| |:|| 
                                                 13TATAGTTATAGATGGAGCAC---T--GAACGAGAATACAAACAAT-AGATA gE 
 
Et 
S        F  G  G  L  L  C  V  S  L  Y  F  F  V  D     F  C  L  C  L  L   F  S  F  C  F  L  L   
       V  L  V  V  Y  C  V  Y  H  C  I  F  L  W     I  F  V  Y  V  C  Y   L  V  F  V  F  Y  W  
        F  W  W  F  I  V  C  I  I  V  F  F  C  G     F  L  F  M  F  V  I   ST F  L  F  F  I  G 
A******GuuuuGGuGGUUUAuuGuGuGuAuCAuuGuAuuuuuuuGuGG***AUUUUUGuuuAuGuuuGuuA*UUUAGuuuuuGuuuuuuAuuG 
|      |:|:|::|::|:||::||||:||||||#|||||| 
T------CGAGATTATTAGATGGCACATATAGTA-CATAAATTATA gFGtxp 
                       :::::|||||||:||:|:|:||:|:|   ||||:|||:||||||||||| | 
                      13TGTGTATAGTAATATGAGAGAATATC---TAAAGACAGATACAAACAAT-ATATA gEt 
                                                           :||||::|::||| ::||||||:|||:|:|||||| 
                                                     gD 11TAGAATATGAGTAAT-GGATCAAAGACAGAGAATAAC 
 
Et’ 

      C  L  V  L  L  F  F  Y  C  L  F  V  H  F  F  C  F  L  F  V  C  D       L  F  L  C  L  L   F  S  F  C  
I      V  W  F  C  C  F  F  I  V  C  L  Y  I  F  F  V  F  Y  L  F  V     I    C  F  Y  V  C  Y   L  V  F  V 
     L  F  G  F  V  V  F  L  L  F  V  C  T  F  F  L  F  F  I  C  L  W       F  V  F  M  F  V  I   ST F  L   
A****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuGuACAuuuuuuuuGuuuuuuAuuuGuuuGuG***A**UUUGuuuuuAuGuuuGuuA*UUUAGuuuuuG 
          |::|:|:::|||:|:||:|:||:|:||:||||||||||                                 |:||||::|::||| ::||||||:|| 
    12TACATATTAGAGTGACAGAGAAGTGACGAGCAGACATGTAAAATATA gFt                     gD 11TAGAATATGAGTAAT-GGATCAAAGAC 
                                     |::|||||:|:||:||||::|||||:|||||||   |  |||:| 
                               11TATATAGTATGTAGAGAAGACAAGGAATAAGCAAACAC---T--AAATACATA gEt’ 
                                                                :|:|||   |  :|||:|:|||||||||||| |:|| 
                                                 13TATAGTTATAGATGGAGCAC---T--GAACGAGAATACAAACAAT-AGATA gE 
 

 

258 

FG Variants 
FGtx 
 K  T  L  V  C  S        F  G  G  L  L  C  V  S  L  Y  F  F  V  D     F  C  L   
  K  H  ST F  V        V  L  V  V  Y  C  V  Y  H  C  I  F  L  W     I  F  V  Y  
K  N  I  S  L  ST       F  W  W  F  I  V  C  I  I  V  F  F  C  G     F  L  F  M 
AAAAACAuuAGuuuGuA******GuuuuGGuGGUUUAuuGuGuGuAuCAuuGuAuuuuuuuGuGG***AUUUUUGuuuA 
      ||||::|::||      |:|:|::|::|:||::||||:||||||#|||||| 
     14TAATTGAGTAT------CGAGATTATTAGATGGCACATATAGTA-CATAAATTATA gFGtxp 
                                       :::::|||||||:||:|:|:||:|:|   ||||:|||:|| 
                                   gEt 13TGTGTATAGTAATATGAGAGAATATC---TAAAGACAGAT 
 

Ft 
 I  N  Y  F  H  F  M  F  D      C  L  V  L  L  F  F  Y  C  L  F  V  H  F  F  C  F  L  F  V  C  D       L  F 
  L  I  I  F  I  L  C  L  I      V  W  F  C  C  F  F  I  V  C  L  Y  I  F  F  V  F  Y  L  F  V     I    C   
N  ST L  F  S  F  Y  V  W      L  F  G  F  V  V  F  L  L  F  V  C  T  F  F  L  F  F  I  C  L  W       F  V  
AAuuAAuuAuuuuCAuuuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuGuACAuuuuuuuuGuuuuuuAuuuGuuuGuG***A**UUUGu 
|||||||:|:|||||:|:||::|||||    |||||::|||||||||||                                         :|:|||   |  :|||: 
TTAATTAGTGAAAGTGAGATGTAAACT----AACAAGTCAAAACAACAATATA gGt                gE 13TATAGTTATAGATGGAGCAC---T--GAACG 
                                    |::|:|:::|||:|:||:|:||:|:||:|||||||||| 
                              12TACATATTAGAGTGACAGAGAAGTGACGAGCAGACATGTAAAATATA gFt 
                                                               |::|||||:|:||:||||::|||||:|||||||   |  |||:| 
                                                     gEt’ 11TATATAGTATGTAGAGAAGACAAGGAATAAGCAAACAC---T--AAATA 
 
FGt 
  K  Q  C  V       C  C  C  F  V  L     I  L  V  V  H  F  F  C  F  L  F  V  C  D       L  F  L   
K  N  N  V  Y       V  V  V  L  F  W     F  W  L  Y  I  F  F  V  F  Y  L  F  V     I    C  F  Y  
 K  T  M  C  M       L  L  F  C  F  D     F  G  C  T  F  F  L  F  F  I  C  L  W       F  V  F  M 
AAAAACAAuGuGuA*****UGuuGuuGuuuuGuuuuG***AuuuuGGuuGuACAuuuuuuuuGuuuuuuAuuuGuuuGuG***A**UUUGuuuuuA 
       ||:|:||     ::|:||:::|||:|:||:   |||:|||||||||||||| 
      04TATATAT-----GTAGTAGTGAAATAGAAT---TAAGACCAACATGTAAAATATATA gFGtp 
                                              :|::|||||:|:||:||||::|||||:|||||||   |  |||:| 
                                         11TATATAGTATGTAGAGAAGACAAGGAATAAGCAAACAC---T--AAATACATA gEt’ 
                                                                          :|:|||   |  :|||:|:||| 
                                                        gE 13TATAGTTATAGATGGAGCAC---T--GAACGAGAAT 
 
 
 

 

259 

Gt 
 E  I  ST I  C  V  W  Y  I  I  N  Y  F  H  F  M  F  D      C  L  V  L  L  F  F  Y  C  L  F  V 
  K  Y  K  Y  V  Y  D  I  ST L  I  I  F  I  L  C  L  I      V  W  F  C  C  F  F  I  V  C  L   
R  N  I  N  M  C  M  I  Y  N  ST L  F  S  F  Y  V  W      L  F  G  F  V  V  F  L  L  F  V  C  
AGAAAUAUAAAUAUGUGUAUGAUAUAUAAuuAAuuAuuuuCAuuuuAuGuuuGA****UUGuuuGGuuuuGuuGuuuuUUUAuuGuuuGuuuG 
                           |||||||:|:|||||:|:||::|||||    |||||::||||||||||| 
                          13TTAATTAGTGAAAGTGAGATGTAAACT----AACAAGTCAAAACAACAATATA gGt 
                                                               |::|:|:::|||:|:||:|:||:|:||:|| 
                                                      gFt 12TACATATTAGAGTGACAGAGAAGTGACGAGCAGAC 
 
 

 

 

260 

APPENDIX O. CR3 gRNA Alignments for EATRO 164 (A), and all editing variants (B). 

Amino acid translations are shown above the mRNA sequences.  The cDNA sequence of the most abundant gRNA in its 

sequence class is shown aligned beneath the fully edited mRNA. Lowercase u’s indicate uridines added by editing, asterisks 
indicate encoded uridines deleted during editing. Nucleotides and deletion sites in the fully edited mRNA were numbered 
starting from the 50 end (+1=0). gRNAs are colored based on transcript abundance as follows: Blue<100; Green<1,000; 
Purple<10,000; Orange<100,000; Red>100,000; Black=not quantified. Watson-Crick (|) and G:U base pairs (:) are indicated. 
Mismatches are indicated by an octothorpe (#). Highlighted sequence represents sequences were multiple CU configurations 
are possible.   

 

A.  EATRO 164 CR3 SDM79/SDM80 

gFGep, gEep, gDep, gCe, gB5e, gB4, gA2 
            M  C  M  I  Y  K  L  T  I  V  L        G  G     I  L  V  I  I  V  Y  L  V  V   M      S  
AGAAAUAUAAAUAUGUGUAUGAUAUAUAAAuuAACAAuuGuGuuA******GGuGGG***AuuuuGGuGAuCAuuGuuuAuuuGGuuG*UUA****UGAG 
                             ||||||||:::|:|||      |||:::   ||:|::||||||||:||| 
                            13TAATTGTTGGTATAAT------CCATTT---TAGAGTCACTAGTAGCAACATATA gFGep 
                                                                     |||||::||:|:||:::|| :||    ::|| 
                                                              gEep 12TATAGTAGTAAGTGAATTGAC-GAT----GTTC 
                                                                                    :||: |:|    ::|: 
                                                                               gDep 12TAAT-AGT----GTTT 
 
 
 C  I  L  C  F  V  M  V  I  V  F  Y  L  I  W  V  Y  C  C  V  Y  I  T  Y  V  F  L  L  L  L  S  F  L   
uuGuAUUUUAuGuuuuGuuAuGGuuAuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuAuAuuACuuAuGuAuuuuuAuuGuuGuuAuCAuuuuUA 
|::|||||||||||||||                                |||:|::::||||:||:||||||:|||:|:||:|||||||||||| 
AGTATAAAATACAAAATATATA gEep                       12TAATAGTGTAAATGTAGTGAATATATAGAGATGACAACAATAGTATATA gB5e 
:||:|:|||||:|:|::|:|||:||||||||||                                                  |:|:::||||||:|:|| 
GACGTGAAATATAGAGTAGTACTAATAACAAAATATA gDep                     gB4 11TAATATAGTGAGTTATATAGTGATAGTAGAGAT 
                      ::|||:::|||:|||:|||:||:|||||||::|||||||||||||                    :||||||::||:| 
                     13TGATAGTGAAAGATAGATTGTATCCAAATAGTAACACAAATATAAA gCe          gA2 13TAATAGTGGAAGT 
 
 
L  L  Y  I  S  F  R  Y  STOP 
uuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAAAAAAGUAUGCAAAUAAUUUUUGU 
|||:||||||||| 
AATGACATATATTATATA gB4 
|:||::||||:|||||:||||:|||||||| 
AGTAGTATATGTTCAAGAGCAGTAATCTAAAATATA gA2 

 

 

261 

B.  EATRO 164 CR3 Variants 

A Variants 
 
A1 
 L  L  C  F  I  L  Q  I  F  S  V  I  I  I  I  V  Y  K  F  S  L  L  D  STOP 
  C  C  V  L  Y  Y  R  F  L  V  L  S  L  L  L  Y  I  S  F  R  Y  STOP 
I  V  V  F  Y  I  T  D  F  ST C  Y  H  Y  Y  C  I  STOP 
AuUGuuGuGuuuuAuAuuACAGAuuuuuAGuGuuAuCAuUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAAAAAAGUAUGCAAAUAAUUUUUGU 
                             :|:|:|||||||:|||::||||:||||:|:|||||||||||||| 
                            11TATAGTAGTAATGATAGTATATGTTCAGAGGCAATAATCTAATTATA gA1 
||::|::|:|:|||||:||||||||:|:|||||||||||| 
TAGTAGTATAGAATATGATGTCTAAGAGTCACAATAGTAAATATATA gB1B2 
 
 
A2 
 L  L  C  L  Y  Y  L  C  I  F  I  V  V  I  I  F  I  I  V  Y  K  F  S  L  L  D  STOP 
  C  C  V  Y  I  T  Y  V  F  L  L  L  L  S  F  L  L  L  Y  I  S  F  R  Y  STOP 
I  V  V  F  I  L  L  M  Y  F  Y  C  C  Y  H  F  Y  Y  C  I  STOP 
AuUGuuGuGuuuAuAuuACuuAuGuAuuuuuAuuGuuGuuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCGUUAUUAGAUUAAAAAAGUAUGCAAAUAAUUUUUGU 
                                     :||||||::||:||:||::||||:|||||:||||:|||||||| 
                                    13TAATAGTGGAAGTAGTAGTATATGTTCAAGAGCAGTAATCTAAAATATA gA2 
                                 |:|:::||||||:|:|||||:||||||||| 
                11TAATATAGTGAGTTATATAGTGATAGTAGAGATAATGACATATATTATATA gB4 
|||:|::::||||:||:||||||:|||:|:||:|||||||||||| 
TAATAGTGTAAATGTAGTGAATATATAGAGATGACAACAATAGTATATA gB5e 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

262 

B Variants 
B1B2 
G  Y  C  F  L  F  N  M  G  L  L  L  C  F  I  L  Q  I  F  S  V  I  I  I  I  V  Y  K  F  S   
 V  I  V  F  Y  L  I  W  V  Y  C  C  V  L  Y  Y  R  F  L  V  L  S  L  L  L  Y  I  S  F  R  
  L  L  F  F  I  ST Y  G  F  I  V  V  F  Y  I  T  D  F  ST C  Y  H  Y  Y  C  I  STOP 
GGuuAuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuuAuAuuACAGAuuuuuAGuGuuAuCAuUAuuAuuGuAuAuAAGuUUUCG 
                           ||||::|::|:|:|||||:||||||||:|:|||||||||||| 
                   gB1B2 14TAATAGTAGTATAGAATATGATGTCTAAGAGTCACAATAGTAAATATATA 
                                                          :|:|:|||||||:|||::||||:||||:|:|| 
                                                     gA1 11TATAGTAGTAATGATAGTATATGTTCAGAGGC 
 ::|||:::|||:|||:|||:||:|||||||::|||||||||||||         
13TGATAGTGAAAGATAGATTGTATCCAAATAGTAACACAAATATAAA gCe 
 
 
B5e 
G  Y  C  F  L  F  N  M  G  L  L  L  C  L  Y  Y  L  C  I  F  I  V  V  I  I  F  I  I  V  Y  K  F  S   
 V  I  V  F  Y  L  I  W  V  Y  C  C  V  Y  I  T  Y  V  F  L  L  L  L  S  F  L  L  L  Y  I  S  F  R  
  L  L  F  F  I  ST Y  G  F  I  V  V  F  I  L  L  M  Y  F  Y  C  C  Y  H  F  Y  Y  C  I  STOP 
GGuuAuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuAuAuuACuuAuGuAuuuuuAuuGuuGuuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCG 
                             |||:|::::||||:||:||||||:|||:|:||:|||||||||||| 
                            12TAATAGTGTAAATGTAGTGAATATATAGAGATGACAACAATAGTATATA gB5e 
                                                              |:|:::||||||:|:|||||:||||||||| 
                                         gB4 11TAATATAGTGAGTTATATAGTGATAGTAGAGATAATGACATATATTATATA 
 ::|||:::|||:|||:|||:||:|||||||::|||||||||||||                    :||||||::||:||:||::||||:|||||:||| 
13TGATAGTGAAAGATAGATTGTATCCAAATAGTAACACAAATATAAA gCe          gA2 13TAATAGTGGAAGTAGTAGTATATGTTCAAGAGC 
 
 
B6e  
G  Y  C  F  L  F  N  M  G  L  L  L  C  L  Y  Y  L  M  Y  F  Y  C  C  Y  H  F  Y  Y  C  I  STOP 
 V  I  V  F  Y  L  I  W  V  Y  C  C  V  Y  I  I  L  C  I  F  I  V  V  I  I  F  I  I  V  Y  K  F  S   
  L  L  F  F  I  ST Y  G  F  I  V  V  F  I  L  S  Y  V  F  L  L  L  L  S  F  L  L  L  Y  I  S  F  R  
GGuuAuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuAuAuuAuCuuAuGuAuuuuuAuuGuuGuuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCG 
                             ||::|::|:|:||||:|||||:|:|||||:||||||||||||| 
                            12TAGTAGTATAGATATGATAGAGTGCATAAGAATAACAACAATATATA gB6e 
                                                               |:|:::||||||:|:|||||:||||||||| 
                                          gB4 11TAATATAGTGAGTTATATAGTGATAGTAGAGATAATGACATATATTATATA 
 ::|||:::|||:|||:|||:||:|||||||::|||||||||||||                     :||||||::||:||:||::||||:|||||:||| 
13TGATAGTGAAAGATAGATTGTATCCAAATAGTAACACAAATATAAA gCe            gA2 13TAATAGTGGAAGTAGTAGTATATGTTCAAGAGC 

 

263 

B7e  
  S  N  L  ST C  C  Y  L  F  G  Y  I  I  W  Y  V  V  I  I  F  I  I  V  Y  K  F  S   
G  V  I  Y  S  V  V  I  C  L  D  I  S  F  D  M  L  L  S  F  L  L  L  Y  I  S  F  R 
 E  ST F  I  V  L  L  F  V  W  I  Y  H  L  I  C  C  Y  H  F  Y  Y  C  I  STOP 
GGAGuAAuuuAuAGuGuuGuuAuuUGuuuGGAuAuAuCAuuuGAuAuGuuGuuAuCAuuuuUAuuAuuGuAuAuAAGuUUUCG 
   :|||||:|:|:|||::|:|:||::|::|||||||| 
  09TATTAAGTGTTACAGTAGTGAATGAGTCTATATAG gB7e 
      |||:|:|:|||::|:|:||::|::||||||||||||| 
     14TAAGTGTTACAGTAGTGAATGAGTCTATATAGTAAACGAATATAAA gB7e 
                                |||||||:|::||||:|:::||||||:|:|||||:||||||||| 
                         gB4 11TAATATAGTGAGTTATATAGTGATAGTAGAGATAATGACATATATTATATA 
                                                  :||||||::||:||:||::||||:|||||:||| 
                                             gA2 13TAATAGTGGAAGTAGTAGTATATGTTCAAGAGC 
 
 

 

 

264 

C Variants 
C 
 C  L  L   F  S  F  C  F  L  L  D  F  C  F  L  F  N  M  G  L  L  L  C  L  Y  Y  L  C  I  F  I  V  V  I 
  V  C  Y   L  V  F  V  F  Y  W  I  F  V  F  Y  L  I  W  V  Y  C  C  V  Y  I  T  Y  V  F  L  L  L  L   
M  F  V  I   ST F  L  F  F  I  G  F  L  F  F  I  ST Y  G  F  I  V  V  F  I  L  L  M  Y  F  Y  C  C  Y  
AuGuuuGuuA*UUUAGuuuuuGuuuuuuAuuGGAuuuuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuAuAuuACuuAuGuAuuuuuAuuGuuGuuA 
                            |||::||:||::|:|:|:|:||||||:|:||||||||||||:| 
                      gC 14TAATTTAGAAGTAGAGAGTGAATTATGCTCAAATAACAACATATATA 
||::|::||| |:|||:||:|||:|:|||||||||||:|:|                    |||:|::::||||:||:||||||:|||:|:||:|||||||| 
TATGAGTAAT-AGATCGAAGACAGAGAATAACCTAAAGATA gD            gB5e 12TAATAGTGTAAATGTAGTGAATATATAGAGATGACAACAAT 
 
 
Ce 
  L  Y  F  M  F  C  Y  G  Y  C  F  L  F  N  M  G  L  L  L  C  L  Y  Y  L  C  I  F  I  V  V  I 
S  C  I  L  C  F  V  M  V  I  V  F  Y  L  I  W  V  Y  C  C  V  Y  I  T  Y  V  F  L  L  L  L   
 V  V  F  Y  V  L  L  W  L  L  F  F  I  ST Y  G  F  I  V  V  F  I  L  L  M  Y  F  Y  C  C  Y  
AGuuGuAUUUUAuGuuuuGuuAuGGuuAuuGuuuuuuAuuuAAuAuGGGuuuAuUGuuGuGuuuAuAuuACuuAuGuAuuuuuAuuGuuGuuA 
                        ::|||:::|||:|||:|||:||:|||||||::||||||||||||| 
                       13TGATAGTGAAAGATAGATTGTATCCAAATAGTAACACAAATATAAA gCe 
                                                    |||:|::::||||:||:||||||:|||:|:||:|||||||| 
                                              gB5e 12TAATAGTGTAAATGTAGTGAATATATAGAGATGACAACAAT 
|::||:|:|||||:|:|::|:|||:|||||||||| 
TTGACGTGAAATATAGAGTAGTACTAATAACAAAATATA gDep 
 
 
 
Ce80 
S  C  I  L  C  F  V  M  F  C  M  I  I  L  ST C  D  L  L  C  L  Y  Y  L  C  I  F  I  V  V  I 
 V  V  F  Y  V  L  L  C  F  V  W  L  F  Y  S  V  I  C  C  V  Y  I  T  Y  V  F  L  L  L  L   
  L  Y  F  M  F  C  Y  V  L  Y  D  Y  F  I  V  W  F  V  V  F  I  L  L  M  Y  F  Y  C  C  Y  
AGuuGuAUUUUAuGuuuuGuuAuGuuuuGuAuGAuuAuuuuAuAGuGuGAuuUGuuGuGuuuAuAuuACuuAuGuAuuuuuAuuGuuGuuA 
                  :||||:|:|::|||:|:|||:|||:||||||||:|||||||||||||| 
                 14TAATATAGAGTATATTGATAGAATGTCACACTAGATAACACAAATATATATA gCe80 
                                                   ||:|::::||||:||:||||||:|||:|:||:|||||||| 
                                             gB5e 12TAATAGTGTAAATGTAGTGAATATATAGAGATGACAACAAT 
|::||:|:|||||:|:|::|:||| 
TTGACGTGAAATATAGAGTAGTACTAATAACAAAATATA gDep  
 

 

 

265 

D Variants 
D 
 Y  Q  Y  L  F  C  D       L  F  L  C  L  L   F  S  F  C  F  L  L  D  F  C  F  L  F  N  M  G  L   
  I  S  I  C  F  V     I    C  F  Y  V  C  Y   L  V  F  V  F  Y  W  I  F  V  F  Y  L  I  W  V  Y  
V  S  V  F  V  L  W       F  V  F  M  F  V  I   ST F  L  F  F  I  G  F  L  F  F  I  ST Y  G  F  I 
GuAuCAGuAuuuGuuuuGuG***A**UUUGuuuuuAuGuuuGuuA*UUUAGuuuuuGuuuuuuAuuGGAuuuuuGuuuuuuAuuuAAuAuGGGuuuA 
                               |:||||::|::||| |:|||:||:|||:|:|||||||||||:|:| 
                        gD 13TATAGAATATGAGTAAT-AGATCGAAGACAGAGAATAACCTAAAGATA 
:|||||:|||:|:::|:|||   |  :|||:|:|||||||||||| |:|| 
TATAGTTATAGATGGAGCAC---T--GAACGAGAATACAAACAAT-AGATA gE 
                                                               |||::||:||::|:|:|:|:||||||:|:||||| 
                                                           gC 14TAATTTAGAAGTAGAGAGTGAATTATGCTCAAAT 
 
 
De 
D  H  C  L  F  G  C   Y      E       L  Y  F  M  F  C  Y  G  Y  C  F  L  F  N  M  G  L   
 I  I  V  Y  L  V  V   M      S       C  I  L  C  F  V  M  V  I  V  F  Y  L  I  W  V  Y  
  S  L  F  I  W  L   L      W       V  V  F  Y  V  L  L  W  L  L  F  F  I  ST Y  G  F  I 
D  C  R  L  F  S  C   Y      E       L  Y  F  M  F  C  Y  D  Y  C  F  C  F  I  G  D  A  ND7 protein seq 
GAuuGUCGuuuAuuuAGuuG-uuA****UGA*****GuUGuAuuuuAuGuuuuGuuAuGAuuAuuGuuuuuGuuuuAuAGGuGAuGCAuuu ND7 777-866 
GAuCAuuGuuuAuuuGGuuG*UUA****UGA-----GuuGuAUUUUAuGuuuuGuuAuGGuuAuuGuuuuuuAuuuAAuAuGGGuuuA 
                :||: |:|    ::|-----::||:|:|||||:|:|::|:|||:|||||||||| 
           gDep 12TAAT-AGT----GTT-----TGACGTGAAATATAGAGTAGTACTAATAACAAAATATA 
 |||||::||:|:||:::|| :||    ::|-----||::||||||||||||||| 
ATAGTAGTAAGTGAATTGAC-GAT----GTT-----CAGTATAAAATACAAAATATATA gEep 
                                                           ::|||:::|||:|||:|||:||:|||||| 
                                                       gCe 13TGATAGTGAAAGATAGATTGTATCCAAAT 
 

 

 

266 

E Variants 
Ee 
R        W  D     F  G  D  H  C  L  F  G  C   Y      E  L  Y  F  M  F  C  Y  G 
       G  G     I  L  V  I  I  V  Y  L  V  V   M      S  C  I  L  C  F  V  M   
        V  G     F  W  W  S  L  F  I  W  L   L      W  V  V  F  Y  V  L  L  W  
A******GGuGGG***AuuuuGGuGAuCAuuGuuuAuuuGGuuG*UUA****UGAGuuGuAUUUUAuGuuuuGuuAuG 
|      |||:::   ||:|::||||||||:||| 
T------CCATTT---TAGAGTCACTAGTAGCAACATATA gFGep  
                         |||||::||:|:||:::|| :||    ::|||::||||||||||||||| 
                  gEep 12TATAGTAGTAAGTGAATTGAC-GAT----GTTCAGTATAAAATACAAAATATATA 
                                        :||: |:|    ::|::||:|:|||||:|:|::|:||| 
                                   gDep 12TAAT-AGT----GTTTGACGTGAAATATAGAGTAGTAC 
 
 
E 
 F  F  G  G  L  G  Y  Q  Y  L  F  C  D       L  F  L  C  L  L   F  S  F  C  F  L  L   
  F  L  G  V  ST G  I  S  I  C  F  V     I    C  F  Y  V  C  Y   L  V  F  V  F  Y  W  
I  F  W  G  F  R  V  S  V  F  V  L  W       F  V  F  M  F  V  I   ST F  L  F  F  I  G 
AUUUUUUGGGGGUUUAGGGuAuCAGuAuuuGuuuuGuG***A**UUUGuuuuuAuGuuuGuuA*UUUAGuuuuuGuuuuuuAuuG 
                                                 |:||||::|::||| |:|||:||:|||:|:|||||| 
                                          gD 13TATAGAATATGAGTAAT-AGATCGAAGACAGAGAATAAC 
                  :|||||:|||:|:::|:|||   |  :|||:|:|||||||||||| |:|| 
                 12TATAGTTATAGATGGAGCAC---T--GAACGAGAATACAAACAAT-AGATA gE 
 
 
 

 

 

267 

FG Variants 
Fe 
K  Y  Y  H  I  C  V  R        W  D     F  G  D  H  C  L  F  G  C   Y      E  
 N  I  I  I  F  V  L        G  G     I  L  V  I  I  V  Y  L  V  V   M      S 
  I  L  S  Y  L  C  ST       V  G     F  W  W  S  L  F  I  W  L   L      W 
AAAuAuuAuCAuAuuuGuGuuA******GGuGGG***AuuuuGGuGAuCAuuGuuuAuuuGGuuG*UUA****UGA 
  |||||||||:|||::|:|:|      |:|:|:   ||||||||:#||||:|| 
 11TATAATAGTGTAAGTATAGT------CTATCT---TAAAACCATCAGTAGCATATATA gFGep 
                                              |||||::||:|:||:::|| :||    ::| 
                                       gEep 12TATAGTAGTAAGTGAATTGAC-GAT----GTT 
 
 
Fe* 
  K  H  I  C  V  R        W  D     F  G  D  H  C  L  F  G  C   Y      E  
K  N  I  F  V  L        G  G     I  L  V  I  I  V  Y  L  V  V   M      S 
 K  T  Y  L  C  ST       V  G     F  W  W  S  L  F  I  W  L   L      W   
AAAAACAuAuuuGuGuuA******GGuGGG***AuuuuGGuGAuCAuuGuuuAuuuGGuuG*UUA****UGA 
      |:||:::|:|||      |||:::   ||:|::||||||||:||| 
 13TAATTGTAGGTATAAT------CCATTT---TAGAGTCACTAGTAGCAACATATA gFGe*p  
                                          |||||::||:|:||:::|| :||    ::| 
                                   gEep 12TATAGTAGTAAGTGAATTGAC-GAT----GTT 
 
 
Fex 
  I  I  I  K  F  V  I     W  C  F  V  F  D     L  F  C  V  F  H  C  L  F  G  C   Y      E  
I  L  ST S  S  L  L     F  G  V  L  F  L     I  C  F  V  Y  F  I  V  Y  L  V  V   M      S 
 Y  Y  N  Q  V  C  Y     L  V  F  C  F  W     F  V  L  C  I  S  L  F  I  W  L   L      W   
AuAuuAuAAuCAAGuuuGuuA***UUUGGuGuuuuGuuuuuG***AuuuGuuuuGuGuAuuuCAuuGuuuAuuuGGuuG*UUA****UGA 
|||:||:|||||||||::|:#   |||||| 
TATGATGTTAGTTCAAGTAGC---AAACCAAAAA gGep 
            |:|:|||:|   ||||||:||||||||||| 
TATAATAGTGACTTAGACAGT---AAACCATAAAACAAAAACATATA gFexp 
                            :|:|||:::|||:|   |||::::||:|:|||:|||||||||||| 
                      gFexp 12TATAAAGTGAAAGC---TAAGTGGAATATATAGAGTAACAAATAATACATA  
                                                             ||||::||:|:||:::|| :||    ::| 
                                                     gEep 12TATAGTAGTAAGTGAATTGAC-GAT----GTT 
 
 

 

268 

Ge 
  K  Y  K  Y  V  Y  D  I  Y  I  I  I  K  F  V  I     W  C  F  V  F  D     L  F  C  V 
R  N  I  N  M  C  M  I  Y  I  L  ST S  S  L  L     F  G  V  L  F  L     I  C  F  V   
 E  I  ST I  C  V  W  Y  I  Y  Y  N  Q  V  C  Y     L  V  F  C  F  W     F  V  L  C  
AGAAAUAUAAAUAUGUGUAUGAUAUAUAuAuuAuAAuCAAGuuuGuuA***UUUGGuGuuuuGuuuuuG***AuuuGuuuuGuG 
                          ||||:||:|||||||||::|:#   |||||| 
                  12TAATAGAATATGATGTTAGTTCAAGTAGC---AAACCAAAAA gGep 
                                       |:|:|||:|   ||||||:||||||||||| 
                  04TAATAGAGTATAATAGTGACTTAGACAGT---AAACCATAAAACAAAAACATATA gFexp 
                                                       :|:|||:::|||:|   |||::::||:|: 
                                                 gFexp 12TATAAAGTGAAAGC---TAAGTGGAATAT 
 
 
 
SDM80 only 
R  N  I  N  M  C  M  I  Y  K  N  N  G  S        C  G  F  V    G  W  F  R  L  G   Y     C  Y  C  E    
AGAAAUAUAAAUAUGUGUAUGAUAUAUAAAAACAAuGGuA******GuuGuGGuuuuG**UAGGuuGAuuCAGAuuGGG*UUA***UUGuuAuuGuGA** 
                                  ||:|||      :|::|:||:|::  :||:||:|||||#||||:| |||   |||| 
                           gFGe80 14TATCAT------TAGTATCAGAGT--GTCTAATTAAGTGTAACTC-AAT---AACATATA 
                                                                         |||::: |:|   |::|:|:|||:|   
                                                                        11TAATTT-AGT---AGTAGTGACATT-- 
 
   C  C  S  F  C  M  I  I  L  ST C  D  L  L  C  L  Y  Y  L  C  I  F  I  V  V  I  I  F  I  I  V  Y  K 
**AuGuuGuAGuuuuuGuAuGAuuAuuuuAuAGuGuGAuuUGuuGuGuuuAuAuuACuuAuGuAuuuuuAuuGuuGuuAuCAuuuuUAuuAuuGuAuAuA 
  ||:|||:||||||||||||#|||| 
--TATAACGTCAAAAACATACGAATATATA gDEe80 
 
  F  S  L  L  D  ST 
AGuUUUCGUUAUUAGAUUAAAAAAGUAUGCAAAUAAUUUUUGU 
 
 
 
 
 

 

 

269 

APPENDIX P.  gRNAs identified to edit the CR3 mRNAs of found in both TREU 667 and EATRO 164 gRNA transcriptomes. 

Editing 
Region 

Population 

Sequence 

G 

 

FG 

 

F 

gGt 
gGt 
gGep 
gGep 
 
gFGtp 
gFGtxp 
gFGtxp 
gFGtxp 
gFGtxp 
gFGtxp 
gFGtxp 
gFGtxp 
gFGtxp 
gFGtxp 
gFGe*p 
gFGe*p 
gFGe*p 
gFGe*p 
gFGe*p 
gFGe*p 
gFGe*p 
gFGe*p 
gFGe*p 
gFGe*p 
gFGep 
gFGep 
gFGep 
gFGe80 
gFGe80 

 
gFt 
gFt 
gFt 
gFt 
gFt 
gFt 
gFt 
gFt 
gFt 
gFt 

 

ATATGT ACAACAAAACCGAGCAATCAGATATAGAGTGAAAGTGATTAATT TTTTTTTTTTTC 

ATAT AACAACAAAACTGAACAATCAAATGTAGAGTGAAAGTGATTAATT TTTTTTTTTTTT 

AAAAACCAAAC GATGAACTTGATTGTAGTATA AGATAATTTTTTTTTTTT 

 AATACCGAGC GACAGATTTGATTGTAGTATA AGATAATATTTTTTTTTTTT 
 
ATATAT AAAATGTACAACCAGAATTAAGATAAAGTGATGATGTATATATT TT 

ATATTAAATAC ATGATATACGCGATAGATTATTAGAGTTATGAGTTAAT TTTTTTTTTTTGTTT 

       ATAC ATGATATATACAGTGAACTATTAGAATTATAGGT AATGAGATTTATTTTTTTTTTTTTT 

ATATTAAATAC ATGATATGCGCAGTAGACTATTAAAGTTATGAGTTAAT TTTTTTTTTTT 

ATATTAAATAC ATGATATACACGGTAGATTATTAGAGCTATGAGTTAAT TTTTTTTTTTTTT 

ATATTAAATAC ATGATATACACGGTAGATTATTAGAGCTATGAGTTA TTTTTTTTTTTTCT 

ATATTAAATAC ATGATATACACGGTAGATTATTAGAGCTATGAGTT TTTTTTTTTTT 

 TATTAAATAC ATGATATACACGGTAGATTATTAGAGCTATGAGTTAA CTTTTTTTTTTTTT 

   ATATTAAATAC ATATACACGGTAGATTATTAGAGCTATGAGTTAAT TTTTT 

ATATTAAATAC ATGATATACACGGTAGATTATTAGAGCTATGA TTTAAAATTTTTTTTTTTTTT 

    ATATAC AACGATGATCACTGAGATTTTACCTAATATGG ATGTTAATTTTTTTTTTTTTGGGGAACTGAA 

    ATACA AAACGATGATTACCGAGATTTCA GTTAATATGATTGTCTAATTTTTTTTTTTTT 

    ATATACAACT ATGATCACTGAGATTTTACCTAATATGG ATGTTAATTTTTTTTTTT 

ATATATACA AAACGATGATTACCGAGATTTCATTTAATATGATTGT CTAATTTT 

    ATATAC AACGATGATCACTGAGATTTTACCTAATATG TATGTTAATTTTTTTCTTTTT 

    ATATAC AACGATGATCACTGAGATTTTACCTAATATGGTTGTTAATTT TAGATTTTTT 

    ATATAC AACAATGATCACTGAGATTTTACCTAATATGGTTGTTAATTT TTTTTTTTTTT 

ATATAGAAACGATGAC TGCTAGAATTCTGCTTGATATGGATGT TAATTTTTTTTTTTTTT 

 ATATAC AACGATGATCACTGAGATTTTACCTAATACGA TGTTTAATTTTTTTTTTTTTTAT 

 ATATAC AACGATGATCACTGAGATTTTACCTAATATGGAT TTTTTTTTTTTTTTAAAGTGCGGCCATAGTGGGTG 

ATATATACGATGAC TACCAAAATTCTATCTGATATGAATGTGATAATATTT TTTTTTTT 

 ATATATACGATGAC TGCCAAAATTCTATCTGATATGAATGTGATAATATTT TTTTTTTT 

ATATAC AACGATGATCACTGAGATTTTACCTAAT TTTTTTTTTTT 

ATATACAATAACTCAATG TGAATTAATCTGTGAGACTATGATTACT TTTTTTTGTTTTT 

ATATACAATAACTCAATG TGAATTAATCTGTGAGACTATGATTACTATT TTTTTTTATTTT 

 
 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATGAT ATTTTTTTTTTTTTC 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATGATT TTTTTTTTTTTT 

ATATAATT AAATGTACAGACAAATGATAGAGAGACGATGAGATTAAGT TATATTTTTTTTTTTT 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGAT TTTTTTTTGTTTTT 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATG TTTTTTTTTTT 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTAAAATTAGATGAT ATTTTTTTTTTTT 

 ATATATAAAATA TACAAACGGACAATGAGAGAACAGTGAAATTAGATGAT AATATATTTTT 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATGA AATATTTTTTTTTTTTTT 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATA TAATATTTTTTTTTTTA 

  ATATAT AAATGTACAAACGGACAATGAGAGAACAGTGAAATTAGATGAT ATTTTTATATTTTTTTTTT 

270 

Reads 
TREU 
667 

493 
8,171 

21 

44,119 
2,106 
936 
404 
133 
115 
53,389 

133 

42 

12,250 
1,418 
534 

173 
140 
146 
112 

16,892 

  

  
 

  
  
  

  

  
  

  

  

 
  
  

  
  
  
  
  
  
  

Reads 
EATRO 

164 
21,297 

2,736 
94 

2,772 
354 
159 

53,459 
717 
166 
143 
97 
23 
1 

1,854 

  

 
  

  
  
  
  
  
  

  
  
  

  
  

349 
239 

  
138,710 
33,909 
3,738 
941 
932 
679 
603 
459 
263 
234 

gFt 
gFt 
gFt 
gFt 
gFt 
gFexp 
gFexp 
gFexp 
gFexp 
 
gE 
gE 
gE 
gE 
gEt' 
gEt' 
gEt' 
gEt' 
gEt 
gEep 
gEep 
gEep 
gEep 
gEep 
gEep 
gEep 
 
gDEe80 
gDEe80 
 
gD 
gD 
gD 
gD 
gD 
gD 
gD 
gD 
gD 
gD 
gD 
gD 
gD 
gDep 
gDep 
gDep 
gDep 
gDep 

 

E 

 

DE 

 

D 

 

 ATATAT AAAATGTACAAACGGACAATGAGAGAACAGCGAAATTAGATGAT ATCTTTTTTT 

   ATAT AAAATGTACAGACGAGCAGTGAAGAGACAGTGAGATTA TACATTTTTTTTTTTT 

      ATATAA GTACAAACAGACAATGAAAAGATGGTGAGACTGAGTA TACATTTTTTTTT 

    AAATT AATGTACAAATAAACGATAGAGAGACAGTGAGATTA TGATTGTAATTCTTTTTTTT 

   ATAT AAAATGTACAGACAAGCAATGAAGAGACAGTGAGATTAGATAGTT TTTTTTTTTC 

ATACAT AATAAACAATGAGATATATAAGGTGAATCGAAAGTGAAATATT TTTTTTTTTT 

                             ATATA CAAAAACAAAATACCAAATGACAGATT CAGTGATAATATGAGATAATTTT 

                              ATATAC AAAACAAAATACCAAATGACAGATT CAGTGATAATATGAGATAATTTTTTTTTTTT 

ATACAT AATAAACAATGAGATATATAAGGTGAATCGAAAGTGAAATAT ATTTTTTTTT 
 
A TAGATAACAAACATAAGAGCAAGTCACGAGGTAGATATTGATATTTT TTTTTTTTTC 

A TAGATAACAAACATAAGAGCAAGTCACGAGGTAGATATTGATATTTTAA TTTTTTTATTTTT 

A TAGATAACAAACATAAGAGCAAGTCACGAGGTAGATATTGATATTT GTTTTTTTTTTTTT 

 TTAGATAACAAACATAAGAGCAAGTCACGAGGTAGATATTGATATTTTAA TTTTTT 

ATATAT AATCACAAACAAATAGAAAATGAGAGAGGTGTATGA TACTATTTTTTTTTCTTTTT 

ATAC ATAAATCACAAACGAATAAGGAACAGAAGAGATGTATGA TATATTTTTTTTTTT 

 AT AATAAATCACAAACAGATAAAGAGCAAGAAAGGTGTATGA TTATATTTTTATTTTTT 

 ATAAAC AATCACAAACAGATAGAAGACAGAAGAGATGTATAGATA TTAAAATTTTTTTTT 

ATAT ATAACAAACATAGACAGAAATCTATAAGAGAGTATAATGATATGTGT TTTTTTTTTTTT 

    ATAT ATAAAACATAAAATATGACTTGTAGCAGTTAAGTGAATGATGA TATTTTTTTTTTTT 

AAATA ATAACAAAACATGAGATATAACTTGTAGTGATTAGATGAATGAT TTTTTTTTTTT 

          ATATAT TAAAATACAACTTATGATGACTAAGTGAATGATGATTGTCAA TTTTTTTTCTTTTT 

    ATAT ATAAAACATAAAATATGACTTGTAGCAGTTAAGTAAATGATGA TATTTTTTT 

          ATATAT TAAAATACAACTTATGATGGCTAAGTGAATGATGATTGTCAA TTTTTTTTTTTTC 

    ATAT ATAAAACATAAAATATGACTTGTAGCAGTTAAGTGAATGATGATT TTTTTTTATTTTT 

   ATATT ATAAAACATAAGATATAACTCATAGTGATTGAATAAGTGAT AATTTTTTTTTGTTTT 
 
ATATATAAG CATACAAAAACTGCAATATTTACAGTGATGATGATTTAATTT TTTTTTTT 

ATATATAAG CATACAAAAACTGCAGTATTTACAGTGATGATGATTTAATTT T 
 
         ATAGAAATCCAATAAGAGACAGAAGCTAGATAATGAGTATAAG ATATTTTTTTTTTTTT 

      ATATAT AAATCCAATAAGAAATGAAAGCTAGATAGTGAGTATAAGT TTTTTTTTTTTGTTT 

         GTAGAAATCCAATAAGAGACAGAAGCTAGATAATGAGTATAAG ATATATTTTTTTTGTTTTTTTTTTTT 

         ATAGAAATCCAATAAGAGACAGAAGCTAGATAATGAGTATAA TTTTTTTTTTTT 

    ATAT ACAAAAATCCAATGAAAAATAAAGACTGAGTGATGGATG CAATTTTCTTTTTTTTTT 

         ATAGAAATCCAATAAGAGACAGAAGCTAGATAATGAGTATA TTTTTTCTTTTTTTTTT 

         ATAGAAATCCAATAAGAGACAGAAGCTAGATAATGAGTAT TAGATATATTTTTTTTTTTTT 

         ATAGAAATCCAATAAAAGACAGAAGCTAGATAATGAGTATAAG ATATATTTT 

         ATAGAAATCCAATAAGAGACAGAAGCTAGATAATCAGTATAAG ATATATTTTTTTAA 

         ATAGAAATCCAATAAGAGACAGAAGCTAGATAATTAGTATAAG ATATATATTTTTTTTTT 

ATATAT AAACAAAAATTCAATAGAAAACGAGAACTGAGTAATTGATATAA TTTT 

   AT ATAGAAATCCAATAAGAGACAGAAACTAGGTAATGAGTATAAG ATTTTTTTTTTT 

ATAT AACAAAAATCCAATGAGAAATAGAGACTGAGTAATTGATATA TATTTTTATTTTTT 

ATAT AAAACAATAATCATGATGAGATATAAAGTGCAGTTTGTGATAATT TTTTTGTTTT 

      ATAT ATAATCATAACAAGATGTAGAGTACGATTTATAGTGATTAA TTTTTTTTTTTT 

ATAT AAAACAATAATCATGATGAGATATAAAGTGCAGTTTGTGATAATTAA TTTTT 

ATAT AAAACAATAATCATAATAAGATGTAAGGTACGATTTATGATAATT TTTTTTGTTTT 

      ATAT ATAATCATAACAGAGCATAGAATACAGTTTATAGTGATTAA TTTTTT 

271 

  

  
  
  
  
 

  
  

  
  

 

  
 
  

  
  
  
  
  
  
  
  
  

  

19,078 
592 
243 
210 

10,598 
867 
100 

388 
86 
80 
29 
2,493 
12,105 

2,491 
428 
157 
525 

535 

718 
217 
12 
2 

15 
8 

208 

360 
235 
197 
179 
15,900 
1,342 
199 
102 
584 

27,823 
5,923 
5,559 
131 

149 
10 
27,320 
331 
315 
206 
142 
137 
90 
74 
23 
19 
17 

68 
53 
31 

  
  
  
  

 

  
  
  
  

  
  
  
 

 

  
  

  
  

gDep 
gDep 
 
gC 
gC 
gC 
gC 
gC 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe80 
gCe 
gCe 
 
gB1B2 
gB1B2 
gB1B2 
gB1B2 
gB1B2 
gB1B2 
gB1B2 
gB1B2 
gB3t 
gB3t 
gB3t 
gB4 
gB4 
gB4 
gB4 
gB4 
gB4 
gB4t' 
gB5e 
gB6e 
gB6e 

C 

B 

 

 

 

ATAT AAAACAATAATCATAATAAGATGTAAGGTACGATTTATGAT TTTTTTTTTATGCAACGTTGGATACTGGAG 

      ATAT ATAATCATAACAAGATATAGAATACGATTCATAGTGATTAA TTTTTTTTTTTTTT 
 
   ATAT ATACAACAATAAACTCGTATTAAGTGAGAGATGAAGATTTAAT TTTTTTTTATTTT 

ATAT TAAACACAACGATAGATCTATATTAAGTAGAAGATAGAAATTTA TTTATTTTTTTTTTTT 

A AATATAAACACAATGATAAGCCTGTATTAGATAGAAAGTGAGAATTT TTTTTTTTTG 

    TA AAACACAACAATAGATTCGTATTAAATAGAGAATAGAGATTTA TTTTTTTTTTT 

 ATAT TAAACACAACAGTAGATCTATATTAAGTAGAAGATAGAAATTT TTTTTTTTT 

ATAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGAGATATA ATTTTTTTTTTTTTT 

ATAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGAGATAT TTTTTTTTTTTTN 

ATAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGAGATATATTTT TTTTTTTTTT 

ATAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGAGAT TTTTTTTTTTGTT 

ATAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGAGATATATTTTGT TTTTTT 

ATAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGAGATATATTT GTTTTTTTTT 

ATAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGAGATA ATTTTTTTTTATTTTTTTT 

ATAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGAGATATAT AATTTTTTTT 

ATAT ATATAAACACAATAGATCACACTGTAAAATAGTTATATGAGATATA ATTTTTCTTTTTTTT 

ATAT ATATAAACACAATAGATCGCACTGTAAGATAGTTATATGAGATAT TTTTTTTTTTTTG 

ATAT ATATAAACACAATAGATCGCACTGTAAGATAGTTATATGAGATATA ATTTTTTTTTTTTTC 

ATAT ATATAAACACAATAGATCGCACTGTAAGATAGTTATATGAGATATATTTT TTTTTTTTT 

ATAT ATATAAACACAATAGATCGCACTGTAAGATAGTTATATGAGAT TTTTTTTTTTT 

ATAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGGGATATA ATTTTTTTCTTTTTT 

NTAT ATATAAACACAATAGATCACACTGTAAGATAGTTATATGAGATATACTTT TTTTT 

A AATATAAACACAATGATAAACCTATGTTAGATAGAAAGTGATAGTT TTTTTTTTTTT 

AAAA AATATAAACACAATGATAAGCCTGTATTAGATAGAAAGTGATAATT TTTT 
 
ATATATA AATGATAACACTGAGAATCTGTAGTATAAGATATGATGATAA TTTTTTTTTTTTTT 

ATATATA AATGATAACACTGAGAATCTGTAGTATAAGATATGATGATA TTTTTTTATTTT 

ATATATAAATA ATAACACTGAGAATCTGTAGTATAAGATATGATGATAA TTTGTTTTTTTTTTTTTT 

ATATATA AATGATAACACTGAGAATCTGTAGTATAAGATATGATAAT TTTTTTTTTTT 

 ATAT ATAATGATAACACTGAGAATCTGTAATGTGAGATATGATGATAAGTTT TTTTTTTTTT 

 ATAT ATAATGATAACACTGAGAATCTGTAATGTGAGATATGATGATAA TTTTTTTTTTTTT 

ATATATA AATGATAACACTGAAGATCTGTAGTATAAGATATGATGATAA TTTTTTTTCTTTTTT 

 ATAT ATAATGATAACACTGAGAATCTGTAATGTGAGATATGATGATA TTTTTTTTTATTTT 

ATAT ATAAAAATGATACCATAGAATTTGGAGTAGTATAGATATGATGATA TTTTTTTTTTTT 

ATAT ATAAAAATGATACCATAGAGTTTGAGGTGATGTAGATATGATGGTA TTTTT 

ATAT ATAAAAATGATACCATAGAATTTGGAGTAGTATAGATATGATGAT TTTTTTTTTTT 

ATATA TTATATACAGTAATAGAGATGATAGTGATATATTGAGTGATATA ATTTTTTTTTTT 

        ATATACAGTAATAGAGATGATAGTGATATATTGAGTGATATAATTAATATGT GATTTTTAATTTTTTTTTTCTTT 

          ATACAGTAATAGAGATGATAGTGATATATTGAGTGATATAATTAATATGTTTT T 

   ATA TATATACAATAATAAGAGTGATAGCAGTG TATTAGTGATGATTAATTTTTTTTTTTTTTT 

   ATA TATATACAATAATGAGAATGATGACAGTG TATTAGTGATGTAATATTTTTTTTTATTTT 

ATATA TTATATACAGTAATAGAGATGATAGTGATATATTGAGTGATATAATTAATATGT GATTTTTAATATAATA 

ATAT ATAACATATCAAATGATAAAGTGAGAAAAGAGAGTGAATGTAAT TTTTTTCTTTTCTATCTATTATTACAT 

ATAT ATGATAACAACAGTAGAGATATATAAGTGATGTAAATGTGATAAT TTTTTTTTTTT 

ATAT ATAACAACAATAAGAATACGTGAGATAGTATAGATATGATGAT TTTTTTTTTTTG 

ATATATAAT ACAATAAGAATACGTGAGATAGTATAGATATGATGAT TTTTTTTTTTA 

272 

1 
1 

  
  
 

1,509 
88 

  
  
  
187,465 
170,393 
57,583 
2,157 
932 
871 
829 
250 
242 

  
  
  
  
  
  

  
 

  
  
  
  
  
  
  

  
  
  

32 

37,769 
330 
180 
98 

154 
8 
6 
5 

859 
2,300 

 
  
  

10,402 
209 
170 
214,918 
180,589 
67,000 
2,828 
553 
621 
1,014 
330 
358 
33,434 
14,634 
3,887 
828 
387 
374 

  

 
  
  
  
  

  
  
  

  

3 

10,346 
2,337 
139 
124 
1,183 
474 
383 
260 

367 
18 
13 
3,708 

222 

  

 

A 

gB7e 
gB7e 
gB7e 
gB7e 
gB7e 
gB7e 
gB7e 
 
gA1 
gA1 
gA2 
gA2 
gA2 
gA2 
gA2 
gA2 
gA2 
gA2 
gA2 
gA2 

               GATATATCTGAGTAAGTGATGACATTGTGAATTATTTTTTTT T 

AAATATAAG CAAATGATATATCTGAGTAAGTGATGACATTGTGAATT TTTTTTTTTTTT 

             ATGATATATCTGAGTAAGTGATGACATTGTGAATTAT GGTATATTTTTG 

              TGATATATCTGAGTAAGTGATGACATTGTGAATTACTTTTTT T 

            AATGATATATCTGAGTAAGTGATGACATTGTGAATTATTTTTTTT TTTTTT 

 AATATAAG CAAATGATATATCTGAGTAAGTGATGACATTGTGAATTATTTTTTTT TTTTTTTAAAAAAAAA 

AAATATAAG CAAATGATATATCTGAGTAAGTGATGACATTGTGAATTAT GGTATATAAAGTTAAATAATTTTATC 
 
ATA TTAATCTAATAACGGAGACTTGTATATGATAGTAATGATGATATT TTTTTTTTT 

ATATA TTAATCTAATAGCGGAGACTTGTATATGATAGTAATGATGATATT TT 

ATATAA AATCTAATGACGAGAACTTGTATATGATGATGAAGGTGATA ATTTTTTTTTTTTTG 

   ATA AATCTAATAACGAGAATTTATGTACGATAATGAAAGTGATAT ATTTTTTTTCTTTTT 

ATATAA AATCTAATGACGAAAGCTTGTATATGGTAATGAAGATGGTATT TTTTTTTTTT 

ATATAA AATCTAATGACGAAAGCTTGTATATGGTAATGAAGATGGTA ATTTTTTTTTCTTTTT 

ATATAA AATCTAATGACGAGAACTTGTATATGATGATGAAGGTGATATT TCTTTTTTTTT 

 ATATA AATCTAATAACGAGAATTTATGTACGATAATGAAAGTGATATT TTTTTCTTTTTTTTT 

 ATACAT ATCTAATAACGGAAGCTTATGTGTAGTAGTGAAGATGGTA TATTTTTTTTTTTTT 

ATATAA AATCTAATGACGAAAGCTTGTATATGGTAGTGAAGATGGTA ATTTTTTTTTTTTTT 

ATATAA AATCTAATGACGAAAGCTTGTATATGGTAGTGAAGATGGTATT TTTTTTTTTT 

 ATATA AATCTAATAACGGAAATTTGTATATGATGATAGAAGTGATAGT TTTTTTTTTTTT 

 

  

  
  
  

 
  

  
  
  
  
  
  
  

489 
250 
117 
49 
14 

2,697 

16,984 
540 
540 
419 
331 
106 
103 

3,042 

884 
522 

104 

  
  
 

  

1,234 
382 
40 

  
  
  

273 

 
 

 

 

 

REFERENCES 

 

274 

REFERENCES 

 

1.  

2.  

3.  

4.  

5.  

6.  

7.  

8.  

9.  

Shapiro TA, Englund PT. The structure and replication of kinetoplast DNA. Annu Rev 
Microbiol. 1995;49: 117+.  

Vickerman K. The evolutionary expansion of the trypanosomatid flagellates. Int J 
Parasitol. 1994;24: 1317–1331. doi:10.1016/0020-7519(94)90198-8 

Simpson AGB, Stevens JR, Lukeš J. The evolution and diversity of kinetoplastid flagellates. 
Trends Parasitol. 2006;22: 168–174. doi:10.1016/j.pt.2006.02.006 

Read LK, Lukeš J, Hashimi H. Trypanosome RNA editing: the complexity of getting U in 
and taking U out. Wiley Interdiscip Rev RNA. 2016;7: 33–51. doi:10.1002/wrna.1313 

Priest JW, Hajduk SL. Developmental regulation of mitochondrial biogenesis 
inTrypanosoma brucei. J Bioenerg Biomembr. 1994;26: 179–191. 
doi:10.1007/BF00763067 

Aphasizhev R, Aphasizheva I. Uridine insertion/deletion editing in trypanosomes: a 
playground for RNA-guided information transfer. Wiley Interdiscip Rev RNA. 2011;2: 669–
685. doi:10.1002/wrna.82 

Aphasizhev R, Aphasizheva I. Mitochondrial RNA editing in trypanosomes: Small RNAs in 
control. Biochimie. 2014;100: 125–131. doi:10.1016/j.biochi.2014.01.003 

Hong M i. n., Simpson L. Genomic Organization of Trypanosoma brucei Kinetoplast DNA 
Minicircles. Protist. 2003;154: 265–279. doi:10.1078/143446103322166554 

Koslowsky D, Sun Y, Hindenach J, Theisen T, Lucas J. The insect-phase gRNA 
transcriptome in Trypanosoma brucei. Nucleic Acids Res. 2014;42: 1873–1886. 
doi:10.1093/nar/gkt973 

10.   CDC - African Trypanosomiasis [Internet]. 2 May 2017 [cited 25 Jan 2019]. Available: 

https://www.cdc.gov/parasites/sleepingsickness/index.html 

11.   Programme Against African Trypanosomosis (PAAT) | Food and Agriculture Organization 

of the United Nations [Internet]. [cited 25 Jan 2019]. Available: 
http://www.fao.org/paat/en/ 

12.   Horn D. Antigenic variation in African trypanosomes. Mol Biochem Parasitol. 2014;195: 

123–129. doi:10.1016/j.molbiopara.2014.05.001 

13.   Sudarshi D, Lawrence S, Pickrell WO, Eligar V, Walters R, Quaderi S, et al. Human African 

Trypanosomiasis Presenting at Least 29 Years after Infection—What Can This Teach Us 

 

275 

about the Pathogenesis and Control of This Neglected Tropical Disease? PLOS Negl Trop 
Dis. 2014;8: e3349. doi:10.1371/journal.pntd.0003349 

14.   Matthews KR. Developments in the Differentiation of Trypanosoma brucei. Parasitol 

Today. 1999;15: 76–80. doi:10.1016/S0169-4758(98)01381-7 

15.   van Hellemond JJ, Bakker BM, Tielens AGM. Energy Metabolism and Its 

Compartmentation in Trypanosoma brucei. In: Poole RK, editor. Advances in Microbial 
Physiology. Academic Press; 2005. pp. 199–226. doi:10.1016/S0065-2911(05)50005-5 

16.   Hannaert V, Bringaud F, Opperdoes FR, Michels PA. Evolution of energy metabolism and 

its compartmentation in Kinetoplastida. Kinetoplastid Biol Dis. 2003;2: 11. 
doi:10.1186/1475-9292-2-11 

17.   Nolan DP, Voorheis HP. The mitochondrion in bloodstream forms of Trypanosoma brucei 
is energized by the electrogenic pumping of protons catalysed by the F1F0-ATPase. Eur J 
Biochem. 1992;209: 207–216. doi:10.1111/j.1432-1033.1992.tb17278.x 

18.   Vertommen D, Van Roy J, Szikora J-P, Rider MH, Michels PAM, Opperdoes FR. Differential 
expression of glycosomal and mitochondrial proteins in the two major life-cycle stages of 
Trypanosoma brucei. Mol Biochem Parasitol. 2008;158: 189–201. 
doi:10.1016/j.molbiopara.2007.12.008 

19.   Weelden SWH van, Fast B, Vogt A, Meer P van der, Saas J, Hellemond JJ van, et al. 

Procyclic Trypanosoma brucei Do Not Use Krebs Cycle Activity for Energy Generation. J 
Biol Chem. 2003;278: 12854–12863. doi:10.1074/jbc.M213190200 

20.   Weelden SWH van, Hellemond JJ van, Opperdoes FR, Tielens AGM. New Functions for 

Parts of the Krebs Cycle in Procyclic Trypanosoma brucei, a Cycle Not Operating as a 
Cycle. J Biol Chem. 2005;280: 12451–12460. doi:10.1074/jbc.M412447200 

21.   Bringaud F, Rivière L, Coustou V. Energy metabolism of trypanosomatids: Adaptation to 

available carbon sources. Mol Biochem Parasitol. 2006;149: 1–9. 
doi:10.1016/j.molbiopara.2006.03.017 

22.   Oberle M, Balmer O, Brun R, Roditi I. Bottlenecks and the Maintenance of Minor 

Genotypes during the Life Cycle of Trypanosoma brucei. PLOS Pathog. 2010;6: e1001023. 
doi:10.1371/journal.ppat.1001023 

23.   Abbeele JVD, Claes Y, Bockstaele DV, Ray DL, Coosemans M. Trypanosoma brucei spp. 

development in the tsetse fly: characterization of the post-mesocyclic stages in the 
foregut and proboscis. Parasitology. 1999;118: 469–478.  

24.   Michelotti EF, Hajduk SL. Developmental regulation of trypanosome mitochondrial gene 

expression. J Biol Chem. 1987;262: 927–932.  

 

276 

25.   Feagin JE, Jasmer DP, Stuart K. Developmentally regulated addition of nucleotides within 

apocytochrome b transcripts in Trypanosoma brucei. Cell. 1987;49: 337–345. 
doi:10.1016/0092-8674(87)90286-8 

26.   Read LK, Wilson KD, Myler PJ, Stuart K. Editing of Trypanosoma brucei maxicircle CR5 

mRNA generates variable carboxy terminal predicted protein sequences. Nucleic Acids 
Res. 1994;22: 1489–1495. doi:10.1093/nar/22.8.1489 

27.   Koslowsky DJ, Bhat GJ, Perrollaz AL, Feagin JE, Stuart K. The MURF3 gene of T. brucei 

contains multiple domains of extensive editing and is homologous to a subunit of NADH 
dehydrogenase. Cell. 1990;62: 901–911. doi:10.1016/0092-8674(90)90265-G 

28.   Souza AE, Myler PJ, Stuart K. Maxicircle CR1 transcripts of Trypanosoma brucei are edited 
and developmentally regulated and encode a putative iron-sulfur protein homologous to 
an NADH dehydrogenase subunit. Mol Cell Biol. 1992;12: 2100–2107. 
doi:10.1128/MCB.12.5.2100 

29.   Souza AE, Shu HH, Read LK, Myler PJ, Stuart KD. Extensive editing of CR2 maxicircle 
transcripts of Trypanosoma brucei predicts a protein with homology to a subunit of 
NADH dehydrogenase. Mol Cell Biol. 1993;13: 6832–6840. doi:10.1128/MCB.13.11.6832 

30.   Read LK, Myler PJ, Stuart K. Extensive editing of both processed and preprocessed 

maxicircle CR6 transcripts in Trypanosoma brucei. J Biol Chem. 1992;267: 1123–1128.  

31.   Bhat GJ, Koslowsky D, Feagin J, Smiley B, Stuart K. An Extensively Edited Mitochondrial 

Transcript in Kinetoplastids Encodes a Protein Homologous to ATPase Subunit 6. Cell. 
1990;61: 885–894.  

32.   Feagin JE, Abraham JM, Stuart K. Extensive editing of the cytochrome c oxidase III 

transcript in Trypanosoma brucei. Cell. 1988;53: 413–422. doi:10.1016/0092-
8674(88)90161-4 

33.   Blum B, Bakalara N, Simpson L. A model for RNA editing in kinetoplastid mitochondria: 

“Guide” RNA molecules transcribed from maxicircle DNA provide the edited information. 
Cell. 1990;60: 189–198. doi:10.1016/0092-8674(90)90735-W 

34.   Eperon IC, Janssen JWG, Hoeijmakers JHJ, Borst P. The major transcripts of the 

kinetoplast DNA of Trypanosoma brucei are very small ribosomal RNAs. Nucleic Acids 
Res. 1983;11: 105–125. doi:10.1093/nar/11.1.105 

35.   Adler BK, Harris ME, Bertrand KI, Hajduk SL. Modification of Trypanosoma brucei 

mitochondrial rRNA by posttranscriptional 3’ polyuridine tail formation. Mol Cell Biol. 
1991;11: 5878–5884. doi:10.1128/MCB.11.12.5878 

 

277 

36.   Aphasizheva I, Maslov D, Wang X, Huang L, Aphasizhev R. Pentatricopeptide Repeat 

Proteins Stimulate mRNA Adenylation/Uridylation to Activate Mitochondrial Translation 
in Trypanosomes. Mol Cell. 2011;42: 106–117. doi:10.1016/j.molcel.2011.02.021 

37.   Bhat GJ, Souza AE, Feagin JE, Stuart K. Transcript-specific developmental regulation of 

polyadenylation in Trypanosoma brucei mitochondria. Mol Biochem Parasitol. 1992;52: 
231–240. doi:10.1016/0166-6851(92)90055-O 

38.   Hensgens LA, Brakenhoff J, De Vries BF, Sloof P, Tromp MC, Van Boom JH, et al. The 

sequence of the gene for cytochrome c oxidase subunit I, a frameshift containing gene 
for cytochrome c oxidase subunit II and seven unassigned reading frames in 
Trypanosoma brucei mitochrondrial maxi-circle DNA. Nucleic Acids Res. 1984;12: 7327–
7344.  

39.   Benne R, Van Den Burg J, Brakenhoff JPJ, Sloof P, Van Boom JH, Tromp MC. Major 

transcript of the frameshifted coxll gene from trypanosome mitochondria contains four 
nucleotides that are not encoded in the DNA. Cell. 1986;46: 819–826. doi:10.1016/0092-
8674(86)90063-2 

40.   Feagin JE, Stuart K. Differential expression of mitochondrial genes between life cycle 

stages of Trypanosoma brucei. Proc Natl Acad Sci. 1985;82: 3380–3384.  

41.   Payne M, Rothwell V, Jasmer DP, Feagin JE, Stuart K. Identification of mitochondrial 

genes in Trypanosoma brucei and homology to cytochrome c oxidase II in two different 
reading frames. Mol Biochem Parasitol. 1985;15: 159–170. doi:10.1016/0166-
6851(85)90117-3 

42.   Kannan S, Burger G. Unassigned MURF1 of kinetoplastids codes for NADH dehydrogenase 

subunit 2. BMC Genomics. 2008;9: 455. doi:10.1186/1471-2164-9-455 

43.   Feagin JE, Jasmer DP, Stuart K. Apocytochrome b and other mitochondrial DNA 

sequences are differentially expressed during the life cycle of Trypanosoma brucei. 
Nucleic Acids Res. 1985;13: 4577–4596. doi:10.1093/nar/13.12.4577 

44.   Stuart K, Feagin JE, Jasmer DP. Regulation of Mitochondrial Gene Expression in 

Trypanosoma brucei. Sequence Specificity in Transcription and Translation. Alan R. Liss; 
1985. pp. 621–631.  

45.  

Jasmer DP, Feagin JE, Stuart K. Diverse patterns of expression of the cytochrome c 
oxidase subunit I gene and unassigned reading frames 4 and 5 during the life cycle of 
Trypanosoma brucei. Mol Cell Biol. 1985;5: 3041–3047. doi:10.1128/MCB.5.11.3041 

46.   Feagin JE, Stuart K. Developmental aspects of uridine addition within mitochondrial 

transcripts of Trypanosoma brucei. Mol Cell Biol. 1988;8: 1259–1265. 
doi:10.1128/MCB.8.3.1259 

 

278 

47.   Stuart K. The RNA editing process in Trypanosoma brucei. Semin Cell Biol. 1993;4: 251–

260. doi:10.1006/scel.1993.1030 

48.   Corell RA, Myler P, Stuart K. Trypanosoma brucei mitochondrial CR4 gene encodes an 
extensively edited mRNA with completely edited sequence only in bloodstream forms. 
Mol Biochem Parasitol. 1994;64: 65–74. doi:10.1016/0166-6851(94)90135-X 

49.   Hajduk SL, Adler BK, Madison S, McManus M, Sabatini R. Insertional and deletional RNA 

editing in trypanosome mitochondria. Nucleic Acids Symp Ser. 1996; 15–18.  

50.   Koslowsky DJ, Riley GR, Feagin JE, Stuart K. Guide RNAs for transcripts with 

developmentally regulated RNA editing are present in both life cycle stages of 
Trypanosoma brucei. Mol Cell Biol. 1992;12: 2043–2049. doi:10.1128/MCB.12.5.2043 

51.   Riley GR, Corell RA, Stuart K. Multiple guide RNAs for identical editing of Trypanosoma 

brucei apocytochrome b mRNA have an unusual minicircle location and are 
developmentally regulated. J Biol Chem. 1994;269: 6101–6108.  

52.   Greif G, Rodriguez M, Reyna-Bello A, Robello C, Alvarez-Valin F. Kinetoplast adaptations 
in American strains from Trypanosoma vivax. Mutat Res Mol Mech Mutagen. 2015;773: 
69–82. doi:10.1016/j.mrfmmm.2015.01.008 

53.   Ooi C-P, Schuster S, Cren-Travaillé C, Bertiaux E, Cosson A, Goyard S, et al. The Cyclical 
Development of Trypanosoma vivax in the Tsetse Fly Involves an Asymmetric Division. 
Front Cell Infect Microbiol. 2016;6. doi:10.3389/fcimb.2016.00115 

54.   Ruvalcaba-Trejo LI, Sturm NR. The Trypanosoma cruzi Sylvio X10 strain maxicircle 

sequence: the third musketeer. BMC Genomics. 2011;12: 58. doi:10.1186/1471-2164-12-
58 

55.   Cazzulo JJ. Protein and amino acid catabolism in Trypanosoma cruzi. Comp Biochem 

Physiol Part B Comp Biochem. 1984;79: 309–320. doi:10.1016/0305-0491(84)90381-X 

56.   Cazzulo JJ. Intermediate metabolism inTrypanosoma cruzi. J Bioenerg Biomembr. 

1994;26: 157–165. doi:10.1007/BF00763064 

57.   Tyler KM, Engman DM. The life cycle of Trypanosoma cruzi revisited. Int J Parasitol. 

2001;31: 472–481. doi:10.1016/S0020-7519(01)00153-9 

58.   Cannata JJB, Cazzulo JJ. The aerobic fermentation of glucose by Trypanosoma cruzi. Comp 

Biochem Physiol Part B Comp Biochem. 1984;79: 297–308. doi:10.1016/0305-
0491(84)90380-8 

59.   Sanchez-moreno M, Fernandez-becerra MC, Castilla-calvente JJ, Osuna A. Metabolic 
studies by 1H NMR of different forms of Trypanosoma cruzi as obtained by “in vitro” 

 

279 

culture. FEMS Microbiol Lett. 1995;133: 119–125. doi:10.1111/j.1574-
6968.1995.tb07871.x 

60.   Prevention C-C for DC and. CDC - Leishmaniasis [Internet]. 16 Oct 2018 [cited 25 Jan 

2019]. Available: https://www.cdc.gov/parasites/leishmaniasis/index.html 

61.   McConville MJ, Naderer T. Metabolic Pathways Required for the Intracellular Survival of 
Leishmania. Annu Rev Microbiol. 2011;65: 543–561. doi:10.1146/annurev-micro-090110-
102913 

62.   Saunders EC, Souza DPD, Naderer T, Sernee MF, Ralton JE, Doyle MA, et al. Central 

carbon metabolism of Leishmania parasites. Parasitology. 2010;137: 1303–1313. 
doi:10.1017/S0031182010000077 

63.   Simpson L, Thiemann OH, Savill NJ, Alfonzo JD, Maslov DA. Evolution of RNA editing in 

trypanosome mitochondria. Proc Natl Acad Sci. 2000;97: 6986–6993. 
doi:10.1073/pnas.97.13.6986 

64.   Porcel BM, Denoeud F, Opperdoes F, Noel B, Madoui M-A, Hammarton TC, et al. The 

Streamlined Genome of Phytomonas spp. Relative to Human Pathogenic Kinetoplastids 
Reveals a Parasite Tailored for Plants. PLOS Genet. 2014;10: e1004007. 
doi:10.1371/journal.pgen.1004007 

65.   Nawathean P, Maslov DA. The absence of genes for cytochrome c oxidase and reductase 
subunits in maxicircle kinetoplast DNA of the respiration-deficient plant trypanosomatid 
Phytomonas serpens. Curr Genet. 2000;38: 95–103. doi:10.1007/s002940000135 

66.   Blum B, Simpson L. Guide RNAs in kinetoplastid mitochondria have a nonencoded 3′ 
oligo(U) tail involved in recognition of the preedited region. Cell. 1990;62: 391–397. 
doi:10.1016/0092-8674(90)90375-O 

67.   Ochsenreiter T, Hajduk SL. Alternative editing of cytochrome c oxidase III mRNA in 

trypanosome mitochondria generates protein diversity. EMBO Rep. 2006;7: 1128–1133. 
doi:10.1038/sj.embor.7400817 

68.   Ochsenreiter T, Anderson S, Wood ZA, Hajduk SL. Alternative RNA Editing Produces a 
Novel Protein Involved in Mitochondrial DNA Maintenance in Trypanosomes. Mol Cell 
Biol. 2008;28: 5595–5604. doi:10.1128/MCB.00637-08 

69.   Covello P, Gray M. On the evolution of RNA editing. Trends Genet. 1993;9: 265–268. 

doi:10.1016/0168-9525(93)90011-6 

70.   Gray MW. Evolutionary Origin of RNA Editing. Biochemistry (Mosc). 2012;51: 5235–5242. 

doi:10.1021/bi300419r 

 

280 

71.   Gray MW, Lukeš J, Archibald JM, Keeling PJ, Doolittle WF. Irremediable Complexity? 

Science. 2010;330: 920–921. doi:10.1126/science.1198594 

72.   Stoltzfus A. On the Possibility of Constructive Neutral Evolution. J Mol Evol. 1999;49: 

169–181. doi:10.1007/PL00006540 

73.   Stoltzfus A. Constructive neutral evolution: exploring evolutionary theory’s curious 

disconnect. Biol Direct. 2012;7: 35. doi:10.1186/1745-6150-7-35 

74.  

Leeder W-M, Hummel NFC, Göringer HU. Multiple G-quartet structures in pre-edited 
mRNAs suggest evolutionary driving force for RNA editing in trypanosomes. Sci Rep. 
2016;6. doi:10.1038/srep29810 

75.   Speijer D. Evolutionary Aspects of RNA Editing. In: Göringer HU, editor. RNA Editing. 

Berlin, Heidelberg: Springer Berlin Heidelberg; 2008. pp. 199–227. doi:10.1007/978-3-
540-73787-2_10 

76.   Buhrman H, van der Gulik P, Severini S, Speijer D. A mathematical model of kinetoplastid 

mitochondrial gene scrambling advantage. ArXiv13071163 Q-Bio. 2013; Available: 
http://arxiv.org/abs/1307.1163 

77.   Thiemann OH, Maslov DA, Simpson L. Disruption of RNA editing in Leishmania tarentolae 

by the loss of minicircle-encoded guide RNA genes. EMBO J. 1994;13: 5689–5700.  

78.   Savill Nicholas J., Higgs Paul G. A theoretical study of random segregation of minicircles in 

trypanosomatids. Proc R Soc Lond B Biol Sci. 1999;266: 611–620. 
doi:10.1098/rspb.1999.0680 

79.  

Lynch M, Bürger R, Butcher D, Gabriel W. The Mutational Meltdown in Asexual 
Populations. J Hered. 1993;84: 339–344. doi:10.1093/oxfordjournals.jhered.a111354 

80.  

LaBar T, Adami C. Evolution of drift robustness in small populations. Nat Commun. 
2017;8: 1012. doi:10.1038/s41467-017-01003-7 

81.   Muller HJ. The relation of recombination to mutational advance. Mutat Res Mol Mech 

Mutagen. 1964;1: 2–9. doi:10.1016/0027-5107(64)90047-8 

82.   Haigh J. The accumulation of deleterious genes in a population—Muller’s Ratchet. Theor 

Popul Biol. 1978;14: 251–267. doi:10.1016/0040-5809(78)90027-8 

83.   Poon A, Otto SP. Compensating for Our Load of Mutations: Freezing the Meltdown of 

Small Populations. Evolution. 2000;54: 1467–1479. doi:10.1111/j.0014-
3820.2000.tb00693.x 

 

281 

84.   Whitlock MC. Fixation of New Alleles and the Extinction of Small Populations: Drift Load, 

Beneficial Alleles, and Sexual Selection. Evolution. 2000;54: 1855–1861. 
doi:10.1111/j.0014-3820.2000.tb01232.x 

85.   Normark S, Bergström S, Edlund T, Grundström T, Jaurin B, Lindberg FP, et al. 

Overlapping genes. Annu Rev Genet. 1983;17: 499–525.  

86.  

Liang H. Decoding the dual-coding region: key factors influencing the translational 
potential of a two-ORF-containing transcript. Cell Res. 2010;20: 508–509. 
doi:10.1038/cr.2010.62 

87.   Belshaw R, Pybus OG, Rambaut A. The evolution of genome compression and genomic 

novelty in RNA viruses. Genome Res. 2007;17: 000–000. doi:10.1101/gr.6305707 

88.   Brandes N, Linial M. Gene overlapping and size constraints in the viral world. Biol Direct. 

2016;11: 26. doi:10.1186/s13062-016-0128-3 

89.   Mouilleron H, Delcourt V, Roucou X. Death of a dogma: eukaryotic mRNAs can code for 

more than one protein. Nucleic Acids Res. 2016;44: 14–23. doi:10.1093/nar/gkv1218 

90.  

Liang H, Landweber LF. A genome-wide study of dual coding regions in human 
alternatively spliced genes. Genome Res. 2006;16: 190–196. doi:10.1101/gr.4246506 

91.   Ribrioux S, Brüngger A, Baumgarten B, Seuwen K, John MR. Bioinformatics prediction of 

overlapping frameshifted translation products in mammalian transcripts. BMC Genomics. 
2008;9: 122. doi:10.1186/1471-2164-9-122 

92.   Pallejà A, Harrington ED, Bork P. Large gene overlaps in prokaryotic genomes: result of 

functional constraints or mispredictions? BMC Genomics. 2008;9: 335. doi:10.1186/1471-
2164-9-335 

93.   Chung W-Y, Wadhawan S, Szklarczyk R, Pond SK, Nekrutenko A. A First Look at ARFome: 

Dual-Coding Genes in Mammalian Genomes. PLOS Comput Biol. 2007;3: e91. 
doi:10.1371/journal.pcbi.0030091 

94.   Klemke M. Two overlapping reading frames in a single exon encode interacting proteins--
a novel way of gene usage. EMBO J. 2001;20: 3849–3860. doi:10.1093/emboj/20.14.3849 

95.   Yoshida H, Oku M, Suzuki M, Mori K. pXBP1(U) encoded in XBP1 pre-mRNA negatively 

regulates unfolded protein response activator pXBP1(S) in mammalian ER stress 
response. J Cell Biol. 2006;172: 565–575. doi:10.1083/jcb.200508145 

96.   Peleg O, Kirzhner V, Trifonov E, Bolshoy A. Overlapping Messages and Survivability. J Mol 

Evol. 2004;59: 520–527. doi:10.1007/s00239-004-2644-5 

 

282 

97.   Sykes SE, Hajduk SL. Dual Functions of α-Ketoglutarate Dehydrogenase E2 in the Krebs 

Cycle and Mitochondrial DNA Inheritance in Trypanosoma brucei. Eukaryot Cell. 2013;12: 
78–90. doi:10.1128/EC.00269-12 

98.   Sykes S, Szempruch A, Hajduk S. The Krebs Cycle Enzyme α-Ketoglutarate Decarboxylase 
Is an Essential Glycosomal Protein in Bloodstream African Trypanosomes. Eukaryot Cell. 
2015;14: 206–215. doi:10.1128/EC.00214-14 

99.   Ochsenreiter T, Cipriano M, Hajduk SL. Alternative mRNA Editing in Trypanosomes Is 
Extensive and May Contribute to Mitochondrial Protein Diversity. PLoS ONE. 2008;3: 
e1566. doi:10.1371/journal.pone.0001566 

100.   Aphasizheva I, Maslov DA, Aphasizhev R. Kinetoplast DNA-encoded ribosomal protein 

S12. RNA Biol. 2013;10: 1679–1688. doi:10.4161/rna.26733 

101.   Stuart K, Gobright E, Jenni L, Milhausen M, Thomashow L, Agabian N. The Istar 1 
Serodeme of Trypanosoma brucei: Development of a New Serodeme. J Parasitol. 
1984;70: 747–754. doi:10.2307/3281757 

102.   Chomczynski P, Sacchi N. Single-step method of RNA isolation by acid guanidinium 

thiocyanate-phenol-chloroform extraction. Anal Biochem. 1987;162: 156–159. 
doi:10.1016/0003-2697(87)90021-2 

103.   Agabian N, Thomashow L, Milhausen M, Stuart K. Structural Analysis of Variant and 

Invariant Genes in Trypanosomes. Am J Trop Med Hyg. 1980;29: 1043–1049.  

104.   Friedman RC, Farh KK-H, Burge CB, Bartel DP. Most mammalian mRNAs are conserved 

targets of microRNAs. Genome Res. 2009;19: 92–105. doi:10.1101/gr.082701.108 

105.   Madina BR, Kumar V, Metz R, Mooers BHM, Bundschuh R, Cruz-Reyes J. Native 

mitochondrial RNA-binding complexes in kinetoplastid RNA editing differ in guide RNA 
composition. RNA. 2014; doi:10.1261/rna.044495.114 

106.   Clement SL, Mingler MK, Koslowsky DJ. An Intragenic Guide RNA Location Suggests a 

Complex Mechanism for Mitochondrial Gene Expression in Trypanosoma brucei. 
Eukaryot Cell. 2004;3: 862–869. doi:10.1128/EC.3.4.862-869.2004 

107.   Cristodero M, Seebeck T, Schneider A. Mitochondrial translation is essential in 
bloodstream forms of Trypanosoma brucei. Mol Microbiol. 2010;78: 757–769. 
doi:10.1111/j.1365-2958.2010.07368.x 

108.   MacLeod A, Turner CMR, Tait A. A high level of mixed Trypanosoma brucei infections in 

tsetse flies detected by three hypervariable minisatellites. Mol Biochem Parasitol. 
1999;102: 237–248. doi:10.1016/S0166-6851(99)00101-2 

 

283 

109.   Balmer O, Caccone A. Multiple-strain infections of Trypanosoma brucei across Africa. 

Acta Trop. 2008;107: 275–279. doi:10.1016/j.actatropica.2008.06.006 

110.   Szempruch AJ, Choudhury R, Wang Z, Hajduk SL. In vivo analysis of trypanosome 

mitochondrial RNA function by artificial site-specific RNA endonuclease-mediated 
knockdown. RNA. 2015; doi:10.1261/rna.052084.115 

111.   Surve S, Heestand M, Panicucci B, Schnaufer A, Parsons M. Enigmatic Presence of 

Mitochondrial Complex I in Trypanosoma brucei Bloodstream Forms. Eukaryot Cell. 
2012;11: 183–193. doi:10.1128/EC.05282-11 

112.   Verner Z, Čermáková P, Škodová I, Kriegová E, Horváth A, Lukeš J. Complex I 

(NADH:ubiquinone oxidoreductase) is active in but non-essential for procyclic 
Trypanosoma brucei. Mol Biochem Parasitol. 2011;175: 196–200. 
doi:10.1016/j.molbiopara.2010.11.003 

113.   Speijer D. Is kinetoplastid pan-editing the result of an evolutionary balancing act? IUBMB 

Life. 2006;58: 91–96. doi:10.1080/15216540600551355 

114.   Hudson K m., Taylor AE r., Elce B j. Antigenic changes in Trypanosoma brucei on 
transmission by tsetse fly. Parasite Immunol. 1980;2: 57–69. doi:10.1111/j.1365-
3024.1980.tb00043.x 

115.   Gibson W. The origins of the trypanosome genome strains Trypanosoma brucei brucei 

TREU 927, T. b. gambiense DAL 972, T. vivax Y486 and T. congolense IL3000. Parasit 
Vectors. 2012;5: 71.  

116.   Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable generation of 

high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 
2014;7: 539–539. doi:10.1038/msb.2011.75 

117.   Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence 

database. Science. 1992;256: 1443–1445.  

118.   Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: 

Machine Learning in Python. J Mach Learn Res. 2011;12: 2825−2830.  

119.   Kirby LE, Sun Y, Judah D, Nowak S, Koslowsky D. Analysis of the Trypanosoma brucei 

EATRO 164 Bloodstream Guide RNA Transcriptome. PLOS Negl Trop Dis. 2016;10: 
e0004793. doi:10.1371/journal.pntd.0004793 

120.   Duarte M, Tomás AM. The mitochondrial complex I of trypanosomatids - an overview of 
current knowledge. J Bioenerg Biomembr. 2014;46: 299–311. doi:10.1007/s10863-014-
9556-x 

 

284 

121.   Simpson L, Neckelmann N, Cruz VF de la, Simpson AM, Feagin JE, Jasmer DP, et al. 

Comparison of the maxicircle (mitochondrial) genomes of Leishmania tarentolae and 
Trypanosoma brucei at the level of nucleotide sequence. J Biol Chem. 1987;262: 6182–
6196.  

122.   Hanada K, Shiu S-H, Li W-H. The Nonsynonymous/Synonymous Substitution Rate Ratio 

versus the Radical/Conservative Replacement Rate Ratio in the Evolution of Mammalian 
Genes. Mol Biol Evol. 2007;24: 2235–2241. doi:10.1093/molbev/msm152 

123.   Firth AE, Brown CM. Detecting overlapping coding sequences with pairwise alignments. 

Bioinformatics. 2005;21: 282–292. doi:10.1093/bioinformatics/bti007 

124.   Firth AE, Brown CM. Detecting overlapping coding sequences in virus genomes. BMC 

Bioinformatics. 2006;7: 75. doi:10.1186/1471-2105-7-75 

125.   Landweber LF, Gilbert W. RNA editing as a source of genetic variation. Nat Lond. 

1993;363: 179.  

126.   Tielens AGM, van Hellemond JJ. Surprising variety in energy metabolism within 

Trypanosomatidae. Trends Parasitol. 2009;25: 482–490. doi:10.1016/j.pt.2009.07.007 

127.   Verner Z, Čermáková P, Škodová I, Kováčová B, Lukeš J, Horváth A. Comparative analysis 

of respiratory chain and oxidative phosphorylation in Leishmania tarentolae, Crithidia 
fasciculata, Phytomonas serpens and procyclic stage of Trypanosoma brucei. Mol 
Biochem Parasitol. 2014;193: 55–65. doi:10.1016/j.molbiopara.2014.02.003 

128.   Jackson AP, Berry A, Aslett M, Allison HC, Burton P, Vavrova-Anderson J, et al. Antigenic 

diversity is generated by distinct evolutionary mechanisms in African trypanosome 
species. Proc Natl Acad Sci U S A. 2012;109: 3416–3421. doi:10.1073/pnas.1117313109 

129.   Morrison LJ, Vezza L, Rowan T, Hope JC. Animal African Trypanosomiasis: Time to 
Increase Focus on Clinically Relevant Parasite and Host Species. Trends Parasitol. 
2016;32: 599–607. doi:10.1016/j.pt.2016.04.012 

130.   Maslov DA, Hollar L, Haghighat P, Nawathean P. Demonstration of mRNA editing and 

localization of guide RNA genes in kinetoplast–mitochondria of the plant trypanosomatid 
Phytomonas serpens1Note: Nucleotide sequences from P. serpens 1G reported in this 
work were deposited in GenBankTM database with the following accession numbers: 
AF034624 (Sau3AI-cut minicircle), AF034625 (HindIII-cut minicircle), AF034626 (fully 
edited sequence of RPS12 mRNA), AF034627 (genomic sequence of RPS12 cryptogene).1. 
Mol Biochem Parasitol. 1998;93: 225–236. doi:10.1016/S0166-6851(98)00028-0 

131.   David V, Flegontov P, Gerasimov E, Tanifuji G, Hashimi H, Logacheva MD, et al. Gene Loss 

and Error-Prone RNA Editing in the Mitochondrion of Perkinsela, an Endosymbiotic 
Kinetoplastid. mBio. 2015;6: e01498-15. doi:10.1128/mBio.01498-15 

 

285 

132.   Maslov DA. Complete set of mitochondrial pan-edited mRNAs in Leishmania mexicana 

amazonensis LV78. Mol Biochem Parasitol. 2010;173: 107–114. 
doi:10.1016/j.molbiopara.2010.05.013 

133.   Käll L, Krogh A, Sonnhammer ELL. A Combined Transmembrane Topology and Signal 

Peptide Prediction Method. J Mol Biol. 2004;338: 1027–1036. 
doi:10.1016/j.jmb.2004.03.016 

134.   Käll L, Krogh A, Sonnhammer ELL. Advantages of combined transmembrane topology and 

signal peptide prediction—the Phobius web server. Nucleic Acids Res. 2007;35: W429–
W432. doi:10.1093/nar/gkm256 

135.   Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJE. The Phyre2 web portal for 

protein modeling, prediction and analysis. Nat Protoc. 2015;10: 845–858. 
doi:10.1038/nprot.2015.053 

136.   Xu Y, Tao Y, Cheung LS, Fan C, Chen L-Q, Xu S, et al. Structures of bacterial homologues of 

SWEET transporters in two distinct conformations. Nature. 2014;515: 448–452. 
doi:10.1038/nature13670 

137.   Bender T, Pena G, Martinou J-C. Regulation of mitochondrial pyruvate uptake by 

alternative pyruvate carrier complexes. EMBO J. 2015;34: 911–924. 
doi:10.15252/embj.201490197 

138.   Štáfková J, Mach J, Biran M, Verner Z, Bringaud F, Tachezy J. Mitochondrial pyruvate 

carrier in Trypanosoma brucei. Mol Microbiol. 2016;100: 442–456. 
doi:10.1111/mmi.13325 

139.   Saunders EC, Ng WW, Kloehn J, Chambers JM, Ng M, McConville MJ. Induction of a 

Stringent Metabolic Response in Intracellular Stages of Leishmania mexicana Leads to 
Increased Dependence on Mitochondrial Metabolism. PLOS Pathog. 2014;10: e1003888. 
doi:10.1371/journal.ppat.1003888 

140.   Aphasizheva I, Aphasizhev R. U-Insertion/Deletion mRNA-Editing Holoenzyme: Definition 

in Sight. Trends Parasitol. 2016;32: 144–156. doi:10.1016/j.pt.2015.10.004 

141.   Kirby LE, Koslowsky D. Mitochondrial dual-coding genes in Trypanosoma brucei. PLoS 

Negl Trop Dis. 2017;11: e0005989. doi:10.1371/journal.pntd.0005989 

142.   Lamour N, Rivière L, Coustou V, Coombs GH, Barrett MP, Bringaud F. Proline Metabolism 

in Procyclic Trypanosoma brucei Is Down-regulated in the Presence of Glucose. J Biol 
Chem. 2005;280: 11902–11910. doi:10.1074/jbc.M414274200 

143.   Coustou V, Biran M, Breton M, Guegan F, Rivière L, Plazolles N, et al. Glucose-induced 

Remodeling of Intermediary and Energy Metabolism in Procyclic Trypanosoma brucei. J 
Biol Chem. 2008;283: 16342–16354. doi:10.1074/jbc.M709592200 

 

286 

144.   Bochud-Allemann N, Schneider A. Mitochondrial Substrate Level Phosphorylation Is 

Essential for Growth of Procyclic Trypanosoma brucei. J Biol Chem. 2002;277: 32849–
32854. doi:10.1074/jbc.M205776200 

145.   Horváth A, Horáková E, Dunajčíková P, Verner Z, Pravdová E, Šlapetová I, et al. 

Downregulation of the nuclear-encoded subunits of the complexes III and IV disrupts 
their respective complexes but not complex I in procyclic Trypanosoma brucei. Mol 
Microbiol. 2005;58: 116–130. doi:10.1111/j.1365-2958.2005.04813.x 

146.   Gnipová A, Panicucci B, Paris Z, Verner Z, Horváth A, Lukeš J, et al. Disparate phenotypic 

effects from the knockdown of various Trypanosoma brucei cytochrome c oxidase 
subunits. Mol Biochem Parasitol. 2012;184: 90–98. 
doi:10.1016/j.molbiopara.2012.04.013 

147.   Kuile BH ter. Adaptation of metabolic enzyme activities of Trypanosoma brucei 

promastigotes to growth rate and carbon regimen. J Bacteriol. 1997;179: 4699–4705. 
doi:10.1128/jb.179.15.4699-4705.1997 

148.   Simpson RM, Bruno AE, Bard JE, Buck MJ, Read LK. High-throughput sequencing of 

partially edited trypanosome mRNAs reveals barriers to editing progression and evidence 
for alternative editing. RNA. 2016;22: 677–695. doi:10.1261/rna.055160.115 

149.   Carnes J, McDermott S, Anupama A, Oliver BG, Sather DN, Stuart K. In vivo cleavage 

specificity of Trypanosoma brucei editosome endonucleases. Nucleic Acids Res. 2017;45: 
4667–4686. doi:10.1093/nar/gkx116 

150.   Otaka E, Hashimoto T, Mizuta K. The ribosomal proteins. I: An introduction to a 

compilation of the protein species equivalents from various organisms by a universal 
code system. Protein Seq Data Anal. 1993;5: 285–300.  

151.   Lawson SD, Igo RP, Salavati R, Stuart KD. The specificity of nucleotide removal during RNA 

editing in Trypanosoma brucei. RNA. 2001;7: 1793–1802.  

152.   Baradaran R, Berrisford JM, Minhas GS, Sazanov LA. Crystal structure of the entire 

respiratory complex I. Nature. 2013;494: 443–448. doi:10.1038/nature11871 

153.   Källberg M, Wang H, Wang S, Peng J, Wang Z, Lu H, et al. Template-based protein 
structure modeling using the RaptorX web server. Nat Protoc. 2012;7: 1511–1522. 
doi:10.1038/nprot.2012.085 

154.   Peng J, Xu J. A multiple-template approach to protein threading. Proteins Struct Funct 

Bioinforma. 2011;79: 1930–1939. doi:10.1002/prot.23016 

155.   Peng J, Xu J. Raptorx: Exploiting structure information for protein alignment by statistical 
inference. Proteins Struct Funct Bioinforma. 2011;79: 161–171. doi:10.1002/prot.23175 

 

287 

156.   Sturm NR, Maslov DA, Blum B, Simpson L. Generation of unexpected editing patterns in 
Leishmania tarentolae mitochondrial mRNAs: Misediting produced by misguiding. Cell. 
1992;70: 469–476. doi:10.1016/0092-8674(92)90171-8 

157.   Maslov DA, Thiemann O, Simpson L. Editing and misediting of transcripts of the 

kinetoplast maxicircle G5 (ND3) cryptogene in an old laboratory strain of Leishmania 
tarentolae. Mol Biochem Parasitol. 1994;68: 155–159. doi:10.1016/0166-6851(94)00160-
X 

158.   Alatortsev VS, Cruz-Reyes J, Zhelonkina AG, Sollner-Webb B. Trypanosoma brucei RNA 
Editing: Coupled Cycles of U Deletion Reveal Processive Activity of the Editing Complex. 
Mol Cell Biol. 2008;28: 2437–2445. doi:10.1128/MCB.01886-07 

159.   Necas D, Ohtamaa M, Määttä E, Haapala A. python-Levenshtein 0.12.0.  

160.   Zimmer SL, Simpson RM, Read LK. High throughput sequencing revolution reveals 

conserved fundamentals of U-indel editing. Wiley Interdiscip Rev RNA. 2018;9: e1487. 
doi:10.1002/wrna.1487 

161.   Simpson RM, Bruno AE, Chen R, Lott K, Tylec BL, Bard JE, et al. Trypanosome RNA Editing 
Mediator Complex proteins have distinct functions in gRNA utilization. Nucleic Acids Res. 
2017;45: 7965–7983. doi:10.1093/nar/gkx458 

162.   Koslowsky DJ, Jayarama Bhat G, Read LK, Stuart K. Cycles of progressive realignment of 

gRNA with mRNA in RNA editing. Cell. 1991;67: 537–546. doi:10.1016/0092-
8674(91)90528-7 

163.   Abraham JM, Feagin JE, Stuart K. Characterization of cytochrome c oxidase III transcripts 

that are edited only in the 3′ region. Cell. 1988;55: 267–272. doi:10.1016/0092-
8674(88)90049-9 

164.   Sturm NR, Simpson L. Partially edited mRNAs for cytochrome b and subunit III of 

cytochrome oxidase from leishmania tarentolae mitochondria: RNA editing 
intermediates. Cell. 1990;61: 871–878. doi:10.1016/0092-8674(90)90197-M 

165.   Decker CJ, Sollner-Webb B. RNA editing involves indiscriminate U changes throughout 

precisely defined editing domains. Cell. 1990;61: 1001–1011. doi:10.1016/0092-
8674(90)90065-M 

166.   Ammerman ML, Presnyak V, Fisk JC, Foda BM, Read LK. TbRGG2 facilitates kinetoplastid 

RNA editing initiation and progression past intrinsic pause sites. RNA. 2010;16: 2239–
2251. doi:10.1261/rna.2285510 

167.   Wang Z, Drew ME, Morris JC, Englund PT. Asymmetrical division of the kinetoplast DNA 
network of the trypanosome. EMBO J. 2002;21: 4998–5005. doi:10.1093/emboj/cdf482 

 

288 

168.   Lukeš J, Skalický T, Týč J, Votýpka J, Yurchenko V. Evolution of parasitism in kinetoplastid 

flagellates. Mol Biochem Parasitol. 2014;195: 115–122. 
doi:10.1016/j.molbiopara.2014.05.007 

 

 

 

289