DIGITIZATION OF ANALOG SEISMOGRAMS
                        By
              Kaitlynn Mary Stibitz
                     A THESIS
                    Submitted to
            Michigan State University
    in partial fulfillment of the requirements
                 for the degree of
   Geological Sciences – Master of Science
                       2023


                                           ABSTRACT
        The recovery and digitization of analog seismograms is critical for research into historical
seismological events. Analog seismogram digitization is a difficult and complex problem and
requires standards to successfully recover information from the analog media. This study
investigates proposed standards for the digitization of analog seismograms. For this
investigation, ‘white noise’ synthetic seismograms were used, with known frequency content that
emulates analog records. The synthetic signal was modified to test variables such as scan
resolution, interpolation algorithms, amplitude, line thickness, etc. After digitization, the digital
seismograms were compared back to the original synthetic seismogram. Effectiveness of scan
density can be quantified by the simplicity of digitization and waveform accuracy. Low scan
resolutions adversely affect waveform accuracy and ultimately the frequency recovery. For
example, a 200 DPI image can recover signals up to 2.5 Hz whereas a 600 DPI image can
recover up to about 8 HZ, assuming an original recording speed of 60 mm/minute. Variability of
the waveform thickness can change due to the focus of the recording beam/pen. Wider signal
traces reduce the probability of accurately recovering high frequency signals due to hidden
signals in the overlapping traces. We also observed the recoverable signal from low amplitude
analog traces. Signals that exceed five times the width of the analog trace can be recovered
within 3db of the true reference amplitude.


                                     ACKNOWLEDGEMENTS
         I would first like to thank Dr. Kevin Mackey for taking a chance on me back in 2014,
which allowed me to become an undergraduate researcher. I would not be where I am today if it
were not for your constructive feedback, advice, and friendship. I would also like to thank my
parents and my friends for always allowing me to bounce ideas off of them and for always
supporting me in all of my endeavors. I would like to thank Dr. Kazuya Fujita for helping me
become a better writer and scientist by your feedback, questions, and guidance. Thank you for
helping me find and translate Russian literature and for being a valuable resource. I would also
like to extend my gratitude to Dr. Jeffrey Freymueller and Dr. Min Chen for serving as vital
members of my committee. Dan Burk, thank you for providing the framework for numerous
programs that were used in this project as well as taking the extra time to help me comprehend
study material. This project would not have been possible without the digitization efforts from
my numerous colleagues in Russia, Kazakhstan, and Kyrgyzstan. Thank you to Chris Witte at
Michigan State for also assisting in numerous digitizations for this project.
         The financial assistance for my master’s degree was jointly provided by the US
Department of State and the Air Force Research Laboratory. I would like to thank both agencies
who assisted my continued education and the field work in numerous countries. Lastly, I would
like to thank my husband, John, who was my emotional and supportive rock during this whole
process. Thank you for believing in me and reminding me to never give up.
                                                 iii


                                             TABLE OF CONTENTS
INTRODUCTION ......................................................................................................................1
PREVIOUS WORK ....................................................................................................................4
  Early Digitizations ...................................................................................................................4
  Modern Digitizations ............................................................................................................. 12
  Summary ............................................................................................................................... 21
METHODOLOGY.................................................................................................................... 22
  Interpolation Method ............................................................................................................. 30
  Seismogram Sample Rate ...................................................................................................... 36
  Synthetic Seismogram Development and Variables Studied................................................... 38
DISCUSSION OF RESULTS ................................................................................................... 48
  Effects of DPI ........................................................................................................................ 48
    Image Resolution ............................................................................................................... 48
    Theoretical Sine Test ......................................................................................................... 58
  Technician Variability ........................................................................................................... 62
  Image Resolution Test ........................................................................................................... 66
  Waveform Thickness Test ..................................................................................................... 74
  Amplitude Test ...................................................................................................................... 85
  Co-located Stations ................................................................................................................ 92
CONCLUSIONS....................................................................................................................... 96
REFERENCES ......................................................................................................................... 99
APPENDIX ............................................................................................................................ 101
                                                                 iv


                                        INTRODUCTION
        The global archives of analog seismological data are of critical importance as they
contain the only recordings for many of Earth’s historic large earthquakes; they contain many
seismograms from the era of nuclear testing (mid 1940s to 1990s), which was largely completed
by the time most stations were converted to digital. Analog seismograms are an underexploited
resource, although “[they] constitute an irreplaceable dataset for the quantitative investigation
and understanding of the planet’s long-term seismicity” (Okal, 2015). Most of the world’s
seismological data up until the late 20th century was recorded on analog media, such as paper,
photographic film, or magnetic tape (Figure 1). A pen or a stylus would inscribe the ground
motion on various forms of media which in turn produced different levels of signal definition
due to the nature of the recording method. For example, an ink pen seismogram has sharp lines
compared to the photosensitive paper where a light beam has a potential for being out of focus
and draws ‘fuzzy’ signal. Almost all seismological stations were converted to digital recording
of seismometers in the 1990s and early 2000s. The analog nature of the older analog data
prevents modern digital processing techniques from being used for their analysis.       The analog
data archives are also aging, and as such the analog data is at a high risk for loss due to
degradation, disposal, and/or destruction.
        An option to modernize these analog records and to save them from further deterioration
and loss, is to scan them into high resolution image files. Scanned seismograms will not only
preserve the raw data but allow the re-creation of the original analog waveform into a digital
signal, which will then allow the data to be digitally processed. Digitization is conducted by
manually selecting points at the peak, trough, or a change of slope in the signal or by a computer
automatically identifying and tracing the signal and producing a digital waveform.
                                                  1


A.
B.                                                 C.
D.
E.
   Figure 1. Examples of different types of analog seismograms. A) Photopaper, B)
   Microfilm, C) Printed seismogram from microfilm, D) Ink pen, E) Thermal paper
               (Photographs from Michigan State University archives).
                                           2


        Numerous projects around the world have started to preserve, scan, and digitize analog
seismograms. However, the successful digital recovery and usage of these signals is dependent
on the accurate and complete recovery of information from the analog seismogram.
Unfortunately, there are no existing standards or recommendations on how to achieve the level
of accuracy required nor, has ‘the level of accuracy required’ even been defined. For example, if
a waveform is of high frequency, and the known signal contains data up to 10 Hertz (Hz), there
has been no published guidance on the reliable recovery of this data. The goal for digitizing
seismograms is to achieve the most precise waveform as possible retaining the frequencies,
amplitudes, and signal integrity, all while limiting any artifacts from arising in the digitization.
That cannot be done unless the factors within the digitization process are fully understood.
Kemerait et al. (1981) claims that “The [seismogram] database dependence is not only on the
quantity and distribution of data but also on the quality of the data”, but as a community that
openly shares data and data management practices, it surprisingly fails in setting criteria for
obtaining quality analog data digitizations that maintain both good signal quality and frequency
response. This thesis examines variables within the digitization process to achieve quality
analog digitizations practices and provides recommendations for digitization to the seismological
community.
                                                  3


                                        PREVIOUS WORK
Early Digitizations
        Early digitizations of analog seismograms utilized the physical seismogram where
waveforms were digitized by hand either with a millimeter scale or on a digitization table.
Seismograms were attached to a digitization table where a technician would manually select
points with a stylus, or a puck, to recreate the seismic signal. An example of someone digitizing
is displayed in Figure 2. Digitization is completed by aligning a small crosshair in a magnifying
glass inside the puck and the point of interest on the seismogram. Once aligned, the person
digitizing will define the point which is then translated onto a computer (Indian Institute of
Science).
                                                   4


                A.
               B.
Figure 2. A) Photograph of a technician digitizing a map on an old-fashioned digitization
table (photograph from Indian Institute of Science). B) Digitizer puck used for selecting
            points in the digitization (photograph from Eastmancuts, 2013).
                                             5


         Chiburis et al. (1980) digitized 35-mm and 70-mm film chip seismograms from the
World-Wide Standard Seismograph Network (WWSSN) by hand on a digitization table. For
digitization, the film chips were enlarged to produce a copy of the seismogram from which (x, y)
coordinates were selected throughout the waveform. This authors initially considered an
automated digitizer for this project, however ultimately decided against it due to photos needing
significant alterations and ‘retouching’ from overlapping signals or variances in the beam
brightness. Coordinate points that were selected in the digitization process were interpolated
using three different methods (four-point Lagrangian, ½ cosine, and ¼ cosine) to help recreate
the sinusoidal shape of the waveform. Figure 3 illustrates a digitization with the application of
the interpolation techniques on the raw coordinate points from digitizing. The small dots
illustrate the selected points during digitization. The letters denote the interpolation technique
applied in the specific section of the waveform. Testing different methods ensured a suitable
replica of the original waveform. By visual inspection, the original waveform was overlain by
the interpolated method to compare the fit. If a particular section had noticeable discrepancies of
amplitude or frequency recovery, it was sent back for revision.
                                                   6


Figure 3. Application of interpolation techniques for digitization points. Points (shown as
small dots) were selected during digitization of a 70-mm film chip seismogram. Each letter
 signifies the interpolation function in that section of the digitization. A) continuous mode,
              and B) ½ cosine interpolation. (Figure from Chiburis et al., 1980).
                                                7


         Kemerait et al. (1981) recognized the usefulness of digitizing analog seismic data;
however, they questioned how “good” the resulting digitization were. They created a series of
synthetic seismograms to help quantify the “goodness” of a digitized seismogram, without
specifically isolating variables within the digitization process, such as sampling rate, DPI
(Number of pixels per inch of image) etc. The authors state how digitization is a complex
process, with three identified sources of error. The first source of error is the inconsistency of a
user’s digitization experience. Second, in order to achieve a quality digitization a program must
have the ability to fit an appropriate curve to the data and apply correct interpolation methods.
The final source of error in digitizing is the high potential for signal distortion. This could
happen when a print from a film chip is made, or a scan is magnified before digitization which
can both create and/or amplify a distortion.
         To examine a digitization’s potential, Kemerait et al. (1981) used synthetic seismic data
and hand-digitized analog records. The process for digitizing records were the same described in
Chiburis et al. (1980). Synthetic signals generated at 5 Hz were used to examine how well the
generated synthetic signals were relative to hand-digitizations completed by four individuals.
The five signals were compared against one another for their frequency response. Figure 4
illustrates the signals, and the table below describes the frequency response. The topmost
waveform (a) is the synthetic waveform followed by the hand-digitized signals (b-e). The
frequency recovery between the four waveforms showed a strong correlation between the
frequencies of 0 – 3 Hz with a value of 0.89. From this examination, the authors of this study
concluded that hand-digitizations do indeed produce adequate signals for data analysis.
                                                   8


Figure 4. Comparison of synthetic seismogram (a) with four hand-digitized analog records
  (b-e). The table below compares their frequency response between different frequency
                        ranges (Figure from Kemerait et al., 1981).
                                            9


         James and Linde (1971) examined the range of WWSSN microfilm digitizations where
they digitized one seismogram three times with the x-axis at different orientations. One
digitization had the x-axis parallel to the trace, a second had the x-axis perpendicular to the
seismogram drum axis, and a third where the x-axis was oriented to the direction of the
galvanometer swing. If inappropriately aligned in the digitization process, the resulting
digitization would have significant skew (Figure 5). The first and second traces, which correlate
to the first two digitization methods, display distortion in the waveform as compared to the third
trace where the x-axis was oriented to the direction of the galvanometer swing. In the third
digitization method, the digitizing device was tilted at an angle similar to the swing of the
galvanometer to limit the amount of distortion in the digitization.
         Singh (1983) compared digitized WWSSN microfilm chips to long period seismograms
from High Gain Long Period (HGLP) and Seismic Research Observatories (SRO) networks for
studies on anisotropy. Paper records were created from a microfilm reader-printer where seismic
traces were digitized using a semi-automatic D-Mac digitizer where coordinates were selected
along the trace. Some microfilm records were deemed unsuitable for digitization as they had too
thin of a line or had overlapping traces which made it hard to follow in order to recreate the
trace. Singh (1983) confirmed the same digitization alignment errors as James and Linde (1971).
The x-axis of the digitizing device must be parallel to the swing of the galvanometer. Both
Singh (1983) and James and Linde (1971) agreed that if the seismogram alignment is incorrect
during digitization, the waveform would have significant errors and negatively impact
geophysical studies.
                                                   10


Figure 5. Example of distortion in a digitized WWSSN microfilm chip. Each line shows a
   separate digitization method where the x-axis was oriented differently. Trace 1 was
    digitized with the x-axis parallel to the trace. Trace 2 was digitized with the x-axis
   perpendicular to the trace, and Trace 3 was digitized with the x-axis oriented in the
    direction of the galvanometer swing. (Figure taken from James and Linde (1971)).
                                               11


        Early digitizations would sometimes use printed copies of magnified film chip
seismograms. With this method of digitization, there are multiple layers of distortion. Initially,
film chips have a potential for optical lens focal point distortion from the camera that took the
original photo of the seismogram. Information deviating away from the focal point may have a
distorted perspective in the data. Secondly, the seismogram would then be magnified and
printed, which is yet another potential source of distortion, for easier digitization. In some
instances, the magnified film chip copy would be scanned and sent to a user for digitization.
Having a copy of a copy is several steps removed from the original source of the data and has a
high potential for distortion and data misrepresentation in the digitization process. To limit these
deformations, seismograms should be taken from storage, and scanned at a high resolution on a
scanner directly at the seismic network and returned to storage for safe keeping. Having the
original seismogram be full scale and have no focal point distortions can greatly improve the
chances of retaining the fidelity of the seismic information in the digitization.
Modern Digitizations
        Modern seismogram digitization efforts utilize different processes, depending on the
original recording media. For media such as paper or photographic film, the process uses a
scanned image of the original waveform. The resulting digitization is done on a computer either
manually, where an operator selects all points used to reconstruct the trace, or automatically
where the images are digitally processed by a computer algorithm to recognize and follow the
trace. The automatic techniques typically require human oversight to correct any errors such as
adjusting the timing and correcting the trace. Magnetic tapes are another type of analog data that
contain recorded seismic data, however, techniques of recovering this analog data and
digitization from this form of media are not discussed here. Much of the basic theory such as
                                                  12


relationships between sample rate and frequency recovery remain the same. Many digitization
projects had to develop their own digitization software due to the lack of a ‘one-size-fits-all’
program where numerous types of analog media could be digitized by the same program.
        Between 2005 - 2011, a large-scale digitization project between Lamont-Doherty
Observatory of Columbia University, USA (LDEO) and the Institute of Geophysical Research
(IGR) in Kazakhstan, digitized more than 6000 records of nuclear and chemical explosions in
and around Central Asia (Sokolova, 2015). Technicians digitized photopaper seismograms using
a software program developed by the California Institute of Technology, known as NXSCAN.
NXSCAN is a semi-automatic digitization program that requires the use of a 1980’s Sun
Microsystems workstation (NXSCAN, 1992). Scanned seismograms were uploaded into the
program where a line-following algorithm digitized the waveform with a sampling rate of 40
samples per second. Digitized seismograms were utilized for multiple geophysical studies such
as regional travel-time curves and seismic attenuation of shear waves (e.g., Richards et al., 2015;
Sokolova, 2015). Figure 7 illustrates a sample seismogram with three digitized components.
                                                13


  Figure 7. Digitized three-component seismogram showing a nuclear explosion. Analog
   seismogram digitized using semi-automatic program, NxScan. Three components are
shown for the station Ak-Kiya (AKK), in Kyrgyzstan (KG) with East-West as the top trace,
 North-South middle, and vertical as the bottom trace. (Figure from Richards et al., 2015).
                                            14


        The Berkeley Seismological Laboratory (BSL) started scanning their analog seismogram
archive, consisting of over 1 million seismograms, in 2003. Scientists and researchers within the
lab understood the significance of digitizing analog records not only for its data importance, but
the fact that scanning analog seismograms would further ensure the preservation and protection
of the seismic data on the photographic and smoke paper records should they deteriorate.
Researchers like Bromirski and Chuang (2003) attempted to use the older digitization software
from LDEO, NXSCAN, but due to computer constraints the need for a new software system was
realized. They developed a digitization software program called SeisDig. This new program
utilized a MATLAB interface which provided the flexibility of use by various types of
computers. Seismograms were digitized from a scanned 400 Dots per Inch (DPI) seismograms, a
4 sample per second sampling interval, and a spline interpolation method using SeisDig
(Bromirski and Chuang, 2003).
        In 2001, the Istituto Nazionale di Geofisica E Vulcanologia (INGV) in Italy initiated a
Europe-wide project, Progetto SISMOS (SISMOgrammi Storici), to locate, scan, digitize, and
archive historical analog seismograms. A collection of countries around the Mediterranean
gathered records at their observatories and sent them to SISMOS for scanning and preservation.
Historic seismograms were brought to a ‘scanning laboratory’ where high resolution scanners
would scan the entire seismogram at 1016 DPI (Michelini et al., 2005). Some seismograms were
scanned at lower image resolutions, such as 200 or 600 DPI, that was dependent on the scanner
used (Okal, 2015). A high resolution was recommended however, because it prevented the loss
of important seismic information. During SISMOS, a new program that vectorized
seismograms, Teseo, later named Teseo2, was developed in 2005. Seismic signals were traced
by connected vectors that were representative of a piecewise cubic Bézier curve that recreated
                                                 15


the signal. The program allowed for manual or automatic digitization of seismograms (Pintore et
al., 2005).
        Ishii et al. (2015) at Harvard University began a large digitization project in which
seismograms from the local Harvard station were scanned and digitized using a Harvard-
developed semi-automatic digitization software program, DigitSeis. Upon digitization, the
seismograms were uploaded to an online archive with seismic records dating back to 1933.
Seismograms were scanned at image resolutions between 800 to 1200 DPI, depending on the
scanner used. The authors explored higher DPI resolutions but were limited by the scanning
time per image. For example, scanning a 2400 DPI image would take more than 30 minutes to
scan compared to 1200 DPI which only took 5 minutes. In determining the DPI
recommendations for their program, the authors made no other justification for their DPI
selection for DigitSeis. A comprehensive evaluation of DigitSeis completed by the author is
found in the Appendix where synthetic seismograms were put through a manual and automatic
digitization process and compared for their frequency recovery.
        Yu et al. (2017) described several Chinese analog seismogram digitization projects that
digitized and cataloged hundreds of thousands of seismic records and maps. Varying image
resolutions, from 300 to 600 DPI, were used in their digitization projects. The authors also
investigated the effects of image resolution and file size. They concluded that if the resolution
was too low, information would be lost and if the resolution was too high the file size would be
too large. The recommendation was made that in order to maintain a balance between frequency
recovery and file size, an image resolution of 600 DPI was best. However, this recommendation
is dependent on a seismogram’s recording speed, which was not described by the authors.
                                                  16


        Currently, Michigan State University (MSU) and the Geophysical Survey of Russia are
collaborating on an ongoing effort of the collection and digitization of Peaceful Nuclear
Explosion (PNE) analog seismograms from across the Former Soviet Union. Due to a majority
of the detonation sites being in seismically stable regions of Russia, these analog records are of
great interest to researchers. After locating and scanning the seismograms of interest, the
seismograms were digitized in a manual digitization program developed by the Institute of
Petroleum Geology and Geophysics in Novosibirsk, called Wavetrack. Mackey et al. (2009)
observed differences in signal quality from scanned images. Several of the Russian-scanned
seismograms were in black and white and showed low signal definition as compared to
seismograms scanned by Mackey that were in grayscale that showed better clarity of the seismic
signal (Figure 8). If too much detail is lost, the seismogram becomes un-digitizable because the
signal is not detectable in the digitization process.
                                                  17


Figure 8. Comparison between scanned image color. Top image illustrates a black and
 white image, and the bottom is grayscale. Note the differences in the signal definition
between the two images (Top photo: GSRAS, 2001, bottom photo: Mackey et al., 2009).
                                          18


         Other efforts at MSU showed the effect of skew on digitization. A situation where the
seismic event is close to the recording station will result in the recording media having both high
signal frequencies and amplitudes. This can create a problem in digitizing waveforms as the
signals can have noticeable skew if there is a slight angle from the horizontal to the waveform.
If left untouched, the waveform will become tilted during digitization. Factors that can introduce
skewed waveforms into the digitization include misalignment of the recording pen to the
recording media and the orientation of the paper relative to the scanner or photocopier. Figure 9
illustrates the effect of digitizing skewed waveforms. The top figure displays the signal leaning
to the left which is a direct result from a skewed signal. The misalignment of the seismogram as
originally digitized was 0.17 degrees. To correct this situation, the scanned seismogram must be
realigned in a photo editing software, so the waveforms digitized are orthogonal to the time axis.
The bottom figure in Figure 9 shows the waveform corrected after image rotation (K. Mackey,
personnel communications).
                                                  19


Figure 9. Digitized section of a seismogram with visible skew. Top figure: digitized section
     of a waveform with left tilting slopes. Bottom figure: Corrected waveform post
  modification with skew removed (figure from K. Mackey, personnel communications).
                                             20


Summary
        The process of digitizing analog seismograms has been an ongoing, evolving process for
many decades. Due to the advancement of technology, digitization has progressed from using
the physical seismogram or a copy on a digitization table to computer-based methods, like
Wavetrack and DigitSeis, where the seismogram is digitized on a digital workstation from an
optical scanned image. Upon review of previous digitization projects, one observation is clear:
there are no standards for digitizing analog seismograms. No prior discussions have occurred
about the variables and parameters within the digitization process that affect the quality and
accuracy of digitized analog data. Some unknown variables include sampling rate, interpolation
method, the DPI needed to retain a certain frequency, recording rate, the waveform thickness line
as well as many more. The accurate use of digitized data is only as good as the digitization
process. Standards need to be set, or at least discussed in length, for the seismological
community to better utilize the unique datasets available from analog seismograms and to
identify the limitations of previously produced digital datasets and studies based on them.
                                                 21


                                         METHODOLOGY
         This section describes the digitization process and post-processing used in this research
and includes background as to why certain procedures were done. For this thesis, most of the
research was conducted using the Wavetrack software. In this digitization process, a grid is
overlaid on the seismic image where inside, the trace of interest is digitized. The signal
amplitude is on the x-axis and time is on the y-axis. Within the grid, there are horizontal lines
that correlate to minute marks. These lines can be moved to accurately mark the beginning of
each minute. In some seismograms that were recorded on a drum, the rotational speed was not
always constant, causing the length of minutes and corresponding time scale to be variable. The
ability to individually use a variable timing grid largely corrects the variability of the time axis
and thus retains accurate timing in the digitization. The rotational speed of seismic recorders
varies due to both environmental factors and the mechanics of the recording system. Other
digitization programs lack this feature of adjustable minute marks, which makes Wavetrack an
optimal software.
         An example of the digitization grid in Wavetrack is shown in Figure 10 where the grid
spans the entire length and width of the seismogram. This example shows a grid length of
fourteen minutes and a width of 30 centimeters (cm) to accommodate seismograms with a 60
mm/minute recording speed. These dimensions are customizable to fit any sized seismogram
and recording speed. The seismogram in this example has a width of 30cm and a template
having this predetermined parameter ensures that the digitized amplitudes are truthfully
preserved. Due to the nature of some of the types of analog seismograms, the recording media is
wrapped around a rotating drum so once unraveled, the fifteenth minute is broken apart.
Digitizing the broken minute is possible, but additional image processing and digitizing is
                                                  22


necessary to accurately merge the data together. For example, a small section, or the entire
seismogram will need to be appended to the end of the original image to extend the trace of
interest and ultimately the length of the digitization. Thorough image processing is required for
this step to accurately align the original and appended images together. Once appended, a new
Wavetrack grid can be put on the newly extended image and digitized as normal.
                                                  23


Figure 10. View within Wavetrack of a scanned seismogram and digitization grid which is
      shown in red. Amplitude and time are retained in the grid to produce accurate
 digitizations. Amplitudes are measured in the x-axis and time is measured on the y-axis.
        The example seismogram is Soviet seismogram where time runs right to left.
                                           24


         Within the digitization grid, the user recreates the signal by selecting points (or click
points) along the trace. To accurately select click points, the user must be cognizant of the
waveform thickness and exposure levels on the paper. Photographic seismograms have a light
beam tracing the ground motion or signal velocity (or acceleration on a strong motion sensor) on
photosensitive paper. For example, if the ground motion, or signal velocity, were fast, the light
beam has less contact or exposure on the paper resulting in a lighter and thinner trace on the
seismogram. Likewise, slower velocity signals have a darker and thicker appearance on the
seismogram due to the light beam having more exposure on the paper. The velocity changes can
be observed in click points that show the peaks, troughs, and points where the slope changes
within the signal in Figure 11A. When choosing click points in the digitization process, the user
must select the center points of the light beam trace and not the edges. Distortion will be created
if edges are selected due to beam focus or trace thickness. With slower trace velocities having a
darker and potentially wider trace, it is best to select the center of these regions as this will
reflect the movement of the light beam more accurately. An example of accurately selected click
points are shown in Figure 11B.
                                                   25


                 A.
                                                                          Slower trace
                                                                          velocity
                                                                           Faster trace
                                                                           velocity
                B.
  Figure 11. Signal velocity relates to the waveform’s brightness. Areas where the trace
   velocity was slow which causing the light beam to have more exposure with the paper
appear as darker regions in the waveform. Lighter sections highlight areas where the trace
 velocity was fast (Shown as orange arrows in 11A). The blue line in 11B show click points
      selected at the center of the trace to mimic the true movement of the light beam.
                                               26


        The user must also be mindful to select points only at peaks and inflections along the
trace. Wavetrack interprets these click points as a series of line segments and exports a linear fit
to the digitization of the seismic waveform. As this creates an unnatural shape, post-digitization
processing is needed. A curve fitting algorithm called Piecewise Cubic Hermite Interpolating
Polynomial, or PCHIP, is used to fit a realistic curve to the click points selected in the
digitization process (Fritsch and Carlson, 1980). The waveform generated by the PCHIP
algorithm passes through each click point and retains amplitudes and as such this algorithm is the
preferred way to reinterpret waveforms digitized by Wavetrack. An analysis of testing different
curve fitting algorithms is explained later. The waveform is then exported with a 100 samples
per second sampling rate resulting in a realistic digital waveform. Choosing additional click
points is tempting by novice digitizers, but these extra click points do not recover the waveform
as well and create artifacts with the curve fitting algorithm. A comparison between a section of a
digitized seismogram with excess click points (A) and one with only points chosen at the peaks,
troughs, and any changes of slope between the peaks (B) are displayed in Figure 12. Below the
waveforms are digitized signals with the application of the PCHIP interpolation. Note how
Waveform A, the signal with additional points, has a smoothed-rectangular shape compared to
Waveform B, which has only the peaks chosen. The more sinusoidal shape of Waveform B is a
more realistic seismic signal. Yellow highlighted regions display sections of the waveform with
noticeable differences in shape. Further discussions regarding the examination of different curve
algorithms and deciding digitization sample rates are described later in this section. Other
digitization procedures that utilize automatized routines or a different interpolation algorithm
may need to approach point selection differently. The above description relates to the Wavetrack
software and post-processing of waveforms in use here at MSU. However, having a thorough
                                                  27


understanding of the steps of the digitization and post-processing of waveforms is the only way
to accurately recover analog waveforms.
                                                 28


      A.                                        B.
          Waveform A- excess points
          Waveform B- peak points
Figure 12. Comparison between a digitization with excess click points and one with only
peaks chosen. In 12A, the blue digitized line illustrates click points following the line to
 recreate the signal whereas 12B only shows the peaks selected. Choosing only peaks, a
               curve fitting algorithm can recreate a more accurate signal.
                                            29


        One major flaw of Wavetrack is that it does not retain the original click points of the
digitization and as such a multi-step post-digitization process is necessary to have an archive of
original click points in the instance of data recovery. The post-digitization processor back
calculates the original user click points by finding the changes of slope in the digitization.
Having a high digitization rate within Wavetrack allows for better point identification which are
then used to fit a curve fitting algorithm to recreate the waveform.
Interpolation Method
        As previously discussed, the Wavetrack program exports linear interpolations of the
signal between the chosen click points. The goal of digitizing analog seismograms is to recover
the original waveform and the only way to achieve these results is to apply different curve fitting
algorithms to these discrete click points. Examining the effects of multiple interpolation
methods in the frequency and time domain can help determine which curve fitting or smoothing
algorithm best estimates the original signal (D. Burk, personnel communication, 2020). We
interpolated the original click points using three methods: 1) a cubic Hermite spline, 2) a cubic
spline, and 3) a Piecewise Cubic Hermite Interpolation Polynomial (Fritsch and Carlson, 1980).
There are other curve fitting interpolation methods, however we chose to compare only three as
these methods were readily available within Python Obspy.
        Interpolation methods display how the signal is modelled and how well the waveform is
retained post processing. Each interpolation method has its own unique way to ‘draw’ the signal.
Figure 13 illustrates how each interpolation method was drawn over an even time interval on a
continuous sine function (red line). The red points denote the discrete points within the
waveform selected by a user. Each colored line is a different interpolation method. The spline
function (pink) displays a symmetric curve closely matching the original sine function. The
                                                  30


PCHIP interpolation (aqua) on the other hand aligns closer to the linear interpolation (dark blue)
is asymmetric in shape. For this example, the spline function interpolates the signal better than
the PCHIP.
        Examining the PCHIP and spline function further, a basic step function was created and
both interpolation methods were applied. Figure 14 shows a step function (red line), points
within the signal which guide the interpolation algorithm in recovering the original signal (red
points), and the two interpolation methods (PCHIP is blue, and spline is pink). Near the
discontinuity at x =1, the spline interpolation overshoots the amplitude of the step function
whereas the PCHIP function is constrained and follows closer to the original function. For this
example, PCHIP performs better interpolating the data because it follows closely to the original
waveform.
                                                  31


 Figure 13. Comparison of multiple curve fitting algorithms in the time domain. Red stars
denote click points where there is a change of slope in the signal. The blue line represents a
linear interpolation. The aqua line (PCHIP) and pink (Spline) interpolations highlight how
             each method estimates the curve (Octave Forge Community, 2017).
                                             32


    Figure 14. Step function with two curvilinear interpolation methods. A basic step
function, outlined in red, with selected click points within the signal shown as red stars. A
  PCHIP interpolation method, shown in blue, preserves the shape of the original signal
  better than the spline interpolation, shown in pink (Octave Forge Community, 2017).
                                              33


         The click points from Wavetrack are not distributed evenly in time, which is not
compatible with desired digital processing. Figure 15 illustrates this effect. The blue dots are
the click points chosen in Wavetrack with different colored lines showing the interpolation
methods. The click points are the true peaks in the waveform as well as areas of a change in
slope. The original linear interpolation (red) from Wavetrack still shows that is it not a good
representation of the waveform. Both spline interpolations (blue and green) overshoot the peaks
in the waveform while the PCHIP waveform (gold) follows closer to the original linear output
and does not exceed the true amplitudes of the waveform. An inset within Figure 15 shows a
peak in the waveform illustrating the overshooting peaks from the spline interpolations. For
Wavetrack’s points chosen at uneven time intervals, the spline functions try to apply a symmetric
curve fit to the waveform. In this example, the PCHIP interpolation method is preferred due to
accommodating the uneven time points which are representative of the peaks and troughs in the
waveform all while not overshooting the amplitudes.
         In the examples shown, both spline and PCHIP have situations where one may stand out
over the other. The spline function works well in waveforms that have discrete points chosen at
even time intervals and the points are mere guidelines to recover the original signal. This is
usually the case for automatic and semi-automatic digitization programs. The PCHIP method on
the other hand excels at situations where digitization points are representative of the true peaks,
which happen at uneven time intervals, like our Wavetrack program. For seismograms digitized
in this study, the PCHIP interpolation method was applied in the post processing step.
                                                 34


  0.0002
  0E-005
       0
 -0E-005
                            0.5 s                  1s                    1.5 s
  Figure 15. Waveform with the application of different interpolation methods. The blue
circles are the click points chosen in the digitization process. Red is the linear output from
       the manual digitization program Wavetrack. The blue and green lines are spline
interpolations, and the gold line is the PCHIP interpolation. The inset illustrates a zoomed
                                   in peak of the waveform.
                                               35


Seismogram Sample Rate
        Sampling rate is defined as the number of samples per second in a continuous digital
waveform signal. For seismogram digitization, the continuous analog signal is converted to be
represented by a series of discrete points, or samples, each representing a specific time and
amplitude. If a high enough sampling rate is used, the complexity of the signal will be better
recovered because more points are used to define the shape of the signal. Too low of a sampling
rate will yield a broader curve and potentially loss of high-frequency components of the signal
due to limited points along the waveform. Figure 16 shows a wave with various sampling rates
and better signal recovery with an increase in the sampling rate (shown as rectangles). To ensure
an accurate waveform recovery with best signal retention, a high sampling rate is encouraged.
        Within the same post-digitization processor that applies the PCHIP algorithm used in our
research, the processor resamples the data at a 100 samples per second. This interval yields a
Nyquist frequency of 50 Hz. This is five times above the upper limit of a 10 Hz response for a
short period seismogram.
                                                 36


Sample Rate
Increases
Figure 16. Sample rate, shown as rectangles, correlate to the recovery of the signal. A
 higher sampling rate will capture more complex frequencies and yield a better signal
because there are more points along the line compared to lower sampling rates (Image
                             modified from Brown, 2021).
                                          37


Synthetic Seismogram Development and Variables Studied
        The steps and information within the digitization process need to be accurate to allow
reliable data processing. A series of tests were conducted that modified key variables in the
digitization process where each change provided insight on how the frequency recovery in a
digitized seismogram differed from the reference waveform. Analyzing these factors that
influence the frequency recovery of a digitized seismogram allows researchers and scientists to
better understand the digitization process and ultimately achieve accurate digitizations.
        Synthetic seismograms were generated to examine each digitization variable
independently. The synthetic seismograms were generated using a Python script developed by
D. Burk where a white noise signal is created with a known frequency range in both the
displacement and velocity spectrums. The program allows for the amplitude, trace thickness,
and trace velocity to be changed. Currently, the code does not account for variations of trace
thickness as a function of trace velocity. This is a future modification needed to mimic analog
seismograms that occurs with some recording media. Generated waveforms were saved as a
Miniseed file with an embedded network code, station name, location identifier, and channel
imported in its header. The embedded seismic information allowed the comparison of multiple
waveforms. An example of a white noise displacement signal is illustrated in Figure 17 with a
known frequency response with the range of 1-12 Hz.
        The Miniseed displacement signal was used as the reference signal. To simulate a
scanned analog seismogram, the displacement signal was drawn on a blank image (i.e. an empty
seismogram scan with no waveforms) then exported at 3000 DPI (see Figure 18). The reference
signal did not go through the digitization process. IrfanView, a photographic processing
software (Skiljan, 1996) was used to down-sample the original image into different image
                                                 38


resolutions for each digitization test. Each image was then digitized in Wavetrack using the
process described earlier.
                                                39


 Amplitude
                                               Time (s)
Figure 17. Example of a synthetic seismogram generated with a Python script. This white
                 noise signal contains known frequencies of 1-12 Hz.
                                          40


Figure 18. Synthetic white noise seismic signal embedded on a blank image. This image
    can now be digitized in Wavetrack. This is the same waveform as in Figure 17.
                                           41


         Four variables were examined in this thesis: 1) image resolution, 2) signal trace
thickness, 3) waveform amplitude, and 4) Technician Variability. Three waveforms were
created for first three variables to create an average result for each test. Analyzing technician
variability (or experience) was conducted with a group of technicians digitizing a single
waveform. For image resolution, the reference signal was modified to different image DPIs
using IrfanView. Resolution extremes, both low and high, were chosen for this study to test the
limits of seismogram digitization and its resulting frequency recovery. The impact of the image
resolution on the frequency recovery of a digitized analog seismogram was studied by digitizing
each seismogram and then comparing them back to the original Miniseed waveform that did not
go through the digitization process. Waveforms were visually inspected for assessing frequency
recovery in Power Spectral Density (PSD) graphs. These graphs show the distribution of energy
as a function of frequency and are helpful in understanding which frequencies are strong or weak
in a waveform (Cygnus Research International, n.d).
         The second variable tested was the waveform thickness, where the thickness was
modified to various widths. The original waveform thickness for the synthetic seismograms was
10 pixels wide, which is represented as 10x. Changing this number either higher or lower
resulted in a shift in the waveform thickness. For example, a 50x waveform uses a 50-pixel wide
waveform thickness. To keep some of the digitization variables constant, the image resolution
was set to 600 DPI and the trace velocity was kept at a constant exposure rate throughout the
seismogram. Exposure rate is closely rated to the trace velocity and is the amount of time the
recording media, for example, a light beam, has with the paper, tape, etc. which generates the
waveform thickness on seismograms. More exposure yields a wider waveform thickness and
relates to a slower trace velocity. A constant exposure rate was chosen to limit seismogram
                                                   42


variables that influence seismogram digitization. Four waveforms were examined in the
waveform thickness study. Figure 19 displays an example of a waveform for this test. Each
waveform was generated four different times with various waveform thicknesses. Wider traces
and high frequency signal have a high potential for overlap causing a reduction in signal
recovery.
                                                43


 Figure 19. Synthetic seismogram with varied waveform thicknesses. These seismograms
are generated with a frequency range of 1 – 12 Hz at an image resolution of 600 DPI. The
greater the trace thickness, the higher the probability for concealed high frequency signal.
                                             44


       The third variable that was examined assessed how well a waveform’s amplitude was
recovered in the digitization process depending on the amplitude of the signal. An original
waveform with a pixel variation of +/- 825 pixels from a zero line was generated (waveform 1x
in Figure 20). A multiplier was applied to this number which either magnified or compressed the
amplitudes of the waveform. If a number less than one was used for the multiplier, the
amplitudes were compressed; however, a number less than 0.5 saw severe amplitude
compression and signal discretization where the pixel spread was only a few pixels wide. The
0.5x images were almost a straight line and were not used in this study. Examples of the
amplitude modified waveforms are illustrated in Figure 20.
                                                45


Figure 20. Portion of a 600 DPI synthetic seismogram with varied amplitude heights with
               white noise as the signal in a frequency range of 1 - 12 Hz.
                                             46


        The digitization method was the same in all tests; however, different people performed
the digitization. Some seismograms were digitized by the author; this created some known bias
while digitizing as prior information was known about the individual tests and the reference
waveforms. A ‘blind’ study was thus conducted to utilize over twenty technicians with various
levels of digitization experience from independent organizations in Russia, Kazakhstan, and
Kyrgyzstan. Each individual received one or two images for each test and were instructed to
digitize the waveforms to the best of their ability. The individuals had no connection to each
other, nor did they seek additional help with the digitization process. This ‘blind’ study group
created a realistic situation where a research lab or institute digitizes analog seismograms. This
group also established a way to quantify digitization experience (i.e., months digitizing and
number of digitizations completed) and the quality (or frequency recovery) of a digitization.
This is an important variable in digitizing seismograms as it introduces human influence and is
another variable that is tested in this study. Referencing the digitization process below, the
digitizations will be identified as either author digitized or blind digitized.
                                                   47


                                   DISCUSSION OF RESULTS
Effects of DPI
Image Resolution
        To achieve good results, the copy/scan of the analog data to be digitized must be of good
quality. One element to generate high quality digitized data is with a high-resolution image of
the seismogram. An image with a higher DPI will have a higher pixel density and more detail of
the original image retained, whereas a lower DPI image will feature a lower pixel density and
retain less detail in the image. As the DPI decreases, the pixels become larger and coarser as
they cover more area within the image, which ultimately decreases the confidence level of
deciphering the contents of the image. To better visualize DPI uncertainty, consider a five-
pointed star. As the individual pixels become coarser due to a decreasing DPI, the confidence in
pinpointing each point in the star also decreases. Figure 21 illustrates how accurately identifying
the five points of the star decrease as the DPI decreases.
                                                 48


    Figure 21. Relationship between image resolution and image detail. Uncertainty in
identifying the points of the star increase, like the click points of seismograms, as the image
 resolution decreases. The pixels become larger and coarser as they cover more area of the
                        image. (Image modified from Toskey, 2018).
                                                49


         Just as the star’s five points are harder to identify as the DPI decreases, a similar result is
noticeable in scanned seismograms. The image resolution significantly impacts the digitization
quality, signal timing, and frequency response of a digitized waveform. In the digitization
process, a user selects a peak or any point where there is a change of slope to recreate the signal.
These points become harder to identify as the DPI decreases. Figure 22 illustrates how a
scanned seismogram appears at several image resolutions that are common in modern scanners.
Areas that have a change in slope within the black signal become harder to distinguish as the
pixels become larger due to a lower image resolution and higher frequency signals become lost.
The red box denotes a small area that is zoomed in in Figure 23. From afar, mid-range DPIs, like
300 and 400, may look reasonable to digitize. However, after zooming in closer in the image
results in a ‘fuzzier’ picture which can raise difficulty identifying a slope change in the
seismogram. This illustrates from an image perspective that lower and mid-range DPIs cannot
accurately retain data for seismogram digitization.
                                                    50


Figure 22. Relationship of image resolution and details within a seismogram. Scanned
seismogram at various image resolutions, DPIs. As the image resolution decreases, the
  pixels within the image become larger and fuzzier which reduces the confidence in
          correctly identifying areas with a change of slope within the signal.
                                           51


Figure 23. Zoomed in section of a scanned seismogram. As the image resolution decreases,
the pixels become larger which reduces the confidence in correctly identifying areas with a
change in slope within the signal which is observed in the mid-range image resolutions like
                                       300 and 400.
                                             52


        Each point within the digitization is a coordinate representing time on the x-axis and
amplitude on the y-axis. The width of the pixel represents time and as each pixel become larger
due to lower image resolutions, the time per pixel also increases (shown in the table in Figure
24). A lower resolution image has the most time (seconds) per pixel, ∆t. For example, a 100
DPI image of a seismogram that was recorded at 60 mm/minute has a ∆t of 0.254 seconds/pixel
which is a large uncertainty in recovering the time in the digitization compared to a 3000 DPI
image with a ∆t of 0.0085 seconds/pixel. Uncertainty will be higher with slower recording
speeds (i.e., 120 mm/minute), or lower with faster recording speeds (i.e., 30 mm/minute). Low
pixel densities force the click point selection of a signal peak to either be ahead of or behind the
true peak which creates a time shift. A waveform superimposed with a click point at a peak,
shown as colored dots, was picked at numerous DPI values, also shown in Figure 24. The dotted
black line is the waveform of interest and is digitized while the adjacent black lines are different
waveforms in the seismogram. The colored dots are shown for the peak on the dotted black line.
Each color represents a DPI value, and the accompanying table lists the ∆t values for each
corresponding point. Color coded error bars show the range of each DPI’s click point and for
low DPIs, the magnitude of the variability is the highest. The error bars were determined by the
pixel width for a given DPI. Click points are an intersection of two pixels and the error bar
extent ranges from +/- one pixel intersection. By digitizing a seismogram at 100 DPI, a
digitization’s timing may be shifted by as much as +/- 0.25 seconds. In the end, this could affect
seismological studies such as seismic phase picks used in geophysical analysis. Errors in the
time position of peaks will also result in asymmetric waveforms that do not accurately represent
the original seismogram. Digital analysis of the asymmetric waveforms will have incorrect
frequency content, not filter properly, and generally result in larger errors. Since seismology, as
                                                  53


well as other scientific fields, requires accurately timed data, digitizations should be conducted
only using higher DPIs to retain correct timing within the seismogram. Ultimately, having better
timing in data will produce better geophysical studies.
                                                   54


                 DPI Value               Point Color        ∆t (seconds/pixel)
                     100                   Yellow                  0.254
                     200                    Green                  0.127
                     300                   Orange                 0.0846
                     600                    White                 0.0423
                    1000                     Pink                 0.0254
                    1500                     Blue                 0.0169
                    3000                     Red                  0.0085
Figure 24. Changes in time per pixel (∆t) and Image Resolution (DPI) using an assumed
  recording velocity of 60 mm/minutes. Each DPI value is a different color for the peak
 location of the dotted black line. Its corresponding pixel ∆t value is shown in the table.
    Error bars show the range of the click point location for the peak at a given DPI.
                                              55


        Scan resolution also affects the frequency recovery of a signal. Low DPI values could
force ‘off peak’ click point location selection within the digitization process. For example, if
there is a combination of a thin trace thickness and a low scan resolution, the pixelization at the
end of the line may join resulting in a distorted waveform. This causes a shift in the waveform
peaks which transfers to distorted timing and overall shape of the waveform which then can
result in erroneous frequencies. In Figure 25, a 2 Hz sine wave was generated at two different
DPIs to illustrate how the waveform can become vertically distorted at high and low scan
resolutions. This example demonstrates how the peaks become shifted if there is both a low DPI
and a thin trace thickness (highlighted as red boxes in the figure). In the low DPI, the peak to
peak spacing is variable compared to the higher DPI. There are sections that are wider and
narrower due to where the pixels lie in the matrix in the low DPI image. If these peaks are
selected during the digitization process, the resulting digitized waveform may have asymmetric
signals and may recover incorrect frequencies from the original symmetric signal.
                                                  56


  Low DPI
  High DPI
                   Low DPI                                   High DPI
 Figure 25. Spacing between the waveform peaks is shifted if there is a combination of low
image resolution and a thin line thickness. Time is vertical. The red box highlights an area
    (shown below) of shifted pixels at the waveform peaks which create uneven spacing
                                     between the peaks.
                                             57


Theoretical Sine Test
        This test established the theoretical frequency recovery of a signal at a given DPI.
Synthetic sine waves were created at different DPI values and digitized in Wavetrack. A ‘perfect
scenario’ was formed for digitizing as the synthetic waves limited some of the digitization
variables such as a uniform amplitude and trace velocity. Each synthetic sine wave was
generated from the product of the time interval (0.01 seconds), pi (π), and a multiplier. The base
frequency of the sine wave was 0.5 Hz, the multiplier allowed the wave to easily transform into
different frequencies. For example, using a multiplier of 2, the function would produce a 1 Hz
sine wave. Each wave was exported at a specific DPI then digitized by the author in Wavetrack.
Signal DPIs ranged from 72 DPI to 3000 DPI and a signal’s frequency ranged from 0.5 Hz to 12
Hz.
        The digitized waveforms were examined for shape retention and frequency recovery and
grouped into three categories: recoverable, recoverable but distorted, and not recoverable. The
recoverable waveforms had no asymmetric signals and fully recovered the frequency.
Recoverable but distorted waveforms were determined by noticing some asymmetry in the
waveforms, but they were able to be digitized. Lastly the not recoverable waveforms were
signals that could not recover the original frequency. This may be decided during the
digitization process where the technician physically could not identify the signal from the
background noise or if the digitized signal did not recover the original frequency. Categories are
displayed in a matrix with respect to DPI in Table 1. This table also displays common
seismometer instrument recording speeds.
           An example of a 2 Hz sine with numerous image resolutions are shown in Figure 26.
An ideal 2 Hz signal would have a sharp peak at the 2 Hz line on the PSD graph (bolded in blue
                                                  58


in Figure 26) where the x-axis describes the frequency, and the y-axis describes decibels. Scans
at various DPI values were digitized and compared against the ideal waveform to see their
frequency recovery. If a signal shifted away from the ideal peak and has a broader crown in the
PSD graph, that demonstrates that the digitization has additional noise in the system thus
returning a non-pure 2 Hz frequency. A possible reason for added noise is due to asymmetric
waves in the digitization. Distorted waveforms in the 200 and 300 DPI digitizations are
highlighted as yellow boxes in Figure 26. The asymmetry in the digitization can also be seen in
the PSD graph where the blue and green lines, correlating to the 200 and 300 DPI digitizations,
have broader peaks around the 2 Hz line in the graph. Relating back to the theoretical frequency
recovery matrix, the 200 and 300 DPI would be yellow and anything above 300 would be
deemed green.
                                                59


 Table 1. Theoretical recoverable frequencies at a specified DPI for different recording
speeds. Green is defined as recoverable frequency, yellow describes recoverable but with
                      distortion, and red denotes not recoverable.
                                           60


           200 DPI
      300 DPI
                             2 Hz
   Figure 26. Degradation of 2 Hz sine wave at various image resolutions for a recording
speed of 60mm/minute. Digitizations are compared against a reference waveform for their
   appearance and their frequency response. Yellow regions highlight asymmetry in the
digitizations due to low pixel densities. The Power Spectral Density (PSD) graph on the left
                       shows the frequency recovery of the waveforms.
                                              61


Technician Variability
         Kemerait et al. (1981) claimed that the inconsistencies of a user’s digitization experience
significantly impact the subsequent digitization. This is an almost unspoken variable that needs
to be taken into consideration while digitizing analog seismograms. To quantify the variability
in digitizations due to operator, a separate mini study examined eleven technicians and their
ability to duplicate a 1 – 12 Hz 600 DPI waveform. This was a blind study where each
technician independently digitized a waveform. Each waveform was assigned a number and
compared against one another and a reference waveform.
         From the PSD graph in Figure 27, there are three apparent groupings of technicians:
needs revision, average, and excels. These groupings were based off the maximum recoverable
frequencies determined in Figure 34 and explained in a later section. Technician 4 is in the
‘needs revision’ grouping, as they recovered between 3 – 4 Hz, which is less than 50% of the
expected frequency recovery for a 1 – 12 Hz 600 DPI seismogram. Supplemental training and
revision of the digitization could improve technician 4’s future digitizations. The ‘average’
grouping contains technicians 2, 3, 5, 6, and 10, all of whom recovered up to 6 – 8 Hz. These
technicians recovered what is expected for a 600 DPI seismogram. Lastly the ‘excels’ group has
technicians 7, 8, 9, 11, and 12, as they recover frequencies up to 8 – 9 Hz. This grouping
surpassed the expectations of what we expect a 600 DPI image should recover from a 1 – 12 Hz
signal. Figure 28 highlights the digitizations in the time domain. Within the yellow regions, the
amount of detail in the waveforms are apparent between the technicians which ultimately relate
back to the recoverable frequency in the previous figure. For example, technicians 2 and 4 have
less detail compared to technicians 7 and 11.
                                                  62


 Figure 27. PSD graph illustrating the eleven technician’s digitizations of a 600 DPI 1 – 12
Hz waveform. Each technician’s waveform was assigned a number and compared against
one another and the reference. There are three groupings based on the frequency recovery
of digitizations: needs revision (technician 4, recovers 3-4 Hz), average (technicians 2, 3, 5,
   6, 10 and recovers 6-8 Hz), and excels (technicians 7, 8, 9, 11, and 12 recovers 8-9 Hz).
                                              63


 Figure 28. Comparison of a small section of a 1 – 12 Hz 600 DPI waveform that was
digitized by eleven technicians. Each technician was assigned a number and compared
  against the reference. The yellow highlighted regions especially show variations of
  digitization detail. The waveform that the technicians digitized is shown below the
                                      waveforms.
                                          64


         The eleven technicians had various level of experience (ranging from 0.5 years to 1.5
years) and have completed different quantities of digitizations (ranging from 10s to 100s). This
grouping of technicians is very experienced in seismogram digitization so there is no correlation
in this group of experience level and quality of digitization. One of the technicians who has one
of the highest levels of experience and completed digitizations fell into the ‘average’ category
while another technician who had a low level of experience exceeded the expected frequency
recovery of 600 DPI seismograms. This may describe a technician’s ability to understand
seismograms. It could be that a technician simply does not see some of the superimposed high
frequency signals overlain on the lower frequency signals. Technicians need to grasp the nature
of waveform mechanics and the influence of ground motion and how it translates to a
seismogram. This understanding combined with adequate training will result in better
seismogram digitization.
         The time taken to digitize a waveform (or speed) was a factor that was not collected or
examined in this study. A technician may have completed the task quickly and may have missed
important information while others took a slower, more methodical, approach. This may have an
influence on the recoverability of a waveform’s frequency content. Referring to Kemerait et al.
(1981) statement again, they mention how good digitizations stem from a user’s digitization
experience. While this statement is true and is observed this in this study, a better definition for
a user/technician’s experience should be a mixture of involvement (i.e., number of digitizations
and time overall digitizing), ability to understand waveforms, and the time it takes to digitize a
single waveform all have an influence on the resulting digitization.
         The variability in digitization quality between different technicians is important in this
study. To account for technician variations in the following sections, each individual test is
                                                   65


evaluated using multiple independent and blind digitizations. The independent digitization
waveforms are summed for an overall estimation of frequency recovery versus the variable under
each test.
Image Resolution Test
        To better simulate analog seismograms, synthetic ‘white noise’ signals were generated
with a known frequency range. With the combination of both a known frequency and a
reference signal, it made it easier to determine if a specific waveform digitized at a certain DPI
could recover the maximum frequency in a waveform. For this test, five image resolutions were
used to evaluate the limitations of scan resolution and frequency recovery. The waveforms
studied in this section assume a recording speed of 60 mm/minute. Faster recording speeds, such
as 120 mm/minute, with the same DPI will have a better recovery because there will be more
pixels per waveform cycle. Slower recording speeds, such as 30 mm/minute, will have poorer
recovery because there will be fewer pixels per waveform cycle.
        Three white noise signals containing frequencies of 1 – 12 Hz were generated at different
image resolutions between 200 and 3000 DPI and were blind digitized from different institutions
in Wavetrack. An example of one of the waveforms is shown in Figure 23. Each waveform was
digitized independently by different institutions for statistical control. PSD graphs in Figures 29
through 33 illustrate the variability of each DPI and each technician’s ability to recover the
waveform. The red line signifies the reference waveform, the three yellow lines denote
individual digitizations, and the black line is the average of all three digitizations to create an
overall estimate for the specific variable tested. For example, Figure 31 shows the frequency
recovery for digitizations completed at 600 DPI. The individual tests for the 600 DPI trial
indicate good recovery to 7 – 9 Hz, with the average around 8 Hz.
                                                  66


        Each of the compiled waveforms at each DPI were combined in a final figure shown in
Figure 34. For each DPI, a different color is used with the reference waveform shown as red.
There are color-coded labels indicating the maximum recoverable frequency for each image
resolution. The maximum recoverable frequency was determined whenever a line on the graph
takes a sharp decline that suggests that that digitization is no longer recovering that frequency
content. It can be expected that low image resolutions limit the ability to accurately choose
points in the digitization process, thus low image resolutions will yield a poorer frequency
recovery. For example, the maximum frequency for a 200 DPI image is between 3 – 4 Hz as
shown in the PSD in Figure 34. This level of frequency recovery is insufficient for many
quantitative results that could be done with digitized seismograms. Raising the image resolution
to 300 DPI yields a frequency recovery between 5 -6 Hz and increasing the DPI even further to
600 DPI, returns up to approximately 7 – 8 Hz. The only waveforms from which the full range
of 12 Hz were recovered were the two highest image resolutions: 1500 and 3000 DPI. There is
no meaningful difference in the results between the 1500 and 3000 DPI trials. This indicates that
digitizing a seismogram scanned at greater than 1500 DPI has no additional benefit.
        It is suggested that using a minimum of 600 DPI for seismograms with a recording speed
of 60 mm/minute will ensure sufficient frequency recover for most geophysical analysis. For
circumstances that require a higher image resolution (i.e., the definition of the signal at 600 DPI
still makes it difficult to accurately identify peaks), a recommendation can be made for scanning
the seismogram at 1500 DPI if data storage space permits.
                                                   67


Figure 29. PSD graph displaying the frequency recovery for a 200 DPI image for a signal
with a frequency range of 1 – 12 Hz. The reference signal is in red, the yellow lines are the
   individual digitizations completed by independent technicians, and the black line is
 combination of the three yellow individual digitizations to make an average for 200 DPI.
                                            68


Figure 30. PSD graph displaying the frequency recovery for a 300 DPI image for a signal
with a frequency range of 1 – 12 Hz. The reference signal is in red, the yellow lines are the
   individual digitizations completed by independent technicians, and the black line is
 combination of the three yellow individual digitizations to make an average for 300 DPI.
                                            69


 Figure 31. Power Spectral Density (PSD) graph displaying the frequency recovery for a
600 DPI image for a signal with a frequency range of 1 – 12 Hz. The reference signal is in
 red, the yellow lines are the individual waveforms digitized by independent technicians,
 and the black line is combination of the three yellow individual digitizations to make an
                                    average for 600 DPI.
                                             70


 Figure 32. Power Spectral Density (PSD) graph displaying the frequency recovery for a
1500 DPI image for a signal with a frequency range of 1 – 12 Hz. The reference signal is in
 red, the yellow lines are the individual waveforms digitized by independent technicians,
  and the black line is combination of the three yellow individual waveforms to make an
                                    average for 1500 DPI.
                                             71


 Figure 33. Power Spectral Density (PSD) graph displaying the frequency recovery for a
3000 DPI image for a signal with a frequency range of 1 – 12 Hz. The reference signal is in
 red, the yellow lines are the individual waveforms digitized by independent technicians,
  and the black line is combination of the three yellow individual waveforms to make an
                                    average for 3000 DPI.
                                             72


Figure 34. Power Spectral Density (PSD) graph illustrating all the compiled waveforms for
each image resolution against a 1 – 12 Hz reference signal. Each colored line is the average
                                      for a given DPI.
                                             73


Waveform Thickness Test
         Waveform thickness can be defined as the width of the seismic trace on a seismogram.
This may vary for each seismic station and component. Seismic signals can appear thicker for
many reasons that differ among recording media. On photopaper, the light beam can be out of
focus causing the beam to appear fuzzy, or there can be a very high frequency (such as 50 or 60
Hz) line noise signal overprinted onto the seismic signal. The very fast oscillation of the
galvanometer essentially fattens the seismic trace at all points. This can cause a ‘buffer’
surrounding the incoming ground motion data. The velocity of the recording pen (or light beam,
stylus, etc. depending on the recording methodology) also influences the waveform thickness.
The recording pen has less contact with the paper if it has a fast velocity. This results in a light
and potentially thinner trace. The recording pen has more contact with the paper if the incoming
ground motion data has a slower velocity but also in peaks and troughs where there is a change
of slope. Figure 35 presents an example of a photopaper seismogram and how each component
has a different waveform thickness. With ink recording, the nub of the pen may be worn down
or a bit of fuzz could be stuck on the pen, both which will fatten the trace. A partially clogged
pen may make a thinner than normal trace readable at low amplitudes but invisible at high
amplitudes. A worn heat pen also produces a fatter than normal trace on thermal paper as it is in
contact with the paper over a larger area. The waveform thickness on heat pens is also affected
by the temperature control of the pen. If the pen is too hot, the trace will be much thicker, while
if too low, it will be thin and faint. Additionally, if the thermal paper is not installed correctly
and has areas that are not in contact with the underlying drum, waveforms will be thicker. The
underlying drum acts as a heat sink that pulls heat away from the paper. If the paper is not in
contact with the drum, the heat stays more in the paper making a locally wider waveform. While
                                                  74


digitizing, it is increasingly difficult with wider waveforms to notice and detect seismic signal
within this ‘buffer’ because seismic signals may be hidden.
                                                   75


                                                                                     NS
                                                                                     Z
                                                                                     EW
  Figure 35. Analog photopaper seismogram displaying variations in signal waveform
 thicknesses. This seismogram exemplifies how variable the line thickness is due to the
beam focus and changes in trace velocity. Each group of traces is a different component
                             from the same seismogram.
                                          76


        Chiburis et al. (1980) documented these situations while digitizing their WWSSN
seismograms and noticed that waveform width is a major issue in the digitization process. They
explain how an averaging effect takes place in a digitization where a program, or technician,
follows the true center of the waveform. But by following the absolute center, the digitizer may
miss points such as the true accurate peak and trough of the signal hidden in a wider waveform.
Figure 36 illustrates this concept where the circle on the far left denotes a galvanometer light
beam, the dotted line depicts the true signal, the two outer solid lines represent the waveform
extent, and the single solid line in the middle illustrates the center averaged digitized signal. The
extrema of the amplitude peaks are averaged due to a wide waveform width. This can guarantee
that the digitized line will not represent the true seismic information, which is problematic
because it does not capture the true amplitude and nature of the waveform. With an unrealistic
waveform produced from the averaging effect, it can be assumed that frequency response of the
waveform is also not faithful.
                                                   77


Figure 36. Trace width and the digitizing ‘averaging’ effect. Circle on the left denotes the
 galvanometer light width, the dotted line illustrates the true signal, the outer black lines
show the extent of the amplitude, and the black center line in the digitized waveform from
  computing the average of the amplitude extrema. (Figure from Chiburis et al., 1980).
                                             78


         To examine how the waveform thickness affects the digitization process and the overall
frequency response, three synthetic seismograms were created with a frequency range of 1 – 12
Hz and a 600 DPI image resolution. The process of altering the waveforms was described
previously in the Methodology section with an example of one of the signals shown in Figure 19.
These digitizations were completed as a blind digitization test and a compiled digitization was
created for each waveform thickness.
         As the waveform becomes fatter, the point where the change of slope occurs becomes
harder to identify in the seismic signal, especially for high frequency signals. Most waveforms
are darker in color which increases the difficulty of identifying these points as they tend to blend,
making them indistinguishable from adjacent waveforms. It is expected that as the waveform
thickness becomes wider, the digitization quality will suffer and will not recover high frequency
signals as well.
         The PSD graphs in Figure 37 illustrate the separate waveform thicknesses. Within each
graph, the reference signal is red, the three individual digitizations are yellow, and the compiled
digitizations used as an estimation of the individual waveform thickness is black. The 1x and
10x waveforms show little variances between the technician’s digitizations with an approximate
frequency recovery of around 7 Hz. Both thicknesses are narrow enough to easily identify peaks
and areas of a change of slope.
         The 20x waveform reveals the most inconsistency between the technicians. Two
technicians recovered around 5 Hz and the other slightly more around 8 Hz because the third
technician may have selected their click points on the outer edge of the waveform. By doing
this, the digitized signal amplitudes increase, and this may be the reason why there is an
observed bump in the PSD graph around 5 Hz. The bump is highlighted yellow in the 20x PSD
                                                  79


and we see an averaged recovery for the 20x waveform is now around 7 Hz. Lastly, the 50x
waveform has some variability, which is expected due to the nature of the wider waveform
thickness and the technician’s ability to see the signal. Recoverable frequencies for this
waveform thickness ranged from 3 – 3.5 Hz.
        Figure 38 displays the effect of a technician choosing the outside edge of the trace during
digitization that correlates with the increase or bump in the PSD graph in Figure 37. For
example, when a light beam slows down at a peak or trough, the beam has more exposure in this
area causing a wider waveform thickness. To account for this, technicians should select in the
middle of this area to collect the most accurate peak location. If technicians select the outer edge
as their peak location (i.e., the click point) the true amplitude of the signal increases. Technician
1 in Figure 38 illustrates this problem. Areas of their digitization show peak locations that are
too high and should be moved slightly more inward towards the center of the beam like
Technicians 2 and 3. Some of the digitization points in Technician 1’s waveform that
demonstrate this problem are shown in yellow. Retaining accurate amplitude heights are vital
for geophysical studies that utilize peak to peak signal measurements like earthquake-explosion
discrimination studies.
                                                    80


  Figure 37. Power Spectral Density (PSD) graphs illustrating the frequency recovery of
 each waveform thickness. The number in the upper left denotes the individual test. The
red lines denote the reference signal, yellow lines are the three individual digitizations, and
     the black is the compiled final estimation digitization. Yellow box in the 20x PSD
      highlights a bump in a technician’s recoverable frequency (see text for details).
                                              81


    Figure 38. Waveform comparison of three individuals who digitized a 1x amplitude
      waveform in Wavetrack. The blue line is the digitized trace. Technician 1 chose
digitization point locations at the furthest edge of the waveform thickness (examples shown
   as yellow circles) more so than Technician’s 2 and 3 where they selected points in the
                                             middle.
                                               82


        Figure 39 shows the spectra of the final compiled waveforms on a PSD from each
waveform thickness test and the reference waveform. Each color denotes a waveform thickness.
From the PSDs, there are three waveform thicknesses that recover similar frequencies (1x, 10x,
and 20x). The confidence that all three of these waveform thicknesses return the same signal is
low. The 20x waveform is slightly skewed towards a higher recoverable frequency due to one
technician selecting their click points on the outer edge of the waveform. This skewed the
results for the maximum recoverable frequency for the 20x waveform making the average
around 7 Hz. A more believable recovery for the 20x waveform based on the maximum
recovery of the other two digitizations in this trial is around 5 Hz. The 50x waveform has low
frequency recovery since areas containing high frequency signal are simply lost due to the wide
waveform thickness.
        With that being said, a majority of the world’s analog seismograms’ waveform
thicknesses will typically fall between the 10x and 20x thickness. Digitizing a seismogram with
this waveform size results in data recovery to between 5 – 7 Hz, which is acceptable for most
geophysical analyses. There are extreme cases of seismograms having thin and broad
thicknesses (1x and 50x) and in these situations, technican’s should note the waveform thickness
(especially 50x waveforms) due to the potential of having data recovery loss
                                                  83


   Figure 39. Summary of the frequency recovery of digitizations with varied waveform
thicknesses on a Power Spectral Density (PSD) graph. Each colored line is the average for
a given waveform thickness. All digitized seismograms were compared against a reference
          signal to characterize the recoverable frequencies for each digitization.
                                             84


Amplitude Test
        A waveform was generated with a frequency range of 1-12 Hz with various amplitudes.
These amplitudes ranged from 1x to 50x of the base amplitude. The process of generating these
waveforms were described in the Methodology section and an example of each amplitude is
shown in Figure 20. For this test, the seismograms were scanned at 600 DPI, and a final
compiled waveform was used to show the ‘average’ of each individual variable’s result. Having
a variety of amplitudes will examine how a waveform’s amplitude influences the frequency
recovery.
        Figures 40 through 43 show PSD graphs for each amplitude test. The blue line is the
reference waveform, the red lines are the three individual digitizations completed by independent
technicians and the green line is the compiled or ‘averaged’ digitization. The 1x waveform in
Figure 40 shows the most inconsistency in signal recovery due to the technician’s differing
observations of the compressed signal. Alternatively, the 50x waveform has the least variability
because the enlarged signals better define the waveform and retaining accurate points chosen at
the peaks.
                                                 85


Figure 40. PSD of a 1 -12 Hz 600 DPI waveform with a 1x amplification multiplier. The
 blue is the reference, red is the individual trials, and green is the compiled ‘averaged’
                                         waveform.
                                             86


Figure 41. PSD of a 1 -12 Hz 600 DPI waveform with a 5x amplification multiplier. The
 blue is the reference, red is the individual trials, and green is the compiled ‘averaged’
                                         waveform.
                                             87


Figure 42. PSD of a 1 -12 Hz 600 DPI waveform with a 20x amplification multiplier. The
  blue is the reference, red is the individual trials, and green is the compiled ‘averaged’
                                          waveform.
                                              88


Figure 43. PSD of a 1 -12 Hz 600 DPI waveform with a 50x amplification multiplier. The
  blue is the reference, red is the individual trials, and green is the compiled ‘averaged’
                                          waveform.
                                              89


        Figure 44 illustrates the frequency recovery with respect to signal amplitude. Each
colored line is the compiled ‘averaged’ PSD for each amplitude trial. The PSD graph
demonstrates that amplitude does not significantly impact a waveform’s frequency recovery. For
this example, signals with amplitude that exceed five times the thickness of the recording line
(5x, 20x, and 50x), are within 3db of the reference waveform. Having a digitization be within
3db of the reference signal is a good indicator that the digitizations of good quality and are
recovering accurate frequencies. Lower amplitude waveforms (like 1x) are still recoverable but
needs careful attention while digitizing. An increase in the 1x waveform is observed in the PSD.
When amplitudes are compressed, the ease of correctly identifying and accurately selecting the
true peaks and areas of a change in slope become increasingly difficult. A potential reason for
the increase of the 1x waveform is that the technicians may have selected the outer edges of the
waveform while digitizing (detailed in Figure 38). Selecting the outer edge of the waveform
increases the true amplitude of the signal, which does not correlate with the original signal.
        From the author’s personal experience with real-world seismograms, a signal waveform
thickness between 1x and 5x correlates with general seismic background noise. Seismic events,
depending on their magnitude, can show amplitude measurements well above the 20x waveform
example in this study. Higher amplitudes run the risk of the signal overlapping the adjacent
trace, component, or going off the page entirely. Many records were clipped for large
earthquakes in the analog era. This potential problem increases the difficulty in accurately
retaining amplitude locations. The chance of digitizing a higher amplitude event is great, other
factors such as DPI and the focus of the signal beam play into the simplicity of digitization but
also the overall frequency recovery.
                                                 90


Figure 44. Power Spectral Density (PSD) graph demonstrating the frequency recovery of
   the various compiled ‘averaged’ signal amplitudes. Each colored line is a different
 amplitude level. The seismogram had an image resolution of 600 DPI with a frequency
                                  range of 1 – 12 Hz.
                                           91


Co-located Stations
        Two co-located seismic instruments, one analog SKM-3 short period sensor and one
broadband STS-1 sensor, from Ala-Archa (AAK), Kyrgyzstan were compared against one
another to examine the trustworthiness of digitized analog seismograms and digital data. The
seismic data for this comparison was from a Chinese Lop Nor nuclear test event. A map of
station AAK and the detonation site are shown in Figure 45. The analog data was digitized by
the author and the broadband data was downloaded from the Incorporated Research Institutions
for Seismology (IRIS) digital seismogram database.
        Figure 46 displays a comparison between the co-located instruments in the frequency
domain. Within the PSD, there are labeled low and high noise models and these models are
helpful in relating real-world data to the upper and lower bounds of seismic noise from the
world’s seismic stations. Both PSDs of the co-located instruments match well with one another
between 0.5 - 5 Hz as shown in the yellow highlighted region on the PSD. Within this region,
both waveforms fall within 3db of each other; however, above 5 Hz the broadband waveform
takes a sharp decline. This roll off is due to the application of a lowpass filter on the broadband
digital station. Due to the application of the low-pass filter, it is unknown whether the analog
data continues to correlate with the digital data at higher frequencies. Another difference
between the two waveforms is the sample rate. The digital waveform has a sample rate of 20
samples per second (sps), which we know is relatively low, but at the time of recording 20sps
was thought to be adequate. The digitized waveform was a 600 DPI scan with a 100 sps sample
rate. Even with these small differences, both waveform frequencies match well and show that
the digitization of analog seismograms agree with digital acquired data. The data produced from
digitizing is thus of high enough quality to be used in geophysical studies and processing.
                                                  92


        Analog seismograms contain many variables that influence signal recovery. Variables
such as signal amplitude and the beam focus account for the ability to see the signal. These are
limitations put forth by the station at the time of recording. Scanning the image at an acceptable
image resolution improves the clarity of the signal so the technician can ensure an accurate
signal recovery. Situations where the amplitude extends off the page or the ink pen running out
of ink, an increased image resolution will not improve the amount of data recovered simply due
to variables that are out of the technician’s control. Something like this shows that if one
variable is impeded, data quality will suffer. On an analog seismogram, these variables can
fluctuate between components. Alternatively, digital data does not have the same challenges as
analog seismograms. Modern-day digital stations must monitor their storage capacity and
battery power to ensure quality data collection. Overall, digitization is a complex problem, and a
balance between waveform thickness, signal amplitude, and image resolution is needed for signal
retention.
                                                  93


Figure 45. Location for station with co-located instruments relative to the seismic event at
the Lop Nor Chinese Nuclear Test Site used for comparison analysis. The blue triangle is
   the seismic station Ala-Archa (AAK), Kyrgyzstan and Lop Nor detonation site is red
triangle. Additional detonation sites for Soviet and United States nuclear tests are shown
                                       as red circles.
                                             94


                           High Noise Model
         Low Noise
         Model
   Figure 46. PSD comparison of a broadband STS-1 sensor (red) and an analog SKM-3
   sensor (blue). The High and Low Noise Models illustrate the bounds of seismic noise
relative to the world’s seismic stations. The yellow region highlights the frequencies where
   the two PSD curves waveform falls within 3db of one another between 0.5 – 5 Hz. An
  application of a low-pass filter on the broadband sensor is the reason for a strong signal
                                       roll off after 5 Hz.
                                                95


                                          CONCLUSIONS
         The digitization of analog seismograms is a complex process with many variables
affecting the ability to recover the original analog waveform and represent it in digital form with
minimal loss of information. Some variables in the process are within our control, such as scan
resolution, while other variables are natural limitations from the original record, like waveform
thickness and amplitude. Nevertheless, it is necessary to understand the overall effect of the
variables in the digitization process to both produce high quality digital waveforms as well as
understand the limitations in the process and resulting data.
         Scan DPI has a significant impact on the frequency recovery of digitizations. Higher DPI
scanned seismograms yield higher recoverable frequencies in the digitization process. A DPI of
at least 600 DPI is needed to achieve recoverable signal up to 8 Hz; however, if there is a need
for retaining higher frequencies, a higher DPI will need to be used. Continual increases in scan
resolution will not forever improve frequency recovery as evidenced by no change in recovery
between scan resolutions of 1500 and 3000 DPI in one of the tests.
         Regarding waveform thickness, the thinner the trace, the easier it is to recover signal.
Narrower waveform thicknesses allow an easier selection of digitization points chosen at the
peaks and areas where a change of slope occurs. If the signal is severely out of focus, frequency
recovery is reduced, and maximum expected frequencies can range between 3 – 5 Hz. A
majority of the world’s seismograms will most likely fall between a 10x and 20x thickness
described in this study. Digitizing a seismogram within this waveform size will result in a data
recovery to between 5 – 7 Hz. Recovering frequencies in this range are acceptable for most
geophysical analyses.
                                                  96


         Signal amplitude does not have a significant influence on a waveform’s frequency
recovery for most seismograms. The true amplitudes of the signal are more easily chosen at
higher amplitudes (like 20x and 50x in the examples described) compared to compressed
amplitudes. Compressed amplitudes require additional attention because there is an increased
chance that the technician may chose the outer edge of the signal beam and increase the true
amplitude of the signal.
         A technician’s ability to recover the frequency and amplitude is an important factor in
digitization and signal recovery. There are many facets that influence accurate recovery such as
experience, understanding of waveform mechanics, speed or time taken to digitize, and care
taken during digitization. From our group of technicians, there was no correlation between
digitization experience and quality of their digitization. This is due to the reason that these
technicians are veterans in analog seismogram digitization. I think there would be some
separation if we took a technician just starting out and a technician who has been digitizing for a
long time. Speed is important because technicians may speed through a digitization and miss
click points in their digitization. Note that as a technician gains experience, they should digitize
faster. The understanding of waveform mechanics and care while digitizing are additional items
to consider because frankly some technicians cannot understand what seismograms should look
like or understand ground motion so their digitizations will suffer from this. Lastly, does the
technician care what their digitization looks like? Sometimes technicians will not put enough
care or attention in their work which affects the resulting digitization.
         The process of digitizing analog seismograms is meaningful, and worldwide efforts from
institutions are needed to save and preserve these historic seismic. Analog seismograms contain
vital information of large earthquakes and information from nuclear testing. For our manual
                                                   97


digitization program, a PCHIP interpolation is recommended with a high sampling rate because
it maintains the original waveform’s shape by utilizing the digitization points at each peak. The
digitizations produced manually correlate well with digital data and are indeed usable for
geophysical analysis
        The complexity of digitization is not only a factor of seismogram variables in the
digitization process but also the combination of human influences or technician ability. If one
variable (DPI, signal amplitude, waveform thickness, and human influence) suffers, the data
quality and the frequency recovery will be negatively affected. There needs to be a balance
between all of the influences to achieve good digitizations. All of the factors can be on a sliding
scall of importance depending on the research and data requirements for post processing.
                                                98


                                            REFERENCES
Bromirski, P. D., & Chuang, S. (2003). SeisDig: Software to Digitize Analog Seismogram
        Images, User’s Manual. Scripps Institution of Oceanography Technical Report.
        http://escholarship.org/uc/item/76b2m74m, 28 pp.
 Chiburis, E. F., Ahner, R. O., & Reinhardt, E. C. (1980). Procedures for Digitizing
        Seismograms. Indian Harbour Beach, FL. 44pp.
Cygnus Research International (n.d). from
        https://www.cygres.com/OcnPageE/Glosry/SpecE.html
Eastmancuts. (2013, February 6). Eastman-joining large pieces on digitizer. [Video]. YouTube.
        https://www.youtube.com/watch?v=jvUQ_whKje0
Fritsch, F. N. & Carlson, R. E., (1980). Monotone Piecewise Cubic Interpolation. SIAM Journal
        on Numerical Analysis. 17. p.238–246.
GSRAS. (2001). Calibration of the Seismic Stations of the Russian Academy of Sciences for
        CTBT Seismic Monitoring Purposes. Russian Academy of Sciences Geophysical Survey
        of Russia Technical Report. 30pp.
Indian Institute of Science. (n.d.). Conducting a GIS Analysis:
        http://wgbis.ces.iisc.ernet.in/envis/Remote/section156.htm
Ishii, M., Ishii, H., Bernier, B., & Bulat, E. (2015). Efforts to Recover and Digitize Analog
        Seismograms from Harvard-Adam- Dziewoński Observatory. Seismological Research
        Letters. 86(1), p.255-261.
Kemerait, R. C., Kraft, G., Mott, J. S., & Dohner, E. (1981). A study of the hand-digitization
        process for digitizing short period seismic data. Indian Harbour Beach, FL. 38 pp.
Mackey, K. G., Hartse, H., & Fujita, K. (2009). Final Report Analysis of Digitized Seismograms
        from Russian Geophysical Survey Stations of Soviet Peaceful Nuclear Explosions.
        (Report No. AFRl-RV-Ha-TR-2009-0000). Michigan State University, East Lansing,
        MI. 111 pp.
Michelini, A., De Simoni, B., Amato, A., & Boschi, E. (2005). Collecting, Digitizing, and
        Distributing Historical Seismological Data. EOS. 86(28). p.261-266.
Octave Forge Community. (2017, January 2). Function reference: Interp1. From
        https://octave.sourceforge.io.octave/function/interp1.html
Okal, E. (2015). Historical seismograms: Preserving an endangered species. GeoResJ. 6. p.53-
        64.
Pintore, S., Quintiliani, M., & Franceschi, D. (2005). Teseo: A vectoriser of historical
        seismograms. Computers & Geosciences. 31(10). p.1277-1285.
                                                   99


Skiljan, I. (1996). Irfanview. [Computer software]. Developer. https://www.irfanview.com/
Sokolova, I. (2015). Acoustic waves from atmospheric nuclear explosions recorded by
        infrasound and seismic stations of Kazakhstan. Poster T2.3-P3 presented at the CTBTO
        Snt 2015 Annual Meeting in Vienna, Austria.
        https://www.ctbto.org/fileadmin/user_upload/SnT2015/SnT2015_Posters/T2.3-P3.pdf
Toskey, N. (2018, July 10). Image Resolution Explained [web log].
        http://www.makingmediatoremember.com/learning/image-resolution-explained/
Yu, Z., Chaoyong, P., & Jiansi, Y. (2017). Historical Seismic Map Database and Sharing
        Platform. Seismic and Geomagnetic Observations and Research. 38(4). p.207-211.
Zhang, J., Song, X., Li, Y., Richards, P. G., Sun, X., Waldhauser, F. (2005). Inner Core
        Differential Motion Confirmed by Earthquake Waveform Doublets. Science. 309(5739).
        p.1357-1360.
                                                100


                                              APPENDIX
         One of the holy grails for the digitization of seismograms is the development of a fully
automated routine that will generate high quality waves with minimum operator input. In this
thesis, I investigated parameters in the process using manual digitization techniques. However, I
also investigated the possibility of conducting this research using the Harvard University
developed semi-automatic digitization program called DigitSeis (Ishii et al., 2015). Although
DigitSeis was not used in this research, I am providing an evaluation of the current state of the
software (v1.5, 2020). Many of the variables discussed above are relevant for both manual and
automated routines and affect the resulting digital waveforms.
         DigitSeis is an image processing program where a line is traced along the seismogram.
The image is classified or identified into three categories: noise, signal, and time marks. After
careful identification by the user, the program digitizes the signal. If the original digitization was
unsatisfactory, the user can manually go back and fix any data gaps in the digitization or
incorrectly traced signals and re-digitize.
         There are seven main steps to produce a digitized waveform in DigitSeis. First is image
processing and uploading of the image into the program. DigitSeis requires JPEG images where
the image has a white trace on a black background whereas Wavetrack required BMP images. If
the images do not have the required image contrast, DigitSeis will adjust it automatically.
Another process for preparing the image for digitization is cropping the image. As DigitSeis
computes the pixel matrix within the image, it is strongly recommended to crop the image before
the start of digitization as this will speed up processing time. The author found that cropping the
image prior to importing the image into DigitSeis was the best method due to the large image
size and long processing time.
                                                   101


        The next step is determining the minute marks. Retaining accurate timing in analog
seismogram digitization is vital for geophysical studies. DigitSeis can account for numerous
types of time marks such as minute mark offsets, a few seconds of data that are above or below
the normal trace line, and no time marks. A physical measurement of the pixel length of the time
mark is used for later classification.
        Classification and digitization are the next two steps for digitizing analog seismograms in
DigitSeis. DigitSeis has a three-class system where it categorizes information into signal, noise,
and time marks. Based on the prior pixel measurements of the time marks, DigitSeis manually
calculates and displays the classification of the entire seismogram. The classification can be
edited and re-classified to obtain an optimal classification for digitization. Using the pre-
determined classifications, DigitSeis digitizes the signal portion of the waveform. Once
digitized, the user can manually edit the digitization by merging several traces into one
continuous waveform, correcting any digitized signals, and filling in any data gaps. If the
original classification was unsatisfactory, the user can go back and edit the original classification
and re-digitize.
        The last two steps are determining the time and exporting the data file in the form of a
SAC file. Each data line within the seismogram has two red bars that appear on either end of the
digitization, and the user must click those points and enter in the date and time for that specific
section of the seismogram. However, an obstacle for this step, which is certainly a suggested
area of improvement for DigitSeis, is that the user must physically click on the red line in order
to enter the time. For the digitizations in this small study, the red lines were on the edges of the
viewing window making it challenging to define the time for the digitization. A suggestion for
improvement is to set a tolerance around the red lines to allow easier selection. After the timing
                                                  102


is determined the data is then exported as count of pixels in SAC format. The digitized data is
now able to be viewed and processed on a computer.
         This study assessed two synthetic white noise waveforms through the DigitSeis
digitization process at 3000 DPI and compared the digitizations against reference signals. Each
digitization was administered and edited by the author. One waveform was low frequency with a
range of 0.1 – 2 Hz and the other was a higher frequency waveform with a range of 1 – 12 Hz.
For the 0.1 – 2 Hz waveform, DigitSeis classified that waveform into six different data segments.
These separate traces are the dashed blue lines and the change of signal color in Figure 47 while
the yellow boxes denote data gap areas within the digitization. Manual modifications were
needed to fix the data gaps and to merge the traces back again into one waveform.
         As the lower frequency waveform is less complex, DigitSeis succeeded in recovering the
full signal up to 2 Hz. The green is the reference signal that did not go through the digitization
process and the red is the trace produced through DigitSeis (Figure 48). The PSD graph
illustrates that both traces recovered the same frequency. There are subtle but detectable
differences between the two, but overall, the traces compare well.
                                                 103


                                                                   Reference Seismogram
                                                                     DigitSeis Waveform
Figure 47. Example of a digitized synthetic wave at 0.1 – 2 Hz in DigitSeis. The different
  colors of the waveform represent separate traces within the waveform. Yellow boxes
denote data gaps in the digitization. Manual improvements are needed to fix the gaps and
                                      merge the traces.
                                            104


                                                                                Reference
                                                                                  DigitSeis
   Figure 48. Comparison between a low frequency signal (0.1 – 2 Hz) and its DigitSeis
digitization. The green trace is the reference signal that did not go through the digitization
process while the red trace was digitized using DigitSeis. The digitized signal recovers the
                    full 2 Hz signal as seen in the PSD graph on the left.
                                              105


        Both Figures 47 and 48 provide credible data that DigitSeis can fully recover waveforms
up to 2 Hz. However, a higher frequency waveform with a frequency range of 1 – 12 Hz does
not have the same result. This waveform was difficult to digitize because of its complexity.
DigitSeis classified the trace into 38 individual traces, unlike the lower frequency waveform that
only had six pieces (Figure 49). Significant modifications were needed to make this waveform
usable for scientific studies.
        If left in its current state, and not edited, the waveform has significant data loss. The red
box in Figure 49 illustrates how the signal does not reach the full extent of a signal’s amplitude.
That is one source of error if left untouched. Another source of error is the ‘averaging effect’
algorithm within DigitSeis. For high frequency signals, the program cannot recognize and
follow the trace due to a combination of complex signals and a wider trace thickness resulting in
a muted waveform. It is suspected that DigitSeis considers the trace thickness when determining
the trace during digitization. If a high frequency signal were present, the signal may be lost and
not be accurately recovered. This observation is similar to the remark from Chibirus et al. (1980)
where they noticed an ‘averaging’ effect of their WWSSN digitizations. The authors found that
it was a “poor representation of the original signal.” For this high frequency waveform example,
the entire waveform was manually modified to enhance the amplitudes and adjust any subdued
signals.
                                                    106


        True waveform peak
    Automatically digitized peak
 Figure 49. Classification and initial digitization of high frequency waveform in DigitSeis.
  A 1 – 12 Hz signal was digitized in DigitSeis, and the program broke the signal into 38
   different traces which are denoted by the blue dashed lines and the different colored
signals. The yellow boxes denote data gaps within the waveform that need to be manually
modified. The inset illustrates how the digitized trace does not continue to the full extent of
                                 the waveform’s amplitude.
                                              107


        Figure 50 shows how a high frequency waveform appears pre- and post- editing. The un-
edited signal is inconsistent with the reference signal and has severe loss of high frequency
signal (shown in yellow highlighted region). There are two ways to edit a digitization in
DigitSeis. The first method is to re-classify the objects in the waveform and re-digitize the
waveform. The second method is more laborious where the user must manually comb through
the digitization and update any errors. Within the manual method, reference points are selected
to guide DigitSeis to fit a spline interpolation method between the points. Since the automatic
algorithm selects discrete time points in the digitization (unlike Wavetrack, where a point is a
peak or change of slope), a spline is ideal for this digitization program. If one point was
adjusted, it affected the entire waveform in the editing window, thus smaller working windows
for editing were suggested. Having smaller editing windows increased the time in the
digitization process. The time spent improving the waveform is necessary because from Figure
51, the PSD graph illustrates the frequency recovery of the waveform pre- and post- editing. The
edited version of the waveform (gold) follows strongly with the reference signal (blue) compared
to the un-edited version (green). Also in the PSD, a manually digitized version of this waveform
is shown in red. The Wavetrack digitization fully recovers the 12 Hz whereas the edited
DigitSeis waveform recovers just below. I believe the difference in these waveforms is from my
fluency with each digitization program. I have over seven years’ experience with Wavetrack and
only a few months with DigitSeis. To improve my results with DigitSeis, I could have spent
more time updating and improving the waveform, but I am confident that DigitSeis returns a
comparable waveform to Wavetrack.
        The approximate digitization time for both programs, DigitSeis and Wavetrack, are
shown in Table 2. Lower frequency signals take significantly less time to digitize in both
                                                  108


programs compared to higher frequency seismograms. My final conclusions are that overall,
DigitSeis produces good, quality digitizations; however, it struggles with complex waveforms. If
a complex waveform is used, substantial time and effort is needed to produce a sufficient
waveform for analysis. Wavetrack also produces good, quality seismograms and the digitization
process is currently faster. For the complex waveform evaluated, it took approximately five
times longer to achieve a quality waveform with DigitSeis as compared to Wavetrack due to the
tedious process of implementing manual corrections. It is important to note that the synthetic
waveforms used to test DigitSeis were single traces. They did not have adjacent waveform traces
that are typical in a real-world seismogram. Adjacent traces crossing the waveform under
digitization can interfere with the automatic digitization routine, requiring additional time for
corrections. Both programs require significant training, but after an initial practice period you
will achieve quality digitizations. This is the goal for any digitization project (either completed
manually or automatically). DigitSeis is an open-source software program and readily available
for download unlike Wavetrack which at this time is only used within Michigan State University
and their research collaborators in Russia, Kazakhstan, and Kyrgyzstan. In the end, the program
availability, complexity of the waveform, and time available for digitization should be
considered when digitizing analog seismograms.
                                                 109


                                                                                Reference
                                                                    Unedited DigitSeis
                                                                   Corrected DigitSeis
Figure 50. Relationship between a reference waveform (green), an un-edited digitization in
DigitSeis (red), and a corrected digitization in DigitSeis (black). If left un-edited, significant
 data loss can occur in the digitization and an example is shown in the yellow highlighted
       region. Significant manual modifications are needed to recover the lost data.
                                               110


Figure 51. Power Spectral Density (PSD) graph illustrating the frequency recovery of a 1 –
 12 Hz waveform that was digitized in two digitization programs: DigitSeis and Wavetrack.
                                           111


                   Waveform               Digitization    Total Digitization
                                            Program               time
          Low frequency (0.1 – 2 Hz)        DigitSeis           ~ 1 hour
                                           Wavetrack            ~ 1 hour
          High frequency (1 – 12 Hz)        DigitSeis        ~ 8 – 10 hours
                                           Wavetrack        ~ 1.5 – 2 hours
Table 2. Relative time it takes to digitize a low and high frequency signal in a manual
       digitization program, Wavetrack, and an automatic program, DigitSeis.
                                           112