QUANTITATIVE METHODS FOR CALIBRATED SPATIAL  

MEASUREMENTS OF LARYNGEAL PHONATORY  

 

MECHANISMS  

By 

Hamzeh Ghasemzadeh 

A DISSERTATION 

Submitted to 

Michigan State University 

in partial fulfillment of the requirements 

for the degree of 

 

Communicative Sciences and Disorders - Doctor of Philosophy 

Computational Mathematics Science and Engineering - Dual Major 

2020 

 

 

 

 

 

ABSTRACT 

QUANTITATIVE METHODS FOR CALIBRATED SPATIAL 

MEASUREMENTS OF LARYNGEAL PHONATORY 

MECHANISMS 

By 

Hamzeh Ghasemzadeh 

The  ability  to  perform  measurements  is  an  important  cornerstone  and  the  prerequisite  of  any 

quantitative research. Measurements allow us to quantify inputs and outputs of a system, and then 

to express their relationships using concise mathematical expressions and models. Those models 

would then enable us to understand how a target system works and to predict its output for changes 

in the system parameters. Conversely, models would enable us to determine the proper parameters 

of a system for achieving a certain output. Putting these in the context of voice science research, 

variations in the parameters of the phonatory system could be attributed to individual differences. 

Thus, accurate models would enable us to account for individual differences during the diagnosis 

and to make reliable predictions about the likely outcome of different treatment options. Analysis 

of vibration of the vocal folds using high-speed videoendoscopy (HSV) could be an ideal candidate 

for constructing computational models. However, conventional images are not spatially calibrated 

and cannot be used for absolute spatial measurements. This dissertation is focused on developing 

the required methodologies for calibrated spatial measurements from in-vivo HSV recordings. 

Specifically, two different approaches for calibrated horizontal measurements of HSV images are 

presented. The first approach is called the indirect approach, and it is based on the registration of 

a specific attribute of a common object (e.g. size of a lesion) from a calibrated intraoperative still 

image to its corresponding non-calibrated in-vivo HSV recording. This approach does not require 

specialized instruments and can be implemented in many clinical settings. However, its validity 

 

 

depends  on  a  couple  of  assumptions.  Violation  of  those  assumptions  could  lead  to significant 

measurement errors. The second approach is called the direct approach, and it is based on a laser-

projection  flexible  fiberoptic  endoscope.  This  approach  would  enable  us  to  make  accurate 

calibrated  spatial  measurements.  This  dissertation  evaluates  the  accuracy  of  the  first  approach 

indirectly, and by studying its underlying fundamental assumptions. However, the accuracy of the 

second approach is evaluated directly, and using benchtop experiments with different surfaces, 

different  working  distances,  and  different  imaging  angles.  The  main  significances  and 

contributions of this dissertation are the following: (1) a formal treatment of indirect horizontal 

calibration is presented, and the assumptions governing its validity and reliability are discussed. A 

battery of tests is presented that can indirectly assess the validity of those assumptions in laryngeal 

imaging applications; (2) recordings from pre- and post-surgery from patients with vocal fold mass 

lesions are used as a testbench for the developed indirect calibration approach. In that regard, a 

full solution is developed for measuring the calibrated velocity of the vocal folds. The developed 

solution is then used to investigate post-surgery changes in the closing velocity of the vocal folds 

from patients with vocal fold mass lesions; (3) the method for calibrated vertical measurement 

from  a  laser-projection  fiberoptic  flexible  endoscope  is  developed.  The  developed  method  is 

evaluated at different working distances, different imaging angles, and on a 3D surface; (4) a 

detailed analysis and investigation of non-linear image distortion of a fiberoptic flexible endoscope 

is presented. The effect of imaging angle and spatial location of an object on the magnitude of that 

distortion is studied and quantified; (5) the method for calibrated horizontal measurement from a 

laser-projection fiberoptic flexible endoscope is developed. The developed method is evaluated at 

different working distances, different imaging angles, and on a 3D surface.

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

I would like to dedicate this dissertation to the sparking light of my heart, Marjan, who has 
always been a true source of strength, passion, and support for me. The peace and comfort that I 
have found in her, have always given me the power to overcome all barriers and difficulties that I 

I would like to dedicate this dissertation to my lovely parents, whom I haven’t seen for a long 

have been facing. 

time, and miss deeply. 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

iv 

ACKNOWLEDGEMENTS 

I would like to express my gratitude to all who have contributed to my academic and personal 

 
 

development. In the first place, I am deeply grateful to my Ph.D. advisor, Dr. Dimitar Deliyski. 

He  helped  me  to  have  a  smooth  and  pleasant  transitioning  from  engineering  into  science.  I 

appreciate  the  flexibility  that  he  offered,  and  his  welcoming  attitude  toward  new  ideas.  His 

approach  allowed  me  to  feel  confident  to  combine  my  engineering  skills  with  the  scientific 

knowledge that I learned. Without his help I would not feel confident for starting my independent 

scholarship  and  academic  career.  I  would  also  like  to  express  my  great  appreciation  to  my 

committee members, Dr. Eric Hunter, Dr. Adam Alessio, Dr. Maryam Naghibolhosseini, and Dr. 

Dirk Colbry for their persistent help and guidance. 

This dissertation was partially supported by the Michigan State University Foundation, the Council 

of  Academic  Programs  in  Communication  Sciences  and  Disorders  (CAPCSD)  2020  Ph.D. 

Scholarship, and the National Institutes of Health (NIH) - National Institute on Deafness and Other 

Communication Disorders (grants: R01 DC017923, R01 DC007640, P50 DC01546).  

 

v 

TABLE OF CONTENTS 

 
 
LIST OF TABLES ........................................................................................................................ x 

LIST OF FIGURES ................................................................................................................... xiii 

CHAPTER 1: INTRODUCTION ................................................................................................ 1 
1.1. Background .......................................................................................................................... 1 
1.2. Significance and rational ...................................................................................................... 9 
1.3. Structure of the dissertation and the research questions .................................................... 14 
1.4. Recordings setup and characteristics .................................................................................. 24 
1.4.1. Benchtop recording setup ............................................................................................ 24 
1.4.2. Recording protocol ...................................................................................................... 25 

CHAPTER 2: INDIRECT HORIZONTAL CALIBRATION OF IN-VIVO HSV 
RECORDINGS ........................................................................................................................... 28 
2.1. Introduction ........................................................................................................................ 29 
2.2. Aim and hypothesis ............................................................................................................ 32 
2.3. Material and method ........................................................................................................... 33 
2.3.1. Participants and data acquisition ................................................................................. 33 
2.3.2. Indirect calibration principles and assumptions .......................................................... 34 
2.3.2.1. Indirect calibration for between-subject size comparison .................................... 35 
2.3.2.2. Indirect calibration for within-subject size comparison ........................................ 37 
2.3.3. Evaluation of indirect calibration ................................................................................ 40 
2.3.3.1. Registration uncertainty test ................................................................................. 41 
2.4. Experiments and results ..................................................................................................... 42 
2.4.1. Experiment1: Efficacy of registration uncertainty test ................................................ 43 
2.4.1.1. Database ................................................................................................................ 43 
2.4.1.2. Method .................................................................................................................. 43 
2.4.1.3. Results ................................................................................................................... 44 
2.4.2. Experiment 2: Effect of phonatory configuration on the calibrated length ................. 45 
2.4.2.1. Experiment 2.a: Vocal fold length attributes ........................................................ 46 
              2.4.2.1.1. Database .......................................................................................................... 46 
              2.4.2.1.2. Method ............................................................................................................ 47 
              2.4.2.1.3. Results ............................................................................................................. 48 
2.4.2.2. Experiment 2.b: Vocal fold width ......................................................................... 49 
              2.4.2.2.1. Database .......................................................................................................... 49 
              2.4.2.2.2. Method ............................................................................................................ 49 
              2.4.2.2.3. Result .............................................................................................................. 49 
2.4.2.3. Experiment 2.c: Blood vessel on a vocal fold....................................................... 52 
              2.4.2.3.1. Database .......................................................................................................... 52 
              2.4.2.3.2. Method ............................................................................................................ 53 
              2.4.2.3.3. Results ............................................................................................................. 53 
2.4.2.4. Experiment 2.d: Blood vessel on a nearby tissue ................................................. 54 

 

vi 

              2.4.2.4.1. Database .......................................................................................................... 54 
              2.4.2.4.2. Method ............................................................................................................ 54 
              2.4.2.4.3. Results ............................................................................................................. 55 
2.4.3. Experiment 3: Selecting the most suitable common attribute ..................................... 56 
2.4.3.1. Experiment 3a: Registration uncertainty of different common attributes ............ 56 
2.4.3.2. Experiment 3b: Size consistency of different common attribute .......................... 57 
              2.4.3.2.1. Method ............................................................................................................ 57 
              2.4.3.2.2. Results ............................................................................................................. 58 
2.5. Discussions ......................................................................................................................... 59 
2.6. Conclusions ........................................................................................................................ 63 

CHAPTER 3: APPLICATION OF INDIRECT HORIZONTAL CALIBRATION TO 
KINEMATIC MEASUREMENTS FROM IN-VIVO HSV RECORDINGS ....................... 65 
3.1. Introduction ........................................................................................................................ 66 
3.2. Aim and hypothesis ............................................................................................................ 67 
3.3. Material and Method .......................................................................................................... 71 
3.3.1. Participants and data acquisition ................................................................................. 71 
3.3.2. Approach and measurements ....................................................................................... 73 
3.3.2.1. Temporal segmentation ......................................................................................... 73 
3.3.2.2. Motion compensation............................................................................................ 75 
3.3.2.3. Rotation correction ................................................................................................ 77 
3.3.2.4. Spatial segmentation ............................................................................................. 80 
3.3.2.5. Horizontal calibration ........................................................................................... 84 
3.3.2.6. Velocity measurements ......................................................................................... 85 
3.4. Experiments and results ..................................................................................................... 87 
3.4.1. Experiment1: Post-surgery changes in closing velocity .............................................. 88 
3.4.2. Experiment2: Post-surgery similarity between the two vocal folds ............................ 93 
3.4.3. Experiment3: Effect of lesion size on post-surgery changes ....................................... 95 
3.5. Discussions ......................................................................................................................... 97 
3.6. Conclusions ...................................................................................................................... 100 

CHAPTER 4: DIRECT VERTICAL CALIBRATION OF HSV RECORDINGS ............ 102 
4.1. Introduction ...................................................................................................................... 103 
4.2. Aim and hypothesis .......................................................................................................... 109 
4.3. Material and method ......................................................................................................... 110 
4.3.1. Laser-projection endoscope ....................................................................................... 110 
4.3.2. Calibration protocol and recordings .......................................................................... 111 
4.3.3. Measuring vertical distance ....................................................................................... 114 
4.3.3.1. Compensating for the lens-coupler parameters ................................................... 114 
              4.3.3.1.1. Recording model ........................................................................................... 115 
              4.3.3.1.2. Automatic estimation of the mapping ........................................................... 117 
4.3.3.2. Algorithm for distance estimation ...................................................................... 120 
              4.3.3.2.1. Automatic detection of laser points .............................................................. 120 
              4.3.3.2.2. Vertical distance decoding ............................................................................ 122 
4.4. Experiments and results ................................................................................................... 124 
4.4.1. Experiment1: Evaluation of preprocessing components ........................................... 124 

 

vii 

4.4.1.1. Experiment1a: Evaluation of FOV and the fiducial finder modules .................. 125 
4.4.1.2. Experiment1b: Evaluation of the laser finder module ........................................ 125 
4.4.2. Experiment2: Displacement analysis and vertical resolution of the system ............. 127 
4.4.3. Experiment3: Evaluation of vertical distance measurements .................................... 129 
4.5. Discussions ....................................................................................................................... 132 
4.6. Conclusion ........................................................................................................................ 135 

CHAPTER 5: NON-LINEAR IMAGE DISTORTIONS IN FLEXIBLE FIBEROPTIC 
ENDOSCOPES ......................................................................................................................... 136 
5.1. Introduction ...................................................................................................................... 137 
5.2. Aim and hypothesis .......................................................................................................... 140 
5.3. Optical principles of image formation ............................................................................. 141 
5.4. Material and method ......................................................................................................... 145 
5.4.1. Recording instrumentation and setup ........................................................................ 145 
5.4.2. Datasets ...................................................................................................................... 146 
5.4.3. Automatic detection of grid lines .............................................................................. 148 
5.4.4. Pixel size .................................................................................................................... 149 
5.5. Experiments and results ................................................................................................... 150 
5.5.1. Experiment 1: Differences between grid sizes .......................................................... 150 
5.5.2. Experiment 2: Effect of spatial location .................................................................... 152 
5.5.3. Experiment 3: Effect of the tilting angle ................................................................... 157 
5.6. Discussions ....................................................................................................................... 164 
5.7. Conclusions ...................................................................................................................... 167 

CHAPTER 6: DIRECT HORIZONTAL CALIBRATION OF HSV RECORDINGS ...... 169 
6.1. Introduction ...................................................................................................................... 170 
6.2. Aim and hypothesis .......................................................................................................... 173 
6.3. Material and method ......................................................................................................... 174 
6.3.1. Datasets ...................................................................................................................... 176 
6.3.2. Segmentation and preprocessing ............................................................................... 178 
6.3.3. Horizontal calibration method ................................................................................... 180 
6.3.4. Horizontal measurement method ............................................................................... 182 
6.3.5. Estimation of the working distance ........................................................................... 183 
6.4. Experiments and results ................................................................................................... 186 
6.4.1. Experiment 1: Accuracy of vertical measurements ................................................... 186 
6.4.2. Experiment 2: Performance of radial horizontal measurements ............................... 187 
6.4.3. Experiment 3: Performance of central angle estimation ........................................... 190 
6.4.4. Experiment 4: Performance of general horizontal measurements ............................. 191 
    6.5. Discussion ......................................................................................................................... 194 
    6.6. Conclusion ........................................................................................................................ 196 

CHAPTER 7: VALIDITY AND ACCURACY OF HORIZONTAL AND VERTICAL 
MEASUREMENTS BASED ON DIRECT CALIBRATION ............................................... 198 
7.1. Introduction ...................................................................................................................... 199 
7.2. Aim and hypothesis .......................................................................................................... 201 
7.3. Material and method ......................................................................................................... 203 

 

viii 

7.3.1. Material and method for the effect of the imaging angle .......................................... 203 
7.3.1.1. Data acquisition .................................................................................................. 203 
7.3.1.2. Database .............................................................................................................. 204 
              7.3.1.2.1. Database for vertical measurements ............................................................. 206 
              7.3.1.2.2. Database for horizontal measurements ......................................................... 208 
7.3.1.3. Analysis and measurements from a tilted surface ............................................... 210 
              7.3.1.3.1. Vertical measurements from a tilted surface ................................................ 210 
              7.3.1.3.2. Horizontal measurements from a tilted surface ............................................ 213 
7.3.2. Material and method for the effect of the 3D surface ................................................ 213 
7.3.2.1. Data acquisition .................................................................................................. 213 
7.3.2.2. Analysis and measurements from a 3D surface .................................................. 215 
              7.3.2.2.1. Vertical measurements from a 3D surface .................................................... 215 
              7.3.2.2.2. Horizontal measurements from a 3D surface ................................................ 217 
7.4. Experiments and results ................................................................................................... 218 
7.4.1. Experiment1: effect of the imaging angle ................................................................. 218 
7.4.1.1. Experiment1a: effect of imaging angle on calibrated vertical measurements .... 219 
7.4.1.2. Experiment1b: effect of imaging angle on calibrated horizontal measurements 222 
7.4.2. Experiment2: effect of a 3D surface .......................................................................... 225 
7.4.2.1. Experiment2a: effect of a 3D surface on calibrated vertical measurements ....... 225 
7.4.2.2. Experiment2b: effect of a 3D surface on calibrated horizontal measurements .. 228 
7.5. Discussions ....................................................................................................................... 230 
7.6. Conclusions ...................................................................................................................... 235 

CHAPTER 8: SUMMARY OF THE FINDINGS .................................................................. 236 
    8.1. Specific contributions of each dissertation chapter………………………………………238 
    8.2. Directions for further investigations.…………………………………………………….242 

REFERENCES .......................................................................................................................... 245 
 
 

 

ix 

LIST OF TABLES 

Table 1.1. Summary of different chapters of the dissertation. .......................................................23 

 
 

Table 2.1. Descriptive statistics of intra-sample registration variability. ......................................45 

Table 2.2. Descriptive statistics of the mm size of attributes of vocal fold length. .......................48 

Table 2.3. Descriptive statistics of the mm width of the vocal fold. ..............................................52 

Table 2.4. Descriptive statistics of the mm size of attributes of a blood vessel on the vocal fold..
 ......................................................................................................................................54 

Table 2.5. Descriptive statistics of the mm size of attributes of the blood vessel on a nearby tissue.
 ......................................................................................................................................55 

Table 2.6. Descriptive statistics of registration uncertainty for different selections of the common 
attribute. ........................................................................................................................56 

Table 2.7. Individual differences in registration uncertainty of each common attribute. ..............57 

Table 2.8. Descriptive statistics of γ for different selections of the common attribute. ................58 

Table 2.9. Individual trends regarding the size consistency of different common attributes. .......59 

Table 2.10. Comparing suitability of different common attributes for indirect calibration of vocal 
folds. .............................................................................................................................62 

Table 3.1. Demographic and diagnosis information of the included subjects. ..............................73 

Table 3.2. Descriptive statistics of closing velocity at different scanning lines (mean±std). ........91 

Table 3.3. Results of the paired-sample t-test for the closing velocity at different scanning lines.
 ......................................................................................................................................91 

Table 3.4. Descriptive statistics of closing velocity at different scanning lines (mean±std). ........92 

Table 3.5. Results of the paired-sample t-test for the closing velocity of the vocal fold with a lesion 
at different scanning lines. ............................................................................................92 

Table 3.6. Results of the paired-sample t-test for the closing velocity of the vocal fold without a 
lesion at different scanning lines...................................................................................93 

Table 3.7. Results of paired-sample t-test for pre- and post-surgery recordings. ..........................94 

 

x 

Table 3.8. Correlation between post-surgery changes in the closing velocity and the area of the 
lesion. ............................................................................................................................97 

Table 4.1. Literature-based taxonomy of different imaging systems with laser projection. These 
abbreviations  were  used  in  the  table:      VSB  (videostroboscopy),  HSV  (high-speed 
videoendoscopy), 3D (three-dimensional reconstruction), nm (nanometer), mW (milli 
Watt). ..........................................................................................................................108 

Table 4.2. Statistics of the measurement error. All measurements have the unit of mm and the 
number  in  parentheses  signifies  the  number  of  functions  that  were  used  in  the 
measurements. .............................................................................................................131 

Table  4.3.  Results  of  correlation  test  for  vertical  measurement  errors.  The  symbol  ε  means 
p<0.00001. ..................................................................................................................132 

Table 5.1. Actual values of working distance and tilting angle for each target group. The first 
number represents the actual working distance in mm, and the second number the actual 
tilting angle in degree. ................................................................................................147 

Table 5.2. Results of 2×2 robust ANOVA. ..................................................................................151 

Table 5.3. Descriptive statistics of pixel sizes. ............................................................................152 

Table 5.4. Results of 2×4 robust ANOVA. ..................................................................................153 

Table 5.5. Estimated values of pixel size. ....................................................................................156 

Table 5.6. Results of 7×4×3 ANOVA for trimmed means. .........................................................159 

Table 5.7. The percentage of difference at the back and front peripheries from different working 
distances and tilting angles. ........................................................................................163 

Table 5.8. Estimated uncalibrated length (i.e. pixel length) of a 2 mm object at different locations 
of the FOV and different tilting angles. ......................................................................164 

Table 6.1. Correlation coefficients of the uniform model for radial measurement error. The symbol 
ε denotes a p<0.0001. ..................................................................................................188 

Table 6.2. Correlation coefficients of the non-uniform model for radial measurement error. The 
symbol ε denotes a p<0.0001. .....................................................................................189 

Table 6.3. Accuracy of radial measurements from the uniform and the non-uniform models in 
different ranges of working distance. ..........................................................................190 

Table 7.1. The estimated working distance from the 3D surface  ...............................................215 

 

xi 

Table 7.2. Results of multiple linear regression for the JOV vertical measurement model. The 
symbols wd, a, and ε stands for the working distance, the imaging angle, and p<0.00001.
 ....................................................................................................................................221 

Table 7.3. Results of multiple linear regression for the PCA vertical measurement model. The 
symbols wd, a, and ε stands for the working distance, the imaging angle, and p<0.00001.
 ....................................................................................................................................221 

Table  7.4.  Results  of  multiple  linear  regression  for  the  uniform  model  for  horizontal 
measurements. The symbols wd, a, and ε stands for the working distance, the imaging 
angle, and p<0.00001. .................................................................................................224 

Table  7.5.  Results  of  multiple  linear  regression  for  the  non-uniform  model  for  horizontal 
measurements. The symbols wd, a, and ε stands for the working distance, the imaging 
angle, and p<0.00001. .................................................................................................224 

Table 7.6. Results of 2×5 ANOVA for vertical measurement errors  .........................................227 

Table 7.7. Mean percent error and mean percent magnitude of error for vertical measurement..
 ....................................................................................................................................228 

Table 7.8. Results of two-way ANOVA for horizontal measurement errors  .............................229 

Table 7.9. Mean percent error and mean percent magnitude of error for horizontal measurement.
 ....................................................................................................................................230 

 

 

xii 

LIST OF FIGURES 

Figure 1.1. Illustration of a horizontal plane and the vertical direction. ........................................15 

 
 

Figure 1.2. The employed setup for benchtop recordings. ............................................................25 

Figure 1.3. Examples of incorrect placements of the FOV in the image frame. ...........................25 

Figure 1.4. Some examples of the FOV with unclear edges. .........................................................26 

Figure 1.5. Some examples of the inadequate border between the FOV and the image frame. ....26 

Figure 1.6. An example image with non-visible fiducial marker. .................................................27 

Figure 2.1. Two examples of intraoperative calibrated images, taken from references 190 and 93. 
 ......................................................................................................................................31 

Figure 2.2. Results of registration uncertainty test: (A) values of interquartile range for different 
patients  and  (C)omfortable  and  (H)igh  pitch  phonations,  (B)  estimated  pdf  of 
interquartile range over all recordings. .........................................................................44 

Figure 2.3. Boxplot of mm size of vocal fold length attribute of each subject for (C)omfortable and 
(H)igh pitch phonations.  ..............................................................................................48 

Figure 2.4. Measurement of the vocal fold width: (A) the reference image with designated vocal 
fold and the target anchor point, (B) the measurement steps. .......................................51 

Figure 2.5. Boxplot of mm size of vocal fold width of each subject for (C)omfortable and (H)igh 
pitch phonations. ...........................................................................................................51 

Figure 2.6. Boxplot of mm size of an attribute of blood vessels on the vocal fold of each subject 
for (C)omfortable and (H)igh pitch phonations. ...........................................................53 

Figure 2.7. Boxplot of mm size of an attribute of blood vessels on a nearby tissue of each subject 
for (C)omfortable and (H)igh pitch phonations. ...........................................................55 

Figure 3.1. Result of registration uncertainty test for included subjects. ......................................72 

Figure 3.2. Intraoperative images from subjects with high uncertainty registration.  ...................72 

Figure 3.3. An example of temporal segmentation outcome. ........................................................75 

Figure 3.4. An example of motion compensation: (A) kymogram before motion compensation, (B) 
kymogram after motion compensation. ........................................................................76 

 

xiii 

Figure  3.5.  Effect  of  endoscopic  rotation  on  the  kymogram:  (A)  kymogram  before  rotation 
compensation, (B) kymogram after rotation compensation. .........................................77 

Figure 3.6. Estimation of the GAW: (A) pdf of the red channel, and the computed black threshold, 
(B) GAW estimate after applying the black threshold..................................................79 

Figure 3.7. Rotation correction for a frame of data: (A) before correction, (B) segmented glottis 
with the fitted line on the first moment of inertia from each row, (C) after correction.
 ......................................................................................................................................80 

Figure 3.8. Temporal curve fitting results: (A) local black reference estimation, the red window 
shows the search window, (B) ROI segmentation, (C) detection of vocal fold edges. 82 

Figure 3.9. Spatial curve fitting results: (A) outlier removal step, (B and C) segmented edges of 
the vocal fold for two different timepoints. ..................................................................83 

Figure 3.10. Selection of the data: (A) the least stable portion of a phonation, (B) the most stable 
portion of a phonation. ..................................................................................................86 

Figure 3.11. Boxplot of closing phase maximum velocity for  different subjects pre- and post-

Figure 3.12. Boxplot of closing phase maximum velocity for  different subjects pre- and post-
. ...............................................89 

, (B) box plot of v     . ....................................................89 
surgery: (A) box plot of v    
, (B) box plot of v      
surgery: (A) box plot of v      
 
 
, (B) box plot of v     . ....................................................90 
surgery: (A) box plot of v    
blue region shows the lesion (B) Scatter plot of post-surgery changes in v           vs. area 

Figure 3.14. Boxplot of closing phase maximum velocity for the vocal fold with the lesion and the 
(cont)ralateral side for different subjects: (A) pre-surgery condition, (B) post-surgery 
condition. ......................................................................................................................95 

Figure 3.15. The relationship between area of a lesion and its post-surgery improvement: (A) The 

Figure 3.13. Boxplot of closing phase maximum velocity for  different subjects pre- and post-

of the lesion. The outliers are marked by a red circle. ..................................................96 

Figure 4.1. Schematics of different laser projection techniques with the principle of encoding the 
vertical and/or horizontal distances: (A) laser triangulation method, (B) structured light 
projection, (C) a combined technique. Green and red dots depict hypothetical positions 
of the laser pattern at two different vertical distances. ...............................................106 

Figure 4.2. The calibrated flexible endoscope with an insertion tube diameter of 4.9 mm and its 
main components. .......................................................................................................111 

Figure 4.3. A diagram of the recording conditions. .....................................................................112 

 

xiv 

Figure 4.4. Calibration setup: (A) measuring the distance to the tip of the endoscope, (B) measuring 
the distance to the fixture. ...........................................................................................113 

Figure 4.5. Model for compensating the recording parameters of the system. ............................117 

Figure 4.6. The intensity of the laser points: (A) sum of the intensity of pixels on the rows, (B) 
original image, (C) sum of the intensity of pixels on the columns. ............................121 

Figure 4.7. Position of each laser point as a function of working distance where each color shows 
a different laser point: (A) x-y coordinates as a function of working distance, (B) x-
coordinate as a function of working distance, (C) y-coordinate as a function of working 
distance. ......................................................................................................................123 

Figure 4.8. Distribution of the variability in the output of FOV and the fiducial finder modules: 
(A) distribution of the centralized coordinates of the FOV center, (B) distribution of the 
centralized radius of FOV, (C) distribution of the centralized fiducial angle. ...........126 

Figure 4.9. Distribution of the variability in the output of the laser finder module. ....................126 

Figure 4.10. Displacement analysis of the laser points as the working distance is changing: (A) the 
magnitude of variation in the position of the laser points as the working distance is 
changing from 35 mm to a new distance, (B) the magnitude of variation in the position 
of the laser points for 1 mm decrement at different working distances. .....................128 

Figure 4.11. The behavior of different laser points: (A) indexing used in this chapter, (B) the 
average magnitude of displacement of each laser point. ............................................128 

Figure 4.12. The average magnitude of displacement of each laser point. ..................................129 

Figure 4.13. Boxplot of vertical measurement errors at different working distances: (A) results 
from all functions, (B) results when the functions from the top row are discarded. ..130 

Figure 5.1. Optical principles of image formation: (A) parameters of the Snell’s law, (B) image 
formation in the Gaussian optics model. ....................................................................142 

Figure 5.2. Effects of tilting the target surface on the geometry of the acquired images.  ..........144 

Figure 5.3. A schematic for measuring the tilting angle. .............................................................147 

Figure 5.4. Automatic detection of the grid lines: (A) recording from 1 mm grids at the working 
distance of 10 mm, (B) the binary image showing the locations of the minima, (C) fitted 
second-order polynomials on the locations of the minima. ........................................149 

Figure 5.5. Groupings for experiments 1 and 2: (A) the solid red blocks and the patterned blue 
blocks denote the center and the periphery groups, (B) the selected sides of an example 
image. The Center of the image-FOV is denoted by a green cross mark. ..................151 

 

xv 

Figure 5.6. Variation in pixel size for different working distances and groups. ..........................153 

Figure 5.7. Boxplots of the pixel size for different groups and working distances. ....................154 

Figure 5.8.  Estimation of the dependence of pixel size on its spatial location, (A) selected line 
segments are shown in green dashed line, and the center of the image-FOV is denoted 
with a red cross mark, (B) dependence of pixel size on its distance from the center of 
the image-FOV and the working distance. The negative distance means blocks that were 
below the center of the image-FOV. ...........................................................................155 

Figure 5.9. Groupings for experiment 3. Solid red lines denote the back group, dotted green lines 
denote the middle group, and dashed blue lines denote the front group,  (A) groupings 
at the working distance of 5 mm, (B) groupings at the working distance of 15 mm. ..158 

Figure 5.10.  Values of the mean and standard deviation of pixel size: (A) working distance of 5 
mm, (B) working distance of 10  mm, (C) working distance of 15  mm, (D) working 
distance of 20 mm. ......................................................................................................159 

Figure 5.11.  (A) The selected line segments are shown in green dashed lines, and the center of the 
image-FOV  is  denoted  with  a  red  cross  mark.  (B)  Dependence  of  pixel  size  on  its 
distance from the center of the image-FOV and the tilting angle at the working distance 
of 15 mm. ....................................................................................................................161 

Figure 5.12. Dependence of location with the highest spatial resolution on the tilting angle. ....161 

Figure  6.1.  Relationship  between  the  length  of  an  object  (ho)  and  its  image  (hi)  in  an  axially 
symmetrical optical system. ........................................................................................174 

Figure  6.2.  Effects  of  working  distance  and  spatial  location  on  horizontal  measurements:  (A) 
working distance of 2.87 mm, (B) working distance of 2.24 mm. ..............................176 

Figure 6.3. The data for evaluation of central angle measurement: (A) the custom-designed grid, 
(B) segmented radial lines. .........................................................................................178 

Figure 6.4. Segmentation of a circular grid: (A) horizontal and vertical strips with their respective 
summations, (B) final segmented circles after the fine-tuning stage. .........................179 

Figure 6.5. Models for horizontal measurements: (A) non-uniform model, (B) uniform model.
 ....................................................................................................................................181 

Figure 6.6. Expressing a general measurement in terms of radial measurements. ......................183 

Figure 6.7. Mean absolute error (MAE) of original and the proposed PCA method for different 
values of the standard angle. .......................................................................................184 

 

xvi 

Figure 6.8.  Performance of estimating the working distance: (A) indexing of the laser points, (B) 
measurement accuracy of different laser points, (C) effect of working distance. ......187 

Figure 6.9.  Performance of uniform model for radial measurements: (A) effect of object length, 
(B) effect of working distance. ...................................................................................188 

Figure 6.10. Performance of non-uniform model for radial measurements: (A) effect of object 
length, (B) effect of working distance. .......................................................................189 

Figure 6.11.  Boxplot of angle estimation error computed from set3. .........................................191 

Figure 6.12.  Performance of uniform model for general measurements: (A) effect of working 
distance, (B) effect of object length. ...........................................................................192 

Figure 6.13. Performance of non-uniform model for general measurements: (A) effect of working 
distance, (B) effect of object length. ...........................................................................193 

Figure 7.1. Imaging from a tilted surface: (A) effect of tilting the target surface on different objects 
within the FOV, (B) effect of tilting the target surface on the geometry of the FOV... 
 ....................................................................................................................................202 

Figure 7.2. The effect of tilting the target surface vs. changing the imaging angle. ...................204 

Figure 7.3. Recordings from a circular grid at the working distance of 8.66 mm: (A) the tilting 
angle  of  15°,  (B)  the  tilting  angle  of  -15°,  (C)  tilting  angle  of  0°  after  making  the 
endoscopic tip perpendicular to the target surface. .....................................................205 

Figure 7.4. The setup that allowed precise adjustment of the distal tip of the endoscope. ..........206 

Figure 7.5. A diagram of the recording conditions. Different colors correspond to the FOV cone at 
different  working  distances.  To  simplify  the visualization,  the  target  surface  is  kept 
fixed and the camera is displaced. However, in the experiments it was the other way 
around. ........................................................................................................................207 

Figure 7.6. Placement of the 5-mm line segment inside the FOV for horizontal measurements. 
 ....................................................................................................................................209 

Figure 7.7. A schematic for estimation of the true vertical distance of the laser point B. ...........210 

Figure 7.8. An example of computing the mm distance between two laser points B and R. .......212 

Figure 7.9. The data used for investigating the effect of 3D shape: (A) the 3D model, (B) fiducial 
markers, (C) the printed composite model  .................................................................214 

 

xvii 

Figure  7.10.  The  outcome  of  the  registration  process:  (A)  a  composite  image  before  the 
registration. Centers of the fiducial markers are marked with a red dot. Centers of the 
laser points are marked with a green cross mark, (B) the registration outcome for the 
composite image. ........................................................................................................216 

Figure 7.11. Boxplots of vertical measurement error using the JOV model at different working 
distances and imaging angles.  ....................................................................................219 

Figure 7.12. Boxplots of vertical measurement error using the PCA model at different working 
distances and imaging angles.  ....................................................................................220 

Figure  7.13.  Boxplots  of  horizontal  measurement  error  from  the  uniform  model  at  different 
working distances and imaging angles. ......................................................................222 

Figure 7.14. Boxplots of horizontal measurement error from the non-uniform model at different 
working distances and imaging angles. ......................................................................223 
Figure 7.15. Performance of the PCA model on a flat surface: (A) vertical measurement errors, 
(B): magnitude of vertical measurement errors.  ........................................................226 

Figure 7.16. Performance of the vertical measurement errors on a 3D surface: (A) boxplot of error, 
(B) boxplot of the magnitude of error. ........................................................................226 

Figure 7.17. Performance of the horizontal measurement errors on a 3D surface: (A) boxplot of 
error, (B) boxplot of the magnitude of error. ...............................................................................229 

Figure 8.1. Graphical representation of the relationships among the chapters of this dissertation.
......................................................................................................................................................238
 

 

xviii 

CHAPTER 1: INTRODUCTION 

 

 

1.1. Background 

Voice and speech are the main communication channels for expressing our ideas, thoughts, 

and emotions. Furthermore, we use them for artistic creations (e.g. singing). Therefore, it is not 

very surprising that degradation in speech production and voice quality would lead to serious 

problems in communication. Several studies have shown that degraded voice quality is associated 

with significant negative bias and attitude of the society, and hence could have a negative effect 

on the social life of people with voice disorders.1–4 Previous studies have also confirmed that the 

psychological and emotional burden of degraded voice quality could be very high and it could lead 

to a serious deterioration in the perceived quality of life.5–7 Furthermore, voice and speech have a 

significant role in the career of professional voice users (i.e. news anchors, teachers, singers, etc.), 

and  maintaining  the  voice  quality  becomes  even  more  important  for  a  large  population. 

Considering the high prevalence of voice and speech problems in the general population (5 million 

school-age annually8 and between 3% to 9% of the whole population9,10) and even higher incidence 

rate for professional voice users (e.g. prevalence between 50% to 80% in lifetime of teachers11), a 

very large population would benefit from quantitative research in this area. 

The voice production system can be modeled as a dynamic system that takes a well-regulated 

air stream as the input and modifies it in a certain way to produce a specific acoustic signal in the 

output. Glottis in the larynx is the first and the primary place for conversion of airflow into the 

acoustic signal. Therefore, determining and understanding the behavior of vocal folds and their 

vibratory characteristics is important for the advancement of voice science research and clinical 

 

1 

applications.  Considering  the  position  of  the  larynx  in  the  airway,  direct  assessment  and 

observation of the larynx and the vocal folds have been challenging. Consequently, for a long-time 

people  were  relying  on  the  output  of  the  phonatory  system  (i.e.  the  acoustic  signal  and  the 

aerodynamic  measurements)  for  studying  the  behavior  and  functional  assessment  of  the  vocal 

folds. These methods are called output-based approaches in the rest of this dissertation. However, 

advancements in technology have made it possible to directly observe and study different parts of 

the  phonatory  system,  including  the  vibration  of  the  vocal  folds.  These  approaches  are  called 

internal-based  approaches  in  the  rest  of  this  dissertation,  as  they  provide  a  direct  means  for 

studying the internal states and function of the phonatory system. 

The  output  of  the  voice  production  system  is  an  aerodynamic-acoustic  phenomenon. 

Therefore, it is possible to analyze the aerodynamic and acoustic signals of the voice and derive 

certain information regarding the vibration of the vocal folds. Considering the interaction of the 

airflow with the cyclic abduction (i.e. opening) and adduction (i.e. closing) of the vocal folds, 

airflow could provide a very good means for studying the vibration of the vocal folds. In normal 

voice production, the glottal flow starts with the opening of the glottis and during a specific time, 

it reaches the maximum (positive slope) at this point the vocal folds are fully abducted. Then, the 

vocal folds start to move toward the midline and hence the flow starts to decline (negative slope) 

until the vocal folds are fully adducted and hence the flow stops (or reaches its minimum). The 

vocal folds remain adducted for a specific time and then this cycle is repeated again.12,13 Several 

important characteristics from this cyclic behavior have been defined for differentiation between 

different modes of phonation and diagnostic purposes.12,13 Open quotient is defined as the portion 

of a cycle that the glottis is open. Additionally, the duration of the positive slope is longer than the 

negative slope in normal phonation14,15 which results in right-skewed flow measurements. This 

 

2 

characteristic has been associated with the inertia of the air in the vocal tract14,15 and is an important 

feature of the glottal flow. Skewing quotient can capture and quantify this feature and is defined 

as the ratio of the duration of the increasing flow to the duration of the decreasing flow.12 These 

two  measurements  provide  a  rough  picture  from  the  shape  of  the  glottal  flow  and  provide 

information about the underlying mechanism of the voice production system and the produced 

voice quality. For example, high values of open quotient have been associated with a breathy 

quality16–19 and low values may indicate a pressed phonation.20–22 The maximum flow declination 

rate is another important output-based measurement which is defined as the maximum value of the 

derivative  of  glottal  airflow.23  This  measurement  has  been  used  as  an  indirect  approach  for 

estimation of the closing velocity of the vocal folds.24 Maximum flow declination rate is closely 

related to collision forces of the vocal folds25–27 and the produced acoustic output.23,28,29 

Acoustic assessment is another example of output-based approaches and it accounts for the 

majority of the studies in voice research.30 Two main approaches of auditory perceptual assessment 

and  acoustic  measurement  can  be  identified  for  the  evaluation  of  the  acoustic  data.  Auditory 

perceptual  assessment  is  considered  as  the  gold  standard31–34,  and  the  most  commonly  used 

technique  in  the  clinical  settings.35  This  approach  does  not  require  additional  investment, 

additional equipment, and technical knowledge. In perceptual assessment, the quality of the voice 

has been evaluated using qualitative terms such as wet36,37, gurgly38,39, breathy40,41, hoarse41,42, 

harsh43,44, rough40,41, creaky45,46, strained40,41, and many more. These terms refer to qualitative 

features of a voice that are clear enough for the general population such that almost everyone 

understands  them.34  Consequently,  they  are  very  useful  for  communication  between  different 

people and hence would facilitate client-clinician communications. On the other hand, researches 

have indicated low reliability of auditory perceptual evaluations.31,47,48 More recent studies have 

 

3 

tried to remove some of the subjectivities from the evaluation, and hence to reduce the inter-rater 

and  intra-rater  variabilities  by  providing  standard  anchors  or  using  matching  tasks,.49–52 

Variabilities  in  the  evaluation  terms  and  protocols  were  another  big  issue  with  the  perceptual 

approaches. Efforts have been made to standardize the routine and the scales for evaluation of the 

voice quality. CAPEV40 and GRBAS41 are two widely used instruments for this purpose.   

Acoustic measurements are objective approaches that have been designed based on signal 

processing techniques, and hence can alleviate some of the issues associated with the perceptual 

methods. Robustness to factors such as bias and variability is the primary advantage of acoustic 

measurements. Also, once these methods are developed, they provide fast and low-cost tools for 

assessing the voice in an automatic and repeatable fashion. Finally, since the procedures behind 

these  measurements  are  known,  their  resolutions  and  sensitivities  can  be  evaluated.  Current 

objective measurements of voice quality can be grouped in four main categories of perturbation 

measurements  (e.g.  jitter53,54,  pitch  perturbation55,  pitch  perturbation  quotient56,  shimmer57, 

amplitude perturbation58, and amplitude perturbation quotient56), noise measurements (e.g. signal 

to noise ratio (SNR)59, harmonic to noise ratio (HNR)60, frequency domain HNR61, normalized 

noise energy62, glottal-to-noise excitation ratio63, the energy of the noise from a filter bank64), 

spectral and cepstral measurements (e.g. spectral slope65,66, cepstrum peak prominence (CPP)67, 

Mel-frequency cepstral coefficients68, energy and entropy of wavelet sub-bands69, and temporal 

and  spectral  dynamics  of  the  speech70),  and  non-linear  measurements  (e.g.  largest  Lyapunov 

exponent71, correlation dimension72, and parameter of the phase-space73). 

Using the output-based approaches for studying the voice production system is similar to 

reverse engineering. Output-based data are easier to collect; however, it is not an easy task to relate 

them back to their underlying mechanisms. Often, this step requires the assumption of a model 

 

4 

that describes the system very well. Additionally, there are several other factors that make things 

even more complicated. Based on control theory and mathematical analysis of dynamic systems, 

it is well-known that only under certain assumptions, internal states of a system can be inferred 

from  its  output.74,75  Additionally,  several  characteristics  of  the  speech  production  system  (e.g. 

multiple to one mapping) may lead to an ambiguous interpretation of the underlying mechanism 

from  the  output.  For  example,  researchers  have  shown  that  multiple  significantly  different 

articulatory  configurations,  can  lead  to  the  same  acoustic  measurements  and  output.76–78 

Considering  the  complex  structure  and  interaction  between  intrinsic  and  extrinsic  laryngeal 

muscles, and also their agonist-antagonist roles, a similar characteristic can be expected for the 

phonatory mechanism, too. Quantal and saturation effects are non-linear properties of the speech 

production system that describe stable regions such that changes within that region do not lead to 

a  change  in  the  acoustic  output.79–84  Interestingly,  these  characteristics  are  not  unique  to  the 

articulatory system and also exist in the phonatory mechanism. For example, recent studies on the 

biomechanics of the larynx have indicated the existence of rich quantal regions in the larynx.85,86 

These characteristics facilitate motor planning and help with the production of stable sounds.87 

However, these features create ambiguity in determining the internal states of the system from its 

output. Based on these arguments, internal-based approaches are more favorable for studying the 

underlying mechanism of voice production and voice disorder. 

Imaging techniques are probably the most important and popular internal-based approach for 

studying  the  voice.  Imaging  techniques  can  provide  a  wealth  of  information  regarding  the 

underlying mechanisms of vice production, their configuration, and their kinematics. Considering 

that vocal folds are vibrating at relatively high frequencies --with the typical range of 85-196 Hz 

for  males,  155-334 Hz  for  females,  and  208-440 Hz  for  children  during  normal  speaking88-- 

 

5 

imaging techniques should be able to track such frequencies. In fact, research has recommended a 

minimum of 4000 frames per second (fps) for a reliable functional assessment of the voice.89 The 

existing imaging system can be classified based on different criteria. One important distinction can 

be made based on the imaging modality and how this frame-rate requirement is addressed. In that 

regard  videostroboscopy  (VSB),  videokymography  (VKG),  and  high-speed  videoendoscopy 

(HSV) could be identified as the most common modalities for visualization of the vocal folds. 

Using a different criterion, imaging systems can be classified based on the type of endoscope that 

gets connected to the imaging system. Using this criterion two types of rigid and flexible systems 

could be identified. It is noteworthy that each of these factors would lead to different functionalities 

and applications for the acquired images. For example, the imaging modality (i.e. VSB, VKG, 

HSV) determines the type of phenomenon that can be captured and studied using the imaging 

system. However, the type of endoscopic instrument (i.e. flexible vs. rigid) determines the type of 

stimuli that can be elicited. Regardless of the employed imaging modality and the endoscopic 

instrument, acquired images can be evaluated with subjective visual assessment approaches89–96 or 

objective measurements.97–103 

To elaborate more on the effect of the imaging modality on the type of phenomenon that can 

be studied a brief introduction on principles of each imaging modality is presented. VSB system 

typically flashes a strobe light at specific phases of consecutive glottal cycles, and in this manner 

creates an illusion of slow-motion from vocal folds vibration.104,105 Clearly, this technique requires 

a precise mechanism for estimation of the fundamental frequency and synchronization with it. 

Therefore, two important conditions for the correct functionality of VSB can be determined. First, 

the target phenomenon should be cyclic, and therefore it is not applicable to transient phenomena 

such as voice onset, voice offset, voice break, etc. Second, the target phenomenon should be nearly 

 

6 

periodic. Considering that fundamental frequency could be ambiguous in type2,312 and 4106 voices 

--which correspond to many cases of disordered voices-- VSB does not represent a correct slow 

motion  from  highly  dysphonic  phonations.  Additionally,  it  is  a  well-known  phenomenon  that 

depending on the sampling frequency, the strobe slow-motioned picture may appear freezing or 

even backward playing.107 On the positive side, VSB can provide audio-synchronized visualization 

of the vocal folds which is very important for clinical evaluations.108 Additionally, the distal-chip 

VSB systems can provide very high-quality images.109 Therefore, despite its inherent flaws, it is 

still the gold-standard method for clinical evaluations.108,110,111 VKG uses a different approach for 

visualization of the fast vibration of the vocal folds. The idea is to capture high-speed images from 

a single line of the vocal folds along its posterior-anterior axis, and then to stack them up and create 

a composite image.112 VKG images can capture up to 8,000 images from the target section.112 

VKG captures the true behavior of the vocal fold and then show it in real-time; therefore, it is very 

appropriate  for  clinical  evaluations.113  VKG  can  demonstrate  the  existence  of  many  vibratory 

characteristics 

including,  subharmonics  (i.e. 

type2  phonation12), 

left-right  asymmetries, 

propagation of the mucosal waves, and open quotient.112,113 However, limited spatial resolution 

(i.e. single line scanning) is the biggest limitation of VKG. HSV can provide full images at the rate 

of 20,000 fps or even higher89, and therefore can provide recordings with high temporal and spatial 

resolutions  from  vocal  folds  vibration.  In  comparison  to  VSB,  HSV  has  a  better  temporal 

resolution  and  therefore  can  be  used  for  studying  aperiodic  vibrations,  as  well  as,  transient 

phenomena. In comparison to VKG, HSV has a higher spatial resolution. This feature is necessary 

for studying the spatial aspects of the vibration such as spatial variations in the kinematics of the 

vocal folds. In summary, HSV captures vibration of the vocal folds as it is happening, and hence 

it  could  be  the  gold  standard.  More  specifically,  it  can  be  used  for  validation  of  other 

 

7 

measurements100,114  and  also  the  validation  of  computational  models115,  which  other  imaging 

techniques cannot do as accurately. Finally, both VKG and VSB can be simulated from HSV 

recordings.111 These characteristics make HSV the ideal tool for studying normal and disordered 

phonations.  However,  these  significant  benefits  come  at  a  price.  Considering  the  huge  data 

generated from HSV systems, manual analysis is not a viable solution and automated methods 

should be developed for the analysis of HSV recordings. Processing of HSV recordings typically 

consists  of  multiple  steps  including  segmentation,  motion  compensation,  and  measurement. 

Segmentation is the first step in the analysis of the HSV recordings, where the phenomenon of 

interest is extracted.111 Depending on the desired phenomenon segmentation can be performed in 

temporal111,116,  spatial98,117–122,  and  spatial-temporal123,124  domains.  Motion  compensation  is 

another important step that could remove artifacts introduced by movements of the camera or the 

endoscope.124,125 

The endoscopic instrument also has significant impacts on the application of the acquired 

images. Rigid endoscopes provide images with better spatial resolutions and visual qualities.126,127 

They have minimum image distortion127 and can provide significantly more diagnosis information 

for a wide variety of voice disorders including vocal fold lesions127,128 and laryngopharyngeal 

reflux.127,129 Therefore, rigid endoscopes are considered the “gold standard” for awake imaging 

conditions.127,129 On the other hand, due to transoral insertion, the rigid endoscopes affect the voice 

and speech production systems. For example, to get a decent view from the larynx the tongue 

should be retracted unnaturally.130 This means that only limited types of stimuli can be elicited. 

Additionally, the altered voice production system could raise some concerns regarding the validity 

of  the  acquired  data.  For  example,  a  previous  study  has  shown  that  the  presence  of  a  rigid 

endoscope could significantly change the fundamental frequency and the quality of the produced 

 

8 

voice.131 The changes in the fundamental frequency may indicate the altered functionality of the 

phonatory mechanism in the presence of the scope. Also, the changes in the voice quality may 

indicate issues regarding the validity of subsequent measurements. 

Flexible  endoscopy  does  not  interfere  with  articulators  and  speech  can  be  produced  with 

minimal interference; therefore, it could be more ecologically valid. Additionally, there are fewer 

restrictions on the type of stimuli that could be produced. Thus, flexible endoscopes could be used 

for analysis and studying of vibratory patterns of the vocal folds during connected speech.116 This 

feature has made flexible scopes the instrument of choice for diagnosis and evaluation of most 

neurological  voice  disorders.126  Flexible  endoscopes  can  also  provide  the  possibility  of 

simultaneous  aerodynamic  measurements.132–134  This  characteristic  could  provide  significant 

information about the complex interactions between kinematics, aerodynamics, and the produced 

acoustic of the phonatory system. Additionally, flexible endoscopes allow the complete visual 

examination and evaluation of the vocal tract.135 Last but not least, flexible endoscopes have been 

associated with higher success rates in adult127 and especially pediatric136,137 populations. On the 

other hand, flexible endoscopes are more invasive and have been associated with more pain and 

discomfort even among adult subjects.138 Additionally, flexible endoscopes have inferior image 

quality and spatial resolution.  

1.2. Significance and rational 

The ability to perform measurements is an important cornerstone and the prerequisite of any 

quantitative research. Measurements allow us to quantify inputs and outputs of a system, and then 

to express their relationships using concise mathematical expressions and models. These models 

could help us then, to understand how that system works. Additionally, measurements could enable 

us  to  make  intelligent  and  accurate  predictions  about  the  output  of  a  system  if  certain 

 

9 

characteristics  of  that  system  are  changed.  Conversely,  they  could  enable  us  to  determine  the 

proper parameters of the system for achieving a certain output. Obviously, quantitative research 

from the phonatory system is not an exception. Moreover, the goal of predicting the output of the 

system for changes in the parameters of the system has significant and practical implications for 

people with dysphonia. Specifically, dissimilar intervention outcomes in different patients could 

be due to their individual differences. In this sense, models could improve our ability to account 

for individual differences during the diagnosis and to improve the likelihood of reliable predictions 

about the outcome of different treatment options. In this sense, the likely outcome of different 

interventions  could  be  predicted,  and  the  best  one  could  be  selected.  The  existence  of 

computational models that could link the input, parameters, and the output of the phonatory system 

together  are  important  components  for  developing  precision-medicine  and  personalized 

approaches to diagnosis and treatment of voice disorder. 

The voice production mechanism can be modeled as a dynamic system with specific input, 

system parameters, and output. Interestingly, the required methodology for measuring the input 

and the output of this system on calibrated scales has been around for a long time.13 Specifically, 

the air stream is the input of the phonatory system which can be measured on calibrated scales 

using  airflow  and  air  pressure  measurements.23,28,139  The  acoustic  signal  is  the  output  of  this 

system,  which  can  also  be  measured  on  a  calibrated  scale  using  sound  pressure  level.140–146 

However, this is not the case for the system parameters. That is, the required methodology for 

kinematic measurements of the vocal folds and spatial measurements from the larynx on calibrated 

scales  are  missing.122  Such  measurements  are  necessary  for  a  wide  range  of  computational 

approaches  to  study  and  understand  the  biomechanics  and  aerodynamics  of  the  phonatory 

mechanism.147–152  Additionally,  the  calibrated  spatial  measurement  could  be  very  valuable  for 

 

10 

studying the developmental aspects of vocal fold vibration.153 The primary goal of this dissertation 

is to fill this significant gap and to present methods for calibrated spatial measurements from in-

vivo laryngeal HSV images. 

Another significance of this research is its application in the advancement of evidence-based 

practice in the field of laryngology and speech-language pathology. Specifically, the efficacy and 

outcome of voice therapy are usually evaluated using auditory perceptual changes between pre- 

and post-therapy conditions.154–157 A survey study from experienced speech-language pathologists 

showed  that  perceptual  assessments  were  the  most  likely  evaluation  tool  in  the  field.35 

Additionally, some researchers have used acoustic measurements as an objective alternative for 

quantification  of  the  efficacy  of  the  intervention.158–162  However,  both  auditory  perceptual 

methods, and acoustic measurements are based on the output of the phonatory system and have all 

of the mentioned limitations of the output-based measurements. Most importantly, it is not trivial 

how  to  infer  physiological  changes  from  changes  in  the  acoustic.  A  more  straightforward 

alternative would be direct measurements of the physiological changes due to the intervention. For 

example, the efficacy of a therapy on a nodule could be evaluated and quantified in terms of 

changes in the lesion size, or changes in kinematics and vibratory patterns of the vocal folds. 

Obviously, this approach could provide significant information and provide the required evidence 

on the efficacy of different therapies. However, measuring the lesion size and the computation of 

kinematic measures (e.g. velocity measures) require horizontally calibrated images. That is, we 

need to compare the lesion size pre- and post-intervention, or correlate changes in the lesion size 

to changes in the kinematics of the vocal folds post-intervention. The proposed research presents 

methods for calibrated horizontal measurements and has the potential of addressing this need, and 

therefore is very significant. 

 

11 

Velocity measures are important kinematic features that can capture the dynamics of vocal 

folds’ vibration. Velocity measures can relate different aspects of the phonatory system together, 

and therefore are significant for voice science and clinical applications. For example, the closing 

velocity of the vocal folds relates to their collision forces.25–27 It also relates to the maximum flow 

declination rate26,163 and the maximum area declination rate102,164,165, which their effect on the 

average  produced  acoustic  output29  and  the  vocal  intensity23,28  have  been  established.  Finally, 

higher closing velocity increases the energy of high-frequency components of the voice, which in 

turn may improve the speech intelligibility.166 However, velocity is the calibrated displacement of 

an object with respect to time. Consequently, the computation of the velocity depends on calibrated 

temporal and spatial measurements. Time is already calibrated in cameras. Therefore, calibration 

of  the  spatial  domain  would  pave  the  path  for  the  computation  of  velocity  measures.  This 

dissertation presents different approaches for calibrated horizontal measurement from HSV images 

and hence has the potential of addressing this need.   

Another significance of this dissertation is its possible application in the quantification of the 

vertical  movements  and  displacements  of  the  phonatory  mechanism.  Specifically,  imaging 

techniques provide a direct method for observation and assessment of the larynx and hence are 

important  parts  of  diagnosis  and  functional  assessment  of  the  voice  production  system.167–171 

However, images are two-dimensional representations of the real world. Considering that, the real 

world is happening in three-dimensional (3D) space, images would not be a true representation of 

the actual phenomena that are being captured. In other words, the vertical dimension is lost during 

the data acquisition process, and we could not measure the distance of an object from the camera. 

This  lack  of  vertical  component  means  that  the  vertical  motion  of  the  larynx  and  the  vertical 

component  of  the  vibration  of  the  vocal  folds  could  not  be  measured  and  studied.  Multiple 

 

12 

modeling studies have predicted the significance and the role of the vertical component of the 

vibration of the vocal folds on phonation.164,172–175 For example, the mucosal wave is a surface 

wave that propagates along the medial surface (i.e. from the lower to upper margin) of vocal folds 

and in the direction of the airflow.176,177 Mucosal wave can also be expressed as a phase difference 

between the upper and lower margins of the vocal folds.174,177 Several important aspects of the 

voice  production  system  have  been  attributed  to  mucosal  wave.  For  example,  mucosal  wave 

velocity has been associated with the phonation threshold pressure (PTP)152,178–180 in the sense that, 

a  larger  vertical  phase  difference  may  lead  to  a  lower  PTP  which  may  indicate  an  easier 

phonation.174 Mucosal wave has also been associated with voice quality.91,177,181 Last but not least, 

subjective evaluations of the magnitude of the mucosal wave from in-vivo recordings have been 

used for diagnosis95,113,135,182–184 or measuring the efficacy of an intervention.185–189 However, the 

mucosal wave is a vertical aspect of the phonatory mechanism and therefore the capability of 

vertical measurements is the prerequisite for its objective quantification. This dissertation uses a 

laser-projection endoscope and presents the method for vertical measurements that can address 

these needs. 

In summary, the main significances and contributions of this dissertation are the following: 

(1) a formal treatment of indirect horizontal calibration is presented, and the principles governing 

its validity and reliability are discussed. A battery of tests is presented that can indirectly assess 

the validity of those assumptions in laryngeal imaging applications; (2) recordings from pre- and 

post-surgery from patients with vocal fold mass lesions are used as a testbench for the developed 

indirect  calibration  approach.  In  that  regard,  a  full  solution  is  developed  for  measuring  the 

calibrated velocity of the vocal folds. The developed solution is then used to investigate post-

surgery  changes  in  the  closing  velocity  of  the  vocal  folds  from  patients  with  vocal  fold  mass 

 

13 

lesions;  (3)  the  method  for  calibrated  vertical  measurement  from  a  laser-projection  fiberoptic 

flexible  endoscope  is  developed.  The  developed  method  is  evaluated  at  different  working 

distances, different imaging angles, and on a 3D surface; (4) a detailed analysis and investigation 

of non-linear image distortion of a fiberoptic flexible endoscope is presented. The effect of imaging 

angle and spatial location of an object on the magnitude of that distortion is studied and quantified; 

(5) the method for calibrated horizontal measurement from a laser-projection fiberoptic flexible 

endoscope  is  developed.  The  developed  method  is  evaluated  at  different  working  distances, 

different imaging angles, and on a 3D surface. 

1.3. Structure of the dissertation and the research questions 

This dissertation is focused on developing the required methodologies for calibrated spatial 

measurements  from  in-vivo  HSV  recordings.  To  that  end,  two  projects  are  conducted  in  this 

framework.  This  section  provides  a  brief  overview  of  each  project,  with  discussions  on  how 

different chapters are connected to each other. In that regard, this section connects different pieces 

of this dissertation together and describe how they fit into a single framework. 

Considering that laryngeal HSV recordings are typically performed in an upright position, the 

following directions are defined for the rest of this dissertation. A horizontal plane is an imaginary 

plane that splits the body into the superior (i.e. above) and the inferior (i.e. below) sections. The 

vector normal (i.e. perpendicular) to that plane is called the vertical direction. Figure 1.1 presents 

an illustration of these terms.  

 

14 

Figure 1.1. Illustration of a horizontal plane and the vertical direction. 

 

 

Based on our daily experiences, we know that the size of an object in an image depends on its 

distance from the camera. This means that we could measure the pixel size of an object in an 

image, but we could not relate it to its actual size (e.g. the mm size). In this regard, we could not 

perform  absolute  mm  measurements  on  typical  images  and  hence  we  say  that  images  are  not 

spatially calibrated. However, if we have some specific auxiliary information, we could make mm 

measurements  from  the  images.  The  pixel-to-mm  conversion  scale  is  the  required  auxiliary 

information. The procedure that allows us to compute the pixel-to-mm conversion scale is called 

calibration.  

Based on figure 1.1 two different types of spatial measurements could be identified. They 

include horizontal and vertical measurements. Additionally, the auxiliary information could come 

from  different  sources,  and  depending  on  that,  two  different  methods  of  direct  and  indirect 

calibrations can be distinguished and are defined here. The indirect calibration approach is defined 

as a method that its auxiliary information comes from a different image (possibly taken from a 

different  imaging  modality).  Using  the  intraoperative  calibrated  measurement  of  the  lesion 

 

15 

size93,190,191 for horizontal calibration of its corresponding HSV recording is an example of the 

indirect  approach.  Conversely,  the  direct  calibration  approach  is  defined  as  a  method  that  its 

auxiliary information comes from the same image that we want to make measurements from. 

Laser-calibrated endoscopes are the most common example of the direct approach in voice science 

research.191–195 

The main goal of this dissertation is to devise the methods for performing calibrated spatial 

measurements from in-vivo HSV recordings. Therefore, the central hypothesis of this dissertation 

is: 

H: 

Absolute spatial measurements from in-vivo HSV recordings using indirect and 

direct calibration approaches, are feasible. 

In order to test H, several research questions and sub-hypotheses were formed which are presented 

in the rest of this section. 

The indirect calibration approach does not require any specialized instruments and can be 

performed using the conventional and existing laryngeal imaging systems. Additionally, an image 

could be printed and a simple caliper would be enough for doing the horizontal measurement.93 

Consequently,  indirect  methods  are  very  simple  and  could  be  used  in  many  clinical  settings. 

Chapter 2 of this proposal taps into this potential and proposes a method for horizontal calibration 

of an HSV recording using its corresponding calibrated intraoperative image. The main idea is to 

find  a  proper  common  attribute  (e.g.  lesion  size  in  the  pre-surgery  recording)  between  the 

calibrated  image  and  the  HSV  recording,  and  then  to  register  that  attribute  (i.e.  aligning  the 

common attribute) on the HSV data. This project has external funding, and it is tightly related to a 

recently  funded  NIH  R01  grant  R01  DC017923  (PI:  Verdolini  Abbott)  with  a  subcontract  to 

Michigan State University (sub-award PI: Deliyski). The main research question of chapter 2 is: 

 

16 

Q1: 

How could calibrated intraoperative images be used for spatial calibration of pre- 

and post-surgery HSV recordings? 

To answer this central research question, it is broken into three sub-questions. 

Q1a: 

What  are  the  main  assumptions  behind  the  validity  of  the  indirect  calibration 

approach? 

How can validity of the registration step be evaluated? 

Multiple common attributes can be identified for calibration of the post-surgery 

Q1b: 

Q1c: 

HSV recordings. How could we select the most appropriate one? 

Associated with these research questions the following hypotheses were formed: 

H1a: 

The  proposed  registration  uncertainty  test  can  detect  instances  of  common 

attributes that have high registration uncertainties. 

H1b: 

A proper object could be selected from the calibrated intraoperative image, such 

that its registration on the pre- and post-surgery HSV recordings would lead to the 

horizontal calibration of the HSV images. 

Chapter  3  uses  a  pre-existing  set  of  HSV  recordings  as  a  test  bench  to  demonstrate  the 

feasibility of the indirect calibration method. To that end, the required methods for computation of 

the calibrated velocity measures of the vocal folds are presented in chapter 3. The outcomes of 

chapter  3  have  also  significant  scientific  values  for  voice  science  and  clinical  applications. 

Perceptual evaluation and acoustic measurement studies have shown that the presence of a lesion 

on a vocal fold often changes the produced acoustic signal.60,61,196–201 VSB, VKG, and HSV studies 

have made some connections between changes in the physiology and the vibratory characteristics 

of the vocal folds.93–95,108,183,184,202,203 However, we do not know exactly how the kinematics of 

vocal folds changes in the presence of a lesion, and how removing the lesion improves it. The 

 

17 

closing velocity of the vocal folds is an important kinematic measure that relates to collision forces 

of the vocal folds25–27 and the produced acoustic output.23,28,29 Hence, this measure could link 

biomechanics  of  the  phonation  to  the  output  of  the  system.  Considering  that  time  is  already 

calibrated in the HSV recordings, its horizontal calibration would be the prerequisite of estimating 

the closing velocity of the vocal folds. Chapter 3 is aimed at studying the post-surgery changes in 

the closing velocity of the vocal folds. To this end, the following research question is answered in 

chapter 3. 

Q2: 

How does the removal of a lesion from a vocal fold affects its kinematics? 

Associated with this research question the following hypotheses were formed: 

H2a: 

The  closing  phase  maximum  velocity  will  significantly 

increase  after 

phonomicrosurgery. 

H2b: 

For unilateral mass lesions, the closing phase maximum velocity of the two vocal 

folds will become more similar after the surgery. 

H2c: 

Post-operative change in the closing phase maximum velocity will be positively 

correlated with the area of the lesion. 

The indirect calibration approach has its own limitations. For example, it only provides the 

horizontal information and could not be used for vertical measurements. Additionally, it is based 

on some important implicit assumptions, which could not be validated or evaluated, directly. For 

example, a common attribute (e.g. lesion size) should be present in images from both modalities, 

and we should be able to register it accurately. Additionally, the actual size of the common attribute 

(i.e. its mm length) should be constant and does not change between different imaging sessions 

and imaging modalities. Also, the relationship between the length of the common attribute and the 

rest of the image in different imaging modalities should be a linear transformation of each other 

 

18 

(this is discussed in more detail in chapter 2). To put these conditions into context, if the lesion 

does not have a clear boundary, the first condition may be violated. If the size of the lesion changes 

(e.g.  due  to  pliability  of  the  tissue,  different  gravitation  forces  between  supine  and  upright 

positions, etc.) the second condition may not hold. The last condition would be violated if the 

imaging angle is changed, or if the vertical distance between the common object and the rest of 

the image is changed between different imaging sessions. Consequently, the indirect approach 

could be prone to significant errors. The direct calibration approach could remedy this, at the 

expense  of  more  sophisticated  hardware  (imaging  instrument)  and  software  (measurement 

algorithm).  The  remaining  chapters  of  this  dissertation  are  devoted  to  the  development  and 

evaluation  of  methodologies  for  direct  calibration  of  in-vivo  HSV  recordings  using  a  laser-

projection transnasal fiberoptic endoscope.195 Due to the optical design of this laser-calibrated 

endoscope, the horizontal distance between each pair of laser points is a function of the distance 

between the tip of the endoscope and the target surface (i.e. the working distance). Consequently, 

the horizontal measurement from this new system relies on the estimation of the working distance 

which  is  a  vertical  measurement.  Therefore,  the  vertical  measurement  is  presented  before  the 

horizontal measurement in this dissertation. 

Chapter  4  presents  the  developed  methodology  for  direct  vertical  measurements.  Besides 

providing  the  required  information  for  the  horizontal  measurement,  multiple  modeling  studies 

have predicted a significant role for the vertical component of vibration of the vocal folds.164,172–

174 Therefore, the vertical measurements could be a significant source of information for improving 

our knowledge from the normal and disordered phonatory mechanisms. To achieve these goals, a 

system with the capability of vertical measurements should be developed first. The main research 

question of chapter 4 is: 

 

19 

Q3: 

How could we use a structured laser projection system for measuring the vertical 

distance between the distal tip of a flexible endoscope and the target surface? 

Associated with this research question the following hypotheses were formed: 

H3a: 

The position of each laser point will be a unique and deterministic function of the 

vertical  distance  between  the  distal  tip  of  the  flexible  endoscope  and  the  target 

surface, once the confounding factors are accounted for. 

H3b: 

Vertical measurement error will be positively correlated to working distance. 

Different parameters could be a confounding factor for calibrated vertical measurements. The 

effect of the focal distance of the lens coupler, the rotation of the endoscopic eyepiece inside the 

lens coupler, and the displacement of the eyepiece within the lens coupler are accounted for in the 

proposed  method.  Additionally,  the  effects  of  working  distance,  optical  differences  between 

different laser points, the imaging angle, and imaging from a non-flat surface on measurement 

errors are evaluated. Finally, there are other factors including the intensity of the light source, the 

frame rate of the camera, the exposure time of the camera, the sensitivity of the chip of the camera, 

the spatial resolution of the chip of the camera, the format of the images (e.g. raw data vs. avi), the 

intensity of the laser source, differences between different makes of the endoscope, the curvature 

of the target surface, the reflective properties of the target surface, the color of the target surface, 

and absorption properties of the target surface that are not investigated in this dissertation and 

should be investigated in future works. 

Calibrated horizontal measurement depends on devising a scheme that could convert the pixel 

length of an object (i.e. its length on the image) to its true length (i.e. its mm length). Achieving 

this  goal  requires  a  precise  knowledge  of  the  confounding  factors  affecting  the  relationship 

between  a  pixel  length  and  its  mm  length.  The  main  aim  of  chapter  5  is  to  study  two  main 

 

20 

confounding factors of horizontal measurements. Additionally, the outcomes of this chapter could 

help us better understand possible confounding factors in subjective assessments and objective 

measurements from flexible endoscopy images. The main research questions of chapter 5 are: 

Q4a: 

Q4b: 

How much the mm size of a pixel depends on its spatial location? 

How much the imaging angle affects the mm size of a pixel? 

Associated with these research questions the following hypotheses were formed: 

H4b: 

H4c: 

Pixel size is significantly smaller in the center group than the periphery group. 

Pixel size is significantly different between back, middle, and front groups when 

the target surface gets tilted. 

Chapter  6  builds  on  the  results  of  chapter  5  and  presents  the  required  methodology  for 

calibrated  horizontal  measurements  using 

the 

laser-calibrated  endoscope.  Horizontal 

measurements could provide a better and direct means for studying the developmental aspects of 

vocal folds153 and laryngeal tissues, quantifying the relationship between an intervention and its 

resulting physiological changes (e.g. post-intervention changes in the lesion size), staging and 

grading  of  relevant  laryngeal  diseases191,  and  providing  calibrated  spatial  measurements  for 

patient-specific  models.  To  achieve  these  goals,  a  system  with  the  capability  of  calibrated 

horizontal measurements should be developed first. The aim of this chapter is to develop a method 

that could address this need. The main research question of chapter 6 is: 

Q5: 

How could we use a structured laser projection system for measuring the horizontal 

distance between two points on a target surface? 

Associated with this research question the following hypothesis was formed: 

H5a: 

Horizontal  measurement  error  from  the  laser-projection  system  significantly 

increases if the nonlinear distortion is not properly compensated for. 

 

21 

H5b: 

Horizontal measurement error will be positively correlated to working distance. 

Different factors could affect the accuracy of calibrated horizontal measurements. The effects 

of  vertical  distance  and  the  spatial  location  of  the  object  inside  the  field  of  view  (FOV)  are 

accounted for in the proposed method. Additionally, the effect of working distance, the imaging 

angle, and imaging from a non-flat surface on the measurement errors are quantified and reported.  

Chapters 4, and 6 of this dissertation are aimed at developing methods for calibrated vertical 

and horizontal measurements using a laser-projection endoscope. The methods are developed in a 

very controlled setting and using benchtop recordings from flat surfaces. In order to validate the 

methods in a more complex and realistic setting, chapter 7 is devoted to the validation of the 

developed methods. Specifically, based on our daily experiences we know that the angle of a 

camera relative to a scene affects the way that scene is recorded. Therefore, it is expected for the 

imaging angle to affect the accuracy of horizontal and vertical measurements. However, this topic 

has received very limited attention in the field of voice.204,205 One possibility for this gap, could be 

a  lack  of  quantitative  values  regarding  the  effect  of  imaging  angle  on  the  accuracy  of 

measurements.  Additionally,  a  3D  surface  is  used  to  evaluate  the  accuracy  of  the  developed 

methods on non-flat surfaces. 

The main research questions of chapter 7 are: 

Q6a: 

How  the  imaging  angle  affects  the  performance  of  the  vertical  and  horizontal 

measurements? 

Q6b: 

How the topology of a 3D surface affects the vertical and horizontal measurements? 

Associated with these research questions the following hypotheses were formed: 

H6a: 

The  tilting  angle  of  the  target  surface  and  the  working  distance  will  be  good 

predictors of the vertical measurement error. 

22 

 

H6b: 

The  tilting  angle  of  the  target  surface  and  the  working  distance  will  be  good 

predictors of the horizontal measurement error. 

H6c: 

The vertical measurement errors from a non-flat surface will be higher than those 

from a flat surface positioned at the same estimated average vertical distance. 

H6d: 

The horizontal measurement errors from a non-flat surface will be higher than a flat 

surface positioned at the same estimated average vertical distance. 

Finally, chapter 8 presents a summary of the findings. 

Table 1.1 presents a summary of the chapters of the dissertation. Specifically, for each chapter 

the type of endoscope, the relevant dataset, and its primary goals are presented. 

Table 1.1. Summary of different chapters of the dissertation. 

Chapter  Endoscope  Dataset 
 
Chapter2 

 
Rigid 

 
Chapter3 

 
Rigid 

Chapter4  Flexible 

Chapter5  Flexible 

 
 
 
Chapter6 

 
 
 
Flexible 

 
Chapter7 

 
Flexible 

Pre- and post-surgery in-vivo HSV 
recordings. 
Calibrated intraoperative still 
images. 
Pre- and post-surgery in-vivo HSV 
recordings. 
Calibrated intraoperative still 
images. 
Benchtop recordings from white 
papers. 
Benchtop recordings from 
rectangular grid papers. 
Benchtop recordings from white 
papers. 
Benchtop recordings from circular 
grid papers. 
Benchtop recordings from circle 
sectors. 
Benchtop recordings from line 
segments. 
Benchtop recordings from line 
segments. 
Benchtop recordings from a 3D 
printed surface. 

Outcome 
 
Indirect calibrated horizontal measurements 

 
Indirect calibrated closing phase maximum 
velocity 

Direct calibrated vertical measurements 

Quantification of non-linear distortion of a 
fiberoptic flexible endoscope. 
 
 
 
Direct calibrated horizontal measurements 

 
Validation of developed direct horizontal and 
vertical methods. 

 

 

23 

1.4. Recordings setup and characteristics 

The  proposed  research  is  based  on  different  sets  of  video  recordings.  Specifically,  two 

different types of in-vivo and benchtop recordings are used in this dissertation. Considering the 

extensive use of benchtop recordings, this section presents the employed setup, as well as, the 

protocol that was followed for the benchtop data collections. 

1.4.1. Benchtop recording setup 

This dissertation uses benchtop recordings for the development of methods for vertical and 

horizontal  measurements  using  a  laser-calibrated  fiberoptic  flexible  endoscope195,  as  well  as, 

investigation of the effect of different factors on the accuracy of measurements. Therefore, a setup 

that allows precise variations in the working distance and the imaging angle was developed. The 

setup  consisted  of  a  vertical  pillar  that  was  connected  to  a  horizontal  surface.  A  high-speed 

monochrome camera Phantom v7.1 (Vision Research Inc., Wayne, NJ) was connected to the pillar 

such that it was perpendicular to the horizontal surface. A 45-mm lens coupler was used to connect 

the flexible endoscope to the camera. The distal tip of the endoscope was passed through two 

fixtures with small holes to keep the distal end of the endoscope fixed. The target surface was 

attached to an adjustable arm with two degrees of freedom. Specifically, the vertical adjustment of 

the arm allowed us to regulate the distance between the target surface and the distal end of the 

endoscope, accurately. Additionally, the setup allowed us to regulate the angle between the target 

surface and the imaging axis of the endoscope, accurately. The first parameter is called the working 

distance, and the second parameter is called the tilting angle for the rest of this dissertation. Figure 

1.2 shows the employed setup for benchtop recordings. 

 

24 

Figure 1.2. The employed setup for benchtop recordings. 

 

1.4.2. Recording protocol 

Based on our preliminary studies and analyses a recording protocol was developed. Benchtop 

recordings were acquired using the following protocol.  

(1) We made sure that the FOV was completely inside the image frame and a border of at least 

five pixels was present on all four sides of the FOV. Figure 1.3 depicts this condition. 

Figure 1.3. Examples of incorrect placements of the FOV in the image frame. 

 

 

25 

(2) We made sure that the FOV was quite visible and had a sharp contrast with the black 

background. Figure 1.4 depicts this condition. 

Figure 1.4. Some examples of the FOV with unclear edges. 

 

(3) We made sure that the fiducial marker was completely inside the image frame and it had a 

border of at least five pixels. Figure 1.5 depicts this condition. 

Figure 1.5. Some examples of the inadequate border between the FOV and the image frame. 

 

(4) We made sure that the fiducial marker was visible and had a sharp contrast with the black 

background. Figure 1.6 shows an unacceptable example. 

 

26 

Figure 1.6. An example image with non-visible fiducial marker. 

 

 

(5) We used similar recording parameters for all recordings. The only parameters that were 

allowed to vary were the working distance, the tilting angle of the target surface (only in 

chapter 6), the illumination intensity of the light source, power of the laser source, and the 

exposure time of the camera. 

(6) The  xenon  light  is  essential  for  recording  at  high  frame  rates;  however,  it  adds  high-

intensity divergence to the image. This divergence could interfere with the accuracy of the 

calibration  protocols  and  add  unnecessary  complexities  to  the  image  processing  steps. 

Therefore, all benchtops recordings used low frame rates and an external light source was 

used instead of the xenon light. For this purpose, we placed a studying lamp near to the 

target surface such that no shadow was projected on the target surface.  

   

 

 

 

27 

CHAPTER 2: INDIRECT HORIZONTAL CALIBRATION OF IN-VIVO HSV 

 

RECORDINGS 

Based on: 

Ghasemzadeh H., Deliyski D. D., Hillman R. E., Mehta D. D. Indirect horizontal calibration of 
high-speed videoendoscopy recordings. How to do it, and what to look for? in Preparation. 
 

 

Summary: Calibrated horizontal measurements from high-speed videoendoscopy recordings could 

offer significant advantage to precision medicine, patient-specific modeling, and evidence-based 

practice  in  the  field  of  speech-language  pathology  and  laryngology.  Recently  laser-projection 

systems  have  been  developed  for  achieving  the  calibrated  measurement  goals.  However,  such 

systems are still in their infancy, and also only available to very few research labs. This chapter 

presents an alternative approach for achieving the horizontal calibration. The main idea of this 

alternative approach is to find a proper common object and then normalize lengths of other spatial 

measures to it. The underlying assumptions behind the validity of this approach are studied and it 

is shown that three main conditions should hold.  First, the registration of the common object 

should be with negligible error. Second, the true length of the common object should be fixed. 

Finally, the common object and the target object should be on the same vertical distance. Two tests 

are proposed that could detect significant violation of the first and the second assumptions. In this 

study, a pre-existing dataset is used to demonstrate the feasibility of this approach. 

 

 

28 

2.1. Introduction 

The uncalibrated size of a region of interest (ROI) (e.g. length, width, area) in any image can 

be measured by counting its corresponding number of pixels. Assuming a similar distance between 

all objects in the scene and the camera (i.e. the same working distance), one can compare the size 

of different objects within that image. Under this assumption, typical images can be utilized for 

within-image size comparisons. The main difficulty arises if we want to compare the size of an 

object from an image, with the size of a different object (or even the same object) from a different 

image (i.e. between-image size comparison). Considering the simplest scenario, we know that the 

size of an object in an image depends on its working distance. Therefore, differences between pixel 

lengths in different images could be attributed to either difference in their working distances or 

difference in their actual sizes. Obviously, this issue arises because we do not have a standard basis 

for comparison between the two images. This issue would be resolved if we could map both images 

to a fixed and standard basis. This task is called the horizontal calibration and it makes between-

image size comparison possible. Considering that the meter is the base unit of the length in the 

International System of Units (SI), it is very typical to use it as a standard basis. In that regard, 

horizontal calibration is the process that one determines the size of a pixel in a metric unit (typically 

millimeter (mm) in voice science). This number then serves as a scale for conversion from pixel 

into mm, hence it is called the pixel size and the pixel-to-mm conversion scale in the rest of this 

dissertation. These two terms are used interchangeably in this dissertation. Different approaches 

are possible for the computation of this conversion scale. This chapter is devoted to what we have 

called the indirect computation of the conversion scale. 

Let us consider a set of images from different scenes, all containing one common object. 

Between-image size comparisons can be made within this set if the pixel length of that common 

 

29 

object is used for calibration (i.e. the pixel length of the desired object is normalized using the 

pixel length of the common object). Now, if we know the metric size of that common object, 

images  from  that  set  can  be  mapped  into  a  fixed  and  standard  basis  and  between-image  size 

comparisons  can  be  made  across  different  image  sets.  This  approach  is  the  basis  of  indirect 

horizontal  calibration.  In  summary,  we  have  a  set  of  images  with  some  auxiliary  information 

regarding the metric size of a specific object or a specific spatial attribute from that set. In the 

context of laryngeal imaging, this auxiliary information could be for example the mm size of a 

lesion93, mm length of the vocal folds, or some mm features of a blood vessel.206 

Indirect calibration is based on some auxiliary information that has come from a different 

image, than the one we would like to perform measurement from. One typical example is the 

intraoperative calibrated measurement of the lesion size.93,190,191 Figure 2.1 shows some examples 

of calibrated intraoperative images. In this approach, a surgical instrument with a miniaturized 

ruler or a known mm length is positioned next to a lesion and the whole scene is recorded on a 

picture.  The  image  then  can  be  printed  and  the  pixel  lengths  of  the  lesion  and  the  surgical 

instrument can be measured using a high-precision caliper.93 One can determine the pixel-to-mm 

conversion scale from this information. The computed scale could then be used for calibration of 

the HSV recording from the same patient. Another possible source of such auxiliary information 

could be a laser-calibrated VSB recording.133,207 Considering that laser-calibrated VSB systems 

have been around longer than their HSV counterparts122 and the simplicity of their optical design 

(typically  parallel  laser  projections122),  the  required  methodology  for  analysis  and  calibrated 

measurements  of  the  laser-calibrated  VSB  images  is  already  available.  Additionally,  the 

significantly  shorter  integration  time  of  HSV  systems  and  significantly  brighter  illumination 

sources of HSV systems in comparison to VSB, add extra requirements to the optical design of the 

 

30 

laser projection system. Therefore, in some instances, it may be more convenient and practical to 

use a combination of laser-calibrated VSB and a non-calibrated HSV system. One example would 

be a recently funded NIH R01 grant (R01 DC017923, PI: Verdolini Abbott) that is very tightly 

related to this project and the next chapter. In that grant proposal, VSB images will be used for 

calibrated  measurement  purposes,  whereas  HSV  images  will  be  used  for  the  studying  of  the 

temporal and spatial vibratory characteristics of the vocal folds. Assuming a similar length for the 

vocal folds (or another spatial attribute) between VSB and HSV recordings, the mm length of the 

vocal folds (or another spatial attribute) from laser-calibrated VSB could be utilized for calibration 

of the HSV images. Previous studies have suggested that pitch of phonation depends (among other 

factors) on the length of the vocal folds and the subglottal air pressure.12 Additionally, subglottal 

air pressure is a good predictor of the loudness.12 Hence, if both recordings are done using the 

same pitch and loudness (e.g. habitual pitch, habitual loudness) similar attributes of the vocal folds 

across the recordings could be assumed to some extent. 

Figure 2.1. Two examples of intraoperative calibrated images, taken from references 190 and 93.  

 

 

This chapter presents a method for indirect horizontal calibration of HSV recordings from 

their corresponding intraoperative still calibrated images. The developed method could be used for 

different applications including, investigation of the kinematics of disordered phonation, studying 

the developmental aspects of vocal folds’ vibration153 and laryngeal tissues, the post-intervention 

 

31 

physiological changes (e.g. lesion size), staging and grading of relevant laryngeal disease191, and 

providing calibrated horizontal measurements for patient-specific models. 

2.2. Aim and hypothesis 

The project of this chapter has external funding, and it is tightly related to a recently approved 

NIH  R01  grant  R01  DC017923  (PI:  Verdolini  Abbott)  with  a  subcontract  to  Michigan  State 

University (sub-award PI: Deliyski). The second aim of that grant proposal requires calibrated 

horizontal measurements from in-vivo HSV recordings. The imaging will be conducted using a 

rigid endoscope, and the subjects will be pediatric patients diagnosed with nodules. Additionally, 

a parallel laser projection VSB system153 will be utilized for measuring the size of the nodule pre- 

and post-therapy.  

The 

idea  of 

indirect  horizontal  calibration  has  already  been  used 

in  multiple 

studies.93,190,191,206,207 However, this notion has not yet received a formal treatment. Specifically, 

the conditions and assumptions behind the validity of this approach are yet to be studied. The main 

aims of this chapter are to investigate the possible sources of error in the indirect calibration 

approach, and then to use intraoperative images for indirect calibration of pre- and post-surgery 

HSV recordings. The central research question of this chapter is, 

Q1: 

How could calibrated intraoperative images be used for spatial calibration of pre- 

and post-surgery HSV recordings? 

To answer this central research question, it was broken into three sub-questions. 

Q1a: 

What  are  the  main  assumptions  behind  the  validity  of  the  indirect  calibration 

approach? 

Q1b: 

How can validity of the registration step be evaluated? 

 

32 

Q1c: 

Multiple common attributes can be identified for calibration of the post-surgery 

HSV recordings. How could we select the most appropriate one? 

The following hypotheses were formed in this chapter: 

H1a: 

The  proposed  registration  uncertainty  test  can  detect  instances  of  common 

attributes that have high registration uncertainties. 

H1b: 

A proper object could be selected from the calibrated intraoperative image, such 

that its registration on the pre- and post-surgery HSV recordings would lead to the 

horizontal calibration of the HSV images. 

2.3.  Material and method 

2.3.1. Participants and data acquisition 

The aim of this chapter is pursued using retrospective data. Calibrate intraoperative images 

and HSV recordings were obtained from 26 adults with vocal fold mass lesions at Massachusetts 

General Hospital. Subjects were recorded using a custom-built HSV system over two different 

sessions. The first session was before the surgery, and the second recording was carried out on 

average 3.5 weeks after the surgery. The HSV system consisted of the following components, a 

color Phantom v7.3 camera (Vision Research, Inc., Wayne, New Jersey), a 300-Watt xenon light 

(Model  7152A,  PENTAX  Medical  Company  Montvale,  New  Jersey),  and  a  70°  10-mm  rigid 

laryngoscope  (Model  49-4072,  JEDMED  Instrument  Co,  St.  Louis,  Missouri).  The  recordings 

were  done  at  a  sampling  rate  of  6,250  fps  with  the  maximum  integration  time  and  a  spatial 

resolution of 320×352 pixels. The surgery was performed using cold instruments and/or a 532-nm 

pulsed  potassium  titanyl  phosphate  laser  photoablation  under  general  anesthesia.  Before  the 

 

33 

operation, a surgical instrument with a known mm length was placed next to the lesion and an 

intraoperative image was recorded. 

Considering the aims of this chapter, the full temporal resolution of HSV recordings is not 

required. Therefore, a hamming window with a size of 5 was used for temporal smoothing of the 

data.  To  that  end,  every  five  consecutive  frames  of  HSV  data  were  weighted  by  a  hamming 

window,  and  then  they  were  averaged.  This  process  significantly  reduced  the  noise  of  the 

recordings.  

2.3.2. Indirect calibration principles and assumptions 

Let     and        denote calibrated length and uncalibrated length of an object in an image. 
We can define the pixel size ( ) as, 
 =          
regarding  the  parameter  .  For  example,  the  percentage  change  in  the  pixel  length  (i.e.  the 

The  validity  of  any  uncalibrated  size  comparison  depends  on  certain  implicit  assumptions, 

(2-1) 

uncalibrated length in the image) of a lesion pre- and post-intervention could be used as a direct 

evaluation criterion for measuring and comparing the efficacy of different interventions. In this 

group-comparison scenario, an implicit assumption is that for each subject the measurement from 

the pre and post conditions are on the same scale, and hence can be compared with each other. 

Being on the same scale means that if the pixel length of the lesion in the post-intervention image 

is reduced by 20%, in reality, the true length in mm (i.e. the calibrated length) of that lesion has 

also been reduced by 20%. More precisely, the implicit assumption is that the mm size of a pixel 

(i.e. the pixel size) in the pre and post conditions are the same for each subject. We call this the 

within-subject  size  comparison  assumption.  It  is  noteworthy  that  most  image-based  group 

 

34 

comparison studies in voice (both objective and subjective) are based on this assumption. On the 

other  hand,  if  the  purpose  of  the  research  is  to  compare  different  groups  or  to  relate  post-

intervention changes in the lesion size to some outcomes of the phonatory mechanisms (e.g. some 

acoustic measurements), a more strict assumption should hold. More precisely, not only the mm 

size of pixels in the pre- and post-conditions for each subject should be the same, but also the mm 

size  of  pixels  in  different  subjects  should  be  the  same.  We  call  this  the  between-subject  size 

comparison  assumption.  This  implicit  assumption  is  present  in  most  (if  not  all)  image-based 

regression and modeling studies in voice. It is noteworthy that the between-subject size comparison 

assumption satisfies the within-subject size comparison assumption; however, the other direction 

does not necessarily hold. Therefore, the conditions and assumptions behind the validity of each 

approach are studied in different sections. 

2.3.2.1. Indirect calibration for between-subject size comparison 

The  between-subject  size  comparison  requires  the  mapping  of  all  measurements  into  a 

 denotes the calibrated length of the common attribute (e.g. the lesion size) in 

standard and fixed basis. This often requires the knowledge of the mm length of the common 

uncalibrated length of the common attribute and the target object (e.g. length of the vocal folds) in 

attribute. Let   ,   
 and   ,     
the first image (e.g. the intraoperative image).  Additionally, let   ,     
 
 
) from the knowledge of   ,   
the calibrated length of the target object in the second image (  ,   
Obviously, if we have the pixel size of the target object in the second image (    ) we can compute 
  ,   
  ,   

the second image (e.g. a frame of HSV recording). The aim of indirect calibration is to estimate 

 by, 

=   .  ,     
 

 

 denote the 

. 

(2-2) 

 

35 

 as follows, 

=  ,   

Given that the second image is not calibrated, the      is not known. Let     denotes the pixel size 
of the common attribute in the second image. Now, assuming    =    and   ,   
compute   ,   
   =  ,     ,     
=  ,     ,     
 
 
=   .  ,     
  ,   
 
The value of   could depend on different parameters of recording including the vertical distance, 
we  assume  a  case  where    only  depends  on  the  vertical  distance.  In  that  case,  the    =    

the imaging angle, and the spatial location of the object inside the field of view.208,209 For simplicity 

 we can 

 

 

(2-3) 

(2-4) 

assumption translates into the common attribute and the target object being on the same vertical 

distance from the endoscope. Based on the presented arguments, the main assumptions behind the 

validity of indirect calibration for between-subject size comparison applications are as follows. 

First, the common attribute can be registered accurately on the second image. Second, the common 

attribute and the target object are at the same vertical distance from the endoscope. Third,   ,   
  ,   

 which means that the calibrated length of the common attribute should not change between 

=

the  first  and  the  second  images.  These  three  conditions  will  be  referred  to  as  the  registration 

accuracy assumption, the similarity in the vertical distance assumption, and the consistency of the 

common  attribute  assumption  in  the  rest  of  this  dissertation.  As  a  final  note,  the  similarity  in 

vertical distance assumption was derived based on the assumption that pixel size only depends on 

the working distance. However, spatial location in the fiberoptic endoscopes and the imaging angle 

are also significant factors for the value of the pixel size.208 Therefore, the similarity in vertical 

distance assumption would become much more complicated during the fiberoptic endoscopy, or if 

the optical axis is not perpendicular during the imaging sessions. 

 

36 

2.3.2.2. Indirect calibration for within-subject size comparison 

Performing the within-subject size comparison requires less information and in that regard is 

more practical. However, it is very likely that the outcome of calibration could not be used for 

equal to, 

−1)×100% 

absolute measurements, but rather the percent change in the size (e.g. percent change in the lesion 

We could divide the numerator and denominator of Equation 2-6 with the same number, 

Now, we could use the value of pixel size and re-write Equation 2-5 using uncalibrated lengths, 

size post-therapy). Assuming the availability of calibrated lengths of the target object in the first 

and the second image, the calibrated (i.e. the true) percent change (    ) of a target object is 
    =  ,   
−  ,   
×100%=(  ,     ,   
  ,   
.   
    =(  ,     
.   −1)×100% 
 
  ,     
 
    =(  ,     
.   
.   ×  ,     ,   
−1)×100% 
 
  ,     
 
=  ,   
Assuming   ,   
.   
.   
    =(  ,     
.   ×  ,     
.   −1)×100% 
 
 
  ,     
  ,     
 
 
    =(  ,     
  
 
  ,     
×   .   
   .   −1)×100% 
  ,     
  
 
  ,     

  we  can  rewrite  Equation  2-7  using  the  uncalibrated  lengths  of  the 

(2-5) 

(2-6) 

(2-7) 

(2-8) 

(2-9) 

common attribute, 

Doing some re-arrangements, we would have, 

 

37 

). The calibrated percentage change can be computed as: 

−1)×100% 

(2-10) 

) and the second image (  ,     
 

). Also, 

) and the second image 

change can be computed from uncalibrated lengths as follows. Pick a suitable common attribute 

   .   =1  (this  assumption  would  be  discussed  shortly),  then  calibrated  percentage 

Assuming    .   
and measure its uncalibrated length in the first (  ,     
 
measure the uncalibrated lengths of the target object in the first (  ,     
 
(  ,     
 
    =(  ,     
  
 
  ,     
  ,     
  
 
  ,     
   .   =1. The value of   could depend on different 
Now, let us investigate the condition for    .   
of the object inside the field of view.208,209 For simplicity we assume a case where   only depends 
of    .   
   .    term and plug it in Equation 2-9. However, the vertical information is often (if not always) 
   .   =1. 
lost during the image acquisition. Therefore, we need to find conditions that govern    .   
There are two trivial solutions to this. Either    =    and    =    , or    =     and    =    . The 

parameters of recording including the vertical distance, the imaging angle, and the spatial location 

on the vertical distance. In that case, if we have the vertical distance we can compute the true value 

first solution means that the vertical distance between the endoscope and the target object, and the 

vertical distance between the endoscope and the common attribute should not change between the 

first and the second images. The second solution means that the target object and the common 

attribute  are  on  the  same  vertical  levels.  Considering  that  the  larynx  can  move  in  the  vertical 

direction, the first condition cannot be controlled. Therefore, the second condition would be more 

feasible and practical case for laryngeal imaging applications. In summary, the validity of indirect 

calibration for within-subject size comparison depends on the same three main assumptions of: 

 

38 

registration  accuracy  assumption,  the  similarity  in  the  vertical  distance  assumption,  and  the 

consistency of the common attribute assumption in the rest of this dissertation. As a final note, the 

similarity in the vertical distance assumption was derived based on the assumption that pixel size 

only depends on the working distance. However, spatial location in the fiberoptic endoscope and 

the  imaging  angle  are  also  significant  factors  of  the  pixel  size.208  Therefore,  the  similarity  in 

vertical distance assumption would become much more complicated in fiberoptic endoscopy or if 

the optical axis is not perpendicular in the two imaging sessions. 

To provide more insights into these assumptions, some scenarios that violate each assumption 

are provided. For the first assumption, let us consider a common attribute that is blurry, or does 

not  have  a  clear  and  sharp  boundary.  In  that  case,  the  common  attribute  cannot  be  registered 

accurately. For the second assumption, let the target object be some spatial features of the vocal 

folds and the common attribute be some spatial feature in the supraglottic region. Considering that 

the common attribute is closer to the camera it will have a smaller pixel size in comparison to the 

target object, which is a violation of the similarity in the vertical distance assumption. For the third 

assumption, let us consider a soft and pliable common attribute attached to the vocal which gets 

deformed  easily.  Now,  if  the  vocal  folds  stretch,  the  common  attribute  will  also  elongate. 

Obviously, this is a contradiction of the consistency of the common attribute assumption. 

It is noteworthy that, the second and the third assumptions often could be contradicting each 

other. Specifically, if the target object is on the vocal fold, using a common attribute that is not on 

the vocal fold would possibly satisfy the third assumption to the maximum extent. Conversely, the 

second assumption requires the selection of a common attribute that is as close to the target object 

as possible. In practice, we need to make a tradeoff between these two assumptions and other 

considerations and select a common attribute that is more suitable. A final word regarding the 

 

39 

second assumption, if the target object and the common attribute are not on the same vertical 

distance, the indirect calibration approach will introduce some errors into the measurement. The 

magnitude of this error will depend on the vertical distance between the common attribute and the 

target object and also the vertical distance between the imaging component (i.e. the endoscope) 

and the closer object. Additionally, a higher vertical difference between the common attribute and 

the target object will lead to higher error. Conversely, keeping the vertical difference between the 

common  attribute  and  the  target  object  fixed  and  increasing  the  vertical  distance  between  the 

imaging component (i.e. the endoscope) and the closer object would decrease the error. This is 

especially important during flexible endoscopy, where vertical distance can be varied in a large 

range. As a practical guide, we need to keep the vertical distance between the common attribute 

and the target object as small as possible, especially when the imaging is done at a close working 

distance (e.g. during flexible endoscopy).  

2.3.3. Evaluation of indirect calibration 

The indirect calibration lacks the existence of a universal and standard basis of comparison 

(e.g. metric scale) on the target data (i.e. HSV recording). However, it can offer some functionality 

of calibration by registering a proper common attribute on the HSV data. The validity and accuracy 

of any measurement following the indirect calibration would depend on correct and successful 

registration  of  the  common  attribute,  as  well  as,  the  existence  of  the  two  other  fundamental 

characteristics of the common attribute discussed in the previous section. While due to the lack of 

a universal and standard basis these conditions could not be checked directly, a test is presented in 

the next section that can indirectly evaluate the registration accuracy assumption. 

 

40 

2.3.3.1. Registration uncertainty test 

Accuracy of indirect calibration relies heavily on correct registration of the common attribute. 

Additionally, not all common attributes may have similar registration accuracies. For example, a 

common attribute could be blurry or may lack a clear and sharp boundary. A test is developed here 

that indirectly provides an estimation of the accuracy of the registration process. The computed 

value provides a higher bound for the registration accuracy and therefore puts a lower bound on 

subsequent measurement uncertainty. That is, the subsequent measurements would at least have 

that amount of uncertainty. Finally, this test has a very high positive predictive value. Meaning 

that for a common attribute with a high score we could be quite confident that the amount of 

uncertainty is high, and therefore that common attribute would not be suitable.  

The test assumes that a total number of k different HSV recordings from multiple subjects are 

being calibrated at the same time. Additionally, it assumes that for each subject a single image 

with the common attribute (e.g. the intraoperative image) is available. We will call this, the fixed 

image in the rest of this chapter. The test consists of three steps of data selection, data registration, 

and data analysis.  

In the data selection step, a time point is selected randomly from one of the HSV recordings, 

and then frames within that glottal cycles are evaluated subjectively. The frame with the best visual 

appearance of the common attribute is selected. This process is repeated n times for the current 

recording. The whole process is repeated for all k recordings. This will lead to a total number of 

k×n selected frames. These images will be referred to as the moving images in the rest of this 

chapter. Moving images are randomized, and then they are presented for the registration step.  

In the registration phase, a two-panel graphical user interface (GUI) is used where one panel 

shows the fixed image, and the other panel shows the moving image. The GUI is equipped with 

 

41 

the zooming capability, to make the registration more accurate. The user clicks the boundary of 

the common attribute (e.g. the start and the endpoint of a lesion in the anterior-posterior direction) 

on the fixed image, and then do the same thing for the moving image. The software then computes 

the uncalibrated pixel size of the common attribute in the fixed and the moving image and then 

records  them.  If  the  lesion  size  is  used  for  calibration,  the  size  is  computed  as  the  Euclidean 

distance between the two click points. This process is repeated until all images are processed.  

In  the  analysis  phase,  the  data  is  de-randomized  and  then  the  ratios  between  sizes  of  the 

common attribute in the fixed and the moving images are computed for each recording. This will 

lead to n such ratio values. The interquartile range of these n ratio values is computed. This process 

is repeated for all k recordings. Any recordings with an interquartile range larger than a threshold 

would be an instance of calibration with a high level of uncertainty and should be removed from 

later analysis. A method for selecting a proper value of the threshold is presented in section 2.4.1.3.  

It  is  noteworthy  that,  the  registration  uncertainty  test  can  be  extended  to  select  the  most 

suitable common attribute (e.g. anterior-posterior length of a lesion, medial-lateral length of a 

lesion, etc.) for each recording. For that purpose, different common attributes may be determined 

for each recording. The common attribute with the lowest interquartile range would be the most 

suitable common attribute for that recording.  

2.4.  Experiments and results 

Three  experiments  were  conducted  to  answer  the  research  questions  of  this  chapter. 

Experiment 1 demonstrates the efficacy of the proposed registration uncertainty test. Experiment 2 

investigates  the  consistency  of  different  common  objects  between  different  phonatory 

configurations.  Experiment 3  demonstrates  how  the  most  appropriate  common  object  may  be 

 

42 

selected from a couple candidate ones. This section presents details of each experiment, followed 

by results and the related discussions. 

2.4.1. Experiment1: Efficacy of registration uncertainty test 

This experiment was conducted to demonstrate the performance and efficacy of the registration 

uncertainty test. The following hypothesis was formed for this experiment.  

H1a: 

The  proposed  registration  uncertainty  test  can  detect  instances  of  common 

attributes that have high registration uncertainties. 

2.4.1.1. Database 

From the 26 subjects with calibrated intraoperative images, lesions were not visible in three 

recordings,  and  one  recording  was  blurry.  These  samples  were  excluded  from  the  rest  of  the 

analysis.  The  included  22  subjects  had  an  HSV  recording  from  their  comfortable  pitch. 

Additionally,  14  subjects  had  an  HSV  recording  from  their  higher  pitch.  Considering  that  the 

higher pitch requires a different glottal configuration and could also be an instance of interest for 

calibration, they were also included in the analysis. In summary, 36 different recordings were used 

for this experiment. 

2.4.1.2. Method 

Following  the  data  selection  step  from  the  registration  uncertainty  test,  10  frames  were 

randomly  selected  from  each  recording  and  were  saved  as  images.  Often,  the  frame  with  the 

maximum abduction resulted in the best visual exposure of the lesions. Additionally, from the 

saved 10 images, two images were randomly selected and were added again to the pool of saved 

 

43 

images (20% redundancy). This resulted in 36×12=432 files to be registered. Registration and 

analysis followed the steps described in section 2.3.3.1. 

2.4.1.3. Results 

First,  the  redundant  samples  were  excluded  from  the  analysis.  Figure  2.2(A)  shows  the 

computed score (the interquartile range) for each recording.  

(A)

(B)

Figure 2.2. Results of registration uncertainty test: (A) values of interquartile range for different patients and 

(C)omfortable and (H)igh pitch phonations, (B) estimated pdf of interquartile range over all recordings. 

 
A statistical approach was used to determine the appropriate value of the threshold. To that 

end, the probability density function (pdf) of the interquartile range over all 36 recordings was 

estimated  using  a  Gaussian  kernel.  Figure  2.2(B)  shows  the  result.  Based  on  this  figure  the 

interquartile range can be attributed to three different classes. The first class has a very high value 

of the interquartile range, corresponding to calibration with a high level of uncertainty and hence 

high error. The second class has a moderate value of the interquartile range. And the last class 

corresponds to common attributes that can be registered with a low level of uncertainty. The value 

of the threshold was computed as the minimum value of the estimated pdf between the second and 

the third classes. 

 

44 

To test hypothesis H1a, the computed threshold was used to split the data into two groups, 

one  with  a  high  level  of  uncertainty  (6  samples)  and  one  with  a  low  level  of  uncertainty  (30 

samples). Then, intra-sample registration variability was computed using the redundant samples 

that were excluded from the previous analysis (two per each recording). The absolute difference 

between  the  computed  ratios  for  each  pair  of  redundant  data  was  computed.  This  led  to  two 

measurements  per  each  recording.  Intra-sample  registration  variability  was  computed  as  the 

average of those two values. Table 2.1 presents the descriptive statistics of each group. 

Table 2.1. Descriptive statistics of intra-sample registration variability. 

Group 
High uncertainty  0.0453 
Low uncertainty 
0.0096 

Mean (pixel/pixel) 

std (pixel/pixel) 
0.0288 
0.0068 

 

 

The dependent variable for H1 was the intra-sample registration variability. The independent 

variable was groupings (high vs. low uncertainty). A two-sample t-test was used to check H1. The 

test rejected the null hypothesis (p<.00001, t=-6.28). Based on this result and the values in table 

2.1  we  can  conclude  that  the  high  uncertainty  group  had  significantly  higher  intra-sample 

registration variability. This confirms the efficacy of the proposed test for detecting instances with 

high levels of registration uncertainty. 

2.4.2. Experiment 2: Effect of phonatory configuration on the calibrated length 

The lesion is not present in post-surgery HSV recordings. Therefore, lesion size cannot be used 

for calibration of the post-surgery recordings. However, it is possible to find a common attribute 

between pre- and post-surgery recording and then use it for indirect calibration of the post-surgery 

data. In that sense, the lesion size would be used for the indirect calibration of the pre-surgery HSV 

data. Then the calibrated pre-surgery data would be used for indirect calibration of the post-surgery 

 

45 

HSV  data.  Considering  the  availability  of  the  mm  size  of  the  lesion,  the  outcomes  of  both 

calibrations could be used for between-subject size comparison applications.  

Going back to the consistency of the common attribute assumption, the mm size of the common 

attribute should be the same between the different imaging sessions, or imaging modalities. That 

is, the mm length of the lesion should be similar during the intraoperative imaging and the pre-

surgery HSV recording. Additionally, the mm length of the object selected for calibration of the 

post-surgery HSV data should be similar in pre- and post-surgery conditions. Unfortunately, these 

conditions  cannot  be  checked  directly.  However,  we  may  use  the  information  from  different 

phonatory configurations in the pre-surgery recordings and check the robustness of the selected 

common attribute for calibration of the post-surgery recording. Experiment 2 presents the results 

of this analysis for different common attributes.  

2.4.2.1. Experiment 2.a: Vocal fold length attributes 

The length of the vocal folds (or some part of it) may be used as a common attribute. This 

idea has been used in several studies.103,203,206 Considering the dependence of the fundamental 

frequency on the vocal fold length12, the following hypothesis was formed.  

H1c: 

Calibrated vocal fold length during high-pitch phonation is significantly larger than 

its length during comfortable pitch phonation. 

2.4.2.1.1. Database 

From  the  26  subjects  with  calibrated  intraoperative  images,  14  had  recordings  from  both 

comfortable and the high-pitch phonations. Based on the result of experiment1 three subjects had 

high registration uncertainties, and hence were excluded from this analysis. Therefore, 11 subjects 

were used in this experiment. 

 

46 

2.4.2.1.2. Method 

11 glottal cycles were selected randomly from each recording. The frames within each glottal 

cycle were visually inspected and the frame with the highest glottal opening along the anterior-

posterior direction was saved as an image. From the selected 11 images, one image was designated 

as  the  fixed  image,  and  the  remaining  10  images  were  designated  as  the  moving  image.  The 

anterior  commissure  and  the  posterior  part  of  the  vocal  folds  were  not  visible  in  some  of  the 

images. Therefore, for each subject, the fixed images from both phonation tasks (i.e. comfortable 

and high pitch) were visually inspected and two suitable anchor points (one in the posterior and 

one in the anterior) were selected and marked on both fixed images. Some example anchors include 

the anterior commissure, a blood vessel on the vocal fold or a nearby tissue, or the midline of the 

lesion.  Following  the  methodology  of  the  registration  uncertainty  test,  moving  images  from 

different recordings were randomized. The rationale for this randomization will be discussed in 

section 2.4.3.1. 

A GUI with two panels (one showing the fixed image superimposed with the anchor points, 

and one showing the moving image) was developed for measuring the length of the vocal folds 

between the two anchor points. Due to occlusion and cropping of the recording, the measured 

value may only be part of the vocal fold length, hence it is named the vocal fold length attribute. 

The  GUI  had  zooming  capability  for  improved  visual  inspection  and  enabled  marking  of  the 

anchor points on the target image. The pixel size of the vocal fold length attribute was measured 

as  the  Euclidian  distance  between  the  selected  two  anchor  points  on  the  moving  image. 

Additionally, the pixel-to-mm conversion scale of each recording was computed from the known 

mm size of the lesion. Finally, the calibration was achieved by multiplying the pixel size of the 

attribute by the pixel-to-mm conversion scale.  

 

47 

2.4.2.1.3. Results 

Figure 2.3 shows boxplot of the mm size of the vocal fold length attribute for each patient and 

each phonation task. 

16

14

12

10

8

6

4

C H

C H

C H

C H

C H

Subject ID/Task

C H

C H

C H

C H

C H

C H

 

Figure 2.3. Boxplot of mm size of vocal fold length attribute of each subject for (C)omfortable and (H)igh pitch 

phonations.  

 

The hypothesis H1c is based on the comparison of the mm sizes of the vocal fold length 

between the two phonation tasks. The mm size of the vocal fold attribute for each recording was 

computed  as  the  median  of  the  measurements  from  the  10  images.  Table  2.2  presents  the 

descriptive statistics of each phonatory group. 

Table 2.2. Descriptive statistics of the mm size of attributes of vocal fold length. 

 

Group 
Comfortable pitch  9.5 
High pitch 

10.53 

mean (mm) 

std (mm) 
2.21 
2.84 

 

To  test  the  hypothesis  H1c,  a  one-sided  paired-samples  t-test  was  used.  The  independent 

variable was the phonation task (comfortable vs. high pitch) and the dependent variable was the 

mm size of the vocal fold length attribute. The test detected a significant difference (p= 0.03, t= -

2.11) between the two conditions. Therefore, the vocal fold length attribute is not a robust common 

 

48 

attribute for indirect calibration. This is especially important given that, the intraoperative image 

is taken under the resting state of the vocal fold with low tension, while the HSV recordings are 

captured during the phonation where the vocal folds have higher tension. Consequently, using 

vocal fold length attributes could lead to a violation of the consistency of the common attribute 

assumption, unless from the domain knowledge we know that the visible length of the vocal fold 

was not changing between the two imaging sessions, or imaging modalities.  

2.4.2.2. Experiment 2.b: Vocal fold width 

Vocal fold width is another spatial feature that can be used for calibration. For this experiment, 

the following hypothesis was formed.  

H1d: 

Calibrated vocal fold width during high-pitch phonation is significantly different 

from a comfortable pitch. 

2.4.2.2.1. Database 

The data were similar to the experiment 2.a. 

2.4.2.2.2. Method 

11 glottal cycles were selected randomly from each recording. The frames within each glottal 

cycle were visually inspected for fining a frame where glottis would become very narrow, but not 

fully closed. Figure 2.4 shows an example image. The selected frames were saved as images. 

Considering that the left and the right vocal folds could have different widths, and also that width 

may be calculated at different locations along the anterior-posterior axis, the registration step could 

become inconsistent and hence susceptible to error. To remedy this, one image from the selected 

11 frames was designated as the fixed image, and the rest were designated as the moving image. 

 

49 

On the fixed image, the target side of measurement (i.e. left or right vocal fold) was marked. 

Additionally, for each subject, the fixed images from both high and comfortable pitches were 

visually inspected for finding a proper anchor point (i.e. a point with a clear visual appearance in 

both  phonation  conditions)  along  the  anterior-posterior  direction.  Some  examples  of  the  used 

anchor points were branching of a blood vessel on the vocal fold or a nearby tissue or specific 

topological attributes of the lesion. Figure 2.4 shows a fixed image with an anchor point selected 

based on a blood vessel on a nearby tissue. Following the methodology of registration uncertainty 

test images from different recordings were randomized. The rationale for this randomization will 

be discussed in section 2.4.3.1.  

A GUI with two panels (one showing the fixed image superimposed with the anchor point, 

and one showing the moving image) was developed for measuring the width of the vocal fold, 

using the following procedure. A line was fitted to the target edge of the vocal fold on the moving 

image (solid red  line in figure 2.4). Then the  anchor point was marked on the  moving image 

(symbol × in figure 2.4). A line perpendicular to the line fitted to the edge of the vocal fold was 

passed through the selected anchor point (dashed blue line in figure 2.4). Then, the intersection of 

this new line with the periphery of the vocal fold was marked using the mouse (symbol O in figure 

2.4). To reduce the inaccuracy of this selection, the selected point was analytically projected on 

the dashed line (point B in figure 2.4). The uncalibrated pixel width of the vocal fold was computed 

as the Euclidian distance between points A and B, where point A was the intersection of the two 

lines described above (figure 2.4). Finally, calibration was achieved by multiplying the pixel width 

of the vocal fold by pixel-to-mm conversion scale.  

 

50 

Figure 2.4. Measurement of the vocal fold width: (A) the reference image with designated vocal fold and the target 

anchor point, (B) the measurement steps. 

 

 

2.4.2.2.3. Results 

Figure  2.5  shows  boxplot  of  the  mm  width  of  the  vocal  fold  for  each  patient  and  each 

phonation task. 

10

9

8

7

6

5

4

3

2

C H

C H

C H

C H

C H

C H

Subject ID/Task

C H

C H

C H

C H

C H

 

Figure 2.5. Boxplot of mm size of vocal fold width of each subject for (C)omfortable and (H)igh pitch phonations. 

The hypothesis H1d is based on a comparison of the mm size of the vocal fold width between 

 

 

two phonation tasks. The mm width of the vocal fold for each recording was computed as the 

 

51 

median of the measurements from the 10 images. Table 2.3 presents the descriptive statistics of 

each phonatory group. 

Table 2.3. Descriptive statistics of the mm width of the vocal fold. 

Group 
Comfortable pitch  5.27 
High pitch 
5.08 

mean (mm) 

std (mm) 
1.56 
1.67 

 

 

To test hypothesis H1d, a two-sided paired-samples t-test was used. The independent variable 

was the phonation task (comfortable vs. high pitch) and the dependent variable was the mm width 

of the vocal fold. The test did not detect a significant difference (p= 0.51, t= 0.68) between the two 

conditions. Therefore, vocal fold width could be a robust common attribute for indirect calibration, 

and it may be used for indirect calibration. 

2.4.2.3. Experiment 2.c: Blood vessel on a vocal fold 

The  length  of  a  blood  vessel  is  another  spatial  feature  that  can  be  used  for  indirect 

calibration.206,207 This experiment explores the suitability of a blood vessel on the vocal fold. For 

this experiment, the following hypothesis was formed.  

H1e: 

Calibrated attribute of a blood vessel on the vocal fold during high-pitch phonation 

is significantly different from a comfortable pitch. 

2.4.2.3.1. Database 

From the data included in experiments 2.a and 2.b, seven subjects had a visible blood vessel 

on their vocal folds. Recordings from comfortable and high-pitch phonations of these subjects 

were used for this experiment. 

 

52 

2.4.2.3.2. Method 

The method was similar to the one described in experiment 2.a, but instead frames with the 

best  visual  appearance  of  the  blood  vessels  were  selected.  Additionally,  anchor  points  were 

selected based on unique features of each blood vessel, including their branching, or looping. 

2.4.2.3.3. Results 

Figure 2.6 shows boxplot of the calibrated length of a blood vessel on the vocal fold of each 

patient for both phonation tasks.  

1
0
P

2
0
P

6
1
P

9
2
P

4
4
P

0
5
P

2
5
P

)

m
m

(
 
e
t
u
b
i
r
t
t
a
 
l
e
s
s
e
v
 
d
o
o
l
B

 

Figure 2.6. Boxplot of mm size of an attribute of blood vessels on the vocal fold of each subject for (C)omfortable 

and (H)igh pitch phonations. 

The hypothesis H1e is based on a comparison of the mm size of a blood vessel on the vocal 

 

fold  between  two  phonation  tasks.  The  mm  size  of  the  blood  vessel  for  each  recording  was 

computed  as  the  median  of  the  measurements  from  the  10  images.  Table  2.4  presents  the 

descriptive statistics of each phonatory group. 

 

53 

Table 2.4. Descriptive statistics of the mm size of attributes of a blood vessel on the vocal fold. 

Group 
Comfortable pitch  4.06 
High pitch 
3.92 

mean (mm) 

std (mm) 
1.33 
1.61 

 

 

To test hypothesis H1e, a two-sided paired-samples t-test was used. The independent variable 

was the phonation task (comfortable vs. high pitch) and the dependent variable was the mm size 

of  some  attribute  of  the  blood  vessel  on  the  vocal  fold.  The  test  did  not  detect  a  significant 

difference (p= 0.53, t= 0.66) between the two conditions. Therefore, attributes of the blood vessel 

on the vocal fold is a robust common attribute for indirect calibration, and it could be used for 

indirect calibration. 

2.4.2.4. Experiment 2.d: Blood vessel on a nearby tissue 

The length of a blood vessel is a spatial feature that can be used for indirect calibration.206,207 

This experiment explores the suitability of a blood vessel on a nearby tissue. For this experiment, 

the following hypothesis was formed.  

H1f: 

Calibrated attribute of a blood vessel on a nearby tissue during high-pitch phonation 

is significantly different from a comfortable pitch. 

2.4.2.4.1. Database 

From the data included in experiments 2.a and 2.b, seven subjects had a visible blood vessel 

on a tissue near the vocal folds. Recordings from comfortable and high-pitch phonations of these 

subjects were used for this experiment. 

2.4.2.4.2. Method 

The method was similar to the one described in experiment 2.c. 

 

54 

2.4.2.4.3. Results 

Figure 2.7 shows boxplot of the calibrated length of a blood vessel on a nearby tissue of each 

patient for both phonation tasks.  

2
0
P

6
1
P

9
3
P

4
4
P

7
4
P

0
5
P

2
5
P

)

m
m

(
 
e
t
u
b

i
r
t
t
a
 
l
e
s
s
e
v
d
o
o
l
B

 

Figure 2.7. Boxplot of mm size of an attribute of blood vessels on a nearby tissue of each subject for 

(C)omfortable and (H)igh pitch phonations. 

 

The hypothesis H1f is based on a comparison of the mm size of a blood vessel on a nearby 

 

tissue between the two phonation tasks. The mm size of the blood vessel for each recording was 

computed  as  the  median  of  the  measurements  from  the  10  images.  Table  2.5  presents  the 

descriptive statistics of each phonatory group. 

Table 2.5. Descriptive statistics of the mm size of attributes of the blood vessel on a nearby tissue. 

Group 
Comfortable pitch  5.33 
High pitch 
5.12 

mean (mm) 

std (mm) 
3.09 
2.81 

 

 

To test hypothesis H1f, a two-sided paired-samples t-test was used. The independent variable 

was the phonation task (comfortable vs. high pitch) and the dependent variable was the mm size 

of some attribute of the blood vessel on a tissue near to the vocal fold. The test did not detect a 

significant difference (p= 0.35, t= 1.02) between the two conditions. Therefore, attributes of the 

 

55 

blood vessel on a nearby tissue is a robust common attribute for indirect calibration, and it could 

be used for indirect calibration. 

 

2.4.3. Experiment 3: Selecting the most suitable common attribute 

Often, we could select several common attributes for performing indirect calibration. Then, 

an important question is how to select the most suitable one. The aim of experiment 3 is to answer 

this question. 

2.4.3.1. Experiment 3a: Registration uncertainty of different common attributes 

Experiments 2.a-2.d were based on randomized measurements of different target objects from 

multiple recordings, and in that sense resemble the registration uncertainty test. Therefore, we 

could use a similar approach and estimate the registration uncertainty of each common attribute. 

To that end, the data was de-randomized and then the ratios between the 10 measurements per 

each recording and their median was computed. The interquartile range of the computed ratios was 

used as an estimate of the registration uncertainty. Table 2.6 presents the descriptive statistics of 

the registration uncertainty for different common attributes. 

Table 2.6. Descriptive statistics of registration uncertainty for different selections of the common attribute. 

mean (mm) 
Common attribute 
Vocal fold length attribute 
0.027 
Vocal fold width 
0.036 
Blood vessel on a vocal fold  
0.061 
Blood vessel on a nearby tissue  0.047 

std (mm) 
0.012 
0.019 
0.037 
0.027 

 

 

Based  on  table  2.6  the  vocal  fold  length  attribute  may  lead  to  the  lowest  registration 

uncertainty, followed by the vocal fold width, blood vessel on a nearby tissue, and finally blood 

vessel  on  a  vocal  fold.  Additionally,  for  each  subject,  we  could  compare  the  registration 

 

56 

uncertainty for different common attributes and determine the best one. This was computed as the 

average of the registration uncertainty of the two phonation tasks per each subject. Table 2.7 shows 

this, where the lowest value is presented in bold face letters. 

Table 2.7. Individual differences in registration uncertainty of each common attribute. 

Subject ID 

P01 
P02 
P16 
P29 
P39 
P44 
P47 
P50 
P52 
P53 
P54 

 

Common attribute 

VF length  VF width  VF blood  Nearby blood 
0.016 
0.015 
0.023 
0.04 
0.036 
0.025 
0.033 
0.024 
0.018 
0.041 
0.025 

0.058 
0.058 
0.076 
0.126 
× 
0.031 
× 
0.032 
0.049 
× 
× 

0.022 
0.048 
0.017 
0.036 
0.073 
0.036 
0.047 
0.027 
0.026 
0.027 
0.034 

× 
0.011 
0.086 
× 
0.049 
0.04 
0.051 
0.059 
0.033 
× 
× 

2.4.3.2. Experiment 3b: Size consistency of different common attribute 

 

2.4.3.2.1. Method 

Calibrated (i.e. the true) percent change (    ) of a target object was defined in section 
    =  ,   
−  ,   
×100% 
  ,   
Using the common attribute consistency assumption (i.e.   ,   
−  ,   
    =  ,   
  ,    
  ,    
  ,   
  ,    

=  ,   

), Equation 2-11 can be 

written as, 

(2-11) 

(2-12) 

2.3.2.2 as, 

 

 

57 

which leads to, 

 

=  ,   

+∆, 

approach  will  introduce  some  error  into  measurements.  Assuming  a  similar  pixel-to-mm 

If the common attribute consistency assumption is violated, we would have   ,   
−  ,   
     =  ,   
  ,    
  ,    
  ,   
  ,    
Obviously, if ∆=0, then      =    . However, if the size of the common attribute changes 
by ∆ mm between the two imaging sessions or imaging modalities,      ≠     and the indirect 
     =  ,     
   ,     
 
=1 there is no error (     =    ), but as 
  ,     ,   
  ,     ,   
of  error  increases.  Consequently,  we  could  use  =  ,     ,   
The value of   was computed for the four common attributes presented in experiment2. Table 
2.8 presents descriptive statistics of   for different common attributes. 

conversion scales for the target and the common object, we could simplify 2-13 into, 

×  ,     ,   

−1 

different common attributes. 

2.4.3.2.2. Results 

If  

 deviates from the value 1, the magnitude 

(2-13) 

(2-14) 

  for  comparing  the  consistency  of 

Table 2.8. Descriptive statistics of γ for different selections of the common attribute. 

mean (mm/mm) 
Common attribute 
Vocal fold length attribute 
1.11 
Vocal fold width 
0.968 
Blood vessel on a vocal fold  
0.94 
Blood vessel on a nearby tissue  0.974 

std (mm/mm) 
0.165 
0.21 
0.188 
0.136 

 

58 

 

 

Based on table 2.8, a blood vessel on nearby tissue may lead to the lowest measurement error, 

followed by the vocal fold width, blood vessel on a vocal fold, and finally the vocal fold length. 

Additionally, for each subject, we could compare values of   for different common attributes and 

determine  the  best  one.  Table  2.9  shows  individual  trends  regarding  the  size  consistency  of 

different common attributes. 

Table 2.9. Individual trends regarding the size consistency of different common attributes. 

Common attribute 

× 
0.868 
0.986 
× 
1.141 
0.93 
0.78 
0.962 
1.153 
× 
× 

 

Subject ID 

P01 
P02 
P16 
P29 
P39 
P44 
P47 
P50 
P52 
P53 
P54 

 

2.5. Discussions 

VF length  VF width  VF blood  Nearby blood 
1.012 
1.127 
1.299 
1.131 
1.359 
1.302 
1.04 
0.916 
1.18 
0.863 
0.976 

0.899 
0.583 
0.931 
1.217 
× 
0.987 
× 
1.003 
0.96 
× 
× 

0.835 
1.05 
0.844 
0.987 
1.416 
1.132 
0.632 
1.058 
0.82 
0.962 
0.913 

This  chapter  presented  a  formal  treatment  of  indirect  spatial  calibration.  The  purpose  of 

indirect calibration is to use images taken from different imagining modalities or, imaging sessions, 

to account for confounding factors of horizontal measurements. Depending on the type of available 

information,  the  outcome  of  indirect  calibration  could  be  used  for  within-subject  or  between-

subject  size  comparisons.  Specifically,  if  one  of  the  images  is  spatially  calibrated  (i.e.  mm 

measurement can be achieved) the outcome of indirect calibration could be utilized for between-

subject size comparison. However, if neither of the images are spatially calibrated, the outcome of 

indirect calibration could be utilized only for within-subject size comparison (e.g. pre/post changes 

 

59 

in the same person). The indirect calibration approach is based on identifying a proper common 

attribute between different recordings of each subject. The size of this common attribute is then 

used as a scale for spatial calibration. If the mm size of this common attribute is available, the 

calibration  will  map  all  images  into  a  standard  basis,  and  obviously,  between-subject  size 

comparison can be achieved. This chapter identified three conditions that govern the validity of 

the indirect calibration. These conditions were the registration accuracy assumption, the similarity 

in the vertical distance assumption, and the consistency of the common attribute assumption 

The registration accuracy stipulates that the common attribute can be identified and registered 

accurately for each subject across all images. For example, if images are taken from different 

angles,  the  common  attribute  would  be  recorded  differently  and  hence  some  error  will  be 

introduced during the registration step. Additionally, common attributes without a sharp contrast 

with  the  background  would  be  another  example  of  low  registration  consistency.  A  test  was 

proposed in this chapter that can estimate the magnitude of the registration uncertainty. In that 

regard, a high value of registration uncertainty may indicate a serious violation of the registration 

accuracy assumption, which means the presence of significant errors in the calibration outcome. 

As a rule of thumb, a common attribute that is larger and has a sharper contrast with the background 

would lead a lower registration uncertainty value.  

Another  assumption  of  the  indirect  calibration  was  the  existence  of  a  certain  relationship 

between the pixel-to-mm conversion scales of the common attribute and the target object. Pixel-

to-mm conversion scale may depend on the vertical distance, the spatial location of the object, and 

the  imaging  angle.208,209  However,  we  assumed  that  the  pixel-to-mm  conversion  scale  only 

depended on the vertical distance, and derived the similarity in the vertical distance assumption. 

Unfortunately, the vertical distance is lost during the imaging and consequently, evaluation of the 

 

60 

similarity in the vertical distance assumption is not an easy task (if possible, at all). In sections 

2.3.2.1 and 2.3.2.2 we derived conditions satisfying the vertical distance assumption. One solution 

assumed that the vertical distances from the endoscope to the common attribute, and from the 

endoscope  to  the  target  object  are  similar  between  different  imaging  sessions  or  imaging 

modalities. Achieving this condition is very hard in practice. The second solution assumed that the 

common attribute and the target object have the same vertical distance from the endoscope. This 

condition may be achieved by selecting a common attribute that is part of the region of interest. 

The consistency of the common attribute assumption stipulates that the actual size of the 

common attribute (i.e. its mm length) is fixed and does not change between different imaging 

sessions, or imaging modalities. This assumption is quite fundamental in the indirect calibration 

method  and  is  the  basis  of  its  validity.  Unfortunately,  checking  this  assumption  requires  the 

existence of spatially calibrated images (which obviously is not available), and hence cannot be 

done directly. However, for laryngeal images, a method was developed that could evaluate this 

assumption indirectly. The method was based on the comparison of sizes of the common attribute 

during different vocal behaviors and phonation tasks.  

This chapter used laryngeal images as a test bench for indirect calibration. The images were 

acquired  from  two  different  imaging  modalities  of  intraoperative  and  HSV  recordings.  The 

intraoperative recordings were still images and they provided the calibrated mm measurements of 

the  lesion.  Conversely,  HSV  data  were  not  calibrated  but  they  provided  the  motion  and  the 

vibration of the vocal folds. The indirect calibration was aimed to achieve a valid comparison of 

some spatial (or tempo-spatial) attributes of the vocal fold between different recordings (e.g. vocal 

fold length, vocal fold velocity, etc.). HSV data were recorded pre- and post-surgery. Assuming 

that size of the lesion is consistent between the intraoperative and HSV sessions, the pre-surgery 

 

61 

HSV recordings can be calibrated using the lesion. However, the lesion is not present in the post-

surgery recording and a different common attribute should be used. To that end, four common 

attributes of vocal fold length, vocal fold width, size of a blood vessel on the vocal fold, and size 

of a blood vessel on a nearby tissue were identified. The registration test was used to compare the 

registration accuracy of these four objects. Experiment 3.a showed that vocal fold length had the 

lowest registration uncertainty. This low registration uncertainty may stem from the fact that the 

vocal fold length was significantly longer than the other three common attributes. Additionally, 

the  dark  glottis  provides  a  very  sharp  contrast  for  accurate  detection  of  the  vocal  folds  and 

measurement  of  its  length.  Experiment  3.b  compared  the  consistency  of  the  four  common 

attributes.  Interestingly,  the  vocal  fold  length  had  the  lowest  consistency.  Additionally,  in 

experiments 2.a-2.d we saw that vocal fold length was the only attribute that was significantly 

different  between  different  phonation  tasks.  Therefore,  using  vocal  fold  length  for  calibration 

could add significant error into calibration and subsequent measurements. Table 2.10 presents a 

summary of the three assumptions for the indirect calibration using different common attributes. 

Based  on  table  2.10,  the  vocal  fold  width  may  provide  the  best  trade-off  between  the  three 

assumptions of indirect calibration. Another significant advantage of vocal fold width is the lack 

of ambiguity in its measurement. Specifically, some parts of the vocal fold may be occluded during 

the data collection which makes measurement of the vocal fold length ambiguous.  

Table 2.10. Comparing suitability of different common attributes for indirect calibration of vocal folds. 

Common attribute 
Vocal fold length attribute 
Vocal fold width 
Blood vessel on vocal fold  
Blood vessel on a nearby tissue  Low 

Registration consistency  Size consistency  Vertical distance consistency 
Highest 
High 
Lowest 

Lowest 
High 
Low 
Highest 

High 
High 
High 
Low 

 

62 

 

 

This work had several limitations that should be mentioned. The main assumption of this work 

was that the calibrated size of the lesion was not changing between the intraoperative and HSV 

recordings. The majority of the subjects included in this study were diagnosed with vocal fold 

polyps. Vocal fold polyp has been associated with increased stiffness210, which provides some 

evidence regarding the validity of this assumption. The small sample size was another limitation 

of this study. Specifically, the results regarding the vocal fold length and the vocal fold width were 

based on measurements from 11 subjects, and the results for blood vessel attributes were based on 

measurements from 7 subjects. 

2.6. Conclusions 

Calibrated spatial measurements from laryngeal images could provide significant benefits for 

voice  science  research  and  clinical  practice.  However,  the  calibration  of  endoscopic  images 

requires  the  existence  of  some  auxiliary  information.  Recent  advancements  in  laser-calibrated  

scopes may provide the required auxiliary information.122 However, that technology is still in its 

infancy and requires significant effort and investment to become fully developed. Additionally, 

the functionality of the laser-calibrated system depends on specialized hardware and software, 

which will not be widely available in a near future. Meanwhile, an alternative calibration approach 

that is more accessible is needed. The indirect calibration approach could be an answer to this 

need. The indirect calibration depends on identifying a proper object that is common in different 

images for achieving the calibration. This chapter presented a formal treatment of this problem 

and identified three fundamental assumptions behind the validity of the indirect calibration. These 

conditions  were  the  registration  accuracy  assumption,  the  similarity  in  the  vertical  distance 

assumption, and the consistency of the common attribute assumption. The registration accuracy 

stipulates that the common attribute can be registered with a small error on different images. The 

 

63 

similarity in the vertical distance stipulates that the common attribute and the region of interest are 

on  the  same  vertical  distances  from  the  endoscope.  The  consistency  of  the  common  attribute 

stipulates  that  the  calibrated  size  of  the  common  attribute  does  not  change  between  different 

images. A test was developed for evaluating the registration accuracy assumption. The similarity 

in the vertical distance and consistency of the common attribute could come from the domain 

knowledge  and  the  anatomy  of  the  larynx.  Calibrated  intraoperative  images  were  used  for 

calibration of pre- and post-surgery HSV recordings. Considering the absence of the lesion on 

post-surgery  HSV  recordings,  four  common  attributes  of  the  vocal  fold  length,  the  vocal  fold 

width, the length of a blood vessel on a vocal fold, and the length of a blood vessel on a nearby 

tissue  were  identified.  The  three  assumptions  of  indirect  calibration  were tested  on  these  four 

attributes and it was concluded that the vocal fold width may provide the best trade-off. 

 

64 

CHAPTER 3: APPLICATION OF INDIRECT HORIZONTAL CALIBRATION TO 

KINEMATIC MEASUREMENTS FROM IN-VIVO HSV RECORDINGS 

 

Based on: 

Ghasemzadeh H., Deliyski D. D., et al. Spatial segmentation of high-speed videoendoscopy with 
sub-pixel resolution using adaptive-thresholding and double curve fitting, in Preparation. 
 
Ghasemzadeh  H.,  Deliyski  D.  D.,  Hillman  R.  E.,  Mehta  D.  D.,  Verdolini  K.  A.  Post-surgery 
changes in vocal fold closing velocity in patients with mass lesions, in Preparation. 
 

 

Summary: Vocal fold kinematic measures are important features that can aid in modeling the input, 

output,  and  parameters  of  the  phonatory  system.  This  chapter  investigates  the  post-surgical 

changes  in  the  closing  velocity  of  the  vocal  folds  during  phonation  in  patients  with  VF  mass 

lesions.  Transoral  rigid  high-speed  videoendoscopy  from  habitual  pitch/loudness  of  sustained 

phonation from 16 subjects with benign vocal fold mass lesions were recorded pre- and post-

surgery,  along  with  spatially  calibrated  intraoperative  images.  HSV  recordings  underwent 

temporal  segmentation,  motion  compensation,  spatial  segmentation  and  spatial  calibration 

processes. The pre-surgical HSV images were spatially calibrated by registering the lesions from 

the  intraoperative  images.  The  vocal  fold  width  from  each  calibrated  pre-surgical  HSV  was 

selected,  then  registered  to  its  corresponding  post-surgical  HSV  to  provide  indirect  spatial 

calibration.  Three  different  experiments  were  conducted  to  investigate  the:  (1) post-surgical 

changes in closing velocity, (2) differences in pre- and post-surgical left-right closing velocity 

symmetry, and (3) association between post-surgical changes in closing velocity and lesion size. 

Significant post-surgery increases were found in the closing velocity of the surgically-treated vocal 

 

65 

fold  at  multiple  points  throughout  its  length.  The  contralateral  vocal  fold  showed  a  small 

insignificant improvement in the lesion contact area. (2) Closing velocity of the two vocal folds 

became more symmetric after surgery. (3) Post-surgical changes in closing velocity and lesion size 

were not significantly correlated. 

 

3.1. Introduction 

The closing velocity is an important kinematic feature of vocal folds’ vibration which relates 

to  their  collision  forces.25–27  The  closing  velocity  also  correlates  with  the  maximum  flow 

declination  rate26,163  and  the  maximum  area  declination  rate102,164,165,  which  have  established 

association with the average produced acoustic output29 and the vocal intensity.23,28 Additionally, 

based on the time-frequency duality of the Fourier transform211, a faster phenomenon results in 

high-frequency components. Consequently, it is expected for higher closing velocity to lead to an 

increase in the energy of high-frequency components of the voice, which in turn may improve the 

speech intelligibility.166 In summary, investigation of the closing velocity of the vocal folds could 

provide  significant  information  about  the  phonatory  mechanism  and  could  link  the  input  (i.e. 

airflow  measurements),  the  output  (i.e.  the  produced  acoustic  signal),  and  parameters  of  the 

phonatory system together. However, velocity is the calibrated displacement of an object with 

respect to time. Consequently, the computation of any velocity (including the closing velocity) 

depends on temporal and spatial measurements that are calibrated. In cameras, time is already 

calibrated. Therefore, only the spatial component should be calibrated, which can be done using 

the method developed in chapter 2. 

Spatial calibration of in-vivo laryngeal images is a challenging task, and therefore the number 

of  studies  on  velocity  measures  of  the  vocal  folds  is  very  limited,  and  also  limited  to  normal 

 

66 

subjects. Using the color Doppler imaging technique the vocal fold velocity of 68±10 cm/s was 

reported  for  comfortable  pitch  and  loudness  of  a  sustained  phonation  from  10  healthy  male 

subjects.212 Another study employed the photoglottography recordings from 32 healthy subjects 

covering a wide range of sound pressure levels (65.46- 86.89 dBA) and reported the vocal fold 

maximum closing velocity of 112±53 cm/s.213 Finally, using a parallel-laser project endoscope the 

average value of 100 cm/s was reported for the maximum velocity of the vocal folds from 9 healthy 

male subjects.25 A different study reported the values of 9 to 110 cm/s during talking and phonation 

from 20 normal subjects.214 Finally, a different study divided the uncalibrated velocity of the vocal 

folds by the vocal fold length, for achieving the spatial calibration.103 Considering the results and 

discussions of chapter 2 this approach could be prone to significant errors, especially for between-

subject comparison applications. 

3.2. Aim and hypothesis 

The project of this chapter has external funding, and it is tightly related to a recently approved 

NIH  R01  grant  R01  DC017923  (PI:  Verdolini  Abbott)  with  a  subcontract  to  Michigan  State 

University (sub-award PI: Deliyski). The second aim of that grant proposal is “to investigate the 

influence  of  children’s  physical  development  on  their  biological  response  to  voice  therapy”. 

Where the physical development will be quantified using velocity measures from HSV recordings. 

Also,  the  response  to  voice  therapy  will  be  measured  using  a  laser-calibrated  VSB  system. 

Therefore, this project could constitute an example of indirect horizontal calibration which was 

developed in chapter 2 of this dissertation. In that case, the auxiliary information would come from 

the calibrated VSB recordings. 

 

67 

This chapter is aimed at developing a method for computation of the closing velocity of the 

vocal folds and studying the post-surgery changes in the closing velocity of patients with vocal 

fold mass lesions. To this end, the following research question is answered in this chapter. 

Q2: 

How does the removal of a lesion from a vocal fold affects its kinematics? 

To answer this research question three hypotheses were formed that are presented in this section. 

Let m and a(t) denote mass and the instantaneous acceleration of a lumped model of a vocal 

fold, based on Newton's second law of motion we have, 

timestamps of the maximum abduction and the maximum adduction of the vocal fold from the 

 ( )= . ( ) 
where F(t) denotes the net exerted external force on the vocal fold. Let      and      denote the 
same glottal cycle. The time window between      and      is defined as the closing phase of the 
vocal folds. Additionally, the vocal folds are at rest at      and      and therefore, vocal fold 
(    < <    ), the closing velocity of the vocal fold at t can be computed as, 
 ( )=   ( ).  
Let, tmax be the time point during the closing phase (    <    <    ) that the magnitude of 
Equation 3-2 becomes maximum, then | (    )| is called the closing phase maximum velocity 

velocity would be equal to zero at these timepoints. Let t denotes a time during the closing phase 

 
    

 

(3-1) 

(3-2) 

and is the main dependent variable of this chapter.  

Assuming similar sub-glottal air pressures and aerodynamic characteristics between the pre- 

and post-surgery conditions, and also similar interactions between the airflow and vocal folds, 

similar forces would be exerted on the vocal folds in both conditions. Therefore, based on Equation 

3-1 we would have, 

 

68 

    .    ( )=     .     ( ) 
     ( )=          .    ( ) 

We can rearrange Equation 3-3 and derive, 

during the surgery, thus 

the upper bound of the integral we get, 

that  the  magnitude  of  the  acceleration  of  the  vocal  fold  would  increase  in  the  post-surgery 

In vocal fold mass lesions, some extra mass is accumulated on the vocal fold, which is removed 

recording. Assuming similar pitches between the pre- and post-surgery conditions, we can expect 

         >1. With everything else being equal, it is logical to hypothesize 
similar timings between the two recordings. Finally, by plugging 3-4 into 3-2 and using      for 
 (    )    =          .      ( ).  
 
    
Obviously  (    )   =∫
    ( ).  
     
 (    )    =          . (    )    
         >1, we get, 
| (    )    |>| (    )   | 

(3-5) 

(3-6) 

(3-7) 

 

, therefore, 

(3-3) 

(3-4) 

Using the 

Consequently, it is expected for the closing phase maximum velocity of the vocal fold to increase 

after the surgery. Another reason behind the expected increase in the velocity of the vocal folds is 

the improved post-surgery glottal closure. Specifically, vocal fold lesions have been associated 

with an incomplete glottal closure.168,210,215–219 The increased glottal gap has been associated with 

increased PTP.220 This may indicate reduced energy transfer from the air stream into the vocal 

 

69 

folds. In that regards, it is quite possible that |     ( )|>|    ( )| which would further increase 

the post-operative changes in the velocity. 

Based on the presented rationales, the following hypothesis is made, 

H2a: 

The  closing  phase  maximum  velocity  will  significantly 

increase  after 

phonomicrosurgery. 

Phonomicrosurgery probably would leave a scar on the vocal fold with the lesion221, which in 

turn,  leads  to  changes  in  the  biomechanical  properties  of  the  scarred-vocal  fold,  including 

increased stiffness.222 However, after the phonomicrosurgery mass and morphology of the vocal 

fold with the lesion and the contralateral vocal fold would become more similar. Considering that 

the goal of the surgery is to improve the voice, it is expected for the positive changes to outweigh 

the negative side effects of the surgery. Therefore, it is hypothesized that, 

H2b: 

For unilateral mass lesions, the closing phase maximum velocity of the two vocal 

folds will become more similar after the surgery. 

A lesion with a larger area probably indicates a larger accumulation of the extra mass on the 

vocal fold. It is expected that removing a larger mass leads to larger post-surgery changes in the 

velocity of the vocal fold. Additionally, a previous subjective study based on visual evaluation of 

HSV recordings has suggested that the area of a lesion is a better predictor for qualitative changes 

in the vibratory characteristics of the vocal folds (e.g. left-right phase asymmetry) than its length.93 

Based on these rational, the following hypothesis is made, 

H2c: 

Post-operative change in the closing phase maximum velocity will be positively 

correlated with the area of the lesion. 

 

70 

3.3. Material and Method 

3.3.1. Participants and data acquisition 

The  aims  of  this  chapter  were  pursued  using  retrospective  data.  Calibrate  intraoperative 

images  and  HSV  recordings  were  obtained  from  26  adults  with  vocal  fold  mass  lesions  at 

Massachusetts General Hospital. Subjects were recorded using a custom-built HSV system over 

two different sessions. The first session was before the surgery and the second recording was 

carried out on average 3.5 weeks after the surgery. The HSV system consisted of the following 

components, a color Phantom v7.3 camera (Vision Research, Inc., Wayne, New Jersey), a 300-

Watt xenon light (Model 7152A, PENTAX Medical Company Montvale, New Jersey), and a 70° 

10-mm rigid laryngoscope (Model 49-4072, JEDMED Instrument Co, St. Louis, Missouri). The 

recordings were done at a sampling rate of 6,250 fps with the maximum integration time and at a 

spatial resolution of 320×352 pixels. The surgery was performed using cold instruments and/or a 

532-nm pulsed potassium titanyl phosphate laser photoablation under general anesthesia. Before 

the operation, a surgical instrument with a known mm length was placed next to the lesion and an 

intraoperative image was recorded. 

Reviewing the HSV data showed that 6 subjects (p1, p24, p39, p43, p50, p51) did not have 

the post-surgery HSV data, recordings from the comfortable pitch of two subjects were missing 

(p40, p44), the pre-surgery recording from one subject was quite blurry (p34), and glottis in the 

pre-surgery recording of one subject was not visible (p46). These subjects were excluded, and the 

rest of the analyses were carried out using recordings from the comfortable pitch and comfortable 

loudness sustained phonations of the remaining 16 subjects.  

The registration uncertainty test described in section 2.3.3.1 was applied to all of the data 

including the redundant samples described in section 2.4.1. Based on this analysis, the value of 

 

71 

0.0357 was used as the threshold. Figure 3.1 shows a scatter plot of the registration uncertainty of 

the included subjects. The red dashed line represents the value of the threshold. 

Data points
Threshold

0.12

0.1

0.08

0.06

0.04

0.02

0

Subject ID

 

Figure 3.1. Result of registration uncertainty test for included subjects. 

 

Based on figure 3.1, subjects p3, p15, p25, and p37 had high registration uncertainty, and hence 

they were excluded from the rest of the analysis. Figure 3.2 shows the intraoperative images of 

these subjects. 

Figure 3.2. Intraoperative images from subjects with high uncertainty registration.  

 

Table 3.1 reflects demographic and diagnosis information from the included subject. 

 

72 

Table 3.1. Demographic and diagnosis information of the included subjects. 

Gender  Age  Diagnosis 

F 
F 
F 
F 
M 
M 
M 
F 
F 

M 
M 
M 

24 
23 
50 
40 
42 
52 
63 
17 
50 

27 
45 
40 

Right vocal fold mucoid polyp 
Left vocal fold hemmorhagic polyp 
Left vocal fold hemmorhagic polyp 
Left vocal fold polyp 
Hemmorhagic cyst on anterior aspect of left vocal fold 
Hemmorhagic polyp on left vocal fold 
Left vocal fold polyp 
Bilateral phonotraumatic vocal fold lesions 
Left vocal fold hemorrhagic polyp and a fibrovascular contact lesion on the right 
vocal fold 
Keratin cyst, sessile fibrovascular polyp, and residual sulcus on right vocal fold 
Right vocal fold hemmorhagic polyp 
Left vocal fold hemmorhagic polyp 

 

Subject 
ID 
P2 
P10 
P11 
P16 
P21 
P29 
P35 
P47 
P48 

P52 
P53 
P54 

 

3.3.2. Approach and measurements 

To  measure  the  velocity  of  the  vocal  folds,  a  series  of  pre-processing  steps  should  be 

performed. These steps include temporal segmentation, motion compensation, rotation correction, 

spatial segmentation, and horizontal calibration. These steps are described in the following.  

3.3.2.1. Temporal segmentation 

The act of phonation is a complex phenomenon and requires accurate timing between different 

body  organs  and  depends  on  specific  laryngeal  posture  and  glottal  configuration.223–225  The 

phonation  starts  with  the  pre-phonatory  adjustment  phase,  where  the  vocal  folds  take  the 

appropriate posturing.226 Additionally, there is a time lag between the first vibration of the vocal 

folds to the first glottal contact, also known as glottal attack time.116 Voice offset and voice break 

are other temporal characteristics of the phonation. Recording from a full phonation cycle includes 

most of these temporal features. Additionally, a single recording could include multiple repetitions 

 

73 

of  the  phonatory  cycle.  Considering  the  aim  of  this  chapter,  we  need  to  find  timestamps 

corresponding to the onset and offset of phonation. The purpose of temporal segmentation is to 

address this need. 

Temporal segmentation can be automated based on different glottal features. We adopted the 

method  based  on  the  fundamental  frequency  (f0)  contour.111  The  fundamental  frequency  was 

estimated based on the glottal area waveform (GAW) estimate.111 Main steps of the method are 

described here shortly. A more detailed description of the algorithm can be found in 111,116,227. First, 

the temporal difference between consecutive frames of the video recording was computed. Large 

temporal differences, corresponding to the movements of the vocal fold edges, were detected using 

a thresholding technique, and then they were summed over time. Following this process, a mask 

was created that contained the region of interest (i.e. all possible spatial locations of the edges of 

the vocal folds). Let f(x,y,t) denotes the result of applying the mask on the frame t of the recording. 

The y-direction second central moment of inertia (  ( , )) of f was computed as, 
  ( , )= ∫ ( , , )    
∫ ( , , )   −   ( , ) 
where   ( , ) was equal to, 
  ( , )=∫ ( , , )   
∫ ( , , )    
The y-direction estimated of the GAW (    ( )) was computed as the integral of   ( , ) over 
    ( )=   ( , )   
The x-direction estimated of the GAW (    ( )) was computed similarly. The final estimate of 

all rows of the image, 

(3-8) 

(3-9) 

(3-10) 

the GAW was computed as the root mean square of x- and y-direction GAW estimates, 

 

74 

   ( )= (    ( ) +    ( ) )/2 

(3-11) 

Finally,  the  fundamental  frequency  was  computed  based  on  windowing  and  autocorrelation 

analysis of the GAW estimate. Temporal segmentation was achieved based on the analysis of the 

fundamental frequency contour. Figure 3.3 shows an example of temporal segmentation outcome. 

fo contour
Vocal onset
Vocal offset
Phonatory segment

300

250

200

150

100

50

0

0

200

400

600

800

Time (ms)

1000

1200

1400

1600

 

Figure 3.3. An example of temporal segmentation outcome. 

 

3.3.2.2. Motion compensation 

The position of the endoscope could change during the HSV data collection. Such movements 

will lead to changes in the spatial location of the vocal folds and could impact the performance 

and accuracy of subsequent measurements or analysis. Motion compensation can be employed to 

account for endoscopic movements. Motion compensation is an image registration process that 

maps vocal folds  from  different  frames of  the  recording  into  a  fixed  and  constant  coordinate. 

Depending on the type of endoscopic movement, different types of motion compensations may be 

needed.227  We assumed a motion that leads to anterior-posterior and left-right displacement of the 

vocal fold in the HSV frames. The method proposed in125,227 was adopted to compensate for this 

factor. The main steps of the method are described here shortly, but a more detailed description of 

the algorithm can be found in125,227. 

 

75 

An intensity-based registration method was used for motion compensation.125 The key idea 

of the method is based on the fact that motions of vocal folds are happing much faster (70-400 Hz) 

than the motion of the endoscope.125 Therefore, we could use a low pass filter and separate the two 

components  from  each  other.  The  method  starts  with  computing  the  temporal  difference  of 

consecutive frames of the data. Then, the high-frequency components of the motions are filtered 

out. This step leads to the removal of the vibration of the vocal folds, and hence only the gross 

movements of the vocal folds will remain. Then, the region containing the vocal fold (ROI) is 

determined. This is achieved by applying a thresholding technique and only retaining pixels with 

high  intensities.  Next,  the  motion  vector  between  two  frames  of  the  data  is  determined.  The 

translation vector that minimized the least absolute difference (L1 norm) between intensities of the 

ROI  from  the  two  frames  was  selected  as  the  best  estimate.  Finally,  the  registration  task  was 

achieved by applying the motion vector on the data. Figure 3.4 shows kymogram of a data before 

and after motion compensation. The scanning lines of both kymograms were matched based on 

the location of a blood vessel from the first frame of the selected portion of each video data. 

Figure 3.4. An example of motion compensation: (A) kymogram before motion compensation, (B) kymogram after 

motion compensation. 

 

 

 

76 

3.3.2.3. Rotation correction 

The endoscope could have an angle relative to the vocal folds. This would result in vocal folds 

that are rotated in the image. More precisely, under such circumstances, the glottal midline would 

have an angle with the y-axis of the image. This rotation could change the kymogram and the 

subsequent velocity measurements. Figure 3.5(A) depicts a kymogram from a recording with a 

30° rotation. Our measurements showed a maximum excursion of 12 pixels for the vocal fold at 

each cycle. Figure 3.5(B) depicts the kymogram of the same recording after the correction. Our 

measurements  showed  a  maximum  excursion  of  10  pixels  for  the  vocal  fold  at  each  cycle. 

Comparing  these  two  conditions  shows  a  significant  error  in  uncorrected  data.  Specifically, 

uncorrected  data  shows  20%  higher  excursion,  which  translates  into  higher  velocity.  It  is 

noteworthy that, the scanning lines of both kymograms were matched based on the location of a 

blood  vessel  from  the  first  frame  of  the  corresponding  video  data.  Another  problem  with 

uncorrected data is that measurements from the left and right vocal folds would not be comparable. 

Because the left and right edges of the vocal folds in the uncorrected kymogram do not belong to 

the same section along the anterior-posterior axis.  

Figure 3.5. Effect of endoscopic rotation on the kymogram: (A) kymogram before rotation compensation, (B) 

kymogram after rotation compensation. 

 

 

 

77 

An automated method is presented here that can account for this factor. The method consists 

of four steps. 

Step1: Estimation of the GAW 

GAW was estimated based on an adaptive thresholding method. The method assumes that the 

location of the anterior commissure and the posterior end of the vocal folds is known. The user 

can provide these parameters by clicking the two ends of the vocal folds. A box with a width of 

100 pixels around the clicked points is selected from the recording. This box will enclose the vocal 

folds. The probability density function (pdf) of the red channel from the box is estimated using a 

Gaussian kernel. Considering the high number of data points, this step can be sped up by random 

sampling. Figure 3.6(A) depicts the estimated pdf of data for 100000 randomly selected samples. 

The pdf can often be modeled as a mixture of three different distributions. The first distribution 

would  be  an  estimate of  pdf  of  pixels  inside the  glottis.  The  second  distribution  would  be  an 

estimate of pdf of pixels on the vocal folds or the nearby tissues. The third distribution would be 

an estimate of the pdf of reflection lights. The black reference was defined as the bin corresponding 

to the deep between the first two peaks. Figure 3.6(A) illustrates this. The black reference was 

used for the thresholding of the data. GAW for each frame was computed as the number of black 

pixels. 

Step2: Finding frames with the maximum abduction from each glottal cycle. 

First, the ripples of GAW were removed by applying a Hanning window with a size of 5. 

Figure 3.6(B) presents the smoothed GAW of the data. Timepoints of all local maxima of the 

smoothed GAW were detected, and their corresponding frames were extracted from the data. 

 

78 

10

8

6

4

2

0

10-3

Data points
Threshold

1200
1100
1000
900
800
700
600
500
400
300
200

0

50

100
Pixel intensity

150

(A)

200

250

0

0.02

0.04

0.08

0.06
Time (s)

(B)

0.1

0.12

 

Figure 3.6. Estimation of the GAW: (A) pdf of the red channel, and the computed black threshold, (B) GAW 

estimate after applying the black threshold. 

Step3: Detection of the glottal midline.  

 

The following process was repeated for all extracted frames. The frame was thresholded using 

the black reference and it was converted into a binary image. The object with the largest area was 

selected and then it underwent the morphological operation of closing with a circular structuring 

element with a radius of 2 pixels. The first moment of inertia (corresponding to the center of glottis 

at  each  row)  was  computed  for  each  row  of  the  image.  Let  J(x,y)  denotes  the  binary  image. 

Equation 3-12 shows the formula for the computation of the first moment of inertia for row y (Iy).  

  =∫ ( , )   
∫ ( , )    

(3-12) 

A linear line was fitted on the computed centers of the glottis. The angle between this line and the 

x-axis was computed and stored for further analysis. Figure 3.7(B) shows the outcome of this step. 

Step4: Rotation correction 

Assuming  a  constant  rotation  angle  throughout  the  recording,  the  correction  angle  was 

estimated as the mean of values computed from all frames, after removing the top and bottom 5% 

of the data (trim mean with 0.1 level). This approach makes estimation of the angle robust to the 

 

79 

presence  of  outliers.  Finally,  all  frames  were  rotated  by  this  value.  Figure  3.7(C)  shows  the 

outcome for a frame of data. It is noteworthy that the method can easily be adapted to conditions 

where the rotation angle is changing throughout the recording.  

Figure 3.7. Rotation correction for a frame of data: (A) before correction, (B) segmented glottis with the fitted line 

on the first moment of inertia from each row, (C) after correction. 

 

3.3.2.4. Spatial segmentation 

Computation of the velocity of vocal folds depends on the accurate detection of the edges of 

the vocal folds. Spatial segmentation is the process that achieves this. Different methods have been 

proposed  in  the  literature  for  this  purpose,  including  intensity  thresholding98,  level  set 

segmentation120,  active  contours123,124,  and  region  growing118.  A  new  method  for  spatial 

segmentation is presented here that takes full advantage of the temporal and spatial redundancy of 

the vocal fold edges and can achieve a sub-pixel resolution. The method assumes that recordings 

are motion- and rotation-compensated. Additionally, the method assumes that the location of the 

anterior commissure and the posterior end of the vocal folds is known. The user can provide these 

parameters by clicking the two ends of the vocal folds. While this information can be estimated 

automatically (e.g. processing the temporal difference of frames), the user can provide it very 

accurately and without too much effort. This information is used as an initial estimation of the 

 

80 

glottal midline and the two ends of the glottis in a recording. The proposed algorithm consists of 

3 steps. 

Step1: Temporal curve fitting 

The spatial location of a certain point on a vocal fold edge cannot abruptly change from one 

frame to the next one. More precisely, the function determining the coordinate of a specific point 

on a vocal fold edge should be continuous in time. This step exploits this temporal redundancy of 

the data. To that end, kymograms of the recording between the two user-selected points were 

created. Then, the following processes were done on each kymogram.  

1.a: The local black threshold (i.e. the threshold for a specific scanning line along the anterior-

posterior axis) was computed. The 20th percentile of each row of the red channel of the kymogram 

was computed. A 5th order Hanning window was used to remove the ripples and to make the result 

smooth. A window with a size of 31, centered at the glottal midline was selected from the result 

(figure 3.8(A)), and its minimum was selected as the local black threshold. 

1.b: The ROI corresponding with the glottis was segmented. To that end, the red channel of 

the data was thresholded with the computed black reference. The clutters were removed for the 

computed  binary  image.  This  was  achieved  by  computing  the  area  of  all  objects,  and  then 

constructing their pdf using a Gaussian kernel. For multimodal distributions, the maximum size of 

the clutter was determined as the minimum between the first two peaks (refer to figure 3.6 for an 

example), and the value of 4 was used otherwise. The binary image underwent a closing operation 

with a circular structuring element with a radius of 1 pixel. A window with a size of 21, centered 

at the glottal midline was retained from the binary image, and the rest was set to zero. 

1.c Two different curves (one per each vocal fold edges) were fitted on the data. First, the 

ROI mask was summed on all columns. The result was smoothed with a 3rd order Hanning window. 

 

81 

The location of the maximum was recorded as the current midline estimate. Let M(:, i) denotes 

column i of the ROI. The row index of the first non-zero element of M(:, i) was stored in a variable 

called u(i). If all elements of M(:, i) were zero, the current midline estimate was stored in u(i). 

Using a similar approach, the last non-zero element of M(:, i) was stored in a variable called l(i). 

If all elements of M(:, i) were zero, the current midline estimate was stored in l(i). In that regard, 

u(i) and l(i) stored the initial estimate of y-coordinates of the two edges of the vocal folds from the 

kymogram.  Separate  curves  were  fitted  on  vectors  u  and  l.  Depending  on  the  vibrating 

characteristics of the vocal folds different types of curves may be employed at this step. If the 

kymogram has clear periodicity, using Fourier curves offers more robustness to noise and outliers. 

Otherwise, spline lines may be used. Due to the presence of lesions, some of our kymograms were 

not fully periodic, hence the spline curves with a smoothing factor of 0.1 were used. Figure 3.8 

depicts different stages of step1. 

Figure 3.8. Temporal curve fitting results: (A) local black reference estimation, the red window shows the search 

window, (B) ROI segmentation, (C) detection of vocal fold edges. 

 

Step2: Outlier removal 

The step1 only exploited the temporal redundancy of the data. That is, each scanning line in 

the anterior-posterior direction was segmented independently. Therefore, two points adjacent to 

each other on a vocal fold can show very strong and abrupt changes. Often, this phenomenon was 

 

82 

observed on the lesion site or the two ends of the vocal folds. This step takes care of such instances 

and prepares the data for the next stage.  

Executing the step1 results in two vectors per each scanning line. Each vector stores the x-

coordinates  of  one  of  the  edges  of  the  vocal  folds  for  different  time  points.  Therefore,  the 

information from step1 may be concatenated into L and R matrices. Let L(:, i) denotes the column 

i of matrix L, where it will store the x-coordinate of all points on the edge of the left vocal fold at 

time point i. A 9th order polynomial with the least absolute residuals (LAR) cost function was fitted 

on L(:, i). Rows corresponding to the absolute value of residual greater than 2 were designated as 

outliers  and  excluded  from  further  analysis.  Matrix  R  was  processed  similarly.  Figure  3.9(A) 

shows this step. 

Figure 3.9. Spatial curve fitting results: (A) outlier removal step, (B and C) segmented edges of the vocal fold for 

two different timepoints. 

 

Step3: Spatial curve fitting 

The x-coordinate of two adjacent points on a vocal fold edge cannot abruptly change in each 

frame of the data. More precisely, the function determining the edges of the vocal fold at each time 

point should be continuous in space. This step exploits this spatial redundancy of the data. To that 

end, a spline curve with a smoothing factor of 0.06 was fitted on every column of matrices L and 

 

83 

R. These curves will be the output of the spatial segmentation process and they will constitute the 

edges of the vocal folds for different time points. Figures 3.9(B-C) show the result. 

3.3.2.5. Horizontal calibration 

Computation of the velocity of the vocal folds depends on tracking mm displacements of the 

edges  of  the  vocal  folds  which  are  horizontal  measurements.  This  task  can  be  achieved  by 

computing the pixel displacements of the edges of the vocal folds and then converting them into 

mm displacement using the indirect calibration method developed in chapter 2. To that end, a 

proper  common  attribute  should  be  determined.  Three  steps  were  followed  for  horizontal 

calibration of HSV recordings. 

Step1: Computing the mm length of the lesion from the intraoperative images 

The  pixel  lengths  of  the  lesion  and  the  surgical  instrument  were  measured  from  the 

intraoperative image of each subject. This task was repeated 10 times and then their median was 

recorded.  Considering  the  known  mm  length  of  the  surgical  instrument,  the  pixel-to-mm 

conversion scale of the intraoperative image was computed. This value was then multiplied with 

the computed median of pixel length of the lesion to compute the mm length of the lesion. 

Step2: Calibration of pre-surgery HSV recording 

Ten timepoints were selected randomly from each HSV recording and then frames within 

their  corresponding  glottal  cycles  were  evaluated  subjectively.  The  frame  with  the  best  visual 

appearance of the lesion was selected. The pixel length of the lesion was computed from each 

selected frame. The median of these 10 measurements was used as the final estimate of the pixel 

length of the lesion. Considering the known mm length of the lesion (Step1), the pixel-to-mm 

conversion scale of the pre-surgery HSV data was computed. 

Step3: Calibration of post-surgery HSV recording 

 

84 

In chapter 2 we showed that the vocal fold width was a robust attribute for calibration of HSV 

recordings. Considering that the lesions are not present in the post-surgery recordings, the vocal 

fold width was used for calibration of the post-surgery data. Following the method described in 

section  2.4.2.2.2  pre-  and  post-surgery  recordings  of  each  subject  were  investigated  for  an 

appropriate anchor point. Ten frames from the pre- and post-surgery recordings of each subject 

were selected. Following the method described in section 2.4.2.2.2, the pixel width of the vocal 

fold was measured from all selected frames. The medians of the measurements from the pre- and 

post-surgery recordings were computed for each subject. Based on the outcome of the step2, the 

mm width of the vocal fold in pre-surgery data was computed. This value in combination with the 

median of pixel width of the vocal fold from the post-surgery recording was used for computation 

of the pixel-to-mm conversion scale of post-surgery HSV data.  

3.3.2.6. Velocity measurements 

Reviewing the data showed that each recording contained different numbers of glottal cycles. 

Additionally,  some  of  the  recordings  did  not  include  the  onset  or  offset.  To  make  this  factor 

uniform across all recordings, the most stable portion of each recording was detected and used for 

further analysis. The selection strategy was as follows. The uncalibrated GAW was computed 

based on detected edges. GAW was smoothed using a 5th order Hanning window. Indexes of the 

maximum (corresponding to the maximum abduction) were computed and used as timestamps for 

different glottal cycles. The vocal fold velocity depends on the magnitude of the lateral excursion 

of the vocal folds; therefore, the most stable region of phonation was determined based on the 

dynamics of the excursion of the vocal folds. Specifically, the average value of GAW in each 

glottal  cycle  was  computed.  The  fifty  consecutive  cycles  that  showed  the  lowest  value  of  the 

interquartile range for the mean of GAW were used for the rest of the analysis. This approach also 

 

85 

ensures that any possible occlusion of the vocal folds remains relatively constant. Figure 3.10 

presents a comparison between GAW from the least and the most stable portions of a recording. 

(A)

(B)

Figure 3.10. Selection of the data: (A) the least stable portion of a phonation, (B) the most stable portion of a 

phonation. 

 

The pixel displacements of the estimated edges between consecutive frames were measured 

and then converted into the mm displacements, using the appropriate pixel-to-mm conversion scale 

(section  3.3.2.5).  Finally,  the  velocity  of  each  vocal  fold  at  scanning  line  y  (corresponding  to 

location y along the anterior-posterior axis) was computed according to Equation 3-13. 

  ( )=  ( ) 
where   ( ) denotes the mm displacement of point y along the anterior-posterior axis on a vocal 

(3-13) 

 

fold edge between frames t and t+1, and τ denotes the time-difference between consecutive frames. 

τ can be computed based on the known frame rate of the recording. Our investigation showed that 

the computed velocities near the two ends of the vocal folds, and near the lesion site sometimes 

had very sharp discontinuity. To remedy this, each   ( ) was smoothed using a Hanning window. 

The  investigation  of  hypotheses  of  this  chapter  depends  on  inter-  and  intra-subject 

comparisons  of  vocal  folds  velocities.  For  a  vocal  fold  with  the  length  of  l  pixels,  l  different 

velocity time-sequences can be computed, per each vocal fold. However, meaningful inter- and 

 

86 

intra-subject comparisons depend on selecting comparable points on the vocal folds. This selection 

was subjected to multiple complications. First, the true length (i.e. mm) of the vocal fold would be 

dissimilar in different subjects. Second, investigating the data showed that the full vocal length 

was not visible in some of the recordings. This was primarily due to arytenoid hooding, epiglottis 

obstruction,  or  accumulation  of  significant  mucous  on  the  anterior  commissure.  Third,  the 

recording from different subjects was done at different working distances. In summary, the number 

of measured velocity time-sequences depends on the true length of the vocal fold, the imaging 

working distance, and the magnitude of the vocal fold occlusion. To tackle this problem, three 

different strategies were taken. Each strategy provides a scanning line y (corresponding to location 

y along the anterior-posterior axis) for computation of the velocity. 

The first strategy was based on finding the scanning line y that led to the maximum velocity 

measure. To that end, the scanning line with the maximum velocity measure was determined from 

each glottal cycle. This process led to 50 values. The mode of the computed value was used as the 

scanning line y. The second strategy was based on the scanning line y that passed through the 

middle of the lesion. The value of y was determined from each pre-surgery recording. A proper 

anchor point was selected for determining the comparable point on the post-surgery recording. The 

third strategy was based on the scanning line y that passed through the middle of the visible vocal 

fold.  Regardless  of  the  strategy  taken,  for  each  selected  scanning  line  y,  the  velocity  time-

sequences at lines [y-2, y-1, y, y+1, y+2] were computed, and then they were averaged (over the y-

direction) for the analysis. This step was taken to remove some of measurement errors. 

3.4. Experiments and results 

Three  experiments  were  conducted  to  answer  the  research  questions  of  this  chapter. 

Experiment 1 investigates changes in the closing velocity of the vocal folds following the surgery. 

 

87 

Experiment 2 presents the analysis on similarity between the closing velocity of the left and the 

right vocal folds in pre- and post-surgery conditions. Experiment 3 studies the association between 

the area of the lesion and the post-surgery changes in the closing velocity of the vocal fold with a 

lesion.  This  section  presents  details  of  each  experiment,  followed  by  results  and  related 

discussions. 

3.4.1. Experiment1: Post-surgery changes in closing velocity 

This experiment investigates the intra-subject changes in the closing velocity of the vocal fold 

following the surgery. The following hypothesis was formed for this experiment.  

H2a: 

The  closing  phase  maximum  velocity  will  significantly 

increase  after 

phonomicrosurgery. 

To investigate H2a, timestamps of the closing phrases of the vocal folds should be determined. 

GAW was computed based on the detected edges, and then it was smoothed using a 5th order 

Hanning window. Indexes of the local maxima (corresponding to the maximum abduction) and 

the local minima (corresponding to the maximum adduction) were computed. The time window 

between a minimum and its preceding maximum was defined as a closing phase. All closing phases 

were determined for each token.  

Following the discussion of the previous section, three different locations for measuring the 

closing velocities were used. The measurement from the scanning line that led to the maximum 

value will be represented as      
from the scanning line passing through the middle of the lesion will be represented as        
 
       
 
 and        for the left and right vocal 
the middle of vocal fold length will be represented as      

 for the left and right vocal folds. The measurement from the scanning line passing through 

 and      

 for the left and right vocal folds. The measurement 

 and 

folds. 

 

88 

Figure 3.11 shows the boxplots of      
 and      
subjects p16, p35, p47, p53, p54 showed a decrease in      

 for different subjects pre- and post-surgery. 

 following the surgery. Additionally, 

The most immediate observation is that different subjects have dissimilar behaviors. For example, 

the left and right vocal folds could show dissimilar trends following the surgery (e.g. p47 and p53). 

Right

70

60

50

40

30

20

10

0

80

70

60

50

40

30

20

10

Figure 3.11. Boxplot of closing phase maximum velocity for different subjects pre- and post-surgery: (A) box plot of 

Subject ID/Condition

(A)

     
A similar analysis was done for        
 

, (B) box plot of      
 and        
 

. 

Subject ID/Condition

(B)

this figure we see that most subjects had an increase of closing velocity at the lesion site. 

. Figure 3.12 shows the result. Based on 

 

 

Figure 3.12. Boxplot of closing phase maximum velocity for different subjects pre- and post-surgery: (A) box plot of 

(A)

       
 

, (B) box plot of        
 

. 

(B)

 

89 

A similar analysis was done for      

 and       . Figure 3.13 shows the result. Based on this 

figure we see that most subjects had an increase of closing velocity in the middle of the vocal fold.  

70

60

50

40

30

20

10

0

Subject ID/Condition

(A)

70

60

50

40

30

20

     

, (B) box plot of       . 

Subject ID/Condition

(B)

 

Figure 3.13. Boxplot of closing phase maximum velocity for different subjects pre- and post-surgery: (A) box plot of 
 

Finally,  investigating  figures  3.11-3.13  reveals  a  peculiar  trend  for  subject  p35.  Specifically, 

closing velocity for this subject shows a consistent decrease (with the exception of      

) in the 

closing velocity post-surgery. 

To quantify the qualitative trends observed in boxplots and to test H2a, a paired-sample t-test 

was adopted. The independent variable was the recording condition (pre/post) and the dependent 

variable was the maximum closing velocity at different scanning lines. The closing velocity for 

each subject was computed as the median of measurements from the 50 cycles. The Bonferroni 

correction was used to address the issue of the increased likelihood of type I error due to multiple 

testing. Table 3.2 shows the descriptive statistics of each measurement. Table 3.3 shows the result 

of t-tests. 

 

90 

Table 3.2. Descriptive statistics of closing velocity at different scanning lines (mean±std). 

Scanning 
location 

 
 

     
     
       
 
       
 
     
       

 

 
 

 
 

     
     
       
 
       
 
     
       

 

 
 

Pre (cm/s) 

Post (cm/s) 

38.72±13.53  44.13±12.43 
42.7±8.63 
50.37±18.01 
20.39±14.39  38.5± 13.81 
28.72±13.6 
44.26±16.83 
28.36±15.27  41.87±11.77 
37.1±9.77 
47.06±14.07 

p 

t 
1.28  0.23 
1.19  0.26 
3.58  0.004 
2.84  0.016 
2.63  0.02 
0.09 
1.8 

 

 

Table 3.3. Results of the paired-sample t-test for the closing velocity at different scanning lines. 

Scanning location 

 

Using the Bonferroni correction and the significance level of 0.05, only the closing velocity 

of  the  left  vocal  fold  at  the  lesion  site  (       
 
, and even        had also low p-values, but they did not 
,      
surgery. It is noteworthy that        
 

Referring to table 3.3 we see a positive t-value for this variable and therefore, we could conclude 

that the closing phase maximum velocity at the lesion site has increased significantly after the 

)  shows  a  significant  change  after  the  surgery. 

 

reach the significant level. It is quite possible for these variables, to become significant if we had 

a bigger sample size. Finally, we could see a consistent and interesting trend in tables 3.2 and 3.3. 

Specifically, the right vocal fold on average shows a higher closing velocity for all variables in 

both conditions (pre/post) than the left vocal fold. Investigation of table 3.1 shows that the majority 

of the subjects had a lesion on the left vocal fold. If this is correct, we may expect to see a bigger 

improvement in the closing velocity of the left vocal fold following the surgery. Referring to the t 

 

91 

column in table 3.3 we see a bigger t-statistic for measurements from the left vocal fold, which 

supports this expectation. A different analysis was used to test this subjective observation. 

The post-surgery changes in the closing phase maximum velocities of patients with unilateral 

lesions at different scanning lines were investigated. Table 3.4 shows the descriptive statistics, 

where         ,           ,          correspond  with  the  scanning  line  producing  the  maximum  closing 

velocity, the line passing through the middle of the lesion, and the line passing through the middle 

of the vocal fold with the lesion, respectively. Table 3.5 shows the results of the t-test for these 

variables. 

 

         
             
         

Table 3.4. Descriptive statistics of closing velocity at different scanning lines (mean±std). 

Post (cm/s) 
Scanning location  Pre (cm/s) 
34.8±11.95 
45.33±14.75  30% 
19.68±13.41  39.04±16.46  98% 
24.02±12.09  42.98±13.08  79% 

Improvement (cm/s) % 

Table 3.5. Results of the paired-sample t-test for the closing velocity of the vocal fold with a lesion at different 

scanning lines. 

Scanning location 

         
             
         

p 

t 
2.09  0.07 
3.65  0.005 
3.43  0.008 

 

 

 
Comparing tables 3.3 with 3.5 shows a consistent improvement (a lower p-value and a higher t-

statistic)  in  the  later  analysis  for  all  measures.  Finally,  a  similar  analysis  was  done  for  the 

contralateral vocal fold. Table 3.6 shows the results of the t-tests. This table shows the opposite 

behavior of table 3.5, where the t-statics show lower values (hence smaller effect sizes), and the 

p-value show higher values.  

 

92 

Table 3.6. Results of the paired-sample t-test for the closing velocity of the (cont)ralateral vocal fold at different 

scanning lines. 

Scanning location 

         
       
    
         

 

p 

t 
0.52  0.62 
1.75  0.11 
0.99  0.35 

 

 

In summary, we could make the following conclusion regarding the closing phase maximum 

velocity of the vocal folds following the surgery. The closing phase maximum velocity of the vocal 

fold with a lesion improves (at least) at multiple points following the surgery. The closing phase 

maximum velocity of the contralateral side only shows a small improvement at the location of the 

lesion (this improvement did not reach the significance level due to the small sample size). Finally, 

the closing phase maximum velocity could be computed from different scanning lines along the 

anterior-posterior axis. The result of this experiment suggests that the selection of the scanning 

line could have a significant effect on the potency of the measure for explaining the intervention 

outcome. For example, the scanning line producing the maximum velocity is the easiest approach 

to implement, as it does not need a registration step (i.e. finding a comparable scanning line in 

different recordings). However, it may produce a significantly inferior outcome (none of the p-

values  even  reached  the  0.05).  Conversely,  employing  the  scanning  line  passing  through  the 

middle of the lesion seems to be the most promising location for computing the velocity measures. 

The p-values for this measure from the lesioned-vocal fold and the contralateral side produced the 

smallest p-values and the largest t-statistic (hence a larger effect size).  

3.4.2. Experiment2: Post-surgery similarity between the two vocal folds  

Phonomicrosurgery probably leaves a scar on the vocal fold with the lesion221, which in turn, 

leads to changes in the biomechanical properties of the scarred-vocal fold, including increased 

 

93 

stiffness.222 However, after the phonomicrosurgery mass and morphology of the vocal fold with 

the lesion and the contralateral side would become more similar. Considering that the goal of the 

surgery is to improve the voice, it is expected for positive changes to outweigh the negative side 

effects of the surgery. Additionally, based on tables 3.5 and 3.6 the vocal fold with a lesion showed 

a higher improvement following the surgery, comparing to the vocal fold without a lesion. This 

higher improvement may compensate for the small value for the vocal fold with the lesion at the 

baseline (table 3.4). Therefore, we may expect for kinematics of the two vocal folds to become 

more similar following the surgery. Therefore, it is hypothesized that, 

H2b: 

For unilateral mass lesions, the closing phase maximum velocity of the two vocal 

folds will become more similar after the surgery. 

To test H2b two separate paired-sample t-tests were used per each scanning line. First, the pre-

surgery differences in the closing phase maximum velocities between the vocal fold with the lesion 

and the contralateral side were investigated. Then, the same process was repeated for the post-

surgery recording. Table 3.7 shows the results for     ,        , and     . 

Table 3.7. Results of paired-sample t-test for pre- and post-surgery recordings. 

Scanning location 

     
        
     

  Pre-surgery 

  Post-surgery 

p 

t 
3.36  0.008 
  4.82  0.0009 
  4.92  0.0008 

t 
0.86 
  0.97 
  0.96 

p 
0.41 
0.36 
0.36 

 

 

Based on the results of table 3.7 we see that the vocal fold with the lesion has a significantly 

lower closing phase maximum velocity (positive value of t) than the contralateral side in pre-

surgery condition for all scanning locations. However, none of the tests were significant for post-

surgery  condition.  Therefore,  we  could  conclude  that  the  two  vocal  folds  were  significantly 

 

94 

dissimilar in pre-surgery data, but they become similar following the surgery. Figure 3.14 shows 

the individual behavior of         for both conditions, which corroborates the findings of table 3.7. 

60

50

40

30

20

10

0

70

60

50

40

30

20

Subject ID/Condition

(A)

Subject ID/Condition

(B)

 
Figure 3.14. Boxplot of closing phase maximum velocity for the vocal fold with the lesion and the (cont)ralateral 

side for different subjects: (A) pre-surgery condition, (B) post-surgery condition. 

 

3.4.3. Experiment3: Effect of lesion size on post-surgery changes 

A lesion with a larger area probably indicates a larger accumulation of the extra mass on the 

vocal fold. It is expected for the removal of a larger mass to lead to larger post-surgery increase in 

the velocity of the vocal fold. Additionally, subjective visual evaluation of HSV recordings has 

suggested that the area of a lesion is a better predictor for qualitative changes in the vibratory 

characteristics of the vocal folds (e.g. left-right phase asymmetry) than its length.93 Based on these 

rational, the following hypothesis is made, 

H2c: 

Post-operative change in the closing phase maximum velocity will be positively 

correlated with the area of the lesion. 

The intraoperative images were imported into an image editing software and then the area of 

the lesion was painted blue (figure 3.15(A)). The edited images were imported into Matlab and the 

numbers of solid blue pixels (i.e. red =0, green =0, and blue =255) were counted. This number 

corresponds  with  the  uncalibrated  (i.e.  pixel)  areas  of  the  lesions.  Calibration  was  done  by 

 

95 

multiplying the uncalibrated area with the square of the pixel-to-mm conversion scale computed 

from the corresponding intraoperative image. Figure 3.15(B) shows a scatter plot of post-surgery 

changes in         computed from the vocal fold with the lesion. 

the lesion (B) Scatter plot of post-surgery changes in              vs. area of the lesion. The outliers are marked by a red 

Figure 3.15. The relationship between area of a lesion and its post-surgery improvement: (A) The blue region shows 

 

circle. 

 

The  scatter  plot  shows  that  two  data  points  were  not  follow  the  trend.  These  data  points 

belonged to subjects p11 and p29. Investigation of the recordings from these two subjects showed 

that they share two common characteristics. First, the size of the lesion was very big. Second, the 

lesion site was near the anterior commissure. Edges near the two ends of the vocal fold have low 

excursions and hence small maximum velocities. Therefore, removing a very big lesion from these 

locations  could  have  a  smaller  impact  on  the  velocity.  To  test  hypothesis  H2c  the  correlation 

coefficient between the post-surgery changes in the              and the calibrated area of the lesion 

was  computed.  Considering  the  above-mentioned  differences  for  subjects  p11  and  p29,  two 

different cases were tested. First, these samples were included in the analysis (N=10). In the second 

analysis, these outliers were excluded (N=8). Table 3.8 shows the results. 

 

96 

Table 3.8. Correlation between post-surgery changes in the closing velocity and the area of the lesion. 

Sample size (N) 
10 
8 

r 
0.17 
0.16 

p 
0.63 
0.71 

 
Based on the results of table 3.8 the null hypothesis cannot be rejected, which could be due to the 

 

small sample size. 

3.5. Discussions 

The closing velocity is an important kinematic measure of the vocal folds' vibratory motion. 

For example, the closing velocity relates to collision forces between the two vocal folds25–27, as 

well as, to the average produced acoustic output29 and vocal intensity.23,28 Additionally, previous 

studies have suggested that closing velocity could be a predictor for tissue elasticity102,103, and 

hence  a  predictor  of  physical  development  of  the  vocal  folds.103  Therefore,  accurate  velocity 

measures could significantly improve our understanding of the normal and disordered phonatory 

mechanisms. There is also a general agreement in the association between the phonotrauma and 

the collision forces between the vocal folds.169,210,216,228 Therefore, clinical diagnosis and treatment 

could significantly benefit from further research into velocity measures.  

This  chapter  provided  the  required  methodology  for  accurate  measurements  of  calibrated 

horizontal (i.e. medial-lateral direction) velocity of the vocal fold edges. Two primary steps were 

performed  to  achieve  this.  First,  a  method  with  a  sub-pixel  resolution  was  developed  for  the 

segmentation  of  the  edges  of  the  vocal  folds.  This  was  done  using  an  adaptive  thresholding 

technique, followed by fitting proper curves on temporal and spatial domains. Second, calibrated 

velocity measures require the existence of calibrated time and space. Fortunately, HSV videos are 

temporally calibrated, that is, the time difference between consecutive frames is known. However, 

 

97 

spatial information is not readily calibrated. The spatial calibration was done using the method 

developed in chapter 2. Specifically, the intraoperative images were used to determine the mm 

lengths of the lesions, which in turn were used for spatial calibration of the pre-surgery HSV 

recordings. The mm widths of vocal folds at specific locations along the anterior-posterior axis 

were measured from the pre-surgery HSV data, and then they were used for calibration of the post-

surgery  HSV  data.  The  employed  method  was  used  to  measure  the  closing  phase  maximum 

velocity of subjects with mass lesions pre- and post-surgery. Based on table 3.2 the closing phase 

maximum velocity of the pre-surgery condition was on average between 28.36 cm/s and 42.7 cm/s, 

depending on where the measurements were computed from. A similar measurement from the 

post-surgery condition was on average between 38.5 cm/s and 50.37 cm/s, depending on where the 

measurements were computed from. We may compare these values with the velocity reported in 

other studies. Using the color Doppler imaging technique the vocal fold velocity of 68±10 cm/s 

was reported for comfortable pitch and loudness of a sustained phonation from 10 healthy male 

subjects.212 Using the photoglottography the vocal fold maximum closing velocity of 112±53 cm/s 

was reported for 32 healthy subjects covering a wide range of sound pressure levels (65.46- 86.89 

dBA).213  Finally,  using  a  parallel-laser  project  endoscope  the  average  value  of  100  cm/s  was 

reported for the maximum velocity of the vocal folds for 9 healthy male subjects.25 Considering 

that our subjects had voice disorders, the computed values seem to be in a sensible range. 

Assuming  a  vocal  fold  with  the  length  of  l  pixel,  we  could  compute  2l  time-sequences 

describing the velocity of every point on the edges of the two vocal folds at every time. Obviously, 

this high number of measurements has a lot of redundancy and should be reduced. Such reduction 

should  have  two  parts.  First,  each  velocity  time-sequence  should  be  represented  by  a  limited 

number of attributes. This step is a temporal reduction. Next, computed attributes from the 2l points 

 

98 

on the edges of the two vocal fold should be represented by a limited number of features. This step 

is a spatial reduction. In this chapter, the temporal reduction was achieved by selecting time points 

in the closing phase that led to the maximum velocity (i.e. the closing phase maximum velocity). 

The spatial reduction was achieved by just selecting certain points on the edges of the vocal folds. 

These were: the point leading to the maximum value, the midpoint of the vocal fold length, and 

the midpoint of the vocal fold lesion. Results from experiment1 indicated a significant effect for 

the spatial reduction operation. Specifically, the measurement from the point with the maximum 

value showed the least discriminative power (i.e. the lowest effect size between pre/post), followed 

by the measurement from the middle of the vocal folds, and then the measurement from the middle 

of the lesion. This outcome suggests that future studies should consider this factor during their 

experiment designs.  

Referring to table 3.4 we see for patients with unilateral lesions the closing phase maximum 

velocity of the vocal fold with the lesion on average improves by 98%, 79%, and 30% at the 

midpoint of the lesion, midpoint of the vocal fold length, and the point with the maximum closing 

velocity, respectively. Referring to table 3.5 we can conclude that the closing velocity of the vocal 

fold with the lesion improves significantly, at the midpoint of the lesion and the midpoint of the 

vocal fold length. The improvement for the point with the maximum closing velocity also showed 

a promising improvement, but due to the small sample size, it did not reach the significance level 

(p=0.07>0.05/3). However, this was not the case for the contralateral side. Specifically, table 3.6 

did not establish a significant improvement for the contralateral side. The line passing through the 

midpoint of the lesion (based on the other vocal fold) was the only location that showed some level 

of improvement. However, due to the small sample size, it did not reach the significance level 

( =0.11> .    ). In summary, the finding from experiment1 suggests that the closing velocity of 

 

99 

the vocal fold with a lesion improves, at least, at multiple points along the length of the vocal fold 

following  the  surgery.  However,  the  improvement  of  the  contralateral  side  is  more  local  and 

probably more limited to the area in direct contact with the lesion.  

Experiment2 provided some evidence regarding the similarity of the vibration of the two vocal 

folds following the surgery. Specifically, table 3.7 showed that the closing velocities of the two 

vocal  folds  during  the  pre-surgery  phonation  were  significantly  different,  at  least,  at  multiple 

locations.  However,  the  closing  velocities  of  the  two  vocal  folds  after  the  surgery  were  not 

significantly different. This finding suggests that kinematics of the two vocal folds become more 

similar after the surgery. Finally, experiment3 investigated the association between the area of the 

lesion and the post-surgery improvement in the closing velocity of the vocal fold with the lesion. 

Table 3.8 indicated a very weak association between the two, and the correlation failed to reach 

the  significance  level.  Considering  the  small  sample  size,  a  firm  conclusion  cannot  be  made. 

However, this result suggests that the area of the lesion is not a good predictor for closing velocity 

improvement. 

3.6. Conclusions 

This chapter was motivated by the importance and the relevance of closing velocity of the 

vocal fold for clinical applications, and voice science research. The computation of the calibrated 

velocity measures depends on two primary steps. The accurate segmentation of the edges of the 

vocal  folds  with  sub-pixel  resolution,  and  the  spatial  calibration  of  the  recording.  A  new 

segmentation  method  based  on  an  adaptive  thresholding  technique,  followed  by  fitting  proper 

curves  on  temporal  and  spatial  domains  was  presented.  An  indirect  approach  based  on 

intraoperative images was employed for calibration of the pre- and post-surgery HSV recordings. 

Investigation of post-surgery changes revealed a significant effect for the location that the velocity 

 

100 

is computed from. The line passing through middle of the lesion showed the highest improvement 

(an average improvement of 98%). Additionally, the analysis suggested that the closing velocity 

of the vocal fold with a lesion improves, at least, at multiple points following the surgery. However, 

the improvement of the contralateral side was more local and probably more limited to the area in 

direct contact with the lesion. Furthermore, the result showed that the closing velocity of the two 

vocal folds become more similar following the surgery. This study also investigated the association 

between the size of the lesion and the post-surgery closing velocity improvements. The result 

showed a very weak association between the two (r=0.17), which did not reach the significant 

level (p=0.63). 

 

 

101 

CHAPTER 4: DIRECT VERTICAL CALIBRATION OF HSV RECORDINGS 

 

Based on: 

Ghasemzadeh H., Deliyski D. D., Ford D. S., Kobler J., Hillman R. E., Mehta D. D. Method for 
Vertical  Calibration  of  Laser-Projection  Transnasal  Fiberoptic  High-Speed  Videoendoscopy. 
Journal of Voice. 2020 Nov;34(6):847-861. doi: 10.1016/j.jvoice.2019.04.015. PMID: 31151853; 
PMCID: PMC6883161. 
 

 

Summary:  The  ability  to  provide  absolute  calibrated  measurement  of  the  laryngeal  structures 

during phonation is of paramount importance to voice science and clinical practice. Calibrated 

three-dimensional measurement could provide essential information for modeling purposes, for 

studying  the  developmental  aspects  of  vocal  fold  vibration,  for  refining  functional  voice 

assessment  and  treatment  outcomes  evaluation,  and  for  more  accurate  staging  and  grading  of 

laryngeal disease. Recently, a laser-calibrated transnasal fiberoptic endoscope compatible with 

high-speed videoendoscopy (HSV) and capable of providing three-dimensional measurements was 

developed. The optical principle employed is to project a grid of 7×7 green-laser points across the 

field of view (FOV) at an angle relative to the imaging axis, such that (after calibration) the position 

of each laser point within the FOV encodes the vertical distance from the tip of the endoscope to 

the laryngeal tissues. The purpose of this chapter was to develop a precise method for vertical 

calibration of the endoscope. Investigating the position of the laser points showed that, besides the 

vertical  distance,  they  also  depend  on  the  parameters  of  the  lens  coupler,  including  the  FOV 

position within the image frame and the rotation angle of the endoscope. The presented automatic 

calibration method was developed to compensate for the effect of these parameters. Statistical 

image processing and pattern recognition were used to detect the FOV, the center of FOV, and the 

 

102 

fiducial marker. This step normalizes the HSV frames to a standard coordinate system and removes 

the dependence of the laser-point positions on the parameters of the lens coupler. Then, using a 

statistical learning technique, a calibration protocol was developed to model the trajectories of all 

laser points as the working distance was varied. Finally, a set of experiments was conducted to 

measure  the  accuracy  and  reliability  of  every  step  of  the  procedure.  The  system  was  able  to 

measure absolute vertical distance with mean percent error in the range of 1.7% to 4.7%, depending 

on the working distance. 

 

4.1. Introduction 

Typical images are two-dimensional representations of the real world. Considering that the 

real world has a three-dimensional (3D) spatial structure, images are not a true representation of 

the actual phenomena that are being captured. Basically, for any pixel of an image, we could 

construct a hypothetical square pyramid such that its tip is on the sensor of the camera and its base 

is toward the front of the camera. Anything inside this pyramid would be represented by the same 

pixel, or equivalently, all the space inside that pyramid is squeezed into a single point on the image. 

This model predicts several important features for an image. If the pyramid contains several objects 

at different distances, the closest one gets recorded. Also, based on this model the height of the 

pyramid is lost during the imaging. Finally, the size of an object in the image depends on its 

distance from the camera. The main aim of this chapter is to devise a method that can estimate the 

height of this hypothetical pyramid for laryngeal endoscopy. Assuming an upright position for the 

patient during the laryngeal imaging, this height would correspond to the vertical distance between 

the tip of the endoscope and different points on the superior view of the larynx. Therefore, the term 

vertical distance is used for the rest of this chapter. 

 

103 

The  larynx  has  a  3D  structure,  and  its  different  components  reside  at  different  vertical 

distances. Extrinsic laryngeal muscles could also elevate or depress the larynx12,229 which would 

lead to changes in the vertical distance of the larynx from the endoscope. Additionally, vocal folds 

have 3D morphology and in fact, their vibration is happening in both horizontal and vertical planes. 

Multiple studies have predicted the significant role of the vertical component of the vibration on 

the phonation.164,172–174 Therefore, measuring the vertical movements of the larynx and the vertical 

component  of  the  vibration  of  vocal  folds  could  provide  significant  amount  of  information 

regarding the mechanism of normal and disordered phonations. At the same time, the ability to 

obtain absolute horizontal measurements from laryngeal tissues and structures may depend on 

estimating their distances from the endoscope. It is expected for accurate horizontal and vertical 

measurements from in-vivo laryngeal images to provide essential information for modeling of the 

vocal  fold  behavior151,230,  studying  the  developmental  aspects  of  vocal  fold  vibration153  and 

laryngeal tissues, better evaluation of treatment outcome of voice disorders, and more accurate 

grading of relevant laryngeal diseases.191 To achieve these goals laryngeal imaging systems should 

provide calibrated measurement capabilities. 

Researchers have been working on augmenting the laryngeal imaging systems with absolute 

measurements and/or 3D reconstruction capabilities for more than two decades.25,153,190–192,194,231–

234  Most  often,  these  goals  are  achieved  by  projecting  a  laser  pattern  with  certain  topological 

properties  on  the  field  of  view  (FOV)  and  then  using  the  information  from  the  position  and 

displacement  of 

the 

laser  pattern 

for  achieving  absolute  measurement  or  3D 

reconstruction.191,192,194,231,235,236 Three main components can be identified in (almost) all systems 

that have been designed for this purpose: the laser projection component, the imaging component, 

 

104 

and 

the  endoscopic 

instrument.  These 

three  components  determine 

the  functionality, 

characteristics, and capabilities of the final imaging system. 

Considering the underlying principles for creating the laser pattern three main categories may 

be distinguished. Systems in the first category use the well-known laser triangulation principle for 

performing measurements.237 The main idea behind systems in this category is to project a laser 

point (or line) on the target surface and then record the scene from a different angle. The angle 

difference between the laser projection and the imaging axes captures the vertical displacement of 

the target surface. The single-point231,232 and single-line233 laser projection systems fall under this 

category.  Systems  in  the  second  category  have  been  developed  based  on  the  projection  of 

structured laser lights. These systems project a set of (commonly two) parallel laser beams with 

known  horizontal  distance  on  the  target  surface.  Then,  the  distance  between  the  parallel  laser 

patterns on the image acts as a scale for converting pixel into mm. Two-point25,190,192, two-parallel-

line234, and multiple-parallel-line153 projection systems are examples from this category. Finally, 

systems in the third category have combined structured light projection with the laser triangulation 

technique  for  achieving  the  desired  measurement  goals.  The  multiple-point  laser  projection 

systems are examples of this category.191,194,235 It is noteworthy that systems from each category 

have  different  functionalities.  Systems  from  the  first  category  could  only  capture  the  vertical 

movements of the target surface, whereas systems from the second category are typically used for 

absolute measurements on the horizontal plane. The systems in the third category are by far the 

most flexible approach and, depending on the design, can provide detailed information regarding 

vertical  movements  and  absolute  measurements  on  the  horizontal  plane.  This  wealth  of 

information  comes  at  the  cost  of  more  complex  hardware  (optical)  and  software  (algorithm) 

design. 

 

105 

Figure 4.1 presents a schematic of different laser projection systems. Fig 4.1(A) shows the 

projection of a single laser beam on a target surface (S1). When the surface S1 moves h mm in the 

vertical direction, the laser point moves Δ mm in the horizontal plane. This horizontal component 

is captured on the image as a δ-pixels displacement. Fig 4.1(B) shows a projection of two parallel 

laser points on a target surface (S1). The actual distance between laser points (d mm), is reflected 

by a δ-pixel distance on the image. Fig 4.1(C) shows a schematic image of the combined approach. 

Specifically, hypothetical positions of laser points on the image for two different vertical distances 

are shown in red and green colors. Change in the vertical distance leads to the displacement of the 

laser pattern by D pixels. Additionally, the distance between pairs of laser points (d1, d2) could be 

used for horizontal measurements.  

Figure 4.1. Schematics of different laser projection techniques with the principle of encoding the vertical and/or 
horizontal distances: (A) laser triangulation method, (B) structured light projection, (C) a combined technique. 

Green and red dots depict hypothetical positions of the laser pattern at two different vertical distances. 

 

 

Considering the optical imaging component, two main technologies of VSB and HSV can be 

differentiated. VSB has been the “gold standard” approach for clinical voice evaluations108,110,111 

and it “provides real-time audiovisual feedback and continues to be the imaging modality of choice 

by voice clinicians.”108 This technique uses very short flashes of light and takes a sequence of 

pictures from different glottal cycles and then assembles them into a motion picture. An external 

trigger based on the vibratory phase of the acoustic or electroglottographic signal determines the 

 

106 

time of the flashes. In this fashion, the assembled images represent a slow motion of the true 

vibration of the vocal folds.104,105 Consequently, VSB does not present the actual vibratory patterns 

of the vocal fold, and its captured images would substantially deviate from the true pattern as the 

vibration becomes irregular and aperiodic.93,108 On the other hand, HSV systems capture the true 

vibratory patterns of the vocal fold, and therefore it is more appropriate when studying the intra-

cycle  characteristics  of  vocal  fold  vibration.105,111  In  summary,  the  imaging  component  would 

determine the temporal resolution of the captured images and consequently, it has a significant 

role in the type of phenomena that can be captured and studied. Systems based on stroboscopy are 

applicable  to  stationary  phenomena,  whereas  HSV  systems  can  be  used  for  capturing  non-

stationary behaviors such as onset and offset of phonation and also aperiodic phonation. 

Considering  the  type  of  endoscopic  instrument,  two  categories  of  rigid  and  flexible 

endoscopes are available. The rigid endoscope provides images with better spatial resolution and 

visual quality but at the same time it affects the voice and speech production due to transoral 

insertion that requires unnatural retraction of the tongue for adequate laryngeal exposure, thus, 

only  limited  types  of  stimuli  can  be  elicited.  On  the  other  hand,  flexible  endoscopy  does  not 

interfere with articulators and speech can be produced with minimal interference, therefore, it 

could be more ecologically valid. Additionally, there are fewer restrictions on the type of stimuli 

that could be produced, thus, it could be used for analysis and studying of the vibratory pattern of 

vocal folds during connected speech.116 Finally, flexible endoscopes provide the possibility of 

simultaneous recordings of the aerodynamic measurements.132–134 

Table 4.1 summarizes the taxonomy of different systems with laser projection capabilities in 

the literature. Recently, we developed a new flexible, fiberoptic endoscope with laser-projection 

capabilities.195 The new system uses a flexible endoscope for accessing the superior view of the 

 

107 

larynx,  which  allows  eliciting  a  wide  range  of  stimuli,  while  at  the  same  time  the  optical 

characteristics of the laser projection system were designed to be compatible with HSV systems 

and  provide  good  visual  contrast  between  laser  points  and  the  background.  The  system  was 

designed  so  that  absolute  measurements  in  both  horizontal  and  vertical  planes  are  possible. 

Combining these characteristics, the new system could provide 3D information regarding the vocal 

fold vibratory pattern and the laryngeal configuration during laryngeal maneuvers, phonation, and 

connected speech.  

Table 4.1. Literature-based taxonomy of different imaging systems with laser projection. These abbreviations were 

used in the table:   VSB (videostroboscopy), HSV (high-speed videoendoscopy), 3D (three-dimensional 

reconstruction), nm (nanometer), mW (milli Watt). 

Imaging 

Endoscope  Functionality 

Other notes 

90°, rigid 

horizontal 

Year  Ref. 

Laser 
pattern 
[192]  2-point 

1997 
2001 
2002 
2004 
2004 
2006 
2008 

[193]  1-point 
[25] 
2-point 
[233]  1-point 
[190]  2-point 

[191]  23-

point 
[234]  1-line 

Projection 
technique 
parallel beams 

triangulation 
parallel beams 
triangulation 
parallel beams 

structured light+ 
triangulation 
triangulation 

VSB 

VSB 
HSV 
HSV 
VSB 

VSB 

HSV 

2008 

[235]  2-line 

parallel lines 

HSV 

90°, rigid 

2010 
2013 

[194]  196-
point 

[153]  21-line 

structured light+ 
triangulation 
parallel lines 

HSV 

HSV 

70°, rigid 

70°, rigid 

70°, rigid 
90°, rigid 
70°, rigid 
70°, rigid 

flexible 

90°, rigid 

vertical  
horizontal  
vertical 
horizontal 

horizontal  

vertical+horizont
al along a single 
line 
vertical+horizont
al along two 
lines 
vertical+ 
horizontal+3D 
horizontal 

red laser, 670 nm, 3 mW 
power per laser point 
red laser, 643 nm 
red laser, 633 nm 
1 mW power 
green laser, 1 mW power at 
source 
green laser, 150 mW power at 
source 
red laser, 653 nm, irradiance 
of 1800 W/m2 

red laser, 635 nm, 22 mW 
power 

- 

green laser, 300 mW power at 
source, irradiance of 1100 
W/m2 at working distance of 
30 mm 
532 nm, 150 mW power at 
source, 80 mW at the tip of 
the endoscope, irradiance of 
1000 W/m2 at working 
distance of 60 mm 
green laser, 520 nm, 55 mW 
at source, 20 mW at the tip of 
the endoscope, irradiance of 
372 W/m2 at working 
distance of 20 mm 

 

2016 

[236]  324-
point 

structured light+ 
triangulation 

HSV 

70°, rigid 

vertical+ 
horizontal+3D 

2019 

[195]  49-

point 

structured 
light+triangulati
on 

HSV 

flexible 

vertical+ 
horizontal+3D 

 

108 

To achieve the above-mentioned measurement goals, the laser-projection endoscope should 

be calibrated first. This chapter presents the methodology for vertical calibration and subsequent 

measurements. 

4.2. Aim and hypothesis 

The  main  aim  of  this  chapter  is  to  develop  the  methodology  of  vertical  calibration  and 

subsequent vertical measurement for a laser-projection transnasal fiberoptic HSV system. The 

main research question of this chapter is: 

Q3: 

How could we use a structured laser projection system for measuring the vertical 

distance between the distal tip of a flexible endoscope and the target surface? 

To answer this research question and to pursue the aim of this chapter, two hypotheses were 

formed that are presented in this section.  

Referring to figure 4.1(A) horizontal and vertical displacements of each laser point could be 

related using trigonometric rules. Equation 4-1 shows this. 

Δ=ℎ.   ( ) 
 = .Δ 
  .   ( ) 
ℎ=

At the same time, the horizontal displacement component (Δ) and the pixel displacement (δ) are 

related through the magnification factor of the camera (m).  

Combining Equations 4-1 and 4-2 we would have, 

(4-1) 

(4-2) 

(4-3) 

Based on Equation 4-3 the mm vertical displacement (h) is a function of the pixel displacement 

(δ),  magnification  of  the  camera  (m),  and  the  angle  difference  between  imaging  and  laser 

projection axes (θ). Additionally, magnification of the camera depends on the focal length of its 

 

109 

lens (f) and the vertical distance between the object and the focal point of the lens (x).238 Equation 

4-4 shows this, 

 =   

(4-4) 

Based on the presented Equations, it is hypothesized that, 

H3a: 

The position of each laser point will be a unique and deterministic function of the 

vertical  distance  between  the  distal  tip  of  the  flexible  endoscope  and  the  target 

surface, once the confounding factors are accounted for. 

Considering the hypothetical pyramid of introduction, the resolution of the system decreases 

as the working distance increases. It means that for every mm increase in Δ, the number of pixel 

displacement on the image (δ) would be a decreasing function of the working distance. Therefore, 

it is hypothesized that  

H3b: 

Vertical measurement error will be positively correlated to working distance. 

4.3. Material and method 

4.3.1. Laser-projection endoscope 

A  surgical  flexible  endoscope,  Fiber  Naso  Pharyngo  Laryngoscope  Model  FNL‑15RP3 

(PENTAX  Medical,  Montvale,  NJ),  with  three  channels  (surgical,  imaging,  and  light-delivery 

channels)  was  used  for  developing  the  laser-projection  endoscope  with  absolute  measurement 

capabilities.195 The surgical channel is used for delivering a green laser light with a wavelength of 

520 nm to the distal tip of the endoscope, where a diffraction-based system splits it into a mesh-

pattern of 7×7 laser points. The size of the laser pattern is 16×16 mm at a working distance of 

20 mm.  The  imaging  channel  of  the  endoscope  allows  for  coupling  the  endoscope  with  a 

color/monochrome high-speed digital camera and recording of the superior view of the larynx with 

 

110 

the projected laser pattern at distance ranging from 5 mm to 35 mm. The third channel utilizes a 

fiberoptic light-delivery system that can be coupled with a xenon light source with power up to 

300 W. Figure 4.2 depicts the calibrated endoscope with its main components. 

Figure 4.2. The calibrated flexible endoscope with an insertion tube diameter of 4.9 mm and its main components. 

 

4.3.2. Calibration protocol and recordings 

To achieve the absolute measurements in the vertical plane the endoscope should be calibrated 

first. More specifically, the position of the laser points in the FOV is a non-linear function of the 

lens-coupler parameters and the working distance. Calibration is the process that accounts for these 

factors  and  finds  the  mathematical  function  for  decoding  the  desired  measurements  from  the 

positions of the laser points. To find that function, a data-driven approach based on statistical 

pattern recognition and statistical learning techniques were adopted. 

The setup presented in chapter 1 (figure 1.2) with one degree of freedom was used in this 

chapter. Specifically, the tilting angle was kept fixed and at zero angle (i.e. perpendicular imaging 

angle) and only the working distance was varied. The laser-projection endoscope was connected 

to a high-speed monochrome camera Phantom v7.1 (Vision Research Inc., Wayne, NJ) using a 45-

 

111 

mm lens coupler and a 300-Watt xenon light source. Considering that calibrations are typically 

done under a controlled environment and the best possible settings, a monochrome camera was 

used for this phase. Monochrome cameras have higher sensitivity comparing to their color versions 

and they don’t use the Bayer-decomposition filters.239 These characteristics result in a sharper 

image with better-defined edges. It is noteworthy that using the monochrome camera does not 

impose restrictions on the application of the system, and the calibrated endoscope can be used with 

a color camera after calibration. The camera and the endoscope were mounted on a vertical plane 

perpendicular to the target surface and FOV was recorded at the speed of 7000 frames per second 

with a spatial resolution of 288×280 pixels. The target surface was attached to an adjustable arm 

that allowed to regulate with high precision the working distance to the distal end of the endoscope. 

The working distance was varied from 5 mm to 35 mm using a 1-mm step and it was measured 

using a digital height gauge with an accuracy of 0.001’’ (0.03 mm). Figure 4.3 presents a diagram 

of the recording conditions. 

Figure 4.3. A diagram of the recording conditions. 

 

 

Accurate measurement of working distance depended on accurate leveling of the arm of the 

gauge with the distal end of the endoscope (figure 4.4(A)), which should be determined visually 

and therefore time-consuming and subject to variability. Therefore, in the setup, a fixture was 

 

112 

placed  about  2 cm  above  the  distal  end  of  the  endoscope,  and  the  following  procedure  for 

measuring the distance between the tip of the endoscope and the top surface of the fixture was 

implemented. The measurement arm of the gauge was positioned on the top surface of the fixture 

and the height was recorded (figure 4.4(B)). Then, the measurement arm was positioned parallel 

to the tip of the endoscope and the height was recorded again. To check the leveling of the two 

surfaces, a 13-megapixel smartphone camera was positioned on the same vertical level as the tip 

of the endoscope and the digital magnification feature of the camera was used to fine-tune the 

position of the adjustable arm (figure 4.4(A)). These steps were repeated ten times and then the 

results were averaged. The average distance to the fixture was 45.01±0.03 mm and the average 

distance to the tip of the endoscope was 21.85±0.11 mm. The measurement of the distance to the 

fixture shows a lower value of standard deviation, supporting better accuracy of measurement 

when the fixture is used as the reference point. From these measurements, the distance between 

the tip of the endoscope and the top surface of the fixture was estimated to be 23.16 mm. 

Figure 4.4. Calibration setup: (A) measuring the distance to the tip of the endoscope, (B) measuring the distance to 

the fixture. 

 

Two different recordings were made at each working distance. In the first recording, a white 

piece of paper was used, the xenon light was turned off, and the laser projection system was turned 

 

113 

on with maximum power. In the second recording, a multi-resolution grid paper (1-mm, 2-mm, and 

10-mm boxes) was used, the laser projection system was turned off, and the xenon light was turned 

on. Throughout this chapter, these two recordings will be referred to as laser recordings and grid 

recordings, respectively. Each of these two sets of recordings serves a different purpose in the 

calibration procedure. The laser recordings are used for finding the accurate position of laser points 

in the FOV, whereas the grid recordings are used for estimating the parameters of recordings. The 

grid recordings are also necessary for the horizontal calibration of the system, which is the topic 

of the next two chapters. It is noteworthy that these two recording conditions are only used to 

remove confounding factors from different calibration processes and to maximize the accuracy, 

but they don’t impose any restrictions on the application of the system, and they don’t need to be 

replicated during clinical data collection. Finally, since the intensity of pixels increases at shorter 

working distances leading to possible saturation of the image, the exposure time of the camera and 

the power of the light source were adjusted at each step to prevent image saturation. 

4.3.3. Measuring vertical distance 

The position of the laser points in the captured image is a deterministic function of the vertical 

distance  between  the  distal  tip  of  the  endoscope,  the  target  surface,  and  the  lens-coupler 

parameters. This  section  presents  the  automatic  approach for  compensating  the  effect  of  lens-

coupler parameters and for decoding the vertical distances from the positions of laser points. 

4.3.3.1. Compensating for the lens-coupler parameters 

Some of the lens-coupler parameters change the position of the laser points in the FOV even 

if the working distance is kept constant. Those parameters include the focal distance of the lens 

coupler connecting the endoscope to the camera and the position and angle of the endoscopic 

 

114 

eyepiece relative to the lens coupler. To decode the vertical displacements, first, these parameters 

should be estimated from the recordings and then compensated for. After that, the positions of the 

laser  points  become  only  a  function  of  the  vertical  distance  and  could  be  used  for  the 

measurements.  The  effects  of  different  lens-coupler  parameters  and  the  corresponding 

compensation approaches are presented as follows. 

4.3.3.1.1. Recording model 

The focal distance of the lens coupler determines the magnification of the camera. Using 

higher magnification results in an image where everything is larger. Therefore, the number of 

pixels between certain laser points (equivalently x-y coordinates of the laser points in the image) 

would depend on the magnification of the camera. The second variability comes from the rotation 

of the endoscopic eyepiece inside the lens coupler attached to the camera. Because the camera is 

fixed, the recording frame would remain constant, but the FOV with everything inside of it would 

undergo a rotation transformation. Therefore, when the endoscope gets rotated, the projected laser 

pattern would also get rotated. This means that the x-y coordinates of the laser points in the image 

would depend on the endoscope rotation. The last variability stems from the displacement of the 

eyepiece within the lens coupler. More specifically, the position of the eyepiece inside the lens 

coupler is not fixed and it can move in the horizontal plane. When the eyepiece is displaced, the 

whole FOV is displaced within the image frame. Consequently, the x-y coordinates of the laser 

points in the image would depend on the position of the eyepiece within the lens adapter. 

To account for variations due to these lens-coupler parameters, first, we need to have a model 

that describes the effect of each parameter on the recorded images. The model that was used for 

this purpose consists of three main transformations of scaling (effect of magnification), rotation 

(effect of eyepiece rotation), and translation (eyepiece displacements). This model aims to map the 

 

115 

recordings  with  variable  parameters  into  a  fixed  and  standard  coordinate  system  where  x-y 

coordinates of the laser points are independent of those lens-coupler parameters.  

Let I(x,y) and i(x,y) denote the original image and a pixel from it and J(x´,y´) and j(x´,y´) 

denote the mapping of that image in the standard coordinate system and the corresponding pixel 

in the new image. Also, let T, R, I2, and k denote a translation vector, a rotation matrix, an identity 

matrix with the size of two, and a scaling factor, respectively. Equation 4-5 shows the model. 

  ′ ′ = .  . .(    + ) 
First, the parameters should be determined so that the new image ( ( , )) is invariant from the 

Now, if values of T, R, and k are determined, the mapping can be carried out. Considering the aim 

of mapping, a few considerations should be taken into account when determining these parameters. 

(4-5) 

lens-coupler parameters. Second, the estimation of those parameters should be computationally 

efficient. Third, the estimated parameters should be relatively robust to different sources of noise.  

The effect of the eyepiece displacements manifests itself as the position change of FOV in the 

image frame. Also, the effect of the focal length of the lens coupler is manifested through the FOV 

size.  Therefore,  both  magnification  and  eyepiece  displacement  can  be  compensated  by  the 

parametrization of the FOV. Both visual inspection and objective assessment confirmed that FOV 

can be estimated with a circle. Fortunately, very efficient algorithms have been developed for the 

parametrization of circular objects.240,241 Additionally, circles have very well-defined and smooth 

topological shapes, which makes estimation of their parameters robust to noise. Therefore, the 

translation  transformation  (T)  was  defined  so  that  the  center  of  the  new  coordinate  system 

coincides  with  the  center  of  FOV.  Also,  the  radius  of  FOV  was  used  to  account  for  the 

magnification effect making the size of pixels constant. Flexible endoscopes have a fixed fiducial 

marker on their distal end that remains fixed relative to FOV and the target surface (figure 4.5). 

 

116 

This fiducial marker helps with determining the orientation during flexible endoscopy. The laser 

projection optics are glued inside the surgical channel and their position relative to the fiducial 

marker  is  fixed,  therefore,  the  position  of  the  fiducial  marker  can  be  used  as  a  reference  for 

compensating  the  effect  of  rotation  of  the  endoscope  within  the  lens  coupler.  Figure 4.5 

summarizes the model. Based on this model the recorded image (I(x,y)) undergoes a series of 

transformations including a translation, a rotation, and a scaling, and gets converted into a new 

image (J(x,y)) in a standard coordinate system. In the translation phase, the center of the coordinate 

system is shifted to the center of the FOV. Rotation transformation brings the fiducial marker to a 

predetermined position (e.g., 0 degree in figure 4.5). Finally, the scaling transformation stretches 

or shrinks the FOV so that its radius gets equal to a predetermined value of the radius r. 

Figure 4.5. Model for compensating the recording parameters of the system. 

 

4.3.3.1.2. Automatic estimation of the mapping 

Based on Equation 4-5, the mapping consists of three main transformations. As shown in 

Figure 4.5 the parameters of those transformations can be estimated based on two components of 

the  image.  That  is,  the  center  of  FOV  and  its  radius  are  used  for  estimating  parameters  of 

translation and scaling transformations, and the angle (θ) between the line connecting the center 

of FOV to the fiducial marker and the horizontal axis determines the parameter of the rotation 

 

117 

transformation. This section presents the algorithms and image processing techniques that were 

used for finding these two important landmarks. 

First, the detection of FOV is investigated. The lighting channels of the endoscope provide 

illumination for the FOV, which is then trimmed by the field of view of the endoscope, leaving 

the  pixels  outside  of  the  endoscopic  circle  quite  dark.  Therefore,  it  is  possible  to  apply  a 

thresholding technique and find a rough estimation of FOV. However, any error in estimating the 

center and radius of FOV would change the position of the laser points in the standard coordinate 

system, introducing an error in estimating the vertical distances. To find a more robust approach, 

the FOV finder module requires an additional source of information. Assuming the noise and 

distortions  have  linear  effects,  the  geometrical  shape  of  FOV  would  remain  intact.  Therefore, 

combining the geometrical information with the illumination differences inside and outside of 

FOV could help devise a robust method. 

FOV has a circular shape and therefore the pixels on its boundary can be expressed using a 

precise mathematical Equation. Let  =      denote the center of a circle with radius r, Equation 4-
{( , )∈  ∶ ( −  ) +( −  ) =  } 

6 shows the locus of points on the perimeter of that circle. 

(4-6) 

The Hough transform is a very popular approach in the computer vision community, which initially 

was  developed  for  the  detection  of  lines  and  other  analytically  defined  shapes  (e.g.  circles, 

ellipses)242,  but  later  on,  it  was  extended  to  other  shapes.243  Considering  that  FOV  has  an 

analytically  well-defined  shape,  Hough  transform  can  be  used  for  capturing  the  geometrical 

information of FOV. In summary, the FOV finder module consists of two steps. In the first step, a 

thresholding  technique  is  applied  to  the  grayscale  image.  This  step  uses  information  from 

differences between the intensity of pixels inside and outside of FOV and converts the grayscale 

 

118 

image into a binary image. In the second step, the binary image is fed into the Hough transform 

algorithm, where it finds the center and radius of a circle that fits the binary image the best. 

The second landmark in the image is the fiducial marker (figure 4.5). The position of this 

landmark  relative  to  the  center  of  the  FOV  and  horizontal  line  determines  the  rotation 

transformation parameter. To make its detection as accurate and robust as possible, two different 

sources  of  information  were  identified  and  combined.  First,  the  fiducial  marker  is  fabricated 

through a physical notch in the FOV. Therefore, it is the most likely region outside the FOV to be 

bright and hence there would be differences between the intensity of the pixels within the fiducial 

marker comparing to other regions outside of FOV. Second, the fiducial marker is attached to the 

exterior of FOV, and therefore there is no need to check all pixels outside of FOV. Using this 

spatial information would remove some incorrect candidates and improve the performance of the 

fiducial finder module. 

The fiducial finder module has two main steps. First, a torus mask centered at the center of 

FOV  with  an  inner  radius  of  r+1  and  outer  radius  r+8  was  applied  to  the  image.  This  step 

incorporates  the  spatial  information  into  the  method.  Next,  a  threshold  was  applied  to  the 

remaining pixels. Due to the imperfect circular shape of FOV and the leakage of light to the outside 

of FOV, a very thin arc could be present at this step. To remove those artifacts, the binary image 

at  this  stage  underwent  a  morphological  opening  operation  with  a  disk-shaped  structuring 

element.244 The final step is to quantify the position of the detected fiducial marker. It is known 

that the centroid of an object is relatively insensitive to noise and therefore is a robust estimation 

of the location of that object inside the image. Therefore, the centroid of the biggest element was 

computed as the location of the fiducial marker. Let B(x, y) denote a binary image with a size of 

m×n pixels. Equation 4-7 shows how its centroid can be computed. 

 

119 

 

 

  = 1| |.     ∙ ( , )
    
    
  = 1| |.     ∙ ( , )
    
    
| |=     ( , )
    
    
9.  =            

 

 

(4-7) 

(4-8) 

(4-9) 

Where |A| denotes area of image B and it can be computed from Equation 4-8. 

After finding the centroid of the fiducial marker, the rotation angle is computed using Equation 4-

4.3.3.2. Algorithm for distance estimation 

After mapping a frame into the standard coordinate system, the position of each laser point 

on the new image (J(x, y)) depends only on the vertical distance between the distal tip of the 

endoscope and the target surface. This section presents details of the algorithm for the automatic 

detection of laser points and the decoding of vertical distances from those positions. 

4.3.3.2.1. Automatic detection of laser points 

The  accuracy  of  the  laser  point  detection  module  would  have  a  significant  effect  on  the 

accuracy of vertical distance estimation. Any error in the detection of the laser points, or in the 

quantification of their positions, would translate into vertical distance inaccuracies. To devise a 

robust detection algorithm and an accurate calibration method, the characteristics of the projected 

laser points should be known. Figure 4.6 shows a frame from one of the laser recordings data. As 

shown, the energy of the laser source is not uniformly divided between the laser points, where the 

points in the middle are significantly brighter than the points in the periphery. Additionally, as 

 

120 

shown in figure 4.6(A, C), sums of the image on rows and columns indicate that the intensity of 

each  laser  point  has  a  bell-shaped  spatial  distribution,  with  the  highest  intensity  at  the  center 

followed by a fast decay toward the distal pixels. 

Figure 4.6. The intensity of the laser points: (A) sum of the intensity of pixels on the rows, (B) original image, (C) 

sum of the intensity of pixels on the columns. 

 

Different characteristics and sources of information were taken into account during the design 

of the laser detection module. First, the difference between the intensity of the laser points and the 

background was exploited through an adaptive thresholding approach. Considering that intensity 

of pixels is a function of working distance, using the adaptive approach was inevitable. For that 

purpose, the histogram of the intensity of pixels was constructed with 200 bins and the first bin 

was considered as the black reference and was discarded. The cumulative distribution function 

(CDF) of the logarithms of the remaining bins was estimated, and the value corresponding to 0.4 

was selected as the intensity threshold. Second, referring to Figures 4.6(A) and (C), a very large 

magnitude  of  gradient  around  the  laser points  is  expected;  therefore,  an  adaptive  thresholding 

approach  was  used  for  exploiting  this  information,  as  well.  To  that  end,  the  histogram  of  the 

magnitude of the gradient of the image was constructed with 200 bins and the value of the sixth 

 

121 

bin was used as the gradient threshold. The two thresholding values were applied to the image 

followed by a morphological opening operation with a disk-shaped structuring element. At this 

point,  every  laser  point  would  be  represented  by  a  blob  and  its  centroid  can  be  computed. 

Considering  that  the  intensities  of  the  laser  points  have  a  bell-shaped  spatial  distribution,  this 

information  was  used  too.  This  information  was  incorporated  by  using  the  weighted  centroid 

instead. Third, the laser points should have circular shapes, but most of the time the extracted blobs 

don’t have that characteristic; therefore, the weighted centroid would be affected by those artifacts. 

To remedy that and also to incorporate the morphological information of laser points, a disk with 

a radius of 7 pixels was constructed around every centroid, and then the final position of laser 

points was computed as a weighted centroid of the pixels within those disks. 

4.3.3.2.2. Vertical distance decoding 

Figure 4.7 shows how the x-y coordinates of each laser point vary depending on the working 

distance. Based on this figure, each laser point travels along a unique and well-defined trajectory, 

and hence its position within that trajectory can be used for decoding the vertical distance from 

that point to the tip of the endoscope. Additionally, it is evident that each laser point has some 

idiosyncratic  characteristics.  As  seen  in  Figure  4.7(B),  the  behaviors  of  the  laser  points  are 

different, where some of the laser points travel along a line (almost) perpendicular to the x-axis, 

indicating very small variations in the x-coordinate of these points; while other points travel along 

non-linear trajectories and show significant variations in the x-coordinate. Interestingly, some of 

these points have deflection to the right and some of them have deflection to the left. Considering 

that each laser point has a slightly different projection angle, these variations are to be expected. 

It is desirable to have trajectories with variation only along one axis, but these observations show 

that such characteristics cannot be achieved perfectly. 

 

122 

Figure 4.7. Position of each laser point as a function of working distance where each color shows a different laser 
point: (A) x-y coordinates as a function of working distance, (B) x-coordinate as a function of working distance, (C) 

y-coordinate as a function of working distance. 

coordinate system using Equation 4-10. 

To make the decoding process efficient and fast, the trajectory of each laser point was modeled 

laser point corresponding to that working distance. This point can be converted into the polar 

  =            

using a function. Let    be a specific working distance and  =      denote the position of a certain 
  =    +   ,
Now, the goal is to find a family of parametrized function ℱ and their proper parameters   such 
that on average the estimated distances (   ) and the true distances (  ) are near each other based 
on a properly defined distance function ( ). Equations 4-11 and 4-12 show these, 
   =ℱ (  ) 
 (  ,  ) 
       
  
The parameter   could be determined using optimization Equation 4-12 and a set of data points 

 

(4-10) 

(4-11) 

(4-12) 

(training phase).  After that, the trained function could be used for decoding the working distances 

of new data points. As shown in figure 4.7(B-C), different laser points follow relatively similar 

 

123 

during this phase. Equation 4-13 shows the family of curves there were used.  

Finally, most often normalization of data points improves the performance of machine learning 

ℱ     =  .   .  +  .   .   
algorithms.73 Let    and    denote the mean and standard deviation of the radius of all laser points 
  ́=(  −  )/   

from  trajectory  i,  Equation 4-14  shows  the  employed  normalization  process.  The  normalized 

(4-13) 

(4-14) 

semi-exponential  trajectories;  thus,  the  same  family  of  curves  (ℱ )  was  used  for  decoding 

purposes. But, to capture the idiosyncratic characteristics of each trajectory, the training phase was 

done separately for each laser point. Therefore, a total number of 49 different curves were trained 

values were then used for the training purpose. 

4.4. Experiments and results 

Three  experiments  were  conducted  to  answer  the  research  questions  of  this  chapter. 

Experiment 1  tests  the  performance  of  different  preprocessing  components  of  the  proposed 

method.  Experiment 2  presents  displacement  analysis  and  vertical  resolution  of  the  system. 

Experiment 3 investigates the performance of the proposed method for vertical measurements. 

This section presents details of each experiment, followed by results and related discussions. 

4.4.1. Experiment1: Evaluation of preprocessing components 

The performance of the proposed method relies on accurate estimation of parameters of the 

mapping and also accurate detection of the laser points. Eperiment1 was conducted to assess the 

performance of these components. 

 

124 

4.4.1.1. Experiment1a: Evaluation of FOV and the fiducial finder modules 

The performance of the methods for compensating the effect of lens-coupler parameters was 

evaluated.  Doing  that  requires  a  ground  truth  as  a  reference  for  comparison.  Additionally,  to 

measure  the  performance  of  each  module  separately,  the  standard  deviation  of  the  estimated 

parameters within a recording was used as the evaluation criterion. To that end, the videos from 

grid recordings were used. During those recordings, the configuration between the camera and 

endoscope was kept constant; therefore, the position of FOV, the radius of FOV, and the position 

of the fiducial marker should be the same for all of them. This observation was used for objective 

evaluation  of  the  implemented  algorithms.  For  that  purpose,  each  recording  was  divided  into 

batches with 200 frames. Then, frames within each batch were averaged and the result was fed 

into the algorithm for estimating the center of FOV, the radius of FOV, and the angle of the fiducial 

marker. Figure 4.8 shows the centralized distribution (the means were subtracted to make the plots 

more  comparable)  of  estimated  parameters  over  all  batches  and  recordings.  As  seen  in  these 

figures, the centralized probability density functions are concentrated around zero with very sharp 

peaks. This supports that the proposed FOV and fiducial finder modules are quite robust and have 

very stable performances. 

4.4.1.2. Experiment1b: Evaluation of the laser finder module 

The performance of the laser detecting module was evaluated using the videos from the laser 

recordings. To that end, each recording at a specific working distance was divided without any 

overlaps into 11 batches of 200 frames. Then, the frames within each batch were averaged to 

remove the effect of additive noise. The positions of the laser points for each batch were estimated 

using the presented algorithm. Because all batches were recorded at the same working distance, 

 

125 

Figure 4.8. Distribution of the variability in the output of FOV and the fiducial finder modules: (A) distribution of 
the centralized coordinates of the FOV center, (B) distribution of the centralized radius of FOV, (C) distribution of 

the centralized fiducial angle. 

the estimated position should have no variation in the ideal case. Therefore, the standard deviation 

of  the  (x,y)  coordinates  of  each  laser  point  over  all  batches  could  be  used  to  evaluate  the 

performance of the algorithm. The same approach was repeated for all working distances. The 

distribution  of  this  evaluation  criterion  had  a  mean  of  0.012  pixels  and  std  of  0.0224  pixels. 

Figure 4.9 shows the distribution of this evaluation metric. The figure shows that the probability 

density function is concentrated around a small number near the zero with a very sharp peak. This 

supports that the employed approach for detection of the laser points is quite robust and has very 

stable performance. 

y
t
i
s
n
e
d
 
y
t
i
l
i
b
a
b
o
r
P

Figure 4.9. Distribution of the variability in the output of the laser finder module. 

 

 

 

126 

4.4.2. Experiment2: Displacement analysis and vertical resolution of the system 

The displacement of the laser points when the working distance was varied was analyzed. To 

that  end,  the  positions  of  all  laser  points  were  computed  for  all  working  distances.  Then,  the 

magnitude of the displacement was plotted as a function of the variation in working distance. 

Figure 4.10(A) shows the magnitude of displacement when the working distance is changed from 

35  mm  to  another  target  distance.  This  figure  clearly  shows  a  semi-exponential  relationship 

between  the  working  distance  and  the  magnitude  of  displacement,  where,  at  large  working 

distances the displacement is small, but at small working distances the magnitude of displacement 

is much larger. To present this phenomenon better, the magnitude of displacement between two 

consecutive working distances was computed. In this fashion, the amount of decrement in the 

working distance is kept constant (around 1 mm), but the effect of different working distances can 

be studied. Figure 4.10(B) shows the result. Clearly, at large working distances (>20 mm) reducing 

the working distance by 1 mm leads to a small variation in the position of the laser points. On the 

other hand, as the working distance is reduced, a much larger variation in the position of the laser 

points for the same reduction in working distance is seen. Because the variation in the position of 

the laser points captures the vertical displacement of the target surface, these analyses show that 

the vertical resolution of the endoscope is a function of the working distance, where the vertical 

movements can be measured with higher resolution at shorter working distances. 

Figure 4.10 indicates that different laser points at the same working distance exhibit different 

behaviors.  More  specifically,  some  laser  points  show  a  higher  magnitude  of  displacement 

indicating a higher sensitivity to variation in working distance. To find whether those points have 

certain  relationships  with  each  other  or  not,  another  analysis  was  carried  out.  The  average 

magnitude of displacement for a 1 mm decrement in different working distances (figure 4.10(B)) 

 

127 

Figure 4.10. Displacement analysis of the laser points as the working distance is changing: (A) the magnitude of 
variation in the position of the laser points as the working distance is changing from 35 mm to a new distance, (B) 
the magnitude of variation in the position of the laser points for 1 mm decrement at different working distances. 

was computed separately for each laser point and then the result was plotted. Figure 4.11 presents 

the  employed  indexing  and  the  result.  The  result  from  figure  4.11(B)  is  significant  in  several 

regards. First, the figure has a specific pattern and it is not random. Therefore, the variability seen 

in  figure 4.11  does  not  stem  from  the  detection  algorithm,  but  it  is  rather  inherent  to  the 

characteristics of the system. Second, assuming that the square grid of 7×7 points is parallel to the 

x-y axes (figure 4.11 (A)), the points with the highest sensitivity to vertical displacement were the 

three middle rows and the first and last rows had the lowest sensitivity to vertical displacement. 

Therefore, the best reconstruction of vertical movements is achieved if the target region is covered 

with laser points from the three middle rows.  

Figure 4.11. The behavior of different laser points: (A) indexing used in this chapter, (B) the average magnitude of 
 

displacement of each laser point. 

 

128 

 

4.4.3. Experiment3: Evaluation of vertical distance measurements 

This experiment was conducted to quantify the accuracy of estimated vertical measurements. 

The following hypothesis was formed for this experiment. 

H3b: 

Vertical measurement error will be positively correlated to working distance. 

The proposed method was evaluated using two different criteria. First, the goodness of fit of 

the functions during the training phase was analyzed. For that purpose, the values of root mean 

square error (RMSE) and adjusted r-squared were computed. Figure 4.12 shows these values for 

each individual function. 

Figure 4.12. The average magnitude of displacement of each laser point. 

 
Figure 4.12 shows that the training error has peaks for RMSE and dips for adjusted r-squared 

for laser point indices {7, 14, 21, 28, 35, 42, 49}. These trajectories correspond to the top row of 

the projection pattern. Therefore, for these points, high values of error in the testing phase are 

expected.  That  is,  the  points  on  the  top  row  would  have  a  higher  vertical  measurement  error. 

Referring to the trajectories with the best performance, most of them were from middle rows of 

the projection pattern and hence those rows would have lower vertical measurement error. This 

last observation concurs with the results shown in figure 4.11.  

Next, the performance of the system in the testing scenario was analyzed. To that end, the 

target surface was positioned at fifteen different new working distances, and positions of the laser 

 

129 

points were recorded. After finding the x-y coordinates of the laser points from the above-described 

approach, they were mapped into the polar coordinate system using Equation 4-10. The radius in 

the polar coordinate system was then fed as the input to all 49 trained functions, and each function 

returned an estimated vertical distance. Figure 4.13 shows boxplot of the error at each working 

distance during the testing phase. Considering that the top row of the laser projection pattern had 

quite different performance in the training phase (figure 4.12), two different scenarios are reported, 

first  results  from  all  trajectories  were  used  for  finding  the  vertical  distance  (A),  second,  the 

functions corresponding to the top row in the projection pattern were excluded from the analysis 

(B).  

 

Figure 4.13. Boxplot of vertical measurement errors at different working distances: (A) results from all functions, 

(B) results when the functions from the top row are discarded. 

 
Referring to figure 4.13, some observations can be made. First, the estimation error at short 

working distances (<20 mm) is much lower than at large working distances. It is noteworthy that 

this  observation  agrees  with  the  result  and  discussions  of  experiment2  and  figure 4.10(B). 

Considering the working distance of flexible endoscopy and the fact that these endoscopes could 

 

130 

get  near  to  the  target  tissue,  this  characteristic  could  be  utilized  very  efficiently  during  the 

examination. As a rule of thumb, the proximity of the endoscope to the target tissue can be ensured 

by  filling  the  image  with  the  tissue  of  interest.  Second,  when  the  points  on  the  top  row  are 

discarded, the estimation error is reduced considerably. Finally, table 4.2 reports the measurement 

error for each working distance, comparing the mean of the estimation error for the whole pattern 

(averaged over all 49 functions) to the case when the top row is discarded (averaged over 42 

functions). This value is significant because if a flat and horizontal target surface can be assumed, 

averaging  multiple  measurements  would  remove  significant  amount  of  error  from  the 

measurements. It also provides the lower (upper) bound on error (accuracy) of the measurements 

from the device. Additionally, the mean percent error (mPE) defined as the average absolute value 

of error divided by the working distance, and the maximum percent error (MPE) defined as the 

maximum of the absolute value of the error divided by the working distance, are also computed 

and reported. 

Table 4.2. Statistics of the measurement error. All measurements have the unit of mm and the number in 

parentheses signifies the number of functions that were used in the measurements. 

Distance  Mean (49)  mPE (49)  MPE (49)  Mean (42)  mPE (42)  MPE (42) 
5.77 
7.95 
9.54 
11.87 
13.38 
15.39 
16.92 
18.41 
20.3 
21.69 
23.16 
25.34 
27.06 
28.69 
30.14 

141.1% 
21.3% 
32.5% 
25.2% 
25.9% 
20.6% 
14.5% 
16.9% 
30.8% 
14.8% 
20.7% 
8.8% 
9.3% 
13.8% 
12.7% 

10.4% 
3.4% 
3.1% 
2.6% 
2.1% 
4.9% 
2.8% 
3.1% 
5.5% 
4% 
5.6% 
2.7% 
3.3% 
4.1% 
5.1% 

-0.43 
0.17 
0.19 
0.21 
-0.03 
0.76 
0.38 
-0.42 
-0.84 
-0.78 
-1.3 
-0.4 
0.51 
1.13 
1.53 

0.04 
0.02 
0.03 
0.06 
-0.09 
0.57 
0.33 
-0.15 
-0.34 
-0.6 
-0.99 
-0.43 
0.66 
0.9 
1.41 

1.7% 
1.7% 
1.5% 
1.2% 
1.3% 
3.7% 
2.2% 
1.8% 
3.3% 
3.2% 
4.3% 
2.2% 
3.2% 
3.3% 
4.7% 

5.7% 
5.6% 
5.8% 
6.4% 
6.8% 
10.2% 
5.3% 
6.6% 
13.6% 
14.8% 
13.8% 
7.2% 
9.3% 
10.8% 
12% 

 

131 

 

 

To test hypothesis H3b two correlation tests were used. Based on the results of table 4-2 the 

top-row laser points were omitted from this analysis. The first test establishes the dependence of 

measurement  error  on  the  working  distance.  Specifically,  a  correlation  test  with  the  working 

distance as the independent variable and the average measurement error as the dependent variable 

was conducted. The second test establishes the dependence of the magnitude of measurement error 

on  the  working  distance.  Specifically,  a  correlation  test  with  the  working  distance  as  the 

independent variable and the average of absolute measurement error as the dependent variable was 

conducted. Table 4-3 reflects the results. Based on this analysis, the measurement error has a non-

significant and weak correlation with the working distance. However, the magnitude of the error 

and working distance has a very strong, significant, and positive correlation. 

Table 4.3. Results of correlation test for vertical measurement errors. The symbol ε means p<0.00001. 

Error 
r 
0.28  0.32 

p 

  Magnitude of error 

r 
0.9 

p 
ε 

 

4.5. Discussions 

Speech  and  voice  are  the  outcomes  of  intricate  collaborative  functions  between  different 

systems of the body. The pulmonary system provides the driving force for the voice and speech 

production system and its effect can be measured on a calibrated scale using air-flow and air-

pressure measurements, which are used for modeling the underlying mechanisms. On the output, 

the intensity of the acoustic signal can also be measured on a calibrated scale using sound pressure 

level. The methodology and the required instrumentation for performing these measurements have 

been available  to  researchers  for  a  long  time.13  One  of  the  remaining  pieces  for  developing  a 

comprehensive model for voice and speech production is performing the kinematic measurements 

on the vocal folds and their vibratory pattern on a calibrated scale. Having access to a device with 

 

132 

absolute measurement capabilities along the horizontal and vertical planes would address this gap. 

Additionally,  personalized  medicine245  and  patient-specific  modeling246  are  topics  of  high 

importance  to  medicine,  because  they  allow  taking  into  account  the  differences  between 

individuals during diagnosis and treatment. In patient specific-modeling, such differences could 

be fed into computational models for improving the diagnosis and treatment of patients by making 

better predictions about the outcome of different therapeutic options and surgeries.246 Considering 

that most current patient-specific modeling approaches rely on the geometry of the tissues derived 

from 3D imaging techniques, instrumentation with absolute measurement and 3D reconstruction 

capabilities would be beneficial for developing patient-specific models for populations with voice 

disorders. Finally, imaging techniques with absolute measurement capabilities can significantly 

enhance evidence-based practice, an important clinical topic in all fields, including laryngology 

and speech-language pathology.30 More specifically, the ability to perform absolute measurements 

on tissues and to reconstruct the 3D vibratory patterns of the vocal folds would provide researchers 

and clinicians with means for measuring the size of lesions and performing quantitative analysis 

on the kinematics of the vocal folds. This information can be obtained before, and after therapy, 

and the comparison between the two would allow evaluating the efficacy of the therapy. Other 

important  clinical  applications  of  an  imaging  system  with  calibrated  measurement  capabilities 

include studying the developmental aspects of the laryngeal tissues and the resulting changes in 

vocal fold vibration153, and the more accurate grading of relevant laryngeal diseases.191 

This chapter provided a detailed analysis of the calibration characteristics and procedures, 

which is the first step into developing an accurate instrument allowing absolute measurements of 

the vocal fold vibratory kinematics. Achieving the above-mentioned goals depends on a software 

solution that performs several additional tasks. Considering the end-user perspective, the laser 

 

133 

points  should  be  first  detected  and  tracked  on  in-vivo  recordings.  This  module  should  handle 

efficiently the non-uniform intensity of the laser points, the high-intensity reflection points in the 

recorded images, and the non-uniform reflections of the tissues. Further, a second module would 

take the estimated position of the laser points as an input and perform the required measurements 

and the reconstruction of the 3D envelope of the vibratory pattern of the vocal folds. Establishing 

the relationship between the position of the laser points and the target measurements is the pre-

requisite for this second module. This process is known as calibration, where the calibration along 

the vertical dimension was the focus of this chapter. To that end, an automatic modular solution 

was proposed for performing vertical calibration. The modular solution allows the system to be 

broken  into  different  components  and  has  several  important  advantages.  It  makes  objective 

analysis of each module possible, in that different sources of error can be distinguished and each 

of them can be quantified separately. Also, it provides flexibility in the design where each module 

may be replaced independently with a better solution in the future. Another feature of the proposed 

calibration method was using the data-driven approach. Considering that each of the laser points 

has  idiosyncratic  characteristics,  the  manufacturing  of  each  endoscope,  and  the  different 

endoscopic brands introduce differences, this approach adds significant flexibility to the system. 

In that regard, the calibration system was designed based on a set of parameters (a translation 

vector, a rotation matrix, a scaling factor, and parameters of the decoding functions ℱ) where the 
parameters of ℱ are determined separately for each endoscope and the remaining parameters are 

computed per recording. Another distinctive feature of the data-driven approach is its robustness 

to measurement error. More specifically, all measurements have some inherent errors, and using 

statistical learning approaches can remove the random error component and hence, improve the 

 

134 

performance  of  the  system.  This  feature  may  improve  by  increasing  the  number  of  training 

samples. 

4.6. Conclusion 

The ability to provide absolute calibrated measurements and to estimate the vertical vibratory 

pattern of the vocal folds would further advance the kinematic and aerodynamic modeling of voice 

production,  enabling  new  clinically  significant  research  approaches,  such  as  patient-specific 

modeling and studying laryngeal development. With these goals in mind, this chapter presented 

an automatic and modular approach for calibration of a newly developed transnasal fiberoptic 

endoscope with absolute horizontal and vertical measurement capabilities. This was achieved by 

mapping the recorded image into a standard and fixed coordinate systems, where the position of 

the laser points was independent of the lens-coupler parameters such as the magnification of the 

camera, the rotation of the endoscope relative to the camera, and the displacement of the endoscope 

within the lens coupler. Consequently, the position of the laser points in this new coordinate system 

is only a function of working distance. The analysis showed that each laser point travels along a 

unique and deterministic trajectory, making the efficient decoding of the vertical distance possible. 

The decoder was implemented based on statistical learning techniques, where a different function 

was trained per each trajectory. The trained function produces the estimated vertical distance upon 

a given input. Each module of the system was tested separately, and the results were satisfactory. 

The system was able to measure absolute vertical distance with the mean percent error varying 

from 1.7% to 4.7%, depending on the working distance.  

 

 

135 

CHAPTER 5: NON-LINEAR IMAGE DISTORTIONS IN FLEXIBLE FIBEROPTIC 

 

ENDOSCOPES  

Based on: 

Ghasemzadeh H., Deliyski D. D. Non-Linear Image Distortions in Flexible Fiberoptic Endoscopes 
and  their  Effects  on  Calibrated  Horizontal  Measurements  Using  High-Speed  Videoendoscopy. 
Journal of Voice. 2020 Sep 18:S0892-1997(20)30331-3. doi: 10.1016/j.jvoice.2020.08.029. Epub 
ahead of print. PMID: 32958427. 
 

 

Summary: Laryngeal images obtained via high-speed videoendoscopy are an invaluable source of 

information for the advancement of voice science because they can capture the true cycle-to-cycle 

vibratory characteristics of the vocal folds in addition to the transient behaviors of the phonatory 

mechanism, such as onset, offset, and breaks. This information is obtained through relating the 

spatial and temporal features from acquired images using objective measurements or subjective 

assessments. While these images are calibrated temporally, a great challenge is the lack of spatial 

calibration. Recently, a laser-projection system allowing for spatial calibration was developed. 

However, various sources of optical distortions deviate the images from reflecting the reality. The 

main  purpose  of  this  chapter  was  to  evaluate  the  effect  of  the  fiberoptic  flexible  endoscope 

distortions on the calibration of images acquired by the laser-projection system. Specifically, it is 

shown that two sources of non-linear distortions could deviate captured images from reality. The 

first  distortion  stems  from  the  wide-angle  lens  used  in  flexible  endoscopes.  It  is  shown  that 

endoscopic images have a significantly higher spatial resolution in the center of the field of view 

(FOV) than in its periphery. The difference between the two could lead to as high as 26.4% error 

in calibrated horizontal measurements. The second distortion stems from variation in the imaging 

 

136 

angle.  It  is  shown  that  the  disparity  between  spatial  resolution  in  the  center  and  periphery  of 

endoscopic  images  increases  as  the  imaging  angle  deviates  from  the  perpendicular  position. 

Furthermore, it is shown that when the imaging angle varies, the symmetry of the distortion is also 

affected significantly. The combined distortions could lead to calibrated horizontal measurement 

errors as high as 65.7%. The implications of the findings on objective measurements and subjective 

visual assessments are discussed. These findings can contribute to the refinement of the methods 

for clinical assessment of voice disorders. Considering that the studied phenomena are due to 

optical principles, the findings of this study, especially those related to the effects of the imaging 

angle, can provide further insights regarding other endoscopic instruments (e.g. distal-chip and 

rigid endoscopes) and procedures (e.g. gastroendoscopy and colonoscopy). 

 

5.1. Introduction 

Imaging  techniques  provide  a  direct  method  for  observation,  assessment,  and  precision 

measurement  of  characteristics  of  the  laryngeal  mechanisms.  Therefore,  they  are  important  in 

voice  research30,247  and  functional  assessment  of  voice  production.110,167,169  Regardless  of  the 

imaging modality (e.g. VSB, HSV, or videokymography) the acquired images can be evaluated 

using two main approaches of visual-perceptual assessments30 or image measurements. Visual-

perceptual  assessments  and  image  measurements  respectively  lead  to  subjective  and  objective 

evaluations of some features of the phonatory mechanism. Using a different taxonomy, features 

from the acquired images may belong to spatial, temporal, or spatial-temporal domains. Some 

examples of spatial features would be the size of a lesion206, glottal closure pattern95, and glottic 

angle.248 Some examples of spatial-temporal features would be velocity measures27,103, mucosal 

wave184,  glottal  area  waveform102,202,  and  kymogram.113,249  Objective  measurements  and 

 

137 

subjective assessments based on spatial and spatial-temporal features rely on some implicit but 

important  assumptions.  Those  implicit  assumptions  may  vary  depending  on  the  purpose  of 

measurements or assessments. The notions of within- and between-subject size comparisons were 

defined in section 2.3.2, but they are repeated here shortly. Comparison between size of a feature 

(e.g. lesion size) in the same person but between two different imaging conditions (e.g. pre- post-

intervention) is called the within-subject size comparison. The implicit assumption of this scenario 

is that for each subject the measurement from the two recording conditions are on the same scale, 

and hence can be compared with each other. More precisely, the implicit assumption is that the 

mm size of a pixel (i.e. pixel size) in the two conditions for each subject are the same. However, 

between-subject size comparison is the scenario that we want to compare size of a feature (possibly 

from two recording conditions) among different subjects. The implicit assumption of this scenario 

is stricter. More precisely, not only the mm size of pixels in two recording conditions for each 

subject should be the same, but also the mm size of pixels in different subjects should be the same. 

Obviously,  the  between-subject  size  comparison  assumption  satisfies  the  within-subject  size 

comparison assumption; however, the other direction does not necessarily hold.  

Different approaches are possible to satisfy the between-subject size comparison assumption. 

Regardless of the employed approach, all methods are based on the same principle. Basically, 

pixels are building blocks of images. Therefore, if we know the mm size of pixels, all objects in 

the image could be mapped in mm scale which is a universal and standard basis. Intraoperative 

calibrated  images93,190  and  laser-calibrated  imaging  systems153,192,195,236  are  some  possible 

approaches  for  determining  the  mm  size  of  pixels.  In  the  intraoperative  calibration  method,  a 

surgical instrument is placed next to a target tissue and an image is recorded.93,190 Considering the 

known mm length of the surgical instrument, the mm size of pixels in the image could be estimated. 

 

138 

On  the  other  hand,  laser-calibrated  systems  are  based  on  well-designed  laser  patterns  that  are 

projected on the laryngeal tissues. The laser patterns often have specific topological characteristics 

that help with determining the mm size of pixels in the acquired images.  

Deriving  the  mm  size  of  pixels  based  on  intraoperative  calibrated  images  or  parallel  laser 

projection is based on an important condition. Basically in these approaches, the mm size of a pixel 

is computed from some specific part of the image --in the intraoperative approach this is the target 

tissue that the surgical instrument is placed next to, and in the laser projection is the part of the 

image that falls between the two laser points-- and then we assume that the same number is valid 

for other parts of the image too. Specifically, we assume that all pixels in the image have the same 

mm sizes and therefore the conversion from pixel to mm can be achieved using a constant number 

(i.e. independent from the spatial location of the pixel). This assumption is critical for both within-

subject  and  between-subject  size  comparison  applications,  and  its  violation  could  lead  to 

significant  error  in  the  measurements.  To  put  this  argument  into  perspective  let us  consider  a 

hypothetical imaging system with a specific non-linear distortion where pixels in the right half of 

the image correspond to 1 mm, and pixels in the left half of the image correspond to 0.5 mm. 

Obviously, using a constant pixel size would lead to significant errors in between-subject size 

comparison applications, as well as, within-subject size comparison applications (e.g. if the lesion 

site in pre- and post-intervention are on different halves of the image). Consequently, studying the 

presence of image distortion is the prerequisite of reliable and accurate spatial measurements. 

Reviewing  the  literature  showed  that  the  effect  of  non-linear  distortions  has  found  little 

attention in the field of voice. Hibi and colleagues investigated the effects of non-linear distortions 

in  flexible  endoscopes.204  They  showed  that  the  magnitude  of  distortion  increases  with  the 

deviation of the imaging axis from the perpendicular angle.204 Distortion as high as 20% was 

 

139 

reported  for  a  30°  deviation  in  the  imaging  angle.  Considering  that  calibrated  horizontal 

measurements  were  not  possible  at  that  time,  that  work  was  geared  more  toward  practical 

recommendations for keeping the effect of distortions to a minimum. A different research aimed 

at studying the normative values of the glottic angle using flexible endoscopy acknowledged the 

significant  effect  of  non-linear  barrel  distortion  on  the  measurements.248  However,  the  study 

neither provided details on how the distortion was compensated for nor reported the magnitude of 

errors in presence of the non-linear barrel distortion. Finally, a very recent work investigated the 

effects of parameters of HSV recordings on the estimation of the phonatory parameters of synthetic 

vocal folds.205 This work suggested that the imaging angle was the most influential factor, where 

a 10° change in the imaging angle led to a 10% error in the estimation of the subglottal air pressure 

from  the  glottal  area  waveform.205  However,  none  of  these  works  were  aimed  at  calibrated 

horizontal measurement and effects of barrel distortion or changes in the imaging angle on it.  

5.2. Aim and hypothesis 

 

The main aim of this chapter is to investigate if non-linear distortions are present in flexible 

HSV endoscopy and if so, to quantify their impacts on subsequent horizontal measurements. The 

main research questions of this chapter are: 

Q4a: 

Q4b: 

How much the mm size of a pixel depends on its spatial location? 

How much the imaging angle affects the mm size of a pixel? 

To answer these research questions and to pursue the aim of this chapter two main hypotheses 

were formed that are presented in this section.  

H4b: 

H4c: 

 

Pixel size is significantly smaller in the center group than the periphery group. 

Pixel size is significantly different between back, middle, and front groups when 

the target surface gets tilted. 

140 

The outcomes of this chapter will help us develop a more accurate and reliable method for 

horizontal calibration and measurements from the laser-calibrated endoscope (to be presented in 

the  next  chapter).  It  is  expected  for  the  derived  horizontal  measurement  to  improve  our 

understanding of the effect of individual differences on the function of the phonatory mechanism207 

and consequently advancement of personalized medicine in the field of laryngology and speech-

language pathology. Additionally, the outcomes of this chapter could help us to better understand 

possible confounding factors in subjective assessments and objective measurements from flexible 

endoscopy images. It is noteworthy that the application of the outcomes is not limited to horizontal 

measurements from laser-calibrated endoscopes. For example, the outcomes could be utilized to 

increase the accuracy of horizontal measurements from intraoperative images, as well as, any other 

calibration approach. Additionally, the outcomes of this research would shed light on possible 

confounding  factors  affecting  the  accuracy  and  reliability  of  objective  measurements  and 

subjective  evaluations  on  images  recorded  using  distal-chip  flexible  endoscopes  or  rigid 

endoscopes. However, the exact effects in distal-chip flexible endoscopes and rigid endoscopes 

are not the purpose of this chapter and need to be investigated in a separate study.  

5.3. Optical principles of image formation 

The formation of an image in a camera follows principles of optics. Snell’s law is one of the 

main principles that govern image formation in the presence of a lens.238 Based on Snell’s law, the 

path of a ray of light changes, when it passes through the boundary of two different mediums. 

Specifically,  let     and     denote  the  refractive  index  and  the  angle  of  incidence  in  the  first 

 

141 

medium. Also, let    and    denote the refractive index and the refracted angle in the second 

medium. Figure 5.1(A) shows these symbols. Equation 5-1 shows the Snell’s law. 

  

  

  

f

  

(A)

(B)

Figure 5.1. Optical principles of image formation: (A) parameters of the Snell’s law, (B) image formation in the 

Gaussian optics model. 

  .     =  .      

 

(5-1) 

Snell’s law could be utilized to trace rays of light as they insert and exit the lens, and hence 

properties  of  the  resulting  image  could  be  estimated.  However,  Snell’s  law  is  based  on 

trigonometric  functions  and  hence  involves  complex  computations.  One  solution  is  to  use 

approximations  to  Snell’s  law.  Specifically,  using  the  thin  lens  assumption  and  small-angle 

approximation we can derive a simplified model known as the Gaussian optics238 which is very 

easy to use. The small-angle approximation stipulates that the height (length in laryngeal images) 

of the object relative to its distance from the lens is small. More precisely, in Gaussian optics the 

object should be near to the optical axis of the imaging system, otherwise, a significant error will 

be introduced into the computation. Based on Gaussian optics, the properties of the image can be 

expressed in terms of simple measurements. Let    and    be distances from the lens to the object 
and its image, respectively (figure 5.1(B)). Also, let   denotes the focal distance of the lens and 
ℎ  and ℎ  be the actual size of the object (i.e. mm length) and its image size (i.e. pixel length), 

respectively. Equations 5-2 and 5-3 present the relationship between these variables, under the 

 

142 

the negative sign is due to the inversion of the image. 

Gaussian optics.238 Also, in Equation 5-3 m denotes the magnification of the imaging system and 

1  +1  =1  
 =ℎ ℎ =−     
Referring  to  Equation  5-3, ℎ   can  be  measured  in  a  metric  unit  (e.g.  mm)  and ℎ   can  be 

(5-2) 

(5-3) 

measured in pixels. We can define the reciprocal of the magnification factor as the pixel size. The 

value of the pixel size could serve similarly to the scale printed on a map, and it could enable us 

to estimate the actual length (i.e. mm length) of an object from its uncalibrated image length (i.e. 

pixel length). Additionally, based on Equation 5-3 magnification of the camera only depends on 

   and   . Therefore, under Gaussian optics, all pixels of the image would have similar pixel sizes. 

However, Gaussian optics approximation is only valid under the small-angle assumption. The 

optical lens of the flexible endoscope gets very near to the target surface. In that case, a lens with 

a small FOV angle (and hence valid small-angle approximation) can only visualize a very small 

portion  of  the  target  surface.  To  remedy  this  and  to  increase  the  size  of  the  FOV,  flexible 

endoscopes are equipped with wide-angle lenses. Considering the significant deviation from the 

small-angle approximation in such lenses, we may expect significant errors in using the Gaussian 

optics approximation. In reality, the magnification of imaging systems equipped with wide-angle 

lenses could become a function of the spatial location of the object in the FOV. Such characteristics 

will  lead  to  a  non-linear  distortion.  Specifically,  if  the  magnification  of  an  imaging  system 

decreases with the distance from the optical axis, it is called barrel distortion.250 Conversely, if the 

magnification of an imaging system increases with the distance from the optical axis, it is called 

pincushion distortion.250 

 

143 

The second source of non-linear distortion could come from deviation in the imaging angle. 

This  effect  can  be  described  clearly  using  the  concept  of  field-of-view  cone.  A  cone  can  be 

constructed for an imaging system with its apex on the center of the lens and its base toward the 

target scene. Sides of this cone denote the last ray of light that can reach the sensor of the camera. 

Using this concept, an imaging system only records objects that are inside its FOV cone. Figure 

5.2 shows the intersections of the FOV cone with two different surfaces. Specifically, the line AC 

centered  at  point  B  (it  is  drawn  as  an  ellipse  due  to  perspective  principles).  However,  the 

denotes  the  optical  axis,     denotes  a  surface  that  is  perpendicular  to  the  optical  axis,  and    
denotes  a  non-perpendicular  surface.  Intersection  of   with  the  FOV  cone  creates  the  circle 
intersection of    with the FOV cone creates the ellipse centered at point D. Pictures are only a 
(i.e. the one located on   ) are lost and it is also mapped into a circle in the final picture. To 

imaging. Therefore, differences in heights of the left and right sides of the ellipse centered at D 

two-dimensional representation of the three-dimensional world, hence the height is lost during the 

differentiate between the intersection of a surface with the FOV cone and its recorded image, the 

former one is called the FOV while the latter one is called the image-FOV in the rest of this chapter.  

Figure 5.2. Effects of tilting the target surface on the geometry of the acquired images.  

 

Assuming the small-angle approximation (i.e. small α in figure 5.2), Equation 5-3 could be 

used for finding the magnification of the imaging system. Specifically, if two objects are on   , 

 

 

144 

one to the left and one to the right side of point B, they would have similar distances from the lens 

(  ) and hence similar magnification factors. On the other hand, if the two objects are on   , one 

to the left and one to the right side of point D, they would have unequal distances from the lens. 

That  is,  the  object  on  the  right  will  be  closer  to  the  camera  and  hence  will  have  a  larger 

magnification factor comparing to the object on the left. This example indicates another case of 

the dependence of the magnification factor of an imaging system to the spatial location of the target 

object. Another interesting observation from figure 5.2 is that when the surface is perpendicular to 

the optical axis, the center of the image-FOV (i.e. point B) coincides with the intersection point of 

the optical axis, and the surface   . However, when the surface is tilted the center of the image-
FOV (i.e. point D) moves away from the intersection point of the optical axis, and the surface   . 

Combining this observation with properties of the barrel distortion would lead to an interesting 

anticipation, which is tested in this chapter. We know in imaging systems with a barrel distortion 

the maximum magnification happens near the optical axis. Therefore, we could anticipate that if 

the surface is tilted, the point with the maximum magnification (i.e. the point with the smallest 

pixel size) would move from the center of the image toward the direction that gets closer to the 

imaging system.  

5.4. Material and method 

5.4.1. Recording instrumentation and setup 

To answer the research questions of this study, different sets of benchtop recordings should be 

collected. Therefore, the setup presented in chapter 1 (figure 1.2) with both degrees of freedom 

was used. Considering that images were taken from static surfaces, high frame rates were not 

required and the moderately low speed of 200 frames per second was used for data collection. The 

 

145 

main benefit of reducing the frame rate is the increase in the integration time that we could get. 

Therefore,  the  target  surface  does  not  need  to  be  very  bright  and  instead  of  a  xenon  light,  a 

conventional study incandescent lamp could be used for illuminating the target surface. The main 

problem with the xenon light was that it produced a spatially non-uniform illumination (i.e. the 

intensity of the light at different spatial locations was very different). This non-uniformity led to 

images with high-intensity divergence, which would unnecessarily complicate the required image 

processing  algorithms.  Therefore,  a  study  lamp  was  employed  as  the  light  source  for  data 

collection. 

5.4.2. Datasets 

This study used recordings from a target surface at multiple working distances and multiple 

tilting angles for answering the research questions. The working distance was varied from 5 mm 

to 20 mm in 5-mm increments. The working distance was measured using a digital height gauge 

with an accuracy of 0.001’’ (approximately 0.03 mm). The tilting angle was varied from -15° to 

15° in 5-degree increments. The following procedure was followed for measuring and adjusting 

the tilting angle. First, the target surface was leveled using a leveler. Then the distance between 

the front edge of the target surface (figure 5.3) and the desk was measured using the digital height 

gauge. The same measurement was carried out for the back edge of the target surface. Let D and l 

denote the difference between the back and front measurements and the length of the target surface, 

respectively. Additionally, for a desired tilting angle let ℎ  and ℎ  denote heights of the front and 

back edges of the target surface from the table. Figure 5.3 depicts definitions of these quantities. 

Now, the trigonometric functions could be employed for measuring the tilting angle of the target 

 

146 

surface (γ). Equation 5-4 shows the formula. Based on Equation 5-4, a negative angle corresponds 

to the case where the front edge of the target surface is higher than the back edge.  

 =       (ℎ −ℎ − 
 

) 

(5-4) 

 

 

Figure 5.3. A schematic for measuring the tilting angle. 

Finally, it is hard to adjust the setup for achieving the exact target working distances and tilting 

angles; therefore, the actual values deviated from the target values. Table 5.1 reflects the actual 

value of these parameters for each set of recordings. However, in the rest of this chapter groups 

will be referenced using their target values. 

Table 5.1. Actual values of working distance and tilting angle for each target group. The first number represents the 

actual working distance in mm, and the second number the actual tilting angle in degree. 

 

 

5 

 
 
e
l
g
n
a
g
n

 

i
t
l
i

T

Working distance group 
10 

20 

15 
 
-15  5.12, -15.6  9.93, -15.6 
15.05, -15.6  20.06, -15.6 
-10  5.06, -10.1  10.04, -10.1  15.27, -10.1  20.02, -10.1 
-5 
20.05, -5.1 
0 
20.12, 0 
5 
20.05, 5 
10 
20.15, 10.3 
15 
20.07, 15.6 

15.18, -5.1 
15.08, 0 
15.29, 5 
15.30, 10.3 
15.07, 15.6 

5.12, -5.1 
4.95, 0 
5.14, 5 
5.08, 10.3 
5.26, 15.6 

10.07, -5.1 
10, 0 
10.05, 5 
10.08, 10.3 
10, 15.6 
 

 

 

147 

Considering the aim of this chapter, square grid papers were attached to the target surface and 

they  were  recorded  with  the  spatial  resolution  of  288×280  pixels,  the  frame  rate  of  200,  and 

exposure time of 4900 μs. Subjective investigations showed that 1 mm grids were quite blurry and 

hard to detect at the working distance of 20 mm. Therefore, two different square grids with 1 mm 

and 2 mm spacings were used for data collection. Working distances of 5 mm, 10 mm, and 15 mm 

were  recorded  using  1 mm-spacing  grids  and  working  distances  of  15 mm  and  20 mm  were 

recorded using 2 mm-spacing grids. The overlap between the two cases was used to investigate 

any possible effect of different grid sizes on the measurements. This is discussed in more detail in 

section 5.5.1. 

5.4.3. Automatic detection of grid lines 

The main aim of this study was to investigate the effect of non-linear distortions in flexible 

endoscopy on horizontal measurements from the acquired images. Accurate detection of the grid 

lines from benchtop recordings was the prerequisite of that. Visual investigation of the recordings 

showed that grid lines in the images did not constitute straight lines but had some curvature. This 

characteristic is a classic case of barrel distortion. Figure 5.4(A) shows an example image taken 

from the 1 mm grid at the working distance of 10 mm. Therefore, an automatic algorithm based on 

statistical image processing was developed to account for possible curvature of the grid lines. 

Frames of each video recording were averaged over time and then a spatial averaging filter with 

the size of 2 pixels was applied. The following algorithm was then used for the detection of lines 

parallel to the y-axis. The filtered image was segmented in strips parallel to the x-axis with the 

width of 10 pixels and maximum overlaps (i.e. 9 pixels). The strip was averaged over the columns, 

and  then  locations  of  its  local  minima  were  detected.  A  zero-vector  mask  was  created,  and 

locations of the minima were set to 1. This procedure was repeated for all strips parallel to the x-

 

148 

axis, and all masks were concatenated vertically to create a binary image. The binary image at this 

stage  underwent  two  morphological  operations  of  dilation  and  erosion244  using  rectangular 

structuring elements with the size of 8×2 and 3×1 pixels. Finally, second-order polynomials were 

fitted on the regions with large areas. Figure 5.4 shows the outputs of the algorithm at different 

stages. The procedure for the detection of the lines parallel to the x-axis followed similar steps. 

However, the filtered image was segmented in strips parallel to the y-axis instead. Also, the strips 

were averaged on the rows, zero-vector masks were concatenated horizontally, and rectangular 

structuring elements had the size of 2×8 and 1×3 pixels. 

Figure 5.4. Automatic detection of the grid lines: (A) recording from 1 mm grids at the working distance of 10 mm, 
(B) the binary image showing the locations of the minima, (C) fitted second-order polynomials on the locations of 

the minima. 

 

 

5.4.4. Pixel size 

This study relies on a variable called the pixel size. This quantity could play a similar role to a 

scale on a printed map. Basically, we can multiply the uncalibrated pixel length of an object with 

this quantity and estimate its calibrated mm length. This number can be estimated as the ratio of 

the mm length of a target object to its pixel length during the horizontal calibration process. In this 

study, the target surfaces were calibrated square grids; hence, the mm lengths of sides of all blocks 

were known. Therefore, we could measure pixel lengths of sides of blocks from the image and 

 

149 

then compute their corresponding values of pixel sizes. To that end, pixel lengths of sides of blocks 

were determined from the fitted curves (figure 5.4(C)). Specifically, coordinates of intersections 

of all curves were determined with the precision of 0.1 pixels. Then, the pixel length of a side was 

computed as the Euclidian distance between its corresponding intersection points. 

5.5. Experiments and results 

Three  experiments  were  conducted  to  answer  the  research  questions  of  this  chapter. 

Experiment 1 investigates the existence of differences in pixel sizes computed from different grids. 

Experiment 2 presents the results on dependence of the pixel size on the spatial location of the 

target region. Experiment 3 tests the effect of imaging angle on pixel size. This section presents 

details of each experiment, followed by results and related discussions. 

5.5.1. Experiment 1: Differences between grid sizes 

We saw in section 5.4.2 that two different grids with 1 mm and 2 mm spacing were used for 

collecting data from different working distances. Before proceeding with further analysis, we need 

to  make  sure  that  measurements  from  1  mm  and  2  mm  grids  are  comparable.  The  following 

hypothesis was formed to test this.  

H4a: 

Pixel sizes computed from 1 mm grids are significantly different from 2 mm grids. 

Rejection of H4a would indicate that measurements from 1 mm and 2 mm grids are comparable. 

The dataset for this experiment were images from 1 mm and 2 mm grids recorded at the working 

distance  of  15  mm.  Considering  the  possible  effect  of  spatial  location  on  the  pixel  size,  two 

different groups of blocks were distinguished. The center group included all sides of blocks that 

were nearest to the center of the image-FOV. The periphery group included the farthest side of the 

 

150 

blocks that were farthest from the center of the image-FOV. Figure 5.5 depicts the two groupings 

with their corresponding selected sides.  

 

Figure 5.5. Groupings for experiments 1 and 2: (A) the solid red blocks and the patterned blue blocks denote the 
center and the periphery groups, (B) the selected sides of an example image. The Center of the image-FOV is 

denoted by a green cross mark. 

 

The dependent variable for this experiment was the computed pixel size. The independent 

variables  were  grid  sizes  (1  mm  vs.  2  mm)  and  groupings  (center  vs.  periphery).  A  two-way 

ANOVA was used to test H4a. Since it is known that ANOVA is generally not robust to the violation 

of  homogeneity  of  variance  if  groups  have  different  sample  sizes251,  Levene’s  test  was  first 

employed to check the homogeneity of variance. The test rejected the null hypothesis (p<.00001). 

Therefore, the analysis was carried out using M-estimators for the location with 1000 bootstrap, 

which  provides  ANOVA  with  a  robust  performance  for  non-homogeneous  variance  between 

groups.252 Table 5.2 reflects the results of this analysis. 

Table 5.2. Results of 2×2 robust ANOVA. 

variable 
Grouping (G) 
Grid size (S) 
G×S 

 

p 
<0.00001 
0.12 
0.13 

 

 

151 

Based on table 5.2, we see a non-significant effect of grid size on the pixel size. Therefore, we 

could conclude that measurements from 1 mm and 2 mm grids are comparable. Additionally, we 

see  a  significant  effect  for  the  grouping  variable.  It  means  that  pixel  sizes  were  significantly 

different between the center and the periphery groups. To better investigate this, experiment 2 was 

conducted.  

5.5.2. Experiment 2: Effect of spatial location 

The aims of this experiment were to establish the dependence of the pixel size on its spatial 

location and then to quantify that dependence. Specifically, the effects of different groups (center 

vs. periphery as depicted in figure 5.5) and different working distances on the pixel size were 

analyzed. Table 5.3 presents descriptive statistics of pixel size in different conditions. 

Table 5.3. Descriptive statistics of pixel sizes. 

Working distance (mm) 

  Center 

5 
10 
15 
20 

t 
mean (mm) 

  0.028 
  0.054 
  0.08 
  0.106 

 

  Periphery 

t 
mean (mm) 

  0.037 
  0.074 
  0.107 
  0.141 

p 
std (mm) 
0.001 
0.001 
0.001 
0.001 

p 
std (mm) 
0.004 
0.008 
0.012 
0.017 

 

Figure 5.6 depicts how the pixel size changes between different groups and working distances.  

Based on figure 5.6 we can hypothesize that, 

H4b: 

Pixel size is significantly smaller in the center group than the periphery group. 

To test this hypothesis a new dataset was compiled. The dataset consisted of images from 1 

mm grids recorded at the working distances of 5 mm and 10 mm and from 2 mm grids recorded at 

the working distances of 15 mm and 20 mm. The dependent variable for this experiment was the 

pixel size. The independent variables were groups (center vs. periphery) and working distance. A 

 

152 

)

m
m

(
 
e
z
i
s
 
l
e
x
i
P

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

Center
Periphery

5

10
Working distance (mm)

15

20

 

Figure 5.6. Variation in pixel size for different working distances and groups. 

two-way ANOVA could be used to test H4b. It is known that ANOVA is not robust to the violation 

 

of homogeneity of variance if groups have different sample sizes251; therefore, Levene’s test was 

used  to  check  the  homogeneity  of  variance.  Levene’s  rejected  the  null  hypothesis  (p<.00001) 

indicating non-homogeneity of variance between different groups. Consequently, the robust two-

way ANOVA using M-estimators for the location with 1000 bootstrap samples was used instead.252 

Table 5.4 reflects the results of the analysis. 

Table 5.4. Results of 2×4 robust ANOVA. 

variable 
p 
Groups (G) 
<0.00001 
Working distance (WD)  <0.00001 
G×WD 
<0.00001 

 

 

Based  on  table  5.4  we  see  a  significant  main  effect  of  groups  (center  vs.  periphery),  a 

significant main effect of the working distance, and a significant interaction effect. In order to 

pinpoint  differences,  robust  post  hoc  analysis  with  1000  bootstrap  samples  was  used.252  The 

analysis showed significant differences between all contrasts. Figure 5.7 presents the boxplots of 

the pixel size for different groups and working distances. 

 

153 

)

m
m

(
 
e
z
i
s
 
l
e
x
i
P

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

Center Periphery

Center Periphery

Center Periphery

Center Periphery

5 mm

10 mm

15 mm

20 mm

 

Figure 5.7. Boxplots of the pixel size for different groups and working distances. 

 

Based on figure 5.7 we could conclude that, at a fixed working distance, pixels from the center 

group have smaller pixel sizes than pixels from the periphery group. Additionally, the pixel size 

increases with the working distance, which was to be expected. Finally, as the working distance 

increases the disparity in the pixel size between the center and the periphery groups increases. This 

observation  which  concurs  with  the  significant  interaction  effect  presented  in  table  5.4  has 

practical  implications.  Specifically,  measurement  errors  due  to  the  usage  of  pixel  length  for 

comparing sizes of two objects, one in the center and one in the periphery, increases with the 

working distance. 

To quantify the effect of the spatial location of a pixel on its pixel size a different analysis was 

carried out. The pixel size of line segments highlighted in figure 5.8(A) were computed. Also, the 

Euclidian distances between the center of all line segments and the center of the image-FOV were 

computed. Finally, a negative sign was assigned to the distance of blocks that were below the 

center  of  the  image-FOV.  Figure  5.8(B)  presents  a  scatter  plot  of  the  pixel  size  for  different 

distances  from  the  center  of  the  image-FOV.  Second-order  polynomials  were  fitted  to 

measurements. 

 

154 

Figure 5.8.  Estimation of the dependence of pixel size on its spatial location, (A) selected line segments are shown 
in green dashed line, and the center of the image-FOV is denoted with a red cross mark, (B) dependence of pixel 
size on its distance from the center of the image-FOV and the working distance. The negative distance means 

blocks that were below the center of the image-FOV. 

 

Based on figure 5.8 the following conclusions can be made. First, the relationship between the 

pixel size and the distance from the center of the image-FOV is non-linear. Second, curves are 

symmetrical around the center (i.e. zero distance). This characteristic has practical implications. 

Basically,  it  means  that  pixel  length  cannot  be  used  for  within  (and  between)  subject  size 

comparison, unless the target objects have similar distances from the center of the image-FOV, in 

addition to similar working distances and zero tilting angle. For example, pixel length could not 

be used for comparing spatial features of a point on the left vocal fold to a similar point on the 

right vocal fold, unless those points have similar distances from the center of the image-FOV. 

Third, pixels in the center of the image have the smallest pixel size, and as we move toward the 

periphery the value of pixel size increases. This characteristic has important practical implications. 

Moving the target tissue to the center of the image-FOV provides better spatial resolution and 

details in the captured images. Fourth, the curvature of plots increases with the working distance. 

That is the difference between pixel size in the center and the periphery increases with the working 

 

155 

distance.  This  result  concurs  with  the  results  and  discussion  of  figure  5.7,  and  the  significant 

interaction effect of table 5.4. 

The fitted second-order polynomials could be used to quantify the magnitude of variations in 

the pixel size between the center and the periphery. Table 5.5 shows the estimated values of pixel 

size at the center and periphery of the image-FOV. Considering the dependence of the pixel size 

on its spatial location, a possible simplistic approach for computing the mm length of an object 

could be to compute the average values of all pixel sizes in the image-FOV and use it as the pixel 

size. The mean column in table 5.5 reflects this value. However, if this mean value is used for 

measuring the mm length of an object in the center and the periphery, some error will be introduced 

into  the  measurement.  The  percent  value  of  this  error  for  a  center  pixel  was  defined  as  the 

difference between the mean pixel size and the pixel size in the center divided by the mean value. 

A similar approach was followed for computing the percent difference of a periphery pixel. These 

values are presented in the last two columns of table 5.5. 

Working distance 
(mm) 
5 
10 
15 
20 

 

Table 5.5. Estimated values of pixel size. 

Center 
(mm) 
0.028 
0.053 
0.079 
0.106 

Periphery 
(mm) 
0.035 
0.067 
0.099 
0.131 

Mean 
(mm) 
0.03 
0.058 
0.086 
0.115 

Center diff. 
% 
8 
8.1 
7.9 
7.8 

Periphery diff. 
% 
-16 
-16.4 
-14.6 
-14.6 

 

Combining previous results with table 5.5 the following conclusions can be made. Despite the 

fact that the absolute value of difference increases with the working distance (more curvature in 

figure 5.8(B) at larger working distances), yet the percentage of error remains relatively constant. 

This  characteristic  means  that  the  non-linear  distortion  mostly  depends  on  the  optical 

characteristics of the endoscope and it is relatively independent of the working distance. This 

 

156 

independence  translates  into  a  simpler  method  for  compensating  the  effect  of  such  non-linear 

distortions in horizontal measurements. This topic would fully be investigated in the next chapter. 

To  put  the  results  of  table  5.5  into  perspective  an  extreme  case  of  a  within-subject  size 

comparison scenario is presented. Let us consider the actual size of a lesion is reduced from 2 mm 

to 1.5 mm post an intervention. If the pre-intervention lesion is recorded at the working distance 

of 10 mm and on the periphery of the image, it would be presented by approximately 30 pixels. 

However, if the post-intervention lesion is recorded at the same working but on the center of the 

image, it would be presented by approximately 28 pixels. That is, despite a 25% reduction in the 

mm length of the lesion we would get only a 6.7% reduction in the pixel length. This reduced 

sensitivity requires a bigger sample size in scientific research in order to achieve a significant 

effect. 

5.5.3. Experiment 3: Effect of the tilting angle 

Experiments 1 and 2 were done at zero tilting angle (i.e. imaging axis was perpendicular to the 

target surface). However, changes in the tilting angle could also lead to non-linear distortions. The 

aim  of  this  experiment  was  to  study  and  quantify  the  effects  of  this  parameter  on  horizontal 

measurements.  Therefore,  values  of  pixel  size  in  three  different  groups  at  multiple  working 

distances and multiple tilting angles were studies. Figure 5.9 shows the groupings that were used 

in this experiment. Recording at the working distance of 5 mm resulted in 14 line segments in the 

front and back groups and 18 line segments in the middle group. For all other working distances, 

all three groups had 22 line segments. 

We saw in experiment2 that the pixel size increases with the working distance. Considering 

that tilting the target surface decreases the working distance of one side of the image and increases 

the working distance of the other side, the following hypothesis was formed.  

 

157 

 

Figure 5.9. Groupings for experiment 3. Solid red lines denote the back group, dotted green lines denote the middle 
group, and dashed blue lines denote the front group,  (A) groupings at the working distance of 5 mm, (B) groupings 

at the working distance of 15 mm. 

 

H4c: 

Pixel size is significantly different between back, middle, and front groups when 

the target surface gets tilted. 

To test this hypothesis a new dataset was compiled. The dataset consisted of images from 1 

mm grids recorded at working distances of 5 mm and 10 mm and from 2 mm grids recorded at 

working distances of 15 mm and 20 mm. The dependent variable for this experiment was the pixel 

size. The independent variables were groups (back, middle, and front), the working distance, and 

the tilting angle. Figure 5.10 presents the mean and standard deviation of the pixel size for the 

three groups at different working distances and tilting angles.  

A  three-way  ANOVA  was  used  to  test  H4c.  Levene’s  test  rejected  the  null  hypothesis 

(p<.00001). Therefore, the analysis was carried out using trimmed means (0.2 trimming level), 

which  provides  ANOVA  with  a  robust  performance  for  non-homogeneous  variance  between 

groups.252 Table 5.6 reflects the results of the analysis. Based on this table we see all main effects 

were  significant.  Additionally,  except  for  the  Angle×WD,  all  other  interaction  effects  were 

significant. 

 

158 

Back group 
Middle group
Front group 

(A)

Back group 
Middle group
Front group 

)

m
m

(
 
e
z
i
s
 
l
e
x
i
P

)

m
m

(
 
e
z
i
s
 
l
e
x
i
P

0.16

0.14

0.12

0.1

0.08

0.06

)

m
m

(
 
e
z
i
s
 
l
e
x
i
P

0.04

-15

-10

-5

Back group 
Middle group
Front group 

0

Imaging angle

(B)

Back group 
Middle group
Front group 

)

m
m

(
 
e
z
i
s
 
l
e
x
i
P

5

10

15

(C)

(D)

Figure 5.10.  Values of the mean and standard deviation of pixel size: (A) working distance of 5 mm, (B) working 

distance of 10 mm, (C) working distance of 15 mm, (D) working distance of 20 mm. 

Table 5.6. Results of 7×4×3 ANOVA for trimmed means. 

p 
variable 
Angle 
0.0001 
Working Distance (WD)  0.001 
Groups (G) 
0.0001 
Angle×WD 
0.86 
Angle×G 
0.001 
G×WD 
0.001 
Angle×G×WD 
0.001 

 

 

 

Based on figure 5.10 the following conclusions can be made. First, when the tilting angle is 

zero, the back and front groups have similar pixel sizes. However, as the magnitude of the tilting 

angle increases the difference in the pixel size of the back and front group increases. Specifically, 

at positive angles (i.e. when the backside is higher) pixels in the back group have smaller pixel 

sizes than the front group (hence higher spatial resolution in the backside). Conversely, at negative 

 

159 

angles, pixels in the front group have smaller pixel sizes than the back group. Second, crudely 

speaking, the behavior of the front group at a negative angle is similar to the behavior of the back 

group at a similar but positive angle, and vice versa. This characteristic indicates the presence of 

a  specific  symmetry  in  the  distortion.  Third,  the  standard  deviations  of  different  groups  show 

dissimilar trends. The middle group exhibits the least variations and its behavior remains relatively 

constant for different tilting angles. However, as the tilting angle goes from -15° to 15° standard 

deviation  of  the  pixel  size  in  the  front  group  (back)  increases  (decreases).  This  behavior  may 

indicate a non-linear dependence of the pixel size on the tilting angle and the spatial location of a 

target pixel. To quantify this behavior a further analysis was carried out. 

The  pixel  sizes  for  line  segments  highlighted  in  figure  5.11(A)  were  computed.  Then,  the 

Euclidian distance between the center of all line segments and the center of the image-FOV was 

computed. Then, a second-order polynomial curve was fitted for data points computed from each 

tilting angle. Figure 5.11(B) represents the result of this analysis for the working distance of 15 

mm. It is noteworthy that, the negative sign denotes blocks that were below the center of the image-

FOV. 

Based on figure 5.11(B) we see significant differences between different curves. Specifically, 

when the tilting angle is zero, the minimum of the curve is near point zero (i.e. the minimum pixel 

size is at the center of the image-FOV). However, when the tilting angle becomes positive the 

minimum of the curve (i.e. position with the minimum pixel size and hence the highest spatial 

resolution) deviates from the center of the image-FOV and goes toward the negative direction (i.e. 

toward the back of the target surface). Additionally, the magnitude of this deviation is positively 

correlated with the magnitude of the tilting angle. Conversely, when the tilting angle becomes 

negative the minimum of the curve deviates from the center of the image-FOV and goes toward 

 

160 

Figure 5.11.  (A) The selected line segments are shown in green dashed lines, and the center of the image-FOV is 
denoted with a red cross mark. (B) Dependence of pixel size on its distance from the center of the image-FOV and 

the tilting angle at the working distance of 15 mm. 

the positive direction (moving toward the front of the target surface). Additionally, the magnitude 

of this deviation is positively correlated with the magnitude of the tilting angle. To quantify these 

qualitative  observations,  further  analysis  was  carried  out.  The  minimum  of  each  curve  was 

estimated using the analytical approach (i.e. equating the derivative to zero). Figure 5.12 shows 

the distance of the minimum pixel size from the center of the image-FOV.  

5 mm
10 mm
15 mm
20 mm

50

40

30

20

10

0

-10

-20

-30

-40

-50

-15

-10

-5
5
Deviation angle (degree)

0

10

15

 

Figure 5.12. Dependence of location with the highest spatial resolution on the tilting angle. 

 

161 

Another  significant  observation  from  figure  5.11  is  that  at  zero  tilting  angle  the  curve  is 

symmetrical around the minimum point (which coincides with the center of the image-FOV). This 

means the points with similar distances from the center of the image-FOV would have similar 

pixel  sizes.  However,  as  the  tilting  angle  starts  to  deviate,  the  curves  become  exceedingly 

asymmetric. That is, the dissimilarity between the two portions of the curves (left of the minimum 

pixel size and the pixel size at the front periphery of the image-FOV. Then, the percentage of 

and right of the minimum) increases with the magnitude of the tilting angle. To quantify these 

qualitative observations, further analysis was carried out. Let      and        denote the minimum 
difference at the front periphery (  ) was defined as follows. 
  =      −    
    
The percentage of difference at the back periphery (  ) was defined similarly. These values were 

×100% 

(5-5) 

computed for each working distance and tilting angle. Table 5.7 shows the results, which support 

the  preceding  qualitative  discussions.  Specifically,  at  negative  tilting  angles,  pixel  sizes  are 

significantly larger at the back periphery (larger values of percentage of difference), and at positive 

tilting  angles,  pixel  sizes  are  significantly  larger  at  the  front  periphery  (hence  smaller  spatial 

resolution).  Additionally,  as  the  magnitude  of  the  tilting  angle  increases  the  percentage  of 

difference from one side (the side that is getting away from the camera) increases while the other 

side (the side that is getting near to the camera) decreases. For example, at the working distance 

of 10 mm and the tilting angle of 15°, the pixel sizes at the front and back peripheries are 61.2% 

and 12.4% larger than the minimum pixel size. In summary, pixel sizes at the side of FOV that 

gets closer to the camera become more similar, whereas, the other side become more divergent. 

 

162 

Table 5.7. The percentage of difference at the back and front peripheries from different working distances and tilting 

angles. 

 

 
e
l
g
n
a
g
n
i
t
l
i

 

T

`

 
 
-15° 
-10° 
-5° 
0° 
5° 
10° 
15° 

  5 mm 

Db%  Df% 
61.5  17.9 
46.1  16.5 
28.7  18.1 
25.7  25.8 
18.1  31.4 
16.8  50.9 
65.7 
12 

  10 mm 

Db%  Df% 
  59 
13.8 
  45.3  15.2 
  29.4  18.6 
  25.7  26.3 
  19.5  33.7 
  14.5  43.9 
  12.4  61.2 

 

  15 mm 

Db%  Df% 
55.5  11.5 
43.5  14.6 
35 
19.1 
26.5  24.2 
  19.8  31.3 
  15.2  42.1 
  13 
59.9 

  20 mm 

Db%  Df% 
59.6  14.3 
42.2  14.3 
34.1  18.8 
25.8  24.1 
  20.5  32.8 
  15.1  43.6 
  12.7  60.3 

 

The main goal of this chapter was to investigate the effect of two non-linear distortions on 

horizontal measurements. To that end, we simulated a situation where an object with the actual 

length  of  2  mm  was  placed  at  different  locations  of  the  FOV  (front  periphery,  center,  back 

periphery, and the location with the highest spatial resolution). Then we used the estimated value 

of pixel size (figure 5.11(B)) for computing the pixel length of that object at the working distance 

of 10 mm and different tilting angles. Table 5.8 presents the results. The location with the highest 

spatial resolution (denoted as the maximum in table 5.8) was determined analytically (i.e. equating 

the derivative to zero) from curves in figure 5.11(B). 

Table  5.8  clearly  demonstrates  the  effect  of  spatial  location  and  tilting  angle  on  the 

uncalibrated size (i.e. pixel length) of an object in flexible endoscopy. For example, for a constant 

spatial location in the back periphery the uncalibrated size of the object could increase by 48% if 

the tilting angle changes from -15° to 15°. Also, for a constant tilting angle of 15° the uncalibrated 

size  of  the  object  could  increase  by  57.7%  if  the  object  moves  from  the  minimum  resolution 

location to the maximum resolution location in the FOV. Finally, we can see the interaction effect 

of  grouping  (front,  center,  back)  and  the  tilting  angle.  Specifically,  at  zero  tilting  angle  the 

uncalibrated size of an object on the front periphery increases by 26.7% if that object moves to the 

 

163 

center of the image-FOV. However, at the tilting angle of 15°, the increase could be as high as 

53.8%.  

Table 5.8. Estimated uncalibrated length (i.e. pixel length) of a 2 mm object at different locations of the FOV and 

different tilting angles. 

  Front  Center  Back  Maximum 

 

 
e
l
g
n
a
g
n
i
t
l
i

 

T

 
-15° 
-10° 
-5° 
0° 
5° 
10° 
15° 

36 
34 
32 
30 
29 
25 
26 

25 
27 
29 
30 
32 
32 
37 

40 
40 
38 
38 
38 
37 
41 

39 
39 
38 
38 
38 
36 
40 
 

 

5.6. Discussions 

Imaging techniques are widely employed in clinical practice. The fields of speech-language 

pathology  and  laryngology  are  not  an  exception.  However,  the  access  for  direct  functional 

observation of the laryngeal tissues is not trivial, and therefore, the visualization is channeled 

through an endoscopic instrument. Hence, the functionality and characteristics of the endoscope 

determine the characteristics of the acquired images. For example, rigid endoscopy is based on 

transoral  insertion,  which  limits  the  types  of  stimuli  that  can  be  elicited.  Also,  the  unnatural 

retraction of the tongue130 required for adequate laryngeal exposure may alter the voice production 

system and hence may not  reflect the natural function of  the phonatory system. For example, 

research  has  shown  that  the  presence  of  a  rigid  endoscope  could  significantly  change  the 

fundamental  frequency  and  quality  of  the  produced  voice131,  which  may  support  a  modified 

function  of  the  phonatory  mechanism  during  the  rigid  endoscopy.  Flexible  endoscopy  helps 

address some of these concerns. Also, flexible endoscopes provide the possibility of simultaneous 

 

164 

aerodynamic measurements.132–134 This could provide significant information about the complex 

interactions  between  kinematics,  aerodynamics,  and  the  produced  acoustic  of  the  phonatory 

mechanism. Additionally, coupling a laser-calibrated  flexible endoscope195 to an HSV system and 

recording  synchronized  aerodynamic  measurements  could  help  us  tease  apart  the  effect  of 

individual differences on the phonatory mechanism. Last but not least, flexible scopes have been 

associated with higher success rates in adult127 and pediatric136,137 populations. However, flexible 

endoscopes are associated with non-linear distortions.  

The main aim of this chapter was to quantify the effects of two different sources of non-linear 

distortions in the images acquired from a fiberoptic flexible endoscope. The first source stems 

from the wide-angle lens that is used in the flexible endoscope in order to compensate for short 

working distances and hence maximizing the FOV. The second source of non-linearity stems from 

changes in the imaging angle. A significant error can be introduced into measurements if these 

distortions are not compensated for. Two different interpretations of the effects of these distortions 

are presented here. The first interpretation relates to the usage of uncalibrated measurements (i.e. 

pixel lengths) and quantifies the magnitude of error in comparing pixel length of objects from 

different  locations  of  the  image-FOV.  Whereas,  the  second  interpretation  relates  to  calibrated 

measurements (i.e. estimating the mm lengths) in the absence of proper compensation methods. 

This interpretation quantifies the magnitude of error in estimating the mm length of objects from 

different locations of the image-FOV. Experiments 1 and 2 demonstrated the significant effect of 

spatial location of a pixel on its mm size. Based on results of table 5.5 pixels in the periphery could 

have about 26.4% lower spatial resolution than pixel in the center. This means that if pixel lengths 

are used for comparing two similar objects one in the center and one the periphery, length of the 

object in the center will be overestimated by 26.4%. Considering the mm measurement, a simplistic 

 

165 

solution could be to compute the average pixel size and then use it for conversion from pixel into 

mm. Based on results of table 5.5 this approach could lead up to 8.1% overestimation of the object 

in  the  center  and  up  to  14%  underestimation  of  the  object  in  the  periphery.  Experiment  3 

investigated  the  effect  of  tilting  angle  and  showed  its  significant  effect  on  measurements. 

Specifically, table 5.8 showed that pixel length of an object in the periphery of the image-FOV 

could changes by 48% if the tilting angle goes from -15° to 15°. If the average pixel size (table 

5.5, column Mean) is used and the effect of tilting angle is not compensated for, calibrated mm 

measurements could have significant error. Specifically, at the tilting angle of 15° the mm length 

on one side of the periphery could be underestimated by 34%. 

The focus of this study was on non-linear distortion from a laser-calibrated laryngeal fiberoptic 

flexible  endoscope  and  their  effect  on  the  horizontal  measurement.  However,  the  results  may 

provide insights and motivation for further analysis of other types of endoscopes, as well as, other 

endoscopic  procedures  (e.g.  gastroendoscopy,  colonoscopy).  Specifically,  the  first  non-linear 

distortion was due to the wide-angle lens of the fiberoptic flexible endoscopes. Considering that 

distal-chip flexible endoscopes, gastroendoscopy, and colonoscopy also use wide-angle lenses, 

one may expect to see some residual distortions. However, the exact magnitude of distortion would 

be different from this study and should be investigated in a separate study. Rigid endoscopes have 

a narrower angle of view and hence the small-angle approximation may be valid. Therefore, the 

effect of the first source on non-linear distortion could be minimal in rigid endoscopes. On the 

other hand, the effect of the imaging angle seems to be universal. Therefore, it is expected for 

accuracy  of  measurements 

from 

rigid  endoscopy,  distal-chip 

flexible  endoscopes, 

gastroendoscopy, and colonoscopy to depend on the imaging angle. However, the exact magnitude 

of that distortion could be different from fiberoptic endoscopes and should be investigated in a 

 

166 

separate study. To address this need, we are planning to use a similar approach and evaluate the 

distortions of distal-chip videoendoscopy systems, which would quantify the effect of tilting angle 

and spatial location on the validity and reliability of horizontal measurements. Considering the 

popularity and widespread usage of distal chip videoendoscopy systems in clinical settings, such 

a study is warranted to provide more immediate clinical value. 

Implications and findings from this study seem to extend beyond horizontal measurements. 

For example, in figure 5.4(C) we see that parallel lines exhibit a bowing effect in the captured 

images. This may indicate that subjective visual assessments of laryngeal images captured from 

fiberoptic flexible endoscopes for assessment of vocal fold bowing may get biased. Figure 5.11(A) 

shows that when the imaging angle is not perpendicular, parallel lines may result in divergent lines 

in  the  image.  This  may  indicate  that  vocal  folds  that  are  in  fact  parallel  may  be  captured  as 

divergent ones in laryngeal images (regardless of the imaging modality) if the imaging angle is not 

perpendicular. Last but not least, the objective and subjective measurements of asymmetry have 

been  used  in  previous  literature.98,249  However,  the  investigated  non-linear  distortions  could 

significantly change the accuracy of those subjective assessments and objective measurements. 

5.7. Conclusions 

This study was motivated by performing calibrated (i.e. mm) horizontal measurement from a 

laser-calibrated HSV system. The system was designed based on a fiberoptic flexible endoscope. 

Two  different  sources  of  non-linear  distortions  in  the  fiberoptic  flexible  endoscope  were 

investigated, the wide-angle lens used in flexible endoscopes, and the deviation in the imaging 

angle. It was shown that the first source of distortion, the wide-angle lens, results in a pixel size 

(i.e. the conversion scale from pixel into mm) that depends on the spatial location of that pixel. 

More precisely, it was shown that if the imaging axis is perpendicular, all pixels with similar 

 

167 

distances to the center of the image-FOV will have similar pixel sizes. Additionally, it was shown 

that as we move away from the center of the image-FOV the pixel size increases. A different 

interpretation of this observation would be that the spatial resolution of the image decreases as we 

move away from the center of the image-FOV toward its periphery. Therefore, keeping the region 

of  interest  in  the  center  of  the  image-FOV  would  improve  the  details  of  the  captured  image. 

Studying the second source of non-linear distortion, the effect of imaging angle, showed that it 

disturbs the radial symmetry of the images. That is, the spatial resolution of points with similar 

distance to the center of the image-FOV become dissimilar, and also that dissimilarity increases 

with an increase in the tilting angle. Additionally, this distortion leads to the dislocation of the 

points with the highest spatial resolutions from the center of the image-FOV. The analysis showed 

that the combined non-linear distortions could result in calibrated horizontal measurement errors 

up to 65.7%. 

 

 

168 

CHAPTER 6: DIRECT HORIZONTAL CALIBRATION OF HSV RECORDINGS 

 

Based on: 

Ghasemzadeh H., Deliyski D. D., Hillman R. E., Mehta D. D. Method for Horizontal Calibration 
of Laser-Projection Transnasal Fiberoptic High-Speed Videoendoscopy, in Preparation. 
 

 

Summary: Calibrated horizontal measurements (e.g., mm) from the vibrating vocal folds and the 

surrounding laryngeal structures during phonation could improve our knowledge of the function 

of  normal  and  disordered  phonatory  mechanisms.  Additionally,  it  could  be  used  for  direct 

assessment of therapeutic outcomes, implementation of evidence-based practice, and advancement 

of personalized medicine in the fields of laryngology and speech-language pathology. However, 

the size of an object in laryngeal images is not routinely calibrated during endoscopic assessment 

and depends on a couple of factors, including the distance between the endoscope and the target 

surface. This chapter used a recently developed in-vivo laser-projection fiberoptic endoscope and 

proposes a method for calibrated spatial measurements. To that end, a set of circular grids were 

recorded at multiple working distances. A statistical model was trained that would map from pixel 

length of the object, the working distance, and the spatial location of the target object into its mm 

length. A detailed analysis of the performance of the proposed method is presented. The analyses 

have shown that the accuracy of the proposed method does not depend on the working distance 

and length of the target object. The estimated average magnitude of error was 0.27 mm, which is 

three times less than the existing approaches. 

 

 

169 

6.1. Introduction 

The length of an object in an image depends on the magnification factor of the camera, which 

in turn depends on several factors including the distance of the object from the camera. Considering 

that most often, we do not know the distance of an object from the camera, measuring the calibrated 

(i.e. mm) lengths of an object from an image is not a trivial task unless some auxiliary information 

is provided. Providing a conversion scale probably is the most common approach, which is present 

in every printed map. Another less common approach would be to add an object with a known size 

(e.g.  a  penny)  to  the  scene  before  taking  the  picture.  Regardless  of  the  employed  approach, 

calibrated horizontal measurement on an image follows the same steps. Pixel lengths of the target 

object and the auxiliary information (i.e. the scale on a map or the penny) are measured. These 

measurements are then combined with the a-prior knowledge of the mm length of the auxiliary 

information,  and  the  pixel-to-mm  conversion  scale  is  computed.  Finally,  the  pixel-to-mm 

conversion scale can be used for measuring the mm length of any object in the image. 

In chapter 1 we discussed that horizontal calibration approaches can be classified into direct 

and  indirect  methods,  depending  on  the  source  of  the  auxiliary  information.  The  auxiliary 

information  for  the  indirect  approach  comes  from  a  different  image  (or  source),  whereas  the 

auxiliary information for a direct approach comes from the same image that we want to make 

measurements from. This subtle difference has very important consequences regarding the validity 

of the measurements. Specifically, indirect approaches have the following implicit assumptions. 

(1) The auxiliary information is exactly the same in both images. (2) The auxiliary information 

can be registered accurately on the target image. (3) Both images were captured under similar 

conditions (e.g. imaging angle, a similar vertical distance between the auxiliary information and 

the target surface, etc.). These conditions were discussed in detail in chapter 2. 

 

170 

Let us consider the length of a vocal fold for indirect horizontal calibration. Obviously, the 

phonation system is a moving mechanism and hence it is changing constantly. For example, the 

length of the vocal folds can change from one recording to the next one.  Also, the larynx could 

move in the vertical plane (i.e. be elevated or depressed) which would change the working distance 

of the camera. Therefore, elevation-depression of the larynx could change the pixel length of a 

vocal fold between two recording sessions, even if mm lengths of the vocal fold was similar in 

both  imaging  sessions.  Things  get  even  more  complicated  during  phonation.  The  relationship 

between different parameters of the phonatory system (e.g. activity of different intrinsic muscles 

and  subglottal  pressure)  and  the  acoustic  output  (e.g.  pitch,  intensity)  is  very  complex.12 

Consequently, using the measured (or the self-perceived) pitch and loudness could not necessarily 

warrant the assumption of a similar length of the vocal fold between different recordings. Finally, 

if different recordings are done pre- and post-surgery, then the system was changed between the 

two  conditions.  Obviously,  using  pitch  and  loudness  could  be  even  more  problematic  in  such 

instances. On the other hand, direct calibration approaches do not have these important implicit 

assumptions. Thus, their measurements could be more accurate. It is noteworthy, that the improved 

accuracy  is  achieved  at  the  expense  of  higher  complexity.  Specifically,  indirect  calibration 

approaches do not require specialized instruments and can be performed using existing laryngeal 

imaging systems. Additionally, images could be printed and a simple caliper would be enough for 

measurements.93  Conversely,  direct  approaches  rely  on  specialized  and  more  sophisticated 

imaging  instruments,  and  often  any  measurement  requires  complex  calibration  and  processing 

steps.  

The definition of the direct calibration approach stipulates the existence of some auxiliary 

information on the recorded images. This means that we need to add some properly designed 

 

171 

fiducial markers to the FOV. Two important problems should be addressed in order to realize this 

requirement. The first problem is to design and create fiducial markers with certain topological 

properties.  The  second  problem  is  to  deliver  the  created  pattern  to  the  laryngeal  mechanism. 

Reviewing the literature on laryngeal imaging shows that researchers have been working on these 

problems for more than two decades.122 Laser source emits spatially coherent light and therefore 

can be used for creating fiducial patterns with specific topological properties. The created pattern 

could 

then  be  delivered  by  clipping 

the 

laser  projection  component 

to 

the 

endoscope.25,153,190,232,236,253  Obviously,  this  approach  increases  the  insertion  diameter  of  the 

endoscope which would exacerbate the discomfort level of the patient and hence reduce the success 

rate of the endoscopy. A more elegant approach is using a surgical endoscope195 or employing 

some portions of the illumination fibers of a flexible endoscope.191  

Two main approaches of parallel laser markers and multiple laser points have been used for 

creating  the  laser-fiducial  markers  in  the  field  of  voice.122  The  projection  of  the  parallel  laser 

markers is the simplest approach. Two-point laser projection25,190,192, two-line laser projection234, 

and multiple line laser projection153 are some examples of this category. The multiple-laser-points 

projection  is  more  sophisticated  and  involves  the  projection  of  many  laser  points  on  the 

FOV.191,194,195,235 Each method has its own merits. The parallel laser projection category benefits 

from the simplicity of its optical design and subsequent measurement methodology. Detection of 

the laser markers on the image is the only required step for measurement in those systems. After 

that,  the  distance  between  the  laser  markers  may  be  used  similarly  to  a  scale  on  a  map  and 

calibrated horizontal measurements may be achieved using a simple caliper. The main assumption 

of this method is that all pixels in the image have the same mm lengths which could be violated if 

different objects of the image have different distances from the camera, or if different locations of 

 

172 

the image have different pixel size representations.208 Violation of these assumptions will lead to 

measurement errors. Conversely, multiple-laser projection systems benefit from the presence of 

laser points in all parts of the image. Not only this information helps with vertical measurements122 

and 3D reconstruction of the envelope of the FOV236, but it also means that with a high probability 

some  laser  points  would  be  near  to,  or  on  the  target  surface.  Therefore,  the  above-discussed 

problems would be resolved. However, systems from this category require more sophisticated 

optical hardware and processing software design. Horizontal measurements from these systems 

depend on a calibration step, where the confounding factors of the pixel-to-mm conversion scale 

are determined and accounted for. 

6.2. Aim and hypothesis 

The main aim of this chapter is to develop the methodology of horizontal calibration and 

subsequent horizontal measurement for a laser-projection transnasal fiberoptic HSV system. 

The main research question of this chapter is: 

Q5: 

How could we use a structured laser projection system for measuring the horizontal 

distance between two points on a target surface? 

In  chapter  5  we  saw  that  fiberoptic  endoscopes  have  significant  non-linear  distortions. 

Therefore, the following hypothesis was formed for this chapter.  

H5a: 

Horizontal  measurement  error  from  the  laser  projection  system  significantly 

increases, if the nonlinear distortion is not properly compensated for. 

Additionally, in chapter 4 we saw that vertical measurement error was positively correlated to 

working  distance.  Considering  that  horizontal  measurements  depend  on  the  estimation  of  the 

working distance, the following hypothesis was formed. 

H5b: 

Horizontal measurement error will be positively correlated to working distance. 

 

173 

6.3. Material and method 

The  mm  length  of  an  object  is  its  length  perpendicular  to  the  line  of  sight  of  the  scope. 

Accurate estimation of the mm length of an object from the pixel length of the object’s image is 

the  primary  goal  of  calibrated  horizontal  measurement.  Assuming  an  optical  system  that  is 

symmetrical  around  its  optical  axis,  the  relationship  between  an  object  and  its  image  can  be 

determined. Let ℎ  denote the mm length of an object and ℎ  denote the pixel length of the object’s 
ℎ =       +  ℎ +        +    ℎ (2+   2 )

image. Also, let O be the intersection point between a ray of light from the object and the aperture 

of the camera (figure 6.1) expressed in the polar coordinates (ρ, φ). We would have238, 

(6-1) 

+(3  +  ) ℎ      +  ℎ   

where Aj and Bj are constants, and HOT represents the higher-order terms. 

 

Figure 6.1. Relationship between the length of an object (ho) and its image (hi) in an axially symmetrical optical 

system. 

 

Equation 6-1 shows a non-linear and complex relationship between pixel and mm lengths. 

Using  the  thin-lens  assumption  and  small-angle  approximation238  Equation 6-1  can  be 

approximated with a much simpler model known as the Gaussian optics.238 In this model, the ratio 

of pixel length to mm length is a constant number, which is called the magnification factor of the 

system (m). Equation 6-2 shows this: 

 

174 

 =ℎ ℎ  

(6-2) 

In  the  Gaussian  optics,  the  magnification  factor  and  the  working  distance  have  an  inverse 

relationship 238, and therefore the working distance would be a confounding factor for calibrated 

horizontal measurements. 

Flexible  fiberoptic  endoscopes  employ  wide-angle  lenses  to  maximize  their  FOV  sizes. 

However, wide-angle lenses violate the small-angle approximation of the Gaussian optics. This 

leads to a more complex relationship between pixel and mm lengths. Specifically, this deviation 

introduces  significant  non-linear  distortion  into  recorded  images.  Distortion  of  a  flexible 

laryngoscope was studied in chapter 5. We showed that when the imaging axis is perpendicular to 

the target surface, the distortion is symmetrical around the optical axis, and points with similar 

distances  from  the  FOV  center  experience  similar  distortions.  Considering  this  symmetry, 

Equation 6-1 may govern the image formation in flexible endoscopy. Additionally, we showed 

that the pixel length of an object significantly depends on its spatial location within the FOV.208 

Therefore, the spatial location of the target object is another confounding factor for horizontal 

measurements. Circular grids can exploit this symmetry efficiently; thus, the proposed method 

uses circular grids to account for the effect of working distance and the spatial location of the target 

object. 

To  demonstrate  this,  a  circular  grid  with  a  spacing  of  0.5 mm  was  recorded  at  working 

distances  of  2.87 mm  and  2.24 mm.  Figure 6.2  shows  the  recorded  images.  The  circles  had  a 

constant distance of 0.5 mm from each other. However, in figure 6.2(A) we see as we go from the 

center toward the periphery the distance between consecutive circles decreases from 30 pixels to 

20 pixels. This clearly demonstrates the dependence of horizontal measurements on the spatial 

location. Comparing figures 6.2(A) and 6.2(B) we see the effect of working distance, where the 

 

175 

distance between the two smallest circles increases from 30 pixels to 35.5 pixels when the working 

distance decreases from 2.87 mm (figures 6.2(A)) to 2.24 mm (figures 6.2(B)). 

Figure 6.2. Effects of working distance and spatial location on horizontal measurements: (A) working distance of 

2.87 mm, (B) working distance of 2.24 mm. 

 

 

6.3.1. Datasets 

The proposed calibration and subsequent horizontal measurement methods were developed 

and then evaluated based on different sets of benchtop recordings. The setup presented in section 

1.4.1, figure 1.2 with only one degree of freedom was used for data collection. That is, the tilting 

angle was fixed (perpendicular to the imaging axis) and only the working distance was changed. 

This study used four different sets of recordings. Set 1 contained 65 recordings from circular 

grids (figure 6.2) at different working distances. This set was used for training and testing of the 

model converting a pixel length to its mm length. The working distance was gradually increased 

from  2 mm  to  32 mm  and  at  each  working  distance,  a  recording  was  done.  This  process  was 

repeated  three  times  to  reduce  measurement  error.  For  each  recording,  the  grid  was  adjusted 

subjectively inside the FOV such that the largest visible circle had a uniform distance from the 

border of the FOV. Considering the limited spatial resolution, grids became significantly blurry 

after a certain working distance. Hence, three different circular grids with the spacing of 0.5 mm, 

 

176 

1 mm, and 2 mm were used for working distances in the range of [2, 10], [10, 20], [20, 32] mm. 

The laser source was turned off during these recordings. 

The proposed method requires an accurate estimation of the distance between the tip of the 

endoscope  and  the  target  surface  (i.e.,  the  working  distance).  We  showed  in  chapter  4  that  a 

statistical model can be trained to decode the working distance from locations of the laser points.122 

Set 2 had 72 recordings and was used for the training of this model. For this set, the laser source 

was turned on, and the light source was turned off and recordings were done from a white paper. 

The working distance was gradually increased from 2 mm to 35 mm and at each working distance, 

a recording was done. The recording process was repeated four times to reduce measurement error. 

The proposed method relies on an accurate estimation of a central angle (i.e., an angle that 

has its apex on the center of a circle). However, flexible endoscopy images exhibit significant 

nonlinear  distortions.208  Set 3  was  recorded  to  investigate  possible  effects  of  the  introduced 

nonlinear distortion on central angle measurements. This set was based on a custom-designed grid. 

A circular grid was divided into 24 equal sectors, which created 24 central angles in 15° increments 

(figure 6.3(A)). The grid was recorded at four working distances of 6.16 mm, 13.20 mm, 19.54 mm, 

and 26.44 mm. At each recording distance, the grid was adjusted subjectively inside the FOV such 

that the largest visible circle had a uniform distance from the border of the FOV. This process 

insured that the center of the grid was at the center of the FOV. This characteristic governs that 

estimated angles from the image are central angles. The laser source was turned off during these 

recordings. 

Set 4 was recorded for evaluating the accuracy of the proposed method. Line segments with 

known mm lengths were recorded at fifteen arbitrary locations in the FOV with arbitrary rotations. 

To provide a comprehensive evaluation, a wide range of lengths and working distances were used. 

 

177 

 

Figure 6.3. The data for evaluation of central angle measurement: (A) the custom-designed grid, (B) segmented 

radial lines. 

Specifically, 5 mm, 10 mm, 15 mm, and 20 mm line segments were recorded at a working distance 

of 20.18 mm. These recordings were used to investigate the possible effect of object length on the 

accuracy of the method. Additionally, a 5 mm line segment was recorded at working distances of 

5.12 mm, 9.98 mm, 14.98 mm, and 20.18 mm, which covers the common range of administration 

of fiberoptic laryngeal endoscopy. These recordings were used to investigate the possible effect of 

working distance on the accuracy of the method. The laser source was turned on during these 

recordings. 

6.3.2. Segmentation and preprocessing 

Accurate detection of circular grids is a prerequisite of the proposed calibration method. An 

automatic two-stage method was developed for the segmentation of the circles from Set 1. To take 

advantage  of  the  full  72-dB  dynamic  range  of  the  camera,  recordings  were  imported  into 

MATLAB directly in the native 12-bit format from the proprietary Vision-Research .cine files 

without any conversion or compression. Frames of the recordings were averaged over time and 

then a Gaussian filter with a size of 2 pixels was applied. The Center and radius of the FOV were 

estimated using the method described in122. A strip parallel to the x-axis centered at the center of 

the FOV with a width of 9 pixels was selected. The strip was averaged over the rows, and then 

 

178 

locations of its local minima were detected. Detected locations were paired based on their distances 

from the center of the FOV. The average of each pair was used as the coarse estimation of the x-

coordinate of centers of circles. Half of the difference between each pair was used as the row-wise 

estimation of radii of circles. This process was repeated for a strip parallel to the y-axis averaged 

over the columns. The average of each pair was used as the coarse estimation of the y-coordinate 

of  centers  of  circles.  Half  of  the  difference  between  each  pair  was  used  as  the  column-wise 

estimation of radii of circles. The final coarse estimation of the radius of each circle was computed 

as the average of its row- and column-wise radii. A grid search over all combinations of the three 

estimated parameters ±1 pixel with the resolution of 0.25 pixels was used for fine-tuning of the 

estimated parameters. Specifically, for each case, the target parameters were used to create a ring 

mask with a width of 1 pixel. The mask was then applied to the gradient of the image, and the 

summation of the results was used as the cost function. The set of parameters that minimized the 

cost function was selected as the final estimation of the center and radius of each circle. Figure 6.4 

shows the process, with the results on an example image.  

Figure 6.4. Segmentation of a circular grid: (A) horizontal and vertical strips with their respective summations, (B) 

final segmented circles after the fine-tuning stage. 

 

The segmentation of the laser points for Set 2 was based on the method described in122. Target 

 

 

179 

objects in Set 3 were 24 radial lines, which were detected using the Hough transform242. Figure 

6.3(B) shows the grid after segmentation. The actual horizontal measurements on laryngeal images 

will rely on the manual segmentation of the target object. To better reflect this characteristic during 

the evaluation, a graphical user interface was developed for manual segmentation of line segments 

from Set 4. 

6.3.3. Horizontal calibration method 

Working distance and spatial location of the target object are the main confounding factors of 

horizontal measurements. Circular grids provide an effective way for the spatial sampling of the 

location  inside  the  FOV.  This  information  can  be  utilized  for  determining  the  dependence  of 

horizontal measurements on the spatial location. Additionally, the grids can be recorded at multiple 

working distances. This information may be utilized for determining the dependence of horizontal 

measurements on the working distance. To that end, all circles from Set 1 were segmented. This 

process led to 612 different data points. Let  ,   (w), and    (w) denote the working distance, 
pixel radius, and mm radius of a circle, respectively, recorded at   mm. Then, a statistical model 
can be trained using   and    as the predictor variables and     as the outcome variable. Let ℱ ,  
denote  a  polynomial  model  in  two  variables,    and   ,  with  maximum  degrees  of  M  and  N, 
respectively. Equations 6-3 and 6-4 show the model, where   ,  are some constants determined 
   =ℱ , ( ,  ) 
ℱ ,   ,   =    , .  .   
 
 
   
   

(6-3) 

(6-4) 

during the training process: 

  

To select the best model, polynomial models with different degrees were evaluated using 10-fold 

 

180 

cross-validation. The cost function was defined as the mean absolute error (MAE) over all testing 

samples from all folds. The ℱ ,  resulted in the MAE of 0.025 mm, which was the lowest value. 

This model will be referred to as the non-uniform model in the rest of this dissertation. Figure 

6.5(A) presents the trained non-uniform model. 

Figure 6.5. Models for horizontal measurements: (A) non-uniform model, (B) uniform model. 

 

 

To highlight the effect of spatial location on horizontal measurements, and also to test H5a, a 

second model was trained where all pixels in the FOV had similar pixel sizes. This scenario mimics 

the pixel radius and mm radius of the largest circle visible in the FOV recorded at the working 

horizontal measurement from a parallel-laser projection system. Let,   ( ) and    ( ) denote 
distance of   mm. The uniform pixel size  ( ) is defined as 
 ( )=   ( )
  ( ). 
trained  using  the  working  distance  as  the  predictor  variable  and    as  the  outcome  variable. 
Investigating the relationship between working distance and   revealed a linear model. This model 

The uniform pixel size was computed for all recordings in Set 1. Then, a statistical model was 

(6-5) 

is  shown  in  figure 6.5(B)  and  it  will  be  referred  to  as  the  uniform  model  in  the  rest  of  this 

 

181 

dissertation. 

6.3.4. Horizontal measurement method 

The application of the uniform model is simple and quite similar to the estimation of a distance 

on a printed map. The pixel size ( ) allows the conversion from pixel length into mm length. 
Considering the dependence of   on the working distance, the following steps were followed for 

horizontal measurements using the uniform model. The working distance was estimated from the 

positions of the laser points122; then the appropriate value of the pixel size was computed from the 

uniform model. The pixel length of the target object was measured on the image; then the pixel 

length was multiplied with the multiplicative factor of pixel size to estimate its mm length. 

The application of the non-uniform model is more involved, and it is described under two 

categories of radial and general measurements. A radial measurement is defined as the length of 

an object that has one of its ends on the FOV center. The non-uniform model was trained using 

circles centered at the FOV center. Therefore, the model can estimate the mm radius of a circle 

centered  at  the  FOV  center,  which  would  be  equivalent  to  a  radial  measurement.  Thus,  the 

following steps were followed for horizontal radial measurements using the non-uniform model. 

The working distance was estimated from the positions of the laser points.122 Then, the pixel length 

of the target radial object was measured on the image. The values of working distance and pixel 

length were fed into the trained non-uniform model (Equation 6-3), and the mm length of the object 

was estimated. 

A general measurement needs to be expressed in terms of radial measurements before the 

application of the non-uniform model. Figure 6.6 shows this process. The main goal is to determine 

the length of the line segment AB in mm. We can construct the triangle AOB on the image, where 

O is the FOV center. Referring to Figure 6.6, OA and OB each have one of their ends at the FOV 

 

182 

follows, 

center, and hence they constitute radial measurements, and their mm lengths can be computed 

using the non-uniform model. At the same time, we can measure the angle α from the image. Let 

       and        then, we can determine the angle between OA and the positive x-axis (  ) as 
  =            
where       denotes the four-quadrant inverse tangent function. The angle between OB and the 
positive x-axis (  ) can also be measured, similarly. Finally, the angle α can be computed as, 
 =|  −  | 
AB = OB + OA −2∙ OA∙ OB∙cos( ) 

Now, we can apply the law of cosines for determining the mm length of the line segment AB: 

(6-6) 

(6-7) 

(6-8) 

y(mm)

A

α

    

B

O

x(mm)

 

Figure 6.6. Expressing a general measurement in terms of radial measurements 

 

6.3.5. Estimation of the working distance 

Referring to Equations 6-3 and 6-5, we see that accurate estimation of the working distance 

is a prerequisite of both uniform and non-uniform methods. The method for estimating the working 

distance has been presented in chapter 4.122 The model assumed that the data were mapped into a 

 

183 

standard  template  by  applying  a  chain  of  rotation,  translation,  and  scaling  operations  on  the 

recorded  images.  The  rotation  operation  was  parametrized  in  terms  of  the  angle  between  the 

positive  x-axis  and  the  line  connecting  the  fiducial  marker  to  the  FOV  center.  The  rotation 

operation brings this angle to a fixed and standard value across all recordings.122 First, we show 

that the performance of the original method depends on the value of this angle; then we propose 

an improved version to alleviate this problem. 

Ten-fold  cross-validation  over  Set 2  was  used  to  evaluate  the  effect  of  different  standard 

angles on the accuracy of estimation of the working distance. To that end, the standard angle of 

the method presented in in122 was varied between 0° and 180° in 5° increments, and then the 

segmented laser points from the training set were used to create the model. It has been shown that 

laser points from the top row degrade the accuracy of measurements122; therefore laser points from 

the top row were discarded for this analysis. The trained model was then applied to the testing set, 

and measurement errors from the remaining 42 laser points were computed. MAE over all folds is 

shown in figure 6.7. 

)

m
m

(
 

E
A
M

 

Figure 6.7. Mean absolute error (MAE) of original and the proposed PCA method for different values of the standard 
 

angle. 

Investigation of figure 6.7 shows that the accuracy of the original method highly depends on 

the choice of the standard angle. Principal component analysis (PCA) is a mapping that is robust 

 

184 

to linear transformations of the data points, including their rotation. Consequently, we propose a 

Now, we can center the data and construct the matrix Qi. Equations 6-9 through 6-11 show these 

of j (1≤j≤n) into a 2×n data matrix Pi, where n is the number of working distances in the dataset. 

slight improvement over the original method. Let            be the cartesian coordinates of the laser 
point i (1≤i≤49) at the working distance   . We can store     for a specific value of i and all values 
Let     and     denote the average values of Pi over the first and the second row, respectively. 
definitions.    is a column vector containing 1 in all of its n rows. 
   =∑
   
     
   =∑
   
     
  =  −        ∙    
Now the direction capturing most of the variance of the data (  ) can be computed as, 
  =       
‖ ‖  
The first principle component (  ) would be the projection of the data points on the direction    
  =   ∙   
Now, the first principal component may be used to train the vertical calibration model. Let     
denotes  the  j  component  of  the  vector     (i.e.  projection  of  the  point      in  direction   ). 
  =        
       =          +           

 (  ∙     ∙ ) 

Equations 6-14 and 6-15 are repeated for each laser point i. 

and is computed as, 

(6-14) 

(6-15) 

 

 

(6-9) 

(6-10) 

(6-11) 

(6-12) 

(6-13) 

Ten-fold  cross-validation  over  Set 2  was  used  to  evaluate  the  effect  of  different  standard 

 

185 

angles on the accuracy of the improved model. The standard angle was varied between 0° and 180° 

in  5°  increments  and  for  each  value.  The  training  set  was  used  to  estimate    ,    ,   ,  and 
parameters of the model (   ,   ,   ,   ). The trained model was then applied to the testing set 

and measurement errors were computed. Figure 6.7 shows the computed MAE of the proposed 

method over all folds. This figure shows the robustness of the improved method to variations in 

standard  angle.  Experiment  1  in  the  next  section  presents  the  performance  of  the  proposed 

improved method in more detail. 

6.4. Experiments and results 

Four  experiments  were  conducted  to  answer  the  research  questions  of  this  chapter. 

Experiment 1 presents the performance of the vertical measurement. Experiment 2 quantifies the 

accuracy of horizontal radial measurements. Experiment 3 presents the performance of central 

angle  estimation  from  recorded  images.  Experiment 4  tests  the  performance  of  the  proposed 

method for general horizontal measurements. This section presents details of each experiment, 

followed by results and related discussions. 

6.4.1. Experiment 1: Accuracy of vertical measurements 

The accuracy of the improved vertical measurement model (Equations 6-14 and 6-15) was 

compared with the original method122 using 10-fold cross-validation. The original method used the 

value of 30° for the standard angle. At this angle, the grid becomes a square that is parallel to the 

x-y axis, which facilitates the labeling of the laser points. The same standard angle was also used 

for the proposed improved version. Recordings from Set 2 were split into training and testing sets. 

Both models were trained using the training set. The trained models were then applied to the testing 

set and measurement errors were computed. First, the effect of different laser points on the error 

 

186 

was investigated. MAE was computed for each laser point averaged over all working distances. 

Figure 6.8(B) shows the result. Based on this figure we see different laser points exhibit different 

performances  in  the  original  method,  where  the  top-row  laser  points  produce  inferior  results. 

Conversely, all laser points exhibit comparable performances in the improved PCA method. A 

second  analysis  was  conducted  to  test  the  effect  of  working  distance  on  the  accuracy  of  both 

methods. For this analysis, laser points from the top row were discarded from the original method 

and only the remaining 42 laser points were used. However, all 49 laser points were used for the 

analysis of the improved PCA method. Figure 6.8(C) shows the results. The lines represent the 

linear  model  fitted  on  the  individual  data  points.  We  can  use  the  slope  of  regression  lines  to 

compare the magnitude of error of different methods with the working distance. Slopes of original 

and  PCA  methods  were  0.008 mm/mm  and  0.001 mm/mm,  respectively.  Therefore,  we  may 

conclude that the performance of the improved PCA method is less dependent on the working 

distance. 

7
6
5
4
3
2
1

14
13
12
11
10
9
8

21
20
19
18
17
16
15

28
27
26
25
24
23
22

35
34
33
32
31
30
29

42
41
40
39
38
37
36

49
48
47
46
45
44
43

(A)

(B)

(C)

Figure 6.8.  Performance of estimating the working distance: (A) indexing of the laser points, (B) measurement 

accuracy of different laser points, (C) effect of working distance. 

 

 

6.4.2. Experiment 2: Performance of radial horizontal measurements 

The accuracy of the uniform model for radial measurement was evaluated using 10-fold cross-

validation. To that end, Set 1 recordings were split into training and testing sets. The uniform 

 

187 

model was trained using the largest enclosed circles of the training set. The trained uniform model 

was then evaluated for estimating mm radii of all circles from the testing set, in addition to smaller 

circles (those that were not used during the training process) of the training set. Figure 6.9 presents 

scatter plots of absolute errors of all folds versus the radial length of the target circle and the 

working distance. 

(A)

(B)

 

Figure 6.9.  Performance of uniform model for radial measurements: (A) effect of object length, (B) effect of 

working distance. 

Investigating scatter plots of figure 6.9 reveals that the measurement error of the uniform model 

depends on the working distance and the length of the target object. However, the relationship 

seems to be non-linear. Additionally, our analysis showed that neither of the variables had a normal 

distribution. Therefore, both parametric and non-parametric tests were used to quantify the effect 

of working distance and length of the object on the magnitude of the error. Table 6.1 reports the 

values of Pearson's r, Kendall's τ, and Spearman's ρ.  

Table 6.1. Correlation coefficients of the uniform model for radial measurement error. The symbol ε denotes a 

p<0.0001. 

Parameter 

Radial length 
Working distance 

r 

  Pearson's 
p 
ε 
ε 

  0.59 
  0.76 

  Kendall's 
p 
ε 
ε 

τ 
0.56 
  0.57 

  Spearman's 
p 
ε 
ε 

ρ 
0.69 
0.74 

Based on Table 6.1, we see a moderate positive correlation between the magnitude of error and 

 

 

 

188 

length of the target object and a strong positive correlation between the magnitude of the error and 

the working distance. 

The  non-uniform  model  was  trained  using  the  training  set,  and  then  its  performance  for 

estimating the mm radii of circles was evaluated using the testing set. Figure 6.10 presents scatter 

plots of absolute errors of all folds versus the radial length of the target circle and the working 

distance. 

(A)

(B)

 

Figure 6.10. Performance of non-uniform model for radial measurements: (A) effect of object length, (B) effect of 

working distance. 

 

Table 6.2 quantifies the effect of the radial length of the object and working distance on the 

magnitude of error from the non-uniform model. Based on Table 6.2, we see the magnitude of 

error has very week associations with the working distance and length of the target object.  

Table 6.2. Correlation coefficients of the non-uniform model for radial measurement error. The symbol ε denotes a 

p<0.0001. 

Parameter 

Radial length 
Working distance 

r 

  Pearson's 
p 
ε 
ε 

  0.16 
 
-0.14 

  Kendall's 
p 
ε 

τ 
0.12 
-0.08  0.003 

 

  Spearman's 

p 
ε 

ρ 
0.17 
-0.13  0.001 

 

Comparing the results of tables 6.1 and 6.2 highlights a primary advantage of the non-uniform 

method  over  its  uniform  counterpart.  Specifically,  the  non-uniform  method  has  a  stable  and 

 

189 

relatively constant error for a wide range of working distances and target lengths. Additionally, 

comparing  figures 6.9  and  6.10,  we  see  the  non-linear  method  reduces  measurement  error 

significantly.  To  better  quantify  this,  the  range  of  working  distance  was  divided  into  separate 

intervals. The average and standard deviation of error and magnitude of error for both uniform and 

non-uniform methods were calculated in each interval. Table 6.3 presents the results. Based on 

this table we see another advantage of the non-uniform approach. The average value of error in the 

non-uniform method is almost zero; therefore, measurement error using the non-uniform approach 

Table 6.3. Accuracy of radial measurements from the uniform and the non-uniform models in different ranges of 

working distance. 

Working  
distance  
interval (mm) 
(0, 5) 
[5, 10) 
[10, 15) 
[15, 20) 
[20, 25) 
[25, 30) 

Non-uniform 

Error (mm)  Magnitude of error (mm) 
std   mean 
mean 
0.003 
0.039  0.029 
-0.012  0.049  0.04 
0.02 
0.028  0.025 
-0.005  0.02 
0.015 
0.001 
0.031  0.022 
-0.001  0.039  0.029 

std 
0.026 
0.031 
0.024 
0.015 
0.022 
0.026 

 

 
 
 
 
 
 

Uniform 

std 

Error (mm)  Magnitude of error (mm) 
mean 
mean 
-0.192  0.077  0.192 
-0.351  0.151  0.352 
-0.489  0.217  0.492 
-0.692  0.303  0.693 
-0.955  0.352  0.956 
-1.159  0.476  1.162 

std 
0.075 
0.15 
0.21 
0.299 
0.347 
0.47 

has  a  random  nature.  Thus,  multiple  radial  measurements  can  reduce  the  error  significantly. 

Conversely, the average error in the uniform approach is not zero, indicating the systematic nature 

of the error. Finally, the error of the non-uniform method is several orders of magnitude smaller 

than  the  uniform  approach.  This  result  confirms  a  recent  finding  suggesting  the  presence  of 

significant errors in horizontal measurements if nonlinear distortion of fiberoptic endoscopy is not 

compensated.208 

6.4.3. Experiment 3: Performance of central angle estimation 

Equation 6-8 is at the core of general calibrated measurements using the non-uniform model 

and relies on the angle α. Experiment 3 was conducted to investigate the accuracy of the estimation 

of α from an image. This experiment is especially important, given the presence of non-linear 

 

190 

distortion in flexible endoscopy.208 Angle differences between adjacent lines from Set 3 (figure 

6.3) were estimated, and then they were subtracted from their true value (i.e. 15°). Figure 6.11 

presents boxplots of this error for different working distances. Running a one-way analysis of 

variance  (ANOVA)  did  not  indicate  any  significant  effect  of  working  distance.  Therefore,  all 

measurement errors were combined into a single group. The overall angle estimation error had the 

value of −0.03° ± 0.6° (average±std). Consequently, central angles can accurately be estimated 

from acquired images. This result may seem contradictory with a previous finding, suggesting 

significant  errors  in  the  estimation  of  angles  from  flexible  endoscopes248,  and  hence  requires 

further explanation. The proposed method relies on central angles; however, the work of 248 was 

based on a general angle. Considering the radial nature of the non-linear distortion, lines passing 

through the center do not experience bending and curving. Therefore, the central angles can be 

measured very accurately. 

1

0.5

0

-0.5

-1

-1.5

)
e
e
r
g
e
d
(
 
r
o
r
r
e
 
e
l
g
n
A

6.16

13.2

19.54

Working distance (mm)

26.44

 

Figure 6.11. Boxplot of angle estimation error computed from set3. 

 

6.4.4. Experiment 4: Performance of general horizontal measurements 

Set 4  was  used  to  compare  the  accuracy  of  uniform  and  non-uniform  models  for  general 

horizontal measurements. Both models were trained with all data points from Set 1. Additionally, 

Set 4 was recorded in the presence of laser points. Therefore, the required working distance was 

 

191 

estimated using the improved PCA method (Equations 6-14 and 6-15). 

To  investigate  the  effect  of  working  distance  on  general  horizontal  measurement  in  the 

uniform model, measurement errors from a 10 mm line segment recorded at working distances of 

6.16 mm, 13.20 mm, 19.54 mm, 26.44 mm were computed. One-way ANOVA with a trimming 

level of 0.2 and 1000 bootstrap samples252 was non-significant (p=0.61). Figure 6.12(A) presents 

boxplot of errors for different working distances. To investigate the effect of length of the target 

object on general horizontal measurement in the uniform model, measurement errors from 5 mm, 

10 mm,  15 mm,  20 mm  line  segments  recorded  at  the  working  distance  of  26.44 mm  were 

computed. One-way ANOVA with a trimming level of 0.2 and 1000 bootstrap samples252 was non-

significant (p=0.22). Figure 6.12(B) presents boxplot of errors for target objects with different 

lengths. Considering these non-significant results, all measurement errors were combined into a 

single group. The overall measurement error was −0.8±0.69 mm, and the magnitude of error was 

0.86±0.6 mm for the uniform method. 

(A)

(B)

 

Figure 6.12.  Performance of uniform model for general measurements: (A) effect of working distance, (B) effect of 
 

object length. 

A similar approach was followed for the non-uniform method. Figure 6.13 presents boxplot 

of errors for this analysis. The effects of working distance (p=0.64) and length of the target object 

(p=0.43) were non-significant. Considering these non-significant results, all measurement errors 

 

192 

were combined into a single group. The overall measurement error for the non-uniform method 

was -0.2±0.29 mm, and the magnitude of error was 0.27±0.24 mm. 

(A)

(B)

Figure 6.13. Performance of non-uniform model for general measurements: (A) effect of working distance, (B) 

effect of object length. 

 

 

Comparing boxplots and average errors of both methods indicate that the uniform approach 

on average has three times more errors than the non-uniform method. These results demonstrate 

the advantage of the proposed non-uniform approach. Investigation of boxplots of figure 6.12 may 

indicate a general trend for errors of the uniform method. Specifically, the measurement error 

seems to increase with the working distance and length of the target object. In experiment 2 we 

saw a strong and positive correlation between uniform method error and these two parameters, 

which confirms this subjective observation. However, the objective analysis of ANOVA failed to 

detect a significant trend. Experiment 2 relied on the detection of circular shapes. This specific 

geometry enabled us to achieve sub-pixel resolution on measuring the length of target objects (i.e. 

radii  of  circles).  However,  experiment  4  was  based  on  the  detection  of  lines,  which  has  the 

resolution of a pixel. Investigation of the performance of the non-uniform method also supports 

this.  Specifically,  the  non-uniform  method  showed  very  week  correlations  in  experiment  2 

(Table II). Therefore, we may expect to see a negligible trend for experiment 4, which subjective 

observation of figure 6.13 confirms. 

 

193 

6.5. Discussion 

The phonatory mechanism of the larynx is the primary voice production system in humans. It 

can be modeled as a dynamic system that takes air stream as the input and produces an acoustic 

signal  in  the  output.  The  parameters  of  this  dynamic  system  (e.g.  vocal  fold  length,  glottal 

configuration, etc.) determine the relationship between its input and output. If we could measure 

and determine the input, the output, and the parameters of the system on calibrated scales, we 

would  be  able  to  express  and  model  this  dynamic  system  using  mathematical  equations.  The 

method  for  measuring  the  input  and  output  of  this  system,  in  particular  for  clinical  voice 

assessment,  has  a  long  history.13  The  calibrated  measurement  of  parameters  of  the  phonatory 

system would help in achieving a more comprehensive physical model of voice production. This 

chapter presented a method that can measure spatial parameters of the phonatory mechanism on a 

calibrated scale (i.e. mm). It is expected that prospective horizontal measurements would improve 

our understanding of the function of normal and disordered phonatory mechanisms. Additionally, 

it could enable us to derive computational models tuned to each patient and hence make reliable 

predictions about the likely outcome of different treatment options. This computational approach 

would advance personalized medicine in the fields of laryngology and speech-language pathology. 

Last  but  not  the  least,  calibrated  horizontal  measurements  could  allow  us  to  make  a  direct 

evaluation of therapy efficacy (e.g. post-therapy reduction in the lesion size). The results of such 

prospective studies would advance evidence-based practice in the field of voice.  

This chapter provided the method for horizontal calibration and measurements from a laser-

projection transnasal fiberoptic HSV system, followed by a detailed analysis of its performance in 

different  conditions  and  scenarios.  Flexible  endoscopy  images  have  significant  non-linear 

distortions, which leads to the dependence of the pixel length of an object on its spatial location.208 

 

194 

Chapter 5 established the radial symmetry of this distortion208; hence, the proposed calibration 

protocol  was  based  on  circular  grids.  The  proposed  non-uniform  method  has  the  potency  of 

capturing and quantifying the effects of both working distance and spatial location simultaneously. 

To  demonstrate  the  efficacy  of  the  proposed  method,  its  performance  was  contrasted  with  a 

uniform approach, which assumed the independence of the pixel size of an image from its spatial 

location. Such uniform model is the basis of most existing methods for horizontal measurements, 

including all parallel laser projection systems.122  

The  conducted  experiments  revealed  several  significant  advantages  for  the  non-uniform 

approach over its alternative uniform counterpart. Specifically, the analysis of figures 6.9 and 6.10 

showed that the accuracy of radial measurements (experiment 2) using the non-uniform method 

was less dependent on the length of the target object and the working distance. For example, based 

on  table 6.3  we  see  the  average  magnitude  of  error  in  the  non-uniform  case  does  not  change 

significantly  when  working  distance  increases  from  5 mm  to  30 mm.  However,  the  average 

magnitude of error in the uniform case shows an increase of 600%. The average±std magnitude of 

error  in  uniform  approach  over  the  range  of  tested  working  distance  was  0.68±0.45 mm.  The 

average±std  magnitude  of  error  in  the  non-uniform  approach  over  a  similar  range  of  working 

distance was 0.03±0.03 mm which further highlights the advantage of the proposed non-uniform 

method.  

Evaluation of both methods in general measurement scenario (experiment 4) showed trends 

similar to radial measurements. Specifically, figure 6.12 indicates that the accuracy of the uniform 

approach degrades with an increase in the length of the target object, whereas figure 6.13 does not 

show any trends for the non-uniform approach. When the length of the target object increases, it 

spans  a  wider  spatial  location  in  the  FOV.  Considering  that  non-linear  distortion  of  flexible 

 

195 

endoscopy is spatially-dependent208, this may translate into a larger distortion of the final image. 

Therefore,  we  may  expect  to  see  a  length-dependent  error  for  the  uniform  approach.  It  is 

noteworthy that this dependence did not reach the significance level, which could be attributed to 

the small sample size and low spatial resolution of images. Average±std magnitude of errors in the 

general measurement scenario resulted in 0.27±0.24 mm for non-uniform and 0.86±0.6 mm for 

uniform method, which shows an improvement of 318% for the non-uniform method. 

6.6. Conclusion 

This  chapter  was  motivated  by  the  importance  of  performing  calibrated  (i.e.  mm)  spatial 

measurements of the vocal folds and the surrounding laryngeal structures during phonation. Such 

measurements  would  improve  our  understanding  of  the  normal  and  disordered  phonatory 

mechanisms  and  enable  us  to  derive  more  accurate  computational  models.  It  is  expected  that 

evidence-based practice and personalized medicine would benefit significantly from this line of 

research. However, the size of a target object in laryngeal images may depend on confounding 

factors, which prevents calibrated spatial measurements. This chapter investigated the effects of 

two confounding factors, namely the working distance and the spatial location of the target object. 

To  that  end,  a  set of  circular  grids  were  recorded  at  multiple  working  distances.  These  grids 

provided an efficient way of quantifying the effect of both factors. The information from these 

recordings was then used to train a statistical model that would take the spatial location and the 

working distance of the target object as the input, and estimate the calibrated length of the target 

object as the output. A laser projection fiberoptic endoscope was used to estimate the working 

distance  from  the  positions  of  the  laser points.  The  performance  of  the  proposed  method  was 

investigated in different scenarios. The method was also compared with a uniform model approach, 

where the effect of spatial location is not considered. The overall measurement error from the 

 

196 

proposed method was −0.2±0.29 mm, and the magnitude of error was 0.27±0.24 mm. These errors 

were more than three times lower than the uniform model approach.  

 

 

197 

CHAPTER 7: VALIDITY AND ACCURACY OF HORIZONTAL AND VERTICAL 

MEASUREMENTS BASED ON DIRECT CALIBRATION 

 

Based on: 

Ghasemzadeh  H.,  Deliyski  D.  D.,  et  al.  External  validity  of  calibrated  vertical  and  horizontal 
measurements from a laser-projection fiberoptic transnasal endoscope, in Preparation. 
 

 

Summary: Methods using laser-projection endoscopes allow for calibrated surface measurements. 

The  design  and  evaluation  of  these  methods  are  typically  done  in  controlled  settings,  using 

benchtop recordings. However, many factors could be contributing to measurement errors from 

in-vivo images. This chapter investigates the effect of two such factors: imaging angle and surface 

topology.  A  laser-projection  fiberoptic  flexible  endoscope  was  calibrated  using  benchtop 

recordings from flat surfaces (i.e. paper), perpendicular to the optical axis. Two experiments were 

conducted to evaluate its performance in situations modelling the in-vivo settings. (1) Images were 

acquired from tilted surfaces. (2) A target surface with known x-, y-, z-coordinates was 3D-printed, 

and its measurements accuracies were contrasted with the flat surface. The data analysis showed a 

significant effect of imaging angle on vertical measurement error. However, the effect of imaging 

angle on the magnitude of horizontal measurement error was not significant. Analysis of the effect 

of surface topology showed the reverse effects. The effect of surface type on vertical measurement 

error was not significant. But the magnitude of horizontal measurements errors from the 3D surface 

was  significantly  higher  than  the  flat  surface.  The  mean  percent  magnitude  of  horizontal 

measurement error increased from 5% (flat) to 10.6% (3D) at the working distance of 15 mm, 

which still represents satisfactory accuracy. 

 

198 

7.1. Introduction 

Imaging techniques are an important part of the functional assessment of voice and diagnosis 

of  voice  disorders.167–171  Previous  studies  have  suggested  that  vibratory  characteristics  and 

kinematic  measures  from  laryngeal  images  can  be  used  for  direct  evaluation  of  treatment 

outcomes.35,254  These  applications  would  benefit  significantly  from  the  ability  of  performing 

calibrated spatial measurements from the acquired images. Chapters 4 and 6 of this dissertation 

presented the methods for calibrated vertical and horizontal measurements from a laser-projection 

fiberoptic  HSV  system.  However,  the  methods  were  developed  based  on  benchtop  images, 

recorded  in  a  very  controlled  setting.  Specifically,  images  were  acquired  from  flat  surfaces 

perpendicular to the optical axis of the endoscope. Considering that the in-vivo environment is 

uncontrolled and with many variable factors, the performance of the proposed system may degrade 

significantly. The main aim of this chapter is to investigate how performances of the proposed 

vertical and horizontal measurements change as we move from the simple and controlled settings 

to  more  complex  cases.  To  achieve  this,  the  performance  of  the  system  is  evaluated  in  two 

scenarios. First, vertical and horizontal measurement errors are evaluated on flat surfaces that are 

not perpendicular to the optical axis. This analysis will quantify the effect of imaging angle on the 

accuracy of horizontal and vertical measurements. Second, vertical and horizontal measurement 

errors are evaluated on a 3D surface. This analysis will quantify accuracy of horizontal and vertical 

measurements on non-flat surfaces. 

In order to study the effect of variations in the imaging angle, first, we need to know the 

typical range of variations of this parameter. Reviewing the literature indicated that no rigorous 

study  on  the  normative  variations  of  the  imaging  angle  during  VSB  or  HSV  has  been  done. 

However, a crude estimation could be made based on two different studies.204,255 A single-subject 

 

199 

study with a flexible endoscope indicated that variations up to 30° in the imaging angle could be 

expected.204  The  second  study  was  on  differences  in  motions  of  a  laryngoscope  during 

endotracheal intubations between an expert, an intermediate-skilled practitioner, and a novice.255 

This study found 10° variations in the angle of the laryngoscope during the time that practitioners 

were trying to hold the view constant for placing the tube. Considering the very low sample size 

of the first study (n=1)204, and significant differences between the two laryngeal procedure (i.e. 

endotracheal intubation vs. laryngeal endoscopy) in the second study255, no clear conclusion can 

be made about the range of variability in the imaging angle. One possibility for this gap could be 

the lack of quantitative evaluation of the effect of imaging angle on objective measurements and 

subjective visual assessment of laryngeal images, which we tried to answer in chapter 5. It is 

noteworthy  that,  the  topic  of  imaging  angle  in  general,  has  found  little  attention.  Hibi  and 

colleagues were probably among the first people that investigated the effect of imaging angle on 

the  endoscopic  images.204  They  showed  that  distortions  of  endoscopic  images  significantly 

increase with an increase in the deviation of the imaging axis from the perpendicular angle.204 

Distortions as high as 20% were reported for a 30° deviation in the imaging angle. The other result 

is from a very recent work that used synthetic vocal folds to investigate the effect of different 

parameters of HSV recordings on the accuracy of the estimated subglottal air pressure and the 

cricothyroid activation from the glottal area waveform.205 This work suggested that the imaging 

angle was the most influential factor in the estimation of parameters of the model. Based on this 

study, a 10° changes in the imaging angle could lead to a 10% error in the estimation of subglottal 

air pressure from the glottal area waveform.205  

 

200 

Considering the relevance and importance of horizontal and vertical measurements for the 

field of voice, it is expected for the quantitative results of this chapter to provide significant insights 

into the accuracy and reliability of the proposed measurement methods.  

7.2. Aim and hypothesis 

The main aims of this chapter are to investigate the effect of imaging angle on vertical and 

horizontal measurement errors from the laser-calibrated  endoscope and to investigate the effect 

of  non-flat  surfaces  on  vertical  and  horizontal  measurement  errors  from  the  laser-calibrated  

endoscope. The main research questions of this chapter are: 

Q6a: 

How  the  imaging  angle  affects  the  performance  of  the  vertical  and  horizontal 

measurements? 

Q6b: 

How the topology of a 3D surface affects the vertical and horizontal measurements? 

To answer these research questions four hypotheses were formed that are presented here. 

The vertical distance of each individual laser point to the camera could be estimated using the 

method developed in chapter 4. In that regard, the method for vertical measurement may not have 

a direct dependency on the tilting angle of the target surface. However, changes in the imaging 

geometry  would  likely  lead  to  changes  in  the  shape  of  the  laser  points.  Our  initial  visual 

observations have suggested that shapes of laser points change from circles to ellipses when the 

target surface is tilted. Considering that vertical distances were measured based on the position of 

the circular estimation of the laser points, it is very likely for the vertical measurement accuracy 

to be affected by the imaging angle too. Based on this rationale, it is hypothesized that, 

H6a: 

The  tilting  angle  of  the  target  surface  and  the  working  distance  will  be  good 

predictors of the vertical measurement error. 

 

201 

Chapter 5 investigated the effect of imaging angle on the distortion of a fiberoptic flexible 

endoscope. We showed that variations in the imaging angle could have a significant effect on non-

calibrated horizontal measurements. Figure 7.1 shows a schematic of the imaging system when the 

target surface is tilted. Specifically, the points on one side of the surface would get closer to the 

camera (e.g. object B), whereas the points on the other side of the surface would get further away 

from the camera (e.g. object A).  

Figure 7.1. Imaging from a tilted surface: (A) effect of tilting the target surface on different objects within the FOV, 
 

(B) effect of tilting the target surface on the geometry of the FOV.  

The  uniform  and  non-uniform  pixel-to-mm  conversion  scales  were  the  basis  of  calibrated 

horizontal  measurement  from  the  laser-calibrated  endoscope  (chapter  6).  Those  models  were 

developed based on a perpendicular imaging angle assumption and its corresponding geometry. 

However, that assumption does not reflect the geometry of the imaging system when the surface 

is tilted (figure 7.1(B)). Therefore, it is expected for this discrepancy to manifest as measurement 

errors. Combining the likely effect of imaging angle and the working distance on the measurement 

error, it is hypothesized that, 

H6b: 

The  tilting  angle  of  the  target  surface  and  the  working  distance  will  be  good 

predictors of the horizontal measurement error. 

 

202 

The  topology  of  the  target  surface  could  also  be  a  major  contributor  to  measurement 

errors. Therefore, it is hypothesized that: 

H6c: 

The vertical measurement errors from a non-flat surface will be higher than those 

from a flat surface positioned at the same estimated average vertical distance. 

H6d: 

The horizontal measurement errors from a non-flat surface will be higher than a flat 

surface positioned at the same estimated average vertical distance. 

7.3. Material and method 

To investigate the effect of imaging angle and the 3D shape of the target surface on horizontal 

and vertical measurement errors, different sets of benchtop recordings were collected. Considering 

the significant differences between the imaging angle and the 3D shape, the protocols for their data 

collection, and the methods for their data analysis were different and hence are described in two 

different sections. 

7.3.1. Material and method for the effect of the imaging angle 

7.3.1.1. Data acquisition 

Different  sets  of  benchtop  recordings  were  used  to  pursue  the  aims  of  this  chapter.  The 

datasets were recorded using the same setup presented in section 1.4.1, figure 1.2. However, it had 

a major difference with the recordings of chapters 4 and 6. Recordings of those chapters were 

carried out using a setup with only one degree of freedom. That is, only the working distance was 

varied, but the tilting angle was fixed (perpendicular to the imaging axis). However, the setup for 

this chapter used two degrees of freedom.  

 

203 

First, we show that tilting the target surface and changing the imaging angle have comparable 

effects on the geometry of the imaging system. Referring to figure 7.2, we see two conditions. In 

the first condition, the target surface is fixed (surface S1) and the camera is rotated. In this case, 

the camera is perpendicular to S1 (position B), however, when the camera rotates by  °(position 
 °. Considering that in our setup (section 1.4.1, figure 1.2) the tilting angle of the target surface 

A) it becomes perpendicular to a different surface (S2) that is, in fact, the rotation of S1 by the same 

can be adjusted more conveniently and also more accurately, the target surface was tilted in this 

chapter. 

Figure 7.2. The effect of tilting the target surface vs. changing the imaging angle. 

 

 

7.3.1.2. Database 

Flexible endoscopes are equipped with a control handle that can bend its distal end. This 

feature enables the operator to change the FOV during the endoscopy. However, this feature means 

that the imaging angle of the system (at least) relies on the position of this handle. In other words, 

when the handle is at rest the optical axis may not be perpendicular to the target surface. To make 

reliable and accurate predictions about the likely effect of the imaging angle, this factor should be 

accounted for. In chapter 5 we showed that when the optical axis is perpendicular to the target 

 

204 

surface, the endoscope has radial symmetry. Furthermore, we saw that when the imaging angle 

deviates, the radial symmetry is disturbed significantly. Therefore, we may use the circular grid 

from chapter 6 to make the optical axis perpendicular. Figure 7.3(A and B) shows recordings from 

a circular grid at similar working distances, but opposite directions of tilting angle. Based on these 

images, we see when there is a tilting angle, the center of the circles moves away from the center 

of the image, and also the circles become ellipses. Additionally, as we predicted in chapter 5, the 

direction of the movement depends on the direction of the tilting angle. This observation could be 

utilized for achieving a perpendicular imaging angle.  

Figure 7.3. Recordings from a circular grid at the working distance of 8.66 mm: (A) the tilting angle of 15°, (B) the 

tilting angle of -15°, (C) tilting angle of 0° after making the endoscopic tip perpendicular to the target surface. 

 

The following procedure was followed to make the optical axis perpendicular. The target 

surface was leveled using a leveler. The coordinates of the center of the FOV was computed using 

the method described in section 4.3.3.1.2. A circular grid was attached to the metal sheet of the 

setup (figure 1.2), and then it was adjusted subjectively inside the FOV such that the largest visible 

circle had a uniform distance from the border of the FOV. The distal tip of the endoscope was 

passed  through  a  mechanism  that  allowed  its  displacement  in  the  left-right  and  front-back 

directions (figure 7.4). The distal tip was displaced until the center of the circular grid (the + mark 

 

205 

in figure 7.3) coincided with the center of the FOV. At this point, the optical axis of the endoscope 

would be perpendicular to the target surface. This position was fixed by tightening the fixtures in 

the displacement mechanism (figure 7.4). Another way for checking this would be to measure the 

radius of a certain circle in the four directions and make them as close as possible (i.e. the recorded 

image is a circle). Figure 7.3(C) shows an example of this.  

Figure 7.4. The setup that allowed precise adjustment of the distal tip of the endoscope. 

 

 

7.3.1.2.1. Database for vertical measurements 

To  test  the  effect  of  tilting  angle  and  working  distance  on  vertical  measurement  error, 

locations of the laser points at different working distances and imaging angles were recorded. The 

working distance was changed from 5 mm to 35 mm in 5-mm increments. The working distance 

was measured using a digital height gauge with an accuracy of 0.001″ (approximately 0.03 mm). 

Additionally, five different tilting angles of 0° to 10° in 2.5°-increments were tried. The method 

for measurement of the tilting angle was described in section 5.4.2 and figure 5.3. In summary, 

7×5=35  different  recording  conditions  were  tested  for  this  experiment.  Figure  7.5  shows  a 

schematic of different recording conditions. It is noteworthy that it is hard to adjust the setup for 

 

206 

achieving  the  exact  target  working  distances  and  tilting  angles;  therefore,  the  actual  values 

deviated  from  the  target  values.  However,  in  the  rest  of  this  chapter,  each  condition  will  be 

referenced using its attempted values. 

 

Figure 7.5. A diagram of the recording conditions. Different colors correspond to the FOV cone at different working 
distances. To simplify the visualization, the target surface is kept fixed and the camera is displaced. However, in the 
 

experiments it was the other way around. 

We will see in the next section that estimation of vertical measurement error, depends on 

accurate measurement of the mm distance between two arbitrary points inside the FOV. Therefore, 

the following protocol was followed for the data collection. The setup was adjusted to a desired 

working distance and imaging angle. A white piece of paper was attached to the metal sheet, the 

laser source was turned on, the light source was turned off, and a recording was done. Then, a grid 

paper with a known mm spacing was attached parallel to the edges of the metal sheet, the laser 

source was turned off, the light source was turned on, and a recording was done. The reason for 

performing two separate recordings was as follows. The grid lines were printed in black, and they 

absorbed the green light of any laser points falling on them. This would introduce errors in the 

detection of the center of the laser points. However, having separate laser and grid recordings 

 

207 

would  allow  a  more  accurate  segmentation  outcome  (i.e.  centers  of  laser  points  from  a  laser 

recording, and equations of the grid lines from a grid recording). Then, we can combine segmented 

information and create a composite image for performing the analysis.  

Our  preliminary  analysis  indicated  a  likely  effect  for  the  rotation  of  endoscopic  eyepiece 

inside the camera lens coupler. Such rotation can be quantified in terms of the fiducial angle (α in 

figure 7.6). Therefore, the whole recording protocol was repeated for three different fiducial angles 

of 32°, 124°, and 309°. In summary, the database for this experiment had 7×5×2×3=210 different 

recordings.  

7.3.1.2.2. Database for horizontal measurements 

To test the effect of tilting angle and working distance on horizontal measurement error, a 5-

mm line segment was recorded at different working distances and imaging angles. The working 

distance  was  changed  from  5  mm  to  35  mm  in  5-mm  increments.  The  working  distance  was 

measured  using  a  digital  height  gauge  with  an  accuracy  of  0.001″  (approximately  0.03  mm). 

Additionally, five different tilting angles of 0° to 10° in 2.5°-increments were tried. The method 

for measurement of the tilting angle is described in section 5.4.2 and figure 5.3. In summary, 

7×5=35  different  recording  conditions  were  tested  for  this  experiment.  Figure  7.5  shows  a 

schematic of different recording conditions. It is noteworthy that it is hard to adjust the setup for 

achieving  the  exact  target  working  distances  and  tilting  angles;  therefore,  the  actual  values 

deviated  from  the  target  values.  However,  in  the  rest  of  this  chapter,  each  condition  will  be 

referenced using its attempted values. 

We  showed  in  chapter  5  that  fiberoptic  flexible  endoscopes  have  significant  non-linear 

distortions. Additionally, in chapter 6 we showed that the spatial location of the target object is a 

confounding factor for calibrated horizontal measurements. Therefore, the 5-mm line segment was 

 

208 

positioned at eight different locations inside the FOV, per each recording condition. Considering, 

the systematic effect of spatial location on measurements (chapters 5 and 6), locations of the line 

segment inside the FOV was controlled. Specifically, the radius of the FOV and its center were 

estimated using the method described in section 4.3.3.1.2. Then the diameter of the FOV parallel 

to  the  x-axis  was  drawn  (line  AB  in  figure  7.6).  Then,  AB  was  divided  into  5-equal  spaced 

partitions (dashed gray line in figure 7.6). This process led to four lines inside the FOV that were 

parallel to the y-axis. The 5-mm line segment was subjectively positioned on these four lines, such 

that the line AB was its perpendicular bisector. A similar process was repeated for the line AB 

parallel to the y-axis. 

α

C

A

m
m
5

y

x

B

 

Figure 7.6. Placement of the 5-mm line segment inside the FOV for horizontal measurements.  

 

Similar to the previous section, the line segment was absorbing the green light of any laser 

points falling on it. This would introduce errors in the detection of the center of those laser points. 

Consequently,  two  different  types  of  recordings  were  collected.  A  white  piece  of  paper  was 

attached to the metal sheet (figure 1.2), the laser source was turned on, the light source was turned 

off, and a recording was done. Then, the laser source was turned off, the light source was turned 

on, and then the 5-mm line segment was placed at eight pre-determined spatial locations and it was 

 

209 

recorded. Consequently, there were 9 different recordings from each condition. The laser recording 

and the line segment recordings were segmented separately, then they were combined to create a 

set of composite images for the analysis.  

Our  preliminary  analysis  indicated  a  likely  effect  for  the  rotation  of  endoscopic  eyepiece 

inside the camera lens coupler. Such rotation can be quantified in terms of the fiducial angle (α in 

figure 7.6). Therefore, the whole recording protocol was repeated for three different fiducial angles 

of 31.6°, 124.1°, and 309.6°. In summary, the database for this experiment had 7×5×9×3=945 

different recordings.  

7.3.1.3. Analysis and measurements from a tilted surface 

7.3.1.3.1. Vertical measurements from a tilted surface 

The estimation of vertical measurement error depends on the knowledge of the true vertical 

distance of each laser point. When the target surface is flat and perpendicular to the optical axis of 

the endoscope all laser points would have similar vertical distances. However, when the target 

surface is tilted, laser points would have dissimilar vertical distances. Figure 7.7 shows a schematic 

of the problem. 

Figure 7.7. A schematic for estimation of the true vertical distance of the laser point B. 

 

 

 

210 

Let point O denotes the distal end of the endoscope and point B a laser point laying on the 

surface S1, where S1 has a tilting angle of γ degree. We can pass the hypothetical surface S2 from 

point B, perpendicular to the optical axis of the endoscope (OA). The intersection of S2 with the 

optical axis (OA) is marked with C′. The mm length of OC′ is defined as the vertical distance of 
the laser point B and is the aim of vertical measurement. The estimation of OC′ can be done as 
   =  .   ( ) 
   =  +   =  +  .   ( ) 

follows. The length OC is the working distance and is known in mm from the recording condition. 

Assuming the availability of BC in mm, we can write, 

(7-1) 

(7-2) 

It is noteworthy that in Equation 7-2 the length BC could be either positive (if point B has a 

larger vertical distance than point C), or negative (if point B has a smaller vertical distance than 

point C). Based on Equation 7-2, the mm length of BC is the only unknown factor that needs to be 

determined. We could use recordings from calibrated grids for measuring the mm distance between 

any  two  points  inside  the  FOV.  However,  in  chapter  5  we  saw  that  the  fiberoptic  flexible 

endoscopes have significant non-linear distortion. Additionally, we showed that when the optical 

axis is not perpendicular to the target surface, point C would move away from the center of the 

FOV in the image (refer to figure 7.3 for an example). Consequently, determining the location of 

point C, which is the prerequisite of computing the mm length of BC, is not trivial and could be 

subject to significant error. To remedy this, a modified approach was taken. Let R denotes an 

arbitrary fixed laser point called the reference point. Now at each tilting angle γ, the true vertical 

difference (  ) between points R and B can be computed as, 
  =  .   ( ) 

(7-3) 

 

211 

where BR is the mm distance between points B and R in the direction of the tilt. Now, we could 

use recordings from calibrated grids for measuring BR in mm. Assuming a tilt in the x-direction, 

the process was as follows. Let N denotes the number of complete grids in the x-direction between 

points B and R (N=4 in figure 7.8). The analytic equation of each grid line was determined during 

using the equation of the lines and in sub-pixel resolution. Additionally, the pixel distance between 

the segmentation process using the method described in section 5.4.3. Then, the pixel distance 

between the two y-direction lines enclosing the point R (length    in figure 7.8) was determined 
point R and the nearest y-direction line residing between points B and R (length    in figure 7.8) 
was determined using the equation of the line and in sub-pixel resolution. Values of    and    
  =  +    +     .  

were computed, similarly, for point B. If δ is the mm spacing of the grid lines, then BR can be 

computed in mm as, 

(7-4) 

 

Figure 7.8. An example of computing the mm distance between two laser points B and R. 

 
Now,  the  effects  of  working  distance  and  tilting  angle  on  vertical  measurement  error  can  be 

quantified using   . Let    and    denote the estimated vertical distance of points R and B using 
a vertical model, vertical measurement error (ℰ) can be computed using Equation 7-5. 
ℰ=  −(  −  )=  .   ( )−  +   

(7-5) 

 

212 

7.3.1.3.2. Horizontal measurements from a tilted surface 

Horizontal  measurements  from  the  5-mm  line  segment  recordings  closely  followed  the 

method presented in chapter 6. To that end, a GUI was developed that showed recordings one at a 

time. The GUI used the mouse for selection of the proper laser points, in addition to the marking 

of the two ends of the line segment. We saw in chapter 6 that the horizontal measurement relies 

on the estimation of the working distance. Considering that the surface could be tilted, some of the 

laser points would be on a different vertical distance than the target 5-mm line segment. Therefore, 

only the laser points close to the line segment were marked. This will ensure that a correct vertical 

distance is estimated for the target object. The vertical distance of each selected laser point was 

estimated using the PCA-based vertical model (chapter 6, Equations 6-14 and 6-15) and their 

average values was used as the vertical distance of the target object. Calibrated mm length of the 

line segment was estimated by feeding the estimated working distance and locations of the two 

endpoints of the line segment into the uniform (Equations 6-5) and the non-uniform (Equations 6-

3  and  6-4)  horizontal  models.  Horizontal  measurement  error  was  computed  as  the  difference 

between the true value (i.e. 5 mm) and the estimated value. 

7.3.2. Material and method for the effect of the 3D surface 

7.3.2.1. Data acquisition 

To investigate possible effects of 3D surfaces on vertical and horizontal measurement errors, 

a set of benchtop recordings from a 3D shape was collected. To that end, a general 3D model was 

created in Matlab. The model had three peaks and three valleys. The maximum height difference 

between its peaks and its valleys was 15 mm, and the size (length and width) of the model was 50 

mm×50 mm. Figure 7.9(A) presents the created model. Investigation of the hypotheses of this 

 

213 

section requires an accurate registration of the acquired images to the model. Considering the 

significant  non-linear  distortion  of  fiberoptic  flexible  endoscopes,  a  matrix  of  20×20  fiducial 

markers  was  created  (figure  7.9(B)).  Each  fiducial  marker  was  a  cuboid  with  a  size  of  0.45 

mm×0.45 mm×0.4 mm (for length, width, and height). These fiducial markers were merged with 

the 3D model and a composite image was created. A Creality-Ender5 3D printer with a 0.4 mm 

nozzle size and Polylactic Acid filaments with 1.75 mm diameter was used to print the created 

composite model. The surface was printed layer by layer with a thickness of 0.12 mm in each layer. 

The temperature of the nozzle during the 3D printing was set to 205° Celsius, and the temperature 

of the printing bed was set to 55° Celsius. The precision of the print was ±0.12 mm. Finally, to 

make the detection of the fiducial markers more accurate, all fiducial markers were painted in 

black. Figure 7.9(C) shows the printed 3D composite model, after painting its fiducial markers in 

black. 

Figure 7.9. The data used for investigating the effect of 3D shape: (A) the 3D model, (B) fiducial markers, (C) the 

printed composite model. 

 
Following the method described in section 7.3.1.2, the optical axis of the endoscope was made 

perpendicular to the target surface. Then, the printed 3D model was placed on the setup presented 

in section 1.4.1, figure 1.2 with one degree of freedom. Specifically, the tilting angle was kept 

fixed  and  at  zero  angle  (i.e.  perpendicular  imaging  angle)  and  only  the  working  distance  was 

varied. Five different subjective distances covering the range of close and far away, were used for 

 

214 

the recordings. Finally, the presence of the bright laser points was affecting the visual appearance 

of the fiducial markers. Therefore, at each working distance, two separate recordings were done. 

The external light source was turned on, and the laser source was turned off for the first recoding. 

This  data  will  be  referred  to  as  the  model  recordings  for  the  rest  of  this  chapter.  The  model 

recordings  were  used  for  the  detection  of  the  location  of  the  fiducial  markers  in  the  recorded 

images. Then, the light source was turned off, and the laser source was turned on. This data will 

be referred to as the laser recordings for the rest of this chapter. The laser recordings were used for 

the detection of the location of the laser points in the recorded images. The laser recordings were 

analyzed  with  the  PCA  vertical  measurement  model  (section  6.3.5).  post-data  collection.  The 

average working distance of each data was measured and is reported in table 7.1. 

Table 7.1. The estimated working distance from the 3D surface. 

 
Working distance index 
1 
2 
3 
4 
5 

  Estimated working distance 

std (mm) 
1.48 
2.04 
2.84 
2.98 
3.26 

Mean (mm) 

  9.72 
  14.67 
  19.1 
  23.45 
  27.53 

 

 

7.3.2.2. Analysis and measurements from a 3D surface 

7.3.2.2.1. Vertical measurements from a 3D surface 

The locations of the laser points were detected using the method described in 122. The fiducial 

markers were painted in black, this led to the absorption of the light from the laser points falling 

on them. For those points, the best circle representing the laser point was determined subjectively, 

and its center was used instead. A similar approach was followed for the detection of the fiducial 

 

215 

points, but instead of looking for the brightest point in the image (i.e. the laser pints), we looked 

for the darkest points in the image (i.e. the fiducial markers). For the fiducial markers missing from 

the segmentation process, they were detected manually. The segmented information from a laser 

recording (i.e. the center of the laser points) was fused with the segmented information from its 

corresponding model recording (i.e. the center of the fiducial markers) and a composite image was 

created. Figure 7.10(A) shows an example. 

 

Figure 7.10. The outcome of the registration process: (A) a composite image before the registration. Centers of the 
fiducial markers are marked with a red dot. Centers of the laser points are marked with a green cross mark, (B) the 

registration outcome for the composite image. 

 

Reliable estimation of vertical measurement errors depends on the knowledge of the ground 

truth (i.e. the real vertical distance). To achieve this, the four fiducial markers enclosing a laser 

points were determined. The indices of those fiducial markers in combination with the distances 

between them and the laser point were used to register the laser point on the 3D model (figure 

7.9(A)). This process was repeated for all laser points with four enclosing fiducial markers. The 

points with 3 or less enclosing fiducial markers were omitted from the rest of the analysis. Figure 

7.10(B) shows an example of the registration outcome. In figure 7.10(B) the height of the surface 

 

216 

is depicted in red, where a brighter color means a larger elevation at that point. Blue dots represent 

the center of the fiducial marker, and the green dots represent the center of the laser points. 

Doing the registration step would give us the estimated true value of the height of a laser point 

on the model from the base of the model. Therefore, using the methodology presented in section 

7.3.1.3.1  the  laser  point  25  (i.e.  the  laser  point  in  the  middle)  was  used  as  a  reference,  and 

differences  in  the  height  relative  to  laser  point  25  were  used  to  evaluate  measurement  error. 

Specifically, let zR denotes the true height of the model at the center of laser point 25 after the 

registration. Also, let zT denotes the true height of the model at the center of a target laser point 

after the registration. The true height difference between the two points (Δz) would be equal to, 

  =  −   
Additionally, let  ̃  and  ̃  denote the estimated vertical distances of the reference and the target 
points, respectively. Now, we could compute the difference in the vertical distance (  ̃) between 
vertical measurement error (ℰ) can be computed as, 
ℰ=  +  ̃ 
where the plus is due to the fact that    is measured relative to the base of the model (i.e. a surface 
that is below the model), but   ̃ s measured relative to the endoscope (i.e. a surface that is above 

the reference and the target laser points, using the PCA-based vertical measurement method. The 

(7-6) 

(7-7) 

the model). 

7.3.2.2.2. Horizontal measurements from a 3D surface 

In section 6.4.4 we saw that the non-uniform method offers significantly better measurement 

accuracy  than  the  uniform  method;  therefore,  the  non-uniform  method  was  used  for  this 

experiment. Horizontal measurement using the non-uniform method requires the x-y coordinate of 

the  two  endpoints  of  the  target  object,  in  addition  to  its  estimated  working  distance.  This 

 

217 

information is readily available from the created composite images (figure 7.10 (A)). Specifically, 

the horizontal distance between each two adjacent fiducial markers is fixed and equal to 2.5 mm. 

Therefore, we could find a string of 3 adjacent fiducial markers that belong to the same row or 

column, and use x-y coordinates of the first and the last fiducial markers. The true horizontal 

distance for this selection would be equal to 5 mm. The composite images also include the location 

of the laser point. Therefore, the working distance of the selected string can easily be estimated 

using their nearby laser points and the PCA method. 

For this experiment, all string of 3 adjacent fiducial markers that belong to the same row or 

column were detected. For each string, the number of leaser points near to the string was computed, 

and if it was less than three, that string was omitted from the rest of the analysis. The vertical 

distances of all nearby laser points were estimated using the PCA-based vertical model (chapter 6, 

Equations 6-14 and 6-15) and their average was used as the working distance of the target object. 

Finally, the x-y coordinates of the first and the last fiducial markers from the string were used for 

performing the horizontal measurements. 

7.4.  Experiments and results 

Two  experiments  were  conducted  to  answer  the  research  questions  of  this  chapter. 

Experiment 1 investigates the effect of the imaging angle on measurement accuracy. Experiment 2 

presents  measurement  accuracies  from  a  3D  surface.  This  section  presents  details  of  each 

experiment, followed by results and related discussions. 

7.4.1. Experiment1: effect of the imaging angle 

The effects of imaging angle and working distance on measurement errors were investigated 

in this experiment. 

 

218 

7.4.1.1. Experiment1a: effect of imaging angle on calibrated vertical measurements 

This experiment was conducted to quantify the effects of imaging angle and working distance 

on vertical measurement errors from a flat surface. The following hypothesis was formed for this 

experiment.  

H6a: 

The  tilting  angle  of  the  target  surface  and  the  working  distance  will  be  good 

predictors of the vertical measurement error. 

To test hypothesis H6a the dataset described in section 7.3.1.2.1 was used. Two different vertical 

models were presented in this dissertation. The first model was presented in chapter 4 and was 

published in the journal of voice. This model will be called the JOV model for the rest of this 

chapter. The second model was presented in chapter 6 and was based on the PCA analysis. This 

model will be referred to as the PCA model for the rest of this chapter. Vertical measurement error 

from each recording condition was evaluated using Equation 7-5.  

The performance of the JOV model was evaluated after removing the top row laser points. 

Figure 7.11 shows boxplot of error for different working distances and imaging angles from the 

JOV model. 

Figure 7.11. Boxplots of vertical measurement error using the JOV model at different working distances and 

imaging angles. 

 

 

The performance of the PCA model was evaluated. Figure 7.12 shows boxplot of error for 

different working distances and imaging angles from the PCA model. 

 

219 

Figure 7.12. Boxplots of vertical measurement error using the PCA model at different working distances and imaging 
 

angles. 

Investigating figures 7.11 and 7.12 indicates a higher magnitude of error in the JOV model. 

Additionally, boxplots of the PCA model show smaller variations in the error, which translates 

into a better agreement between measurements from different laser points. To test H6a a multiple 

linear regression analysis was used. Based on figures 7.11 and 7.12 measurement errors from both 

methods  have  outliers.  Considering  the  sensitivity  of  regression  analysis  to  the  presence  of 

outliers251, the robust multiple linear regression with iteratively reweighted least squares was used.  

Two different regression analyses were performed. In the first analysis, the imaging angle (a) and 

the working distance (wd) were used as the predictor variables and measurement error was used 

as the outcome variable. This analysis determines whether the method tends to underestimate or 

overestimate the measurements. The second regression analysis was based on the same predictor 

variables but used the magnitude of measurement error as the outcome variable. This analysis 

determines the overall performance of the system. 

Table 7.2 shows the results of the regression analyses for the JOV model. Based on the results 

of  table  7.2  we  can  make  the  following  conclusions.  There  is  a  significant  effect  of  working 

distance (p<0.00001) and imaging angle (p<0.00001) on the magnitude of the error. Also, the 

magnitude  of  measurement  error  was  positively  correlated  with  the  working  distance  and  the 

imaging angle. Additionally, the coefficient of imaging angle is 1.5 times larger than the working 

 

220 

distance. This indicates a higher sensitivity of error to the imaging angle. The overall model was 

able to account for 18.9% of variations in the magnitude of the error.  

Table 7.2. Results of multiple linear regression for the JOV vertical measurement model. The symbols wd, a, and ε 

stands for the working distance, the imaging angle, and p<0.00001. 

Parameter 

  Error 

  Magnitude of error 

Intercept 
wd 
a 
R-squared 

  0.215 
 
-0.004 
 
-0.006 
 

0.003 

Estimate  p 

Estimate 

0.0002 
0.08 
0.33 

  0.174 
  0.027 
  0.04 
 

p 
ε 
ε 
ε 

0.189 

 

The performance of the PCA model was evaluated using a similar approach. Table 7.3 shows 

the results. Based on the results of table 7.3 we can make the following conclusions. There is a 

significant  effect  of  working  distance  (p<0.00001)  and  imaging  angle  (p<0.00001)  on  the 

magnitude of the error. Also, the magnitude of measurement error was positively correlated with 

the working distance and the imaging angle. Additionally, the coefficient of the imaging angle was 

2 times larger than the working distance. This indicates a higher sensitivity of error to the imaging 

angle. The overall model was able to account for 34% of variations in the magnitude of the error.  

Table 7.3. Results of multiple linear regression for the PCA vertical measurement model. The symbols wd, a, and ε 

stands for the working distance, the imaging angle, and p<0.00001. 

Parameter 

  Error 

  Magnitude of error 

Intercept 
wd 
a 
R-squared 

Estimate  p 
ε 
 
-0.137 
ε 
  0.009 
  0.015 
ε 
 

0.03 

Estimate 
 
-0.08 
  0.019 
  0.038 
 

0.34 

p 
ε 
ε 
ε 

 

 

Comparing  tables  7.2  and  7.3  we  can  quantify  the  advantages  of  the  PCA  model.  The 

coefficients of working distance and imaging angles are smaller in the PCA model. This indicates 

 

221 

that measurements from the PCA model are more robust to variations in the working distance and 

tilting angle. 

7.4.1.2. Experiment1b: effect of imaging angle on calibrated horizontal measurements 

This experiment was conducted to quantify the effects of imaging angle and working distance 

on horizontal measurement errors from a flat surface. The following hypothesis was formed for 

this experiment.  

H6b: 

The  tilting  angle  of  the  target  surface  and  the  working  distance  will  be  good 

predictors of the horizontal measurement error. 

To  test  hypothesis  H6b  the  dataset  described  in  section  7.3.1.2.2  was  used.  Horizontal 

measurement error from each recording condition was computed as the difference between the true 

value  and  the  estimated  value  from  the  vertical  models.  Two  different  horizontal  models  of 

uniform  and  non-uniform  were  presented  in  chapter  6.  Both  models  were  evaluated  in  this 

experiment. 

The performance of the uniform model was evaluated. Figure 7.13 shows boxplots of error 

for different working distances and imaging angles from the uniform model. 

Figure 7.13. Boxplots of horizontal measurement error from the uniform model at different working distances and 

imaging angles. 

 

The performance of the non-uniform model was evaluated. Figure 7.14 shows boxplots of 

error for different working distances and imaging angles from the non- uniform model. 

 

222 

Figure 7.14. Boxplots of horizontal measurement error from the non-uniform model at different working distances 

and imaging angles. 

 

Investigating figures 7.13 and 7.14 indicates a significantly higher magnitude of error in the 

uniform model. Additionally, comparing the two sets of boxplots reveals that the centers of non-

uniform boxes are much closer to zero than their uniform counterparts. This indicates a random 

nature for measurement errors in the non-uniform model, compared to a systematic nature for 

measurement error in the uniform model. Putting it differently, we may achieve a very small error 

by averaging multiple measurements from the same object using the non-uniform model. To test 

H6b, a multiple linear regression analysis was used. To that end, two different regression analyses 

were performed. In the first analysis, the imaging angle (a) and the working distance (wd) were 

used as the predictor variables and measurement error was used as the outcome variable. This 

analysis determines whether the method tends to underestimate or overestimate the measurements. 

The  second  regression  analysis  was  based  on  the  same  predictor  variables  but  instead  the 

magnitude of measurement error was used as the outcome variable. This analysis determines the 

overall performance of the system. 

Table 7.4 shows the results of regression analysis for the uniform model. Based on the results 

of table 7.4 we can make the following conclusions for the uniform model. There is a significant 

effect of working distance (p<0.00001). Also, the magnitude of the error and the working distance 

 

223 

were positively correlated. Finally, the imaging angle didn’t reach the significant level (p=0.06). 

The overall model was able to account for 34% of variations. 

Table 7.4. Results of multiple linear regression for the uniform model for horizontal measurements. The symbols wd, 

a, and ε stands for the working distance, the imaging angle, and p<0.00001. 

Parameter 

  Error 

  Magnitude of error 

Intercept 
wd 
a 
R-squared 

Estimate  p 
ε 
-0.663 
 
-0.022 
ε 
 
  0.006 
0.06 
 

0.34 

Estimate 

  0.667 
  0.022 
 
-0.006 
 

p 
ε 
ε 
0.06 

0.34 

 

The performance of the non-uniform model was evaluated using a similar approach. Table 

7.5 shows the results of regression analysis for the non-uniform model. Based on the results of 

table 7.5 we can make the following conclusions for the non-uniform model. There is a significant 

effect of working distance (p<0.00001), where the magnitude of the error and the working distance 

were positively correlated. Finally, the imaging angle did not reach the significant level (p=0.24). 

The overall model was able to account for 8.9% of variations in the magnitude of the error. 

Table 7.5. Results of multiple linear regression for the non-uniform model for horizontal measurements. The 

symbols wd, a, and ε stands for the working distance, the imaging angle, and p<0.00001. 

Parameter 

  Error 

  Magnitude of error 

Intercept 
wd 
a 
R-squared 

Estimate  p 
-0.135 
ε 
 
ε 
 
-0.007 
  0.005 
0.008 
 

0.11 

Estimate 

  0.17 
  0.005 
 
-0.002 
 

p 
ε 
ε 
0.24 

0.089 

 

Comparing tables 7.4 and 7.5 we can quantify the advantages of the non-uniform model over 

its uniform counterpart. The coefficients for the working distance and the imaging angle were 

smaller in the non-uniform model. This indicates that measurements using the non-uniform model 

are more robust to these variations. For example, in the uniform model for every mm increase in 

 

224 

the working distance, the magnitude of error increases by 0.022 mm, which is 4.4 times higher 

than the non-uniform model. A similar argument can be made for imaging angle too, however, due 

to nonsignificant p-values, the effect of imaging angle should be interpreted with more caution. 

7.4.2. Experiment2: effect of a 3D surface 

Experiment 2 presents the performances of vertical and horizontal measurement methods on 

a 3D surface.  

7.4.2.1. Experiment2a: effect of a 3D surface on calibrated vertical measurements 

This experiment was conducted to quantify the effects of a 3D surface on vertical measurement 

errors. The following hypothesis was formed for this experiment.  

H6c: 

The vertical measurement errors from a non-flat surface will be higher than those 

from a flat surface positioned at the same estimated average vertical distance. 

To  test  hypothesis  H6c  the  dataset  described  in  section  7.3.2.1  was  used.  In  the  previous 

section,  we  saw  that  the  PCA  model  had  superior  performance  compared  to  the  JOV  model; 

therefore, only the performance of the PCA model was investigated here. 

First, the performance of the PCA model on a flat surface with a zero tilting angle should be 

computed. We had a total number of 124 data points from a flat surface at zero tilting angle. The 

data points were randomly divided into training (70%) and testing (30%) sets. The PCA model 

was trained using the training data. The trained model was applied to the testing samples, and the 

measurement error was computed. Figure 7.15 presents the fitted curves with their 95% confidence 

interval. 

 

225 

Data points
Estimated average error
95% prediction bounds

2.5

2

1.5

1

0.5

0

-0.5

-1

-1.5

Data points
Estimated average error
95% prediction bounds

2.5

2

1.5

1

0.5

-2

0

5

10

15

20

Working distance (mm)

(A)

25

30

35

0

0

5

10

15

20

Working distance (mm)

(B)

25

30

35

 

Figure 7.15. Performance of the PCA model on a flat surface: (A) vertical measurement errors, (B): magnitude of 

 
Then,  vertical  measurement  errors  from  the  3D  model  were  computed.  Figure  7.16  present 

vertical measurement errors. 

boxplots  of  this  analysis.  An  interesting  observation  can  be  made  from  boxplot  7.16(A).  The 

vertical measurement error has a random nature. That is, multiple measurements relative to a fixed 

reference can reduce the error significantly. This is evident from the fact that boxplots of vertical 

measurement error are relatively centered around zero. Considering that the endoscope would be 

utilized for  studying the  envelope of the vocal folds (and not the  behavior of individual laser 

points), this characteristic is very beneficial. 

3

2

1

0

-1

-2

-3

-4

-5

-6

9.72

14.67

19.1

23.45

Working distance (mm)

(A)

27.53

(B)

Figure 7.16. Performance of the vertical measurement errors on a 3D surface: (A) boxplot of error, (B) boxplot of 

the magnitude of error. 

 

 

226 

A two-way ANOVA could be used to test H6c. The dependent variable for this test was the 

vertical measurement errors, and the independent variables were the surface condition (flat vs. 

3D), and the working distance group. Table 7.6 reflects the results of the analysis for measurement 

error and the magnitude of measurement error. 

Table 7.6. Results of 2×5 ANOVA for vertical measurement errors. 

Variable 

Surface (S) 
Working distance (WD) 
S×WD 

 

  Error (mm) 

p 

  0.59 
  <0.00001 
  <0.00001 

 

  Magnitude of error (mm) 

p 

  <0.00001 
  <0.00001 
  <0.00001 

 

Based on the results of table 7.6 we see that the surface condition (flat vs. 3D) did not have a 

significant effect on the vertical measurement errors. However, the surface condition (flat vs. 3D) 

had  a  significant  effect  on  the  magnitude  of  vertical  measurement  errors.  Running  post-hoc 

analysis  showed  that  the  magnitude  of  vertical  measurement  error  from  the  3D  surface  was 

significantly higher at all working distances, except for the 23.45 mm group. Considering that the 

endoscope primarily would be utilized for studying the envelope of the vocal folds (and not the 

behavior of individual laser points), the non-significant difference of the error seems to be of higher 

practical value. Finally, the mean percent (magnitude of) error was defined as the mean of the ratio 

of (magnitude of) errors to the target value and is reported in table 7.7. As a final note, in all 

analyses we assumed that the 3D printing error and the registration error had a random nature. 

 

227 

Table 7.7. Mean percent error and mean percent magnitude of error for vertical measurement. 

Estimated  
Working distance 
9.72 
14.67 
19.1 
23.45 
27.53 

  Mean percent error % 

  Mean percent magnitude of error % 

Flat 
 
-0.2 
  0.8 
 
-0.1 
-3.7 
 
 
-0.7 

3D 
0.2 
-1.9 
-4.3 
-1.1 
0.7 

Flat 
  2.5 
  1.6 
  1.3 
  1 
  0.9 

3D 
9.2 
6.2 
6.4 
5 
5.4 

 

7.4.2.2. Experiment2b: effect of a 3D surface on calibrated horizontal measurements 

This  experiment  was  conducted  to  quantify  the  effects  of  a  3D  surface  on  horizontal 

measurement errors. The following hypothesis was formed for this experiment.  

H6d: 

The horizontal measurement errors from a non-flat surface will be higher than a flat 

surface positioned at the same estimated average vertical distance. 

 
Horizontal measurement errors from the 3D model were computed. Figure 7.17 present boxplots 

of this analysis. Investigating boxplot of figure 7.17(A) suggests that at short working distances 

(less than 15 mm) the method is underestimating the measurements. However, at large working 

distances the method is overestimating the measurements. Investigation of figure 7.17(B) also 

shows that the magnitude of measurement is significantly higher at 9.72 mm. This could be because 

at shorter working distances the magnitude of registration error and/or printing error become more 

comparable with the measurement errors and therefore their contributions could become more 

significant. 

 

228 

2

1.5

1

0.5

0

-0.5

-1

-1.5

-2

9.72

14.67

19.1

23.45

Working distance (mm)

(A)

2

1.5

1

0.5

0

27.53

9.72

14.67

19.1

23.45

27.53

Working distance (mm)

(B)

Figure 7.17. Performance of the horizontal measurement errors on a 3D surface: (A) boxplot of error, (B) boxplot of 
 

the magnitude of error. 

A  two-way  ANOVA  was  used  to  test  H6d.  The  dependent  variable  for  this  test  was  the 

horizontal measurement errors, and the independent variables were the surface condition (flat vs. 

3D),  and  the  working  distance  group.  The  data  for  the  flat  surface  was  the  same  as  the 

experiement1b of this chapter, but only the data from the zero tilting angle were used. Table 7.8 

reflects the results of the analysis for measurement error and the magnitude of measurement error. 

It  is  noteworthy  that  the  values  of  working  distance  were  a  little  different  between  different 

recordings,  but  we  will  report  them  using  the  same  working  distance  group  (for  example  the 

measurement from 9.72 mm will be referred to as 10 mm, etc.). 

Table 7.8. Results of two-way ANOVA for horizontal measurement errors. 

Variable 

Surface (S) 
Working distance (WD) 
S×WD 

 

  Error (mm) 

p 

  <0.00001 
  <0.00001 
  0.0067 
 

  Magnitude of error (mm) 

p 

  <0.00001 
  0.35 
  0.0008 

 

Based on table 7.8 there was a significant effect of the surface (flat vs. 3D) on both horizontal 

measurement error, and the magnitude of horizontal measurement error. Post-hoc analysis was run 

 

229 

on  the  ANOVA  model  for  the  error.  Only  the  measurement  error  from  the  shortest  working 

distance (~10 mm) led to a significant difference between the two surface conditions. Interestingly, 

at this working distance the measurements from the flat surface were overestimated, but from the 

3D surface were underestimated.  Post-hoc analysis on the ANOVA model for the magnitude of 

errors  showed  a  significant  difference  between  measurement  errors  from  the  flat  and  the  3D 

surfaces  at  the  working  distance  groups  of  10  mm  and  15  mm.  The  magnitude  of  error  was 

significantly higher from the 3D surface. However, the test failed to detect a significant difference 

between  the  two  conditions  at  the  working  distance  of  20  mm.  Finally,  the  mean  percent 

(magnitude of) error was defined as the average of the ratio of the (magnitude of) errors to the 

target value and is reported in table 7.9. The results of table 7.9 support the discussed finding. 

However,  the  more  interesting  finding  is  that  the  difference  in  the  performance  of  horizontal 

measurements from the flat and the 3D surface decreases as the working distance increases. As a 

final note, in all analyses we assumed that the 3D printing error and the registration error had a 

random nature. 

Table 7.9. Mean percent error and mean percent magnitude of error for horizontal measurement. 

Working distance group 

  Mean percent error % 

  Mean percent magnitude of error % 

10 
15 
20 

Flat 
-2.2 
-5 
-7.2 

 
 
 

3D 
11.4 
2.5 
-7 

Flat 
  2.4 
  5 
  7.2 

3D 
16.6 
10.6 
11.7 

 

7.5. Discussions 

The laser-projection endoscope will be used for in-vivo data collection. However, the vertical 

and  horizontal  calibration  methods  presented  in  chapters  4  and  6  were  developed  based  on 

benchtop  recordings.  There  are  significant  differences  between  the  two  recording  conditions. 

 

230 

Specifically, the benchtop recording presents the most controlled data acquisition scenario. For 

example, the surfaces were white, flat, there was a minimum light reflection, and the optical axis 

was perpendicular to the target surface. On the other hand, the in-vivo condition represents the 

least controllable data collection environment. For example, it is very likely for the in-vivo images 

to be acquired at a non-perpendicular imaging angle from the region of interest. Furthermore, the 

region of interest would definitely have a 3D topology and be non-flat. The in-vivo data will be 

collected in the presence of 300-Watt xenon light. This could add significant light reflections to 

the acquired images. In summary, the true performance of the method on actual in-vivo data could 

be significantly different from those estimated and reposted in chapters 4 and 6. Using calibrated 

intraoperative images would be one possible solution to remedy this.191 We could use calibrated 

intraoperative images and determine if there are huge discrepancies between the performance of 

the two conditions. The main advantage of this approach would be its potency to mimic the true 

data collection condition. However, there are some limitations to this approach. For example, it 

can  only  be  done  in  an  operation  room,  which  puts  a  practical  restriction  on  its  feasibility. 

Additionally, the number of possible factors is so high that if high measurement errors are found, 

it would be very difficult (if possible, at all) to determine the most contributing factors to the 

measurement errors. Obviously determining the most contributing factors would be necessary for 

devising better measurement approaches and/or instruments. Last not least, the ground truth is not 

known in intraoperative images. More precisely, using the intraoperative images as the ground 

truth assumes a perfect validity and reliability for its subsequent measurements, and attributes all 

measurement error to the method that is being tested. However, in reality, the estimated error 

would be a mixture of the two errors. 

 

231 

A different solution would be to simulate the most likely contributing factors in a controlled 

fashion. This approach has the potency of addressing the above-mentioned concerns; however, it 

depends on the selection of the most likely factors contributing to measurement errors, which 

requires enough knowledge to assist with the selection process. In chapter 5 we saw that imaging 

angle was the most contributing factor on uncalibrated measurement errors from a flat surface. 

Therefore, it is logical to select it as a contributing factor. Furthermore, figure 7.1 shows that 

variations in the imaging angle bring some parts of a flat surface closer to the endoscope while 

pushing the other parts further away. We could hypothesize that this factor may account for some 

of the observed increases in the measurement errors. In that case, imaging from a non-flat surface 

would be another instance of a non-uniform distance between the target surface and the endoscope. 

Therefore, the 3D structure of the target surface was selected as the second likely contributing 

factor. 

The selected two most likely contributing factors (i.e. the imaging angle and 3D topology of 

the target surface) were changed in a systematic way to investigate their effect on horizontal and 

vertical measurement errors. Experiment 1 was conducted to quantify the effect of changes in the 

imaging angle. Our analysis showed that vertical measurement errors from the PCA-based method 

were 2 times more sensitive to variations in the imaging angle than the working distance. A similar 

analysis was conducted on horizontal measurement error. The analysis showed that horizontal 

measurement error from the non-uniform method was less sensitive to variation in the imaging 

angle than the working distance. Interestingly, the effect of imaging angle on the magnitude of 

error  was  non-significant.  Comparing  this  outcome,  with  the  high  sensitivity  of  uncalibrated 

vertical measurement errors on imaging angle (chapter 5), highlights the efficacy of the proposed 

method for handling the effect of imaging angle. 

 

232 

Experiment  2  was  conducted  to  see  if  there  are  significant  differences  in  measurement 

accuracies between a flat and a non-flat surface.  First, the effect of surface type (flat vs. 3D) was 

tested  on  vertical  measurement  errors.  Interestingly,  the  surface  type  didn’t  have  a  significant 

effect on the vertical error. However, the magnitude of measurement from the 3D surface was 

significantly higher than a flat surface. The vertical measurement capability of the laser endoscope 

would  primarily  be  used  for  vertical  envelope  estimation  and  not  behavior  of  individual  laser 

points. Therefore, the obtained non-significant result would be of higher practical value for this 

device. Additionally, the fact that the effect of surface type was non-significant for the error and 

was significant for error indicates a random nature for measurement errors from the 3D surface. 

More specifically, it indicates that the magnitude of overestimation and underestimation from the 

3D surface is higher than a flat surface positioned at a comparable average working distance (hence 

the significant effect of the magnitude of error); however, the magnitudes of overestimation and 

underestimation are on the same level, and hence cancel each other out when averaged (hence the 

non-significant effect for error). Elaboration on the cause of this non-significant effect warrants 

some explanation. The calibrated endoscope projects a set of distinct laser points on the FOV, and 

each  laser  point  would  occupy  a  very  small  area  of  the  whole  image.  Also,  the  vertical 

measurement method was designed such that measurement from each laser point was independent 

from other laser points. The combinations of these two characteristics mean that each laser point 

would only have access to information (including topology information) from a very limited area. 

If the target surface is smooth and without sudden changes in the vertical components, any small 

area can be approximated with a flat surface. Consequently, the area that each laser points have 

access to would be almost a flat surface, and we shouldn’t see a significant difference in errors 

between  the  two  conditions.  Finally,  a  very  peculiar  trend  can  be  seen  in  table  7.7  worth  a 

 

233 

discussion. The mean percentage of the magnitude of error decreases with the working distance. 

When the working distance is shorter, a smaller portion of the 3D surface would be recorded, and 

in that regard, the vertical variation should be smaller. Additionally, in chapter 4 we saw that 

accuracy of vertical measurement was better at shorter working distances. So, we may expect to 

see  smaller  values  for  shorter  working  distances.  Therefore,  some  other  factors  should  be 

contributing to this observation. In all of the analyses we assumed that the 3D printing error and 

the  registration  error  had  a  random  nature.  However,  this  could  be  an  incorrect  assumption. 

Specifically, at shorter working distances the magnification of the imaging system is higher, and 

therefore a smaller area of the target surface is recorded. This means that fewer fiducial markers 

would be present at images acquired from shorter working, and the number of fiducial markers 

would  increase  as  the  working  distance  increases.  Considering  that  non-linear  distortion  of 

fiberoptic flexible endoscopes is location-dependent, and also the fact that registration accuracy 

relies on the fiducial markers; we could argue that at smaller working distances the registration 

error would be higher. This would translate into a less accurate estimation of the ground truth, 

which incorrectly may lead to a higher estimation of measurement errors. Finally, the statistical 

analysis indicated a significant effect of surface type on horizontal measurement error. However, 

post-hoc  analysis  did  not  find  a  consistent  trend  between  different  working  distances.  More 

specifically, the results suggested that at short working distances measurement errors from the 3D 

surface is higher than the flat surface, However, there was no significant difference between the 

two surface conditions at larger working distances. Table 7.9 shows that in fact as the working 

distance  increase  the  difference  in  mean  percent  error  and  mean  percent  magnitude  of  error 

between the two surface condition decreases. 

 

234 

7.6. Conclusions 

This work was motivated by the significant difference in the conditions that vertical and horizontal 

calibration  methods  were  being  developed  and  evaluated  on,  and  the  actual  in-vivo  imaging 

conditions.  The  in-vivo  condition  is  uncontrolled  with  many  different  variable  factors.  These 

factors  were  not  explicitly  considered  during  the  development  of  the  algorithms.  Therefore, 

measurement accuracies from in-vivo images could be very different from the estimated values in 

chapters 4 and 6. To address this concern, the two most likely parameters degrading the accuracy 

of  developed  horizontal  and  vertical  measurement  methods  were  investigated  in  this  chapter. 

Those  parameters  included  the  imaging  angle  and  the  3D  topology  of  the  surface.  Doing  the 

analysis showed that vertical measurement errors were two times more sensitive to variations in 

the imaging angle than the working distance. However, horizontal measurement errors were less 

sensitive  to  variation  in  the  imaging  angle,  than  the  working  distance.  This  highlights  the 

robustness of the developed horizontal measurement method to variations in the imaging angle. 

Investigating the effect of surface type (flat vs. 3D) did not lead to significant differences in vertical 

measurement errors. Doing similar analysis on horizontal measurements indicated a significant 

effect of the surface type on horizontal measurement error. However, post-hoc analysis suggested 

that at short working distances measurement errors from the 3D surface were higher than the flat 

surface.  But  the  two  conditions  were  becoming  more  similar  as  the  working  distance  was 

increased. 

 

235 

CHAPTER 8: SUMMARY OF THE FINDINGS 

 

Spatially  calibrated  measurements  could  offer  significant  advantages  for  voice  science 

research and clinical applications. They could be used to derive criteria for more accurate and 

direct evaluation of intervention outcomes (e.g. post-intervention changes in the lesion size), and 

in that regards could advance the evidence-based practice in the field of laryngology and speech-

language pathology. Spatially calibrated measurements could also be used to create comprehensive 

models that can link the input (i.e. airflow), the output (i.e. acoustic signal), and parameters of the 

phonatory system (e.g. calibrated glottal area waveform, vocal fold length, kinematic measures) 

together. It is expected for such computational models to explain the individual differences that 

we see in the intervention outcome of individual patients. This prospective line of research could 

advance  precision  and  personalized  medicine  in  the  field  of  laryngology  and  speech-language 

pathology.  Kinematic  measures  (e.g.  the  vocal  fold  velocity)  are  another  possible  outcome  of 

calibrated measurements. Kinematic measures are closely related to biomechanics of the vocal 

fold  vibration  and  provide  wealth  of  information  for  modeling  and  patient-specific  modeling 

applications.  Additionally,  vocal  folds  collision  forces  and  vocal  fold  stiffness256  are  among 

important parameters of the phonatory system that indirectly may be estimated using the velocity 

measures. More accurate gradings of laryngeal diseases and studying the developmental aspects 

of  the  vocal  folds  are  other  topics  on  interest  that  could  benefit  from  calibrated  spatial 

measurements. Considering the significance of such prospective research, this dissertation was 

devoted  to  an  in-depth  treatment  of  spatial  calibrated  measurements  from  in-vivo  high-speed 

videoendoscopy images.  

Generally speaking, achieving the spatial calibration goals depend on the existence of some 

auxiliary  information.  This  auxiliary  information  makes  the  conversion  from  the  uncalibrated 

 

236 

lengths (i.e. pixel) to calibrated lengths (i.e. mm) possible. Depending on the source of the auxiliary 

information, two different categories of direct and indirect calibration approaches were identified 

and presented in this dissertation. The auxiliary information of direct method comes from the same 

image that we want to perform measurements from. While, in the indirect calibration approach the 

auxiliary  information  comes  from  a  different  image  than  the  image  that  we  want  to  perform 

measurements from. The definition of direct method stipulates the existence of some properly 

designed fiducial markers on the acquired images. Therefore, several important challenges should 

be  addressed  for  direct  methods.  First,  proper  fiducial  markers  should  be  designed,  such  that 

calibrated  measurements  become  possible,  while  the  fiducial  markers  should  not  obstruct  the 

clinical applications of the acquired images. Second, the fiducial markers should be delivered and 

projected  on  the  field  of  view.  Third,  sophisticated  calibration  protocols  and  measurement 

techniques  should  be  developed  and  implemented  to  achieve  the  measurement  purposes.  In 

summary, direct calibration could offer very reliable and accurate calibrated measurements but it 

requires specialized hardware and software capabilities, and because of that, it would only be 

accessible to very limited research labs. It is unlikely for such systems to become commercially 

available  in  the  near  future.  Indirect  calibration  was  the  solution  that  was  proposed  in  this 

dissertation to make calibrated measurements accessible to more research labs. Specifically, the 

indirect  calibration  uses  the  uncalibrated  length  (i.e.  pixel  length)  of  a  common  object  for 

normalization of other spatial features of the image. Depending on the information available from 

the common object, either absolute mm measurement or percentage change of a target object can 

be computed. The downside of indirect calibration is the reliability of its subsequent measurement. 

Specifically, three main assumptions behind the validity of indirect calibration were presented in 

this dissertation. Often (if not always) direct evaluation of these three assumptions is not trivial 

 

237 

from in-vivo images, and hence the measurement errors from indirect calibration could not be 

estimated directly. However, two tests were proposed in this dissertation that could provide some 

levels of assurance regarding the validity of measurements. Figure 8.1 presents a diagram of the 

relationships among the different chapters of this dissertation. 

Indirect calibration

(chapter 2)

Application

Closing velocity of the vocal folds

(chapter 3)

Spatial calibration

approach

Vertical calibration

(chapter 4)

Direct calibration

Horizontal calibration

(chapter 6)

Confounding 

factors

V
a
l
i
d
a
t
i
o
n

Working distance

(chapter 4)

Spatial location of the target object

(chapter 5)

External validity of horizontal and 

vertical measurements

(chapter 7)

Figure 8.1. Graphical representation of the relationships among the chapters of this dissertation. 

8.1. Specific contributions of each dissertation chapter 

 

 

 

Chapter 2 was devoted to a formal treatment of indirect calibration method. The assumptions 

behind the validity of measurements were derived based on mathematical analysis of the pixel 

size. To make the problem tractable, it was assumed that the pixel size was only a function of the 

working distance (e.g. no non-linear image distortion) and the optical axis was perpendicular to 

the target surface. Under these conditions three main assumptions governing the validity of the 

indirect calibration were derived. First, the common attribute should be registered accurately in 

the target image. Second, the common attribute and the target object should be at the same vertical 

distance  from  the  endoscope.  Third,  the  calibrated  length  of  the  common  attribute  should  not 

 

238 

change between different imaging sessions. Finally, these assumptions were tested and discussed 

in the context of laryngeal imaging and using a pre-existing HSV dataset. 

Chapter  3  built  on  the  results  of  chapter  2  and  used  the  indirect  calibration  method  for 

investigation of post-surgery changes in closing velocity of the vocal folds in patients with vocal 

fold mass lesions. HSV recordings from habitual pitch, habitual loudness of 16 subjects with VF 

mass  lesions  were  collected  pre-surgery  and  post-surgery.  Spatially  calibrated  intraoperative 

images  were  acquired  from  each  subject  during  the  surgery.  HSV  data  underwent  temporal 

segmentation  (to  select  the  timestamps  corresponding  with  different  glottal  phases),  motion 

compensation  (to  remove  the  endoscopic  motion  artifacts),  spatial  segmentation  (to  detect  the 

edges of the VF in sub-pixel resolution), and horizontal calibration processes. The pre-surgery 

HSV data were indirectly calibrated by registering the lesions from the intraoperative images to 

their corresponding HSV recording. The vocal fold width from each calibrated pre-surgery HSV 

data was selected, and then it was registered to its corresponding post-surgery HSV data. This step 

led  to  indirect  calibration  of  the  post-surgery  HSV  data.  Three  different  experiments  were 

conducted to investigate the (1) post-surgery changes in the closing velocity of the vocal folds, 

(2) differences in pre-surgery and post-surgery similarities between closing velocity of the two 

vocal folds, and (3) the association between post-surgery changes in the closing velocity of the 

vocal folds and the area of the lesion. Experiment 1 showed significant increases in the closing 

velocity of the vocal folds with the lesion, however, the increase for the contralateral side was 

limited  more  to  the  area  in  direct  contact  with  the  lesion.  Experiment 2  showed  that  closing 

velocity of the two vocal folds become more similar after the surgery. Experiment 3 failed to detect 

a significant correlation between the post-surgery changes in the closing velocity of the vocal folds 

and the area of the lesion. 

 

239 

Chapter 4 presented the methodology for direct vertical calibration of HSV images using a 

laser-projection fiberoptic transnasal endoscope. The access to calibrated vertical measurement 

could provide significant and clinically valuable information regarding the vertical movements of 

the vocal folds in normal and disordered populations. Furthermore, the vertical calibration is the 

prerequisite for horizontal calibrated measurements from the laser-projection endoscope. The x-, 

y-coordinates of the laser points is the primary factor that encodes the vertical distance. However, 

investigating the position of the laser points showed that, besides the vertical distance, they also 

depended on the parameters of the lens coupler, including the field of view (FOV) position within 

the image frame and the rotation angle of the endoscope. An automatic calibration method was 

developed  to  compensate  for  the  effect  of  these  parameters.  Statistical  image  processing  and 

pattern recognition were used to detect the FOV, the center of FOV, and the fiducial marker. This 

step normalized the HSV frames to a standard coordinate system and removed the dependence of 

the laser-point positions on the parameters of the lens coupler. Then, using a statistical learning 

technique, a calibration protocol was developed to model the trajectories of all laser points as the 

working distance was varied. Finally, a set of experiments was conducted to measure the accuracy 

and validity of every step of the procedure. The system was able to measure vertical distance with 

mean percent error in the range of 1.7% to 4.7%, depending on the working distance. 

Accurate calibrated horizontal measurements require the determination of its confounding 

factors, and then accounting for them. Working distance is the most trivial confounding factor for 

horizontal measurements, and the method for its estimation was presented in chapter 4. Chapter 5 

investigated the possibility of a second confounding factor for calibrated horizontal measurements, 

namely the spatial location of the target object. To that end, the effect of the fiberoptic flexible 

endoscope distortions on calibrated horizontal measurements were studied and quantified. It was 

 

240 

shown that two sources of nonlinear distortions could deviate captured images from the reality. 

The first distortion stemmed from the wide-angle lens used in flexible endoscopes. It was shown 

that endoscopic images have a significantly higher spatial resolution in the center of the FOV than 

its periphery. The difference between the two could lead to as high as 26.4% error in calibrated 

horizontal measurements. The second distortion stemmed from variations in the imaging angle. It 

was shown that the disparity between spatial resolution in the center and periphery of endoscopic 

images increases as the imaging angle deviates from the perpendicular position. Furthermore, it 

was shown that when the imaging angle varies, the symmetry of the distortion was also affected 

significantly. Our analyses showed that the combined distortions could led to calibrated horizontal 

measurement errors as high as 65.7%.  

Chapter 6 built on the results and outcomes of chapters 4 and 5 and presented the methodology 

for accurate horizontal measurements from a laser-projection fiberoptic transnasal endoscope. To 

that end, a set of circular grids were recorded at multiple working distances. A statistical model 

was trained that would map from pixel length of the object, the working distance, and the spatial 

location of the target object into its mm length. This non-uniform model was contrasted with a 

second model that was not compensating for the effect of spatial location of the target object. This 

property led to a model with similar pixel sizes for all part of the image, and hence it was named 

the uniform model. The uniform model is the basis of existing methods for calibrated horizontal 

measurements, and it is significant in that regard. A detailed analysis of the performance of both 

models was presented. The analyses showed that the accuracy of the uniform method depended 

significantly on the working distance and also the length of the target object. However, the non-

uniform model was quite robust to those variations. The estimated average magnitude of error 

from the non-uniform method was 0.27 mm, which was three times less than the uniform model. 

 

241 

Chapters 4 and 6 presented the methods for calibrated vertical and horizontal measurements 

from  a  laser-projection  fiberoptic  transnasal  endoscope.  The  design  and  evaluation  of  those 

methods was done in controlled settings and using benchtop recordings. However, many factors 

could be contributing to measurement errors from in-vivo images. 

Chapter  7  investigated  the  effect  of  two  factors  that  were  more  likely  to  contribute 

significantly to increased measurement errors from in-vivo images. These factors were the imaging 

angle and the surface topology. To that end, the calibrated vertical and horizontal measurement 

models trained in chapters 4 and 6 were used. Two experiments were conducted to evaluate their 

performances  in  situations  modelling  the  in-vivo  settings.  The  first  experiment  was  based  on 

images acquired from tilted surfaces. The second experiment was based on a target surface with 

known  x-,  y-,  z-coordinates  that  was  3D-printed.  The  measurement  accuracies  from  the  tilted 

surface and the 3D-printed surface were contrasted with the accuracy from the flat surface. The 

data  analysis  showed  a  significant  effect  of  imaging  angle  on  vertical  measurement  error. 

However, the effect of imaging angle on the magnitude of horizontal measurement error was not 

significant. Analysis of the effect of surface topology showed the reverse effects. The effect of 

surface type on vertical measurement error was not significant. But the magnitude of horizontal 

measurements errors from the 3D surface was significantly higher than the flat surface. The mean 

percent magnitude of horizontal measurement error increased from 5% (flat) to 10.6% (3D) at the 

working distance of 15 mm, which still represents satisfactory accuracy. 

8.2. Directions for further investigations 

This dissertation can be expanded in several directions for future works. Chapter 2 presented 

the concept of indirect calibration. It was shown that the vocal fold width was a robust feature for 

calibration. However, this conclusion was based on a small sample size and may not be very 

 

242 

generalizable. Conducting a study with a bigger sample size and more phonatory behaviors (e.g. 

resting state, combinations of different loudness levels and pitches) could lead to an attribute that 

governs the consistency of the common attribute assumption to the maximum extent. Additionally, 

devising a test that can validate the vertical distance assumption was left as an open problem for 

future  research.  Chapter  3  may  be  expanded  in  several  directions.  Specifically,  the  dependent 

variable of chapter 3 was the magnitude of maximum closing velocity at different scanning lines. 

However,  the  developed  method  could  be  used  to  investigate  the  phase-differences  between 

different scanning lines of the vocal folds. It is quite possible for this variable to explain some of 

phenomena that the magnitude of velocity cannot. Relating the post-surgery changes in the closing 

velocity and output of the system (e.g. acoustic changes) would be another line for future research. 

Considering that calibrated images are often not available, devising a non-calibrated proxy for 

closing velocity could remove a significant obstacle in application of kinematic measures for other 

studies. Chapter 5 presented the effects of non-linear distortion and imaging angle on horizontal 

measurements from a fiberoptic flexible endoscope. However, rigid endoscope and distal-chip 

flexible endoscopes are more widely used in clinical practice. Investigation and quantification of 

the  effects  of  non-linear  distortion  and  imaging  angle  from  rigid  and  distal-chip  flexible 

endoscopes could be of significant value for clinical practices. Additionally, chapter 5 showed that 

variations in the imaging angle is a significant confounding factor for horizontal measurement. 

However, the method for estimation and compensation of the imaging angle is still lacking. Our 

initial experimentations with the laser-projections endoscope showed promising results that could 

lead  to  an  innovative  application  for  the  laser-projection  endoscope  and  requires  further 

investigations. Finally, chapters 4 and 6 presented the method for vertical and horizontal calibrated 

 

243 

measurements;  however,  applications  of  these  methods  were  not  part  of  this  dissertation.  The 

applications of these methods would be a whole avenue for future research. 

 

 

244 

 

 

 

 

 

 

 

 

 

REFERENCES 

 

 

 

 

 

 

 

 

 

245 

 

REFERENCES 

1.  

2.  

3.  

Connor  NP,  Cohen  SB,  Theis  SM,  Thibeault  SL,  Heatley  DG,  Bless  DM.  Attitudes  of 
children with dysphonia. J Voice. 2008;22(2):197-209. 

 
 

Lass  NJ,  Ruscello  DM,  Bradshaw  KH,  Blankenship  BL.  Adolescents’  perceptions  of 
normal and voice-disordered children. J Commun Disord. 1991;24(4):267-274. 

Branski RC, Cukier-Blaj S, Pusic A, et al. Measuring quality of life in dysphonic patients: 
a systematic review of content development in patient-reported outcomes measures. J voice. 
2010;24(2):193-198. 

4.   Merati AL, Keppel K, Braun NM, Blumin JH, Kerschner JE. Pediatric voice-related quality 
of life: findings in healthy children and in common laryngeal disorders. Ann Otol Rhinol 
Laryngol. 2008;117(4):259-262. 

5.   Murry  T,  Rosen  CA.  Outcome  measurements  and  quality  of  life  in  voice  disorders. 

Otolaryngol Clin North Am. 2000;33(4):905-916. 

6.  

Scott S, Robinson K, Wilson JA, Mackenzie K. Patient-reported problems associated with 
dysphonia. Clin Otolaryngol Allied Sci. 1997;22(1):37-40. 

7.   Hogikyan ND, Sethuraman G. Validation of an instrument to measure voice-related quality 

of life (V-RQOL). J voice. 1999;13(4):557-569. 

8.   Allen MS, Pettit JM, Sherblom JC. Management of vocal nodules: a regional survey of 
otolaryngologists  and  speech-language  pathologists.  J  Speech,  Lang  Hear  Res. 
1991;34(2):229-235. 

9.  

Ramig LO, Verdolini K. Treatment efficacy: voice disorders. J Speech, Lang Hear Res. 
1998;41(1):S101--S116. 

10.   Roy  N,  Merrill  RM,  Gray  SD,  Smith  EM.  Voice  disorders  in  the  general  population: 
prevalence, risk factors, and occupational impact. Laryngoscope. 2005;115(11):1988-1995. 

11.   Cutiva LCC, Vogel I, Burdorf A. Voice disorders in teachers and their associations with 

work-related factors: a systematic review. J Commun Disord. 2013;46(2):143-155. 

12.   Titze IR. Principles of Voice Production. Prentice-Hall, Englewood Cliffs, NJ; 1994. 

13.   Baken RJ, Orlikoff RF. Clinical Measurement of Speech and Voice. Cengage Learning; 

2000. 

14.   Rothenberg M. Acoustic interaction between the glottal source and the vocal tract. Vocal 

fold Physiol. 1981;1:305-323. 

 

246 

15.   Rothenberg M. Source-tract acoustic interaction in breathy voice. In: Proceedings of the 
International  Conference  on  Physiology  and  Biophysics  of  the  Voice,  Iowa  City,  IA.  ; 
1983:465-481. 

16.   Huffman MK. Measures of phonation type in Hmong. J Acoust Soc Am. 1987;81(2):495-

504. 

17.   Fischer-Jørgensen E. Phonetic Analysis of Breathy (Murmured) Vowels in Gujarati.; 1970. 

18.   Södersten M, Lindestad P-Å. Glottal closure and perceived breathiness during phonation in 

normally speaking subjects. J Speech, Lang Hear Res. 1990;33(3):601-611. 

19.   Klatt DH, Klatt LC. Analysis, synthesis, and perception of voice quality variations among 

female and male talkers. J Acoust Soc Am. 1990;87(2):820-857. 

20.   Alku  P,  Vilkman  E.  A  comparison  of  glottal  voice  source  quantification  parameters  in 
breathy, normal and pressed phonation of female and male speakers. Folia Phoniatr Logop. 
1996;48(5):240-254. 

21.   Shue Y-L, Chen G, Alwan A. On the interdependencies between voice quality, glottal gaps, 
and  voice-source  related  acoustic  measures.  In:  Eleventh  Annual  Conference  of  the 
International Speech Communication Association. ; 2010. 

22.   Bergan CC, Titze IR, Story B. The perception of two vocal qualities in a synthesized vocal 

utterance: ring and pressed voice. J Voice. 2004;18(3):305-317. 

23.   Holmberg  EB,  Hillman  RE,  Perkell  JS.  Glottal  airflow  and  transglottal  air  pressure 
measurements for male and female speakers in soft, normal, and loud voice. J Acoust Soc 
Am. 1988;84(2):511-529. doi:10.1121/1.396829 

24.   Titze  IR.  Theoretical  analysis  of  maximum  flow  declination  rate  versus  maximum  area 

declination rate in phonation. J Speech, Lang Hear Res. 2006;49:439-447. 

25.   Schuberth  S,  Hoppe  U,  Döllinger  M,  Lohscheller  J,  Eysholdt  U.  High-precision 
measurement  of  the  vocal  fold  length  and  vibratory  amplitudes.  Laryngoscope. 
2002;112(6):1043-1049. 

26.   Holmberg EB, Doyle P, Perkell JS, Hammarberg B, Hillman RE. Aerodynamic and acoustic 
voice  measurements  of  patients  with  vocal  nodules:  Variation  in  baseline  and  changes 
across voice therapy. J Voice. 2003;17(3):269-282. doi:10.1067/S0892-1997(03)00076-6 

27.  

Iwahashi T, Ogawa M, Hosokawa K, Kato C, Inohara H. A detailed motion analysis of the 
angular velocity between the vocal folds during throat clearing using high-speed digital 
imaging. J Voice. 2016;30(6):770.e1-770.e8. 

28.   Dromey  C,  Stathopoulos  ET,  Sapienza  CM.  Glottal  airflow  and  electroglottographic 

measures of vocal function at multiple intensities. J Voice. 1992;6(1):44-54. 

 

247 

29.   Titze  IR,  Sundberg  J.  Vocal  intensity  in  speakers  and  singers.  J  Acoust  Soc  Am. 

1992;91(5):2936-2946. doi:10.1121/1.402929 

30.   Roy N, Barkmeier-Kraemer J, Eadie T, et al. Evidence-based clinical voice assessment: a 

systematic review. Am J Speech-Language Pathol. 2013;22(2):212-226. 

31.   Kreiman J, Gerratt BR, Kempster GB, Erman A, Berke GS. Perceptual evaluation of voice 
quality: Review, tutorial, and a framework for future research. J Speech, Lang Hear Res. 
1993;36(1):21-40. 

32.   De MSB, de Heyning Van PH, Wuyts FL, Lambrechts L. The perceptual evaluation of voice 

disorders. Acta Otorhinolaryngol Belg. 1996;50(4):283-291. 

33.   Kent  RD.  Hearing  and  believing:  Some  limits  to  the  auditory-perceptual  assessment  of 

speech and voice disorders. Am J Speech-Language Pathol. 1996;5(3):7-23. 

34.   Oates J. Auditory-perceptual evaluation of disordered voice quality. Folia Phoniatr Logop. 

2009;61(1):49-56. 

35.   Behrman A. Common practices of voice therapists in the evaluation of patients. J Voice. 

2005;19(3):454-469. 

36.   Murugappan S, Boyce S, Khosla S, Kelchner L, Gutmark E. Acoustic characteristics of 

phonation in “wet voice” conditions. J Acoust Soc Am. 2010;127(4):2578-2589. 

37.   Warms  T,  Richards  J.  “Wet  voice”  as  a  predictor  of  penetration  and  aspiration  in 

oropharyngeal dysphagia. Dysphagia. 2000;15(2):84-88. 

38.   Arvedson JC. Feeding children with cerebral palsy and swallowing difficulties. Eur J Clin 

Nutr. 2013;67(S2):S9. 

39.   Baker BM, Fraser AM, Baker CD. Long-term postoperative dysphagia in oral/pharyngeal 
surgery  patients:  subjects’  perceptions  vs.  videofluoroscopic  observations.  Dysphagia. 
1991;6(1):11-16. 

40.   Kempster  GB,  Gerratt  BR,  Abbott  KV,  Barkmeier-Kraemer  J,  Hillman  RE.  Consensus 
auditory-perceptual evaluation of voice: development of a standardized clinical protocol. 
Am J Speech-Language Pathol. 2009;18(2):124-132. 

41.   Takahashi H. rating using the grbas scale. Japan Soc Logop phoniatr. 1995. 

42.   Honjo  I,  Isshiki  N.  Laryngoscopic  and  voice  characteristics  of  aged  persons.  Arch 

Otolaryngol. 1980;106(3):149-150. 

43.   Gobl C, Chasaide AN. The role of voice quality in communicating emotion, mood and 

attitude. Speech Commun. 2003;40(1-2):189-212. 

44.   Laver JDM. Voice quality and indexical information. Br J Disord Commun. 1968;3(1):43-

 

248 

54. 

45.   Yuasa IP. Creaky voice: A new feminine voice quality for young urban-oriented upwardly 

mobile American women? Am Speech. 2010;85(3):315-337. 

46.   Gobl  C,  Chasaide  AN.  Acoustic  characteristics  of  voice  quality.  Speech  Commun. 

1992;11(4-5):481-490. 

47.   Kreiman  J, Gerratt  BR.  Sources  of  listener  disagreement  in  voice  quality  assessment.  J 

Acoust Soc Am. 2000;108(4):1867-1876. 

48.   Kreiman J, Gerratt BR. Validity of rating scale measures of voice quality. J Acoust Soc Am. 

1998;104(3):1598-1608. 

49.   Gerratt BR, Kreiman J, Antonanzas-Barroso N, Berke GS. Comparing internal and external 
standards  in  voice  quality  judgments.  J  Speech,  Lang  Hear  Res.  1993;36(1):14-20. 
doi:10.1044/jshr.3601.14 

50.   Chan KMK, Yiu EML. The effect of anchors and training on the reliability of perceptual 

voice evaluation. J Speech, Lang Hear Res. 2002. 

51.   Eddins DA, Anand S, Camacho A, Shrivastav R. Modeling of breathy voice quality using 

pitch-strength estimates. J Voice. 2016;30(6):774--e1. 

52.   Kopf  LM,  Skowronski  MD,  Anand  S,  Eddins  DA,  Shrivastav  R.  The  Perception  of 

Breathiness in the Voices of Pediatric Speakers. J Voice. 2017. 

53.   Lieberman P. Perturbations in vocal pitch. J Acoust Soc Am. 1961;33(5):597-603. 

54.   Lieberman  P.  Some  acoustic  measures  of  the  fundamental  periodicity  of  normal  and 

pathologic larynges. J Acoust Soc Am. 1963;35(3):344-353. 

55.   Koike  Y.  Application  of  Some  Acoustic  Measures  for  the  Evaluation  of  Laryngeal 

Dysfunction. 1973. 

56.   Koike Y, Takahashi H, Calcaterra TC. Acoustic measures for detecting laryngeal pathology. 

Acta Otolaryngol. 1977;84(1-6):105-117. 

57.   Wendahl  RW.  Laryngeal  analog  synthesis  of  jitter  and shimmer  auditory  parameters  of 

harshness. Folia Phoniatr Logop. 1966;18(2):98-108. 

58.   Qi Y, Weinberg B, Bi N, Hess WJ. Minimizing the effect of period determination on the 
computation of amplitude perturbation in voice. J Acoust Soc Am. 1995;97(4):2525-2532. 
doi:10.1121/1.411972 

59.   Klingholz F. The measurement of the signal-to-noise ratio (SNR) in continuous speech. 

Speech Commun. 1987;6(1):15-26. 

60.   Yumoto  E,  Gould  WJ,  Baer  T.  Harmonics-to-noise  ratio  as  an  index  of  the  degree  of 

 

249 

hoarseness. J Acoust Soc Am. 1982;71(6):1544-1550. 

61.   Qi Y, Hillman RE. Temporal and spectral estimations of harmonics-to-noise ratio in human 

voice signals. J Acoust Soc Am. 1997;102(1):537-543. doi:10.1121/1.419726 

62.   Ebihara S, Ogawa S. Normalized noise energy as an acoustic measure to evaluate pathologic 

voice. J Acoust Soc Am. 1986;80(5):1329-1334. doi:10.1121/1.394384 

63.   Michaelis D, Gramss T, Strube HW. Glottal-to-Noise Excitation Ratio - A New Measure 

for Describing Pathological Voices. Acustica. 1997;83(4):700-706. 

64.   Ghasemzadeh H, Arjmandi MK. Toward Optimum Quantification of Pathology-induced 
Noises: An Investigation of Information Missed by Human Auditory System. IEEE/ACM 
Trans Audio, Speech, Lang Process. 2020;28:519-528. 

65.   Klich RJ. Relationships of vowel characteristics to listener ratings of breathiness. J Speech, 

Lang Hear Res. 1982;25(4):574-580. 

66.   Stevens KN. Physics of laryngeal behavior and larynx modes. Phonetica. 1977;34(4):264-

279. 

67.   Hillenbrand J, Cleveland RA, Erickson RL. Acoustic correlates of breathy vocal quality. J 

Speech, Lang Hear Res. 1994;37(4):769-778. 

68.   Godino-Llorente JI, Gómez-Vilda P. Automatic Detection of Voice Impairments by Means 
of  Short-Term  Cepstral  Parameters  and  Neural  Network  Based  Detectors.  IEEE  Trans 
Biomed Eng. 2004;51(2):380-384. doi:10.1109/TBME.2003.820386 

69.   Arjmandi MK, Pooyan M. An optimum algorithm in pathological voice quality assessment 
using  wavelet-packet-based  features,  linear  discriminant  analysis  and  support  vector 
machine. Biomed Signal Process Control. 2012;7(1):3-19. doi:10.1016/j.bspc.2011.03.010 

70.   Ghasemzadeh H, Searl J. Modeling Dynamics of Connected Speech in Time and Frequency 
Domains with Application to ALS. 11th Int Conf Voice Physiol Biomech. 2018;(August). 

71.   Vaziri  G,  Almasganj  F,  Behroozmand  R.  Pathological  assessment  of  patients’  speech 

signals using nonlinear dynamical analysis. Comput Biol Med. 2010;40(1):54-63. 

72.  

Jiang  JJ,  Zhang  Y.  Nonlinear  dynamic  analysis  of  speech  from  pathological  subjects. 
Electron Lett. 2002;38(6):294-295. doi:10.1049/e1 

73.   Ghasemzadeh  H,  Tajik  Khass  M,  Khalil  Arjmandi  M,  Pooyan  M.  Detection  of  vocal 
disorders based on phase space parameters and Lyapunov spectrum. Biomed Signal Process 
Control. 2015;22:135-145. doi:10.1016/j.bspc.2015.07.002 

74.   Kalman RE. On the general theory of control systems. In: Proceedings First International 

Conference on Automatic Control, Moscow, USSR. ; 1960. 

 

250 

75.   Kalman RE. Mathematical description of linear dynamical systems. J Soc Ind Appl Math 

Ser A Control. 1963;1(2):152-192. 

76.   Lindblom  B,  Sundberg  J.  Acoustical  Consequences  of  Lip,  Tongue,  Jaw,  and  Larynx 

Movement. J Acoust Soc Am. 2005;48(1A):120-120. doi:10.1121/1.1974958 

77.   Stevens KN, House AS. Development of a Quantitative Description of Vowel Articulation. 

J Acoust Soc Am. 2005;27(3):484-493. doi:10.1121/1.1907943 

78.   Yunusova Y, Rosenthal JS, Rudy K, Baljko M, Daskalogiannakis J. Positional targets for 
lingual  consonants  defined  using  electromagnetic  articulography.  J  Acoust  Soc  Am. 
2012;132(2):1027-1038. doi:10.1121/1.4733542 

79.   Stevens KN. On the quantal nature of speech. J Phonetics. 1989;17:3-45. 

80.   Stevens KN, Keyser SJ. Quantal theory , enhancement and overlap. J Phon. 2010;38(1):10-

19. doi:10.1016/j.wocn.2008.10.004 

81.   Honda K, Takano S, Takemoto H. Effects of side cavities and tongue stabilization: Possible 

extensions of the quantal theory. J Phon. 2010;38(1):33-43. 

82.   Fujimura O. Remarks on quantitative description of the lingual articulation. Front speech 

Commun Res. 1978:17-24. 

83.   Perkell JS, Matthies ML, Tiede M, et al. The distinctness of speakers’/s/-/∫/contrast is related 
to their auditory discrimination and use of an articulatory saturation effect. J speech, Lang 
Hear Res. 2004. 

84.   Gick B, Stavness I, Chiu C, Fels S. Categorical variation in lip posture is determined by 

quantal biomechanical-articulatory relations. Can Acoust. 2011;39(3):178-179. 

85.   Moisik SR, Gick B. The Quantal Larynx: The Stable Regions of Laryngeal Biomechanics 
and Implications for Speech Production. J Speech, Lang Hear Res. 2017;60(3):540-560. 
doi:10.1044/2016_jslhr-s-16-0019 

86.   Moisik S, Gick B. The quantal larynx revisited. J Acoust Soc Am. 2013;133(5):3522-3522. 

doi:10.1121/1.4806322 

87.   Perkell JS. Movement goals and feedback and feedforward control mechanisms in speech 

production. J Neurolinguistics. 2012;25(5):382-407. 

88.   Williamson G. Human Communication: A Linguistic Introduction. Speechmark; 2001. 

89.   Deliyski  DD,  Powell  MEG,  Zacharias  SRC,  Gerlach  TT,  De  Alarcon  A.  Experimental 
investigation  on  minimum  frame  rate  requirements  of  high-speed  videoendoscopy  for 
clinical  voice 
assessment.  Biomed  Signal  Process  Control.  2015;17:21-28. 
doi:10.1016/j.bspc.2014.11.007 

 

251 

90.   Zacharias SRC, Deliyski DD, Gerlach TT. Utility of laryngeal high-speed videoendoscopy 

in clinical voice assessment. J Voice. 2018;32(2):216-220. 

91.   Bonilha  HS,  Deliyski  DD.  Mucosal  wave:  A  normophonic  study  across  visualization 

techniques. J Voice. 2008;22(1):23-33. 

92.   Olthoff  A,  Woywod  C,  Kruse  E.  Stroboscopy  versus  high-speed  glottography:  a 

comparative study. Laryngoscope. 2007;117(6):1123-1126. 

93.   Powell ME, Deliyski DD, Zeitels SM, et al. Efficacy of Videostroboscopy and High-Speed 
Videoendoscopy to Obtain Functional Outcomes From Perioperative Ratings in Patients 
With Vocal Fold Mass Lesions. J Voice (in Press. 2019. doi:10.1016/j.jvoice.2019.03.012 

94.   Bonilha HS, Deliyski DD, Whiteside JP, Gerlach TT. Vocal Fold Phase Asymmetries in 
Patients With Voice Disorders: A Study Across Visualization Techniques. Am J Speech-
Language Pathol. 2012;21(1):3-15. doi:10.1044/1058-0360(2011/09-0086) 

95.   Rosen CA. Stroboscopy as a research instrument: development of a perceptual evaluation 

tool. Laryngoscope. 2005;115(3):423-428. 

96.   Bonilha HS, O’Shields M, Gerlach TT, Deliyski DD. Arytenoid adduction asymmetries in 
persons with and without voice disorders. Logop Phoniatr Vocology. 2009;34(3):128-134. 
doi:10.1080/14015430903150210 

97.   Braunschweig T, Flaschka J, Schelhorn-Neise P, Döllinger M. High-speed video analysis 
of the phonation onset, with an application to the diagnosis of functional dysphonias. Med 
Eng Phys. 2008;30(1):59-66. 

98.   Mehta DD, Deliyski DD, Quatieri TF, Hillman RE. Automated measurement of vocal fold 
vibratory asymmetry from high-speed videoendoscopy recordings. J Speech, Lang Hear 
Res. 2011;54(1):47-54. 

99.   Verikas A, Gelzinis A, Bacauskiene M, Uloza V. Integrating global and local analysis of 
color, texture and geometrical information for categorizing laryngeal images. Int J Pattern 
Recognit Artif Intell. 2006;20(08):1187-1205. 

100.   Orlikoff RF, Deliyski DD, Baken RJ, Watson BC. Validation of a Glottographic Measure 

of Vocal Attack. J Voice. 2009;23(2):164-168. doi:10.1016/j.jvoice.2007.08.004 

101.   Lohscheller J, Švec JG, Döllinger M. Vocal fold vibration amplitude, open quotient, speed 
quotient and their variability along glottal length: kymographic data from normal subjects. 
Logop Phoniatr Vocology. 2013;38(4):182-192. 

102.   Patel RR, Dubrovskiy D, Döllinger M. Measurement of glottal cycle characteristics between 

children and adults: physiological variations. J Voice. 2014;28(4):476-486. 

103.   Patel R, Donohue KD, Unnikrishnan H, Kryscio RJ. Kinematic measurements of the vocal-
fold displacement waveform in typical children and adult populations: quantification of 

 

252 

high-speed endoscopic videos. J Speech, Lang Hear Res. 2015;58(2):227-240. 

104.   Hillman RE, Mehta DD. The science of stroboscopic imaging. In: Kendall KA, Leonard RJ, 
eds. Laryngeal Evaluation: Indirect Laryngoscopy to High-Speed Digital Imaging. Thieme 
New York, NY; 2010:101-109. 

105.   Deliyski  D.  Laryngeal  high-speed  videoendoscopy.  In:  Kendall  K,  Leonard  R,  eds. 
Laryngeal  Evaluation:  Indirect  Laryngoscopy  to  High-Speed  Digital  Imaging.  Thieme 
Medical, New York, NY; 2010:245-270. 

106.   Sprecher A, Olszewski A, Jiang JJ, Zhang Y. Updating signal typing in voice: addition of 

type 4 signals. J Acoust Soc Am. 2010;127(6):3710-3716. 

107.   Mehta  DD,  Deliyski  DD,  Hillman  RE.  Why  Laryngeal  Stroboscopy  Really  Works: 
Clarifying  Misconceptions  Surrounding  Talbot’s  Law  and  the  Persistence  of  Vision.  J 
Speech,  Lang  Hear  Res.  2010;53(5):1263-1267.  doi:https://doi.org/10.1044/1092-
4388(2010/09-0241) 

108.   Mehta  DD,  Hillman  RE.  Current  role  of  stroboscopy  in  laryngeal  imaging.  Curr  Opin 

Otolaryngol Head Neck Surg. 2012;20(6):429. 

109.   Deliyski  DD,  Hillman  RE.  State  of  the  art  laryngeal  imaging:  research  and  clinical 

implications. Curr Opin Otolaryngol Head Neck Surg. 2010;18(3):147. 

110.   Patel RR, Eadie T, Paul D, et al. Recommended Protocols for Instrumental Assessment of 
Voice:  American  Speech-Language-Hearing  Association  Expert  Panel  to  Develop  a 
Protocol for Instrumental Assessment of Vocal Function. Am J Speech-Language Pathol. 
2018;27(3):887-905. doi:10.1044/2018_ajslp-17-0009 

111.   Deliyski  DD,  Petrushev  PP,  Bonilha  HS,  Gerlach  TT,  Martin-Harris  B,  Hillman  RE. 
Clinical  implementation  of  laryngeal  high-speed  videoendoscopy:  Challenges  and 
evolution. Folia Phoniatr Logop. 2008;60(1):33-44. 

112.   Švec JG, Schutte HK. Videokymography: high-speed line scanning of vocal fold vibration. 

J Voice. 1996;10(2):201-205. 

113.   Švec JG, Šram F, Schutte HK. Videokymography in Voice Disorders: What to Look For? 

Ann Otol Rhinol Laryngol. 2007;116(3):172-180. 

114.   Golla  ME,  Deliyski  DD,  Orlikoff  RF,  Moukalled  HJ.  Objective  comparison  of  the 
electroglottogram to synchronous high-speed images of vocal-fold contact during vibration. 
Model Anal Vocal Emiss Biomed Appl - 6th Int Work MAVEBA 2009. 2009;9:1-4. 

115.   Mehta  DD,  Zañartu  M,  Quatieri  TF,  Deliyski  DD,  Hillman  RE.  Investigating  acoustic 
correlates of human vocal fold vibratory phase asymmetry through modeling and laryngeal 
high-speed 
2011;130(6):3999-4009. 
doi:10.1121/1.3658441 

videoendoscopy. 

Am. 

J 

Acoust 

Soc 

 

253 

116.   Naghibolhosseini M, Deliyski DD, Zacharias SRC, de Alarcon A, Orlikoff RF. Temporal 
Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech. J Voice. 
2018;32(2):256--e1. 

117.   Mehta  DD,  Deliyski  DD,  Zeitels  SM,  Quatieri  TF,  Hillman  RE.  Voice  production 
mechanisms following phonosurgical treatment of early glottic cancer. Ann Otol Rhinol 
Laryngol. 2010;119(1):1-9. 

118.   Schwarz  R,  Döllinger  M,  Wurzbacher  T,  Eysholdt  U,  Lohscheller  J.  Spatio-temporal 
quantification  of  vocal  fold  vibrations  using  high-speed  videoendoscopy  and  a 
biomechanical model. J Acoust Soc Am. 2008;123(5):2717-2732. 

119.   Yan Y, Damrose E, Bless D. Functional analysis of voice using simultaneous high-speed 

imaging and acoustic recordings. J Voice. 2007;21(5):604-616. 

120.   Skalski  A,  Zielinki  T,  Deliyski  D.  Analysis  of  vocal  folds  movement  in  high  speed 
videoendoscopy  based  on  level  set  segmentation  and  image  registration.  In:  2008 
International Conference on Signals and Electronic Systems. ; 2008:223-226. 

121.   Yan Y, Chen X, Bless D. Automatic tracing of vocal-fold motion from high-speed digital 

images. IEEE Trans Biomed Eng. 2006;53(7):1394-1400. 

122.   Ghasemzadeh H, Deliyski DD, Ford DS, Kobler JB, Hillman RE, Mehta DD. Method for 
Vertical  Calibration 
of  Laser-Projection  Transnasal  Fiberoptic  High-Speed 
Videoendoscopy. J Voice. 2020;34(6):847-861. PMID: 31151853; PMCID: PMC6883161. 

123.   Moukalled HJ, Deliyski DD, Schwarz RR, Wang S. Segmentation of Laryngeal High-Speed 
Videoendoscopy in Temporal Domain Using Paired Active Contours. Sixth Int Work Model 
Anal Vocal Emiss Biomed Appl MAVEBA. 2009;9(d):137-140. 

124.   Karakozoglou  S-Z,  Henrich  N,  D  Alessandro  C,  Stylianou  Y.  Automatic  glottal 
segmentation  using  local-based  active  contours  and  application  to  glottovibrography. 
Speech Commun. 2012;54(5):641-654. 

125.   Deliyski DD. Endoscope motion compensation for laryngeal high-speed videoendoscopy. J 

Voice. 2005;19(3):485-496. doi:10.1016/j.jvoice.2004.07.006 

126.   Sulica L. Laryngoscopy, stroboscopy and other tools for the evaluation of voice disorders. 

Off Proced Laryngol An Issue Otolaryngol Clin. 2012;46(1):21. 

127.   Milstein  CF,  Charbel  S,  Hicks  DM,  Abelson  TI,  Richter  JE,  Vaezi  MF.  Prevalence  of 
laryngeal  irritation  signs  associated  with  reflux  in  asymptomatic  volunteers:  impact  of 
endoscopic technique (rigid vs. flexible laryngoscope). Laryngoscope. 2005;115(12):2256-
2261. 

128.   Yanagisawa  E,  Yanagisawa  K.  Stroboscopic  videolaryngoscopy:  A  comparison  of 
fiberscopic  and  telescopic  documentation.  Ann  Otol  Rhinol  Laryngol.  1993;102(4):255-
265. 

 

254 

129.   Eller  R,  Ginsburg  M,  Lurie  D,  Heman-Ackah  Y,  Lyons  K,  Sataloff  R.  Flexible 
laryngoscopy:  a  comparison  of  fiber  optic  and  distal  chip  technologies  part  2: 
laryngopharyngeal reflux. J Voice. 2009;23(3):389-395. 

130.   Chandran  S,  Hanna  J,  Lurie  D,  Sataloff  RT.  Differences  between  flexible  and  rigid 

endoscopy in assessing the posterior glottic chink. J Voice. 2011;25(5):591-595. 

131.   Ng  ML,  Bailey  RL.  Acoustic  changes  related  to  laryngeal  examination  with  a  rigid 

telescope. Folia Phoniatr Logop. 2006;58(5):353-362. 

132.   Kobler  JB,  Zeitels  SM,  Hillman  RE,  Kuo  J.  Assessment  of  vocal  function  using 
simultaneous  aerodynamic  and  calibrated  videostroboscopic  measures.  Ann  Otol  Rhinol 
Laryngol. 1998;107(6):477-485. 

133.   Mehta DD, Deliyski DD, Zeitels SM, Zañartu M, Hillman RE. Integration of transnasal 
fiberoptic high-speed videoendoscopy with time-synchronized recordings of vocal function. 
ePhonoscope. 2015:105-114. 

134.   Zañartu M, Mehta DD, Ho JC, Wodicka GR, Hillman RE. Observation and analysis of in 
vivo  vocal  fold  tissue  instabilities  produced  by  nonlinear  source-filter  coupling:  a  case 
study. J Acoust Soc Am. 2011;129(1):326-339. 

135.   Rosen  CA,  Murry  T.  Diagnostic  laryngeal  endoscopy.  Otolaryngol  Clin  North  Am. 

2000;33(4):751-757. 

136.   Gray SD, Smith ME, Schneider H. Voice disorders in children. Pediatr Clin North Am. 

1996;43(6):1357-1384. 

137.   Chait  DH,  Lotz  WK.  Successful  pediatric  examinations  using  nasoendoscopy. 

Laryngoscope. 1991;101(9):1016-1018. 

138.   Clark BS, Gao WZ, Bertelsen C, et al. Flexible versus rigid laryngoscopy: A randomized 

crossover study comparing patient experience. Laryngoscope. 2020. 

139.   Rothenberg M. Source-tract acoustic interaction and voice quality. In: Transcripts of the 
12th Symposium Care of Professional Voice, Part I. New York, NY: The Voice Foundation. 
; 1983:25-31. 

140.   Ben-David BM, Icht M. Voice Changes in Real Speaking Situations during a Day, with and 
Without  Vocal  Loading:  Assessing  Call  Center  Operators.  J  Voice.  2016;30(2):247e1-
247e11. doi:10.1016/j.jvoice.2015.04.002 

141.   Laukkanen AM, Ilomäki I, Leppänen K, Vilkman E. Acoustic Measures and Self-reports of 
2008;22(3):283-289. 

Teachers. 

Female 

Voice. 

Vocal 
doi:10.1016/j.jvoice.2006.10.001 

Fatigue 

by 

J 

142.   Laukkanen  AM,  Kankare  E.  Vocal  loading-related  changes  in  male  teachers’  voices 
investigated before and after a working day. Folia Phoniatr Logop. 2006;58(4):229-239. 

 

255 

doi:10.1159/000093180 

143.   Laukkenen A-M, Jarvinen K, Artkoski M, et al. Changes in Voice and Subjective Sensations 
during  a  45-min  Vocal  Loading  Test  in  Female  Subjects  with  Vocal  Training.  Folia 
Phoniatr e Logop. 2004. 

144.   Jonsdottir V, Laukkenen A-M, Siiki I. Changes in Teachers ’ Speech during a Working Day 
with and without Electric Sound Amplification. Folia Phoniatr e Logop. 2003;601:282-287. 
doi:10.1159/000066149 

145.   Lehto L, Laaksonen L, Vilkman E, Alku P. Occupational voice complaints and objective 
acoustic measurements - Do they correlate? Logop Phoniatr Vocology. 2006;31(4):147-
152. doi:10.1080/14015430600654654 

146.   Wolfe VI, Long J, Youngblood HC, Williford H, Olson MS. Vocal parameters of aerobic 
J  Voice.  2002;16(1):52-60. 

and  without  voice  problems. 

instructors  with 
doi:10.1016/S0892-1997(02)00072-3 

147.   Yang  A,  Stingl  M,  Berry  DA,  et  al.  Computation  of  physiological  human  vocal  fold 
parameters  by  mathematical  optimization  of  a  biomechanical  model.  J  Acoust  Soc  Am. 
2011;130(2):948-964. 

148.   Yang A, Lohscheller J, Berry DA, et al. Biomechanical modeling of the three-dimensional 

aspects of human vocal fold dynamics. J Acoust Soc Am. 2010;127(2):1014-1031. 

149.   Šidlof P, Švec JG, Horáček J, Vesel\`y J, Klepáček I, Havl\’\ik R. Geometry of human vocal 
folds  and  glottal  channel  for  mathematical  and  biomechanical  modeling  of  voice 
production. J Biomech. 2008;41(5):985-995. 

150.   Titze IR, Alipour F. The Myoelastic Aerodynamic Theory of Phonation. National Center for 

Voice and Speech; 2006. 

151.   Thomson SL, Mongeau L, Frankel SH. Aerodynamic transfer of energy to the vocal folds. 

J Acoust Soc Am. 2005;118(3):1689-1700. 

152.   Fulcher  LP,  Scherer  RC.  Phonation  threshold  pressure:  Comparison  of  calculations  and 
measurements  taken  with  physical  models  of  the  vocal  fold  mucosa.  J  Acoust  Soc  Am. 
2011;130(3):1597-1605. 

153.   Patel RR, Donohue KD, Lau D, Unnikrishnan H. In vivo measurement of pediatric vocal 

fold motion using structured light laser projection. J Voice. 2013;27(4):463-472. 

154.   Verdolini Abbott K, K., Hersan, R., Hammer, D., & Potter Reed J. Adventures in Voice: A 

whole new way of doing things for kids. 2015. 

155.   Selby JC, Gilbert HR, Lerman JW. Perceptual and acoustic evaluation of individuals with 

laryngopharyngeal reflux pre-and post-treatment. J Voice. 2003;17(4):557-570. 

 

256 

156.   Schindler  A,  Mozzanica  F,  Maruzzi  P,  Atac  M,  De  Cristofaro  V,  Ottaviani  F. 
Multidimensional  assessment  of  vocal  changes  in  benign  vocal  fold  lesions  after  voice 
therapy. Auris Nasus Larynx. 2013;40(3):291-297. 

157.   Rydell R, Schalén L, Fex S, Elner Å. Voice evaluation before and after laser excision vs. 

radiotherapy of T1A glottic carcinoma. Acta Otolaryngol. 1995;115(4):560-565. 

158.   Chen SH, Hsiao T-Y, Hsiao L-C, Chung Y-M, Chiang S-C. Outcome of resonant voice 
therapy  for  female  teachers  with  voice  disorders:  Perceptual,  physiological,  acoustic, 
aerodynamic, and functional measurements. J Voice. 2007;21(4):415-425. 

159.   Fex B, Fex S, Shiromoto O, Hirano M. Acoustic analysis of functional dysphonia: Before 
and after voice therapy (accent method). J Voice. 1994;8(2):163-167. doi:10.1016/S0892-
1997(05)80308-X 

160.   Tezcaner CZ, Ozgursoy SK, Sati I, Dursun G. Changes after voice therapy in objective and 
subjective  voice  measurements  of  pediatric  patients  with  vocal  nodules.  Eur  Arch  Oto-
Rhino-Laryngology. 2009;266(12):1923-1927. 

161.   Roy  N,  Bless  DM,  Heisey  D,  Ford  CN.  Manual  circumlaryngeal  therapy  for 
functionaldysphonia: An evaluation of short-and long-term treatment outcomes. J Voice. 
1997;11(3):321-331. 

162.   Gillespie AI, Dastolfo C, Magid N, Gartner-Schmidt J. Acoustic analysis of four common 
voice  diagnoses:  moving  toward  disorder-specific  assessment.  J  Voice.  2014;28(5):582-
588. 

163.   Holmberg  EB,  Hillman  RE,  Perkell  JS,  Guiod  PC,  Goldman  SL.  Comparisons  Among 
Aerodynamic, Electroglottographic, and Acoustic Spectral Measures of Female Voice. J 
Speech, Lang Hear Res. 1995;38(6):1212-1223. doi:10.1044/jshr.3806.1212 

164.   Döllinger M, Gómez P, Patel RR, Alexiou C, Bohr C, Schützenberger A. Biomechanical 
simulation of vocal fold dynamics in adults based on laryngeal high-speed videoendoscopy. 
PLoS One. 2017;12(11):e0187486. 

165.   Bohr C, Kraeck A, Eysholdt U, Ziethe A, Döllinger M. Quantitative analysis of organic 
females  by  high-speed  endoscopy.  Laryngoscope. 

in 

vocal 
fold  pathologies 
2013;123(7):1686-1693. 

166.   Stevens KN. Acoustic Phonetics. Vol 30. MIT press; 2000. 

167.   Dejonckere PH, Bradley P, Clemente P, et al. A basic protocol for functional assessment of 
voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and 
evaluating new assessment techniques. Eur Arch Oto-rhino-laryngology. 2001;258(2):77-
82. 

168.   Rosen CA, Gartner-Schmidt J, Hathaway B, et al. A nomenclature paradigm for benign 
2012;122(6):1335-1341. 

midmembranous 

Laryngoscope. 

lesions. 

vocal 

fold 

 

257 

doi:10.1002/lary.22421 

169.   Naunheim  MR,  Carroll  TL.  Benign  vocal  fold  lesions:  Update  on  nomenclature,  cause, 
diagnosis, and treatment. Curr Opin Otolaryngol Head Neck Surg. 2017;25(6):453-458. 
doi:10.1097/MOO.0000000000000408 

170.   Spiegel  JR,  Sataloff  RT,  Hawkshaw  MJ.  Strobovideolaryngoscopy:  results  and  clinical 

value. Ann Otol Rhinol Laryngol. 1991;100(9):725-727. 

171.   Woo P, Colton R, Casper J, Brewer D. Diagnostic value of stroboscopic examination in 

hoarse patients. J voice. 1991;5(3):231-238. 

172.   Titze IR. The physics of small-amplitude oscillation of the vocal folds. J Acoust Soc Am. 

1988;83(4):1536-1552. 

173.   Titze IR, Talkin DT. A theoretical study of the effects of various laryngeal configurations 

on the acoustics of phonation. J Acoust Soc Am. 1979;66(1):60-74. 

174.   Titze  IR,  Jiang  JJ,  Hsiao  T-Y.  Measurement  of  mucosal  wave  propagation  and  vertical 

phase difference in vocal fold vibration. Ann Otol Rhinol Laryngol. 1993;102(1):58-63. 

175.   Boutin H, Smith J, Wolfe J. Laryngeal flow due to longitudinal sweeping motion of the 
vocal folds and its contribution to auto-oscillation. J Acoust Soc Am. 2015;138(1):146-149. 

176.   Hirano M. Phonosurgery: basic and clinical investigations. Otol. 1975;21:239-242. 

177.   Krausert CR, Olszewski AE, Taylor LN, McMurray JS, Dailey SH, Jiang JJ. Mucosal wave 

measurement and visualization techniques. J Voice. 2011;25(4):395-405. 

178.   Titze IR. Phonation threshold pressure: A missing link in glottal aerodynamics. J Acoust 

Soc Am. 1992;91(5):2926-2935. 

179.   Verdolini-Marston K, Titze IR, Druker DG. Changes in phonation threshold pressure with 

induced conditions of hydration. J voice. 1990;4(2):142-151. 

180.   Chan RW, Titze IR. Dependence of phonation threshold pressure on vocal tract acoustics 

and vocal fold tissue mechanics. J Acoust Soc Am. 2006;119(4):2351-2362. 

181.   Imaging  H.  28  Laryngeal  High-Speed  Videoendoscopy.  Laryngeal  Eval.  2014. 

doi:10.1055/b-0034-81468 

182.   Eller  R,  Ginsburg  M,  Lurie  D,  Heman-Ackah  Y,  Lyons  K,  Sataloff  R.  Flexible 
laryngoscopy: a comparison of fiber optic and distal chip technologies. Part 1: vocal fold 
masses. J Voice. 2008;22(6):746-750. 

183.   Yamauchi A, Yokonishi H, Imagawa H, et al. Quantification of vocal fold vibration in 
various laryngeal disorders using high-speed digital imaging. J Voice. 2016;30(2):205-214. 

184.   Powell ME, Deliyski DD, Hillman RE, Zeitels SM, Burns JA, Mehta DD. Comparison of 

 

258 

Videostroboscopy  to  Stroboscopy  Derived  From  High-Speed  Videoendoscopy  for 
Evaluating Patients With Vocal Fold Mass Lesions. 2016;25(Andrade 2009):2011-2013. 
doi:10.1044/2016 

185.   Gardner  GM,  Parnes  SM.  Status  of  the  mucosal  wave  post  vocal  cord  injection  versus 

thyroplasty. J Voice. 1991;5(1):64-73. 

186.   Rihkanen  H,  Reijonen  P,  Lehikoinen-Söderlund  S,  Lauri  E-R.  Videostroboscopic 
assessment of unilateral vocal fold paralysis after augmentation with autologous fascia. Eur 
Arch Oto-Rhino-Laryngology Head Neck. 2004;261(4):177-183. 

187.   Hsiung M-W, Kang B-H, Su W-F, Pai LU, Lin Y-H. Combination of fascia transplantation 
and fat injection into the vocal fold for sulcus vocalis: long-term results. Ann Otol Rhinol 
Laryngol. 2004;113(5):359-366. 

188.   González-Herranz R, Garc\’\ia EH, Granda-Rosales M, Eisenberg-Plaza G, Woodeson JM, 
Plaza G. Improved mucosal wave in unilateral autologous temporal fascia graft in sulcus 
vocalis type 2 and vocal scars. J Voice. 2019;33(6):915-922. 

189.   Tsuji  DH,  de  Almeida  ER,  Sennes  LU,  Butugan  O,  Pinho  SMR.  Comparison  between 
thyroplasty  type  I  andArytenoid  rotation:  a  study  of  vocal  fold  vibration  using  excised 
human larynges. J Voice. 2003;17(4):596-604. 

190.   Schade G, Leuwer R, Kraas M, Rassow B, Hess MM. Laryngeal morphometry with a new 
laser “clip on” device. Lasers Surg Med Off J Am Soc Laser Med Surg. 2004;34(5):363-
367. 

191.   Kobler JB, Rosen DI, Burns JA, et al. Comparison of a flexible laryngoscope with calibrated 
intraoperative  measurements.  Ann  Otol  Rhinol  Laryngol. 

to 

function 

sizing 
2006;115(10):733-740. 

192.   Herzon GD, Zealear DL. New laser ruler instrument for making measurements through an 

endoscope. Otolaryngol Neck Surg. 1997;116(6):689-692. 

193.   Hertega˚rd S. Measurement of human vocal fold vibrations with laser triangulation. Opt 

Eng. 2002;40(9):2041. doi:10.1117/1.1396324 

194.   Luegmair G, Kniesburges S, Zimmermann M, Sutor A, Eysholdt U, Dollinger M. Optical 
reconstruction  of  high-speed  surface  dynamics  in  an  uncontrollable  environment.  IEEE 
Trans Med Imaging. 2010;29(12):1979-1991. 

195.   Deliyski DD, Shishkov M, Mehta DD, Ghasemzadeh H, Bouma B, Zañartu M, de Alarcon 
A, Hillman RE. Laser-Calibrated System for Transnasal Fiberoptic Laryngeal High-Speed 
Videoendoscopy. J Voice. 2019 Aug 2:S0892-1997(19)30278-4. Epub ahead of print. doi: 
10.1016/j.jvoice.2019.07.013. PMID: 31383516; PMCID: PMC6995434. 

196.   Awan SN, Roy N, Jetté ME, Meltzner GS, Hillman RE. Quantifying dysphonia severity 
using  a  spectral/cepstral-based  acoustic  index:  comparisons  with  auditory-perceptual 

 

259 

judgements from the CAPE-V. Clin Linguist Phon. 2010;24(9):742-758. 

197.   Hillenbrand J, Houde RA. Acoustic correlates of breathy vocal quality: Dysphonic voices 

and continuous speech. J Speech, Lang Hear Res. 1996;39(2):311-321. 

198.   Eadie TL, Doyle PC. Classification of dysphonic voice: acoustic and auditory-perceptual 

measures. J Voice. 2005;19(1):1-14. 

199.   Peterson EA, Roy N, Awan SN, Merrill RM, Banks R, Tanner K. Toward validation of the 
cepstral spectral index of dysphonia (CSID) as an objective treatment outcomes measure. J 
Voice. 2013;27(4):401-410. 

200.   Maryn  Y,  Corthals  P,  Van  Cauwenberge  P,  Roy  N,  De  Bodt  M.  Toward  improved 
ecological  validity  in  the  acoustic  measurement  of  overall  voice  quality:  combining 
continuous speech and sustained vowels. J voice. 2010;24(5):540-555. 

201.   Godino-Llorente  JI,  Gomez-Vilda  P,  Blanco-Velasco  M.  Dimensionality  reduction  of  a 
pathological voice quality assessment system based on Gaussian mixture models and short-
term  cepstral  parameters. 
IEEE  Trans  Biomed  Eng.  2006;53(10):1943-1953. 
doi:10.1109/TBME.2006.871883 

202.   Noordzij JP, Woo P. Glottal area waveform analysis of bsenign vocal fold lesions before 
I):441-446. 

surgery.  Ann  Otol  Rhinol  Laryngol. 

2000;109(5 

after 

and 
doi:10.1177/000348940010900501 

203.   Patel RR, Unnikrishnan H, Donohue KD. Effects of vocal fold nodules on glottal cycle 
measurements  derived  from  high-speed  videoendoscopy  in  children.  PLoS  One. 
2016;11(4):e0154586. 

204.   Hibi  SR,  Bless  DM,  Hirano  M,  Yoshida  T.  Distortions  of  videofiberoscopy  imaging: 

reconsideration and correction. J Voice. 1988;2(2):168-175. 

205.   Deng JJ, Hadwin PJ, Peterson SD. The effect of high-speed videoendoscopy configuration 
on  reduced-order  model  parameter  estimates  by  Bayesian  inference.  J  Acoust  Soc  Am. 
2019;146(2):1492-1502. 

206.   Speyer R, Wieneke GH, Kersing W, Dejonckere PH. Accuracy of measurements on digital 
videostroboscopic images of the vocal folds. Ann Otol Rhinol Laryngol. 2005;114(6):443-
450. 

207.   Alzamendi  GA,  Manriquez  R,  Hadwin  PJ,  et  al.  Bayesian  estimation  of  vocal  function 
measures using laryngeal high-speed videoendoscopy and glottal airflow estimates: An in 
vivo case study. J Acoust Soc Am. 2020;147(5):EL434--EL439. 

208.   Ghasemzadeh  H,  Deliyski  DD.  Non-Linear  Image  Distortions  in  Flexible  Fiberoptic 
Endoscopes and their Effects on Calibrated Horizontal Measurements Using High-Speed 
Videoendoscopy. J Voice. 2020 Sep 18:S0892-1997(20)30331-3. Epub ahead of print. doi: 
10.1016/j.jvoice.2020.08.029. PMID: 32958427. 

 

260 

209.   Ghasemzadeh H, Deliyski D, Hillman RE, Mehta DD. Method for Horizontal Calibration 

of Laser-Projection Transnasal Fiberoptic High-Speed Videoendoscopy. In Preparation. 

210.   Johns MM. Update on the etiology, diagnosis, and treatment of vocal fold nodules, polyps, 
cysts.  Curr  Opin  Otolaryngol  Head  Neck  Surg.  2003;11(6):456-461. 

and 
doi:10.1097/00020840-200312000-00009 

211.   Oppenheim  Alan  V,  Willsky  Alan  S,  Hamid  Nawab  S.  Signals  and  systems.  ISBN-10, 

Pearson Press USA. 1996. 

212.   Hsiao  T-Y,  Wang  C-L,  Chen  C-N,  Hsieh  F-J,  Shau  Y-W.  Noninvasive  assessment  of 
laryngeal  phonation  function  using  color  Doppler  ultrasound  imaging.  Ultrasound  Med 
Biol. 2001;27(8):1035-1040. 

213.   DeJonckere PH, Lebacq J. Vocal Fold Collision Speed in vivo: The Effect of Loudness. J 

Voice. 2020. doi:10.1016/j.jvoice.2020.08.025 

214.   Subbotina  M  V.  Evaluation  the  velocity  of  vocal  fold  movements  in  adults  by  duplex 

Doppler scanning. Vestn Otorinolaringol. 2019;84(5):38-43. 

215.   Colton RH, Woo P, Brewer DW, Griffin B, Casper J. Stroboscopic signs associated with 
benign  lesions  of  the  vocal  folds.  J  Voice.  1995;9(3):312-325.  doi:10.1016/S0892-
1997(05)80240-1 

216.   Wallis  L,  Jackson-Menaldi  C,  Holland  W,  Giraldo  A.  Vocal  fold  nodule  vs.  vocal  fold 
polyp:  Answer  from  surgical  pathologist  and  voice  pathologist  point  of  view.  J  Voice. 
2004;18(1):125-129. doi:10.1016/j.jvoice.2003.07.003 

217.   Benninger MS. Microdissection or Microspot CO 2 Laser for Limited Vocal Fold Benign 
Lesions:  A  Prospective  Randomized  Trial.  Laryngoscope.  2000;110(S92):1-1. 
doi:10.1097/00005537-200002001-00001 

218.   Altman  KW.  Vocal  Fold  Masses.  Otolaryngol  Clin  North  Am.  2007;40(5):1091-1108. 

doi:10.1016/j.otc.2007.05.011 

219.   Dejonckere PH, Kob M. Pathogenesis of vocal fold nodules: New insights from a modelling 

approach. Folia Phoniatr Logop. 2009;61(3):171-179. doi:10.1159/000219952 

220.   De Vries MP, Schutte HK, Veldman AEP, Verkerke GJ. Glottal flow through a two-mass 
model: comparison of Navier--Stokes solutions with simplified models. J Acoust Soc Am. 
2002;111(4):1847-1853. 

221.   Benninger  MS,  Alessi  D,  Archer  S,  et  al.  Vocal  fold  scarring:  current  concepts  and 

management. Otolaryngol - Head Neck Surg. 1996;115(5):474-482. 

222.   Rousseau B, Hirano S, Scheidt TD, et al. Characterization of vocal fold scarring in a canine 

model. Laryngoscope. 2003;113(4):620-627. 

 

261 

223.   Cavallo SA, Baken RJ. Prephonatory laryngeal and chest wall dynamics. J Speech, Lang 

Hear Res. 1985;28(1):79-87. 

224.   Shiba TL, Chhetri DK. Dynamics of phonatory posturing at phonation onset. Laryngoscope. 

2016;126(8):1837-1843. 

225.   Chhetri DK, Neubauer J, Berry DA. Neuromuscular control of fundamental frequency and 

glottal posture at phonation onset. J Acoust Soc Am. 2012;131(2):1401-1412. 

226.   Faaborg-Andersen  K.  Electromyography  of  laryngeal  muscles  in  humans.  technics  and 

results. Aktuel Probl Phoniatr Logop. 1965;12:1. 

227.   Deliyski D, Petrushev P. Methods for objective assessment of high-speed videoendoscopy. 

Proc Adv Quant Laryngol. 2003:1-16. 

228.   Titze IR. Mechanical stress in phonation. J Voice. 1994;8(2):99-105. 

229.   Sapienza C, Ruddy BH. Voice Disorders. Plural Publishing; 2016. 

230.   Hunter  EJ,  Titze  IR,  Alipour  F.  A 

three-dimensional  model  of  vocal  fold 

abduction/adduction. J Acoust Soc Am. 2004;115(4):1747-1759. 

231.   Manneberg G, Hertegard S, Liljencrantz J. Measurment of human vocal fold vibrations with 

laser triangulation. Opt Eng. 2001;40(9):2041-2045. 

232.   Larsson H, Hertegård S. Calibration of high-speed imaging by laser triangulation. Logop 

Phoniatr Vocology. 2004;29(4):154-161. 

233.   George  NA,  de  Mul  FFM,  Qiu  Q,  Rakhorst  G,  Schutte  HK.  New  laryngoscope  for 
quantitative  high-speed  imaging  of  human  vocal  folds  vibration  in  the  horizontal  and 
vertical direction. J Biomed Opt. 2008;13(6):64024. doi:10.1117/1.3041164 

234.   Wurzbacher T, Voigt I, Schwarz R, et al. Calibration of laryngeal endoscopic high-speed 
image sequences by an automated detection of parallel laser line projections. Med Image 
Anal. 2008;12(3):300-317. 

235.   Semmler M, Kniesburges S, Birk V, Ziethe A, Patel R, Döllinger M. 3D reconstruction of 
human laryngeal dynamics based on endoscopic high-speed recordings. IEEE Trans Med 
Imaging. 2016;35(7):1615-1624. 

236.   Luegmair  G,  Mehta  DD,  Kobler  JB,  Döllinger  M.  Three-Dimensional  Optical 
Reconstruction of Vocal Fold Kinematics Using High-Speed Video With a Laser Projection 
System. IEEE Trans Med Imaging. 2015;34(12):2572-2582. 

237.   Ji Z, Leu M-C. Design of optical triangulation devices. Opt Laser Technol. 1989;21(5):339-

341. 

238.   Smith WJ, Smith WJ. Modern Optical Engineering. (3rd, ed.). Mcgraw-hill New York; 

 

262 

2000. 

239.   Bayer BE. Color imaging array. 1976. 

240.   Atherton  TJ,  Kerbyson  DJ.  Size  invariant  circle  detection.  Image  Vis  Comput. 

1999;17(11):795-803. 

241.   Yuen  HK,  Princen  J,  Illingworth  J,  Kittler  J.  Comparative  study  of  Hough  transform 

methods for circle finding. Image Vis Comput. 1990;8(1):71-77. 

242.   Duda  RO,  Hart  PE.  Use  of  the  Hough  Transformation  to  Detect  Lines  and  Curves  in 

Pictures.; 1971. 

243.   Ballard DH. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognit. 

1981;13(2):111-122. 

244.   Dougherty  ER,  Lotufo  RA.  Hands-on  Morphological  Image  Processing.  Vol  59.  SPIE 

press; 2003. 

245.   Hamburg  MA,  Collins  FS.  The  path  to  personalized  medicine.  N  Engl  J  Med. 

2010;363(4):301-304. 

246.   Neal ML, Kerckhoffs R. Current progress in patient-specific modeling. Brief Bioinform. 

2009;11(1):111-126. 

247.   Kendall KA, Leonard RJ, eds. Laryngeal Evaluation: Indirect Laryngoscopy to High-Speed 

Digital Imaging. Thieme; 2011. 

248.   Dailey SH, Kobler JB, Hillman RE, et al. Endoscopic measurement of vocal fold movement 

during adduction and abduction. Laryngoscope. 2005;115(1):178-183. 

249.   Bonilha HS, Deliyski DD, Gerlach TT. Phase asymmetries in normophonic speakers: visual 

judgments and objective findings. Am J Speech-Language Pathol. 2008;17(4):367-376. 

250.   Fannin TE, Grosvenor T. Clinical Optics. Butterworth-Heinemann; 2013. 

251.   Field A, Miles J, Field Z. Discovering Statistics Using R. Sage publications; 2012. 

252.   Wilcox RR. Introduction to Robust Estimation and Hypothesis Testing. Academic press; 

2011. 

253.   Patel  RR,  Donohue  KD,  Johnson  WC,  Archer  SM.  Laser  projection  imaging  for 

measurement of pediatric voice. Laryngoscope. 2011;121(11):2411-2417. 

254.   Bonilha HS, Focht KL, Martin-Harris B. Rater methodology for stroboscopy: a systematic 

review. J Voice. 2015;29(1):101-108. 

255.   Carlson JN, Das S, la Torre F, Callaway CW, Phrampus PE, Hodgins J. Motion capture 
measures  variability  in  laryngoscopic  movement  during  endotracheal  intubation:  a 

 

263 

preliminary report. Simul Healthc J Soc Simul Healthc. 2012;7(4):255. 

256.   Stepp CE, Hillman RE, Heaton JT. A virtual trajectory model predicts differences in vocal 
individuals  with  vocal  hyperfunction.  J  Acoust  Soc  Am. 

fold  kinematics 
2010;127(5):3166-3176. 

in 

 

264