w I§.'.}1°~‘ 'j. v. , are: gmfii . u... 3» . t .. 53. “a... .s. 1.1.... ea #1 1:, fl... , .2 it? 3 Juté. park ELF. {.1 .. L... o 3.? 5... fit;- i. ,. 2...: n... , . l . , . :rll:. . jhh' I LIBRARY N Michigan State University This is to certify that the thesis entitled VISION—BASED TRACKING OF FIDUCIALS FOR AUGMENTED REALITY presented by PAUL W . MIDDLIN has been accepted towards fulfillment of the requirements for M.S. degree mm SCIENCE . /’ I“! f l/ \ / , f) (1 ‘ i / /( I M I. , j . L / ____,,-~ ‘ Major professor ’ Date l2/13/0Z c-7539 MS U is an Affirmative Action/Equal Opportunity Institution PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c:/CIRC/DateDua.p65-p. 15 VISION-BASED TRACKING OF FIDUCIALS FOR AUGMENTED REALITY By Paul W. Middlin A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTERS OF SCIENCE Computer Science 2002 ABSTRACT VISION-BASED TRACKING OF FIDUCIALS FOR AUGMENTED REALITY By Paul W. Middlin Visible fiducial images are a common method for supporting vision-based tracking in augmented reality systems. This thesis describes algorithmic improvements in fiducial-based tracking including an improved fiducial design, better fiducial location, and improved pose computation. A set of criteria that are desirable in an optically-tracked fiducia] are presented and a new fiducial image set is designed that meets these criteria. The images in this set utilize a square black-border pattern with a 15% border width and an interior image that supports orientation determination and unique identification. The interior image is constructed from orthogonal Discreet Cosine Transform basis images chosen to minimize the probability of misidentification and to be robust to noise and occlusion. This image could be integrated into an Augmented Reality software system such as the well-known and widely used ARToolKit to improve accuracy in identification of fiducials. Piducial tracking involves more than simply creating a good fiducial image. The tracking includes methods to accurately locate the fiducial in the image, then use this information to calculate the location and orientation of the fiducial in relation to the camera. This location and orientation is known as pose. The ability of this system to track and calculate the pose of fiducials has been evaluated and compared to the ARToolKit as well. The system has proved to be generally better than the ARToolKit in terms of locating, identifying, and calculating the pose of a fiducial. Dedicated to Stuart Griffin, whose ambition inspires us all. iii ACKNOWLEDGMENTS I would like to thank Dr. Charles Owen for his extensive contributions in the implementation of this project, as well as for his advice and direction. I would also like to thank the students of CSE891, Augmented Reality, whose ideas and discussion led to the adoption of the fiducial criteria used, and the eventual DCT method itself. Thanks go to Tony Lambert for helping set up the testing environment, and for the use of his laptop and truck. Michael Malinak was helpful when doing the background research necessary for ' this thesis. iv TABLE OF CONTENTS Table of Figures ................................................................................................................ vii Table of Tables ................................................................................................................... ix 1 Introduction ................................................................................................................. l 1.1 Background ......................................................................................................... 1 1.2 Contributions ....................................................................................................... 4 1.3 Outline of Chapters ............................................................................................. 4 2 Related Work ............................................................................................................... 6 2.1 ARToolKit ........................................................................................................... 6 2.2 CyberCode ........................................................................................................... 9 2.3 HOM System ..................................................................................................... 11 2.4 IGD System ....................................................................................................... 11 2.5 SCR System ...................................................................................................... 12 2.6 TRIP System ..................................................................................................... 13 2.7 Multi-resolution Colored Rings ........................................................................ 14‘ 2.8 Other Systems ................................................................................................... 15 2.9 Pose Calculation Methods ................................................................................. 15 3 Criteria for a Good Fiducial ...................................................................................... 17 3.1 Fiducial Shape ................................................................................................... 17 3.2 , Fiducial Color ............ 20 3.3 Locating the Fiducial ......................................................................................... 21 3.4 Fiducial Identification ....................................................................................... 23 3.5 Fiducial Identification Range ............................................................................ 27 3.6 A Large Fiducial Identification Space .............................................................. 28 3.7 Human Identification ......................................................................................... 28 3.8 Summary of Desirable Characteristics .............................................................. 29 4 A “Good” Fiducial Interior Image ............................................................................ 31 4.1 Deriving the Image ............................................................................................ 31 4.2 Detection ........................................................................................................... 35 5 A Functioning System ............................................................................................... 38 5.1 Finding the Border ............................................................................................ 39 5.2 Tracing the Border ............................................................................................ 40 5.3 Accounting for Camera Distortion .................................................................... 42 5.4 Locating the Quadrilateral Comers ................................................................... 43 5.5 Quadrilateral Test .............................................................................................. 45 5.6 Line Fitting ........................................................................................................ 45 5.7 Warping ............................................................................................................. 46 5.8 Identifying with the DCT .................................................................................. 50 5.9 Calculating Pose ................................................................................................ 50 6 Evaluation .................................................................................................................. 55 6.1 Distance and Rotation Test Setup ..................................................................... 55 6.2 Distance Results ................................................................................................ 58 6.3 Rotation results .................................................................................................. 59 6.4 Identification Results ......................................................................................... 65 6.5 Speed ................................................................................................................. 73 6.6 Discussion of Results ........................................................................................ 74 7 Future Work .............................................................................................................. 76 7.1 Basis Set ............................................................................................................ 76 7.2 Pose Estimation ................................................................................................. 77 7.3 Finding Potential Fiducials ............................................................................... 78 8 Conclusions ............................................................................................................... 79 9 Appendix A -— Camera Frequency Response ............................................................ 81 9.1 Background ....................................................................................................... 81 9.2 Test Setup .......................................................................................................... 82. 9.3 Results ............................................................................................................... 84 9.4 Discussion ......................................................................................................... 92 10 Appendix B — Testing Data. .................................................................................. 94 11 Appendix C - Camera Calibration ......................................................................... 96 References ....................................................................................................................... 100 vi TABLE OF FIGURES Figure 2-1 - Example ARToolKit Fiducial ......................................................................... 7 Figure 2-2 - CyberCode recognition steps ........................................................................ 10 Figure 2-3 - Example HOM Fiducials ............................................................................... 11 Figure 2-4 - Example IGD Fiducials ................................................................................. 12 Figure 2-5 - Example SCR Fiducials ................................................................................ 12 Figure 2-6 - TRIP Target representing 1160407 ............................................................... 13 Figure 2-7 - Multi-size color fiducials .............................................................................. 14 Figure 3-1 - Equivalence of interior images for orientation determination ...................... 23 Figure 3-2 - Example images for correlation tests ............................................................ 26 Figure 4—1 - Example DCT fiducial Images ...................................................................... 35 Figure 5-1- Region with extraneous pixels ....................................................................... 42 Figure 5-2 - (a) Estimated first, actual third (b) Finding 2 and 4 ...................................... 44 Figure 5-3 - Special Case .................................................................................................. 44 Figure 5-4 - Solution for special case ................................................................................ 44‘ Figure 5-5 - Line fitting and intersection .......................................................................... 46 Figure 56 - Pseudo-code for Finding Warped Image ....................................................... 50 ' Figure 5-7 - Finding the X axis ......................................................................................... 51 Figure 5-8 — Pose Finding Method Comparison ................................................................ 54 Figure 6-1 - Test Setup ...................................................................................................... 56 Figure 6-2 - Finding the Center of Projection ................................................................... 57 Figure 6-3 - DCT Test Fiducial ......................................................................................... 58 Figure 6-4 - ARToolKit Test Fiducial ............................................................................... 58 Figure 6-5 - Distance Error Comparison ........................................................................... 59 Figure 6-6 - Angular Error, All Images ............................................................................. 61 Figure 6-7 - Angular Error, 0 Degrees .............................................................................. 61 Figure 6-8 - Angular Error, 15 Degrees ............................................................................ 62 Figure 6-9 - Angular Error, 30 Degrees ............................................................................ 62 Figure 6-10 - Angular Error, 45 Degrees .......................................................................... 63 Figure 6-11 - Angular Error, 60 Degrees .......................................................................... 63 Figure 6-12 - Angular Error, 75 Degrees .......................................................................... 64 Figure 6-13 - ARToolKit Misidentification, 3 foot, #1 ..................................................... 68 Figure 6-14 - ARToolKit Misidentification, 3 foot, #2 ..................................................... 69 Figure 6-15 - ARToolKit Misidentification, 6 foot ........................................................... 70 Figure 6-16 - ARToolKit Correct Identification Using DCT, 3 feet ................................ 71 Figure 6-17 - ARToolKit Correct Identification Using DCT, 6 feet ................................ 72 Figure 6-18 - Example Test Image .................................................................................... 73 Figure 9-1 - Effect of point spread function ...................................................................... 82 Figure 9-2 - Testing pattern ............................................................................................... 83 Figure 9-3 -Ideal image, bands 165 through 179 (close to half the Nyquist frequency) .. 9O vii Figure 9-4 - Logitech Image .............................................................................................. 91 Figure 11-1 - Jig used for calibration ................................................................................ 97 Figure 11-2 - Radial Distortion ......................................................................................... 97 Figure 11-3 - Comer locations after calibration ................................................................ 98 viii TABLE OF TABLES Table 5-1 - Starting Directions for Tracing ....................................................................... 41 Table 6-1 - ARToolKit Fiducial Shape Associations ........................................................ 65 Table 6-2 - Identification test results ................................................................................. 67 Table 6-3 - Speed Test Results .......................................................................................... 74 ix 1 Introduction 1. 1 Background Augmented reality (AR) is the blending of computer-generated virtual elements with reality [1]. A common example AR application is rendering computer graphics onto existing imagery such that the graphics appear to be seamless additions to or augmentations of the real image, registered in space, matching in scale. One of the most difficult challenges in this application is aligning the real and virtual worlds so as to achieve this seamless registration. The parameters of the rendering environment must exactly match those of the camera system that captured the image. Vision-based tracking. uses images of the world to support this computation, either through tracking of natural image features [2, 3] or through the use of markers or fiducials placed in the scene. This thesis proposes a set of criteria to use when designing a fiducial and a vision-based tracking system for the fiducial design, making arguments for a specific type of fiducial that was created with optimization of these criteria in mind. Further, a system has been designed that uses these fiducials for tracking, which has been optimized for performance in a way that is consistent with the fiducial criteria. Existing fiducial tracking systems use ad-hoc fiducial images based on either comparison to a library of template images or simple bar-code-based mechanisms. The designs are typically based on human, not machine, identification and ease of identification at high resolutions. The images tend to be highly correlated and often are misidentified. This thesis recognizes the need for a set of fiducial images that can be systematically produced, has a small chance of being misidentified, and can be easily and accurately tracked. These images are two-dimensional forms of the Discrete Cosine Transform (DCT) basis set. The shape, border width, color, and method of locating the fiducial have been chosen after analysis of the criteria set forth in this thesis. The choices were made in an attempt to satisfy the general majority of fiducial tracking needs, though they will not be ideal for all situations. The fiducials will utilize a square shape with a black border that is 15% of the width of the fiducial. The interior images are monochrome and are based on the DCT basis set, with an orientation component built in. The choices made in the design of the tracking system are shown to be theoretically superior choices. However, theoretical superiority is not a guarantee of performance in a real implementation. To verify the performance of this system, it has been compared to the ARToolKit [4] with tests in pose calculation accuracy and fiducial identification. The ARToolKit was used as a benchmark since it is one of the most popular and widely used fiducial tracking systems for Augmented Reality. The ARToolKit will in fact be mentioned numerous times throughout this thesis as a basis for performance comparison. The testing has shown that the system created for this thesis was more capable both in terms of fiducial identification and pose estimation. Additionally, this system executes much more quickly than the ARToolKit system, allowing more time to do the three-dimensional rendering required in most AR applications. The need for such an improved system stems from the wide variety of AR applications that use such technology. For instance, Fjeld and Voegtli [5] have created a system that uses fiducials to allow a user to view chemical models in a more interactive way. A series of fiducials are used to identify different chemical compounds, and a graphical overlay of a model for these compounds is placed over the fiducial. This addition of graphics is done on a viewscreen. The user can interact with the models by moving the fiducials, or by using another cube that has fiducials on it. This cube can be rotated with a person’s hand, and will cause the chemical model to rotate synchronously. Using a viewscreen is not the only option for displaying Augmented Reality. Some systems use a Head-Mounted Display (HMD), which is like having two small monitors in front of the user (one for each eye). HMDs come in two major forms: video see-through and optical see-through. A video see through HMD uses a camera to record ' video, then passes this video to the eye with small LCD displays. In this case, the user is seeing the video as the camera(s) see it. For fiducial tracking, this means that the fiducials could be replaced with a virtual element before the video is seen by the user, thereby augmenting the reality that the user sees. An optical see-through display is similar, but the user can see the world directly through the HMD. Here, graphics are overlaid using a half-silvered mirror and LCDs to combine the computer display’s light with light coming from the actual objects. Again, fiducials could be used to calculate where the user’s head is in relation to the objects he or she is viewing so that the virtual elements can be registered with the real objects in the user’s line of sight. This is also an example of a time when the fiducials being tracked do not need to be in the same space as the virtual elements. That is, a separate camera can be used strictly to track the user’s HMD, so the fiducial on the I-IMD would never be seen by the user; the fiducial is never shown on the HMD display. Fiducial tracking can be extended to many other media, such as video monitors, handheld devices, or systems that do not use visual representations at all. The purpose behind using the fiducials is for tracking, which implies that any application in which the location and orientation of an object needs to be known can benefit from vision-based tracking. 1.2 Contributions Contributions of this thesis are as follows: 0 A set of criteria that define the qualities of a good fiducial tracking system 0 A set of fiducial design choices that optimize those criteria 0 A specific set of fiducial images based on the DCT that perform well with respect to the given criteria 0 A system implemented using these fiducials, optimized for performance 0 Testing of the system and comparison to a well-established fiducial tracking system (ARToolKit) 0 Evaluation results that demonstrate improved accuracy, stability, and reliability. 1.3 Outline of Chapters Chapter 2 outlines a representative set of existing vision-based fiducial tracking systems. Chapter 3 describes a set of criteria created based on the needs of such systems as those described in Chapter 2. Chapter 4 utilizes these criteria to derive a new type of fiducial that performs well relative to those criteria. Chapter 5 describes this system in detail and Chapter 6 presents evaluation of the system performance. The fiducial system created is still not necessarily ideal, and ideas for the improvement of this system are described in Chapter 7. 2 Related Work Fiducial-based tracking is a key enabling technology for a wide variety of applications. The motion-picture industry uses fiducials to track camera movement in support of augmented imagery. Manufacturing applications track fiducial images on circuit boards and other components so as to support accurate assembly alignment. Because of this general utility, there are many fiducial systems that have been proposed in both commercial and research areas. Some systems exist only to support location of a single point in an image. Others support only two axis of alignment for parts placement. Only a limited number of systems support full pose computation as described in this thesis. This chapter describes a set of the major systems described in the literature. 2. 1 ARToolKit One of the most well known and widely used fiducial tracking systems is the ARToolKit. It was created by H. Kato and M. Billinghurst at the University of Washington in the Human Interface Technology Lab [4, 6] and supports full pose calculation in addition to identification of a set of fiducials. The ARToolKit is widely distributed as open source for a variety of target platforms. Between its free distribution, documentation, and ease of use it has become the center of a wide variety of AR applications that depend on vision-based tracking. It is used both for research and commercial use, and for development of other systems in the form of its compiled tracking libraries. ARToolKit markers are square fiducial images with a fixed, black band exterior surrounding a unique image interior. Figure 2-1 is an example ARToolKit fiducial. The outer black band contrasts against a light background and is used to locate a candidate fiducial in a captured image. The interior image enables the identification of the candidate from a set of expected images and determination of the four possible orientations. The four comers of the located fiducial are used to unambiguously determine the position and orientation of the fiducial relative to a calibrated camera. iro Figure 2-1 - Example ARToolKit Fiducial Design of the distinquising interior image is completely up to the user. This content is ad hoc, in that there is no systematic process to generate it or to choose good alternatives. Frequently, single letters or numbers are used. The ARToolKit requires several steps to find and match a fiducial image. The image is thresholded against a constant value and all connected components are labeled. The edges of the connected regions are located using contour following. These contours are then fitted to lines to form a quadrilateral. If a quadrilateral is found, then the pixels in this quadrilateral is resampled into a 16x16 upright square image that is to be compared with the fiducia] patterns registered with the system. The comparison is done by calculating the correlation coefficient between the captured candidate image and a stored template pattern. In the following equations, I(x,y) is the candidate image and P(x,y) is the pattern. First, the mean and standard deviations for the image and pattern are computed (clearly the pattern data can be pre-computed). The following equations show how to compute the standard deviation for the candidate image (or) and for each pattern (or). u; and up from equation ( 2-1 ) and ( 2-2 ) are just substitutions into the standard deviation equations in ( 2-3 ). 1 (2-1) #1 =3221(x,y) x y 1 xy (2'3) 0'1 = 22(I(x’Y)-#I) 0P = 22(P(xry)—Aap) xy xy Then, the correlation coefficient (p) is computed as: ZZUU, y) -m )(P(x,y) -flp) xy (2-4) p= UIOP The correlation coefficient is a non-negative value such that larger values indicate similarity of the image based on an L2 norm. If the coefficient for one image is maximal for the image set and exceeds a fixed threshold (0.5), then the image is accepted. Obviously, this process is a complex calculation. More importantly, using this process means that to find a best match the system must calculate a coefficient between the candidate image and each of the expected patterns, an O(N) operation. The more patterns in the system, the longer it will take to perform this calculation. The ARToolKit actually has a hard limit as to the number of fiducials that can be registered with the system. This limit helps to prevent it from taking too long to match the fiducial, but limits the flexibility of the system because of the small number of fiducials that can be used. 2.2 CyberCode The CyberCode system was created at Sony Computer Science Laboratories [7]. CyberCode is based on a two dimensional bar code fiducial. Here, the interest was more in producing a large number of unique fiducials. A CyberCode fiducial consists of a square area for the patterned code, with a black bar alongside the square region to help determine orientation. There is no surrounding border as with the ARToolKit fiducials. Figure 2-2(a) shows an example of a CyberCode fiducial. The guide bar is pointed out in (b). The four comers of the square area are always black (c), so the code pattern is the cross-shaped area inside of this (d). Figure 2-2 - CyberCode recognition steps The tags are found by adaptive thresholding the image, then applying a connected components algorithm. The connected regions are then searched a specific second order moment, indicating the guide bar. From there, the algorithm locates the four comers, and uses these locations to account for distortion from tilt/angle. The last step is of course to decode the bitmap inside the four comers. Sony claims to be able to use 24 bits to encode the identification, meaning that there are over 16 million possible CyberCode markers. This is a very wide space. Sony published little about the performance of this system in terms of adaptability to different lighting conditions, low resolution images, or 3D location accuracy. 2.3 HOM System Similar to CyberCode, the HOM system created by Siemens uses a 2D code with a side bar [8]. In this case, however, the sidebar also contains 6 bits of additional coding information and the square part of the fiducial has a solid border. See Figure 2-3 as an example. Figure 2-3 - Example HOM Fiducials 2.4 IGD System The IGD system is another coded fiducial system using a black border and a bitmap in the middle [9]. The IDG marker system was implemented at the Institute for Computer Graphics (Institut Graphische Datenverarbeitung) in Darrnstadt, which is an ARVIKA partner. ARVIKA is the German government supported research project to develop AR-related applications in industry. Many ARVIKA-related applications are developed using the IGD marker system. An IGD marker is a square divided into 6x6 square tiles of equal size. The inner 4x4 tiles are used to determine the orientation and the code of the marker. Figure 2—4 shows an example of this fiducial. The precompiled libraries of the IGD marker system are available to ARVIKA participants [10]. Figure 2-4 - Example IGD Fiducials 2.5 SCR System The SCR marker system was developed by Siemens Research Corporation for AR applications [1 1]. It also uses a coded matrix to identify the fiducial, as seen in Figure 2-5. Additionally, it locates 8 feature points instead of the usual 4 found in most square fiducial systems. The additional points might help to increase the accuracy of the location of the fiducial, which in turn can help make 3D translations more accurate. Figure 2-5 - Example SCR Fiducials 2.6 TRIP System The TRIP (Target Recognition using Image Processing) system is a circle-based system. It was developed at Cambridge University in the Laboratory for Communications Engineering [12]. It uses a sector-based circular system of bar coding. The innermost part of the target is a “bull’s-eye”. The bull’s—eye is used to locate the fiducial. The TRIP algorithm thresholds the image, does edge detection, and then edge following. The connected edges are examined and only those that are circular (or ovular) are kept. Finally, the bull’s-eye is identified when two concentric circles are found. After finding the fiducial, the two concentric rings around the bull’s-eye are examined. They are broken into 16 sectors, as shown in Figure 2-6. One of these is used as a synchronization sector; two others are used for even-parity. The remaining 13 sectors are used as a ternary code. There are therefore 3'3 = 1,594,323 2 220 possible codes. I,“ I"; 1 —————— ‘1: ‘V~\‘, ‘ ring code I . ‘ ' ring code 2 .. even-parity sectors synchronization sector 0 10 2011221210001 Figure 2-6 - TRIP Target representing 1160407 Despite providing only one real location point, the TRIP system does indeed calculate the 3D position of the target in relation to the camera. It does this using the POSE_FROM_CIRCLE algorithm described by Forsyth et a1 [13]. The synchronization sector is used to find the orientation of the circle. 2. 7 Mum-resolution Colored Rings Cho, Lee, and Neumann at the University of Southern California have created a system that uses nested colored rings [14]. The purpose is to make fiducials that can be found over a wide viewing range. Each fiducial consists of a center circle, then three rings of increasing width surround the center (Figure 2-7). a 0 0 First level Second level Third level Figure 2-7 - Multi-size color fiducials Their algorithm searches for the smaller rings first. If the center circle with a single ring is found (first level), then there is no need to look for the surrounding rings. If the center cannot be found, then the fiducial must be too far away to distinguish such a small feature, so it will locate the second level instead. Likewise the third will be found for a smaller fiducial. The effective range for each level overlaps, but the smaller should be found first in the case of an overlap because this requires less processing time. The range of sizes for identifiable fiducials is about 24 to 56 pixels in diameter. 14 It should also be noted that each fiducial returns only a single point, so any calculation of 3D location would require 3- or more fiducials in the scene. In fact, using strictly the three points, there are often up to four solutions [15]. This implies that this system employs some extra processing between frames to rule out other solutions, or that it is sometimes inaccurate because of the lack of a fourth point for correspondence. 2.8 Other Systems Many simple approaches using fixed color squares, circles, or cross patterns have been demonstrated. Most projects approach the problem either from the standpoint of selecting a set of images (as in ARToolKit) or choosing a way to encode data into images (as in CyberCode). There are a plethora of other systems that do fiducial based tracking. See the following references: [14, 16-19]. 2.9 Pose Calculation Methods Pose is the location and orientation of an object. The location implies three degrees of freedom -— a point on an ‘X’, ‘Y’, and ‘Z’ axis. This alone does not reveal the way the object is situated at that location, so the orientation component is needed as well. Orientation is three degrees of freedom as well — rotations about the X, Y, and Z axes. Therefore, pose involves six degrees of freedom. There are many methods for calculating the 3D location of points in relation to a calibrated camera given the screen coordinates of these points and the model of the object. It is assumed that a single fiducial (or a set of fiducials in some systems) represents a coordinate system. It is typical that a single comer of a fiducial image will be declared to be the origin of the system and all points are considered to be in the (x,y) plane. Any 15 fiducial tracking system used for AR must use some form of pose calculation to estimate the 3D location of the fiducial. Three particular methods are examined in this thesis (see Section 5.9), but there are many methods in existence. The methods described here relate mainly to those presented by Shapiro and Stockman [20] and that used by the ARToolKit [4]. It seems valid to mention the work in this area by Ji et al [21] which describes methods for doing pose calculation from a variety of geometric shapes. Also important is the work of Quan and Lan [22], who have developed a linear method for pose calculation (instead of an iterative approach as is described in Section 5.9). This method solves the systems of equations using the classical Sylvester resultant [23] and quatemions. This solution is not a perfect least-squared solution, it is an estimate. See also [24-29] for examples of other methods and applications in the subject of pose calculation. l6 3 Criteria for a Good Fiducial Clearly there are tradeoffs among the criteria for a good fiducial image. Existing designs for AR fiducials have been ad hoc and have not started with specific design criteria other than support for some level of tracking (planer, pose, etc.). This thesis approaches the problem by asking question and proposing answers consistent with many applications in augmented reality and commonly available hardware. The questions addressed in this section are: c What is a good fiducial shape? What colors should be utilized in a fiducial image? How should a specific fiducial be located in an image? How should a specific fiducial be identified? Over what range of sizes should the fiducial be identified? Should a human be able to decode/identify a fiducial? The answers to these questions can vary depending on the application or domain that the fiducials will be used in. Some applications may require fiducials with anthropomorphic characteristics; others may be optimized for computer tracking only. Care will be taken, however, to try to make the answers to these questions as generally applicable as possible. Additionally, points that may influence one’s decision on the best choice to meet a given criteria will be presented to help make this decision. 3. 1 Fiducial Shape The purpose of a fiducial image is to provide automatic correspondences between points in a camera frame and points in a captured image. Clearly, any visual feature can 17 be used as a fiducial if its location is known (or can be computed) and it can be automatically identified. Indeed, tracking systems designed for use in unprepared environments have been proposed that use regions, lines, and other natural environmental features [30, 31]. Most applications for fiducial images, however, assume a prepared space with specific images placed in the environment, with the assumption that the relative transformation between a camera frame and frames indicated by the fiducials needs to be determined. In tracking terminology, the position and orientation (six degrees of freedom) of the frame marked by fiducials needs to be identified relative to the camera. This problem is also commonly referred to as pose estimation. Determination of position and orientation of a physical object relative to a camera. frame requires the correspondence of at least four non-linear points. As an example, estimating the pose of a camera relative to a physical environment will require the identification of four 2D points in the camera image and knowledge of their 3D coordinates in the world coordinate system. It is possible to compute pose from only three points. However, the result is ambiguous, generally having two, and often three or four, solutions [15]. Hence, any ideal fiducial solution supporting 6DOF pose estimation should always emit a minimum of four located points, no three of which are colinear. Additional points can be used to compute least-square solutions that can average out errors and increase the estimate’s accuracy. Many fiducial methods utilize a single, typically very simple, fiducial image such as a ring or disk with the requirement that multiple fiducials must be simultaneously tracked [14]. Since the location of fiducials in camera images will always be permuted by noise and quantization error, there is a clear advantage to tracking additional points, so 18 fiducials that emit multiple tracking points seem advantageous. Also, many applications require tracking of styli, independent marked locations, or multiple users, where placement of a large number of fiducial images is prohibitive. An assertion of this thesis is that an ideal fiducial image should emit at least four points. Beyond that, it is clear that the points should approximate a square. The size of the fiducial equates to resolution in the capture image. Four points not in the form of a square will result in some elements of the image presenting a lesser resolution to the camera than others, thereby decreasing tracking accuracy in corresponding orientations. This requirement does not necessarily imply that the fiducial image itself must be square. Any image that can emit four points would suffice. However, there are clear computational advantages to simplicity, and a square fiducial image is the simplest possible fiducial emitting four points. The straight edges of a square can be used to compute best-fit lines allowing comers to be computed with greater, potentially sub-pixel accuracy. Indeed, the ARToolKit standard fiducial image is a square image. It should be noted that a circular marker can be used to determine pose if a point on the circle can be determined. The POSE_FROM_CIRCLE algorithm provides a robust solution given circle edge points [13]. However, an interior image for identification is more difficult to implement and cannot be represented in a rectangular array. Most implementations based on pose estimation from circles are based on barcodes (or, more precisely, ringcodes) [12]. 19 3.2 Fiducial Color The question of fiducial color is much more difficult to address. Clearly, choosing a color fiducial as opposed to monochrome increases the possible set of fiducial images. Indeed, both color and monochrome images have been utilized in existing systems. However, there are several technical reasons to favor a monochrome fiducial: o Varying chroma resolution in camera systems 0 Decreased image representation 0 Hi gher-perforrnance localization algorithms The spatial frequency sensitivity of the human visual system for luminance components is much greater than for chrominance components [32]. Unfortunately, many imaging systems designed for computers mimic this characteristic, transmitting chrominance information in lower bandwidth channels or representing chrominance information with lower resolution. This neCessarily decreases the detection resolution for color fiducials. Use of inexpensive web-cams has become very popular for fiducial- based tracking. These cameras clearly exhibit decreased color resolution. Hence, for the most accurate results using a wide range of cameras, a monochrome fiducial image is the best choice. When hi gh-quality cameras are available, color fiducial images can increase the information available in the fiducial image. Even if an RGB color presentation is captured at full resolution, the resulting color image will increase the memory usage and, consequently, the analysis time, by a factor of three (or four). This is a consequence of the increased memory bandwidth requirements. 20 An additional element in the choice of color or monochrome is the choice of localization algorithms. Hi gh-performance algorithms have been developed for color fiducials, but assume very simple shapes that can be identified by cross-sectional lines [14]. One advantage of color fiducials is the use of color to identify the specific fiducial, as in the multi-ring approach. However, the number of colors that can be uniquely identified varies greatly depending on lighting conditions, and is likely to be small. Specular reflection will not only affect the luminance of an image, but can also modify the hue of imaged colors. Additionally, the colors must contrast with colors naturally occurring in the scene. One option for color is to utilize retro-reflective fiducials and infrared illumination [33] or direct imaging of infrared emitters [34]. This option is a very different technological approach from visible-image tracking, requiring special camera, illumination, and reflective technologies. In addition, IR fiducials based on retroreflective materials do not lend themselves well to patterned individual fiducials other than simple binary patterns. As the focus of this thesis is visible image fiducials, IR approaches are beyond the scope of this discussion. 3.3 Locating the Fiducial The shape and color of a fiducial is directly related to the algorithm utilized to locate it in the camera image. As mentioned previously, the ARToolKit contains a fiducial tracking system using a square image with a black border as illustrated in Figure 2-1. An interior image contained within the border provides identification for the particular fiducial image. It is assumed that the marker will contrast with a surrounding region when converted to a binary image. Typically, this contrast can be achieved by 21 simply ensuring that the fiducial is mounted on a white surface or is printed on a larger white sheet of paper. More details of the ARToolKit approach will be included in later sections. Kato and Billinghurst [4] allow for the fiducial corners to be rapidly and accurately located in a camera image. The approach assumes a monochrome fiducial image. Is this the best fiducial design for localization, the location of the fiducial in an image? There are several distinct advantages to this design. The shape is a square design and yields four comer points for tracking purposes. The edges are straight between the comer points. This allows the comers to be determined by line fitting to the edges, yielding measurements that are less sensitive to noise in the vicinity of the comer and quantization errors. The black border also yields a maximum contrast relative to the background, particularly a white background. Once the comers have been located, the interior can be warped to a common frame of reference (16 by 16 in the ARToolKit approach) for comparison to a database of marker images. This fiducial approach does not emit an orientation other than through analysis of the interior image; hence, the offset of the interior text in the marker image in Figure 2-1. Would it be better to design the outline to emit orientation independent of the interior text? This design could be accomplished in a variety of ways, including offsetting the interior image, adding an orientation image in addition to the interior image, or using varying colors on the edge. Varying colors is not considered a good choice for the reasons mentioned in the previous section and because it would eliminate the homogeneity of the design. Detection performance would be determined by the least common denominator of detection of the two types of borders. Offsetting the image or adding an image 22 component for orientation is equivalent to using a larger interior image and determining orientation from the interior image alone. Figure 3-1 illustrates this equivalence. When either the interior image is offset or a special orientation pattern is added, the fiducial can be considered equivalent to a simple border with a larger interior image, as indicated by the dotted lines. Equivalent interior region Offset interior image Orientation pattern Figure 3-1 - Equivalence of interior images for orientation determination Given these criteria, the square ARToolKit fiducial outline seems to be a "good" approach. The border width and the interior image will be adjusted in this research, though. 3.4 Fiducial Identification Once an individual fiducial image is located, it must be identified. The identification of the interior image is simplified if a border has been located. The interior image can then be warped to a square image with a fixed scale. 23 Clearly, marking a space with identical fiducials would require the analysis of relative placement for identification, so it is advantageous if fiducials are unique. Uniqueness can be accomplished in a variety of ways, including color combinations, bar codes, or patterns. The pattern must be unique and accurately identifiable at a variety of resolutions. Several desirable characteristics for fiducial identification have been collected: - Orientation identification Minimal inter-fiducial correlation. Resistance to noise or partial obscuring. A large identification range. A large fiducial identification space. As discussed, using a fixed monochrome square image, as in the ARToolKit fiducials, is a preferred method. The identification image is then set inside this box. It is also preferred that the orientation, and thereby the correspondence of detected image comers with physical coordinates, is determined by an interior image. Consequently, the image must support determination of a unique orientation. In ARToolKit, fiducials are commonly designed with offset text or blocks that make the orientation unique. Then a candidate image is compared to the known images in each of the four possible orientations. This method of comparison necessarily limits what can be selected as a fiducial, particularly if users desire fiducial images with visually perceptible meaning. A key characteristic of fiducial images is that there is rrrinimal inter-fiducial correlation in all orientations. A variety of methods are possible for comparing images. 24 Mean squared error (MSE) is a common measure of image similarity, particularly when measuring image degradation: 1/2 (34) c(I.P)= ZZ