Ill llHIWII l t 142 820 THS This is to certify that the thesis entitled VISUAL TRACKING IN MANUFACTURING PROCESS presented by RUI ZHANG has been accepted towards fulfillment of the requirements for the MS. degree in Electrical Mering Major Professor’s Signature 8 / 2 ’7 / I 0 Date MSU is an Affinnativa Action/Equal Opportunity Employer LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE “3% ii 21013 5/08 K‘lProj/Acc8-Pres/CIRC/Date0ue,indd VISUAL TRACKING IN MANUFACTURING PROCESS By Rui Zhang A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Electrical Engineering 2010 ABSTRACT VISUAL TRACKING IN MANUFACTURING PROCESS By Rui Zhang Visual tracking system is a very active research area. The application of visual tracking system is not limited to research but expended to manufacture process automation. A complete visual tracking system can provide useful information based on the analysis of the input image. The core of visual tracking system is its image processing and analysis procedure. In this research a new hardware system set up is presented as well as a new approach of edge detection to achieve a more accurate tracking. The new system set up includes the utilization of server machines to separate computation process from display and operation process which are the main function of client PC. This new system set up greatly improved system reliability and enabled the system to handle multiple camera multiple object tracking. The new approach of edge detection overcomes the inconsistent lighting condition in manufacture process and generates target object properties for the manufacturer. Human-machine interaction is also introduced to this tracking system to improve reliability and simplifies the problem. TABLE OF CONTENTS LIST OF FIGURES .................................................................................. iv INTRODUCTION ................................................................................... 1 LITERATURE REVIEW ........................................................................... 5 PROBLEM STATEMENT ......................................................................... 7 SYSTEM OVERVIEW ........................................................................... 10 Hardware architecture .................................................................. 10 Software architecture ................................................................... 13 VISUAL TRACKING .............................................................................. 16 Noise removal ............................................................................ 16 Segmentation ............................................................................. 1 7 Unwanted features within ROI ....................................................... 20 Edge fitting ................................................................................ 21 Calibration ................................................................................. 22 RESULT AND ANALYSIS ...................................................................... 26 CONCLUSION AND FUTURE WORK ....................................................... 4O REFERNCES ...................................................................................... 43 iii LIST OF FIGURES Figure 1 Sample tin bath images .......................................................... 3 Figure 2 Hardware architecture .......................................................... 12 Figure 3 Software architecture ............................................................ 14 Figure 4 Glass manufacture process .................................................... 26 Figure 5 Sample ROI image and tin bath image ....................................... 28 Figure 6 Adaptive threshold with and without noise removal ................ 30 Figure 7 Global threshold.Result ........................................................... 32 Figure 8 Opening and Closing Operation Effects .................................... 33 Figure 9 Registry of non-interest edges ................................................ 35 Figure 10 Result of XOR between registry and new processed frame ......... 35 Figure 11 ROI and line scan sample .................................................... 36 Figure 12 Result of fitted edge ............................................................ 38 Figure 13 Calibration redult ................................................................ 39 1. Introduction Visual tracking system is a very active research topic. Its primary function is to detect and track the motion of certain object in image sequences. It can be applied in many fields such as surveillance, robot control, human gesture recognition, medical imaging and manufacture process automation. Real-time Visual tracking system is becoming more and more popular in manufacture process automation since it provides a necessary feedback measurement which can greatly improve the accuracy of process automation [1][2]. Visual tracking has many unique advantages comparing to other tracking methods in both functionality and practicality. It can capture rich information such as color, shape, and size which are very useful at recognizing object. It is also usually cheap and easy to implement which is important to industrial applications. However, visual tracking system still has lots of problems which preventing it to play a bigger part in manufacture process. The performance of visual tracking system highly depends on the efficiency and accuracy of the image processing technique being used. Generally, image processing has a very high demand of computation power. With complicated image processing techniques, the system suffers less from error rate. However, it also increases the time needed to complete the analysis of an image and fail to meet the on-Iine requirement of manufacture process. Therefore, in order to achieve real-time tracking, either the complexity of image processing is reduced so that the overall process time is 1 shorted, or very expensive equipment is purchased to provide enough computation power. In addition to that, one major problem the tracking system faces is that the inconsistent lighting condition of manufacture environment. Inconsistent lighting condition can cause the camera either fail to grab a clear image to be analyzed or capture a large variation of illumination which can cause malfunction of image processing and eventually fails the tracking system. People tried to solve this problem by adding additional lighting source so that camera can catch a clearer image and illumination variance is reduced. Unfortunately, sometime it is impossible to add another lighting source to the manufacture process and images caught by the camera becomes inconsistent in its overall brightness and contrast. For example, figure 1 shows two camera images retrieved from a glass production tin bath in which the average temperature is 1475 degree Fahrenheit. There is no light bulb can sustain that heat and the picture is in low contrast and contains lots of noise. (b) Figure 1 Sample tin bath images In order to provide a more generic solution to tracking system, a new hardware set up and a new image processing method on image segmentation is presented in this research. The new hardware set up greatly improved the capability of the tracking system which can handle multiply camera multiple object tracking. It also improved the hardware system’s reliability and provided the 3 possibility of remote access and control. The new image processing procedure simplified the complexity of the algorithm and reduced the computation power and time needed to ensure that this system can handle real-time tasks. 2. Literature review There exist many visual tracking systems in the literature [2] [4] [5] [6] [7] [8], however, none of them is focused on low contrast image-based visual tracking. In the above glass production example, the main purpose is to track the edge of liquid glass flowing above liquid tin and calculate the distance from the edge to machine tip. The contrast difference between them is not significant and is surrounded by a lot of other features which are very hard to be distinguished. Using CAD model based visual tracking [2] can easily find the machine tip because a determined shape is available and the rigid body will not alter a lot just because of the camera is pointing from a difference angle. However, this method fails to deal with target object without a defined CAD model. Others use template matching [1] [6] which in general works fine as long as the target object doesn’t change its shape over time. The draw back of template matching is that it requires a lot of computation resource and usually takes longer to finish the task of locating the object since it performs a search over the entire image. Another problem of template matching is that it cannot provide any further properties of the target object since only the template is located but not the object within it. Further analysis method is required in order to get more detailed property of the target object. Lack of attention on hardware architecture design also imposed restrictions on the system. A normal approach usually includes all the tracking system 5 function on a single machine which has a very limited computation power. Sawasaki [12] and others have noticed this issue and started separating the image processing procedures from other functions of the tracking system. This idea of distributing computation is the right path to take and inspired the system hardware architecture set up presented in this research. In this research, a new visual tracking system set up which has more efficiency and ability to analyze low contrast noisy images is presented. This system can handle more generic situations of manufacture process. 3. Problem statement From section 1 and section 2, we come to the following conclusions of the problems that the current visual tracking system faces. The current visual tracking system can handle well defined simple rigid object tracking under stable lighting condition with evenly distributed illumination. However, the current tracking system cannot handle object tracking under uneven distribution of illumination. The visual tracking system also fails to perform its task under unstable lighting condition. The main cause is that the core image processing algorithm used in visual tracking system is not targeted to real world applications. Because the lighting condition is usually much better at a research environment than a manufacture plant, these factors is not taken into consideration. At current state, the tracking system does not meet the hardware reliability requirements for manufacture process which usually require system running at 24/7. In addition to that, the tracking system cannot use more sophisticated algorithm for image processing due to the limitation of hardware’s computation power. These are the main problems we want to solve in this research. The difficulties in overcoming these questions involve hardware architecture modification and software algorithm improvement. On hardware side, the challenge is to increase the computation ability and hardware reliability while maintaining the cost of system to an acceptable level. It is also the manufacture’s interest to give this system remote control ability so that the flexibility of system 7 set up is maximized. Another challenge in hardware set up is to minimize the electromagnetic interference. Usually in a manufacture plant, many power cables run around the plant to provide power to production equipments. These energy cables have a high current running through them and will generate electromagnetic fields which will interference with the video signal running in cable from camera to frame grabber. This effect will degrade the image quality at the frame grabber end and will be worse if the video signal goes through a longer cable which has a higher risk of being exposed to electromagnetic field interference. The solution to these problems is proposed in the following sections. In order to reduce the meet manufacture hardware reliability requirement, server machine is used for image acquiring, image processing and information generation. To reduce interference of electromagnetic field to video signal and include remote control ability, network communication is introduced and a client PC machine is included for display and human machine interaction. The network communication provides the option of putting server closer to camera and reduces video signal cable length in order to minimize effect of electromagnetic field interference. The image processing algorithm that is used on server machine includes additional filter to reduce high frequency noise. In addition to that, an adaptive threshold method is utilized to reduce uneven distributed illumination and unstable lighting conditions. This new approach is very efficient and has a low cost of computation 8 power which increased the ability to add more cameras to the tracking system to best utilize the powerful server machine. 4. System overview 4.1 Hardware architecture The general system set up is shown in figure 2. The main idea is to distribute computation load to reduce the time needed for image processing and information retrieval and improve the overall system efficiency by maximizing the computation ability. In the system set up, cameras will catch image continuously and send them to a splitter. Increasing the number of camera used can greatly increase adaptability of the system. The cameras can set up in two ways. In the first set up, each camera focuses on different view areas and act like end point sensors. Therefore, all together, the cameras can form a surveillance network with maximum coverage. Another way to set up the cameras is to pair them up and each pair is configured to stereo vision which is very effective in generating 3D image and avoiding camera calibration. From the splitter, the video signal is separated to 3 feeds entering frame grabber, encoder and monitor respectively. The splitter must have the ability to strengthen the video signals othenIvise the separated video signal will be too weak for the devices at the next step to pick up. The frame grabber will grab an image for the software application containing the image processing routines to perform necessary steps to analyze the current frame of image and produce information for next step. The information produced by image processing routines is then 10 either stored to a database or sent to client machine through Ethernet connection using TCP/IP protocol. The TCP/IP protocol provides reliable communication between the server and client and guarantee that no information loss during data transmission. The second output from splitter goes into the encoder and it will compress the video signal and send them directly to client through network connection. The client has decoder installed and the image is decoded for display or other simple tasks at the client machine. User at the client machine can also use a graphic user interface to send out command to the server machine. A direct connection from splitter to client is available, but longer cable is needed for transmitting the video signal. Usually there is lots of high energy cables used in manufacture plant and most of them are running strong current which generate strong electromagnetic field can interference with the video transmitted cable and degrade its quality. With the use of encoder and decoder, the image can be transmitted over the intemet with slight loss of information due to quantization. Since the quantization effect only appears on the client machine where no image processing is performed, the effect is minimized. The third feed from splitter goes directly to analog monitor who performs as a backup display in case the server machine malfunctions and stop video transmitting through the network. Camera 1 I: Camera 2 ———I_— Splitter Monitor w 1' . , l . I» _r J r '1 I. 'r ‘J ’( III I, "I ’-‘ [I ,r — r I. ttttttttttt I .‘ “I" A'AI' a I I I ‘4 . . _a I _. .' . ....... A A A A A '1’ 'l rrlil" ’3 "( I L L A l amera .. .- . .. . [A 'r ‘l ‘l '1 {I I, It '1 l. I" II I" I" I1} FJ 'I - b‘ ‘I‘ ‘4 o‘ - ‘ rame .1 -' ,- r ,7 v - - .- , , . I" . . Encoder Grabber | .4 IIIIIII v' “A IIIIIIIIIIIIIII VJ ’r' ' a I . '4 rrrrrrr r 2,2 , It (.2 lllllllllll J (((((( f r” llllllllllllllllllll l " v‘ .3 I. .2 ( . lllllllllllllllllllllllllllllll Figure 2 Hardware architecture Another advantage of using server to perform most computation demanding process is that the parallel processing can be best utilized. Servers are usually designed to handle multiple applications at the same time and have passed a much higher reliability test so that they can operate 24/7 which is the same of any manufacture production. In general, servers are much more reliable than regular PCs. With multi-threading programming technique along with the powerful server machines, the tracking system can be pushed to its extreme to handle multiple cameras, multiple object tracking simultaneously. A graphic user interface (GUI) is created on the client machine and important information of the tracking system can be visualized. The GUI also provides the possibility for human machine cooperation where the user can input some key parameter of the object to be tracked and these parameters are sent to server to help improve accuracy and efficiency of the tracking system. 12 4.2 Software Architecture The majority of computation and image processing is completed on the server and Figure 3 shows the software architecture within the server machines. Region of interest (ROI) is utilized in this tracking algorithm because it greatly reduces the computation load. A ROI containing the most useful information of the object being tracked is created and the initial position of this ROI is provided by user in this system’s set up. User at the client machine can interface with the tracking system through a graphic user interface and provide the information needed. While using an automated tracking module, a sufficient prior knowledge of tracking object is required. Moreover, initial detection of location of object being tracked contains the most uncertainties and very likely to go wrong. Therefore, instead of writing the prior knowledge into hard code and risking the software’s reliability, this initial detection function is left open for user to define the tracking object’s characteristics at the initial processing. User-machine cooperation can save a lot of effort and reduce the chance of error to improve software reliability. —> Input Image I Image Segmentation ROI I Feature Extraction Output data / Update Sub-window location Figure 3 Software architecture This tracking system also utilizes a recursive framework of the core image processing algorithm to accomplish the tracking objective. At the end of each image analysis, essential extracted information will be stored and passed to the next frame analysis as inputs. The advantage of recursive framework is that it greatly reduces the complexity of the program while increased its reliability. The draw back of recursive framework is that the program either has to perform a search for the tracking object over entire image for its location before it can extract any feature from the object being tracked, or use prior information from analysis to estimate the location of the tracking object. Either method has its affection on the overall system performance. The first method obviously requires more effort on computation which will increase the total processing time needed for each frame analysis which is not preferred in on-line tracking system. The second method will 14 have the potential risk of false estimation of the location of tracking object. However, with added condition and constrains such as a history of object moving speed and direction, natural law of physics, etc, the false estimation rate can be minimized. After analyzing input image through image segmentation and other processes, server will extract useful information of the object being tracked and then send them to client through an Ethernet connection. A TCP connection between server and client is used to ensure no loss of information during network data transmission. 5. Visual Tracking 5.1 Noise removal: Noise is removed by 2-D adaptive noise-removal filtering. 261011122) Mn] ”2677 0'2 =_1_ “2(n1’”2)-.U NM 111,122.77 The above equations produce the mean and variance of local neighborhood. Use the above statistics a Wiener filter can be created, which is very efficient for noise removal. Noise removal at the initial step is part of the image enhancement process. Normally, under very good and stable environment, the camera can catch a very clean image. By clean, we mean there is no high frequency component of the captured image in the frequency domain. In other words, it means no “salt and pepper” noise in the spatial domain. Other noise removal filters using spatial domain method such as Gaussian blur usually perform an average operation on adjacent pixels. These methods work very fast since they are pixel-wise operations. However, when removing the “salt and pepper” noise, they also blurred the edges in the image since the adjacent gray level difference is weakened by the averaging operation. The other noise removal methods using frequency domain operations such as frequency cut-off don’t have blurring effect on edge. However, these methods require the image transformed to frequency 16 domain from spatial domain and transfer them back to spatial domain after the operation is performed. Obviously this technique requires more processing time and is not favored by on-Iine applications. The result of noise removed image, in this case, “salt and pepper” noise removed image has clearer edge information and provided a better platform for image segmentation which is the next step. 5.2 Segmentation The next step is to get the edge image. Edge detection is very useful in image processing since the resulting edge image can be used for image segmentation and many other applications. It is a fundamental step in image processing. Edges are the sign of lack of continuity, usually a jump of intensity in the image. Discontinuities in the image intensity can be either step edge, where the image intensity abruptly changes from one value on one side of the discontinuity to a different value on the opposite side, or line edges, where the image intensity abruptly changes value but then returns to the starting value within some short distance. However, step and line edges are very rare in real images. Because of low frequency components or the smoothing introduced by most sensing devices, sharp discontinuities rarely exist in real signals. Step edges become ramp edges and line edges become roof edges, where intensity changes are not instantaneous but occur over a finite distance. The common edge detection methods use a variety of masks and perform convolution across the entire image. 17 Because the resulting edge images from masks are always directional due to the nature of masks, these edge images are then added together to form one final edge image which includes edge of all directions. The obvious draw back of these methods is the directional effect of the resulting edge image. When the edge’s direction is know, these methods can save a lot of effort on edge detection. But when dealing with a more generic situation, using masks to detect edge is not a good choice and can be very time consuming. Therefore, adaptive threshold method is introduced to get possible edge points in this project. 2 2 AMI) _Iz-_#2L P1 20'2 P2 20% _—_____ 1 +__ p(z) mole 4275026 AT2+BT+C=O _ 2 2 __ 2 2) B—ZIyloz —,u20'1 2 2 2 2 2 2 01’ C=c7]1,u2 —0'2,u1 +201 202 In 02}: 12 Using the above equations, software can quickly threshold the image using an adaptive method and not worry about different illumination effect on the image. Not like traditional threshold, the result of adaptive threshold is an edge image. I8 Adaptive threshold method will look at a smaller region of the entire image and produce a proper threshold value for that particular region. Generally, the region either contains pixels from a variety of gray levels or all the pixels are in one gray level. In the former case, adaptive threshold can apply a proper threshold value to the region to distinguish the difference in gray levels. In the later case, adaptive threshold can simply set all pixels to high since they all belong to one gray level. Notice that in the later case, no matter the pixels in the region has a higher gray level or lower gray level, they all get assigned to high in the output. This is very different comparing to traditional threshold operation. That is the main reason why adaptive threshold operation can provide an edge image and can be considered as an edge detection method. The resulting edge image may still contain some unwanted features. Additional morphological operators such opening and closing operator can eliminate small noisy features. Connected component is another helpful algorithm can be used to further analyze the image and remove noisy elements. As mentioned in section 3.2, a ROI is used to roughly locate the desired object and its initial location is provided by user. User draws a line or shape roughly near the objected being tracked and a ROI is created based on that drawing’s starting and ending point. The initial location of object being tracked and ROI is usually hard to search automatically by software since it requires extensive computation and has a higher error rate. User-machine cooperation reduces the I9 computation cost and possibility of error. The ROI location is updated by software after the initial step using the detected desired object’s properties. 5.3 Unwanted features within ROI No matter how careful the image segmentation is performed, usually the resulting edge image in the ROI still contains a lot of features other than the desired target object. To prevent them from confusing the software with the desired target object in future analysis, a registry of these features is generated. The registry is constructed in a way that erases the pixel values around the initial detected desired object. The registry updated after every frame analysis as mentioned in previous sections. When the next frame of image is grabbed by the frame grabber and the image segmentation is performed, an XOR function of the registry generated by last frame analysis and the new edge image can successfully remove confusing features and leaving only the desired target object. This method works both when the object being tracked is stationary and moving within the estimated ROI. In stationary case, since the registry doesn’t contain the desired target object, the XOR function can bring the object back and remove all the background. In moving case, the target object traveled to a new location and changed the image in that area. Therefore, the XOR function can detect the difference and isolate the moved target object. After the new location of target object is found, use the new frame to create an updated registry for next frame’s analysis. Therefore, if the 20 camera doesn’t malfunction during image caption, the registry is always updated and the moving/stationary target object can always be isolated from the background or noisy features. 5.4 Edge fitting Only detecting the moving target object will not help the manufacturer much if no further information is generated. Therefore, this system includes a function to calculate the target object’s certain properties and provide this information. This information can be applied by the manufacturer to perform any future task they want. Meanwhile, this information is also feed back to the tracking system as input for estimation purpose. An estimation of next frame’s ROI location is produced based on this useful information. In this system, the function to calculate line property is included due to the nature of the test example in the following sections. Since this computation must be done in a fast and accurate way, simple form of linear regression model is adapted to fit the resulting edge points obtained to a straight line. This method avoids any minimization of a function which has the possibility being trapped in a local minimum and either providing wrong results or end up in a dead loop. The following equations are used for calculating the line parameters. x , y represent the average of edge points coordinates. yi =a+flxi +53 21 To reduce computation time, only area within ROI is used. The area is chopped into 10-15 stripes. In each stripe, a horizontal line scan is performed to get multiple line scan profiles. Using these profiles, gradient method is applied to find the edge points and get an average of the total line points of each stripe. Therefore, each stripe can generate one point and totally 10 to 15 points is produced for the linear regression model to calculate the straight line’s slope and offset. This information is then sent to client through Ethernet connection between the server and client. 5.5 Calibration Once a camera is set up and position is fixed, it can be calibrated to provide measurements in engineer units instead of pixel values. This is very useful in manufacture process since the real world results can be applied to control systems as inputs to form a close loop control system and guiding a robot arm to finish a task. As mentioned in previous sections, stereo vision can avoid the calibration process of using CAD model which usually include the task of accurately locate certain special points in the image. In this research, due to the 22 nature of the test example, stereo vision is not an option so that calibration using CAD model is utilized. From another perspective, stereo vision can’t be applied to all system while there must be CAD model for machines used in manufacture process. Therefore, calibration using CAD model is a more generic method which can be applied to all tracking system works in manufacture process. This process of using CAD model for calibration usually includes a lot of image processing effort in order to accurately identify certain points and provide their image coordinates. In order to reduce the effort on identifying useful points, a better approach is to choose unique points such as points at a particular comer or some other unique properties which made them stands out and easy to identify. This extra effort invested before programming the software is very important because it saves a lot of computation power and the calibration is more robust and reliable. An alternative method to avoid this effort in identifying these unique points is to take the advantage of the system’s set up to use human-machine interaction. This means a list of particular points on the CAD model is stored orderly in the software and a user will locate these points in the same order. In this case, the points should have similar properties so that the software can use the same method to refine the user’s selection and provide more accurate coordination for these points. To sum this up, basically, these certain points’ image coordinates in pixel values can be paired up with their CAD model coordinates in engineering units. A set of equations can then use these pairs to 23 produce a transformation which describes the camera model and transform the pixels on image plane to the real world coordinates. There is one more important factor will affect the performance calibration process. All these pre-defined points in both methods must be presented in the image. If the image doesn’t contain these points, the result will be either a false location or lack of enough points to perform proper calibration. Either way the resulting transformation matrix of the camera model will have a large error of converting the image coordinates to engineering units. There are many existing proven calibration methods and in this research we use affine transformation [13] to get the camera model. This method is very straight forward to understand and the performance is very reliable. C _ W P _ T(tx,ty , t, )R(a, fl, y) P C W W P: CTR(a, ,6, y, tx , ty , t2) P ’P = (F’ S ZH(f )ZTRWfl, 7.t,,t,,t,)) WP w _ l .. _ _ Px S Pr Cl] 612 C13 614 WP SIP - c c c c y c — 21 22 23 24 WP Z 5 _C31 C32 C33 1 _ ._ a 1 The superscript “C”, “I” represents camera plain and image plain respectively. Symbols “7” and “R" are transformation and rotation matrix. “s” is scaling factor. 24 There are total of 12 parameters need to be determined in the equations which means at least 6 unique points with known CAD model coordinates must be located within the image. However, the number of points used by this method is not limited to 6. The points used toward this method are identified by software or user. Either way there may be an offset embedded in the points’ coordination. In order to refine the camera model and improve the accuracy of the transformation, more points with coordination can be introduced to the method. All together, these points can be applied to the above equations along with least square fit to minimize the error of the camera model. Note that at the beginning of this section, camera calibration is a one time procedure. Therefore, as long as the camera doesn’t move, the camera model will not change. Because of this nature, the camera calibration procedure doesn’t have a strict time limit requirement as the tracking procedure. It is acceptable if the calibration process requires a reasonable longer time to finish its procedure in order to provide an accurate camera model. 25 6. Result and analysis In this research, glass manufacture video is used as example to test system performance. A standard glass manufacture can be break down into five main stages shown in Figure 4. 0 Raw Material Feed o Tin Bath Figure 4 Glass manufacture process. Firstly, the raw materials of glass which usually include a mixture of silica sand, soda ash and so on are blended and fed into the production machine. Secondly, the mixture of raw material is melted into liquid form under extreme temperature usually 1600°C or higher. Thirdly, the liquid glass is cooled down to temperature between 1 100°C and 1300°C. It is during this stage all bubbles formed in the viscous liquid glass flow are removed so that the liquid glass is ready for the next step. Fourthly, while maintaining its temperature, liquid glass float smoothly and uniformly distributed on melted tin to form the shape and thickness desired. The equipment completing this task is called tin bath. It is at 26 this stage that the tracking system came in to play an important role of tracking the liquid glass edge and calculate its distance between the edge and the machine grabbing the liquid glass flow. The machine is the plate shaped object in Figure 1. The machine has the ability to grab the glass flow and by controlling the machine’s position and angle of operation, certain thickness and width of glass can be produced. The glass between the actual liquid glass flow edge and machine is eventually cut and put back to stage one as raw material. Therefore, by minimizing the distance between liquid glass flow edge and machine can save the glass manufacturer a lot of money. However, a certain threshold of the distance is desired to ensure that the liquid glass doesn’t loose from the machine and shrink causing the production process to halt and waste a lot of money. The tracking system can provide necessary data for the production line to operate safely and continuously while optimizing its cost. It is shown from Figure 1 that the image is dark and contains a lot of noise both from the environment and the various objects inside the view of camera which can also be shown from Figure 5 in an enlarged region of interest (ROI) view. Therefore, this situation matches the purpose of this research and is chosen to be the experiment to test the proposed image processing algorithm and hardware/software system structure. Hardware engineers have figured out a way to retrieve video signals from the very hot tin bath. However, only one camera can be used to point at a particular view inside the tin bath which in this case is the machine and its adjacent liquid glass flow 27 edge. This hardware setup limited the ability of using stereo vision technique but it is still possible to calibrate the camera using various unique points of machine and its corresponding CAD model. The last stage of glass manufacture is just cooling the glass down to return its solid form. Results and analysis is presented in the following section. Figure 5 Sample ROI image and tin bath images 28 (C) Figure 5 continue Figure 5 shows a sample ROI image in which line indicates glass edge. It is obvious that without further knowledge, even human can not separate glass edge with other edges in the image. The multiple short white stripes are heating elements reflections. There is a slight grayscale difference between the liquid glass and the melted tin in this ROI view. This grayscale difference is hard to distinguish at most part of the camera view but much easier to identify at the bottom where shadow from the machine presents. Unfortunately, the shadow is 29 not always presented in the camera view and the edge inside the shadow is too short for edge property extraction. The lack of edge data points will result a large error of the slope and offset of the estimated straight line fitted using these points. Therefore, the image processing algorithm must be able to distinguish the grayscale difference between the liquid glass flow and the melted tin and use as many edge points as possible to provide enough data for the line fitting function to estimate the edge’s slope and offset and calculate the distance between the liquid glass edge and the machine. Figure 6 Adaptive threshold with and without noise removal 30 (d) Figure 6 continue Figure 6 shows the result of both stand alone adaptive threshold and adaptive threshold with 2-D noise removal filter. Figure 6a is threshold image without noise removal and 5b is threshold with noise removal. The parameter of adaptive threshold is a region size of 15x15. This parameter is obtained by many experiments done on different situations of input images. Different region size of threshold will have different effect on the output result image. If the region size is too large, there are two defects. Firstly, it will increase the processing time needed to calculate the threshold value for each block. Secondly, since the region is increased, a lot of fine details are included in the region. After threshold is applied, many of them will be eliminated. This is a really unwanted because the main purpose of apply threshold to the image is to reduce the image data size while maintaining as many detail as possible. On the other hand, if the region size is too 31 small, there won’t be enough data within the region to produce a correct threshold value. Therefore, while applying adaptive threshold method, region size is crucial and must be tuned to get the best performance. Since this is an empirical result, it is made tunable in the different situations. Comparing Figure 6a and Figure 6b, one can easily notice that without noise filtering, the resulting image will contain many salt and pepper noises. These are some high frequency components presented in the original image. If the salt and pepper noise is left untouched in the image, it will greatly reduce the accuracy of the following steps and eventually reduce the overall system performance. Therefore, it can be shown that filtering is a must do step for real application image processing. Figure 7 Global threshold Figure 7 here shows the result of a simple global threshold. All fine features including the edge information are lost using this method. This is because that the 32 liquid glass flow edge only has a slight difference from melted tin and these gray levels all fall into one region on the gray level histogram on the entire image. Therefore, after global threshold, they all merged into one level. In real applications, especially when lighting is inconsistent and causing uneven distribution of illumination, adaptive threshold is much better than global threshold. Figure 8 Opening and Closing Operation Effects 33 [I- *2 at... M 1“: I\;\.“.;‘;U| : I I (b) Figure 8 Continue Figure 8(b) show result after opening followed by closing operations of the adaptive threshold image of Figure 8(a). This operation further removes small objects in the image. From the above figures one can notice that in some case, even with noise filtering, the output image still contains noise in the form of small objects. These objects are small compare to the actual features in the image, but they are large enough to pass the high frequency removal filter. There are several ways to remove these small objects. A simple one is to apply connected component algorithm on the entire image and then remove the ones with small area property. Connected component is very easy to implement but requires a relatively large portion of memory to complete the algorithm as well as longer process time depending on the image’s size. These are really bad for an on-line tracking system. So we move to an even simpler pixel-wise method to get rid of 34 the small objects. An opening operation followed by a closing operation is proven very efficient at getting rid of most noisy small objects. These operations also have a very low demand of memory since they are pixel-wise operations and timing efficient. In addition to its effectiveness, the opening and closing operation connected some of the broken edges to regenerate its correct form which provides a good platform for the data analysis in the later steps. F I FWWIWW'KVW Figure 9 Registry of non-interest edges Figure 10 Result of XOR between registry and new processed frame In the iterative framework, it is important to distinguish the object to be tracked from the background. In the glass manufacture case, the background image is 35 very complicated and very hard to be distinguished since the heating element reflection has the similar shape to the liquid glass edge. This situation may also happen to many other applications. Therefore, creating some kind of registry to store the background information is important. This registry can then be used as a reference to distinguish the object being tracked from the background image easily. Since this is an iterative framework, the registry must also be updated because the background may have some changes over a period of time. The change can either be caused by a move of the camera or the lighting is different. If it is within the scope of processing time, it is recommended to update the registry for each frame but it is not required. Different applications may have a different approach on this matter. In Figure 9 and Figure 10, shows the result of edge registry and XOR operation result of processed new frame image. It can be shown that with the correct background registry, the moving object is easily identified and separated. The result of Figure 10 can be used for line fitting of the next step. Figure 11 ROI and line scan sample Figure 11 show the ROI image and a sample line scan profile. The horizontal lines divide the ROI into multiple stripes. Within each stripe, line scan generates 36 multiple line profiles. The spikes of the line profile indicate edges of the line scan. This ROI image contains the liquid glass edge and some background component as well. Therefore in the line scan profile, multiple spikes are observed. This has proven that correctly separate object being tracked from the background is very important and can save a lot of analysis effort in the following steps. With the correct separation, the ROI will only contain the moving object, and the line scan should only have one spike for data retrieving and analysis. However, if the ROI contains some noise like it is shown in Figure 11, there is still possible method to identify the correct spike in the line scan to be used for data analysis. By looking at previous frames result, program can give an estimate of the moving object’s position. Based on the estimation, the program can look for the nearest spike in the line scan profile and treat that as the object. The spike contains a rising edge and falling edge. The mid-point is considered to be the actual object location and is generated as the line scan profile’s output. There are multiple line scan profiles in each stripe and many object location points are produced by analyzing the line scan profiles. Averaging multiple line scan within each stripe will generate one object location point for further analysis. In this glass manufacture case, the liquid glass edge’s property is the primary goal and it can be fitted into a straight line. Using the object location points produced by each stripe of the ROI and applying the line fitting method in section 4.4, the result is the edge’s slope and offset. These values are used to calculate the 37 distance between the liquid glass edge and the machine using standard distance from point to straight line formula. The result can be either in pixel distance or in engineering units since the camera has been calibrated using method provided in section 4.5. The results in engineering units have the error rate less than 5% from Figure 13. Figure 12 Result of fitted edge 38 (b) Figure 12 Continue Figure 12 shows result of two consequent frame with result edge showing on top of original image as overlay drawing. The total process time of each frame is less than 0.5 second. 4o . . . AW 30 - . E 20- _ 10 . - 0o 20 4b 610 3b Figure 13 Calibration result 39' 100 7. Conclusion and future works The distributed system set up enabled the tracking system to handle multiple cameras multiple objects tracking simultaneously. The utilization of server machine ensured the hardware reliability to meet manufacture standard and can run 24/7. The iterative software structure simplified the computational complexity and provided more robustness with proper constrains setting. The calibration process of camera provided the ability of transforming pixel value and image coordinates to more meaningful engineering units which can be used as feedback to engineers to determine how to optimize the manufacture line. The image processing techniques included in this system provides accurate and robust image analysis results. Moreover, human-machine interaction is introduced to this tracking system so that operators can provide necessary support to the tracking algorithm to further improve the automation process’ reliability. Because image processing is separated to server which has a higher computation power, image acquisition and analysis time is much shorter compare to the object’s movement. This system can be used for edge tracking or other or rigid object tracking. The advantage of the hardware system set up also provides the ability of remote accessing and control for future works since Ethernet communication is introduced. After all, this system is developed focusing on industrial applications. Therefore, reliability is the most important issue while accuracy has a higher tolerance comparing to research applications. From the 40 experiment result, this visual tracking system takes less than 0.5 second to finish analysis for a single frame of image and produce information. For a single camera tracking, the processor usage on server machine is less than 20%. Note that multiple tracking modules can work at the same time to accomplish multiple camera tracking and the total processing time will not be affected. Future works of this project can include backup method for edge detection like template matching to guarantee edge tracking of all time. This effort must be made within the hardware resource’s availability since template machining is also a computational demanding algorithm. An image understanding process which can automatically determine the image’s characteristics will also help complete the automation of this system. With an image understanding process and a high level decision rule which can also be implemented to determine under what circumstance which tracking method is preferred or has the higher probability of the best performance, this system can be pushed to its limit of best performance possible. As mentioned in previous sections, camera set up can be altered to form a stereo vision system to avoid CAD model for camera calibration and reduces the calibration error. This system only records a small number of historical data to provide necessary data to support estimation procedure. A database system can also be introduced to interface with this tracking system to record all historical data which can be used to either provide data log for the manufacturer or to statistical model analysis to improve the manufacture process. Adding a database 41 to the system also includes the option of interfacing with other manufacture process. This tracking system is a base model which can still be further improved in many ways to achieve the goal of industrial manufacture process automation and optimization. 42 REFERENCES: [1] Heping Chen, William Eakins (2009). Robotic Wheel Loading Proofis in Agtomotive Manufacturing Automation. IEEE/RSJ lntemational Conference on Intelligent Robots and Systems. [2] Kemal Berk Yesin, Bradley J. Nelson (2004). Robust CAD Model Based Visual Tracking for 3DLMicroe§semblv Using Image Space Potentials. IEEE lntemational Conference on Robotics and Automation. [3] Zhenbang Gong, Wei Ding and Hairong Zou (2006). Data-logging and Monitoring of Prciuction Auto-lines Based on Visual-tracking Tech. IEEE. [4] Hong Liu, Ying Shi (2009). Robust Visual Tracking Based On Selective Attention Shift. IEEE Multi-conference on Systems and Control. [5] Masaaki Shibatal and Nobuaki Kobayashi (2006). Image-based visual tracking for moving targets with active stereo vision rpm. SlCE-ICASE lntemational Joint Conference [6] A. Aouf, H. Rajabi, N. Rajabi, H.Alanbari and C. Perron (2004).Visual object trackirgbvga camera mounted on a 6DOF industrial robot. IEEE conference on Robotics, Automation and Mechatronics. [7] Tae Hyoung Park and Beom Hee Lee (1997). Dynamic Tracking Line: Feasible Tracking Region of a Robot in Convevor Svstems. IEEE Transactions on Systems, Man, and Cybernetics, vol. 27, pp. 1022-1030, 1997 [8] N. J. Ferrier (1998). Performance of Visual Tracking Systems: Implications for Visual Controlled Motion. Conference on Decision & Control [9] LARRY S. DAVIS (1975), A Survev of Edge Detection Technigues, Computer Graphics and Image Processing 1975, 4, 248-270 [10] Rafael Gonzalez, Richard Woods (2002). Digital Image Processing, Prentice Hall [11] Alexander Borst (2007). Correlation Vers_us Gragient Tvpe Motion _D_etectors_:_ the pros egg cons. Phil. Trans. R. Soc. B 362, 369—374 43 [12] Naoyuki Sawasaki, Toshihiko Morita, Takashi Uchiyama (1996). Design and Implementation of High-speep VISII_aI1Tracki_ng Svstem for Real-Time Motion Analysis. lntemational Conference on Pattern Recognition [13] Linda Shapiro, George Stockman (2001 ). Computer Vision. Prentice Hall [14] Kazuhiko Kawamoto (2008). Adaptive Sampling for Bayesian Visual Tracking [15] P. F. Mclauchlan, J. Malik (1997). Vision for Longitudinal Vehicle Control. ITSC 97, IEEE Conference, Nov. 1997 pp 918-923 [16] M. P. Groover, M. Weiss, R. Nagel, and N. G. Odrey (1986) Industrial Robotics: Technolodv. Proarammipg and Applications. McGraw-Hill [17] T. H. Park and B. H. Lee, An approach to robot motion analvsis and planning for conveyor tracking, IEEE Trans. Syst., Man, Cybem., vol. 22, pp. 378-384, [18] Jung Uk Cho, Seung Hun Jin, Xuan Dai Pham (2007), FPGA-Based Real-Time Visual Trackinq Svstem Using Adaptive Color Histograms, IEEE lntemational Conference on Robotics and Biomimetics [19] G. R. Bradski (1998), Real Time Face and Object Tracking as a Component of a Perceptual User Interface, IEEE workshop on Applications of Computer Vision, Princeton, pp. 214-219 [20] K. Nummiaro, E. Koller—Meier, L. Van Gool (2002), A Color-Based Particle_ Filter, Generative-ModeI-Based Vision, pp. 53-60 [21] S. Fleck, W. Straber (2005), Adaptive Probabilistic Tracking Embedded in a Smart Camcfl, IEEE Computer Vision and Pattern Recognition, Vol. 3, pp. 134-142 [22] Andrew W. B. Smith, Brian C. Lovell (2005), Measurement Function Desigg for Vispal TraLcking Applications, lntemational Conference on Pattern Recognition [23] T. F. Cootes, C. J. Taylor (1992). Active shape models - ‘Smart Snakes’. British Machine Vision Conference [24] A. Blake, R. Curwen, A. Zisserrnan (1993). A framework for :spatio-temporal control in the tracking of vispal contours. lntemational Journal of Computer Vision [25] P. Beardsley, A. Zisserrnan (1995). Affine structure from motion. lntemational Conference on Computer Vision [26] O. Faugeras, F. Lustman, G. Toscani (1987). Motion and structure from motion from flint and line matches. Conference on Computer Vision [27] D. Koller, D. Danilidis, H.-H. Nagel (1993). Modelbafsed opiect trackingip monocular image seguences of road traffic scenes. Journal of Computer Vision, 10(3):257-281 [28] Bijoy K. Ghosh, Di Xiao (1997), S_ensor-Guided Manipulation in a Manufacturing workcell. lntemational Conference on Intelligent Robots and Systems, pp. 1403-1407 [29] Hesheng Wang, Yun-Hui Liu (2006). Uncalibrated Visual Tracking Control without Visual Velocity. IEEE lntemational Conference on Robotics and Automation [30] A. Astolfi, L. Hsu, M. Netto, R. Ortega (2002), Two Solutions to the Adaptive Visual Servoind Problem, IEEE Transaction on Robotics and Automation, vol. 18, no.3, pp. 387-392 45 'l I III I ll Ill ll l‘l‘ I ll I l I.’ I II I l