APPLIED PROBLEMS
3D Reconstruction for a Scanning Electron Microscope A. A. Zolotukhin, I. V. Safonov, and K. A. Kryzhanovskii National Research Nuclear University MEPhi, Kashirskoe shosse 31, Moscow, 115409 Russia email:
[email protected],
[email protected],
[email protected] Abstract—An algorithm for the reconstruction of the 3D shape of the surface of a microobject from a stereo pair of images obtained on a raster electron microscope (REM) has been considered. A model of building an image in REM has been presented. The SIFT algorithm was used for determination of the correspondence points. The correspondence points are used to determine the mutual position of the object in a stereo pair by the RANSAC method. A set of points is created in the 3D space, which is later interpolated to reconstruct the 3D surface. Keywords: stereo construction, stereo pair, microobject, shape reconstruction, REM. DOI: 10.1134/S105466181301015X
INTRODUCTION
A MODEL OF THE REM IMAGES
The reconstruction of the 3D model of a micro object makes it possible to analyze its properties. For example, properties such as the roughness or the cor rosion extent of the surface can be estimated from the 3D model. A 3D model can be obtained on a special device (e.g., an atomicforce microscope, AFM); however, such solutions have some physical limita tions. In this work, we propose a method of recon struction of the approximate 3D shape of the object from the stereo pair of images obtained on a raster electron microscope (REM). An example of such a stereo pair is shown in Fig. 1. Algorithms of the recon struction on the basis of disparity such as [3] and [6] require a linear shift without rotation of the objects or the camera; a focus distance for the transition from the shift of separate objects to the depth is also necessary. This makes the use of such methods for REM images impossible without additional processing. Other reconstruction methods are based on calculation of the essential matrix and the mutual location of the camera in the stereo pair; they produce a set of points in 3D space, which should be recognized in order to compose a 3D object. The proposed algorithm receives two images in the stereo pair at the input, finds the correspondence points, calculates the mutual location of the images in the stereo pair, and builds the 3D surface. This algorithm is applicable for the recon struction of surfaces completely visible in the images; invisible (occluded by other objects) surface fragments will not be reconstructed. To obtain better results, the images in the stereo pair should be obtained from dif ferent camera angles. To obtain this in REM, the angle of the object inclination is changed.
Most of the known reconstruction algorithms are aimed at processing images obtained on a common photo camera. Such photo camera (Fig. 2) consists of a lens and a matrix (CCD or CMOS). All beams con necting the points of a real object and their projections on the image should meet in the focus. A raster elec tron microscope [4] (Fig. 3) has a large magnification coefficient (up to 500000x). The electron beam is radiated and deflected before the collision with the object surface. Collision leads to the reflection of the electron beam and also to the secondary radiation of electrons. The reflected electrons are registered by one or several detectors. In REM, the beams do not meet at the measure ment point as occurs in an ordinary digital photo cam era; however, the beams also meet at one point (the deflection system). This makes it possible to adapt the algorithms, which are intended for conventional photo cameras, for the REM images. A wide range of possible magnification coeffi cients, the possibility to vary the resolution, and also the necessity to focus the beam on the surface are fea tures of REM.
Received March 14, 2011
Fig. 1. Example of a stereo pair obtained by means of REM.
ISSN 10546618, Pattern Recognition and Image Analysis, 2013, Vol. 23, No. 1, pp. 168–174. © Pleiades Publishing, Ltd., 2013.
3D RECONSTRUCTION FOR A SCANNING ELECTRON MICROSCOPE
169
Electron emitter Electron beam Deflection system Electron detector
Collision point Object
Lens
Matrix
Focus
Impact point Object
Fig. 2. Scheme of a digital photo camera.
Fig. 3. Scheme of a raster electron microscope.
Figure 4 shows the scheme of obtaining an REM image. The working distance WD is tuned. The optical OZ axis coincides with the direction of the nonde flected electron beam. In contrast to the digital photo camera, the position of the projection plane in REM is not determined. It is necessary to use a virtual plane located perpendicular to the OZ axis. Let us consider a model of the pinhole camera: the image plane is placed on the opposite side of the object with respect to the origin (in the negative direction of the OZ axis). The motion of the plane along the axis affects the scale only; to simplify the calculations, it is reasonable to assume the focus distance as f = WD. (1) Then the scale of the image is 1:1. It occurs at large magnification that the working distance WD exceeds the maximum difference of the depth of the object points by several orders of magnitude (2–3) due to the small depth of focus of the REM beam. In the pinhole camera formula
images in the stereo pair will lead to the displacement of the object points depending on their distance to the optical center (depth). This displacement of object fragments is the only source of information for the 3D reconstruction. The REM sample stage makes it pos sible to move or incline the object; in contrast to the common photo cameras, inclination of the object is necessary for the 3D reconstruction of the REM images since motion without inclination leads to the motion of all fragments in the same direction and to the same distance; i.e., it leads to the absence of the parallax.
where x, y are the coordinates of the point in the image, X, Y, Z are the coordinates of the point in the space, f is the focus distance, the quantity f/Z can be considered a constant, and the camera is normalized (calibrated). This assumption makes it possible to eliminate errors due to the calculation operations on floating point numbers that differ by several orders of magnitude, as well as in cases when the accuracy of the determination of the coordinates of points is limited by the resolution of the digital image in the stereo pair. The scales of the X–Y and Z coordinates in recon structions in this case will not coincide, and for the coordinate Z it is necessary to determine the correc tion coefficient by comparing data about the depth in the reconstruction with the estimate of the depth by other method (e.g., with the help of the REM focus depth). Images in the stereo pair should be obtained from different camera angles. Such a mutual location of the PATTERN RECOGNITION AND IMAGE ANALYSIS
f
(2)
The correspondence points are determined at the first stage of the algorithm. Characteristic points are searched for on each image with the help of the SIFT (ScaleInvariant Feature Transform) algorithm [1, 2]. SIFT creates a Gaussian pyramid, i.e., a sequence of images is smoothed with different, successively increasing σ values. The neighboring images in the sequence are subtracted in order to create the Differ ence of Gaussians. Extrema on the Differences of Gaussians are used as the basis for SIFT points. These characteristic points are filtered leaving only unique points with highly expressed properties: lowcontrast points and those on the object boundaries are deleted.
Image plane WD
⎛ x⎞ f⎛ ⎞ ⎜ ⎟ = ⎜ X ⎟ , z⎝ Y ⎠ ⎝ y⎠
CORRESPONDENCE POINTS
Optical center Electron beam
Z
Object
Fig. 4. Optical REM model. Vol. 23
No. 1
2013
170
ZOLOTUKHIN et al. Start
Composition of the Gaussian pyramid
Test of the SIFT criteria
Composition of the pyramid of the Gaussian differences
no
Are criteria fulfilled? yes Determination of the direction of the key point
Search for the possible key point
The k–d tree to search for the correspondence pairs is often used along with SIFT. This method func tions much faster than the above enumeration of all possible pairs. The k–d tree makes it possible to find for the characteristic point the corresponding pair with the Euclidean distance no larger than the given one. In the case, when the point has several correspon dence pairs, the first one found is chosen. Such a method functions well for problems of the determina tion of the mutual location of images, object recogni tion, etc., where it is sufficient to have several corre spondence pairs with the maximum similarity of the characteristic pairs. However, it was discovered while testing the program that the use of the k–d tree lowers the reconstruction fidelity: only one correspondence pair will be checked at an ambiguous correspondence of the characteristic point on one image to several points on the other image. FUNDAMENTAL MATRIX
the point is found
no
yes
composition of the descriptor
Saving of the key point with the descriptor
end
Fig. 5. Blockscheme of the algorithm of the determina tion of the key SIFT points.
Each SIFT feature point has a descriptor, a 128 dimensional vector calculated from the gradients in the vicinity of the point. The SIFT points are invariant to rotation since they are directed according to the main gradients in their position. A pair of feature points is considered to be a correspondence pair if the Euclidean distance between their vectors is less than the threshold t = 0.5 and the points in the pair belong to different images. A list of the correspondence pairs is compiled after obtaining the SIFT feature points on both images. The threshold value t was chosen as a result of a large number of experiments on images with 1000–2000 SIFT feature points; the reduction of the threshold lowers the construction specification and its increase increases the number of “false” correspon dence pairs to filter, which is necessary to increase the number of iterations of the RANSAC method. This leads to an increase in the execution time of the recon struction algorithm by several orders of magnitude at a slight increase in the fidelity of the 3D object shape.
To build the 3D model of the object, it is necessary to determine the mutual location of images. This can be performed manually by the introduction of the rotation and parallel transfer parameters. However, it is necessary to know exactly the rotation axes under rotation. This can turn out to be nontrivial in the case when REM does not present this information, as a rule, only the rotation angle is known. Another prob lem arises when the correspondence pairs are “false”: e.g., if a large number of similar objects are present in the image, some of which can be erroneously deter mined as the correspondence ones. Such images are typical for micro and nanoelectronics. The algorithm makes it possible to calculate the mutual position of images simultaneously with the fil tration of the correspondence pairs, the existence of which is impossible in this optical configuration. The RANSAC (RANdom Sample Consensus) method [5] is used to calculate the fundamental matrix using the 8point algorithm [10] for the eight randomly chosen correspondence pairs. To increase the accuracy of the 8point algorithm and decrease the error, all feature points before the start of the calculation of the fundamental matrix are transformed so that their common center of mass lies at the origin of coordinates (linear shift) and the aver age distance to this center of mass was 2 : x x' = T y y' 1 1 (3) s 0 – s*x c T = 0 s – s*y , c 00
PATTERN RECOGNITION AND IMAGE ANALYSIS
1 Vol. 23
No. 1
2013
3D RECONSTRUCTION FOR A SCANNING ELECTRON MICROSCOPE
where x, y are the coordinates of the feature point of the image, x', y' are the modified coordinates, xc, yc are the coordinates of the center of mass, and T is the matrix of the coordinate transformation. The scaling coefficient s is calculated according to the formula
171
Start
i=0
S = 22, R
(4)
where R is the average distance of the points to the center of mass. After calculation of the fundamental matrix F, it will be transformed for the correspondence to the initial coordinate of the feature points: F = T '2 F'T 1 ,
i: = i + 1
i < k?
(5)
no
where F ' is the fundamental matrix calculated by the RANSAC method, F is the final fundamental matrix, and T1 and T2 are the matrices of the coordinate trans formation for the first and second images, respectively.
Choice of eight correspondence pairs, addition to the current set
Number of pairs in the current set is larger than that in thesaved one saved one
The formula below is used for calculation of the error Df showing whether or not the correspondence pair may exist for the calculated fundamental matrix: T
2
T
yes
2
( x 2 Ex 1 ) ( x 2 Ex 1 ) 2 + D e = 2 , 2 2 ( Ex 1 ) x + ( Ex 1 ) y ( E T x 2 ) x + ( E T x 2 ) y
Saving of the current set of pairs and F
F calculation
(6)
where F is the fundamental matrix, x1 is the coordinate of the point in the first image, and x2 is the coordinate of the corresponding point in the second image. This formula expresses the distance from the point to the epipolar lines determined by the fundamental matrix, and during testing, it provided the most stable and reli able results. A correspondence point with the error larger than Df > 10–18 is discarded. The use of Df = 10– 18 as the threshold value made it possible during testing to filter the majority of the “false” pairs and at the same time to keep the majority of the correct corre spondence pairs. RANSAC continues to choose eight random points, calculates the fundamental matrix, and filters points during k = 10000 iterations. As a result, it chooses the fundamental matrix with the largest num ber of points that passed the filtration. The k value was chosen as a compromise between the speed and the reliability. In contrast to the traditional RANSAC algorithm, the fundamental matrix is not recalculated over all points filtered at the definite iteration. One “false” correspondence pair is sufficient to considerably affect the fundamental matrix; it is impossible to determine always and absolutely accurately such pairs for their deletion. The comparison criterion for the set of points chosen during the iteration is their number and not the summary error. PATTERN RECOGNITION AND IMAGE ANALYSIS
Addition to the set of all correspon dence pairs, for which Da < 10−18
Ecalculation of F on the saved set of pairs
Is not performed Result: the saved set of pairs and the F matrix
The number of pairs and not the total error is a criterion
end
Fig. 6. Modification of the RANSAC method.
The calculated fundamental matrix makes it possi ble to transform the point in one image into the epipo lar line of the other image. The conclusion made in the section “Model of REM Images” about the normalization of the camera makes it possible to equalize the matrix of the camera K and the unit matrix 3 × 3 in the formula of the transition from the fundamental matrix F to the essential matrix E: T
E = K FK, Vol. 23
No. 1
2013
(7)
172
ZOLOTUKHIN et al.
Input of initial images
Search for SIFT feature points in each image
TRIANGULATION OF POINTS After calculation of the rotation and translation matrices and the set of filtered points, the points are triangulated by the linear method [9]. These projec tion matrices are used for the first and second images, respectively: P = [I | 0] (10) P = [ R | T ]. The reconstruction is performed by solving this equa tion by performing the singular decomposition of the matrix AX = 0
Search for the best fundamental matrix and the set of points
Triangulation of points
xP 3 – P 1 A =
yP 3 – P 2
(11)
,
x'P '3 – P '1 y'P '3 – P '2
Search for correspondence pairs of the feature points
Surface building
Fig. 7. Blockscheme of the reconstruction process.
obtaining the expression E = F.
(8)
By transferring from the fundamental matrix F to the essential matrix E, it is possible to calculate the rota tion matrix (R) and the translation matrix (T), which determine the mutual location of the images of the ste reo pair. In the case when the projection is close to the par allel one (the working distance is several orders of magnitude larger than the linear sizes of the object fragments visible in the image), the submatrix 2 × 2 F will be zero and the fundamental matrix takes the form 00a F = 00 b , c d e
(9)
where a, b, c, d, e are nonzero numbers [8]. Since the fundamental matrix is normalized, due to the calcula tion error (the impossibility of obtaining exact zero), it is sufficient that the absolute value of elements of the submatrix F 2 × 2 be less than 10–15 (limiting of the floating point numbers of the type double). In the pro jection close to parallel, the epipolar lines are also par allel. The rotation matrix (R) and the translation matrix (T), which determine the mutual location of the images in the stereo pair, may be calculated from the fundamental matrix without the transition to the essential matrix.
where X are the homogeneous 3D coordinates of the points, (x, y) and (x', y') are the coordinates of the points on the first and second image, respectively, and Pi is the ith row of the projection matrix P. Before triangulation the points are temporarily transferred from the origin of coordinates since the values close to zero introduce distortions. The transfer distance is (10xmax, 10ymax), where xmax and ymax are the maximum coordinates x and y in the set of points, respectively. Triangulation is performed for each point belong ing to the first image inside the correspondence pair. The coordinates (x, y) are already known; the coordi nate z is triangulated giving the set of points in the 3D space at the output. This set of points contains data that may be inter polated or used as reference material in the additional algorithm of the 3D reconstruction. RECONSTRUCTION OF THE 3D SURFACE The set of points in the 3D space is interpolated to obtain the 3D surface. This set is preliminarily filtered to delete peaks produced by the false correspondence pairs, which were not deleted by filters at the earlier stages of the algorithm. The algorithm [7] was chosen for the interpolation. This algorithm is intended for compilation of 3D maps of the surface and transforms the cloud of points in the 3D space into a set of polygonstriangles the vertices of each of which correspond to the points from the initial data set. A part of the “false” coincidences may pass through the RANSAC filtration and appear in the resulting surface in the form of vertices with a sharp surge of the height, i.e., “peaks.” These “peaks” may be determined from the following condition: distances along the coordinate Z from a certain point A to all
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 23
No. 1
2013
3D RECONSTRUCTION FOR A SCANNING ELECTRON MICROSCOPE
173
The blockscheme of the total reconstruction pro cess is shown in Fig. 7. The final result of the recon struction of the stereo pair in Fig. 1 is shown in Fig. 8.
Fig. 8. Surface reconstruction, view from different camera angles.
other points of polygons (B1, C1, …, Bn, Cn), to which it belongs, are l = 10 times larger than the projections on the plane XY distances between points B1 – C1, …, Bn – Cn. In this case, the median of the coordinates Z of all points of polygons that contain A is used for the coordi nate Z. Each point is tested in this manner, and then the test is performed 10 times iteratively or until no “peaks” were detected after the next passage of the peak filter. This method deletes the separate “peaks” well, while it does not affect hollows, the existence of which is con firmed by several points in the bottom of such a hollow. After the formation of the 3D surface, it is displayed on the screen with the help of the viewing software. The obtained 3D shape may be saved as a screenshot, a text file with a set of points in space, a text file with a set of polygons in space, or a 3D model in the COLLADA format for subsequent import into the 3D editor. The surface has relative, and not absolute, coordi nates. Data are correct only for the polygon vertices, and the other points are interpolated. The scale for the coordinate z in the x–y plane can be given in the program interface, otherwise (for the input of the object on the screen on the whole) the coordinates will be normalized so that the coordinates in the x–y plane will be within [–10; 10], and the coordinate z within [–3; 0]; the scale for the coordi nates x and y is always the same. PATTERN RECOGNITION AND IMAGE ANALYSIS
CONCLUSIONS An algorithm has been elaborated that makes it possible to build a 3D model of a microobject on the basis of a stereo pair of images of its images obtained on a raster electron microscope (REM). The images should be obtained from two camera angles provided by the inclination of the sample stage; the inclination angle is obtained from REM automatically. The par allax in the stereo pair is needed for correct function ing of the algorithm. Software has been implemented that makes it pos sible to load the image of the stereo pair and to calcu late the 3D shape of the object in the automated mode and also to view and save it in a file. The 3D shape of the object is reconstructed from the key points found with the help of the SIFT algorithm. Then, the funda mental matrix is calculated by the RANSAC method and the points are filtered. The algorithm is adapted for work with the optical REM system with large mag nification coefficients (more than 500×), and it can use both parallel and perspective projection modes. The false coincidences of the SIFT points, which were not deleted by the RANSAC method, are additionally filtered. Only the part of the object seen on both images is reconstructed. It is necessary to set the scale on the x–y axes (image plane) and z in order to obtained the physical dimensions of the object. The work time of the algorithm for an image with a size of 1024 × 884 on a test system (Intel Core i5 M460 2.53 GHz, 2Gb RAM, Windows 7) was from 19 s to 1 min 19s and depends on the number of correspon dence pairs of the key SIFT points. At present we are working on increasing the fidelity of the algorithm and on reduction of the processing time. REFERENCES 1. D. G. Lowe, “Distinctive Image Features from Scale Invariant Keypoints,” Int. J. Comput. Vision 60 (2), 91–110 (2004). 2. D. G. Lowe, “Object Recognition from Local Scale Invariant Features,” in Proc. Int. Conf. on Computer Vision (Kerkyra, 1999). 3. J. Corso, D. Burschka, and G. Hager, “Direct Plane Tracking in Stereo Images for Mobile Navigation,” in Proc. International Conference on Robotics and Automa tion ICRA’2003 (Taipei, 2003). 4. J. I. Goldstein, Scanning Electron Microscopy and X Ray Microanalysis (Plenum Press, 1981). 5. M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Appli cations to Image Analysis and Automated Cartogra phy,” Comm. ACM 24, 381–395 (1981). 6. MiHyun Kim and KwangHoon Sohn, “EdgePre serving Directional Regularization Technique for Dis Vol. 23
No. 1
2013
174
ZOLOTUKHIN et al. parity Estimation of Stereoscopic Images,” IEEE Trans. Consumer Electron. (1999).
7. P. Bourke, “Efficient Triangulation Algorithm Suitable for Terrain Modelling,” in Proc. Pan Pacific Computer Conf. (Beijing, 1989). 8. O. Faugeras and QuangTuan Luong, The Geometry of Multiple Images (MIT Press, 2001). 9. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision (Cambridge, 2003). 10. Yi Ma, S. Soatto, J. Košecká, and S. Sastry, An Invita tion to 3D Vision (Springer, 2005).
Dmitry A. Zolotukhin. Received his MS (Specialist) degree in cyber netics from Moscow Engineering Physics Institute (MEPhI), Russia, in 2009. Since 2009 he has been working on his PhD thesis at National Research Nuclear Univer sity MEPhI. Research areas: pattern recognition, feature matching, reconstruction from stereoscopic images.
Ilia V. Safonov. Received his MS degree in automatic and electronic engineering from Moscow Engineer ing Physics Institute (MEPhI), Rus sia, in 1994 and his PhD degree in computer science from MEPhI in 1997. Since 1998 he has been an asso ciate professor in the Faculty of Cybernetics of MEPhI while con ducting research in image segmenta tion, features extraction, and pattern recognition problems. In 2004, he joined Samsung Research Center, Moscow, Russia, where he is engaged in photo, video, and document image enhancement projects.
Konstantin A. Kryzhanovskii. Received his MS degree in cybernet ics from Moscow Engineering Phys ics Institute (MEPhI), Russia, in 2000. Since 2009 he has worked as an instructor in the Faculty of Cybernet ics of MEPhI. Research areas: image enhancement and segmentation, fea tures extraction, and pattern recog nition problems.
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 23
No. 1
2013