14 pages

Quo vadis face recognition?

Please download to get full document.

View again

of 14
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Abstract Within the past decade, major advances have occurred in face recognition. With few exceptions, however, most research has been limited to training and testing on frontal views. Little is known about the extent to which face pose,
  Quo vadis Face Recognition? Ralph Gross, Jianbo ShiRobotics InstituteCarnegie Mellon UniversityPittsburgh, PA 15213   rgross,jshi ¡ @cs.cmu.eduJeffrey F. CohnDepartment of PsychologyUniversity of PittsburghPittsburgh, PA 15260 Abstract Within the past decade, major advances have occurred in face recognition. With few exceptions, however, most re-search has been limited to training and testing on frontalviews. Little is known about the extent to which face pose, illumination, expression, occlusion, and individualdifferences, such as those associated with gender, influencerecognition accuracy. We systematically varied these fac-tors to test the performance of two leading algorithms, onetemplate based and the other feature based. Image dataconsisted of over 21000 images from 3 publicly availabledatabases: CMU PIE, Cohn-Kanade, and AR databases. In general, both algorithms were robust to variation in illu-mination and expression. Recognition accuracy was highlysensitive to variation in pose. For frontal training images, performance was attenuated beginning at about 15 degrees. Beyond about 30 degrees, performance became unaccept-able. For non-frontal training images, fall off was moresevere. Small but consistent differences were found for in-dividual differences in subjects. These findings suggest di-rection for future research, including design of experimentsand data collection. 1. Introduction Is face recognition a solved problem? Over the last 30years face recognition has become one of the best studiedpattern recognition problems with a nearly intractable num-ber of publications. Many of the algorithms have demon-strated excellent recognition results, often with error ratesof less than 10 percent. These successes have led to thedevelopment of a number of commercial face recognitionsystems. Most of the current face recognition algorithmscan be categorized into two classes, image template basedor geometry feature-based. The template based methods [1]compute the correlation between a face and one or moremodel templates to estimate the face identity. Statisticaltools such as Support Vector Machines (SVM) [30, 21],Linear Discriminant Analysis (LDA) [2], Principal Compo-nent Analysis (PCA) [27, 29, 11], Kernel Methods [25, 17],and Neural Networks [24, 7, 12, 16] have been used to con-structa suitableset offace templates. While thesetemplatescan be viewed as features, they mostly capture global fea-tures of the face images. Facial occlusion is often difficultto handle in these approaches.The geometry feature-basedmethodsanalyzeexplicitlo-calfacial features, andtheirgeometric relationships. Cooteset al. have presented an active shape model in [15] extend-ing the approach by Yuille [34].Wiskott et al. developedan elastic Bunch graph matching algorithm for face recog-nition in [33]. Penev et. al [22] developed PCA into Lo-cal Feature Analysis (LFA). This technique is the basis foroneofthe mostsuccessfulcommercial facerecognitionsys-tems, FaceIt.Most face recognition algorithms focus on frontal facialviews. However, pose changes can often lead to large non-linear variation in facial appearance due to self-occlusionand self-shading. To address this issue, Moghaddam andPentland [20] presented a Bayesian approach using PCA asa probability density estimation tool. Li et al. [17] havedeveloped a view-based piece-wise SVM model for facerecognition. In the feature based approach, Cootes et al. [5]proposed a 3D active appearance model to explicitly com-pute the face pose variation. Vetter et at. [32, 31] learna 3D geometry-appearance model for face registration andmatching. However, today the exact trade-offs and limita-tion of these algorithms are relatively unknown.To evaluatetheperformanceofthese algorithms, Phillipset. al. have conducted the FERET face algorithm tests [23],based on the FERET database which now contains 14,126  images from 1,199 individuals. More recently the FacialRecognition Vendor Test [3] evaluated commercial systemsusing the FERET and HumanID databases. The test resultshaverevealed that important progress has beenmade in facerecognition, and many aspects of the face recognition prob-lems are now well understood. However, there still remainsa gap between these testing results and practical user ex-periences of commercial systems. While this gap can, andwill, benarrowed throughtheimprovements ofpractical de-tails such as sensor resolution and view selection, we wouldlike to understand clearly the fundamental capabilities andlimitations of current face recognition systems.In this paper, we will conduct a series of tests using twostate of art face recognition systems on three newly con-structed face databases to evaluate the effect of face pose,illumination, facial expression, occlusion and subject gen-der on face recognition performance.The paper is organized as follows. We describe the threedatabase used in our evaluation in Section 2. In Section3 we introduce the two algorithms we used for our eval-uations. The experimental procedures and results are pre-sented in Section 4, and we conclude in Section 5. 2. Description of Databases 2.1. Overview Table 1 gives an overview of the databases used in ourevaluation.CMU PIE Cohn-Kanade AR DBSubjects 68 105 116Poses 13 1 1Illuminations 43 3 3Expressions 3 6 3Occlusion 0 0 2Sessions 1 1 2 Table 1. Overview over databases. 2.2. CMU Pose Illumination Expression (PIE)database TheCMUPIEdatabasecontains atotalof41,368imagestaken from 68 individuals [26]. The subjects were imagedin the CMU 3D Room [14] using a set of 13 synchronizedhigh-quality color cameras and 21 flashes. The resultingimages are 640x480 in size, with 24-bit color resolution.The cameras and flashes are distributed in a hemisphere infront of the subject as shown in Figure 1.A series of images of a subject across the different posesis shown in Figure 2. Each subject was recorded under 4conditions:1. expression : thesubjectswereaskedtodisplayaneutralface, to smile, and to close their eyes in order to simu-late a blink. The images of all 13 cameras are availablein the database.2. illumination 1 : 21 flashes are individually turned onin a rapid sequence. In the first setting the imageswere captured with the room lights on. Each camerarecorded 24 images, 2 with no flashes, 21 with oneflashfiringandthena finalimagewithnoflashes. Onlythe output of three cameras (frontal, three-quarter andprofile view) was kept.3. illumination 2 : the procedure for the illumination 1 was repeated with the room lights off. The output of all 13 cameras was retained in the database. Combin-ing the two illumination settings, a total of 43 differentillumination conditions were recorded.4. talking : subjects counted starting at 1. 2 seconds (60frames) ofthemtalkingwere recorded using3 camerasasabove(againfrontal, three-quarterandprofile view).Figure 3 shows examples for illumination conditions 1 and2. 2.3. Cohn-Kanade AU-Coded Facial ExpressionDatabase This is a publicly available database from Carnegie Mel-lon University [13]. It contains image sequences of facialexpression from men and women of varying ethnic back-grounds. The camera orientation is frontal. Small head mo-tion is present. Image size is 640 by 480 pixels with 8-bitgray scale resolution. There are three variations in light-ing: ambient lighting, single-high-intensity lamp, and dualhigh-intensity lamps with reflective umbrellas. Facial ex-pressions are coded using the Facial Action Coding System[8] and also assigned emotion-specified labels. For the cur-rent study, we selected 714 image image sequences from105 subjects. Emotion expressions included happy, sur-prise, anger, disgust, fear, and sadness. Examples for thedifferent expressions are shown in Figure 4. 2.4. AR Face Database The publicly available AR database was collected at theComputer Vision Center in Barcelona [19]. It contains im-ages of 116 individuals (63 males and 53 females). Theimages are 768x576 pixels in size with 24-bit color resolu-tion. The subjects were recorded twice at a 2-week interval.  (a) βα cameraFlash (b) −1.5 −1 −0.5 0 0.5 1 1.5−0.3−       a 2557911142222729313437 Figure 1. PIE database camera positions. (a) 13 synchro-nized video cameras capture face images from multiple an-gles, 21 controlled flash units are evenly distributed aroundthe cameras. (b) A plot of the azimuth ( ¢ ) and altitude ( £ )angles of the cameras, along with the camera ID number.9 of the 13 cameras sample a half circle at roughly headheight ranging from a full left to a full right profile view(+/-60 degrees); 2 cameras were placed above and below thecentral camera; and 2 cameras were positioned in the cor-ners of the room. During each session 13 conditions with varying facial ex-pressions, illuminationandocclusionwerecaptured. Figure5 shows an example for each condition. 3. Face Recognition Algorithms 3.1. MIT, Bayesian Eigenface Moghaddam et. al. generalize the Principal ComponentAnalysis (PCA) approach of Sirovich and Kirby [28] andTurk and Pentland [29] by examining the probability dis-tribution of  intra-personal variations in appearance of thesameindividualand extra-personal variations inappearancedue to difference in identity. This algorithm performed con-sistently near the top in the 1996 FERRET test [23].Given two face images, ¤¦¥¨§©¤ , let ¤¥¤ be theimage intensity difference between them, we would like to c25c25c22 c02 c37 c05 c27c07 Figure2. Pose variation in the PIE database. 8 of 13 cam-era views are shown here. The remaining 5 camera posesare symmetrical to the right side of camera c27. Figure 3. Illumination variation in the PIE database. Theimages in the firstrow show faces recorded with room lightson, the images in the second row show faces captured withonly flash illumination. estimate the posterior probability of  "!$#&%©')( , where #0% istheintra-personalvariation ofsubject 1 . Accordingto Bayesrule, we can rewrite it as: "!$#%')(2"!$3'#%(4"!$#%( "!$3'#5%6(7"!8#5%8(@9A"!8B'#5CD(4"!$#5CD( § (1)where #EC is the extra-personal variation of all the subjects.To estimate the probability density distributions "!$3'#&%6( and "!$3'#ECF( , PCA is used to derived a low (M) dimen-sion approximation of the measured feature space GHPI ( QRTS)!7UWVYX`(4( : "!$3'#E(2bacYd !7¥FeTf%hg¥5ipqrq(!8s¨tu( fwvyxf%hg¥¥v%acYd!7p© (!$s¦t( Ifv§ (2)where %§% are the eigenvectors and eigenvalues in the Mdimensional principal component space, and `!$)( is theresidual error.The algorithm finds the subject class 1 which maxi-mizes the posterior "!$#%')( . Unlike FaceIt’s algorithm,  Figure 4. Cohn-Kanade AU-Coded Facial Expressiondatabase. Examples of emotion-specified expressions fromimage sequences. this is a mostly template based classification algorithm, al-though some local features are implicitly encoded throughthe “Eigen” intra-personal and extra-personal images. 3.2. Visionics, FaceIt FaceIt’s recognition module is based on Local FeatureAnalysis (LFA) [22]. This technique addresses two majorproblems of Principal Component Analysis. The applica-tion of PCA to a set of images yields a global representa-tion of the image features that is not robust to variabilitydue to localized changes in the input [10]. Furthermore thePCA representation is non topographic, so nearby values inthe feature representation do not necessarily correspond tonearby values in the input. LFA overcomes these problemsby using localized image features in form of multi-scalefilters. The feature images are then encoded using PCAto obtain a compact description. According to Visionics,FaceIt is robust against variations in lighting, skin tone, eyeglasses, facial expression and hair style. They furthermoreclaim to be able to handle pose variations of up to 35 de-grees in all directions. We systematically evaluated theseclaims. 4. Evaluation Following Phillips et. al. [23] we distinguish between gallery and probe images. The gallery contains the imagesused during training of the algorithm. The algorithms aretested with the images in the probe sets. All results reportedhere are based on non-overlapping gallery and probe sets(with the exception of the PIE pose test). We use the closed universe model for evaluating the performance, meaningthat every individual in the probe set is also present in thegallery. The algorithms were not given any further informa-tion, so we only evaluate the face recognition, not the face 0102 03 0405 06 0708 09 1011 12 13 Figure 5. AR database. The conditions are: (1) neu-tral, (2) smile, (3) anger, (4) scream, (5) left light on, (6)right light on, (7) both lights on, (8) sun glasses, (9) sunglasses/left light (10) sun glasses/right light, (11) scarf, (12)scarf/left light, (13) scarf/right light verification performance. 4.1. Face localization and registration Face recognition is a two step process consisting of facedetectionandrecognition. First, the facehastobe locatedinthe image and registered against an internal model. The re-sult of this stage is a normalized representation of the face,which the recognition algorithm can be applied to. In orderto ensure the validity of our findings in terms of face recog-nition accuracy, we provided both algorithms with correctlocations of the left and right eyes. This is done by apply-ing FaceIt’s face finding module with a subsequent manualverification of the results. If the initial face position wasincorrect, the location of the left and right eye was markedmanually and the face finding module is rerun on the im-age. The face detection module became more likely to failas departure from the frontal view increased. 4.2. Pose Using the CMU PIE database we are in the unique po-sition to evaluate the performance of face recognition algo-rithms with respect to pose variations in great detail. Weexhaustively sampled the pose space by using each view  in turn as gallery with the remaining views as probes. Asthere is only a single image per subject and camera view inthe database, the gallery images are included in the probeset. Table 2 shows the complete pose confusion matrix forFaceIt. Of particular interest is the question how far the al-gorithm can generalize from given gallery views.Two things are worth noting. First, FaceIt has a reason-ablegeneralizability for frontal gallery images: the recogni-tion rate drops to the 70%-80% range for 45 degree of headrotation (corresponds to camera positions 11 and 37 in Fig-ure 1 ). Figure 6 shows the recognition accuracies of thedifferent camera views for a mugshot gallery view. Gallery pose: 27 0.030.930.930.940.750. Figure 6. Recognition accuracies of all cameras for themugshot gallery image. The recognition rates are plotted onthe pose positions shown in figure 1(b). The darker color inthe lower portion of the graph indicates higher recognitionrate. The square box marks the gallery view. Second, for most non-frontal views (outside of the 40degree range), face generalizability goes down drastically,even for very close by views. This can be seen in Figure7. Here the recognition rates are shown for the two profileviews as gallery images. The full set of performance graphsfor all 13 gallery views is shown in appendix A.We then asked the question, if we can gain more by in-cluding multiple face poses in the gallery set? Intuitively,given multipleface poses, with correspondence betweenthefacial features, one can have a better chance of predicting Gallery pose: 34 Gallery pose: 22 Figure 7. Recognition accuracies of all cameras for thetwo profile poses as gallery images (cameras 34 and 22 in1b).
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!