謝志華 江鵬 余新河 張帥
摘 要:為了提高光譜人臉數(shù)據(jù)表征人臉特征的有效性,提出一種基于VGGNet和多譜帶循環(huán)訓(xùn)練的高光譜人臉識(shí)別方法。首先,在光譜人臉圖像的預(yù)處理階段,采用多任務(wù)卷積神經(jīng)網(wǎng)絡(luò)(MTCNN)進(jìn)行高光譜人臉圖像的精確定位,并利用混合通道的方式對(duì)高光譜人臉數(shù)據(jù)進(jìn)行增強(qiáng);然后,基于卷積神經(jīng)網(wǎng)絡(luò)(CNN)結(jié)構(gòu)建立一個(gè)面向高光譜人臉識(shí)別的VGG12深度網(wǎng)絡(luò);最后,基于高光譜人臉數(shù)據(jù)的特點(diǎn),引入多譜帶循環(huán)訓(xùn)練方法訓(xùn)練建立的VGG12網(wǎng)絡(luò),完成最后的訓(xùn)練和識(shí)別。在公開(kāi)的UWA-HSFD和PolyU-HSFD高光譜人臉數(shù)據(jù)集的實(shí)驗(yàn)結(jié)果表明,所提方法取得了比其他深度網(wǎng)絡(luò)(如DeepID、DeepFace、VGGNet)更好的識(shí)別性能。
關(guān)鍵詞:高光譜人臉識(shí)別;卷積神經(jīng)網(wǎng)絡(luò);VGGNet,;多譜帶循環(huán)訓(xùn)練;深度神經(jīng)網(wǎng)絡(luò)
中圖分類號(hào): TP183; TP391.4
文獻(xiàn)標(biāo)志碼:A
Abstract: To improve the effectiveness of facial feature represented by hyperspectral face data, a VGGNet and multi-band recurrent training based method for hyperspectral face recognition was proposed. Firstly, a Multi-Task Convolutional Neural Network (MTCNN) was used to locate the hyperspectral face image accurately in preprocessing phase, and the hyperspectral face data was enhanced by mixed channel. Then, a Convolutional Neural Network (CNN) structure based VGG12 deep network was built for hyperspectral face recognition. Finally, multi-band recurrent training was introduced to train the VGG12 network and realize the recognition based on the characteristics of hyperspectral face data. The experimental results of UWA-HSFD and PolyU-HSFD databases reveal that the proposed method is superior to other deep networks such as DeepID, DeepFace and VGGNet.
Key words: hyperspectral face recognition; Convolutional Neural Network (CNN); VGGNet; multi-band recurrent training; Deep Neural Network (DNN)
0 引言
隨著遙感技術(shù)的發(fā)展,光譜成像已經(jīng)在農(nóng)業(yè)、生物醫(yī)學(xué)等方面得到了廣泛的應(yīng)用,其中研究人員已經(jīng)表明[1]:人臉的不同部位具有巨大光譜變異性。而高光譜成像能夠捕獲這種變異性特征,使得高光譜圖像具有更多的判別信息,因此可以將高光譜成像技術(shù)用于人臉識(shí)別。
同傳統(tǒng)的RGB圖像相比,高光譜成像具有顯著優(yōu)勢(shì):1)對(duì)于圖像中的每個(gè)像素,高光譜相機(jī)能獲取大量連續(xù)的光強(qiáng)度,即圖像中的每一個(gè)像素都包含一個(gè)連續(xù)的光譜,可以非常精確地描述場(chǎng)景中物體的細(xì)節(jié);2)高光譜影像能感知多種被探測(cè)物的大小、形狀、缺陷等外部品質(zhì)特征,大大提高了成像的精度和可靠性;3)高光譜數(shù)據(jù)能夠探測(cè)具有診斷性光譜吸收特征的物質(zhì),能準(zhǔn)確地區(qū)分人臉的各個(gè)部分,對(duì)于人臉的檢測(cè)和識(shí)別能力有極大提高。此外,對(duì)于人臉圖像,高光譜圖像與RGB圖像相比具有更多的譜帶信息[1]。
文獻(xiàn)[2]首次將高光譜人臉圖像應(yīng)用到人臉識(shí)別中,證明了不同光譜對(duì)人臉組織的差異性;在文獻(xiàn)[3]中,Chang等經(jīng)過(guò)測(cè)試表明,在不同的光源強(qiáng)度下,如鹵素?zé)?、熒光燈、白天光照,高光譜人臉比可見(jiàn)光下的人臉更具魯棒性;Di等[4]也做了類似工作,他們的研究表明,在光譜測(cè)量中,光譜的譜帶與血紅蛋白有著密切的關(guān)系,特別是高光譜譜帶比其他譜帶更具鑒別力,并挑選了高光譜中33個(gè)譜帶進(jìn)行研究,證明了高光譜的多個(gè)波段比單個(gè)波段和可見(jiàn)光RGB波段在人臉識(shí)別上的效果好;文獻(xiàn)[5]通過(guò)滑動(dòng)窗口的方式將多維的高光譜人臉圖像融合成二維的灰度圖像,然后利用偏最小二乘回歸進(jìn)行識(shí)別,驗(yàn)證了高光譜人臉識(shí)別的有效性;魏冬梅等[6]將波段選擇與Gabor特征融合,通過(guò)分析不同波段的光學(xué)特性,減少了冗余光譜對(duì)識(shí)別的影響;Ghasemzadeh等[7]將三維小波變換同時(shí)應(yīng)用于時(shí)空域提取高光譜人臉的譜間和空間特征。
目前,高光譜的人臉識(shí)別研究主要集中在手工特征提取及其改善[5],較少關(guān)注深度學(xué)習(xí)特征的提取。鑒于高光譜成像的低信噪比和數(shù)據(jù)復(fù)雜性,研究有利于高光譜人臉識(shí)別的深度特征十分必要。本文借鑒深度卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Network, CNN)在傳統(tǒng)可見(jiàn)光人臉識(shí)別的研究,基于高光譜成像數(shù)據(jù)的特點(diǎn),提出了基于 VGGNet和多譜帶循環(huán)的高光譜人臉識(shí)別方法,建立并訓(xùn)練了一個(gè)適合高光譜人臉識(shí)別的VGG12深度網(wǎng)絡(luò)。
1 高光譜人臉識(shí)別方法
1.1 高光譜人臉預(yù)處理
在深度學(xué)習(xí)人臉識(shí)別研究中,人臉的精確定位是不可缺少的一部分,本文采用級(jí)聯(lián)網(wǎng)絡(luò)來(lái)整合多任務(wù)卷積神經(jīng)網(wǎng)絡(luò)(Multi-Task Convolutional Neural Network, MTCNN)人臉檢測(cè)[8]。該方法由三個(gè)階段組成[8]:第一階段由淺層的全卷積網(wǎng)絡(luò)快速產(chǎn)生候選窗體;第二階段使用增強(qiáng)的CNN在候選窗體中選出更精準(zhǔn)的窗體,丟棄大量的重疊窗體;第三階段使用輸出CNN實(shí)現(xiàn)候選窗體去留,同時(shí)顯示五個(gè)面部關(guān)鍵點(diǎn)定位。具體實(shí)現(xiàn)方法為:
1)采用四層的全卷積神經(jīng)網(wǎng)絡(luò),即Propose-Net(P-Net),去獲得候選窗體和邊界回歸向量,其中前三個(gè)卷積核的大小為3×3,最后一個(gè)大小為1×1;同時(shí),候選窗體根據(jù)邊界框進(jìn)行校準(zhǔn)。然后,利用非極大值抑制方法去除重疊窗體。
2)將經(jīng)過(guò)P-Net確定的、包含候選窗體的圖片送入Refine-Net(R-Net),網(wǎng)絡(luò)最后選用全連接的方式進(jìn)行訓(xùn)練。利用邊界框向量微調(diào)候選窗體,再利用非極大值抑制的方法再次去除重疊窗體。
1.2 網(wǎng)絡(luò)模型
為了進(jìn)行高光譜人臉識(shí)別,搭建了如圖1所示的12層網(wǎng)絡(luò)模型,包含11個(gè)卷積層和1個(gè)全連接層。由于和VGGNet網(wǎng)絡(luò)類似[11],每個(gè)卷積核的大小都是3×3,因此將此模型命名為VGG12。在VGG12中,前兩層(conv1,conv2)是兩個(gè)64通道的卷積層,所有卷積層的步長(zhǎng)(stride)都為1,周圍填充(padding)也為1,因此conv1和conv1不會(huì)降低維度,會(huì)提取更多的特征。conv2后面是最大池化層,由于池化之后特征圖縮小為原來(lái)的1/4,為了最大限度地保留特征,在進(jìn)行conv3時(shí)又將卷積核的數(shù)量增加一倍,因此conv3的特征圖形是conv2的兩倍,此時(shí)特征圖的大小變成了輸入的四分之一。從con1到conv8都采用類似的方法,即每?jī)蓚€(gè)連續(xù)的卷積層后都加上一個(gè)最大池化層。conv9、conv10、conv11是三個(gè)連續(xù)的卷積層,都由相同數(shù)量和大小的卷積核卷積運(yùn)算而來(lái)。conv11包含了一個(gè)最大池化層,用來(lái)減少特征數(shù)量。最后的第12層是全連接層用來(lái)特征提取,網(wǎng)絡(luò)的損失函數(shù)為softmax。
1.3 多譜帶循環(huán)訓(xùn)練
已有深度神經(jīng)網(wǎng)絡(luò)VGGNet處理的圖片都是三通道的[12],因此已有的深度學(xué)習(xí)模型對(duì)三通道的數(shù)據(jù)更有效,本文提出的模型和VGGNet一樣都采用3×3的卷積核,具有一定的共性。為了利用已有網(wǎng)絡(luò)的這種共性,提出了一種多譜帶循環(huán)訓(xùn)練的方法,如圖2所示。
2 實(shí)驗(yàn)和分析
實(shí)驗(yàn)數(shù)據(jù)集為兩個(gè)常用的高光譜人臉數(shù)據(jù)庫(kù)UWA-HSFD和PolyU-HSFD[5]。PolyU-HSFD數(shù)據(jù)庫(kù)的波段范圍為400~720nm,每個(gè)樣本共有33個(gè)譜帶的二維圖像,實(shí)驗(yàn)數(shù)據(jù)集采用24個(gè)人構(gòu)成的161幅圖像為原始訓(xùn)練樣本,141幅圖像作為測(cè)試樣本;UWA-HSFD數(shù)據(jù)庫(kù)的波段范圍與PolyU-HSFD相同,每個(gè)樣本共33個(gè)譜帶,數(shù)據(jù)集選取48個(gè)人構(gòu)成的158幅圖像作為訓(xùn)練樣本,96幅圖像作為測(cè)試樣本。為了驗(yàn)證本文方法的有效性,在上述兩個(gè)數(shù)據(jù)集上分別做了以下
3 結(jié)語(yǔ)
本文提出了一種基于深度學(xué)習(xí)特征的高光譜人臉識(shí)別方法?;诟吖庾V人臉成像的特點(diǎn)和數(shù)據(jù)規(guī)模限制,結(jié)合預(yù)處理和多譜帶循環(huán)訓(xùn)練方法,構(gòu)建了一個(gè)有效的VGG12深度網(wǎng)絡(luò)。實(shí)驗(yàn)結(jié)果表明,提出的深度學(xué)習(xí)高光譜人臉識(shí)別方法優(yōu)于已有的人臉識(shí)別深度網(wǎng)絡(luò)。
參考文獻(xiàn):
[1] OSIA N, BOURLAI T. A spectral independent approach for physiological and geometric based face recognition in the visible, middle-wave and long-wave infrared bands[J]. Image and Vision Computing, 2014, 32(11): 847-859.
[2] PAN Z, HEALEY G E, PRASCAD M, et al. Face recognition in hyperspectral images [C]// CVPR03Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2003: 334-339.
[3] CHANG H, KOSCHAN A, ABIDI B, et al. Fusing continuous spectral images for face recognition under indoor and outdoor illuminants[J]. Machine Vision and Applications, 2010, 21(2): 201-215.
[4] DI W, ZHANG L, ZHANG D. Studies on hyperspectral face recognition in visible spectrum with feature band selection[J]. IEEE Transactions on Systems, Man and Cybernetics — Part A: Systems and Humans, 2010, 40(6): 1354-1361.
[5] UZAIR M, MAHMOOD A, MIAN A. Hyperspectral face recognition with spatiospectral information fusion and PLS regression[J]. IEEE Transactions on Image Processing, 2015, 24(3): 1127-1137.
[6] 魏冬梅, 張立人, 胡楠楠,等. 聯(lián)合空譜信息和Gabor特征的高光譜人臉識(shí)別算法[J]. 北京理工大學(xué)學(xué)報(bào),2017,37(10):1077-1083. (WEI D M, ZHANG L R, HU N N, et al. Hyperspectral face recognition with spatial-spectral fusion information and Gabor feature[J]. Transactions of Beijing Institute of Technology, 2017, 37(10):1077-1083.)
[7] GHASEMZADEH A, DEMIREL H. 3D discrete wavelet transform-based feature extraction for hyperspectral face recognition [J]. IET Biometrics, 2018, 7(1): 49-55
[8] ZHANG K, ZHANG Z, LI Z, et al. Joint face detection and alignment using multi-task cascaded convolution networks [J]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503.https://arxiv.org/ftp/arxiv/papers/1604/1604.02878.pdf網(wǎng)上只找到PDF
[9] 楊楠, 南琳, 張丁一, 等. 基于深度學(xué)習(xí)的圖像描述研究[J]. 紅外與激光工程,2018,47(2): 203002. (YANG N, NAN L, ZHANG D Y, et al. Research on image interpretation based on deep learning[J]. Infrared and Laser Engineering, 2018, 47(2): 203002.)
[10] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2015: 1-9.
[11] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2018-03-22]. https://arxiv.org/pdf/1409.1556.pdf.
[12] 張國(guó)山, 張培崇, 王欣博. 基于多層次特征差異圖的視覺(jué)場(chǎng)景識(shí)[J]. 紅外與激光工程, 2018, 47(2): 203004. (ZHANG G S, ZHANG P C, WANG X B, Visual place recognition based on multi-level feature difference map[J]. Infrared and Laser Engineering, 2018, 47(2): 203004.)
[13] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// NIPS12Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012, 1: 1097-1105.
[14] MOLLAHOSSEINI A, CHAN D, MAHOOR M H. Going deeper in facial expression recognition using deep neural networks [C]// Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision. Piscataway, NJ: IEEE, 2016: 1-10.
[15] SUN Y, WANG X, TANG X O. Deep learning face representation from predicting 10000 classes [C]// CVPR 14Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1891-1898.
[16] TAIGMAN Y, YANG M, RANZATO M, et al. DeepFace: closing the gap to human-level performance in face verification [C]// CVPR 14Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1701-1708.
[17] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [C]// Proceedings of the 32nd International Conference on Machine Learning. [S.l.]: ICML, 2015: 448-456.http://proceedings.mlr.press/v37/ioffe15.htmlhttps://arxiv.org/pdf/1502.03167.pdf