張吐輝,張海
(西北大學(xué)數(shù)學(xué)系,陜西 西安 710069)
基于Lp正則化的自適應(yīng)稀疏group lasso研究
張吐輝,張海
(西北大學(xué)數(shù)學(xué)系,陜西 西安 710069)
基于稀疏group lasso的思想和adaptive lasso的優(yōu)點(diǎn),提出更具一般性的Lp正則化的自適應(yīng)稀疏group lasso,并對其高維統(tǒng)計(jì)性質(zhì)進(jìn)行了研究.通過對正則子、損失函數(shù)的性質(zhì)和正則參數(shù)的選擇的分析,最終得到基于Lp正則化的自適應(yīng)稀疏group lasso非漸近誤差界估計(jì).
稀疏group lasso;限制強(qiáng)凸;可分解性;adaptive lasso
一般線性回歸問題
其中 y是 n×1響應(yīng)變量,X=(X1,X2,···,Xn)T是 n×d矩陣,Xi=(xi1,···,xid), i=1,···,n,β=(β1,···,βd)為d×1未知參數(shù).?是噪聲向量且服從高斯分布?~N(0,σ2I).若真實(shí)模型系數(shù)為β?=(,,···,)且q 一般地,具有組結(jié)構(gòu)線性模型可表示為: 其中y是n×1響應(yīng)變量,?~N(0,σ2I),Xl是n×ml矩陣,表示第l個(gè)因子,βl是第l個(gè)因子對應(yīng)系數(shù),大小為ml,l=1,···,L.若記X=(X1,X2,···,XL),β=(β1,β2,···,βL)′,進(jìn)而上述線性回歸問題可寫為y=Xβ+?.針對變量之間具有組結(jié)構(gòu)的問題,文獻(xiàn)[4]提出了以組的形式進(jìn)行變量選擇的group lasso,其模型為: 該方法對lasso進(jìn)行了改進(jìn),為高維結(jié)構(gòu)化海量數(shù)據(jù)分析提供了一種新方法,并能從中選擇出重要因子.隨后,不同研究者開始了各種變形的group lasso方法研究[5-6].但group lasso只在組間具有稀疏性,而組內(nèi)沒有稀疏性,也就是說,在一個(gè)組內(nèi),因子將被同時(shí)選擇或刪除,然而在許多實(shí)際問題中,組內(nèi)變量影響往往有所差異,這一缺點(diǎn)限制了它的應(yīng)用.文獻(xiàn)[7]提出p=2的稀疏group lasso組內(nèi)組間都具有稀疏性,模型為: 此時(shí),當(dāng) λ1=0時(shí)是大家熟知的lasso;當(dāng)λ2=0時(shí)是group lasso. 顯然,稀疏group lasso具有l(wèi)asso的優(yōu)良性質(zhì),可以選擇出相關(guān)重要因子,但是lasso方法具有不一致性,詳見文獻(xiàn)[8-11].文獻(xiàn)[12]提出的adaptive lasso,只要選擇出合適權(quán)重就能解決lasso的缺點(diǎn),從而稀疏group lasso也繼承了lasso的缺點(diǎn).基于此,本文研究更一般的Lp(p≥2)正則化的自適應(yīng)稀疏group lasso,模型為: 其中ωl>0.由于p≥2時(shí),正則化模型具有優(yōu)良性質(zhì)[13-14].本文關(guān)注于參數(shù)d大于樣本數(shù)n時(shí)Lp(p≥2)正則化的自適應(yīng)稀疏group lasso的非漸近誤差界估計(jì). 注 1.1Lp正則化的自適應(yīng)稀疏group lasso三項(xiàng)都是關(guān)于β的凸函數(shù),因此求解模型是對應(yīng)于一個(gè)凸優(yōu)化問題的求解. 一般地,正則化問題都可以寫成如下形式: 其中L(β;X,y)是損失函數(shù),r(β)是正則函數(shù),λn≥0是正則參數(shù).在高維統(tǒng)計(jì)情形下,通過估計(jì)值βλn與真實(shí)值β?之間的誤差界來度量一個(gè)算法的好壞[15-17]. 首先引入相應(yīng)定義.正則子r可分是指正則子關(guān)于Rd的兩個(gè)子空間A?B,滿足 損失函數(shù)滿足限制強(qiáng)凸是指給定的集合 上,下式成立: 其中參數(shù)kL>0. 引理 2.1[17]平方損失函數(shù)滿足限制強(qiáng)凸性即就是損失函數(shù)滿足限制特征值.如果設(shè)計(jì)矩陣的每行向量服從分布正態(tài)分布,即其中是協(xié)方差矩陣,則損失函數(shù)限制強(qiáng)凸性以較大概率成立. 引理 2.2[17]如果損失函數(shù)L是凸函數(shù)、可微函數(shù)且滿足限制強(qiáng)凸性,同時(shí)正則子滿足可分性,正則參數(shù)時(shí),有 下面利用上述兩引理研究Lp正則化的自適應(yīng)稀疏group lasso理論性質(zhì).假設(shè)具有組結(jié)構(gòu)的線性模型組的劃分為: 即i=1時(shí)組的大小為1. 則Lp正則化的自適應(yīng)稀疏group lasso在平方損失下的模型可寫為: 定理Lp正則化的自適應(yīng)稀疏group lasso模型(2.1),當(dāng)正則參數(shù) 證明首先證明模型(2.1)滿足引理2.2的條件.對于任意的 由此可知Lp正則化的自適應(yīng)稀疏group lasso正則子r(β)滿足可分性. 由于此模型的損失函數(shù)是平方損失函數(shù),由引理2.1知限制強(qiáng)凸性滿足.除限制強(qiáng)凸性外,對于給定大小為m的組G,令XG:→,算子范數(shù) 對所有的l=1,2,···,ni.當(dāng)i=1時(shí),即每組的大小是1,有 使正則參數(shù)λn滿足定理中的條件,即就是使參數(shù)成立.由對偶范數(shù)的定義,有 其中b=min{?l},l=1,···,d.而,即 由對偶范數(shù)的定義知: 為 ∥u∥1的對偶范數(shù),由列標(biāo)準(zhǔn)化和高斯條件,有 而 對于高斯過程 由Sudakov-Fernique[18],有 由此可知, 由(2.2)式誤差界可知,當(dāng)損失函數(shù)滿足限制強(qiáng)凸性和正則函數(shù)滿足可分性且選擇適當(dāng)正則參數(shù)時(shí),對基于Lp正則化的自適應(yīng)稀疏 group lasso的誤差界估計(jì)有準(zhǔn)確的描述.可以看到Lp正則化的自適應(yīng)稀疏group lasso誤差界不僅與正則參數(shù)λn和限制強(qiáng)凸常量kL有關(guān),而且還與罰函數(shù)范數(shù)的選擇有關(guān). 變量選擇問題是統(tǒng)計(jì)學(xué)基本問題.本文主要研究具有組結(jié)構(gòu)的變量選擇問題,針對經(jīng)典group lasso的缺點(diǎn),研究了更一般的Lp正則化的自適應(yīng)稀疏group lasso,在對損失函數(shù)加限制強(qiáng)凸性條件以及罰函數(shù)滿足可分性時(shí),選擇適當(dāng)?shù)膮?shù),給出了估計(jì)值與真實(shí)值之間的非漸近界估計(jì). 本文研究罰函數(shù)是凸罰函數(shù)的情形.對于目前流行的非凸罰函數(shù)情況[2,10,19-21],其高維統(tǒng)計(jì)性質(zhì)是否成立尚沒有研究.另外如果損失函數(shù)是非凸損失函數(shù),研究具有組結(jié)構(gòu)的變量選擇問題也是有意義的工作之一. [1]Efron B,Hastie T Johnstone.Least angle regression[J].The Annals of Statistics,2004,32,407-499. [2]Xu Z.B,Zhang H,Wang Y.Lregularizer[J].Science in China(Information Sciences),2010,53:1159-1169. [3]Tibshirani R.Regression shrinkage and selection via the Lasso[J].Journal of the Royal Statistical Society Series B,1996,5:267-288. [4]Yuan M,Lin Y.Model selection and estimation in regression with grouped variables[J].Journal of the Royal Statistics Society B,2006,68(1):49-67. [5]Vogt J E,Roth V.A Complete Analysis of the lpGroup Lasso[C].Edinburgh:International Conference on Machine Learning,2012. [6]Meier L,Geer S,Buhlmann P.The group lasso for logistic regression[J].Journal of the Royal Statistical Society:Series B,2008,70(1):53-71. [7]Friedman J,Hastie T,Tibshirani R.A note on the group lasso and a sparse group lasso[J].Mathmatical Statistics,arXiv:1001.0736vl,2010. [8]Meinshausen N,Buhlmann P.High dimensional graphs and variable selection with the lasso[J].The Annals of Statistics,2006,34:1436-1462. [9]Zhao P,Yu B.On model seletion consistency of lasso[J].Journal of Machine Learning Reseach,2006,7:2541-2567. [10]Fan J,Li R.Variable selection via nonconcave penalized likelihood and its oracle properties[J].Journal of the American Statistical Association,2001,96:1348-1360. [11]Fan J,Peng H.Nonconcave penalized likelihood with diverging nunmber of parameters[J].The Annals of Statistics,2004,32:928-961. [12]Zou H.The adaptive lasso and its oracle properties[J].Journal of the American Statistical Association, 2006,101:1418-1429. [13]Vogt J E,Roth V.The group lasso:l(1,∞)regularization versus l(1,2)regularization[J].In Department of Computer Science,2010(9):252-261. [14]Zhao P,Rocha G,Yu B.The composite absolute penalties family for grouped and hierarchical variable selection Source[J].The Annals of Statistics,2009,37(6A):3468-3497. [15]曹懷火,張永,王勇.種群擴(kuò)散系數(shù)互譯系統(tǒng)解的一致有界性[J].純粹數(shù)學(xué)與應(yīng)用數(shù)學(xué),2011,27(3):38-41. [16]Raskutti G,wainwright M,Yu B.Minimax rates of estimation for high-dimentional linear regression over lq-ball[J].IEEE Transactiions on Information Theory,2011,57(10):6976-6994. [17]Negahban S,Ravikumar P,Wainwright M,Yu B.A unifed framework for high-dimensional analysis of M-estimators with decomposable regularizers[J].Statistical Science,2012,27(4):538-557. [18]Ledoux M,Talagrand M.Probability in Banach Spaces:Isoperimetry and Processes[M].New York: Springer-Verlag,1991. [19]Breheny P,Huang J.Coordinate descent algorithms for nonconvex penalized regression with appications to biological feature seletion[J].Annals of Applied Statistics,2011,5(1):232-253. [20]Zhang C H.Nearly unbiased variable seletion under minimax concave penalty[J].Annals of Statistics, 2012,38(2):894-942. [21]Zhang C H,Zhang T.A general theory of concave regularization for high-dimentional sparse estimation problems[J].Statistical Science,2012,27(4):576-593. The analysis of adaptive sparse group lasso based on the Lpregularizer Zhang Tuhui,Zhang Hai In this paper we propose adaptive sparse group lasso based on the Lpregularizer,we studied the high-dimensional statistical properties of our method by analysising properties of loss function and regularizer and choosing appropriate regularization paramete.Finally we obtained the nonasymptotic error bound. sparse group lasso,restricted strong convexity,decomposability,adaptive lasso O236,O213 A 1008-5513(2014)02-0178-08 10.3969/j.issn.1008-5513.2014.02.009 2013-11-10. 國家自然科學(xué)基金(60975036,11171272). 張吐輝(1988-),碩士生,研究方向:機(jī)器學(xué)習(xí). 2010 MSC:62B102 理論分析
3 結(jié)論
(Department of Mathematics,Northwest University,Xi′an 710069,China)