鄧 釗,晁綿濤,簡(jiǎn)金寶
(1.廣西大學(xué)數(shù)學(xué)與信息科學(xué)學(xué)院,廣西南寧 530004;2.玉林師范學(xué)院數(shù)學(xué)與統(tǒng)計(jì)學(xué)院,廣西玉林 537000)
?
非凸兩分塊問(wèn)題乘子交替方向法的收斂性分析*
鄧釗1,晁綿濤1,簡(jiǎn)金寶2**
(1.廣西大學(xué)數(shù)學(xué)與信息科學(xué)學(xué)院,廣西南寧530004;2.玉林師范學(xué)院數(shù)學(xué)與統(tǒng)計(jì)學(xué)院,廣西玉林537000)
(College of Mathematics and Information Science,Guangxi University,Nanning,Guangxi 530004,China;School of Mathematics and Statistics,Yulin Normal University,Yulin,Guangxi,537000,China)
摘要:乘子交替方向法(ADMM)求解大規(guī)模問(wèn)題十分有效.ADMM在凸情形下的收斂性已被清晰認(rèn)識(shí),但非凸問(wèn)題ADMM的收斂性結(jié)果還很少.本文針對(duì)非凸兩分塊優(yōu)化問(wèn)題,在增廣拉格朗日函數(shù)滿足Kurdyka-Lojasiewicz不等式性質(zhì)且罰參數(shù)大于某個(gè)常數(shù)的條件下,證明了ADMM的收斂性.
關(guān)鍵詞:乘子交替方向法Kurdyka-Lojasiewicz不等式非凸優(yōu)化收斂性
【研究意義】乘子交替方向法(ADMM)求解大規(guī)模分布式計(jì)算問(wèn)題十分有效.ADMM既能分散地收集和存儲(chǔ)這些數(shù)據(jù)集,又能在并行和分布式的環(huán)境下求解這些問(wèn)題.ADMM適合求解分布式凸優(yōu)化問(wèn)題,尤其適用于出現(xiàn)在統(tǒng)計(jì)學(xué)、機(jī)器學(xué)習(xí)和相關(guān)領(lǐng)域中的大規(guī)模問(wèn)題,其重要性日益凸顯.【前人研究進(jìn)展】ADMM的思想最早起源于20世紀(jì)50年代,算法在20世紀(jì)70年代中期由Glowinski和Marrocco[1],以及Gabay和Mercier[2]首次提出.20世紀(jì)80年代,乘子交替方向法的研究和應(yīng)用非常廣泛,直到20世紀(jì)90年代中期,乘子交替方向法求解凸優(yōu)化問(wèn)題的很多理論結(jié)果,已經(jīng)得到很好的證明.傳統(tǒng)ADMM是求解凸兩分塊問(wèn)題十分有效的方法[1-2],其直接應(yīng)用到問(wèn)題(0.5)的迭代框架如下
(0.1)
其中Lβ(x,y,λ)為問(wèn)題(0.5)的增廣拉格朗日函數(shù),定義如下
(0.2)
在凸情形下,ADMM的收斂性已被充分認(rèn)識(shí)[3].非凸問(wèn)題ADMM的收斂性分析僅有初步的結(jié)果,其研究是當(dāng)前的熱點(diǎn)問(wèn)題[4-6].文獻(xiàn)[7]考慮如下非凸問(wèn)題
minf(x)+g(y)
s.t.Ax=y,
(0.3)
并提出如下改進(jìn)的ADMM
(0.4)
其中Δφ是關(guān)于強(qiáng)凸函數(shù)φ的Bregman距離.當(dāng)φ=0時(shí),算法(0.4)為傳統(tǒng)ADMM.在罰參數(shù)充分大且目標(biāo)函數(shù)二階連續(xù)可微的條件下,文獻(xiàn)[7]證明算法產(chǎn)生點(diǎn)列的聚點(diǎn)是問(wèn)題(0.3)的穩(wěn)定點(diǎn).
文獻(xiàn)[8]分析如下Bregman ADMM算法的收斂性
【本研究切入點(diǎn)】最近,文獻(xiàn)[9]在矩陣A列滿秩,增廣拉格朗日函數(shù)滿足KL性質(zhì)(參見(jiàn)定義1.1)且罰參數(shù)大于某個(gè)常數(shù)的條件下,分析了傳統(tǒng)ADMM算法(0.1)求解非凸問(wèn)題(0.3)的全局收斂性.
我們考慮如下兩分塊極小化問(wèn)題
minf(x) + g(y)
s.t.Ax+By=0,
(0.5)
其中函數(shù)f:Rn→Rn∪{+∞}是一個(gè)非凸正常下半連續(xù)函數(shù),函數(shù)g:Rm→R L-Lipschitz可微,即存在L>0使得‖g(x)-g(y)‖≤L‖x-y‖,?x,y∈Rn),矩陣A∈Rm×n,B∈Rm×n.這類(lèi)問(wèn)題廣泛出現(xiàn)在實(shí)際應(yīng)用中,如矩陣分解、圖像處理、信號(hào)處理等[4-5].
【擬解決的關(guān)鍵問(wèn)題】本文在不要求矩陣A列滿秩,B不一定是單位陣,在Lβ(w)滿足KL性質(zhì)且罰參數(shù)β大于某個(gè)常數(shù)的條件下分析了ADMM算法(0.1)求解問(wèn)題(0.5)的收斂性.
下面,給出本文理論分析所需的一些概念與性質(zhì).
λ++(BTB)表示矩陣BTB最小正特征值.?f(x)表示函數(shù)f在點(diǎn)x處的極限次微分[5],對(duì)于任意x∈Rn是函數(shù)f的極小值點(diǎn)的必要條件是:0∈?f(x),滿足這個(gè)條件的點(diǎn)稱(chēng)為關(guān)鍵點(diǎn)或穩(wěn)定點(diǎn),函數(shù)f關(guān)鍵點(diǎn)集記為crit f.
定義1.1[10](Kurdyka-Lojasiewicz性質(zhì))設(shè)函數(shù)f:Rn→Rn∪{+∞}是正常下半連續(xù)函數(shù),對(duì)于任意實(shí)數(shù)η1,η2(η1<η2),令[η1 (i)φ(0)=0; (ii)φ在0處連續(xù),在區(qū)間(0,η)上一階連續(xù)可微; (iii)φ′(s)>0,?s∈(0,η); (iv)對(duì)于任意的x∈U∩[f(x*) f(x*)+η],有如下Kurdyka-Lojasiewicz不等式成立 φ ′(f(x)-f(x*))d(0,? f(x))≥1. 則稱(chēng)函數(shù)f在點(diǎn)x*處滿足Kurdyka-Lojasiewicz性質(zhì)(簡(jiǎn)稱(chēng)KL性質(zhì)). 滿足上述性質(zhì)(i)、(ii)、(iii)的函數(shù)全體記為Φη. 引理1.1[11](uniformized KL property)設(shè)Ω是一個(gè)緊集,函數(shù)f:Rn→Rn∪{+∞}是正常下半連續(xù)函數(shù).設(shè)函數(shù)f在集合Ω上取常數(shù),并在Ω中任一點(diǎn)處滿足KL性質(zhì),則存在>0,η>0,φ ∈Φη,對(duì)于任意的和x∈{x∈Rn:d(x,ω)<},有 引理1.2[12]若h:Rn→R為L(zhǎng)-Lipschitz可微函 數(shù),則對(duì)任意的x,y∈Rn,有 由算法(0.1)中每個(gè)子問(wèn)題的最優(yōu)性條件,有 (2.1) 進(jìn)一步,可得 (2.2) 本文的收斂性建立在如下假設(shè)條件下. 假設(shè)2.1假設(shè)以下條件成立 (i)Im(A)?Im(B); 首先,證明{Lβ(wk)}k∈N是遞減序列. 證明由增廣拉格朗日函數(shù)的定義,可得 (2.3) 利用yk+1的最優(yōu)性可得 (2.4) 由引理1.2和式(2.2)中第二式可得 把上式代入式(2.4)可得 (2.5) 由λk+1=λk-β(Axk+1+Byk+1)可得 故 把上式代入式(2.5)右端可得 (2.6) 由λk+1=λk-β(Axk+1+Byk+1)可得 λk+1-λk=-β(Axk+1+Byk+1)∈Im(B). ‖λk+1-λk‖≤μ‖B(λk+1-λk)‖=μ‖g(yk+1)-g(yk)‖≤μL‖yk+1-yk‖. (2.7) 由H的性質(zhì)可得yk=H(Byk),從而 ‖yk+1-yk‖=‖H(Byk+1)-H(Byk)‖≤M‖B(yk+1-yk)‖. (2.8) 結(jié)合式(2.6)和引理2.1可得 Lβ(xk+1,yk+1,λk+1)≤Lβ(xk+1,yk,λk)+ 結(jié)合式(2.7)、式(2.8)有 利用xk+1的最優(yōu)性可得 故序列{Lβ(wkj)}有下界,又序列{Lβ(wk)}k∈N單調(diào)遞減,所以Lβ(wkj)收斂并且Lβ(wk)≥Lβ(w*).由引理2.1可得 由δ>0及n的任意性可得 上式結(jié)合式(2.7)可得 兩式相減,可得 λk+1-λk=(λk-λk-1)+β(Axk-Axk+1)+β(Byk-Byk+1). 利用不等式(a+b+c)2≤3(a2+b2+c2)可得 ‖β(Axk-Axk+1)‖2≤3(‖λk+1-λk‖2+‖λk-λk-1‖2+β2‖B(yk+1-yk)‖2). (2.9) 由F的性質(zhì)可得xk=F(Axk),進(jìn)而 (2.10) 由式(2.9)和式(2.10)可得 引理2.3存在ζ>0,對(duì)于任意k有 d(0,?Lβ(wk+1))≤ζ‖yk+1-yk‖. 證明由增廣拉格朗日函數(shù)定義,可得 (2.11) 進(jìn)一步結(jié)合式(2.2)可得 (2.12) 令ζ:=ζ1+μLζ2,結(jié)合式(2.7)可得 d(0,?Lβ(wk+1))≤ζ‖yk+1-yk‖. 引理2.4設(shè)序列{wk}的全體極限點(diǎn)記為S(w0),則 (i)S(w0)是一個(gè)非空緊集,并且 d(wk,S(w0))→0,k→+∞; (ii)S(w0)?critLβ; (iii)Lβ(·)在S(w0)上取有限值且為常數(shù),且 證明(i)式由S(w0)的定義直接可得. (ii)設(shè)(x*,y*,λ*)∈S(w0),則存在子列{(xkj,ykj,λkj)}使得(xkj,ykj,λkj)→(x*,y*,λ*),(kj→+∞).由xk+1的最優(yōu)性有 Lβ(xk+1,yk,λk)≤Lβ(x*,yk,λk). 由引理2.2可知‖wk+1-wk‖→0,從而(xkj+1,ykj+1,λkj+1)→(x*,y*,λ*).又Lβ(·)關(guān)于y,λ連續(xù), 則有 由函數(shù)Lβ(·)的下半連續(xù)性,有 即w*?critLβ. (iii)對(duì)于任意點(diǎn)(x*,y*,λ*)∈S(w0),存在子列(xkj,ykj,λkj)→(x*,y*,λ*).結(jié)合Lβ(xkj+1)收斂,以及{Lβ(wk)}k∈N單調(diào)遞減,可得 最后,給出非凸問(wèn)題(0.5)的ADMM算法(0.1)的收斂性分析. 定理2.1若Lβ(w)滿足KL性質(zhì),則 (ii)序列{wk}收斂到函數(shù)Lβ(·)的一個(gè)關(guān)鍵點(diǎn). Lβ(w*)(?w*∈S(w0))成立.考慮如下兩種情形: (i)存在整數(shù)k0使得Lβ(wk0)=Lβ(w*)成立,由引理2.1可知,對(duì)于任意的k>k0,有 ‖yk+1-yk‖2≤Lβ(wk)-Lβ(wk+1)≤ Lβ(wk0)-Lβ(w*)=0, 因此,對(duì)于任意的k>k0,有yk+1=yk.結(jié)合式(2.7)、式(2.9)、式(2.10)可知,對(duì)于任意的k>k0+1,有wk+1=wk成立,結(jié)論成立. d(wk,S(w0))<ε, Lβ(w*) 由于S(w0)是非空緊集,函數(shù)Lβ(·)在集合 S(w0)上是常數(shù),由引理1.1得 φ′(Lβ(wk)-Lβ(w*))d(0,?Lβ(wk))≥ 由函數(shù)φ的凹性,可得 φ(Lβ(wk)-Lβ(w*))-φ(Lβ(wk+1)- Lβ(w*))≥φ′(Lβ(wk)-Lβ(w*))(Lβ(wk)-Lβ(wk+1)). 由引理2.3及φ′(Lβ(wk)-Lβ(w*))>0,可得 Lβ(wk)-Lβ(wk+1)≤ ζ‖yk-yk-1‖[φ(Lβ(wk)-Lβ(w*))- φ(Lβ(wk+1)-Lβ(w*))]. δ‖yk+1-yk‖2≤ζ‖yk-yk-1‖Δk,k+1, 即 由上式可得 注意到φ(Lβ(wm+1)-Lβ(w*))>0,移項(xiàng)并且令m→+∞,可得 所以 由式(2.7)可得 另一方面,由式(2.9)、式(2.10)可得 ‖yk+1-yk‖+‖λk+1-λk‖, 所以 進(jìn)一步可知序列wk是Cauchy序列(參見(jiàn)文獻(xiàn)[11]),所以序列{wk}收斂,定理得證. 本文針對(duì)非凸兩分塊優(yōu)化問(wèn)題,在不要求矩陣A列滿秩,B不一定是單位陣,在Lβ(w)滿足Kurdyka-Lojasiewicz性質(zhì)且罰參數(shù)大于某個(gè)常數(shù)的條件下,證明了非凸問(wèn)題ADMM的收斂性. 參考文獻(xiàn): [1]GLOWINSKI R,MARROCO A.Sur l’approximation,par elements finis d’ordre un,et la resolution,par penalisation-dualité,d’une classe de problèms de Dirichlet nonlineares[J].Revue Francaise d’Automatique,Informatique et Recherche Opérationelle,Series R,1975,31(5/6):41-76. [2]GABAY D,MERCIER B.A dual algorithm for the solution of nonlinear variational problems via finite element approximation[J].Computers &Mathematics with Applications,1976,2(1):17-40. [3]BOYD S,PARIKH N,CHU E,et al.Distributed optimization and statistical learning via the alternating direction method of multipliers[J].Foundations and Trends in Machine Learning,2011,3(1):1-122. [4]YANG L,PONG T K,CHEN X J.Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction[J].Mathematics,2016. [5]HONG M Y,LUO Z Q,RAZAVIYAYN M.Convergen- ce analysis of alternating direction method of multipliers for a family of nonconvex problems[J].SIAM Journal on Optimization,2014,26(1):337-364. [6]WANG Y,YIN W T,ZENG J S.Global Convergence of ADMM in Nonconvex Nonsmooth Optimization[R].UCLA CAM Report 15-62,2015. [7]LI G Y,PONG T K.Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems[J].Mathematical Programming,2016,159(1/2):374-401. [8]WANG F H,XU Z B,XU H K.Convergence of multi-block Bregman ADMM for nonconvex composite problems[J].Mathematics,2015,arXiv:1505.03063vl:1-25. [9]GUO K,HAND R,WUT T.Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints[J].International Journal of Computer Mathematics,2016:1-17. [10]BOLTE J,DANIILIDIS A,LEY O,et al.Characterizations of Lojasiewicz inequalities and applications[J].arXiv:0802.0826vl,2008:1-48. [11]ATTOUCH H,BOLTE J,REDONT P,et al.A Proximal alternating minimization and projection methods for nonconvex problems:An approach based on the Kurdyka-Lojasiewicz inequality[J].Mathematics of Operations Research,2010,35(2):438-457. [12]BOLTE J,SABACH S,TEBOULLE M.Proximal alternating linearized minimization for nonconvex and nonsmooth problem[J].Mathematical Programming,2013,146(1/2):459-494. (責(zé)任編輯:米慧芝) Convergence Analysis of Alternating Direction Method of Multipliers for Two Block Nonconvex Problems DENG Zhao1,CHAO Miantao1,JIAN Jinbao2 Key words:Alternating Direction Method of Multipliers,Kurdyka-Lojasiewicz inequality,nonconvex optimization,convergence Abstract:The Alternating Direction Method of Multipliers(ADMM) is an effective method for large scale optimization problems.While the convergence of ADMM has been clearly recognized in the case of convex,the convergence result of ADMM in the case of nonconvex is still an open problem.In this paper,under the assumption that the augmented Lagrangian function satisfies the Kurdyka-Lojasiewicz inequality and the penalty parameter is greater than a constant,we analyze the convergence of ADMM for a class of nonconvex optimization problems whose objective function is the sum of two block nonconvex functions. 收稿日期:2016-07-01 作者簡(jiǎn)介:鄧釗(1991-),男,碩士研究生,主要從事最優(yōu)化理論與方法研究。 中圖分類(lèi)號(hào):O221.2 文獻(xiàn)標(biāo)識(shí)碼:A 文章編號(hào):1005-9164(2016)05-0422-06 修回日期:2016-10-27 *國(guó)家自然科學(xué)基金項(xiàng)目(11601095),廣西自然科學(xué)基金項(xiàng)目(2014GXNSFFA118001,2016GXNSFDA380019)和廣西高??蒲许?xiàng)目(ZD201407)資助。 **通信作者:簡(jiǎn)金寶(1964-),男,教授,博士,博士生導(dǎo)師,主要從事最優(yōu)化理論與算法及應(yīng)用研究,E-mail:jianjb@gxu.edu.cn。 廣西科學(xué)Guangxi Sciences 2016,23(5):422~427 網(wǎng)絡(luò)優(yōu)先數(shù)字出版時(shí)間:2016-11-21【DOI】10.13656/j.cnki.gxkx.20161121.005 網(wǎng)絡(luò)優(yōu)先數(shù)字出版地址:http://www.cnki.net/kcms/detail/45.1206.G3.20161121.1520.010.html2 收斂性分析
3 結(jié)論