李 琪 鐘 將 李 雪
1(重慶大學計算機學院 重慶 400044) 2(昆士蘭大學信息技術與電子工程學院 澳大利亞布里斯班 4072)
基于啟發(fā)策略的動態(tài)平衡圖劃分算法
李 琪1鐘 將1李 雪2
1(重慶大學計算機學院 重慶 400044)2(昆士蘭大學信息技術與電子工程學院 澳大利亞布里斯班 4072)
(liqi0713@foxmail.com)
隨著計算技術的發(fā)展以及大數(shù)據(jù)時代的來臨,分布式計算已成為研究的熱點,其中大圖迭代計算作為其研究的重點,降低劃分后子圖之間的通信邊規(guī)模是改善計算性能的關鍵.傳統(tǒng)算法很難在切割率最小化與負載均衡上同時滿足.由于圖劃分屬于NP組合優(yōu)化問題,提出了一種動態(tài)平衡算法來解決圖的平衡劃分,確保在子圖邊界點劃分最優(yōu)的基礎上引入擾動策略使其跳出局部最優(yōu)擴大搜索空間,最后在真實世界圖上驗證算法的可行性,分別從平衡系數(shù)、切割邊規(guī)模與傳統(tǒng)算法進行了比較.在指定的擾動次數(shù)下,此算法比常見的算法hash,Chunk,Metis在割邊率上分別降低了近40%,30%,5%.與Metis相比,平衡系數(shù)也更加地優(yōu)化,實驗結果證明了該算法的有效性.
平衡圖劃分;啟發(fā)策略;負載均衡;分布式計算;局部優(yōu)化
給定一個無向圖G=(V,E),V和E分別表示頂點和邊的集合,平衡K劃分是把點集V按照映射關系π→(S1,S2,…,SK)映射到K(K≥2)個不相交的子域中,每個子域的規(guī)模幾乎相等,要求使割邊數(shù)最少(邊的2個端點不在同一個域).當K=2時是2劃分,針對2劃分經(jīng)典的是KL(Kernighan-Lin)[1]算法,其基本思想是將圖隨機劃分為2等份,將2分結果作為輸入,通過交換2個子域中的點來改進2分結果.此算法已經(jīng)成為大多數(shù)圖劃分算法迭代改進的基礎,但是由于其較高的時間復雜度O(|V|3),不適合大圖的直接處理.Fiduccia和Mattheyses[2]對其進行了改進,用單點移動來代替KL的雙點交換以及加入了更有效的數(shù)據(jù)結構.圖劃分本身是NP完全問題[3],可以通過元啟發(fā)式算法[4]解決此類問題,主要有模擬退火算法[5]、禁忌搜索算法[6]、遺傳算法[7]等.另外,Kumar等人提出了多層次的圖劃分模型Metis[8]和它的并行版本ParMetis[9].Metis算法設計主要基于多層次圖劃分范式.此類方法還包括Chaco[10]和Scotch[11].圖劃分有廣泛的應用,例如并行計算[12]、VLSI設計[13]、圖像分割[14]等.
圖可以表達復雜的結構和豐富的語意,其迭代分析算法在社交網(wǎng)絡、Web和科學計算等諸多領域獲得了廣泛的應用[15],然而,隨著數(shù)據(jù)規(guī)模的不斷增長對計算要求提出了嚴峻的挑戰(zhàn).在2012年,Google月活躍用戶數(shù)為10億,Twitter月活躍用戶數(shù)為2億,平均每天發(fā)送的消息量達到了1.75億,與之相對應的是數(shù)十億的邊與頂點,但是我們?nèi)砸ㄟ^這些龐大的圖數(shù)據(jù)來進行一些相關的計算,例如PageRank、尋找連通分量、計算三角形等.將如此海量的圖數(shù)據(jù)存儲在單機環(huán)境中計算效率會非常的低,進而人們開發(fā)了分布式迭代處理系統(tǒng),如Pregel[16],GraphLab[17],Spark[18],Giraph[19].圖劃分是Spark等系統(tǒng)進行分布式計算的提前,每次迭代處理均會引入巨大的通信開銷,這將成為制約分布式處理性能的關鍵因素.一個良好的劃分算法應保證劃分后的子圖在負載均衡的前提下,最小化割邊數(shù)規(guī)模.因此,設計劃分效果優(yōu)越的圖分割算法已經(jīng)成為現(xiàn)有大圖處理系統(tǒng)急需解決的問題,已有的圖劃分算法[20-21]在割邊數(shù)規(guī)模與子域負載平衡上難以同時滿足.針對此問題,本文提出了動態(tài)平衡圖劃分算法——DyBGP,利用多種策略確保各子域負載均衡的基礎上最小化割邊率.
本文的貢獻主要有2個方面:
1) 設計了基于啟發(fā)策略的動態(tài)平衡圖劃分算法,貪心頂點轉移操作能夠有效地減少割邊數(shù)達到局部最優(yōu),分區(qū)容量限制策略用來平衡各子域的負載,又定義了擾動策略,是跳出局部最優(yōu)的關鍵,并利用全局記憶結構存儲最優(yōu)的結果,同時也對該算法的復雜性進行了理論分析;
2) 在真實的圖數(shù)據(jù)上進行實驗分析,分別在切割邊數(shù)量與平衡度2方面分別與Hash,Chunk,Metis進行比較,實驗結果證明了本文所提出算法在平衡圖劃分問題的有效性.
1) 圖劃分.給定一個無向圖G=(V,E),V和E分別表示圖的點集和邊集,K路平衡劃分是將頂點V按照某種策略分配到K個子域中S1,S2,…,Sk,要求在各子域負載平衡的基礎上最小化割邊率,Vi代表第i子區(qū)中的頂點集,V1∪V2∪…∪Vk=V,Vi∩Vj=?,i≠j,ρ為平衡系數(shù)(ρ≥1),理想值為1.0.圖劃分問題可以定義為
(1)
(2)
(3)
式(2)中的ECutij為子域Si到Sj(或者Sj到Si)所有邊的集合(Si≠Sj).
2)g(v,n).點v從所在子域Slocal移向另一個子域Sj(Slocal≠Sj),割邊減少的數(shù)量我們稱之為收益值,|EVi|(i∈[1,K])表示子域Si中點與點v相連的邊數(shù),圖1中有4個子域(S1,S2,S3,S4),點v在子域S3(Slocal)中,|EV1|=3,|EV2|=1,|EV3|=2,|EV4|=2.
n代表點v所移動的目標子域,取獲得收益最大的子域,g(v,n)不僅有正值也有負值(圖1中g(v,1)=1),用數(shù)學形式表示g(v,n)為
(4)
Fig. 1 An example of 4-partitioning圖1 4個子域的圖劃分
當初始劃分完成后,首先選取邊界點作為候選點,然后定義多種策略確保圖的候選點劃分(割邊數(shù)規(guī)模、子域負載)達到最優(yōu),為了擴大搜索范圍我們加入了擾動策略,在此基礎上引入了懲罰措施,懲罰負載過大和過小的子域,算法1用偽代碼詳細描述了此過程,詳細的子過程將分別在2.1~2.3節(jié)、2.5節(jié)介紹.
算法1. 動態(tài)平衡劃分算法(DyBGP).
輸入:初始劃分Pk={V1,V2,…,VK}(見2.1節(jié));
輸出:劃分結果.
步驟1. 初始化參數(shù),擾動次數(shù)(pertur_times),禁忌列表(tabu list),全局記憶結構(global memory structure);
步驟2. For每一個候選點
計算g(v,n);
將點v插入增益結構(見2.2節(jié));
End For
步驟3. Whilepertur_times
計算此時劃分圖狀態(tài);
① If 收斂
執(zhí)行擾動策略跳出局部最優(yōu);
pertur_times=pertur_times-1;
pertur(v,n)(見2.5節(jié));
更新增益結構和全局記憶結構;
② Else If 沒有收斂
Repeat
iter_number=iter_number+1;
greedy_move(v,Sdst)(見2.3節(jié));
更新增益結構和禁忌列表;
Balance_move(v,Sdst)(見2.3節(jié));
更新增益結構和禁忌列表;
Until候選點的收益值都小于等于零
③ 執(zhí)行懲罰策略(見2.5節(jié));
End If
End While
首先將圖分為K個小圖,為了證明本算法是否與初始劃分有關,本文列出了3種初始的圖劃分.
1) Hash.Pregel,GraphLab采用此方法,根據(jù)index=Hash(ID) modK將頂點映射到第index個分區(qū),K為分區(qū)數(shù).此方法時間復雜度很低O(V).
3) Metis.Metis屬于多級劃分,分為3個階段——粗化、劃分、細化.粗化階段是壓縮圖的規(guī)模,時間復雜度大于O(|E|);粗化后的圖用KL等算法進行劃分,時間復雜度為O(N3),N為粗化后的頂點數(shù),細化是將圖恢復成原圖并且在恢復過程中不斷調(diào)整優(yōu)化,時間復雜度大于O(|E|);Metis劃分整個過程時間復雜度大于O(2×|E|+N3).
桶結構首次被Fiduccia和Mattheyses提出[2],是為了改進2劃分的KL算法,把所有相同收益值的點放在木桶結構中的相同位置,根據(jù)收益的大小進行移動操作,時間復雜度明顯降低.Benlic等人[22]提出了針對K-劃分的木桶結構.但是其隨著子域數(shù)量的增加,所消耗的內(nèi)存也是急速地增加,本文也提出了針對本算法的結構.
首先計算候選點的收益值,將點插入到對應的收益值列表中,對應相應的目標子域,每次將最大收益值對應的點移向目標子域.另外,還增加了鄰居列表和鄰居所在列表位置的列表,當點v發(fā)生移動時,我們只需要根據(jù)索引更新點v和點v周圍鄰居點的值,每次更新所需要的時間復雜度與點v的鄰居數(shù)有直接的關系,同樣也大大減少了計算量.圖2舉例說明了將例圖劃分為3個子圖的增益結構.
Fig. 2 An example of gain struct for 3-partitioing圖2 例圖劃分為3個子域的增益結構
為了在候選點上執(zhí)行局部優(yōu)化操作,采用的操作策略:
如果移動之前|Vsrc|<|Vdst|,那么移動之后在子域Sdst選擇某一點v,滿足g(v,Ssrc)≥0,移向目標子域Ssrc.但是如果對于Sdst中任意的點收益值g(v,Ssrc)<0,則不移動.
2) 子域負載限制操作{balance_move(v,Sdst)}.對于同一個子域來說,每次迭代可能會有很多點從不同子域轉移過來,造成子域負載不平衡,因此設計了一種平衡操作,這種操作規(guī)定任意選擇2個子域Si和Sj,如果|Vi|>|Vj|,在子域Si中,選擇某一點v且g(v,Sj)≥0,將點v從Si移向Sj(如果|Vi|<|Vj|,執(zhí)行相反的操作),此操作也可以進一步降低割邊率.
1) 禁忌列表(tabu list).本文所采用的轉移決策具有獨立性,局部的對稱性會導致無效的轉移,如1對互為鄰居的頂點,在迭代中2頂點可能相互轉移到對方所在的子域中不斷地互相多次轉移,影響局部的收斂,為了防止此類無效的轉移,規(guī)定:當某個頂點從Si轉移到另一個子域Sj,在某個常數(shù)時間內(nèi)禁止返回原子域.該算法增加了禁忌表tabu list,禁忌長度定義為t(v,Si)=border(|Vi|)×α,border(|Vi|)表示子域Si的邊界點個數(shù),α是一個因子,在本文中設α=0.05,每次擾動之前,tabu list將清空重新計算.
2) 全局記憶結構(global memory structure).由于擾動具有隨機性,因此,在設定的擾動次數(shù)下,用全局記憶結構存儲劃分效果最好的一次擾動,但是也會相應的增加內(nèi)存消耗.
為了跳出局部最優(yōu),本文設計了一種擾動策略,選擇一個子域Si,在Si中任意選擇其中的γ個內(nèi)點(邊界點之外的點),每個點任意地移向其他子域Sj(Si≠Sj),γ=0.03×inside(|Vi|).點在不斷的移動過程中,有些子域負載規(guī)模可能過大或過小,因此,引出2種懲罰措施.
以上2種策略,都是在收益值大于或等于零的情況下進行移動,因此不會增加圖的割邊率.
本節(jié)對所提出算法的復雜度進行分析,本算法的復雜度主要體現(xiàn)在初始劃分、擾動以及擾動之后的迭代時間,由于初始劃分的隨機性,因此設初始劃分的復雜度為O(t).擾動次數(shù)為pertur_times,每輪擾動之后的迭代次數(shù)為iter_number,擾動之后總的迭代時間為pertur_times×iter_number.本文中每次擾動的頂點數(shù)為0.03×inside(|Vi|),因此擾動需要的時間復雜度為pertur_times×0.03×inside(|Vi|).整個算法時間復雜度O(t+pertur_times×iter_number+pertur_times×0.03×inside(|Vi|).
本節(jié)我們在真實圖上來測試本算法的可行性,介紹實驗的具體步驟及平臺環(huán)境,展示實驗的結果并對這些結果進行分析.
實驗中使用的真實圖數(shù)據(jù)來源于斯坦福大學網(wǎng)絡分析項目,詳細圖信息在表1中.算法用python語言編寫,在AMD phenom Ⅱ X4 955 4 GB上編譯測試.
Table 1 Experimental Data Sets表1 實驗數(shù)據(jù)集
如圖3所示,我們用Hash,Chunk,Metis方法分別對圖loc-Gowalla進行了初始的K-劃分(K=2,6,8,16,32,64),由圖3可以看出Hash的劃分結果最差,當子域數(shù)量為64時割邊率幾乎達到了94%;Metis的劃分效果明顯優(yōu)于Hash和Chunk,隨著子域的增多,割邊比也會增加,但增幅明顯小于Hash與Chunk.
Fig. 3 Results of initial partitioning on loc-Gowalla圖3 基于Hash,Chunk,Metis的K-劃分
擾動策略是跳出局部最優(yōu)的關鍵,因此也對擾動策略進行了實驗分析,在圖4中,在沒有擾動策略的情況下(即算法1中沒有步驟①)割邊率與迭代次數(shù)(iter_number)的關系,橫坐標為迭代次數(shù),縱坐標為割邊率.圖5展示了加入擾動策略之后擾動輪數(shù)與割邊率的關系,橫坐標為擾動次數(shù)(dister_number),縱坐標為割邊率.
Fig. 4 Results of 16-partitioning on p2p-Gnutella8 without perturbation strategy圖4 p2p-Gnutella8上沒有擾動策略的16-劃分結果
Fig. 5 Results of 16-partitioning on p2p-Gnutella8 with perturbation strategy圖5 p2p-Gnutella8上加入擾動策略的16-劃分結果
如圖4所示,由于Hash的初始劃分的割邊率明顯高于Chunk和Metis,在沒有擾動的情況下,Hash迭代收斂的次數(shù)最高,Metis收斂的迭代次數(shù)最少.加入擾動之后,如圖5所示,割邊率都會有進一步降低,隨著擾動次數(shù)的增加,全局記性結構里都會存儲最好的劃分結果,由結果可以看出,劃分結果質(zhì)量的優(yōu)劣與初始劃分沒有關系.
最后在表2中,分別取Hash,Chunk,Metis為本算法的初始劃分,結果取其平均值作為提出算法的劃分結果,括號中的數(shù)值為平衡因子.從表2中,可以看出DyBGP算法在割邊率上明顯提高,而且在平衡度上與Metis相比也有所提升,證明了所提出算法的有效性.
Table 2 Comparion of Our Approach (DyBGP) with Hash, Chunk and Metis表2 本文提出的方法(DyBGP)與Hash,Chunk,Metis結果比較
本文利用初始劃分的局部信息(邊界點)通過啟發(fā)式策略調(diào)整點位置達到局部最優(yōu),為了擴大搜索范圍,我們又定義了擾動策略,用多種策略來確保圖的平衡劃分且最小化割邊率,實驗數(shù)據(jù)也證明了此算法的有效性.平衡圖劃分有著廣泛的應用,隨著大數(shù)據(jù)發(fā)展與應用,在圖并行框架中起著重要的作用,未來,我們將圖劃分運用到具體的大圖迭代系統(tǒng)中,與具體的計算相結合,對于后續(xù)大圖算法的研究有很重要的意義.
[1]Dutt S. New faster kernighan-lin-type graph-partitioning algorithms[C] //Pro of ICCAD-93. Piscataway, NJ: IEEE, 1993: 370-377
[2] Fiduccia C M, Mattheyses R M. A linear-time heuristic for improving network partitions[C] //Proc of the 19th IEEE Conf on Electronic Design Automation. New York: ACM, 1988: 241-247
[3] Garey M R, Johnson D S, Stockmeyer L. Some simplified NP-complete graph problems[J]. Theoretical Computer Science, 1976, 1(3): 237-267
[4] Xu Jinfeng, Dong Yihong, Wang Shiyi. Summary of large-scale graph partitioning algorithms[J]. Telecommunications Science, 2014, 30(7): 100-106 (in Chinese)(許金鳳, 董一鴻, 王詩懿. 大規(guī)模圖數(shù)據(jù)劃分算法綜述[J]. 電信科學, 2014, 30(7): 100-106)
[5] Johnson D S, Aragon C R, McGeoch L A. Optimization by simulated annealing: An experimental evaluation; part I, graph partitioning[J]. Operations Research, 1989, 37(6): 865-892
[6] Rolland E, Pirkul H, Glover F. Tabu search for graph partitioning[J]. Annals of Operations Research, 1996, 63(2): 209-232
[7] Rahimian F, Payberah A H, Girdzijauskas S, et al. JA-BE-JA: A distributed algorithm for balanced graph partitioning[C] //Proc of the 7th IEEE Int Conf on Self-Adaptive and Self-Organizing Systems. Piscataway, NJ: IEEE, 2013: 51-60
[8] Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs[J]. SIAM Journal on Scientific Computing, 1998, 20(1): 359-392
[9] Karypis G, Schloegel K, Kumar V. Parmetis: Parallel graph partitioning and sparse matrix ordering library[OL]. [2016-08-16]. https://www.research-gate.net/publication/238705993_Parmetis_Parallel_graph_partitioning_and_sparse_matrix_ordering_library
[10] Hendrickson B, Leland R. A multi-level algorithm for partitioning graphs[C] //Proc of ACM/IEEE Conf on Supercomputing. New York: ACM, 1995: 28-28
[11] Pellegrini F, Roman J. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs[C] //Proc of HPCN-Europe 1996. Berlin: Springer, 1996: 493-498
[12] Simon H D. Partitioning of unstructured problems for parallel processing[J]. Computing Systems in Engineering, 1991, 2(2/3): 135-148
[13] Karypis G, Kumar V. Multilevelk-way partitioning scheme for irregular graphs[J]. Journal of Parallel and Distributed Computing, 1998, 48(1): 96-129
[14] Grady L, Schwartz E L. Isoperimetric graph partitioning for image segmentation[J]. IEEE Trans on Pattern Analysis & Machine Intelligence, 2006, 28(3): 469-475
[15] Chen Ling, Li Xue, et al. Mining health examination records—A graph-based approach[J]. IEEE Trans on Knowledge and Data Engineering, 2016, 28(9): 2423-2437
[16] Malewicz G, Austern M H, Bik A J C, et al. Pregel: A system for large-scale graph processing[C] //Proc of the 2010 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2010: 135-146
[17] Low Y, Bickson D, Gonzalez J, et al. Distributed graphLab: A framework for machine learning and data mining in the cloud[J]. Proceedings of the VLDB Endowment, 2012, 5(8): 716-727
[18] Zaharia M, Chowdhury N M, Franklin M J, et al. Spark: Cluster computing with working sets[C] //Proc of the 2nd USENIX Conf on Hot Topics in Cloud Computing. Berkeley, CA: USENIX Association, 2010: 10
[19] Avery C. Giraph: Large-scale graph processing infrastructure on hadoop[OL]. [2016-08-16]. http://giraph.apache.org/
[20] Stanton I, Kliot G. Streaming graph partitioning for large distributed graphs[C] //Proc of the 18th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2012: 1222-1230
[21] Mehrdoost Z, Bahrainian S S. A multilevel tabu search algorithm for balanced partitioning of unstructured grids[J]. International Journal for Numerical Methods in Engineering, 2016, 105(9): 678-692
[22] Benlic U, Hao J K. An effective multilevel tabu search approach for balanced graph partitioning[J]. Computers & Operations Research, 2011, 38(7): 1066-1075
[23] Leskovec J, Kleinberg J, Faloutsos C. Graph evolution: Densification and shrinking diameters[J]. ACM Trans on Knowledge Discovery from Data, 2007, 1(1): 2
[24] Cho E, Myers S A, Leskovec J. Friendship and mobility: User movement in location-based social networks[C] //Proc of the 17th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2011: 1082-1090
[25] Leskovec J, Adamic L A, Huberman B A. The dynamics of viral marketing[J]. ACM Trans on the Web, 2007, 1(1): 228-237
DyBGP:ADynamic-BalancedAlgorithmforGraphPartitioningBasedonHeuristicStrategies
Li Qi1, Zhong Jiang1, and Li Xue2
1(CollegeofComputerScience,ChongqingUniversity,Chongqing400044)2(SchoolofInformationTechnologyandElectricalEngineering,UniversityofQueensland,Brisbane,Australia4072)
With the development of computing technology and the advent of the era of big data, the distributed computing has became a research hotspot. Iterative computation of big graph becomes the focus of the research. Reducing the communication data quantity between subgraph after effective partitioning, it is the key to improve the computational performance, because the existing algorithms are difficult to meet the requirements on both minimizing fraction of egdes cut and load balancing at the same time. In this paper, a dynamic-balanced algorithm for graph partitioning named DyBGP is proposed, and it is used to solve the problem of balanced partition. Based on ensuring the partitioning of subgraph boundary vertices optimal, the perturbation strategy to jump out of local optimum to expand the search space is used. Finally, our algorithm is verified the feasibility in the real-world graph, respectively from the balance coefficient and the scale of edges cut compared with the traditional algorithms, such as Hash, Chunk and Metis. In the number of edges cut, it is decreased about 40%, 30%, 5% with our algorithm under specifying perturbation times. In the balance coefficient, our algorithm is more optimized than Metis. The experimental results show that the algorithm is effective.
balanced graph partitioning; heuristic strategies; load balancing; distributed computing; local optimization
his PhD from Queensland University of Technology in 1997. His main research interests include opinion analysis from social media, big data analytics, knowledge discovery from sequences, mining distributed, high-speed, time-variant data streams, etc.
2016-09-09;
2017-02-21
國家“八六三”高技術研究發(fā)展計劃基金項目(2015AA015308);重慶市社會事業(yè)與民生保障科技創(chuàng)新專項(cstc2017shmsA0641)
This work was supported by the National High Technology Research and Development Program of China (863 Program) (2015AA015308) and the Social Undertakings and Livelihood Security Science and Technology Innovation Funds of CQ CSTC (cstc2017shmsA0641).
鐘將(zhongjiang@cqu.edu.cn)
TP301.6
LiQi, born in 1987. PhD candidate at the College of Computer Science, Chongqing University. His main research interests include data mining and graph computing, etc.
ZhongJiang, born in 1974. Recevied his PhD degree in computer science from Chongqing University in 2005. Professor and PhD supervisor. His main research interests include data mining, management information system, trusted computer system, service computing, etc.