胡毅 朱子江
摘? 要: 對(duì)于傳統(tǒng)云環(huán)境大數(shù)據(jù)聚類中的量子進(jìn)化方法的聚類精準(zhǔn)度比較低的問題,為了降低存儲(chǔ)開銷,提高數(shù)據(jù)管理能力與調(diào)度能力,提出將優(yōu)化粒子群算法作為基礎(chǔ)的云環(huán)境大數(shù)據(jù)聚類算法,對(duì)云環(huán)境大數(shù)據(jù)聚類原理進(jìn)行分析,將傳統(tǒng)模糊C均值聚類作為基礎(chǔ),通過粒子群聚類算法對(duì)大數(shù)據(jù)聚類算法進(jìn)行改進(jìn),從而實(shí)現(xiàn)空間分割,得出云存儲(chǔ)系統(tǒng)的海量數(shù)據(jù)模糊聚類。利用粒子群聚類方法分配聚類數(shù)據(jù)離散成本,得到數(shù)據(jù)聚類信息濃度;與粒子群優(yōu)化聚類約束條件結(jié)合,得到云環(huán)境大數(shù)據(jù)聚類中心最優(yōu)解。仿真結(jié)果表明,此算法的數(shù)據(jù)聚類精準(zhǔn)度比較高,具有良好的收斂性能。
關(guān)鍵詞: 大數(shù)據(jù)聚類; 云環(huán)境; 粒子群優(yōu)化; 空間分割; 模糊聚類; 仿真測(cè)試
中圖分類號(hào): TN919?34? ? ? ? ? ? ? ? ? ? ? ? ? ? ?文獻(xiàn)標(biāo)識(shí)碼: A? ? ? ? ? ? ? ? ? ? ? 文章編號(hào): 1004?373X(2020)14?0072?04
PSO?based big data clustering algorithm in cloud environment
HU Yi, ZHU Zijiang
(South China Business College Guangdong University of Foreign Studies, Guangzhou 410545, China)
Abstract: As the clustering accuracy of the quantum evolution method of the big data clustering in the traditional cloud environment is relatively low, a PSO?based big data clustering algorithm in the cloud environment is proposed to reduce the storage cost and improve the abilities of data management and scheduling. The principle of big data clustering in the cloud environment is analyzed. By taking the traditional fuzzy C?means clustering as the basis, the big data clustering algorithm is improved by means of the particle swarm clustering algorithm, so as to achieve the spatial segmentation and get the fuzzy clustering of mass data in the cloud storage system. The discrete cost of clustering data is distributed by means of the particle swarm clustering method to get the information concentration of data clustering, and is combined with the clustering constraint condition of particle swarm optimization to get the optimal solution of big data clustering center in the cloud environment. The simulation results show that the algorithm has high accuracy of data clustering and good convergence performance.
Keywords: big data clustering; cloud environment; particle swarm optimization; space division; fuzzy clustering; simulation testing
0? 引? 言
云計(jì)算概念是IBM于2007年提出的。云計(jì)算是并行處理、分布式計(jì)算、網(wǎng)格計(jì)算之后所發(fā)展起來的最新計(jì)算方式,其將各種互聯(lián)計(jì)算、數(shù)據(jù)、存儲(chǔ)和使用等資源整合,從而能夠?qū)崿F(xiàn)多層次虛擬化和抽象,用戶只需要和網(wǎng)絡(luò)連接,就能夠利用云計(jì)算強(qiáng)大的計(jì)算和存儲(chǔ)能力實(shí)現(xiàn)功能。基于云計(jì)算背景,大數(shù)據(jù)信息處理能夠?qū)崿F(xiàn)數(shù)據(jù)聚類,利用大數(shù)據(jù)的特征參量可以對(duì)數(shù)據(jù)進(jìn)行分析。基于數(shù)據(jù)聚類可實(shí)現(xiàn)大數(shù)據(jù)的創(chuàng)建,并且利用模式識(shí)別與診斷實(shí)現(xiàn)服務(wù)分析。
1? 云環(huán)境大數(shù)據(jù)存儲(chǔ)的設(shè)計(jì)
云計(jì)算是指通過現(xiàn)代互聯(lián)網(wǎng)對(duì)結(jié)構(gòu)模型與存儲(chǔ)空間進(jìn)行動(dòng)態(tài)擴(kuò)展。要想以云計(jì)算作為背景,進(jìn)行分類挖掘與大數(shù)據(jù)存儲(chǔ),首先就要實(shí)現(xiàn)大數(shù)據(jù)存儲(chǔ)機(jī)制架構(gòu)的創(chuàng)建。在云環(huán)境中,大數(shù)據(jù)存儲(chǔ)通過虛擬化存儲(chǔ)在計(jì)算機(jī)集群開展云計(jì)算部署,通過USB磁盤層、結(jié)構(gòu)層、計(jì)算機(jī)等構(gòu)成,企業(yè)利用終端就能夠使用,通過分布式計(jì)算機(jī)就能進(jìn)行計(jì)算。
云環(huán)境大數(shù)據(jù)存儲(chǔ)結(jié)構(gòu)如圖1所示。
利用圖1所示結(jié)構(gòu),將屋內(nèi)分配應(yīng)用到云計(jì)算虛擬機(jī)中。通過式(1)、式(2)實(shí)現(xiàn)優(yōu)化聚類算法,利用最優(yōu)解實(shí)現(xiàn)云計(jì)算背景中大數(shù)據(jù)特點(diǎn)聚類物理分配,公式為:
[x=12μ(1+μ+(μ+1)(μ-3))]
[x=12μ(1+μ+(μ+1)(μ-3))]
為了避免粒子陷入局部最優(yōu),實(shí)現(xiàn)大數(shù)據(jù)信息特征矢量Xi存檔,計(jì)算公式為:
[li(k)=(1-ρ)li(k-1)+γf(xi(k))]
設(shè)置聚類閾值為Nth,在Neff