劉小都 趙慧奇
摘要 傳統(tǒng)大數(shù)據(jù)隱匿性特征安全提取技術(shù)忽略了大數(shù)據(jù)密文的公鑰及密鑰封裝,且大數(shù)據(jù)隱匿性特征類別混亂,導(dǎo)致該技術(shù)的提取精度偏低、冗余度較高.為此,本文提出一種基于混合密碼體制的大數(shù)據(jù)隱匿性特征安全提取方法.通過(guò)混合密碼體制中的公鑰封裝以及密鑰封裝機(jī)制生成大數(shù)據(jù)密文;根據(jù)密文內(nèi)容設(shè)計(jì)對(duì)稱加密方法和非對(duì)稱加密方法,基于此分類隱匿性特征,利用不同類的隱匿性特征構(gòu)建大數(shù)據(jù)隱秘性特征相空間,計(jì)算大數(shù)據(jù)間的關(guān)聯(lián)維值,實(shí)現(xiàn)大數(shù)據(jù)隱匿特征的安全提取.實(shí)驗(yàn)結(jié)果表明,與傳統(tǒng)方法相比,所提出的大數(shù)據(jù)隱匿特征提取方法冗余度低,大數(shù)據(jù)隱匿特征平均分類正確率高達(dá)95%,且特征安全提取誤差低,驗(yàn)證了所提方法具有更好的應(yīng)用性能.關(guān)鍵詞 混合密碼體制;大數(shù)據(jù);隱匿性特征;安全提??;混合算法;關(guān)聯(lián)維數(shù)
中圖分類號(hào)TP393文獻(xiàn)標(biāo)志碼A
0 引言
為了保證大數(shù)據(jù)傳輸安全,應(yīng)對(duì)大數(shù)據(jù)實(shí)施加密處理[1-2].因數(shù)據(jù)量劇增,大數(shù)據(jù)隱匿性特征類別混亂,導(dǎo)致原有的加密技術(shù)無(wú)法達(dá)到大數(shù)據(jù)加密要求[3].目前,各種網(wǎng)絡(luò)入侵行為加劇,如不法黑客收取、復(fù)制、發(fā)布信息等.為了規(guī)避大數(shù)據(jù)信息發(fā)生上述風(fēng)險(xiǎn),必須采用大數(shù)據(jù)隱匿性特征安全提取技術(shù)保證大數(shù)據(jù)交互安全[4].而傳統(tǒng)大數(shù)據(jù)隱匿性特征安全提取技術(shù),已經(jīng)無(wú)法滿足大數(shù)據(jù)發(fā)展的要求.傳統(tǒng)大數(shù)據(jù)隱匿性特征安全提取方法具有局限性.王安琪[5]討論了網(wǎng)絡(luò)用戶協(xié)議語(yǔ)言存在的專業(yè)術(shù)語(yǔ)堆砌、表達(dá)模糊、文本語(yǔ)句不規(guī)則、子語(yǔ)言信息隱藏陷阱等問(wèn)題,并提出了加強(qiáng)網(wǎng)絡(luò)用戶協(xié)議監(jiān)管的特征安全提取方法.在大數(shù)據(jù)隱匿性特征安全提取過(guò)程中,該方法主要解決的是用戶協(xié)議語(yǔ)義中存在的數(shù)據(jù)監(jiān)管及提取問(wèn)題,計(jì)算過(guò)程非常復(fù)雜,忽略了大數(shù)據(jù)密文的公鑰及密鑰封裝,導(dǎo)致大數(shù)據(jù)隱匿性特征安全提取效果不佳.蔡柳萍等[6]基于稀疏表示和特征加權(quán)的大數(shù)據(jù)挖掘方法,采用求解線性方程稀疏解的方法對(duì)大數(shù)據(jù)進(jìn)行特征分類,在稀疏解的求解過(guò)程中利用向量的范數(shù)將此過(guò)程轉(zhuǎn)化為最優(yōu)化目標(biāo)函數(shù)的求解.在完成特征分類后進(jìn)行特征提取以降低數(shù)據(jù)維度,最后充分結(jié)合數(shù)據(jù)的分布情況進(jìn)行有效加權(quán)來(lái)實(shí)現(xiàn)大數(shù)據(jù)挖掘.在大數(shù)據(jù)隱匿性特征安全提取過(guò)程中,該方法主要通過(guò)特征提取降低數(shù)據(jù)維度來(lái)實(shí)現(xiàn)大數(shù)據(jù)挖掘,但未考慮大數(shù)據(jù)中的冗余特征數(shù)據(jù),使得大數(shù)據(jù)隱匿性特征安全提取效率低.同時(shí),上述兩種方法均忽略了大數(shù)據(jù)密文的公鑰及密鑰封裝,提取精度較低、冗余度較高.混合密碼體制是對(duì)稱加密方法和非對(duì)稱加密方法的綜合技術(shù).因此,本文基于混合密碼體制的大數(shù)據(jù)隱匿性特征安全提取技術(shù),通過(guò)混合算法提高大數(shù)據(jù)的加密速度,實(shí)時(shí)提取關(guān)聯(lián)維數(shù),并利用不同種類的大數(shù)據(jù)隱匿性特征構(gòu)建大數(shù)據(jù)隱匿性特征相空間.密鑰對(duì)稱與公鑰封裝機(jī)制相融合,提升了大數(shù)據(jù)隱匿性,通過(guò)橢圓加密算法對(duì)數(shù)據(jù)摘要實(shí)施加密處理,提高了密鑰傳輸?shù)陌踩?實(shí)驗(yàn)結(jié)果表明,本研究能夠提升大數(shù)據(jù)隱匿性特征安全提取效率,滿足大數(shù)據(jù)時(shí)代的要求.1 混合密碼體制的大數(shù)據(jù)隱匿性特征安全提取技術(shù)建立混合密碼體制,設(shè)計(jì)對(duì)稱加密方法和非對(duì)稱加密方法,選擇大數(shù)據(jù)隱匿性特征,構(gòu)建大數(shù)據(jù)隱匿性特征相空間,引入關(guān)聯(lián)的隱匿性特征安全提取.
1.1 混合密碼體制研究混合密碼體制的建立融合了密鑰對(duì)稱與公鑰封裝機(jī)制,提升大數(shù)據(jù)隱匿性.混合密碼體制建立原理如圖1所示.
由圖1可知,為了生成安全性高的大數(shù)據(jù)混合密文,混合密碼體制在接收待處理的大數(shù)據(jù)信息之后[7],實(shí)施如下操作:部分大數(shù)據(jù)密文的生成通過(guò)公鑰封裝機(jī)制實(shí)施大數(shù)據(jù)初級(jí)處理實(shí)現(xiàn),其他大數(shù)據(jù)密文通過(guò)密鑰封裝機(jī)制實(shí)施深度處理實(shí)現(xiàn).
1.2 混合算法為了提高大數(shù)據(jù)的加密速度,采用混合密碼體制中混合算法實(shí)現(xiàn),流程如圖2所示.
由圖2可知,高級(jí)加密算法的密鑰通過(guò)哈希算法將大數(shù)據(jù)明文生成一個(gè)數(shù)據(jù)摘要.為了增強(qiáng)密鑰傳輸?shù)陌踩?,采用橢圓加密算法對(duì)數(shù)據(jù)摘要實(shí)施加密處理.在搜尋有效數(shù)據(jù)域的基礎(chǔ)上,采用高級(jí)加密AES算法生成大數(shù)據(jù)密文.經(jīng)過(guò)加密后的密鑰和密文[8-10],通過(guò)數(shù)據(jù)傳輸至指定對(duì)象.根據(jù)上述加密后的密文內(nèi)容,設(shè)計(jì)對(duì)稱加密方法和非對(duì)稱加密方法,兩種加密方法分類構(gòu)建,如圖3所示.
2 實(shí)驗(yàn)分析
選取某公司的大量財(cái)務(wù)數(shù)據(jù)作為實(shí)驗(yàn)數(shù)據(jù)集,選用 Matlab 軟件為實(shí)驗(yàn)平臺(tái),硬件配置為 3.20 GHz CPU、4.00 GB 內(nèi)存,軟件配置為Windows7 SP1 的電腦,運(yùn)行環(huán)境為Visual Studio 2010.在 Matlab平臺(tái)搭建實(shí)驗(yàn)環(huán)境,數(shù)據(jù)參數(shù)如表1所示.實(shí)驗(yàn)對(duì)比方法為文獻(xiàn)[5]加強(qiáng)網(wǎng)絡(luò)用戶協(xié)議監(jiān)管的特征安全提取方法和文獻(xiàn)[6]基于稀疏表示和特征加權(quán)的大數(shù)據(jù)挖掘方法.
2.1 冗余度測(cè)試采用本文方法和文獻(xiàn)[5]加強(qiáng)網(wǎng)絡(luò)用戶協(xié)議監(jiān)管的特征安全提取方法、文獻(xiàn)[6]基于稀疏表示和特征加權(quán)的大數(shù)據(jù)挖掘方法提取實(shí)驗(yàn)數(shù)據(jù)集中的大數(shù)據(jù)隱匿特征冗余度,對(duì)比結(jié)果如圖4所示.
由圖4可知,本文方法提取的大數(shù)據(jù)隱匿特征冗余度平均值僅為1.5%,相比其他兩種特征安全提取方法,本文方法特征提取的冗余度較低,表明本文方法可有效去除大數(shù)據(jù)隱匿特征內(nèi)的冗余特征量,提取出更加有效的大數(shù)據(jù)隱匿特征.這是因?yàn)楸疚姆椒ú捎藐P(guān)聯(lián)維數(shù)實(shí)時(shí)提取,利用不同種類的大數(shù)據(jù)隱匿性特征構(gòu)建大數(shù)據(jù)隱秘性特征相空間.
2.2 魯棒性測(cè)試為了進(jìn)一步驗(yàn)證大數(shù)據(jù)隱匿特征安全提取性能,測(cè)試了3種方法的大數(shù)據(jù)隱匿特征安全提取的魯棒性.魯棒性越高,表明提取過(guò)程越穩(wěn)定,具體對(duì)比結(jié)果如圖5所示.
由圖5可知:本文方法大數(shù)據(jù)隱匿性特征安全提取效果較好,特征提取魯棒性接近100%,而其他兩種方法僅達(dá)到85%,表明本文方法大數(shù)據(jù)隱匿性特征安全提取過(guò)程更加穩(wěn)定.這是因?yàn)楸疚姆椒ú捎没旌厦艽a體制中混合算法實(shí)現(xiàn),并采用橢圓加密算法對(duì)數(shù)據(jù)摘要實(shí)施加密處理,增強(qiáng)了密鑰傳輸?shù)陌踩?
2.3 精度測(cè)試為驗(yàn)證本文方法的有效性,對(duì)比分析不同迭代次數(shù)情況下,3種方法進(jìn)行大數(shù)據(jù)隱匿特征安全提取的誤差對(duì)比結(jié)果(表2).分類正確率實(shí)驗(yàn)結(jié)果如圖6所示.提取誤差越小,分類正確率越高,表明安全提取精度越高.由表2可知,本文方法的平均誤差為2.69%,分別比其他2種方法的平均標(biāo)準(zhǔn)差低11.09個(gè)百分點(diǎn)和5.51個(gè)百分點(diǎn),表明本文方法具有更好的應(yīng)用性能.這是因?yàn)楸疚姆椒ㄖ谢旌厦艽a體制的建立融合了密鑰對(duì)稱與公鑰封裝機(jī)制,提升了大數(shù)據(jù)隱匿性,降低了特征安全提取誤差.
由圖6可知,其他2種方法平均分類正確率分別為73%、80%,而本文方法平均分類正確率為95%,表明本文方法提取精度高且能迅速地達(dá)到高收斂狀態(tài).這是因?yàn)楸疚姆椒ㄔ谒褜び行?shù)據(jù)域的基礎(chǔ)上,采用高級(jí)加密AES算法生成大數(shù)據(jù)密文.經(jīng)過(guò)加密后的密鑰和密文,通過(guò)數(shù)據(jù)傳輸至指定對(duì)象.
2.4 運(yùn)行時(shí)間測(cè)試采用3種方法對(duì)實(shí)驗(yàn)數(shù)據(jù)實(shí)施大數(shù)據(jù)隱匿特征安全提取,測(cè)試3種方法運(yùn)行時(shí)間.運(yùn)行時(shí)間越短,表明大數(shù)據(jù)隱匿特征安全提取效率越快,具體對(duì)比結(jié)果如圖7所示.
由圖7可知,本文方法的運(yùn)行時(shí)間低于15 ms,其他2種方法的運(yùn)行時(shí)間超過(guò)40 ms,表明本文方法提取效率更高.這是因?yàn)楸疚姆椒ㄔ趯?duì)稱加密方法和非對(duì)稱加密方法對(duì)大數(shù)據(jù)隱匿性安全特征信息實(shí)施分類的基礎(chǔ)上,大數(shù)據(jù)隱匿性特征選擇通過(guò)對(duì)特征集合實(shí)施評(píng)價(jià)實(shí)現(xiàn).3 結(jié)論為解決傳統(tǒng)方法大數(shù)據(jù)隱匿性特征類別混亂、提取精度偏低以及冗余度較高的問(wèn)題,本文提出基于混合密碼體制的大數(shù)據(jù)隱匿性特征安全提取方法.實(shí)驗(yàn)結(jié)果表明,本文方法大數(shù)據(jù)隱匿性特征安全提取誤差低,數(shù)據(jù)提取效果較好、冗余度較低,提取精度高,且運(yùn)行時(shí)間低于15 ms,提取效率高.在進(jìn)行大數(shù)據(jù)隱匿性特征安全提取后,運(yùn)用更先進(jìn)的技術(shù)精細(xì)化處理大數(shù)據(jù)以及算法安全性,是下一步主要的研究方向.
參考文獻(xiàn) References
[1]楊國(guó)強(qiáng),丁杭超,鄒靜,等.基于高性能密碼實(shí)現(xiàn)的大數(shù)據(jù)安全方案[J].計(jì)算機(jī)研究與發(fā)展,2019,56(10):2207-2215YANG Guoqiang,DING Hangchao,ZOU Jing,et al.A big data security scheme based on high-performance cryptography implementation[J].Journal of Computer Research and Development,2019,56(10):2207-2215
[2] 徐超,陳勇,葛紅美,等.基于大數(shù)據(jù)的審計(jì)技術(shù)研究[J].電子學(xué)報(bào),2020,48(5):1003-1017XU Chao,CHEN Yong,GE Hongmei,et al.Audit technology research based on big data[J].Acta Electronica Sinica,2020,48(5):1003-1017
[3] 王永坤,羅萱,金耀輝.基于私有云和物理機(jī)的混合型大數(shù)據(jù)平臺(tái)設(shè)計(jì)及實(shí)現(xiàn)[J].計(jì)算機(jī)工程與科學(xué),2018,40(2):191-199 WANG Yongkun,LUO Xuan,JIN Yaohui.A hybrid big data platform based on private cloud VMs and bare metals[J].Computer Engineering & Science,2018,40(2):191-199
[4] 楊麗麗.船用物聯(lián)網(wǎng)大數(shù)據(jù)加密的混合密碼體制[J].艦船科學(xué)技術(shù),2020,42(4):196-198YANG Lili.Mixed cryptography system encrypted by big data on marine internet of things[J].Ship Science and Technology,2020,42(4):196-198
[5] 王安琪.大數(shù)據(jù)戰(zhàn)略下網(wǎng)絡(luò)用戶協(xié)議語(yǔ)言問(wèn)題與監(jiān)管建議[J].遼東學(xué)院學(xué)報(bào)(社會(huì)科學(xué)版),2019,21(3):69-73WANG Anqi.Network user agreement language under big data strategy:problems and suggestions[J].Journal of Eastern Liaoning University (Social Sciences),2019,21(3):69-73
[6] 蔡柳萍,解輝,張福泉,等.基于稀疏表示和特征加權(quán)的大數(shù)據(jù)挖掘方法的研究[J].計(jì)算機(jī)科學(xué),2018,45(11):256-260CAI Liuping,XIE Hui,ZHANG Fuquan,et al.Study on big data mining method based on sparse representation and feature weighting[J].Computer Science,2018,45(11):256-260
[7] Zhang C,Liu X J.Feature extraction of ancient Chinese characters based on deep convolution neural network and big data analysis[J].Computational Intelligence and Neuroscience,2021,2021:2491116
[8] 張啟星,付敬奇.基于信道特征提取的物理層安全密鑰生成方法[J].電子測(cè)量與儀器學(xué)報(bào),2019,33(1):16-22ZHANG Qixing,F(xiàn)U Jingqi.Physical layer security key generation method based on channel feature extraction[J].Journal of Electronic Measurement and Instrumentation,2019,33(1):16-22
[9] 王妍,李俊,曾輝,等.一種基于互信息的實(shí)時(shí)特征提取算法[J].小型微型計(jì)算機(jī)系統(tǒng),2019,40(6):1242-1247WANG Yan,LI Jun,ZENG Hui,et al.Real-time feature extraction algorithm based on mutual information[J].Journal of Chinese Computer Systems,2019,40(6):1242-1247
[10] 段大高,趙振東,梁少虎,等.基于條件變分自編碼的密碼攻擊算法[J].計(jì)算機(jī)應(yīng)用研究,2020,37(3):821-823,837DUAN Dagao,ZHAO Zhendong,LIANG Shaohu,et al.Password cracking algorithm using conditional variational auto-encoders[J].Application Research of Computers,2020,37(3):821-823,837
[11] 吳穎,李曉玲,唐晶磊.Hadoop平臺(tái)下粒子濾波結(jié)合改進(jìn)ABC算法的IoT大數(shù)據(jù)特征選擇方法[J].計(jì)算機(jī)應(yīng)用研究,2019,36(11):3297-3301WU Ying,LI Xiaoling,TANG Jinglei.Internet of things big data feature selection method based on particle filter and improved ABC algorithm on Hadoop platform[J].Application Research of Computers,2019,36(11):3297-3301
[12] Cole J M.A design-to-device pipeline for data-driven materials discovery[J].Accounts of Chemical Research,2020,53(3):599-610
[13] 劉波濤,彭長(zhǎng)根,吳睿雪,等.基于MILP方法的LED密碼安全性分析[J].計(jì)算機(jī)應(yīng)用研究,2020,37(2):505-509,517LIU Botao,PENG Changgen,WU Ruixue,et al.Based on MILP method for security analysis of LED[J].Application Research of Computers,2020,37(2):505-509,517
[14] Zhang F,Yang Y H.Feature vector extraction algorithm based on big data in engineering quality[J].E3S Web of Conferences,2021,257:02029
[15] Elaggoune Z,Maamri R,Boussebough I.A fuzzy agent approach for smart data extraction in big data environments[J].Journal of King Saud University:Computer and Information Sciences,2020,32(4):465-478
Secure extraction of hidden big data featuresbased on hybrid cryptosystem
LIU Xiaodu ZHAO Huiqi
1Information Center of China Association for Science and Technology,Beijing 100863
2College of Intelligent Equipment,Shandong University of Science and Technology,Taian 271019
Abstract The chaotic categories of hidden big data features,combined with the ignorance of the public key and key encapsulation of big data ciphertext,result in low extraction accuracy and high redundancy of traditional hidden big data feature extraction methods.Here,a secure extraction approach of hidden features of big data is proposed based on hybrid cryptosystem.First,the big data ciphertext is generated through public key encapsulation and cryptographic key encapsulation mechanisms in hybrid cryptosystem.Second,the hidden big data characteristics are categorized based on symmetric encryption and asymmetric encryption designed according to the content of big data ciphertext,which are then used to construct the phase space of big data hidden features and calculate the correlation dimension between big data,thus realize the secure extraction of hidden big data features.The experimental results show that,compared with traditional methods,the proposed approach has low redundancy,high accuracy of classification rate for big data hidden features up to 95%,and low error of feature extraction,verifying the feasibility and application prospect of the proposed approach.
Key words mixed cipher system;big data;occult characteristics;secure extraction;hybrid algorithm;correlation dimension