鄧文雯 孫成明 秦培亮
摘 要: 針對(duì)傳統(tǒng)數(shù)據(jù)采集方法主要通過(guò)數(shù)據(jù)特征采集,忽略采集過(guò)程對(duì)數(shù)據(jù)特征造成的影響,導(dǎo)致數(shù)據(jù)采集耗時(shí)長(zhǎng)、誤差大的問(wèn)題,提出基于REID技術(shù)與F統(tǒng)計(jì)計(jì)量結(jié)合的云儲(chǔ)存海量數(shù)據(jù)采集方法。在分析數(shù)據(jù)采集原理的基礎(chǔ)上,對(duì)云儲(chǔ)存的原始數(shù)據(jù)進(jìn)行非線性補(bǔ)償,設(shè)置參數(shù)對(duì)數(shù)據(jù)進(jìn)行預(yù)處理,建立一種能夠?qū)?nèi)存進(jìn)行直接訪問(wèn)的硬件機(jī)制,給出部分傳輸程序;采用聚類算法對(duì)云儲(chǔ)存數(shù)據(jù)進(jìn)行聚類,結(jié)合F統(tǒng)計(jì)計(jì)量進(jìn)行檢驗(yàn)所建立的判別函數(shù)的有效性,實(shí)現(xiàn)對(duì)云儲(chǔ)存海量數(shù)據(jù)的采集。實(shí)驗(yàn)結(jié)果表明,采用改進(jìn)方法進(jìn)行云儲(chǔ)存數(shù)據(jù)采集時(shí),其采集結(jié)果相比傳統(tǒng)方法精度及完整度均有提高,具有一定的優(yōu)勢(shì)。
關(guān)鍵詞: 云儲(chǔ)存; 海量數(shù)據(jù)采集; REID技術(shù); F統(tǒng)計(jì)計(jì)量; 非線性補(bǔ)償; 聚類算法
中圖分類號(hào): TN911?34; TP391 文獻(xiàn)標(biāo)識(shí)碼: A 文章編號(hào): 1004?373X(2018)14?0010?04
Research on acquisition method of cloud storage mass data
DENG Wenwen1,2, SUN Chengming3, QIN Peiliang2
(1. School of Accounting & Information Systems, Virginia Polytechnic Institute and State University, Virginia 24061, U.S.A;
2. Smart Agriculture School of Suzhou Polytechnic Institute of Agriculture, Suzhou 215008, China;
3. Agricultural College of Yangzhou University, Yangzhou 225127, China)
Abstract: In allusion to the problems of long time consumption and big error of data acquisition existing in the traditional data collection method, in which the influence of acquisition process on data features is ignored due to its data feature acquisition, a cloud storage mass data acquisition method based on the combination of REID technology and F statistical metrology is proposed. On the basis of analyzing the data acquisition principle, nonlinear compensation is conducted for the cloud storage original data, some parameters are set for data preprocessing to establish a hardware mechanism that can directly access the memory, and part of transmission programs are given. The clustering algorithm is adopted to cluster the cloud storage data, and the F statistical metrology is combined to detect the effectiveness of the established discrimination function, so as to realize the acquisition of cloud storage mass data. The experimental results show that the acquisition precision and integrity of the improved cloud storage data acquisition method are higher than those of the traditional data acquisition method.
Keywords: cloud storage; mass data acquisition; REID technology; F statistical metrology; nonlinear compensation; clustering algorithm
0 引 言
網(wǎng)絡(luò)上的數(shù)據(jù)量隨著互聯(lián)網(wǎng)的快速發(fā)展而呈現(xiàn)爆炸式增長(zhǎng)態(tài)勢(shì),也導(dǎo)致了數(shù)據(jù)存儲(chǔ)成本高,存儲(chǔ)可靠性低,大量數(shù)據(jù)管理困難等問(wèn)題長(zhǎng)期困擾企業(yè)[1]。這些困難導(dǎo)致許多企業(yè)開(kāi)始考慮將數(shù)據(jù)存儲(chǔ)從企業(yè)本身分離出來(lái),交給專門的云存儲(chǔ)服務(wù)供應(yīng)商進(jìn)行管理。云存儲(chǔ)技術(shù)同時(shí)具備分步文獻(xiàn)、網(wǎng)絡(luò)技術(shù)、集群應(yīng)用等系統(tǒng)功能,能夠通過(guò)應(yīng)用軟件,將網(wǎng)絡(luò)中的不同類型的存儲(chǔ)設(shè)備急用運(yùn)用,協(xié)調(diào)工作。其具有高可靠性、高通用性、高擴(kuò)展性及大容量存儲(chǔ)等特點(diǎn),因此,其對(duì)數(shù)據(jù)采集提出更高的要求[2]。傳統(tǒng)方法主要在各采集步驟采用以太網(wǎng)、TCP/IP網(wǎng)絡(luò)通信協(xié)議,通過(guò)對(duì)標(biāo)準(zhǔn)網(wǎng)絡(luò)協(xié)議進(jìn)行改進(jìn)、簡(jiǎn)化,減小采集出現(xiàn)延時(shí)的現(xiàn)象;但忽略了數(shù)據(jù)特征對(duì)采集結(jié)果造成的影響,導(dǎo)致采集耗時(shí)長(zhǎng)、誤差大的問(wèn)題。因此,本文提出基于REID技術(shù)與F統(tǒng)計(jì)計(jì)量結(jié)合的云儲(chǔ)存海量數(shù)據(jù)采集方法。
1 數(shù)據(jù)采集原理及特點(diǎn)
目前的云儲(chǔ)存數(shù)據(jù)采集技術(shù)多以使用成熟且價(jià)格低廉的條碼技術(shù)為基礎(chǔ)。由于數(shù)據(jù)云儲(chǔ)存速度快,會(huì)遇到存儲(chǔ)環(huán)境惡劣,條形碼信息受干擾容易誤讀、漏讀的現(xiàn)象[3],所以多采用REID技術(shù)。數(shù)據(jù)采集原理如圖1所示,存儲(chǔ)數(shù)據(jù)的無(wú)源電子標(biāo)簽進(jìn)入磁場(chǎng)后,接收讀寫(xiě)器發(fā)出的信號(hào),通過(guò)數(shù)據(jù)感應(yīng)模塊獲得云儲(chǔ)存數(shù)據(jù)在芯片中存儲(chǔ)形式,讀寫(xiě)器接收數(shù)據(jù)儲(chǔ)存解碼后再傳輸給具體的采集系統(tǒng),最終實(shí)現(xiàn)云儲(chǔ)存海量數(shù)據(jù)的自動(dòng)采集。
2 數(shù)據(jù)預(yù)處理
海量數(shù)據(jù)采集程序中寫(xiě)入FIFO中的數(shù)據(jù),包括幀頭、通道數(shù)、數(shù)據(jù),再對(duì)原始數(shù)據(jù)進(jìn)行提取處理。首先將這些原始數(shù)據(jù)進(jìn)行非線性補(bǔ)償?shù)玫嚼硐氲脑拼鎯?chǔ)數(shù)據(jù)[4]。然后截取部分?jǐn)?shù)據(jù)進(jìn)行計(jì)算,在循環(huán)計(jì)算中加入1個(gè)計(jì)數(shù)器,當(dāng)讀取出1個(gè)數(shù)時(shí),計(jì)數(shù)器數(shù)值加1,直到獲取足夠用的云存儲(chǔ)數(shù)據(jù)后停止。
LabVIEW FPGA軟件提供了計(jì)算相位的控件及對(duì)應(yīng)的計(jì)算方法。算法的參數(shù)可以在控件內(nèi)進(jìn)行設(shè)置[5]。控件中的算法具備數(shù)據(jù)量大,計(jì)算快速的性能特點(diǎn),因此只需把SCTL所需的數(shù)據(jù)錄入到空間中,就能夠計(jì)算出經(jīng)過(guò)選取后的結(jié)果特征,將計(jì)算出的數(shù)據(jù)特征寫(xiě)入到與之對(duì)應(yīng)的存儲(chǔ)器中[6?7]。再將存儲(chǔ)器中的數(shù)據(jù)讀取出來(lái),在對(duì)應(yīng)的計(jì)算控件中的數(shù)據(jù)特征點(diǎn)的對(duì)應(yīng)位置輸入這些數(shù)據(jù),以此為依據(jù)截取中心頻率點(diǎn)。而附近相對(duì)的頻率點(diǎn)寫(xiě)入DMAFIFO中,完成海量數(shù)據(jù)預(yù)處理過(guò)程,整體框架如圖2所示。需要注意的是,數(shù)據(jù)量與通道數(shù)量必須一一對(duì)應(yīng),否則上位機(jī)無(wú)法判斷解調(diào)得到的結(jié)果屬于哪個(gè)通道。
3 數(shù)據(jù)傳輸程序
在存儲(chǔ)層上進(jìn)行的存儲(chǔ)器與數(shù)據(jù)之間的數(shù)據(jù)傳輸,首先將處理過(guò)的云儲(chǔ)存數(shù)據(jù)輸入到傳輸層。利用數(shù)據(jù)收集應(yīng)用廣泛的DMA,建立一種能夠?qū)?nèi)存進(jìn)行直接訪問(wèn)的硬件機(jī)制,借助主內(nèi)存與外圍設(shè)備之間的鏈接直接傳輸?shù)絻?chǔ)存層[8],不需要再通過(guò)處理器進(jìn)行進(jìn)一步處理。當(dāng)使用這種機(jī)制時(shí),與設(shè)備之間傳輸量會(huì)得到很大的提高。由于海量數(shù)據(jù)傳輸?shù)木_度高,在單一傳輸層內(nèi)部的不同傳輸通道間借助FIFO進(jìn)行數(shù)據(jù)傳遞難度較低,但海量云存儲(chǔ)在不同傳輸層之間實(shí)現(xiàn)數(shù)據(jù)則較為復(fù)雜[9]。在采集過(guò)程中,云儲(chǔ)存數(shù)據(jù)的特征直接影響數(shù)據(jù)采集速率,需要在采集過(guò)程中完整地讀取出數(shù)據(jù),防止出現(xiàn)云儲(chǔ)存數(shù)據(jù)丟失的情況[10],因此須采用DMAFIFO方式,部分云儲(chǔ)存數(shù)據(jù)傳輸程序代碼如下:
} //數(shù)據(jù)采集結(jié)束
4 云儲(chǔ)存海量數(shù)據(jù)采集方法優(yōu)化
在對(duì)云儲(chǔ)存海量數(shù)據(jù)進(jìn)行預(yù)處理及傳輸?shù)幕A(chǔ)上,對(duì)其采集方法進(jìn)行優(yōu)化,詳細(xì)步驟如下,流程圖如圖3所示。
1) 訓(xùn)練數(shù)據(jù)集。從云存儲(chǔ)器中采集海量數(shù)據(jù),除留下部分所需數(shù)據(jù)外,其余數(shù)據(jù)作為訓(xùn)練數(shù)據(jù)集參與相關(guān)的采集計(jì)算[11]。
2) 聚類算法。依據(jù)實(shí)際需要,采用k?medoids聚類算法,將訓(xùn)練數(shù)據(jù)聚成[k]個(gè)類。由于存儲(chǔ)過(guò)程中會(huì)對(duì)云儲(chǔ)存數(shù)據(jù)形成干擾,則在滿足理想狀態(tài)下,兩個(gè)云儲(chǔ)存數(shù)據(jù)分別為[f1=A+Bcos φ(t)],[f2=A+Bsin φ(t)]。其中,A為干擾參數(shù),B為干擾篇頻率,[φ(t)]為受干擾后的數(shù)據(jù)信息,[φ(t)=2kL(t)],[k=2π/λ1],[L(t)]為受干擾的時(shí)長(zhǎng)。要進(jìn)行聚類,得到[L(t)]只需要數(shù)據(jù)信息求得[φ(t)],提取干擾數(shù)據(jù)進(jìn)行歸一化得到[g1=cos φ(t)],[g2=sin φ(t)],并進(jìn)行聚類,則云儲(chǔ)存數(shù)據(jù)信息[φ(t)]為:
[φ(t)=01g1g2-g1dt] (1)
3) 依據(jù)訓(xùn)練數(shù)據(jù)集,及其聚類結(jié)果建立[fisher]判斷函數(shù),運(yùn)用方差理論計(jì)算出判別函數(shù)。
4) 判別準(zhǔn)則。將新測(cè)樣本代入判別函數(shù)檢驗(yàn)新樣本[x]屬于是否需要采集,即把具有[p]個(gè)指標(biāo)的樣本[x]代入判別函數(shù),使得[λ(α)=(α′Aα)(α′Eα)]取極大值,此時(shí)對(duì)應(yīng)的[yi=maxishskyh],則[x∈Gi]。假設(shè)數(shù)據(jù)受干擾時(shí)長(zhǎng)為[L0],輸入數(shù)量分別為[λ1],[λ2],要滿足采集云儲(chǔ)存海量數(shù)據(jù)的要求,云儲(chǔ)存數(shù)據(jù)之間對(duì)應(yīng)數(shù)據(jù)信息為[φ1],[φ2],則需要滿足以下要求:
[φ1-φ2=4πλ2-λ1λ1λ2, L0=nπ+π2] (2)
式中,n=0,1,2,…。
5) 檢驗(yàn)采集判別函數(shù)有效性。運(yùn)用F統(tǒng)計(jì)計(jì)量進(jìn)行檢驗(yàn)所建立的判別函數(shù)的有效性。如果有效,則可對(duì)云儲(chǔ)存海量數(shù)據(jù)進(jìn)行采集,反之尋找其他方法。
6) 采集結(jié)束。亦即將符合[yi=maxishskyh]的[x]進(jìn)行采集。
5 實(shí)驗(yàn)結(jié)果分析
為了驗(yàn)證改進(jìn)方法在云儲(chǔ)存數(shù)據(jù)采集方面的有效性及可行性,采用改進(jìn)方法與傳統(tǒng)方法為對(duì)比,以數(shù)據(jù)采集量及完整度為指標(biāo),在0.5 cm×0.5 cm區(qū)域內(nèi)進(jìn)行對(duì)比分析,結(jié)果如圖4、圖5所示。
由圖4、圖5可知,在0.5 cm×0.5 cm區(qū)域內(nèi)進(jìn)行數(shù)據(jù)采集分析時(shí),采用傳統(tǒng)方法,在遠(yuǎn)離分割線越遠(yuǎn),云儲(chǔ)存數(shù)據(jù)采集多次出現(xiàn)不完整現(xiàn)象,且采集量過(guò)少,導(dǎo)致數(shù)據(jù)采集結(jié)果誤差越大,耗時(shí)越長(zhǎng);采用改進(jìn)方法相比傳統(tǒng)方法,數(shù)據(jù)沿著分割線逐漸降低,但未出現(xiàn)數(shù)據(jù)不完整的現(xiàn)象,分割線左右呈現(xiàn)相對(duì)應(yīng)的形式,且采集量較大,具有一定的優(yōu)勢(shì)。
6 結(jié) 論
本文提出基于REID技術(shù)與F統(tǒng)計(jì)計(jì)量結(jié)合的云儲(chǔ)存海量數(shù)據(jù)采集方法,達(dá)到了降低數(shù)據(jù)采集能耗,提高采集效率的目的。在相同區(qū)域采用傳統(tǒng)采集方法為對(duì)比,其采集誤差降低、準(zhǔn)確率提高,能夠更完整地進(jìn)行采集。改進(jìn)數(shù)據(jù)采集方法主要針對(duì)云儲(chǔ)存數(shù)據(jù)進(jìn)行采集,對(duì)于數(shù)據(jù)特征處理及采集環(huán)境對(duì)采集結(jié)果的影響,有待進(jìn)一步研究。
參考文獻(xiàn)
[1] 董一兵,劉麗,楊銳,等.一種測(cè)震儀器數(shù)據(jù)實(shí)時(shí)接入中間件設(shè)計(jì)與實(shí)現(xiàn)[J].地震工程學(xué)報(bào),2017,39(5):969?975.
DONG Yibing, LIU Li, YANG Rui, et al. Design and implementation of the middleware to access realtime stream of digitizers [J]. China earthquake engineering journal, 2017, 39(5): 969?975.
[2] 趙芳云,張明富.基于云存儲(chǔ)的海量海洋監(jiān)測(cè)數(shù)據(jù)平臺(tái)設(shè)計(jì)[J].艦船科學(xué)技術(shù),2016,38(13):143?148.
ZHAO Fangyun, ZHANG Mingfu. Based on monitoring data of vast ocean cloud storage platform design [J]. Ship science and technology, 2016, 38(13): 143?148.
[3] 徐立艷.基于ARM和LabVIEW的網(wǎng)絡(luò)數(shù)據(jù)采集測(cè)試系統(tǒng)設(shè)計(jì)[J].現(xiàn)代電子技術(shù),2016,39(5):24?27.
XU Liyan. Design of network data acquisition and test system based on ARM and LabVIEW [J]. Modern electronics technique, 2016, 39(5): 24?27.
[4] 韓立,劉正捷,李暉,等.基于情境感知的遠(yuǎn)程用戶體驗(yàn)數(shù)據(jù)采集方法[J].計(jì)算機(jī)學(xué)報(bào),2015(11):2234?2246.
HAN Li, LIU Zhengjie, LI Hui, et al. A method based on context?awareness for remote user experience data capturing [J]. Chinese journal of computers, 2015(11): 2234?2246.
[5] 趙妍,蘇玉召.一種批量數(shù)據(jù)處理的云存儲(chǔ)方法[J].科技通報(bào),2017,33(7):81?85.
ZHAO Yan, SU Yuzhao. A cloud storage method of batch data processing [J]. Bulletin of science and technology, 2017, 33(7): 81?85.
[6] 周朝揮,蔡燕霞,魯國(guó)瑞.信牌驅(qū)動(dòng)式Web數(shù)據(jù)采集模型的應(yīng)用[J].計(jì)算機(jī)應(yīng)用,2016,36(z1):252?256.
ZHOU Chaohui, CAI Yanxia, LU Guorui. Applications of XINPAI?driven Web data scraping model [J]. Journal of computer applications, 2016, 36(S1): 252?256.
[7] 高夢(mèng)超,胡慶寶,程耀東,等.基于眾包的社交網(wǎng)絡(luò)數(shù)據(jù)采集模型設(shè)計(jì)與實(shí)現(xiàn)[J].計(jì)算機(jī)工程,2015,41(4):36?40.
GAO Mengchao, HU Qingbao, CHENG Yaodong, et al. Design and implementation of crowdsourcing?based social network data collection model [J]. Computer engineering, 2015, 41(4): 36?40.
[8] 韓盈黨,李哲.MEMS加速度傳感器的數(shù)據(jù)采集和預(yù)處理[J].儀表技術(shù)與傳感器,2015(2):16?19.
HAN Yingdang, LI Zhe. Data acquisition and pre?processing based on MEMS accelerometer [J]. Instrument technique and sensor, 2015(2): 16?19.
[9] 倪曉寅,馮志生,陳瑩.2013年岷縣6.6級(jí)地震前天水臺(tái)磁通門秒數(shù)據(jù)異常提取分析[J].地震工程學(xué)報(bào),2016,38(z2):203?207.
NI Xiaoyin, FENG Zhisheng, CHEN Ying. Extraction and analysis of anomalies of the second data from GM4 fluxgate magnetometer at Tianshui station before the 2013 Minxian MS6.6 earthquake [J]. China earthquake engineering journal, 2016, 38(S2): 203?207.
[10] 邱雪松,藺艷斐,邵蘇杰,等.一種面向智能電網(wǎng)數(shù)據(jù)采集的傳感器聚合布局構(gòu)造算法[J].電子與信息學(xué)報(bào),2015,37(10):2411?2417.
QIU Xuesong, LIN Yanfei, SHAO Sujie, et al. Sensor aggregation distribution construction algorithm for smart grid data collection system [J]. Journal of electronics & information technology, 2015, 37(10): 2411?2417.
[11] 何茂輝.4G網(wǎng)絡(luò)下的多終端建筑工程現(xiàn)場(chǎng)移動(dòng)數(shù)據(jù)采集系統(tǒng)設(shè)計(jì)[J].現(xiàn)代電子技術(shù),2016,39(15):25?27.
HE Maohui. Design of multi?terminal mobile data acquisition system utilizing 4G network for architectural engineering field [J]. Modern electronics technique, 2016, 39(15): 25?27.