袁泉 常偉鵬
關(guān)鍵詞: Hadoop; 云計算; 圖書推薦; DAG; Apriori算法; 推薦算法
中圖分類號: TN911.1?34 ? ? ? ? ? ? ? ? ? ? ? ? ? 文獻(xiàn)標(biāo)識碼: A ? ? ? ? ? ? ? ? ? ? ?文章編號: 1004?373X(2019)01?0180?03
Abstract: An Apriori optimization algorithm based on Hadoop platform is proposed to improve the accuracy of book recommendation service. On the basis of distributed Hadoop framework, the directed acyclic graph (DAG) is used to analyze the implementation steps of parallel Map Reduce based on Hadoop platform. The Map Reduce is optimized for the traditional association rule Apriori algorithm to reduce the connection times of database, and generation of useless candidate items as much as possible, so as to shorten the task processing time. The experimental results show that, in comparison with traditional LDA recommendation algorithm, the proposed algorithm has higher accuracy, and can recommend more suitable books for borrowers.
Keywords: Hadoop; cloud computing; book recommendation; DAG; Apriori algorithm; recommendation algorithm
隨著科技的不斷進(jìn)步,傳統(tǒng)圖書館的發(fā)展模式已經(jīng)不能滿足社會大眾對圖書服務(wù)的各種需求。因此,需要實現(xiàn)圖書館的數(shù)字化和信息化,需要合適的個性化推薦技術(shù)為用戶提供感興趣和有意義的信息,例如應(yīng)用于圖書管理的個性化圖書推薦[1?2]。用戶如果想從海量的書籍中尋找自己想要的書籍[3?4],就需要花費大量的時間和精力進(jìn)行查詢和檢索,而具有圖書推薦的圖書管理信息化系統(tǒng)能夠解決用戶的此類需求問題。
在解決此類大數(shù)據(jù)挖掘問題時,Hadoop云平臺表現(xiàn)出了優(yōu)秀的性能,但是,由于數(shù)據(jù)越來越復(fù)雜且數(shù)據(jù)庫的規(guī)模變得越來越大,集中式處理方法很容易造成網(wǎng)絡(luò)擁塞問題[5]。因此,傳統(tǒng)的云計算系統(tǒng)已經(jīng)無法有效解決大數(shù)據(jù)處理任務(wù)。目前,分布式Hadoop平臺下的并行Map Reduce作業(yè)流處理技術(shù)成為當(dāng)今的研究主流[5]。為了在分布式Hadoop平臺上有效實現(xiàn)圖書推薦并進(jìn)一步提高推薦的精確度,本文提出一種基于Hadoop平臺的Apriori優(yōu)化算法。實驗結(jié)果顯示,相比傳統(tǒng)算法,所提出的算法具有較高的準(zhǔn)確度,能夠有效實現(xiàn)圖書數(shù)據(jù)挖掘。
具有圖書推薦的圖書管理信息化系統(tǒng)能夠自動地向借閱者推薦符合其興趣的圖書[5]。通過使用圖書推薦,圖書管理系統(tǒng)能夠合理、及時地向借閱者推薦潛在感興趣的圖書。解決類此大數(shù)據(jù)挖掘問題時,Hadoop云平臺表現(xiàn)出了優(yōu)秀的性能。Hadoop作為三大分布式計算系統(tǒng)之一,可以輕松完成不同結(jié)構(gòu)類型數(shù)據(jù)的集合,它可以提供跨計算機集群的分布式存儲計算環(huán)境[5]。Hadoop在數(shù)據(jù)分析方面有獨特的優(yōu)勢。大數(shù)據(jù)環(huán)境下的信息資源具有開放性特點。此外,由于大數(shù)據(jù)的上傳下載較為頻繁,特別適用于在Hadoop平臺管理。而且考慮到大數(shù)據(jù)吞吐量的問題,在用戶行為數(shù)據(jù)挖掘過程中,資源交互的流暢性尤為重要。
從圖3中可以看出,隨著圖書管理系統(tǒng)中推薦書籍的總數(shù)不斷增加,三種算法得到的準(zhǔn)確度都隨之不斷提高。其中LDA算法的提高速度最慢,傳統(tǒng)關(guān)聯(lián)規(guī)則算法的提高速度次之,本文提出方法的提高速度最快。驗證了本文提出算法的有效性和可行性,能夠有效地完成用戶圖書推薦,并且在相同條件下,相比其他兩種算法,本文提出算法的準(zhǔn)確度更高。
本文提出一種基于Hadoop平臺的Apriori優(yōu)化算法,能夠在分布式Hadoop平臺上有效實現(xiàn)圖書推薦并進(jìn)一步提高推薦的精確度。實驗結(jié)果顯示,相比傳統(tǒng)算法,本文提出的算法能夠有效地實現(xiàn)圖書數(shù)據(jù)挖掘任務(wù),并滿足圖書推薦的要求;相比于關(guān)聯(lián)規(guī)則算法與LDA算法,本文方法的圖書推薦準(zhǔn)確度更高。
參考文獻(xiàn)
[1] CHEN C M. An intelligent mobile location?aware book recommendation system that enhances problem?based learning in libraries [J]. Interactive learning environments, 2013, 21(5): 469?495.
[2] LI K C, LIANG Z Y. Personalized book recommendation algorithm based on multi?feature [J]. Computer engineering, 2012, 38(11): 34?37.
[3] YANG S T, HUNG M C. A model for book inquiry history ana?lysis and book?acquisition recommendation of libraries [J]. Library collections acquisitions & technical services, 2012, 36(3/4): 127?142.
[4] SOHAIL S S, SIDDIQUI J, ALI R. A novel approach for book recommendation using fuzzy based aggregation [J]. Indian journal of science & technology, 2017, 10(19): 1?30.
[5] YANG S T. An active recommendation approach to improve book?acquisition process [J]. International journal of electronic business management, 2012, 10(2): 108?115.
[6] 徐飛.大數(shù)據(jù)流的實時處理研究[D].無錫:江南大學(xué),2015.
XU Fei. Real?time processing of big data streams [D]. Wuxi: Jiangnan University, 2015.
[7] KHAN M, JIN Y, LI M, et al. Hadoop performance modeling for job estimation and resource provisioning [J]. IEEE transactions on parallel & distributed systems, 2016, 27(2): 441?454.
[8] 劉麗娟.改進(jìn)的Apriori算法的研究及應(yīng)用[J].計算機工程與設(shè)計,2017(12):3324?3328.
LIU Lijuan. Research and application of improved Apriori algorithm [J]. Computer engineering and design, 2017(12): 3324?3328.
[9] RAJAGOPAL S, KWAN A. Book recommendation system using data mining for the University of Hong Kong Libraries [J]. ITEC journal, 2012, 58(4): 393?401.