• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看

      ?

      基于迭代主成分分析的哈希算法研究與實(shí)現(xiàn)

      2018-09-29 02:38李志韓王洪亞

      李志韓 王洪亞

      摘 要:為了提高高維空間近鄰搜索算法的查詢(xún)性能,本文結(jié)合DSH算法和迭代PCA方法的優(yōu)點(diǎn)提出迭代PCA哈希算法。該算法查詢(xún)效果良好,充分利用數(shù)據(jù)集的分布信息、有嚴(yán)格的理論保證。該算法在達(dá)到相同精度的條件下較LSH算法和DSH算法查詢(xún)花費(fèi)時(shí)間少。該算法提供了一種解決近鄰搜索問(wèn)題有效方法。

      關(guān)鍵詞:近鄰查找; 迭代PCA; 查詢(xún)性能; 哈希算法

      Abstract: In order to improve the query performance of the neighbor search algorithm, this paper combines the advantages of DSH algorithm and iterative PCA method to propose an iterative PCA hash algorithm. The algorithm has a strict theoretical guarantee and makes full use of the distribution information of the data set, therefore has a good query effect. Compared with the LSH algorithm and the DSH algorithm, this algorithm requires less query time to achieve the same accuracy. The algorithm provides an effective method to solve the neighbor search problem.

      Key words: nearest neighbor search; iterative PCA; query performance; hash algorithm

      引言

      近鄰搜索被廣泛應(yīng)用在信息檢索、數(shù)據(jù)庫(kù)、圖像識(shí)別、數(shù)據(jù)挖掘以及機(jī)器學(xué)習(xí)等方面。特別是在當(dāng)下的大數(shù)據(jù)時(shí)代,億萬(wàn)級(jí)別的數(shù)據(jù)量等待著計(jì)算機(jī)來(lái)處理,因此能否提出一種近鄰搜索的高效方法,在有限的時(shí)間和空間之內(nèi)滿(mǎn)足用戶(hù)的近鄰查詢(xún)需求就顯得尤為重要。

      針對(duì)以上背景,本文提出迭代PCA哈希算法。介紹了迭代PCA哈希算法的基本思想;通過(guò)實(shí)驗(yàn)驗(yàn)證了迭代PCA哈希算法的實(shí)驗(yàn)效果優(yōu)于DSH算法和LSH算法。

      1 迭代PCA哈希算法

      在近鄰搜索研究領(lǐng)域,LSH算法[1]已經(jīng)具有相當(dāng)不錯(cuò)的查詢(xún)效果。但LSH算法在維度升高時(shí),查詢(xún)效果下降明顯。為了能更好地利用數(shù)據(jù)分布信息,學(xué)者們提出了不同的哈希學(xué)習(xí)近鄰搜索算法,比如DSH算法[5]和迭代PCA方法[4]。本文結(jié)合DSH算法和迭代PCA方法的優(yōu)點(diǎn),提出了迭代PCA哈希算法,并運(yùn)用python編程實(shí)現(xiàn)LSH算法、DSH算法、迭代PCA哈希算法,基于實(shí)驗(yàn)結(jié)果進(jìn)而分析3種算法的查詢(xún)效果。

      1.1 基本思想

      文獻(xiàn)[5]提出的DSH算法實(shí)現(xiàn)簡(jiǎn)單,查詢(xún)效果優(yōu)于LSH算法。但DSH算法沒(méi)有理論證明,對(duì)數(shù)據(jù)集只進(jìn)行一次PCA,沒(méi)能充分利用數(shù)據(jù)集的分布信息。而文獻(xiàn)[4]提出的迭代PCA方法有嚴(yán)格的理論證明,通過(guò)反復(fù)對(duì)數(shù)據(jù)集進(jìn)行PCA,充分學(xué)習(xí)數(shù)據(jù)集的分布信息。然而迭代PCA方法存在2個(gè)缺陷,首先參數(shù)難以設(shè)定,其次點(diǎn)到空間的距離定義不合理。結(jié)合DSH算法和迭代PCA方法的優(yōu)點(diǎn),本文提出迭代PCA哈希算法。迭代PCA哈希算法反復(fù)對(duì)數(shù)據(jù)集進(jìn)行PCA,充分學(xué)習(xí)數(shù)據(jù)集的分布信息;迭代PCA方法為迭代PCA哈希算法提供嚴(yán)格的理論證明;查詢(xún)實(shí)現(xiàn)上借鑒位置敏感哈希的實(shí)現(xiàn)方法,實(shí)現(xiàn)簡(jiǎn)單。

      3 結(jié)束語(yǔ)

      本文給出了PCA空間良好“捕獲”數(shù)據(jù)集點(diǎn)的距離定義。這個(gè)距離定義是在PCA數(shù)學(xué)理論基礎(chǔ)上提出的,迭代PCA方法沒(méi)有明確的給出良好“捕獲”的定義,參數(shù)難以確定,在實(shí)現(xiàn)上很難做到。提出的距離定義依據(jù)投影向量的垂直距離來(lái)定義,幾何含義非常的明確清晰。

      文中提出了一種近鄰搜索的方法,迭代PCA哈希算法。迭代PCA哈希算法經(jīng)過(guò)對(duì)數(shù)據(jù)集的學(xué)習(xí),將數(shù)據(jù)集進(jìn)行分區(qū)。不過(guò)迭代PCA哈希算法所占的索引大小和內(nèi)存大小與數(shù)據(jù)導(dǎo)向型哈希算法及位置敏感哈希算法一樣。迭代PCA哈希算法在達(dá)到相同精度的前提下,大大降低了數(shù)據(jù)導(dǎo)向型哈希算法和位置敏感哈希算法所遍歷的點(diǎn)的個(gè)數(shù),降低了算法的查詢(xún)時(shí)間。

      實(shí)驗(yàn)驗(yàn)證迭代PCA哈希算法的查詢(xún)效果良好。運(yùn)用python實(shí)驗(yàn)工具實(shí)現(xiàn)LSH算法、DSH算法、迭代PCA哈希算法。實(shí)驗(yàn)結(jié)果表明在達(dá)到相同查詢(xún)精度,迭代PCA哈希算法查詢(xún)時(shí)間最少;迭代PCA哈希算法不存在奇異點(diǎn)(0,0),表明數(shù)據(jù)集經(jīng)迭代PCA哈希算法投影分布稠密、均勻;迭代PCA哈希算法快速收斂到較高精度位置,減少尋找合適閾值w的時(shí)間。

      參考文獻(xiàn)

      [1] INDYK P, MOTWANI R. Approximate nearest neighbors: Towards removing the curse of dimensionality[C]//Proceedings of the 30th Symposium on Theory of Computing(STOC'98). DALLAS, TX, USA:ACM,1998: 604-613.

      [2] SUN Yifang, WANG Wei, QIN Jianbin,et al. SRS: Solving c-approximate nearest neighbor queries in high dimensional Euclidean space with a tiny index[J].PVLDB, 2014,8(1): 1-12.

      [3] HUANG Qiang, FENG Jianlin, ZHANG Yikai, et al. Query-aware locality-sensitive hashing for approximate nearest neighbor search[C]// PVLDB ,2015,9(1): 1-12.

      [4] ABDULLAH A,ANDONI A,KANNAN R, et al. Spectral approaches to nearest neighbor search [C]//2014 IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS).Philadelphia, PA, USA:IEEE, 2014:581-590.

      [5] ZHANG Wei, GAO Ke, ZHANG Yongdong, et al. Data-oriented locality sensitive hashing [C]//Proceedings of the 18th ACM international conference on Multimedia. Firenze, Italy :ACM,2010:1131-1134 .

      [6] HOTELLING H. Analysis of a complex of statistical variables into principal components[J]. Journal of Educational Psychology, 1933,24:417-441.

      [7] ROBINSONJ T. The K-D-B-Tree : A search structure for large multidimensio- nal dynamic indexes [C]//Proc. ACM-SIGMOD International Conference on Management of Data .Ann Arbor,MI,USA:ACM,1981:10-18.

      [8] HENRICH A, SIX H W, WIDMAYER P. The LSD tree: Spatial access to multidimensional point and non point objects[C]//Proceedings of the Fifteenth International Conference on VLDB. Amsterdam, The Netherlands:Morgan Kaufman,1989:43-53.

      [9] KIBRIYA A M, FRANK E. An empirical comparison of exact nearest neighbour algorithms[M]//KOK J N, KORONACKI J, DE MANTARAS R L, et al. Knowledge Discovery in Databases: PKDD 2007. Lecture Notes in Computer Science. Berlin/ Heidelberg:Springer, 2007:140-151.

      [10]MOORE A. Efficient Memory-based Learning for Robot Control[D]. USA:University of Cambridge, 1991.

      [11]GUTTMAN A. R-trees: a dynamic index structure for spratial searching[M]//Readings in database systems. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc.,1988:599-609.

      [12]BECKMANNN, KRIEGEL H P, SCHNEIDER R, et al. The R*-tree: An efficient and robust access method for points and rectangles+ [C]// Proceedings of the 1990 ACM SIGMOD international conference on Management of daTA. Atlantic City:ACM,1990:322-331.

      [13]SELLIS T K, ROUSSOPOULOS N, FALOUTSOS C. The R+-tree: A dynamic index for multidimensional objects [C]// Proceedings of the 13th VLDB. Brighton, England:[s.n.], 1987: 507- 518.

      [14]常會(huì)麗.基于移動(dòng)搜索的關(guān)鍵詞優(yōu)化技術(shù)探索與研究[J]. 信息與電腦,2018(6):6-7,10.

      [15]BAWA M,CONDIE T, GANESAN P. LSH Forest: Self tuning indexes for similarity search[C]// International Conference on World Wide Web, WWW 2005. Chiba, Japan:IW3C2,2005:651-660.

      南京市| 驻马店市| 南川市| 英德市| 会宁县| 长泰县| 本溪市| 讷河市| 栾城县| 陆川县| 拜泉县| 涿鹿县| 丰顺县| 章丘市| 阿拉善左旗| 永仁县| 拉萨市| 桐乡市| 札达县| 南汇区| 华池县| 乐陵市| 措美县| 望奎县| 衡东县| 建水县| 延庆县| 承德市| 浏阳市| 宜兰市| 台安县| 凤台县| 万州区| 墨竹工卡县| 舒兰市| 龙井市| 凌源市| 东光县| 澳门| 阿巴嘎旗| 石狮市|