李月標(biāo)++李力++張毅
摘 要:數(shù)據(jù)缺失問題是交通領(lǐng)域中的主要難題之一。為了解決這一問題,國內(nèi)外的學(xué)者在近年來提出了大量的數(shù)據(jù)補(bǔ)償算法,這些算法雖然都能在一定程度上提高交通數(shù)據(jù)的準(zhǔn)確性,但其精度和運(yùn)算速度均有所區(qū)別。從已有算法中選取精度高、運(yùn)算速度快的算法對(duì)提高交通系統(tǒng)的性能具有重要的意義。該研究以目前的主流預(yù)測算法為對(duì)象,分析了各類算法的優(yōu)缺點(diǎn),并選取典型的預(yù)測類補(bǔ)償算法、插值類補(bǔ)償算法和統(tǒng)計(jì)類補(bǔ)償算法對(duì)PeMS線圈數(shù)據(jù)進(jìn)行補(bǔ)償,幾種算法的準(zhǔn)確性和運(yùn)算速度的結(jié)果表明主成份分析法PPCA具有最好的補(bǔ)償效果。進(jìn)一步分析PPCA算法與其改進(jìn)算法KPPCA和MPPCA對(duì)單點(diǎn)數(shù)據(jù)補(bǔ)償?shù)男Ч?,結(jié)果表明,改進(jìn)算法的補(bǔ)償精度稍優(yōu)于PPCA算法,但其計(jì)算時(shí)間也明顯高于PPCA算法。在此基礎(chǔ)上,分析PPCA算法和KPPCA算法對(duì)多點(diǎn)數(shù)據(jù)進(jìn)行補(bǔ)償?shù)男Ч?,結(jié)果表明考慮多點(diǎn)數(shù)據(jù)的空間關(guān)聯(lián)性可以使PPCA算法和KPPCA算法的補(bǔ)償精度得到明顯提高。同時(shí)考慮多點(diǎn)數(shù)據(jù)的時(shí)間關(guān)聯(lián)性和空間關(guān)聯(lián)性時(shí),KPPCA算法精度優(yōu)于PPCA算法,但其運(yùn)算效率明顯低于PPCA算法。因此,對(duì)單點(diǎn)數(shù)據(jù)進(jìn)行補(bǔ)償或多點(diǎn)數(shù)據(jù)間的時(shí)間關(guān)聯(lián)性不強(qiáng)時(shí),選用PPCA算法進(jìn)行補(bǔ)償能同時(shí)獲得較高的補(bǔ)償精度和運(yùn)算速度。在不考慮運(yùn)算時(shí)間成本時(shí),KPPCA算法可以獲得更高的補(bǔ)償精度。
關(guān)鍵詞:數(shù)據(jù)補(bǔ)償 主成份分析法 基于Kernel的主成份分析法 時(shí)空特征
Comparison of Traffic Imputation Methods Based on Spatial and Temporal Characteristics
Li Yuebiao Li Li Zhang Yi
(Tsinghua University)
Abstract:Data Missing is one of the major problems in the traffic area. In order to solve this problem, numerous data imputing algorithms have been proposed in recent years. All these algorithms can improve the accuracy of the collected data to some extent, but the precision and calculating speed can vary greatly. Selecting the algorithm with high accuracy and calculating speed is significant to improve the performance of the traffic systems. This research analyzes the typical algorithms of prediction methods, interpolation methods and statistical learning methods. And the advantages and disadvantages of these methods are compared. Using these typical algorithms to imputing the data from PeMS, the results show PPCA algorithm has optimal imputing effect. By further comparing the imputing effects of PPCA algorithm and improved PPCA algorithms - KPPCA algorithm and MPPCA algorithm for single detector data, we find that improved algorithms show higher accuracy but long calculating time. On this basis, this study analyzes the performances of PPCA and KPPCA algorithms for multiple detector data imputation. It turns out that considering data spatial characteristics can reduce imputing errors for both PPCA and KPPCA algorithms. While the imputing accuracy will improve for KPPCA but reduce for PPCA when taking time lag of data into account. Therefore, for single detector data or multiple detectors data whose time correlation is not obvious, PPCA is a best data imputation choice which has both high accuracy and calculating efficiency. KPPCA will show high performance on accuracy when not considering calculating time cost.
Key Words:Data Imputation; PPCA; KPPCA; Temporal and spatial characteristics