甘勝進(jìn),林 娟
(福建師范大學(xué)福清分校數(shù)學(xué)與計(jì)算機(jī)科學(xué)系,福建福清 350300)
線性相關(guān)系數(shù)的一種穩(wěn)健形式*
甘勝進(jìn),林 娟
(福建師范大學(xué)福清分校數(shù)學(xué)與計(jì)算機(jī)科學(xué)系,福建福清 350300)
相關(guān)系數(shù)是反應(yīng)2個(gè)隨機(jī)變量之間線性關(guān)系緊密程度的一個(gè)量,容易受異常值干擾.提出一種穩(wěn)健的形式,它在正態(tài)分布條件下的性質(zhì)與相關(guān)系數(shù)類似,但是抵御異常值遠(yuǎn)遠(yuǎn)優(yōu)于相關(guān)系數(shù),具有很好的應(yīng)用價(jià)值.
相關(guān)系數(shù);中位數(shù);中位數(shù)絕對(duì)偏差;穩(wěn)健性
二維隨機(jī)變量(X,Y)之間的線性相關(guān)系數(shù)為
由于期望、方差抗離群點(diǎn)(outlier)較差,因此在穩(wěn)健性統(tǒng)計(jì)當(dāng)中,常以中位數(shù)(median)med代替期望,以中位數(shù)絕對(duì)偏差(median absolute deviation)MAD來代替方差,其均能抵御50%離群點(diǎn)[1],抗異常值干擾能力極強(qiáng),從而ρXY的一個(gè)穩(wěn)健表達(dá)形式為
一般情況下,δXY的取值并不是像ρXY那樣介于[-1,1]之間[2],但是對(duì)于二維正態(tài)分布,δXY與ρXY具有相似的性質(zhì),筆者主要探討二維正態(tài)分布條件下δXY與ρXY的關(guān)系.
(?。﹎ed(aX+b)=amed(X)+b;
(ⅱ)當(dāng)X與Y相互獨(dú)立時(shí),med(XY)=med(X)med(Y);
(ⅲ)med(|X|)=[med(X2)].
其中X與Y均為一維隨機(jī)變量,a,b均為任意實(shí)數(shù).
證明 (ⅰ)aX+b≥amed(X)+b?X≥med(X),或X≤med(X),而p(X≥med(X))=p(X≤med(X))=,即證.
(ⅱ)當(dāng)X與Y相互獨(dú)立時(shí),有
故med[(X-med(X))(Y-med(Y))]=0.而由性質(zhì)(ⅰ)可知,
性質(zhì)(ⅰ)表明中位數(shù)具有線性性質(zhì),也稱為仿射同變性質(zhì);性質(zhì)(ⅱ)表明2個(gè)相互獨(dú)立的隨機(jī)變量的乘積的中位數(shù)等于各自中位數(shù)的乘積,這與期望的性質(zhì)是一樣的;一般情況下,med(|X|)極為難求,而性質(zhì)(ⅲ)揭示了med(|X|)與med(X2)之間簡(jiǎn)單的平方關(guān)系.
根據(jù)δXY定義以及上述中位數(shù)性質(zhì),得到δXY另一種表達(dá)式:
其中σX,σY分別為X與Y的標(biāo)準(zhǔn)差.對(duì)于二維正態(tài)分布隨機(jī)變量X和Y,有
定理1 在二維正態(tài)分布條件下,δXY與ρ形成一一對(duì)應(yīng)關(guān)系,即δXY=δ(ρ),并且δ(ρ)是ρ的增函數(shù),δ(-1)=-1,δ(0)=0,δ(1)=1.
[1] PETER J ROUSSEEUW,CHRISTOPHE CROUX.Alternatives to the Median Absolute Deviation[J].J.Amer.Statist.Assoc.,1993,88:1 273-1 283.
[2] MICHAEL FALK.On MAD and Comedians[J].Annals of the Institute of Statistical Mathematics,1997,49:615-644.
[3] 孫山澤.非參數(shù)統(tǒng)計(jì)講義[M].北京:北京大學(xué)出版社,2000.
[4] MICHAEL FALK.A Note on the Comedian for Elliptical Distributions[J].Journal of Multivariate Analysis,1998,67:306-317.
[5] STAMATIS CAMBANE,STEEL HUANG,GORDON SIMONS.On the Theory of Elliptically Contoured Distributions[J].Journal of Multivariate Analysis,1981,11:368-385.
[6] MICHAEL FALK.The Sample Covariance is Not Efficient for Elliptical Distributions[J].Journal of Multivariate A-nalysis,2002,80:358-377.
(責(zé)任編輯 向陽潔)
A Robust Form of the Linear Correlation Coefficient
GAN Sheng-jin,LIN Juan
(Department of Mathematics &Computer Science,F(xiàn)uqing Branch of Fujian Normal University,F(xiàn)uqing 350300,F(xiàn)ujian China)
Correlation coefficient reflects the closeness of the linear relationship between two random variables,which is susceptible to the interference of abnormal values.This paper presents a robust form,which has the same nature as the correlation coefficient under conditon of the normal distribution,but has higher point breaking down than that of correlation coefficient,so it has good application value.
correlation coefficient;median;median absolute deviation;robustness
O213
A
10.3969/j.issn.1007-2985.2013.04.006
1007-2985(2013)04-0023-03
2013-01-18
福建師范大學(xué)福清分校科研項(xiàng)目(KY2012025);福建省教育廳A類科技項(xiàng)目(JA12353)
甘勝進(jìn)(1982-),男,湖北黃岡人,福建師范大學(xué)福清分校數(shù)學(xué)與計(jì)算機(jī)科學(xué)系助教,碩士,主要從事應(yīng)用統(tǒng)計(jì)研究.