南方醫(yī)科大學(xué)公共衛(wèi)生學(xué)院生物統(tǒng)計(jì)學(xué)系(510515) 王 巖 錢(qián)若遠(yuǎn) 陳平雁
1.3 多樣本的均數(shù)比較
1.3.1 差異性檢驗(yàn)
1.3.1.4 協(xié)方差分析(ANCOVA)
(1-43)
計(jì)算樣本量時(shí),先設(shè)定樣本量初始值,然后迭代樣本量直到所得的檢驗(yàn)效能滿足設(shè)定值為止,此時(shí)的樣本量即研究所需的樣本量[2]。
【例1-27】某研究欲比較三種不同教學(xué)干預(yù)方法(有聲思維教學(xué)法、指導(dǎo)性閱讀思維教學(xué)法、指導(dǎo)性閱讀教學(xué)法)對(duì)小學(xué)四年級(jí)學(xué)生閱讀理解能力的影響。采用平衡設(shè)計(jì),在干預(yù)前后分別評(píng)估受試者的錯(cuò)誤檢測(cè)任務(wù)(error detection task,EDT)得分。以干預(yù)前EDT得分和干預(yù)后理解監(jiān)控問(wèn)卷(comprehension monitoring questionnaire)得分為2個(gè)協(xié)變量,干預(yù)后EDT得分為因變量進(jìn)行協(xié)方差分析。根據(jù)既往研究數(shù)據(jù),三組均值分別為8.2220,9.8148,6.1904,標(biāo)準(zhǔn)差為2.3788,ρ2為0.4434,設(shè)定檢驗(yàn)水準(zhǔn)為0.05,試以檢驗(yàn)效能分別為0.8和0.9進(jìn)行樣本量估計(jì)。
nQuery Advanced實(shí)現(xiàn):設(shè)定檢驗(yàn)水準(zhǔn)α=0.05;檢驗(yàn)效能分別取80%和90%。依據(jù)上述基礎(chǔ)數(shù)據(jù)可知,G=3,c=2,ρ2=0.4344,σ=2.3788,r1=r2=r3=1,μ={μ1,μ2,μ3}={8.2220,9.8148,6.1904}計(jì)算可得V=2.20,在nQueryAdvanced 主菜單選擇:
Design:⊙Fixed Term
Goal:⊙Means
No.of Groups:⊙>Two
Analysis Method:⊙Test
方法框中選擇Analysis of Covariance(ANCOVA)。
在彈出的樣本量計(jì)算窗口將各參數(shù)鍵入,如圖1-65所示,檢驗(yàn)效能取80%的結(jié)果N=18,即總樣本量為18例,每組6例。檢驗(yàn)效能取90%的結(jié)果N=22,即總樣本量為22例,三組例數(shù)分別為8例、7例和7例(因總例數(shù)不是組數(shù)的整倍數(shù),故各組例數(shù)不絕對(duì)相等)。
圖1-65 nQuery Advanced 關(guān)于例1-27樣本量估計(jì)的參數(shù)設(shè)置與計(jì)算結(jié)果
SAS 9.4軟件實(shí)現(xiàn):
%let u={8.2220 9.8148 6.1904}; /*各組均值*/
%let r={1 1 1}; /*各組樣本量與第一組(參照組)樣本量的比值*/
proc IML;
start MGT4_1(a,G,sd,c,R2,power);
error=0;
if ( a>=1 | a<=0 ) then do; error=1; print "Error" "Test significance Level must be in 0-1"; end;
if ( sd<=0 ) then do; error=1; print "Error" "Common standard deviation must be >=0"; end;
if (G<=0 | ceil(G)^=G) then do; error=1; print "Error" "The Number of groups must be positive integer "; end;
if (c<=0 | ceil(c)^=c) then do; error=1; print "Error" "The Number of coveriates must be positive integer "; end;
if(power>=100 | power<1) then do; error=1; print "error" "Power(%) must be in 1-100"; end;
if(R2>1 | R2<0) then do; error=1; print "error" "R_squared value must be in 0-1";end;
if(error=1) then stop;
if(error=0) then do;
n1=ceil(c/G+1); n=n1#&r.;sum_N=n[1,+]-0.01;
do until(pw>=power/100);
sum_n=ceil(sum_n+0.01); n1=sum_N/&r.[1,+]; n=n1#&r.;
u_mean=(n#&u.)[,+]/sum_N;
v=(&r.#(&u.-u_mean)##2)[,+]/&r.[,+];
n_mean=sum_N/G;
sigma_e2=(1-R2)*sd##2;
lamda=n_mean*G*v/sigma_e2;
df1=G-1;df2=sum_N-G-c;
f=finv(1-a,df1,df2);
pw=1-probf(f,df1,df2,lamda); pw1=100*pw;
end;
end;
print a[label="Test Significance level"]
G[label="Number of groups"]
V[label="Variance of means"]
sd[label="Common standard deviation"]
c[label="Number of coveriates"]
R2[label="R_squared value between the reponse and the covariates"]
Pw1[label="Power(%)"]
sum_N[label="Total sample size"];
finish MGT4_1;
run MGT4_1(0.05,3,2.3788,2,0.4434,80);
run MGT4_1(0.05,3,2.3788,2,0.4434,90);
quit;
SAS 9.4運(yùn)行結(jié)果:
圖1-66a SAS 9.4關(guān)于例1-27樣本量估計(jì)的參數(shù)設(shè)置與計(jì)算結(jié)果(power=80%)
圖1-66b SAS 9.4關(guān)于例1-27樣本量估計(jì)的參數(shù)設(shè)置與計(jì)算結(jié)果(power=90%)
1.3.1.5 多變量方差分析(MANOVA)
方法:多變量方差分析針對(duì)分析中有多個(gè)反應(yīng)變量的情況,目前有多種常用統(tǒng)計(jì)量可用于分析,nQuery給出了其中三種供用戶選擇,分別是Wilks’似然比統(tǒng)計(jì)量、Pillai-Bartlett Trace統(tǒng)計(jì)量、Hotelling-Lawley Trace統(tǒng)計(jì)量。Muller and Barton (1989)[3],以及 Muller,LaVange,Ramey(1992)[4]等給出了多變量方差分析的樣本量和檢驗(yàn)效能的估計(jì)方法,各主效應(yīng)及其交互效應(yīng)的檢驗(yàn)效能估計(jì)是建立在各自的自由度及非中心參數(shù)的F分布上。其檢驗(yàn)效能的計(jì)算公式為:
1-β=1-ProbF(F1-α,df1,df2,df1,df2,λ)
(1-44)
式中,df1,df2代表F分布的自由度,λ為非中心參數(shù),根據(jù)不同的統(tǒng)計(jì)量,其各參數(shù)計(jì)算方式如下:
首先,我們用p代表因變量個(gè)數(shù),q代表所研究的分組變量的水平數(shù),X代表設(shè)計(jì)矩陣,r代表設(shè)計(jì)矩陣的秩,M代表均值矩陣,Σ代表協(xié)方差矩陣,C代表對(duì)比矩陣,不同效應(yīng)的檢驗(yàn)可以構(gòu)建不同對(duì)比矩陣C,N代表總樣本量。Wilks’似然比統(tǒng)計(jì)量、Pillai-Bartlett Trace統(tǒng)計(jì)量、Hotelling-Lawley Trace統(tǒng)計(jì)量可基于如下矩陣構(gòu)建 :
H=(CM)′[C(X′X)-1C′]-1(CM)
(1-45)
E=Σ(N-r)
(1-46)
T=H+E
(1-47)
(1)Wilks’ Lambda
利用矩陣(1-46), (1-47)可得Wilks’似然比統(tǒng)計(jì)量:
W=|ET-1|
(1-48)
該檢驗(yàn)統(tǒng)計(jì)量轉(zhuǎn)化為近似的F統(tǒng)計(jì)量為:
(1-49)
其中,
df1=ap,
df2=g[(N-r)-(p-a+1)/2]-(ap-2)/2,
(2) Pillai-Bartlett Trace
利用矩陣(1-45),(1-47)可得Pillai-Bartlett Trace統(tǒng)計(jì)量:
PBT=tr(HT-1)
(1-50)
該檢驗(yàn)統(tǒng)計(jì)量轉(zhuǎn)化為近似的F統(tǒng)計(jì)量為:
(1-51)
其中,
df1=ap,
df2=s[(N-r)-p+s],
(3)Hotelling-Lawley Trace
利用矩陣(1-45), (1-46)可得Hotelling-Lawley Trace統(tǒng)計(jì)量:
HLT=tr(HE-1)
(1-52)
該檢驗(yàn)統(tǒng)計(jì)量轉(zhuǎn)化為近似的F統(tǒng)計(jì)量為:
(1-53)
其中,
df1=ap,
df2=s[(N-r)-p-1]+2,
對(duì)于上述各統(tǒng)計(jì)量,其非中心參數(shù)λ為:
λ=F·df1
(1-54)
計(jì)算樣本量時(shí),先設(shè)定樣本量初始值,然后選擇相應(yīng)統(tǒng)計(jì)方法并迭代樣本量直到所得的檢驗(yàn)效能滿足設(shè)定值為止,此時(shí)的樣本量即研究所需的樣本量[2]。
【例1-28】某研究欲比較不同教學(xué)方法在培養(yǎng)中學(xué)生良好學(xué)習(xí)行為中的作用,教學(xué)方法分為傳統(tǒng)教學(xué)組、視聽(tīng)教學(xué)組和對(duì)照組三個(gè)組,因變量包括4個(gè)衡量學(xué)習(xí)行為的指標(biāo):學(xué)習(xí)環(huán)境、學(xué)習(xí)習(xí)慣、筆記能力和總結(jié)能力。根據(jù)既往研究,均值矩陣如矩陣M所示,協(xié)方差矩陣如矩陣Σ所示,設(shè)定檢驗(yàn)水準(zhǔn)為0.05,檢驗(yàn)效能為80%,試采用Wilks’ Lambda方法據(jù)此參數(shù)進(jìn)行樣本量估計(jì)。
nQuery Advanced實(shí)現(xiàn):設(shè)定檢驗(yàn)水準(zhǔn)為α= 0.05;檢驗(yàn)效能取80%。依據(jù)上述基礎(chǔ)數(shù)據(jù)可知,p=4,試驗(yàn)包含1個(gè)影響因素,其水平數(shù)為3,均值矩陣如矩陣M所示,協(xié)方差矩陣Σ如矩陣所示。在nQueryAdvanced 主菜單選擇:
Design:⊙Fixed Term
Goal:⊙Means
No.of Groups:⊙>Two
Analysis Method:⊙Test
方法框中選擇Multivariate Analysis of Variance(MANOVA)。
在彈出的樣本量計(jì)算窗口將各參數(shù)鍵入,如圖1-67a所示,均值矩陣和協(xié)方差矩陣如圖1-67b和圖1-67c所示,結(jié)果n和N分別為37和111,即每組樣本量為37例,總樣本量為111例。
圖1-67a nQuery Advanced 關(guān)于例1-28樣本量估計(jì)的參數(shù)設(shè)置與計(jì)算結(jié)果
圖1-67b nQuery Advanced 關(guān)于例1-28樣本量估計(jì)的參數(shù)設(shè)置與計(jì)算結(jié)果(均值矩陣)
圖1-67c nQueryAdvanced 關(guān)于例1-28樣本量估計(jì)的參數(shù)設(shè)置與計(jì)算結(jié)果(協(xié)方差矩陣)
SAS 9.4軟件實(shí)現(xiàn):
%let f_level={3}; /*各分組變量的水平數(shù)*/
%let m= { 8.75 7.79 8.14,8.10 7.71 7.19,17.83 17.32 16.67,18.90 19.18 17.69};
/*因變量在所有分組變量各個(gè)水平組合下的均值*/
prociml;
start MGT3(alpha,power,p,f,rho,sd,sigma_def,method);
f_level=&f_level.;
m=t(&m.);
/*parameter check*/
error=0;
if ( alpha>=1 | alpha<=0 ) then do; error=1; print "Error" "Test significance Level must be in 0-1"; end;
/*參數(shù)f代表分組變量個(gè)數(shù),不超過(guò)3個(gè)*/
if (f^=1 & f^=2 & f^=3) then do; error=1; print "Error""The number of factors must be in 1,2,3";end;
/*參數(shù)power代表各個(gè)效應(yīng)的檢驗(yàn)效能向量;當(dāng)研究包含A和B兩個(gè)分組變量時(shí),可依次定義A、B及其交互效應(yīng)AB的檢驗(yàn)效能;當(dāng)研究包含A、B、C三個(gè)分組變量時(shí),可依次定義A、B、C、AB、AC、BC、ABC的檢驗(yàn)效能*/
if f=1 & nrow(power)^=1 & ncol(power)^=1 then do;error=1;print "Error""Please input power of factor A";end;
if f=2 & nrow(power)^=3 & ncol(power)^=3 then do;error=1;print "Error""Please input power of factor A,factor B,factor AB (input "." if missing)";end;
if f=3 & nrow(power)^=7 & ncol(power)^=7 then do; error=1; print "Error""Please input power of factor A,factor B,factor C,factor AB,factor AC,factor BC,factor ABC (input "." if missing)"; end;
if (p<=0 | ceil(p)^=p) then do; error=1; print "Error" "The number of response variables must be a positive integer "; end;
if error=0 then do; q=1;do i=1 to f; q= q * &f_level.[i]; end;end;else stop;
if nrow(f_level)^=f then do;error=1; print "Error""The row number of factor levels should be equal to the number of factors";end;
if ncol(m)^=p then do; error=1; print "Error" "The row number of mean matrix should be equal to the number of response variables"; end;
if nrow(m)^=q then do; error=1; print "Error" "The column number of mean matrix should be equal to the product of factor levels"; end;
/*協(xié)方差矩陣可以由參數(shù)rho和sd生成,也可以由參數(shù)sigma_def直接定義*/
if (rho^=.& (rho>=1 | rho<=0)) then do; error=1;print "error" "Correlation must be in 0-1";end;
if (sd^=.& ( sd<=0 )) then do;error=1; print "Error" "Standard deviation at each level must be >=0"; end;
if (sigma_def^=.& (ncol(sigma_def)^=p | nrow(sigma_def)^=p)) then do;
error=1; print "Error" "The row and column numbers of covariance matrix should be equal to the number of response variables"; end;
/*參數(shù)method代表所選用的檢驗(yàn)統(tǒng)計(jì)量,其中1=Wilks' Lambda統(tǒng)計(jì)量,2=Pillai-Bartlett Trace統(tǒng)計(jì)量,3=Hotelling-Lawley Trace統(tǒng)計(jì)量。*/
if (method^=1 & method^=2 & method^=3) then do; error=1; print "Error""method must be in 1,2,3";end;
if (rho^=.& sd^=.) then do;
covariance=rho*sd**2;sigma=j(p,p,covariance);
do i=1 to p; sigma[i,i]=sd**2; end;
end;
else sigma=sigma_def;
if(error=1) then stop;
if(error=0)then do;
/*C matrix*/
max_level=&f_level.[<>,];
origin_c=j(max_level-1,max_level,0);
origin_j=j(max_level-1,max_level,.);
do i=2 to max_level;
e1=-(i-1)/sqrt(i*(i-1)); e2=1/sqrt(i*(i-1)); e3=j(1,i-1,e2); e4=1/sqrt(i);
e5=j(1,i,e4);
index=max_level+1-i;
origin_c[index,index]=e1;
origin_c[index,index+1:max_level]=e3;
origin_j[index,index:max_level]=e5;
end;
f1=&f_level.[1]; index1=max_level-f1+1;
C1=origin_c[index1:max_level-1,index1:max_level];
J1=origin_j[index1,index1:max_level];C_A=C1; power_a=power[1];
if f>1 then do;
f2=&f_level.[2]; index2=max_level-f2+1; C2=origin_c[index2:max_level-1,index2:max_level];
J2=origin_j[index2,index2:max_level];
C_A=C1@J2;C_B=J1@C2;C_AB=C1@C2;
power_b=power[2];power_ab=power[3];
end;
if f>2 then do;
f3=&f_level.[3];
index3=max_level-f3+1;
C3=origin_c[index3:max_level-1,index3:max_level];
J3=origin_j[index3,index3:max_level];
C_A=C1@J2@J3;C_B=J1@C2@J3;C_C=J1@J2@C3;
C_AB=C1@C2@J3;C_AC=C1@J2@C3;C_BC=J1@C2@C3;
C_ABC=C1@C2@C3;
power_c=power[3];
power_ab=power[4];power_ac=power[5];power_bc=power[6];
power_abc=power[7];
end;
/*power*/
%macro pw(c,pw_exp);
n_orig=j(1,q,1);n=j(1,q,1);
if &pw_exp.^=.then do;
if method=1 then do;
test="Wilks' Lambda";
do until (pw>=&pw_exp.);
n=n+n_orig;xx=t(n)#I(q);E=sigma * (n[<>,+]-q);
theta=&C.* m; H=t(theta)*inv((&C.*inv(XX)*t(&C.)))*theta;
T=H+E;df1=a*p;W=det(E*inv(T));fmm=a**2+p**2-5;
if fmm>0 then g=sqrt((a**2*p**2-4)/(a**2+p**2-5)); else g=1;
eta=1-W**inv(g);
df2=g*((n[<>,+]-q)-(p-a+1)/2)-(a*p-2)/2;
if df2>0 then do;
F_statistics=(eta/df1)/((1-eta)/df2);
f_c=finv(1-alpha,df1,df2);lambda=df1*F_statistics;
pw=(1-probf(f_c,df1,df2,lambda))*100; sum_n=n[<>,+];
end;
else pw=0;
end;
end;
if method=2 then do;
test="Pillai-Bartlett Trace";
do until (pw>=&pw_exp.);
n=n+n_orig;xx=t(n)#I(q);E=sigma * (n[<>,+]-q);
theta=&C.* m; H=t(theta)*inv((&C.*inv(XX)*t(&C.)))*theta;
T=H+E;df1=a*p;
ht=h*inv(t);PBT=trace(ht);s=min(a,p);
eta=pbt/s;df2=s*((n[<>,+]-q)-p+s);
if df2>0 then do;
F_statistics=(eta/df1)/((1-eta)/df2);
f_c=finv(1-alpha,df1,df2);lambda=df1*F_statistics;
pw=(1-probf(f_c,df1,df2,lambda))*100; sum_n=n[<>,+];
end;
else pw=0;
end;
end;
if method=3 then do;
test="Hotelling-Lawley Trace";
do until (pw>=&pw_exp.);
n=n+n_orig;xx=t(n)#I(q);E=sigma * (n[<>,+]-q);
theta=&C.* m; H=t(theta)*inv((&C.*inv(XX)*t(&C.)))*theta;
T=H+E;df1=a*p;
he=h*inv(E);hlt=trace(he);
s=min(a,p);eta=(hlt/s)/(1+hlt/s);
df2=s*((n[<>,+]-q)-p-1)+2;
if df2>0 then do;
F_statistics=(eta/df1)/((1-eta)/df2);
f_c=finv(1-alpha,df1,df2);lambda=df1*F_statistics;
pw=(1-probf(f_c,df1,df2,lambda))*100; sum_n=n[<>,+];
end;
else pw=0;
end;
end;
end;
else do; pw=.; sum_n=0; end;
%mend;
/*compute total sample size*/
a=&f_level.[1]-1; %pw(C_A,power_a);sum_n_a=sum_n;Smp_size=sum_n_a;
if f>1 then do;
a=&f_level.[2]-1;%pw(C_B,power_b);sum_n_b=sum_n;
a=(&f_level.[1]-1)*(&f_level.[2]-1);%pw(C_AB,power_ab);
sum_n_ab=sum_n;
Smp_size=j(3,1,.);
Smp_size[1]=sum_n_a;
Smp_size[2]=sum_n_b;
Smp_size[3]=sum_n_ab;
end;
if f>2 then do;
a=&f_level.[3]-1;%pw(C_C,power_c);pw_c=pw;sum_n_c=sum_n;
a=(&f_level.[1]-1)*(&f_level.[3]-1);
%pw(C_AC,power_ac);sum_n_ac=sum_n;
a=(&f_level.[2]-1)*(&f_level.[3]-1);
%pw(C_BC,power_bc);sum_n_bc=sum_n;
a=(&f_level.[1]-1)*(&f_level.[2]-1)*(&f_level.[3]-1);
%pw(C_ABC,power_abc);sum_n_abc=sum_n;
Smp_size=j(7,1,.);
Smp_size[1]=sum_n_a;
Smp_size[2]=sum_n_b;
Smp_size[3]=sum_n;
Smp_size[4]=sum_n_ab;
Smp_size[5]=sum_n_ac;
Smp_size[6]=sum_n_bc;
Smp_size[7]=sum_n_abc;
end;
Total_n=Smp_size[<>,];grp_size=Total_n/q;
/*compute power for total sample size*/
%macro pw2(c,pw_exp);
if &pw_exp.^=.then do;
n=j(1,q,grp_size);xx=t(n)#I(q);E=sigma * (n[<>,+]-q);
theta=&C.* m; H=t(theta)*inv((&C.*inv(XX)*t(&C.)))*theta;
T=H+E;df1=a*p;
if method=1 then do;
W=det(E*inv(T));fmm=a**2+p**2-5;
if fmm>0 then g=sqrt((a**2*p**2-4)/(a**2+p**2-5)); else g=1;
eta=1-W**inv(g);
df2=g*((n[<>,+]-q)-(p-a+1)/2)-(a*p-2)/2;
F_statistics=(eta/df1)/((1-eta)/df2);
end;
if method=2 then do;
ht=h*inv(t);PBT=trace(ht);s=min(a,p);
eta=pbt/s;df2=s*((n[<>,+]-q)-p+s);
F_statistics=(eta/df1)/((1-eta)/df2);
end;
if method=3 then do;
he=h*inv(E);hlt=trace(he);s=min(a,p);
eta=(hlt/s)/(1+hlt/s);df2=s*((n[<>,+]-q)-p-1)+2;
F_statistics=(eta/df1)/((1-eta)/df2);
end;
f_c=finv(1-alpha,df1,df2);lambda=df1*F_statistics;
pw=(1-probf(f_c,df1,df2,lambda))*100;
end;
else pw=.;
%mend;
Level= &f_Level.;
factor="Factor A";
a=&f_level.[1]-1; %pw2(C_A,power_a); pw_a=pw;power=pw_a;Alpha1=alpha;
if f>1 then do;
factor={"Factor A","FactorB","Factor AB"};
a=&f_level.[2]-1; %pw2(C_B,power_b); pw_b=pw;
a=(&f_level.[1]-1)*(&f_level.[2]-1); %pw2(C_AB,power_ab); pw_ab=pw;
power=j(3,1,.);power[1]=pw_a; power[2]=pw_b;power[3]=pw_ab;
Alpha1=j(3,1,alpha);
end;
if f>2 then do;
factor={"Factor A","FactorB","FactorC","FactorAB","FactorAC","FactorBC","Factor ABC"};
a=&f_level.[3]-1; %pw2(C_C,power_c); pw_c=pw;
a=(&f_level.[1]-1)*(&f_level.[3]-1); %pw2(C_AC,power_ac); pw_ac=pw;
a=(&f_level.[2]-1)*(&f_level.[3]-1); %pw2(C_BC,power_bc); pw_bc=pw;
a=(&f_level.[1]-1)*(&f_level.[2]-1)*(&f_level.[3]-1);
%pw2(C_ABC,power_abc);pw_abc=pw;
power=j(7,1,.);power[1]=pw_a; power[2]=pw_b;power[3]=pw_c;
power[4]=pw_ab;power[5]=pw_ac;power[6]=pw_bc;power[7]=pw_abc;
Alpha1=j(7,1,alpha);
end;
Mean_Matrix=&m.;
end;
print test[Label="Test"]
p[Label="Number of Response Variables"]
sd[Label="Common Standard Deviation"]
rho[Label="Correlation"]
grp_size[Label="Group Size"]
Total_n[Label="Total Sample Size"];
Print Factor [label="Factor"]
Level [Label="Level"]
Alpha1 [Label="Alpha"]
Power [Label="Power(%)"];
Print Mean_Matrix;
Print sigma[label="Covariance Matrix"];
finish MGT3;
%let sigma1={3.641 1.274 2.641 4.555,1.274 2.623 1.947 2.722,2.641 1.947 9.548 7.001,4.555 2.722 7.001 15.914};
%let power={80};
run MGT3(0.05,&power.,4,1,.,.,&sigma1.,1);
quit;
SAS 9.4運(yùn)行結(jié)果:
圖1-68 SAS 9.4關(guān)于例1-28樣本量估計(jì)的參數(shù)設(shè)置與計(jì)算結(jié)果
中國(guó)衛(wèi)生統(tǒng)計(jì)2019年2期