郭迪 吳海濤
摘? 要: 基于以開(kāi)發(fā)人員驅(qū)動(dòng)的代碼異味優(yōu)先級(jí)排序方法,結(jié)合優(yōu)化決策樹(shù)算法建立模型,對(duì)代碼異味的重構(gòu)優(yōu)先級(jí)進(jìn)行面向開(kāi)發(fā)人員的排名,并在實(shí)證研究中評(píng)估了該模型,以模型可解釋性方法對(duì)特征的重要性進(jìn)行評(píng)估,給出了相關(guān)影響較高的特征.結(jié)果表明,該模型的F1值為89%,分別較基線值和最新研究成果高出25%和5%.
關(guān)鍵詞: 代碼異味; 決策樹(shù); 特征選擇; 軟件可維護(hù)性
中圖分類(lèi)號(hào): TP 311??? 文獻(xiàn)標(biāo)志碼: A??? 文章編號(hào): 1000-5137(2022)02-0210-07
GUO Di, WU Haitao
(College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China)
Based on the prioritization method of developer-driven code smell, combining with the optimization decision tree algorithm, a model was built to rank the refactoring priority of code smell for developers, which was evaluated in the empirical study. The importance of features was evaluated by the interpretive methods and features with high correlation impact were provided. The experimental results showed that the F1 value of the model was 89%, which was 25% and 5% better than the benchmark value and the latest research result, respectively.
code smell; decision tree; feature selection; software maintainability
0? 引言
隨著軟件系統(tǒng)的演進(jìn),開(kāi)發(fā)人員需要持續(xù)地修改代碼以適應(yīng)新的需求和變化的業(yè)務(wù)場(chǎng)景.因此,在開(kāi)發(fā)過(guò)程中會(huì)引入技術(shù)債務(wù),即因軟件代碼問(wèn)題導(dǎo)致軟件質(zhì)量的下降,而代碼異味是一種典型的、可量化的技術(shù)債務(wù).代碼異味與程序可理解性、可維護(hù)性、可測(cè)試性的降低以及維護(hù)工作量和額外成本的增加有關(guān)聯(lián),對(duì)代碼異味的人工分析也耗時(shí)且費(fèi)力.因此,研究人員提出使用自動(dòng)化機(jī)制,幫助開(kāi)發(fā)人員識(shí)別和消除代碼異味,從而提升代碼質(zhì)量.
FONTANA等提出了從軟件指標(biāo)中衍生的代碼異味嚴(yán)重性度量方法,根據(jù)度量值對(duì)代碼異味的優(yōu)先級(jí)進(jìn)行識(shí)別,但該方法過(guò)于依賴(lài)于主觀經(jīng)驗(yàn),導(dǎo)致工具之間的一致性較低且不符合開(kāi)發(fā)人員的偏好,對(duì)于代碼異味檢測(cè)的實(shí)踐貢獻(xiàn)仍然有限.MARINESCU提出了開(kāi)發(fā)人員驅(qū)動(dòng)的代碼異味優(yōu)先級(jí)方法,根據(jù)開(kāi)發(fā)人員提到的重要因素提取特征,用于對(duì)代碼異味進(jìn)行優(yōu)先級(jí)排序,與文獻(xiàn)[7]相比,排序性能得到了提升,但該方法僅使用了最通用的特征選擇方法和機(jī)器學(xué)習(xí)模型實(shí)現(xiàn)排序功能,且沒(méi)有闡明選擇這些方法的具體理由,其模型的性能仍有待提升.本文作者基于上述文獻(xiàn)的數(shù)據(jù)集,提出了一種改進(jìn)的模型,通過(guò)閾值篩選出合適的特征,避免了特征篩選中的局部最優(yōu)問(wèn)題;另外,通過(guò)引入決策樹(shù)模型,提升預(yù)測(cè)優(yōu)先級(jí)的準(zhǔn)確率,并采用信息增益方法計(jì)算特征的重要性,分析每個(gè)特征對(duì)每種代碼異味優(yōu)先級(jí)預(yù)測(cè)的貢獻(xiàn).
1? 構(gòu)建方法與建立模型
決策樹(shù)()原理
LightGBM是一個(gè)梯度boosting框架,使用基于學(xué)習(xí)算法的決策樹(shù),解決了對(duì)每一個(gè)特征都要掃描所有的樣本點(diǎn)來(lái)選擇最好切分點(diǎn)的問(wèn)題,是分布式且高效的.LightGBM主要具有以下特點(diǎn):1) 能減少分割增益的計(jì)算量,通過(guò)將直方圖相減計(jì)算,進(jìn)一步減少所占內(nèi)存的空間以及并行學(xué)習(xí)的通信代價(jià);2) 制定決策樹(shù)生長(zhǎng)策略,最優(yōu)分割類(lèi)別的特征值;3) 優(yōu)化并行學(xué)習(xí)的LightGBM算法的兩個(gè)主要步驟:
① 基于梯度的單邊采樣 (GOSS),僅對(duì)樣本進(jìn)行采樣來(lái)計(jì)算梯度,其偽代碼如圖1所示.
② 互斥特征捆綁(EFB),將某些特征捆綁,降低特征的維度,尋找最佳切分點(diǎn),減少成本的消耗,其主要偽代碼如圖2所示.
模型建立
模型流程如圖3所示.數(shù)據(jù)處理包括:1) 數(shù)據(jù)預(yù)處理,對(duì)選取的軟件指標(biāo)進(jìn)行人工數(shù)據(jù)篩選,剔除空白指標(biāo)數(shù)據(jù),刪除冗余數(shù)據(jù),將數(shù)據(jù)歸一化;2) 特征選擇,本研究采用spearman對(duì)軟件指標(biāo)間的相關(guān)系數(shù)()進(jìn)行相關(guān)性分析,spearman對(duì)原始變量的分布不作要求,屬于非參數(shù)統(tǒng)計(jì)方法,所選取的特征更符合數(shù)據(jù)集本身的要求.
通過(guò)對(duì)比各種分類(lèi)器,選擇優(yōu)化后的LightGBM作為最佳分類(lèi)器模型.為了訓(xùn)練模型,采用了10折交叉驗(yàn)證策略,將數(shù)據(jù)集隨機(jī)劃分為大小相等的10份,對(duì)數(shù)據(jù)集分層抽樣,每一折具有相同比例的臨界等級(jí).將2份數(shù)據(jù)用作測(cè)試集,其余用于訓(xùn)練模型.該過(guò)程重復(fù)10次,每次將數(shù)據(jù)集重新劃分成訓(xùn)練集和測(cè)試集.
通過(guò)計(jì)算精度、召回率和F-measure評(píng)估實(shí)驗(yàn)?zāi)P偷男阅?使用信息增益算法和Scott-Knot結(jié)果大小差異(SK-ESD)測(cè)試計(jì)算和評(píng)估軟件指標(biāo)帶來(lái)的熵變化,其中Scott-Knott檢驗(yàn)是一種統(tǒng)計(jì)度量,用于比較和區(qū)分模型性能,使用層次聚類(lèi)方法對(duì)評(píng)估指標(biāo)進(jìn)行分組,以便分析評(píng)估指標(biāo)對(duì)開(kāi)發(fā)人員感知代碼異味嚴(yán)重性評(píng)估的準(zhǔn)確性.
2? 仿真實(shí)驗(yàn)
代碼異味
選取4種代碼異味進(jìn)行研究:
1) 上帝類(lèi)(God Class).上帝類(lèi)會(huì)影響不遵循單一職責(zé)原則的類(lèi),造成代碼內(nèi)聚性變差且難以維護(hù),上帝類(lèi)通常會(huì)造成軟件缺陷,提高維護(hù)成本.
2) 復(fù)雜類(lèi)(Complex Class).高度復(fù)雜的類(lèi)會(huì)提高開(kāi)發(fā)人員理解并優(yōu)化代碼的難度,開(kāi)發(fā)人員通常需要識(shí)別這種異味,并評(píng)估其重要性.
3) 意大利面條代碼(Spaghetti Code).意大利面條代碼通常表現(xiàn)為混亂的代碼控制結(jié)構(gòu),且沒(méi)有正確使用面向?qū)ο缶幊淘瓌t的編程風(fēng)格,會(huì)影響開(kāi)發(fā)人員對(duì)源代碼的理解.
4) 霰彈式修改(Shotgun Surgery).當(dāng)開(kāi)發(fā)人員對(duì)一個(gè)類(lèi)進(jìn)行修改時(shí),必須同時(shí)在大量不同的類(lèi)中做相應(yīng)修改,導(dǎo)致相關(guān)類(lèi)出現(xiàn)缺陷的概率大幅提高.
PECORELLI等證明了這4種異味在真實(shí)的系統(tǒng)中廣泛分布;GRANO等證實(shí)了這4種異味對(duì)軟件系統(tǒng)的可維護(hù)性、可理解性和可測(cè)試性會(huì)產(chǎn)生負(fù)面影響;PALOMBA等提出開(kāi)發(fā)人員不僅能準(zhǔn)確分析這4種代碼異味,而且能感知它們的嚴(yán)重性.
軟件指標(biāo)
參考文獻(xiàn)[19]選取一組涵蓋不同角度下的類(lèi)結(jié)構(gòu)和可維護(hù)性特征的指標(biāo),如表1所示.
數(shù)據(jù)集
代碼異味是隨著時(shí)間變化的一組特征,所以項(xiàng)目需滿足數(shù)據(jù)集完整性、連貫性的要求,選取Apache社區(qū)的7個(gè)開(kāi)源項(xiàng)目進(jìn)行實(shí)驗(yàn)(表2).考慮類(lèi)數(shù)超過(guò)500,更改歷史至少5年,至少提交1 000次,并且貢獻(xiàn)者數(shù)量高于20人的項(xiàng)目,篩選出341個(gè)上帝類(lèi)、349個(gè)復(fù)雜類(lèi)、313 個(gè)意大利面條代碼和329個(gè)霰彈式修改,共1 332條數(shù)據(jù).
實(shí)驗(yàn)結(jié)果分析
通過(guò)對(duì)數(shù)據(jù)集的訓(xùn)練和測(cè)試,得到的實(shí)驗(yàn)結(jié)果對(duì)比如表3所示.結(jié)果表明,所建立模型的F1值為77%~90%,高于對(duì)比文獻(xiàn)[19]中的模型,其中對(duì)God Class,Complex Class及Shotgun Surgery的分類(lèi)結(jié)果,F(xiàn)1值分別高出4%,6%及5%.
通過(guò)特征可解釋性工具,計(jì)算出每個(gè)特征的重要性,實(shí)驗(yàn)結(jié)果如表4所示.通過(guò)信息增益算法,計(jì)算出每個(gè)特征對(duì)模型結(jié)果提供的增益值,利用Scott-Knott結(jié)果大小差異校驗(yàn)進(jìn)行評(píng)估打分,并對(duì)打分進(jìn)行排序.本研究給出信息增益值大于0.5的結(jié)果.
影響代碼異味God Class的因素是由多種指標(biāo)混合而成,不能僅僅通過(guò)某種單一指標(biāo)進(jìn)行描述.模型主要依賴(lài)EXP,LOC,OWN及ADS,同時(shí)LCOM5,C3以及NFI對(duì)其也有影響.
對(duì)于Complex Class,模型最為依賴(lài)的指標(biāo)為EXP,OWN和NCH,此外其他指標(biāo)對(duì)分類(lèi)模型也有一定影響,如RA,表明在優(yōu)先考慮Complex Class時(shí),開(kāi)發(fā)人員認(rèn)為可讀性很重要.其他影響因素包括WMC和LCOM5,證實(shí)了代碼異味的檢測(cè)涉及代碼結(jié)構(gòu)方面的因素.
在分析Spaghetti Code時(shí),發(fā)現(xiàn)沒(méi)有代碼結(jié)構(gòu)因素對(duì)分類(lèi)產(chǎn)生主要影響.RA是開(kāi)發(fā)人員優(yōu)先考慮Spaghetti Code的首要因素,因此,開(kāi)發(fā)人員在遇到遭受這種異味的實(shí)例時(shí),優(yōu)先考慮語(yǔ)義上是否連貫.OWN和 LOC為次要因素,NCH和LCOM5的影響可以忽略不計(jì).
Shotgun Surgery,NCH和ACS被證明是重要的指標(biāo).此外,PERS也被證明會(huì)影響分類(lèi)的情況,證實(shí)了開(kāi)發(fā)人員會(huì)根據(jù)單次代碼提交共同修改涉及類(lèi)的數(shù)量,即Shotgun Surgery的強(qiáng)度, 評(píng)估Shotgun Surgery的嚴(yán)重程度.此外,C3和LCOM5也會(huì)有一定程度的影響,但與其他類(lèi)型指標(biāo)的貢獻(xiàn)相比,影響程度較低.
3? 結(jié) 論
基于面向開(kāi)發(fā)人員的代碼異味優(yōu)先級(jí)排序方法,利用特征選擇方法選擇自變量,應(yīng)用LightGBM機(jī)器學(xué)習(xí)算法,對(duì)描述代碼異味的關(guān)鍵性特征進(jìn)行分類(lèi),用于預(yù)測(cè)開(kāi)發(fā)人員對(duì)1 332個(gè)代碼異味實(shí)例標(biāo)注的嚴(yán)重程度.通過(guò)實(shí)驗(yàn)表明,對(duì)于4種代碼異味的分類(lèi),所設(shè)計(jì)模型的F1值為77%~90%,與基線模型相比,平均高出25%.本研究后續(xù)的工作包括:1) 增加數(shù)據(jù)集數(shù)量,優(yōu)化數(shù)據(jù)集處理方法;2) 替換其他模型,論述模型的對(duì)比與優(yōu)化情況;3) 更換其他開(kāi)發(fā)語(yǔ)言應(yīng)用,驗(yàn)證本研究結(jié)論.
參考文獻(xiàn):
[1]? BROWN N, CAI Y F, GUO Y P, et al. Managing technical debt in software?reliant systems [C]// Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research. Santa Fe: ACM,2010:47-52.
[2]? SHULL F, FALESSI D, SEAMAN C, et al. Technical Debt: Showing the Way for Better Transfer of Empirical Results [M]// Perspectives on the Future of Software Engineering. Berlin: Springer,2013.
[3]? ABBES M, KHOMH F, GUEHENEUC Y G, et al. An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension [C]// 15th European Conference on Software Maintenance and Reengineering. Oldenburg: IEEE,2011:181-190.
[4]? PALOMBA F, BAVOTA G, PENTA M D, et al. On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation [C]// 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). Gothenburg: IEEE,2017:1-34.
[5]? GRANO G, PALOMBA F, GALL H C. Lightweight assessment of test?case effectiveness using source?code?quality indicators [J]. IEEE Transactions on Software Engineering,2019,47(4):758-774.
[6]? SJ?BERG D I K, YAMASHITA A, ANDA B C D, et al. Quantifying the effect of code smells on maintenance effort [J]. IEEE Transactions on Software Engineering,2012,39(8):1144-1156.
[7]? FONTANA F A, ZANONI M. Code smell severity classification using machine learning techniques [J]. Knowledge?Based Systems,2017,128:43-58.
[8]? MARINESCU R. Assessing technical debt by identifying design flaws in software systems [J]. IBM Journal of Research and Development,2012,56(5):1-13.
[9]? AL?ANI A, DERICHE M. Feature selection using a mutual information based measure [C]// Proceeding of the 16th IEEE International Conference on Pattern Recognition. Quebec: IEEE,2002:82-85.
[10] KE G L, MENG Q, FINLEY T, et al. Lightgbm: a highly efficient gradient boosting decision tree [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM,2017:3149-3157.
[11] BAEZA?YATES R, RIBEIRO?NETO B, MILLS D, et al. Modern Information Retrieval [M]. New York: ACM,1999.
[12] SCOTT A J, KNOTT M. A cluster analysis method for grouping means in the analysis of variance [J]. Biometrics,1974,30(3):507-512.
[13] BUDD T A. An Introduction to Object?Oriented Programming [M]. Boston: Addison?Wesley Publishing,2001.
[14] FOWLER M, BECK K. Refactoring: Improving the Design of Existing Code [M]. Boston: Addison?Wesley Publishing, 1999.
[15] KHOMH F, PENTA M D, GUE?HE?NEUC Y, et al. An exploratory study of the impact of antipatterns on class change-and fault?proneness [J]. Empirical Software Engineering,2012,17(3):243-275.
[16] SOH Z, YAMASHITA A, KHOMH F, et al. Do code smells impact the effort of different maintenance programming activities?[C]// IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering. Osaka: IEEE,2016:393-402.
[17] BROWN W H, MALVEAU R C, MCCORMICK H W, et al. Antipatterns: Refactoring Software, Architectures, and Projects in Crisis [M]. New York: John Wiley & Sons,1998.
[18] PALOMBA F, BAVOTA G, PENTA M D, et al. Do they really smell bad? A study on developers’ perception of bad code smells [C]// Software Maintenance and Evolution. Victoria: IEEE,2014:101-110.
[19] PECORELLI F, KHOMH F, LUCIA A D. Developer?driven code smell prioritization [C]// Proceedings of the 17th International Conference on Mining Software Repositories. New York: ACM,2020.
[20] HUANG Z J, CHEN J H, GAO J H. Quantifying anemia and bloodshot of layers in Web applications from the perspective of code smell [J]. Acta Electronica Sinica,2020,48(4):772-780.
[21] TUFANO M, PALOMBA F, BAVOTA G, et al. When and why your code starts to smell bad (and whether the smells go away) [J]. IEEE Transactions on Software Engineering,2017,43(11):1063-1088.
[22] TAIBI D, JANES A, LENARDUZZI V. How developers perceive smells in source code: a replicated study [J]. Information and Software Technology,2017,92:223-235.
(責(zé)任編輯:包震宇,馮珍珍)