唐晨 李勇華 饒夢妮 胡鋼俊
摘 要:雖然與信息檢索(IR)方法相比,基于本體的動態(tài)需求跟蹤方法能提高跟蹤鏈的精度,但構建一個合理、有效的本體特別是領域本體是一個相當復雜和繁瑣的過程。為了減小構建領域本體帶來的時間成本和人力成本,通過將修飾詞和通用本體相結合,提出基于修飾詞本體的關鍵詞語義判斷方法(MOKSJM)。首先,對關鍵詞和修飾詞的搭配關系進行分析;然后,采用修飾詞本體結合規(guī)則的方式來確定關鍵詞的語義,以避免關鍵詞的多義性對動態(tài)需求跟蹤結果造成的偏差;最后,根據上述分析的結果,對關鍵詞語義作出調整,并通過相似度得分來體現(xiàn)其語義。修飾詞在需求文檔、設計文檔等中數量較少,因此建立修飾詞本體所帶來的時間成本和人力成本相對較小。實驗結果表明,MOKSJM與基于領域本體的動態(tài)跟蹤方法在召回率相當時,精度差距更小;與向量空間模型(VSM)方法相比,MOKSJM能有效提高需求跟蹤結果的精度。
關鍵詞:動態(tài)需求跟蹤;本體;修飾詞;需求工程;軟件工程
中圖分類號:TP311.5
文獻標志碼:A
Abstract: Although ontologybased dynamic requirement traceability methods can improve the accuracy of trace links compared with Information Retrieval (IR), but it is rather complicated and tedious to construct a reasonable and effective ontology, especially domain ontology. In order to reduce time cost and labor cost brought by the domain ontology construction, a Modifier Ontologybased Keyword Semantic Judgment Method (MOKSJM) which combined modifiers with general ontology was proposed. Firstly, the collocation relationship between keywords and modifiers was analyzed. Then, the semantics of keywords were determined by combining modifier ontologies with rules, so as to avoid the bias of dynamic requirements traceability results caused by the polysemy of keywords. Finally, based on results of the above analysis, the semantics of keywords were adjusted and reflected by similarity scores. The number of modifiers is small in the requirements document, design documents, etc., so the time cost and labor cost brought by establishing the modifier ontology is relatively small. The experimental results show that compared to domain ontologybased dynamic requirement traceability method, MOKSJM has a small gap in precision with the same recall rate, and when compared to Vector Space Model (VSM) method, MOKSJM can effectively improve the accuracy of the requirements traceability result.
英文關鍵詞Key words: dynamic requirements traceability; ontology; modifier; requirements engineering; software engineering
0 引言
語義問題是目前動態(tài)需求跟蹤[1]中的關鍵問題。本體研究的深入和本體技術的廣泛應用使得其關注度不斷提升,越來越多的學者采用本體解決動態(tài)需求中的語義問題:Chen等 [2]提出了一種評估語義挖掘的方法,將WordNet中九種語義關系和一體化醫(yī)學語言系統(tǒng)(Unified Medical Language System, UMLS)中的同義詞關系相結合得到一個標準數據集,并通過這個標準數據集來評估嵌入詞, 該方法適用于大部分的語義關系,但是測量方法采用余弦相似度的計算方式,結果并不足夠準確; Kolhe等[3]為了方便對大型文本數據庫進行數據檢索和管理,采用潛在語義索引(Latent Semantic Index, LSI)聚類并創(chuàng)建標簽,然后將WordNet擴展查詢和余弦相似度相結合計算相似度, 該方法通過 WordNet 的語義算法解決了多義詞等問題,但是當矩陣變換的數量增多時,對于內存的需求就會增大; Besbes 等[4]為了幫助用戶理解或表達醫(yī)學術語,通過自動提取用戶查詢概念并構建醫(yī)療本體,然后考慮分類關系及用戶個人資料信息,對本體進行模糊化,最后將本體納入查詢的重定義中, 但實驗結果和所應用的領域密切相關; Matei等[5]通過WordNet計算單詞之間的語義距離,然后根據動態(tài)時序來計算文本之間的相似度,所提出的時間序列模型,與傳統(tǒng)的向量空間模型相比,考慮了單詞的順序對于語義的影響,提高了結果的準確性; Kulathunga等[6]通過將本體和聚類方法相融合來識別金融文本中含糊的單詞含義,該方法雖消除了文本的語義歧義并提高了聚類算法的性能,但并未使用金融數據集去驗證該方法的有效性; Mai等[7]提出了一種基于統(tǒng)計和本體的語義核函數,并將語義核函數嵌入到支持向量機中進行中文文本分類,充分地利用了文本中的語義關系來改善文本分類性能,但是構建與語義核函數相關聯(lián)的特征矩陣是非常耗時的; 鞏皓等[8]以微博短文為素材,構建安全領域本體知識庫,利用本體知識對初始查詢詞進行擴展,并結合局部查詢反饋對候選擴展詞進行篩選,最后進行二次查詢和迭代操作得到最后結果。微博以短文為主且關鍵詞和信息量較稀疏,因此該方法隨查詢結果不斷增多,準確性會降低。
根據相關研究表明,需求文檔中78%的詞和名詞相關[9],因此動名詞成為了動態(tài)需求跟蹤中的主要研究對象。而名詞具有多義性,以動名詞為研究對象,容易因語義分歧造成動態(tài)需求跟蹤的誤差。信息檢索(Information Retrieval, IR)方法便無法解決名詞的“一詞多義”和“一義多詞”的這類問題[10],雖然基于領域本體的動態(tài)跟蹤方法能夠有效解決此類問題,但是該方法必須構建相關的領域本體,而構建領域本體是一個相當復雜和繁瑣的過程。由于修飾詞在需求文檔中的數量較少,因此與建立領域本體相比,建立修飾詞本體代價較小。為此,本文提出了一種基于修飾詞本體的關鍵詞語義判斷方法(Modifier Ontologybased Keyword Semantic Judgment Method, MOKSJM),在通用本體WordNet的基礎上,通過與修飾詞本體相結合的方式,共同決定名詞在素材中的語義,減少因“一詞多義” 和“一義多詞”造成的語義混淆,降低因構建領域本體帶來的時間成本和人力成本。
3 結語
領域本體已成為了動態(tài)需求跟蹤的重要研究手段,但目前構建領域本體的方法并不能實現(xiàn)自動化,且構建領域本體的質量和規(guī)模上受到了一定程度上的限制。
本文提出了一種基于修飾詞本體的關鍵詞語義判斷方法(MOKSJM)。該方法在通用本體WordNet的基礎上,根據修飾詞類別和修飾詞語義距離,以及通過調整關鍵字的相似度來體現(xiàn)語義選擇的目的,消除語義分歧,實驗證明了該方法的有效性。
下一步工作將集中于如何將句式結構和修飾詞相結合,利用淺層語義分析的方法,從句式層面上,集中的體現(xiàn)句子語義的中心含義,提高推薦跟蹤鏈的準確性。
參考文獻 (References)
[1] ??? CLELANDHUANG J, SETTIMI R, DUAN C, et al. Utilizing supporting evidence to improve dynamic requirements traceability[C]// Proceedings of the 13th IEEE International Conference on Requirements Engineering. Piscataway, NJ: IEEE, 2005: 135-144.
[2] ??? CHEN Z, HE Z, LIU X, et al. An exploration of semantic relations in neural word embeddings using extrinsic knowledge[C]// Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine. Washington, DC: IEEE Computer Society, 2017:1246-1251.
[3] ??? KOLHE S R, SAWARKAR S D. A concept driven document clustering using WordNet[C]// Proceedings of the 2017 International Conference on Nascent Technologies in Engineering. Piscataway, NJ: IEEE, 2017:1-5.
[4] ??? BESBES G, BAAZAOUIZGHAL H. Fuzzy ontologybased medical information retrieval[C]// Proceedings of the 2016 IEEE International Conference on Fuzzy Systems. Piscataway, NJ: IEEE, 2016:178-185.
[5] ??? MATEI L S, MATU S T. Document semantic distance based on the time series model[C]// Proceedings of the 2016 15th RoEduNet Conference: Networking in Education and Research. Piscataway, NJ: IEEE, 2016:1-4.
[6] ??? KULATHUNGA C, KARUNARATNE D D. An ontologybased and domain specific clustering methodology for financial documents[C]// Proceedings of the 17th International Conference on Advances in ICT for Emerging Regions. Piscataway, NJ: IEEE, 2018:1-8.
[7] ??? MAI F J, HUANG L, TAN J, et al. The research of semantic kernel in SVM for Chinese text classification[C]// Proceedings of the 2nd International Conference on Intelligent Information Processing. New York: ACM, 2017: Article No. 8.
[8] ??? 鞏皓, 杜軍平, 賴金財,等. 基于本體和局部查詢反饋的微博查詢擴展算法[J]. 南京大學學報(自然科學版), 2017, 53(6):1004-1011.(GONG H, DU J P, LAI J C, et al. Microblog query expansion algorithm based on ontology and local query feedbace[J]. Journal of Nanjing University (Natural Sciences), 2017, 53(6):1004-1011.)
[9] ??? CUNNINGHAM H, MAYNARD D, BONTCHEVA K, et al. Developing language processing components with GATE version 7 (a user guide)[EB/OL].[2018-03-20]. http://gate.ac.uk/sale/tao/tao.pdf.
[10] ?? 李引, 李娟, 李明樹. 動態(tài)需求跟蹤方法及跟蹤精度問題研究[J]. 軟件學報, 2009, 20(2):177-192. (LI Y, LI J, LI M S. Research on dynamic requirement traceability method and traces precision[J]. Journal of Software, 2009, 20(2):177-192.)
[11] ?? Stanford University. The Stanford parser: a statistical parser[CP/OL]. [2018-07-21]. https://nlp.stanford.edu/software/lexparser.shtml.
[12] ?? 徐健, 張智雄. 基于詞語軟匹配和修飾詞權重差異化的術語相似度算法[J]. 情報學報, 2011, 30(11):1145-1151.(XU J, ZHANG Z X. An term similarity algorithm based on word soft matching and weight difference of modifying words[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(11):1145-1151.)
[13] ?? LI Y, CLELANDHUANG J. Ontologybased trace retrieval[C]// Proceedings of the 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering. Washington, DC: IEEE Computer Society, 2013: 30-36.
[14] ?? SALTON G, WONG A, YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[15] ?? ?MANNING C D, RAGHAVAN P, SCHUTZE H. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008: 142-145.