• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看

      ?

      基于多任務(wù)學習的生成文本摘要研究

      2020-12-28 11:53李伯涵李紅蓮
      電腦知識與技術(shù) 2020年31期

      李伯涵 李紅蓮

      摘要:基于注意力機制的編碼-解碼模型的神經(jīng)網(wǎng)絡(luò)具有很好的生成文本摘要能力。但是,這些模型在生成過程中很難控制,這導(dǎo)致在生成文本中缺少關(guān)鍵信息。一些關(guān)鍵信息,例如時間,地點和人物,對于人類理解主要內(nèi)容是必不可少的。本文提出了一個基于多任務(wù)學習框架的用于生成文本摘要的關(guān)鍵信息指南網(wǎng)絡(luò)。主要思想是以端到端的方式自動提取人們最需要的關(guān)鍵信息,并用其指導(dǎo)生成過程,從而獲得更符合人類需求的摘要。在本文提出的模型中,文檔被編碼為兩個部分:普通文檔編碼器的編碼結(jié)果和關(guān)鍵信息的編碼,關(guān)鍵信息包括關(guān)鍵句和關(guān)鍵詞。引入了多任務(wù)學習框架以獲得更先進的端到端模型。為了融合關(guān)鍵信息,提出了一種多視角注意指南網(wǎng)絡(luò),以獲取源文本和關(guān)鍵信息的向量。另外,向量被合并到生成模塊中以指導(dǎo)摘要生成的過程。本文在CNN 與Daily Mail數(shù)據(jù)集上評估了模型,實驗結(jié)果表明此模型有重大改進。

      關(guān)鍵詞:關(guān)鍵信息;多任務(wù)學習;文本摘要;關(guān)鍵信息指南網(wǎng)絡(luò)

      中圖分類號:TP391.1 ? ? ?文獻標識碼:A

      文章編號:1009-3044(2020)31-0020-06

      Abstract: The neural network of the encoding-decoding model based on the attention mechanism has a good ability to generate text summaries. However, these models are difficult to control during the generation process, which leads to the lack of key information in the generated text. Some key information, such as time, place, and people, is essential for humans to understand the main content. This paper proposes a key information guide network for generating text summaries based on a multi-task learning framework. The main idea is to automatically extract the key information that people need most in an end-to-end manner, and use it to guide the generation process, so as to obtain a summary that is more in line with human needs. In the model proposed in this paper, the document is coded into two parts: the coding result of the ordinary document encoder and the coding of key information. The key information includes key sentences and keywords. A multi-task learning framework is introduced to obtain a more advanced end-to-end model. In order to fuse key information, a multi-perspective attention guide network is proposed to obtain the vector of source text and key information. In addition, vectors are incorporated into the generation module to guide the process of abstract generation. This paper evaluates the model on the CNN and Daily Mail datasets, and the experimental results show that this model has significant improvements.

      Key words: key information;multi-task learning;text summary;key information guide network

      1引言

      文本摘要是一項自動從給定文本生成簡短摘要的任務(wù)。文本摘要有兩種主要方法:抽取式和生成式。抽取式模型[1-2]通常通過從原始文本中提取一些句子來獲得摘要,而生成式模型[3-4]通過生成新的句子來生成摘要。最近,編碼-解碼神經(jīng)網(wǎng)絡(luò)框架[5]推進了對生成式文本摘要的進一步研究。

      原始文本和摘要都是人類語言。為了得到更高質(zhì)量的結(jié)果,模型必須能夠“理解”并以類似于人的方式表示原始文本。時間,地點和人物等實體是人類理解主要內(nèi)容的關(guān)鍵。因此,必須利用這些關(guān)鍵信息生成摘要。盡管當前的生成式模型被證明能夠捕獲文本摘要的規(guī)律性,但是在生成過程中卻很難對其進行控制。換句話說,沒有外部指導(dǎo),很難確保生成式模型可以識別關(guān)鍵信息并將其生成到摘要中[6]。

      一些研究試圖解決這些問題。Zhou等人[7]提出了一個選擇性門網(wǎng)絡(luò)來保留摘要中的更多關(guān)鍵信息。但是,由輸入文本控制的選擇性門網(wǎng)絡(luò)僅控制一次從編碼器到解碼器的信息流。如果某些關(guān)鍵信息沒有通過網(wǎng)絡(luò),那么它們將很難出現(xiàn)在摘要中。See等人[8]提出了一種指針生成器模型,該模型使用指針機制[9]從輸入文本中復(fù)制單詞,以處理未登錄詞(OOV)。沒有外部指導(dǎo),指針很難識別關(guān)鍵字。在先前的工作中,本文將抽取式模型和生成式模型結(jié)合在一起,并使用前一個模型來獲取關(guān)鍵詞作為后一個模型的指導(dǎo)[10]。然而,這種模型還不夠完善。并且這是一個通過TextRank算法提取關(guān)鍵字的流水線系統(tǒng)。

      3.3指針機制

      為了處理OOV(未登錄詞)問題,將指針網(wǎng)絡(luò)[9]與基于關(guān)鍵信息的生成方法相結(jié)合,這樣便能夠復(fù)制單詞并生成文本。在指針生成器模型中,需要計算一個開關(guān)[psw]以在生成的單詞和原文本的單詞之間進行選擇:

      4使用KIGN進行生成式文本摘要的多任務(wù)學習

      第3節(jié)中介紹的KIGN使文本摘要生成器可以更加關(guān)注關(guān)鍵信息。但是,該模型還不夠完善。例如,關(guān)鍵信息是通過Textrank算法而不是基于學習的方法獲得的。本節(jié)介紹了一種基于KIGN框架的用于生成式文本摘要的多任務(wù)學習模型(圖2),該模型包括文檔編碼器,關(guān)鍵信息提取器以及聯(lián)合訓(xùn)練和預(yù)測指南機制等方法。

      如圖2所示,首先,提出一種文檔編碼器,它包括單詞級編碼器和句子級編碼器。這樣,可以分別獲取每個單詞和每個句子編碼的全局特征。然后,關(guān)鍵信息提取層選擇關(guān)鍵詞和關(guān)鍵句子。接下來,生成器將指導(dǎo)生成過程。最后,通過以端到端的方式最小化三個損失函數(shù)來訓(xùn)練和關(guān)鍵信息提取器。

      4.1文檔編碼器

      本文提出了一種新穎的文檔編碼器,分別對單詞和句子進行編碼,而不是僅使用分層編碼器對兩者進行編碼[4]。該方法將有助于后續(xù)的關(guān)鍵詞提取和關(guān)鍵句子提取。

      4.1.1全局詞編碼器

      關(guān)鍵詞提取的任務(wù)需要整個文檔的信息以及單個單詞的信息。使用雙向LSTM作為全局詞編碼器。輸入文本[w1,w2,...,wn]的詞語被前饋到全局單詞編碼器中,全局單詞編碼器將文本映射到一系列隱藏狀態(tài)[hw1→,hw2→,...,hwn→]:

      關(guān)鍵詞標簽生成。要獲取關(guān)鍵詞,需要去除參考摘要中的停用詞,然后將其余部分[rw=rwii]用作關(guān)鍵字正確數(shù)據(jù)標簽。對于關(guān)鍵句子正確數(shù)據(jù)標簽[rs=rsii],測量文本中每個句子的信息量,并選擇關(guān)鍵句子[11]。為了獲得正確數(shù)據(jù)標簽,首先通過計算句子和參考摘要之間的ROUGE-L得分,來測量文章中每個句子的信息量。其次,根據(jù)信息量對句子進行排序,并按信息量從高到低的順序選擇句子。最后,獲得了正確數(shù)據(jù)標簽并通過最小化損失函數(shù)來訓(xùn)練提取器。方法類似于[11],該方法旨在提取文章的最終摘要,以便使用ROUGE F-1得分來選擇句子。在本文提出的模型中,使用ROUGE得分來獲得盡可能多的參考摘要信息。

      4.5測試預(yù)測指南機制

      在測試的過程中,模型在預(yù)測下一個單詞時,不僅要考慮上述概率(公式(9)),還要考慮預(yù)測指南機制[13]預(yù)測的長期值。

      本文的預(yù)測指南機制是具有sigmoid激活函數(shù)的單層前饋網(wǎng)絡(luò),可預(yù)測最終摘要中關(guān)鍵信息的范圍。在每個解碼時間[t]處,對解碼器隱藏狀態(tài)進行平均求和[st=1tl=1tsl],編碼器隱藏狀態(tài)[hn=1ni=1nhi],并且關(guān)鍵信息[k]作為輸入獲得長期值。

      為每個[x]摘要采樣yp1和yp2兩個部分,并隨機停止得到[st]。然后,從yp完成構(gòu)建,以獲取M個平均解碼器隱藏狀態(tài)[s]并計算平均得分:

      5實驗

      5.1實驗設(shè)置

      這里使用CNN /Daily Mail數(shù)據(jù)集[4,25],并且以與[8]相同的方式處理數(shù)據(jù)。對于全局單詞編碼器和句子編碼器,使用三個300維LSTM,并使用一個5萬詞的詞匯表。在訓(xùn)練和測試過程中,將單詞編碼器的輸入設(shè)為400個詞匯,并將摘要的長度設(shè)為100個詞匯。使用Adagrad算法[15]以學習速率0.15和初始累加值0.1來訓(xùn)練模型。一次訓(xùn)練樣本數(shù)設(shè)置為16,關(guān)鍵詞和關(guān)鍵句子的數(shù)量分別為40和10。共同訓(xùn)練這三個任務(wù),并設(shè)置[λ1=1]和[λ2=λ3=0.5]。本文采用的主要評估指標是ROUGE的F得分。

      此外,對于預(yù)測機制,使用單層前饋網(wǎng)絡(luò),并將節(jié)點數(shù)設(shè)置為800。對于超參數(shù)[α],在解碼期間使用不同的[α]來測試KIGN預(yù)測指南模型在運行期間的性能。從圖3可以看出,[α]為0.8到0.95時此模型的性能是穩(wěn)定的。另外,將[α]設(shè)置為0.9時,可以獲得最高的F分數(shù)。這里將M設(shè)置為8并采用mini-batch算法與AdaDelta算法來訓(xùn)練模型。

      在訓(xùn)練和測試的過程中,將輸入詞語設(shè)置為400個,并將輸出摘要的長度設(shè)置為100個詞語。在測試階段類似于[8],將輸入詞語設(shè)置為400個,并將輸出摘要的長度設(shè)置為100個詞語。以少于200,000的迭代次數(shù)訓(xùn)練關(guān)鍵詞網(wǎng)絡(luò)模型。然后,基于KIGN模型訓(xùn)練一個單層前饋網(wǎng)絡(luò)。最后,在測試過程中,將KIGN模型與預(yù)測指南機制相結(jié)合以生成摘要。

      5.2結(jié)果與討論

      實驗結(jié)果如表1所示。前五種是常用的序列到序列方法:具有注意力機制的Seq2Seq模型(詞匯表含150k詞匯),具有注意力機制的Seq2Seq模型(詞匯表含50k詞匯),具有圖注意力機制的Seq2Seq模型,分層注意力機制網(wǎng)絡(luò)[4]和具有指針機制的Seq2Seq模型。

      表1表明本文模型的基線模型(稱為關(guān)鍵信息指南網(wǎng)絡(luò))(如圖1所示)比具有指針機制的Seq2Seq模型獲得更好的分數(shù),分別為+1.3 ROUGE-1,+0.9 ROUGE-2和+1.0 ROUGE-L。借助指針機制(如圖1所示),關(guān)鍵信息指南網(wǎng)絡(luò)(KIGN + Prediction-guide)再次取得了更好的分數(shù),分別為+2.5 ROUGE-1,+1.5 ROUGE-2和+2.2ROUGE-L。而具有多任務(wù)學習框架的關(guān)鍵信息指南網(wǎng)絡(luò)(如圖2所示)獲得了最佳分數(shù)。并且,如果給出了關(guān)鍵詞和關(guān)鍵句子,則結(jié)果會更好,這也證明使用關(guān)鍵信息指導(dǎo)文本摘要生成是合理的。

      5.3案例研究

      為了證明本文的方法獲得關(guān)鍵信息的能力,圖4顯示了特定文本的處理結(jié)果。原始文本列在圖4的上半部分,并對關(guān)鍵信息加粗處理。接下來是給出的原本摘要和兩個模型的輸出。原始文字是關(guān)于在Android手機上的Google手寫輸入和一些功能介紹,關(guān)鍵信息是“goole claims”,“read anyones handwriting”,“android handsets can under 82 languages in 20 distinct scripts”和“works with both printed and cursive writing input with or without a stylus”可以觀察到,具有指針機制的基線模型的輸出僅涉及“google have cracked the problem of reading handwriting”,然而本文的模型總結(jié)了幾乎所有關(guān)鍵信息。

      6結(jié)論

      本文提出了一種具有關(guān)鍵信息指南網(wǎng)絡(luò)的多任務(wù)學習模型,該模型以新穎的方式將抽取式方法和生成式方法相結(jié)合。此模型基于關(guān)鍵信息指南網(wǎng)絡(luò)。關(guān)鍵信息指南網(wǎng)絡(luò)使用抽取式方法從文本中獲取關(guān)鍵字,然后將關(guān)鍵詞編碼為關(guān)鍵信息向量,并將其輸入到生成模型中以指導(dǎo)生成過程。指導(dǎo)主要從兩個方面進行:注意力機制和指針機制。在關(guān)鍵信息指導(dǎo)網(wǎng)絡(luò)的基礎(chǔ)上,本文提出了一種多任務(wù)學習模型來聯(lián)合訓(xùn)練抽取式模型和生成式模型,具體而言,使用文檔編碼器分別對輸入文本的詞匯和句子進行編碼。然后,基于編碼器,提取包括關(guān)鍵字和關(guān)鍵句在內(nèi)的關(guān)鍵信息。通過這種方式,關(guān)鍵信息的提取并不是通過TextRank方法,而是從序列到序列模型中獲取。最后,共同訓(xùn)練關(guān)鍵詞提取,關(guān)鍵句子提取和生成摘要這三項任務(wù)。在測試階段,使用一種預(yù)測指南機制,來進一步指導(dǎo)摘要生成,該機制可以獲取長期值用于后續(xù)的解碼。實驗表明,本文提出的模型可以顯著提升摘要的質(zhì)量。

      參考文獻:

      [1] R. Mihalcea, P. Tarau, in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Text rank: bringing order into text (Association for Computational Linguistics, Barcelona, Spain, 2004), pp. 404–411. https://www.aclweb.org/anthology/W04-3252.

      [2] YasunagaM,ZhangR,MeeluK,etal.Graph-based neural multi-document summarization[EB/OL].2017:arXiv:1706.06681[cs.CL].https://arxiv.org/abs/1706.06681.

      [3] A. M. Rush, S. Chopra, J. Weston, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. A neural attention model for abstractive sentence summarization (Association for Computational Linguistics, Lisbon, Portugal, 2015), pp. 379–389. https://www.aclweb.org/anthology/D15-1044. https://doi.org/10.18653/v1/D15-1044

      [4] R. Nallapati, B. Xiang, B. Zhou, Sequence-to-sequence rnns for text summarization. CoRR.abs/1602.06023 (2016). 1602.06023

      [5] I. Sutskever, O. Vinyals, Q. V. Le, in Advances in Neural Information Processing Systems 27. Sequence to sequence learning with neural networks (Curran Associates, Inc., 2014), pp. 3104–3112. http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf

      [6] J. Tan, X. Wan, J. Xiao, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).Abstractive document summarization with a graph-based attentional neural model (Association for Computational Linguistics, Vancouver, Canada, 2017), pp. 1171–1181. https://www.aclweb.org/anthology/P17-1108. https://doi.org/10.18653/v1/P17-1108

      [7] Q. Zhou, N. Yang, F. Wei, M. Zhou, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Selective encoding for abstractive sentence summarization (Association for Computational Linguistics, Vancouver, Canada, 2017), pp. 1095–1104. https://www.aclweb.org/anthology/P17-1101. https://doi.org/10.18653/v1/P17-1101

      [8]A. See, P. J. Liu, CD. Manning, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Get to the point: summarization with pointer-generator networks (Association for Computational Linguistics, Vancouver, Canada, 2017), pp. 1073–1083. https://www.aclweb.org/anthology/P17-1099. https://doi.org/10.18653/v1/P17-1099

      [9] O. Vinyals, M. Fortunato, N. Jaitly, in Advances in Neural Information Processing Systems 28, ed. by C. Cortes, N. D. Lawrence, D. D. Lee, M.Sugiyama, and R. Garnett. Pointer networks (Curran Associates, Inc., 2015), pp. 2692–2700. http://papers.nips.cc/paper/5866-pointer-networks.pdf

      [10] C. Li, W. Xu, S. Li, S. Gao, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Guiding generation for abstractive text summarization based on key information guide network (Association for Computational Linguistics, New Orleans, Louisiana, 2018), pp. 55–60. https://www.aclweb.org/anthology/N18-2009. https://doi.org/10.18653/v1/N18-2009

      [11] W.-T. Hsu, C.-K.Lin, M.-Y. Lee, K. Min, J. Tang, M. Sun, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). A unified model for extractive and abstractive summarization using inconsistency loss (Association for Computational Linguistics, Melbourne, Australia, 2018), pp. 132–141. https://www. aclweb.org/anthology/P18-1013. https://doi.org/10.18653/v1/P18-1013

      [12] S. Chopra, M. Auli, A. M. Rush, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Abstractive sentence summarization with attentive recurrent neural networks (Association for Computational Linguistics, San Diego, California, 2016), pp. 93–98. https://www.aclweb.org/anthology/N16-1012. https://doi.org/10.18653/v1/N16-1012

      [13] D. He, H. Lu, Y. Xia, T. Qin, L. Wang, T.-Y. Liu, in Advances in Neural Information Processing Systems 30, ed. by I. Guyon, U. V. Luxburg, S. Bengio,H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett. Decoding with value networks for neural machine translation (Curran Associates, Inc.,2017), pp. 178–187. http://papers.nips.cc/paper/6622-decoding-with value-networks-for-neural-machine-translation.pdf

      [14] B. Dzmitry, C. Kyunghyun, B. Yoshua, Neural machine translation by jointly learning to align and translate. Comput.Sci. (2014). 1409.0473v6

      [15] J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul),2121–2159 (2011)

      [16] C. Y. Lin, E. Hovy, in Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. Automatic evaluation of summaries using n-gram co-occurrence statistics, (2003), pp. 150–157. https://www. aclweb.org/anthology/N03-1020

      [17] M. D. Zeiler, ADADELTA: an adaptive learning rate method. CoRR.abs/1212.5701 (2012). 1212.5701

      [18] M. Zhanyu, L. Yuping, K. W. Bastiaan, W. Liang, G. Jun, Variational Bayesian learning for Dirichlet process mixture of inverted Dirichlet distributions in non-Gaussian image feature modeling. IEEE Trans. Neural Netw. Learn. Syst. 30(2), 449–463 (2019). https://doi.org/10.1109/TNNLS.2018.2844399

      [19] M. Zhanyu, Y. Hong, C. Wei, G. Juo, Short utterance based speech language identification in intelligent vehicles with time-scale modifications and deep bottleneck features. IEEE Trans. Veh. Technol.68(1), 121–128 (2019). https://doi.org/10.1109/TVT.2018.2879361

      [20] K. Zhang, N. Liu, X. Yuan, X. Guo, C. Gao, Z. Zhao, Fine-grained age estimation in the wild with attention LSTM networks. IEEE Trans. Circ. Syst.Video Technol. (TCSVT) (2019). Accepted. https://doi.org/10.1109/tcsvt.2019.2936410

      [21] L. Xiaoxu, Y. Liyun, C. Dongliang, M. Zhanyu, C. Jie, Dual cross-entropy loss for small-sample fine-grained vehicle classification. IEEE Trans. Veh.Technol. (TVT). 68(5), 4204–4212 (2019)

      [22] M. Zhanyu, X. Jing-Hao, L. Arne, T. Zheng-Hua, Y. Zhen, G. Jun,Decorrelation of neutral vector variables: theory and applications. IEEE Trans. Neural Netw. Learn. Syst. 29(1), 129–143 (2018). https://doi.org/10.1109/TNNLS.2016.2616445

      [23] M. Zhanyu, C. Dongliang, X. Jiyang, D. Yifeng, W. Shaoguo, L. Xiaoxu, S.Zhongwei, G. Jun, Fine-grained vehicle classification with channel max pooling modified CNNs. IEEE Trans. Veh. Technol. 68(4), 3224–3233 (2019)

      [24] M. Zhanyu, X. Jiyang, L. Yuping, T. Jalil, X. Jing-Hao, G. Jun, Insights into multiple/single lower bound approximation for extended variational inference in non-Gaussian structured data modeling. IEEE Trans. Neural Netw. Learn. Syst., 1–15 (2019). https://doi.org/10.1109/TNNLS.2019.2899613

      [25] K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M.Suleyman, P. Blunsom, in Advances in Neural Information Processing Systems 28, ed. by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R.Garnett. Teaching machines to read and comprehend (Curran Associates, Inc., 2015), pp. 1693–1701. http://papers.nips.cc/paper/5945-teaching machines-to-read-and-comprehend.pdf

      【通聯(lián)編輯:梁書】

      于都县| 闽清县| 滦南县| 崇州市| 南岸区| 阳高县| 神池县| 承德市| 平潭县| 常山县| 祁门县| 循化| 藁城市| 田阳县| 子长县| 彩票| 奉节县| 新河县| 翁牛特旗| 鹤山市| 扶风县| 云和县| 城固县| 兴和县| 灵宝市| 中西区| 扶沟县| 读书| 宁河县| 元朗区| 南木林县| 荣成市| 青龙| 黔江区| 华容县| 邳州市| 金昌市| 聂荣县| 龙山县| 阿拉善盟| 郎溪县|