cs.CL 方向,今日共計22篇
Transformer(1篇)
【1】 Classifying Long Clinical Documents with Pre-trained Transformers
標(biāo)題:使用預(yù)先訓(xùn)練的變形金剛對長篇臨床文檔進(jìn)行分類
作者:Xin Su,Timothy Miller,Xiyu Ding,Majid Afshar,Dmitriy Dligach
機(jī)構(gòu):University of Arizona, Boston Children’s Hospital and Harvard Medical School, University of Wisconsin–Madison, Loyola University Chicago
鏈接:https://arxiv.org/abs/2105.06752
摘要:大規(guī)模數(shù)據(jù)集的提出促進(jìn)了新聞?wù)顚由窠?jīng)模型的研究。深度學(xué)習(xí)還可能對口語對話摘要有用,這有助于一系列實際場景,包括客戶服務(wù)管理和藥物跟蹤。為此,我們提出了DialSumm,一個大規(guī)模的有標(biāo)簽的對話摘要數(shù)據(jù)集。我們使用最先進(jìn)的神經(jīng)摘要器對DialSumm進(jìn)行了實證分析。實驗結(jié)果表明,對話摘要在口語術(shù)語、特殊的語篇結(jié)構(gòu)、共指和省略、語用學(xué)和社會常識等方面面臨著獨特的挑戰(zhàn),這些都需要特定的表征學(xué)習(xí)技術(shù)來更好地應(yīng)對。
摘要:Automatic phenotyping is a task of identifying cohorts of patients that match a predefined set of criteria. Phenotyping typically involves classifying long clinical documents that contain thousands of tokens. At the same time, recent state-of-art transformer-based pre-trained language models limit the input to a few hundred tokens (e.g. 512 tokens for BERT). We evaluate several strategies for incorporating pre-trained sentence encoders into document-level representations of clinical text, and find that hierarchical transformers without pre-training are competitive with task pre-trained models.
BERT(2篇)
【1】 BERT Busters: Outlier LayerNorm Dimensions that Disrupt BERT
標(biāo)題:伯特·巴斯特:擾亂伯特的離群層范數(shù)維度
作者:Olga Kovaleva,Saurabh Kulshreshtha,Anna Rogers,Anna Rumshisky
機(jī)構(gòu):Department of Computer Science, University of Massachusetts Lowell, Center for Social Data Science, University of Copenhagen
備注:Accepted as long paper at Findings of ACL 2021
鏈接:https://arxiv.org/abs/2105.06990
摘要:我們生活在一個重要的時代??茖W(xué)界擁有一個宇宙信使的兵工廠,可以對宇宙進(jìn)行前所未有的詳細(xì)研究。引力波、電磁波、中微子和宇宙射線涵蓋了廣泛的波長和時間尺度。結(jié)合和處理這些在數(shù)量、速度和維度上各不相同的數(shù)據(jù)集需要新的儀器協(xié)調(diào)模式、資金籌措模式和國際合作模式以及專門的人力和技術(shù)基礎(chǔ)設(shè)施。隨著大規(guī)??茖W(xué)設(shè)施的出現(xiàn),過去十年在計算和信號處理算法方面經(jīng)歷了前所未有的變革。圖形處理單元、深度學(xué)習(xí)和開源高質(zhì)量數(shù)據(jù)集的可用性的結(jié)合,推動了人工智能的興起。這場數(shù)字革命現(xiàn)在推動了一個價值數(shù)十億美元的產(chǎn)業(yè),對技術(shù)和社會產(chǎn)生了深遠(yuǎn)的影響。在這一章中,我們描述了開創(chuàng)性的努力,以適應(yīng)人工智能算法,以解決計算的巨大挑戰(zhàn),在多信使天體物理學(xué)。我們回顧了這些破壞性算法的快速發(fā)展,從2017年初推出的第一類算法,到如今將領(lǐng)域?qū)I(yè)知識融入其架構(gòu)設(shè)計和優(yōu)化方案的復(fù)雜算法。我們討論了科學(xué)可視化和極端規(guī)模計算在減少洞察時間和從模型和數(shù)據(jù)之間的相互作用中獲得新知識方面的重要性。
摘要:Multiple studies have shown that BERT is remarkably robust to pruning, yet few if any of its components retain high importance across downstream tasks. Contrary to this received wisdom, we demonstrate that pre-trained Transformer encoders are surprisingly fragile to the removal of a very small number of scaling factors and biases in the output layer normalization (<0.0001% of model weights). These are high-magnitude normalization parameters that emerge early in pre-training and show up consistently in the same dimensional position throughout the model. They are present in all six models of BERT family that we examined and removing them significantly degrades both the MLM perplexity and the downstream task performance. Our results suggest that layer normalization plays a much more important role than usually assumed.
【2】 Distilling BERT for low complexity network training
標(biāo)題:用于低復(fù)雜度網(wǎng)絡(luò)訓(xùn)練的BERT提取
作者:Bansidhar Mangalwedhekar
鏈接:https://arxiv.org/abs/2105.06514
摘要:利用SST-2數(shù)據(jù)集上的情感分析,研究了將BERT學(xué)習(xí)轉(zhuǎn)化為低復(fù)雜度模型BiLSTM、帶注意的BiLSTM和淺層CNNs的效率。本文還比較了BERT模型與這些低復(fù)雜度模型的推理復(fù)雜度,并強(qiáng)調(diào)了這些技術(shù)在邊緣設(shè)備(如手機(jī)、平板電腦和Raspberry-Pi等MCU開發(fā)板)上實現(xiàn)高性能NLP模型以及實現(xiàn)令人興奮的新應(yīng)用方面的重要性。
摘要:This paper studies the efficiency of transferring BERT learnings to low complexity models like BiLSTM, BiLSTM with attention and shallow CNNs using sentiment analysis on SST-2 dataset. It also compares the complexity of inference of the BERT model with these lower complexity models and underlines the importance of these techniques in enabling high performance NLP models on edge devices like mobiles, tablets and MCU development boards like Raspberry Pi etc. and enabling exciting new applications.
QA|VQA|問答|對話(1篇)
【1】 QAConv: Question Answering on Informative Conversations
標(biāo)題:QAConv:信息性對話的問答
作者:Chien-Sheng Wu,Andrea Madotto,Wenhao Liu,Pascale Fung,Caiming Xiong
機(jī)構(gòu):?Salesforce AI Research, ?The Hong Kong University of Science and Technology
備注:Data and code are available at this https URL
鏈接:https://arxiv.org/abs/2105.06912
摘要:本文介紹了一種新的問答數(shù)據(jù)集qacev,它利用會話作為知識源。我們專注于信息交流,包括商務(wù)郵件、小組討論和工作渠道。與開放領(lǐng)域和面向任務(wù)的對話不同,這些對話通常是長的、復(fù)雜的、異步的,并且涉及到很強(qiáng)的領(lǐng)域知識。總的來說,我們收集了34204對問答,包括基于廣度的、自由形式的和無法回答的問題,從10259個選擇的對話中,包括人類書面和機(jī)器生成的問題。我們將長對話分段,并使用問題生成器和對話摘要器作為輔助工具來收集多跳問題。數(shù)據(jù)集有兩種測試場景,chunk模式和full模式,這取決于固定的chunk是提供的還是從大型會話池中檢索的。實驗結(jié)果表明,在現(xiàn)有QA數(shù)據(jù)集上訓(xùn)練的最新QA系統(tǒng)具有有限的零射擊能力,并且傾向于預(yù)測我們的問題是無法回答的。在我們的語料庫上對這樣的系統(tǒng)進(jìn)行微調(diào)可以分別在塊模式和全模式下獲得23.6%和13.6%的顯著改善。
摘要:This paper introduces QAConv, a new question answering (QA) dataset that uses conversations as a knowledge source. We focus on informative conversations including business emails, panel discussions, and work channels. Unlike open-domain and task-oriented dialogues, these conversations are usually long, complex, asynchronous, and involve strong domain knowledge. In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions, from 10,259 selected conversations with both human-written and machine-generated questions. We segment long conversations into chunks, and use a question generator and dialogue summarizer as auxiliary tools to collect multi-hop questions. The dataset has two testing scenarios, chunk mode and full mode, depending on whether the grounded chunk is provided or retrieved from a large conversational pool. Experimental results show that state-of-the-art QA systems trained on existing QA datasets have limited zero-shot ability and tend to predict our questions as unanswerable. Fine-tuning such systems on our corpus can achieve significant improvement up to 23.6% and 13.6% in both chunk mode and full mode, respectively.
機(jī)器翻譯(2篇)
【1】 Do Context-Aware Translation Models Pay the Right Attention?
標(biāo)題:語境感知翻譯模式是否得到了應(yīng)有的重視?
作者:Kayo Yin,Patrick Fernandes,Danish Pruthi,Aditi Chaudhary,André F. T. Martins,Graham Neubig
機(jī)構(gòu):Andr′e F. T. Martins, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, Instituto de Telecomunicac??oes, Lisbon, Portugal, Unbabel, Lisbon, Portugal
備注:Accepted to ACL2021
鏈接:https://arxiv.org/abs/2105.06977
摘要:上下文感知機(jī)器翻譯模型旨在利用上下文信息,但往往不能做到這一點。結(jié)果,他們錯誤地消除了代詞和多義詞的歧義,這些詞需要上下文來解決。在本文中,我們提出了幾個問題:人類譯者使用什么樣的語境來解決歧義詞?模型是否大量關(guān)注同一背景?如果我們明確地訓(xùn)練他們這樣做呢?為了回答這些問題,我們引入了SCAT(Supporting Context for difficious Translations),這是一個新的英法數(shù)據(jù)集,包含14K翻譯的支持上下文詞,專業(yè)翻譯人員發(fā)現(xiàn)它對代詞消歧很有用。使用SCAT,我們對用于消除歧義的上下文進(jìn)行了深入分析,檢查了支持詞的位置和詞匯特征。此外,我們還測量了模型的注意分?jǐn)?shù)與來自SCAT的支持上下文之間的一致程度,并應(yīng)用引導(dǎo)注意策略來鼓勵兩者之間的一致性。
摘要:Context-aware machine translation models are designed to leverage contextual information, but often fail to do so. As a result, they inaccurately disambiguate pronouns and polysemous words that require context for resolution. In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words? Are models paying large amounts of attention to the same context? What if we explicitly train them to do so? To answer these questions, we introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations that professional translators found useful for pronoun disambiguation. Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words. Furthermore, we measure the degree of alignment between the model's attention scores and the supporting context from SCAT, and apply a guided attention strategy to encourage agreement between the two.
【2】 Dynamic Multi-Branch Layers for On-Device Neural Machine Translation
標(biāo)題:在設(shè)備神經(jīng)機(jī)器翻譯中的動態(tài)多分支層
作者:Zhixing Tan,Maosong Sun,Yang Liu
機(jī)構(gòu):Department of Computer Science and Technology, Tsinghua University, Institute for AI Industry Research, Tsinghua University, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology
鏈接:https://arxiv.org/abs/2105.06679
摘要:我們介紹了dalaj1.0,一個用于瑞典語可接受性判斷的數(shù)據(jù)集,第一個版本包含9596個句子;并將其用于二值分類任務(wù)的初步實驗。DaLAJ基于第二語言學(xué)習(xí)者的數(shù)據(jù),包括不同水平的文章。為了確保數(shù)據(jù)集可以免費使用,盡管GDPR的規(guī)定,我們有句子混亂的學(xué)習(xí)者論文和刪除部分元數(shù)據(jù)的學(xué)習(xí)者,為每個句子只保留有關(guān)母語的信息和課程水平的文章已經(jīng)寫了。我們使用學(xué)習(xí)者語言的規(guī)范化版本作為DaLAJ句子的基礎(chǔ),并且每個句子只保留一個錯誤。我們對句子中使用的每個單獨的更正標(biāo)記重復(fù)相同的句子。對于dalaj1.0,我們使用了四種錯誤類別(SweLL中有35種),它們都與詞匯或構(gòu)詞選擇有關(guān)。我們的二進(jìn)制分類的基線結(jié)果顯示,使用BERT嵌入的dalaj1.0的準(zhǔn)確率為58%。數(shù)據(jù)集包含在SwedishGlue(Swe)中。SuperLim)基準(zhǔn)。下面,我們將介紹數(shù)據(jù)集的格式、首次實驗、我們的見解以及選擇數(shù)據(jù)共享方法的動機(jī)。
摘要:With the rapid development of artificial intelligence (AI), there is a trend in moving AI applications such as neural machine translation (NMT) from cloud to mobile devices such as smartphones. Constrained by limited hardware resources and battery, the performance of on-device NMT systems is far from satisfactory. Inspired by conditional computation, we propose to improve the performance of on-device NMT systems with dynamic multi-branch layers. Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference. As not all branches are activated during training, we propose shared-private reparameterization to ensure sufficient training for each branch. At almost the same computational cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14 English-German translation task and 1.8 BLEU points on the WMT20 Chinese-English translation task over the Transformer model, respectively. Compared with a strong baseline that also uses multiple branches, the proposed method is up to 1.6 times faster with the same number of parameters.
摘要|信息提取(2篇)
【1】 EASE: Extractive-Abstractive Summarization with Explanations
標(biāo)題:輕松:帶解釋的摘要摘要
作者:Haoran Li,Arash Einolghozati,Srinivasan Iyer,Bhargavi Paranjape,Yashar Mehdad,Sonal Gupta,Marjan Ghazvininejad
機(jī)構(gòu):Facebook
鏈接:https://arxiv.org/abs/2105.06982
摘要:當(dāng)前的摘要系統(tǒng)在性能上優(yōu)于抽取式摘要系統(tǒng),但由于其固有的可解釋性不足,限制了其廣泛應(yīng)用。為了達(dá)到兩全其美的效果,我們提出了一個基于證據(jù)的文本生成的抽象框架EASE,并將其應(yīng)用到文檔摘要中。我們提出了一個基于信息瓶頸原理的可解釋摘要系統(tǒng),該系統(tǒng)以端到端的方式聯(lián)合訓(xùn)練用于抽取和抽象。受先前研究的啟發(fā),人類使用兩階段框架來總結(jié)長文檔(Jing和McKeown,2000),我們的框架首先提取預(yù)定義數(shù)量的證據(jù)跨度作為解釋,然后僅使用證據(jù)生成摘要。使用自動和人工評估,我們表明,我們的框架中的解釋比簡單的基線更相關(guān),而不會實質(zhì)性地犧牲生成摘要的質(zhì)量。
摘要:Current abstractive summarization systems outperform their extractive counterparts, but their widespread adoption is inhibited by the inherent lack of interpretability. To achieve the best of both worlds, we propose EASE, an extractive-abstractive framework for evidence-based text generation and apply it to document summarization. We present an explainable summarization system based on the Information Bottleneck principle that is jointly trained for extraction and abstraction in an end-to-end fashion. Inspired by previous research that humans use a two-stage framework to summarize long documents (Jing and McKeown, 2000), our framework first extracts a pre-defined amount of evidence spans as explanations and then generates a summary using only the evidence. Using automatic and human evaluations, we show that explanations from our framework are more relevant than simple baselines, without substantially sacrificing the quality of the generated summary.
【2】 DialSumm: A Real-Life Scenario Dialogue Summarization Dataset
標(biāo)題:DialSumm:一個真實場景對話摘要數(shù)據(jù)集
作者:Yulong Chen,Yang Liu,Liang Chen,Yue Zhang
機(jī)構(gòu):? Zhejiang University, ? School of Engineering, Westlake University, ? Microsoft Cognitive Services Research, ? College of Software, Jilin University, ? Institute of Advanced Technology, Westlake Institute for Advanced Study
備注:ACL findings
鏈接:https://arxiv.org/abs/2105.06762
摘要:大規(guī)模數(shù)據(jù)集的提出促進(jìn)了新聞?wù)顚由窠?jīng)模型的研究。深度學(xué)習(xí)還可能對口語對話摘要有用,這有助于一系列實際場景,包括客戶服務(wù)管理和藥物跟蹤。為此,我們提出了DialSumm,一個大規(guī)模的有標(biāo)簽的對話摘要數(shù)據(jù)集。我們使用最先進(jìn)的神經(jīng)摘要器對DialSumm進(jìn)行了實證分析。實驗結(jié)果表明,對話摘要在口語術(shù)語、特殊的語篇結(jié)構(gòu)、共指和省略、語用學(xué)和社會常識等方面面臨著獨特的挑戰(zhàn),這些都需要特定的表征學(xué)習(xí)技術(shù)來更好地應(yīng)對。
摘要:Proposal of large-scale datasets has facilitated research on deep neural models for news summarization. Deep learning can also be potentially useful for spoken dialogue summarization, which can benefit a range of real-life scenarios including customer service management and medication tracking. To this end, we propose DialSumm, a large-scale labeled dialogue summarization dataset. We conduct empirical analysis on DialSumm using state-of-the-art neural summarizers. Experimental results show unique challenges in dialogue summarization, such as spoken terms, special discourse structures, coreferences and ellipsis, pragmatics and social commonsense, which require specific representation learning technologies to better deal with.
推理|分析|理解|解釋(2篇)
【1】 Towards Navigation by Reasoning over Spatial Configurations
標(biāo)題:通過空間構(gòu)型推理實現(xiàn)導(dǎo)航
作者:Yue Zhang,Quan Guo,Parisa Kordjamshidi
機(jī)構(gòu):Michigan State University
鏈接:https://arxiv.org/abs/2105.06839
摘要:我們處理了一個導(dǎo)航問題,其中agent在觀察環(huán)境的同時遵循自然語言的指令。以語言理解為重點,我們展示了空間語義在將導(dǎo)航指令根植于視覺感知中的重要性。我們提出了一種利用空間結(jié)構(gòu)元素的神經(jīng)代理,并研究了它們對導(dǎo)航代理推理能力的影響。此外,我們還建立了順序執(zhí)行順序的模型,并將可視對象與指令中的空間配置對齊。我們的神經(jīng)代理在可見的環(huán)境中改進(jìn)了強(qiáng)基線,并在不可見的環(huán)境中顯示出競爭性能。此外,實驗結(jié)果表明,對指令中的空間語義元素進(jìn)行顯式建??梢蕴岣吣P偷幕A(chǔ)性和空間推理能力。
摘要:We deal with the navigation problem where the agent follows natural language instructions while observing the environment. Focusing on language understanding, we show the importance of spatial semantics in grounding navigation instructions into visual perceptions. We propose a neural agent that uses the elements of spatial configurations and investigate their influence on the navigation agent's reasoning ability. Moreover, we model the sequential execution order and align visual objects with spatial configurations in the instruction. Our neural agent improves strong baselines on the seen environments and shows competitive performance on the unseen environments. Additionally, the experimental results demonstrate that explicit modeling of spatial semantic elements in the instructions can improve the grounding and spatial reasoning of the model.
【2】 A cost-benefit analysis of cross-lingual transfer methods
標(biāo)題:跨語言遷移方式的成本效益分析
作者:Guilherme Moraes Rosa,Luiz Henrique Bonifacio,Leandro Rodrigues de Souza,Roberto Lotufo,Rodrigo Nogueira
機(jī)構(gòu):University of Campinas (UNICAMP), NeuralMind Inteligência Artificial, David R. Cheriton School of Computer Science, University of Waterloo
鏈接:https://arxiv.org/abs/2105.06813
摘要:一種有效的跨語言遷移方法是在一種語言的有監(jiān)督數(shù)據(jù)集上對雙語或多語模型進(jìn)行微調(diào),并在另一種語言上以零鏡頭方式進(jìn)行評估。在訓(xùn)練時或推理時翻譯實例也是可行的選擇。然而,與這些方法相關(guān)的成本在文獻(xiàn)中很少提及。在這項工作中,我們分析了跨語言方法的有效性(如準(zhǔn)確性)、開發(fā)和部署成本,以及它們在推理時的延遲。我們在三個任務(wù)上的實驗表明,最好的跨語言方法是高度依賴于任務(wù)的。最后,通過結(jié)合零鏡頭和翻譯方法,我們實現(xiàn)了本工作中使用的三個數(shù)據(jù)集中的兩個數(shù)據(jù)集的最新技術(shù)?;谶@些結(jié)果,我們質(zhì)疑是否需要在目標(biāo)語言中手動標(biāo)記訓(xùn)練數(shù)據(jù)。代碼、模型和翻譯數(shù)據(jù)集可在https://github.com/unicamp-dl/cross-lingual-analysis
摘要:An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner. Translating examples at training time or inference time are also viable alternatives. However, there are costs associated with these methods that are rarely addressed in the literature. In this work, we analyze cross-lingual methods in terms of their effectiveness (e.g., accuracy), development and deployment costs, as well as their latencies at inference time. Our experiments on three tasks indicate that the best cross-lingual method is highly task-dependent. Finally, by combining zero-shot and translation methods, we achieve the state-of-the-art in two of the three datasets used in this work. Based on these results, we question the need for manually labeled training data in a target language. Code, models and translated datasets are available at https://github.com/unicamp-dl/cross-lingual-analysis
GAN|對抗|攻擊|生成相關(guān)(3篇)
【1】 Generating Empathetic Responses with a Large Scale Dialog Dataset
標(biāo)題:使用大規(guī)模對話數(shù)據(jù)集生成感同身受的響應(yīng)
作者:Yubo Xie,Pearl Pu
機(jī)構(gòu):School of Computer and Communication Sciences, ′Ecole Polytechnique F′ed′erale de Lausanne, Switzerland
鏈接:https://arxiv.org/abs/2105.06829
摘要:移情反應(yīng)生成的任務(wù)旨在生成語法正確的反應(yīng),更重要的是,在前面的對話之后生成情感上合適的反應(yīng)?,F(xiàn)有的模型要么直接引入預(yù)定義的情感信息來指導(dǎo)反應(yīng)的產(chǎn)生,要么使用確定性規(guī)則來決定反應(yīng)的情感,忽略了人類對話中捕捉到的微妙的情感交互。隨著高級語言模型的出現(xiàn),學(xué)習(xí)自然語言對話中捕捉到的微妙的情感交流成為可能。為了充分探索情感和對話意圖的范圍,重要的是要整理一個足夠大的數(shù)據(jù)集,以闡明在我們的對話中人類情感互動的一般理解。在這篇文章中,我們詳細(xì)描述了一個大規(guī)模對話數(shù)據(jù)集的整理過程,其中每個話語被標(biāo)記為32種情感和9種意圖類別中的一種。然后,我們將展示如何建立一個多回合共情對話模型,該模型與6000多個人類評估實例的基線相比表現(xiàn)良好。
摘要:The task of empathetic response generation aims at generating syntactically correct and, more importantly, emotionally appropriate responses following previous dialog turns. Existing models either directly incorporate pre-defined emotion information to guide the response generation, or use deterministic rules to decide the response emotion, ignoring the subtle emotion interactions captured in human conversations. With the advent of advanced language models, it is possible to learn the nuanced emotional exchanges captured in natural language dialogs. To fully explore the range of emotions and dialog intents, it is important to curate a dataset large enough to shed light on the general understanding of human emotional interactions in our conversations. In this paper, we describe in detail the curation process of a large-scale dialog dataset where each utterance is labeled with one of 32 emotions and 9 intent categories. We then show how to build a multi-turn empathetic dialog model that performs well compared to its baselines over 6,000 human evaluated instances.
【2】 Adversarial Learning for Zero-Shot Stance Detection on Social Media
標(biāo)題:社交媒體上零射姿態(tài)檢測的對抗性學(xué)習(xí)
作者:Emily Allaway,Malavika Srikanth,Kathleen McKeown
機(jī)構(gòu):Department of Computer Science, Columbia University, New York, NY
備注:To appear in NAACL 2021
鏈接:https://arxiv.org/abs/2105.06603
摘要:社交媒體上的立場檢測有助于識別和理解日常生活中的傾斜新聞或評論。在這項工作中,我們提出了一個新的模型零射擊姿態(tài)檢測在Twitter上,使用對抗性學(xué)習(xí),以推廣跨主題。我們的模型在一些看不見的測試主題上以最小的計算成本實現(xiàn)了最先進(jìn)的性能。此外,我們將零鏡頭姿態(tài)檢測擴(kuò)展到新的主題,突出了零鏡頭轉(zhuǎn)移的未來方向。
摘要:Stance detection on social media can help to identify and understand slanted news or commentary in everyday life. In this work, we propose a new model for zero-shot stance detection on Twitter that uses adversarial learning to generalize across topics. Our model achieves state-of-the-art performance on a number of unseen test topics with minimal computational costs. In addition, we extend zero-shot stance detection to new topics, highlighting future directions for zero-shot transfer.
【3】 Joint Retrieval and Generation Training for Grounded Text Generation
標(biāo)題:用于基礎(chǔ)文本生成的聯(lián)合檢索和生成訓(xùn)練
作者:Yizhe Zhang,Siqi Sun,Xiang Gao,Yuwei Fang,Chris Brockett,Michel Galley,Jianfeng Gao,Bill Dolan
機(jī)構(gòu):Microsoft Corporation, Redmond, WA, USA
鏈接:https://arxiv.org/abs/2105.06597
摘要:近年來,GPT-3等大規(guī)模預(yù)訓(xùn)練技術(shù)的發(fā)展使得從給定的提示中生成看似高質(zhì)量的文本成為可能。然而,這樣的生成系統(tǒng)經(jīng)常會遇到幻覺事實的問題,并且在設(shè)計上并不包含有用的外部信息。扎根生成模型似乎提供了補(bǔ)救措施,但它們的訓(xùn)練通常依賴于很少可用的并行數(shù)據(jù),其中為上下文提供了相應(yīng)的文檔。我們提出了一個框架,通過在語言模型信號上聯(lián)合訓(xùn)練接地生成器和文檔檢索器來減輕這種數(shù)據(jù)約束。該模型學(xué)習(xí)檢索生成中效用最高的文檔,并在輸出中仔細(xì)地組合它們。我們證明,通過利用外部參照,我們的方法可以在散文和對話生成中產(chǎn)生更多信息和有趣的文本。
摘要:Recent advances in large-scale pre-training such as GPT-3 allow seemingly high quality text to be generated from a given prompt. However, such generation systems often suffer from problems of hallucinated facts, and are not inherently designed to incorporate useful external information. Grounded generation models appear to offer remedies, but their training typically relies on rarely-available parallel data where corresponding documents are provided for context. We propose a framework that alleviates this data constraint by jointly training a grounded generator and document retriever on the language model signal. The model learns to retrieve the documents with the highest utility in generation and attentively combines them in the output. We demonstrate that by taking advantage of external references our approach can produce more informative and interesting text in both prose and dialogue generation.
半/弱/無監(jiān)督|不確定性(1篇)
【1】 Shades of confusion: Lexical uncertainty modulates ad hoc coordination in an interactive communication task
標(biāo)題:念力的陰影:詞匯不確定性影響互動交流任務(wù)中的即席協(xié)調(diào)
作者:Sonia K. Murthy,Robert D. Hawkins,Thomas L. Griffiths
機(jī)構(gòu):Department of Psychology, Princeton University, Princeton, NJ, Allen Institute for Artificial Intelligence, Seattle, WA, Department of Computer Science, Princeton University, Princeton, NJ, Author Note
備注:under review
鏈接:https://arxiv.org/abs/2105.06546
摘要:溝通伙伴在互動中所帶來的期望存在很大的差異,從而產(chǎn)生誤解的可能性。為了直接探索這些差距和我們克服它們的能力,我們提出了一個基于顏色概念關(guān)聯(lián)的交流任務(wù)。在實驗1中,我們根據(jù)最新的概率理論建立了這些期望的心理表征的幾個關(guān)鍵屬性,即詞匯先驗。對于抽象概念來說,關(guān)聯(lián)是更可變的,可變性表現(xiàn)為每個個體內(nèi)部的不確定性,不確定性能夠準(zhǔn)確預(yù)測其他人是否可能共享相同的關(guān)聯(lián)。在實驗2中,我們研究了這些表達(dá)對交流的下游影響。最初,當(dāng)交流具有更多可變關(guān)聯(lián)的概念時,準(zhǔn)確率較低,但隨著參與者形成特別約定,準(zhǔn)確率迅速提高??傊?,我們的研究結(jié)果表明,人們應(yīng)對變化的方式是保持對伴侶的良好校準(zhǔn)的不確定性和對自己的適當(dāng)適應(yīng)性表征。
摘要:There is substantial variability in the expectations that communication partners bring into interactions, creating the potential for misunderstandings. To directly probe these gaps and our ability to overcome them, we propose a communication task based on color-concept associations. In Experiment 1, we establish several key properties of the mental representations of these expectations, or \emph{lexical priors}, based on recent probabilistic theories. Associations are more variable for abstract concepts, variability is represented as uncertainty within each individual, and uncertainty enables accurate predictions about whether others are likely to share the same association. In Experiment 2, we then examine the downstream consequences of these representations for communication. Accuracy is initially low when communicating about concepts with more variable associations, but rapidly increases as participants form ad hoc conventions. Together, our findings suggest that people cope with variability by maintaining well-calibrated uncertainty about their partner and appropriately adaptable representations of their own.
識別/分類(2篇)
【1】 Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
標(biāo)題:定位和標(biāo)注:嵌套命名實體識別的兩階段標(biāo)識符
作者:Yongliang Shen,Xinyin Ma,Zeqi Tan,Shuai Zhang,Wen Wang,Weiming Lu
機(jī)構(gòu):College of Computer Science and Technology, Zhejiang University, University of Science and Technology of China
備注:Accepted to ACL 2021, submission version
鏈接:https://arxiv.org/abs/2105.06804
摘要:命名實體識別(Named entity recognition,NER)是自然語言處理中的一個研究熱點。傳統(tǒng)的NER研究只涉及平面實體,忽略了嵌套實體?;趶V域的方法將實體識別視為廣域分類任務(wù)。這些方法雖然具有處理嵌套NER的能力,但計算量大,對邊界信息的忽略,對部分匹配實體的跨度利用不足,長實體識別困難。為了解決這些問題,我們提出了一種兩階段實體標(biāo)識符。首先通過對種子跨度進(jìn)行過濾和邊界回歸來生成跨度建議以定位實體,然后用相應(yīng)的類別標(biāo)記邊界調(diào)整后的跨度建議。該方法有效地利用了訓(xùn)練過程中實體和部分匹配跨度的邊界信息。通過邊界回歸,理論上可以覆蓋任意長度的實體,提高了對長實體的識別能力。此外,在第一階段中過濾掉許多低質(zhì)量的種子跨度,降低了推理的時間復(fù)雜度。在嵌套的NER數(shù)據(jù)集上的實驗表明,本文提出的方法優(yōu)于現(xiàn)有的模型。
摘要:Named entity recognition (NER) is a well-studied task in natural language processing. Traditional NER research only deals with flat entities and ignores nested entities. The span-based methods treat entity recognition as a span classification task. Although these methods have the innate ability to handle nested NER, they suffer from high computational cost, ignorance of boundary information, under-utilization of the spans that partially match with entities, and difficulties in long entity recognition. To tackle these issues, we propose a two-stage entity identifier. First we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories. Our method effectively utilizes the boundary information of entities and partially matched spans during training. Through boundary regression, entities of any length can be covered theoretically, which improves the ability to recognize long entities. In addition, many low-quality seed spans are filtered out in the first stage, which reduces the time complexity of inference. Experiments on nested NER datasets demonstrate that our proposed method outperforms previous state-of-the-art models.
【2】 Out-of-Manifold Regularization in Contextual Embedding Space for Text Classification
標(biāo)題:上下文嵌入空間中的流形外正則化文本分類
作者:Seonghyeon Lee,Dongha Lee,Hwanjo Yu
機(jī)構(gòu):Dept. of Computer Science and Engineering, POSTECH, Republic of Korea, Institute of Artificial Intelligence, POSTECH, Republic of Korea
備注:ACL2021 main conference
鏈接:https://arxiv.org/abs/2105.06750
摘要:最近關(guān)于預(yù)訓(xùn)練權(quán)值(即BERT)神經(jīng)網(wǎng)絡(luò)的研究主要集中在一個低維子空間上,即從輸入詞(或其上下文)計算出的嵌入向量所在的子空間。在這項工作中,我們提出了一種新的方法來尋找和規(guī)范剩余的空間,稱為外流形,這是無法通過文字訪問。具體地說,我們基于從實際觀察到的單詞中獲得的兩個嵌入來合成流形外嵌入,以利用它們來微調(diào)網(wǎng)絡(luò)。訓(xùn)練鑒別器來檢測輸入嵌入是否位于流形內(nèi)部,同時優(yōu)化生成器以產(chǎn)生新的嵌入,該鑒別器可以很容易地將其識別為流形外部的嵌入。這兩個模塊成功地以統(tǒng)一的端到端的方式協(xié)作來規(guī)范流形外的行為。我們對各種文本分類基準(zhǔn)的廣泛評估表明了我們的方法的有效性,以及它與旨在增強(qiáng)流形的現(xiàn)有數(shù)據(jù)增強(qiáng)技術(shù)的良好兼容性。
摘要:Recent studies on neural networks with pre-trained weights (i.e., BERT) have mainly focused on a low-dimensional subspace, where the embedding vectors computed from input words (or their contexts) are located. In this work, we propose a new approach to finding and regularizing the remainder of the space, referred to as out-of-manifold, which cannot be accessed through the words. Specifically, we synthesize the out-of-manifold embeddings based on two embeddings obtained from actually-observed words, to utilize them for fine-tuning the network. A discriminator is trained to detect whether an input embedding is located inside the manifold or not, and simultaneously, a generator is optimized to produce new embeddings that can be easily identified as out-of-manifold by the discriminator. These two modules successfully collaborate in a unified and end-to-end manner for regularizing the out-of-manifold. Our extensive evaluation on various text classification benchmarks demonstrates the effectiveness of our approach, as well as its good compatibility with existing data augmentation techniques which aim to enhance the manifold.
表征(1篇)
【1】 Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction
標(biāo)題:反事實干預(yù)揭示關(guān)系從句表征對協(xié)議預(yù)測的因果效應(yīng)
作者:Shauli Ravfogel,Grusha Prasad,Tal Linzen,Yoav Goldberg
機(jī)構(gòu):Computer Science Department, Bar Ilan University, Allen Institute for Artificial Intelligence, Cognitive Science Department, Johns Hopkins University, Department of Linguistics and Center for Data Science, New York University
備注:Equal contribution by SR and GP
鏈接:https://arxiv.org/abs/2105.06965
摘要:當(dāng)語言模型處理句法復(fù)雜的句子時,它們是以一種與英語語法一致的方式使用這些句子中的抽象句法信息,還是僅僅依賴于一組啟發(fā)式方法?我們提出了一個解決這個問題的方法,alterep。對于句子中的任何語言特征,AlterRep允許我們通過改變該特征的編碼方式來生成反事實表示,同時保留原始表示的所有其他方面。然后,通過測量不同句子中這些反事實表征對模型單詞預(yù)測的影響,我們可以得出關(guān)于模型使用語言特征的語境(如果有的話)的因果結(jié)論。應(yīng)用該方法研究BERT如何利用關(guān)系從句(RC)跨度信息,發(fā)現(xiàn)BERT在使用語言策略進(jìn)行一致性預(yù)測時使用了RC跨度信息。我們還發(fā)現(xiàn),為特定RC子類型生成的反事實表示會影響具有其他RC子類型的句子中的數(shù)量預(yù)測,這表明關(guān)于RC邊界的信息是抽象編碼在BERT表示中的。
摘要:When language models process syntactically complex sentences, do they use abstract syntactic information present in these sentences in a manner that is consistent with the grammar of English, or do they rely solely on a set of heuristics? We propose a method to tackle this question, AlterRep. For any linguistic feature in the sentence, AlterRep allows us to generate counterfactual representations by altering how this feature is encoded, while leaving all other aspects of the original representation intact. Then, by measuring the change in a models' word prediction with these counterfactual representations in different sentences, we can draw causal conclusions about the contexts in which the model uses the linguistic feature (if any). Applying this method to study how BERT uses relative clause (RC) span information, we found that BERT uses information about RC spans during agreement prediction using the linguistically strategy. We also found that counterfactual representations generated for a specific RC subtype influenced the number prediction in sentences with other RC subtypes, suggesting that information about RC boundaries was encoded abstractly in BERT's representation.
其他神經(jīng)網(wǎng)絡(luò)|深度學(xué)習(xí)|模型|建模(1篇)
【1】 Thank you BART! Rewarding Pre-Trained Models Improves Formality Style Transfer
標(biāo)題:謝謝你,巴特!獎勵預(yù)先培訓(xùn)的模特可以改善禮儀風(fēng)格的轉(zhuǎn)移
作者:Huiyuan Lai,Antonio Toral,Malvina Nissim
機(jī)構(gòu):CLCG, University of Groningen The Netherlands
鏈接:https://arxiv.org/abs/2105.06947
摘要:并行數(shù)據(jù)的缺乏導(dǎo)致形式化的傳輸模型在保存內(nèi)容方面很少成功。我們表明,微調(diào)預(yù)訓(xùn)練語言(GPT-2)和序列到序列(BART)模型可以增強(qiáng)內(nèi)容保存,而且即使在有限的并行數(shù)據(jù)量下,這也是可能的。通過以風(fēng)格和內(nèi)容(任務(wù)的兩個核心方面)為目標(biāo)的獎勵來增強(qiáng)這些模型,我們實現(xiàn)了一種新的技術(shù)水平。
摘要:Scarcity of parallel data causes formality style transfer models to have scarce success in preserving content. We show that fine-tuning pre-trained language (GPT-2) and sequence-to-sequence (BART) models boosts content preservation, and that this is possible even with limited amounts of parallel data. Augmenting these models with rewards that target style and content --the two core aspects of the task-- we achieve a new state-of-the-art.
其他(4篇)
【1】 Plot and Rework: Modeling Storylines for Visual Storytelling
標(biāo)題:情節(jié)和返工:為視覺講故事建模故事情節(jié)
作者:Chi-Yang Hsu,Yun-Wei Chu,Ting-Hao,Huang,Lun-Wei Ku
機(jī)構(gòu):Pennsylvania State University , Purdue University , Institute of Information Science, Academia Sinica
備注:Accepted by ACL'21 Findings; this is not the camera-ready version
鏈接:https://arxiv.org/abs/2105.06950
摘要:寫一個連貫而引人入勝的故事并不容易。有創(chuàng)造力的作家利用他們的知識和世界觀,把不連貫的元素組合在一起,形成一個連貫的故事情節(jié),并不斷地工作和修改,力求完美。然而,自動視覺故事講述(VIST)模型在嘗試創(chuàng)建故事時,很少使用外部知識和迭代生成。本文介紹了PR-VIST,一種將輸入圖像序列表示為故事圖的框架,在該框架中找到形成故事線的最佳路徑。然后PR-VIST走這條路,通過迭代訓(xùn)練過程學(xué)習(xí)生成最終故事。該框架產(chǎn)生的故事在多樣性、連貫性和人性化方面都優(yōu)于自動和人工評估。燒蝕研究表明,繪圖和修改都有助于提高模型的優(yōu)越性。
摘要:Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via an iterative training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.
【2】 Neural-Symbolic Commonsense Reasoner with Relation Predictors
標(biāo)題:帶關(guān)系預(yù)測的神經(jīng)-符號常識推理機(jī)
作者:Farhad Moghimifar,Lizhen Qu,Yue Zhuo,Gholamreza Haffari,Mahsa Baktashmotlagh
機(jī)構(gòu):The School of ITEE, The University of Queensland, Australia, Monash University, Australia, School of CSE, The University of New South Wales, Australia
備注:ACL2021
鏈接:https://arxiv.org/abs/2105.06717
摘要:常識推理的目的是將從常識知識圖(CKG)中提取的一組常識事實結(jié)合起來,得出關(guān)于一般情況的結(jié)論。常識知識的動態(tài)特性假設(shè)模型能夠在新情況下進(jìn)行多跳推理。這一特性還導(dǎo)致具有大規(guī)模稀疏知識圖,在這種情況下,需要這樣的推理過程來預(yù)測新事件之間的關(guān)系。然而,這一領(lǐng)域的現(xiàn)有方法由于將CKG視為一組有限的事實而受到限制,從而使它們不適合對新的看不見的情況和事件進(jìn)行推理。本文提出了一種神經(jīng)符號推理機(jī),它能夠?qū)Υ笠?guī)模動態(tài)CKG進(jìn)行推理。該模型在訓(xùn)練過程中學(xué)習(xí)了CKGs推理的邏輯規(guī)則。除了提供可解釋的解釋外,學(xué)習(xí)的邏輯規(guī)則有助于將預(yù)測推廣到新引入的事件。在CKGs鏈路預(yù)測任務(wù)上的實驗結(jié)果證明了該模型的有效性。
摘要:Commonsense reasoning aims to incorporate sets of commonsense facts, retrieved from Commonsense Knowledge Graphs (CKG), to draw conclusion about ordinary situations. The dynamic nature of commonsense knowledge postulates models capable of performing multi-hop reasoning over new situations. This feature also results in having large-scale sparse Knowledge Graphs, where such reasoning process is needed to predict relations between new events. However, existing approaches in this area are limited by considering CKGs as a limited set of facts, thus rendering them unfit for reasoning over new unseen situations and events. In this paper, we present a neural-symbolic reasoner, which is capable of reasoning over large-scale dynamic CKGs. The logic rules for reasoning over CKGs are learned during training by our model. In addition to providing interpretable explanation, the learned logic rules help to generalise prediction to newly introduced events. Experimental results on the task of link prediction on CKGs prove the effectiveness of our model by outperforming the state-of-the-art models.
【3】 DaLAJ - a dataset for linguistic acceptability judgments for Swedish: Format, baseline, sharing
標(biāo)題:DALAJ-瑞典語語言可接受性判斷的數(shù)據(jù)集:格式、基線、共享
作者:Elena Volodina,Yousuf Ali Mohammed,Julia Klezl
機(jī)構(gòu):University of Gothenburg, Sweden
備注:This is an extended version of an article accepted to the 10th NLP4CALL workshop (2021), Link\"oping Electronic Conference Proceedings 177, ISSN: 1650-3740 (online). In the extended version (available at arXiv) we have added a description of an experiment and baseline results to the dataset description accepted for NLP4CALL publication
鏈接:https://arxiv.org/abs/2105.06681
摘要:我們介紹了dalaj1.0,一個用于瑞典語可接受性判斷的數(shù)據(jù)集,第一個版本包含9596個句子;并將其用于二值分類任務(wù)的初步實驗。DaLAJ基于第二語言學(xué)習(xí)者的數(shù)據(jù),包括不同水平的文章。為了確保數(shù)據(jù)集可以免費使用,盡管GDPR的規(guī)定,我們有句子混亂的學(xué)習(xí)者論文和刪除部分元數(shù)據(jù)的學(xué)習(xí)者,為每個句子只保留有關(guān)母語的信息和課程水平的文章已經(jīng)寫了。我們使用學(xué)習(xí)者語言的規(guī)范化版本作為DaLAJ句子的基礎(chǔ),并且每個句子只保留一個錯誤。我們對句子中使用的每個單獨的更正標(biāo)記重復(fù)相同的句子。對于dalaj1.0,我們使用了四種錯誤類別(SweLL中有35種),它們都與詞匯或構(gòu)詞選擇有關(guān)。我們的二進(jìn)制分類的基線結(jié)果顯示,使用BERT嵌入的dalaj1.0的準(zhǔn)確率為58%。數(shù)據(jù)集包含在SwedishGlue(Swe)中。SuperLim)基準(zhǔn)。下面,我們將介紹數(shù)據(jù)集的格式、首次實驗、我們的見解以及選擇數(shù)據(jù)共享方法的動機(jī)。
摘要:We present DaLAJ 1.0, a Dataset for Linguistic Acceptability Judgments for Swedish, comprising 9 596 sentences in its first version; and the initial experiment using it for the binary classification task. DaLAJ is based on the SweLL second language learner data, consisting of essays at different levels of proficiency. To make sure the dataset can be freely available despite the GDPR regulations, we have sentence-scrambled learner essays and removed part of the metadata about learners, keeping for each sentence only information about the mother tongue and the level of the course where the essay has been written. We use the normalized version of learner language as the basis for the DaLAJ sentences, and keep only one error per sentence. We repeat the same sentence for each individual correction tag used in the sentence. For DaLAJ 1.0 we have used four error categories (out of 35 available in SweLL), all connected to lexical or word-building choices. Our baseline results for the binary classification show an accuracy of 58% for DaLAJ 1.0 using BERT embeddings. The dataset is included in the SwedishGlue (Swe. SuperLim) benchmark. Below, we describe the format of the dataset, first experiments, our insights and the motivation for the chosen approach to data sharing.
【4】 NLP is Not enough -- Contextualization of User Input in Chatbots
標(biāo)題:僅有NLP是不夠的--聊天機(jī)器人中用戶輸入的語境化
作者:Nathan Dolbir,Triyasha Dastidar,Kaushik Roy
機(jī)構(gòu):Artificial Intelligence Institute, University of South Carolina, BITS-Pilani Hyderabad
鏈接:https://arxiv.org/abs/2105.06511
摘要:近年來,AI聊天機(jī)器人在技術(shù)改進(jìn)方面取得了巨大進(jìn)步,已經(jīng)在許多行業(yè)投入使用?;谏疃染W(wǎng)絡(luò)的高級自然語言處理技術(shù)可以有效地處理用戶的請求,以實現(xiàn)其功能。隨著聊天機(jī)器人越來越受歡迎,由于負(fù)擔(dān)過重的系統(tǒng)降低了經(jīng)濟(jì)和人力成本,它們在醫(yī)療保健領(lǐng)域的適用性是一個很有吸引力的命題。然而,醫(yī)療機(jī)器人需要安全且醫(yī)學(xué)上精確的信息捕獲,而由于用戶文本和語音的變化,深度網(wǎng)絡(luò)還不能捕獲這些信息。符號結(jié)構(gòu)中的知識更適合于精確推理,但不能直接處理自然語言處理。因此,在本文中,我們研究了結(jié)合知識和神經(jīng)表示對聊天機(jī)器人安全性、準(zhǔn)確性和理解的影響。
摘要:AI chatbots have made vast strides in technology improvement in recent years and are already operational in many industries. Advanced Natural Language Processing techniques, based on deep networks, efficiently process user requests to carry out their functions. As chatbots gain traction, their applicability in healthcare is an attractive proposition due to the reduced economic and people costs of an overburdened system. However, healthcare bots require safe and medically accurate information capture, which deep networks aren't yet capable of due to user text and speech variations. Knowledge in symbolic structures is more suited for accurate reasoning but cannot handle natural language processing directly. Thus, in this paper, we study the effects of combining knowledge and neural representations on chatbot safety, accuracy, and understanding.
咨詢熱線
18611170056官方微信
返回頂部