穿越小说完本,好看的玄幻小说

自然語言處理學術速遞

https://www.linglab.cn/news/27462021年06月03日

自然語言處理學術速遞

cs.CL 方向，今日共計22篇

Transformer(1篇)

【1】 Classifying Long Clinical Documents with Pre-trained Transformers
標題：使用預先訓練的變形金剛對長篇臨床文檔進行分類

作者：Xin Su,Timothy Miller,Xiyu Ding,Majid Afshar,Dmitriy Dligach
機構：University of Arizona, Boston Children’s Hospital and Harvard Medical School, University of Wisconsin–Madison, Loyola University Chicago
鏈接：https://arxiv.org/abs/2105.06752

摘要：大規(guī)模數據集的提出促進了新聞摘要深層神經模型的研究。深度學習還可能對口語對話摘要有用，這有助于一系列實際場景，包括客戶服務管理和藥物跟蹤。為此，我們提出了DialSumm，一個大規(guī)模的有標簽的對話摘要數據集。我們使用最先進的神經摘要器對DialSumm進行了實證分析。實驗結果表明，對話摘要在口語術語、特殊的語篇結構、共指和省略、語用學和社會常識等方面面臨著獨特的挑戰(zhàn)，這些都需要特定的表征學習技術來更好地應對。
摘要：Automatic phenotyping is a task of identifying cohorts of patients that match a predefined set of criteria. Phenotyping typically involves classifying long clinical documents that contain thousands of tokens. At the same time, recent state-of-art transformer-based pre-trained language models limit the input to a few hundred tokens (e.g. 512 tokens for BERT). We evaluate several strategies for incorporating pre-trained sentence encoders into document-level representations of clinical text, and find that hierarchical transformers without pre-training are competitive with task pre-trained models.

BERT(2篇)

【1】 BERT Busters: Outlier LayerNorm Dimensions that Disrupt BERT
標題：伯特·巴斯特：擾亂伯特的離群層范數維度

作者：Olga Kovaleva,Saurabh Kulshreshtha,Anna Rogers,Anna Rumshisky
機構：Department of Computer Science, University of Massachusetts Lowell, Center for Social Data Science, University of Copenhagen
備注：Accepted as long paper at Findings of ACL 2021
鏈接：https://arxiv.org/abs/2105.06990

摘要：我們生活在一個重要的時代。科學界擁有一個宇宙信使的兵工廠，可以對宇宙進行前所未有的詳細研究。引力波、電磁波、中微子和宇宙射線涵蓋了廣泛的波長和時間尺度。結合和處理這些在數量、速度和維度上各不相同的數據集需要新的儀器協調模式、資金籌措模式和國際合作模式以及專門的人力和技術基礎設施。隨著大規(guī)模科學設施的出現，過去十年在計算和信號處理算法方面經歷了前所未有的變革。圖形處理單元、深度學習和開源高質量數據集的可用性的結合，推動了人工智能的興起。這場數字革命現在推動了一個價值數十億美元的產業(yè)，對技術和社會產生了深遠的影響。在這一章中，我們描述了開創(chuàng)性的努力，以適應人工智能算法，以解決計算的巨大挑戰(zhàn)，在多信使天體物理學。我們回顧了這些破壞性算法的快速發(fā)展，從2017年初推出的第一類算法，到如今將領域專業(yè)知識融入其架構設計和優(yōu)化方案的復雜算法。我們討論了科學可視化和極端規(guī)模計算在減少洞察時間和從模型和數據之間的相互作用中獲得新知識方面的重要性。
摘要：Multiple studies have shown that BERT is remarkably robust to pruning, yet few if any of its components retain high importance across downstream tasks. Contrary to this received wisdom, we demonstrate that pre-trained Transformer encoders are surprisingly fragile to the removal of a very small number of scaling factors and biases in the output layer normalization (<0.0001% of model weights). These are high-magnitude normalization parameters that emerge early in pre-training and show up consistently in the same dimensional position throughout the model. They are present in all six models of BERT family that we examined and removing them significantly degrades both the MLM perplexity and the downstream task performance. Our results suggest that layer normalization plays a much more important role than usually assumed.

【2】 Distilling BERT for low complexity network training
標題：用于低復雜度網絡訓練的BERT提取

作者：Bansidhar Mangalwedhekar
鏈接：https://arxiv.org/abs/2105.06514

摘要：利用SST-2數據集上的情感分析，研究了將BERT學習轉化為低復雜度模型BiLSTM、帶注意的BiLSTM和淺層CNNs的效率。本文還比較了BERT模型與這些低復雜度模型的推理復雜度，并強調了這些技術在邊緣設備（如手機、平板電腦和Raspberry-Pi等MCU開發(fā)板）上實現高性能NLP模型以及實現令人興奮的新應用方面的重要性。
摘要：This paper studies the efficiency of transferring BERT learnings to low complexity models like BiLSTM, BiLSTM with attention and shallow CNNs using sentiment analysis on SST-2 dataset. It also compares the complexity of inference of the BERT model with these lower complexity models and underlines the importance of these techniques in enabling high performance NLP models on edge devices like mobiles, tablets and MCU development boards like Raspberry Pi etc. and enabling exciting new applications.

QA|VQA|問答|對話(1篇)

【1】 QAConv: Question Answering on Informative Conversations
標題：QAConv：信息性對話的問答

作者：Chien-Sheng Wu,Andrea Madotto,Wenhao Liu,Pascale Fung,Caiming Xiong
機構：?Salesforce AI Research, ?The Hong Kong University of Science and Technology
備注：Data and code are available at this https URL
鏈接：https://arxiv.org/abs/2105.06912

摘要：本文介紹了一種新的問答數據集qacev，它利用會話作為知識源。我們專注于信息交流，包括商務郵件、小組討論和工作渠道。與開放領域和面向任務的對話不同，這些對話通常是長的、復雜的、異步的，并且涉及到很強的領域知識?？偟膩碚f，我們收集了34204對問答，包括基于廣度的、自由形式的和無法回答的問題，從10259個選擇的對話中，包括人類書面和機器生成的問題。我們將長對話分段，并使用問題生成器和對話摘要器作為輔助工具來收集多跳問題。數據集有兩種測試場景，chunk模式和full模式，這取決于固定的chunk是提供的還是從大型會話池中檢索的。實驗結果表明，在現有QA數據集上訓練的最新QA系統(tǒng)具有有限的零射擊能力，并且傾向于預測我們的問題是無法回答的。在我們的語料庫上對這樣的系統(tǒng)進行微調可以分別在塊模式和全模式下獲得23.6%和13.6%的顯著改善。
摘要：This paper introduces QAConv, a new question answering (QA) dataset that uses conversations as a knowledge source. We focus on informative conversations including business emails, panel discussions, and work channels. Unlike open-domain and task-oriented dialogues, these conversations are usually long, complex, asynchronous, and involve strong domain knowledge. In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions, from 10,259 selected conversations with both human-written and machine-generated questions. We segment long conversations into chunks, and use a question generator and dialogue summarizer as auxiliary tools to collect multi-hop questions. The dataset has two testing scenarios, chunk mode and full mode, depending on whether the grounded chunk is provided or retrieved from a large conversational pool. Experimental results show that state-of-the-art QA systems trained on existing QA datasets have limited zero-shot ability and tend to predict our questions as unanswerable. Fine-tuning such systems on our corpus can achieve significant improvement up to 23.6% and 13.6% in both chunk mode and full mode, respectively.

機器翻譯(2篇)

【1】 Do Context-Aware Translation Models Pay the Right Attention?
標題：語境感知翻譯模式是否得到了應有的重視？

作者：Kayo Yin,Patrick Fernandes,Danish Pruthi,Aditi Chaudhary,André F. T. Martins,Graham Neubig
機構：Andr′e F. T. Martins, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, Instituto de Telecomunicac??oes, Lisbon, Portugal, Unbabel, Lisbon, Portugal
備注：Accepted to ACL2021
鏈接：https://arxiv.org/abs/2105.06977

摘要：上下文感知機器翻譯模型旨在利用上下文信息，但往往不能做到這一點。結果，他們錯誤地消除了代詞和多義詞的歧義，這些詞需要上下文來解決。在本文中，我們提出了幾個問題：人類譯者使用什么樣的語境來解決歧義詞？模型是否大量關注同一背景？如果我們明確地訓練他們這樣做呢？為了回答這些問題，我們引入了SCAT（Supporting Context for difficious Translations），這是一個新的英法數據集，包含14K翻譯的支持上下文詞，專業(yè)翻譯人員發(fā)現它對代詞消歧很有用。使用SCAT，我們對用于消除歧義的上下文進行了深入分析，檢查了支持詞的位置和詞匯特征。此外，我們還測量了模型的注意分數與來自SCAT的支持上下文之間的一致程度，并應用引導注意策略來鼓勵兩者之間的一致性。
摘要：Context-aware machine translation models are designed to leverage contextual information, but often fail to do so. As a result, they inaccurately disambiguate pronouns and polysemous words that require context for resolution. In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words? Are models paying large amounts of attention to the same context? What if we explicitly train them to do so? To answer these questions, we introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations that professional translators found useful for pronoun disambiguation. Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words. Furthermore, we measure the degree of alignment between the model's attention scores and the supporting context from SCAT, and apply a guided attention strategy to encourage agreement between the two.

【2】 Dynamic Multi-Branch Layers for On-Device Neural Machine Translation
標題：在設備神經機器翻譯中的動態(tài)多分支層

作者：Zhixing Tan,Maosong Sun,Yang Liu
機構：Department of Computer Science and Technology, Tsinghua University, Institute for AI Industry Research, Tsinghua University, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology
鏈接：https://arxiv.org/abs/2105.06679

摘要：我們介紹了dalaj1.0，一個用于瑞典語可接受性判斷的數據集，第一個版本包含9596個句子；并將其用于二值分類任務的初步實驗。DaLAJ基于第二語言學習者的數據，包括不同水平的文章。為了確保數據集可以免費使用，盡管GDPR的規(guī)定，我們有句子混亂的學習者論文和刪除部分元數據的學習者，為每個句子只保留有關母語的信息和課程水平的文章已經寫了。我們使用學習者語言的規(guī)范化版本作為DaLAJ句子的基礎，并且每個句子只保留一個錯誤。我們對句子中使用的每個單獨的更正標記重復相同的句子。對于dalaj1.0，我們使用了四種錯誤類別（SweLL中有35種），它們都與詞匯或構詞選擇有關。我們的二進制分類的基線結果顯示，使用BERT嵌入的dalaj1.0的準確率為58%。數據集包含在SwedishGlue（Swe）中。SuperLim）基準。下面，我們將介紹數據集的格式、首次實驗、我們的見解以及選擇數據共享方法的動機。
摘要：With the rapid development of artificial intelligence (AI), there is a trend in moving AI applications such as neural machine translation (NMT) from cloud to mobile devices such as smartphones. Constrained by limited hardware resources and battery, the performance of on-device NMT systems is far from satisfactory. Inspired by conditional computation, we propose to improve the performance of on-device NMT systems with dynamic multi-branch layers. Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference. As not all branches are activated during training, we propose shared-private reparameterization to ensure sufficient training for each branch. At almost the same computational cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14 English-German translation task and 1.8 BLEU points on the WMT20 Chinese-English translation task over the Transformer model, respectively. Compared with a strong baseline that also uses multiple branches, the proposed method is up to 1.6 times faster with the same number of parameters.

摘要|信息提取(2篇)

【1】 EASE: Extractive-Abstractive Summarization with Explanations
標題：輕松：帶解釋的摘要摘要

作者：Haoran Li,Arash Einolghozati,Srinivasan Iyer,Bhargavi Paranjape,Yashar Mehdad,Sonal Gupta,Marjan Ghazvininejad
機構：Facebook
鏈接：https://arxiv.org/abs/2105.06982

摘要：當前的摘要系統(tǒng)在性能上優(yōu)于抽取式摘要系統(tǒng)，但由于其固有的可解釋性不足，限制了其廣泛應用。為了達到兩全其美的效果，我們提出了一個基于證據的文本生成的抽象框架EASE，并將其應用到文檔摘要中。我們提出了一個基于信息瓶頸原理的可解釋摘要系統(tǒng)，該系統(tǒng)以端到端的方式聯合訓練用于抽取和抽象。受先前研究的啟發(fā)，人類使用兩階段框架來總結長文檔（Jing和McKeown，2000），我們的框架首先提取預定義數量的證據跨度作為解釋，然后僅使用證據生成摘要。使用自動和人工評估，我們表明，我們的框架中的解釋比簡單的基線更相關，而不會實質性地犧牲生成摘要的質量。
摘要：Current abstractive summarization systems outperform their extractive counterparts, but their widespread adoption is inhibited by the inherent lack of interpretability. To achieve the best of both worlds, we propose EASE, an extractive-abstractive framework for evidence-based text generation and apply it to document summarization. We present an explainable summarization system based on the Information Bottleneck principle that is jointly trained for extraction and abstraction in an end-to-end fashion. Inspired by previous research that humans use a two-stage framework to summarize long documents (Jing and McKeown, 2000), our framework first extracts a pre-defined amount of evidence spans as explanations and then generates a summary using only the evidence. Using automatic and human evaluations, we show that explanations from our framework are more relevant than simple baselines, without substantially sacrificing the quality of the generated summary.

【2】 DialSumm: A Real-Life Scenario Dialogue Summarization Dataset
標題：DialSumm：一個真實場景對話摘要數據集

作者：Yulong Chen,Yang Liu,Liang Chen,Yue Zhang
機構：? Zhejiang University, ? School of Engineering, Westlake University, ? Microsoft Cognitive Services Research, ? College of Software, Jilin University, ? Institute of Advanced Technology, Westlake Institute for Advanced Study
備注：ACL findings
鏈接：https://arxiv.org/abs/2105.06762

摘要：大規(guī)模數據集的提出促進了新聞摘要深層神經模型的研究。深度學習還可能對口語對話摘要有用，這有助于一系列實際場景，包括客戶服務管理和藥物跟蹤。為此，我們提出了DialSumm，一個大規(guī)模的有標簽的對話摘要數據集。我們使用最先進的神經摘要器對DialSumm進行了實證分析。實驗結果表明，對話摘要在口語術語、特殊的語篇結構、共指和省略、語用學和社會常識等方面面臨著獨特的挑戰(zhàn)，這些都需要特定的表征學習技術來更好地應對。
摘要：Proposal of large-scale datasets has facilitated research on deep neural models for news summarization. Deep learning can also be potentially useful for spoken dialogue summarization, which can benefit a range of real-life scenarios including customer service management and medication tracking. To this end, we propose DialSumm, a large-scale labeled dialogue summarization dataset. We conduct empirical analysis on DialSumm using state-of-the-art neural summarizers. Experimental results show unique challenges in dialogue summarization, such as spoken terms, special discourse structures, coreferences and ellipsis, pragmatics and social commonsense, which require specific representation learning technologies to better deal with.

推理|分析|理解|解釋(2篇)

【1】 Towards Navigation by Reasoning over Spatial Configurations
標題：通過空間構型推理實現導航

作者：Yue Zhang,Quan Guo,Parisa Kordjamshidi
機構：Michigan State University
鏈接：https://arxiv.org/abs/2105.06839

摘要：我們處理了一個導航問題，其中agent在觀察環(huán)境的同時遵循自然語言的指令。以語言理解為重點，我們展示了空間語義在將導航指令根植于視覺感知中的重要性。我們提出了一種利用空間結構元素的神經代理，并研究了它們對導航代理推理能力的影響。此外，我們還建立了順序執(zhí)行順序的模型，并將可視對象與指令中的空間配置對齊。我們的神經代理在可見的環(huán)境中改進了強基線，并在不可見的環(huán)境中顯示出競爭性能。此外，實驗結果表明，對指令中的空間語義元素進行顯式建?？梢蕴岣吣Ｐ偷幕A性和空間推理能力。
摘要：We deal with the navigation problem where the agent follows natural language instructions while observing the environment. Focusing on language understanding, we show the importance of spatial semantics in grounding navigation instructions into visual perceptions. We propose a neural agent that uses the elements of spatial configurations and investigate their influence on the navigation agent's reasoning ability. Moreover, we model the sequential execution order and align visual objects with spatial configurations in the instruction. Our neural agent improves strong baselines on the seen environments and shows competitive performance on the unseen environments. Additionally, the experimental results demonstrate that explicit modeling of spatial semantic elements in the instructions can improve the grounding and spatial reasoning of the model.

【2】 A cost-benefit analysis of cross-lingual transfer methods
標題：跨語言遷移方式的成本效益分析

作者：Guilherme Moraes Rosa,Luiz Henrique Bonifacio,Leandro Rodrigues de Souza,Roberto Lotufo,Rodrigo Nogueira
機構：University of Campinas (UNICAMP), NeuralMind Inteligência Artificial, David R. Cheriton School of Computer Science, University of Waterloo
鏈接：https://arxiv.org/abs/2105.06813

摘要：一種有效的跨語言遷移方法是在一種語言的有監(jiān)督數據集上對雙語或多語模型進行微調，并在另一種語言上以零鏡頭方式進行評估。在訓練時或推理時翻譯實例也是可行的選擇。然而，與這些方法相關的成本在文獻中很少提及。在這項工作中，我們分析了跨語言方法的有效性（如準確性）、開發(fā)和部署成本，以及它們在推理時的延遲。我們在三個任務上的實驗表明，最好的跨語言方法是高度依賴于任務的。最后，通過結合零鏡頭和翻譯方法，我們實現了本工作中使用的三個數據集中的兩個數據集的最新技術?；谶@些結果，我們質疑是否需要在目標語言中手動標記訓練數據。代碼、模型和翻譯數據集可在https://github.com/unicamp-dl/cross-lingual-analysis
摘要：An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner. Translating examples at training time or inference time are also viable alternatives. However, there are costs associated with these methods that are rarely addressed in the literature. In this work, we analyze cross-lingual methods in terms of their effectiveness (e.g., accuracy), development and deployment costs, as well as their latencies at inference time. Our experiments on three tasks indicate that the best cross-lingual method is highly task-dependent. Finally, by combining zero-shot and translation methods, we achieve the state-of-the-art in two of the three datasets used in this work. Based on these results, we question the need for manually labeled training data in a target language. Code, models and translated datasets are available at https://github.com/unicamp-dl/cross-lingual-analysis

GAN|對抗|攻擊|生成相關(3篇)

【1】 Generating Empathetic Responses with a Large Scale Dialog Dataset
標題：使用大規(guī)模對話數據集生成感同身受的響應

作者：Yubo Xie,Pearl Pu
機構：School of Computer and Communication Sciences, ′Ecole Polytechnique F′ed′erale de Lausanne, Switzerland
鏈接：https://arxiv.org/abs/2105.06829

摘要：移情反應生成的任務旨在生成語法正確的反應，更重要的是，在前面的對話之后生成情感上合適的反應?，F有的模型要么直接引入預定義的情感信息來指導反應的產生，要么使用確定性規(guī)則來決定反應的情感，忽略了人類對話中捕捉到的微妙的情感交互。隨著高級語言模型的出現，學習自然語言對話中捕捉到的微妙的情感交流成為可能。為了充分探索情感和對話意圖的范圍，重要的是要整理一個足夠大的數據集，以闡明在我們的對話中人類情感互動的一般理解。在這篇文章中，我們詳細描述了一個大規(guī)模對話數據集的整理過程，其中每個話語被標記為32種情感和9種意圖類別中的一種。然后，我們將展示如何建立一個多回合共情對話模型，該模型與6000多個人類評估實例的基線相比表現良好。
摘要：The task of empathetic response generation aims at generating syntactically correct and, more importantly, emotionally appropriate responses following previous dialog turns. Existing models either directly incorporate pre-defined emotion information to guide the response generation, or use deterministic rules to decide the response emotion, ignoring the subtle emotion interactions captured in human conversations. With the advent of advanced language models, it is possible to learn the nuanced emotional exchanges captured in natural language dialogs. To fully explore the range of emotions and dialog intents, it is important to curate a dataset large enough to shed light on the general understanding of human emotional interactions in our conversations. In this paper, we describe in detail the curation process of a large-scale dialog dataset where each utterance is labeled with one of 32 emotions and 9 intent categories. We then show how to build a multi-turn empathetic dialog model that performs well compared to its baselines over 6,000 human evaluated instances.

【2】 Adversarial Learning for Zero-Shot Stance Detection on Social Media
標題：社交媒體上零射姿態(tài)檢測的對抗性學習

作者：Emily Allaway,Malavika Srikanth,Kathleen McKeown
機構：Department of Computer Science, Columbia University, New York, NY
備注：To appear in NAACL 2021
鏈接：https://arxiv.org/abs/2105.06603

摘要：社交媒體上的立場檢測有助于識別和理解日常生活中的傾斜新聞或評論。在這項工作中，我們提出了一個新的模型零射擊姿態(tài)檢測在Twitter上，使用對抗性學習，以推廣跨主題。我們的模型在一些看不見的測試主題上以最小的計算成本實現了最先進的性能。此外，我們將零鏡頭姿態(tài)檢測擴展到新的主題，突出了零鏡頭轉移的未來方向。
摘要：Stance detection on social media can help to identify and understand slanted news or commentary in everyday life. In this work, we propose a new model for zero-shot stance detection on Twitter that uses adversarial learning to generalize across topics. Our model achieves state-of-the-art performance on a number of unseen test topics with minimal computational costs. In addition, we extend zero-shot stance detection to new topics, highlighting future directions for zero-shot transfer.

【3】 Joint Retrieval and Generation Training for Grounded Text Generation
標題：用于基礎文本生成的聯合檢索和生成訓練

作者：Yizhe Zhang,Siqi Sun,Xiang Gao,Yuwei Fang,Chris Brockett,Michel Galley,Jianfeng Gao,Bill Dolan
機構：Microsoft Corporation, Redmond, WA, USA
鏈接：https://arxiv.org/abs/2105.06597

摘要：近年來，GPT-3等大規(guī)模預訓練技術的發(fā)展使得從給定的提示中生成看似高質量的文本成為可能。然而，這樣的生成系統(tǒng)經常會遇到幻覺事實的問題，并且在設計上并不包含有用的外部信息。扎根生成模型似乎提供了補救措施，但它們的訓練通常依賴于很少可用的并行數據，其中為上下文提供了相應的文檔。我們提出了一個框架，通過在語言模型信號上聯合訓練接地生成器和文檔檢索器來減輕這種數據約束。該模型學習檢索生成中效用最高的文檔，并在輸出中仔細地組合它們。我們證明，通過利用外部參照，我們的方法可以在散文和對話生成中產生更多信息和有趣的文本。
摘要：Recent advances in large-scale pre-training such as GPT-3 allow seemingly high quality text to be generated from a given prompt. However, such generation systems often suffer from problems of hallucinated facts, and are not inherently designed to incorporate useful external information. Grounded generation models appear to offer remedies, but their training typically relies on rarely-available parallel data where corresponding documents are provided for context. We propose a framework that alleviates this data constraint by jointly training a grounded generator and document retriever on the language model signal. The model learns to retrieve the documents with the highest utility in generation and attentively combines them in the output. We demonstrate that by taking advantage of external references our approach can produce more informative and interesting text in both prose and dialogue generation.

半/弱/無監(jiān)督|不確定性(1篇)

【1】 Shades of confusion: Lexical uncertainty modulates ad hoc coordination in an interactive communication task
標題：念力的陰影：詞匯不確定性影響互動交流任務中的即席協調

作者：Sonia K. Murthy,Robert D. Hawkins,Thomas L. Griffiths
機構：Department of Psychology, Princeton University, Princeton, NJ, Allen Institute for Artificial Intelligence, Seattle, WA, Department of Computer Science, Princeton University, Princeton, NJ, Author Note
備注：under review
鏈接：https://arxiv.org/abs/2105.06546

摘要：溝通伙伴在互動中所帶來的期望存在很大的差異，從而產生誤解的可能性。為了直接探索這些差距和我們克服它們的能力，我們提出了一個基于顏色概念關聯的交流任務。在實驗1中，我們根據最新的概率理論建立了這些期望的心理表征的幾個關鍵屬性，即詞匯先驗。對于抽象概念來說，關聯是更可變的，可變性表現為每個個體內部的不確定性，不確定性能夠準確預測其他人是否可能共享相同的關聯。在實驗2中，我們研究了這些表達對交流的下游影響。最初，當交流具有更多可變關聯的概念時，準確率較低，但隨著參與者形成特別約定，準確率迅速提高?？傊覀兊难芯拷Y果表明，人們應對變化的方式是保持對伴侶的良好校準的不確定性和對自己的適當適應性表征。
摘要：There is substantial variability in the expectations that communication partners bring into interactions, creating the potential for misunderstandings. To directly probe these gaps and our ability to overcome them, we propose a communication task based on color-concept associations. In Experiment 1, we establish several key properties of the mental representations of these expectations, or \emph{lexical priors}, based on recent probabilistic theories. Associations are more variable for abstract concepts, variability is represented as uncertainty within each individual, and uncertainty enables accurate predictions about whether others are likely to share the same association. In Experiment 2, we then examine the downstream consequences of these representations for communication. Accuracy is initially low when communicating about concepts with more variable associations, but rapidly increases as participants form ad hoc conventions. Together, our findings suggest that people cope with variability by maintaining well-calibrated uncertainty about their partner and appropriately adaptable representations of their own.

識別/分類(2篇)

【1】 Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
標題：定位和標注：嵌套命名實體識別的兩階段標識符

作者：Yongliang Shen,Xinyin Ma,Zeqi Tan,Shuai Zhang,Wen Wang,Weiming Lu
機構：College of Computer Science and Technology, Zhejiang University, University of Science and Technology of China
備注：Accepted to ACL 2021, submission version
鏈接：https://arxiv.org/abs/2105.06804

摘要：命名實體識別（Named entity recognition，NER）是自然語言處理中的一個研究熱點。傳統(tǒng)的NER研究只涉及平面實體，忽略了嵌套實體?；趶V域的方法將實體識別視為廣域分類任務。這些方法雖然具有處理嵌套NER的能力，但計算量大，對邊界信息的忽略，對部分匹配實體的跨度利用不足，長實體識別困難。為了解決這些問題，我們提出了一種兩階段實體標識符。首先通過對種子跨度進行過濾和邊界回歸來生成跨度建議以定位實體，然后用相應的類別標記邊界調整后的跨度建議。該方法有效地利用了訓練過程中實體和部分匹配跨度的邊界信息。通過邊界回歸，理論上可以覆蓋任意長度的實體，提高了對長實體的識別能力。此外，在第一階段中過濾掉許多低質量的種子跨度，降低了推理的時間復雜度。在嵌套的NER數據集上的實驗表明，本文提出的方法優(yōu)于現有的模型。
摘要：Named entity recognition (NER) is a well-studied task in natural language processing. Traditional NER research only deals with flat entities and ignores nested entities. The span-based methods treat entity recognition as a span classification task. Although these methods have the innate ability to handle nested NER, they suffer from high computational cost, ignorance of boundary information, under-utilization of the spans that partially match with entities, and difficulties in long entity recognition. To tackle these issues, we propose a two-stage entity identifier. First we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories. Our method effectively utilizes the boundary information of entities and partially matched spans during training. Through boundary regression, entities of any length can be covered theoretically, which improves the ability to recognize long entities. In addition, many low-quality seed spans are filtered out in the first stage, which reduces the time complexity of inference. Experiments on nested NER datasets demonstrate that our proposed method outperforms previous state-of-the-art models.

【2】 Out-of-Manifold Regularization in Contextual Embedding Space for Text Classification
標題：上下文嵌入空間中的流形外正則化文本分類

作者：Seonghyeon Lee,Dongha Lee,Hwanjo Yu
機構：Dept. of Computer Science and Engineering, POSTECH, Republic of Korea, Institute of Artificial Intelligence, POSTECH, Republic of Korea
備注：ACL2021 main conference
鏈接：https://arxiv.org/abs/2105.06750

摘要：最近關于預訓練權值（即BERT）神經網絡的研究主要集中在一個低維子空間上，即從輸入詞（或其上下文）計算出的嵌入向量所在的子空間。在這項工作中，我們提出了一種新的方法來尋找和規(guī)范剩余的空間，稱為外流形，這是無法通過文字訪問。具體地說，我們基于從實際觀察到的單詞中獲得的兩個嵌入來合成流形外嵌入，以利用它們來微調網絡。訓練鑒別器來檢測輸入嵌入是否位于流形內部，同時優(yōu)化生成器以產生新的嵌入，該鑒別器可以很容易地將其識別為流形外部的嵌入。這兩個模塊成功地以統(tǒng)一的端到端的方式協作來規(guī)范流形外的行為。我們對各種文本分類基準的廣泛評估表明了我們的方法的有效性，以及它與旨在增強流形的現有數據增強技術的良好兼容性。
摘要：Recent studies on neural networks with pre-trained weights (i.e., BERT) have mainly focused on a low-dimensional subspace, where the embedding vectors computed from input words (or their contexts) are located. In this work, we propose a new approach to finding and regularizing the remainder of the space, referred to as out-of-manifold, which cannot be accessed through the words. Specifically, we synthesize the out-of-manifold embeddings based on two embeddings obtained from actually-observed words, to utilize them for fine-tuning the network. A discriminator is trained to detect whether an input embedding is located inside the manifold or not, and simultaneously, a generator is optimized to produce new embeddings that can be easily identified as out-of-manifold by the discriminator. These two modules successfully collaborate in a unified and end-to-end manner for regularizing the out-of-manifold. Our extensive evaluation on various text classification benchmarks demonstrates the effectiveness of our approach, as well as its good compatibility with existing data augmentation techniques which aim to enhance the manifold.

表征(1篇)

【1】 Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction
標題：反事實干預揭示關系從句表征對協議預測的因果效應

作者：Shauli Ravfogel,Grusha Prasad,Tal Linzen,Yoav Goldberg
機構：Computer Science Department, Bar Ilan University, Allen Institute for Artificial Intelligence, Cognitive Science Department, Johns Hopkins University, Department of Linguistics and Center for Data Science, New York University
備注：Equal contribution by SR and GP
鏈接：https://arxiv.org/abs/2105.06965

摘要：當語言模型處理句法復雜的句子時，它們是以一種與英語語法一致的方式使用這些句子中的抽象句法信息，還是僅僅依賴于一組啟發(fā)式方法？我們提出了一個解決這個問題的方法，alterep。對于句子中的任何語言特征，AlterRep允許我們通過改變該特征的編碼方式來生成反事實表示，同時保留原始表示的所有其他方面。然后，通過測量不同句子中這些反事實表征對模型單詞預測的影響，我們可以得出關于模型使用語言特征的語境（如果有的話）的因果結論。應用該方法研究BERT如何利用關系從句（RC）跨度信息，發(fā)現BERT在使用語言策略進行一致性預測時使用了RC跨度信息。我們還發(fā)現，為特定RC子類型生成的反事實表示會影響具有其他RC子類型的句子中的數量預測，這表明關于RC邊界的信息是抽象編碼在BERT表示中的。
摘要：When language models process syntactically complex sentences, do they use abstract syntactic information present in these sentences in a manner that is consistent with the grammar of English, or do they rely solely on a set of heuristics? We propose a method to tackle this question, AlterRep. For any linguistic feature in the sentence, AlterRep allows us to generate counterfactual representations by altering how this feature is encoded, while leaving all other aspects of the original representation intact. Then, by measuring the change in a models' word prediction with these counterfactual representations in different sentences, we can draw causal conclusions about the contexts in which the model uses the linguistic feature (if any). Applying this method to study how BERT uses relative clause (RC) span information, we found that BERT uses information about RC spans during agreement prediction using the linguistically strategy. We also found that counterfactual representations generated for a specific RC subtype influenced the number prediction in sentences with other RC subtypes, suggesting that information about RC boundaries was encoded abstractly in BERT's representation.

其他神經網絡|深度學習|模型|建模(1篇)

【1】 Thank you BART! Rewarding Pre-Trained Models Improves Formality Style Transfer
標題：謝謝你，巴特！獎勵預先培訓的模特可以改善禮儀風格的轉移

作者：Huiyuan Lai,Antonio Toral,Malvina Nissim
機構：CLCG, University of Groningen The Netherlands
鏈接：https://arxiv.org/abs/2105.06947

摘要：并行數據的缺乏導致形式化的傳輸模型在保存內容方面很少成功。我們表明，微調預訓練語言（GPT-2）和序列到序列（BART）模型可以增強內容保存，而且即使在有限的并行數據量下，這也是可能的。通過以風格和內容（任務的兩個核心方面）為目標的獎勵來增強這些模型，我們實現了一種新的技術水平。
摘要：Scarcity of parallel data causes formality style transfer models to have scarce success in preserving content. We show that fine-tuning pre-trained language (GPT-2) and sequence-to-sequence (BART) models boosts content preservation, and that this is possible even with limited amounts of parallel data. Augmenting these models with rewards that target style and content --the two core aspects of the task-- we achieve a new state-of-the-art.

其他(4篇)

【1】 Plot and Rework: Modeling Storylines for Visual Storytelling
標題：情節(jié)和返工：為視覺講故事建模故事情節(jié)

作者：Chi-Yang Hsu,Yun-Wei Chu,Ting-Hao,Huang,Lun-Wei Ku
機構：Pennsylvania State University , Purdue University , Institute of Information Science, Academia Sinica
備注：Accepted by ACL'21 Findings; this is not the camera-ready version
鏈接：https://arxiv.org/abs/2105.06950

摘要：寫一個連貫而引人入勝的故事并不容易。有創(chuàng)造力的作家利用他們的知識和世界觀，把不連貫的元素組合在一起，形成一個連貫的故事情節(jié)，并不斷地工作和修改，力求完美。然而，自動視覺故事講述（VIST）模型在嘗試創(chuàng)建故事時，很少使用外部知識和迭代生成。本文介紹了PR-VIST，一種將輸入圖像序列表示為故事圖的框架，在該框架中找到形成故事線的最佳路徑。然后PR-VIST走這條路，通過迭代訓練過程學習生成最終故事。該框架產生的故事在多樣性、連貫性和人性化方面都優(yōu)于自動和人工評估。燒蝕研究表明，繪圖和修改都有助于提高模型的優(yōu)越性。
摘要：Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via an iterative training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

【2】 Neural-Symbolic Commonsense Reasoner with Relation Predictors
標題：帶關系預測的神經-符號常識推理機

作者：Farhad Moghimifar,Lizhen Qu,Yue Zhuo,Gholamreza Haffari,Mahsa Baktashmotlagh
機構：The School of ITEE, The University of Queensland, Australia, Monash University, Australia, School of CSE, The University of New South Wales, Australia
備注：ACL2021
鏈接：https://arxiv.org/abs/2105.06717

摘要：常識推理的目的是將從常識知識圖（CKG）中提取的一組常識事實結合起來，得出關于一般情況的結論。常識知識的動態(tài)特性假設模型能夠在新情況下進行多跳推理。這一特性還導致具有大規(guī)模稀疏知識圖，在這種情況下，需要這樣的推理過程來預測新事件之間的關系。然而，這一領域的現有方法由于將CKG視為一組有限的事實而受到限制，從而使它們不適合對新的看不見的情況和事件進行推理。本文提出了一種神經符號推理機，它能夠對大規(guī)模動態(tài)CKG進行推理。該模型在訓練過程中學習了CKGs推理的邏輯規(guī)則。除了提供可解釋的解釋外，學習的邏輯規(guī)則有助于將預測推廣到新引入的事件。在CKGs鏈路預測任務上的實驗結果證明了該模型的有效性。
摘要：Commonsense reasoning aims to incorporate sets of commonsense facts, retrieved from Commonsense Knowledge Graphs (CKG), to draw conclusion about ordinary situations. The dynamic nature of commonsense knowledge postulates models capable of performing multi-hop reasoning over new situations. This feature also results in having large-scale sparse Knowledge Graphs, where such reasoning process is needed to predict relations between new events. However, existing approaches in this area are limited by considering CKGs as a limited set of facts, thus rendering them unfit for reasoning over new unseen situations and events. In this paper, we present a neural-symbolic reasoner, which is capable of reasoning over large-scale dynamic CKGs. The logic rules for reasoning over CKGs are learned during training by our model. In addition to providing interpretable explanation, the learned logic rules help to generalise prediction to newly introduced events. Experimental results on the task of link prediction on CKGs prove the effectiveness of our model by outperforming the state-of-the-art models.

【3】 DaLAJ - a dataset for linguistic acceptability judgments for Swedish: Format, baseline, sharing
標題：DALAJ-瑞典語語言可接受性判斷的數據集：格式、基線、共享

作者：Elena Volodina,Yousuf Ali Mohammed,Julia Klezl
機構：University of Gothenburg, Sweden
備注：This is an extended version of an article accepted to the 10th NLP4CALL workshop (2021), Link\"oping Electronic Conference Proceedings 177, ISSN: 1650-3740 (online). In the extended version (available at arXiv) we have added a description of an experiment and baseline results to the dataset description accepted for NLP4CALL publication
鏈接：https://arxiv.org/abs/2105.06681

摘要：我們介紹了dalaj1.0，一個用于瑞典語可接受性判斷的數據集，第一個版本包含9596個句子；并將其用于二值分類任務的初步實驗。DaLAJ基于第二語言學習者的數據，包括不同水平的文章。為了確保數據集可以免費使用，盡管GDPR的規(guī)定，我們有句子混亂的學習者論文和刪除部分元數據的學習者，為每個句子只保留有關母語的信息和課程水平的文章已經寫了。我們使用學習者語言的規(guī)范化版本作為DaLAJ句子的基礎，并且每個句子只保留一個錯誤。我們對句子中使用的每個單獨的更正標記重復相同的句子。對于dalaj1.0，我們使用了四種錯誤類別（SweLL中有35種），它們都與詞匯或構詞選擇有關。我們的二進制分類的基線結果顯示，使用BERT嵌入的dalaj1.0的準確率為58%。數據集包含在SwedishGlue（Swe）中。SuperLim）基準。下面，我們將介紹數據集的格式、首次實驗、我們的見解以及選擇數據共享方法的動機。
摘要：We present DaLAJ 1.0, a Dataset for Linguistic Acceptability Judgments for Swedish, comprising 9 596 sentences in its first version; and the initial experiment using it for the binary classification task. DaLAJ is based on the SweLL second language learner data, consisting of essays at different levels of proficiency. To make sure the dataset can be freely available despite the GDPR regulations, we have sentence-scrambled learner essays and removed part of the metadata about learners, keeping for each sentence only information about the mother tongue and the level of the course where the essay has been written. We use the normalized version of learner language as the basis for the DaLAJ sentences, and keep only one error per sentence. We repeat the same sentence for each individual correction tag used in the sentence. For DaLAJ 1.0 we have used four error categories (out of 35 available in SweLL), all connected to lexical or word-building choices. Our baseline results for the binary classification show an accuracy of 58% for DaLAJ 1.0 using BERT embeddings. The dataset is included in the SwedishGlue (Swe. SuperLim) benchmark. Below, we describe the format of the dataset, first experiments, our insights and the motivation for the chosen approach to data sharing.

【4】 NLP is Not enough -- Contextualization of User Input in Chatbots
標題：僅有NLP是不夠的--聊天機器人中用戶輸入的語境化

作者：Nathan Dolbir,Triyasha Dastidar,Kaushik Roy
機構：Artificial Intelligence Institute, University of South Carolina, BITS-Pilani Hyderabad
鏈接：https://arxiv.org/abs/2105.06511

摘要：近年來，AI聊天機器人在技術改進方面取得了巨大進步，已經在許多行業(yè)投入使用。基于深度網絡的高級自然語言處理技術可以有效地處理用戶的請求，以實現其功能。隨著聊天機器人越來越受歡迎，由于負擔過重的系統(tǒng)降低了經濟和人力成本，它們在醫(yī)療保健領域的適用性是一個很有吸引力的命題。然而，醫(yī)療機器人需要安全且醫(yī)學上精確的信息捕獲，而由于用戶文本和語音的變化，深度網絡還不能捕獲這些信息。符號結構中的知識更適合于精確推理，但不能直接處理自然語言處理。因此，在本文中，我們研究了結合知識和神經表示對聊天機器人安全性、準確性和理解的影響。
摘要：AI chatbots have made vast strides in technology improvement in recent years and are already operational in many industries. Advanced Natural Language Processing techniques, based on deep networks, efficiently process user requests to carry out their functions. As chatbots gain traction, their applicability in healthcare is an attractive proposition due to the reduced economic and people costs of an overburdened system. However, healthcare bots require safe and medically accurate information capture, which deep networks aren't yet capable of due to user text and speech variations. Knowledge in symbolic structures is more suited for accurate reasoning but cannot handle natural language processing directly. Thus, in this paper, we study the effects of combining knowledge and neural representations on chatbot safety, accuracy, and understanding.

上一篇：陸儉明：不忘朱先生對我的指導和幫助下一篇：語言認知科學國際學術研討會（CLCS-1）通知（第2號）

最熱資訊

熱門標簽