### 2018年度顶会论文完整盘点

CVPR 2018

Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

【Abstract】Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and pro- vides a principled way for identifying redundancies across tasks, in order to, for instance, seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity.

We propose a fully computational approach for modeling the structure of the space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty-six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g.   emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled data points needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.

【论文摘要】视觉任务之间是否有关联，或者它们是否无关？例如，法线(Surface Normals)可以简化估算图像的深度(Depth)吗？直觉回答了这些问题，暗示了视觉任务中存在结构。了解这种结构具有显著的价值;它是迁移学习的基本概念，并提供了一种原则性的方法来识别任务之间的冗余，例如，无缝地重用相关任务之间的监督或在一个系统中解决许多任务而不会增加复杂性。我们提出了一种完全计算的方法来建模视觉任务的空间结构。这是通过在潜在空间中的26个2D，2.5D，3D和语义任务的字典中查找（一阶和更高阶）迁移学习依赖性来完成的。该产品是用于任务迁移学习的计算分类地图。我们研究了这种结构的后果，例如非平凡的关系，并利用它们来减少对标签数据的需求。例如，我们表明，解决一组10个任务所需的标签数据点总数可以减少大约2/3（与独立训练相比），同时保持性能几乎相同。

《Deep Learning of Graph Matching》

Andrei Zanfir, Cristian Sminchisescu

【Abstract】The problem of graph matching under node and pair- wise constraints is fundamental in areas as diverse as combinatorial optimization, machine learning or computer vision, where representing both the relations between nodes and their neighborhood structure is essential. We present an end-to-end model that makes it possible to learn all parameters of the graph matching process, including the unary and pairwise node neighborhoods, represented as deep feature extraction hierarchies. The challenge is in the formulation of the different matrix computation layers of the model in a way that enables the consistent, efficient propagation of gradients in the complete pipeline from the loss function, through the combinatorial optimization layer solving the matching problem, and the feature extraction hierarchy. Our computer vision experiments and ablation studies on challenging datasets like PASCAL VOC keypoints, Sintel and CUB show that matching models refined end-to-end are superior to counterparts based on feature hierarchies trained for other problems.

【论文摘要】在节点和配对约束下的图匹配问题是组合优化、机器学习或计算机视觉等许多领域中的基本问题，其中表示节点之间的关系及其邻域结构是至关重要的。本文提出了一个端到端的模型，使其能够学习图形匹配过程的所有参数，包括表示为深度特征提取层次的一元节点邻域和二元节点邻域。挑战在于通过求解匹配问题的组合优化层和特征提取层次，以能够从损失函数在整个管道（pipeline）中实现梯度的一致。坐着在PASCAL VOC keypoints、Sintel和CUB等具有挑战性的数据集上的计算机视觉实验和消融研究表明，端到端精确匹配模型优于基于针对其他问题训练出的特征层次结构的模型。

《SPLATNet: Sparse Lattice Networks for Point Cloud Processing》

Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz

【Abstract】We present a network architecture for processing point clouds that directly operates on a collection of points rep- resented as a sparse set of samples in a high-dimensional lattice. Na ̈ıvely applying convolutions on this lattice scales poorly, both in terms of memory and computational cost, as the size of the lattice increases. Instead, our network uses sparse bilateral convolutional layers as building blocks. These layers maintain efficiency by using indexing structures to apply convolutions only on occupied parts of the lattice, and allow flexible specifications of the lattice structure enabling hierarchical and spatially-aware feature learning, as well as joint 2D-3D reasoning. Both point-based and image-based representations can be easily incorporated in a network with such layers and the resulting model can be trained in an end-to-end manner. We present results on 3D segmentation tasks where our approach outperforms existing state-of-the-art techniques.

【论文摘要】本文提出了用于处理点云的网络结构，该点云直接在高维网格中表示为稀疏样本集的点集合上操作。随着晶格尺寸的增加，在这个晶格上应用卷积在存储和计算成本方面都表现得非常糟糕。相反，我们的网络使用稀疏的双边卷积层作为基本结构。这些层通过使用索引结构来保持效率，从而仅对格子的占用部分应用卷积，并且允许格子结构的灵活规范，从而实现分层和空间感知的特征学习以及联合2D-3D推理。基于点和基于图像的表示都可以很容易地结合到具有此类层的网络中，并且所得到的模型可以用端到端的方式训练。本文在3D分割任务上的结果显示该方法优于现有最优的技术。

《CodeSLAM-learning a Compact, Optimisable Representation for Dense Visual SLAM》

Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, Andrew J. Davison

【Abstract】The representation of geometry in real-time 3D perception systems continues to be a critical research issue. Dense maps capture complete surface shape and can be augmented with semantic labels, but their high dimensionality makes them computationally costly to store and process, and unsuitable for rigorous probabilistic inference. Sparse feature-based representations avoid these problems, but capture only partial scene information and are mainly useful for localisation only.

We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. We are inspired by work both on learned depth from images, and auto-encoders. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: While each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlapping keyframes to attain global consistency. Conditioning the depth map on the image allows the code to only represent aspects of the local geometry which cannot directly be predicted from the image. We explain how to learn our code representation, and demonstrate its advantageous properties in monocular SLAM.

【论文摘要】实时三维感知系统中的几何表示仍然是一个关键的研究课题。稠密映射可以捕获完整的表面形状，并且可以用语义标签进行扩充，但是它们的高维数使得它们存储和处理的计算成本很高，并且不适合用于严格的概率推断。稀疏的基于特征的表示避免了这些问题，但是只捕获部分场景信息，并且主要用于定位。本文提出一种新的紧凑密集的场景几何表示，它以单个图像的强度数据为条件，并且由含少量参数的编码生成。这个方法的灵感来自于从图像学习的深度和自动编码器两方面的工作。该方法适合在基于关键帧的单目密集SLAM系统中使用：虽然每个带有编码的关键帧可以生成一个深度图，但是可以与姿态变量以及重叠关键帧的编码一起有效地优化编码，以实现全局一致性。对图像上的深度图进行条件化允许编码仅表示不能从图像中直接预测的局部几何体。本文还解释如何学习编码表示，并演示其在单目SLAM中的优势。

《Efficient Optimization for Rank-based Loss Functions》

Pritish Mohapatra, Michal Rolínek C.V. Jawahar, Vladimir Kolmogorov, M. Pawan Kumar

【Abstract】The accuracy of information retrieval systems is often measured using complex loss functions such as the aver- age precision (AP) or the normalized discounted cumulative gain (NDCG). Given a set of positive and negative samples, the parameters of a retrieval system can be estimated by minimizing these loss functions. However, the non-differentiability and non-decomposability of these loss functions does not allow for simple gradient based optimization algorithms. This issue is generally circumvented by either optimizing a structured hinge-loss upper bound to the loss function or by using asymptotic methods like the direct-loss minimization framework. Yet, the high computational complexity of loss-augmented inference, which is necessary for both the frameworks, prohibits its use in large training data sets. To alleviate this deficiency, we present a novel quicksort flavored algorithm for a large class of non-decomposable loss functions. We provide a complete characterization of the loss functions that are amenable to our algorithm, and show that it includes both AP and NDCG based loss functions. Furthermore, we prove that no comparison based algorithm can improve upon the computational complexity of our approach asymptotically. We demonstrate the effectiveness of our approach in the context of optimizing the structured hinge loss upper bound of AP and NDCG loss for learning models for a variety of vision tasks. We show that our approach provides significantly better results than simpler decomposable loss functions, while requiring a comparable training time.

【论文摘要】信息检索系统的精度通常使用诸如平均精度（Average Precision，AP）或归一化折扣累积增益（Normalized Discounted Cumulative Gain，NDCG）的复杂损失函数来测量。给定一组正样本和负样本，可以通过最小化这些损失函数来估计检索系统的参数。然而，这些损失函数的不可微性和不可分解性使得我们无法使用简单的基于梯度的优化算法。这个问题通常通过优化损失函数的结构铰链损失（hinge-loss）上界或者使用像直接损失最小化框架（direct-loss minimization framework）这样的渐进方法来避免。然而，损失增强推理（loss-augmented inference）的高计算复杂度限制了它在大型训练数据集中的使用。为了克服这一不足，我们提出了一种针对大规模不可分解损失函数的快速排序算法。我们提供了符合这一算法的损失函数的特征描述，它可以处理包括AP和NDCC系列的损失函数。此外，我们证明了任何基于比较的算法都不能提高我们方法的渐近计算复杂度。在优化各种视觉任务学习模型的结构铰链损失上限的AP和NDCG损失，我们证明了该方法的有效性。我们证明该方法比简单的可分解损失函数提供更好的结果，同时只需要相当的训练时间。

ECCV 2018

《Implicit 3D Orientation Learning for 6D Object Detection from RGB Images》

Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel

【Abstract】We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization.

This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Experiments on the T-LESS and LineMOD datasets show that our method outperforms similar model- based approaches and competes with state-of-the art approaches that require real pose-annotated images.

【论文摘要】本文提出了一种基于RGB图像的实时物体检测与6维姿态估计的方法。其中，新型的3维目标朝向估计方法是基于降噪自编码器（Denoising Autoencoder）的一个变种，它使用域随机化（Domain Randomization）方法在3维模型的模拟视图上进行训练。这种我们称之为“增强自编码器”（Augmented Autoencoder，AAE）的方法，比现有方法具有很多优点：它不需要真实的姿势标注的训练数据，可泛化到多种测试传感器，且能够内部处理目标和视图的对称性。该方法不学习从输入图像到目标姿势的明确映射，相反，它提供了样本在隐空间（latent space）中定义的目标朝向的隐式表达。在 T-LESS 和 LineMOD 数据集上的测试表明，我们的方法优于类似的基于模型的方法，可以媲美需要真实姿态标注图像的当前最优的方法。

Best Paper Award, Honorable Mention（两篇）

《Group Normalization》

【Abstract】Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN- based counterparts for object detection and segmentation in COCO,1 and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.

【论文摘要】批量归一化（Batch Normalization，BN）是深度学习发展中的一项里程碑式技术，可以让各种网络进行训练。但是，批量维度进行归一化会带来一些问题——批量统计估算不准确导致批量变小时，BN的误差会迅速增加。因此，BN在训练大型网络或者将特征转移到计算机视觉任务（包括检测、分割和视频）的应用受到了限制，因为在这类问题中，内存消耗限制了只能使用小批量的BN。在这篇论文中，作者提出了群组归一化（Group Normalization，GN）的方法作为 BN 的替代方法。GN首先将通道（channel）分为许多组（group），对每一组计算均值和方差，以进行归一化。GN的计算与批大小（batch size）无关，并且它的精度在不同批大小的情况中都很稳定。在ImageNet上训练的ResNet-50上，当批量大小为2时，GN的误差比BN低10.6%。当使用经典的批量大小时，GN与BN相当，但优于其他归一化变体。此外，GN 可以很自然地从预训练阶段迁移到微调阶段。在COCO的目标检测和分割任务以及Kinetics的视频分类任务中，GN的性能优于或与BN变体相当，这表明GN可以在一系列不同任务中有效替代BN；在现代的深度学习库中，GN通过若干行代码即可轻松实现。

《GANimation: Anatomically-aware Facial Animation from a Single Image》

【Abstract】Recent advances in Generative Adversarial Networks(GANs) have shown impressive results for task of facial expression synthesis. The most successful architecture is StarGAN [4], that conditions GANs’ generation process with images of a specific domain, namely a set of images of persons sharing the same expression. While effective, this approach can only generate a discrete number of expressions, determined by the content of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on Action Units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. Our approach allows controlling the magnitude of activation of each AU and combine several of them. Additionally, we propose a fully unsupervised strategy to train the model, that only requires images annotated with their activated AUs, and exploit attention mechanisms that make our network robust to changing backgrounds and lighting conditions. Extensive evaluation show that our approach goes beyond competing conditional generators both in the capability to synthesize a much wider range of expressions ruled by anatomically feasible muscle movements, as in the capacity of dealing with images in the wild.

【论文摘要】生成式对抗网络（Generative Adversarial Networks, GANs）近期在面部表情合成任务中取得了惊人表现，其中最成功的架构是StarGAN，它把GANs的图像生成过程限定在了特定情形中，即一组不同的人做出同一个表情的图像。这种方法虽然有效，但只能生成若干离散的表情，具体生成哪一种取决于训练数据内容。为了处理这种限制问题，本文提出了一种新的GAN条件限定方法，该方法基于动作单元（Action Units，AU）标注，而在连续的流形中，动作单元标注可以描述定义人类表情的解剖学面部动作。这种方法可以使我们控制每个AU的激活程度，并将之组合。除此以外，本文还提出一种完全无监督的方法用来训练模型，只需要标注了激活的AU的图像，并通过应用注意力机制（attention mechanism）就可使网络对背景和光照条件的改变保持鲁棒性。大量评估表明该方法比其他的条件生成方法有明显更好的表现，不仅表现在有能力根据解剖学上可用的肌肉动作生成多样的表情，而且也能更好地处理来自户外的图像。

IJCAI-ECAI-2018

《SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks》

Ke Wang, Xiaojun Wan

【Abstract】Generating texts of different sentiment labels is get- ting more and more attention in the area of natural language generation. Recently, Generative Adversarial Net (GAN) has shown promising results in text generation. However, the texts generated by GAN usually suffer from the problems of poor quality, lack of diversity and mode collapse. In this paper, we propose a novel framework SentiGAN, which has multiple generators and one multi-class discriminator, to address the above problems. In our framework, multiple generators are trained simultaneously, aiming at generating texts of different sentiment labels without supervision. We pro- pose a penalty based objective in the generators to force each of them to generate diversified examples of a specific sentiment label. Moreover, the use of multiple generators and one multi-class discriminator can make each generator focus on generating its own examples of a specific sentiment label accurately. Experimental results on four datasets demonstrate that our model consistently outperforms several state-of-the-art text generation methods in the sentiment accuracy and quality of generated texts.

【论文摘要】在自然语言生成领域，不同情感文本的生成受到越来越广泛的关注。近年来，生成对抗网（GAN）在文本生成中取得了成功的应用。然而，GAN 所产生的文本通常存在质量差、缺乏多样性和模式崩溃的问题。在本文中，我们提出了一个新的框架——SentiGAN，包含多个生成器和一个多类别判别器，以解决上述问题。在我们的框架中，多个生成器同时训练，旨在无监督环境下产生不同情感标签的文本。我们提出了一个基于目标的惩罚函数，使每个生成器都能在特定情感标签下生成具有多样性的样本。此外，使用多个生成器和一个多类判别器可以使每个生成器专注于准确地生成自己的特定情感标签的例子。在四个数据集上的实验结果表明，我们的模型在情感准确度和生成文本的质量方面始终优于几种最先进的文本生成方法。

《Reasoning about Consensus when Opinions Diffuse through Majority Dynamics》

Vincenzo Auletta，Diodato Ferraioli，Gianluigi Greco

【Abstract】Opinion diffusion is studied on social graphs where agents hold binary opinions and where social pressure leads them to conform to the opinion manifested by the majority of their neighbors. Within this setting, questions related to whether a minority/majority can spread the opinion it supports to all the other agents are considered. It is shown that, no matter of the underlying graph, there is always a group formed by a half of the agents that can annihilate the opposite opinion. Instead, the influence power of minorities depends on certain features of the given graph, which are NP-hard to be identified. Deciding whether the two opinions can coexist in some stable configuration is NP-hard, too.

【论文摘要】在社会图中，agent持有二元意见，并且社会压力导致他们遵从大多数邻居所表示的意见。在这种背景下，考虑有关少数/多数是否能够将其支持的意见传播到所有其他agent的问题。研究结果表明，无论底层图如何，总是存在一个由半数agent组成的群体可以消除相反的意见。相反，少数群体的影响力取决于给定图的某些特征，这些特征的识别是NP难问题。决定这两种观点是否可以在某种稳定的配置中共存也是NP难的。

《R-SVM+: Robust Learning with Privileged Information》

Xue Li , Bo Du , Chang Xu , Yipeng Zhang , Lefei Zhang , Dacheng Tao

【Abstract】In practice, the circumstance that training and test data are clean is not always satisfied. The performance of existing methods in the learning using privileged information (LUPI) paradigm may be seriously challenged, due to the lack of clear strategies to address potential noises in the data. This paper proposes a novel Robust SVM+ (R- SVM+) algorithm based on a rigorous theoretical analysis. Under the SVM+ framework in the LUPI paradigm, we study the lower bound of perturbations of both example feature data and privileged feature data, which will mislead the model to make wrong decisions. By maximizing the lower bound, tolerance of the learned model over perturbations will be increased. Accordingly, a novel regularization function is introduced to upgrade a variant form of SVM+. The objective function of R- SVM+ is transformed into a quadratic programming problem, which can be efficiently optimized using off-the-shelf solvers. Experiments on real- world datasets demonstrate the necessity of studying robust SVM+ and the effectiveness of the proposed algorithm.

【论文摘要】实际应用场景下，训练数据和测试数据质量并不足够干净。由于缺少解决数据中潜在噪声的有效策略，现有方法的效果在特权信息学习（learning using privileged information，LUPI）范式中可能受到很大的挑战。本文基于严格的理论分析，提出了一种新的鲁棒SVM+（R-SVM+）算法。我们在SVM+框架下的LUPI中研究了样本标签数据和特权标签数据的扰动下界，这个扰动下界会误导模型做出错误的决策。通过最大化下界，所学习的模型在扰动下的容忍度将会增大。因此，新的正则化函数被引入，用于升级SVM+的变体。将R-SVM+的目标函数转化为二次规划问题，利用现成的求解方法可以很容易进行优化求解。实证结果展现了R-SVM+的必要性和算法的有效性。

《From Conjunctive Queries to Instance Queries in Ontology-Mediated Querying》

Cristina Feier, Carsten Lutz, Frank Wolter

【Abstract】We consider ontology-mediated queries (OMQs) based on expressive description logics of the ALC family and (unions) of conjunctive queries, studying the rewritability into OMQs based on instance queries (IQs). Our results include exact characterizations of when such a rewriting is possible and tight complexity bounds for deciding rewritability. We also give a tight complexity bound for the related problem of deciding whether a given MMSNP sentence is equivalent to a CSP.

【论文摘要】我们考虑基于ALC族和连接查询的表达性描述逻辑的本体中介查询（ontology-mediated queries，OMQs），研究基于实例查询（instance queries，IQs）的OMQ的可重写性。我们的结果包括这种重写何时能精确表征以及决定重写性的严格复杂性界限。我们还给出了判定给定MMSNP语句是否等价于CSP的相关问题的严格复杂度界限。

《What Game are We Playing? End-to-end Learning in Normal and Extensive from Games》

Chun Kai Ling, Fei Fang, J. Zico Kolter

【Abstract】Although recent work in AI has made great progress in solving large, zero-sum, extensive-form games, the underlying assumption in most past work is that the parameters of the game itself are known to the agents. This paper deals with the relatively under-explored but equally important “in- verse” setting, where the parameters of the under- lying game are not known to all agents, but must be learned through observations. We propose a differentiable, end-to-end learning framework for ad- dressing this task. In particular, we consider a regularized version of the game, equivalent to a particular form of quantal response equilibrium, and develop 1) a primal-dual Newton method for finding such equilibrium points in both normal and extensive form games; and 2) a backpropagation method that lets us analytically compute gradients of all relevant game parameters through the solution itself. This ultimately lets us learn the game by training in an end-to-end fashion, effectively by integrating a “differentiable game solver” into the loop of larger deep network architectures. We demonstrate the effectiveness of the learning method in several set- tings including poker and security game tasks.

【论文摘要】虽然最近人工智能的研究在求解大型、零和、扩展形式的博弈方面取得了很大进展，但过去大多数工作中的基本假设是博弈本身的参数是agent已知的。本文讨论相对未被充分探索但同样重要的“逆”设置，其中底层博弈的参数不是所有agent都知道的，必须通过观察来学习。我们提出一个可微的、端到端的学习框架来处理这个任务。特别地，我们考虑博弈的正则化版本，等价于随机最优反应均衡（quantal response equilibrium）的特定形式，并改进：1)在正规形式博弈和扩展形式博弈中寻找这种平衡点的原始-对偶牛顿（primal-dual Newton）方法；2)反向传播方法，它使我们能够通过解本身来计算所有相关博弈参数的梯度。这最终让我们通过端到端的训练来学习博弈，通过将“可微的博弈求解器”有效地集成到更大的深层网络体系结构的循环中。我们展示了该学习方法在多种设置中的有效性，包括扑克和安全博弈任务。

《Commonsense Knowledge Aware Conversation Generation with Graph Attention》

Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, Xiaoyan Zhu

【Abstract】Commonsense knowledge is vital to many natural language processing tasks. In this paper, we present a novel open-domain conversation generation model to demonstrate how large-scale commonsense knowledge can facilitate language under- standing and generation. Given a user post, the model retrieves relevant knowledge graphs from a knowledge base and then encodes the graphs with a static graph attention mechanism, which augments the semantic information of the post and thus sup- ports better understanding of the post. Then, during word generation, the model attentively reads the retrieved knowledge graphs and the knowledge triples within each graph to facilitate better generation through a dynamic graph attention mechanism. This is the first attempt that uses large-scale commonsense knowledge in conversation generation. Furthermore, unlike existing models that use knowledge triples (entities) separately and independently, our model treats each knowledge graph as a whole, which encodes more structured, connected semantic information in the graphs. Experiments show that the proposed model can generate more appropriate and informative responses than state- of-the-art baselines.

【论文摘要】常识知识对许多自然语言处理任务至关重要。本文提出了一种新的开放领域会话生成模型，以演示大规模常识知识如何促进语言理解和生成。给定用户帖子，模型从知识库中检索相关知识图，然后用静态图注意力机制对图进行编码，从而增强帖子的语义信息，从而支持对帖子的更好理解。然后，在单词生成过程中，该模型通过动态图注意力机制仔细地读取检索到的知识图和每个图中的知识三元组，以便于更好地生成。这是第一次尝试在对话生成中使用大规模常识知识。此外，与现有模型分别和独立地使用知识三元组（实体）不同，我们的模型将每个知识图作为一个整体来处理，从而在图中编码更结构化、连接的语义信息。实验表明，该模型能够产生比现有基准更合适、信息量更大的响应。

《A Degeneracy Framework for Graph Similarity》

【Abstract】The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Most existing methods for graph similarity focus either on local or on global properties of graphs. However, even if graphs seem very similar from a local or a global perspective, they may exhibit different structure at different scales. In this paper, we present a general framework for graph similarity which takes into account structure at multiple different scales. The proposed framework capitalizes on the well- known k-core decomposition of graphs in order to build a hierarchy of nested subgraphs. We apply the framework to derive variants of four graph kernels, namely graphlet kernel, shortest-path kernel, Weisfeiler-Lehman subtree kernel, and pyramid match graph kernel. The framework is not limited to graph kernels, but can be applied to any graph comparison algorithm. The proposed frame- work is evaluated on several benchmark datasets for graph classification. In most cases, the core- based kernels achieve significant improvements in terms of classification accuracy over the base kernels, while their time complexity remains very at- tractive.

【论文摘要】精确测量图形之间的相似性是许多学科应用的核心问题。大多数现有的确定图相似性的方法要么关注图的局部性质，要么关注图的全局性质。然而，即使从局部或全局的角度来看，图形看起来非常相似，但它们可能在不同的尺度上表现出不同的结构。本文提出了一个通用的图相似性框架，该框架考虑了多个不同尺度上的结构。该框架利用图的k核（k-core）分解来构建嵌套子图的层次结构。应用该框架导出了四种图核（graph kernels）的变体，即图核、最短路径核、Weisfeiler-Lehman子树核和金字塔匹配图核。该框架不仅限于图核，而是可以应用于任何图比较算法。该框架在多个用于图分类的基准数据集上进行了评估。在大多数情况下，基于核(core-based)的内核(kernel)在分类精度方面比基本内核(base kernel)有显著的提高，而它们的时间复杂度仍然非常优秀。

ICML 2018

《Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples》

Anish Athalye，Nicholas Carlini，David Wagner

【Abstract】We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization- based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining noncertified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.

《Delayed Impact of Fair Machine Learning》

Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, Moritz Hardt

【Abstract】Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect.

We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not. We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably.

Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.

【论文摘要】机器学习的公平性主要在静态分类设置中进行研究，而不关心决策如何随着时间的推移改变潜在的群体。传统观点认为，公平标准可以促进他们旨在保护的群体的长期利益。

《Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices》

Zengfeng Huang

【Abstract】Given a large matrix  Rn×d, we consider the problem of computing a sketch matrix  Rl×d  which is significantly smaller than but still well approximates A. We are interested in minimizing the covariance error AA − BB2. We consider the problems in the streaming model, where the algorithm can only make one pass over the input with limited working space. The popular Frequent Directions algorithm of (Liberty, 2013) and its variants achieve optimal space-error tradeoff. However, whether the running time can be improved remains an unanswered question. In this paper, we almost settle the time complexity of this problem. In particular, we provide new space-optimal algorithms with faster running times. Moreover, we also show that the running times of our algorithms are near-optimal unless the state-of-the-art running time of matrix multiplication can be improved significantly.

【论文摘要】给定一个维的大型矩阵A，我们考虑计算l x d维的草图矩阵（sketch matrix），这个矩阵的维度要显著小于原矩阵A，但它仍可以很好的近似A。我们希望最小化协方误差AA − BB2。我们再考虑流模型（streaming model）中的问题，在这个模型里，算法只能在有限的工作空间内传输输入一次。流行的 Frequent Directions 算法（Liberty, 2013）与它的变体实现了最优空间和误差间的权衡，然而，运行时间能否缩减还是一个未解决问题。在本论文中，我们几乎解决了这个问题的时间复杂度。特别是，我们提供了有更快运行时间的新型空间-最优（space-optimal）算法。此外，除非矩阵乘法的当前最优运行时间能显著提升，否则我们算法的运行时间是近似最优的（near-optimal）。

《The Mechanics of n-Player Differentiable Games》

David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel

【Abstract】The cornerstone underpinning deep learning is the guarantee that gradient descent on an objective converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, where there are multiple interacting losses. The behavior of gradient-based methods in games is not well understood – and is becoming increasingly important as adversarial and multi- objective architectures proliferate. In this paper, we develop new techniques to understand and control the dynamics in general games. The key result is to decompose the second-order dynamics into two components. The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in general games. Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs – whilst at the same time being applicable to – and having guarantees in – much more general games.

《Fairness Without Demographics in Repeated Loss Minimization》

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, Percy Liang

【Abstract】Machine learning models (e.g., speech recognizers) are usually trained to minimize average loss, which results in representation disparity— minority groups (e.g., non-native speakers) con- tribute less to the training objective and thus tend to suffer higher loss. Worse, as model accuracy affects user retention, a minority group can shrink over time. In this paper, we first show that the status quo of empirical risk minimization (ERM) amplifies representation disparity over time, which can even make initially fair models unfair. To mitigate this, we develop an approach based on distributionally robust optimization (DRO), which minimizes the worst case risk over all distributions close to the empirical distribution. We prove that this approach controls the risk of the minority group at each time step, in the spirit of Rawlsian distributive justice, while remaining oblivious to the identity of the groups. We demonstrate that DRO prevents disparity amplification on examples where ERM fails, and show improvements in minority group user satisfaction in a real-world text autocomplete task.

【论文摘要】机器学习模型（如语音识别器）通常被训练以最小化平均损失，这导致了表征差异（representation disparity）问题——少数群体（如非母语说话者）对训练目标函数的贡献较少，并因此带来了更高的损失。更糟糕的是，由于模型准确率会影响用户留存，因此少数群体的数量会随着时间而日益减少。本论文首先展示了经验风险最小化（empirical risk minimization，ERM）的现状放大了表征差异，这甚至使得最初公平的模型也变得不公平了。为了减小这一问题，我们提出了一种基于分布式鲁棒优化（distributionally robust optimization，DRO）的方法，可以最小化所有分布上的最大风险，使其接近经验分布。我们证明了该方法可以控制每个时间步的少数群体风险，使其符合罗尔斯分配正义（rawlsian distributive justice），不过并不清楚该方法对群体的标识如何。我们证明DRO可以阻止样本的表征差异扩大，而这是ERM做不到的，我们还在现实世界的文本自动完成任务上证明了该方法对少数群体用户满意度有所改进。

NIPS 2018

《Neural Ordinary Differential Equations》

Tian Qi Chen， Yulia Rubanova， Jesse Bettencourt， David Duvenaud

【Abstract】We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black- box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

【论文摘要】本文提出了一种新的深度神经网络模型。我们使用神经网络来参数化隐藏状态的导数，而不是指定一个离散的隐藏层序列。利用黑盒微分方程求解器计算网络的输出。这些连续深度模型具有固定的存储成本，可以根据每个输入调整其评估策略，并且可以显式地通过改变数值精度换取速度。我们在连续深度残差网络和连续时间潜在变量模型中证明了这些性质。我们还构建了连续标准化流（continuous normalizing flows），这是一个可以通过极大似然进行训练、而无需对数据维度进行分区或排序的生成模型。对于训练过程，我们展示了如何在不访问任何ODE求解器内部操作的情况下，可扩展地反向传播。这允许在更大的模型中对ODE进行端到端训练。

《Non-delusional Q-learning and Value-iteration》

Tyler Lu， Dale Schuurmans， Craig Boutilier

【Abstract】We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation. Delusional bias arises when the approximation architecture limits the class of expressible greedy policies. Since standard Q-updates make globally uncoordinated action choices with respect to the expressible policy class, inconsistent or even conflicting Q-value estimates can result, leading to pathological behaviour such as over/under-estimation, instability and even divergence. To solve this problem, we introduce a new notion of policy consistency and define a local backup process that ensures global consistency through the use of information sets—sets that record constraints on policies consistent with backed-up Q-values. We prove that both the model-based and model-free algorithms using this backup remove delusional bias, yielding the first known algorithms that guarantee optimal results under general conditions. These algorithms furthermore only require poly nomially many information sets (from a potentially exponential support). Finally, we suggest other practical heuristics for value-iteration and Q-learning that attempt to reduce delusional bias.

《Optimal Algorithms for Non-Smooth Distributed Optimization in Networks》

Kevin Scaman， Francis Bach， Sebastien Bubeck， Laurent Massoulié， Yin Tat Lee

【Abstract】In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in O(1/√t), the structure of the communication network only impacts a second-order term in O(1/t), where t is time. In other words, the error due to lim- its in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a d1/4 multiplicative factor of the optimal convergence rate, where d is the underlying dimension.

【论文摘要】我们利用计算单元网络，研究了非光滑凸函数的分布优化问题。我们在两个正则性假设下研究这个问题：(1)全局目标函数的Lipschitz连续性，(2)局部单个函数的Lipschitz连续性。在局部正则性假设下，我们提出第一个最优一阶分散算法，即多步原始对偶算法(multimulti-step primal-dual, MSPD)，并给出了相应的最优收敛速度。值得注意是，对于非光滑函数，虽然误差的主导项在中，但是通信网络的结构只影响的二阶项，其中t为时间。也就是说，即使在非强凸目标函数的情况下，由于通信资源的限制而产生的误差也会快速减小。在全局正则性假设下，我们提出了一种基于目标函数局部平滑的简单而有效的分布式随机平滑算法(distributed smooth, DRS)，并证明了DRS是在最优收敛率的乘因子范围内，其中d为底层维数。

《Nearly Tight Sample Complexity Bounds for Learning Mixtures of Gaussians via Sample Compression Schemes》

Hassan Ashtiani， Shai Ben-David， Nick Harvey， Christopher Liaw， Abbas Mehrabian， Yaniv Plan

【Abstract】We prove that $$\widetilde{\theta } （kd^{2}/\varepsilon ^{2})$$  samples are necessary and sufficient for learning a mixture of k Gaussians in $$R^{d}$$, up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $$\widetilde{\theta } （kd/\varepsilon ^{2})$$ samples suffice, matching a known lower bound.

The upper bound is based on a novel technique for distribution learning based on a notion of sample compression. Any class of distributions that allows such a sample compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $$R^{d}$$ has an efficient sample compression.

【论文摘要】我们证明了$$\widetilde{\theta } （kd^{2}/\varepsilon ^{2})$$样本对于学习$$R^{d}$$中的k阶高斯混合是充分必要的，直到整体偏差距离为误差ε。这改善了该问题已知的上限和下限。对于轴对齐高斯分布（axis-aligned Gaussians）的混合，我们证明$$\widetilde{\theta } （kd/\varepsilon ^{2})$$样本是足够的，这与已知的下界相匹配。上界是基于一种新的方法，即基于样本压缩(sample compression)概念的分布式学习。任何一类允许这种样本压缩方案的分布也可以通过很少的样本来学习。我们的主要结果是证明了$$R^{d}$$中的高斯类具有有效的样本压缩。

AAAI 2018

《Memory-Augmented Monte Carlo Tree Search》

Chenjun Xiao, Jincheng Mei and Martin Muller

【Abstract】This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online real- time search. The key idea of M-MCTS is to incorporate MCTS with a memory structure, where each entry contains information of a particular state. This memory is used to generate an approximate value estimation by combining the estimations of similar states. We show that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions. We evaluate M-MCTS in the game of Go. Experimental results show that M- MCTS outperforms the original MCTS with the same number of simulations.

【论文摘要】本文提出并评价了记忆增强蒙特卡罗树搜索（Memory-Augmented Monte Carlo Tree Search，M-MCTS），为在线实时搜索提供了一种新的一般化方法。M-MCTS的关键思想是将MCTS与存储器结构合并，其中每个条目包含特定状态的信息。该存储器用于通过组合相似状态的估计来生成近似值估计。结果表明，在温和的条件下，基于记忆的值逼近方法优于具有高概率的普通蒙特卡罗方法。我们在围棋游戏中评估M-MCTS，结果表明，在相同的仿真次数下，MMCTS性能优于原MCTS。

Jakob N. Foerster , Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

【Abstract】Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents’ policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent’s action, while keeping the other agents’ actions fixed. COMA also uses a critic representation that al- lows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor- critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

【论文摘要】许多现实世界的问题，例如网络分组路由和自动驾驶车辆的协调，都很自然地被建模为多智能体协作系统。这类问题非常需要一种新的强化学习方法，可以有效地学习这种系统的分散策略。为此，我们提出一种新的多智能体 actor-critic方法，称为反事实多智能体（counterfactual multi-agent，COMA）策略梯度。COMA使用一个中心化的critic来估计Q函数，以及一个去中心化的actors来优化智能体的策略。此外，为了解决多智能体信度分配的问题，COMA使用一个反事实基线（counterfactual baseline），将单个智能体的行为边缘化，同时保持其他智能体的行为固定不变。COMA还使用critic表示允许在单个前向传播中有效地计算反事实基线。我们在星际争霸单位微操的测试平台上评估COMA，使用具有显着局部可观察性的去中心化变体。在这种条件下，COMA相比其他多智能体actor-critic 方法的平均性能显著要高，而且性能最好的智能体可以与当前最优的中心化控制器相媲美，并能获得全部状态的信息访问。

ACL 2018

ACL大会（Annual Meeting of the Association for Computational Linguistics）是计算语言学学会一年一度的年会，也是该领域最重要的学术会议。计算语言学学会始于1962年，原名为机器翻译与计算语言学学会（Association for Machine Translation and Computational Linguistics, AMTCL），于1968年更名为ACL。每年夏季，来自世界各地的相关领域研究人员齐聚一堂，共同交流自然语言处理领域的理论发展和技术进步。近年来，自然语言处理在包括机器翻译、语言分析、信息抽取、自动问答和文本摘要等众多方向取得了长足的进步。

《Finding syntax in human encephalography with beam search》

John Hale, Chris Dyer, Adhiguna Kuncoro and Jonathan Brennan.

【Abstract】Recurrent neural network grammars (RNNGs) are generative models of (tree, string) pairs that rely on neural net- works to evaluate derivational choices. Parsing with them using beam search yields a variety of incremental complexity metrics such as word surprisal and parser action count. When used as regressors against human electrophysiological responses to naturalistic text, they derive two amplitude effects: an early peak and a P600-like later peak. By contrast, a non-syntactic neural language model yields no reliable effects. Model comparisons attribute the early peak to syntactic composition within the RNNG. This pattern of results recommends the RNNG+beam search combination as a mechanistic model of the syntactic processing that occurs during normal human language comprehension.

【论文摘要】递归神经网络语法（recurrent neural network grammers，RNNGs）是依靠神经网络来评估衍生选择的（树，串）对的生成模型。使用束搜索（beam search）进行解析会产生各种增量复杂性度量，如单词惊异数（word surprisal count）和解析器动作计数（parser action count）。当把它们用作回归因子，解析人类大脑成像图像中对于自然语言文本的电生理学响应时，它们可以带来两个增幅效果：一个较早的峰值以及一个类似 P600 的稍滞后的峰值。相比之下，一个不具有句法结构的神经语言模型无法达到任何可靠的增幅效果。通过对不同模型的对比，早期峰值的出现可以归功于RNNG中的句法组合。结果中体现出的这种模式表明RNNG+束搜索的组合可以作为正常人类语言处理中的语法处理的很好的机理解释模型。

《Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information》

Sudha Rao and Hal Daumé III.

【Abstract】Inquiry is fundamental to communication, and machines cannot effectively collabo- rate with humans unless they can ask questions. In this work, we build a neural net- work model for the task of ranking clarification questions. Our model is inspired by the idea of expected value of perfect information: a good question is one whose expected answer will be useful. We study this problem using data from StackExchange, a plentiful online resource in which people routinely ask clarifying questions to posts so that they can better offer assistance to the original poster. We create a dataset of clarification questions consisting of ∼77K posts paired with a clarification question (and answer) from three domains of StackExchange: askubuntu, unix and superuser. We evaluate our model on 500 samples of this dataset against expert human judgments and demonstrate significant improvements over controlled base- lines.

《Let’s do it “again”: A First Computational Approach to Detecting Adverbial Presupposition Triggers》

Andre Cianflone, Yulan Feng, Jad Kabbara and Jackie Chi Kit Cheung.

【Abstract】We introduce the task of predicting adverbial presupposition triggers such as also and again. Solving such a task requires detecting recurring or similar events in the discourse context, and has applications in natural language generation tasks such as summarization and dialogue systems. We create two new datasets for the task, de- rived from the Penn Treebank and the An- notated English Gigaword corpora, as well as a novel attention mechanism tailored to this task. Our attention mechanism augments a baseline recurrent neural network without the need for additional trainable parameters, minimizing the added computational cost of our mechanism. We demonstrate that our model statistically outperforms a number of baselines, including an LSTM-based language model.

【论文摘要】本文介绍了预测副词词性的假定状态触发语（adverbial presupposition triggers）（比如also和again）这一任务。完成这样的任务需要在对话上下文里寻找重复出现的或者相似的内容，这项任务的研究成果可以在文本总结或者对话系统等自然语言生成任务中起到帮助。我们为这项任务创造了两个新的数据集，分别由Penn Treebank和Annotated English Gigaword生成，而且也专为这项任务设计了一种新的注意力机制，该注意力机制无需额外的可训练网络参数就可以增强基准RNN模型的表现，因此最小化了这一注意力机制带来的额外计算开销。我们的模型相比多个基准模型都有统计上显著的更好表现，包括相比基于LSTM的语言模型。

《Know What You Don’t Know: Unanswerable Questions for SQuAD. Pranav Rajpurkar, Robin Jia and Percy Liang》

Pranav Rajpurkar，Robin Jia，Percy Liang

《‘Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions》

Olivia Winn，Smaranda Muresan

【Abstract】We propose a novel paradigm of grounding comparative adjectives within the realm of color descriptions. Given a reference RGB color and a comparative term (e.g., ‘lighter’, ‘darker’), our model learns to ground the comparative as a direction in the RGB space such that the colors along the vector, rooted at the reference color, satisfy the comparison. Our model generates grounded representations of comparative adjectives with an average accuracy of 0.65 cosine similarity to the desired direction of change. These vectors approach colors with Delta-E scores of under 7 compared to the target colors, indicating the differences are very small with respect to human perception. Our approach makes use of a newly created dataset for this task derived from existing labeled color data.

【论文摘要】我们提出了一个将比较形容词用于颜色描述领域的新范式。给定一个参考RGB颜色和一个比较项（例如，‘lighter’，‘darker’），我们的模型在RGB空间中将比较项作为方向进行学习，使得沿着矢量的以参考颜色为基准的颜色可进行比较。

SIGIR 2018

SIGIR（International ACM SIGIR Conference on Research and Development in Information Retrieval）是展示信息检索领域新技术和新成果的顶级国际会议，始于1978年，由ACM主办。

2018年SIGIR总投稿量达409篇，最终录取86篇，录取率约为21%。

《Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems》

Rocio Caamares， Pablo Castells

【Abstract】The use of IR methodology in the evaluation of recommender systems has become common practice in recent years. IR metrics have been found however to be strongly biased towards rewarding algorithms that recommend popular items –the same bias that state of the art recommendation algorithms display. Recent research has confirmed and measured such biases, and proposed methods to avoid them. The fundamental question remains open though whether popularity is really a bias we should avoid or not; whether it could be a useful and reliable signal in recommendation, or it may be unfairly rewarded by the experimental biases. We address this question at a formal level by identifying and modeling the conditions that can determine the answer, in terms of dependencies between key random variables, involving item rating, discovery and relevance. We find conditions that guarantee popularity to be effective or quite the opposite, and for the measured metric values to reflect a true effectiveness, or qualitatively deviate from it. We exemplify and confirm the theoretical findings with empirical results. We build a crowdsourced dataset devoid of the usual biases displayed by common publicly available data, in which we illustrate contradictions between the accuracy that would be measured in a common biased offline experimental setting, and the actual ac- curacy that can be measured with unbiased observations.

【论文摘要】在推荐系统的评估中使用IR方法论在近年来已成为惯例。然而，IR指标在推荐受欢迎条目的奖励算法中被发现有很强的偏见，相同的偏见在当前最佳的推荐算法中也出现了。近期的研究证实并测量了这种偏见，并提出了相应的方法来避免它们。问题仍然是开放性的：即流行度是不是一种需要避免的偏见；它在推荐系统中是不是一种有用的和可靠的信号；或者它是否可能由实验偏见带来不公平的奖励。我们通过识别和建模可以确定（关于关键随机变量之间的依赖关系，涉及条目评分、发现和相关性）答案的条件，在形式层次上解决了这个问题。我们发现了保证有效流行度（或恰好相反）的条件，和反映真实有效性的测量指标值的条件，或定量地从中推导出。我们通过经验结果证实了理论发现。我们构建了一个完全没有在常见的公共数据中存在的偏见的众包数据集，其中我们解释了在常见带偏见离线实验设置的准确率，和通过无偏见观察数据测量得到的真实准确率之间的矛盾。

SIGKDD 2018

ACM SIGKDD 国际会议是由 ACM 的知识发现及数据挖掘专委会（SIGKDD）主办的数据挖掘研究领域的顶级年会。KDD 大会涉及的议题大多跨学科且应用广泛，吸引了来自统计、机器学习、数据库、万维网、生物信息学、多媒体、自然语言处理、人机交互、社会网络计算、高性能计算以及大数据挖掘等众多领域的专家和学者参会。

Research Track最佳论文

《Adversarial attacks on classification models for Graphs》

Daniel Zügner， Amir Akbarnejad， Stephan Günnemann

【Abstract】Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learn ing model. We generate adversarial perturbations targeting the node’s features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful even when only limited knowledge about the graph is given.

【论文摘要】图的深度学习模型在节点分类任务中有很好的表现。尽管它们被大量应用，但是目前还没有研究它们抗敌攻击的能力。然而，在其可能被使用的领域，例如网络应用中，对手是很常见的。图的深度学习模型容易被愚弄吗？在本文中，我们首先介绍了对属性图的对抗性攻击的研究，特别集中于利用图卷积思想的模型。除了测试时的攻击，我们还处理了更具挑战性的一类中毒/因果攻击，其重点是机器学习模型的训练阶段。我们生成针对节点特征和图结构的对抗性扰动，从而考虑实例之间的依赖关系。此外，通过保持重要数据特征，我们确保扰动保持不明显。为了处理底层离散域，我们提出了利用增量计算的有效算法Nettack。我们的实验研究表明，即使只执行很少的扰动，节点分类的精度也显著下降。此外，我们的攻击是可推广的：学习到的攻击推广到其他最先进的节点分类模型和无监督方法，即使仅给出关于图的有限知识，结果同样是成功的。

Research Track最佳学生论文

《XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music》

Hongyuan Zhu， Qi Liu， Nicholas Jing Yuan， Chuan Qin， Jiawei Li， Kun Zhang， Guang Zhou， Furu Wei， Yuanchun Xu， Enhong Chen

【Abstract】With the development of knowledge of music composition and the recent increase in demand, an increasing number of companies and research institutes have begun to study the automatic generation of music. However, previous models have limitations when applying to song generation, which requires both the melody and arrangement. Besides, many critical factors related to the quality of a song such as chord progression and rhythm patterns are not well addressed. In particular, the problem of how to ensure the harmony of multi-track music is still underexplored. To this end, we present a focused study on pop music generation, in which we take both chord and rhythm influence of melody generation and the harmony of music arrangement into consideration. We pro- pose an end-to-end melody and arrangement generation frame- work, called XiaoIce Band, which generates a melody track with several accompany tracks played by several types of instruments. Specifically, we devise a Chord based Rhythm and Melody Cross- Generation Model (CRMCG) to generate melody with chord progressions. Then, we propose a Multi-Instrument Co-Arrangement Model (MICA) using multi-task learning for multi-track music arrangement. Finally, we conduct extensive experiments on a real- world dataset, where the results demonstrate the effectiveness of XiaoIce Band.

【论文摘要】随着音乐创作知识的发展和近年来需求的增加，越来越多的公司和研究机构开始研究音乐的自动生成。然而，以往的模型在应用于歌曲生成时存在局限性，因为这既需要旋律又需要编排。此外，许多与歌曲质量相关的关键因素，如和弦和节奏模式没有得到很好的解决。尤其是如何保证多声道音乐的和谐，目前还处于探索阶段。为此，我们对流行音乐的产生进行了重点研究，在研究中，我们既考虑了旋律产生的和弦和节奏的影响，也考虑了和声。我们提出了一个端到端的旋律和排列生成框架，称为小冰乐队（Xiaoice Band），它用几种乐器演奏的多个伴奏曲目生成旋律曲目。特别地，我们设计了一个基于和弦的节奏和旋律交叉生成模型（chord based rhythm and melody cross generation model, CRMCG）来生成具有和弦的旋律。然后，我们提出了采用多任务学习的多乐器协奏模型（Multi-Instrument Co-Arrangement Model，MICA），用于多声道音乐的编曲。最后，我们在真实数据集上进行了大量的实验，结果证明了小冰乐队的有效性。

《Real-time Personalization using Embeddings for Search Ranking at Airbnb》

Mihajlo Grbovic， Haibin Cheng

【Abstract】Search Ranking and Recommendations are fundamental problems of crucial interest to major Internet companies, including web search engines, content publishing websites and marketplaces. How- ever, despite sharing some common characteristics a one-size-fits- all solution does not exist in this space. Given a large difference in content that needs to be ranked, personalized and recommended, each marketplace has a somewhat unique challenge. Correspondingly, at Airbnb, a short-term rental marketplace, search and recommendation problems are quite unique, being a two-sided market- place in which one needs to optimize for host and guest preferences, in a world where a user rarely consumes the same item twice and one listing can accept only one guest for a certain set of dates. In this paper we describe Listing and User Embedding techniques we developed and deployed for purposes of Real-time Personalization in Search Ranking and Similar Listing Recommendations, two channels that drive 99% of conversions. The embedding models were specifically tailored for Airbnb marketplace, and are able to capture guest’s short-term and long-term interests, delivering effective home listing recommendations. We conducted rigorous offline testing of the embedding models, followed by successful online tests before fully deploying them into production.

【论文摘要】搜索排名和推荐是互联网公司非常感兴趣的基本问题，包括网络搜索引擎、内容发布网站和市场。然而，尽管共享了一些共同的特征，但是在这个空间中不存在一个通用的解决方案。鉴于需要排名、个性化和推荐的内容有很大差异，每个市场都有一些独特的挑战。相应地，在Airbnb，短期租赁市场、搜索和推荐问题非常独特，它是一个双向市场，需要同时优化屋主和入住客户的偏好，这是一个用户很少两次消费相同的物品、并且一个列表在某一组日期只能接受一个客户的“世界”。在本文中，我们描述了为了在搜索排名和类似列表推荐中实现实时个性化而开发和部署的列表和用户嵌入技术，这两个通道驱动99%的转换。嵌入模型是专门为Airbnb市场量身定制的，能够捕捉客户的短期和长期兴趣，提供有针对性的住户建议。我们对嵌入模型进行了严格的离线测试，然后在将它们完全部署到生产环境中之前对其进行了成功的在线测试。

《ActiveRemediation: The Search for Lead Pipes in Flint, Michigan》

Jacob Abernethy， Alex Chojnacki， Arya Farahi， Eric Schwartz， Jared Webb

【Abstract】We detail our ongoing work in Flint, Michigan to detect pipes made of lead and other hazardous metals. After elevated levels of lead were detected in residents’ drinking water, followed by an increase in blood lead levels in area children, the state and federal governments directed over \$125 million to replace water service lines, the pipes connecting each home to the water system. In the absence of accurate records, and with the high cost of determining buried pipe materials, we put forth a number of predictive and procedural tools to aid in the search and removal of lead infrastructure. Alongside these statistical and machine learning approaches, we describe our interactions with government officials in recommending homes for both inspection and replacement, with a focus on the statistical model that adapts to incoming information. Finally, in light of discussions about increased spending on infrastructure development by the federal government, we explore how our approach generalizes beyond Flint to other municipalities nationwide.

【论文摘要】我们详细介绍了我们在密歇根州Flint市正在进行的工作：检测由铅和其他有害金属制成的管道。在居民饮用水中检测到铅含量升高，随后该地区儿童血铅水平升高之后，州和联邦政府拨款超过1.25亿美元用以更换供水管线，这些管线将每个家庭连接到供水系统。在缺乏准确记录的情况下，并且由于确定埋地管道材料的高成本，我们提出了许多预测和程序化工具，以帮助搜索和移除铅基础设施。除了这些统计和机器学习方法之外，我们还描述了我们与政府官员在建议房屋进行检查和更换方面的交流，重点是适应输入信息的统计模型。最后，根据联邦政府关于增加基础设施建设支出的讨论，我们探索我们的方法如何从Flint市推广到全国其他城市。

ICLR 2018

ICLR，全称为International Conference on Learning Representations（国际学习表征会议），2013 年才刚刚成立了第一届。这个一年一度的会议虽然今年才办到第六届，但已经被学术研究者们广泛认可，被认为深度学习的顶级会议，有深度学习顶会“无冕之王”之称。

ICLR由Yann LeCun 和 Yoshua Bengio 等大牛发起，会议开创了公开评议机制（open review），但在今年取消了公开评议，改为双盲评审。

《On the convergence of Adam and Beyond》

Sashank J. Reddi, Satyen Kale & Sanjiv Kumar

《Spherical CNNs》

Taco S. Cohen，Mario Geiger，Jonas Köhler， Max Welling

【Abstract】Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective.

In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.

【论文摘要】卷积神经网络（CNN）已经成为二维平面图像学习问题的首选方法。然而，最近一些有趣的问题产生了对能够分析球形图像的模型的需求。比如无人机、机器人和自主汽车的全向视觉、分子回归问题、以及全球天气和气候建模。将卷积网络应用于球面信号的平面投影肯定会失败，因为这种投影引入的空间变化失真将使平移权重共享无效。本文介绍了构建球形CNN的基本模块。我们提出了球面互相关的一个定义，即它既是表示性的，又是旋转等变的。球面相关满足一个广义傅立叶定理，它允许我们使用广义（非交换）快速傅立叶变换（FFT）算法有效地计算它。我们证明了球形CNN应用于三维模型识别和雾化能量回归的计算效率、数值精度和有效性。

《Continuous adaptation via meta-learning in nonstationary and competitive environments》

Maruan Al-Shedivat， Trapit Bansal， Yura Burda， Ilya Sutskever， Igor Mordatch， Pieter Abbeel

【Abstract】The ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additionally, we design a new multiagent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation. We demonstrate that meta-learning enables significantly more efficient adaptation than reactive baselines in the fewshot regime. Our experiments with a population of agents that learn and compete suggest that metalearners are the fittest.

【论文摘要】持续学习并适应非平稳环境中的有限经验的能力对通用人工智能的发展至关重要。在本文中，我们将连续适应问题引入到学习到学习框架中。我们提出了一个简单的基于梯度的元学习算法，适合于适应动态变化和对抗性的情况。此外，我们设计了一个新的多智能体竞争环境RoboSumo，并定义了迭代适应游戏来测试连续适应的各个方面。我们证明，元学习比反应性基线在少样本模式下能更有效地进行适应。我们对智能体集群的学习和竞争实验表明，元学习是最合适的方法。

COLT 2018

《Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations》

Yuanzhi Li, Tengyu Ma and Hongyang Zhang.

【Abstract】We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations.

Concretely, we show that given O ̃(dr2) random linear measurements of a rank r positive semidefinite matrix X, we can recover X by parameterizing it by UU with  $$R^{dxd}$$ and minimizing the squared loss, even if r ≪ d. We prove that starting from a small initialization, gradient descent recovers X in O ̃(r) iterations approximately. The results solve the conjecture of Gunasekar et al. [16] under the restricted isometry property.

The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.

【论文摘要】我们发现梯度下降法为训练过参数化的矩阵分解模型，以及使用二次函数作为激活函数 的单隐含层神经网络提供了隐式的正则化效果。

r<=d。我们证明了从一个小初始化开始，梯度下降法能够在o(√2)此迭代后重构出X*。 这个结果在约束等距性下解决了 Gunasekar 等人的猜想。

《Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure》

Matthew Brennan, Guy Bresler and Wasim Huleihel.

【Abstract】Recently, research in unsupervised learning has gravitated towards exploring statistical- computational gaps induced by sparsity. A line of work initiated in [BR13a] has aimed to explain these gaps through reductions from conjecturally hard problems in complexity theory. However, the delicate nature of average-case reductions has limited the development of tech- niques and often led to weaker hardness results that only apply to algorithms robust to different noise distributions or that do not need to know the parameters of the problem. We introduce several new techniques to give a web of average-case reductions showing strong computational lower bounds based on the planted clique conjecture. Our new lower bounds include:

•   Planted Independent Set: We show tight lower bounds for detecting a planted inde- pendent set of size k in a sparse Erdo ̋s-R ́enyi graph of size n with edge density Θ ̃(n−α).

•   Planted Dense Subgraph: If p > q are the edge densities inside and outside of the community, we show the first lower bounds for the general regime q = Θ ̃(n−α) and p−q = Θ ̃(n−γ where γ ≥ α, matching the lower bounds predicted in [CX16]. Our lower bounds apply to a deterministic community size k, resolving a question raised in [HWX15].

•   Biclustering: We show lower bounds for the canonical simple hypothesis testing formu- lation of Gaussian biclustering, slightly strengthening the result in [MW15b].

•   Sparse Rank-1 Submatrix: We show that detection in the sparse spiked Wigner model is often harder than biclustering, and are able to obtain tight lower bounds for these two problems with different reductions from planted clique.

• Sparse PCA: We give a reduction between sparse rank-1 submatrix and sparse PCA to

obtain tight lower bounds in the less sparse regime  n, when the spectral algorithm is optimal over the natural SDP. We give an alternate reduction recovering the lower bounds of [BR13a, GMZ17] in the simple hypothesis testing variant of sparse PCA. We also observe a subtlety in the complexity of sparse PCA that arises when the planted vector is biased.

• Subgraph Stochastic Block Model: We introduce a model where two small communi- ties are planted in an Erdo ̋s-R ́enyi graph of the same average edge density and give tight lower bounds yielding different hard regimes than planted dense subgraph.

Our results demonstrate that, despite the delicate nature of average-case reductions, using natural problems as intermediates can often be beneficial, as is the case in worst-case complexity. Our main technical contribution is to introduce a set of techniques for average-case reductions that: (1) maintain the level of signal in an instance of a problem; (2) alter its planted structure; and (3) map two initial high-dimensional distributions simultaneously to two target distributions approximately under total variation. We also give algorithms matching our lower bounds and identify the information-theoretic limits of the models we consider.

【论文摘要】最近，对无监督学习的研究偏向于探索由于稀疏性所导致的统计计算间隙。文献[BR13a]

Planted independent set：我们得到了在尺寸为n、边密度为Θ ̃(n−α)的稀疏Erdos-Renyi图中检测尺寸为k的planted independent集的紧下界。

Planted dense subgraph：如果p>q分别为community内部和外部的边密度，我们得到了通用间隙的第一个下界为q = Θ ̃(n−α) ，并且 p−q = Θ ̃(nγ) ，γ ≥ α时，与文献[CX16]预测的下界吻合。这些下界用于尺寸为k的确定性community时，解决了文献[HWX15]提出的问题。

Biclustering：为表述高斯双聚类的典型简单假设检验提出了下界，直接加强了文献[MW15b]的结果。

Sparse Rank-1 Submatrix：发现在稀疏spiked wigner模型中进行检测通常比双聚类更难，而且能得到更紧的下界。

《Logistic Regression: The Importance of Being Improper》

Dylan Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri and Karthik Sridharan.

【Abstract】Learning linear predictors with the logistic loss—both in stochastic and online settings—is a fundamental task in machine learning and statistics, with direct connections to classification and boosting. Existing “fast rates” for this setting exhibit exponential dependence on the predictor norm, and Hazan et al. (2014) showed that this is unfortunately unimprovable. Starting with the simple observation that the logistic loss is 1-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm. This provides a positive resolution to a variant of the COLT 2012 open problem of McMahan and Streeter (2012) when improper learning is allowed. This improvement is obtained both in the online setting and, with some extra work, in the batch statistical setting with high probability. We also show that the improved dependence on predictor norm is near-optimal.

Leveraging this improved dependency on the predictor norm yields the following applica- tions: √(a) we give algorithms for online bandit multiclass learning with the logistic loss with an O ( n) relative mistake bound across essentially all parameter ranges, thus providing a so- lution to the COLT 2009 open problem of Abernethy and Rakhlin (2009), and (b) we give an adaptive algorithm for online multiclass boosting with optimal sample complexity, thus partially resolving an open problem of Beygelzimer et al. (2015) and Jung et al. (2017). Finally, we give information-theoretic bounds on the optimal rates for improper logistic regression with general function classes, thereby characterizing the extent to which our improvement for linear classes extends to other parametric and even nonparametric settings.

【论文摘要】无论是在随机还是在线情况下，用logistic损失训练线性预测器都是机器学习和统计学中的一个基本任务，与分类和boosting直接相关。现有的对这些情况的“faste rates”显示出对预测函数的范数成指数级的相关性，并且文献[Hazan et al 2014]发现这一个问题是无法改进的。从logistic损失是1-mixable这一结果开始，本文为在线logistichuig 设计了一个新的有效的improper学习算法，避开了上面提到的下界，得到了一个regret bound，它对预测器范数的相关性有双指数级提升。这一结论正面回答了COLT 2012的开放问题的变种。这一改进对在线条件下是有效的。另外还显示了这一改进的对预测期的范数的相关性是近似最优的。

COLING 2018

Best error analysis

《SGM: Sequence Generation Model for Multi-label Classification》

Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu and Houfeng Wang

【Abstract】Multi-label classification is an important yet challenging task in natural language processing. It is more complex than single-label classification in that the labels tend to be correlated. Existing methods tend to ignore the correlations between labels. Besides, different parts of the text can contribute differently for predicting different labels, which is not considered by existing models. In this paper, we propose to view the multi-label classification task as a sequence generation problem, and apply a sequence generation model with a novel decoder structure to solve it. Extensive experimental results show that our proposed methods outperform previous work by a substantial margin. Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.

【论文摘要】多标签分类是自然语言处理中一项重要而又具有挑战性的任务。它比单标签分类更复杂，因为标签往往是相互关联的。现有的方法往往忽略标签之间的相关性。此外，文本的不同部分对于预测不同的标签会有不同的贡献，这在现有的模型中没有考虑过。本文提出将多标签分类任务看作一个序列生成问题，并应用一种新的解码器结构的序列生成模型来解决该问题。大量的实验结果表明，我们提出的方法比之前的模型性能要好很多。通过对实验结果的进一步分析，表明该方法不仅能够捕获标签之间的相关性，而且在预测不同标签时能够自动选择信息量最大的单词。

Best linguistic analysis

《Distinguishing affixoid formations from compounds》

Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert

【Abstract】We study German affixoids, a type of morpheme in between affixes and free stems. Several properties have been associated with them – increased productivity; a bleached semantics, which is often evaluative and/or intensifying and thus of relevance to sentiment analysis; and the existence of a free morpheme counterpart – but not been validated empirically. In experiments on a new data set that we make available, we put these key assumptions from the morphological literature to the test and show that despite the fact that affixoids generate many low-frequency formations, we can classify these as affixoid or non-affixoid instances with a best F1-score of 74%.

【论文摘要】本文针对德语词缀做了研究，这是一种介于词缀和自由词干之间的词素。德语词缀与几个特性有关——生产力的提高；一种淡化的语义，它经常是评估性的和/或加强性的，因此与情感分析相关；和自由语素对应物的存在——但是这些并没有被验证。在一组新数据集上的实验中，我们从形态学文献中对这些关键假设进行了检验，结果表明，尽管附加物产生许多低频结构，但我们可以将它们分类为附加物或非附加物，其最佳F1-分数为74％。

Best NLP engineering experiment

《Authorless Topic Models: Biasing Models Away from Known Structure》

Laure Thompson and David Mimno

【Abstract】Most previous work in unsupervised semantic modeling in the presence of metadata has assumed that our goal is to make latent dimensions more correlated with metadata, but in practice the exact opposite is often true. Some users want topic models that highlight differences between, for example, authors, but others seek more subtle connections across authors. We introduce three metrics for identifying topics that are highly correlated with metadata, and demonstrate that this problem affects between 30 and 50% of the topics in models trained on two real-world collections, regardless of the size of the model. We find that we can predict which words cause this phenomenon and that by selectively subsampling these words we dramatically reduce topicmetadata correlation, improve topic stability, and maintain or even improve model quality.

【论文摘要】以前在存在元数据的无监督语义建模方面的大多数工作都假定我们的目标是使潜在维度与元数据更加相关，但在实践中恰恰相反：一些用户希望主题模型突出例如作者之间的区别，但是另一些用户希望作者之间有更微妙的联系。我们介绍了三个用于识别与元数据高度相关的主题的度量，并展示了在两个真实集合上训练的模型中，无论模型的大小如何，这个问题都会影响30%到50%的主题。我们发现，我们可以预测哪些单词导致了这种现象，并且通过选择性地对这些单词进行子采样，我们显著地减少了主题元数据相关性，提高了主题稳定性，并且保持甚至提高了模型质量。

Best position paper

《Arguments and Adjuncts in Universal Dependencies》

【Abstract】The aim of this paper is to argue for a coherent Universal Dependencies approach to the core vs. non-core distinction. We demonstrate inconsistencies in the current version 2 of UD in this respect – mostly resulting from the preservation of the argument–adjunct dichotomy despite the declared avoidance of this distinction – and propose a relatively conservative modification of UD that is free from these problems.

【论文摘要】本文论证了一个连贯的通用依赖（Universal Dependencies，UD）方法可以用来区分核心和非核心。我们在这方面论证了UD当前版本2中的不一致性——主要是由于保留了争论点（附加二分法，尽管声明避免了这一特性）——并且建议对UD进行相对保守的修改，以免出现这些问题。

Best reproduction paper

《Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering》

Wuwei Lan and Wei Xu

【Abstract】In this paper, we analyze several neural network designs (and their variations) for sentence pair modeling and compare their performance extensively across eight datasets, including paraphrase identification, semantic textual similarity, natural language inference, and question answering tasks. Although most of these models have claimed state-of-the-art performance, the original papers often reported on only one or two selected datasets. We provide a systematic study and show that (i) encoding contextual information by LSTM and inter-sentence interactions are critical, (ii) Tree-LSTM does not help as much as previously claimed but surprisingly improves performance on Twitter datasets, (iii) the Enhanced Sequential Inference Model (Chen et al., 2017) is the best so far for larger datasets, while the Pairwise Word Interaction Model (He and Lin, 2016) achieves the best performance when less data is available. We release our implementations as an open-source toolkit.

Best resource paper

《AnlamVer: Semantic Model Evaluation Dataset for Turkish – Word Similarity and Relatedness》

Gökhan Ercan and Olcay Taner Yıldız

【Abstract】In this paper, we present AnlamVer, which is a semantic model evaluation dataset for Turkish designed to evaluate word similarity and word relatedness tasks while discriminating those two relations from each other. Our dataset consists of 500 word-pairs annotated by 12 human subjects, and each pair has two distinct scores for similarity and relatedness. Word-pairs are selected to enable the evaluation of distributional semantic models by multiple attributes of words and word-pair relations such as frequency, morphology, concreteness and relation types (e.g., synonymy, antonymy). Our aim is to provide insights to semantic model researchers by evaluating models in multiple attributes. We balance dataset word-pairs by their frequencies to evaluate the robustness of semantic models concerning out-of-vocabulary and rare words problems, which are caused by the rich derivational and inflectional morphology of the Turkish language.

【论文摘要】本文提出了一个土耳其语语义模型评价数据集AnlamVer，这个数据集可用于评价词语相似性和词语关联性任务。我们的数据集由500个单词对组成，由12个人类受试者注释，每对单词有相似性和相关性两个不同的分数。选择词对，以便通过词的多个属性和词对关系，例如频率、形态、具体性和关系类型（如同义词、反义词）对分布式语义模型进行评估。我们的目的是通过评估多个属性中的模型，为语义模型研究者提供见解。我们根据它们的频率来平衡数据集词对，以评估由土耳其语丰富的派生和屈折形态引起的词汇外问题和稀有词问题的语义模型的鲁棒性。

Best survey paper

《A Survey on Open Information Extraction》

Christina Niklaus, Matthias Cetto, André Freitas and Siegfried Handschuh

【Abstract】We provide a detailed overview of the various approaches that were proposed to date to solve the task of Open Information Extraction. We present the major challenges that such systems face, show the evolution of the suggested approaches over time and depict the specific issues they address. In addition, we provide a critique of the commonly applied evaluation procedures for assessing the performance of Open IE systems and highlight some directions for future work.

【论文摘要】我们提供了迄今为止为解决开放信息提取任务而提出的各种方法的详细概述。我们介绍了这些系统面临的主要挑战，展示了随时间的演变而出现的建议的方法，并描述它们所处理的具体问题。此外，我们还对用于评估开放式IE系统性能的常用评估程序进行了批评，并强调了今后工作的一些方向。

Most reproducible

《Design Challenges and Misconceptions in Neural Sequence Labeling》

Jie Yang, Shuailong Liang and Yue Zhang

【Abstract】We investigate the design challenges of constructing effective and efficient neural sequence labeling systems, by reproducing twelve neural sequence labeling models, which include most of the state-of-the-art structures, and conduct a systematic model comparison on three benchmarks (i.e. NER, Chunking, and POS tagging). Misconceptions and inconsistent conclusions in existing literature are examined and clarified under statistical experiments. In the comparison and analysis process, we reach several practical conclusions which can be useful to practitioners.

【论文摘要】我们通过重现12个包含大多数最新结构的神经序列标记模型，并针对三个基准（即NER、Chunking和POS标记）进行系统地模型比较，来研究构建有效和高效的神经序列标记系统的设计挑战。我们通过统计实验检验并澄清了现有文献中的误解和不一致的结论。在比较分析的过程中，我们得出了一些可供实践者参考的实用结论。

AAAI 2018

ACL 2018

COLT 2018

CVPR 2018

ECCV 2018

ICLR 2018

ICML 2018

ICIP 2018

IJCAI-ECAI-2018

NIPS 2018

SIGIR 2018

SIGKDD 2018

COLING 2018