Stochastic Optimization for Regularized Wasserstein Estimators

Optimal transport is a foundational problem in optimization, that allows to
compare probability distributions while taking into account geometric aspects.
Its optimal objective value, the Wasserstein distance, provides an important
loss between...

The exact geostrophic streamfunction for neutral surfaces

McDougall (1989) proved that neutral surfaces possess an exact geostrophic
streamfunction, but its form has remained unknown. The exact geostrophic
streamfunction for neutral surfaces is derived here. It involves a path
integral of the specific...

Neutral surface topology

Neutral surfaces, along which most of the mixing in the ocean occurs, are
notoriously difficult objects: they do not exist as well-defined surfaces, and
as such can only be approximated. In a hypothetical ocean where neutral
surfaces are...

Reward-rational (implicit) choice: A unifying formalism for reward learning

It is often difficult to hand-specify what the correct reward function is for
a task, so researchers have instead aimed to learn reward functions from human
behavior or feedback. The types of behavior interpreted as evidence of the
reward function...

Neural Network Compression Framework for fast model inference

In this work we present a new framework for neural networks compression with
fine-tuning, which we called Neural Network Compression Framework (NNCF). It
leverages recent advances of various network compression methods and implements
some of them,...

A non-inferiority test for R-squared with random regressors

Determining the lack of association between an outcome variable and a number
of different explanatory variables is frequently necessary in order to
disregard a proposed model. This paper proposes a non-inferiority test for the
coefficient of...

The Geometry of Sign Gradient Descent

Sign-based optimization methods have become popular in machine learning due
to their favorable communication cost in distributed optimization and their
surprisingly good performance in neural network training. Furthermore, they are
closely connected...

Automatic Shortcut Removal for Self-Supervised Representation Learning

In self-supervised visual representation learning, a feature extractor is
trained on a "pretext task" for which labels can be generated cheaply. A
central challenge in this approach is that the feature extractor quickly learns
to exploit low-level...

On the Likelihood of Observing Extragalactic Civilizations: Predictions from the Self-Indication Assumption

Ambitious civilizations that expand for resources at an intergalactic scale
could be observable from a cosmological distance, but how likely is one to be
visible to us? The question comes down to estimating the appearance rate of
such things in the...

Using the Output Embedding to Improve Language Models

We study the topmost weight matrix of neural network language models. We show
that this matrix constitutes a valid word embedding. When training language
models, we recommend tying the input embedding and this output embedding. We
analyze the...

Neuromorphologicaly-preserving Volumetric data encoding using VQ-VAE

The increasing efficiency and compactness of deep learning architectures,
together with hardware improvements, have enabled the complex and
high-dimensional modelling of medical volumetric data at higher resolutions.
Recently, Vector-Quantised...

Learning with Differentiable Perturbed Optimizers

Machine learning pipelines often rely on optimization procedures to make
discrete decisions (e.g. sorting, picking closest neighbors, finding shortest
paths or optimal matchings). Although these discrete decisions are easily
computed in a forward...

Fast Differentiable Sorting and Ranking

The sorting operation is one of the most basic and commonly used building
blocks in computer programming. In machine learning, it is commonly used for
robust statistics. However, seen as a function, it is piecewise linear and as a
result includes...

No-Regret and Incentive-Compatible Online Learning

We study online learning settings in which experts act strategically to
maximize their influence on the learning algorithm's predictions by potentially
misreporting their beliefs about a sequence of binary events. Our goal is
twofold. First, we want...