Articles from 2019-12-31

56 new data science research articles were published on 2019-12-31. 20 discussed machine learning.

Bryan Whiting https://www.bryanwhiting.com
2020-01-02

Table of Contents


Breakdown of arXiv Publication Counts

Yesterday’s counts of submitted papers on www.arxiv.org grouped by primary subject. Click the links in the table to be re-directed to the abstracts below. The links under Subject will redirect you to abstracts with the primary subject (there can only be one primary subject on arXiv). The links under Category will redirect you to all publications yesterday with a given tag (primary or secondary).

Table 1: Number of articles by subject and primary category. Colored titles represent hyperlinks that take you below to abstracts. Key - Subject: Computer Science (5) means there were 5 articles with primary tag CS. Category: Machine Learning (cs.LG) N = 8 (16) means there were 8 primary articles with the (cs.LG) tag but 16 articles had it as a secondary tag, so there should be 24 in total. Click this link to be taken to all 24. Only select categories are highlighted because they are of particular interest to applied data scientists.
Subject Category N
Computer Science (26) Machine Learning (cs.LG) 8 (11)
Computer Vision and Pattern Recognition (cs.CV) 6 (6)
Artificial Intelligence (cs.AI) 6 (2)
Software Engineering (cs.SE) 2
Robotics (cs.RO) 1 (2)
Computation and Language (cs.CL) 1 (1)
Digital Libraries (cs.DL) 1
Emerging Technologies (cs.ET) 1
Statistics (11) Machine Learning (stat.ML) 4 (12)
Methodology (stat.ME) 3 (1)
Applications (stat.AP) 2 (2)
Computation (stat.CO) 2 (1)
Mathematics (6) Statistics Theory (math.ST) 4 (1)
Optimization and Control (math.OC) 1 (3)
Numerical Analysis (math.NA) 1 (1)
Elec. Eng. and Systems Science (4) Image and Video Processing (eess.IV) 4 (1)
Condensed Matter (3) Materials Science (cond-mat.mtrl-sci) 3 (1)
Physics (2) Computational Physics (physics.comp-ph) 1 (5)
Physics and Society (physics.soc-ph) 1 (1)
Quantitative Biology (2) Biomolecules (q-bio.BM) 1
Populations and Evolution (q-bio.PE) 1
Other (1) Earth and Planetary Astrophysics (astro-ph.EP) 1
Quantum Physics (1) Quantum Physics (quant-ph) 1

Articles for Statitstics, Machine Learning Econonmetrics, and Finance

This section contains all articles with any tag of stat.AP, stat.co, stat.ML, cs.LG, q-fin.ST, q-fin.EC, or econ-EM. Only the first two sentences are shown - click the links for more detail.

Applications (stat.AP): 4 new

Applications (stat.AP)
Statistical Models in Forensic Voice Comparison
Audio and Speech Processing, Sound, Applications. 5 authors. pdf
This chapter describes a number of signal-processing and statistical-modeling techniques that are commonly used to calculate likelihood ratios in human-supervised automatic approaches to forensic voice comparison. Techniques described include mel-frequency cepstral coefficients (MFCCs) feature extraction, Gaussian mixture model - universal background model (GMM-UBM) systems, i-vector - probabilistic linear discriminant analysis (i-vector PLDA) systems, deep neural network (DNN) based systems (including senone posterior i-vectors, bottleneck features, and embeddings / x-vectors), mismatch compensation, and score-to-likelihood-ratio conversion (aka calibration). …
Empirical validation of forensic-voice-comparison systems is also covered. The aim of the chapter is to bridge the gap between general introductions to forensic voice comparison and the highly technical automatic-speaker-recognition literature from which the signal-processing and statistical-modeling techniques are mostly drawn. Knowledge of the likelihood-ratio framework for the evaluation of forensic evidence is assumed. It is hoped that the material presented here will be of value to students of forensic voice comparison and to researchers interested in learning about statistical modeling techniques that could potentially also be applied to data from other branches of forensic science.
On Testing for Biases in Peer Review
Methodology, Statistics Theory, Applications, Statistics Theory. 3 authors. pdf
We consider the issue of biases in scholarly research, specifically, in peer review. There is a long standing debate on whether exposing author identities to reviewers induces biases against certain groups, and our focus is on designing tests to detect the presence of such biases. …
Our starting point is a remarkable recent work by Tomkins, Zhang and Heavlin which conducted a controlled, large-scale experiment to investigate existence of biases in the peer reviewing of the WSDM conference. We present two sets of results in this paper. The first set of results is negative, and pertains to the statistical tests and the experimental setup used in the work of Tomkins et al. We show that the test employed therein does not guarantee control over false alarm probability and under correlations between relevant variables coupled with any of the following conditions, with high probability, can declare a presence of bias when it is in fact absent: (a) measurement error, (b) model mismatch, (c) reviewer calibration. Moreover, we show that the setup of their experiment may itself inflate false alarm probability if (d) bidding is performed in non-blind manner or (e) popular reviewer assignment procedure is employed. Our second set of results is positive and is built around a novel approach to testing for biases that we propose. We present a general framework for testing for biases in (single vs. double blind) peer review. We then design hypothesis tests that under minimal assumptions guarantee control over false alarm probability and non-trivial power even under conditions (a)–(c) as well as propose an alternative experimental setup which mitigates issues (d) and (e). Finally, we show that no statistical test can improve over the non-parametric tests we consider in terms of the assumptions required to control for the false alarm probability.
Driver fatigue EEG signals detection by using robust univariate analysis
Methodology, Applications, Signal Processing. 3 authors. pdf
Driver fatigue is a major cause of traffic accidents and the electroencephalogram (EEG) is considered one of the most reliable predictors of fatigue. This paper proposes a novel, simple and fast method for driver fatigue detection that can be implemented in real-time systems by using a single-channel on the scalp. …
The method based on the robust univariate analysis of EEG signals is composed of two stages. First, the most significant channel from EEG raw is selected according to the maximum variance. In the second stage, this single channel will be used to detect the fatigue EEG signal by extracting four feature parameters. Two parameters estimated from the robust univariate analysis, namely mean and covariance, and two classical statistics parameters such as variance and covariance that help to tune the robust analysis. Next, an ensemble bagged decision trees classifier is used in order to discriminate fatigue signals from alert signals. The proposed algorithm is demonstrated on 24 EEG signals from the Jiangxi University of Technology database using only the most significant channel found, which is located in the left tempo-parietal region where spatial awareness and visual-spatial navigation are shared, in terms of 92.7% accuracy with 1.8 seconds of time delay.
Numerical Linear Algebra in Data Assimilation
Computation, Numerical Analysis, Applications, Numerical Analysis. 1 authors. pdf
Data assimilation is a method that combines observations (e.g. …
real world data) of a state of a system with model output for that system in order to improve the estimate of the state of the system and thereby the model output. The model is usually represented by a discretised partial differential equation. The data assimilation problem can be formulated as a large scale Bayesian inverse problem. Based on this interpretation we will derive the most important variational and sequential data assimilation approaches, in particular three-dimensional and four-dimensional variational data assimilation (3D-Var and 4D-Var) and the Kalman filter. We will then consider more advanced methods which are extensions of the Kalman filter and variational data assimilation and will pay particular attention to their advantages and disadvantages. The data assimilation problem usually results in a very large optimisation problem and/or a very large linear system to solve (due to inclusion of time and space dimensions). Therefore, the second part of this article aims to review advances and challenges, in particular from the numerical linear algebra perspective, within the various data assimilation approaches.

Machine Learning (stat.ML): 16 new

Machine Learning (stat.ML)
Leveraging Semi-Supervised Learning for Fairness using Neural Networks
Machine Learning, Artificial Intelligence, Machine Learning. 5 authors. pdf
There has been a growing concern about the fairness of decision-making systems based on machine learning. The shortage of labeled data has been always a challenging problem facing machine learning based systems. …
In such scenarios, semi-supervised learning has shown to be an effective way of exploiting unlabeled data to improve upon the performance of model. Notably, unlabeled data do not contain label information which itself can be a significant source of bias in training machine learning systems. This inspired us to tackle the challenge of fairness by formulating the problem in a semi-supervised framework. In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data to not just improve the performance but also improve the fairness of the decision-making process. The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.
Volumetric Lung Nodule Segmentation using Adaptive ROI with Multi-View Residual Learning
Machine Learning, Image and Video Processing, Computer Vision and Pattern Recognition, Machine Learning. 5 authors. pdf
Accurate quantification of pulmonary nodules can greatly assist the early diagnosis of lung cancer, which can enhance patient survival possibilities. A number of nodule segmentation techniques have been proposed, however, all of the existing techniques rely on radiologist 3-D volume of interest (VOI) input or use the constant region of interest (ROI) and only investigate the presence of nodule voxels within the given VOI. …
Such approaches restrain the solutions to investigate the nodule presence outside the given VOI and also include the redundant structures into VOI, which may lead to inaccurate nodule segmentation. In this work, a novel semi-automated approach for 3-D segmentation of nodule in volumetric computerized tomography (CT) lung scans has been proposed. The proposed technique can be segregated into two stages, at the first stage, it takes a 2-D ROI containing the nodule as input and it performs patch-wise investigation along the axial axis with a novel adaptive ROI strategy. The adaptive ROI algorithm enables the solution to dynamically select the ROI for the surrounding slices to investigate the presence of nodule using deep residual U-Net architecture. The first stage provides the initial estimation of nodule which is further utilized to extract the VOI. At the second stage, the extracted VOI is further investigated along the coronal and sagittal axis with two different networks and finally, all the estimated masks are fed into the consensus module to produce the final volumetric segmentation of nodule. The proposed approach has been rigorously evaluated on the LIDC dataset, which is the largest publicly available dataset. The result suggests that the approach is significantly robust and accurate as compared to the previous state of the art techniques.
On the Role of Weight Sharing During Deep Option Learning
Machine Learning, Machine Learning. 5 authors. pdf
The options framework is a popular approach for building temporally extended actions in reinforcement learning. In particular, the option-critic architecture provides general purpose policy gradient theorems for learning actions from scratch that are extended in time. …
However, past work makes the key assumption that each of the components of option-critic has independent parameters. In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, it is always violated in practice for the deep function approximation setting. We thus reconsider this assumption and consider more general extensions of option-critic and hierarchical option-critic training that optimize for the full architecture with each update. It turns out that not assuming parameter independence challenges a belief in prior work that training the policy over options can be disentangled from the dynamics of the underlying options. In fact, learning can be sped up by focusing the policy over options on states where options are actually likely to terminate. We put our new algorithms to the test in application to sample efficient learning of Atari games, and demonstrate significantly improved stability and faster convergence when learning long options.
Schrödinger Bridge Samplers
Computation, Machine Learning. 4 authors. pdf
Consider a reference Markov process with initial distribution \(\pi_{0}\) and transition kernels \(\{M_{t}\}_{t\in[1:T]}\), for some \(T\in\mathbb{N}\). Assume that you are given distribution \(\pi_{T}\), which is not equal to the marginal distribution of the reference process at time \(T\). …
In this scenario, Schr"odinger addressed the problem of identifying the Markov process with initial distribution \(\pi_{0}\) and terminal distribution equal to \(\pi_{T}\) which is the closest to the reference process in terms of Kullback–Leibler divergence. This special case of the so-called Schr"odinger bridge problem can be solved using iterative proportional fitting, also known as the Sinkhorn algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schr"odinger bridge samplers, to approximate a target distribution \(\pi\) on \(\mathbb{R}^{d}\) and to estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of the reference Markov chain to obtain a process whose marginal distribution at time \(T\) becomes closer to \(\pi_T = \pi\), via regression-based approximations of the corresponding iterative proportional fitting recursion. We report preliminary experiments and make connections with other problems arising in the optimal transport, optimal control and physics literatures.
Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization
Machine Learning, Optimization and Control, Machine Learning. 3 authors. pdf
Stochastic compositional optimization arises in many important machine learning tasks such as value function evaluation in reinforcement learning and portfolio management. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. …
In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: \(\mathcal{O}((n+m)^{1/2} \varepsilon^{-2})\) in the finite-sum case and \(\mathcal{O}(\varepsilon^{-3})\) in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal. Our experiments validate the theoretical performance of our algorithm.
Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity
Machine Learning, Machine Learning. 3 authors. pdf
Traditional landscape analysis of deep neural networks aims to show that no sub-optimal local minima exist in some appropriate sense. From this, one may be tempted to conclude that descent algorithms which escape saddle points will reach a good local minimum. …
However, basic optimization theory tell us that it is also possible for a descent algorithm to diverge to infinity if there are paths leading to infinity, along which the loss function decreases. It is not clear whether for non-linear neural networks there exists one setting that no bad local-min and no decreasing paths to infinity can be simultaneously achieved. In this paper, we give the first positive answer to this question. More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity. The key mathematical trick is to show that the set of regularizers which may be undesirable can be viewed as the image of a Lipschitz continuous mapping from a lower-dimensional Euclidean space to a higher-dimensional Euclidean space, and thus has zero measure.
Reward-Conditioned Policies
Machine Learning, Machine Learning. 3 authors. pdf
Reinforcement learning offers the promise of automating the acquisition of complex behavioral skills. However, compared to commonly used and well-understood supervised learning methods, reinforcement learning algorithms can be brittle, difficult to use and tune, and sensitive to seemingly innocuous implementation decisions. …
In contrast, imitation learning utilizes standard and well-understood supervised learning methods, but requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? The main idea that we explore in this work is that non-expert trajectories collected from sub-optimal policies can be viewed as optimal supervision, not for maximizing the reward, but for matching the reward of the given trajectory. By then conditioning the policy on the numerical value of the reward, we can obtain a policy that generalizes to larger returns. We show how such an approach can be derived as a principled method for policy search, discuss several variants, and compare the method experimentally to a variety of current reinforcement learning methods on standard benchmarks.
Robust Aggregation for Federated Learning
Machine Learning, Cryptography and Security, Machine Learning. 3 authors. pdf
We present a robust aggregation approach to make federated learning robust to settings when a fraction of the devices may be sending corrupted updates to the server. The proposed approach relies on a robust secure aggregation oracle based on the geometric median, which returns a robust aggregate using a constant number of calls to a regular non-robust secure average oracle. …
The robust aggregation oracle is privacy-preserving, similar to the secure average oracle it builds upon. We provide experimental results of the proposed approach with linear models and deep networks for two tasks in computer vision and natural language processing. The robust aggregation approach is agnostic to the level of corruption; it outperforms the classical aggregation approach in terms of robustness when the level of corruption is high, while being competitive in the regime of low corruption.
Representation Internal-Manipulation (RIM): A Neuro-Inspired Computational Theory of Consciousness
Neural and Evolutionary Computing, Machine Learning, Machine Learning, Neurons and Cognition, Artificial Intelligence. 2 authors. pdf
Many theories, based on neuroscientific and psychological empirical evidence and on computational concepts, have been elaborated to explain the emergence of consciousness in the central nervous system. These theories propose key fundamental mechanisms to explain consciousness, but they only partially connect such mechanisms to the possible functional and adaptive role of consciousness. …
Recently, some cognitive and neuroscientific models try to solve this gap by linking consciousness to various aspects of goal-directed behaviour, the pivotal cognitive process that allows mammals to flexibly act in challenging environments. Here we propose the Representation Internal-Manipulation (RIM) theory of consciousness, a theory that links the main elements of consciousness theories to components and functions of goal-directed behaviour, ascribing a central role for consciousness to the goal-directed manipulation of internal representations. This manipulation relies on four specific computational operations to perform the flexible internal adaptation of all key elements of goal-directed computation, from the representations of objects to those of goals, actions, and plans. Finally, we propose the concept of `manipulation agency’ relating the sense of agency to the internal manipulation of representations. This allows us to propose that the subjective experience of consciousness is associated to the human capacity to generate and control a simulated internal reality that is vividly perceived and felt through the same perceptual and emotional mechanisms used to tackle the external world.
On the Difference Between the Information Bottleneck and the Deep Information Bottleneck
Information Theory, Information Theory, Machine Learning, Machine Learning. 2 authors. pdf
Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. …
The two assumed properties of the data \(X\), \(Y\) and their latent representation \(T\) take the form of two Markov chains \(T-X-Y\) and \(X-T-Y\). Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions \(P(X,Y,T)\). We therefore show how to circumvent this limitation by optimising a lower bound for \(I(T;Y)\) for which only the latter Markov chain has to be satisfied. The actual mutual information consists of the lower bound which is optimised in DVIB and cognate models in practice and of two terms measuring how much the former requirement \(T-X-Y\) is violated. Finally, we propose to interpret the family of information bottleneck models as directed graphical models and show that in this framework the original and deep information bottlenecks are special cases of a fundamental IB model.
Model Inversion Networks for Model-Based Optimization
Machine Learning, Machine Learning. 2 authors. pdf
In this work, we aim to solve data-driven optimization problems, where the goal is to find an input that maximizes an unknown score function given access to a dataset of inputs with corresponding scores. When the inputs are high-dimensional and valid inputs constitute a small subset of this space (e. …
g., valid protein sequences or valid natural images), such model-based optimization problems become exceptionally difficult, since the optimizer must avoid out-of-distribution and invalid inputs. We propose to address such problem with model inversion networks (MINs), which learn an inverse mapping from scores to inputs. MINs can scale to high-dimensional input spaces and leverage offline logged data for both contextual and non-contextual optimization problems. MINs can also handle both purely offline data sources and active data collection. We evaluate MINs on tasks from the Bayesian optimization literature, high-dimensional model-based optimization problems over images and protein designs, and contextual bandit optimization from logged data.
Approximate Inference for Fully Bayesian Gaussian Process Regression
Machine Learning, Machine Learning. 2 authors. pdf
Learning in Gaussian Process models occurs through the adaptation of hyperparameters of the mean and the covariance function. The classical approach entails maximizing the marginal likelihood yielding fixed point estimates (an approach called or ML-II). …
An alternative learning procedure is to infer the posterior over hyperparameters in a hierarchical specification of GPs we call (GPR). This work considers two approximation schemes for the intractable hyperparameter posterior: 1) Hamiltonian Monte Carlo (HMC) yielding a sampling-based approximation and 2) Variational Inference (VI) where the posterior over hyperparameters is approximated by a factorized Gaussian (mean-field) or a full-rank Gaussian accounting for correlations between hyperparameters. We analyze the predictive performance for fully Bayesian GPR on a range of benchmark data sets.
Asymptotic Risk of Least Squares Minimum Norm Estimator under the Spike Covariance Model
Machine Learning, Machine Learning. 2 authors. pdf
One of the recent approaches to explain good performance of neural networks has focused on their ability to fit training data perfectly (interpolate) without overfitting. It has been shown that this is not unique to neural nets, and that it happens with simpler models such as kernel regression, too Belkin et al. …
(2018b); Tengyuan Liang (2018). Consequently, there has been quite a few works that give conditions for low risk or optimality of interpolating models, see for example Belkin et al. (2018a, 2019b). One of the simpler models where interpolation has been studied recently is least squares solution for linear regression. In this case, interpolation is only guaranteed to happen in high dimensional setting where the number of parameters exceeds number of samples; therefore, least squares solution is not necessarily unique. However, minimum norm solution is unique, can be written in closed form, and gradient descent starting at the origin converges to it (Hastie et al., 2019). This has, at least partially, motivated several works that study risk of minimum norm least squares estimator for linear regression. Continuing in a similar vein, we study the asymptotic risk of minimum norm least squares estimator when number of parameters \(d\) depends on \(n\), and \(\frac{d}{n} \rightarrow \infty\). In this high dimensional setting, to make inference feasible, it is usually assumed that true parameters or data have some underlying low dimensional structure such as sparsity, or vanishing eigenvalues of population covariance matrix. Here, we restrict ourselves to spike covariance matrices, where a fixed finite number of eigenvalues grow with \(n\) and are much larger than the rest of the eigenvalues, which are (asymptotically) in the same order. We show that in this setting the risk can vanish.
A Dynamic Sampling Adaptive-SGD Method for Machine Learning
Machine Learning, Optimization and Control, Machine Learning. 2 authors. pdf
We propose a stochastic optimization method for minimizing loss functions, which can be expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such directions, eliminating the need for the user to tune the learning rate. The proposed method exploits local curvature information and ensures that search directions are descent directions with high probability using an acute-angle test. …
The method is proved to have, under reasonable assumptions, a global linear rate of convergence on self-concordant functions with high probability. Numerical experiments show that this method is able to choose the best learning rates and compares favorably to fine-tuned SGD for training logistic regression and Deep Neural Networks (DNNs). We also propose an adaptive version of ADAM that eliminates the need to tune the base learning rate and compares favorably to fine-tuned ADAM for training DNNs.
Intrinsic motivations and open-ended learning
Machine Learning, Artificial Intelligence, Machine Learning. 1 authors. pdf
There is a growing interest and literature on intrinsic motivations and open-ended learning in both cognitive robotics and machine learning on one side, and in psychology and neuroscience on the other. This paper aims to review some relevant contributions from the two literature threads and to draw links between them. …
To this purpose, the paper starts by defining intrinsic motivations and by presenting a computationally-driven theoretical taxonomy of their different types. Then it presents relevant contributions from the psychological and neuroscientific literature related to intrinsic motivations, interpreting them based on the grid, and elucidates the mechanisms and functions they play in animals and humans. Endowed with such concepts and their biological underpinnings, the paper next presents a selection of models from cognitive robotics and machine learning that computationally operationalise the concepts of intrinsic motivations and links them to biology concepts. The contribution finally presents some of the open challenges of the field from both the psychological/neuroscientific and computational perspectives.
Some compact notations for concentration inequalities and user-friendly results
Statistics Theory, Machine Learning, Information Theory, Machine Learning, Statistics Theory, Information Theory, Probability. 1 authors. pdf
This paper presents compact notations for concentration inequalities and convenient results to streamline probabilistic analysis. The new expressions describe the typical sizes and tails of random variables, allowing for simple operations without heavy use of inessential constants. …
They bridge classical asymptotic notations and modern non-asymptotic tail bounds together. Examples of different kinds demonstrate their efficacy.

Machine Learning (cs.LG): 19 new

Machine Learning (cs.LG)
Leveraging Semi-Supervised Learning for Fairness using Neural Networks
Machine Learning, Artificial Intelligence, Machine Learning. 5 authors. pdf
There has been a growing concern about the fairness of decision-making systems based on machine learning. The shortage of labeled data has been always a challenging problem facing machine learning based systems. …
In such scenarios, semi-supervised learning has shown to be an effective way of exploiting unlabeled data to improve upon the performance of model. Notably, unlabeled data do not contain label information which itself can be a significant source of bias in training machine learning systems. This inspired us to tackle the challenge of fairness by formulating the problem in a semi-supervised framework. In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data to not just improve the performance but also improve the fairness of the decision-making process. The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.
Side-Tuning: Network Adaptation via Additive Side Networks
Robotics, Computer Vision and Pattern Recognition, Neural and Evolutionary Computing, Machine Learning. 5 authors. pdf
When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than start with a randomly initialized one – due to lacking enough training data, performing lifelong learning where the system has to learn a new task while being previously trained for other tasks, or wishing to encode priors in the network via preset weights. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. …
In this paper, we propose a straightforward alternative: Side-Tuning. Side-tuning adapts a pre-trained network by training a lightweight “side” network that is fused with the (unchanged) pre-trained network using summation. This simple method works as well as or better than existing solutions while it resolves some of the basic issues with fine-tuning, fixed features, and several other common baselines. In particular, side-tuning is less prone to overfitting when little training data is available, yields better results than using a fixed feature extractor, and does not suffer from catastrophic forgetting in lifelong learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including lifelong learning (iCIFAR, Taskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.
Volumetric Lung Nodule Segmentation using Adaptive ROI with Multi-View Residual Learning
Machine Learning, Image and Video Processing, Computer Vision and Pattern Recognition, Machine Learning. 5 authors. pdf
Accurate quantification of pulmonary nodules can greatly assist the early diagnosis of lung cancer, which can enhance patient survival possibilities. A number of nodule segmentation techniques have been proposed, however, all of the existing techniques rely on radiologist 3-D volume of interest (VOI) input or use the constant region of interest (ROI) and only investigate the presence of nodule voxels within the given VOI. …
Such approaches restrain the solutions to investigate the nodule presence outside the given VOI and also include the redundant structures into VOI, which may lead to inaccurate nodule segmentation. In this work, a novel semi-automated approach for 3-D segmentation of nodule in volumetric computerized tomography (CT) lung scans has been proposed. The proposed technique can be segregated into two stages, at the first stage, it takes a 2-D ROI containing the nodule as input and it performs patch-wise investigation along the axial axis with a novel adaptive ROI strategy. The adaptive ROI algorithm enables the solution to dynamically select the ROI for the surrounding slices to investigate the presence of nodule using deep residual U-Net architecture. The first stage provides the initial estimation of nodule which is further utilized to extract the VOI. At the second stage, the extracted VOI is further investigated along the coronal and sagittal axis with two different networks and finally, all the estimated masks are fed into the consensus module to produce the final volumetric segmentation of nodule. The proposed approach has been rigorously evaluated on the LIDC dataset, which is the largest publicly available dataset. The result suggests that the approach is significantly robust and accurate as compared to the previous state of the art techniques.
On the Role of Weight Sharing During Deep Option Learning
Machine Learning, Machine Learning. 5 authors. pdf
The options framework is a popular approach for building temporally extended actions in reinforcement learning. In particular, the option-critic architecture provides general purpose policy gradient theorems for learning actions from scratch that are extended in time. …
However, past work makes the key assumption that each of the components of option-critic has independent parameters. In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, it is always violated in practice for the deep function approximation setting. We thus reconsider this assumption and consider more general extensions of option-critic and hierarchical option-critic training that optimize for the full architecture with each update. It turns out that not assuming parameter independence challenges a belief in prior work that training the policy over options can be disentangled from the dynamics of the underlying options. In fact, learning can be sped up by focusing the policy over options on states where options are actually likely to terminate. We put our new algorithms to the test in application to sample efficient learning of Atari games, and demonstrate significantly improved stability and faster convergence when learning long options.
oLMpics – On what Language Model Pre-training Captures
Machine Learning, Artificial Intelligence, Computation and Language. 4 authors. pdf
Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. …
In this work, we propose eight reasoning tasks, which conceptually require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data. To address this, we propose an evaluation protocol that includes both zero-shot evaluation (no fine-tuning), as well as comparing the learning curve of a fine-tuned LM to the learning curve of multiple controls, which paints a rich picture of the LM capabilities. Our main findings are that: (a) different LMs exhibit qualitatively different reasoning abilities, e.g., RoBERTa succeeds in reasoning tasks where BERT fails completely; (b) LMs do not reason in an abstract manner and are context-dependent, e.g., while RoBERTa can compare ages, it can do so only when the ages are in the typical range of human ages; (c) On half of our reasoning tasks all models fail completely. Our findings and infrastructure can help future work on designing new datasets, models and objective functions for pre-training.
Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization
Machine Learning, Optimization and Control, Machine Learning. 3 authors. pdf
Stochastic compositional optimization arises in many important machine learning tasks such as value function evaluation in reinforcement learning and portfolio management. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. …
In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: \(\mathcal{O}((n+m)^{1/2} \varepsilon^{-2})\) in the finite-sum case and \(\mathcal{O}(\varepsilon^{-3})\) in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal. Our experiments validate the theoretical performance of our algorithm.
Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity
Machine Learning, Machine Learning. 3 authors. pdf
Traditional landscape analysis of deep neural networks aims to show that no sub-optimal local minima exist in some appropriate sense. From this, one may be tempted to conclude that descent algorithms which escape saddle points will reach a good local minimum. …
However, basic optimization theory tell us that it is also possible for a descent algorithm to diverge to infinity if there are paths leading to infinity, along which the loss function decreases. It is not clear whether for non-linear neural networks there exists one setting that no bad local-min and no decreasing paths to infinity can be simultaneously achieved. In this paper, we give the first positive answer to this question. More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity. The key mathematical trick is to show that the set of regularizers which may be undesirable can be viewed as the image of a Lipschitz continuous mapping from a lower-dimensional Euclidean space to a higher-dimensional Euclidean space, and thus has zero measure.
Reward-Conditioned Policies
Machine Learning, Machine Learning. 3 authors. pdf
Reinforcement learning offers the promise of automating the acquisition of complex behavioral skills. However, compared to commonly used and well-understood supervised learning methods, reinforcement learning algorithms can be brittle, difficult to use and tune, and sensitive to seemingly innocuous implementation decisions. …
In contrast, imitation learning utilizes standard and well-understood supervised learning methods, but requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? The main idea that we explore in this work is that non-expert trajectories collected from sub-optimal policies can be viewed as optimal supervision, not for maximizing the reward, but for matching the reward of the given trajectory. By then conditioning the policy on the numerical value of the reward, we can obtain a policy that generalizes to larger returns. We show how such an approach can be derived as a principled method for policy search, discuss several variants, and compare the method experimentally to a variety of current reinforcement learning methods on standard benchmarks.
Robust Aggregation for Federated Learning
Machine Learning, Cryptography and Security, Machine Learning. 3 authors. pdf
We present a robust aggregation approach to make federated learning robust to settings when a fraction of the devices may be sending corrupted updates to the server. The proposed approach relies on a robust secure aggregation oracle based on the geometric median, which returns a robust aggregate using a constant number of calls to a regular non-robust secure average oracle. …
The robust aggregation oracle is privacy-preserving, similar to the secure average oracle it builds upon. We provide experimental results of the proposed approach with linear models and deep networks for two tasks in computer vision and natural language processing. The robust aggregation approach is agnostic to the level of corruption; it outperforms the classical aggregation approach in terms of robustness when the level of corruption is high, while being competitive in the regime of low corruption.
Representation Internal-Manipulation (RIM): A Neuro-Inspired Computational Theory of Consciousness
Neural and Evolutionary Computing, Machine Learning, Machine Learning, Neurons and Cognition, Artificial Intelligence. 2 authors. pdf
Many theories, based on neuroscientific and psychological empirical evidence and on computational concepts, have been elaborated to explain the emergence of consciousness in the central nervous system. These theories propose key fundamental mechanisms to explain consciousness, but they only partially connect such mechanisms to the possible functional and adaptive role of consciousness. …
Recently, some cognitive and neuroscientific models try to solve this gap by linking consciousness to various aspects of goal-directed behaviour, the pivotal cognitive process that allows mammals to flexibly act in challenging environments. Here we propose the Representation Internal-Manipulation (RIM) theory of consciousness, a theory that links the main elements of consciousness theories to components and functions of goal-directed behaviour, ascribing a central role for consciousness to the goal-directed manipulation of internal representations. This manipulation relies on four specific computational operations to perform the flexible internal adaptation of all key elements of goal-directed computation, from the representations of objects to those of goals, actions, and plans. Finally, we propose the concept of `manipulation agency’ relating the sense of agency to the internal manipulation of representations. This allows us to propose that the subjective experience of consciousness is associated to the human capacity to generate and control a simulated internal reality that is vividly perceived and felt through the same perceptual and emotional mechanisms used to tackle the external world.
On the Difference Between the Information Bottleneck and the Deep Information Bottleneck
Information Theory, Information Theory, Machine Learning, Machine Learning. 2 authors. pdf
Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. …
The two assumed properties of the data \(X\), \(Y\) and their latent representation \(T\) take the form of two Markov chains \(T-X-Y\) and \(X-T-Y\). Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions \(P(X,Y,T)\). We therefore show how to circumvent this limitation by optimising a lower bound for \(I(T;Y)\) for which only the latter Markov chain has to be satisfied. The actual mutual information consists of the lower bound which is optimised in DVIB and cognate models in practice and of two terms measuring how much the former requirement \(T-X-Y\) is violated. Finally, we propose to interpret the family of information bottleneck models as directed graphical models and show that in this framework the original and deep information bottlenecks are special cases of a fundamental IB model.
Model Inversion Networks for Model-Based Optimization
Machine Learning, Machine Learning. 2 authors. pdf
In this work, we aim to solve data-driven optimization problems, where the goal is to find an input that maximizes an unknown score function given access to a dataset of inputs with corresponding scores. When the inputs are high-dimensional and valid inputs constitute a small subset of this space (e. …
g., valid protein sequences or valid natural images), such model-based optimization problems become exceptionally difficult, since the optimizer must avoid out-of-distribution and invalid inputs. We propose to address such problem with model inversion networks (MINs), which learn an inverse mapping from scores to inputs. MINs can scale to high-dimensional input spaces and leverage offline logged data for both contextual and non-contextual optimization problems. MINs can also handle both purely offline data sources and active data collection. We evaluate MINs on tasks from the Bayesian optimization literature, high-dimensional model-based optimization problems over images and protein designs, and contextual bandit optimization from logged data.
Approximate Inference for Fully Bayesian Gaussian Process Regression
Machine Learning, Machine Learning. 2 authors. pdf
Learning in Gaussian Process models occurs through the adaptation of hyperparameters of the mean and the covariance function. The classical approach entails maximizing the marginal likelihood yielding fixed point estimates (an approach called or ML-II). …
An alternative learning procedure is to infer the posterior over hyperparameters in a hierarchical specification of GPs we call (GPR). This work considers two approximation schemes for the intractable hyperparameter posterior: 1) Hamiltonian Monte Carlo (HMC) yielding a sampling-based approximation and 2) Variational Inference (VI) where the posterior over hyperparameters is approximated by a factorized Gaussian (mean-field) or a full-rank Gaussian accounting for correlations between hyperparameters. We analyze the predictive performance for fully Bayesian GPR on a range of benchmark data sets.
Asymptotic Risk of Least Squares Minimum Norm Estimator under the Spike Covariance Model
Machine Learning, Machine Learning. 2 authors. pdf
One of the recent approaches to explain good performance of neural networks has focused on their ability to fit training data perfectly (interpolate) without overfitting. It has been shown that this is not unique to neural nets, and that it happens with simpler models such as kernel regression, too Belkin et al. …
(2018b); Tengyuan Liang (2018). Consequently, there has been quite a few works that give conditions for low risk or optimality of interpolating models, see for example Belkin et al. (2018a, 2019b). One of the simpler models where interpolation has been studied recently is least squares solution for linear regression. In this case, interpolation is only guaranteed to happen in high dimensional setting where the number of parameters exceeds number of samples; therefore, least squares solution is not necessarily unique. However, minimum norm solution is unique, can be written in closed form, and gradient descent starting at the origin converges to it (Hastie et al., 2019). This has, at least partially, motivated several works that study risk of minimum norm least squares estimator for linear regression. Continuing in a similar vein, we study the asymptotic risk of minimum norm least squares estimator when number of parameters \(d\) depends on \(n\), and \(\frac{d}{n} \rightarrow \infty\). In this high dimensional setting, to make inference feasible, it is usually assumed that true parameters or data have some underlying low dimensional structure such as sparsity, or vanishing eigenvalues of population covariance matrix. Here, we restrict ourselves to spike covariance matrices, where a fixed finite number of eigenvalues grow with \(n\) and are much larger than the rest of the eigenvalues, which are (asymptotically) in the same order. We show that in this setting the risk can vanish.
A Dynamic Sampling Adaptive-SGD Method for Machine Learning
Machine Learning, Optimization and Control, Machine Learning. 2 authors. pdf
We propose a stochastic optimization method for minimizing loss functions, which can be expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such directions, eliminating the need for the user to tune the learning rate. The proposed method exploits local curvature information and ensures that search directions are descent directions with high probability using an acute-angle test. …
The method is proved to have, under reasonable assumptions, a global linear rate of convergence on self-concordant functions with high probability. Numerical experiments show that this method is able to choose the best learning rates and compares favorably to fine-tuned SGD for training logistic regression and Deep Neural Networks (DNNs). We also propose an adaptive version of ADAM that eliminates the need to tune the base learning rate and compares favorably to fine-tuned ADAM for training DNNs.
Intrinsic motivations and open-ended learning
Machine Learning, Artificial Intelligence, Machine Learning. 1 authors. pdf
There is a growing interest and literature on intrinsic motivations and open-ended learning in both cognitive robotics and machine learning on one side, and in psychology and neuroscience on the other. This paper aims to review some relevant contributions from the two literature threads and to draw links between them. …
To this purpose, the paper starts by defining intrinsic motivations and by presenting a computationally-driven theoretical taxonomy of their different types. Then it presents relevant contributions from the psychological and neuroscientific literature related to intrinsic motivations, interpreting them based on the grid, and elucidates the mechanisms and functions they play in animals and humans. Endowed with such concepts and their biological underpinnings, the paper next presents a selection of models from cognitive robotics and machine learning that computationally operationalise the concepts of intrinsic motivations and links them to biology concepts. The contribution finally presents some of the open challenges of the field from both the psychological/neuroscientific and computational perspectives.
Towards Regulated Deep Learning
Multiagent Systems, Programming Languages, Logic in Computer Science, Machine Learning, Artificial Intelligence. 1 authors. pdf
Regulation of Multi-Agent Systems (MAS) was a research topic of the past decade and one of these proposals was Electronic Institutions. However, with the recent reformulation of Artificial Neural Networks (ANN) as Deep Learning (DL), Security, Privacy, Ethical and Legal issues regarding the use of DL has raised concerns in the Artificial Intelligence (AI) Community. …
Now that the Regulation of MAS is almost correctly addressed, we propose the Regulation of ANN as Agent-based Training of a special type of regulated ANN that we call Institutional Neural Network. This paper introduces the former concept and provides \(\mathcal{I}\), a language previously used to model and extend Electronic Institutions, as a means to implement and regulate DL.
A frequency-domain analysis of inexact gradient descent
Machine Learning, Numerical Analysis, Numerical Analysis, Optimization and Control, Systems and Control. 1 authors. pdf
We study robustness properties of inexact gradient descent for strongly convex functions, as well as for the larger class of functions with sector-bounded gradients, under a relative error model. Proofs of the corresponding convergence rates are based on frequency-domain criteria for the stability of nonlinear systems perturbed by additive noise. …
Some compact notations for concentration inequalities and user-friendly results
Statistics Theory, Machine Learning, Information Theory, Machine Learning, Statistics Theory, Information Theory, Probability. 1 authors. pdf
This paper presents compact notations for concentration inequalities and convenient results to streamline probabilistic analysis. The new expressions describe the typical sizes and tails of random variables, allowing for simple operations without heavy use of inessential constants. …
They bridge classical asymptotic notations and modern non-asymptotic tail bounds together. Examples of different kinds demonstrate their efficacy.

Data Science arXiv by Primary Tag

The tables below show abstracts organized by category with hyperlinks back to the arXiv site.

Computer Science

Machine Learning (cs.LG)
Face X-ray for More General Face Forgery Detection
Computer Vision and Pattern Recognition. 7 authors. pdf
In this paper we propose a novel image representation called face X-ray for detecting forgery in face images. The face X-ray of an input face image is a greyscale image that reveals whether the input image can be decomposed into the blending of two images from different sources. …
It does so by showing the blending boundary for a forged image and the absence of blending for a real image. We observe that most existing face manipulation methods share a common step: blending the altered face into an existing background image. For this reason, face X-ray provides an effective way for detecting forgery generated by most existing face manipulation algorithms. Face X-ray is general in the sense that it only assumes the existence of a blending step and does not rely on any knowledge of the artifacts associated with a specific face manipulation technique. Indeed, the algorithm for computing face X-ray can be trained without fake images generated by any of the state-of-the-art face manipulation methods. Extensive experiments show that face X-ray remains effective when applied to forgery generated by unseen face manipulation techniques, while most existing face forgery detection algorithms experience a significant performance drop.
Leveraging Semi-Supervised Learning for Fairness using Neural Networks
Machine Learning, Artificial Intelligence, Machine Learning. 5 authors. pdf
There has been a growing concern about the fairness of decision-making systems based on machine learning. The shortage of labeled data has been always a challenging problem facing machine learning based systems. …
In such scenarios, semi-supervised learning has shown to be an effective way of exploiting unlabeled data to improve upon the performance of model. Notably, unlabeled data do not contain label information which itself can be a significant source of bias in training machine learning systems. This inspired us to tackle the challenge of fairness by formulating the problem in a semi-supervised framework. In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data to not just improve the performance but also improve the fairness of the decision-making process. The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.
C. H. Robinson Uses Heuristics to Solve Rich Vehicle Routing Problems
Optimization and Control, Artificial Intelligence. 5 authors. pdf
We consider a wide family of vehicle routing problem variants with many complex and practical constraints, known as rich vehicle routing problems, which are faced on a daily basis by C.H. …
Robinson (CHR). Since CHR has many customers, each with distinct requirements, various routing problems with different objectives and constraints should be solved. We propose a set partitioning framework with a number of route generation algorithms, which have shown to be effective in solving a variety of different problems. The proposed algorithms have outperformed the existing technologies at CHR on 10 benchmark instances and since, have been embedded into the company’s transportation planning and execution technology platform.
Side-Tuning: Network Adaptation via Additive Side Networks
Robotics, Computer Vision and Pattern Recognition, Neural and Evolutionary Computing, Machine Learning. 5 authors. pdf
When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than start with a randomly initialized one – due to lacking enough training data, performing lifelong learning where the system has to learn a new task while being previously trained for other tasks, or wishing to encode priors in the network via preset weights. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. …
In this paper, we propose a straightforward alternative: Side-Tuning. Side-tuning adapts a pre-trained network by training a lightweight “side” network that is fused with the (unchanged) pre-trained network using summation. This simple method works as well as or better than existing solutions while it resolves some of the basic issues with fine-tuning, fixed features, and several other common baselines. In particular, side-tuning is less prone to overfitting when little training data is available, yields better results than using a fixed feature extractor, and does not suffer from catastrophic forgetting in lifelong learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including lifelong learning (iCIFAR, Taskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.
FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping
Computer Vision and Pattern Recognition. 5 authors. pdf
In this work, we propose a novel two-stage framework, called FaceShifter, for high fidelity and occlusion aware face swapping. Unlike many existing face swapping works that leverage only limited information from the target image when synthesizing the swapped face, our framework, in its first stage, generates the swapped face in high-fidelity by exploiting and integrating the target attributes thoroughly and adaptively. …
We propose a novel attributes encoder for extracting multi-level target face attributes, and a new generator with carefully designed Adaptive Attentional Denormalization (AAD) layers to adaptively integrate the identity and the attributes for face synthesis. To address the challenging facial occlusions, we append a second stage consisting of a novel Heuristic Error Acknowledging Refinement Network (HEAR-Net). It is trained to recover anomaly regions in a self-supervised way without any manual annotations. Extensive experiments on wild faces demonstrate that our face swapping results are not only considerably more perceptually appealing, but also better identity preserving in comparison to other state-of-the-art methods.
Morphology-Agnostic Visual Robotic Control
Robotics, Computer Vision and Pattern Recognition. 5 authors. pdf
Existing approaches for visuomotor robotic control typically require characterizing the robot in advance by calibrating the camera or performing system identification. We propose MAVRIC, an approach that works with minimal prior knowledge of the robot’s morphology, and requires only a camera view containing the robot and its environment and an unknown control interface. …
MAVRIC revolves around a mutual information-based method for self-recognition, which discovers visual “control points” on the robot body within a few seconds of exploratory interaction, and these control points in turn are then used for visual servoing. MAVRIC can control robots with imprecise actuation, no proprioceptive feedback, unknown morphologies including novel tools, unknown camera poses, and even unsteady handheld cameras. We demonstrate our method on visually-guided 3D point reaching, trajectory following, and robot-to-robot imitation.
Learning 3D Human Shape and Pose from Dense Body Parts
Computer Vision and Pattern Recognition. 5 authors. pdf
Reconstructing 3D human shape and pose from a monocular image is challenging despite the promising results achieved by the most recent learning-based methods. The commonly occurred misalignment comes from the facts that the mapping from images to the model space is highly non-linear and the rotation-based pose representation of the body model is prone to result in the drift of joint positions. …
In this work, we investigate learning 3D human shape and pose from dense correspondences of body parts and propose a Decompose-and-aggregate Network (DaNet) to address these issues. DaNet adopts the dense correspondence maps, which densely build a bridge between 2D pixels and 3D vertexes, as intermediate representations to facilitate the learning of 2D-to-3D mapping. The prediction modules of DaNet are decomposed into one global stream and multiple local streams to enable global and fine-grained perceptions for the shape and pose predictions, respectively. Messages from local streams are further aggregated to enhance the robust prediction of the rotation-based poses, where a position-aided rotation feature refinement strategy is proposed to exploit spatial relationships between body joints. Moreover, a Part-based Dropout (PartDrop) strategy is introduced to drop out dense information from intermediate representations during training, encouraging the network to focus on more complementary body parts as well as adjacent position features. The effectiveness of our method is validated on both in-door and real-world datasets including the Human3.6M, UP3D, and DensePose-COCO datasets. Experimental results show that the proposed method significantly improves the reconstruction performance in comparison with previous state-of-the-art methods. Our code will be made publicly available at https://hongwenzhang.github.io/dense2mesh .
On the Role of Weight Sharing During Deep Option Learning
Machine Learning, Machine Learning. 5 authors. pdf
The options framework is a popular approach for building temporally extended actions in reinforcement learning. In particular, the option-critic architecture provides general purpose policy gradient theorems for learning actions from scratch that are extended in time. …
However, past work makes the key assumption that each of the components of option-critic has independent parameters. In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, it is always violated in practice for the deep function approximation setting. We thus reconsider this assumption and consider more general extensions of option-critic and hierarchical option-critic training that optimize for the full architecture with each update. It turns out that not assuming parameter independence challenges a belief in prior work that training the policy over options can be disentangled from the dynamics of the underlying options. In fact, learning can be sped up by focusing the policy over options on states where options are actually likely to terminate. We put our new algorithms to the test in application to sample efficient learning of Atari games, and demonstrate significantly improved stability and faster convergence when learning long options.
Artificial Intelligence (cs.AI)
oLMpics – On what Language Model Pre-training Captures
Machine Learning, Artificial Intelligence, Computation and Language. 4 authors. pdf
Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. …
In this work, we propose eight reasoning tasks, which conceptually require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data. To address this, we propose an evaluation protocol that includes both zero-shot evaluation (no fine-tuning), as well as comparing the learning curve of a fine-tuned LM to the learning curve of multiple controls, which paints a rich picture of the LM capabilities. Our main findings are that: (a) different LMs exhibit qualitatively different reasoning abilities, e.g., RoBERTa succeeds in reasoning tasks where BERT fails completely; (b) LMs do not reason in an abstract manner and are context-dependent, e.g., while RoBERTa can compare ages, it can do so only when the ages are in the typical range of human ages; (c) On half of our reasoning tasks all models fail completely. Our findings and infrastructure can help future work on designing new datasets, models and objective functions for pre-training.
GraspNet: A Large-Scale Clustered and Densely Annotated Datase for Object Grasping
Robotics, Computer Vision and Pattern Recognition. 4 authors. pdf
Object grasping is critical for many applications, which is also a challenging computer vision problem. However, for the clustered scene, current researches suffer from the problems of insufficient training data and the lacking of evaluation benchmarks. …
In this work, we contribute a large-scale grasp pose detection dataset with a unified evaluation system. Our dataset contains 87,040 RGBD images with over 370 million grasp poses. Meanwhile, our evaluation system directly reports whether a grasping is successful or not by analytic computation, which is able to evaluate any kind of grasp poses without exhausted labeling pose ground-truth. We conduct extensive experiments to show that our dataset and evaluation system can align well with real-world experiments. Our dataset, source code and models will be made publicly available.
Fungal architecture
Emerging Technologies. 4 authors. pdf
As one of the primary consumers of environmental resource, the building industry faces unprecedented challenges in needing to reduce the environmental impact of current consumption practices. This applies to both the construction of the built environment and resource consumption during its occupation and use. …
Where incremental improvements to current practices can be realised, the net benefits are often far outstripped by the burgeoning demands of rapidly increasing population growth and urbanisation. Against the backdrop of this grand societal challenge, it is necessary to explore approaches that envision a paradigm shift in how material is sourced, processed and assembled to address the magnitude of these challenges in a truly sustainable way, and which can even provide added value. We propose to develop a structural substrate by using live fungal mycelium, functionalise the substrate with nanoparticles and polymers to make a mycelium-based electronics, implement sensorial fusion and decision making in the fungal electronics and to growing monolithic buildings from the functionalised fungal substrate. Fungal buildings will self-grow, build, and repair themselves subject to substrate supplied, use natural adaptation to the environment, sense all what human can sense.
Domain-topic models with chained dimensions: modeling the evolution of a major oncology conference (1995-2017)
Physics and Society, Data Analysis, Statistics and Probability, Digital Libraries. 4 authors. pdf
In this paper we introduce a novel approach for the computational analysis of research activities and their dynamics. Named SASHIMI (Symmetrical And Sequential analysis from Hierarchical Inference of Multidimensional Information), our approach provides a multi-level description of the structure of scientific activities that offers numerous advantages over traditional methods such as topic models or network analyses. …
Our method generates a dual description of corpora in terms of research domains (collections of documents) and topics (collections of words). It also extends this description to clusters of associated dimensions, such as time. SASHIMI only requires access to the textual content of individual documents, rather than specific metadata such as citations, authors, or keywords as is the case with other science-mapping approaches. We illustrate the analytical power of our method by applying it to the empirical analysis of an original dataset, namely the 1995-2017 collection of abstracts presented at ASCO, the largest annual oncology research conference. We show that SASHIMI is able to detect the presence of significant temporal patterns and to identify the major thematic transformations of oncology that underlie these patterns.
Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity
Machine Learning, Machine Learning. 3 authors. pdf
Traditional landscape analysis of deep neural networks aims to show that no sub-optimal local minima exist in some appropriate sense. From this, one may be tempted to conclude that descent algorithms which escape saddle points will reach a good local minimum. …
However, basic optimization theory tell us that it is also possible for a descent algorithm to diverge to infinity if there are paths leading to infinity, along which the loss function decreases. It is not clear whether for non-linear neural networks there exists one setting that no bad local-min and no decreasing paths to infinity can be simultaneously achieved. In this paper, we give the first positive answer to this question. More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity. The key mathematical trick is to show that the set of regularizers which may be undesirable can be viewed as the image of a Lipschitz continuous mapping from a lower-dimensional Euclidean space to a higher-dimensional Euclidean space, and thus has zero measure.
Reward-Conditioned Policies
Machine Learning, Machine Learning. 3 authors. pdf
Reinforcement learning offers the promise of automating the acquisition of complex behavioral skills. However, compared to commonly used and well-understood supervised learning methods, reinforcement learning algorithms can be brittle, difficult to use and tune, and sensitive to seemingly innocuous implementation decisions. …
In contrast, imitation learning utilizes standard and well-understood supervised learning methods, but requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? The main idea that we explore in this work is that non-expert trajectories collected from sub-optimal policies can be viewed as optimal supervision, not for maximizing the reward, but for matching the reward of the given trajectory. By then conditioning the policy on the numerical value of the reward, we can obtain a policy that generalizes to larger returns. We show how such an approach can be derived as a principled method for policy search, discuss several variants, and compare the method experimentally to a variety of current reinforcement learning methods on standard benchmarks.
Computer Vision and Pattern Recognition (cs.CV)
Building Confidence in Scientific Computing Software Via Assurance Cases
Software Engineering. 3 authors. pdf
Assurance cases provide an organized and explicit argument for correctness. They can dramatically improve the certification of Scientific Computing Software (SCS). …
Assurance cases have already been effectively used for safety cases for real time systems. Their advantages for SCS include engaging domain experts, producing only necessary documentation, and providing evidence that can be verified/replicated. This paper illustrates assurance cases for SCS through the correctness case for 3dfim+, an existing Medical Imaging Application (MIA) for analyzing activity in the brain. This example was partly chosen because of recent concerns about the validity of fMRI (Functional Magnetic Resonance Imaging) studies. The example justifies the value of assurance cases for SCS, since the existing documentation is shown to have ambiguities and omissions, such as an incompletely defined ranking function and missing details on the coordinate system. A serious concern for 3dfim+ is identified: running the software does not produce any warning about the necessity of using data that matches the parametric statistical model employed for the correlation calculations. Raising the bar for SCS in general, and MIA in particular, is both feasible and necessary - when software impacts safety, an assurance case methodology (or an equivalently rigorous confidence building methodology) should be employed.
Representation Internal-Manipulation (RIM): A Neuro-Inspired Computational Theory of Consciousness
Neural and Evolutionary Computing, Machine Learning, Machine Learning, Neurons and Cognition, Artificial Intelligence. 2 authors. pdf
Many theories, based on neuroscientific and psychological empirical evidence and on computational concepts, have been elaborated to explain the emergence of consciousness in the central nervous system. These theories propose key fundamental mechanisms to explain consciousness, but they only partially connect such mechanisms to the possible functional and adaptive role of consciousness. …
Recently, some cognitive and neuroscientific models try to solve this gap by linking consciousness to various aspects of goal-directed behaviour, the pivotal cognitive process that allows mammals to flexibly act in challenging environments. Here we propose the Representation Internal-Manipulation (RIM) theory of consciousness, a theory that links the main elements of consciousness theories to components and functions of goal-directed behaviour, ascribing a central role for consciousness to the goal-directed manipulation of internal representations. This manipulation relies on four specific computational operations to perform the flexible internal adaptation of all key elements of goal-directed computation, from the representations of objects to those of goals, actions, and plans. Finally, we propose the concept of `manipulation agency’ relating the sense of agency to the internal manipulation of representations. This allows us to propose that the subjective experience of consciousness is associated to the human capacity to generate and control a simulated internal reality that is vividly perceived and felt through the same perceptual and emotional mechanisms used to tackle the external world.
Towards Neural-Guided Program Synthesis for Linear Temporal Logic Specifications
Computer Science and Game Theory, Logic in Computer Science, Neural and Evolutionary Computing, Formal Languages and Automata Theory, Artificial Intelligence. 2 authors. pdf
Synthesizing a program that realizes a logical specification is a classical problem in computer science. We examine a particular type of program synthesis, where the objective is to synthesize a strategy that reacts to a potentially adversarial environment while ensuring that all executions satisfy a Linear Temporal Logic (LTL) specification. …
Unfortunately, exact methods to solve so-called LTL synthesis via logical inference do not scale. In this work, we cast LTL synthesis as an optimization problem. We employ a neural network to learn a Q-function that is then used to guide search, and to construct programs that are subsequently verified for correctness. Our method is unique in combining search with deep learning to realize LTL synthesis. In our experiments the learned Q-function provides effective guidance for synthesis problems with relatively small specifications.
OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Clustering
Computer Vision and Pattern Recognition. 2 authors. pdf
We present a method for simultaneously learning, in an unsupervised manner, (i) a conditional image generator, (ii) foreground extraction and segmentation, (iii) clustering into a two-level class hierarchy, and (iv) object removal and background completion, all done without any use of annotation. The method combines a generative adversarial network and a variational autoencoder, with multiple encoders, generators and discriminators, and benefits from solving all tasks at once. …
The input to the training scheme is a varied collection of unlabeled images from the same domain, as well as a set of background images without a foreground object. In addition, the image generator can mix the background from one image, with a foreground that is conditioned either on that of a second image or on the index of a desired cluster. The method obtains state of the art results in comparison to the literature methods, when compared to the current state of the art in each of the tasks.
On the Difference Between the Information Bottleneck and the Deep Information Bottleneck
Information Theory, Information Theory, Machine Learning, Machine Learning. 2 authors. pdf
Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. …
The two assumed properties of the data \(X\), \(Y\) and their latent representation \(T\) take the form of two Markov chains \(T-X-Y\) and \(X-T-Y\). Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions \(P(X,Y,T)\). We therefore show how to circumvent this limitation by optimising a lower bound for \(I(T;Y)\) for which only the latter Markov chain has to be satisfied. The actual mutual information consists of the lower bound which is optimised in DVIB and cognate models in practice and of two terms measuring how much the former requirement \(T-X-Y\) is violated. Finally, we propose to interpret the family of information bottleneck models as directed graphical models and show that in this framework the original and deep information bottlenecks are special cases of a fundamental IB model.
Model Inversion Networks for Model-Based Optimization
Machine Learning, Machine Learning. 2 authors. pdf
In this work, we aim to solve data-driven optimization problems, where the goal is to find an input that maximizes an unknown score function given access to a dataset of inputs with corresponding scores. When the inputs are high-dimensional and valid inputs constitute a small subset of this space (e. …
g., valid protein sequences or valid natural images), such model-based optimization problems become exceptionally difficult, since the optimizer must avoid out-of-distribution and invalid inputs. We propose to address such problem with model inversion networks (MINs), which learn an inverse mapping from scores to inputs. MINs can scale to high-dimensional input spaces and leverage offline logged data for both contextual and non-contextual optimization problems. MINs can also handle both purely offline data sources and active data collection. We evaluate MINs on tasks from the Bayesian optimization literature, high-dimensional model-based optimization problems over images and protein designs, and contextual bandit optimization from logged data.
Software Engineering (cs.SE)
A Dynamic Sampling Adaptive-SGD Method for Machine Learning
Machine Learning, Optimization and Control, Machine Learning. 2 authors. pdf
We propose a stochastic optimization method for minimizing loss functions, which can be expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such directions, eliminating the need for the user to tune the learning rate. The proposed method exploits local curvature information and ensures that search directions are descent directions with high probability using an acute-angle test. …
The method is proved to have, under reasonable assumptions, a global linear rate of convergence on self-concordant functions with high probability. Numerical experiments show that this method is able to choose the best learning rates and compares favorably to fine-tuned SGD for training logistic regression and Deep Neural Networks (DNNs). We also propose an adaptive version of ADAM that eliminates the need to tune the base learning rate and compares favorably to fine-tuned ADAM for training DNNs.
Essential Sentences for Navigating Stack Overflow Answers
Information Retrieval, Software Engineering, Computation and Language. 2 authors. pdf
Stack Overflow (SO) has become an essential resource for software development. Despite its success and prevalence, navigating SO remains a challenge. …
Ideally, SO users could benefit from highlighted navigational cues that help them decide if an answer is relevant to their task and context. Such navigational cues could be in the form of essential sentences that help the searcher decide whether they want to read the answer or skip over it. In this paper, we compare four potential approaches for identifying essential sentences. We adopt two existing approaches and develop two new approaches based on the idea that contextual information in a sentence (e.g., “if using windows”) could help identify essential sentences. We compare the four techniques using a survey of 43 participants. Our participants indicate that it is not always easy to figure out what the best solution for their specific problem is, given the options, and that they would indeed like to easily spot contextual information that may narrow down the search. Our quantitative comparison of the techniques shows that there is no single technique sufficient for identifying essential sentences that can serve as navigational cues, while our qualitative analysis shows that participants valued explanations and specific conditions, and did not value filler sentences or speculations. Our work sheds light on the importance of navigational cues, and our findings can be used to guide future research to find the best combination of techniques to identify such cues.
Computation and Language (cs.CL)
Intrinsic motivations and open-ended learning
Machine Learning, Artificial Intelligence, Machine Learning. 1 authors. pdf
There is a growing interest and literature on intrinsic motivations and open-ended learning in both cognitive robotics and machine learning on one side, and in psychology and neuroscience on the other. This paper aims to review some relevant contributions from the two literature threads and to draw links between them. …
To this purpose, the paper starts by defining intrinsic motivations and by presenting a computationally-driven theoretical taxonomy of their different types. Then it presents relevant contributions from the psychological and neuroscientific literature related to intrinsic motivations, interpreting them based on the grid, and elucidates the mechanisms and functions they play in animals and humans. Endowed with such concepts and their biological underpinnings, the paper next presents a selection of models from cognitive robotics and machine learning that computationally operationalise the concepts of intrinsic motivations and links them to biology concepts. The contribution finally presents some of the open challenges of the field from both the psychological/neuroscientific and computational perspectives.
Digital Libraries (cs.DL)
Definitions and Semantic Simulations Based on Object-Oriented Analysis and Modeling
Artificial Intelligence. 1 authors. pdf
We have proposed going beyond traditional ontologies to use rich semantics implemented in programming languages for modeling. In this paper, we discuss the application of executable semantic models to two examples, first a structured definition of a waterfall and second the cardiopulmonary system. …
We examine the components of these models and the way those components interact. Ultimately, such models should provide the basis for direct representation.
Emerging Technologies (cs.ET)
Towards Regulated Deep Learning
Multiagent Systems, Programming Languages, Logic in Computer Science, Machine Learning, Artificial Intelligence. 1 authors. pdf
Regulation of Multi-Agent Systems (MAS) was a research topic of the past decade and one of these proposals was Electronic Institutions. However, with the recent reformulation of Artificial Neural Networks (ANN) as Deep Learning (DL), Security, Privacy, Ethical and Legal issues regarding the use of DL has raised concerns in the Artificial Intelligence (AI) Community. …
Now that the Regulation of MAS is almost correctly addressed, we propose the Regulation of ANN as Agent-based Training of a special type of regulated ANN that we call Institutional Neural Network. This paper introduces the former concept and provides \(\mathcal{I}\), a language previously used to model and extend Electronic Institutions, as a means to implement and regulate DL.
Robotics (cs.RO)
BIRL: Benchmark on Image Registration methods with Landmark validation
Performance, Image and Video Processing, Computer Vision and Pattern Recognition. 1 authors. pdf
This report presents a generic image registration benchmark with automatic evaluation using landmark annotations. The BIRL framework has a few key features, such as: easily extendable, performance evaluation, parallel experimenting, simple visualisations, experiment’s time-out limit, pause/resume experiments. …
The main use-cases are (a) compare your (newly developed) method with some State-of-the-Art (SOTA) methods on a common dataset and (b) experiment SOTA methods on your custom dataset (which should contain landmark annotation). In this paper, we present mixed-methods aiming at bio-medical imaging and experimental result on CIMA dataset. However, any other methods for other domain can be added or costume dataset to be used. https://borda.github.io/BIRL

Statistics

Machine Learning (stat.ML)
Statistical Models in Forensic Voice Comparison
Audio and Speech Processing, Sound, Applications. 5 authors. pdf
This chapter describes a number of signal-processing and statistical-modeling techniques that are commonly used to calculate likelihood ratios in human-supervised automatic approaches to forensic voice comparison. Techniques described include mel-frequency cepstral coefficients (MFCCs) feature extraction, Gaussian mixture model - universal background model (GMM-UBM) systems, i-vector - probabilistic linear discriminant analysis (i-vector PLDA) systems, deep neural network (DNN) based systems (including senone posterior i-vectors, bottleneck features, and embeddings / x-vectors), mismatch compensation, and score-to-likelihood-ratio conversion (aka calibration). …
Empirical validation of forensic-voice-comparison systems is also covered. The aim of the chapter is to bridge the gap between general introductions to forensic voice comparison and the highly technical automatic-speaker-recognition literature from which the signal-processing and statistical-modeling techniques are mostly drawn. Knowledge of the likelihood-ratio framework for the evaluation of forensic evidence is assumed. It is hoped that the material presented here will be of value to students of forensic voice comparison and to researchers interested in learning about statistical modeling techniques that could potentially also be applied to data from other branches of forensic science.
Schrödinger Bridge Samplers
Computation, Machine Learning. 4 authors. pdf
Consider a reference Markov process with initial distribution \(\pi_{0}\) and transition kernels \(\{M_{t}\}_{t\in[1:T]}\), for some \(T\in\mathbb{N}\). Assume that you are given distribution \(\pi_{T}\), which is not equal to the marginal distribution of the reference process at time \(T\). …
In this scenario, Schr"odinger addressed the problem of identifying the Markov process with initial distribution \(\pi_{0}\) and terminal distribution equal to \(\pi_{T}\) which is the closest to the reference process in terms of Kullback–Leibler divergence. This special case of the so-called Schr"odinger bridge problem can be solved using iterative proportional fitting, also known as the Sinkhorn algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schr"odinger bridge samplers, to approximate a target distribution \(\pi\) on \(\mathbb{R}^{d}\) and to estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of the reference Markov chain to obtain a process whose marginal distribution at time \(T\) becomes closer to \(\pi_T = \pi\), via regression-based approximations of the corresponding iterative proportional fitting recursion. We report preliminary experiments and make connections with other problems arising in the optimal transport, optimal control and physics literatures.
Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization
Machine Learning, Optimization and Control, Machine Learning. 3 authors. pdf
Stochastic compositional optimization arises in many important machine learning tasks such as value function evaluation in reinforcement learning and portfolio management. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. …
In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: \(\mathcal{O}((n+m)^{1/2} \varepsilon^{-2})\) in the finite-sum case and \(\mathcal{O}(\varepsilon^{-3})\) in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal. Our experiments validate the theoretical performance of our algorithm.
Robust Aggregation for Federated Learning
Machine Learning, Cryptography and Security, Machine Learning. 3 authors. pdf
We present a robust aggregation approach to make federated learning robust to settings when a fraction of the devices may be sending corrupted updates to the server. The proposed approach relies on a robust secure aggregation oracle based on the geometric median, which returns a robust aggregate using a constant number of calls to a regular non-robust secure average oracle. …
The robust aggregation oracle is privacy-preserving, similar to the secure average oracle it builds upon. We provide experimental results of the proposed approach with linear models and deep networks for two tasks in computer vision and natural language processing. The robust aggregation approach is agnostic to the level of corruption; it outperforms the classical aggregation approach in terms of robustness when the level of corruption is high, while being competitive in the regime of low corruption.
Methodology (stat.ME)
On Testing for Biases in Peer Review
Methodology, Statistics Theory, Applications, Statistics Theory. 3 authors. pdf
We consider the issue of biases in scholarly research, specifically, in peer review. There is a long standing debate on whether exposing author identities to reviewers induces biases against certain groups, and our focus is on designing tests to detect the presence of such biases. …
Our starting point is a remarkable recent work by Tomkins, Zhang and Heavlin which conducted a controlled, large-scale experiment to investigate existence of biases in the peer reviewing of the WSDM conference. We present two sets of results in this paper. The first set of results is negative, and pertains to the statistical tests and the experimental setup used in the work of Tomkins et al. We show that the test employed therein does not guarantee control over false alarm probability and under correlations between relevant variables coupled with any of the following conditions, with high probability, can declare a presence of bias when it is in fact absent: (a) measurement error, (b) model mismatch, (c) reviewer calibration. Moreover, we show that the setup of their experiment may itself inflate false alarm probability if (d) bidding is performed in non-blind manner or (e) popular reviewer assignment procedure is employed. Our second set of results is positive and is built around a novel approach to testing for biases that we propose. We present a general framework for testing for biases in (single vs. double blind) peer review. We then design hypothesis tests that under minimal assumptions guarantee control over false alarm probability and non-trivial power even under conditions (a)–(c) as well as propose an alternative experimental setup which mitigates issues (d) and (e). Finally, we show that no statistical test can improve over the non-parametric tests we consider in terms of the assumptions required to control for the false alarm probability.
Driver fatigue EEG signals detection by using robust univariate analysis
Methodology, Applications, Signal Processing. 3 authors. pdf
Driver fatigue is a major cause of traffic accidents and the electroencephalogram (EEG) is considered one of the most reliable predictors of fatigue. This paper proposes a novel, simple and fast method for driver fatigue detection that can be implemented in real-time systems by using a single-channel on the scalp. …
The method based on the robust univariate analysis of EEG signals is composed of two stages. First, the most significant channel from EEG raw is selected according to the maximum variance. In the second stage, this single channel will be used to detect the fatigue EEG signal by extracting four feature parameters. Two parameters estimated from the robust univariate analysis, namely mean and covariance, and two classical statistics parameters such as variance and covariance that help to tune the robust analysis. Next, an ensemble bagged decision trees classifier is used in order to discriminate fatigue signals from alert signals. The proposed algorithm is demonstrated on 24 EEG signals from the Jiangxi University of Technology database using only the most significant channel found, which is located in the left tempo-parietal region where spatial awareness and visual-spatial navigation are shared, in terms of 92.7% accuracy with 1.8 seconds of time delay.
Prediction in the Presence of Missing Covariates
Methodology. 3 authors. pdf
In many applied fields incomplete covariate vectors are commonly encountered. It is well known that this can be problematic when making inference on model parameters, but its impact on prediction performance is less understood. …
We develop a method based on covariate dependent partition models that seamlessly handles missing covariates while avoiding completely any type of imputation. The method we develop is not only able to make in-sample predictions, but out-of-sample as well even if the missing pattern in the new subjects’ incomplete covariate vector was not seen in the training data. Further, both categorical and continuous covariates are permitted. Our proposal fares well when compared to other alternatives based on imputations. We illustrate the method using simulation studies and an ozone dataset.
Applications (stat.AP)
Approximate Inference for Fully Bayesian Gaussian Process Regression
Machine Learning, Machine Learning. 2 authors. pdf
Learning in Gaussian Process models occurs through the adaptation of hyperparameters of the mean and the covariance function. The classical approach entails maximizing the marginal likelihood yielding fixed point estimates (an approach called or ML-II). …
An alternative learning procedure is to infer the posterior over hyperparameters in a hierarchical specification of GPs we call (GPR). This work considers two approximation schemes for the intractable hyperparameter posterior: 1) Hamiltonian Monte Carlo (HMC) yielding a sampling-based approximation and 2) Variational Inference (VI) where the posterior over hyperparameters is approximated by a factorized Gaussian (mean-field) or a full-rank Gaussian accounting for correlations between hyperparameters. We analyze the predictive performance for fully Bayesian GPR on a range of benchmark data sets.
Asymptotic Risk of Least Squares Minimum Norm Estimator under the Spike Covariance Model
Machine Learning, Machine Learning. 2 authors. pdf
One of the recent approaches to explain good performance of neural networks has focused on their ability to fit training data perfectly (interpolate) without overfitting. It has been shown that this is not unique to neural nets, and that it happens with simpler models such as kernel regression, too Belkin et al. …
(2018b); Tengyuan Liang (2018). Consequently, there has been quite a few works that give conditions for low risk or optimality of interpolating models, see for example Belkin et al. (2018a, 2019b). One of the simpler models where interpolation has been studied recently is least squares solution for linear regression. In this case, interpolation is only guaranteed to happen in high dimensional setting where the number of parameters exceeds number of samples; therefore, least squares solution is not necessarily unique. However, minimum norm solution is unique, can be written in closed form, and gradient descent starting at the origin converges to it (Hastie et al., 2019). This has, at least partially, motivated several works that study risk of minimum norm least squares estimator for linear regression. Continuing in a similar vein, we study the asymptotic risk of minimum norm least squares estimator when number of parameters \(d\) depends on \(n\), and \(\frac{d}{n} \rightarrow \infty\). In this high dimensional setting, to make inference feasible, it is usually assumed that true parameters or data have some underlying low dimensional structure such as sparsity, or vanishing eigenvalues of population covariance matrix. Here, we restrict ourselves to spike covariance matrices, where a fixed finite number of eigenvalues grow with \(n\) and are much larger than the rest of the eigenvalues, which are (asymptotically) in the same order. We show that in this setting the risk can vanish.
Computation (stat.CO)
Parallel cross-validation: a scalable fitting method for Gaussian process models
Computation. 2 authors. pdf
Gaussian process (GP) models are widely used to analyze spatially referenced data and to predict values at locations without observations. In contrast to many algorithmic procedures, GP models are based on a statistical framework, which enables uncertainty quantification of the model structure and predictions. …
Both the evaluation of the likelihood and the prediction involve solving linear systems. Hence, the computational costs are large and limit the amount of data that can be handled. While there are many approximation strategies that lower the computational cost of GP models, they often provide only sub-optimal support for the parallel computing capabilities of current (high-performance) computing environments. We aim at bridging this gap with a parameter estimation and prediction method that is designed to be parallelizable. More precisely, we divide the spatial domain into overlapping subsets and use cross-validation (CV) to estimate the covariance parameters in parallel. We present simulation studies, which assess the accuracy of the parameter estimates and predictions. Moreover, we show that our implementation has good weak and strong parallel scaling properties. For illustration, we fit an exponential covariance model to a scientifically relevant canopy height dataset with 5 million observations. Using 512 processor cores in parallel brings the evaluation time of one covariance parameter configuration to less than 1.5 minutes. The parallel CV method can be easily extended to include approximate likelihood methods, multivariate and spatio-temporal data, as well as non-stationary covariance models.
A Dynamic Process Reference Model for Sparse Networks with Reciprocity
Social and Information Networks, Methodology. 1 authors. pdf
Many social and other networks exhibit stable size scaling relationships, such that features such as mean degree or reciprocation rates change slowly or are approximately constant as the number of vertices increases. Statistical network models built on top of simple Bernoulli baseline (or reference) measures often behave unrealistically in this respect, leading to the development of sparse reference models that preserve features such as mean degree scaling. …
In this paper, we generalize recent work on the micro-foundations of such reference models to the case of sparse directed graphs with non-vanishing reciprocity, providing a dynamic process interpretation of the emergence of stable macroscopic behavior.

Mathematics

Statistics Theory (math.ST)
True and false discoveries with e-values
Statistics Theory, Statistics Theory. 2 authors. pdf
We discuss controlling the number of false discoveries using e-values instead of p-values. Using e-values simplifies the known algorithms radically. …
Bayesian Generalization Error of Poisson Mixture and Simplex Vandermonde Matrix Type Singularity
Statistics Theory, Statistics Theory. 2 authors. pdf
A Poisson mixture is one of the practically important models in computer science, biology, and sociology. However, the theoretical property has not been studied because the posterior distribution can not be approximated by any normal distribution. …
Such a model is called singular and it is known that Real Log Canonical Threshold (RLCT) is equal to the coefficient of the asymptotically main term of the Bayesian generalization error. In this paper, we derive RLCT of a simplex Vandermonde matrix type singularity which is equal to that of a Poisson mixture in general cases.
Model-free Bootstrap for a General Class of Stationary Time Series
Statistics Theory, Statistics Theory. 2 authors. pdf
A model-free bootstrap procedure for a general class of stationary time series is introduced. The theoretical framework is established, showing asymptotic validity of bootstrap confidence intervals for many statistics of interest. …
In addition, asymptotic validity of one-step ahead bootstrap prediction intervals is also demonstrated. Finite-sample experiments are conducted to empirically confirm the performance of the new method, and to compare with popular methods such as the block bootstrap and the autoregressive (AR)-sieve bootstrap.
A frequency-domain analysis of inexact gradient descent
Machine Learning, Numerical Analysis, Numerical Analysis, Optimization and Control, Systems and Control. 1 authors. pdf
We study robustness properties of inexact gradient descent for strongly convex functions, as well as for the larger class of functions with sector-bounded gradients, under a relative error model. Proofs of the corresponding convergence rates are based on frequency-domain criteria for the stability of nonlinear systems perturbed by additive noise. …
Numerical Analysis (math.NA)
Some compact notations for concentration inequalities and user-friendly results
Statistics Theory, Machine Learning, Information Theory, Machine Learning, Statistics Theory, Information Theory, Probability. 1 authors. pdf
This paper presents compact notations for concentration inequalities and convenient results to streamline probabilistic analysis. The new expressions describe the typical sizes and tails of random variables, allowing for simple operations without heavy use of inessential constants. …
They bridge classical asymptotic notations and modern non-asymptotic tail bounds together. Examples of different kinds demonstrate their efficacy.
Optimization and Control (math.OC)
Numerical Linear Algebra in Data Assimilation
Computation, Numerical Analysis, Applications, Numerical Analysis. 1 authors. pdf
Data assimilation is a method that combines observations (e.g. …
real world data) of a state of a system with model output for that system in order to improve the estimate of the state of the system and thereby the model output. The model is usually represented by a discretised partial differential equation. The data assimilation problem can be formulated as a large scale Bayesian inverse problem. Based on this interpretation we will derive the most important variational and sequential data assimilation approaches, in particular three-dimensional and four-dimensional variational data assimilation (3D-Var and 4D-Var) and the Kalman filter. We will then consider more advanced methods which are extensions of the Kalman filter and variational data assimilation and will pay particular attention to their advantages and disadvantages. The data assimilation problem usually results in a very large optimisation problem and/or a very large linear system to solve (due to inclusion of time and space dimensions). Therefore, the second part of this article aims to review advances and challenges, in particular from the numerical linear algebra perspective, within the various data assimilation approaches.

Elec. Eng. and Systems Science

Image and Video Processing (eess.IV)
Automatic segmentation and determining radiodensity of the liver in a large-scale CT database
Image and Video Processing, Computer Vision and Pattern Recognition, Medical Physics. 9 authors. pdf
This study proposes an automatic technique for liver segmentation in computed tomography (CT) images. Localization of the liver volume is based on the correlation with an optimized set of liver templates developed by the authors that allows clear geometric interpretation. …
Radiodensity values are calculated based on the boundaries of the segmented liver, which allows identifying liver abnormalities. The performance of the technique was evaluated on 700 CT images from dataset of the Unified Radiological Information System (URIS) of Moscow. Despite the decrease in accuracy, the technique is applicable to CT volumes with a partially visible region of the liver. The technique can be used to process CT images obtained in various patient positions in a wide range of exposition parameters. It is capable in dealing with low dose CT scans in real large-scale medical database with over 1 million of studies.
Volumetric Lung Nodule Segmentation using Adaptive ROI with Multi-View Residual Learning
Machine Learning, Image and Video Processing, Computer Vision and Pattern Recognition, Machine Learning. 5 authors. pdf
Accurate quantification of pulmonary nodules can greatly assist the early diagnosis of lung cancer, which can enhance patient survival possibilities. A number of nodule segmentation techniques have been proposed, however, all of the existing techniques rely on radiologist 3-D volume of interest (VOI) input or use the constant region of interest (ROI) and only investigate the presence of nodule voxels within the given VOI. …
Such approaches restrain the solutions to investigate the nodule presence outside the given VOI and also include the redundant structures into VOI, which may lead to inaccurate nodule segmentation. In this work, a novel semi-automated approach for 3-D segmentation of nodule in volumetric computerized tomography (CT) lung scans has been proposed. The proposed technique can be segregated into two stages, at the first stage, it takes a 2-D ROI containing the nodule as input and it performs patch-wise investigation along the axial axis with a novel adaptive ROI strategy. The adaptive ROI algorithm enables the solution to dynamically select the ROI for the surrounding slices to investigate the presence of nodule using deep residual U-Net architecture. The first stage provides the initial estimation of nodule which is further utilized to extract the VOI. At the second stage, the extracted VOI is further investigated along the coronal and sagittal axis with two different networks and finally, all the estimated masks are fed into the consensus module to produce the final volumetric segmentation of nodule. The proposed approach has been rigorously evaluated on the LIDC dataset, which is the largest publicly available dataset. The result suggests that the approach is significantly robust and accurate as compared to the previous state of the art techniques.
Learning Wavefront Coding for Extended Depth of Field Imaging
Image and Video Processing, Computer Vision and Pattern Recognition, Optics. 3 authors. pdf
The depth of field constitutes an important quality factor of imaging systems that highly affects the content of the acquired spatial information in the captured images. Extended depth of field (EDoF) imaging is a challenging problem due to its highly ill-posed nature, hence it has been extensively addressed in the literature. …
We propose a computational imaging approach for EDoF, where we employ wavefront coding via a diffractive optical element (DOE) and we achieve deblurring through a convolutional neural network. Thanks to the end-to-end differentiable modeling of optical image formation and computational post-processing, we jointly optimize the optical design, i.e., DOE, and the deblurring through standard gradient descent methods. Based on the properties of the underlying refractive lens and the desired EDoF range, we provide an analytical expression for the search space of the DOE, which helps in the convergence of the end-to-end network. We achieve superior EDoF imaging performance compared to state of the art, where we demonstrate results with minimal artifacts in various scenarios, including deep 3D scenes and broadband imaging.
Microlens array grid estimation, light field decoding, and calibration
Image and Video Processing, Computer Vision and Pattern Recognition. 2 authors. pdf
We quantitatively investigate multiple algorithms for microlens array grid estimation for microlens array-based light field cameras. Explicitly taking into account natural and mechanical vignetting effects, we propose a new method for microlens array grid estimation that outperforms the ones previously discussed in the literature. …
To quantify the performance of the algorithms, we propose an evaluation pipeline utilizing application-specific ray-traced white images with known microlens positions. Using a large dataset of synthesized white images, we thoroughly compare the performance of the different estimation algorithms. As an example, we apply our results to the decoding and calibration of light fields taken with a Lytro Illum camera. We observe that decoding as well as calibration benefit from a more accurate, vignetting-aware grid estimation, especially in peripheral subapertures of the light field.

Condensed Matter

Materials Science (cond-mat.mtrl-sci)
Koopmans Meets Bethe-Salpeter: Excitonic Optical Spectra without GW
Computational Physics, Materials Science. 5 authors. pdf
The Bethe-Salpeter Equation (BSE) can be applied to compute from first-principles optical spectra that include the effects of screened electron-hole interactions. As input, BSE calculations require single-particle states, quasiparticle energy levels and the screened Coulomb interaction, which are typically obtained with many-body perturbation theory, whose cost limits the scope of possible applications. …
This work tries to address this practical limitation, instead deriving spectral energies from Koopmans-compliant functionals and introducing a new methodology for handling the screened Coulomb interaction. The explicit calculation of the \(W\) matrix is bypassed via a direct minimization scheme applied on top of a maximally localised Wannier function basis. We validate and benchmark this approach by computing the low-lying excited states of the molecules in Thiel’s set, and the optical absorption spectrum of a \(\text{C}_{60}\) fullerene. The results show the same trends as quantum chemical methods and are in excellent agreement with previous simulations carried out at the TD-DFT or \(G_{0}W_{0}\)- level. Conveniently, the new framework reduces the parameter space controlling the accuracy of the calculation, thereby simplifying the simulation of charge-neutral excitations, offering the potential to expand the applicability of first-principles spectroscopies to larger systems of applied interest.
Machine Learning the Effective Hamiltonian in High Entropy Alloys
Computational Physics, Materials Science. 4 authors. pdf
The development of machine learning sheds new light on the problem of statistical thermodynamics in multicomponent alloys. However, a data-driven approach to construct the effective Hamiltonian requires sufficiently large data sets, which is expensive to calculate with conventional density functional theory (DFT). …
To solve this problem, we propose to use the atomic local energy as the target variable, and harness the power of the linear-scaling DFT to accelerate the data generating process. Using the large amounts of DFT data sets, various complex models are devised and applied to learn the effective Hamiltonians of a range of refractory high entropy alloys (HEAs). The testing \(R^2\) scores of the effective pair interaction model are higher than 0.99, demonstrating that the pair interactions within the 6-th coordination shell provide an excellent description of the atomic local energies for all the four HEAs. This model is further improved by including nonlinear and multi-site interactions. In particular, the deep neural networks (DNNs) built directly in the local configuration space (therefore no hand-crafted features) are employed to model the effective Hamiltonian. The results demonstrate that neural networks are promising for the modeling of effective Hamiltonian due to its excellent representation power.
Machine-Learning X-ray Absorption Spectra to Quantitative Accuracy
Computational Physics, Materials Science. 4 authors. pdf
The advent of massive data repositories has propelled machine learning techniques to the front lines of many scientific fields, and exploring new frontiers by leveraging the predictive power of machine learning will greatly accelerate big data-assisted discovery. In this work, we show that graph-based neural networks can be used to predict the near edge x-ray absorption structure spectra of molecules with exceptional accuracy. …
The predicted spectra reproduce nearly all the prominent peaks, with 90% of the predicted peak locations within 1 eV of the ground truth. Our study demonstrates that machine learning models can achieve practically the same accuracy as first-principles calculations in predicting complex physical quantities, such as spectral functions, but at a fraction of the cost.

Physics

Computational Physics (physics.comp-ph)
ELSI – An Open Infrastructure for Electronic Structure Solvers
Computational Physics, Materials Science. 19 authors. pdf
Routine applications of electronic structure theory to molecules and periodic systems need to compute the electron density from given Hamiltonian and, in case of non-orthogonal basis sets, overlap matrices. System sizes can range from few to thousands or, in some examples, millions of atoms. …
Different discretization schemes (basis sets) and different system geometries (finite non-periodic vs. infinite periodic boundary conditions) yield matrices with different structures. The ELectronic Structure Infrastructure (ELSI) project provides an open-source software interface to facilitate the implementation and optimal use of high-performance solver libraries covering cubic scaling eigensolvers, linear scaling density-matrix-based algorithms, and other reduced scaling methods in between. In this paper, we present recent improvements and developments inside ELSI, mainly covering (1) new solvers connected to the interface, (2) matrix layout and communication adapted for parallel calculations of periodic and/or spin-polarized systems, (3) routines for density matrix extrapolation in geometry optimization and molecular dynamics calculations, and (4) general utilities such as parallel matrix I/O and JSON output. The ELSI interface has been integrated into four electronic structure code projects (DFTB+, DGDFT, FHI-aims, SIESTA), allowing us to rigorously benchmark the performance of the solvers on an equal footing. Based on results of a systematic set of large-scale benchmarks performed with Kohn-Sham density-functional theory and density-functional tight-binding theory, we identify factors that strongly affect the efficiency of the solvers, and propose a decision layer that assists with the solver selection process. Finally, we describe a reverse communication interface encoding matrix-free iterative solver strategies that are amenable, e.g., for use with planewave basis sets.
Physics and Society (physics.soc-ph)
The brevity law as a scaling law, and a possible origin of Zipf’s law for word frequencies
Adaptation and Self-Organizing Systems, Physics and Society, Data Analysis, Statistics and Probability. 2 authors. pdf
An important body of quantitative linguistics is constituted by a series of statistical laws about language usage. Despite the importance of these linguistic laws, some of them are poorly formulated, and, more importantly, there is no unified framework that encompasses all them. …
This paper presents a new perspective to establish a connection between different statistical linguistic laws. Characterizing each word type by two random variables, length (in number of characters) and absolute frequency, we show that the corresponding bivariate joint probability distribution shows a rich and precise phenomenology, with the type-length and the type-frequency distributions as its two marginals, and the conditional distribution of frequency at fixed length providing a clear formulation for the brevity-frequency phenomenon. The type-length distribution turns out to be well fitted by a gamma distribution (much better than with the previously proposed lognormal), and the conditional frequency distributions at fixed length display power-law-decay behavior with a fixed exponent \(\alpha\simeq 1.4\) and a characteristic-frequency crossover that scales as an inverse power \(\delta\simeq 2.8\) of length, which implies the fulfilment of a scaling law analogous to those found in the thermodynamics of critical phenomena. As a by-product, we find a possible model-free explanation for the origin of Zipf’s law, which should arise as a mixture of conditional frequency distributions governed by the crossover length-dependent frequency.

Quantitative Biology

Biomolecules (q-bio.BM)
LinearPartition: Linear-Time Approximation of RNA Folding Partition Function and Base Pairing Probabilities
Data Structures and Algorithms, Biological Physics, Biomolecules, Quantitative Methods. 4 authors. pdf
RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy (MFE) methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. …
However, the classic partition function algorithm scales cubically with sequence length, and is therefore a slow calculation for long sequences. This slowness is even more severe than cubic-time MFE-based methods due to a larger constant factor in runtime. Inspired by the success of LinearFold algorithm that computes the MFE structure in linear time, we address this issue by proposing a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base pairing probabilities. LinearPartition is 256x faster than Vienna RNAfold for a sequence with length 15,780, and 2,771x faster than CONTRAfold for a sequence with length 32,753. Interestingly, although LinearPartition is approximate, it runs in linear time without sacrificing accuracy when base pair probabilities are used to assemble structures, and even leads to a small accuracy improvement on longer families (16S and 23S rRNA).
Populations and Evolution (q-bio.PE)
Optimal evolutionary control for artificial selection on molecular phenotypes
Populations and Evolution, Biological Physics, Quantitative Methods. 2 authors. pdf
Controlling an evolving population is an important task in modern molecular genetics, including in directed evolution to improve the activity of molecules and enzymes, breeding experiments in animals and in plants, and in devising public health strategies to suppress evolving pathogens. An optimal intervention to direct evolution should be designed by considering its impact over an entire stochastic evolutionary trajectory that follows. …
As a result, a seemingly suboptimal intervention at a given time can be globally optimal as it can open opportunities for desirable actions in the future. Here, we propose a feedback control formalism to devise globally optimal artificial selection protocol to direct the evolution of molecular phenotypes. We show that artificial selection should be designed to counter evolutionary tradeoffs among multi-variate phenotypes to avoid undesirable outcomes in one phenotype by imposing selection on another. Control by artificial selection is challenged by our ability to predict molecular evolution. We develop an information theoretical framework and show that molecular time-scales for evolution under natural selection can inform how to monitor a population to acquire sufficient predictive information for an effective intervention with artificial selection. Our formalism opens a new avenue for devising optimal artificial selection for directed evolution of molecular functions.

Other

Earth and Planetary Astrophysics (astro-ph.EP)
Production of nitric oxide by a fragmenting bolide: An exploratory numerical study
Computational Physics, Earth and Planetary Astrophysics. 3 authors. pdf
A meteoroid’s hypersonic passage through the Earth’s atmosphere results in ablational and fragmentational mass loss. Potential shock waves associated with a parent object as well as its fragments can modify the surrounding atmosphere and produce a range of physico-chemical effects. …
Some of the thermally driven chemical and physical processes induced by meteoroid-fragment generated shock waves, such as nitric oxide (NO) production, are less understood. Any estimates of meteoric NO production depend not only on a quantifiable meteoroid population and a rate of fragmentation, with a size capable of producing high temperature flows, but also on understanding the physical properties of the meteor flows along with their thermal history. We performed an exploratory pilot numerical study using ANSYS Fluent, the CFD code, to investigate the production of NO in the upper atmosphere by small meteoroids (or fragments of meteoroids after they undergo a disruption episode) in the size range from 10-2 m to 1 m. Our model uses the simulation of a spherical body in the continuum flow at 70 and 80 km altitude to approximate the behaviour of a small meteoroid capable of producing NO. The results presented in this exploratory study are in good agreement with previous studies.

Quantum Physics

Quantum Physics (quant-ph)
Quantum annealing approach to Ionic Diffusion in Solid
Computational Physics, Quantum Physics. 5 authors. pdf
We developed a framework to evaluate a key quantity to describe the ionic diffusion in solids, the correlation factor, for which we can apply the quantum annealing computation to get rid of the current limitation for the evaluation being possible only for the simple models far from the practical diffusion paths. Though the current \({\it ab initio}\) technique can provide the quantitative information about the diffusion paths network, the difficulty to evaluate the coefficient has hampered to connect such microscopic \({\it ab initio}\) works with macroscopic quantities like diffusion coefficients. …
By applying the framework, we can evaluate how the diffusion coefficients are controlled by temperatures, pressures, atomic substitutions \({\it etc.}\) when coupled with \({\it ab initio}\) technique. We formulated the problem in terms of the mapping into a quantum spin systems described by the Ising Hamiltonian. Calibrating verifications to get a value of the coefficient already known for a simple model is performed on the comparisons amongst various possible methods including simulated quantum annealing on the spin models, the classical random walk, and the matrix description for the problem. We have confirmed that all the evaluations give consistent results with each other though some of the conventional approaches require infeasible computational costs supporting the advantage of the quantum annealing application.