38 new data science research articles were published on 2020-01-02. 17 discussed machine learning.
Yesterday’s counts of submitted papers on www.arxiv.org grouped by primary subject. Click the links in the table to be re-directed to the abstracts below. The links under Subject
will redirect you to abstracts with the primary subject (there can only be one primary subject on arXiv). The links under Category
will redirect you to all publications yesterday with a given tag (primary or secondary).
Subject | Category | N |
---|---|---|
Computer Science (18) | Machine Learning (cs.LG) | 10 (6) |
Computer Vision and Pattern Recognition (cs.CV) | 5 (6) | |
Artificial Intelligence (cs.AI) | 1 (2) | |
Databases (cs.DB) | 1 | |
Software Engineering (cs.SE) | 1 | |
Statistics (6) | Machine Learning (stat.ML) | 2 (12) |
Applications (stat.AP) | 2 (1) | |
Methodology (stat.ME) | 2 (1) | |
Elec. Eng. and Systems Science (4) | Image and Video Processing (eess.IV) | 4 |
Mathematics (3) | Statistics Theory (math.ST) | 2 |
Numerical Analysis (math.NA) | 1 | |
Physics (3) | Computational Physics (physics.comp-ph) | 1 (1) |
Chemical Physics (physics.chem-ph) | 1 | |
Data Analysis, Statistics and Probability (physics.data-an) | 1 | |
Other (1) | Instrumentation and Methods for Astrophysics (astro-ph.IM) | 1 |
Quantitative Biology (1) | Populations and Evolution (q-bio.PE) | 1 |
Quantitative Finance (1) | Risk Management (q-fin.RM) | 1 |
Quantum Physics (1) | Quantum Physics (quant-ph) | 1 |
This section contains all articles with any tag of stat.AP
, stat.co
, stat.ML
, cs.LG
, q-fin.ST
, q-fin.EC
, or econ-EM
. Only the first two sentences are shown - click the links for more detail.
Applications (stat.AP) |
CircSpaceTime: an R package for spatial and spatio-temporal modeling of Circular data Applications. 3 authors. pdf CircSpaceTime is the only R package currently available that implements Bayesian models for spatial and spatio-temporal interpolation of circular data. Such data are often found in applications where, among the many, wind directions, animal movement directions, and wave directions are involved. …To analyze such data we need models for observations at locations s and times t, as the so-called geostatistical models, providing structured dependence assumed to decay in distance and time. The approach we take begins with Gaussian processes defined for linear variables over space and time. Then, we use either wrapping or projection to obtain processes for circular data. The models are cast as hierarchical, with fitting and inference within a Bayesian framework. Altogether, this package implements work developed by a series of papers; the most relevant being Jona Lasinio, Gelfand, and Jona Lasinio (2012); Wang and Gelfand (2014); Mastrantonio, Jona Lasinio, and Gelfand (2016). All procedures are written using Rcpp. Estimates are obtained by MCMC allowing parallelized multiple chains run. The implementation of the proposed models is considerably improved on the simple routines adopted in the research papers. As original running examples, for the spatial and spatio-temporal settings, we use wind directions datasets over central Italy. |
Concentration of Benefit index: A threshold-free summary metric for quantifying the capacity of covariates to yield efficient treatment rules Applications, Methodology. 3 authors. pdf When data on treatment assignment, outcomes, and covariates from a randomized trial are available, a question of interest is to what extent covariates can be used to optimize treatment decisions. Statistical hypothesis testing of covariate-by-treatment interaction is ill-suited for this purpose. …The application of decision theory results in treatment rules that compare the expected benefit of treatment given the patient’s covariates against a treatment threshold. However, determining treatment threshold is often context-specific, and any given threshold might seem arbitrary when the overall capacity towards predicting treatment benefit is of concern. We propose the Concentration of Benefit index (Cb), a threshold-free metric that quantifies the combined performance of covariates towards finding individuals who will benefit the most from treatment. The construct of the proposed index is comparing expected treatment outcomes with and without knowledge of covariates when one of a two randomly selected patients are to be treated. We show that the resulting index can also be expressed in terms of the integrated efficiency of individualized treatment decision over the entire range of treatment thresholds. We propose parametric and semi-parametric estimators, the latter being suitable for out-of-sample validation and correction for optimism. We used data from a clinical trial to demonstrate the calculations in a step-by-step fashion, and have provided the R code for implementation (https://github.com/msadatsafavi/txBenefit). The proposed index has intuitive and theoretically sound interpretation and can be estimated with relative ease for a wide class of regression models. Beyond the conceptual developments, various aspects of estimation and inference for such a metric need to be pursued in future research. |
The Impact of the Choice of Risk and Dispersion Measure on Procyclicality Risk Management, Applications. 2 authors. pdf Procyclicality of historical risk measure estimation means that one tends to over-estimate future risk when present realized volatility is high and vice versa under-estimate future risk when the realized volatility is low. Out of it different questions arise, relevant for applications and theory: What are the factors which affect the degree of procyclicality? More specifically, how does the choice of risk measure affect this? How does this behaviour vary with the choice of realized volatility estimator? How do different underlying model assumptions influence the pro-cyclical effect? In this paper we consider three different well-known risk measures (Value-at-Risk, Expected Shortfall, Expectile), the r-th absolute centred sample moment, for any integer \(r>0\), as realized volatility estimator (this includes the sample variance and the sample mean absolute deviation around the sample mean) and two models (either an iid model or an augmented GARCH(\(p\),\(q\)) model). …We show that the strength of procyclicality depends on these three factors, the choice of risk measure, the realized volatility estimator and the model considered. But, no matter the choices, the procyclicality will always be present. |
Machine Learning (stat.ML) |
Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics Machine Learning, Machine Learning, Robotics. 10 authors. pdf Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or fully discrete action spaces. …These simplifications aim at tailoring the problem to a particular algorithm or solver which may only support one type of action space. Alternatively, expert heuristics are used to remove discrete actions from an otherwise continuous space. In contrast, we propose to treat hybrid problems in their ‘native’ form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously. In our experiments, we first demonstrate that the proposed approach efficiently solves such natively hybrid reinforcement learning problems. We then show, both in simulation and on robotic hardware, the benefits of removing possibly imperfect expert-designed heuristics. Lastly, hybrid reinforcement learning encourages us to rethink problem definitions. We propose reformulating control problems, e.g. by adding meta actions, to improve exploration or reduce mechanical wear and tear. |
Using Data Imputation for Signal Separation in High Contrast Imaging Machine Learning, Earth and Planetary Astrophysics, Instrumentation and Methods for Astrophysics, Solar and Stellar Astrophysics. 8 authors. pdf To characterize circumstellar systems in high contrast imaging, the fundamental step is to construct a best point spread function (PSF) template for the non-circumstellar signals (i.e. …, star light and speckles) and separate it from the observation. With existing PSF construction methods, the circumstellar signals (e.g., planets, circumstellar disks) are unavoidably altered by over-fitting and/or self-subtraction, making forward modeling a necessity to recover these signals. We present a forward modeling–free solution to these problems with data imputation using sequential non-negative matrix factorization (DI-sNMF). DI-sNMF first converts this signal separation problem to a “missing data” problem in statistics by flagging the regions which host circumstellar signals as missing data, then attributes PSF signals to these regions. We mathematically prove it to have negligible alteration to circumstellar signals when the imputation region is relatively small, which thus enables precise measurement for these circumstellar objects. We apply it to simulated point source and circumstellar disk observations to demonstrate its proper recovery of them. We apply it to Gemini Planet Imager (GPI) K1-band observations of the debris disk surrounding HR 4796A, finding a tentative trend that the dust is more forward scattering as the wavelength increases. We expect DI-sNMF to be applicable to other general scenarios where the separation of signals is needed. |
Reasoning on Knowledge Graphs with Debate Dynamics Machine Learning, Machine Learning. 6 authors. pdf We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments – paths in the knowledge graph – with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. …Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users. |
Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation Machine Learning, Machine Learning, Robotics, Artificial Intelligence. 4 authors. pdf Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. …On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1) reward ambiguity - there are an infinite number of possible reward functions that could explain an expert’s demonstration and 2) heterogeneity - human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans’ strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy’s objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task. |
Restricting the Flow: Information Bottlenecks for Attribution Machine Learning, Machine Learning, Computer Vision and Pattern Recognition. 4 authors. pdf Attribution methods provide insights into the decision-making of machine learning models like artificial neural networks. For a given input sample, they assign a relevance score to each individual input variable, such as the pixels of an image. …In this work we adapt the information bottleneck concept for attribution. By adding noise to intermediate feature maps we restrict the flow of information and can quantify (in bits) how much information image regions provide. We compare our method against ten baselines using three different metrics on VGG-16 and ResNet-50, and find that our methods outperform all baselines in five out of six settings. The method’s information-theoretic foundation provides an absolute frame of reference for attribution values (bits) and a guarantee that regions scored close to zero are not necessary for the network’s decision. |
Kernelized Support Tensor Train Machines Machine Learning, Machine Learning, Computer Vision and Pattern Recognition. 4 authors. pdf Tensor, a multi-dimensional data structure, has been exploited recently in the machine learning community. Traditional machine learning approaches are vector- or matrix-based, and cannot handle tensorial data directly. …In this paper, we propose a tensor train (TT)-based kernel technique for the first time, and apply it to the conventional support vector machine (SVM) for image classification. Specifically, we propose a kernelized support tensor train machine that accepts tensorial input and preserves the intrinsic kernel property. The main contributions are threefold. First, we propose a TT-based feature mapping procedure that maintains the TT structure in the feature space. Second, we demonstrate two ways to construct the TT-based kernel function while considering consistency with the TT inner product and preservation of information. Third, we show that it is possible to apply different kernel functions on different data modes. In principle, our method tensorizes the standard SVM on its input structure and kernel mapping scheme. Extensive experiments are performed on real-world tensor data, which demonstrates the superiority of the proposed scheme under few-sample high-dimensional inputs. |
Robust Marine Buoy Placement for Ship Detection Using Dropout K-Means Machine Learning, Machine Learning. 4 authors. pdf Marine buoys aid in the battle against Illegal, Unreported and Unregulated (IUU) fishing by detecting fishing vessels in their vicinity. Marine buoys, however, may be disrupted by natural causes and buoy vandalism. …To minimize the effects of buoy disruption on a buoy network, we propose a more robust buoy placement using dropout k-means and dropout k-median. We apply dropout k-means and dropout k-median to determine locations for deploying marine buoys in the Gabonese waters near West Africa. We simulated the passage of ships using historical Automatic Identification System (AIS) data, then compared the ship detection probability of dropout k-means to classic k-means and dropout k-median to classic k-median, taking into account that the current sensor detection radius is 10km. With 5 buoys, the buoy arrangement computed by classic k-means, dropout k-means, classic k-median and dropout k-median have ship detection probabilities of 38%, 45%, 48% and 52%. |
Non-Parametric Learning of Gaifman Models Machine Learning, Machine Learning. 4 authors. pdf We consider the problem of structure learning for Gaifman models and learn relational features that can be used to derive feature representations from a knowledge base. These relational features are first-order rules that are then partially grounded and counted over local neighborhoods of a Gaifman model to obtain the feature representations. …We propose a method for learning these relational features for a Gaifman model by using relational tree distances. Our empirical evaluation on real data sets demonstrates the superiority of our approach over classical rule-learning. |
A Deep Structural Model for Analyzing Correlated Multivariate Time Series Machine Learning, Machine Learning. 3 authors. pdf Multivariate time series are routinely encountered in real-world applications, and in many cases, these time series are strongly correlated. In this paper, we present a deep learning structural time series model which can (i) handle correlated multivariate time series input, and (ii) forecast the targeted temporal sequence by explicitly learning/extracting the trend, seasonality, and event components. …The trend is learned via a 1D and 2D temporal CNN and LSTM hierarchical neural net. The CNN-LSTM architecture can (i) seamlessly leverage the dependency among multiple correlated time series in a natural way, (ii) extract the weighted differencing feature for better trend learning, and (iii) memorize the long-term sequential pattern. The seasonality component is approximated via a non-liner function of a set of Fourier terms, and the event components are learned by a simple linear function of regressor encoding the event dates. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of time series data sets, such as forecasts of Amazon AWS Simple Storage Service (S3) and Elastic Compute Cloud (EC2) billings, and the closing prices for corporate stocks in the same category. |
Adversarial Policies in Learning Systems with Malicious Experts Multiagent Systems, Machine Learning, Machine Learning, Cryptography and Security. 3 authors. pdf We consider a learning system based on the conventional multiplicative weight (MW) rule that combines experts’ advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. …The loss of the system is naturally defined to be the aggregate absolute difference between the sequence of predicted outcomes and the true outcomes. We consider this problem under both offline and online settings. In the offline setting where the malicious expert must choose its entire sequence of decisions a priori, we show somewhat surprisingly that a simple greedy policy of always reporting false prediction is asymptotically optimal with an approximation ratio of \(1+O(\sqrt{\frac{\ln N}{N}})\), where \(N\) is the total number of prediction stages. In particular, we describe a policy that closely resembles the structure of the optimal offline policy. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in \(O(N^2)\). Our results provide a new direction for vulnerability assessment of commonly used learning algorithms to adversarial attacks where the threat is an integral part of the system. |
Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition Machine Learning, Machine Learning. 3 authors. pdf Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models, and generally yields improved performance and faster training times. The technique of pre-training on one task and then retraining on a new one is called transfer learning. …In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks. We perform three sets of experiments with varying levels of similarity between source and target tasks to investigate the behaviour of different types of knowledge transfer. We transfer both parameters and features and analyse their behaviour. Our results demonstrate that no significant advantage is gained by using a transfer learning approach over a traditional machine learning approach for our character recognition tasks. This suggests that using transfer learning does not necessarily presuppose a better performing model in all cases. |
On Consequentialism and Fairness Computers and Society, Machine Learning, Artificial Intelligence, Machine Learning. 2 authors. pdf Recent work on fairness in machine learning has primarily emphasized how to define, quantify, and encourage “fair” outcomes. Less attention has been paid, however, to the ethical foundations which underlie such efforts. …Among the ethical perspectives that should be taken into consideration is consequentialism, the position that, roughly speaking, outcomes are all that matter. Although consequentialism is not free from difficulties, and although it does not necessarily provide a tractable way of choosing actions (because of the combined problems of uncertainty, subjectivity, and aggregation), it nevertheless provides a powerful foundation from which to critique the existing literature on machine learning fairness. Moreover, it brings to the fore some of the tradeoffs involved, including the problem of who counts, the pros and cons of using a policy, and the relative value of the distant future. In this paper we provide a consequentialist critique of common definitions of fairness within machine learning, as well as a machine learning perspective on consequentialism. We conclude with a broader discussion of the issues of learning and randomization, which have important implications for the ethics of automated decision making systems. |
Thresholds of descending algorithms in inference problems Machine Learning, Machine Learning, Disordered Systems and Neural Networks. 2 authors. pdf We review recent works on analyzing the dynamics of gradient-based algorithms in a prototypical statistical inference problem. Using methods and insights from the physics of glassy systems, these works showed how to understand quantitatively and qualitatively the performance of gradient-based algorithms. …Here we review the key results and their interpretation in non-technical terms accessible to a wide audience of physicists in the context of related works. |
Reject Illegal Inputs with Generative Classifier Derived from Any Discriminative Classifier Machine Learning, Machine Learning. 1 authors. pdf Generative classifiers have been shown promising to detect illegal inputs including adversarial examples and out-of-distribution samples. Supervised Deep Infomax~(SDIM) is a scalable end-to-end framework to learn generative classifiers. …In this paper, we propose a modification of SDIM termed SDIM-. Instead of training generative classifier from scratch, SDIM- first takes as input the logits produced any given discriminative classifier, and generate logit representations; then a generative classifier is derived by imposing statistical constraints on logit representations. SDIM- could inherit the performance of the discriminative classifier without loss. SDIM- incurs a negligible number of additional parameters, and can be efficiently trained with base classifiers fixed. We perform , where test samples whose class conditionals are smaller than pre-chosen thresholds will be rejected without predictions. Experiments on illegal inputs, including adversarial examples, samples with common corruptions, and out-of-distribution~(OOD) samples show that allowed to reject a portion of test samples, SDIM- significantly improves the performance on the left test sets. |
Machine Learning (cs.LG) |
Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics Machine Learning, Machine Learning, Robotics. 10 authors. pdf Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or fully discrete action spaces. …These simplifications aim at tailoring the problem to a particular algorithm or solver which may only support one type of action space. Alternatively, expert heuristics are used to remove discrete actions from an otherwise continuous space. In contrast, we propose to treat hybrid problems in their ‘native’ form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously. In our experiments, we first demonstrate that the proposed approach efficiently solves such natively hybrid reinforcement learning problems. We then show, both in simulation and on robotic hardware, the benefits of removing possibly imperfect expert-designed heuristics. Lastly, hybrid reinforcement learning encourages us to rethink problem definitions. We propose reformulating control problems, e.g. by adding meta actions, to improve exploration or reduce mechanical wear and tear. |
Reasoning on Knowledge Graphs with Debate Dynamics Machine Learning, Machine Learning. 6 authors. pdf We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments – paths in the knowledge graph – with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. …Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users. |
Cost-Function-Dependent Barren Plateaus in Shallow Quantum Neural Networks Machine Learning, Quantum Physics. 5 authors. pdf Variational quantum algorithms (VQAs) optimize the parameters \(\boldsymbol{\theta}\) of a quantum neural network \(V(\boldsymbol{\theta})\) to minimize a cost function \(C\). While VQAs may enable practical applications of noisy quantum computers, they are nevertheless heuristic methods with unproven scaling. …Here, we rigorously prove two results. Our first result states that defining \(C\) in terms of global observables leads to an exponentially vanishing gradient (i.e., a barren plateau) even when \(V(\boldsymbol{\theta})\) is shallow. This implies that several VQAs in the literature must revise their proposed cost functions. On the other hand, our second result states that defining \(C\) with local observables leads to a polynomially vanishing gradient, so long as the depth of \(V(\boldsymbol{\theta})\) is \(\mathcal{O}(\log n)\). Taken together, our results establish a precise connection between locality and trainability. Finally, we illustrate these ideas with large-scale simulations, up to 100 qubits, of a particular VQA known as quantum autoencoders. |
Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation Machine Learning, Machine Learning, Robotics, Artificial Intelligence. 4 authors. pdf Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. …On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1) reward ambiguity - there are an infinite number of possible reward functions that could explain an expert’s demonstration and 2) heterogeneity - human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans’ strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy’s objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task. |
Restricting the Flow: Information Bottlenecks for Attribution Machine Learning, Machine Learning, Computer Vision and Pattern Recognition. 4 authors. pdf Attribution methods provide insights into the decision-making of machine learning models like artificial neural networks. For a given input sample, they assign a relevance score to each individual input variable, such as the pixels of an image. …In this work we adapt the information bottleneck concept for attribution. By adding noise to intermediate feature maps we restrict the flow of information and can quantify (in bits) how much information image regions provide. We compare our method against ten baselines using three different metrics on VGG-16 and ResNet-50, and find that our methods outperform all baselines in five out of six settings. The method’s information-theoretic foundation provides an absolute frame of reference for attribution values (bits) and a guarantee that regions scored close to zero are not necessary for the network’s decision. |
Kernelized Support Tensor Train Machines Machine Learning, Machine Learning, Computer Vision and Pattern Recognition. 4 authors. pdf Tensor, a multi-dimensional data structure, has been exploited recently in the machine learning community. Traditional machine learning approaches are vector- or matrix-based, and cannot handle tensorial data directly. …In this paper, we propose a tensor train (TT)-based kernel technique for the first time, and apply it to the conventional support vector machine (SVM) for image classification. Specifically, we propose a kernelized support tensor train machine that accepts tensorial input and preserves the intrinsic kernel property. The main contributions are threefold. First, we propose a TT-based feature mapping procedure that maintains the TT structure in the feature space. Second, we demonstrate two ways to construct the TT-based kernel function while considering consistency with the TT inner product and preservation of information. Third, we show that it is possible to apply different kernel functions on different data modes. In principle, our method tensorizes the standard SVM on its input structure and kernel mapping scheme. Extensive experiments are performed on real-world tensor data, which demonstrates the superiority of the proposed scheme under few-sample high-dimensional inputs. |
Robust Marine Buoy Placement for Ship Detection Using Dropout K-Means Machine Learning, Machine Learning. 4 authors. pdf Marine buoys aid in the battle against Illegal, Unreported and Unregulated (IUU) fishing by detecting fishing vessels in their vicinity. Marine buoys, however, may be disrupted by natural causes and buoy vandalism. …To minimize the effects of buoy disruption on a buoy network, we propose a more robust buoy placement using dropout k-means and dropout k-median. We apply dropout k-means and dropout k-median to determine locations for deploying marine buoys in the Gabonese waters near West Africa. We simulated the passage of ships using historical Automatic Identification System (AIS) data, then compared the ship detection probability of dropout k-means to classic k-means and dropout k-median to classic k-median, taking into account that the current sensor detection radius is 10km. With 5 buoys, the buoy arrangement computed by classic k-means, dropout k-means, classic k-median and dropout k-median have ship detection probabilities of 38%, 45%, 48% and 52%. |
Non-Parametric Learning of Gaifman Models Machine Learning, Machine Learning. 4 authors. pdf We consider the problem of structure learning for Gaifman models and learn relational features that can be used to derive feature representations from a knowledge base. These relational features are first-order rules that are then partially grounded and counted over local neighborhoods of a Gaifman model to obtain the feature representations. …We propose a method for learning these relational features for a Gaifman model by using relational tree distances. Our empirical evaluation on real data sets demonstrates the superiority of our approach over classical rule-learning. |
PrivacyNet: Semi-Adversarial Networks for Multi-attribute Face Privacy Machine Learning, Computer Vision and Pattern Recognition. 3 authors. pdf In recent years, the utilization of biometric information has become more and more common for various forms of identity verification and user authentication. However, as a consequence of the widespread use and storage of biometric information, concerns regarding sensitive information leakage and the protection of users’ privacy have been raised. …Recent research efforts targeted these concerns by proposing the Semi-Adversarial Networks (SAN) framework for imparting gender privacy to face images. The objective of SAN is to perturb face image data such that it cannot be reliably used by a gender classifier but can still be used by a face matcher for matching purposes. In this work, we propose a novel Generative Adversarial Networks-based SAN model, PrivacyNet, that is capable of imparting selective soft biometric privacy to multiple soft-biometric attributes such as gender, age, and race. While PrivacyNet is capable of perturbing different sources of soft biometric information reliably and simultaneously, it also allows users to choose to obfuscate specific attributes, while preserving others. The results from extensive experiments on five independent face image databases demonstrate the efficacy of our proposed model in imparting selective multi-attribute privacy to face images. |
A Deep Structural Model for Analyzing Correlated Multivariate Time Series Machine Learning, Machine Learning. 3 authors. pdf Multivariate time series are routinely encountered in real-world applications, and in many cases, these time series are strongly correlated. In this paper, we present a deep learning structural time series model which can (i) handle correlated multivariate time series input, and (ii) forecast the targeted temporal sequence by explicitly learning/extracting the trend, seasonality, and event components. …The trend is learned via a 1D and 2D temporal CNN and LSTM hierarchical neural net. The CNN-LSTM architecture can (i) seamlessly leverage the dependency among multiple correlated time series in a natural way, (ii) extract the weighted differencing feature for better trend learning, and (iii) memorize the long-term sequential pattern. The seasonality component is approximated via a non-liner function of a set of Fourier terms, and the event components are learned by a simple linear function of regressor encoding the event dates. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of time series data sets, such as forecasts of Amazon AWS Simple Storage Service (S3) and Elastic Compute Cloud (EC2) billings, and the closing prices for corporate stocks in the same category. |
Adversarial Policies in Learning Systems with Malicious Experts Multiagent Systems, Machine Learning, Machine Learning, Cryptography and Security. 3 authors. pdf We consider a learning system based on the conventional multiplicative weight (MW) rule that combines experts’ advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. …The loss of the system is naturally defined to be the aggregate absolute difference between the sequence of predicted outcomes and the true outcomes. We consider this problem under both offline and online settings. In the offline setting where the malicious expert must choose its entire sequence of decisions a priori, we show somewhat surprisingly that a simple greedy policy of always reporting false prediction is asymptotically optimal with an approximation ratio of \(1+O(\sqrt{\frac{\ln N}{N}})\), where \(N\) is the total number of prediction stages. In particular, we describe a policy that closely resembles the structure of the optimal offline policy. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in \(O(N^2)\). Our results provide a new direction for vulnerability assessment of commonly used learning algorithms to adversarial attacks where the threat is an integral part of the system. |
Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition Machine Learning, Machine Learning. 3 authors. pdf Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models, and generally yields improved performance and faster training times. The technique of pre-training on one task and then retraining on a new one is called transfer learning. …In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks. We perform three sets of experiments with varying levels of similarity between source and target tasks to investigate the behaviour of different types of knowledge transfer. We transfer both parameters and features and analyse their behaviour. Our results demonstrate that no significant advantage is gained by using a transfer learning approach over a traditional machine learning approach for our character recognition tasks. This suggests that using transfer learning does not necessarily presuppose a better performing model in all cases. |
On Consequentialism and Fairness Computers and Society, Machine Learning, Artificial Intelligence, Machine Learning. 2 authors. pdf Recent work on fairness in machine learning has primarily emphasized how to define, quantify, and encourage “fair” outcomes. Less attention has been paid, however, to the ethical foundations which underlie such efforts. …Among the ethical perspectives that should be taken into consideration is consequentialism, the position that, roughly speaking, outcomes are all that matter. Although consequentialism is not free from difficulties, and although it does not necessarily provide a tractable way of choosing actions (because of the combined problems of uncertainty, subjectivity, and aggregation), it nevertheless provides a powerful foundation from which to critique the existing literature on machine learning fairness. Moreover, it brings to the fore some of the tradeoffs involved, including the problem of who counts, the pros and cons of using a policy, and the relative value of the distant future. In this paper we provide a consequentialist critique of common definitions of fairness within machine learning, as well as a machine learning perspective on consequentialism. We conclude with a broader discussion of the issues of learning and randomization, which have important implications for the ethics of automated decision making systems. |
Lightweight Residual Densely Connected Convolutional Neural Network Machine Learning, Computer Vision and Pattern Recognition. 2 authors. pdf Extremely efficient convolutional neural network architectures are one of the most important requirements for limited computing power devices (such as embedded and mobile devices). Recently, some architectures have been proposed to overcome this limitation by considering specific hardware-software equipment. …In this paper, the residual densely connected blocks are proposed to guaranty the deep supervision, efficient gradient flow, and feature reuse abilities of convolutional neural network. The proposed method decreases the cost of training and inference processes without using any special hardware-software equipment by just reducing the number of parameters and computational operations while achieving a feasible accuracy. Extensive experimental results demonstrate that the proposed architecture is more efficient than the AlexNet and VGGNet in terms of model size, required parameters, and even accuracy. The proposed model is evaluated on the ImageNet, MNIST, Fashion MNIST, SVHN, CIFAR-10, and CIFAR-100. It achieves state-of-the-art results on the Fashion MNIST dataset and reasonable results on the others. The obtained results show that the proposed model is superior to efficient models such as the SqueezNet and is also comparable with the state-of-the-art efficient models such as CondenseNet and ShuffleNet. |
Thresholds of descending algorithms in inference problems Machine Learning, Machine Learning, Disordered Systems and Neural Networks. 2 authors. pdf We review recent works on analyzing the dynamics of gradient-based algorithms in a prototypical statistical inference problem. Using methods and insights from the physics of glassy systems, these works showed how to understand quantitatively and qualitatively the performance of gradient-based algorithms. …Here we review the key results and their interpretation in non-technical terms accessible to a wide audience of physicists in the context of related works. |
Reject Illegal Inputs with Generative Classifier Derived from Any Discriminative Classifier Machine Learning, Machine Learning. 1 authors. pdf Generative classifiers have been shown promising to detect illegal inputs including adversarial examples and out-of-distribution samples. Supervised Deep Infomax~(SDIM) is a scalable end-to-end framework to learn generative classifiers. …In this paper, we propose a modification of SDIM termed SDIM-. Instead of training generative classifier from scratch, SDIM- first takes as input the logits produced any given discriminative classifier, and generate logit representations; then a generative classifier is derived by imposing statistical constraints on logit representations. SDIM- could inherit the performance of the discriminative classifier without loss. SDIM- incurs a negligible number of additional parameters, and can be efficiently trained with base classifiers fixed. We perform , where test samples whose class conditionals are smaller than pre-chosen thresholds will be rejected without predictions. Experiments on illegal inputs, including adversarial examples, samples with common corruptions, and out-of-distribution~(OOD) samples show that allowed to reject a portion of test samples, SDIM- significantly improves the performance on the left test sets. |
The tables below show abstracts organized by category with hyperlinks back to the arXiv site.
Machine Learning (cs.LG) |
Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics Machine Learning, Machine Learning, Robotics. 10 authors. pdf Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or fully discrete action spaces. …These simplifications aim at tailoring the problem to a particular algorithm or solver which may only support one type of action space. Alternatively, expert heuristics are used to remove discrete actions from an otherwise continuous space. In contrast, we propose to treat hybrid problems in their ‘native’ form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously. In our experiments, we first demonstrate that the proposed approach efficiently solves such natively hybrid reinforcement learning problems. We then show, both in simulation and on robotic hardware, the benefits of removing possibly imperfect expert-designed heuristics. Lastly, hybrid reinforcement learning encourages us to rethink problem definitions. We propose reformulating control problems, e.g. by adding meta actions, to improve exploration or reduce mechanical wear and tear. |
Reasoning on Knowledge Graphs with Debate Dynamics Machine Learning, Machine Learning. 6 authors. pdf We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments – paths in the knowledge graph – with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. …Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users. |
Dataset of Video Game Development Problems Software Engineering. 5 authors. pdf Different from traditional software development, there is little information about the software-engineering process and techniques in video-game development. One popular way to share knowledge among the video-game developers’ community is the publishing of postmortems, which are documents summarizing what happened during the video-game development project. …However, these documents are written without formal structure and often providing disparate information. Through this paper, we provide developers and researchers with grounded dataset describing software-engineering problems in video-game development extracted from postmortems. We created the dataset using an iterative method through which we manually coded more than 200 postmortems spanning 20 years (1998 to 2018) and extracted 1,035 problems related to software engineering while maintaining traceability links to the postmortems. We grouped the problems in 20 different types. This dataset is useful to understand the problems faced by developers during video-game development, providing researchers and practitioners a starting point to study video-game development in the context of software engineering. |
Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation Machine Learning, Machine Learning, Robotics, Artificial Intelligence. 4 authors. pdf Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. …On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1) reward ambiguity - there are an infinite number of possible reward functions that could explain an expert’s demonstration and 2) heterogeneity - human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans’ strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy’s objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task. |
Butterfly detection and classification based on integrated YOLO algorithm Computer Vision and Pattern Recognition. 4 authors. pdf Insects are abundant species on the earth, and the task of identification and identification of insects is complex and arduous. How to apply artificial intelligence technology and digital image processing methods to automatic identification of insect species is a hot issue in current research. …In this paper, the problem of automatic detection and classification recognition of butterfly photographs is studied, and a method of bio-labeling suitable for butterfly classification is proposed. On the basis of YOLO algorithm, by synthesizing the results of YOLO models with different training mechanisms, a butterfly automatic detection and classification recognition algorithm based on YOLO algorithm is proposed. It greatly improves the generalization ability of YOLO algorithm and makes it have better ability to solve small sample problems. The experimental results show that the proposed annotation method and integrated YOLO algorithm have high accuracy and recognition rate in butterfly automatic detection and recognition. |
Kernelized Support Tensor Train Machines Machine Learning, Machine Learning, Computer Vision and Pattern Recognition. 4 authors. pdf Tensor, a multi-dimensional data structure, has been exploited recently in the machine learning community. Traditional machine learning approaches are vector- or matrix-based, and cannot handle tensorial data directly. …In this paper, we propose a tensor train (TT)-based kernel technique for the first time, and apply it to the conventional support vector machine (SVM) for image classification. Specifically, we propose a kernelized support tensor train machine that accepts tensorial input and preserves the intrinsic kernel property. The main contributions are threefold. First, we propose a TT-based feature mapping procedure that maintains the TT structure in the feature space. Second, we demonstrate two ways to construct the TT-based kernel function while considering consistency with the TT inner product and preservation of information. Third, we show that it is possible to apply different kernel functions on different data modes. In principle, our method tensorizes the standard SVM on its input structure and kernel mapping scheme. Extensive experiments are performed on real-world tensor data, which demonstrates the superiority of the proposed scheme under few-sample high-dimensional inputs. |
Robust Marine Buoy Placement for Ship Detection Using Dropout K-Means Machine Learning, Machine Learning. 4 authors. pdf Marine buoys aid in the battle against Illegal, Unreported and Unregulated (IUU) fishing by detecting fishing vessels in their vicinity. Marine buoys, however, may be disrupted by natural causes and buoy vandalism. …To minimize the effects of buoy disruption on a buoy network, we propose a more robust buoy placement using dropout k-means and dropout k-median. We apply dropout k-means and dropout k-median to determine locations for deploying marine buoys in the Gabonese waters near West Africa. We simulated the passage of ships using historical Automatic Identification System (AIS) data, then compared the ship detection probability of dropout k-means to classic k-means and dropout k-median to classic k-median, taking into account that the current sensor detection radius is 10km. With 5 buoys, the buoy arrangement computed by classic k-means, dropout k-means, classic k-median and dropout k-median have ship detection probabilities of 38%, 45%, 48% and 52%. |
Non-Parametric Learning of Gaifman Models Machine Learning, Machine Learning. 4 authors. pdf We consider the problem of structure learning for Gaifman models and learn relational features that can be used to derive feature representations from a knowledge base. These relational features are first-order rules that are then partially grounded and counted over local neighborhoods of a Gaifman model to obtain the feature representations. …We propose a method for learning these relational features for a Gaifman model by using relational tree distances. Our empirical evaluation on real data sets demonstrates the superiority of our approach over classical rule-learning. |
PrivacyNet: Semi-Adversarial Networks for Multi-attribute Face Privacy Machine Learning, Computer Vision and Pattern Recognition. 3 authors. pdf In recent years, the utilization of biometric information has become more and more common for various forms of identity verification and user authentication. However, as a consequence of the widespread use and storage of biometric information, concerns regarding sensitive information leakage and the protection of users’ privacy have been raised. …Recent research efforts targeted these concerns by proposing the Semi-Adversarial Networks (SAN) framework for imparting gender privacy to face images. The objective of SAN is to perturb face image data such that it cannot be reliably used by a gender classifier but can still be used by a face matcher for matching purposes. In this work, we propose a novel Generative Adversarial Networks-based SAN model, PrivacyNet, that is capable of imparting selective soft biometric privacy to multiple soft-biometric attributes such as gender, age, and race. While PrivacyNet is capable of perturbing different sources of soft biometric information reliably and simultaneously, it also allows users to choose to obfuscate specific attributes, while preserving others. The results from extensive experiments on five independent face image databases demonstrate the efficacy of our proposed model in imparting selective multi-attribute privacy to face images. |
First image then video: A two-stage network for spatiotemporal video denoising Computer Vision and Pattern Recognition. 3 authors. pdf Video denoising is to remove noise from noise-corrupted data, thus recovering true signals via spatiotemporal processing. Existing approaches for spatiotemporal video denoising tend to suffer from motion blur artifacts, that is, the boundary of a moving object tends to appear blurry especially when the object undergoes a fast motion, causing optical flow calculation to break down. …In this paper, we address this challenge by designing a first-image-then-video two-stage denoising neural network, consisting of an image denoising module for spatially reducing intra-frame noise followed by a regular spatiotemporal video denoising module. The intuition is simple yet powerful and effective: the first stage of image denoising effectively reduces the noise level and, therefore, allows the second stage of spatiotemporal denoising for better modeling and learning everywhere, including along the moving object boundaries. This two-stage network, when trained in an end-to-end fashion, yields the state-of-the-art performances on the video denoising benchmark Vimeo90K dataset in terms of both denoising quality and computation. It also enables an unsupervised approach that achieves comparable performance to existing supervised approaches. |
Computer Vision and Pattern Recognition (cs.CV) |
Adversarial Policies in Learning Systems with Malicious Experts Multiagent Systems, Machine Learning, Machine Learning, Cryptography and Security. 3 authors. pdf We consider a learning system based on the conventional multiplicative weight (MW) rule that combines experts’ advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. …The loss of the system is naturally defined to be the aggregate absolute difference between the sequence of predicted outcomes and the true outcomes. We consider this problem under both offline and online settings. In the offline setting where the malicious expert must choose its entire sequence of decisions a priori, we show somewhat surprisingly that a simple greedy policy of always reporting false prediction is asymptotically optimal with an approximation ratio of \(1+O(\sqrt{\frac{\ln N}{N}})\), where \(N\) is the total number of prediction stages. In particular, we describe a policy that closely resembles the structure of the optimal offline policy. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in \(O(N^2)\). Our results provide a new direction for vulnerability assessment of commonly used learning algorithms to adversarial attacks where the threat is an integral part of the system. |
Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition Machine Learning, Machine Learning. 3 authors. pdf Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models, and generally yields improved performance and faster training times. The technique of pre-training on one task and then retraining on a new one is called transfer learning. …In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks. We perform three sets of experiments with varying levels of similarity between source and target tasks to investigate the behaviour of different types of knowledge transfer. We transfer both parameters and features and analyse their behaviour. Our results demonstrate that no significant advantage is gained by using a transfer learning approach over a traditional machine learning approach for our character recognition tasks. This suggests that using transfer learning does not necessarily presuppose a better performing model in all cases. |
Informal Data Transformation Considered Harmful Artificial Intelligence, Databases. 2 authors. pdf NA …NA |
On Consequentialism and Fairness Computers and Society, Machine Learning, Artificial Intelligence, Machine Learning. 2 authors. pdf Recent work on fairness in machine learning has primarily emphasized how to define, quantify, and encourage “fair” outcomes. Less attention has been paid, however, to the ethical foundations which underlie such efforts. …Among the ethical perspectives that should be taken into consideration is consequentialism, the position that, roughly speaking, outcomes are all that matter. Although consequentialism is not free from difficulties, and although it does not necessarily provide a tractable way of choosing actions (because of the combined problems of uncertainty, subjectivity, and aggregation), it nevertheless provides a powerful foundation from which to critique the existing literature on machine learning fairness. Moreover, it brings to the fore some of the tradeoffs involved, including the problem of who counts, the pros and cons of using a policy, and the relative value of the distant future. In this paper we provide a consequentialist critique of common definitions of fairness within machine learning, as well as a machine learning perspective on consequentialism. We conclude with a broader discussion of the issues of learning and randomization, which have important implications for the ethics of automated decision making systems. |
Lightweight Residual Densely Connected Convolutional Neural Network Machine Learning, Computer Vision and Pattern Recognition. 2 authors. pdf Extremely efficient convolutional neural network architectures are one of the most important requirements for limited computing power devices (such as embedded and mobile devices). Recently, some architectures have been proposed to overcome this limitation by considering specific hardware-software equipment. …In this paper, the residual densely connected blocks are proposed to guaranty the deep supervision, efficient gradient flow, and feature reuse abilities of convolutional neural network. The proposed method decreases the cost of training and inference processes without using any special hardware-software equipment by just reducing the number of parameters and computational operations while achieving a feasible accuracy. Extensive experimental results demonstrate that the proposed architecture is more efficient than the AlexNet and VGGNet in terms of model size, required parameters, and even accuracy. The proposed model is evaluated on the ImageNet, MNIST, Fashion MNIST, SVHN, CIFAR-10, and CIFAR-100. It achieves state-of-the-art results on the Fashion MNIST dataset and reasonable results on the others. The obtained results show that the proposed model is superior to efficient models such as the SqueezNet and is also comparable with the state-of-the-art efficient models such as CondenseNet and ShuffleNet. |
Artificial Intelligence (cs.AI) |
Using CNNs For Users Segmentation In Video See-Through Augmented Virtuality Computer Vision and Pattern Recognition. 2 authors. pdf In this paper, we present preliminary results on the use of deep learning techniques to integrate the users self-body and other participants into a head-mounted video see-through augmented virtuality scenario. It has been previously shown that seeing users bodies in such simulations may improve the feeling of both self and social presence in the virtual environment, as well as user performance. …We propose to use a convolutional neural network for real time semantic segmentation of users bodies in the stereoscopic RGB video streams acquired from the perspective of the user. We describe design issues as well as implementation details of the system and demonstrate the feasibility of using such neural networks for merging users bodies in an augmented virtuality simulation. |
Databases (cs.DB) |
Thresholds of descending algorithms in inference problems Machine Learning, Machine Learning, Disordered Systems and Neural Networks. 2 authors. pdf We review recent works on analyzing the dynamics of gradient-based algorithms in a prototypical statistical inference problem. Using methods and insights from the physics of glassy systems, these works showed how to understand quantitatively and qualitatively the performance of gradient-based algorithms. …Here we review the key results and their interpretation in non-technical terms accessible to a wide audience of physicists in the context of related works. |
Software Engineering (cs.SE) |
Reject Illegal Inputs with Generative Classifier Derived from Any Discriminative Classifier Machine Learning, Machine Learning. 1 authors. pdf Generative classifiers have been shown promising to detect illegal inputs including adversarial examples and out-of-distribution samples. Supervised Deep Infomax~(SDIM) is a scalable end-to-end framework to learn generative classifiers. …In this paper, we propose a modification of SDIM termed SDIM-. Instead of training generative classifier from scratch, SDIM- first takes as input the logits produced any given discriminative classifier, and generate logit representations; then a generative classifier is derived by imposing statistical constraints on logit representations. SDIM- could inherit the performance of the discriminative classifier without loss. SDIM- incurs a negligible number of additional parameters, and can be efficiently trained with base classifiers fixed. We perform , where test samples whose class conditionals are smaller than pre-chosen thresholds will be rejected without predictions. Experiments on illegal inputs, including adversarial examples, samples with common corruptions, and out-of-distribution~(OOD) samples show that allowed to reject a portion of test samples, SDIM- significantly improves the performance on the left test sets. |
Applications (stat.AP) |
Circular Regression Trees and Forests with an Application to Probabilistic Wind Direction Forecasting Methodology. 6 authors. pdf While circular data occur in a wide range of scientific fields, the methodology for distributional modeling and probabilistic forecasting of circular response variables is rather limited. Most of the existing methods are built on the framework of generalized linear and additive models, which are often challenging to optimize and to interpret. …Therefore, we suggest circular regression trees and random forests as an intuitive alternative approach that is relatively easy to fit. Building on previous ideas for trees modeling circular means, we suggest a distributional approach for both trees and forests yielding probabilistic forecasts based on the von Mises distribution. The resulting tree-based models simplify the estimation process by using the available covariates for partitioning the data into sufficiently homogeneous subgroups so that a simple von Mises distribution without further covariates can be fitted to the circular response in each subgroup. These circular regression trees are straightforward to interpret, can capture nonlinear effects and interactions, and automatically select the relevant covariates that are associated with either location and/or scale changes in the von Mises distribution. Combining an ensemble of circular regression trees to a circular regression forest yields a local adaptive likelihood estimator for the von Mises distribution that can regularize and smooth the covariate effects. The new methods are evaluated in a case study on probabilistic wind direction forecasting at two Austrian airports, considering other common approaches as a benchmark. |
Restricting the Flow: Information Bottlenecks for Attribution Machine Learning, Machine Learning, Computer Vision and Pattern Recognition. 4 authors. pdf Attribution methods provide insights into the decision-making of machine learning models like artificial neural networks. For a given input sample, they assign a relevance score to each individual input variable, such as the pixels of an image. …In this work we adapt the information bottleneck concept for attribution. By adding noise to intermediate feature maps we restrict the flow of information and can quantify (in bits) how much information image regions provide. We compare our method against ten baselines using three different metrics on VGG-16 and ResNet-50, and find that our methods outperform all baselines in five out of six settings. The method’s information-theoretic foundation provides an absolute frame of reference for attribution values (bits) and a guarantee that regions scored close to zero are not necessary for the network’s decision. |
Machine Learning (stat.ML) |
A Deep Structural Model for Analyzing Correlated Multivariate Time Series Machine Learning, Machine Learning. 3 authors. pdf Multivariate time series are routinely encountered in real-world applications, and in many cases, these time series are strongly correlated. In this paper, we present a deep learning structural time series model which can (i) handle correlated multivariate time series input, and (ii) forecast the targeted temporal sequence by explicitly learning/extracting the trend, seasonality, and event components. …The trend is learned via a 1D and 2D temporal CNN and LSTM hierarchical neural net. The CNN-LSTM architecture can (i) seamlessly leverage the dependency among multiple correlated time series in a natural way, (ii) extract the weighted differencing feature for better trend learning, and (iii) memorize the long-term sequential pattern. The seasonality component is approximated via a non-liner function of a set of Fourier terms, and the event components are learned by a simple linear function of regressor encoding the event dates. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of time series data sets, such as forecasts of Amazon AWS Simple Storage Service (S3) and Elastic Compute Cloud (EC2) billings, and the closing prices for corporate stocks in the same category. |
CircSpaceTime: an R package for spatial and spatio-temporal modeling of Circular data Applications. 3 authors. pdf CircSpaceTime is the only R package currently available that implements Bayesian models for spatial and spatio-temporal interpolation of circular data. Such data are often found in applications where, among the many, wind directions, animal movement directions, and wave directions are involved. …To analyze such data we need models for observations at locations s and times t, as the so-called geostatistical models, providing structured dependence assumed to decay in distance and time. The approach we take begins with Gaussian processes defined for linear variables over space and time. Then, we use either wrapping or projection to obtain processes for circular data. The models are cast as hierarchical, with fitting and inference within a Bayesian framework. Altogether, this package implements work developed by a series of papers; the most relevant being Jona Lasinio, Gelfand, and Jona Lasinio (2012); Wang and Gelfand (2014); Mastrantonio, Jona Lasinio, and Gelfand (2016). All procedures are written using Rcpp. Estimates are obtained by MCMC allowing parallelized multiple chains run. The implementation of the proposed models is considerably improved on the simple routines adopted in the research papers. As original running examples, for the spatial and spatio-temporal settings, we use wind directions datasets over central Italy. |
Methodology (stat.ME) |
Concentration of Benefit index: A threshold-free summary metric for quantifying the capacity of covariates to yield efficient treatment rules Applications, Methodology. 3 authors. pdf When data on treatment assignment, outcomes, and covariates from a randomized trial are available, a question of interest is to what extent covariates can be used to optimize treatment decisions. Statistical hypothesis testing of covariate-by-treatment interaction is ill-suited for this purpose. …The application of decision theory results in treatment rules that compare the expected benefit of treatment given the patient’s covariates against a treatment threshold. However, determining treatment threshold is often context-specific, and any given threshold might seem arbitrary when the overall capacity towards predicting treatment benefit is of concern. We propose the Concentration of Benefit index (Cb), a threshold-free metric that quantifies the combined performance of covariates towards finding individuals who will benefit the most from treatment. The construct of the proposed index is comparing expected treatment outcomes with and without knowledge of covariates when one of a two randomly selected patients are to be treated. We show that the resulting index can also be expressed in terms of the integrated efficiency of individualized treatment decision over the entire range of treatment thresholds. We propose parametric and semi-parametric estimators, the latter being suitable for out-of-sample validation and correction for optimism. We used data from a clinical trial to demonstrate the calculations in a step-by-step fashion, and have provided the R code for implementation (https://github.com/msadatsafavi/txBenefit). The proposed index has intuitive and theoretically sound interpretation and can be estimated with relative ease for a wide class of regression models. Beyond the conceptual developments, various aspects of estimation and inference for such a metric need to be pursued in future research. |
Prediction in locally stationary time series Methodology, Econometrics. 2 authors. pdf We develop an estimator for the high-dimensional covariance matrix of a locally stationary process with a smoothly varying trend and use this statistic to derive consistent predictors in non-stationary time series. In contrast to the currently available methods for this problem the predictor developed here does not rely on fitting an autoregressive model and does not require a vanishing trend. …The finite sample properties of the new methodology are illustrated by means of a simulation study and a financial indices study. |
Image and Video Processing (eess.IV) |
Kalman Filtering and Expectation Maximization for Multitemporal Spectral Unmixing Image and Video Processing, Computer Vision and Pattern Recognition. 5 authors. pdf The recent evolution of hyperspectral imaging technology and the proliferation of new emerging applications presses for the processing of multiple temporal hyperspectral images. In this work, we propose a novel spectral unmixing (SU) strategy using physically motivated parametric endmember representations to account for temporal spectral variability. …By representing the multitemporal mixing process using a state-space formulation, we are able to exploit the Bayesian filtering machinery to estimate the endmember variability coefficients. Moreover, by assuming that the temporal variability of the abundances is small over short intervals, an efficient implementation of the expectation maximization (EM) algorithm is employed to estimate the abundances and the other model parameters. Simulation results indicate that the proposed strategy outperforms state-of-the-art multitemporal SU algorithms. |
DuDoNet++: Encoding mask projection to reduce CT metal artifacts Image and Video Processing, Computer Vision and Pattern Recognition. 4 authors. pdf CT metal artifact reduction (MAR) is a notoriously challenging task because the artifacts are structured and non-local in the image domain. However, they are inherently local in the sinogram domain. …DuDoNet is the state-of-the-art MAR algorithm which exploits the latter characteristic by learning to reduce artifacts in the sinogram and image domain jointly. By design, DuDoNet treats the metal-affected regions in sinogram as missing and replaces them with the surrogate data generated by a neural network. Since fine-grained details within the metal-affected regions are completely ignored, the artifact-reduced CT images by DuDoNet tend to be over-smoothed and distorted. In this work, we investigate the issue by theoretical derivation. We propose to address the problem by (1) retaining the metal-affected regions in sinogram and (2) replacing the binarized metal trace with the metal mask projection such that the geometry information of metal implants is encoded. Extensive experiments on simulated datasets and expert evaluations on clinical images demonstrate that our network called DuDoNet++ yields anatomically more precise artifact-reduced images than DuDoNet, especially when the metallic objects are large. |
Joint Unsupervised Learning for the Vertebra Segmentation, Artifact Reduction and Modality Translation of CBCT Images Image and Video Processing, Computer Vision and Pattern Recognition. 4 authors. pdf We investigate the unsupervised learning of the vertebra segmentation, artifact reduction and modality translation of CBCT images. To this end, we formulate this problem under a unified framework that jointly addresses these three tasks and intensively leverages the knowledge sharing. …The unsupervised learning of this framework is enabled by 1) a novel shape-aware artifact disentanglement network that supports different forms of image synthesis and vertebra segmentation and 2) a deliberate fusion of knowledge from an independent CT dataset. Specifically, the proposed framework takes a random pair of CBCT and CT images as the input, and manipulates the synthesis and segmentation via different combinations of the decodings of the disentangled latent codes. Then, by discovering various forms of consistencies between the synthesized images and segmented , the learning is achieved via self-learning from the given CBCT and CT images obviating the need for the paired (i.e., anatomically identical) groundtruth data. Extensive experiments on clinical CBCT and CT datasets show that the proposed approach performs significantly better than other state-of-the-art unsupervised methods trained independently for each task and, remarkably, the proposed approach achieves a dice coefficient of 0.879 for unsupervised CBCT vertebra segmentation. |
Physically Plausible Spectral Reconstruction from RGB Images Image and Video Processing, Computer Vision and Pattern Recognition. 2 authors. pdf Recently Convolutional Neural Networks (CNN) have been used to reconstruct hyperspectral information from RGB images. Moreover, this spectral reconstruction problem (SR) can often be solved with good (low) error. …However, these methods are not physically plausible: that is when the recovered spectra are reintegrated with the underlying camera sensitivities, the resulting predicted RGB is not the same as the actual RGB, and sometimes this discrepancy can be large. The problem is further compounded by exposure change. Indeed, most learning-based SR models train for a fixed exposure setting and we show that this can result in poor performance when exposure varies. In this paper we show how CNN learning can be extended so that physical plausibility is enforced and the problem resulting from changing exposures is mitigated. Our SR solution improves the state-of-the-art spectral recovery performance under varying exposure conditions while simultaneously ensuring physical plausibility (the recovered spectra reintegrate to the input RGBs exactly). |
Statistics Theory (math.ST) |
On the modelling, linear stability, and numerical simulation for advection-diffusion-reaction in poroelastic media Numerical Analysis, Quantitative Methods, Numerical Analysis. 5 authors. pdf We perform the linear growth analysis for a new PDE-based model for poromechanical processes (formulated in mixed form using the solid deformation, fluid pressure, and total pressure) interacting with diffusing and reacting solutes in the medium. We find parameter regions that lead to interesting behaviour of the coupled system. …These mutual dependences between deformation and diffusive patterns are of substantial relevance in the study of morphoelastic changes in biomaterials. We provide a set of computational examples in 2D and 3D that can be used to form a better understanding on how, and up to which extent, the deformations of the porous structure dictate the generation and suppression of spatial patterning dynamics, also related to the onset of mechanochemical waves. |
Modified Pillai’s trace statistics for two high-dimensional sample covariance matrices Statistics Theory, Statistics Theory. 3 authors. pdf The goal of this study was to test the equality of two covariance matrices by using modified Pillai’s trace statistics under a high-dimensional framework, i.e. …, the dimension and sample sizes go to infinity proportionally. In this paper, we introduce two modified Pillai’s trace statistics and obtain their asymptotic distributions under the null hypothesis. The benefits of the proposed statistics include the following: (1) the sample size can be smaller than the dimensions; (2) the limiting distributions of the proposed statistics are universal; and (3) we do not restrict the structure of the population covariance matrices. The theoretical results are established under mild and practical assumptions, and their properties are demonstrated numerically by simulations and a real data analysis. |
Numerical Analysis (math.NA) |
On the Distribution of an Arbitrary Subset of the Eigenvalues for some Finite Dimensional Random Matrices Statistics Theory, Statistics Theory, Information Theory, Information Theory. 2 authors. pdf We present some new results on the joint distribution of an arbitrary subset of the ordered eigenvalues of complex Wishart, double Wishart, and Gaussian hermitian random matrices of finite dimensions, using a tensor pseudo-determinant operator. Specifically, we derive compact expressions for the joint probability distribution function of the eigenvalues and the expectation of functions of the eigenvalues, including joint moments, for the case of both ordered and unordered eigenvalues. … |
Chemical Physics (physics.chem-ph) |
The Quest For Highly Accurate Excitation Energies: A Computational Perspective Chemical Physics, Other Condensed Matter, Strongly Correlated Electrons, Computational Physics. 3 authors. pdf We provide an overview of the successive steps that made possible to obtain increasingly accurate excitation energies and properties with computational chemistry tools, eventually leading to chemically accurate vertical transition energies for small- and medium-size molecules. First, we describe the evolution of state-of-the-art methods employed to define benchmark values, with originally Roos’ CASPT2 method, and then third-order coupled cluster (CC3) methods as in the renowned Thiel set of vertical excitation energies described in a remarkable series of papers in the 2000’s. …More recently, this quest for highly accurate excitation energies was reinitiated thanks to the resurgence of selected configuration interaction (SCI) methods and their efficient parallel implementation. These methods have been able to routinely deliver highly accurate excitation energies for small molecules, as well as medium-size molecules with compact basis sets, for both single and double excitations. Second, we describe how these high-level methods and the creation of large, diverse, and accurate benchmark sets of excitation energies have allowed to assess fairly and accurately the performance of computationally lighter theoretical models (, TD-DFT, BSE@, ADC, EOM-CC, etc.) for different types of excited states (\(\pi \rightarrow \pi^*\), \(n \rightarrow \pi^*\), valence, Rydberg, singlet, triplet, double excitation, etc). We conclude this by discussing the current potentiality of these methods from both an expert and a non-expert points of view, and what we believe could be the future theoretical and technological developments in the field. |
Computational Physics (physics.comp-ph) |
Estimation of roughness measurement bias originating from background subtraction Data Analysis, Statistics and Probability. 3 authors. pdf When measuring the roughness of rough surfaces, the limited sizes of scanned areas lead to its systematic underestimation. Levelling by polynomials and other filtering used in real-world processing of atomic force microscopy data increases this bias considerably. …Here a framework is developed providing explicit expressions for the bias of squared mean square roughness in the case of levelling by fitting a model background function using linear least squares. The framework is then applied to polynomial levelling, for both one-dimensional and two-dimensional data processing, and basic models of surface autocorrelation function, Gaussian and exponential. Several other common scenarios are covered as well, including median levelling, intermediate Gaussian–exponential autocorrelation model and frequency space filtering. Application of the results to other quantities, such as Rq, Sq, Ra and~Sa is discussed. The results are summarized in overview plots covering a range of autocorrelation functions and polynomial degrees, which allow graphical estimation of the bias. |
Data Analysis, Statistics and Probability (physics.data-an) |
The free and freer XY models Disordered Systems and Neural Networks, Computational Physics. 2 authors. pdf We study two versions of the XY model where the spins but also the interaction topology is allowed to change. In the free XY model, the number of links is fixed, but their positions in the network are not. …We also study a more relaxed version where even the number of links is allowed to vary, we call it the freer XY model. When the interaction networks are dense enough, both models have phase transitions visible both in spin configurations and the network structure. The low-temperature phase in the free XY model, is characterized by tightly connected clusters of spins pointing in the same direction, and isolated spins disconnected from the rest. For the freer XY model the low-temperature phase is almost completely connected. In both models, exponents describing the magnetic ordering are mostly consistent with values of the mean-field theory of the standard XY model. |
Instrumentation and Methods for Astrophysics (astro-ph.IM) |
Using Data Imputation for Signal Separation in High Contrast Imaging Machine Learning, Earth and Planetary Astrophysics, Instrumentation and Methods for Astrophysics, Solar and Stellar Astrophysics. 8 authors. pdf To characterize circumstellar systems in high contrast imaging, the fundamental step is to construct a best point spread function (PSF) template for the non-circumstellar signals (i.e. …, star light and speckles) and separate it from the observation. With existing PSF construction methods, the circumstellar signals (e.g., planets, circumstellar disks) are unavoidably altered by over-fitting and/or self-subtraction, making forward modeling a necessity to recover these signals. We present a forward modeling–free solution to these problems with data imputation using sequential non-negative matrix factorization (DI-sNMF). DI-sNMF first converts this signal separation problem to a “missing data” problem in statistics by flagging the regions which host circumstellar signals as missing data, then attributes PSF signals to these regions. We mathematically prove it to have negligible alteration to circumstellar signals when the imputation region is relatively small, which thus enables precise measurement for these circumstellar objects. We apply it to simulated point source and circumstellar disk observations to demonstrate its proper recovery of them. We apply it to Gemini Planet Imager (GPI) K1-band observations of the debris disk surrounding HR 4796A, finding a tentative trend that the dust is more forward scattering as the wavelength increases. We expect DI-sNMF to be applicable to other general scenarios where the separation of signals is needed. |
Populations and Evolution (q-bio.PE) |
The Effect of Treatment-Related Deaths and “Sticky” Diagnoses on Recorded Prostate Cancer Mortality Populations and Evolution, Quantitative Methods. 5 authors. pdf Background: Although recorded cancer mortality should include both deaths from cancer and deaths from cancer treatment, there is evidence suggesting that the measure may be incomplete. To investigate the completeness of recorded prostate cancer mortality, we compared other-cause (non-prostate cancer) mortality in men found and not found to have prostate cancer following a needle biopsy. …Methods: We linked Medicare claims data to SEER data to analyze survival in the population of men aged 65+ enrolled in Medicare who resided in a SEER area and received a needle biopsy in 1993-2001. We compared other-cause mortality in men found to have prostate cancer (n=53,462) to that in men not found to have prostate cancer (n=103,659). Results: The age-race adjusted other-cause mortality rate was 471 per 10,000 person-years in men found to have prostate cancer vs. 468 per 10,000 in men not found to have prostate cancer (RR = 1.01;95% CI:0.98-1.03). The effect was modified, however, by age. The RR declined in a stepwise fashion from 1.08 (95% CI:1.03-1.14) in men age 65-69 to 0.89 (95% CI:0.83-0.95) in men age 85 and older. If the excess (or deficit) in other-cause mortality were added to the recorded prostate cancer mortality, prostate cancer mortality would rise 23% in the youngest age group (from 90 to 111 per 10,000) and would fall 30% in the oldest age group (from 551 to 388 per 10,000). Conclusion: Although recorded prostate cancer mortality appears to be an accurate measure overall, it systematically underestimates the mortality associated with prostate cancer diagnosis and treatment in younger men and overestimates it in the very old. We surmise that in younger men treatment-related deaths are incompletely captured in recorded prostate cancer mortality, while in older men the diagnosis “sticks”– once diagnosed, they are more likely to be said to have died from the disease. |
Risk Management (q-fin.RM) |
The Impact of the Choice of Risk and Dispersion Measure on Procyclicality Risk Management, Applications. 2 authors. pdf Procyclicality of historical risk measure estimation means that one tends to over-estimate future risk when present realized volatility is high and vice versa under-estimate future risk when the realized volatility is low. Out of it different questions arise, relevant for applications and theory: What are the factors which affect the degree of procyclicality? More specifically, how does the choice of risk measure affect this? How does this behaviour vary with the choice of realized volatility estimator? How do different underlying model assumptions influence the pro-cyclical effect? In this paper we consider three different well-known risk measures (Value-at-Risk, Expected Shortfall, Expectile), the r-th absolute centred sample moment, for any integer \(r>0\), as realized volatility estimator (this includes the sample variance and the sample mean absolute deviation around the sample mean) and two models (either an iid model or an augmented GARCH(\(p\),\(q\)) model). …We show that the strength of procyclicality depends on these three factors, the choice of risk measure, the realized volatility estimator and the model considered. But, no matter the choices, the procyclicality will always be present. |
Quantum Physics (quant-ph) |
Cost-Function-Dependent Barren Plateaus in Shallow Quantum Neural Networks Machine Learning, Quantum Physics. 5 authors. pdf Variational quantum algorithms (VQAs) optimize the parameters \(\boldsymbol{\theta}\) of a quantum neural network \(V(\boldsymbol{\theta})\) to minimize a cost function \(C\). While VQAs may enable practical applications of noisy quantum computers, they are nevertheless heuristic methods with unproven scaling. …Here, we rigorously prove two results. Our first result states that defining \(C\) in terms of global observables leads to an exponentially vanishing gradient (i.e., a barren plateau) even when \(V(\boldsymbol{\theta})\) is shallow. This implies that several VQAs in the literature must revise their proposed cost functions. On the other hand, our second result states that defining \(C\) with local observables leads to a polynomially vanishing gradient, so long as the depth of \(V(\boldsymbol{\theta})\) is \(\mathcal{O}(\log n)\). Taken together, our results establish a precise connection between locality and trainability. Finally, we illustrate these ideas with large-scale simulations, up to 100 qubits, of a particular VQA known as quantum autoencoders. |