46 new data science research articles were published on 2020-01-09. 18 discussed machine learning.
Yesterday’s counts of submitted papers on www.arxiv.org grouped by primary subject. Click the links in the table to be re-directed to the abstracts below. The links under Subject
will redirect you to abstracts with the primary subject (there can only be one primary subject on arXiv). The links under Category
will redirect you to all publications yesterday with a given tag (primary or secondary).
Subject | Category | N |
---|---|---|
Computer Science (27) | Machine Learning (cs.LG) | 10 (8) |
Computer Vision and Pattern Recognition (cs.CV) | 8 (2) | |
Artificial Intelligence (cs.AI) | 2 (6) | |
Human-Computer Interaction (cs.HC) | 1 (2) | |
Robotics (cs.RO) | 1 (1) | |
Computation and Language (cs.CL) | 1 | |
Computers and Society (cs.CY) | 1 | |
Information Theory (cs.IT) | 1 | |
Logic in Computer Science (cs.LO) | 1 | |
Software Engineering (cs.SE) | 1 | |
Physics (5) | Computational Physics (physics.comp-ph) | 2 (5) |
Fluid Dynamics (physics.flu-dyn) | 1 | |
Physics and Society (physics.soc-ph) | 1 | |
Plasma Physics (physics.plasm-ph) | 1 | |
Condensed Matter (4) | Materials Science (cond-mat.mtrl-sci) | 2 |
Mesoscale and Nanoscale Physics (cond-mat.mes-hall) | 1 (2) | |
Statistical Mechanics (cond-mat.stat-mech) | 1 | |
Statistics (4) | Computation (stat.CO) | 2 |
Machine Learning (stat.ML) | 1 (7) | |
Methodology (stat.ME) | 1 | |
Mathematics (2) | Optimization and Control (math.OC) | 1 (1) |
Statistics Theory (math.ST) | 1 | |
Elec. Eng. and Systems Science (1) | Image and Video Processing (eess.IV) | 1 (2) |
Other (1) | General Relativity and Quantum Cosmology (gr-qc) | 1 |
Quantitative Biology (1) | Quantitative Methods (q-bio.QM) | 1 |
Quantum Physics (1) | Quantum Physics (quant-ph) | 1 (1) |
This section contains all articles with any tag of stat.AP
, stat.co
, stat.ML
, cs.LG
, q-fin.ST
, q-fin.EC
, or econ-EM
. Only the first two sentences are shown - click the links for more detail.
Machine Learning (stat.ML) |
The Counterfactual \(χ\)-GAN Machine Learning, Machine Learning. 4 authors. pdf Causal inference often relies on the counterfactual framework, which requires that treatment assignment is independent of the outcome, known as strong ignorability. Approaches to enforcing strong ignorability in causal analyses of observational data include weighting and matching methods. …Effect estimates, such as the average treatment effect (ATE), are then estimated as expectations under the reweighted or matched distribution, P . The choice of P is important and can impact the interpretation of the effect estimate and the variance of effect estimates. In this work, instead of specifying P, we learn a distribution that simultaneously maximizes coverage and minimizes variance of ATE estimates. In order to learn this distribution, this research proposes a generative adversarial network (GAN)-based model called the Counterfactual \(\chi\)-GAN (cGAN), which also learns feature-balancing weights and supports unbiased causal estimation in the absence of unobserved confounding. Our model minimizes the Pearson \(\chi^2\) divergence, which we show simultaneously maximizes coverage and minimizes the variance of importance sampling estimates. To our knowledge, this is the first such application of the Pearson \(\chi^2\) divergence. We demonstrate the effectiveness of cGAN in achieving feature balance relative to established weighting methods in simulation and with real-world medical data. |
Supervised Hyperalignment for multi-subject fMRI data alignment Neurons and Cognition, Machine Learning, Machine Learning. 4 authors. pdf Hyperalignment has been widely employed in Multivariate Pattern (MVP) analysis to discover the cognitive states in the human brains based on multi-subject functional Magnetic Resonance Imaging (fMRI) datasets. Most of the existing HA methods utilized unsupervised approaches, where they only maximized the correlation between the voxels with the same position in the time series. …However, these unsupervised solutions may not be optimum for handling the functional alignment in the supervised MVP problems. This paper proposes a Supervised Hyperalignment (SHA) method to ensure better functional alignment for MVP analysis, where the proposed method provides a supervised shared space that can maximize the correlation among the stimuli belonging to the same category and minimize the correlation between distinct categories of stimuli. Further, SHA employs a generalized optimization solution, which generates the shared space and calculates the mapped features in a single iteration, hence with optimum time and space complexities for large datasets. Experiments on multi-subject datasets demonstrate that SHA method achieves up to 19% better performance for multi-class problems over the state-of-the-art HA algorithms. |
Supervised Discriminative Sparse PCA with Adaptive Neighbors for Dimensionality Reduction Machine Learning, Machine Learning. 4 authors. pdf Dimensionality reduction is an important operation in information visualization, feature extraction, clustering, regression, and classification, especially for processing noisy high dimensional data. However, most existing approaches preserve either the global or the local structure of the data, but not both. …Approaches that preserve only the global data structure, such as principal component analysis (PCA), are usually sensitive to outliers. Approaches that preserve only the local data structure, such as locality preserving projections, are usually unsupervised (and hence cannot use label information) and uses a fixed similarity graph. We propose a novel linear dimensionality reduction approach, supervised discriminative sparse PCA with adaptive neighbors (SDSPCAAN), to integrate neighborhood-free supervised discriminative sparse PCA and projected clustering with adaptive neighbors. As a result, both global and local data structures, as well as the label information, are used for better dimensionality reduction. Classification experiments on nine high-dimensional datasets validated the effectiveness and robustness of our proposed SDSPCAAN. |
Sampling Prediction-Matching Examples in Neural Networks: A Probabilistic Programming Approach Machine Learning, Machine Learning. 4 authors. pdf Though neural network models demonstrate impressive performance, we do not understand exactly how these black-box models make individual predictions. This drawback has led to substantial research devoted to understand these models in areas such as robustness, interpretability, and generalization ability. …In this paper, we consider the problem of exploring the prediction level sets of a classifier using probabilistic programming. We define a prediction level set to be the set of examples for which the predictor has the same specified prediction confidence with respect to some arbitrary data distribution. Notably, our sampling-based method does not require the classifier to be differentiable, making it compatible with arbitrary classifiers. As a specific instantiation, if we take the classifier to be a neural network and the data distribution to be that of the training data, we can obtain examples that will result in specified predictions by the neural network. We demonstrate this technique with experiments on a synthetic dataset and MNIST. Such level sets in classification may facilitate human understanding of classification behaviors. |
Deep Network Approximation for Smooth Functions Numerical Analysis, Machine Learning, Numerical Analysis, Machine Learning. 4 authors. pdf This paper establishes optimal approximation error characterization of deep ReLU networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width \(\mathcal{O}(N)\) and depth \(\mathcal{O}(L)\) with an approximation error \(\mathcal{O}(N^{-L})\). …Through local Taylor expansions and their deep ReLU network approximations, we show that deep ReLU networks of width \(\mathcal{O}(N\ln N)\) and depth \(\mathcal{O}(L\ln L)\) can approximate \(f\in C^s([0,1]^d)\) with a nearly optimal approximation rate \(\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})\). Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by \(N\in\mathbb{N}^+\) and \(L\in\mathbb{N}^+\), respectively. |
Population-Guided Parallel Policy Search for Reinforcement Learning Machine Learning, Artificial Intelligence, Machine Learning. 3 authors. pdf In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. …The key point is that the information of the best policy is fused in a soft manner by constructing an augmented loss function for policy update to enlarge the overall search region by the multiple learners. The guidance by the previous best policy and the enlarged range enable faster and better policy search. Monotone improvement of the expected cumulative return by the proposed scheme is proved theoretically. Working algorithms are constructed by applying the proposed scheme to the twin delayed deep deterministic (TD3) policy gradient algorithm. Numerical results show that the constructed algorithm outperforms most of the current state-of-the-art RL algorithms, and the gain is significant in the case of sparse reward environment. |
Guidelines for enhancing data locality in selected machine learning algorithms Machine Learning, Machine Learning. 3 authors. pdf To deal with the complexity of the new bigger and more complex generation of data, machine learning (ML) techniques are probably the first and foremost used. For ML algorithms to produce results in a reasonable amount of time, they need to be implemented efficiently. …In this paper, we analyze one of the means to increase the performances of machine learning algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We start by motivating why and how a more efficient implementation can be achieved by exploiting reuse in the memory hierarchy of modern instruction set processors. Next we document the possibilities of such reuse in some selected machine learning algorithms. |
Shallow Encoder Deep Decoder (SEDD) Networks for Image Encryption and Decryption Machine Learning, Machine Learning, Cryptography and Security. 1 authors. pdf This paper explores a new framework for lossy image encryption and decryption using a simple shallow encoder neural network E for encryption, and a complex deep decoder neural network D for decryption. E is kept simple so that encoding can be done on low power and portable devices and can in principle be any nonlinear function which outputs an encoded vector. …D is trained to decode the encodings using the dataset of image - encoded vector pairs obtained from E and happens independently of E. As the encodings come from E which while being a simple neural network, still has thousands of random parameters and therefore the encodings would be practically impossible to crack without D. This approach differs from autoencoders as D is trained completely independently of E, although the structure may seem similar. Therefore, this paper also explores empirically if a deep neural network can learn to reconstruct the original data in any useful form given the output of a neural network or any other nonlinear function, which can have very useful applications in Cryptanalysis. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the decoded images from D along with some limitations. |
Machine Learning (cs.LG) |
Don’t Judge an Object by Its Context: Learning to Overcome Contextual Bias Computer Vision and Pattern Recognition, Machine Learning. 6 authors. pdf Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. However, strongly relying on context risks a model’s generalizability, especially when typical co-occurrence patterns are absent. …This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations. Our goal is to accurately recognize a category in the absence of its context, without compromising on performance when it co-occurs with context. Our key idea is to decorrelate feature representations of a category from its co-occurring context. We achieve this by learning a feature subspace that explicitly represents categories occurring in the absence of context along side a joint feature subspace that represents both categories and context. Our very simple yet effective method is extensible to two multi-label tasks – object and attribute classification. On 4 challenging datasets, we demonstrate the effectiveness of our method in reducing contextual bias. |
Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function Machine Learning, Systems and Control, Artificial Intelligence. 5 authors. pdf In current reinforcement learning (RL) methods, function approximation errors are known to lead to the overestimated or underestimated state-action values Q, which further lead to suboptimal policies. We show that the learning of a state-action return distribution function can be used to improve the estimation accuracy of the Q-value. …We combine the distributional return function within the maximum entropy RL framework in order to develop what we call the Distributional Soft Actor-Critic algorithm, DSAC, which is an off-policy method for continuous control setting. Unlike traditional distributional Q algorithms which typically only learn a discrete return distribution, DSAC can directly learn a continuous return distribution by truncating the difference between the target and current return distribution to prevent gradient explosion. Additionally, we propose a new Parallel Asynchronous Buffer-Actor-Learner architecture (PABAL) to improve the learning efficiency. We evaluate our method on the suite of MuJoCo continuous control tasks, achieving the state of the art performance. |
Virtual to Real adaptation of Pedestrian Detectors for Smart Cities Computer Vision and Pattern Recognition, Machine Learning, Image and Video Processing. 5 authors. pdf Pedestrian detection through computer vision is a building block for a multitude of applications in the context of smart cities, such as surveillance of sensitive areas, personal safety, monitoring, and control of pedestrian flow, to mention only a few. Recently, there was an increasing interest in deep learning architectures for performing such a task. …One of the critical objectives of these algorithms is to generalize the knowledge gained during the training phase to new scenarios having various characteristics, and a suitably labeled dataset is fundamental to achieve this goal. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is a time-consuming operation. For this reason, in this work, we introduced ViPeD - Virtual Pedestrian Dataset, a new synthetically generated set of images collected from a realistic 3D video game where the labels can be automatically generated exploiting 2D pedestrian positions extracted from the graphics engine. We used this new synthetic dataset training a state-of-the-art computationally-efficient Convolutional Neural Network (CNN) that is ready to be installed in smart low-power devices, like smart cameras. We addressed the problem of the domain-adaptation from the virtual world to the real one by fine-tuning the CNN using the synthetic data and also exploiting a mixed-batch supervised training approach. Extensive experimentation carried out on different real-world datasets shows very competitive results compared to other methods presented in the literature in which the algorithms are trained using real-world data. |
DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection Computer Vision and Pattern Recognition, Machine Learning. 5 authors. pdf In this paper, we present our on-going effort of constructing a large-scale benchmark, DeeperForensics-1.0, for face forgery detection. …Our benchmark represents the largest face forgery detection dataset by far, with 60, 000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. Extensive real-world perturbations are applied to obtain a more challenging benchmark of larger scale and higher diversity. All source videos in DeeperForensics-1.0 are carefully collected, and fake videos are generated by a newly proposed end-to-end face swapping framework. The quality of generated videos outperforms those in existing datasets, validated by user studies. The benchmark features a hidden test set, which contains manipulated videos achieving high deceptive scores in human evaluations. We further contribute a comprehensive study that evaluates five representative detection baselines and make a thorough analysis of different settings. We believe this dataset will contribute to real-world face forgery detection research. |
The Counterfactual \(χ\)-GAN Machine Learning, Machine Learning. 4 authors. pdf Causal inference often relies on the counterfactual framework, which requires that treatment assignment is independent of the outcome, known as strong ignorability. Approaches to enforcing strong ignorability in causal analyses of observational data include weighting and matching methods. …Effect estimates, such as the average treatment effect (ATE), are then estimated as expectations under the reweighted or matched distribution, P . The choice of P is important and can impact the interpretation of the effect estimate and the variance of effect estimates. In this work, instead of specifying P, we learn a distribution that simultaneously maximizes coverage and minimizes variance of ATE estimates. In order to learn this distribution, this research proposes a generative adversarial network (GAN)-based model called the Counterfactual \(\chi\)-GAN (cGAN), which also learns feature-balancing weights and supports unbiased causal estimation in the absence of unobserved confounding. Our model minimizes the Pearson \(\chi^2\) divergence, which we show simultaneously maximizes coverage and minimizes the variance of importance sampling estimates. To our knowledge, this is the first such application of the Pearson \(\chi^2\) divergence. We demonstrate the effectiveness of cGAN in achieving feature balance relative to established weighting methods in simulation and with real-world medical data. |
Performance-Oriented Neural Architecture Search Machine Learning, Neural and Evolutionary Computing. 4 authors. pdf Hardware-Software Co-Design is a highly successful strategy for improving performance of domain-specific computing systems. We argue for the application of the same methodology to deep learning; specifically, we propose to extend neural architecture search with information about the hardware to ensure that the model designs produced are highly efficient in addition to the typical criteria around accuracy. …Using the task of keyword spotting in audio on edge computing devices, we demonstrate that our approach results in neural architecture that is not only highly accurate, but also efficiently mapped to the computing platform which will perform the inference. Using our modified neural architecture search, we demonstrate \(0.88\%\) increase in TOP-1 accuracy with \(1.85\times\) reduction in latency for keyword spotting in audio on an embedded SoC, and \(1.59\times\) on a high-end GPU. |
Supervised Hyperalignment for multi-subject fMRI data alignment Neurons and Cognition, Machine Learning, Machine Learning. 4 authors. pdf Hyperalignment has been widely employed in Multivariate Pattern (MVP) analysis to discover the cognitive states in the human brains based on multi-subject functional Magnetic Resonance Imaging (fMRI) datasets. Most of the existing HA methods utilized unsupervised approaches, where they only maximized the correlation between the voxels with the same position in the time series. …However, these unsupervised solutions may not be optimum for handling the functional alignment in the supervised MVP problems. This paper proposes a Supervised Hyperalignment (SHA) method to ensure better functional alignment for MVP analysis, where the proposed method provides a supervised shared space that can maximize the correlation among the stimuli belonging to the same category and minimize the correlation between distinct categories of stimuli. Further, SHA employs a generalized optimization solution, which generates the shared space and calculates the mapped features in a single iteration, hence with optimum time and space complexities for large datasets. Experiments on multi-subject datasets demonstrate that SHA method achieves up to 19% better performance for multi-class problems over the state-of-the-art HA algorithms. |
Supervised Discriminative Sparse PCA with Adaptive Neighbors for Dimensionality Reduction Machine Learning, Machine Learning. 4 authors. pdf Dimensionality reduction is an important operation in information visualization, feature extraction, clustering, regression, and classification, especially for processing noisy high dimensional data. However, most existing approaches preserve either the global or the local structure of the data, but not both. …Approaches that preserve only the global data structure, such as principal component analysis (PCA), are usually sensitive to outliers. Approaches that preserve only the local data structure, such as locality preserving projections, are usually unsupervised (and hence cannot use label information) and uses a fixed similarity graph. We propose a novel linear dimensionality reduction approach, supervised discriminative sparse PCA with adaptive neighbors (SDSPCAAN), to integrate neighborhood-free supervised discriminative sparse PCA and projected clustering with adaptive neighbors. As a result, both global and local data structures, as well as the label information, are used for better dimensionality reduction. Classification experiments on nine high-dimensional datasets validated the effectiveness and robustness of our proposed SDSPCAAN. |
Trajectron++: Multi-Agent Generative Trajectory Forecasting With Heterogeneous Data for Control Machine Learning, Human-Computer Interaction, Robotics. 4 authors. pdf Reasoning about human motion through an environment is an important prerequisite to safe and socially-aware robotic navigation. As a result, multi-agent behavior prediction has become a core component of modern human-robot interactive systems, such as self-driving cars. …While there exist a multitude of methods for trajectory forecasting, many of them have only been evaluated with one semantic class of agents and only use prior trajectory information, ignoring a plethora of information available online to autonomous systems from common sensors. Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of agents with distinct semantic classes while incorporating heterogeneous data (e.g. semantic maps and camera images). Our model is designed to be tightly integrated with robotic planning and control frameworks; it is capable of producing predictions that are conditioned on ego-agent motion plans. We demonstrate the performance of our model on several challenging real-world trajectory forecasting datasets, outperforming a wide array of state-of-the-art deterministic and generative methods. |
Sampling Prediction-Matching Examples in Neural Networks: A Probabilistic Programming Approach Machine Learning, Machine Learning. 4 authors. pdf Though neural network models demonstrate impressive performance, we do not understand exactly how these black-box models make individual predictions. This drawback has led to substantial research devoted to understand these models in areas such as robustness, interpretability, and generalization ability. …In this paper, we consider the problem of exploring the prediction level sets of a classifier using probabilistic programming. We define a prediction level set to be the set of examples for which the predictor has the same specified prediction confidence with respect to some arbitrary data distribution. Notably, our sampling-based method does not require the classifier to be differentiable, making it compatible with arbitrary classifiers. As a specific instantiation, if we take the classifier to be a neural network and the data distribution to be that of the training data, we can obtain examples that will result in specified predictions by the neural network. We demonstrate this technique with experiments on a synthetic dataset and MNIST. Such level sets in classification may facilitate human understanding of classification behaviors. |
Deep Network Approximation for Smooth Functions Numerical Analysis, Machine Learning, Numerical Analysis, Machine Learning. 4 authors. pdf This paper establishes optimal approximation error characterization of deep ReLU networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width \(\mathcal{O}(N)\) and depth \(\mathcal{O}(L)\) with an approximation error \(\mathcal{O}(N^{-L})\). …Through local Taylor expansions and their deep ReLU network approximations, we show that deep ReLU networks of width \(\mathcal{O}(N\ln N)\) and depth \(\mathcal{O}(L\ln L)\) can approximate \(f\in C^s([0,1]^d)\) with a nearly optimal approximation rate \(\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})\). Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by \(N\in\mathbb{N}^+\) and \(L\in\mathbb{N}^+\), respectively. |
Population-Guided Parallel Policy Search for Reinforcement Learning Machine Learning, Artificial Intelligence, Machine Learning. 3 authors. pdf In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. …The key point is that the information of the best policy is fused in a soft manner by constructing an augmented loss function for policy update to enlarge the overall search region by the multiple learners. The guidance by the previous best policy and the enlarged range enable faster and better policy search. Monotone improvement of the expected cumulative return by the proposed scheme is proved theoretically. Working algorithms are constructed by applying the proposed scheme to the twin delayed deep deterministic (TD3) policy gradient algorithm. Numerical results show that the constructed algorithm outperforms most of the current state-of-the-art RL algorithms, and the gain is significant in the case of sparse reward environment. |
Guidelines for enhancing data locality in selected machine learning algorithms Machine Learning, Machine Learning. 3 authors. pdf To deal with the complexity of the new bigger and more complex generation of data, machine learning (ML) techniques are probably the first and foremost used. For ML algorithms to produce results in a reasonable amount of time, they need to be implemented efficiently. …In this paper, we analyze one of the means to increase the performances of machine learning algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We start by motivating why and how a more efficient implementation can be achieved by exploiting reuse in the memory hierarchy of modern instruction set processors. Next we document the possibilities of such reuse in some selected machine learning algorithms. |
Closed-loop deep learning: generating forward models with back-propagation Machine Learning, Artificial Intelligence, Robotics. 2 authors. pdf A reflex is a simple closed loop control approach which tries to minimise an error but fails to do so because it will always react too late. An adaptive algorithm can use this error to learn a forward model with the help of predictive cues. …For example a driver learns to improve their steering by looking ahead to avoid steering in the last minute. In order to process complex cues such as the road ahead deep learning is a natural choice. However, this is usually only achieved indirectly by employing deep reinforcement learning having a discrete state space. Here, we show how this can be directly achieved by embedding deep learning into a closed loop system and preserving its continuous processing. We show specifically how error back-propagation can be achieved in z-space and in general how gradient based approaches can be analysed in such closed loop scenarios. The performance of this learning paradigm is demonstrated using a line-follower both in simulation and on a real robot that show very fast and continuous learning. |
Spherical Image Generation from a Single Normal Field of View Image by Considering Scene Symmetry Computer Vision and Pattern Recognition, Machine Learning, Image and Video Processing. 2 authors. pdf Spherical images taken in all directions (360 degrees) allow representing the surroundings of the subject and the space itself, providing an immersive experience to the viewers. Generating a spherical image from a single normal-field-of-view (NFOV) image is convenient and considerably expands the usage scenarios because there is no need to use a specific panoramic camera or take images from multiple directions; however, it is still a challenging and unsolved problem. …The primary challenge is controlling the high degree of freedom involved in generating a wide area that includes the all directions of the desired plausible spherical image. On the other hand, scene symmetry is a basic property of the global structure of the spherical images, such as rotation symmetry, plane symmetry and asymmetry. We propose a method to generate spherical image from a single NFOV image, and control the degree of freedom of the generated regions using scene symmetry. We incorporate scene-symmetry parameters as latent variables into conditional variational autoencoders, following which we learn the conditional probability of spherical images for NFOV images and scene symmetry. Furthermore, the probability density functions are represented using neural networks, and scene symmetry is implemented using both circular shift and flip of the hidden variables. Our experiments show that the proposed method can generate various plausible spherical images, controlled from symmetric to asymmetric. |
Regularity and stability of feedback relaxed controls Optimization and Control, Machine Learning. 2 authors. pdf This paper proposes a relaxed control regularization with general exploration rewards to design robust feedback controls for multi-dimensional continuous-time stochastic exit time problems. We establish that the regularized control problem admits a H"{o}lder continuous feedback control, and demonstrate that both the value function and the feedback control of the regularized control problem are Lipschitz stable with respect to parameter perturbations. …Moreover, we show that a pre-computed feedback relaxed control has a robust performance in a perturbed system, and derive a first-order sensitivity equation for both the value function and optimal feedback relaxed control. We finally prove first-order monotone convergence of the value functions for relaxed control problems with vanishing exploration parameters, which subsequently enables us to construct the pure exploitation strategy of the original control problem based on the feedback relaxed controls. |
A Connection between Feedback Capacity and Kalman Filter for Colored Gaussian Noises Machine Learning, Optimization and Control, Information Theory, Systems and Control, Information Theory, Signal Processing. 2 authors. pdf In this paper, we establish a connection between the feedback capacity of additive colored Gaussian noise channels and the Kalman filters with additive colored Gaussian noises. In light of this, we are able to provide lower bounds on feedback capacity of such channels with finite-order auto-regressive moving average colored noises, and the bounds are seen to be consistent with various existing results in the literature; particularly, the bound is tight in the case of first-order auto-regressive moving average colored noises. …On the other hand, the Kalman filtering systems, after certain equivalence transformations, can be employed as recursive coding schemes/algorithms to achieve the lower bounds. In general, our results provide an alternative perspective while pointing to potentially tighter bounds for the feedback capacity problem. |
Shallow Encoder Deep Decoder (SEDD) Networks for Image Encryption and Decryption Machine Learning, Machine Learning, Cryptography and Security. 1 authors. pdf This paper explores a new framework for lossy image encryption and decryption using a simple shallow encoder neural network E for encryption, and a complex deep decoder neural network D for decryption. E is kept simple so that encoding can be done on low power and portable devices and can in principle be any nonlinear function which outputs an encoded vector. …D is trained to decode the encodings using the dataset of image - encoded vector pairs obtained from E and happens independently of E. As the encodings come from E which while being a simple neural network, still has thousands of random parameters and therefore the encodings would be practically impossible to crack without D. This approach differs from autoencoders as D is trained completely independently of E, although the structure may seem similar. Therefore, this paper also explores empirically if a deep neural network can learn to reconstruct the original data in any useful form given the output of a neural network or any other nonlinear function, which can have very useful applications in Cryptanalysis. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the decoded images from D along with some limitations. |
The tables below show abstracts organized by category with hyperlinks back to the arXiv site.
Machine Learning (cs.LG) |
Don’t Judge an Object by Its Context: Learning to Overcome Contextual Bias Computer Vision and Pattern Recognition, Machine Learning. 6 authors. pdf Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. However, strongly relying on context risks a model’s generalizability, especially when typical co-occurrence patterns are absent. …This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations. Our goal is to accurately recognize a category in the absence of its context, without compromising on performance when it co-occurs with context. Our key idea is to decorrelate feature representations of a category from its co-occurring context. We achieve this by learning a feature subspace that explicitly represents categories occurring in the absence of context along side a joint feature subspace that represents both categories and context. Our very simple yet effective method is extensible to two multi-label tasks – object and attribute classification. On 4 challenging datasets, we demonstrate the effectiveness of our method in reducing contextual bias. |
Objects detection for remote sensing images based on polar coordinates Computer Vision and Pattern Recognition. 6 authors. pdf Oriented and horizontal bounding box are two typical output forms in the field of remote sensing object detection. In this filed, most present state-of-the-art detectors belong to anchor-based method and perform regression tasks in Cartesian coordinates, which cause the design of oriented detectors is much more complicated than the horizontal ones, because the former usually needs to devise more complex rotated anchors, rotated Intersection-over-Union (IOU) and rotated Non Maximum Supression (NMS). …In this paper, we propose a novel anchor-free detector modeled in polar coordinates to detect objects for remote sensing images, which makes the acquisition of oriented output form be as simple as the horizontal one. Our model, named Polar Remote Sensing Object Detector (P-RSDet), takes the center point of each object as the pole point and the horizontal positive direction as the polar axis to establish the polar coordinate system. The detection of one object can be regarded as predictions of one polar radius and two polar angles for both horizontal and oriented bounding box by our model. P-RSDet realizes the combination of two output forms with minimum cost. Experiments show that our P-RSDet achieves competitive performances on DOTA, UCAS-AOD and NWPU VHR-10 datasets on both horizontal and oreinted detection fileds. |
Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function Machine Learning, Systems and Control, Artificial Intelligence. 5 authors. pdf In current reinforcement learning (RL) methods, function approximation errors are known to lead to the overestimated or underestimated state-action values Q, which further lead to suboptimal policies. We show that the learning of a state-action return distribution function can be used to improve the estimation accuracy of the Q-value. …We combine the distributional return function within the maximum entropy RL framework in order to develop what we call the Distributional Soft Actor-Critic algorithm, DSAC, which is an off-policy method for continuous control setting. Unlike traditional distributional Q algorithms which typically only learn a discrete return distribution, DSAC can directly learn a continuous return distribution by truncating the difference between the target and current return distribution to prevent gradient explosion. Additionally, we propose a new Parallel Asynchronous Buffer-Actor-Learner architecture (PABAL) to improve the learning efficiency. We evaluate our method on the suite of MuJoCo continuous control tasks, achieving the state of the art performance. |
Virtual to Real adaptation of Pedestrian Detectors for Smart Cities Computer Vision and Pattern Recognition, Machine Learning, Image and Video Processing. 5 authors. pdf Pedestrian detection through computer vision is a building block for a multitude of applications in the context of smart cities, such as surveillance of sensitive areas, personal safety, monitoring, and control of pedestrian flow, to mention only a few. Recently, there was an increasing interest in deep learning architectures for performing such a task. …One of the critical objectives of these algorithms is to generalize the knowledge gained during the training phase to new scenarios having various characteristics, and a suitably labeled dataset is fundamental to achieve this goal. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is a time-consuming operation. For this reason, in this work, we introduced ViPeD - Virtual Pedestrian Dataset, a new synthetically generated set of images collected from a realistic 3D video game where the labels can be automatically generated exploiting 2D pedestrian positions extracted from the graphics engine. We used this new synthetic dataset training a state-of-the-art computationally-efficient Convolutional Neural Network (CNN) that is ready to be installed in smart low-power devices, like smart cameras. We addressed the problem of the domain-adaptation from the virtual world to the real one by fine-tuning the CNN using the synthetic data and also exploiting a mixed-batch supervised training approach. Extensive experimentation carried out on different real-world datasets shows very competitive results compared to other methods presented in the literature in which the algorithms are trained using real-world data. |
DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection Computer Vision and Pattern Recognition, Machine Learning. 5 authors. pdf In this paper, we present our on-going effort of constructing a large-scale benchmark, DeeperForensics-1.0, for face forgery detection. …Our benchmark represents the largest face forgery detection dataset by far, with 60, 000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. Extensive real-world perturbations are applied to obtain a more challenging benchmark of larger scale and higher diversity. All source videos in DeeperForensics-1.0 are carefully collected, and fake videos are generated by a newly proposed end-to-end face swapping framework. The quality of generated videos outperforms those in existing datasets, validated by user studies. The benchmark features a hidden test set, which contains manipulated videos achieving high deceptive scores in human evaluations. We further contribute a comprehensive study that evaluates five representative detection baselines and make a thorough analysis of different settings. We believe this dataset will contribute to real-world face forgery detection research. |
The Counterfactual \(χ\)-GAN Machine Learning, Machine Learning. 4 authors. pdf Causal inference often relies on the counterfactual framework, which requires that treatment assignment is independent of the outcome, known as strong ignorability. Approaches to enforcing strong ignorability in causal analyses of observational data include weighting and matching methods. …Effect estimates, such as the average treatment effect (ATE), are then estimated as expectations under the reweighted or matched distribution, P . The choice of P is important and can impact the interpretation of the effect estimate and the variance of effect estimates. In this work, instead of specifying P, we learn a distribution that simultaneously maximizes coverage and minimizes variance of ATE estimates. In order to learn this distribution, this research proposes a generative adversarial network (GAN)-based model called the Counterfactual \(\chi\)-GAN (cGAN), which also learns feature-balancing weights and supports unbiased causal estimation in the absence of unobserved confounding. Our model minimizes the Pearson \(\chi^2\) divergence, which we show simultaneously maximizes coverage and minimizes the variance of importance sampling estimates. To our knowledge, this is the first such application of the Pearson \(\chi^2\) divergence. We demonstrate the effectiveness of cGAN in achieving feature balance relative to established weighting methods in simulation and with real-world medical data. |
Open Challenge for Correcting Errors of Speech Recognition Systems Computation and Language, Artificial Intelligence. 4 authors. pdf The paper announces the new long-term challenge for improving the performance of automatic speech recognition systems. The goal of the challenge is to investigate methods of correcting the recognition results on the basis of previously made errors by the speech processing system. …The dataset prepared for the task is described and evaluation criteria are presented. |
Performance-Oriented Neural Architecture Search Machine Learning, Neural and Evolutionary Computing. 4 authors. pdf Hardware-Software Co-Design is a highly successful strategy for improving performance of domain-specific computing systems. We argue for the application of the same methodology to deep learning; specifically, we propose to extend neural architecture search with information about the hardware to ensure that the model designs produced are highly efficient in addition to the typical criteria around accuracy. …Using the task of keyword spotting in audio on edge computing devices, we demonstrate that our approach results in neural architecture that is not only highly accurate, but also efficiently mapped to the computing platform which will perform the inference. Using our modified neural architecture search, we demonstrate \(0.88\%\) increase in TOP-1 accuracy with \(1.85\times\) reduction in latency for keyword spotting in audio on an embedded SoC, and \(1.59\times\) on a high-end GPU. |
GRIDS: Interactive Layout Design with Integer Programming Human-Computer Interaction, Artificial Intelligence. 4 authors. pdf Grid layouts are used by designers to spatially organise user interfaces when sketching and wireframing. However, their design is largely time consuming manual work. …This is challenging due to combinatorial explosion and complex objectives, such as alignment, balance, and expectations regarding positions. This paper proposes a novel optimisation approach for the generation of diverse grid-based layouts. Our mixed integer linear programming (MILP) model offers a rigorous yet efficient method for grid generation that ensures packing, alignment, grouping, and preferential positioning of elements. Further, we present techniques for interactive diversification, enhancement, and completion of grid layouts (Figure 1). These capabilities are demonstrated using GRIDS1, a wireframing tool that provides designers with real-time layout suggestions. We report findings from a ratings study (N = 13) and a design study (N = 16), lending evidence for the benefit of computational grid generation during early stages of design. |
Generative Pseudo-label Refinement for Unsupervised Domain Adaptation Computer Vision and Pattern Recognition. 4 authors. pdf We investigate and characterize the inherent resilience of conditional Generative Adversarial Networks (cGANs) against noise in their conditioning labels, and exploit this fact in the context of Unsupervised Domain Adaptation (UDA). In UDA, a classifier trained on the labelled source set can be used to infer pseudo-labels on the unlabelled target set. …However, this will result in a significant amount of misclassified examples (due to the well-known domain shift issue), which can be interpreted as noise injection in the ground-truth labels for the target set. We show that cGANs are, to some extent, robust against such “shift noise”. Indeed, cGANs trained with noisy pseudo-labels, are able to filter such noise and generate cleaner target samples. We exploit this finding in an iterative procedure where a generative model and a classifier are jointly trained: in turn, the generator allows to sample cleaner data from the target distribution, and the classifier allows to associate better labels to target samples, progressively refining target pseudo-labels. Results on common benchmarks show that our method performs better or comparably with the unsupervised domain adaptation state of the art. |
Computer Vision and Pattern Recognition (cs.CV) |
Supervised Discriminative Sparse PCA with Adaptive Neighbors for Dimensionality Reduction Machine Learning, Machine Learning. 4 authors. pdf Dimensionality reduction is an important operation in information visualization, feature extraction, clustering, regression, and classification, especially for processing noisy high dimensional data. However, most existing approaches preserve either the global or the local structure of the data, but not both. …Approaches that preserve only the global data structure, such as principal component analysis (PCA), are usually sensitive to outliers. Approaches that preserve only the local data structure, such as locality preserving projections, are usually unsupervised (and hence cannot use label information) and uses a fixed similarity graph. We propose a novel linear dimensionality reduction approach, supervised discriminative sparse PCA with adaptive neighbors (SDSPCAAN), to integrate neighborhood-free supervised discriminative sparse PCA and projected clustering with adaptive neighbors. As a result, both global and local data structures, as well as the label information, are used for better dimensionality reduction. Classification experiments on nine high-dimensional datasets validated the effectiveness and robustness of our proposed SDSPCAAN. |
Trajectron++: Multi-Agent Generative Trajectory Forecasting With Heterogeneous Data for Control Machine Learning, Human-Computer Interaction, Robotics. 4 authors. pdf Reasoning about human motion through an environment is an important prerequisite to safe and socially-aware robotic navigation. As a result, multi-agent behavior prediction has become a core component of modern human-robot interactive systems, such as self-driving cars. …While there exist a multitude of methods for trajectory forecasting, many of them have only been evaluated with one semantic class of agents and only use prior trajectory information, ignoring a plethora of information available online to autonomous systems from common sensors. Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of agents with distinct semantic classes while incorporating heterogeneous data (e.g. semantic maps and camera images). Our model is designed to be tightly integrated with robotic planning and control frameworks; it is capable of producing predictions that are conditioned on ego-agent motion plans. We demonstrate the performance of our model on several challenging real-world trajectory forecasting datasets, outperforming a wide array of state-of-the-art deterministic and generative methods. |
Sampling Prediction-Matching Examples in Neural Networks: A Probabilistic Programming Approach Machine Learning, Machine Learning. 4 authors. pdf Though neural network models demonstrate impressive performance, we do not understand exactly how these black-box models make individual predictions. This drawback has led to substantial research devoted to understand these models in areas such as robustness, interpretability, and generalization ability. …In this paper, we consider the problem of exploring the prediction level sets of a classifier using probabilistic programming. We define a prediction level set to be the set of examples for which the predictor has the same specified prediction confidence with respect to some arbitrary data distribution. Notably, our sampling-based method does not require the classifier to be differentiable, making it compatible with arbitrary classifiers. As a specific instantiation, if we take the classifier to be a neural network and the data distribution to be that of the training data, we can obtain examples that will result in specified predictions by the neural network. We demonstrate this technique with experiments on a synthetic dataset and MNIST. Such level sets in classification may facilitate human understanding of classification behaviors. |
Deep Network Approximation for Smooth Functions Numerical Analysis, Machine Learning, Numerical Analysis, Machine Learning. 4 authors. pdf This paper establishes optimal approximation error characterization of deep ReLU networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width \(\mathcal{O}(N)\) and depth \(\mathcal{O}(L)\) with an approximation error \(\mathcal{O}(N^{-L})\). …Through local Taylor expansions and their deep ReLU network approximations, we show that deep ReLU networks of width \(\mathcal{O}(N\ln N)\) and depth \(\mathcal{O}(L\ln L)\) can approximate \(f\in C^s([0,1]^d)\) with a nearly optimal approximation rate \(\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})\). Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by \(N\in\mathbb{N}^+\) and \(L\in\mathbb{N}^+\), respectively. |
Killing Stubborn Mutants with Symbolic Execution Software Engineering. 4 authors. pdf We introduce SeMu, a Dynamic Symbolic Execution technique that generates test inputs capable of killing stubborn mutants (killable mutants that remain undetected after a reasonable amount of testing). SeMu aims at mutant propagation (triggering erroneous states to the program output) by incrementally searching for divergent program behaviours between the original and the mutant versions. …We model the mutant killing problem as a symbolic execution search within a specific area in the programs’ symbolic tree. In this framework, the search area is defined and controlled by parameters that allow scalable and cost-effective mutant killing. We integrate SeMu in KLEE and experimented with Coreutils (a benchmark frequently used in symbolic execution studies). Our results show that our modelling plays an important role in mutant killing. Perhaps more importantly, our results also show that, within a two-hour time limit, SeMu kills 37% of the stubborn mutants, where KLEE kills none and where the mutant infection strategy (strategy suggested by previous research) kills 17%. |
Population-Guided Parallel Policy Search for Reinforcement Learning Machine Learning, Artificial Intelligence, Machine Learning. 3 authors. pdf In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. …The key point is that the information of the best policy is fused in a soft manner by constructing an augmented loss function for policy update to enlarge the overall search region by the multiple learners. The guidance by the previous best policy and the enlarged range enable faster and better policy search. Monotone improvement of the expected cumulative return by the proposed scheme is proved theoretically. Working algorithms are constructed by applying the proposed scheme to the twin delayed deep deterministic (TD3) policy gradient algorithm. Numerical results show that the constructed algorithm outperforms most of the current state-of-the-art RL algorithms, and the gain is significant in the case of sparse reward environment. |
STAViS: Spatio-Temporal AudioVisual Saliency Network Computer Vision and Pattern Recognition. 3 authors. pdf We introduce STAViS, a spatio-temporal audiovisual saliency network that combines spatio-temporal visual and auditory information in order to efficiently address the problem of saliency estimation in videos. Our approach employs a single network that combines visual saliency and auditory features and learns to appropriately localize sound sources and to fuse the two saliencies in order to obtain a final saliency map. …The network has been designed, trained end-to-end, and evaluated on six different databases that contain audiovisual eye-tracking data of a large variety of videos. We compare our method against 8 different state-of-the-art visual saliency models. Evaluation results across databases indicate that our STAViS model outperforms our visual only variant as well as the other state-of-the-art models in the majority of cases. Also, the consistently good performance it achieves for all databases indicates that it is appropriate for estimating saliency “in-the-wild”. |
Guidelines for enhancing data locality in selected machine learning algorithms Machine Learning, Machine Learning. 3 authors. pdf To deal with the complexity of the new bigger and more complex generation of data, machine learning (ML) techniques are probably the first and foremost used. For ML algorithms to produce results in a reasonable amount of time, they need to be implemented efficiently. …In this paper, we analyze one of the means to increase the performances of machine learning algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We start by motivating why and how a more efficient implementation can be achieved by exploiting reuse in the memory hierarchy of modern instruction set processors. Next we document the possibilities of such reuse in some selected machine learning algorithms. |
Artificial Intelligence (cs.AI) |
Closed-loop deep learning: generating forward models with back-propagation Machine Learning, Artificial Intelligence, Robotics. 2 authors. pdf A reflex is a simple closed loop control approach which tries to minimise an error but fails to do so because it will always react too late. An adaptive algorithm can use this error to learn a forward model with the help of predictive cues. …For example a driver learns to improve their steering by looking ahead to avoid steering in the last minute. In order to process complex cues such as the road ahead deep learning is a natural choice. However, this is usually only achieved indirectly by employing deep reinforcement learning having a discrete state space. Here, we show how this can be directly achieved by embedding deep learning into a closed loop system and preserving its continuous processing. We show specifically how error back-propagation can be achieved in z-space and in general how gradient based approaches can be analysed in such closed loop scenarios. The performance of this learning paradigm is demonstrated using a line-follower both in simulation and on a real robot that show very fast and continuous learning. |
Conversational Search for Learning Technologies Information Retrieval, Human-Computer Interaction, Artificial Intelligence. 2 authors. pdf Conversational search is based on a user-system cooperation with the objective to solve an information-seeking task. In this report, we discuss the implication of such cooperation with the learning perspective from both user and system side. …We also focus on the stimulation of learning through a key component of conversational search, namely the multimodality of communication way, and discuss the implication in terms of information retrieval. We end with a research road map describing promising research directions and perspectives. |
Computation and Language (cs.CL) |
Probabilistic Reasoning across the Causal Hierarchy Logic in Computer Science, Artificial Intelligence. 2 authors. pdf We propose a formalization of the three-tier causal hierarchy of association, intervention, and counterfactuals as a series of probabilistic logical languages. Our languages are of strictly increasing expressivity, the first capable of expressing quantitative probabilistic reasoning—including conditional independence and Bayesian inference—the second encoding do-calculus reasoning for causal effects, and the third capturing a fully expressive do-calculus for arbitrary counterfactual queries. …We give a corresponding series of finitary axiomatizations complete over both structural causal models and probabilistic programs, and show that satisfiability and validity for each language are decidable in polynomial space. |
Computers and Society (cs.CY) |
The Neighbours’ Similar Fitness Property for Local Search Artificial Intelligence, Discrete Mathematics. 2 authors. pdf For most practical optimisation problems local search outperforms random sampling - despite the “No Free Lunch Theorem”. This paper introduces a property of search landscapes termed Neighbours’ Similar Fitness (NSF) that underlies the good performance of neighbourhood search in terms of local improvement. …Though necessary, NSF is not sufficient to ensure that searching for improvement among the neighbours of a good solution is better than random search. The paper introduces an additional (natural) property which supports a general proof that, for NSF landscapes, neighbourhood search beats random search. |
Human-Computer Interaction (cs.HC) |
Compression of convolutional neural networks for high performance imagematching tasks on mobile devices Computer Vision and Pattern Recognition. 2 authors. pdf Deep neural networks have demonstrated state-of-the-art performance for feature-based image matching through the advent of new large and diverse datasets. However, there has been little work on evaluating the computational cost, model size, and matching accuracy tradeoffs for these models. …This paper explicitly addresses these practical constraints by considering state-of-the-art L2Net architecture. We observe a significant redundancy in the L2Net architecture, which we exploit through the use of depthwise separable layers and an efficient Tucker decomposition. We demonstrate that a combination of these methods is more effective, but still sacrifices the top-end accuracy. We therefore propose the Convolution-Depthwise-Pointwise (CDP) layer, which provides a means of interpolating between the standard and depthwise separable convolutions. With this proposed layer, we are able to achieve up to 8 times reduction in the number of parameters on the L2Net architecture, 13 times reduction in the computational complexity, while sacrificing less than 1% on the overall accuracy across the HPatches benchmarks. To further demonstrate the generalisation of this approach, we apply it to the SuperPoint model. We show that CDP layers improve upon the accuracy while using significantly less parameters and floating-point operations for inference. |
Information Theory (cs.IT) |
Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification Computers and Society, Computer Vision and Pattern Recognition. 2 authors. pdf Modern face recognition systems leverage datasets containing images of hundreds of thousands of specific individuals’ faces to train deep convolutional neural networks to learn an embedding space that maps an arbitrary individual’s face to a vector representation of their identity. The performance of a face recognition system in face verification (1:1) and face identification (1:N) tasks is directly related to the ability of an embedding space to discriminate between identities. …Recently, there has been significant public scrutiny into the source and privacy implications of large-scale face recognition training datasets such as MS-Celeb-1M and MegaFace, as many people are uncomfortable with their face being used to train dual-use technologies that can enable mass surveillance. However, the impact of an individual’s inclusion in training data on a derived system’s ability to recognize them has not previously been studied. In this work, we audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model’s training data and an accuracy of 75.73% for those not present. This modest difference in accuracy demonstrates that face recognition systems using deep learning work better for individuals they are trained on, which has serious privacy implications when one considers all major open source face recognition training datasets do not obtain informed consent from individuals during their collection. |
Logic in Computer Science (cs.LO) |
Spherical Image Generation from a Single Normal Field of View Image by Considering Scene Symmetry Computer Vision and Pattern Recognition, Machine Learning, Image and Video Processing. 2 authors. pdf Spherical images taken in all directions (360 degrees) allow representing the surroundings of the subject and the space itself, providing an immersive experience to the viewers. Generating a spherical image from a single normal-field-of-view (NFOV) image is convenient and considerably expands the usage scenarios because there is no need to use a specific panoramic camera or take images from multiple directions; however, it is still a challenging and unsolved problem. …The primary challenge is controlling the high degree of freedom involved in generating a wide area that includes the all directions of the desired plausible spherical image. On the other hand, scene symmetry is a basic property of the global structure of the spherical images, such as rotation symmetry, plane symmetry and asymmetry. We propose a method to generate spherical image from a single NFOV image, and control the degree of freedom of the generated regions using scene symmetry. We incorporate scene-symmetry parameters as latent variables into conditional variational autoencoders, following which we learn the conditional probability of spherical images for NFOV images and scene symmetry. Furthermore, the probability density functions are represented using neural networks, and scene symmetry is implemented using both circular shift and flip of the hidden variables. Our experiments show that the proposed method can generate various plausible spherical images, controlled from symmetric to asymmetric. |
Robotics (cs.RO) |
A Connection between Feedback Capacity and Kalman Filter for Colored Gaussian Noises Machine Learning, Optimization and Control, Information Theory, Systems and Control, Information Theory, Signal Processing. 2 authors. pdf In this paper, we establish a connection between the feedback capacity of additive colored Gaussian noise channels and the Kalman filters with additive colored Gaussian noises. In light of this, we are able to provide lower bounds on feedback capacity of such channels with finite-order auto-regressive moving average colored noises, and the bounds are seen to be consistent with various existing results in the literature; particularly, the bound is tight in the case of first-order auto-regressive moving average colored noises. …On the other hand, the Kalman filtering systems, after certain equivalence transformations, can be employed as recursive coding schemes/algorithms to achieve the lower bounds. In general, our results provide an alternative perspective while pointing to potentially tighter bounds for the feedback capacity problem. |
Software Engineering (cs.SE) |
Shallow Encoder Deep Decoder (SEDD) Networks for Image Encryption and Decryption Machine Learning, Machine Learning, Cryptography and Security. 1 authors. pdf This paper explores a new framework for lossy image encryption and decryption using a simple shallow encoder neural network E for encryption, and a complex deep decoder neural network D for decryption. E is kept simple so that encoding can be done on low power and portable devices and can in principle be any nonlinear function which outputs an encoded vector. …D is trained to decode the encodings using the dataset of image - encoded vector pairs obtained from E and happens independently of E. As the encodings come from E which while being a simple neural network, still has thousands of random parameters and therefore the encodings would be practically impossible to crack without D. This approach differs from autoencoders as D is trained completely independently of E, although the structure may seem similar. Therefore, this paper also explores empirically if a deep neural network can learn to reconstruct the original data in any useful form given the output of a neural network or any other nonlinear function, which can have very useful applications in Cryptanalysis. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the decoded images from D along with some limitations. |
Computational Physics (physics.comp-ph) |
Cluster-based network model of an incompressible mixing layer Chaotic Dynamics, Fluid Dynamics, Data Analysis, Statistics and Probability. 5 authors. pdf We propose an automatable data-driven methodology for robust nonlinear reduced-order modeling from time-resolved snapshot data. In the kinematical coarse-graining, the snapshots are clustered into few centroids representable for the whole ensemble. …The dynamics is conceptualized as a directed network where the centroids represent nodes and the directed edges denote possible finite-time transitions. The transition probabilities and times are inferred from the snapshot data. The resulting cluster-based network model constitutes a deterministic-stochastic grey-box model resolving the coherent-structure evolution. This model is motivated by limit-cycle dynamics, illustrated for the chaotic Lorenz attractor and successfully demonstrated for the laminar two-dimensional mixing layer featuring Kelvin-Helmholtz vortices and vortex pairing. Cluster-based network modeling opens a promising new avenue with unique advantages over other model-order reductions based on clustering or proper orthogonal decomposition. |
Transformer ratio enhancement at wakefield excitation in blowout regime in plasma by electron bunch with semi-gaussian charge distribution Computational Physics, Plasma Physics, Accelerator Physics. 4 authors. pdf Using 2d3v code LCODE, the numerical simulation of nonlinear wakefield excitation in plasma by shaped relativistic electron bunch with charge distribution, which increases according to Gaussian charge distribution up to the maximum value, and then decreases sharply to zero, has been performed. Transformer ratio, as the ratio of the maximum accelerating field to the maximum decelerating field inside the bunch, and accelerating the wakefield have been investigated taking into account nonlinearity of the wakefield. …The dependence of the transformer ratio and the maximum accelerating field on the length of the bunch was investigated with a constant charge of the bunch. It was taken into account that the length of the nonlinear wakefield increases with increasing length of the bunch. It is shown that the transformer ratio reaches its maximum value for a certain length of the bunch. The maximum value of the transformer ratio reaches six as due to the profiling of the bunch, and due to the non-linearity of the wakefield. |
Fluid Dynamics (physics.flu-dyn) |
A Hybrid Volume-Surface-Wire Integral Equation for the Anisotropic Forward Problem in Electroencephalography Computational Physics. 4 authors. pdf Solving the electroencephalography (EEG) forward problem is a fundamental step in a wide range of applications including biomedical imaging techniques based on inverse source localization. State-of-the-art electromagnetic solvers resort to a computationally expensive volumetric discretization of the full head to account for its complex and heterogeneous electric profile. …The more efficient, popular in biomedical imaging circles, but unfortunately oversimplifying Boundary Element Method (BEM) relies instead on a piecewise-uniform approximation that severely curbs its application in high resolution EEGs. This contribution lifts the standard BEM contraints by treating the local anisotropies with adequate wire and thin volume integral equations that are tailored to specific structures of the fibrous white matter and the inhomogeneous skull. The proposed hybrid integral equation formulation thereby avoids the full volumetric discretization of the head medium and allows for a realistic and efficient BEM-like solution of the anisotropic EEG forward problem. The accuracy and flexibility of the proposed formulation is demonstrated through numerical experiments involving both canonical and realistic MRI-based head models. |
Physics and Society (physics.soc-ph) |
Deconfined quantum criticality in spin-1/2 chains with long-range interactions Computational Physics. 3 authors. pdf We study spin-\(1/2\) chains with long-range power-law decaying unfrustrated (bipartite) Heisenberg exchange \(J_r \propto r^{-\alpha}\) and multi-spin interactions \(Q\) favoring a valence-bond solid (VBS) ground state. Employing quantum Monte Carlo techniques and Lanczos diagonalization, we analyze order parameters and excited-state level crossings to characterize quantum states and phase transitions in the \((\alpha,Q)\) plane. …For weak \(Q\) and sufficiently slowly decaying Heisenberg interactions (small \(\alpha\)), the system has a long-range-ordered antiferromagnetic (AFM) ground state, and upon increasing \(\alpha\) there is a continuous transition into a quasi long-range ordered (QLRO) critical state of the type in the standard Heisenberg chain. For rapidly decaying long-range interactions, there is transition between QLRO and VBS ground states of the same kind as in the frustrated \(J_1\)-\(J_2\) Heisenberg chain. Our most important finding is a direct continuous quantum phase transition between the AFM and VBS states - a close analogy to the 2D deconfined quantum-critical point. In previous 1D analogies the ordered phases both have gapped fractional excitations, and the critical point is a conventional Luttinger Liquid. In our model the excitations fractionalize upon transitioning from the AFM state, changing from spin waves to deconfined spinons. We extract critical exponents at the AFM-VBS transition and use order-parameter distributions to study emergent symmetries. We find emergent O(\(4\)) symmetry of the O(\(3\)) AFM and scalar VBS order parameters. Thus, the order parameter fluctuations exhibit the covariance of a uniaxially deformed O(\(4\)) sphere (an “elliptical” symmetry). This unusual quantum phase transition does not yet have any known field theory description, and our detailed results can serve to guide its construction. We discuss possible experimental realizations. |
Plasma Physics (physics.plasm-ph) |
Impact of environmental changes on the dynamics of temporal networks Data Analysis, Statistics and Probability, Physics and Society. 3 authors. pdf Dynamics of complex social systems has often been described in the framework of temporal networks, where links are considered to exist only at the moment of interaction between nodes. Such interaction patterns are not only driven by internal interaction mechanisms, but also affected by environmental changes. …To investigate the impact of the environmental changes on the dynamics of temporal networks, we analyze several face-to-face interaction datasets using the multiscale entropy (MSE) method to find that the observed temporal correlations can be categorized according to the environmental similarity of datasets such as classes and break times in schools. By devising and studying a temporal network model considering a periodically changing environment as well as a preferential activation mechanism, we numerically show that our model could successfully reproduce various empirical results by the MSE method in terms of multiscale temporal correlations. Our results demonstrate that the environmental changes can play an important role in shaping the dynamics of temporal networks when the interactions between nodes are influenced by the environment of the systems. |
Materials Science (cond-mat.mtrl-sci) |
Role of longitudinal fluctuations in L\(1_0\) FePt Computational Physics, Materials Science. 3 authors. pdf L\(1_0\) FePt is a technologically important material for a range of novel data storage applications. In the ordered FePt structure the normally non-magnetic Pt ion acquires a magnetic moment, which depends on the local field originating from the neighboring Fe atoms. …In this work a model of FePt is constructed, where the induced Pt moment is simulated by using combined longitudinal and rotational spin dynamics. The model is parameterized to include a linear variation of the moment with the exchange field, so that at the Pt site the magnetic moment depends on the Fe ordering. The Curie temperature of FePt is calculated and agrees well with similar models that incorporate the Pt dynamics through an effective Fe-only Hamiltonian. By computing the dynamic correlation function the anisotropy field and the Gilbert damping are extracted over a range of temperatures. The anisotropy exhibits a power-law dependence with temperature with exponent \(n\approx2.1\). This agrees well with what observed experimentally and it is obtained without including a two-ion anisotropy term as in other approaches. Our work shows that incorporating longitudinal fluctuations into spin dynamics calculations is crucial for understanding the properties of materials with induced moments. |
Probing the concept of line tension down to the nanoscale Chemical Physics, Computational Physics, Mesoscale and Nanoscale Physics. 3 authors. pdf A novel mechanical approach is developed to explore by means of atom-scale simulation the concept of line tension at a solid-liquid-vapor contact line as well as its dependence on temperature, confinement, and solid/fluid interactions. More precisely, by estimating the stresses exerted along and normal to a straight contact line formed within a partially wet pore, the line tension can be estimated while avoiding the pitfalls inherent to the geometrical scaling methodology based on hemispherical drops. …The line tension for Lennard-Jones fluids is found to follow a generic behavior with temperature and chemical potential effects that are all included in a simple contact angle parameterization. Former discrepancies between theoretical modeling and molecular simulation are resolved, and the line tension concept is shown to be robust down to molecular confinements. The same qualitative behavior is observed for water but the line tension at the wetting transition diverges or converges towards a finite value depending on the range of the solid/fluid interactions at play. |
Mesoscale and Nanoscale Physics (cond-mat.mes-hall) |
High-precision estimate of the hydrodynamic radius for self-avoiding walks Statistical Mechanics, Computational Physics. 2 authors. pdf The universal asymptotic amplitude ratio between the gyration radius and the hydrodynamic radius of self-avoiding walks is estimated by high-resolution Monte Carlo simulations. By studying chains of length of up to \(N = 2^{25} \approx 34 \times 10^6\) monomers, we find that the ratio takes the value \(R_{\mathrm{G}}/R_{\mathrm{H}} = 1. ...</summary><br>5803940(45)\), which is several orders of magnitude more accurate than the previous state of the art. This is facilitated by a sampling scheme which is quite general, and which allows for the efficient estimation of averages of a large class of observables. The competing corrections to scaling for the hydrodynamic radius are clearly discernible. We also find improved estimates for other universal properties that measure the chain dimension. In particular, a method of analysis which eliminates the leading correction to scaling results in a highly accurate estimate for the Flory exponent of \(\nu = 0.58759700(40)\). |
Statistical Mechanics (cond-mat.stat-mech) |
Electrocrystallization of Supercooled Water Confined between Graphene Layers Materials Science, Mesoscale and Nanoscale Physics, Chemical Physics, Computational Physics, Soft Condensed Matter. 2 authors. pdf A key feature of the crystallization of supercooled water confined in an applied static electric field is that the structural order here is determined not only by usual thermodynamic and kinematic factors (degree of supercooling, difference between chemical potentials for a liquid and a crystal, and viscosity) but also by the strength and direction of the applied electric field, size of a system (size effects), and the geometry of bounding surfaces. In this work, the electrocrystallization of supercooled water confined between ideally flat parallel graphene sheets at a temperature of \(T=268\)~K has been considered in this work. …It has been established that structural order is determined by two characteristic modes. The initial mode correlates with the orientation of dipolar water molecules by the applied electric field. The subsequent mode is characterized by the relaxation of a metastable system to a crystalline phase. The uniform electric field applied perpendicularly to the graphene sheets suppresses structural ordering, whereas the field applied in the lateral direction promotes cubic ice \(I_c\). |
Computation (stat.CO) |
Supervised Hyperalignment for multi-subject fMRI data alignment Neurons and Cognition, Machine Learning, Machine Learning. 4 authors. pdf Hyperalignment has been widely employed in Multivariate Pattern (MVP) analysis to discover the cognitive states in the human brains based on multi-subject functional Magnetic Resonance Imaging (fMRI) datasets. Most of the existing HA methods utilized unsupervised approaches, where they only maximized the correlation between the voxels with the same position in the time series. …However, these unsupervised solutions may not be optimum for handling the functional alignment in the supervised MVP problems. This paper proposes a Supervised Hyperalignment (SHA) method to ensure better functional alignment for MVP analysis, where the proposed method provides a supervised shared space that can maximize the correlation among the stimuli belonging to the same category and minimize the correlation between distinct categories of stimuli. Further, SHA employs a generalized optimization solution, which generates the shared space and calculates the mapped features in a single iteration, hence with optimum time and space complexities for large datasets. Experiments on multi-subject datasets demonstrate that SHA method achieves up to 19% better performance for multi-class problems over the state-of-the-art HA algorithms. |
Semi-automated simultaneous predictor selection for Regression-SARIMA models Methodology. 4 authors. pdf Deciding which predictors to use plays an integral role in deriving statistical models in a wide range of applications. Motivated by the challenges of predicting events across a telecommunications network, we propose a semi-automated, joint model-fitting and predictor selection procedure for linear regression models. …Our approach can model and account for serial correlation in the regression residuals, produces sparse and interpretable models and can be used to jointly select models for a group of related responses. This is achieved through fitting linear models under constraints on the number of non-zero coefficients using a generalisation of a recently developed Mixed Integer Quadratic Optimisation approach. The resultant models from our approach achieve better predictive performance on the motivating telecommunications data than methods currently used by industry. |
Machine Learning (stat.ML) |
Rapid Numerical Approximation Method for Integrated Covariance Functions Over Irregular Data Regions Computation. 3 authors. pdf In many practical applications, spatial data are often collected at areal levels (i.e. …, block data) and the inferences and predictions about the variable at points or blocks different from those at which it has been observed typically depend on integrals of the underlying continuous spatial process. In this paper we describe a method based on Fourier transform by which multiple integrals of covariance functions over irregular data regions may be numerically approximated with the same level of accuracy to traditional methods, but at a greatly reduced computational expense. |
Methodology (stat.ME) |
Importance Gaussian Quadrature Computation. 3 authors. pdf Importance sampling (IS) and numerical integration methods are usually employed for approximating moments of complicated targeted distributions. In its basic procedure, the IS methodology randomly draws samples from a proposal distribution and weights them accordingly, accounting for the mismatch between the target and proposal. …In this work, we present a general framework of numerical integration techniques inspired by the IS methodology. The framework can also be seen as an incorporation of deterministic rules into IS methods, reducing the error of the estimators by several orders of magnitude in several problems of interest. The proposed approach extends the range of applicability of the Gaussian quadrature rules. For instance, the IS perspective allows us to use Gauss-Hermite rules in problems where the integrand is not involving a Gaussian distribution, and even more, when the integrand can only be evaluated up to a normalizing constant, as it is usually the case in Bayesian inference. The novel perspective makes use of recent advances on the multiple IS (MIS) and adaptive (AIS) literatures, and incorporates it to a wider numerical integration framework that combines several numerical integration rules that can be iteratively adapted. We analyze the convergence of the algorithms and provide some representative examples showing the superiority of the proposed approach in terms of performance. |
Optimization and Control (math.OC) |
Minimax Optimal Conditional Independence Testing Statistics Theory, Statistics Theory. 3 authors. pdf We consider the problem of conditional independence testing of \(X\) and \(Y\) given \(Z\) where \(X,Y\) and \(Z\) are three real random variables and \(Z\) is continuous. We focus on two main cases – when \(X\) and \(Y\) are both discrete, and when \(X\) and \(Y\) are both continuous. …In view of recent results on conditional independence testing (Shah and Peters 2018), one cannot hope to design non-trivial tests, which control the type I error for all absolutely continuous conditionally independent distributions, while still ensuring power against interesting alternatives. Consequently, we identify various, natural smoothness assumptions on the conditional distributions of \(X,Y|Z=z\) as \(z\) varies in the support of \(Z\), and study the hardness of conditional independence testing under these smoothness assumptions. We derive matching lower and upper bounds on the critical radius of separation between the null and alternative hypotheses in the total variation metric. The tests we consider are easily implementable and rely on binning the support of the continuous variable \(Z\). To complement these results, we provide a new proof of the hardness result of Shah and Peters and show that in the absence of smoothness assumptions conditional independence testing remains difficult even when \(X,Y\) are discrete variables of finite (and not scaling with the sample-size) support. |
Statistics Theory (math.ST) |
Regularity and stability of feedback relaxed controls Optimization and Control, Machine Learning. 2 authors. pdf This paper proposes a relaxed control regularization with general exploration rewards to design robust feedback controls for multi-dimensional continuous-time stochastic exit time problems. We establish that the regularized control problem admits a H"{o}lder continuous feedback control, and demonstrate that both the value function and the feedback control of the regularized control problem are Lipschitz stable with respect to parameter perturbations. …Moreover, we show that a pre-computed feedback relaxed control has a robust performance in a perturbed system, and derive a first-order sensitivity equation for both the value function and optimal feedback relaxed control. We finally prove first-order monotone convergence of the value functions for relaxed control problems with vanishing exploration parameters, which subsequently enables us to construct the pure exploitation strategy of the original control problem based on the feedback relaxed controls. |
Image and Video Processing (eess.IV) |
An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond Feature and Signal Image and Video Processing, Computer Vision and Pattern Recognition. 5 authors. pdf In this paper, we study a new problem arising from the emerging MPEG standardization effort Video Coding for Machine (VCM), which aims to bridge the gap between visual feature compression and classical video coding. VCM is committed to address the requirement of compact signal representation for both machine and human vision in a more or less scalable way. …To this end, we make endeavors in leveraging the strength of predictive and generative models to support advanced compression techniques for both machine and human vision tasks simultaneously, in which visual features serve as a bridge to connect signal-level and task-level compact representations in a scalable manner. Specifically, we employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern. By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames via a generative model, relying on the appearance of the coded key frames. Meanwhile, the sparse motion pattern is compact and highly effective for high-level vision tasks, e.g. action recognition. Experimental results demonstrate that our method yields much better reconstruction quality compared with the traditional video codecs (0.0063 gain in SSIM), as well as state-of-the-art action recognition performance over highly compressed videos (9.4% gain in recognition accuracy), which showcases a promising paradigm of coding signal for both human and machine vision. |
General Relativity and Quantum Cosmology (gr-qc) |
Deep learning for clustering of continuous gravitational wave candidates Instrumentation and Methods for Astrophysics, Data Analysis, Statistics and Probability, General Relativity and Quantum Cosmology. 2 authors. pdf In searching for continuous gravitational waves over very many (\(\approx 10^{17}\)) templates , clustering is a powerful tool which increases the search sensitivity by identifying and bundling together candidates that are due to the same root cause. We implement a deep learning network that identifies clusters of signal candidates in the output of continuous gravitational wave searches and assess its performance. … |
Quantitative Methods (q-bio.QM) |
Real-time nanodiamond thermometry probing thermogenic responses Mesoscale and Nanoscale Physics, Quantitative Methods, Biological Physics, Quantum Physics. 15 authors. pdf Real-time temperature monitoring inside living organisms provides a direct measure of their biological activities, such as homeostatic thermoregulation and energy metabolism. However, it is challenging to reduce the size of bio-compatible thermometers down to submicrometers despite their potential applications for the thermal imaging of subtissue structures with single-cell resolution. …Light-emitting nanothermometers that remotely sense temperature via optical signals exhibit considerable potential in such high-spatial-resolution thermometry. Here, using quantum nanothermometers based on optically accessible electron spins in nanodiamonds (NDs), we demonstrate real-time temperature monitoring inside () worms. We developed a thermometry system that can measure the temperatures of movable NDs inside live adult worms with a precision of \(\pm 0.22^{\circ}{\rm C}\). Using this system, we determined the increase in temperature based on the thermogenic responses of the worms during the chemical stimuli of mitochondrial uncouplers. Our technique demonstrates sub-micrometer localization of real-time temperature information in living animals and direct identification of their pharmacological thermogenesis. The results obtained facilitate the development of a method to probe subcellular temperature variation inside living organisms and may allow for quantification of their biological activities based on their energy expenditures. |
Quantum Physics (quant-ph) |
Software Mitigation of Crosstalk on Noisy Intermediate-Scale Quantum Computers Emerging Technologies, Quantum Physics. 4 authors. pdf Crosstalk is a major source of noise in Noisy Intermediate-Scale Quantum (NISQ) systems and is a fundamental challenge for hardware design. When multiple instructions are executed in parallel, crosstalk between the instructions can corrupt the quantum state and lead to incorrect program execution. …Our goal is to mitigate the application impact of crosstalk noise through software techniques. This requires (i) accurate characterization of hardware crosstalk, and (ii) intelligent instruction scheduling to serialize the affected operations. Since crosstalk characterization is computationally expensive, we develop optimizations which reduce the characterization overhead. On three 20-qubit IBMQ systems, we demonstrate two orders of magnitude reduction in characterization time (compute time on the QC device) compared to all-pairs crosstalk measurements. Informed by these characterization, we develop a scheduler that judiciously serializes high crosstalk instructions balancing the need to mitigate crosstalk and exponential decoherence errors from serialization. On real-system runs on three IBMQ systems, our scheduler improves the error rate of application circuits by up to 5.6x, compared to the IBM instruction scheduler and offers near-optimal crosstalk mitigation in practice. In a broader picture, the difficulty of mitigating crosstalk has recently driven QC vendors to move towards sparser qubit connectivity or disabling nearby operations entirely in hardware, which can be detrimental to performance. Our work makes the case for software mitigation of crosstalk errors. |