52 new data science research articles were published on 2020-01-06. 20 discussed machine learning.
Yesterday’s counts of submitted papers on www.arxiv.org grouped by primary subject. Click the links in the table to be re-directed to the abstracts below. The links under Subject
will redirect you to abstracts with the primary subject (there can only be one primary subject on arXiv). The links under Category
will redirect you to all publications yesterday with a given tag (primary or secondary).
Subject | Category | N |
---|---|---|
Computer Science (29) | Computer Vision and Pattern Recognition (cs.CV) | 9 (2) |
Machine Learning (cs.LG) | 6 (12) | |
Software Engineering (cs.SE) | 5 | |
Artificial Intelligence (cs.AI) | 2 (2) | |
Neural and Evolutionary Computing (cs.NE) | 2 | |
Computers and Society (cs.CY) | 1 (1) | |
Robotics (cs.RO) | 1 (1) | |
Cryptography and Security (cs.CR) | 1 | |
Data Structures and Algorithms (cs.DS) | 1 | |
Information Retrieval (cs.IR) | 1 | |
Statistics (9) | Methodology (stat.ME) | 4 |
Computation (stat.CO) | 3 | |
Machine Learning (stat.ML) | 2 (12) | |
Elec. Eng. and Systems Science (4) | Image and Video Processing (eess.IV) | 2 (4) |
Audio and Speech Processing (eess.AS) | 2 | |
Physics (3) | Computational Physics (physics.comp-ph) | 2 (2) |
Data Analysis, Statistics and Probability (physics.data-an) | 1 | |
Condensed Matter (2) | Mesoscale and Nanoscale Physics (cond-mat.mes-hall) | 2 |
Mathematics (2) | Statistics Theory (math.ST) | 2 |
Quantitative Finance (2) | Portfolio Management (q-fin.PM) | 1 |
Statistical Finance (q-fin.ST) | 1 | |
Other (1) | Instrumentation and Methods for Astrophysics (astro-ph.IM) | 1 |
This section contains all articles with any tag of stat.AP
, stat.co
, stat.ML
, cs.LG
, q-fin.ST
, q-fin.EC
, or econ-EM
. Only the first two sentences are shown - click the links for more detail.
NA |
A Spectral Hidden Markov Model for Nonstationary Oscillatory Processes Methodology, Applications. 4 authors. pdf We propose to model time-varying periodic and oscillatory processes by means of a hidden Markov model where the states are defined through the spectral properties of a periodic regime. The number of states is unknown along with the relevant periodicities, the role and number of which may vary across states. …We address this inference problem by a Bayesian nonparametric hidden Markov model assuming a sticky hierarchical Dirichlet process for the switching dynamics between different states while the periodicities characterizing each state are explored by means of a trans-dimensional Markov chain Monte Carlo sampling step. We develop the full Bayesian inference algorithm and illustrate the use of our proposed methodology for different simulation studies as well as an application related to respiratory research which focuses on the detection of apnea instances in human breathing traces. |
Discrete event simulation of point processes: A computational complexity analysis on sparse graphs Applications, Computation. 3 authors. pdf We derive new discrete event simulation algorithms for marked time point processes. The main idea is to couple a special structure, namely the associated local independence graph, as defined by Didelez arXiv:0710. …5874, with the activity tracking algorithm [muzy, 2019] for achieving high performance asynchronous simulations. With respect to classical algorithm, this allows reducing drastically the computational complexity, especially when the graph is sparse. [muzy, 2019] A. Muzy. 2019. Exploiting activity for the modeling and simulation of dynamics and learning processes in hierarchical (neurocognitive) systems. (Submitted to) Magazine of Computing in Science & Engineering (2019) |
Applying Information Theory to Design Optimal Filters for Photometric Redshifts Applications, Information Theory, Instrumentation and Methods for Astrophysics, Information Theory. 3 authors. pdf In this paper we apply ideas from information theory to create a method for the design of optimal filters for photometric redshift estimation. We show the method applied to a series of simple example filters in order to motivate an intuition for how photometric redshift estimators respond to the properties of photometric passbands. …We then design a realistic set of six filters covering optical wavelengths that optimize photometric redshifts for \(z <= 2.3\) and \(i < 25.3\). We create a simulated catalog for these optimal filters and use our filters with a photometric redshift estimation code to show that we can improve the standard deviation of the photometric redshift error by 7.1% overall and improve outliers 9.9% over the standard filters proposed for the Large Synoptic Survey Telescope (LSST). We compare features of our optimal filters to LSST and find that the LSST filters incorporate key features for optimal photometric redshift estimation. Finally, we describe how information theory can be applied to a range of optimization problems in astronomy. |
Machine Learning (stat.ML) |
Meta-modal Information Flow: A Method for Capturing Multimodal Modular Disconnectivity in Schizophrenia Machine Learning, Machine Learning, Image and Video Processing. 16 authors. pdf Objective: Multimodal measurements of the same phenomena provide complementary information and highlight different perspectives, albeit each with their own limitations. A focus on a single modality may lead to incorrect inferences, which is especially important when a studied phenomenon is a disease. …In this paper, we introduce a method that takes advantage of multimodal data in addressing the hypotheses of disconnectivity and dysfunction within schizophrenia (SZ). Methods: We start with estimating and visualizing links within and among extracted multimodal data features using a Gaussian graphical model (GGM). We then propose a modularity-based method that can be applied to the GGM to identify links that are associated with mental illness across a multimodal data set. Through simulation and real data, we show our approach reveals important information about disease-related network disruptions that are missed with a focus on a single modality. We use functional MRI (fMRI), diffusion MRI (dMRI), and structural MRI (sMRI) to compute the fractional amplitude of low frequency fluctuations (fALFF), fractional anisotropy (FA), and gray matter (GM) concentration maps. These three modalities are analyzed using our modularity method. Results: Our results show missing links that are only captured by the cross-modal information that may play an important role in disconnectivity between the components. Conclusion: We identified multimodal (fALFF, FA and GM) disconnectivity in the default mode network area in patients with SZ, which would not have been detectable in a single modality. Significance: The proposed approach provides an important new tool for capturing information that is distributed among multiple imaging modalities. |
Development, Demonstration, and Validation of Data-driven Compact Diode Models for Circuit Simulation and Analysis Machine Learning, Machine Learning, Computational Engineering, Finance, and Science. 8 authors. pdf Compact semiconductor device models are essential for efficiently designing and analyzing large circuits. However, traditional compact model development requires a large amount of manual effort and can span many years. …Moreover, inclusion of new physics (eg, radiation effects) into an existing compact model is not trivial and may require redevelopment from scratch. Machine Learning (ML) techniques have the potential to automate and significantly speed up the development of compact models. In addition, ML provides a range of modeling options that can be used to develop hierarchies of compact models tailored to specific circuit design stages. In this paper, we explore three such options: (1) table-based interpolation, (2)Generalized Moving Least-Squares, and (3) feed-forward Deep Neural Networks, to develop compact models for a p-n junction diode. We evaluate the performance of these “data-driven” compact models by (1) comparing their voltage-current characteristics against laboratory data, and (2) building a bridge rectifier circuit using these devices, predicting the circuit’s behavior using SPICE-like circuit simulations, and then comparing these predictions against laboratory measurements of the same circuit. |
Dissecting Catastrophic Forgetting in Continual Learning by Deep Visualization Machine Learning, Machine Learning. 6 authors. pdf Interpreting the behaviors of Deep Neural Networks (usually considered as a black box) is critical especially when they are now being widely adopted over diverse aspects of human life. Taking the advancements from Explainable Artificial Intelligent, this paper proposes a novel technique called Auto DeepVis to dissect catastrophic forgetting in continual learning. …A new method to deal with catastrophic forgetting named critical freezing is also introduced upon investigating the dilemma by Auto DeepVis. Experiments on a captioning model meticulously present how catastrophic forgetting happens, particularly showing which components are forgetting or changing. The effectiveness of our technique is then assessed; and more precisely, critical freezing claims the best performance on both previous and coming tasks over baselines, proving the capability of the investigation. Our techniques could not only be supplementary to existing solutions for completely eradicating catastrophic forgetting for life-long learning but also explainable. |
Deep Learning-Based Solvability of Underdetermined Inverse Problems in Medical Imaging Machine Learning, Machine Learning, Image and Video Processing. 5 authors. pdf Recently, with the significant developments in deep learning techniques, solving underdetermined inverse problems has become one of the major concerns in the medical imaging domain. Typical examples include undersampled magnetic resonance imaging, interior tomography, and sparse-view computed tomography, where deep learning techniques have achieved excellent performances. …Although deep learning methods appear to overcome the limitations of existing mathematical methods when handling various underdetermined problems, there is a lack of rigorous mathematical foundations that would allow us to elucidate the reasons for the remarkable performance of deep learning methods. This study focuses on learning the causal relationship regarding the structure of the training data suitable for deep learning, to solve highly underdetermined inverse problems. We observe that a majority of the problems of solving underdetermined linear systems in medical imaging are highly non-linear. Furthermore, we analyze if a desired reconstruction map can be learnable from the training data and underdetermined system. |
A Block-based Generative Model for Attributed Networks Embedding Machine Learning, Machine Learning. 5 authors. pdf Attributed network embedding has attracted plenty of interests in recent years. It aims to learn task-independent, low-dimension, and continuous vectors for nodes preserving both topology and attribute information. …Most existing methods, such as GCN and its variations, mainly focus on the local information, i.e., the attributes of the neighbors. Thus, they have been well studied for assortative networks but ignored disassortative networks, which are common in real scenes. To address this issue, we propose a block-based generative model for attributed network embedding on a probability perspective inspired by the stochastic block model (SBM). Specifically, the nodes are assigned to several blocks wherein the nodes in the same block share the similar link patterns. These patterns can define assortative networks containing communities or disassortative networks with the multipartite, hub, or any hybrid structures. Concerning the attribute information, we assume that each node has a hidden embedding related to its assigned block, and then we use a neural network to characterize the nonlinearity between the node embedding and its attribute. We perform extensive experiments on real-world and synthetic attributed networks, and the experimental results show that our proposed method remarkably outperforms state-of-the-art embedding methods for both clustering and classification tasks, especially on disassortative networks. |
MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data Machine Learning, Genomics, Machine Learning. 5 authors. pdf Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. …The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using black box matching procedures that are too expensive to run on the entire data set. Using an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC can be applied to extremely large data sets. We analyze the procedure to describe when we can expect it to work well and demonstrate its flexibility and power by applying it to a number of alignment problems arising in the analysis of single cell molecular data. |
Think Locally, Act Globally: Federated Learning with Local and Global Representations Machine Learning, Machine Learning, Distributed, Parallel, and Cluster Computing. 5 authors. pdf Federated learning is an emerging research paradigm to train models on private data distributed over multiple devices. A key challenge involves keeping private all the data on each device and training a global model only by communicating parameters and updates. …Overcoming this problem relies on the global model being sufficiently compact so that the parameters can be efficiently sent over communication channels such as wireless internet. Given the recent trend towards building deeper and larger neural networks, deploying such models in federated settings on real-world tasks is becoming increasingly difficult. To this end, we propose to augment federated learning with local representation learning on each device to learn useful and compact features from raw data. As a result, the global model can be smaller since it only operates on higher-level local representations. We show that our proposed method achieves superior or competitive results when compared to traditional federated approaches on a suite of publicly available real-world datasets spanning image recognition (MNIST, CIFAR) and multimodal learning (VQA). Our choice of local representation learning also reduces the number of parameters and updates that need to be communicated to and from the global model, thereby reducing the bottleneck in terms of communication cost. Finally, we show that our local models provide flexibility in dealing with online heterogeneous data and can be easily modified to learn fair representations that obfuscate protected attributes such as race, age, and gender, a feature crucial to preserving the privacy of on-device data. |
A Note on Portfolio Optimization with Quadratic Transaction Costs Computational Finance, Machine Learning, Portfolio Management. 4 authors. pdf In this short note, we consider mean-variance optimized portfolios with transaction costs. We show that introducing quadratic transaction costs makes the optimization problem more difficult than using linear transaction costs. …The reason lies in the specification of the budget constraint, which is no longer linear. We provide numerical algorithms for solving this issue and illustrate how transaction costs may considerably impact the expected returns of optimized portfolios. |
Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: a case study with the Lorenz 96 model Machine Learning, Machine Learning, Atmospheric and Oceanic Physics. 4 authors. pdf A novel method, based on the combination of data assimilation and machine learning is introduced. The new hybrid approach is designed for a two-fold scope: (i) emulating hidden, possibly chaotic, dynamics and (ii) predicting their future states. …The method consists in applying iteratively a data assimilation step, here an ensemble Kalman filter, and a neural network. Data assimilation is used to optimally combine a surrogate model with sparse noisy data. The output analysis is spatially complete and is used as a training set by the neural network to update the surrogate model. The two steps are then repeated iteratively. Numerical experiments have been carried out using the chaotic 40-variables Lorenz 96 model, proving both convergence and statistical skills of the proposed hybrid approach. The surrogate model shows short-term forecast skills up to two Lyapunov times, the retrieval of positive Lyapunov exponents as well as the more energetic frequencies of the power density spectrum. The sensitivity of the method to critical setup parameters is also presented: forecast skills decrease smoothly with increased observational noise but drops abruptly if less than half of the model domain is observed. The successful synergy between data assimilation and machine learning, proven here with a low-dimensional system, encourages further investigation of such hybrids with more sophisticated dynamics. |
Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints Machine Learning, Machine Learning, Artificial Intelligence. 3 authors. pdf Reinforcement learning can greatly benefit from the use of options as a way of encoding recurring behaviours and to foster exploration. An important open problem is how can an agent autonomously learn useful options when solving particular distributions of related tasks. …We investigate some of the conditions that influence optimality of options, in settings where agents have a limited time budget for learning each task and the task distribution might involve problems with different levels of similarity. We directly search for optimal option sets and show that the discovered options significantly differ depending on factors such as the available learning time budget and that the found options outperform popular option-generation heuristics. |
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification Computer Vision and Pattern Recognition, Machine Learning, Machine Learning. 2 authors. pdf In real-world scenarios, data tends to exhibit a long-tailed, imbalanced distribution. Developing algorithms to deal with such long-tailed distribution thus becomes indispensable in practical applications. …In this paper, we propose a novel self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME). Our method is inspired by the observation that deep Convolutional Neural Networks (CNNs) trained on less imbalanced subsets of the entire long-tailed distribution often yield better performances than their jointly-trained counterparts. We refer to these models as Expert Models', and the proposed LFME framework aggregates the knowledge from multiple Experts’ to learn a unified student model. Specifically, the proposed framework involves two levels of self-paced learning schedules: Self-paced Expert Selection and Self-paced Instance Selection, so that the knowledge is adaptively transferred from multiple Experts' to the Student’. In order to verify the effectiveness of our proposed framework, we conduct extensive experiments on two long-tailed benchmark classification datasets. The experimental results demonstrate that our method is able to achieve superior performances compared to the state-of-the-art methods. We also show that our method can be easily plugged into state-of-the-art long-tailed classification algorithms for further improvements.
|
Topic Extraction of Crawled Documents Collection using Correlated Topic Model in MapReduce Framework Machine Learning, Computation and Language, Information Retrieval, Machine Learning. 2 authors. pdf The tremendous increase in the amount of available research documents impels researchers to propose topic models to extract the latent semantic themes of a documents collection. However, how to extract the hidden topics of the documents collection has become a crucial task for many topic model applications. …Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of documents collection increases. In this paper, the Correlated Topic Model with variational Expectation-Maximization algorithm is implemented in MapReduce framework to solve the scalability problem. The proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed approach has a comparable performance in terms of topic coherences with LDA implemented in MapReduce framework. |
ARA : Aggregated RAPPOR and Analysis for Centralized Differential Privacy Machine Learning, Cryptography and Security, Machine Learning. 2 authors. pdf Differential privacy(DP) has now become a standard in case of sensitive statistical data analysis. The two main approaches in DP is local and central. …Both the approaches have a clear gap in terms of data storing,amount of data to be analyzed, analysis, speed etc. Local wins on the speed. We have tested the state of the art standard RAPPOR which is a local approach and supported this gap. Our work completely focuses on that part too. Here, we propose a model which initially collects RAPPOR reports from multiple clients which are then pushed to a Tf-Idf estimation model. The Tf-Idf estimation model then estimates the reports on the basis of the occurrence of “on bit” in a particular position and its contribution to that position. Thus it generates a centralized differential privacy analysis from multiple clients. Our model successfully and efficiently analyzed the major truth value every time. |
Estimation of the spatial weighting matrix for regular lattice data – An adaptive lasso approach with cross-sectional resampling Machine Learning, Computation. 2 authors. pdf Spatial econometric research typically relies on the assumption that the spatial dependence structure is known in advance and is represented by a deterministic spatial weights matrix. Contrary to classical approaches, we investigate the estimation of sparse spatial dependence structures for regular lattice data. …In particular, an adaptive least absolute shrinkage and selection operator (lasso) is used to select and estimate the individual connections of the spatial weights matrix. To recover the spatial dependence structure, we propose cross-sectional resampling, assuming that the random process is exchangeable. The estimation procedure is based on a two-step approach to circumvent simultaneity issues that typically arise from endogenous spatial autoregressive dependencies. The two-step adaptive lasso approach with cross-sectional resampling is verified using Monte Carlo simulations. Eventually, we apply the procedure to model nitrogen dioxide (\(\mathrm{NO_2}\)) concentrations and show that estimating the spatial dependence structure contrary to using prespecified weights matrices improves the prediction accuracy considerably. |
Machine Learning (cs.LG) |
Meta-modal Information Flow: A Method for Capturing Multimodal Modular Disconnectivity in Schizophrenia Machine Learning, Machine Learning, Image and Video Processing. 16 authors. pdf Objective: Multimodal measurements of the same phenomena provide complementary information and highlight different perspectives, albeit each with their own limitations. A focus on a single modality may lead to incorrect inferences, which is especially important when a studied phenomenon is a disease. …In this paper, we introduce a method that takes advantage of multimodal data in addressing the hypotheses of disconnectivity and dysfunction within schizophrenia (SZ). Methods: We start with estimating and visualizing links within and among extracted multimodal data features using a Gaussian graphical model (GGM). We then propose a modularity-based method that can be applied to the GGM to identify links that are associated with mental illness across a multimodal data set. Through simulation and real data, we show our approach reveals important information about disease-related network disruptions that are missed with a focus on a single modality. We use functional MRI (fMRI), diffusion MRI (dMRI), and structural MRI (sMRI) to compute the fractional amplitude of low frequency fluctuations (fALFF), fractional anisotropy (FA), and gray matter (GM) concentration maps. These three modalities are analyzed using our modularity method. Results: Our results show missing links that are only captured by the cross-modal information that may play an important role in disconnectivity between the components. Conclusion: We identified multimodal (fALFF, FA and GM) disconnectivity in the default mode network area in patients with SZ, which would not have been detectable in a single modality. Significance: The proposed approach provides an important new tool for capturing information that is distributed among multiple imaging modalities. |
Multi-scale domain-adversarial multiple-instance CNN for cancer subtype classification with non-annotated histopathological images Computer Vision and Pattern Recognition, Machine Learning, Image and Video Processing. 10 authors. pdf We propose a new method for cancer subtype classification from histopathological images, which can automatically detect tumor-specific features in a given whole slide image (WSI). The cancer subtype should be classified by referring to a WSI, i. …e., a large size image (typically 40,000x40,000 pixels) of an entire pathological tissue slide, which consists of cancer and non-cancer portions. One difficulty for constructing cancer subtype classifiers comes from the high cost needed for annotating WSIs; without annotation, we have to construct the tumor region detector without knowing true labels. Furthermore, both global and local image features must be extracted from the WSI by changing the magnifications of the image. In addition, the image features should be stably detected against the variety/difference of staining among the hospitals/specimen. In this paper, we develop a new CNN-based cancer subtype classification method by effectively combining multiple-instance, domain adversarial, and multi-scale learning frameworks that can overcome these practical difficulties. When the proposed method was applied to malignant lymphoma subtype classifications of 196 cases collected from multiple hospitals, the classification performance was significantly better than the standard CNN or other conventional methods, and the accuracy was favorably compared to that of standard pathologists. In addition, we confirmed by immunostaining and expert pathologist’s visual inspections that the tumor regions were correctly detected. |
Development, Demonstration, and Validation of Data-driven Compact Diode Models for Circuit Simulation and Analysis Machine Learning, Machine Learning, Computational Engineering, Finance, and Science. 8 authors. pdf Compact semiconductor device models are essential for efficiently designing and analyzing large circuits. However, traditional compact model development requires a large amount of manual effort and can span many years. …Moreover, inclusion of new physics (eg, radiation effects) into an existing compact model is not trivial and may require redevelopment from scratch. Machine Learning (ML) techniques have the potential to automate and significantly speed up the development of compact models. In addition, ML provides a range of modeling options that can be used to develop hierarchies of compact models tailored to specific circuit design stages. In this paper, we explore three such options: (1) table-based interpolation, (2)Generalized Moving Least-Squares, and (3) feed-forward Deep Neural Networks, to develop compact models for a p-n junction diode. We evaluate the performance of these “data-driven” compact models by (1) comparing their voltage-current characteristics against laboratory data, and (2) building a bridge rectifier circuit using these devices, predicting the circuit’s behavior using SPICE-like circuit simulations, and then comparing these predictions against laboratory measurements of the same circuit. |
Dissecting Catastrophic Forgetting in Continual Learning by Deep Visualization Machine Learning, Machine Learning. 6 authors. pdf Interpreting the behaviors of Deep Neural Networks (usually considered as a black box) is critical especially when they are now being widely adopted over diverse aspects of human life. Taking the advancements from Explainable Artificial Intelligent, this paper proposes a novel technique called Auto DeepVis to dissect catastrophic forgetting in continual learning. …A new method to deal with catastrophic forgetting named critical freezing is also introduced upon investigating the dilemma by Auto DeepVis. Experiments on a captioning model meticulously present how catastrophic forgetting happens, particularly showing which components are forgetting or changing. The effectiveness of our technique is then assessed; and more precisely, critical freezing claims the best performance on both previous and coming tasks over baselines, proving the capability of the investigation. Our techniques could not only be supplementary to existing solutions for completely eradicating catastrophic forgetting for life-long learning but also explainable. |
TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images Computer Vision and Pattern Recognition, Machine Learning, Image and Video Processing. 5 authors. pdf With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. …This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection. |
Deep Learning-Based Solvability of Underdetermined Inverse Problems in Medical Imaging Machine Learning, Machine Learning, Image and Video Processing. 5 authors. pdf Recently, with the significant developments in deep learning techniques, solving underdetermined inverse problems has become one of the major concerns in the medical imaging domain. Typical examples include undersampled magnetic resonance imaging, interior tomography, and sparse-view computed tomography, where deep learning techniques have achieved excellent performances. …Although deep learning methods appear to overcome the limitations of existing mathematical methods when handling various underdetermined problems, there is a lack of rigorous mathematical foundations that would allow us to elucidate the reasons for the remarkable performance of deep learning methods. This study focuses on learning the causal relationship regarding the structure of the training data suitable for deep learning, to solve highly underdetermined inverse problems. We observe that a majority of the problems of solving underdetermined linear systems in medical imaging are highly non-linear. Furthermore, we analyze if a desired reconstruction map can be learnable from the training data and underdetermined system. |
A Block-based Generative Model for Attributed Networks Embedding Machine Learning, Machine Learning. 5 authors. pdf Attributed network embedding has attracted plenty of interests in recent years. It aims to learn task-independent, low-dimension, and continuous vectors for nodes preserving both topology and attribute information. …Most existing methods, such as GCN and its variations, mainly focus on the local information, i.e., the attributes of the neighbors. Thus, they have been well studied for assortative networks but ignored disassortative networks, which are common in real scenes. To address this issue, we propose a block-based generative model for attributed network embedding on a probability perspective inspired by the stochastic block model (SBM). Specifically, the nodes are assigned to several blocks wherein the nodes in the same block share the similar link patterns. These patterns can define assortative networks containing communities or disassortative networks with the multipartite, hub, or any hybrid structures. Concerning the attribute information, we assume that each node has a hidden embedding related to its assigned block, and then we use a neural network to characterize the nonlinearity between the node embedding and its attribute. We perform extensive experiments on real-world and synthetic attributed networks, and the experimental results show that our proposed method remarkably outperforms state-of-the-art embedding methods for both clustering and classification tasks, especially on disassortative networks. |
High-speed Autonomous Drifting with Deep Reinforcement Learning Systems and Control, Machine Learning, Robotics. 5 authors. pdf Drifting is a complicated task for autonomous vehicle control. Most traditional methods in this area are based on motion equations derived by the understanding of vehicle dynamics, which is difficult to be modeled precisely. …We propose a robust drift controller without explicit motion equations, which is based on the latest model-free deep reinforcement learning algorithm soft actor-critic. The drift control problem is formulated as a trajectory following task, where the errorbased state and reward are designed. After being trained on tracks with different levels of difficulty, our controller is capable of making the vehicle drift through various sharp corners quickly and stably in the unseen map. The proposed controller is further shown to have excellent generalization ability, which can directly handle unseen vehicle types with different physical properties, such as mass, tire friction, etc. |
MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data Machine Learning, Genomics, Machine Learning. 5 authors. pdf Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. …The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using black box matching procedures that are too expensive to run on the entire data set. Using an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC can be applied to extremely large data sets. We analyze the procedure to describe when we can expect it to work well and demonstrate its flexibility and power by applying it to a number of alignment problems arising in the analysis of single cell molecular data. |
Think Locally, Act Globally: Federated Learning with Local and Global Representations Machine Learning, Machine Learning, Distributed, Parallel, and Cluster Computing. 5 authors. pdf Federated learning is an emerging research paradigm to train models on private data distributed over multiple devices. A key challenge involves keeping private all the data on each device and training a global model only by communicating parameters and updates. …Overcoming this problem relies on the global model being sufficiently compact so that the parameters can be efficiently sent over communication channels such as wireless internet. Given the recent trend towards building deeper and larger neural networks, deploying such models in federated settings on real-world tasks is becoming increasingly difficult. To this end, we propose to augment federated learning with local representation learning on each device to learn useful and compact features from raw data. As a result, the global model can be smaller since it only operates on higher-level local representations. We show that our proposed method achieves superior or competitive results when compared to traditional federated approaches on a suite of publicly available real-world datasets spanning image recognition (MNIST, CIFAR) and multimodal learning (VQA). Our choice of local representation learning also reduces the number of parameters and updates that need to be communicated to and from the global model, thereby reducing the bottleneck in terms of communication cost. Finally, we show that our local models provide flexibility in dealing with online heterogeneous data and can be easily modified to learn fair representations that obfuscate protected attributes such as race, age, and gender, a feature crucial to preserving the privacy of on-device data. |
Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations Computer Vision and Pattern Recognition, Machine Learning. 4 authors. pdf The goal of many computer vision systems is to transform image pixels into 3D representations. Recent popular models use neural networks to regress directly from pixels to 3D object parameters. …Such an approach works well when supervision is available, but in problems like human pose and shape estimation, it is difficult to obtain natural images with 3D ground truth. To go one step further, we propose a new architecture that facilitates unsupervised, or lightly supervised, learning. The idea is to break the problem into a series of transformations between increasingly abstract representations. Each step involves a cycle designed to be learnable without annotated training data, and the chain of cycles delivers the final solution. Specifically, we use 2D body part segments as an intermediate representation that contains enough information to be lifted to 3D, and at the same time is simple enough to be learned in an unsupervised way. We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images. We also explore varying amounts of paired data and show that cycling greatly alleviates the need for paired data. While we present results for modeling humans, our formulation is general and can be applied to other vision problems. |
Social Media Attributions in the Context of Water Crisis Machine Learning, Computers and Society. 4 authors. pdf Attribution of natural disasters/collective misfortune is a widely-studied political science problem. However, such studies are typically survey-centric or rely on a handful of experts to weigh in on the matter. …In this paper, we explore how can we use social media data and an AI-driven approach to complement traditional surveys and automatically extract attribution factors. We focus on the most-recent Chennai water crisis which started off as a regional issue but rapidly escalated into a discussion topic with global importance following alarming water-crisis statistics. Specifically, we present a novel prediction task of attribution tie detection which identifies the factors held responsible for the crisis (e.g., poor city planning, exploding population etc.). On a challenging data set constructed from YouTube comments (72,098 comments posted by 43,859 users on 623 relevant videos to the crisis), we present a neural classifier to extract attribution ties that achieved a reasonable performance (Accuracy: 81.34% on attribution detection and 71.19% on attribution resolution). |
Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: a case study with the Lorenz 96 model Machine Learning, Machine Learning, Atmospheric and Oceanic Physics. 4 authors. pdf A novel method, based on the combination of data assimilation and machine learning is introduced. The new hybrid approach is designed for a two-fold scope: (i) emulating hidden, possibly chaotic, dynamics and (ii) predicting their future states. …The method consists in applying iteratively a data assimilation step, here an ensemble Kalman filter, and a neural network. Data assimilation is used to optimally combine a surrogate model with sparse noisy data. The output analysis is spatially complete and is used as a training set by the neural network to update the surrogate model. The two steps are then repeated iteratively. Numerical experiments have been carried out using the chaotic 40-variables Lorenz 96 model, proving both convergence and statistical skills of the proposed hybrid approach. The surrogate model shows short-term forecast skills up to two Lyapunov times, the retrieval of positive Lyapunov exponents as well as the more energetic frequencies of the power density spectrum. The sensitivity of the method to critical setup parameters is also presented: forecast skills decrease smoothly with increased observational noise but drops abruptly if less than half of the model domain is observed. The successful synergy between data assimilation and machine learning, proven here with a low-dimensional system, encourages further investigation of such hybrids with more sophisticated dynamics. |
Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints Machine Learning, Machine Learning, Artificial Intelligence. 3 authors. pdf Reinforcement learning can greatly benefit from the use of options as a way of encoding recurring behaviours and to foster exploration. An important open problem is how can an agent autonomously learn useful options when solving particular distributions of related tasks. …We investigate some of the conditions that influence optimality of options, in settings where agents have a limited time budget for learning each task and the task distribution might involve problems with different levels of similarity. We directly search for optimal option sets and show that the discovered options significantly differ depending on factors such as the available learning time budget and that the found options outperform popular option-generation heuristics. |
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification Computer Vision and Pattern Recognition, Machine Learning, Machine Learning. 2 authors. pdf In real-world scenarios, data tends to exhibit a long-tailed, imbalanced distribution. Developing algorithms to deal with such long-tailed distribution thus becomes indispensable in practical applications. …In this paper, we propose a novel self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME). Our method is inspired by the observation that deep Convolutional Neural Networks (CNNs) trained on less imbalanced subsets of the entire long-tailed distribution often yield better performances than their jointly-trained counterparts. We refer to these models as Expert Models', and the proposed LFME framework aggregates the knowledge from multiple Experts’ to learn a unified student model. Specifically, the proposed framework involves two levels of self-paced learning schedules: Self-paced Expert Selection and Self-paced Instance Selection, so that the knowledge is adaptively transferred from multiple Experts' to the Student’. In order to verify the effectiveness of our proposed framework, we conduct extensive experiments on two long-tailed benchmark classification datasets. The experimental results demonstrate that our method is able to achieve superior performances compared to the state-of-the-art methods. We also show that our method can be easily plugged into state-of-the-art long-tailed classification algorithms for further improvements.
|
How neural networks find generalizable solutions: Self-tuned annealing in deep learning Machine Learning, Data Analysis, Statistics and Probability, Statistical Mechanics, Adaptation and Self-Organizing Systems. 2 authors. pdf Despite the tremendous success of Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the high-dimensional weight space. By analyzing the learning dynamics and loss function landscape, we discover a robust inverse relation between the weight variance and the landscape flatness (inverse of curvature) for all SGD-based learning algorithms. …To explain the inverse variance-flatness relation, we develop a random landscape theory, which shows that the SGD noise strength (effective temperature) depends inversely on the landscape flatness. Our study indicates that SGD attains a self-tuned landscape-dependent annealing strategy to find generalizable solutions at the flat minima of the landscape. Finally, we demonstrate how these new theoretical insights lead to more efficient algorithms, e.g., for avoiding catastrophic forgetting. |
Topic Extraction of Crawled Documents Collection using Correlated Topic Model in MapReduce Framework Machine Learning, Computation and Language, Information Retrieval, Machine Learning. 2 authors. pdf The tremendous increase in the amount of available research documents impels researchers to propose topic models to extract the latent semantic themes of a documents collection. However, how to extract the hidden topics of the documents collection has become a crucial task for many topic model applications. …Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of documents collection increases. In this paper, the Correlated Topic Model with variational Expectation-Maximization algorithm is implemented in MapReduce framework to solve the scalability problem. The proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed approach has a comparable performance in terms of topic coherences with LDA implemented in MapReduce framework. |
ARA : Aggregated RAPPOR and Analysis for Centralized Differential Privacy Machine Learning, Cryptography and Security, Machine Learning. 2 authors. pdf Differential privacy(DP) has now become a standard in case of sensitive statistical data analysis. The two main approaches in DP is local and central. …Both the approaches have a clear gap in terms of data storing,amount of data to be analyzed, analysis, speed etc. Local wins on the speed. We have tested the state of the art standard RAPPOR which is a local approach and supported this gap. Our work completely focuses on that part too. Here, we propose a model which initially collects RAPPOR reports from multiple clients which are then pushed to a Tf-Idf estimation model. The Tf-Idf estimation model then estimates the reports on the basis of the occurrence of “on bit” in a particular position and its contribution to that position. Thus it generates a centralized differential privacy analysis from multiple clients. Our model successfully and efficiently analyzed the major truth value every time. |
Statistical Finance (q-fin.ST) |
Disentangling shock diffusion on complex networks: Identification through graph planarity Statistical Finance. 3 authors. pdf Large scale networks delineating collective dynamics often exhibit cascading failures across nodes leading to a system-wide collapse. Prominent examples of such phenomena would include collapse on financial and economic networks. …Intertwined nature of the dynamics of nodes in such network makes it difficult to disentangle the source and destination of a shock that percolates through the network, a property known as reflexivity. In this article, a novel methodology is proposed which combines vector autoregression model with an unique identification restrictions obtained from the topological structure of the network to uniquely characterize cascades. In particular, we show that planarity of the network allows us to statistically estimate a dynamical process consistent with the observed network and thereby uniquely identify a path for shock propagation from any chosen epicenter to all other nodes in the network. We analyze the distress propagation mechanism in closed loops giving rise to a detailed picture of the effect of feedback loops in transmitting shocks. We show usefulness and applications of the algorithm in two networks with dynamics at different time-scales: worldwide GDP growth network and stock network. In both cases, we observe that the model predicts the impact of the shocks emanating from the US would be concentrated within the cluster of developed countries and the developing countries show very muted response, which is consistent with empirical observations over the past decade. |
The tables below show abstracts organized by category with hyperlinks back to the arXiv site.
Computer Vision and Pattern Recognition (cs.CV) |
Meta-modal Information Flow: A Method for Capturing Multimodal Modular Disconnectivity in Schizophrenia Machine Learning, Machine Learning, Image and Video Processing. 16 authors. pdf Objective: Multimodal measurements of the same phenomena provide complementary information and highlight different perspectives, albeit each with their own limitations. A focus on a single modality may lead to incorrect inferences, which is especially important when a studied phenomenon is a disease. …In this paper, we introduce a method that takes advantage of multimodal data in addressing the hypotheses of disconnectivity and dysfunction within schizophrenia (SZ). Methods: We start with estimating and visualizing links within and among extracted multimodal data features using a Gaussian graphical model (GGM). We then propose a modularity-based method that can be applied to the GGM to identify links that are associated with mental illness across a multimodal data set. Through simulation and real data, we show our approach reveals important information about disease-related network disruptions that are missed with a focus on a single modality. We use functional MRI (fMRI), diffusion MRI (dMRI), and structural MRI (sMRI) to compute the fractional amplitude of low frequency fluctuations (fALFF), fractional anisotropy (FA), and gray matter (GM) concentration maps. These three modalities are analyzed using our modularity method. Results: Our results show missing links that are only captured by the cross-modal information that may play an important role in disconnectivity between the components. Conclusion: We identified multimodal (fALFF, FA and GM) disconnectivity in the default mode network area in patients with SZ, which would not have been detectable in a single modality. Significance: The proposed approach provides an important new tool for capturing information that is distributed among multiple imaging modalities. |
Multi-scale domain-adversarial multiple-instance CNN for cancer subtype classification with non-annotated histopathological images Computer Vision and Pattern Recognition, Machine Learning, Image and Video Processing. 10 authors. pdf We propose a new method for cancer subtype classification from histopathological images, which can automatically detect tumor-specific features in a given whole slide image (WSI). The cancer subtype should be classified by referring to a WSI, i. …e., a large size image (typically 40,000x40,000 pixels) of an entire pathological tissue slide, which consists of cancer and non-cancer portions. One difficulty for constructing cancer subtype classifiers comes from the high cost needed for annotating WSIs; without annotation, we have to construct the tumor region detector without knowing true labels. Furthermore, both global and local image features must be extracted from the WSI by changing the magnifications of the image. In addition, the image features should be stably detected against the variety/difference of staining among the hospitals/specimen. In this paper, we develop a new CNN-based cancer subtype classification method by effectively combining multiple-instance, domain adversarial, and multi-scale learning frameworks that can overcome these practical difficulties. When the proposed method was applied to malignant lymphoma subtype classifications of 196 cases collected from multiple hospitals, the classification performance was significantly better than the standard CNN or other conventional methods, and the accuracy was favorably compared to that of standard pathologists. In addition, we confirmed by immunostaining and expert pathologist’s visual inspections that the tumor regions were correctly detected. |
Development, Demonstration, and Validation of Data-driven Compact Diode Models for Circuit Simulation and Analysis Machine Learning, Machine Learning, Computational Engineering, Finance, and Science. 8 authors. pdf Compact semiconductor device models are essential for efficiently designing and analyzing large circuits. However, traditional compact model development requires a large amount of manual effort and can span many years. …Moreover, inclusion of new physics (eg, radiation effects) into an existing compact model is not trivial and may require redevelopment from scratch. Machine Learning (ML) techniques have the potential to automate and significantly speed up the development of compact models. In addition, ML provides a range of modeling options that can be used to develop hierarchies of compact models tailored to specific circuit design stages. In this paper, we explore three such options: (1) table-based interpolation, (2)Generalized Moving Least-Squares, and (3) feed-forward Deep Neural Networks, to develop compact models for a p-n junction diode. We evaluate the performance of these “data-driven” compact models by (1) comparing their voltage-current characteristics against laboratory data, and (2) building a bridge rectifier circuit using these devices, predicting the circuit’s behavior using SPICE-like circuit simulations, and then comparing these predictions against laboratory measurements of the same circuit. |
Dissecting Catastrophic Forgetting in Continual Learning by Deep Visualization Machine Learning, Machine Learning. 6 authors. pdf Interpreting the behaviors of Deep Neural Networks (usually considered as a black box) is critical especially when they are now being widely adopted over diverse aspects of human life. Taking the advancements from Explainable Artificial Intelligent, this paper proposes a novel technique called Auto DeepVis to dissect catastrophic forgetting in continual learning. …A new method to deal with catastrophic forgetting named critical freezing is also introduced upon investigating the dilemma by Auto DeepVis. Experiments on a captioning model meticulously present how catastrophic forgetting happens, particularly showing which components are forgetting or changing. The effectiveness of our technique is then assessed; and more precisely, critical freezing claims the best performance on both previous and coming tasks over baselines, proving the capability of the investigation. Our techniques could not only be supplementary to existing solutions for completely eradicating catastrophic forgetting for life-long learning but also explainable. |
TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images Computer Vision and Pattern Recognition, Machine Learning, Image and Video Processing. 5 authors. pdf With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. …This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection. |
A Block-based Generative Model for Attributed Networks Embedding Machine Learning, Machine Learning. 5 authors. pdf Attributed network embedding has attracted plenty of interests in recent years. It aims to learn task-independent, low-dimension, and continuous vectors for nodes preserving both topology and attribute information. …Most existing methods, such as GCN and its variations, mainly focus on the local information, i.e., the attributes of the neighbors. Thus, they have been well studied for assortative networks but ignored disassortative networks, which are common in real scenes. To address this issue, we propose a block-based generative model for attributed network embedding on a probability perspective inspired by the stochastic block model (SBM). Specifically, the nodes are assigned to several blocks wherein the nodes in the same block share the similar link patterns. These patterns can define assortative networks containing communities or disassortative networks with the multipartite, hub, or any hybrid structures. Concerning the attribute information, we assume that each node has a hidden embedding related to its assigned block, and then we use a neural network to characterize the nonlinearity between the node embedding and its attribute. We perform extensive experiments on real-world and synthetic attributed networks, and the experimental results show that our proposed method remarkably outperforms state-of-the-art embedding methods for both clustering and classification tasks, especially on disassortative networks. |
High-speed Autonomous Drifting with Deep Reinforcement Learning Systems and Control, Machine Learning, Robotics. 5 authors. pdf Drifting is a complicated task for autonomous vehicle control. Most traditional methods in this area are based on motion equations derived by the understanding of vehicle dynamics, which is difficult to be modeled precisely. …We propose a robust drift controller without explicit motion equations, which is based on the latest model-free deep reinforcement learning algorithm soft actor-critic. The drift control problem is formulated as a trajectory following task, where the errorbased state and reward are designed. After being trained on tracks with different levels of difficulty, our controller is capable of making the vehicle drift through various sharp corners quickly and stably in the unseen map. The proposed controller is further shown to have excellent generalization ability, which can directly handle unseen vehicle types with different physical properties, such as mass, tire friction, etc. |
Deep Snake for Real-Time Instance Segmentation Computer Vision and Pattern Recognition. 5 authors. pdf This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation. Unlike some recent methods that directly regress the coordinates of the object boundary points from an image, deep snake uses a neural network to iteratively deform an initial contour to the object boundary, which implements the classic idea of snake algorithms with a learning-based approach. …For structured feature learning on the contour, we propose to use circular convolution in deep snake, which better exploits the cycle-graph structure of a contour compared against generic graph convolution. Based on deep snake, we develop a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in initial object localization. Experiments show that the proposed approach achieves state-of-the-art performances on the Cityscapes, Kins and Sbd datasets while being efficient for real-time instance segmentation, 32.3 fps for 512$$512 images on a 1080Ti GPU. The code will be available at https://github.com/zju3dv/snake/. |
Thermal coupling and effect of subharmonic synchronization in a system of two VO2 based oscillators Neural and Evolutionary Computing, Emerging Technologies. 5 authors. pdf We explore a prototype of an oscillatory neural network (ONN) based on vanadium dioxide switching devices. The model system under study represents two oscillators based on thermally coupled VO2 switches. …Numerical simulation shows that the effective action radius RTC of coupling depends both on the total energy released during switching and on the average power. It is experimentally and numerically proved that the temperature change dT commences almost synchronously with the released power peak and T-coupling reveals itself up to a frequency of about 10 kHz. For the studied switching structure configuration, the RTC value varies over a wide range from 4 to 45 mkm, depending on the external circuit capacitance C and resistance Ri, but the variation of Ri is more promising from the practical viewpoint. In the case of a “weak” coupling, synchronization is accompanied by attraction effect and decrease of the main spectra harmonics width. In the case of a “strong” coupling, the number of effects increases, synchronization can occur on subharmonics resulting in multilevel stable synchronization of two oscillators. An advanced algorithm for synchronization efficiency and subharmonic ratio calculation is proposed. It is shown that of the two oscillators the leading one is that with a higher main frequency, and, in addition, the frequency stabilization effect is observed. Also, in the case of a strong thermal coupling, the limit of the supply current parameters, for which the oscillations exist, expands by ~ 10 %. The obtained results have a universal character and open up a new kind of coupling in ONNs, namely, T-coupling, which allows for easy transition from 2D to 3D integration. The effect of subharmonic synchronization hold promise for application in classification and pattern recognition. |
Machine Learning (cs.LG) |
The SmartSHARK Ecosystem for Software Repository Mining Software Engineering. 5 authors. pdf Software repository mining is the foundation for many empirical software engineering studies. The collection and analysis of detailed data can be challenging, especially if data shall be shared to enable replicable research and open science practices. …SmartSHARK is an ecosystem that supports replicable and reproducible research based on software repository mining. |
Think Locally, Act Globally: Federated Learning with Local and Global Representations Machine Learning, Machine Learning, Distributed, Parallel, and Cluster Computing. 5 authors. pdf Federated learning is an emerging research paradigm to train models on private data distributed over multiple devices. A key challenge involves keeping private all the data on each device and training a global model only by communicating parameters and updates. …Overcoming this problem relies on the global model being sufficiently compact so that the parameters can be efficiently sent over communication channels such as wireless internet. Given the recent trend towards building deeper and larger neural networks, deploying such models in federated settings on real-world tasks is becoming increasingly difficult. To this end, we propose to augment federated learning with local representation learning on each device to learn useful and compact features from raw data. As a result, the global model can be smaller since it only operates on higher-level local representations. We show that our proposed method achieves superior or competitive results when compared to traditional federated approaches on a suite of publicly available real-world datasets spanning image recognition (MNIST, CIFAR) and multimodal learning (VQA). Our choice of local representation learning also reduces the number of parameters and updates that need to be communicated to and from the global model, thereby reducing the bottleneck in terms of communication cost. Finally, we show that our local models provide flexibility in dealing with online heterogeneous data and can be easily modified to learn fair representations that obfuscate protected attributes such as race, age, and gender, a feature crucial to preserving the privacy of on-device data. |
Frequency Fitness Assignment: Making Optimization Algorithms Invariant under Bijective Transformations of the Objective Function Combinatorics, Artificial Intelligence, Neural and Evolutionary Computing. 4 authors. pdf Under Frequency Fitness Assignment (FFA), the fitness corresponding to an objective value is its encounter frequency in fitness assignment steps and is subject to minimization. FFA renders optimization processes invariant under bijective transformations of the objective function. …This is the strongest invariance property of any optimization procedure to our knowledge. On TwoMax, Jump, and Trap functions of scale s, a (1+1)-EA with standard mutation at rate 1/s can have expected running times exponential in s. In our experiments, a (1+1)-FEA, the same algorithm but using FFA, exhibits mean running times quadratic in s. Since Jump and Trap are bijective transformations of OneMax, it behaves identical on all three. On the LeadingOnes and Plateau problems, it seems to be slower than the (1+1)-EA by a factor linear in s. The (1+1)-FEA performs much better than the (1+1)-EA on W-Model and MaxSat instances. Due to the bijection invariance, the behavior of an optimization algorithm using FFA does not change when the objective values are encrypted. We verify this by applying the Md5 checksum computation as transformation to some of the above problems and yield the same behaviors. Finally, FFA can improve the performance of a Memetic Algorithm for Job Shop Scheduling. |
A Rule-Based Model for Victim Prediction Computers and Society, Artificial Intelligence. 4 authors. pdf In this paper, we proposed a novel automated model, called Vulnerability Index for Population at Risk (VIPAR) scores, to identify rare populations for their future shooting victimizations. Likewise, the focused deterrence approach identifies vulnerable individuals and offers certain types of treatments (e. …g., outreach services) to prevent violence in communities. The proposed rule-based engine model is the first AI-based model for victim prediction. This paper aims to compare the list of focused deterrence strategy with the VIPAR score list regarding their predictive power for the future shooting victimizations. Drawing on the criminological studies, the model uses age, past criminal history, and peer influence as the main predictors of future violence. Social network analysis is employed to measure the influence of peers on the outcome variable. The model also uses logistic regression analysis to verify the variable selections. Our empirical results show that VIPAR scores predict 25.8% of future shooting victims and 32.2% of future shooting suspects, whereas focused deterrence list predicts 13% of future shooting victims and 9.4% of future shooting suspects. The model outperforms the intelligence list of focused deterrence policies in predicting the future fatal and non-fatal shootings. Furthermore, we discuss the concerns about the presumption of innocence right. |
Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations Computer Vision and Pattern Recognition, Machine Learning. 4 authors. pdf The goal of many computer vision systems is to transform image pixels into 3D representations. Recent popular models use neural networks to regress directly from pixels to 3D object parameters. …Such an approach works well when supervision is available, but in problems like human pose and shape estimation, it is difficult to obtain natural images with 3D ground truth. To go one step further, we propose a new architecture that facilitates unsupervised, or lightly supervised, learning. The idea is to break the problem into a series of transformations between increasingly abstract representations. Each step involves a cycle designed to be learnable without annotated training data, and the chain of cycles delivers the final solution. Specifically, we use 2D body part segments as an intermediate representation that contains enough information to be lifted to 3D, and at the same time is simple enough to be learned in an unsupervised way. We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images. We also explore varying amounts of paired data and show that cycling greatly alleviates the need for paired data. While we present results for modeling humans, our formulation is general and can be applied to other vision problems. |
Social Media Attributions in the Context of Water Crisis Machine Learning, Computers and Society. 4 authors. pdf Attribution of natural disasters/collective misfortune is a widely-studied political science problem. However, such studies are typically survey-centric or rely on a handful of experts to weigh in on the matter. …In this paper, we explore how can we use social media data and an AI-driven approach to complement traditional surveys and automatically extract attribution factors. We focus on the most-recent Chennai water crisis which started off as a regional issue but rapidly escalated into a discussion topic with global importance following alarming water-crisis statistics. Specifically, we present a novel prediction task of attribution tie detection which identifies the factors held responsible for the crisis (e.g., poor city planning, exploding population etc.). On a challenging data set constructed from YouTube comments (72,098 comments posted by 43,859 users on 623 relevant videos to the crisis), we present a neural classifier to extract attribution ties that achieved a reasonable performance (Accuracy: 81.34% on attribution detection and 71.19% on attribution resolution). |
Software Engineering (cs.SE) |
Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints Machine Learning, Machine Learning, Artificial Intelligence. 3 authors. pdf Reinforcement learning can greatly benefit from the use of options as a way of encoding recurring behaviours and to foster exploration. An important open problem is how can an agent autonomously learn useful options when solving particular distributions of related tasks. …We investigate some of the conditions that influence optimality of options, in settings where agents have a limited time budget for learning each task and the task distribution might involve problems with different levels of similarity. We directly search for optimal option sets and show that the discovered options significantly differ depending on factors such as the available learning time budget and that the found options outperform popular option-generation heuristics. |
Learning Reusable Options for Multi-Task Reinforcement Learning Artificial Intelligence. 3 authors. pdf Reinforcement learning (RL) has become an increasingly active area of research in recent years. Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available. …For many practical applications, it might be unfeasible for an agent to learn how to solve a task from scratch, given that it is generally a computationally expensive process; however, prior experience could be leveraged to make these problems tractable in practice. In this paper, we propose a framework for exploiting existing experience by learning reusable options. We show that after an agent learns policies for solving a small number of problems, we are able to use the trajectories generated from those policies to learn reusable options that allow an agent to quickly learn how to solve novel and related problems. |
Few-shot Learning with Multi-scale Self-supervision Computer Vision and Pattern Recognition. 3 authors. pdf Learning concepts from the limited number of datapoints is a challenging task usually addressed by the so-called one- or few-shot learning. Recently, an application of second-order pooling in few-shot learning demonstrated its superior performance due to the aggregation step handling varying image resolutions without the need of modifying CNNs to fit to specific image sizes, yet capturing highly descriptive co-occurrences. …However, using a single resolution per image (even if the resolution varies across a dataset) is suboptimal as the importance of image contents varies across the coarse-to-fine levels depending on the object and its class label e. g., generic objects and scenes rely on their global appearance while fine-grained objects rely more on their localized texture patterns. Multi-scale representations are popular in image deblurring, super-resolution and image recognition but they have not been investigated in few-shot learning due to its relational nature complicating the use of standard techniques. In this paper, we propose a novel multi-scale relation network based on the properties of second-order pooling to estimate image relations in few-shot setting. To optimize the model, we leverage a scale selector to re-weight scale-wise representations based on their second-order features. Furthermore, we propose to a apply self-supervised scale prediction. Specifically, we leverage an extra discriminator to predict the scale labels and the scale discrepancy between pairs of images. Our model achieves state-of-the-art results on standard few-shot learning datasets. |
Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification Computer Vision and Pattern Recognition. 3 authors. pdf Person re-identification (re-ID) aims at identifying the same persons’ images across different cameras. However, domain diversities between different datasets pose an evident challenge for adapting the re-ID model trained on one dataset to another one. …State-of-the-art unsupervised domain adaptation methods for person re-ID transferred the learned knowledge from the source domain by optimizing with pseudo labels created by clustering algorithms on the target domain. Although they achieved state-of-the-art performances, the inevitable label noise caused by the clustering procedure was ignored. Such noisy pseudo labels substantially hinders the model’s capability on further improving feature representations on the target domain. In order to mitigate the effects of noisy pseudo labels, we propose to softly refine the pseudo labels in the target domain by proposing an unsupervised framework, Mutual Mean-Teaching (MMT), to learn better features from the target domain via off-line refined hard pseudo labels and on-line refined soft pseudo labels in an alternative training manner. In addition, the common practice is to adopt both the classification loss and the triplet loss jointly for achieving optimal performances in person re-ID models. However, conventional triplet loss cannot work with softly refined labels. To solve this problem, a novel soft softmax-triplet loss is proposed to support learning with soft pseudo triplet labels for achieving the optimal domain adaptation performance. The proposed MMT framework achieves considerable improvements of 14.4%, 18.2%, 13.1% and 16.4% mAP on Market-to-Duke, Duke-to-Market, Market-to-MSMT and Duke-to-MSMT unsupervised domain adaptation tasks. Code is available at https://github.com/yxgeee/MMT. |
Deceiving Image-to-Image Translation Networks for Autonomous Driving with Adversarial Perturbations Computer Vision and Pattern Recognition, Image and Video Processing, Robotics. 3 authors. pdf Deep neural networks (DNNs) have achieved impressive performance on handling computer vision problems, however, it has been found that DNNs are vulnerable to adversarial examples. For such reason, adversarial perturbations have been recently studied in several respects. …However, most previous works have focused on image classification tasks, and it has never been studied regarding adversarial perturbations on Image-to-image (Im2Im) translation tasks, showing great success in handling paired and/or unpaired mapping problems in the field of autonomous driving and robotics. This paper examines different types of adversarial perturbations that can fool Im2Im frameworks for autonomous driving purpose. We propose both quasi-physical and digital adversarial perturbations that can make Im2Im models yield unexpected results. We then empirically analyze these perturbations and show that they generalize well under both paired for image synthesis and unpaired settings for style transfer. We also validate that there exist some perturbation thresholds over which the Im2Im mapping is disrupted or impossible. The existence of these perturbations reveals that there exist crucial weaknesses in Im2Im models. Lastly, we show that our methods illustrate how perturbations affect the quality of outputs, pioneering the improvement of the robustness of current SOTA networks for autonomous driving. |
Artificial Intelligence (cs.AI) |
Why and How to Balance Alignment and Diversity of Requirements Engineering Practices in Automotive Software Engineering. 3 authors. pdf In large-scale automotive companies, various requirements engineering (RE) practices are used across teams. RE practices manifest in Requirements Information Models (RIM) that define what concepts and information should be captured for requirements. …Collaboration of practitioners from different parts of an organization is required to define a suitable RIM that balances support for diverse practices in individual teams with the alignment needed for a shared view and team support on system level. There exists no guidance for this challenging task. This paper presents a mixed methods study to examine the role of RIMs in balancing alignment and diversity of RE practices in four automotive companies. Our analysis is based on data from systems engineering tools, 11 semi-structured interviews, and a survey to validate findings and suggestions. We found that balancing alignment and diversity of RE practices is important to consider when defining RIMs. We further investigated enablers for this balance and actions that practitioners take to achieve it. From these factors, we derived and evaluated recommendations for managing RIMs in practice that take into account the lifecycle of requirements and allow for diverse practices across sub-disciplines in early development, while enforcing alignment of requirements that are close to release. |
Using CEF Digital Service Infrastructures in the Smart4Health Project for the Exchange of Electronic Health Records Software Engineering. 3 authors. pdf The Smart4Health (S4H) software application will empower EU citizens to manage, analyze, and exchange their aggregated electronic health data. In order to provide such a service, basic features are needed to ensure usability, reliability, and trust. …The CEF building blocks implement such functionalities while complying with EU regulations. This report examines the current status and applicability of the CEF building blocks for the envisioned S4H software application. The major findings of the report are that (1) most CEF building blocks are currently not ready to be applied without further implementation efforts and (2) the S4H-specific use cases and user needs, to which the single CEF building blocks correspond, must be clarified. Open questions raised in this report need to be answered before a clear analysis can be made. Moreover, the functionalities of CEF building blocks aim at the matured product, while for the first version of the S4H software application basic implementations suffice. Still, concepts need to be elaborated of how the sample implementation of CEF building blocks or suitable alternative applications can be included into the system at a later point. |
Neural and Evolutionary Computing (cs.NE) |
Cross-Dataset Design Discussion Mining Software Engineering. 3 authors. pdf Being able to identify software discussions that are primarily about design, which we call design mining, can improve documentation and maintenance of software systems. Existing design mining approaches have good classification performance using natural language processing (NLP) techniques, but the conclusion stability of these approaches is generally poor. …A classifier trained on a given dataset of software projects has so far not worked well on different artifacts or different datasets. In this study, we replicate and synthesize these earlier results in a meta-analysis. We then apply recent work in transfer learning for NLP to the problem of design mining. However, for our datasets, these deep transfer learning classifiers perform no better than less complex classifiers. We conclude by discussing some reasons behind the transfer learning approach to design mining. |
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification Computer Vision and Pattern Recognition, Machine Learning, Machine Learning. 2 authors. pdf In real-world scenarios, data tends to exhibit a long-tailed, imbalanced distribution. Developing algorithms to deal with such long-tailed distribution thus becomes indispensable in practical applications. …In this paper, we propose a novel self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME). Our method is inspired by the observation that deep Convolutional Neural Networks (CNNs) trained on less imbalanced subsets of the entire long-tailed distribution often yield better performances than their jointly-trained counterparts. We refer to these models as Expert Models', and the proposed LFME framework aggregates the knowledge from multiple Experts’ to learn a unified student model. Specifically, the proposed framework involves two levels of self-paced learning schedules: Self-paced Expert Selection and Self-paced Instance Selection, so that the knowledge is adaptively transferred from multiple Experts' to the Student’. In order to verify the effectiveness of our proposed framework, we conduct extensive experiments on two long-tailed benchmark classification datasets. The experimental results demonstrate that our method is able to achieve superior performances compared to the state-of-the-art methods. We also show that our method can be easily plugged into state-of-the-art long-tailed classification algorithms for further improvements.
|
Computers and Society (cs.CY) |
A Hybrid Approach to Temporal Pattern Matching Computer Vision and Pattern Recognition, Data Structures and Algorithms. 2 authors. pdf The primary objective of graph pattern matching is to find all appearances of an input graph pattern query in a large data graph. Such appearances are called matches. …In this paper, we are interested in finding matches of interaction patterns in temporal graphs. To this end, we propose a hybrid approach that achieves effective filtering of potential matches based both on structure and time. Our approach exploits a graph representation where edges are ordered by time. We present experiments with real datasets that illustrate the efficiency of our approach. |
Cryptography and Security (cs.CR) |
Topic Extraction of Crawled Documents Collection using Correlated Topic Model in MapReduce Framework Machine Learning, Computation and Language, Information Retrieval, Machine Learning. 2 authors. pdf The tremendous increase in the amount of available research documents impels researchers to propose topic models to extract the latent semantic themes of a documents collection. However, how to extract the hidden topics of the documents collection has become a crucial task for many topic model applications. …Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of documents collection increases. In this paper, the Correlated Topic Model with variational Expectation-Maximization algorithm is implemented in MapReduce framework to solve the scalability problem. The proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed approach has a comparable performance in terms of topic coherences with LDA implemented in MapReduce framework. |
Data Structures and Algorithms (cs.DS) |
ARA : Aggregated RAPPOR and Analysis for Centralized Differential Privacy Machine Learning, Cryptography and Security, Machine Learning. 2 authors. pdf Differential privacy(DP) has now become a standard in case of sensitive statistical data analysis. The two main approaches in DP is local and central. …Both the approaches have a clear gap in terms of data storing,amount of data to be analyzed, analysis, speed etc. Local wins on the speed. We have tested the state of the art standard RAPPOR which is a local approach and supported this gap. Our work completely focuses on that part too. Here, we propose a model which initially collects RAPPOR reports from multiple clients which are then pushed to a Tf-Idf estimation model. The Tf-Idf estimation model then estimates the reports on the basis of the occurrence of “on bit” in a particular position and its contribution to that position. Thus it generates a centralized differential privacy analysis from multiple clients. Our model successfully and efficiently analyzed the major truth value every time. |
Information Retrieval (cs.IR) |
Runtime Verification of Linux Kernel Security Module Operating Systems, Software Engineering. 2 authors. pdf The Linux kernel is one of the most important Free/Libre Open Source Software (FLOSS) projects. It is installed on billions of devices all over the world, which process various sensitive, confidential or simply private data. …It is crucial to establish and prove its security properties. This work-in-progress paper presents a method to verify the Linux kernel for conformance with an abstract security policy model written in the Event-B specification language. The method is based on system call tracing and aims at checking that the results of system call execution do not lead to accesses that violate security policy requirements. As a basis for it, we use an additional Event-B specification of the Linux system call interface that is formally proved to satisfy all the requirements of the security policy model. In order to perform the conformance checks we use it to reproduce intercepted system calls and verify accesses. |
Robotics (cs.RO) |
Facial Emotions Recognition using Convolutional Neural Net Computer Vision and Pattern Recognition. 1 authors. pdf Human beings displays their emotions using facial expressions. For human it is very easy to recognize those emotions but for computer it is very challenging. …Facial expressions vary from person to person. Brightness, contrast and resolution of every random image is different. This is why recognizing facial expression is very difficult. The facial expression recognition is an active research area. In this project, we worked on recognition of seven basic human emotions. These emotions are angry, disgust, fear, happy, sad, surprise and neutral. Every image was first passed through face detection algorithm to include it in train dataset. As CNN requires large amount of data so we duplicated our data using various filter on each image. The system is trained using CNN architecture. Preprocessed images of size 80*100 is passed as input to the first layer of CNN. Three convolutional layers were used, each of which was followed by a pooling layer and then three dense layers. The dropout rate for dense layer was 20%. The model was trained by combination of two publicly available datasets JAFFED and KDEF. 90% of the data was used for training while 10% was used for testing. We achieved maximum accuracy of 78% using combined dataset. |
Methodology (stat.ME) |
MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data Machine Learning, Genomics, Machine Learning. 5 authors. pdf Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. …The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using black box matching procedures that are too expensive to run on the entire data set. Using an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC can be applied to extremely large data sets. We analyze the procedure to describe when we can expect it to work well and demonstrate its flexibility and power by applying it to a number of alignment problems arising in the analysis of single cell molecular data. |
A Spectral Hidden Markov Model for Nonstationary Oscillatory Processes Methodology, Applications. 4 authors. pdf We propose to model time-varying periodic and oscillatory processes by means of a hidden Markov model where the states are defined through the spectral properties of a periodic regime. The number of states is unknown along with the relevant periodicities, the role and number of which may vary across states. …We address this inference problem by a Bayesian nonparametric hidden Markov model assuming a sticky hierarchical Dirichlet process for the switching dynamics between different states while the periodicities characterizing each state are explored by means of a trans-dimensional Markov chain Monte Carlo sampling step. We develop the full Bayesian inference algorithm and illustrate the use of our proposed methodology for different simulation studies as well as an application related to respiratory research which focuses on the detection of apnea instances in human breathing traces. |
Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: a case study with the Lorenz 96 model Machine Learning, Machine Learning, Atmospheric and Oceanic Physics. 4 authors. pdf A novel method, based on the combination of data assimilation and machine learning is introduced. The new hybrid approach is designed for a two-fold scope: (i) emulating hidden, possibly chaotic, dynamics and (ii) predicting their future states. …The method consists in applying iteratively a data assimilation step, here an ensemble Kalman filter, and a neural network. Data assimilation is used to optimally combine a surrogate model with sparse noisy data. The output analysis is spatially complete and is used as a training set by the neural network to update the surrogate model. The two steps are then repeated iteratively. Numerical experiments have been carried out using the chaotic 40-variables Lorenz 96 model, proving both convergence and statistical skills of the proposed hybrid approach. The surrogate model shows short-term forecast skills up to two Lyapunov times, the retrieval of positive Lyapunov exponents as well as the more energetic frequencies of the power density spectrum. The sensitivity of the method to critical setup parameters is also presented: forecast skills decrease smoothly with increased observational noise but drops abruptly if less than half of the model domain is observed. The successful synergy between data assimilation and machine learning, proven here with a low-dimensional system, encourages further investigation of such hybrids with more sophisticated dynamics. |
Discrete event simulation of point processes: A computational complexity analysis on sparse graphs Applications, Computation. 3 authors. pdf We derive new discrete event simulation algorithms for marked time point processes. The main idea is to couple a special structure, namely the associated local independence graph, as defined by Didelez arXiv:0710. …5874, with the activity tracking algorithm [muzy, 2019] for achieving high performance asynchronous simulations. With respect to classical algorithm, this allows reducing drastically the computational complexity, especially when the graph is sparse. [muzy, 2019] A. Muzy. 2019. Exploiting activity for the modeling and simulation of dynamics and learning processes in hierarchical (neurocognitive) systems. (Submitted to) Magazine of Computing in Science & Engineering (2019) |
Computation (stat.CO) |
Bayesian inference of Stochastic reaction networks using Multifidelity Sequential Tempered Markov Chain Monte Carlo Computation. 3 authors. pdf Stochastic reaction network models are often used to explain and predict the dynamics of gene regulation in single cells. These models usually involve several parameters, such as the kinetic rates of chemical reactions, that are not directly measurable and must be inferred from experimental data. …Bayesian inference provides a rigorous probabilistic framework for identifying these parameters by finding a posterior parameter distribution that captures their uncertainty. Traditional computational methods for solving inference problems such as Markov Chain Monte Carlo methods based on classical Metropolis-Hastings algorithm involve numerous serial evaluations of the likelihood function, which in turn requires expensive forward solutions of the chemical master equation (CME). We propose an alternative approach based on a multifidelity extension of the Sequential Tempered Markov Chain Monte Carlo (ST-MCMC) sampler. This algorithm is built upon Sequential Monte Carlo and solves the Bayesian inference problem by decomposing it into a sequence of efficiently solved subproblems that gradually increase model fidelity and the influence of the observed data. We reformulate the finite state projection (FSP) algorithm, a well-known method for solving the CME, to produce a hierarchy of surrogate master equations to be used in this multifidelity scheme. To determine the appropriate fidelity, we introduce a novel information-theoretic criteria that seeks to extract the most information about the ultimate Bayesian posterior from each model in the hierarchy without inducing significant bias. This novel sampling scheme is tested with high performance computing resources using biologically relevant problems. |
Closed testing with Globaltest with applications on metabolomics data Methodology. 3 authors. pdf We derive a shortcut for closed testing with Globaltest, which is powerful for pathway analysis, especially in the presence of many weak features. The shortcut strongly controls the family-wise error rate over all possible feature sets. …We present our shortcut in two ways: the single-step shortcut and the iterative shortcut by embedding the single-step shortcut in branch and bound algorithm. The iterative shortcut is asymptotically equivalent to the full closed testing procedure but can be stopped at any point without sacrificing family-wise error rate control. The shortcut improves the scale of the full closed testing from 20 around features before to hundreds. It is post hoc, i.e. allowing feature sets to be chosen after seeing the data, without compromising error rate control. The procedure is illustrated on metabolomics data. |
Permutation testing in high-dimensional linear models: an empirical investigation Methodology. 3 authors. pdf Permutation testing in linear models, where the number of nuisance coefficients is smaller than the sample size, is a well-studied topic. The common approach of such tests is to permute residuals after regressing on the nuisance covariates. …Permutation-based tests are valuable in particular because they can be highly robust to violations of the standard linear model, such as non-normality and heteroscedasticity. Moreover, in some cases they can be combined with existing, powerful permutation-based multiple testing methods. Here, we propose permutation tests for models where the number of nuisance coefficients exceeds the sample size. The performance of the novel tests is investigated with simulations. In a wide range of simulation scenarios our proposed permutation methods provided appropriate type I error rate control, unlike some competing tests, while having good power. |
Machine Learning (stat.ML) |
Scalable Estimation and Inference with Large-scale or Online Survival Data Methodology. 3 authors. pdf With the rapid development of data collection and aggregation technologies in many scientific disciplines, it is becoming increasingly ubiquitous to conduct large-scale or online regression to analyze real-world data and unveil real-world evidence. In such applications, it is often numerically challenging or sometimes infeasible to store the entire dataset in memory. …Consequently, classical batch-based estimation methods that involve the entire dataset are less attractive or no longer applicable. Instead, recursive estimation methods such as stochastic gradient descent that process data points sequentially are more appealing, exhibiting both numerical convenience and memory efficiency. In this paper, for scalable estimation of large or online survival data, we propose a stochastic gradient descent method which recursively updates the estimates in an online manner as data points arrive sequentially in streams. Theoretical results such as asymptotic normality and estimation efficiency are established to justify its validity. Furthermore, to quantify the uncertainty associated with the proposed stochastic gradient descent estimator and facilitate statistical inference, we develop a scalable resampling strategy that specifically caters to the large-scale or online setting. Simulation studies and a real data application are also provided to assess its performance and illustrate its practical utility. |
Estimation of the spatial weighting matrix for regular lattice data – An adaptive lasso approach with cross-sectional resampling Machine Learning, Computation. 2 authors. pdf Spatial econometric research typically relies on the assumption that the spatial dependence structure is known in advance and is represented by a deterministic spatial weights matrix. Contrary to classical approaches, we investigate the estimation of sparse spatial dependence structures for regular lattice data. …In particular, an adaptive least absolute shrinkage and selection operator (lasso) is used to select and estimate the individual connections of the spatial weights matrix. To recover the spatial dependence structure, we propose cross-sectional resampling, assuming that the random process is exchangeable. The estimation procedure is based on a two-step approach to circumvent simultaneity issues that typically arise from endogenous spatial autoregressive dependencies. The two-step adaptive lasso approach with cross-sectional resampling is verified using Monte Carlo simulations. Eventually, we apply the procedure to model nitrogen dioxide (\(\mathrm{NO_2}\)) concentrations and show that estimating the spatial dependence structure contrary to using prespecified weights matrices improves the prediction accuracy considerably. |
Audio and Speech Processing (eess.AS) |
Audio-visual Recognition of Overlapped speech for the LRS2 dataset Audio and Speech Processing, Sound. 10 authors. pdf Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. …Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-to-end and hybrid of AVSR systems are investigated. Second, purposefully designed modality fusion gates are used to robustly integrate the audio and visual features. Third, in contrast to a traditional pipelined architecture containing explicit speech separation and recognition components, a streamlined and integrated AVSR system optimized consistently using the lattice-free MMI (LF-MMI) discriminative criterion is also proposed. The proposed LF-MMI time-delay neural network (TDNN) system establishes the state-of-the-art for the LRS2 dataset. Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29.98% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system. Consistent performance improvements of 4.89% absolute in WER reduction over the baseline AVSR system using feature fusion are also obtained. |
Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders Audio and Speech Processing, Sound. 7 authors. pdf Deep learning-based models have greatly advanced the performance of speech enhancement (SE) systems. However, two problems remain unsolved, which are closely related to model generalizability to noisy conditions: (1) mismatched noisy condition during testing, i. …e., the performance is generally sub-optimal when models are tested with unseen noise types that are not involved in the training data; (2) local focus on specific noisy conditions, i.e., models trained using multiple types of noises cannot optimally remove a specific noise type even though the noise type has been involved in the training data. These problems are common in real applications. In this paper, we propose a novel denoising autoencoder with a multi-branched encoder (termed DAEME) model to deal with these two problems. In the DAEME model, two stages are involved: offline and online. In the offline stage, we build multiple component models to form a multi-branched encoder based on a dynamically-sized decision tree(DSDT). The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT. Finally, a decoder is trained on top of the multi-branched encoder. In the online stage, noisy speech is first processed by the tree and fed to each component model. The multiple outputs from these models are then integrated into the decoder to determine the final enhanced speech. Experimental results show that DAEME is superior to several baseline models in terms of objective evaluation metrics and the quality of subjective human listening tests. |
Image and Video Processing (eess.IV) |
Deep Learning-Based Solvability of Underdetermined Inverse Problems in Medical Imaging Machine Learning, Machine Learning, Image and Video Processing. 5 authors. pdf Recently, with the significant developments in deep learning techniques, solving underdetermined inverse problems has become one of the major concerns in the medical imaging domain. Typical examples include undersampled magnetic resonance imaging, interior tomography, and sparse-view computed tomography, where deep learning techniques have achieved excellent performances. …Although deep learning methods appear to overcome the limitations of existing mathematical methods when handling various underdetermined problems, there is a lack of rigorous mathematical foundations that would allow us to elucidate the reasons for the remarkable performance of deep learning methods. This study focuses on learning the causal relationship regarding the structure of the training data suitable for deep learning, to solve highly underdetermined inverse problems. We observe that a majority of the problems of solving underdetermined linear systems in medical imaging are highly non-linear. Furthermore, we analyze if a desired reconstruction map can be learnable from the training data and underdetermined system. |
Hyperspectral Super-Resolution via Coupled Tensor Ring Factorization Computer Vision and Pattern Recognition, Image and Video Processing. 5 authors. pdf Hyperspectral super-resolution (HSR) fuses a low-resolution hyperspectral image (HSI) and a high-resolution multispectral image (MSI) to obtain a high-resolution HSI (HR-HSI). In this paper, we propose a new model, named coupled tensor ring factorization (CTRF), for HSR. …The proposed CTRF approach simultaneously learns high spectral resolution core tensor from the HSI and high spatial resolution core tensors from the MSI, and reconstructs the HR-HSI via tensor ring (TR) representation (Figure~). The CTRF model can separately exploit the low-rank property of each class (Section ), which has been never explored in the previous coupled tensor model. Meanwhile, it inherits the simple representation of coupled matrix/CP factorization and flexible low-rank exploration of coupled Tucker factorization. Guided by Theorem~, we further propose a spectral nuclear norm regularization to explore the global spectral low-rank property. The experiments have demonstrated the advantage of the proposed nuclear norm regularized CTRF (NCTRF) as compared to previous matrix/tensor and deep learning methods. |
Computational Physics (physics.comp-ph) |
Ab initio study on Quasi-Binary Acetonitriletriide Sr\(_3\)[C\(_2\)N]\(_2\) Computational Physics. 6 authors. pdf We report using density functional theory (DFT), the ground-state properties of the recently synthesized and characterized Sr\(_3\)[C\(_2\)N]\(_2\) crystal. The nearly colorless, centrosymmetric Sr\(_3\)[C\(_2\)N]\(_2\) crystallizes in a monoclinic unit cell with a \(P2_1/c\) space group (No. …14) and many of its properties remain unknown basing on the fact that it’s a latecomer in the field. The goal of this study is to fill this information gap through a theoretical prediction. The calculated structural properties were comparable to those obtained by an experimental group led by Clark and co-workers thus giving us extra confidence in the accuracy of our DFT computations on Sr\(_3\)[C\(_2\)N]\(_2\). We employed the same approach in calculating mechanical and dynamical stabilities together with the electronic density of states of Sr\(_3\)[C\(_2\)N]\(_2\). No imaginary phonon modes were observed and thus implying dynamical stability. The thirteen elastic constants calculated passed the stability criteria of a monoclinic system. From the computed Poisson’s ratio (\(\eta\)=0.27) and G/B=0.54, our calculations predict Sr\(_3\)[C\(_2\)N]\(_2\) being brittle and not able to withstand high-pressure applications. To analyze the chemical bonding mechanism, the corresponding total density of states (TDOS) and partial DOS were plotted. The top of the valence band (VB) mainly consists of C 2p states N 2p, N 2s and a slight admixture of Sr 5s states. The bottom of the conduction band (CB) shows a strong hybridization between C 2p, N 2p, N 2s, and Sr 5s states, yielding a bandwidth of 7.18 eV in the entire conduction band. We were able to obtain a tunable electronic gap of 2.65 eV in Sr\(_3\)[C\(_2\)N]\(_2\). The authors herein note that Sr\(_3\)[C\(_2\)N]\(_2\) falls in an unknown family of pseudonitrides that may possess novel physical and chemical properties. |
Performance of parallel-in-time integration for Rayleigh Bénard Convection Computational Physics. 5 authors. pdf Rayleigh-B'enard convection (RBC) is a fundamental problem of fluid dynamics, with many applications to geophysical, astrophysical, and industrial flows. Understanding RBC at parameter regimes of interest requires complex physical or numerical experiments. …Numerical simulations require large amounts of computational resources; in order to more efficiently use the large numbers of processors now available in large high performance computing clusters, novel parallelisation strategies are required. To this end, we investigate the performance of the parallel-in-time algorithm Parareal when used in numerical simulations of RBC. We present the first parallel-in-time speedups for RBC simulations at finite Prandtl number. We also investigate the problem of convergence of Parareal with respect to to statistical numerical quantities, such as the Nusselt number, and discuss the importance of reliable online stopping criteria in these cases. |
Data Analysis, Statistics and Probability (physics.data-an) |
How neural networks find generalizable solutions: Self-tuned annealing in deep learning Machine Learning, Data Analysis, Statistics and Probability, Statistical Mechanics, Adaptation and Self-Organizing Systems. 2 authors. pdf Despite the tremendous success of Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the high-dimensional weight space. By analyzing the learning dynamics and loss function landscape, we discover a robust inverse relation between the weight variance and the landscape flatness (inverse of curvature) for all SGD-based learning algorithms. …To explain the inverse variance-flatness relation, we develop a random landscape theory, which shows that the SGD noise strength (effective temperature) depends inversely on the landscape flatness. Our study indicates that SGD attains a self-tuned landscape-dependent annealing strategy to find generalizable solutions at the flat minima of the landscape. Finally, we demonstrate how these new theoretical insights lead to more efficient algorithms, e.g., for avoiding catastrophic forgetting. |
Mesoscale and Nanoscale Physics (cond-mat.mes-hall) |
Model Predictive Control for Finite Input Systems using the D-Wave Quantum Annealer Computational Physics, Emerging Technologies, Systems and Control, Quantum Physics, Mesoscale and Nanoscale Physics. 2 authors. pdf The D-Wave quantum annealer has emerged as a novel computational architecture that is attracting significant interest, but there have been only a few practical algorithms exploiting the power of quantum annealers. Here we present a model predictive control (MPC) algorithm using a quantum annealer for a system allowing a finite number of input values. …Such an MPC problem is classified as a non-deterministic polynomial-time-hard combinatorial problem, and thus real-time sequential optimization is difficult to obtain with conventional computational systems. We circumvent this difficulty by converting the original MPC problem into a quadratic unconstrained binary optimization problem, which is then solved by the D-Wave quantum annealer. Two practical applications, namely stabilization of a spring-mass-damper system and dynamic audio quantization, are demonstrated. For both, the D-Wave method exhibits better performance than the classical simulated annealing method. Our results suggest new applications of quantum annealers in the direction of dynamic control problems. |
Unsupervised machine learning and band topology Mesoscale and Nanoscale Physics, Computational Physics, Strongly Correlated Electrons. 2 authors. pdf The study of topological bandstructures is an active area of research in condensed matter physics and beyond. Here, we combine recent progress in this field with developments in machine-learning, another rising topic of interest. …Specifically, we introduce an unsupervised machine-learning approach that searches for and retrieves paths of adiabatic deformations between Hamiltonians, thereby clustering them according to their topological properties. The algorithm is general as it does not rely on a specific parameterization of the Hamiltonian and is readily applicable to any symmetry class. We demonstrate the approach using several different models in both one and two spatial dimensions and for different symmetry classes with and without crystalline symmetries. Accordingly, it is also shown how trivial and topological phases can be diagnosed upon comparing with a generally designated set of trivial atomic insulators. |
Statistics Theory (math.ST) |
Gradient descent algorithms for Bures-Wasserstein barycenters Statistics Theory, Statistics Theory. 4 authors. pdf We study first order methods to compute the barycenter of a probability distribution over the Bures-Wasserstein manifold. We derive global rates of convergence for both gradient descent and stochastic gradient descent despite the fact that the barycenter functional is not geodesically convex. …Our analysis overcomes this technical hurdle by developing a Polyak-Lojasiewicz (PL) inequality, which is built using tools from optimal transport and metric geometry. |
Maximum Likelihood Estimation of Stochastic Differential Equations with Random Effects Driven by Fractional Brownian Motion Statistics Theory, Dynamical Systems, Statistics Theory. 4 authors. pdf Stochastic differential equations and stochastic dynamics are good models to describe stochastic phenomena in real world. In this paper, we study N independent stochastic processes Xi(t) with real entries and the processes are determined by the stochastic differential equations with drift term relying on some random effects. …We obtain the Girsanov-type formula of the stochastic differential equation driven by Fractional Brownian Motion through kernel transformation. Under some assumptions of the random effect, we estimate the parameter estimators by the maximum likelihood estimation and give some numerical simulations for the discrete observations. Results show that for the different H, the parameter estimator is closer to the true value as the amount of data increases. |
Portfolio Management (q-fin.PM) |
A Note on Portfolio Optimization with Quadratic Transaction Costs Computational Finance, Machine Learning, Portfolio Management. 4 authors. pdf In this short note, we consider mean-variance optimized portfolios with transaction costs. We show that introducing quadratic transaction costs makes the optimization problem more difficult than using linear transaction costs. …The reason lies in the specification of the budget constraint, which is no longer linear. We provide numerical algorithms for solving this issue and illustrate how transaction costs may considerably impact the expected returns of optimized portfolios. |
Statistical Finance (q-fin.ST) |
Disentangling shock diffusion on complex networks: Identification through graph planarity Statistical Finance. 3 authors. pdf Large scale networks delineating collective dynamics often exhibit cascading failures across nodes leading to a system-wide collapse. Prominent examples of such phenomena would include collapse on financial and economic networks. …Intertwined nature of the dynamics of nodes in such network makes it difficult to disentangle the source and destination of a shock that percolates through the network, a property known as reflexivity. In this article, a novel methodology is proposed which combines vector autoregression model with an unique identification restrictions obtained from the topological structure of the network to uniquely characterize cascades. In particular, we show that planarity of the network allows us to statistically estimate a dynamical process consistent with the observed network and thereby uniquely identify a path for shock propagation from any chosen epicenter to all other nodes in the network. We analyze the distress propagation mechanism in closed loops giving rise to a detailed picture of the effect of feedback loops in transmitting shocks. We show usefulness and applications of the algorithm in two networks with dynamics at different time-scales: worldwide GDP growth network and stock network. In both cases, we observe that the model predicts the impact of the shocks emanating from the US would be concentrated within the cluster of developed countries and the developing countries show very muted response, which is consistent with empirical observations over the past decade. |
Instrumentation and Methods for Astrophysics (astro-ph.IM) |
Applying Information Theory to Design Optimal Filters for Photometric Redshifts Applications, Information Theory, Instrumentation and Methods for Astrophysics, Information Theory. 3 authors. pdf In this paper we apply ideas from information theory to create a method for the design of optimal filters for photometric redshift estimation. We show the method applied to a series of simple example filters in order to motivate an intuition for how photometric redshift estimators respond to the properties of photometric passbands. …We then design a realistic set of six filters covering optical wavelengths that optimize photometric redshifts for \(z <= 2.3\) and \(i < 25.3\). We create a simulated catalog for these optimal filters and use our filters with a photometric redshift estimation code to show that we can improve the standard deviation of the photometric redshift error by 7.1% overall and improve outliers 9.9% over the standard filters proposed for the Large Synoptic Survey Telescope (LSST). We compare features of our optimal filters to LSST and find that the LSST filters incorporate key features for optimal photometric redshift estimation. Finally, we describe how information theory can be applied to a range of optimization problems in astronomy. |