Poster Sessions

In this section you will find the posters' title, abstract and presenter corresponding to each of the two poster sessions that will take place during the MLSS.

Some photos of the MLSS poster sessions can be seen here: Photos.

First Poster Session: 30/08/2018, 18:30h - 20:30h
 # Presenter Title Abstract 1 Homayun Afrabandpey Active Expert Knowledge Elicitation of Feature Similarities and Covariances for Prediction Prediction in "small n, large $p$'' problems, with sample sizes substantially smaller than the number of features, is challenging. Bayesian models can alleviate the challenge using informative prior distributions over parameters. There exist a rich literature on constructing such prior distributions through expert knowledge elicitation, most of them focus either on directly eliciting distributions or on querying knowledge about features one at a time. Furthermore, common to all these techniques is the independence assumption of the features in the prior distribution. Focusing on linear regression, we propose a human-in-the-loop machine learning method for constructing full covariance matrix for prior distribution of the parameter by querying the expert about pairs of features. Since the number of pairs can be large, we increase the interaction efficiency by a subsampling approach with guarantees and implement the models using probabilistic programming, which allows us to naturally use sequential decision making methods to optimize query selection. Our results demonstrate improvement in predictive performance with simulated and real data. 2 REDA ALAMI Memory Bandits: A Bayesian Approach for the Switching Bandit Problem The Thompson Sampling exhibits excellent results in practice and it has been shown to be asymptotically optimal. The extension of Thompson Sampling algorithm to the Switching Multi-Armed Bandit problem, proposed in \cite {mellor2013thompson}, is a Thompson Sampling equiped with a Bayesian online change point detector \cite{adams2007bayesian}. In this paper, we propose another extension of this approach based on a Bayesian aggregation framework. Experiments provide some evidences that in practice, the proposed algorithm compares favorably with the previous version of Thompson Sampling for the Switching Multi-Armed Bandit Problem, while it outperforms clearly other algorithms of the state-of-the-art. 3 Álvaro Barbero Jiménez proxTV: fast and modular proximal optimization for multidimensional total-variation regularization We study TV regularization, a widely used technique for eliciting structured sparsity. In particular, we propose efficient algorithms for computing prox-operators for lp-norm TV. The most important among these is l1-norm TV, for whose prox-operator we present a new geometric analysis which unveils a hitherto unknown connection to taut-string methods. This connection turns out to be remarkably useful as it shows how our geometry guided implementation results in efficient weighted and unweighted 1D-TV solvers, surpassing state-of-the-art methods. Our 1D-TV solvers provide the backbone for building more complex (two or higher-dimensional) TV solvers within a modular proximal optimization approach. We review the literature for an array of methods exploiting this strategy, and illustrate the benefits of our modular design through extensive suite of experiments on (i) image denoising, (ii) image deconvolution, (iii) four variants of fused-lasso, and (iv) video denoising. To underscore our claims and permit easy reproducibility, we provide all the reviewed and our new TV solvers in an easy to use multi-threaded C++, Matlab and Python library: proxTV. 4 Johanna Bayer Profiling Major Depressive Disorder using methods of machine learning The concept of Major Depressive Disorder (MDD) is very broadly defined. Symptom profiles vary both between individuals diagnosed with this disorder and overlap with those of other mental disorders from the affective spectrum. Due to this heterogeneity and the lack of a clinical decision support tools, the assignment of the correct treatment for MDD is tedious and based on trial and error. We use gaussian process regression to map the normative relationship between age and structural freesurfer brain measures in 8,000 healthy controls and estimate the extent to which brain structures in individual MDD patients (N=2,500) deviate from these normative patterns. This approach allows us to look at single individuals, e.g. with extreme values under the normative model, and to identify subgroups, e.g., using clustering. These subtypes of individuals with MDD can be related to disorder- and treatment-specific variables, like disease severity, treatment response and outcome, and thus facilitate the assignment of treatment. The current results of the project are discussed against the background of the emerging trend to share and combine data between and from multiple sites, and the possible risks in the use of methods of machine learning on them. 5 Mikhail Beck Making sport competition models with the help of machine learning Sport betting industry extensively employs probabilistic modelling to develop better odds compilation tools. These tools typically take odds on some key markets as input and calculate odds on tens or even hundreds of derived markets. Machine Learning techniques are now used to build more realistic models of sport competitions. In this presentation I show 3 cases where ML provided a solution that would be difficult to get by other means. 6 Lyvia Biagi Prediction of nocturnal hypoglycemic events in subjects with type 1 diabetes Introduction Subjects with type 1 diabetes (T1D) need exogenous insulin to regulate blood glucose levels due to an autoimmune destruction of pancreatic beta cells. Insulin must be infused properly to maintain normal levels of glucose. Otherwise, patients can experience hyper- or hypoglycemic levels. Hypoglycemia is a serious complication of T1D and it is a major concern in patient’s safety. Nocturnal hypoglycemia can lead to various adverse situations in T1D patients, including consciousness, seizure, or even death. Making use of patient’s retrospective data allows the prediction and prevention of future hypoglycemic events, contributing to enhancing of patient safety and quality of care. Methods Data from 12 patients with T1D were considered in this work. The main objective of this work is to develop personalized prediction systems based on individual historical data collected from patients. Information related with patients’ insulin therapy, meals and physical activity were considered to provide different features in order to predict the occurrence of nocturnal hypoglycemic events, using machine learning algorithms. Preliminary Results Preliminary results were obtained using artificial neural networks to classify days into two classes: i) nights with hypoglycemic events and ii) nights without hypoglycemic events. The methodology applied in the dataset was able to obtain satisfactory results for most of the patients. Averaged sensitivity and specificity for all the patients were 54.3% and 85.1%, respectively. This information may be helpful to patients to take actions to properly avoid the occurrence of nocturnal hypoglycemic events, while they get ready to sleep. 7 Lubos Buzna Use cases and introductory analysis of the dataset collected within the large network of public charging stations The recent rise of electric vehicles (EV) brings social and technological changes in the transportation and energy sectors, including the massive deployment of charging stations. To provide effective decision support for the operators of charging stations, we are exploring possibilities for exploiting available dataset and present results of preliminary data analyses. Our dataset contains over 32 million meter-readings from charging of plug-in electric vehicles (PEV) on more than 1700 charging stations, located in the Netherlands. Based on the discussions with experts and the available literature, three main application areas were identified: forecasting of demanded energy, identification of customer segments and characterization of suitable locations for charging stations. As an example of a use case, we forecast the consumption of electric energy on charging stations in the COROP region of Utrecht. Two kinds of SARIMAX model together with three kinds of training-forecasting procedure are used with various exogenous predictors to identify, which combination provides the best long-term forecasts 8 Taha Ceritli Modeling Bounded Data with Sum Conditioned Poisson Factorization Non-negative bounded data, such as binary and ordinal matrices, are modeled as Poisson random variables with unbounded ranges in Poisson Factorization, a state-of-the-art matrix factorization method. In this work, we extend Poisson Factorization to model such bounded data with bounded distributions such as Bernoulli, Binomial, Categorical and Multinomial, where multiple Poisson Factorizations are conditioned on their sum. The resulting model named Sum Conditioned Poisson Factorization is evaluated on simulated and real data sets. 9 Ho Ching Chiu Single Image Super-Resolution GAN using Inception-ResNet As the title. I just switched from psychology to machine learning, so my research yet have anything to do with machine learning, and I just do this as my side project purely for fun. The work is still in progress and not yet very successful. Sorry that there is no poster since I don't have time to prepare, but I am keen on learning from all of you to produce awesome super-resolution images and, of course, on all other machine learning topics as well. Happy to discuss and suggestion are welcomed. Email: hoching.chiu@oist.jp 10 Irene Córdoba Uniform sampling of decomposable Gaussian graphical models We propose a novel Metropolis-Hastings algorithm to sample uniformly from the space of decomposable Gaussian graphical models. The method is based on previous work on uniform sampling of correlation matrices. Our approach is intuitive and simple, based on the interpretation of the Cholesky factorization of the inverse covariance matrix and Markov chain Monte Carlo theory. We analyze the convergence of the resulting Markov chain both theoretically and empirically. We show in numerical experiments how traditional sampling methods in Gaussian graphical models are biased towards certain regions of the whole space, whereas our approach uniformly explores all of it. 11 Carlo D'Eramo D'Eramo Exploiting Action-Value Uncertainty to Improve Learning and Exploration in Reinforcement Learning We address the problem of the estimation of the maximum expected value of a set of independent random variables. The performance of several learning algorithms (e.g., Q-learning) is affected by the accuracy of such estimation. Unfortunately, no unbiased estimator exists. The usual approach of taking the maximum of the sample means leads to large overestimates that may significantly harm the performance of the learning algorithm. Recent works have shown that the cross validation estimator which is negatively biased outperforms the maximum estimator in many sequential decision-making scenarios. On the other hand, the relative performance of the two estimators is highly problem-dependent. In this work, we propose a new estimator for the maximum expected value, based on a weighted average of the sample means, where the weights are computed using Gaussian approximations for the distributions of the sample means. We extend this analysis also to an infinite set of random variables. We apply our method to a wide range of Reinforcement Learning problems from discrete to continuous ones. Moreover, we explain how to improve exploration exploiting the uncertainty computed by our algorithm. We compare the proposed estimator and exploration strategies with the other state-of-the-art methods both theoretically, by deriving upper bounds to the bias and the variance of the estimator, and empirically, by testing the performance on different sequential learning problems. 12 Lucas Deecke Mode normalization Normalization methods are a central building block in the deep learning toolbox. By alleviating internal covariate shifting, they accelerate training in deep networks, and decrease the dependence on manually tuned learning rate schedules. When learning from multi-modal distributions, the effectiveness of batch normalization (BN), arguably the most prominent normalization method, is reduced. As a remedy, we propose a more flexible approach: by extending the normalization to more than a single mean and variance, we detect modes of data on-the-fly, jointly normalizing samples that share common features. We demonstrate that our method outperforms BN and other widely used normalization techniques in several experiments, including various architectures and datasets. 13 Markus Eyting Predicting Health from Questionnaire Data By means of random forest classifications we predict different health outcomes from questionnaire data. Patients' health behaviour, psychological conditions, work attitudes as well as basic demographic characteristics are used to predict health outcomes in a variety of dimensions. Predictions are compared to actual health statuses as well as to expert predictions from physicians. 14 Elizabeth Fons Etcheverry A regime switching model for smart beta investing using Hidden Markov Models The financial crisis generated interest in more trans- parent, rules-based strategies, with Smart beta emerging as a trend among institutional investors. Smart beta is a hybrid strategy combining investment strategies from active management, with a systematic approach often associated with passive investment, making them more cost effective. Such strategies show strong performance over the long run, but often suffer from severe short-term drawdown with fluctuating performance across cycles. To address cyclicality and underperformance, we build a regime-switching framework using Hidden Markov Models (HMMs). We build portfolios whose allocating signal is provided by a HMM trained with the same assets. Results show that using HMMs improves risk adjusted returns, especially on more return-oriented portfolios. In addition, we implement a novel approach for regime switching models using an embedded feature selection algorithm to improve regime identification. We evaluate smart feature selection with real life assets using MSCI style indices, and show model performance improvement with respect to portfolios built using full feature HMMs. 15 Víctor Gallego Bayesian structural time series models for advertising expenditures We propose a robust implementation of the Nerlove--Arrow model using a Bayesian structural time series model to explain the relationship between advertising expenditures of a country-wide fast-food franchise network with its weekly sales. Thanks to the flexibility and modularity of the model, it is well suited to generalization to other markets or situations. Its Bayesian nature facilitates incorporating \emph{a priori} information (the manager's views), which can be updated with relevant data. This aspect of the model will be used to present a strategy of budget scheduling across time and channels. 16 Kunal Ghosh Deep learning spectroscopy: neural networks for molecular excitation spectra Applications of novel materials have a significant positive impact on our lives. To search for such novel materials, material scientists traverse massive datasets of prospective materials identifying ones with favourable properties. Prospective materials are screened by studying a suitable spectra of these materials. Contemporary methods like high-throughput screening are very time consuming for moderately sized datasets. We train three different neural network architectures: multilayer perceptron (MLP), convolutional neural network (CNN) and deep tensor neural network (DTNN) to predict orbital energies and excitation spectra 132K organic molecules. The input for the neural networks are the coordinates and charges of the constituent atoms of each molecule. Already the MLP is able to learn spectra, but the test root mean square error (RMSE) is still as high as 0.3 eV. The learning quality improves significantly for the CNN (RMSE=0.23 eV) and reaches its best performance for the DTNN (RMSE=0.19 eV). Both CNN and DTNN capture even small nuances in the spectral shape. 17 Hanane GRISSETTE A dynamic Sentiment Analysis Model based on associative learning for Defining the Minute Insights from passive and active Patients state self-reported on Social Media Nowadays, Sentiment Analysis (SA) is the pioneering approach used to analyze people’s opinions about a product or an event to identify breakpoints in public opinion [1] . The traditional form of clinical notes such as CRFs (Case Report Form) that used to summarize physical examination and details of the medical history of patients’ experiences towards specific drugs or events is not credible and less efficient at defining the changeable emotional state of patients through the process of medication.Moreover, the major Issue is the inability of such general-purpose SA tools to accurately detect the meanings of the sentiments expressed towards treatments/scientific studies and Pharma companies at large over the time.in this work,we aim at defining the underling set of sentiment covered towards an entity by using an associative learning based on Bayesian approach to quantifying exactly what change is. 18 Prakhar Gupta Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings. 19 Ray Han Towards Effective Extraction and Linking of Software Mentions from User-Generated Support Tickets Software support tickets contain short and noisy text from users. Software products are represented by various surface forms and informal abbreviations created by users. Automatically identifying software mentions from tickets and determining the official names (and versions) are helpful for many downstream applications, e.g., routing the support tickets to the right team of experts supporting the software. In this work, we study the problem of software product name extraction and linking from support tickets. We first analyze a collection of annotated tickets to understand the language pat- terns. Second, we design features using multiple in-domain and web knowledge sources, for the extraction and linking with linear mod- els. Experiments on four datasets show better and more consistent results of our methods compared to neural network baselines. 20 Florian Huber Predicting antibacterial drug mode of action using machine learning Elucidating the mode of action (MoA) of small molecules targeting microbial growth is key for drug discovery. MoA determination is still a major bottleneck in drug discovery because it depends on laborious low throughput methods and thus cannot be applied to large compound collections. Previous efforts to systematically infer drug MoA using drug-gene interactions or phenotypic profiling of single cells lack accuracy and cannot be easily generalised. In this project, we aim to make use of chemical screening data, microscopy, and compound structures as input for machine learning analyses in order to identify a set of features that can predict drug MoA across a wide range of bacterial species. We provide an overview of the challenges of MoA prediction and current computational/statistical approaches to predict drug MoA. We present our approaches to select the most informative features for MoA prediction from the wealth of experimental data that is available. These data are combined with our knowledge of bacterial cell physiology to train classification algorithms for MoA prediction. This will yield insights in the pathways affected by drugs and facilitate the elucidation of mechanisms of drugs with unknown MoA. 21 Abdullah-Al-Zubaer Imran PDV-Net: Reliable, Fast, and Automatic Segmentation of Pulmonary Lobes Reliable and automatic segmentation of lung lobes is important for diagnosis, assessment, and quantification of pulmonary diseases. The existing techniques are prohibitively slow, undesirably rely on prior (airway/vessel) segmentation, and/or require user interactions for optimal results. This work presents a reliable, fast, and fully automated lung lobe segmentation based on a progressive dense V-network (PDV-Net). The proposed method can segment lung lobes in one forward pass of the network, with an average runtime of 2 seconds using 1 Nvidia Titan XP GPU, eliminating the need for any prior atlases, lung segmentation or any subsequent user intervention. We evaluated our model using 84 chest CT scans from the LIDC and 154 pathological cases from the LTRC datasets. Our model achieved a Dice score of 0.939 ± 0.02 for the LIDC test set and 0.950 ± 0.01 for the LTRC test set, signicantly outperforming a 2D U-net model and a 3D dense V-net. We further evaluated our model against 55 cases from the LOLA11 challenge, obtaining an average Dice score of 0.935—a performance level competitive to the best performing model with an average score of 0.938. Our extensive robustness analyses also demonstrate that our model can reliably segment both healthy and pathological lung lobes in CT scans from different vendors, and that our model is robust against configurations of CT scan reconstruction. 22 Tim Janke A Quantile Regression Deep Neural Network for Probabilistic Electricity Price Forecasting In the recent decade the paradigm in forecasting has been shifting from point forecasts to probabilistic forecasts acknowledging the need for an assessment of forecast uncertainty. In electricity price forecasting (EPF), probabilistic forecasting is an emerging but still underdeveloped field. Due to the rising share of generation from renewable and volatile energy sources like wind and solar in combination with the partly inflexible generation from thermal power plants, electricity prices have become increasingly volatile and hard to forecast. Hourly electricity prices are typically determined by a one day-ahead uniform price auction and are influenced by fundamental factors like expected renewable in-feed, expected demand, or fuel prices, but also exhibit strong seasonal and auto-regressive patterns. Historically, classic parametric time series models are predominant. Deep neural networks are theoretically well suited to model the complex and non-linear relationships that govern the formation of electricity prices. However, neural network models so far have shown limited performance in the domain of EPF. We challenge this notion by showing that a quantile regression deep neural network beats established benchmark models in terms of point forecasting as well as probabilistic forecasting accuracy. Combining the concept of quantile regression with a deep neural network, we propose a Quantile Regression Deep Neural Network for the simultaneous estimation of 99 quantiles of all 24 day-ahead electricity prices, i.e. our model has 24*99 output units and is trained to minimize the average pinball loss over all prices. We use the load forecast, expected solar power infeed, and expected wind power infeed as well as 168 lagged prices as input from a three-year data set from the German-Austrian bidding zone in hourly resolution. We initially fit our model using the first two years in the data set and test our model’s performance by forecasting the whole year of 2017. We refit our model after each day using an expanding window. 23 Juan Emmanuel Johnson Input Uncertainty in Gaussian Process Regression for Earth Surface Temperature Predictions Gaussian Processes (GPs) are a class of kernel methods that have shown to be very useful in geoscience applications. They are widely used because they are simple, flexible and provide very accurate estimates for nonlinear problems, especially in parameter retrieval. An addition to a predictive mean function, GPs come equipped with a useful property: the predictive variance function which provides confidence intervals for the predictions. The GP formulation usually assumes that there is no input noise in the training and testing points, only in the observations. However, this is often not the case in Earth observation problems where an accurate assessment of the instrument error is usually available. In this poster, I showcase how the derivative of a GP model can be used to provide an analytical error propagation formulation and we analyze the predictive variance and the propagated error terms in a temperature prediction problem from infrared sounding data. 24 Marija Kekic Applications of neural networks in data analysis in NEXT Convolutional Neural Networks have achieved impressive results in the field of computer vision. In this work we examine application of CNNs in Neutrino Experiment with a Xenon TPC (NEXT) where CNNs are used to differ topological signatures of background and signal events via training on many thousands of simulated events. The network trained in this study performed better than previous methods, and we hope to further improve our findings by usage of more appropriate architectures. 25 Benjamin Knopp Temporal Movement Primitive Perception under Naturalistic Conditions Movement Primitives (MP) are hypothetical elements out of which complex movements can be composed. This concept is popular in motor control, but we are interested in MPs as perceptual categories: inspired by the common-coding theory, we investigated if MPs are also a useful approach for describing human movement perception in naturalistic settings. Besides understanding movement perception, finding perceptual MP categories could also be useful for computer-vision/graphics and modelling. We recorded an actor performing natural tasks in a fairly unconstrained manner. The actions consist of walking through an indoor environment, stair climbing, making/drinking coffee. The data was used for a psychophysical movement segment perception experiment and for learning MP models. We showed 70 video clips containing a selection of recordings to 12 participants. They were instructed to segment these clips into non-overlapping time intervals according to perceived boundaries. Results: We then used the segmentations of each participant for the extraction of MPs. Using the Bayesian Information Criterion, we estimated that 6-15 MPs are optimal for a given participant. Furthermore, we did a cluster analysis to compare global and local representations. The results indicate that task-independent MPs provide a better representation of human movement than task-dependent ones. 26 Radha Manisha Kopparti Abstract Rule Learning with Neural Networks Over the past few years, deep neural networks have been widely used for various applications and have produced state-of-the-art results in domains like image recognition, speech recognition, machine translation, etc. Nevertheless, there are still open challenges. One of them is the need for vast amounts of training data, which has been related to the difficultly of neural networks to learn certain abstractions, specifically grammatical patterns. For example, humans can easily learn linguistic abstractions, both through explicit definition and more implicit means . In an experiment, Marcus showed that even 7-month old infants learned abstract grammar-like rules from a small number of unlabeled examples, in just two minutes, while neural networks failed to do so. In a series of recent papers, there has been re-emphasis on the fact that humans are far more efficient in learning complex rules than deep learning systems . However, previous works on training neural networks to understand abstract grammar patterns haven't produced positive results. As recurrent neural networks have been shown to be Turing- complete , they can represent abstract relationships, but the current algorithms do not seem to learn these representations. Therefore we take a constructive approach and create network architectures which detect abstract relationships and condition outputs on them. We take this as the basis for further exploring the learning behaviour of networks and identifying ways to encourage abstraction in neural networks. We started by creating a neural network that can learn identity relationships by design. This network can learn the grammars proposed by Marcus et al. from the data when trained with stochastic gradient descent. We perform several experiments by training the neural network on sequential data and propose a framework by which neural networks can learn abstract relationships . This approach may provide new generalization capabilities to neural networks and can be applied to various modalities like speech and language, music, and time series data. 27 Michal Kozlowski Energy Efficiency in Reinforcement Learning for Wireless Sensor Networks As sensor networks for health monitoring become more prevalent, so will the need to control their usage and consumption of energy. This poster presents a method which leverages the algorithm's performance and energy consumption. By utilising Reinforcement Learning (RL) techniques, we provide an adaptive framework, which continuously performs weak training in an energy-aware system. We motivate this using a realistic example of residential localisation based on Received Signal Strength (RSS). The method is cheap in terms of work-hours, calibration and energy usage. It achieves this by utilising other sensors available in the environment. These other sensors provide weak labels, which are then used to employ the State-Action-Reward-State-Action (SARSA) algorithm and train the model over time. Our approach is evaluated on a simulated localisation environment and validated on a widely available pervasive health dataset which facilitates realistic residential localisation using RSS. We show that our method is cheaper to implement and requires less effort, whilst at the same time providing a performance enhancement and energy savings over time. 28 Rita Kuznetsova Variational Bi-domain Triplet Autoencoder We investigate deep generative models, which allow us to use training data from one domain to build a model for another domain. We consider domains to have similar structure (texts, images). We propose the Variational Bi-domain Triplet Autoencoder (VBTA) that learns a joint distribution of objects from different domains. There are many cases when obtaining any supervision (e.g. paired data) is difficult or ambiguous. For such cases we can seek a method that is able to the information about data relation and structure from the latent space. We extend the VBTAs objective function by the relative constraints or triplets that sampled from the shared latent space across domains. In other words, we combine the deep generative model with a metric learning ideas in order to improve the final objective with the triplets information. We demonstrate the performance of the VBTA model on different tasks: bi-directional image generation, image-to -image translation, even on unpaired data. We also provide the qualitative analysis. We show that VBTA model is comparable and outperforms some of the existing generative models. 29 Krista Longi Semi-supervised Convolutional Neural Networks for Identifying Wi-Fi Interference Sources We present a convolutional neural network for identifying radio frequency devices from signal data, in order to detect possible interference sources for wireless local area networks. Collecting training data for this problem is particularly challenging due to a high number of possible interfering devices, difficulty in obtaining precise timings, and the need to measure the devices in varying conditions. To overcome this challenge we focus on semi-supervised learning, aiming to minimize the need for reliable training samples while utilizing larger amounts of unsupervised labels to improve the accuracy. In particular, we propose a novel structured extension of the pseudo-label technique to take advantage of temporal continuity in the data and show that already a few seconds of training data for each device is sufficient for highly accurate recognition. 30 Gurunath Reddy Madhumani Classification and Segmentation of Vocal Folds in Videolaryngostroboscopy Images of Laryngeal Disorders Videolaryngostroboscopy is a invasive technique to capture the vocal folds vibration/activity during phonation with the help of high speed video camera with intermittent illumination condition. Vocal folds are the membrane structure of two symmetrical folds in the larynx. The healthy vocal folds produces quasi-periodic puffs of air driven from the lungs gets convolved with the vocal tract, results in actions such as speaking, singing and other paralinguistic vocalizations. Whereas an unhealthy/disordered vocal folds results in irregular vibration, leakage of air flow, hoarseness, breathy, also, in some disorders, vocal folds even generates pain in larynx due to external tissue growth on vocal folds or due to inflammation. In many cases, early vocal disorder detection is necessary to prevent probable chronic diseases such as carcinoma. In literature, we can find very few attempts for automated segmentation of vocal fold region and classification from the noisy, distorted and very low illumination condition images. Most of the methods assumes that all frames in the Videolaryngostroboscopy contain vocal folds and starts segmenting the desired region with the help of multiple stages of Digital image processing piplines, which is not always true. Often, the shape of the vocal folds is assumed to be of fixed, hand crafted features such as histogram of oriented gradients and region growing methods are applied to obtain the vocal folds regions. In our work, we have proposed end-to-end classification and segmentation of vocal folds method which does not involve any handcrafted features. The machine learns features on its own by minimizing the error between the predicted and the actual ground truth. The first stage classifies the given frame for the presence of vocal folds. The second stage segments frame containing the vocal folds at pixel level. The final stage classifies the segmented vocal folds into one of several categories of disorders. The initial results showed that the proposed method is indeed better than the state-of-the-art methods. For this work, as a first step, we have hand annotated approximately 7000 images for frame level classification and segmentation at pixel level from 5 normal and 20 disorder vocal fold patients. 31 Atalanti Mastakouri Personalised brain stimulation for motor rehabilitation Non-invasive brain stimulation is one of the most novel techniques for motor rehabilitation after stroke. Although there are some very promising results, many studies trying to replicate them report inconsistent results and large percentages of non-responders. Recently, we demonstrated evidence of large across-subjects heterogeneity of brain activity during the same motor task and proposed that this could consist a reason why the same stimulation parameters (frequency, amplitude, location) do not lead to the same conclusions across subjects. We now focus on identifying the subject-specific brain features which explain response to motor cortex brain stimulation. 32 Joe Meagher Phylogenetic Gaussian Processes and Bat Echolocation The reconstruction of ancestral echolocation calls is an important part of understanding the evolutionary history of bats. General techniques for the ancestral reconstruction of function-valued traits have recently been proposed. A full implementation of phylogenetic Gaussian processes for the ancestral reconstruction of function-valued traits representing bat echolocation calls is presented here. 33 Luca Messina Modeling the behavior of fusion power-plant component with multifidelity-based simulations In future nuclear fusion power plants based on the magnetic-confinement concept, the divertor (the pipe for exhaust gases) will be a critical component that will be exposed to the harshest conditions in terms of high temperatures, mechanical stresses, and neutron irradiation. In particular, high-energy neutrons emitted in the fusion reaction can severely damage and endanger the divertor structural integrity, forcing for very frequent and expensive replacements. In order to maximize the lifetime of this component, it is crucial to predict its behavior during operation and devise the optimal chemical composition that would minimize the irradiation effects. At the atomic scale, the damage is caused by the continuous collision between the high-energy neutrons and the atoms of the divertor metallic structure. These collisions create defects that initiate atomic-diffusion phenomena, leading to changes in the chemical composition of the alloy which strongly affect the macroscopic properties, such as hardness and brittleness. This series of complex phenomena is here modeled in a multiscale framework. First, the metal thermodynamic and kinetic properties are computed with accurate electronic-structure (Density Functional Theory, DFT) calculations, building up a subatomic physical description that can be used to parameterize Monte Carlo simulations of atomic transport and diffusion. In this way, it is possible to simulate the chemical evolution of the alloy caused by the irradiation-induced crystal defects, and make predictions on the structural properties. The parameterization of such Monte Carlo simulations is tricky, because subatomic properties are highly non-linear functions of the local chemical composition around a given crystal defect. For instance, considering environments of approximately 100 atoms (corresponding to a sphere of about 0.6 nm) would give rise in a simple binary alloy to 2^100 combinations of chemical composition, clearly unattainable for accurate but computationally expensive DFT calculations. In this work, we start from much smaller DFT datasets to build a reliable model able to predict the subatomic properties as functions of the local chemical composition. This is achieved by developing simplified models and assessing their accuracy against accurate DFT calculations, with the aid of a multifidelity approach allowing for the prediction of each model's average error. This will be applied in future works to the specific case of a W-Re alloy, which is among the candidate materials for the divertor thanks to its excellent structural properties, but whose behavior under irradiation is still largely unknown. 34 Prerana Mukherjee SalProp: Salient object proposals via aggregated edge cues In this paper, we propose a novel object proposal generation scheme by formulating a graph-based salient edge classification framework that utilizes the edge context. In the proposed method, we construct a Bayesian probabilistic edge map to assign a saliency value to the edgelets by exploiting low level edge features. A Conditional Random Field is then learned to effectively combine these features for edge classification with object/non-object label. We propose an objectness score for the generated windows by analyzing the salient edge density inside the bounding box. Extensive experiments on PASCAL VOC 2007 dataset demonstrate that the proposed method gives competitive performance against 10 popular generic object detection techniques while using fewer number of proposals. 35 Mojmir Mutny Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features We develop an efficient and provably no-regret Bayesian optimization (BO) algorithm for optimization of black-box functions in high dimensions. We assume a generalized additive model with possibly overlapping variable groups. When the groups do not overlap, we are able to provide the first provably no-regret \emph{polynomial time} (in the number of evaluations of the acquisition function) algorithm for solving high dimensional BO. To make the optimization efficient and feasible, we introduce a novel deterministic Fourier Features approximation based on numerical integration with detailed analysis for the squared exponential kernel. The error of this approximation decreases \emph{exponentially} with the number of features, and allows for a precise approximation of both posterior mean and variance. In addition, the kernel matrix inversion improves in its complexity from cubic to essentially linear in the number of data points measured in basic arithmetic operations. 36 Maryleen Ndubuaku Hybrid Intelligence for Real-time Anomaly Detection in Smart Visual Network This research proceeds at the edge where the focus is to process video streams so as to detect and isolate special events. This allows the cloud to receive just as much data as required for predictive analytics, pattern recognition and data mining. It also reduces the amount of data stored in the cloud by ensuring that only special events are filtered starting from the edge devices to the cloud. Using an online deep learning algorithm, the anomalous events are captured in real-time and transmitted to the next tier of the network. While there has been a lot of research in individual fields of edge learning, video analytics, cloud data fusion and anomaly detection, research is still lacking in the aggregation of these technologies, where anomalous activities in visual networks can be detected through a hybrid learning method between the edge and the cloud. The hybrid learning real-time analytics could be valuable in various applications like surveillance systems and environmental monitoring. 37 Fernando O. Gallego A dataset to mining conditions A condition is a constraint that determines when something holds. Mining them is paramount to understanding many sentences properly. Supervised condition miners need a labelled dataset with conditions but there is not one publicly available. We present the first publicly available dataset with conditions. It consists of more than 45,000 labelled sentences from a set of more than 4,500,000 sentences in English, Spanish, French, and Italian that were gathered from the Web between April 2017 and May 2017. The sentences were labelled by means of a custom tool that we devised to perform the task. 38 Julia Olkhovskaya Online influence maximization with local observations We consider an online influence maximization problem in which a decision maker selects a node among a large number of possibilities and places a piece of infor- mation at the node. The node transmits the information to some others that are in the same connected component in a random graph. The goal of the decision maker is to reach as many nodes as possible, with the added complication that feedback is only available about the degree of the selected node. Our main result shows that such local observations can be sufficient for maximizing global influence in two broadly studied families of random graph models: stochastic block models and Chung–Lu models. With this insight, we propose a bandit algorithm that aims at maximizing local (and thus global) influence, and provide its theoretical analysis in both the subcritical and supercritical regimes of both considered models. Notably, our performance guarantees show no explicit dependence on the total number of nodes in the network, making our approach well-suited for large-scale applications. 39 Alessandro Ortis On the Prediction of Social Image Popularity Dynamics This work introduces the new challenge of forecasting the engagement score reached by social images over time. We call this task ”Popularity Dynamic Prediction”. The task is the estimation, in advance, of the engagement score dynamic over a period of time (e.g., 30 days) by exploiting visual and social features. To this aim, we propose a benchmark dataset that consists of ~20K Flickr images labelled with their engagement scores (i.e., views, comments and favorites) in a period of 30 days from the upload in the social platform. For each image, the dataset also includes user’s and photo’s social features that have been proven to have an influence on the image popularity on Flickr. The proposed dataset is publicly available for research purpose. We also present a method to address the aforementioned problem. Our approach is able to forecast the daily number of views reached by a photo posted on Flickr for a period of 30 days, by exploiting features extracted from the post. 40 Despoina Paschalidou RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials We consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches 41 Miquel Perello-Nieto Training classifiers with weak labels It is well known that the labelling process of classification data-sets is expensive. One of the possible solutions is the use of semi-supervised techniques that use non-labelled samples to improve the performance on the labelled samples. However, these methods rely on strong prior assumptions on the unlabelled set. We propose new methods to train with weak labels; these are cases where the labels may be wrong, form a super-set, or are outdated. Our method transforms proper losses that require true labels to proper losses that can be used in weak label scenarios. 42 Bartosz Piotrowski Can Neural Networks Learn Logical Equivalence? Applying deep learning for logical reasoning tasks is an interesting and quite unexplored topic. We show an exemplary experiment of this kind. Preparing a good training set is usually non-trivial -- we want to make sure that the network does not "cheat" exploiting unintended dependencies and learns a proper thing. We show what kind of architectures can be appropriate and propose some ideas to explore. 43 Giorgia Ramponi Generative Adversarial Network for Time Series noisy with irregular sampling Time Series are sequences of measurements that follow non-random orders. The analysis of time series is based on the assumption that successive values in the data represent consecutive measurements taken at equally spaced time intervals. Most commonly, a time series is a sequence of data with a time order, where the time interval is given. Generating time series data is useful in various fields such as astronomy, econometric, quantitative finance or signal processing. Time series analysis has two main goals: identifying the nature of the phenomenon represented by the sequence of observations (classification) and forecasting, predict next value in the future (prediction). The challenge in Time Series analysis is in the data: more likely time series data are noisy, with missing observations (in this case the time interval is irregular) or too few observations. With this kind of data is hard to succeed in the tasks of classification and prediction. In this poster we propose a Conditional Generative Adversarial Network to generate time series with not fixed time intervals. We propose a model to generate time series data with the purpose of augmenting a dataset of various time series noisy. Conditioning the generator and the discriminator with the time intervals we generate new data. We show that a classifier trained with data generated by the GAN and tested on real data, achieves same performances as a classifier trained on real data. In this way given a dataset composed by time series of inbalanced classes we could improve the performance of the classifier augmenting the training set with generated time series data. 44 Ahmed Sabir Enhance Text Spotting with Semantic Information This poster addresses the problem of detecting and recognizing text in images acquired `in the wild'. This is a severely under-constrained problem which needs to tackle a number of challenges including large occlusions, changing lighting conditions, cluttered backgrounds and different font types and sizes. In order to address this problem we leverage on recent and successful developments in the cross-fields of machine learning and natural language understanding. In particular, we initially rely on off-the-shelf deep networks already trained with large amounts of data and that provide a series of text hypotheses per input image. The outputs of this network are then combined with different priors obtained from both the semantic interpretation of the image and from a scene-based language model. As a result of this combination, the performance of the original network is consistently boosted. 45 Kamil Safin Optimal model selection for paraphrase detection task We propose an algoritm of optimal model selection. As a criterion of quality we use model evidence. Model evidence is expressed by integral over pa- rameter space. In order to estimate it we use variational inference. As a approxi- mation of posterior and prior distributions we use normal distribution. We tested proposed algorithm of optimal model selection on the paraphrase classification task. As models we use different types of deep neural networks. Also we ana- lyzed how pretrain helps to estimate parameters of the model. In our experiment we use SemEval 2015 dataset. 46 Mehdi S. M. Sajjadi Assessing Generative Models via Precision and Recall Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as Fréchet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they yield one-dimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and the Variational Autoencoder. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution. 47 Anirban Sarkar Grad­CAM++: Generalized Gradient-­based Visual Explanations for Deep Convolutional Networks Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision based problems. However, these deep models are perceived as ”black box” methods considering the lack of understanding of their internal functioning. There has been a significant recent interest to develop explainable deep learning models, and this paper is an effort in this direction. Building on a recently proposed method called Grad-CAM, we propose Grad-CAM++ to provide better visual explanations of CNN model predictions (when compared to Grad-CAM), in terms of better localization of objects as well as explaining occurrences of multiple objects of a class in a single image. We provide a mathematical explanation for the proposed method, Grad-CAM++, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the class label under consideration. Our extensive experiments and evaluations, both subjective and objective, on standard data sets showed that Grad-CAM++ indeed provides better visual explanations for a given CNN architecture when compared to Grad-CAM. 48 Lukas Schott Robust Perception through Analysis by Synthesis The intriguing susceptibility of deep neural networks to minimal input perturbations suggests that the gap between human and machine perception is still large. We here argue that despite much effort, even on MNIST the most successful defenses are still far away from the robustness of human perception. We here reconsider MNIST and establish a novel defense that is inspired by the abundant feedback connections present in the human visual cortex. We suggest that this feedback plays a role in estimating the likelihood of a sensory stimulus with respect to the hidden causes inferred by the cortex and allow the brain to mute distracting patterns. We implement this analysis by synthesis idea in the form of a discriminatively fine-tuned Bayesian classifier using a set of class-conditional variational autoe ncoders (VAEs). To evaluate model robustness we go to great length to find maximally effective adversarial attacks, including decision-based, score-based and gradient-based attacks. The results suggest that this ansatz yields state-of-the-art robustness on MNIST against L0, L2 and L infinity perturbations and we demonstrate that most adversarial examples are strongly perturbed towards the perceptual boundary between the original and the adversarial class. 49 Akash Srivastava Ratio Matching MMD Nets: Low dimensional projections for effective deep generative models Deep generative models can learn to generate realistic-looking images on several natural image datasets, but many of the most effective methods are adversarial methods, which require careful balancing of training between a generator network and a discriminator network. Maximum mean discrepancy networks (MMD-nets) avoid this issue using the kernel trick, but unfortunately they have not on their own been able to match the performance of adversarial training. We present a new method of training MMD-nets, based on learning a mapping of samples from the data and from the model into a lower dimensional space, in which MMD training can be more effective. We call these networks ratio matching MMD networks (RMMMDnets). We train the mapping to preserve density ratios between the densities over the low-dimensional space and the original space. This ensures that matching the model distribution to the data in the low-dimensional space will also match the original distributions. We show that RM-MMDnets have better performance and better stability than recent adversarial methods for training MMD-nets. 50 David Stutz Learning 3D Shape Completion from Laser Scan Data with Weak Supervision 3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, ie, learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet and KITTI, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet, we additionally show that the approach is able to generalize to other object categories as well. 51 Daniel Heestermans Svendsen Active Emulation of Radiative Transfer Model with Gaussian Processes We introduce a methodology for constructing emulators multi-output Radiative Transfer Models (RTMs). RTMs compute the radiative transfer of radiation through a planetary atmosphere and are often costly to run. The proposed methodology, in a sequential and adaptive way, selects where to evaluate the RTM and is based on the notion of acquisition functions in Bayesian optimization. The Automatic Emulation methodology combines the predictive capabilities of Gaussian Processes with a suitably designed acquisition function which favors sampling in low density regions and high derivates of the interpolating function. We illustrate the promising capabilities of the method for the construction of an emulator of a standard leaf-canopy RTM, used to simulate Landsat8 spectra. 52 Carlos Villacampa-Calvo Alpha Divergence Minimization in Multi-Class Gaussian Process Classification This paper analyzes the minimization of α-divergences for approximate inference in the context of multi-class Gaussian process classification. For this task, several methods are explored, including memory and computationally efficient variants of the Power Expectation Propagation algorithm, which allow for efficient training using stochastic gradients and mini-batches. When these methods are used for training, very large datasets with up to several millions of instances can be considered. The proposed methods are also very general and they can easily interpolate between other popular approaches for approximate inference based on Expectation Propagation (EP) (α = 1) and Variational Bayes (VB) (α → 0) simply by varying the α parameter. An exhaustive empirical evaluation analyzes the generalization properties of each of the proposed methods for different values of the α parameter. The results obtained show that one can do better than EP and VB by considering intermediate values of α. 53 David Widmann Evaluation of model calibration in classification Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their ability to represent uncertainty about predictions. In safety-critical applications it is pivotal for a model to possess an adequate sense of uncertainty, which for probabilistic classifiers translates into outputting probability distributions that accurately match the empirical frequencies observed from realized outcomes. In this work we explain subtleties present in model calibration evaluation and propose different ways to quantify and visualize calibration in probabilistic classification. 54 Joel Zeder Scaling of annual maximum precipitation with changing temperature in Central Europe Previous studies found that changes of extreme daily precipitation show on average a positive relationship with temperature close to Clausius-Clapeyron scaling with strong spatial variability. Due to the inherent scarcity of extreme events and high internal variability, a large amount of long station series is required, which makes it challenging to detect a signal at the regional to local scale. Research is often limited by data availability and thus mostly based on publically available pre-calculated extreme indices that do not allow to assess the dependence on event duration. Therefore, also new non-linear machine learning analysis methods are implemented to efficiently exploit the available data. Here we use a new dense network of raw precipitation data series for Central Europe, extending over several countries including the Netherlands, Germany, Switzerland, and Austria, analysing intensity and frequency changes. Access to raw data allows to extract extreme precipitation indices for the yearly and seasonal maximum precipitation sum over a 1-, 3-, 5-, 7-, and 31-day period and the number of days per year exceeding the 95% and 99% precipitation quantiles of 1961-1990 from series covering at least eighty years within the twentieth century. Based on this we assess the spatial and temporal patterns in heavy rainfall intensification over Europe and their dependence on event duration and season. Non-parametric time series regression for the intensity and logistic regression for the frequency indices shows that a majority of series exhibit an increase since 1900, with a portion of significantly positive trends well exceeding those of resampled time series. This is also the case for almost all seasons and both winter and summer half years, as well as for almost all sub-regions, except for Austria having a wide range of stations showing negative trends. Non-stationary generalised extreme value distributions with temperature dependent location and scale parameters provide an estimate for the temperature dependency of yearly maximum precipitation and put the results in context of Clausius-Clapeyron scaling. Overall, we detect that the fraction of significantly positive trends is larger than expected by internal variability. In space, trends show a clear spatial pattern across country borders. 55 Yi Zheng Phase contrast computed tomography and deep learning The aim of this project is to combine deep learning methods with X-ray phase contrast imaging technique to develop a high sensitivity but low radiation dose 3D imaging method for breast cancer detection. X-ray phase contrast imaging computed tomography (PCCT) yields a higher contrast for soft tissues compared with conventional computed tomography (CT). Therefore it is more sensitive in detecting cancers. For example, a tumor in a piece of breast tissue was invisible in the traditional X-ray image but was identified using the X-ray phase contrast imaging method (Scherer 2015, PLOS ONE). However, the clinical application of such a technique is limited by (1) experimental setup unsuitable in a clinical environment; (2) high radiation dose. To address the first limitation, we are replacing the synchrotron radiation with a laboratory X-ray source and the fine gold grating with a piece of sandpaper, making it possible to be installed in a clinic. Before developing a full body scanner, our first step is to modify a bench-top micro-CT scanner and experiment on small excision specimens. Our method does not require a high brilliant laboratory liquid-metal jet source as in Zanette 2014 Phys. Rev. Lett, Zhou 2015 Opt. Lett. or a high precision motors in stepping as in Wang, 2016 Sci. Rep. This further relaxation in the requirement of specific equipment makes it one step closer to a clinical use. The key to success in this method is the sensitivity in subpixel displacement detection. We have improved the current speckle tracking method and received a 60% reduction in root mean square errors. For the second limitation, our solution is to find an algorithm that can faithfully reconstruct the 3D structure of the sample in a low radiation dose (resulting in low signal-to-noise ratio) scenario. Recently, a 3D reconstruction framework – automated transform by manifold approximation (AUTOMAP) was developed for undersampled data (Zhu 2018, Nature). It consists of three fully connected layers and two convolutional layers. Although the framework was developed for Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) data, I demonstrated that it can also be applied in a CT setup.
Second Poster Session: 4/09/2018, 18:30h - 20:30h
 # Presenter Title Abstract 1 Faez Ahmed Adaptively Learning Diversity from Human Feedback A diverse group is often understood as the one which has different kinds of items represented in it. Although metrics like Shannon index are often used to measure the diversity of items, no single established method for measuring diversity exists in the context of how humans perceive diversity. In this work, we model diversity of items using pairwise Gaussian processes and learn it based on feedback obtained from humans. We use active learning to reduce the number of queries. As humans have an innate understanding of how common items relate, this model captures how the average human reasons about diversity of groups of items. For learning tasks with set functions, eliciting desired information from human subjects is tricky. With the aim of measuring their perception of diversity, we compare two different methods to elicit human response--- triplets and pairwise set comparisons. We show that the proposed model successfully captures diversity of design items and allows applications like diverse ranking. 2 José Carlos Aradillas Jaramillo Boosting Handwriting Text Recognition in Small Databases with Transfer Learning In this presentation we deal with the offline handwriting text recognition (HTR) problem with reduced training data sets. Recent HTR solutions based on artificial neural networks exhibit remarkable solutions in referenced databases. These deep learning neural networks are composed of both convolutional (CNN) and long short-term memory recurrent units (LSTM). In addition, connectionist temporal classification (CTC) is the key to avoid segmentation at character level, greatly facilitating the labeling task. One of the main drawbacks of the CNN-LSTM-CTC (CRNN) solutions is that they need a considerable part of the text to be transcribed for every type of calligraphy, typically in the order of a few thousands of lines. Furthermore, in some scenarios the text to transcribe is not that long, e.g. in the Washington database. The CRNN typically overfits for this reduced number of training samples. Our proposal is based on the transfer learning (TL) from the parameters learned with a bigger database. We first investigate, for a reduced and fixed number of training samples, 350 lines, how the learning from a large database, the IAM, can be transferred to the learning of the CRNN of a reduced database, Washington. We focus on which layers of the network could not be re-trained. We conclude that the best solution is to re-train the whole CRNN parameters initialized to the values obtained after the training of the CRNN from the larger database. We also investigate results when the training size is further reduced. For the sake of comparison, we study the character error rate (CER) with no dictionary or any language modeling technique. The differences in the CER are more remarkable when training with just 350 lines, a CER of 3.3% is achieved with TL while we have a CER of 18.2% when training from scratch. As a byproduct, the learning times are quite reduced. Similar good results are obtained from the Parzival database when trained with this reduced number of lines and this new approach. 3 Oleg Bakhteev Variational deep learning model selection The research is devoted to deep learning model selection problem. The author propose to consider variational inference framework for this problem and combine it with neural architecture search methods. The author considers a model as a probabilistic model with prior on the model parameters and model architecture. The author proposes to consider the optimization problem as a bi-level optimization, where the parameters and structure are optimized using different loss functions based on Evidence lower bound. 4 Michiel Bakker Towards Real-World Domain Adaptation for Text through Prediction Propagation Natural language processing classifiers often have difficulty classifying texts that stem from a different domain than the labeled training data. Many domain adaptation methods have been proposed to train classifiers using only labeled texts from a single domain and unlabeled texts from other domains. Nevertheless, we find that the state-of-the-art methods all lack one or more desirable properties for real-world modeling. In particular, we find that the many methods using a domain-adversarial loss are unable to model domains with different label distributions. Motivated by these limitations we propose a new method, Prediction Propagation, that can classify texts from different domains without using an adversarial loss. Our method uses the label prediction for reconstructing the input text and backprops through the prediction as a way to learn label-related information for the new domain. A new neighborhood encoding architecture and multi-phase training force the network to use the prediction for reconstruction, thereby propagating useful gradients back to update the word embeddings. Our method has the desirable properties for real-world modeling and is able to obtain state-of-the-art performance. 5 Vuk Batanović Reducing Sample Selection Bias in the Construction of Balanced Sentiment Classification Datasets Random subsampling is often used to generate balanced sentiment classification datasets from a larger, unbalanced collection. However, if the starting collection suffers from any sort of sample selection bias, which is usually the case, the random sampling approach allows the bias to be retained in the balanced dataset. This bias makes the classifiers trained on such a dataset less robust, since they learn to rely on incidental patterns, which are not related to the sentiment of a text. The evaluation of such classifiers also tends to be affected, since cross-validation performances on the biased data tend to be inflated, while performances on independent test data, which does not suffer from the same bias, tend to be diminished. In this poster, we present a dataset balancing algorithm that minimizes the sample selection bias and we demonstrate its effectiveness on several sentiment analysis datasets. 6 Gabriel Bernardino Statistical shape analysis of cardiac morphology Several cardiac pathologies produce changes in the heart’s morphology. Assessing and understandings those changes is important not only for diagnosis, but also for a better understanding of the underlying pathophysiology and for therapy planning. In clinical research, morphology is assessed using a predefined set of scalar measurements (lengths, volumes, areas...). There is one main drawback to this approach: the functional nature of the shape is ignored, and reduced to a set of scalars. We propose a simple methodology that works in the space of shapes, extracting shape features using dimensionality reductionand using GLM to find the most discriminative shape. The method corrects for the effect of confounders, like age, gender and weight. This method has the advantages that 1) it is unbiased by the predefined clinic measurements, 2) the decision boundary can be easily visualized. The framework is applied to a real dataset, so as to find differences in the cardiac morphology between a control population, and a diseased one. 7 pierre berthet Sensitivity of cortical neurons to electrical stimulation using ECoG electrode array High-resolution, non-penetrating devices for direct electric stimulation of sensory cortex have the potential to become neuroprosthetic devices that can compensate for deficits in sight or hearing. The effect of extracellular electric stimulation with such devices, however, has so far not been investigated thoroughly, in particular in context of the variety of stimulation patterns possible with high-density electrode arrays and neuron types. In the context of visual neuroprosthetic devices, the ability to selectively stimulate different groups of neurons to potentially create many different phosphenes (visual impressions), is important. Here, we combine neuronal modeling and electrostatic volume-conductor theory to investigate the effect of electrical stimulation on the generation of neuronal action potentials. In general a successful stimulation will depend on properties of the neuron like the position, morphology, and membrane properties, as well as the electrical stimulation pattern, i.e., the geometrical arrangement of the stimulating contacts, electric pulse amplitudes and temporal forms, etc. To quantify the stimulus excitability of the neurons we first consider the sensitivity, that is, the minimum stimulation current amplitude (threshold current) needed to generate an action potentials for a particular neuron and stimulation pattern. We also investigate the selectivity, that is, the dependence of the threshold current on the position of the neuron. Biophysically detailed multicompartment models of cortical neurons using the NEURON simulation environment [1]⁠ and LFPy [2]⁠ are used in the simulation. The neurons are assumed to be embedded in an infinite homogeneous, isotropic and ohmic medium⁠. We compute electric potentials as generated by intracranial electroencephalography (ECoG) electrode arrays, and impose these as boundary conditions for the electric potential immediately outside each neuronal compartment. These imposed potentials in turn affect the neuronal dynamics, and the generation of action potentials. We first study stylized morphologies and demonstrate a critical role of their orientation and position relative to the applied electric field, and also of the polarity of the stimulation current [3,4]⁠. We further investigate the sensitivity and selectivity of morphologically detailed biophysical models, including models from the Allen Brain Institute and the Blue Brain Project, to various configurations of the electrode arrays. Acknowledgements: Neural Engineering System Design (NESD) program from the Defense Advanced Research Projects Agency (DARPA). References: [1] N. T. Carnevale and M. L. Hines, The NEURON Book. Cambridge: Cambridge University Press, 2006. [2] H. Lindén, E. Hagen, S. Łęski, E. S. Norheim, K. H. Pettersen, and G. T. Einevoll, “LFPy: a tool for biophysical simulation of extracellular potentials generated by detailed model neurons,” Front. Neuroinform., vol. 7, no. January, pp. 1–15, 2014. [3] F. Rattay, “Modeling the excitation of fibers under surface electrodes,” IEEE Trans. Biomed. Eng., vol. 35, no. 3, pp. 199–202, 1988. [4] F. Rattay, “The basic mechanism for the electrical stimulation of the nervous system,” Neuroscience, vol. 89, no. 2, pp. 335–346, 1999. 8 Rob Bowman Robust Machine Learning Engineering for Reliable Customer Deployments At Prowler.IO, we are developing an AI platform for decision-making. This poster discusses the implementation of an early stage product for a last mile delivery company, in which we predict demand for deliveries across the city using customer data. The product is divided into two components: a data-pipeline that transforms data into model-ready features, and a solution component which learns probabilistic models based on these features. This poster examines some of the challenges of building machine learning models in a robust and repeatable way with unclean data for real world deployment. 9 Alejandro Catalina Accelerated Block Coordinate Descent for Sparse Group Lasso In this work we transfer several recent acceleration techniques for the FISTA algorithm for composite convex optimization to the context of problems amenable to be solved by Block Coordinate Descent and, in particular, we show how these improvements can be used to speed up the training of the Sparse Group Lasso model. Experiments in several real life datasets illustrate how the proposed method can outperform the state of the art approach in terms of both the number of inner FISTA and outer block iterations required and, more importantly, the number of function and gradient evaluations needed to achieve a certain precision 10 YU CHEN A Topic Model for Discovery of Activities of Daily Living in a Smart Home We present an unsupervised approach for discovery of \acf{ADL} in a smart home. Activity discovery is an important enabling technology, for example to tackle the healthcare requirements of elderly people in their homes. The technique applied most often is supervised learning, which relies on expensive labelled data and lacks the flexibility to discover unseen activities. Building on ideas from text mining, we present a powerful topic model and a segmentation algorithm that can learn from unlabelled sensor data. The model has been evaluated extensively on datasets collected from real smart homes. The results demonstrate that this approach can successfully discover the activities of residents, and can be effectively used in a range of applications such as detection of abnormal activities and monitoring of sleep quality, among many others. 11 Hryhorii Chereda Graph-based Convolutional Neural Networks for analyzing pathways in cancer In recent years deep learning was applied to a wide range of problems in various areas. Such deep learning tools as convolutional neural networks (CNNs) have been shown to work well in natural language processing and computer vision, especially at image classification tasks. Furthermore, CNN's have been applied to bioinformatic challenges like patient stratification tasks. Nowadays, deep learning (including CNNs) is extending to Non-Euclidean domains such as graph-structured data and manifolds. We are planning to map gene-expression data to the vertices of biological pathways and feed this graph-structured data into CNNs in order to classify patients. The usual CNN architecture consists of three types of layers: convolutional layers, pooling layers, and fully connected layers. The first two layers utilize the structure of the data preparing informative features for the fully connected neural network layers. In our work, we consider three popular, but different approaches developed for application of CNN on graph-structured data. Our research aims to compare these approaches in order to address the question if the use of graph-based CNNs is able to provide valuable classification improvements by utilizing prior pathway knowledge. Preliminary results show that the utilizing of the WNT signaling pathways as a prior knowledge does not seem to improve the performance of the classifier in the case of breast cancer patients. Hence, the future work will concern different ways of the integration of the prior knowledge and application of different graph CNN approaches. 12 Lovish Chum Using Domain Adaptation for Few Shot Generative Modeling Although few shot learning is an extensively studied sub-topic in computer vision literature, it has not been explored in the context of generative modelling. In this ongoing work, we are trying to get a popular generative model [GAN] to produce plausible new instances even if we have insufficient data [ few instances ] for training. 13 Irene Córdoba Uniform sampling of decomposable Gaussian graphical models We propose a novel Metropolis-Hastings algorithm to sample uniformly from the space of decomposable Gaussian graphical models. The method is based on previous work on uniform sampling of correlation matrices. Our approach is intuitive and simple, based on the interpretation of the Cholesky factorization of the inverse covariance matrix and Markov chain Monte Carlo theory. We analyze the convergence of the resulting Markov chain both theoretically and empirically. We show in numerical experiments how traditional sampling methods in Gaussian graphical models are biased towards certain regions of the whole space, whereas our approach uniformly explores all of it. 14 Ruifei Cui Learning the Causal Structure of Copula Models with Latent Variables In psychometrics, sociology, and econometrics, we often come across latent variables that cannot be measured directly, such as attitude, intelligence, and depression. The way to get a grip on such variables is to construct a measurement model by which a latent variable is linked to multiple indicators, e.g., responses to questionnaire items. We aim to learn causal relations among latent variables from observed data of their indicators. 15 César de Pablo Sánchez Uncertainty Modelling in Deep Networks for Short and Noisy Time Series Forecasting Deep Learning is a consolidated, state-of-the-art machine learning tool to fit a function y = f(x) when provided with large datasets of examples {(xi, yi)}. However, in regression tasks, the straightforward application of Deep Learning models provides a point estimate of the target. In addition, the model does not take into account the uncertainty of a prediction. This represents a great limitation for tasks where communicating an erroneous prediction carries a risk. In this paper we tackle a real-world problem of forecasting impending financial expenses and incomings of customers, while displaying predictable monetary amounts on a mobile app. We augment Deep Learning models with a heteroscedastic model of the variance of a network’s output. Experimentally, we achieve a higher accuracy than non-trivial baselines. More importantly, we in- troduce a mechanism to discard low-confidence predictions, which means that they will not be visible to users. This should help enhance the user experience of our product. 16 David Díaz-Vico Deep Fisher Discriminant Analysis Fisher Discriminant Analysis’ linear nature and the usual eigen-analysis approach to its solution have limited the application of its underlying elegant idea. In this work we will take advantage of some recent partially equivalent formulations based on standard least squares regression to develop a simple Deep Neural Network (DNN) extension of Fisher’s analysis that greatly improves on its ability to cluster sample projections around their class means while keeping these apart. This is shown by the much better scores of class mean classifiers when applied to the features provided by simple DNN architectures than what can be achieved using Fisher’s linear ones. 17 Julian Fierrez Mobile Touchscreen Biometrics: Password and Continuous Authentication We present a summary of our recent works in mobile touchscreen biometrics for user authentication. We study two different architectures: 1) handwritten passwords for one-time authentication, and 2) swipe gestures conducted over time for continuous authentication. We study both traditional hand-crafted and novel end-to-end deeply learned approaches for exploiting the dynamics of touch gestures in such scenarios. 18 Eduardo César Garrido Merchán Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints This work presents PESMOC, Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints, an information-based strategy for the simultaneous optimization of multiple expensive-to-evaluate black-box functions under the presence of several constraints. Iteratively, PESMOC chooses an input location on which to evaluate the objective functions and the constraints so as to maximally reduce the entropy of the Pareto set of the corresponding optimization problem. The constraints considered in PESMOC are assumed to have similar properties to those of the objectives in typical Bayesian optimization problems. That is, they do not have a known expression (which prevents any gradient computation), their evaluation is considered to be very expensive, and the resulting observations may be corrupted by noise. We present strong empirical evidence in the form of synthetic, benchmark and real-world experiments that illustrate the effectiveness of PESMOC. 19 Azin Ghazimatin 20 Estibaliz Gómez de Mariscal Fully automatic exosomes segmentation in Transmission Electron Microscopy images Exosomes are nano-scale cell-derived extracellular vesicles, involved in the intercellular communication. Exosomes quantification is currently done manually by biologists and its automation will help them to remarkably progress in their research. We present the Fully Residual Unet for the segmentation of exosomes in Transmission Electron Microscope images and the Radon transform properties to separate clusters. An accuracy over 80% and 2s processing time for 2048x2048 pixels image are achieved. 21 Wenbo Gong Meta Learning for Stochastic Gradient MCMC Existing SGMCMC methods are not tailed to any specific probabilistic models and require strong physical and math intuitions. Here, we present the first meta learning algorithm that allows automated design of SGMCMC proposals. Experiments validate the learned sampler and it achieves faster convergence than baseline models. 22 Vivek Gupta Unsupervised Document Vector Representation using Partition Word-Vectors Averaging Multiple NLP tasks require efficient representation of text document. Recent work showed that simple weighted averaging of word vector for sentence representation could outperform complicated seq2seq neural model like {RNN, LSTM} in textual similarity tasks and supervised text classification. However, priors works ignore the polysemic characteristic of words, which would be problematic when we are embedding longer pieces of text which have multiple sentences. To overcome this, we propose a new embedding technique, P-SIF, a partitioned weighted averaging of word embeddings based document embeddings. P-SIF is successful in extending the idea of weighted averaging based sentence embedding to partitioned weighted averaging embedding suited for representing longer multiple sentences document. We showed that P-SIF outperformed weighted averaging for all ranges of document length from single to multiple sentences. We also showed that P-SIF worked relatively better for long text documents than short text document as compared to earlier methods. Through extensive experiments on numerous similarity tasks, multi-class and multi-label classification tasks, we showed the effectiveness of representation. 23 Ulrich Hamann Towards thunderstorm nowcasting by applying machine learning to a multi-sensor observation and NWP model database Weather is a highly complex phenomenon which influences our daily life or can even be dangerous and life threatening. For example thunderstorms produce hail, lightning, gale-force wind gusts, tornadoes, heavy rain, flash floods, and landslides, which all potentially can cause severe damage to properties and infrastructure and impact aviation, traffic as well as personal health and safety. On this poster, we present our plans to predict storm paths and hazards of thunderstorms by means of machine learning (ML) algorithms. Due to the chaotic nature of the atmosphere and the non-trivial process of assimilating high-resolution observations, current Numerical Weather Prediction models face difficulties in forecasting the exact position and strength of thunderstorms. Hence, short term predictions based on observations are heavily used by forecasters to issue severe weather warnings. In general, this process is based on the experience and intuition of the forecaster. ML provides an excellent, in this field far underutilized framework to automatize and objectify this process by deriving the highly non-linear relation between predictors such as the most recent radar, satellite, and lightning observations and the target parameter such as risk of hail damage or probability of strong wind gust. Within this project, several methods such as decision trees, random forests, artificial neural networks with different configuration are scheduled for trial. As the storm dynamics are determined by different scales, in particular topographic lifting in mountainous areas, convolutional technique will be applied. This high-dimensional, multi-variate problem, with an abundance of possible predictors such as macro- and micro-physical properties of the atmosphere, requires predictor importance ranking and/or dimensionality reduction. As usually data archives of several years are available for training, a good scalability of the applied ML techniques is essential to profit from the high availability of input data. The algorithm to develop should also be suitable for supercomputing facilities and capable of using GPUs. 24 Israel Herraiz Reinforcement Learning for Fair Dynamic Pricing Unfair pricing policies have been shown to be one of the most negative perceptions customers can have concerning pricing, and may result in long-term losses for a company. Despite the fact that dynamic pricing models help companies maximize revenue, fairness and equality should be taken into account in order to avoid unfair price differences between groups of customers. This paper shows how to solve dynamic pricing by using Reinforcement Learning (RL) techniques so that prices are maximized while keeping a balance between revenue and fairness. We demonstrate that RL provides two main features to support fairness in dynamic pricing: on the one hand, RL is able to learn from recent experience, adapting the pricing policy to complex market environments; on the other hand, it provides a trade-off between short and long-term objectives, hence integrating fairness into the model's core. Considering these two features, we propose the application of RL for revenue optimization, with the additional integration of fairness as part of the learning procedure by using Jain's index as a metric. Results in a simulated environment show a significant improvement in fairness while at the same time maintaining optimisation of revenue. 25 Riikka Huusari Multi-View Metric Learning in Vector-Valued Kernel Spaces We consider the problem of metric learning for multi-view data and present a novel method for learning within-view as well as between-view metrics in vector-valued kernel spaces, as a way to capture multi-modal structure of the data. We formulate two convex optimization problems to jointly learn the metric and the classifier or regressor in kernel feature spaces. An iterative three-step multi-view metric learning algorithm is derived from the optimization problems. In order to scale the computation to large training sets, a block-wise Nyström approximation of the multi-view kernel matrix is introduced. We justify our approach theoretically and experimentally, and show its performance on real-world datasets against relevant state-of-the-art methods. 26 Abdullah-Al-Zubaer Imran PDV-Net: Reliable, Fast, and Automatic Segmentation of Pulmonary Lobes Reliable and automatic segmentation of lung lobes is important for diagnosis, assessment, and quantification of pulmonary diseases. The existing techniques are prohibitively slow, undesirably rely on prior (airway/vessel) segmentation, and/or require user interactions for optimal results. This work presents a reliable, fast, and fully automated lung lobe segmentation based on a progressive dense V-network (PDV-Net). The proposed method can segment lung lobes in one forward pass of the network, with an average runtime of 2 seconds using 1 Nvidia Titan XP GPU, eliminating the need for any prior atlases, lung segmentation or any subsequent user intervention. We evaluated our model using 84 chest CT scans from the LIDC and 154 pathological cases from the LTRC datasets. Our model achieved a Dice score of 0.939 ± 0.02 for the LIDC test set and 0.950 ± 0.01 for the LTRC test set, signicantly outperforming a 2D U-net model and a 3D dense V-net. We further evaluated our model against 55 cases from the LOLA11 challenge, obtaining an average Dice score of 0.935—a performance level competitive to the best performing model with an average score of 0.938. Our extensive robustness analyses also demonstrate that our model can reliably segment both healthy and pathological lung lobes in CT scans from different vendors, and that our model is robust against configurations of CT scan reconstruction. 27 Rani Izsak Synergy: Mathematical modelling and ML applications Synergy is inherent to various algorithmic problems. For example, in goods’ allocation problems, a phone and an accessory of it can have a joint value that is much higher than the sum of their individual values. Unfortunately, instances that admit synergy tend to be computationally hard to tackle. We cope with this hardness by parametrizing our inputs with a complexity measure that measures the amount of synergy: the supermodular degree. We then design ML related algorithms that output solutions of quality that depends on this measure. 28 Matthew Jagielski Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by machine learning algorithms. In this paper, we perform the first systematic study of poisoning attacks and their countermeasures for linear regression models. In poisoning attacks, attackers deliberately influence the training data to manipulate the results of a predictive model. We propose a theoretically-grounded optimization framework specifically designed for linear regression and demonstrate its effectiveness on a range of datasets and models. We also introduce a fast statistical attack that requires limited knowledge of the training process. Finally, we design a new principled defense method that is highly resilient against all poisoning attacks. We provide formal guarantees about its convergence and an upper bound on the effect of poisoning attacks when the defense is deployed. We evaluate extensively our attacks and defenses on three realistic datasets from health care, loan assessment, and real estate domains 29 David Jiménez-Cabello Learning to Learn Face Spoofing Attacks In the current context of digital transformation, the increasing use of mobile devices for accessing online services highlights the importance of providing a secure digital ecosystem. It is precisely in this crossroad where mobile biometric technologies, and specially face recognition, emerges as a secure and convenient approach. However, this scenario also brings some specific threats that needs to be addressed, where presentation attack detection (PAD) is one of the most challenging scenario. Although much effort has been devoted in the research of anti-spoofing techniques over the past few years, there are still many challenges to be met when implementing these systems in real use cases. In this work we present two Deep Learning-based approaches that aims at solving the detection of face spoofing attempts for two of the most realistic scenarios: i) weakly- and ii) non-collaborative face-PAD systems. We also review some prospective research directions and current limitations based on our experience bringing this technology to market. 30 Sai Praneeth Reddy Karimireddy Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems First-order optimization methods comprise two important primitives: i) the computation of gradient information and ii) the computation of the update that leads to the next iterate. In practice there is often a wide mismatch between the time required for the two steps, leading to underutilization of resources. In this work, we propose a new framework, Approx Composite Minimization (ACM) that uses approximate update steps to ensure balance between the two operations. The accuracy is adaptively chosen in an online fashion to take advantage of changing conditions. Our unified analysis for approximate composite minimization generalizes and extends previous work to new settings. Numerical experiments on Lasso regression and SVMs demonstrate the effectiveness of the novel scheme. 31 Timo Klock Inference and estimation for nonlinear single index models Single Index Models are simple but widely used models for machine learn- ing, where the response variable is modeled as a monotonic function of a linear combination of features, therefore allowing for dimension reduction. In the nonlinear version we allow the linear combination of features to vary smoothly between different regimes of the response variable, thereby adding additional model flexibility. Equivalently, the model assumes the unknown function of interest can be written as a composition of a function defined on a smooth curve and the orthogonal projection onto that curve. In this poster we introduce the nonlinear single index model and we present a fast, intuitive algorithm for the efficient estimation of important directions (or equivalently the tangent field of the unknown curve) and the function of interest. Our proposed estimator achieves ambient dimension independent learning rates by leveraging the structural assumption and thus beats the curse of dimensionality. We present theoretical results and simulation studies to support our claims. 32 Nikola Konstantinov Kernel Dependence Measures for Unsupervised Learning In this poster presentation I will discuss my master thesis, which I did under the supervision of Professor Dino Sejdinovic at Oxford University. The aim of this project is to explore the kernel method in machine learning and its applications to clustering and learning taxonomies. We focus on the Hilbert-Schmidt Independence Criterion (HSIC), a popular kernel dependence measure, and discuss how it has previously been applied to unsupervised learning. We consider the CLUHSIC and numerical taxonomy clustering algorithms. These existing methods come with a high computational cost, which makes their usage impractical for large data sets. Therefore, the project discusses some large-scale approximations of the HSIC that have recently been proposed in the literature and how they can be used to speed up the clustering algorithms. We also propose that a Gini impurity index is used to prevent CLUHSIC from converging to partitions with number of clusters less than the pre-defined one. We apply CLUHSIC together with its proposed modifications to synthetic data and their performance is compared. The methodology lends itself to constructing cluster structure that conforms to a taxonomy specified by the user. We investigate this on a dataset comprising a corpus of NIPS papers. 33 Evgeniya Korneva MERCS: Multi-Directional Ensembles of Classification and Regression treeS In practice, data analysis often happens in two steps. First, a target-specific model is learned. Then, this model is used to perform inference. However, in many cases a user is interested in performing various inference tasks on the same data. Moreover, not of of these tasks may be known in advance. Therefore, it appears beneficial to build a single multi-directional model that can be then used for any prediction task. We develop such a model based on ensemble of decision trees and present main research questions associated with its induction and inference. 34 Vitaly Kurin Challenges of Learning from Demonstration in the Real World Learning from Demonstration is a powerful framework which has already shown its great potential in clean lab conditions. However, applications of it in the real world are still scarce. In our poster, we will identify the most thorough challenges a practitioner might face and will show the existing body of work which can be used to tackle these problems. 35 JIA Linlin Graph Kernels based on Linear Patterns: Theoretical and Experimental Comparisons Graph kernel is a powerful tool to bridge the gap between machine learning and data encoded as graphs. Most graph kernels are based on a decomposition of graphs into a set of patterns. The similarity between graphs is then deduced from the similarity of corresponding patterns. Among different possible sets of patterns, linear patterns based kernels often constitute a good trade off between time consumption and accuracy performance. In this work, we propose a thorough study and comparison of the existing graph kernels based on different linear patterns, namely walks and paths. This work leads to a clear comparison of pros and cons of different proposed kernels. First, all graph kernels are studied in detail, including their mathematical foundation, structures of patterns and time complexity. Relationships among these kernels are studied with respect to their development history and mathematical representations. Then, experiments are performed on various datasets exhibiting different kinds of graphs, including labeled and unlabeled graphs, graphs with different numbers of nodes, graphs with different average degrees, cyclic and acyclic graphs, planar and non-planar graphs. Finally, performance and time complexicity of kernels are compared and analyzed on these graphs, and suggestions are proposed to choose kernels according to the type of graph data. An open source python library containing an implementation of all discussed kernels is publicly available on Github to the community, so as to promote and facilitate the use of graph kernels in machine learning problems. 36 Manuel Lopez-Martin Application of Machine Learning to prediction problems in data networking Application of deep learning techniques to prediction problems in the classification of network traffic, intrusion detection and estimation of user quality of experience for video transmission. 37 Mathurin Massias Celer: a Fast Solver for the Lasso with Dual Extrapolation Convex sparsity-inducing regularizations are ubiquitous in high-dimensional machine learning, but solving the resulting optimization problems can be slow. To accelerate solvers, state-of-the-art approaches consist in reducing the size of the optimization problem at hand. In the context of regression, this can be achieved either by discarding irrelevant features (screening techniques) or by prioritizing features likely to be included in the support of the solution (working set techniques). Duality comes into play at several steps in these techniques. Here, we propose an extrapolation technique starting from a sequence of iterates in the dual that leads to the construction of improved dual points. This enables a tighter control of optimality as used in stopping criterion, as well as better screening performance of Gap Safe rules. Finally, we propose a working set strategy based on an aggressive use of Gap Safe screening rules. Thanks to our new dual point construction, we show significant computational speedups on multiple real-world problems. 38 Gonzalo Mateo-García Transfer learning with Convolutional Neural Networks for Cloud Detection in Satellite Imagery Accurate and automatic detection of clouds in optical satellite images is a key issue for a wide range of Earth observation applications. With no accurate cloud masking, undetected clouds are one of the most significant sources of error in sea and land biophysical parameter retrieval and climate studies. Cloud masking is a semantic segmentation problem where a cloud flag for each pixel must be provided. Convolutional neural networks (CNN) have shown an excellent performance on this problem provided enough labeled data. However, simultaneous collocated information about the presence of clouds within an image is usually not available or requires a great amount of manual labor. In this work, we propose to learn from the available Landsat-8 satellite cloud masks datasets and transfer this learning to solve cloud detection problems in new satellites such as the Proba-V vegetation monitoring satellite. The developed models outperform current operational cloud detection algorithm without being trained with any real Proba-V image. Moreover, cloud detection accuracy can be further increased if the CNN are fine-tuned using a limited amount of supervised data. 39 Swapneel Mehta DeepJet: A Portable Machine Learning Environment for Physics The DeepJet Framework is used to extrapolate and find applications for cutting-edge practices in deep learning to problems involving supervised learning for particle physics. Originally envisaged as a set of scripts to support jet-flavor tagging and classification, it has grown to encompass a range of use-cases as it underwent a transformation into a multi-purpose tool for physics analysis at the CERN CMS Experiment. The framework sports a range of features: simple out-of-memory training a with multi-threaded approach to maximally exploit the hardware acceleration, simple and streamlined I/O to help bookkeeping of the developments. The DeepJet environment is compatible with the CMS Software Repository. It is installable as a Python package to simplify the deployment across multiple systems. 40 Manuel Molano-Mazon Synthesizing realistic neural population activity patterns using Generative Adversarial Networks The ability to model and synthesize realistic patterns of neural activity is crucial for studying neural information processing. Here we used the Generative Adversarial Networks (GANs) framework to simulate the concerted activity of a population of neurons. We adapted the Wasserstein-GAN variant to facilitate the generation of unconstrained neural population activity patterns while still benefiting from parameter sharing in the temporal domain. We demonstrate that our proposed GAN, which we termed Spike-GAN, generates spike trains that match accurately the first- and second-order statistics of datasets of tens of neurons and also approximates well their higher-order statistics. We apply Spike-GAN to a real dataset recorded from salamander retina and show that it performs as well as state-of-the-art approaches based on the maximum entropy and the dichotomized Gaussian frameworks. Importantly, Spike-GAN does not require to specify a priori the statistics to be matched by the model, and so it constitutes a more flexible method than these alternative approaches. Finally, we show how to exploit a trained Spike-GAN to construct importance maps to detect the most relevant statistical structures present in a spike train. Spike-GAN provides a powerful, easy-to-use technique for generating realistic spiking neural activity and for describing the most relevant features of the large-scale neural population recordings studied in modern systems neuroscience. 41 Pablo Moreno-Muñoz Heterogeneous Multi-task Gaussian Process Learning Multi-output Gaussian Processes (MOGP) generalize the powerful Gaussian Process (GP) predictive model to the vector-valued random field setup. It has been experimentally shown that by exploiting correlations between multiple outputs across the input space, it is possible to improve predictions. This framework has been typically applied to datasets where outputs belong to the same statistical type, e.g. all tasks are regression or classification tasks. We present a novel extension of MOGPs for handling heterogeneous outputs. Assuming that each output has its own likelihood function whose parameters are modeled by correlated latent functions, we are able to introduce a MOGP prior with LMC covariance function. We also demonstrate that is already possible to obtain tractable variational bounds for arbitrary combinations of statistical data types in the output domain. Additionally, we make our model scalable by developing stochastic variational inference. We demonstrate the utility of our approach and its performance on a large real-world data set for demographic applications. 42 Jeppe Nørregaard Learning a Distance Measure for Discrete Sequences We are applying some classical dynamic programming techniques to create a probabilistically motivated distance measure between discrete sequences. This can be used to model errors in sequences (DNA mutations, misspellings in NLP etc.) and as alternative neurons for sequences. 43 Francisca Oladipo Unique Named Entity Recognition with Specific Application to the Hausa Language This work presents a custom Hausa based Entity Recognition model built on family tree and family names using supervised learning on a labelled corpus. The objective of the research is to deploy Named Entity Recognition (NER) technique to identify names in Hausa Language with specific applications to persons who have been internally displaced due to the Boko Haram insurgency in Northern Nigeria and with the ultimate aim of reuniting families. Standard NER models were examined but none could be accurately trained to recognize Hausa names due to their unique conventions and the fact that those models were originally trained with English names. The Hausa version of the Facebook API could not provide the required dataset for automatic gazetting as the target entities do not have much presence on Facebook. The source of the data therefore is the publicly available datasets on refugees and victims of the Boko Haram insurgency as well as social media data mined from posts by Hausa users. 44 Arghya Pal Neural networks attributions - a causal perspective In this work, we propose a novel attribution method which provides local explanations to a neural decision-making process along with a global picture. We suggest improvements to efficiently calculate interventional expectations, instead of brute-force interventions on the system every time. For time-series models we obtain a closed form solution for optimal time-lag dependence of the output of a recurrent network on its past. We argue that all current attribution methods are biased and highly local. On the contrary, via marginalizations, our method is robust to spurious biases induced by other input values and allows a more global understanding of the deep network, which is a weakness of all prior attribution-based methods. 45 Amandalynne Paullada Multi-label element extraction for evidence-based medicine We present an LSTM-based multi-label sentence classifier as a tool to assist in the automation of systematic reviews for evidence-based medicine. Our model incorporates domain-specific word embeddings relevant to the biomedical information extraction task. We compare binary relevance and multi-label classifier settings on an expert-annotated corpus of abstracts, finding that learning labels jointly provides gains to performance. 46 Florian Pfisterer Learning Multiple Defaults for Machine Learning Algorithms An often encountered problem in machine learning is the selection of an appropriate parameter configuration for an algorithm on a given dataset. One simple way of selecting a configuration is to use default settings, often proposed along with the publication of a new algorithm. Those default values are usually chosen due to theoretical considerations or to work good enough on a wide variety of data situations. Different methods, such as random search and bayesian optimisation have been proposed. Those usually improve performance, but add additional complexity and computational costs. We propose learning not only one, but multiple default configurations for an algorithm from a large data set of benchmark experiment performances for different hyperparameter-configurations on different data sets. A best configuration can then be selected using a simple search over n (for example 10) pre-determined default configurations. This allows for a more robust selection of configurations, while keeping the computational cost low without introducing additional dependencies. 47 Cristina Pinneri Systematic self-exploration of behaviors for robots in a dynamical systems framework One of the challenges of this century is to understand the neural mechanisms behind cognitive control and learning. Recent investigations propose biologically plausible synaptic mechanisms for self-organizing controllers, in the spirit of Hebbian learning. In particular, differential extrinsic plasticity (DEP) has proven to enable embodied agents to self-organize their individual sensorimotor development, and generate highly coordinated behaviors during their interaction with the environment. These behaviors are attractors of a dynamical system. In this paper, we use the DEP rule to generate attractors and we combine it with a “repelling potential” which allows the system to actively explore all its attractor behaviors in a systematic way. With a view to a self-determined exploration of goal-free behaviors, our framework enables switching between different motion patterns in an autonomous and sequential fashion. Our algorithm is able to recover all the attractor behaviors in a toy system and it is also effective in two simulated environments. A spherical robot discovers all its major rolling modes and a hexapod robot learns to locomote in 50 different ways in 30min. 48 Carlos Ramos Recursive Maxima Hunting Recursive Maxima Hunting (RMH) is a dimensionality reduction method for functional data, arising in the context of binary classification. As functional data have an infinite number of variables, it is very desirable to reduce the dimensionality of the data to a small number of variables. This also allows to apply multivariate classification techniques. As RMH is a feature selection method, its application also improves the interpretability of the data. This method assumes that the trajectories sampled from the two classes are homoscedastic, that is, have a common covariance function. This method selects the point that maximize a dependency measure with the class, such as distance covariance. The method then uses the covariance function of the trajectories (assumed to be the covariance function of a Gaussian process) to subtract the information of the selected point by conditioning. This approach uncovers new points that are relevant after the first ones have been chosen. We have proved that, under some conditions, this procedure allows us to find the points that appear in the Bayes rule of the original classification problem. We have also tested this method with real data, not necessarily conforming with the original hypothesis, observing that the performance of a classifier after this method is applied is comparable and often better than other dimensionality reduction methods used with functional data. 49 Sridhar Rao Next-Generation Network Testing: Opportunities and Challenges This poster showcases the use of machine learning for next-generation network testing. It highlights both the opportunities and challenges in using Machine learning in Network testing, specifically using protocol emulators and packet generators. 50 Laura Rieger Increasing the robustness of explanation techniques for deep learning To obtain a measure of robustness for explanation methods for neural networks, we fine-tune the dense layers of a network with different subsets of the training data. Computing the mean and standard deviation of the obtained explanations, we find empirically that the results more often agree with the ground truth for the relevant area and show appropriately high uncertainty for areas with dubious importance. Based on the promising results from this naïve approach, further work into uncertainty for explanation techniques is promising. 51 Hajer SALEM A Self-Adaptive and Semi-Supervised Learning Approach for Online Power Disaggregation New smart meters are installed in households around the word. Their main aim is to show in real time which appliances are consuming and how much their are consuming. Utility companies are investing in these sensors, in part due to their continued requirements for managing their energy resources. Also, providing consumers with detailed appliances consumption seems to affect their consumption behavior and reduce the total energy consumed. Non Intrusive Load Monitoring (NILM) approaches are a core topic of this technology. NILM approaches are based on information provided by smart meters with machine learning algorithms and techniques in order to identify single appliances consumption from the total load. In this work, the problem of identifying appliances' states and consumption from the household total power load is addressed. We propose advanced semi-supervised approach for NILM that performs online training and model creation. Indeed, the proposed approach is semi-supervised since only prior information related to appliances' consumption that is publicly available is used. Furthermore, the proposed approach succeeds to detect online, training windows where multi-state appliances in a household are operating. The approach takes advantage of additional contextual features. An online training Expectation Maximization algorithm is proposed for NILM to adjust appliance model parameters. Moreover a modified Viterbi algorithm that filters out unlikely observations based on appliance consumption and contextual features is proposed. We evaluate our work on a publicly available framework and ECO data set and show the performance of our online training approach compared to offline approaches and supervised ones. 52 Philip Schmidt Pushing the boundaries of wearable-based affect recognition Affect recognition aims to detect a person’s affective state based on observations, with the goal to e.g. provide reasoning for decision making or support mental wellbeing. Besides approaches based on audio/visual data or text, recently solutions relying on wearable sensors, recording mainly motion and physiological parameters, have received increasing attention. Due to their rich functionality and form factor, wearables offer an ideal platform for long-term affect recognition applications. During a laboratory study, we recorded wearable sensor data from 15 persons, each subject to three different affective stimuli (positive, neutral, and negative). This poster presents a first classical evaluation distinguishing these affective states, using well-known features and standard machine learning methods (e.g. Random Forest, AdaBoost, LDA). Considering recent advances in deep learning methods within the domain of time series classification (e.g. speech recognition, stock forecasting) we aim to apply these methods in wearable-based affect recognition. Given large inter-person differences and expensive label acquisition, personalisation and semi-supervised techniques are particularly interesting research directions. The aim of the poster is to facilitate a discussion about different machine learning approaches pushing the boundaries of wearable-based affective computing. 53 Jonathan Schwarz Neural Processes A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexible, however they are also computationally intensive and thus limited in their applicability. We introduce a class of neural latent variable models which we call Neural Processes (NPs), combining the best of both worlds. Like GPs, NPs define distributions over functions, are capable of rapid adaptation to new observations, and can estimate the uncertainty in their predictions. Like NNs, NPs are computationally efficient during training and evaluation but also learn to adapt their priors to data. We demonstrate the performance of NPs on a range of learning tasks, including regression and optimisation, and compare and contrast with related models in the literature. 54 Alireza Shafaei How Can We Know That We Know? A Principled Benchmark for Image Recognition Tasks We point out that the traditional learning through empirical risk minimization dictates a necessary condition for knowledge within deep neural networks. Does the given sample belong to the training distribution? It is desirable to answer this question to prevent unpredictable behaviour in deployed systems. A recent surge of interest on this problem has led to the development of sophisticated techniques in the deep learning literature. However, due to the absence of a standardized problem formulation or an exhaustive evaluation, it is not evident if we can rely on these methods in practice. What makes this problem different from a typical supervised learning setting is that we do not have access to out-of-distribution samples in practice. Therefore the classical approaches can yield misleading results. We propose a three-dataset evaluation scheme as a practical and more reliable strategy to assess progress on this problem. We present an exhaustive comparison of the existing methods from related areas on image classification problems. We demonstrate that simple data mining techniques can outperform recently developed methods for detecting out-of-distribution samples. Furthermore, we show that for realistic applications of high-dimensional images, the existing methods are only slightly better than random predictions. Our analysis reveals areas of strength and weakness of each method and outlines a roadmap for future work. 55 Siddharth Srivastava Features for 3D Point Clouds Finding strong descriptors for 3D point clouds is a challenging and interesting problem. As the sensors for capturing 3D data are becoming more accessible, there is an imperative need for developing not only robust but efficient and compact descriptors. Moreover, deep learning networks have begun providing good results on various types of 3D modalities. However, there is still a gap in the ability of descriptors to capture fine details of the models and simultaneously represent them in compact manner. Therefore, we present a detailed comparative study and discuss novel 3D descriptors. We also discuss their application on practical problems. 56 Stefan Stark Embedding the ICU with Multivariate Disease Trajectory Maps A major challenge for clinicians working in the Intensive Care Unit (ICU) is to monitor the status of all present patients. A clinician might for example spend the first hour of a shift reviewing vital signs for typically 50 patients, searching for onsets of common ailments such as sepsis, organ failure, respiratory ailments, etc. In this work we aim to embed the multivariate time series of each patient’s vital signs into a shared latent space to help the clinician navigate the information overload. We extend a probabilistic model of univariate time series data[1] to the multivariate case. Here we augment a GPLVM[2], which learns low dimensional representations of data by placing a GP prior on a mapping from the latent space, with a multivariate GP prior onto the space of functions. By using the GP framework, we are able to quantify uncertainty as well as generate new trajectories, helping clinicians interpret the latent space. 1. Schulam, P., & Arora, R. (2016). Disease trajectory maps. In Advances in Neural Information Processing Systems (pp. 4709-4717). 2. Lawrence, N. D. (2004). Gaussian process latent variable models for visualisation of high dimensional data. In Advances in neural information processing systems (pp. 329-336). 57 Lizhe Sun Online Regression with Feature Selection in Stochastic Data Streams Online learning algorithms have a wide variety of applications in large scale machine learning problems because users can know the performance of current models trained by existing data and they also can update the new models rapidly after data changes. However, the standard online learning methods still suffer some issues such as lower convergence rates and limited capability to select features or to recover the true features. In this paper, we present a novel framework for online learning based on running averages and introduce a series of online versions of some popular existing offline algorithms such as Elastic Net, Minimax Concave Penalty and Feature Selection with Annealing. We prove the equivalence between our online methods and their offline counterparts and give theoretical feature selection and convergence guarantees for some of them. In contrast to the existing online methods, the proposed methods can extract models with any desired sparsity level at any time. Numerical experiments indicate that our new methods enjoy high feature selection accuracy and a fast convergence rate, compared with standard stochastic algorithms and offline learning algorithms. We also present some applications to large datasets where again the proposed framework shows competitive results compared to popular online and offline algorithms. 58 Konstantinos Tsirlis On Scoring Maximal Ancestral Graphs with the Max-Min Hill Climbing Algorithm We consider the problem of causal structure learning in presence of latent confounders. We propose a hybrid method, MAG Max-Min Hill-Climbing (M^3HC) that takes as input a data set of continuous variables, assumed to follow a multivariate Gaussian distribution, and outputs the best fitting maximal ancestral graph. M^3HC builds upon a previously proposed method, namely GSMAG, by introducing a constraint-based first phase that greatly reduces the space of structures to investigate. On a large scale experimentation we show that the proposed algorithm greatly improves on GSMAG in all comparisons, and over a set of known networks from the literature it compares positively against FCI and cFCI as well as competitively against GFCI, three well known constraint-based approaches for causal-network reconstruction in presence of latent confounders. 59 Thijs Vogels Denoising Monte Carlo Renderings with Kernel-Predicting Convolutional Networks Physically based light transport simulations, used in for example the animation movie industry, are based on recursive Monte Carlo integration. The brightness of a pixel is determined by averaging contributions from many randomly sampled light paths. This estimator converges slowly with the number of samples, making this prodecure very expensive. Kernel-Predicting Convolutional Networks (KPCNs) aim to reduce the variance of the per-pixel estimates by leveraging spatial correlations present in natural-looking images. We propose a machine learning solution that learns to 'denoise' an unconverged, noisy rendering from a large dataset of (noisy image, converged image) pairs. The KPCN makes use of easily obtained by-products of the rendering pipeline, such as surface normals and texture information to better distinguish signal from noise. The kernel-predicting reconstruction can be trained an order of magnitude faster than a naive CNN and improves on the state-of-the-art in Monte Carlo denoising. 60 Alami Mejjati Youssef Multi-Task Learning by Maximizing Statistical Dependence We present a new multi-task learning (MTL) approach that can be applied to multiple heterogeneous task estimators. Our motivation is that the best task estimator could change depending on the task itself. For example, we may have a deep neural network for the first task and a Gaussian process for the second task. Classical MTL approaches cannot handle this case, as they require the same model or even the same parameter types for all tasks. We tackle this by considering task-specific estimators as random variables. Then, the task relationships are discovered by measuring the statistical dependence between each pair of random variables. By doing so, our model is independent of the parametric nature of each task, and is even agnostic to the existence of such parametric formulation. We compare our algorithm with existing MTL approaches on challenging real world ranking and regression datasets, and show that our approach achieves comparable or better performance without knowing the parametric form. 61 Valer Zetocha Pricing illiquid financial instruments using RNNs Many financial instruments are quoted daily in the markets with sufficient liquidity. However, there are many more that are not quoted and are traded only rarely. Pricing and hedging such instruments in the derivatives books therefore requires extraction of information from correlated liquid instruments. One such case are the long-term equity vanilla options. In the current work we investigate the possibility of estimating the price of long term equity vanilla options based on a time series of short-term option prices on the same stock as well as past observations of long-term prices. We use LSTM neural networks to capture the time dependence and work with the series of implied volatilities as proxies for the prices.