Poster Sessions

In this section you will find the posters' title, abstract and presenter corresponding to each of the two poster sessions that will take place during the MLSS.

Some photos of the MLSS poster sessions can be seen here: Photos.

First Poster Session: 30/08/2018, 18:30h - 20:30h
 # Presenter Title Abstract 1 Homayun Afrabandpey Active Expert Knowledge Elicitation of Feature Similarities and Covariances for Prediction Prediction in "small n, large $p$'' problems, with sample sizes substantially smaller than the number of features, is challenging. Bayesian models can alleviate the challenge using informative prior distributions over parameters. There exist a rich literature on constructing such prior distributions through expert knowledge elicitation, most of them focus either on directly eliciting distributions or on querying knowledge about features one at a time. Furthermore, common to all these techniques is the independence assumption of the features in the prior distribution. Focusing on linear regression, we propose a human-in-the-loop machine learning method for constructing full covariance matrix for prior distribution of the parameter by querying the expert about pairs of features. Since the number of pairs can be large, we increase the interaction efficiency by a subsampling approach with guarantees and implement the models using probabilistic programming, which allows us to naturally use sequential decision making methods to optimize query selection. Our results demonstrate improvement in predictive performance with simulated and real data. 2 REDA ALAMI Memory Bandits: A Bayesian Approach for the Switching Bandit Problem The Thompson Sampling exhibits excellent results in practice and it has been shown to be asymptotically optimal. The extension of Thompson Sampling algorithm to the Switching Multi-Armed Bandit problem, proposed in \cite {mellor2013thompson}, is a Thompson Sampling equiped with a Bayesian online change point detector \cite{adams2007bayesian}. In this paper, we propose another extension of this approach based on a Bayesian aggregation framework. Experiments provide some evidences that in practice, the proposed algorithm compares favorably with the previous version of Thompson Sampling for the Switching Multi-Armed Bandit Problem, while it outperforms clearly other algorithms of the state-of-the-art. 3 Álvaro Barbero Jiménez proxTV: fast and modular proximal optimization for multidimensional total-variation regularization We study TV regularization, a widely used technique for eliciting structured sparsity. In particular, we propose efficient algorithms for computing prox-operators for lp-norm TV. The most important among these is l1-norm TV, for whose prox-operator we present a new geometric analysis which unveils a hitherto unknown connection to taut-string methods. This connection turns out to be remarkably useful as it shows how our geometry guided implementation results in efficient weighted and unweighted 1D-TV solvers, surpassing state-of-the-art methods. Our 1D-TV solvers provide the backbone for building more complex (two or higher-dimensional) TV solvers within a modular proximal optimization approach. We review the literature for an array of methods exploiting this strategy, and illustrate the benefits of our modular design through extensive suite of experiments on (i) image denoising, (ii) image deconvolution, (iii) four variants of fused-lasso, and (iv) video denoising. To underscore our claims and permit easy reproducibility, we provide all the reviewed and our new TV solvers in an easy to use multi-threaded C++, Matlab and Python library: proxTV. 4 Johanna Bayer Profiling Major Depressive Disorder using methods of machine learning The concept of Major Depressive Disorder (MDD) is very broadly defined. Symptom profiles vary both between individuals diagnosed with this disorder and overlap with those of other mental disorders from the affective spectrum. Due to this heterogeneity and the lack of a clinical decision support tools, the assignment of the correct treatment for MDD is tedious and based on trial and error. We use gaussian process regression to map the normative relationship between age and structural freesurfer brain measures in 8,000 healthy controls and estimate the extent to which brain structures in individual MDD patients (N=2,500) deviate from these normative patterns. This approach allows us to look at single individuals, e.g. with extreme values under the normative model, and to identify subgroups, e.g., using clustering. These subtypes of individuals with MDD can be related to disorder- and treatment-specific variables, like disease severity, treatment response and outcome, and thus facilitate the assignment of treatment. The current results of the project are discussed against the background of the emerging trend to share and combine data between and from multiple sites, and the possible risks in the use of methods of machine learning on them. 5 Mikhail Beck Making sport competition models with the help of machine learning Sport betting industry extensively employs probabilistic modelling to develop better odds compilation tools. These tools typically take odds on some key markets as input and calculate odds on tens or even hundreds of derived markets. Machine Learning techniques are now used to build more realistic models of sport competitions. In this presentation I show 3 cases where ML provided a solution that would be difficult to get by other means. 6 Lyvia Biagi Prediction of nocturnal hypoglycemic events in subjects with type 1 diabetes Introduction Subjects with type 1 diabetes (T1D) need exogenous insulin to regulate blood glucose levels due to an autoimmune destruction of pancreatic beta cells. Insulin must be infused properly to maintain normal levels of glucose. Otherwise, patients can experience hyper- or hypoglycemic levels. Hypoglycemia is a serious complication of T1D and it is a major concern in patient’s safety. Nocturnal hypoglycemia can lead to various adverse situations in T1D patients, including consciousness, seizure, or even death. Making use of patient’s retrospective data allows the prediction and prevention of future hypoglycemic events, contributing to enhancing of patient safety and quality of care. Methods Data from 12 patients with T1D were considered in this work. The main objective of this work is to develop personalized prediction systems based on individual historical data collected from patients. Information related with patients’ insulin therapy, meals and physical activity were considered to provide different features in order to predict the occurrence of nocturnal hypoglycemic events, using machine learning algorithms. Preliminary Results Preliminary results were obtained using artificial neural networks to classify days into two classes: i) nights with hypoglycemic events and ii) nights without hypoglycemic events. The methodology applied in the dataset was able to obtain satisfactory results for most of the patients. Averaged sensitivity and specificity for all the patients were 54.3% and 85.1%, respectively. This information may be helpful to patients to take actions to properly avoid the occurrence of nocturnal hypoglycemic events, while they get ready to sleep. 7 Lubos Buzna Use cases and introductory analysis of the dataset collected within the large network of public charging stations The recent rise of electric vehicles (EV) brings social and technological changes in the transportation and energy sectors, including the massive deployment of charging stations. To provide effective decision support for the operators of charging stations, we are exploring possibilities for exploiting available dataset and present results of preliminary data analyses. Our dataset contains over 32 million meter-readings from charging of plug-in electric vehicles (PEV) on more than 1700 charging stations, located in the Netherlands. Based on the discussions with experts and the available literature, three main application areas were identified: forecasting of demanded energy, identification of customer segments and characterization of suitable locations for charging stations. As an example of a use case, we forecast the consumption of electric energy on charging stations in the COROP region of Utrecht. Two kinds of SARIMAX model together with three kinds of training-forecasting procedure are used with various exogenous predictors to identify, which combination provides the best long-term forecasts 8 Taha Ceritli Modeling Bounded Data with Sum Conditioned Poisson Factorization Non-negative bounded data, such as binary and ordinal matrices, are modeled as Poisson random variables with unbounded ranges in Poisson Factorization, a state-of-the-art matrix factorization method. In this work, we extend Poisson Factorization to model such bounded data with bounded distributions such as Bernoulli, Binomial, Categorical and Multinomial, where multiple Poisson Factorizations are conditioned on their sum. The resulting model named Sum Conditioned Poisson Factorization is evaluated on simulated and real data sets. 9 Ho Ching Chiu Single Image Super-Resolution GAN using Inception-ResNet As the title. I just switched from psychology to machine learning, so my research yet have anything to do with machine learning, and I just do this as my side project purely for fun. The work is still in progress and not yet very successful. Sorry that there is no poster since I don't have time to prepare, but I am keen on learning from all of you to produce awesome super-resolution images and, of course, on all other machine learning topics as well. Happy to discuss and suggestion are welcomed. Email: hoching.chiu@oist.jp 10 Irene Córdoba Uniform sampling of decomposable Gaussian graphical models We propose a novel Metropolis-Hastings algorithm to sample uniformly from the space of decomposable Gaussian graphical models. The method is based on previous work on uniform sampling of correlation matrices. Our approach is intuitive and simple, based on the interpretation of the Cholesky factorization of the inverse covariance matrix and Markov chain Monte Carlo theory. We analyze the convergence of the resulting Markov chain both theoretically and empirically. We show in numerical experiments how traditional sampling methods in Gaussian graphical models are biased towards certain regions of the whole space, whereas our approach uniformly explores all of it. 11 Carlo D'Eramo D'Eramo Exploiting Action-Value Uncertainty to Improve Learning and Exploration in Reinforcement Learning We address the problem of the estimation of the maximum expected value of a set of independent random variables. The performance of several learning algorithms (e.g., Q-learning) is affected by the accuracy of such estimation. Unfortunately, no unbiased estimator exists. The usual approach of taking the maximum of the sample means leads to large overestimates that may significantly harm the performance of the learning algorithm. Recent works have shown that the cross validation estimator which is negatively biased outperforms the maximum estimator in many sequential decision-making scenarios. On the other hand, the relative performance of the two estimators is highly problem-dependent. In this work, we propose a new estimator for the maximum expected value, based on a weighted average of the sample means, where the weights are computed using Gaussian approximations for the distributions of the sample means. We extend this analysis also to an infinite set of random variables. We apply our method to a wide range of Reinforcement Learning problems from discrete to continuous ones. Moreover, we explain how to improve exploration exploiting the uncertainty computed by our algorithm. We compare the proposed estimator and exploration strategies with the other state-of-the-art methods both theoretically, by deriving upper bounds to the bias and the variance of the estimator, and empirically, by testing the performance on different sequential learning problems. 12 Lucas Deecke Mode normalization Normalization methods are a central building block in the deep learning toolbox. By alleviating internal covariate shifting, they accelerate training in deep networks, and decrease the dependence on manually tuned learning rate schedules. When learning from multi-modal distributions, the effectiveness of batch normalization (BN), arguably the most prominent normalization method, is reduced. As a remedy, we propose a more flexible approach: by extending the normalization to more than a single mean and variance, we detect modes of data on-the-fly, jointly normalizing samples that share common features. We demonstrate that our method outperforms BN and other widely used normalization techniques in several experiments, including various architectures and datasets. 13 Markus Eyting Predicting Health from Questionnaire Data By means of random forest classifications we predict different health outcomes from questionnaire data. Patients' health behaviour, psychological conditions, work attitudes as well as basic demographic characteristics are used to predict health outcomes in a variety of dimensions. Predictions are compared to actual health statuses as well as to expert predictions from physicians. 14 Elizabeth Fons Etcheverry A regime switching model for smart beta investing using Hidden Markov Models The financial crisis generated interest in more trans- parent, rules-based strategies, with Smart beta emerging as a trend among institutional investors. Smart beta is a hybrid strategy combining investment strategies from active management, with a systematic approach often associated with passive investment, making them more cost effective. Such strategies show strong performance over the long run, but often suffer from severe short-term drawdown with fluctuating performance across cycles. To address cyclicality and underperformance, we build a regime-switching framework using Hidden Markov Models (HMMs). We build portfolios whose allocating signal is provided by a HMM trained with the same assets. Results show that using HMMs improves risk adjusted returns, especially on more return-oriented portfolios. In addition, we implement a novel approach for regime switching models using an embedded feature selection algorithm to improve regime identification. We evaluate smart feature selection with real life assets using MSCI style indices, and show model performance improvement with respect to portfolios built using full feature HMMs. 15 Víctor Gallego Bayesian structural time series models for advertising expenditures We propose a robust implementation of the Nerlove--Arrow model using a Bayesian structural time series model to explain the relationship between advertising expenditures of a country-wide fast-food franchise network with its weekly sales. Thanks to the flexibility and modularity of the model, it is well suited to generalization to other markets or situations. Its Bayesian nature facilitates incorporating \emph{a priori} information (the manager's views), which can be updated with relevant data. This aspect of the model will be used to present a strategy of budget scheduling across time and channels. 16 Kunal Ghosh Deep learning spectroscopy: neural networks for molecular excitation spectra Applications of novel materials have a significant positive impact on our lives. To search for such novel materials, material scientists traverse massive datasets of prospective materials identifying ones with favourable properties. Prospective materials are screened by studying a suitable spectra of these materials. Contemporary methods like high-throughput screening are very time consuming for moderately sized datasets. We train three different neural network architectures: multilayer perceptron (MLP), convolutional neural network (CNN) and deep tensor neural network (DTNN) to predict orbital energies and excitation spectra 132K organic molecules. The input for the neural networks are the coordinates and charges of the constituent atoms of each molecule. Already the MLP is able to learn spectra, but the test root mean square error (RMSE) is still as high as 0.3 eV. The learning quality improves significantly for the CNN (RMSE=0.23 eV) and reaches its best performance for the DTNN (RMSE=0.19 eV). Both CNN and DTNN capture even small nuances in the spectral shape. 17 Hanane GRISSETTE A dynamic Sentiment Analysis Model based on associative learning for Defining the Minute Insights from passive and active Patients state self-reported on Social Media Nowadays, Sentiment Analysis (SA) is the pioneering approach used to analyze people’s opinions about a product or an event to identify breakpoints in public opinion [1] . The traditional form of clinical notes such as CRFs (Case Report Form) that used to summarize physical examination and details of the medical history of patients’ experiences towards specific drugs or events is not credible and less efficient at defining the changeable emotional state of patients through the process of medication.Moreover, the major Issue is the inability of such general-purpose SA tools to accurately detect the meanings of the sentiments expressed towards treatments/scientific studies and Pharma companies at large over the time.in this work,we aim at defining the underling set of sentiment covered towards an entity by using an associative learning based on Bayesian approach to quantifying exactly what change is. 18 Prakhar Gupta Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings. 19 Ray Han Towards Effective Extraction and Linking of Software Mentions from User-Generated Support Tickets Software support tickets contain short and noisy text from users. Software products are represented by various surface forms and informal abbreviations created by users. Automatically identifying software mentions from tickets and determining the official names (and versions) are helpful for many downstream applications, e.g., routing the support tickets to the right team of experts supporting the software. In this work, we study the problem of software product name extraction and linking from support tickets. We first analyze a collection of annotated tickets to understand the language pat- terns. Second, we design features using multiple in-domain and web knowledge sources, for the extraction and linking with linear mod- els. Experiments on four datasets show better and more consistent results of our methods compared to neural network baselines. 20 Florian Huber Predicting antibacterial drug mode of action using machine learning Elucidating the mode of action (MoA) of small molecules targeting microbial growth is key for drug discovery. MoA determination is still a major bottleneck in drug discovery because it depends on laborious low throughput methods and thus cannot be applied to large compound collections. Previous efforts to systematically infer drug MoA using drug-gene interactions or phenotypic profiling of single cells lack accuracy and cannot be easily generalised. In this project, we aim to make use of chemical screening data, microscopy, and compound structures as input for machine learning analyses in order to identify a set of features that can predict drug MoA across a wide range of bacterial species. We provide an overview of the challenges of MoA prediction and current computational/statistical approaches to predict drug MoA. We present our approaches to select the most informative features for MoA prediction from the wealth of experimental data that is available. These data are combined with our knowledge of bacterial cell physiology to train classification algorithms for MoA prediction. This will yield insights in the pathways affected by drugs and facilitate the elucidation of mechanisms of drugs with unknown MoA. 21 Abdullah-Al-Zubaer Imran PDV-Net: Reliable, Fast, and Automatic Segmentation of Pulmonary Lobes Reliable and automatic segmentation of lung lobes is important for diagnosis, assessment, and quantification of pulmonary diseases. The existing techniques are prohibitively slow, undesirably rely on prior (airway/vessel) segmentation, and/or require user interactions for optimal results. This work presents a reliable, fast, and fully automated lung lobe segmentation based on a progressive dense V-network (PDV-Net). The proposed method can segment lung lobes in one forward pass of the network, with an average runtime of 2 seconds using 1 Nvidia Titan XP GPU, eliminating the need for any prior atlases, lung segmentation or any subsequent user intervention. We evaluated our model using 84 chest CT scans from the LIDC and 154 pathological cases from the LTRC datasets. Our model achieved a Dice score of 0.939 ± 0.02 for the LIDC test set and 0.950 ± 0.01 for the LTRC test set, signicantly outperforming a 2D U-net model and a 3D dense V-net. We further evaluated our model against 55 cases from the LOLA11 challenge, obtaining an average Dice score of 0.935—a performance level competitive to the best performing model with an average score of 0.938. Our extensive robustness analyses also demonstrate that our model can reliably segment both healthy and pathological lung lobes in CT scans from different vendors, and that our model is robust against configurations of CT scan reconstruction. 22 Tim Janke A Quantile Regression Deep Neural Network for Probabilistic Electricity Price Forecasting In the recent decade the paradigm in forecasting has been shifting from point forecasts to probabilistic forecasts acknowledging the need for an assessment of forecast uncertainty. In electricity price forecasting (EPF), probabilistic forecasting is an emerging but still underdeveloped field. Due to the rising share of generation from renewable and volatile energy sources like wind and solar in combination with the partly inflexible generation from thermal power plants, electricity prices have become increasingly volatile and hard to forecast. Hourly electricity prices are typically determined by a one day-ahead uniform price auction and are influenced by fundamental factors like expected renewable in-feed, expected demand, or fuel prices, but also exhibit strong seasonal and auto-regressive patterns. Historically, classic parametric time series models are predominant. Deep neural networks are theoretically well suited to model the complex and non-linear relationships that govern the formation of electricity prices. However, neural network models so far have shown limited performance in the domain of EPF. We challenge this notion by showing that a quantile regression deep neural network beats established benchmark models in terms of point forecasting as well as probabilistic forecasting accuracy. Combining the concept of quantile regression with a deep neural network, we propose a Quantile Regression Deep Neural Network for the simultaneous estimation of 99 quantiles of all 24 day-ahead electricity prices, i.e. our model has 24*99 output units and is trained to minimize the average pinball loss over all prices. We use the load forecast, expected solar power infeed, and expected wind power infeed as well as 168 lagged prices as input from a three-year data set from the German-Austrian bidding zone in hourly resolution. We initially fit our model using the first two years in the data set and test our model’s performance by forecasting the whole year of 2017. We refit our model after each day using an expanding window. 23 Juan Emmanuel Johnson Input Uncertainty in Gaussian Process Regression for Earth Surface Temperature Predictions Gaussian Processes (GPs) are a class of kernel methods that have shown to be very useful in geoscience applications. They are widely used because they are simple, flexible and provide very accurate estimates for nonlinear problems, especially in parameter retrieval. An addition to a predictive mean function, GPs come equipped with a useful property: the predictive variance function which provides confidence intervals for the predictions. The GP formulation usually assumes that there is no input noise in the training and testing points, only in the observations. However, this is often not the case in Earth observation problems where an accurate assessment of the instrument error is usually available. In this poster, I showcase how the derivative of a GP model can be used to provide an analytical error propagation formulation and we analyze the predictive variance and the propagated error terms in a temperature prediction problem from infrared sounding data. 24 Marija Kekic Applications of neural networks in data analysis in NEXT Convolutional Neural Networks have achieved impressive results in the field of computer vision. In this work we examine application of CNNs in Neutrino Experiment with a Xenon TPC (NEXT) where CNNs are used to differ topological signatures of background and signal events via training on many thousands of simulated events. The network trained in this study performed better than previous methods, and we hope to further improve our findings by usage of more appropriate architectures. 25 Benjamin Knopp Temporal Movement Primitive Perception under Naturalistic Conditions Movement Primitives (MP) are hypothetical elements out of which complex movements can be composed. This concept is popular in motor control, but we are interested in MPs as perceptual categories: inspired by the common-coding theory, we investigated if MPs are also a useful approach for describing human movement perception in naturalistic settings. Besides understanding movement perception, finding perceptual MP categories could also be useful for computer-vision/graphics and modelling. We recorded an actor performing natural tasks in a fairly unconstrained manner. The actions consist of walking through an indoor environment, stair climbing, making/drinking coffee. The data was used for a psychophysical movement segment perception experiment and for learning MP models. We showed 70 video clips containing a selection of recordings to 12 participants. They were instructed to segment these clips into non-overlapping time intervals according to perceived boundaries. Results: We then used the segmentations of each participant for the extraction of MPs. Using the Bayesian Information Criterion, we estimated that 6-15 MPs are optimal for a given participant. Furthermore, we did a cluster analysis to compare global and local representations. The results indicate that task-independent MPs provide a better representation of human movement than task-dependent ones. 26 Radha Manisha Kopparti Abstract Rule Learning with Neural Networks Over the past few years, deep neural networks have been widely used for various applications and have produced state-of-the-art results in domains like image recognition, speech recognition, machine translation, etc. Nevertheless, there are still open challenges. One of them is the need for vast amounts of training data, which has been related to the difficultly of neural networks to learn certain abstractions, specifically grammatical patterns. For example, humans can easily learn linguistic abstractions, both through explicit definition and more implicit means . In an experiment, Marcus showed that even 7-month old infants learned abstract grammar-like rules from a small number of unlabeled examples, in just two minutes, while neural networks failed to do so. In a series of recent papers, there has been re-emphasis on the fact that humans are far more efficient in learning complex rules than deep learning systems . However, previous works on training neural networks to understand abstract grammar patterns haven't produced positive results. As recurrent neural networks have been shown to be Turing- complete , they can represent abstract relationships, but the current algorithms do not seem to learn these representations. Therefore we take a constructive approach and create network architectures which detect abstract relationships and condition outputs on them. We take this as the basis for further exploring the learning behaviour of networks and identifying ways to encourage abstraction in neural networks. We started by creating a neural network that can learn identity relationships by design. This network can learn the grammars proposed by Marcus et al. from the data when trained with stochastic gradient descent. We perform several experiments by training the neural network on sequential data and propose a framework by which neural networks can learn abstract relationships . This approach may provide new generalization capabilities to neural networks and can be applied to various modalities like speech and language, music, and time series data. 27 Michal Kozlowski Energy Efficiency in Reinforcement Learning for Wireless Sensor Networks As sensor networks for health monitoring become more prevalent, so will the need to control their usage and consumption of energy. This poster presents a method which leverages the algorithm's performance and energy consumption. By utilising Reinforcement Learning (RL) techniques, we provide an adaptive framework, which continuously performs weak training in an energy-aware system. We motivate this using a realistic example of residential localisation based on Received Signal Strength (RSS). The method is cheap in terms of work-hours, calibration and energy usage. It achieves this by utilising other sensors available in the environment. These other sensors provide weak labels, which are then used to employ the State-Action-Reward-State-Action (SARSA) algorithm and train the model over time. Our approach is evaluated on a simulated localisation environment and validated on a widely available pervasive health dataset which facilitates realistic residential localisation using RSS. We show that our method is cheaper to implement and requires less effort, whilst at the same time providing a performance enhancement and energy savings over time. 28 Rita Kuznetsova Variational Bi-domain Triplet Autoencoder We investigate deep generative models, which allow us to use training data from one domain to build a model for another domain. We consider domains to have similar structure (texts, images). We propose the Variational Bi-domain Triplet Autoencoder (VBTA) that learns a joint distribution of objects from different domains. There are many cases when obtaining any supervision (e.g. paired data) is difficult or ambiguous. For such cases we can seek a method that is able to the information about data relation and structure from the latent space. We extend the VBTAs objective function by the relative constraints or triplets that sampled from the shared latent space across domains. In other words, we combine the deep generative model with a metric learning ideas in order to improve the final objective with the triplets information. We demonstrate the performance of the VBTA model on different tasks: bi-directional image generation, image-to -image translation, even on unpaired data. We also provide the qualitative analysis. We show that VBTA model is comparable and outperforms some of the existing generative models. 29 Krista Longi Semi-supervised Convolutional Neural Networks for Identifying Wi-Fi Interference Sources We present a convolutional neural network for identifying radio frequency devices from signal data, in order to detect possible interference sources for wireless local area networks. Collecting training data for this problem is particularly challenging due to a high number of possible interfering devices, difficulty in obtaining precise timings, and the need to measure the devices in varying conditions. To overcome this challenge we focus on semi-supervised learning, aiming to minimize the need for reliable training samples while utilizing larger amounts of unsupervised labels to improve the accuracy. In particular, we propose a novel structured extension of the pseudo-label technique to take advantage of temporal continuity in the data and show that already a few seconds of training data for each device is sufficient for highly accurate recognition. 30 Gurunath Reddy Madhumani Classification and Segmentation of Vocal Folds in Videolaryngostroboscopy Images of Laryngeal Disorders Videolaryngostroboscopy is a invasive technique to capture the vocal folds vibration/activity during phonation with the help of high speed video camera with intermittent illumination condition. Vocal folds are the membrane structure of two symmetrical folds in the larynx. The healthy vocal folds produces quasi-periodic puffs of air driven from the lungs gets convolved with the vocal tract, results in actions such as speaking, singing and other paralinguistic vocalizations. Whereas an unhealthy/disordered vocal folds results in irregular vibration, leakage of air flow, hoarseness, breathy, also, in some disorders, vocal folds even generates pain in larynx due to external tissue growth on vocal folds or due to inflammation. In many cases, early vocal disorder detection is necessary to prevent probable chronic diseases such as carcinoma. In literature, we can find very few attempts for automated segmentation of vocal fold region and classification from the noisy, distorted and very low illumination condition images. Most of the methods assumes that all frames in the Videolaryngostroboscopy contain vocal folds and starts segmenting the desired region with the help of multiple stages of Digital image processing piplines, which is not always true. Often, the shape of the vocal folds is assumed to be of fixed, hand crafted features such as histogram of oriented gradients and region growing methods are applied to obtain the vocal folds regions. In our work, we have proposed end-to-end classification and segmentation of vocal folds method which does not involve any handcrafted features. The machine learns features on its own by minimizing the error between the predicted and the actual ground truth. The first stage classifies the given frame for the presence of vocal folds. The second stage segments frame containing the vocal folds at pixel level. The final stage classifies the segmented vocal folds into one of several categories of disorders. The initial results showed that the proposed method is indeed better than the state-of-the-art methods. For this work, as a first step, we have hand annotated approximately 7000 images for frame level classification and segmentation at pixel level from 5 normal and 20 disorder vocal fold patients. 31 Atalanti Mastakouri Personalised brain stimulation for motor rehabilitation Non-invasive brain stimulation is one of the most novel techniques for motor rehabilitation after stroke. Although there are some very promising results, many studies trying to replicate them report inconsistent results and large percentages of non-responders. Recently, we demonstrated evidence of large across-subjects heterogeneity of brain activity during the same motor task and proposed that this could consist a reason why the same stimulation parameters (frequency, amplitude, location) do not lead to the same conclusions across subjects. We now focus on identifying the subject-specific brain features which explain response to motor cortex brain stimulation. 32 Joe Meagher Phylogenetic Gaussian Processes and Bat Echolocation The reconstruction of ancestral echolocation calls is an important part of understanding the evolutionary history of bats. General techniques for the ancestral reconstruction of function-valued traits have recently been proposed. A full implementation of phylogenetic Gaussian processes for the ancestral reconstruction of function-valued traits representing bat echolocation calls is presented here. 33 Luca Messina Modeling the behavior of fusion power-plant component with multifidelity-based simulations In future nuclear fusion power plants based on the magnetic-confinement concept, the divertor (the pipe for exhaust gases) will be a critical component that will be exposed to the harshest conditions in terms of high temperatures, mechanical stresses, and neutron irradiation. In particular, high-energy neutrons emitted in the fusion reaction can severely damage and endanger the divertor structural integrity, forcing for very frequent and expensive replacements. In order to maximize the lifetime of this component, it is crucial to predict its behavior during operation and devise the optimal chemical composition that would minimize the irradiation effects. At the atomic scale, the damage is caused by the continuous collision between the high-energy neutrons and the atoms of the divertor metallic structure. These collisions create defects that initiate atomic-diffusion phenomena, leading to changes in the chemical composition of the alloy which strongly affect the macroscopic properties, such as hardness and brittleness. This series of complex phenomena is here modeled in a multiscale framework. First, the metal thermodynamic and kinetic properties are computed with accurate electronic-structure (Density Functional Theory, DFT) calculations, building up a subatomic physical description that can be used to parameterize Monte Carlo simulations of atomic transport and diffusion. In this way, it is possible to simulate the chemical evolution of the alloy caused by the irradiation-induced crystal defects, and make predictions on the structural properties. The parameterization of such Monte Carlo simulations is tricky, because subatomic properties are highly non-linear functions of the local chemical composition around a given crystal defect. For instance, considering environments of approximately 100 atoms (corresponding to a sphere of about 0.6 nm) would give rise in a simple binary alloy to 2^100 combinations of chemical composition, clearly unattainable for accurate but computationally expensive DFT calculations. In this work, we start from much smaller DFT datasets to build a reliable model able to predict the subatomic properties as functions of the local chemical composition. This is achieved by developing simplified models and assessing their accuracy against accurate DFT calculations, with the aid of a multifidelity approach allowing for the prediction of each model's average error. This will be applied in future works to the specific case of a W-Re alloy, which is among the candidate materials for the divertor thanks to its excellent structural properties, but whose behavior under irradiation is still largely unknown. 34 Prerana Mukherjee SalProp: Salient object proposals via aggregated edge cues In this paper, we propose a novel object proposal generation scheme by formulating a graph-based salient edge classification framework that utilizes the edge context. In the proposed method, we construct a Bayesian probabilistic edge map to assign a saliency value to the edgelets by exploiting low level edge features. A Conditional Random Field is then learned to effectively combine these features for edge classification with object/non-object label. We propose an objectness score for the generated windows by analyzing the salient edge density inside the bounding box. Extensive experiments on PASCAL VOC 2007 dataset demonstrate that the proposed method gives competitive performance against 10 popular generic object detection techniques while using fewer number of proposals. 35 Mojmir Mutny Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features We develop an efficient and provably no-regret Bayesian optimization (BO) algorithm for optimization of black-box functions in high dimensions. We assume a generalized additive model with possibly overlapping variable groups. When the groups do not overlap, we are able to provide the first provably no-regret \emph{polynomial time} (in the number of evaluations of the acquisition function) algorithm for solving high dimensional BO. To make the optimization efficient and feasible, we introduce a novel deterministic Fourier Features approximation based on numerical integration with detailed analysis for the squared exponential kernel. The error of this approximation decreases \emph{exponentially} with the number of features, and allows for a precise approximation of both posterior mean and variance. In addition, the kernel matrix inversion improves in its complexity from cubic to essentially linear in the number of data points measured in basic arithmetic operations. 36 Maryleen Ndubuaku Hybrid Intelligence for Real-time Anomaly Detection in Smart Visual Network This research proceeds at the edge where the focus is to process video streams so as to detect and isolate special events. This allows the cloud to receive just as much data as required for predictive analytics, pattern recognition and data mining. It also reduces the amount of data stored in the cloud by ensuring that only special events are filtered starting from the edge devices to the cloud. Using an online deep learning algorithm, the anomalous events are captured in real-time and transmitted to the next tier of the network. While there has been a lot of research in individual fields of edge learning, video analytics, cloud data fusion and anomaly detection, research is still lacking in the aggregation of these technologies, where anomalous activities in visual networks can be detected through a hybrid learning method between the edge and the cloud. The hybrid learning real-time analytics could be valuable in various applications like surveillance systems and environmental monitoring. 37 Fernando O. Gallego A dataset to mining conditions A condition is a constraint that determines when something holds. Mining them is paramount to understanding many sentences properly. Supervised condition miners need a labelled dataset with conditions but there is not one publicly available. We present the first publicly available dataset with conditions. It consists of more than 45,000 labelled sentences from a set of more than 4,500,000 sentences in English, Spanish, French, and Italian that were gathered from the Web between April 2017 and May 2017. The sentences were labelled by means of a custom tool that we devised to perform the task. 38 Julia Olkhovskaya Online influence maximization with local observations We consider an online influence maximization problem in which a decision maker selects a node among a large number of possibilities and places a piece of infor- mation at the node. The node transmits the information to some others that are in the same connected component in a random graph. The goal of the decision maker is to reach as many nodes as possible, with the added complication that feedback is only available about the degree of the selected node. Our main result shows that such local observations can be sufficient for maximizing global influence in two broadly studied families of random graph models: stochastic block models and Chung–Lu models. With this insight, we propose a bandit algorithm that aims at maximizing local (and thus global) influence, and provide its theoretical analysis in both the subcritical and supercritical regimes of both considered models. Notably, our performance guarantees show no explicit dependence on the total number of nodes in the network, making our approach well-suited for large-scale applications. 39 Alessandro Ortis On the Prediction of Social Image Popularity Dynamics This work introduces the new challenge of forecasting the engagement score reached by social images over time. We call this task ”Popularity Dynamic Prediction”. The task is the estimation, in advance, of the engagement score dynamic over a period of time (e.g., 30 days) by exploiting visual and social features. To this aim, we propose a benchmark dataset that consists of ~20K Flickr images labelled with their engagement scores (i.e., views, comments and favorites) in a period of 30 days from the upload in the social platform. For each image, the dataset also includes user’s and photo’s social features that have been proven to have an influence on the image popularity on Flickr. The proposed dataset is publicly available for research purpose. We also present a method to address the aforementioned problem. Our approach is able to forecast the daily number of views reached by a photo posted on Flickr for a period of 30 days, by exploiting features extracted from the post. 40 Despoina Paschalidou RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials We consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches 41 Miquel Perello-Nieto Training classifiers with weak labels It is well known that the labelling process of classification data-sets is expensive. One of the possible solutions is the use of semi-supervised techniques that use non-labelled samples to improve the performance on the labelled samples. However, these methods rely on strong prior assumptions on the unlabelled set. We propose new methods to train with weak labels; these are cases where the labels may be wrong, form a super-set, or are outdated. Our method transforms proper losses that require true labels to proper losses that can be used in weak label scenarios. 42 Bartosz Piotrowski Can Neural Networks Learn Logical Equivalence? Applying deep learning for logical reasoning tasks is an interesting and quite unexplored topic. We show an exemplary experiment of this kind. Preparing a good training set is usually non-trivial -- we want to make sure that the network does not "cheat" exploiting unintended dependencies and learns a proper thing. We show what kind of architectures can be appropriate and propose some ideas to explore. 43 Giorgia Ramponi Generative Adversarial Network for Time Series noisy with irregular sampling Time Series are sequences of measurements that follow non-random orders. The analysis of time series is based on the assumption that successive values in the data represent consecutive measurements taken at equally spaced time intervals. Most commonly, a time series is a sequence of data with a time order, where the time interval is given. Generating time series data is useful in various fields such as astronomy, econometric, quantitative finance or signal processing. Time series analysis has two main goals: identifying the nature of the phenomenon represented by the sequence of observations (classification) and forecasting, predict next value in the future (prediction). The challenge in Time Series analysis is in the data: more likely time series data are noisy, with missing observations (in this case the time interval is irregular) or too few observations. With this kind of data is hard to succeed in the tasks of classification and prediction. In this poster we propose a Conditional Generative Adversarial Network to generate time series with not fixed time intervals. We propose a model to generate time series data with the purpose of augmenting a dataset of various time series noisy. Conditioning the generator and the discriminator with the time intervals we generate new data. We show that a classifier trained with data generated by the GAN and tested on real data, achieves same performances as a classifier trained on real data. In this way given a dataset composed by time series of inbalanced classes we could improve the performance of the classifier augmenting the training set with generated time series data. 44 Ahmed Sabir Enhance Text Spotting with Semantic Information This poster addresses the problem of detecting and recognizing text in images acquired `in the wild'. This is a severely under-constrained problem which needs to tackle a number of challenges including large occlusions, changing lighting conditions, cluttered backgrounds and different font types and sizes. In order to address this problem we leverage on recent and successful developments in the cross-fields of machine learning and natural language understanding. In particular, we initially rely on off-the-shelf deep networks already trained with large amounts of data and that provide a series of text hypotheses per input image. The outputs of this network are then combined with different priors obtained from both the semantic interpretation of the image and from a scene-based language model. As a result of this combination, the performance of the original network is consistently boosted. 45 Kamil Safin Optimal model selection for paraphrase detection task We propose an algoritm of optimal model selection. As a criterion of quality we use model evidence. Model evidence is expressed by integral over pa- rameter space. In order to estimate it we use variational inference. As a approxi- mation of posterior and prior distributions we use normal distribution. We tested proposed algorithm of optimal model selection on the paraphrase classification task. As models we use different types of deep neural networks. Also we ana- lyzed how pretrain helps to estimate parameters of the model. In our experiment we use SemEval 2015 dataset. 46 Mehdi S. M. Sajjadi Assessing Generative Models via Precision and Recall Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as Fréchet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they yield one-dimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and the Variational Autoencoder. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution. 47 Anirban Sarkar Grad­CAM++: Generalized Gradient-­based Visual Explanations for Deep Convolutional Networks Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision based problems. However, these deep models are perceived as ”black box” methods considering the lack of understanding of their internal functioning. There has been a significant recent interest to develop explainable deep learning models, and this paper is an effort in this direction. Building on a recently proposed method called Grad-CAM, we propose Grad-CAM++ to provide better visual explanations of CNN model predictions (when compared to Grad-CAM), in terms of better localization of objects as well as explaining occurrences of multiple objects of a class in a single image. We provide a mathematical explanation for the proposed method, Grad-CAM++, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the class label under consideration. Our extensive experiments and evaluations, both subjective and objective, on standard data sets showed that Grad-CAM++ indeed provides better visual explanations for a given CNN architecture when compared to Grad-CAM. 48 Lukas Schott Robust Perception through Analysis by Synthesis The intriguing susceptibility of deep neural networks to minimal input perturbations suggests that the gap between human and machine perception is still large. We here argue that despite much effort, even on MNIST the most successful defenses are still far away from the robustness of human perception. We here reconsider MNIST and establish a novel defense that is inspired by the abundant feedback connections present in the human visual cortex. We suggest that this feedback plays a role in estimating the likelihood of a sensory stimulus with respect to the hidden causes inferred by the cortex and allow the brain to mute distracting patterns. We implement this analysis by synthesis idea in the form of a discriminatively fine-tuned Bayesian classifier using a set of class-conditional variational autoe ncoders (VAEs). To evaluate model robustness we go to great length to find maximally effective adversarial attacks, including decision-based, score-based and gradient-based attacks. The results suggest that this ansatz yields state-of-the-art robustness on MNIST against L0, L2 and L infinity perturbations and we demonstrate that most adversarial examples are strongly perturbed towards the perceptual boundary between the original and the adversarial class. 49 Akash Srivastava Ratio Matching MMD Nets: Low dimensional projections for effective deep generative models Deep generative models can learn to generate realistic-looking images on several natural image datasets, but many of the most effective methods are adversarial methods, which require careful balancing of training between a generator network and a discriminator network. Maximum mean discrepancy networks (MMD-nets) avoid this issue using the kernel trick, but unfortunately they have not on their own been able to match the performance of adversarial training. We present a new method of training MMD-nets, based on learning a mapping of samples from the data and from the model into a lower dimensional space, in which MMD training can be more effective. We call these networks ratio matching MMD networks (RMMMDnets). We train the mapping to preserve density ratios between the densities over the low-dimensional space and the original space. This ensures that matching the model distribution to the data in the low-dimensional space will also match the original distributions. We show that RM-MMDnets have better performance and better stability than recent adversarial methods for training MMD-nets. 50 David Stutz Learning 3D Shape Completion from Laser Scan Data with Weak Supervision 3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, ie, learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet and KITTI, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet, we additionally show that the approach is able to generalize to other object categories as well. 51 Daniel Heestermans Svendsen Active Emulation of Radiative Transfer Model with Gaussian Processes We introduce a methodology for constructing emulators multi-output Radiative Transfer Models (RTMs). RTMs compute the radiative transfer of radiation through a planetary atmosphere and are often costly to run. The proposed methodology, in a sequential and adaptive way, selects where to evaluate the RTM and is based on the notion of acquisition functions in Bayesian optimization. The Automatic Emulation methodology combines the predictive capabilities of Gaussian Processes with a suitably designed acquisition function which favors sampling in low density regions and high derivates of the interpolating function. We illustrate the promising capabilities of the method for the construction of an emulator of a standard leaf-canopy RTM, used to simulate Landsat8 spectra. 52 Carlos Villacampa-Calvo Alpha Divergence Minimization in Multi-Class Gaussian Process Classification This paper analyzes the minimization of α-divergences for approximate inference in the context of multi-class Gaussian process classification. For this task, several methods are explored, including memory and computationally efficient variants of the Power Expectation Propagation algorithm, which allow for efficient training using stochastic gradients and mini-batches. When these methods are used for training, very large datasets with up to several millions of instances can be considered. The proposed methods are also very general and they can easily interpolate between other popular approaches for approximate inference based on Expectation Propagation (EP) (α = 1) and Variational Bayes (VB) (α → 0) simply by varying the α parameter. An exhaustive empirical evaluation analyzes the generalization properties of each of the proposed methods for different values of the α parameter. The results obtained show that one can do better than EP and VB by considering intermediate values of α. 53 David Widmann Evaluation of model calibration in classification Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their ability to represent uncertainty about predictions. In safety-critical applications it is pivotal for a model to possess an adequate sense of uncertainty, which for probabilistic classifiers translates into outputting probability distributions that accurately match the empirical frequencies observed from realized outcomes. In this work we explain subtleties present in model calibration evaluation and propose different ways to quantify and visualize calibration in probabilistic classification. 54 Joel Zeder Scaling of annual maximum precipitation with changing temperature in Central Europe Previous studies found that changes of extreme daily precipitation show on average a positive relationship with temperature close to Clausius-Clapeyron scaling with strong spatial variability. Due to the inherent scarcity of extreme events and high internal variability, a large amount of long station series is required, which makes it challenging to detect a signal at the regional to local scale. Research is often limited by data availability and thus mostly based on publically available pre-calculated extreme indices that do not allow to assess the dependence on event duration. Therefore, also new non-linear machine learning analysis methods are implemented to efficiently exploit the available data. Here we use a new dense network of raw precipitation data series for Central Europe, extending over several countries including the Netherlands, Germany, Switzerland, and Austria, analysing intensity and frequency changes. Access to raw data allows to extract extreme precipitation indices for the yearly and seasonal maximum precipitation sum over a 1-, 3-, 5-, 7-, and 31-day period and the number of days per year exceeding the 95% and 99% precipitation quantiles of 1961-1990 from series covering at least eighty years within the twentieth century. Based on this we assess the spatial and temporal patterns in heavy rainfall intensification over Europe and their dependence on event duration and season. Non-parametric time series regression for the intensity and logistic regression for the frequency indices shows that a majority of series exhibit an increase since 1900, with a portion of significantly positive trends well exceeding those of resampled time series. This is also the case for almost all seasons and both winter and summer half years, as well as for almost all sub-regions, except for Austria having a wide range of stations showing negative trends. Non-stationary generalised extreme value distributions with temperature dependent location and scale parameters provide an estimate for the temperature dependency of yearly maximum precipitation and put the results in context of Clausius-Clapeyron scaling. Overall, we detect that the fraction of significantly positive trends is larger than expected by internal variability. In space, trends show a clear spatial pattern across country borders. 55 Yi Zheng Phase contrast computed tomography and deep learning The aim of this project is to combine deep learning methods with X-ray phase contrast imaging technique to develop a high sensitivity but low radiation dose 3D imaging method for breast cancer detection. X-ray phase contrast imaging computed tomography (PCCT) yields a higher contrast for soft tissues compared with conventional computed tomography (CT). Therefore it is more sensitive in detecting cancers. For example, a tumor in a piece of breast tissue was invisible in the traditional X-ray image but was identified using the X-ray phase contrast imaging method (Scherer 2015, PLOS ONE). However, the clinical application of such a technique is limited by (1) experimental setup unsuitable in a clinical environment; (2) high radiation dose. To address the first limitation, we are replacing the synchrotron radiation with a laboratory X-ray source and the fine gold grating with a piece of sandpaper, making it possible to be installed in a clinic. Before developing a full body scanner, our first step is to modify a bench-top micro-CT scanner and experiment on small excision specimens. Our method does not require a high brilliant laboratory liquid-metal jet source as in Zanette 2014 Phys. Rev. Lett, Zhou 2015 Opt. Lett. or a high precision motors in stepping as in Wang, 2016 Sci. Rep. This further relaxation in the requirement of specific equipment makes it one step closer to a clinical use. The key to success in this method is the sensitivity in subpixel displacement detection. We have improved the current speckle tracking method and received a 60% reduction in root mean square errors. For the second limitation, our solution is to find an algorithm that can faithfully reconstruct the 3D structure of the sample in a low radiation dose (resulting in low signal-to-noise ratio) scenario. Recently, a 3D reconstruction framework – automated transform by manifold approximation (AUTOMAP) was developed for undersampled data (Zhu 2018, Nature). It consists of three fully connected layers and two convolutional layers. Although the framework was developed for Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) data, I demonstrated that it can also be applied in a CT setup.
Second Poster Session: 4/09/2018, 18:30h - 20:30h