Data Science and Mining team (DaSciM)
LIX Laboratory, École Polytechnique.

Research

My main research interests are in the area of machine learning and data science, artificial intelligence and reinforcement learning, IoT and sensor analytics, and sequential data such as time series and data-stream mining.

Multi-label Classification

In multi-label classification, multiple target variables are modelled and predicted together for each instance, as opposed to the traditional learning problem where a single target variable (class) is predicted. The main challenge is detecting and modelling dependencies among labels, while maintaining scalibility to large problems. This task is relevant to many domains where multiple labels can be assigned to each example, for example in text categorisation, scene classification, video and other media, and medical and biological applications.

CC

It is a particular type of structured output prediction and, as such, has close connections with other topics, such as probabilistic graphical models, neural networks, time series forecasting, models for learning from sequential data.

Data Stream Classification

Many real-world applications are found in the context of data streams, where data instances arrive continuously in a theoretically-infinite stream, for example in sensor networks, online social media, news feeds, and large deployments of e-mail.

In this context, methods must be able to process large volumes of data quickly and learn and make predictions in real time, as well as detect and adapt to concept drift.

Applications

Some applications dealing with sensory-data that I have worked on with real-world sensor deployments:

Learning to predict a traveller’s route and destination

In Aalto University I was involved in the Traffic Sense - Energy Efficient Traffic with Crowdsensing project doing route recognition and prediction. Given only a week or so of location data from a mobile phone device, it was possible to make reasonably accurate predictions about the traveller’s route and future destination. See the

Tracking on very low-power sensor motes

In the Comonsens project in Spain I worked on formulating and implementing a distributed particle filter on very low-power motes for target tracking. For more information, see the

Modelling tree growth in Scots pine

In the project MultiTree - Multi-scale modelling of tree growth, forest ecosystems, and their environmental control, I worked with forestry scientists to model intra-annual growth of pine trees in Finand and France using machine learning methods.