Data Science and Mining team (DaSciM), LIX Laboratory, École Polytechnique.


Multi-label Classification

In multi-label classification, multiple target variables are modelled and predicted together for each instance, as opposed to the traditional learning problem where a single target variable (class) is predicted. The main challenge is detecting and modelling dependencies among labels, while trying to remain computationally tractable to large problems.

Multi-label learning is relevant to many domains, for example text categorisation (a document belongs to multiple categories), scene classification (each image is associated with multiple concepts or objects) as well as video and other media, medical classification, and applications in microbiology.


The general case of multi-output prediction includes the regression case; it is a particular type of structured output prediction. It has close connections with other topics (which are also research interestes), including

  • probabilistic graphical models

  • neural networks

  • time series forecasting

  • models for sequence learning

  • structured-output prediction.

Data Stream Classification

Many real-world applications are found in the context of data streams, where data instances arrive continuously in a theoretically-infinite stream, for example in sensor networks, online social media, news feeds, and large deployments of e-mail.

In this context, methods must be able to process large volumes of data quickly and learn and make predictions in real time, as well as detect and adapt to concept drift.


Some applications dealing with sensory-data that I have worked on with real-world sensor deployments:

Learning to predict a traveller’s route and destination

In Aalto University I was involved in the Traffic Sense - Energy Efficient Traffic with Crowdsensing project doing route recognition and prediction. Given only a week or so of location data from a mobile phone device, it was possible to make reasonably accurate predictions about the traveller’s route and future destination. See the

Tracking on very low-power sensor motes

In the Comonsens project in Spain I worked on formulating and implementing a distributed particle filter on very low-power motes for target tracking. For more information, see the

Modelling tree growth in Scots pine

In the project MultiTree - Multi-scale modelling of tree growth, forest ecosystems, and their environmental control, I worked with forestry scientists to model intra-annual growth of pine trees in Finand and France using machine learning methods.