Luc De Raedt (KU Leuven website)
Can we automate data science?
Inspired by recent successes towards automating highly complex jobs like automatic programming and scientific experimentation, I want to automate the task of the data scientist when developing intelligent systems. In this talk, I shall introduce some of the involved challenges of and some possible approaches and tools for automating data science.
More specifically, I shall discuss how automated data wrangling approaches can be used for pre-processing and how both predictive and descriptive models can in principle be combined to automatically complete spreadsheets and relational databases. Special attention will be given towards the induction of constraints in spreadsheets and in an operations research context.
Johannes Fürnkranz (Technische Universität Darmstadt website)
The Need for Interpretability Biases
In his seminal paper in machine learning, Mitchell has defined bias as “any basis for choosing one generalization over another, other than strict consistency with the observed training instances”, such as the choice of the hypothesis language or any form of preference relation between its elements. The most commonly used form is a simplicity bias, which prefers simpler hypotheses over more complex ones, even in cases when the latter provide a better fit to the data. Such a bias not only helps to avoid overfitting, but is also commonly considered to foster interpretability. In this talk, we will question this assumption, in particular with respect to commonly used rule learning heuristics that aim at learning rules that are as simple as possible. We will, in contrary, argue that in many cases, short rules are not desirable from the point of view of interpretability, and present some evidence from crowdsourcing experiments that support this hypothesis. To understand interpretability, we must relate machine learning biases to cognitive biases, which let humans prefer certain explanations over others, even in cases when such a preference cannot be rationally justified. Only then can we develop suitable interpretability biases for machine learning.
Tuuli Toivonen (University of Helsinki website)
What intelligent data analysis can offer for nature conservation and sustainable spatial planning?
Spatial planning benefits from information on human activities and motivations behind these activities. Understanding how people use space, where and when that happens, how they move, why do they do that and what are the motivations and preferences behind their activities allows to do more sustainable planning decisions both in urban and natural environments. Such are urgent as we face some of the biggest challenges in the history of humanity, including climate change, biodiversity loss and rapid urbanization.
There is a growing interest in using big data that may provide novel and rich information on human-environment interactions, as an information source for nature conservation. New technologies allow cost-efficient and non-invasive approaches to collect data compared to traditional surveys and interviews. Geolocated social media posts, for example, contain near real-time information about activities, observation and opinions of people that extend beyond the spatial and temporal reach of traditional data sources. Technological advancements are revolutionizing the collection and analysis of these data. Big volumes of data are available online for further analysis, but using these data efficiently requires the adoption of automated data analysis methods, and careful consideration of data quality, representativeness and ethics of their use.
In my talk, I examine the potential and concerns of using machine learning and intelligent data analysis approaches together with big data sources in providing spatially explicit information on human activities to support sustainable spatial planning and conservation. The talk is based on experiences gathered during the past years of multidisciplinary research at the Digital Geography Lab of the University of Helsinki.