COHORT.1

2019 – 2023

Modal Title

Agatha Mattos

Student

Project Title

Mapping Slums with Machine Learning

Project Description

It is estimated that over one billion people live in slums without access to public services such as water and sanitation. Knowing the location and extent of these settlements is critical to monitor the United Nations’ Sustainable Development Goal of ensuring adequate housing and basic services for all by 2030. However, detecting the location of slum areas at a global scale is an open problem. Currently, the main source of information about the percentage of the world urban population living in the slums comes from housing census surveys, which are labour-intensive, time consuming and require substantial financial resources. An alternative is to use passively collected data, such as satellite imagery, and image processing techniques for the task of mapping these settlements. In this context, recent reviews show that the use of machine learning image processing techniques is still very scarce, but that it shows promising results.

The proposed research seeks to advance the development of a global slum inventory. The first objective is to develop a new algorithm for mapping slums. To achieve this, a georeferenced dataset of slum communities, made available by the Brazilian government, will be used. The characteristics of this dataset make it well- suited for this research. Firstly, the slum areas are already georeferenced and labelled, a process that is often very time-consuming in machine learning projects. Secondly, it has data for many cities, which allows for models to be trained in different urban contexts. This has been demonstrated as a limitation of current studies and that would, consequently, contribute to the advancement of the state-of-the-art on the topic. The second objective is utilise machine learning techniques to quantify the population living in these settlements. To assess the suitability of the algorithms that will be developed in this research, the results will be compared with official census data. The models will also be evaluated in terms of the cost of acquiring the images, computation complexity and generalizability of the approach to other regions.

Modal Title

Bahavathy Kathirgamanathan

Student

Project Title

Time series techniques for return to play predictions using wearable sensor data

Project Description

PhD Projects Short Description Injuries are extremely common in any sport. When an athlete is injured, a decision must be made on when they are ready to return to play, which is crucial in managing the risk of re-injury. Wearable sensors such as Inertial Measurement Units (IMU’s) allow the motion characteristics to be captured during motor tasks such as lunging, squatting, walking etc. This is giving rise to an emerging field of ‘digital biomarkers’ where digitally captured biomechanical features are being used to train models to characterize the various stages of recovery and help identify when the athlete is fit to return to normal activity.

In this research we will attempt to develop models that will enable us to identify the potential role that such digital biomarkers could have in the sports injury field. In particular, motion data from functional motor tasks will be used to interrogate the potential role of digital biomarkers in recovery following anterior cruciate ligament injury (ACL) and subsequent corrective surgery.

One of the challenges in implementing this study will be the limited data availability and more particularly the difficulty in obtaining labelled data. Whilst working with wearable sensor data on athletes obtaining labelled data can be difficult due to the time-consuming nature and domain specific knowledge that is required for the labelling. To overcome this limitation, methods such as transfer learning and semi-supervised learning can be investigated. Transfer learning provides some potential to leverage the knowledge from previously trained models either from the same or a different domain. Transfer learning has recently been successfully implemented on images and text but there has been less work in this area for sensor data (time series data) and hence gives scope for novel methods to be developed.

This study aims to investigate new data driven approaches that can be used to aid clinicians in making informed decisions on when an athlete is fit to return to play. It has been planned to use Pre-seasonal digital data (IMU Sensor data) and analog measures (physical measures such as reach distance) from athletes who have sustained ACL injury.

Modal Title

Ciara Feely

Student

Project Title

Applying Recommender System Techniques to Support Physical Exercise, Particularly for Endurance Sports

Project Description

The human body has not yet adapted properly from the hunter-gatherer lifestyle to a more sedentary one, leading to a rise in dangerous, avoidable diseases including obesity, type 2 diabetes, and osteoporosis (Smyth, 2019). People are realising the importance of living a healthier, more active lifestyle. The advance in wearable sensors and mobile fitness applications reflects this. These technologies allow users to track their activities and set goals, but they do not take an active role in prescribing specific training and recovery activities for users (Smyth, 2019).

The context of endurance sports is great from a machine learning perspective for several reasons (Smyth, 2019). Firstly, the number of people participating in endurance events each year is large, and within that there are a lot of inexperienced individuals needing assistance. Secondly, the aforementioned rise in fitness technologies means that there is a ton of data available from mobile fitness apps such as Strava. Lastly, there are a number of interesting problems to be solved using machine learning techniques such as: fitness level estimation, training session classification, recovery and injury prediction, how to develop personalised training programs, goal time prediction and pacing planning (Smyth, 2019).

Prior work in this area includes the development of a novel application for recommender systems to predict a personal best marathon finish time and pacing plan for a user (Smyth, 2017). Initially this was done using runners who had run two or more marathons. The finish time prediction was accurate for fast runners, but less so for slower runners who would likely be those who would benefit most from this prediction. A later paper (Smyth, 2018), largely improved this using a richer training history for runners. However, these methodologies do not allow for first time marathon runners to determine a predicted finish time.

Therefore, this project will involve extending this previous work to inexperienced marathon runners by including data about different distance runs in place of missing marathons, as well as working on solving some of the other tasks mentioned. The explainability of recommendations made to users will also be a focus for this project since this will allow users to understand why specific training activities are being suggested to them.

Smyth, B., Cunningham, P. 2017. “A novel recommender system for helping marathoners to achieve a new personal-best,”

Modal Title

Courtney Ford

Student

Project Title

Explaining With Cases: Computational & Psychological Explorations in Explainable AI (XAI).

Project Description

Artificial Intelligent (AI) systems are playing an increasing role in decision-making tasks for a variety of industry sectors and government bodies. As such, interactions between human users and AI systems are becoming much more commonplace, and there is a pressing need to understand how people can understand these systems and come to trust their abilities on diverse and critical tasks. However, these developments raise two fundamental problems: (i) how can we explain the black-box decision-processes of these AI systems and (ii) what type of explanation strategy will work best for people interacting with these systems

Recently, the field of eXplainable AI (XAI) has emerged as a major research effort, underpinned by its own DARPA program (Gunning, 2017 DARPA Report), to find answers to these questions. For example, Kenny and Keane (2019 IJCAI-19) have proposed a Twin System approach to explain the decisions, classifications and predictions of deep-learning systems by mapping the feature-weights of a black-box AI into a much more interpretable case-based reasoning (CBR) system to find explanatory cases. This type of post-hoc explanationby- example has a long history in the CBR literature but is marked by a paucity of user studies; that is, it is not at all clear whether people find these case-based explanations useful.

The proposed research will explore both computationally and psychologically the most effective ways in which cases can be used to explain black-box AI systems. Computationally, new algorithmic methods for finding different types of cases will be developed (e.g., to find counterfactual, semi-factual and factual cases) and explored in the context of the Twin Systems approach involving the three main data types used in deep learning systems (i.e., images, text and tabular data). Psychologically, user studies will be performed to evaluate the explanatory validity of case-based explanations and to identify the optimal forms these might take to aid human users.

Outcomes from the work will be (i) a generic computational framework that can applied to any decisionmaking AI system, (ii) definitive knowledge about how and what cases may be deployed to accurately explain the decision processes of such AI systems. The outcomes of the work will be a generic framework and solution to the XAI problem in the context of post-hoc explanation by example, to help users garner a better understanding of AI systems and to help them be more satisfied and trusting of their decision making processes.

Modal Title

David Kilroy

Student

Project Title

Ethical Consumer Sensing and Product Discovery

Project Description

In order to deliver new and innovative products in a timely manner, and to respond to growing consumer demands, companies need to mine and analyse a large number of noisy data sources (both web and in house). The goal? To obtain a holistic view on emerging consumer fads and trends as early as possible in their hype cycle. However, current legislation (e.g. GDPR) prevents the blind analysis of any (potentially) personal or socially-sensitive dataset beyond its intended collection purpose without the expressed informed consent of any affected individuals. This project aims to investigate novel automated solutions for ethical social listening: identification of trends, consumer sentiment, emerging fads and ideas, and opportunity analysis. This includes leveraging existing as well as developing novel methods to distill product reviews/ratings, search volumes and search ranks using in house as well as external sources of (web) data. Key challenges in this project will be “intelligently’’ curating appropriate data sources (automated cleaning, transformation and modelling), handling mixed data, dealing with structured as well as unstructured data, and operating within the domain of Fairness, Accountability, and Transparency in Machine Learning (FATML).

Modal Title

James Fitzpatrick

Student

Project Title

Optimization solutions for SmartGrid applications leveraging machine learning techniques
https://youtu.be/JNsQXdWirIU

Project Description

EU directives set out targets for renewable electricity, heat and transport. Low carbon technologies (LCTs) such as heat pumps (HPs) and electric vehicles (EVs) are critical components of the Government of Ireland’s Climate Action Plan to respond to the need to decarbonise heating and transport. The rate of adoption of these technologies is uncertain, but they pose considerable new challenges if they are to become fully- integrated components of the national infrastructure. In particular, the adoption of EVs on a large scale makes it necessary to solve large-scale, modified routing problems with many constraints more urgently than before.

This project responds to the need to identify how energy systems can be transformed to be secure (reliable), clean (green and sustainable), and fair (ensuring the citizen is at the centre of, and benefits from the transformed system). The project aims, in particular, to explore the design demands of electric vehicle routing problems and how they might be resolved efficiently and effectively.

Specifically, for problems such as the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP), the traditional approaches of integer linear programming solvers and constraint programming solvers do not scale to larger problem instances. The project will build on existing supervised, unsupervised and reinforcement learning techniques to develop scalable solutions for these optimisation problems. These machine learning techniques will be used to improve algorithmic efficiency, in terms of solution-time and to solve more difficult routing problems with more constraints while enabling developers to automatically design their algorithms to solve specific problem variants. Building on the recent advances in these fundamental machine learning areas, the project will aim to achieve the energy system transformation by providing fast, scalable, ML-assisted approaches to VRPs and, in particular, E-VRPs. Also, the project has opportunities to explore the generalisation of these techniques to other difficult, important combinatorial optimisation problems by designing problem-independent techniques.

Modal Title

Qin Ruan

Student

Project Title

Unbiased News Recommender Systems

Project Description

Users are likely to look for news items that are consistent with their own cognitive and political views and ignore news items that are contrary to their own views. The current mainstream personalized recommendation algorithms have exacerbated this so-called ‘filter bubble effect’ phenomenon. The research showed that filter bubbles in the news domain can create serious effects, such as diminishing public discourse, and the fostering of highly polarised views amongst users, etc. This proposed project aims to build a state-of-the-art deep content-based news recommender system that can mitigate these adverse. Based on this goal, we identify a number of potential research tasks to discover accurate information behind news items and improve the performance of current news recommender systems. For example, we will explore how to generate unbiased news summarization as using unbiased news summarization as training data to train recommender systems may reduce the tendency of recommender systems to provide biased guidance; we will study multi-source news summarization to present users with comprehensive summarization of events to increase user awareness of the filter bubble effect; We will build an end-to-end news recommendation model by taking the advantages of cutting-edge deep learning and NLP technologies.

Modal Title

Thu Trang Nguyen

Student

Project Title

Time Series Classification with Explanation
https://youtu.be/0N89eSP_pJ4

Project Description

If we have two different explanations from the same machine learning algorithm, or from two different machine learning algorithms, which explanation is better?

Can we quantitatively compare the usefulness of explanations by linking them to certain tasks, where we can assess that an explanation is better because it helps with solving the task better. For example, can the given explanation help a human or a machine improve the accuracy or speed of labelling of a given set of examples? How do we objectively compare explanations for given tasks, and what are good ways to compute and compare the usefulness of explanations?

This project focuses on building supervised machine learning models in the context of sequence and/or time series classification, and developing methods for providing explanations and assessing their usefulness for different applications, for example in the sports science or smart agriculture domains. We will start from deep learning methods and post-hoc explanation methods that aim to explain black-box models such as CAM (Content Activation Maps) and compare these to state-of-the-art linear models (that are intrinsically easier to explain) and their associated explanation. Such techniques, when used in the time series classification task, aim to highlight parts of the time series signal that are useful for the classifier in reaching a classification decision, the so called discriminative parts of the signal.

Can we make use of these highlights/explanations in achieving higher classification accuracy by a second stage classifier or by a human, can we improve the robustness of labelling and do all of this faster, thus allowing us to quantify and compare the usefulness of different explanations.