COHORT.3

2021 – 2025

Modal Title

Alec Parise

Student

Project Title

Human-In-The-Loop Rule Based Feedback Towards Interactive Deep Learning

Project Description

As technology evolves, the more common it has become to encounter cases of faulty systems and data that have a deep impact on people’s lives. The most prominent ones are in the field of facial recognition and state surveillance in which innocent people are being arrested, gender bias in work recruitment and credit in which women are eliminated from recruitment processes and are denied credit because of “sexist” data sets, people dying in hospital triage because of the use of wrong data sets for classification models, and the list goes on. First and foremost: Do people know what Artificial Intelligence (AI) is? Do they know when they’re engaging with it? The answer is hardly certain, therefore how can they be fully aware if they are being harmed by AI? What is currently seen in practice are algorithmic audits which are key when tackling the source of the problem, but there is still a long way to go until these procedures are fully established and largely implemented. It’s still depending on companies and governments to search for qualified people to conduct the audits, but what happens when there’s no interest? As for what this project proposes, is to listen to individual experiences, one-by-one, trying to build a network and search for patterns. We will take an ethnographic approach to data sets, as we do with people’s narratives, and that will be our starting point. This project initially consists of figuring out the impact AI/ML systems have/ are having/had on people’s lives and to further understand precisely how they are/were affected. For that, interviews will be conducted and if possible recorded, so that we can produce not only oral but also visual records of people telling their stories. In the next phase of the project, we will run topic modelling and Latent Dirichlet Allocation (LDA) which seems to be a powerful tool for recognizing such patterns in discourse and also innovative in ethnographic research. When we draw from people’s experiences we are able to build a more relatable narrative which can help us build systems that take them into consideration. There is a necessity of building bridges between computer scientists, data scientists and social scientists in order to create AI systems, machine learning models and data sets that can be used in a non-harmful way.

Modal Title

Ana Paula Moritz

Student

Project Title

“Robots or something?” : An ethnography of Artificial Intelligence and its effects on society

Project Description

As technology evolves, the more common it has become to encounter cases of faulty systems and data that have a deep impact on people’s lives. The most prominent ones are in the field of facial recognition and state surveillance in which innocent people are being arrested, gender bias in work recruitment and credit in which women are eliminated from recruitment processes and are denied credit because of “sexist” data sets, people dying in hospital triage because of the use of wrong data sets for classification models, and the list goes on. First and foremost: Do people know what Artificial Intelligence (AI) is? Do they know when they’re engaging with it? The answer is hardly certain, therefore how can they be fully aware if they are being harmed by AI? What is currently seen in practice are algorithmic audits which are key when tackling the source of the problem, but there is still a long way to go until these procedures are fully established and largely implemented. It’s still depending on companies and governments to search for qualified people to conduct the audits, but what happens when there’s no interest? As for what this project proposes, is to listen to individual experiences, one-by-one, trying to build a network and search for patterns. We will take an ethnographic approach to data sets, as we do with people’s narratives, and that will be our starting point. This project initially consists of figuring out the impact AI/ML systems have/ are having/had on people’s lives and to further understand precisely how they are/were affected. For that, interviews will be conducted and if possible recorded, so that we can produce not only oral but also visual records of people telling their stories. In the next phase of the project, we will run topic modelling and Latent Dirichlet Allocation (LDA) which seems to be a powerful tool for recognizing such patterns in discourse and also innovative in ethnographic research. When we draw from people’s experiences we are able to build a more relatable narrative which can help us build systems that take them into consideration. There is a necessity of building bridges between computer scientists, data scientists and social scientists in order to create AI systems, machine learning models and data sets that can be used in a non-harmful way.

Modal Title

Badrinath Singhal

Student

Project Title

Performance and Scalability of Recommendation Algorithms

Project Description

Recent developments of new algorithms for recommender systems have followed two quite distinct tracks: (1) highly scalable linear algorithms, such as EASE and SLIM, that have been shown to work well on very large and sparse datasets. (2) Deep models, what learn user and item embeddings through a neural network architecture, that is computationally intensive to train. Another approach which is much less well explored in the state-of-the-art is the Bayesian approach, in which a generative model of the recommendation process is posited and an inference algorithm such as Markov Chain Monte Carlo or Variational Inference is applied to learn full distributions of the parameters of the model. Similar to deep models, a major challenge for Bayesian approaches is the computational intensity of the training process. This project will explore novel algorithms for recommendation, focusing on performance and scalability. We will consider whether a Bayesian model with tractable inference is feasible for the recommendation setting and examine the integration of a Bayesian approach with deep models or simple scalable models. As well as tackling performance, other qualities of the recommendation will be considered, such as diversity and novelty.

Modal Title

Di Meng

Student

Project Title

The flavour of disorder: predicting intrinsically disordered regions in proteins by Deep Learning

Project Description

Proteins are the basis of life and over the last few decades we have learned much information about them through genome sequencing projects and other massive-scale experiments. However important aspects of proteins such as their structure and function remain elusive and the experimental techniques devised to reveal them have not scaled up as quickly as the techniques that elucidate their sequence or expression. Nowadays we know the sequence of well over one hundred million proteins, while the structure/function is known for less than 0.1% of these. For decades the paradigm that proteins formed rigid, stable structures was essentially unquestioned, while it is now clear that many proteins only partly fold to a native regular structure or are normally completely unfolded or varied between folded and unfolded (semi-unfolded). By some estimates, up to 20% of amino acids in known proteins are in a disordered state. We currently have datasets comprising over 180,000 proteins for which disorder information is known in some form. The aim of this project is the prediction of disordered regions in proteins. The problem will be tackled by an array of Deep Learning techniques, which can learn the likely locations of disorder or semi-disorder from examples of proteins in which these locations are known experimentally. Also, we could dig into these locations to investigate disordered binding and semi-disorder variation. Upon success, the results of the project may feed into the online Distill servers and improve the quality of their results. The Distill servers are a widely used tool, with millions of queries served originating from over 100 national and transnational internet domains from all over the world, and even a marginal improvement of their performances would benefit a large pool of scientists world-wide and help them further their research on biology, biotechnologies, and drug design.

Modal Title

Eliane Birba

Student

Project Title

Machine Learning for Better Outcomes in Hematopathology Using Platelet-based Biomarkers and Mass Spectrometry Data

Project Description

The application of Artificial intelligence (AI) in healthcare has increased significantly in recent years, driven mainly by an abundance of data and powerful, accessible tools. Millions of blood cells are evaluated for hematological diagnostics by clinicians every day and this area offers significant opportunities for AI and machine learning. AI offers many promising tools to clinicians working in hematology to speed up and automate blood analysis. The proposed research seeks to employ AI in the analysis of hematopathology. The first objective is to develop a new algorithm for detection of Preeclamptic toxemia (PET) in pregnant women. Annually, PET claims the lives of 50,000 mothers and 500,000 babies and accurate, easily deployed detection tools do not exist. Therefore, there is an urgent, unmet challenge to develop accurate risk stratification tools for PET. This work will develop a new solution for risk stratification in PET. To achieve this, we will use a machine-learning algorithm to build a reliable and accurate test for preeclampsia, with a simple and easily interpretable score that can be deployed and implemented into widespread clinical use. This project will be a collaboration of the larger SFI-funded AI_PREMie project and take advantage of data collected and collaborator expertise within that project. The second objective is to apply machine learning to mass spectrometry (MS)-based proteomics analysis. This is an emerging area for application of machine learning techniques and there are a lot of research opportunities. Proteomics’ study is crucial for drug development, early diagnosis, and monitoring of diseases. One main challenge is that the proteome or the set of proteins in a cell/tissue/organism varies from time to time. Therefore, AI and machine learning methods can help fast and accurate protein pattern recognition and classification. This strand of the work will develop new techniques for the application of ML techniques to proteomics data, with a particular focus on making them accessible to clinicians

Modal Title

Faithful Onwuegbuche

Student

Project Title

A Deception Technique for Adaptive Intrusion Detection

Project Description

The rapid rise of the digital economy and the internet is driving growth of businesses, but it is also introducing new cyber security risks. The Accenture 2020 state of cybersecurity report reveals that the three areas of cybersecurity protection with the largest increases in cost are network security, threat detection and security monitoring. To help mitigate network based inside and outside attacks, researchers have developed honeypots which are deception defenses used to divert the attention of attackers from the real systems or networks and to analyze attacks methods and patterns of activities. They are also used to educate security professionals and support network forensics. However, traditional static honeypots can be detected easily using anti-honeypot toolkits, such as honeypot hunter, since they utilize a fixed configuration and response. When a honeypot is detected, an attacker can tamper with the evidence collected by the honeypot and attempt an attack on the real system. To help overcome these weaknesses, researchers proposed dynamic honeypots, which can change their configuration and can make it more difficult for an attacker to detect where valuable assets are located. Dynamic honeypots are usually deployed in a centralized host to support automation. However, this host, if compromised, can lead to the breakdown of the real system. Therefore, blockchain can be used to address this problem since it features distribution and decentralization. Due to the decentralization property, every network node disperses the computation load and has better robustness. The advantage of blockchain relies on the fact that data cannot be tampered with since any change would be revealed by the nodes which are connected to the network. Also, if one host is compromised the same information is still held by other hosts in the network. Therefore, honeypots and honeynet deployed with blockchain integration can better support network forensics as they can prevent fraud and data theft with more auditable features. The objective of this project is to develop a novel deception technique for adaptive intrusion detection. The proposed system will implement honeytokens that redirect the attackers from the real server or network to the blockchain based honeynet, in order to trap and then track the attackers’ activities, patterns and methods. This information will be used to update the intrusion detection system knowledge through online machine learning. Additionally, the honeynet will store data that could potentially serve as digital evidence during forensic investigations or provide information about a security incident.

Modal Title

Jean Francois Itangayenda

Student

Project Title

Bias in Financial Lending Models

Project Description

The research will delve into machine learning models and their perceived and received bias in lending financial transactions. By perceived bias, we want to understand how bias is seen and analyzed by model developers (for example, through their own judgment, model builders, with an intent of increasing fairness in their model, can exclude certain variables, features and categories in order to obtain a desired output). We want to understand how this act of removing certain items from models can “bias” or “influence” their outputs; by received bias, we mean bias that a model might communicate to its own process (training, test) – through data corruption, noise, etc.) We want to understand from a socio-technical perspective how this received bias —whether from model training, datasets or the models themselves—affects model outcomes. The project will be two-sided, looking at both the social impacts of bias in machine learning algorithms that predict and recommend loan refusal/acceptance to clients of financial institutions, the definition(s) of fairness in financial systems with regards to lending activities, the socio-technical forces that may or may not unknowingly impact data collection (for example, causing its corruption) and training; the technical demands of developing and running these models (socio-technical cost-benefit of these models, how are they deployed, how the data is collected, who or what controls these machines that run models (financial institutions, governments, etc.) and how ultimately all these factors impact these models’ outputs.

Modal Title

John O'Meara

Student

Project Title

Development of a Novel Intrusion Detection System and Architecture-specific Datasets in Software-Defined Networking

Project Description

Software Defined Network (SDN) technology allows for more efficient scaling of networks though central programming of network behaviour using software applications with open APIs. The separation of the control plane from the forwarding plane of the physical hardware allows for a consistent network management strategy regardless of network size or complexity. Unfortunately, as with any new technology, these benefits are accompanied by a host of new threats with the revised infrastructure providing new attack vectors for malicious actors intent on penetrating and/or disrupting network activity. The relatively nascent status of SDN technologies makes development of effective Intrusion Detection Systems (IDS) difficult. There is a lack of available SDN specific datasets, resulting in the deployment of IDS software, which has been developed using unsuitable data collected from traditional networks and hence, ignoring the architectural differences of SDN networks. The aim of this research is to focus specifically on the novel architecture of SDN technologies and to develop an appropriate IDS framework that is tailored to the unique architectures of SDN, effectively identifying and blocking attacks that focus on SDN-specific characteristics, in addition to the range of attacks to which standard networks are prone. The intended research will focus on generating new SDN specific datasets by deploying different SDN architectures, both in the virtual format and using physical devices, allowing for the collection of more intrinsic data. Effective IDS can then be developed by training Machine Learning (ML) models on these new datasets. Standard Supervised ML Models as well as unsupervised Deep Neural Network (DNN) and Reinforcement Learning (RL) models will be developed and evaluated. There are a series of expected challenges to be addressed within the proposed body of work. The specific points of difference in architecture between traditional networks and SDN networks will have to be identified and attack vectors designed and/or replicated. Appropriate test bed architectures must be chosen and implemented. The choice of ML model will depend on the dataset structure and will need to be tailored to this specific use case. Finally, appropriate validation systems will need to be crafted to comprehensively test the effectiveness of candidate IDS frameworks once developed.

Modal Title

Joyce Mahon

Student

Project Title

Integrating Machine Learning and Artificial Intelligence into Pre-University Education

Project Description

Artificial intelligence (AI) and Machine Learning (ML) technologies are rapidly generating new possibilities for all industries. For instance, it is now possible to automatically generate subtitles for videos, enabling people with hearing loss to use video chat tools that were previously unavailable to them, with little cost. However these technologies can also be used for more nefarious purposes. For example, automating the generation of malicious online social media content used to undermine democratic processes. The contrast between these two use cases of the same technologies is a perfect illustration of how new ML and AI technologies are having a huge impact on our lives, often in ways that raise a lot of technological, legal, ethical and societal issues. Our current and future pre-University students will grow up in a world where these technologies are commonplace, and they will develop the next generation of these technologies. It is important that they not only understand how they work, but also are equipped to make informed judgements about what technologies they want in their lives and society, and how they would like to use them. This work will expose a wider and more diverse audience to ML and AI tools such as analytics, speech recognition, and natural language processing; by developing materials and strategies for their use across a range of disciplines, building opportunities for students and teachers to see these tools in action and become involved in their design and application. This work also has a secondary aim to improve diversity in computer science education. The research questions that this project will address will include: • What are the current opinions of Irish second level students on what ML and AI should be used? • What are the best ways to engage with Irish second level students on the topic of ML and AI?

Modal Title

Laura Dunne

Student

Project Title

Bus Network Optimisation with Machine Learning

Project Description

Buses are a vital component of an urban environment, and shifting away from private cars towards public transport is essential in minimising our environmental impact and creating sustainable cities. The UN Sustainable Development Goal 11 seeks to “Make cities and human settlements inclusive, safe, resilient and sustainable”. Specifically, target 11.2 states that cities should expand public transport. Unfortunately, there is a trend away from public transportation and towards private cars, due to passenger dissatisfaction with the public transport networks. However, with increased urbanisation, enduring widespread use of private cars is unsustainable, and we must make bus transport an attractive option for passengers. Many factors influence a passenger’s transport choices, but convenient routing options and reliable service are frequently reported unmet needs. Unfortunately, there are physical and financial limits on the service provided, so it is crucial to optimise the resources available to provide the best possible service. The proposed research seeks to provide better scheduling and better route design by applying machine learning (ML) to several under-exploited areas in the bus transit domain. Researchers have demonstrated that ML can improve the efficiency of public transport, and the focus to date has been on the application of various ML algorithms. However, the results are often conflicting, and the experiments are usually conducted on a single bus route in a single city. We propose to examine a whole network of buses and also to attempt to validate the transferability of our experiments on unseen routes, ideally from an unseen bus network. We also plan to address the conceptual model of the bus network and how the network is structured before ML modelling, and how this conceptual model interacts with various ML algorithms. Chokepoints are a significant factor that makes bus transport less reliable. Chokepoints cause bus bunching, which has been shown to impact severely upon the passenger’s service. Analysis has demonstrated that chokepoints in bus networks are caused by physical constraints like signalised intersections or bridges and dynamic factors such as weather or school collection times. We propose to work with OpenStreetMaps data to analyse features that impact bus reliability and train ML algorithms that can predict optimum bus routing. By applying ML to the bus transport domain, we hope to add knowledge that will help optimise bus networks.

Modal Title

Ramin Ranjbarzadeh Kondrood

Student

Project Title

An attention-based mechanism for brain tumor segmentation using four modalities

Project Description

Brain tumor localization and extraction from magnetic resonance imaging (MRI) is a vital task in a wide variety of applications in the medical field. Current strategies demonstrate good performance on Non-Contrast-Enhanced T1-Weighted MRI, but this is not true when confronted with other modalities. Each modality represents different and vital information about the tissue we are working on. So, in this proposal, we propose an algorithm based on four modalities T1, T1c, T2, and FLAIR for segmenting the tumor region with a high rate of accuracy. To increase the efficiency of the model and decrease the evaluation time, a powerful pre-processing approach for removing the insignificant areas of the brain is used. Also, to improve the segmentation result of discrimination between internal areas of the tumor, an attention-based mechanism is used. We will use the BRATS 2018 dataset which comprises the Multi-Modal MRI images. Each patient’s sample in the dataset has the dimensions of 240×240×150 and were annotated by specialists.

Modal Title

Robert Foskin

Student

Project Title

A Reinforcement Learning Approach to Continuous Measurement-Based Quantum Feedback Control

Project Description

Reinforcement learning has had proven success in the domain of classical control and there has been a recent surge of work investigating its application in quantum control problems however not much work has focused on utilizing continuous measurement when training the agent. This is because the act of measurement on quantum systems is fundamentally different to that of the classical counterparts and introduces a number of unique challenges. The goal of this project will be to investigate the application of reinforcement learning to the control of quantum dynamics using continuous measurement-based feedback techniques. A key objective of this project will be the development of novel techniques for quantum state representation in reinforcement learning algorithms and further research in this area has the potential to impact the development of near term quantum technology, including a fault-tolerant quantum processor. In the area of feedback-based control, current methods have not shown sufficient progress. This is because such applications to quantum systems quickly become intractable for standard optimal control techniques when quantum feedback leads to an exponential increase in the search space. Analytical approaches are also difficult to realise consistently for quantum systems in experimental settings. Due to the presence of noise and de-coherence, optimal dynamics in an experimental system diverges from that of the model used when optimizing the control strategy. This is especially true in quantum feedback control where the act of observing the system continuously introduces non-linearity within the dynamics and generates measurement induced noisy dynamics. Established optimal control techniques have worked well for linear, unitary and deterministic systems, however no known generalized method exists for non-linear and stochastic systems. Reinforcement learning can be implemented in these settings because it is agnostic to the underlying physical description generating the observed dynamics. Control schemes can be derived heuristically using agent-based learning in a quantum environment. Such an approach would be adaptable and robust to changes in the environment and could be implemented more readily in experimental settings than model-based optimal control techniques. Beyond this, reinforcement learning has the potential to become a powerful simulation tool for quantum systems which cannot be analysed effectively by established methods such as systems with large Hilbert spaces, non-integrable systems and systems undergoing far from equilibrium dynamics. Learning-based approaches show promise as a way to probe the control landscapes of these quantum systems to

Modal Title

Ryan O’Connor

Student

Project Title

Leveraging machine learning for the design and analysis of optimisation algorithms

Project Description

Combinatorial optimization problems arise in many areas of computer science and other disciplines, such as business analytics, artificial intelligence and operations research. These problems typically involve finding groupings, orderings or assignments of discrete, finite sets of elements that satisfy certain conditions or constraints. Designing good optimization algorithms requires human ingenuity and a spark of genius. Automating the process of designing and analyzing algorithms has been a long-standing quest for AI researchers and such techniques are expected to have a very high impact in a range of applications. This PhD thesis will explore if machine learning techniques (particularly reinforcement learning and graph neural networks) can be leveraged to augment the human ability to design good heuristics for given input distributions. The thesis will also explore if reinforcement learning techniques can assist in finding counterexamples to discover the limitations of the existing heuristics in terms of the best approximation ratio achievable or in terms of running time. The success of this project will greatly augment the human ability to design algorithms for combinatorial optimization problems in industry. 39

Modal Title

Sonal Baberwal

Student

Project Title

Machine Learning Algorithms for exoskeleton control using Brain Computer Interfaces

Project Description

A Brain-Computer Interface (BCI) system allows the brain to communicate with an external device. A BCI consists of signal acquisition, feature extraction, feature translation, and device output. BCI systems have been a great help in the medical field for assistance to patients suffering from neuromotor disorders, spinal cord injuries and trauma to the nervous systems, with applications including wheelchairs and exoskeletons. Brain activity may be recorded through invasive, semi-invasive or noninvasive systems, detecting electrical or optical signals from the brain related to brain activity. Once the raw signals are collected from these systems, it is important to analyze and translate these signals to control an interfacing device in real-time. Hence, a robust framework is a requirement for any BCI system. Machine learning/ Deep Learning algorithms are now used for the processing of these signals and later translating them into action. The algorithms for BCI control need to be trained and tested. Advancement in the BCI systems has developed from moving a pointer cursor to operating a wheelchair using the brain commands. Various neuronal potentials may be used to control such commands, for instance, SSVEP, P300 and motor imagery signals. This study aims to evaluate the EEG signals from a non-invasive cap and translating these for a lower limb exoskeleton allowing dynamic balance without the use of crutches while walking. One of the major challenges faced in this field is the time investment for training purposes. The research will apply qualitative and quantitative analysis to evaluate the present state-of-art in BCI systems to operate exoskeletons.The goal is to design a BCI framework having minimal training and evaluation of various neuronal potentials that would decrease the training time for exoskeleton addressing the following possible research questions: Q1. Which neuronal potentials have the highest accuracy, efficiency and require less training? Q2. Do hybrid systems perform better as compared to traditional systems for applications like exoskeletons? Q3. Which algorithms hold higher accuracy and could be operated in real-time to minimize the training. Following on from these questions a framework will be generated for the advancement in locomotion for exoskeletons.

Modal Title

Yingjie Niu

Student

Project Title

AI-first Finance: Discovering and Forecasting with Alternative Data

Project Description

In the financial area, decision-making traditionally relied on quantitative indicators collected from financial statements manually. In the last decade, the explosion in the sheer magnitude of data, such as financial news, earnings conference call voice recordings, SEC 10k reports, etc. has brought a huge opportunity and has been playing an increasing role in asset management, decision-making tasks. Each type of data has its own advantages and disadvantages. High-frequency textual data such as social blogs are relatively short and can reflect real-time events, but always involves a lot of noise. Medium-frequency text data including financial news usually have clean content because they are from the official provider, which makes it easy to process and analyze, but the information carried has a certain lag. The professional financial documents, such as 10k reports, are more reliable and contain large valuable information, but are released quarterly or annually. The aim of the project is to leverage the advantages of different types of data and modern natural language processing (NLP) and artificial intelligence (AI) technologies to make a precise financial market prediction and assist the investor’s decision-making process. Objective1: Develop an approach incorporating multi-source text data, i.e. low-frequency, medium-frequency, and high-frequency financial text sources, into financial prediction. Objective2: By building a Graph Convolutional Network (GCN) for economic entities, extract the relationship of different entities, and prompt the use of indirectly correlated text data. 40