Explainable multi-criteria optimisation algorithms for land-use change in Ireland
Combining data from agricultural, economic, meteorological, geological and demographic sources with satellite imagery to understand land-use changes over time and identify Pareto optimal conditions for future land use satisfying multiple criteria including economic output, finance, resource utilisation, supply and demand fluctuation due to population and demographic changes and climate change targets. A variety of methods will be explored in order to examine trade-offs in algorithm performance and interpretability. Reinforcement learning, Evolutionary, Classification Tree (combined with Monte Carlo Search methods) and Markov Decision Process based algorithms all provide themselves as candidates with varying degrees of success in other applications. The goal is to provide a set of Pareto-optimal trade-off solutions that would allow decision makes to compare different balances of conflicting objectives.
Multi-modal user modelling to enhance productivity, memory and health
This project will explore how multi-modal context-aware user models can be used for the purposes of enhancing productivity by allowing computer systems in real-time to intelligently adapt to a user’s state at that point in time and the context of their current mental state. Imagine sitting at your computer being bombarded with requests over instant messaging while trying to complete other tasks. This is something you might be able to manage on any other day without problem, but today you are having difficulty because you are tired as you didn’t sleep, and the three coffees you had earlier in the day are not having the desired effect. The systems and software you use daily cannot adapt appropriately to this because they are not aware that you woke too early and haven’t eaten, and that in that moment you were feeling overwhelmed with the task load. Multi-modal context-aware user-modelling would allow for such systems to be aware of your current mental state in that moment by leveraging signals (neural, physiological, lifelog, etc) captured up to that point in time. Furthermore, having such information would enable powerful personal information retrieval and summarization systems that would allow you to understand what areas of your work or day have been most impacted in terms of productivity, enabling you to change or identify behaviors (or how you schedule work tasks) in order to achieve optimal throughput. By using signals produced from the body such as EEG (Electroencephalography), EOG, movement, heart rate, GSR and breathing, in concert with the capture of signals from our environment such as those from a lifelog camera or computer interaction, this research aims explore the types of multi- modal context-aware user models that can be built based on machine learning that can be used for the purposes of enhancing productivity by allowing computer systems in real-time to intelligently adapt to a user’s state at that point in time and the context of their mental current state.
Quantifying Uncertainty and Decision under Uncertainty in ML
Real-life applications of deep learning models would require these algorithms to achieve a set of tasks that they have not encountered yet. Moreover, the data coming from observations might drift continuously or shift frequently from the data which is used to train the models. These challenges cause uncertainties for models. The uncertainties can be grouped under two broad topics: epistemic uncertainty (model uncertainty) and aleatoric uncertainty (noisy data and/or dataset shift). These two types of uncertainties originate our confidence in a prediction, namely predictive uncertainty. The dynamic nature of behaviour and physical world necessitate assessing uncertainty and adapting to changing environments for intelligent beings. Assessing the uncertainty instead of eliminating it would be a better approach for creating an intelligent agent. A big challenge for current deep learning models is both being generalizable in a reliable manner and being fast, safe, and efficient adaptation to the changing in the objective tasks. Current deep learning models can often output a confidence level in their predictions, however the benefit of having such confidence level and how it should be used to tune, change or adapt a ML model is not well investigated. The two major challenges of this project will be to develop algorithm to detect and quantify uncertainty for the task at hand, and to develop an adaptive framework for deep learning models to react to changing level of uncertainty. Tackling the epistemic and aleatoric uncertainty would require changing or updating the current models. The desired result of this project would be a learning algorithm which will adapt to the changing environment in an uncertainty-aware manner. The method would assess uncertainty when given a task and if the new situation is not familiar, should output a high uncertainty so that the model can adapt to uncertain conditions. Adaptation could imply changing models parameters, adapt to the presence of new or changing feature or to the absence of some features, adapting to new performance criteria for the model, or switching to a different machine learning architecture in order to maintain the required level of performance. Our research question is: can the performance of a deep learning model be increased by making it adaptive to the level of uncertainty of the task at hand? Data-driven methodology such as KDD will be used to develop and evaluate our research and to test our models with different datasets in changing and uncertain domain.
Non-Euclidean data refers to a set of points that cannot be represented on a two-dimensional space since they violate at least one of the axioms of Euclidean geometry. Graphs are a non-Euclidean data structure composed of nodes (objects) and edges (relations) since they violate the triangle inequality principle. By mapping real-world data onto a graph, it is possible to model complex systems such as physics systems and social networks from which rich relational information can be extracted. Graph analysis is a process aimed at extracting information from graphs and some of the objectives include node classification, link prediction and clustering. Examples of techniques employed in graph analysis include those for finding the minimum spanning tree or producing adjacency matrices. These techniques have been used for applications in aviation, such as for aircraft scheduling as well as in multiprocessor systems for task allocation. Recently Deep Learning (DL) based methods such as Convolutional (CNNs) or Recurrent Neural Networks (RNNs) have been employed for graph analysis. However, here the structure of a graph must be explicit to fully exploit the relations between the objects. These methods function with Euclidean data in one- or two-dimensional spaces thus not suitable for processing graph inputs efficiently. To overcome this issue, a recent research field Geometric Deep Learning (GDL) is devoted to build models that can be trained efficiently with non-Euclidean data. One of techniques for such a type of learning is Graph Neural Network (GNN) and one of its peculiarities is its invariance for changes in the order in which non-Euclidean inputs are presented to the learning mechanism. Here, the edges among nodes are treated as dependencies rather than features, in contrast to traditional Euclidean-based learning approaches. This project will be devoted to better understanding the functioning and application of Graph Neural Networks as well as formally comparing them against traditional approaches for deep learning.
Understanding residential electricity consumers demand needs and renewable energy supply capabilities using explainable machine learning models
EU directives set out targets for renewable electricity, heat and transport. Low carbon technologies (LCTs) such as heat pumps (HPs) and electric vehicles (EVs) are components of the Government of Ireland’s Climate Action Plan to decarbonise heating and transport. The rate of adoption of LCTs and other technologies such as domestic photovoltaic (PV) generation is uncertain. The impact of geographic clustering, uncertainty in weather, and the range of technology options for EVs, HPs and PVs further complicate the evaluation of the impacts on the low voltage (LV) electricity distribution network. This poses considerable challenges to the management and operation of future distribution networks and smart grids. Similarly, renewable energy and LCTs have high potential to contribute to developing countries such as Uganda. These opportunities come with significant challenges. Currently only 1.1% of Uganda’s energy needs are served by electricity with 90% of total primary energy consumption in the form of firewood, charcoal, or crop residues. Solar offers potential in rural electrification schemes, while urban electricity networks are currently unreliable with regular load shedding. The Government of Uganda’s national plans aim to use renewable energy to support socio- and economic development in an environmentally sustainable manner. The challenges and opportunities in Uganda differ from those in Ireland, but are linked by the potential for Machine Learning to identify solutions and recommendations. This project responds to the need to identify how energy systems can be transformed to be secure (reliable), clean (green and sustainable), and fair (ensuring the citizen is at the centre of, and benefits from the transformed system). The project aims to use Machine Learning to support the evolving design demands of LV networks in urban areas, and to explore the potential for Machine Learning to support the development of renewable energy communities. It will focus on the impacts of LCTs, and the potential opportunities for local energy communities. The methodology and specific research questions will be defined in detail. Sample research themes include: ML to support (rural & solar) energy communities; ML to support LV network design; ML to support Smart Grid – fair load management in urban grids. The main deliverable will build on ML Fundamentals (particularly sequential data) to inform future LV electricity distribution network design decisions, and will address ML in Society to support fair operations of renewable energy communities.
Machine Learning Solutions for Quality-Energy Balancing for Rich Media Content Delivery in Heterogeneous Network Environments
Machine Learning (ML) enables scientists to design self-learning solutions which solve important problems which are too complex to solve with classic methods. As the demand for mobile traffic is increasing day by day, 5G networking is intended to govern the infrastructure in the telecommunication industry. This project will design a set of ML solutions to address Quality-Energy balancing when delivering rich media content over heterogeneous 5G networks. The proposed solutions will balance the rich media content, including multi-sensorial video and VR, important time and bitrate requirements, with energy efficiency goals set for devices and networks. Machine Learning solutions are predicted to help in making the 5G solution feasible. The project will involve network simulations and prototyping Bringing Machine Learning Solutions/Algorithms into 5G infrastructure for various applications involves a lot of challenges that need to be addressed before beginning any project or research: 1. Interpretability of results 2. Computational Power required by ML Algorithms 3. The Long training times of some ML Algorithms 4. Maximization of the utilization of the unlicensed spectrum 5. Opportunistic exploitation of white spaces 6. Adaptive Leasing between carriers 7. To run the new applications like VR, multi-sensorial videos above 30 GHz in a mobile phone, the upcoming mobile phones and devices need smaller and adaptive antennas to receive the higher frequency waves. 8. Most important Challenge for the delivery of rich media content is the availability of real data. Any ML algorithm needs high quality data for it’s working and the type of data decides which type of learning to use. Generating datasets from computer simulators (Ns3) is not always a good practice as the ML algorithm will end up learning the rules and the environment with which the simulator was programmed. The main point of using ML is to learn from the real data which will not happen when we generate datasets from computer simulators. The availability of real datasets available for 5G is one of the biggest challenges. 9. Another important challenge is to apply the correct kind of distribution to our specific 5G application and which algorithm works well on the specific data. The end goal of ML algorithms is to optimize and improve the delivery of rich media content over heterogeneous 5G Networks.
Machine Learning for Combinatorial Optimization problems in Network Design
Combinatorial optimization problems arise in many areas of computer science and other disciplines, such as business analytics, artificial intelligence and operations research. Prominent examples are tasks such as finding shortest or cheapest round trips in graphs, graph pattern matching, scheduling, time-tabling, resource allocation etc. These problems typically involve finding groupings, orderings or assignments of discrete, finite set of elements that satisfy certain conditions or constraints. A large number of optimization problems are NP-hard. As a result, applications typically rely on approximation algorithms, carefully designed heuristics for specific instance classes, or meta- heuristic frameworks to solve them efficiently. While the first two require significant human effort, the last set of techniques can be very far from optimal. Finding good optimization solutions efficiently has been a long-standing quest and such solutions are likely to have very high impact in a range of applications. In recent years, researchers are exploring if machine learning techniques can be used to learn the solutions directly from the input. The early results based on deep-learning techniques, that modelled these problems as sequence to sequence learning tasks, have shown that it is indeed viable to do so, as there are universal motifs present in graphs (across datasets and scales) that can be leveraged to learn effective models for optimization problems. However, these deep-learning based techniques do not seem to generalize well, require a lot of training data (which requires solving a large number of NP-hard problem instances) and are not interpretable. This PhD project will explore machine learning techniques to design novel heuristics that can achieve optimal (or close-to-optimal) solutions for combinatorial optimization problems. The goal will be to use simple interpretable models relying only on local features of elements, such that these heuristics can potentially be mathematically analyzed for optimality guarantees on specific input distributions. In terms of learning technique, we will use a learning-to-prune framework, reinforcement learning and graph neural networks for this purpose. The success of this project will greatly augment the human ability to design algorithms for combinatorial optimization problems in industry.
A credit rating evaluates the creditworthiness of an entity that is seeking to borrow money, with regards to a particular financial obligation (Investopedia, 2020). Credit rating in the case of corporates or governments is normally supplied by a credit rating agency, such as Fitch, Standard and Poor’s, and Moody’s. Credit rating also determines the cost of borrowing for the entity issuing a financial instrument. However, the rating agencies have a significant conflict of interest in evaluating the creditworthiness of an entity as it is the entity who pays for the rating. The consequences of such conflict of interest have been seen in the financial crisis of 2007-2008 when the highest ratings were assigned to financial products that were of a significantly poorer quality (Stirier, 2008). Additionally, credit ratings are usually expensive due to the amount of labour that is required to produce them. Inaccurate or out-of-date credit score can result in a company, especially a small to medium-size enterprise (SME), having a higher cost of borrowing. Thus, in my PhD, I would like to design a method, based on explainable machine learning (explainable ML), that would accurately evaluate an entity’s credit rating. A well-performing model would tackle the issue of rating agency conflict of interest providing an unbiased evaluation of the creditworthiness of an entity. Academic literature has explored both statistical modelling and machine learning methods, finding the deep learning models to overall yield better performance (Dastile et al., 2020). However, deep learning lacks transparency which is required for credit rating models under Basel II accord (BIS, 2006). As a result, research has been hesitant to implement machine learning models that do not satisfy the legal transparency requirements in credit research. Dastile et al. in a literature survey conducted in 2020 found that only 8% of the investigated studies used transparency techniques. In my research, I will focus on creating an explainable machine learning approach which would yield a performance equivalent or similar to that of the deep learning models. In line with the Basel II accord, it would provide a sufficient level of detail into how the model arrives at a decision. Some success has been achieved using explainable ML in credit scoring to date (Fahner, 2018; Bussman et al., 2020). However, significant progress is yet to be made in the field in terms of model performance and explainability trade-off in corporate credit scoring.
Using Learner Digital Footprints in Recommending Content in Learning Environments
The area of educational data mining has used some aspects of a learner’s digital footprint in order to predict outcome (grade in the final examination) or recommend some content of importance to help the learner. This has included timestamped clicks of pages viewed, prior performance and external data such as previous exam performance, timetables, earlier results, outputs from automatic assessments, etc. While these have each led to improved experiences for the learners, who are able to gauge their own progress and who also can be given personal recommendations for content they should view, the models used to create these predictions and recommendations are limited in that they use only a small portion of the learner’s digital footprint and that small portion is static and does not, and cannot account, for much insight into the learner’s state of mind at the time of prior use of the system. To capture digital footprints, we introduce two unique and independent monitoring mechanisms including keyboard dynamics and webcam-based attention applications. Keyboard dynamics is about capturing patterns of learners typing information, especially timing associated with bigrams or pairs of adjacently-typed alphanumeric characters. Similarly, webcam-based attention application runs on learner’s laptop, which monitors their facial attention while they are attending an online zoom session or reading material on screen or watching an educational video, and records an attention log. These methods preserve information with cost-neutral sources of interaction log, to give deeper insights into a learner’s state of mind, stress, and fatigue while interacting with digital content. We propose to enhance the modelling capabilities of a learning recommender system by capturing more of the learner’s state of mind during interaction with the system. Is she interested or bored with the content, is the learner engaged or distracted with the system because she is not motivated or because she is tired, stressed or cognitively distracted by some other task or outside influence. The first research challenge is related to data packaging of keyboard dynamics according to the mental state of learners. Another challenge is to do investigation about finding best models to calculate webcam attention graphs efficiently. Furthermore, comprehensive research is required for estimation of aggregated attention graphs which can be used for recommender systems. Also, all of this has to be done in a GDPR-compliant way so that users feel comfortable about recording such data about themselves.
Deep Learning approaches for Group Anomaly Detection in Cyber security
Today, the frequency and scale of cyber-attacks and cyber fraud are increasing every year. Incidents related to cybercrime are becoming more sophisticated, complex and multi-faceted. For example, cybercrimes are relying on tools, procedures/process already installed on the system for their attack campaigns because these tools/procedures are normally used by administrators, directors and security analysts for legitimate purposes for their routine tasks. On the defensive side, the detection of patterns that differ from typical behaviour is utterly important to detect new threats or fraud patterns. This requirement has been satisfied by using popular machine learning-based algorithms that are capable of detecting point anomalies. However, many of such approaches cannot detect a variety of different deviations that are evident in group datasets. For example, the activity of a domain admin on a machine can be similar to a cybercrime’s activity confusing any point anomaly detectors. Identifying attacker activities, in this case, require more specialised techniques for robustly differentiating such behaviour. On the other hand, Group Anomaly Detection aims to identify groups that deviate from the regular group pattern. Generally, a group consists of a collection of two or more points and group behaviour could be described by a greater number of observations. Group Anomaly Detection has been studied in various domains to find group anomalies where point-wise methods failed. Recently, Group Anomaly Detection has been applied in cyber security with simple Deep Learning (DL) models such as Adversarial Autoencoder to detect targeted cybercrimes who hide their activity. However, such simple DL models are still limited in detecting sophisticated activities in cyber systems. Hence, in this research, we are looking at developing a new Deep Learning model for Group Anomaly Detection for cyber systems with the existing of sophisticated activities such as multiple attack groups, new cyber-fraud patterns. This new Deep Learning model should have a capable of multi- class classifier. The new model will be evaluated with both open-source datasets and real-world datasets from cyber security industry.
In this project we aim to investigate which NLP techniques are most effective at detecting fake news online. We will research current state-of-the-art in NLP and claim verification techniques, and identify areas for further investigation. Previous offerings in this topic have included knowledge graph-based techniques facilitated with string-matching from large knowledge bases such as Wikipedia. Further research is required at every stage of the process. Fake news detection itself is not the full story: some researchers try to predict whether an article contains false information, while others look at verifying individual claims. We will look into the whole fact checking and fake news process, including interviews with journalists and professional fact-checkers. Several researchers have looked at how to construct large datasets for fake news identification with the right kind and amount of metadata. Further work has then investigated which Machine Learning models should be used to process this data, from Naive Bayes classifiers with no stop-word or stemming stages, to complex convolutional neural networks with stacks of modules. We will look at the interaction of different kinds of data with different kinds of models. Processing input data for relevant information is another step which requires further research, and which is tackled differently by different open challenges and datasets. The FEVER dataset release paper recommends splitting claim verification tasks into document retrieval, sentence selection, and claim verification stages, so that a model will assemble relevant information first before trying to verify a claim. Conversely, the Fake News Challenge dataset is generated with the aim of checking whether or not the headline provided is related to the article body text. We aim to investigate if these are the definitive steps needed for fact- checking, or if some other conceptual step is needed along the way.
Provenance Chain Fact Validation in Neural Knowledge Graphs
Recent years have brought a proliferation of information to the public. Social networks serve up billions of bite size chunks of “information” which we as humans process in the context of our world view and experience. But even with our wealth of “knowledge” about the world, it can be very difficult to infer the veracity or intent of the information presented. The potential for harm cannot be understated – the effects of mis- and disinformation on society, whether it be in politics, public health, or climate change, are already evident. The application of modern Machine Learning, and in particular Deep Learning techniques is constantly evolving and improving. However, the classification of information based solely on its linguistic content can only get us so far. We would like to explore the use of Knowledge Graphs (KG) as additional context for identifying false information. In particular, we would like to explore provenance (to which graph structures ideally lend themselves) as indicative of the probability that the item is/not “true” (this term requires a much more in-depth definition beyond the scope of this introduction). In addition, we are interested in the extent to which sources are biased as a possible proxy for intent. We also believe that it is not enough to provide a model with high precision, but that the model must be explainable. We think it is important to provide a provenance chain with credibility and bias indicators at each step. There is currently a lot of manual effort in this arena – FactCheck.org, PolitiFact, Snopes.com, Hoax Slayer, to name a few. We would like our model to be at least as insightful as these efforts. To build our model we will use existing datasets which will need to be converted to a KG using NLP. This KG would be augmented by existing KGs such as DBpedia (leveraging Semantic Web), or a proprietary solution such as DiffBot. To build an ontology for the fact validation model, we can use a framework like PROVO. We can then combine the ontology and knowledge graph to train a neural network to build and check the provenance chain. To validate our solution, we will compare to baselines such as http://aksw.org/Projects/DeFacto.html to see if it improves results or provides stream or real time validation of facts, or Microsoft’s early detection model claiming to beat existing SOTA.
To say that AI is all-pervasive is now trite. However, despite its inescapable presence, society has yet to identify an effective way to oversee and control this technology. AI solutions which have an overall social benefit but which are also safe, trustworthy and legal are needed. “Control” in in this context can take many forms, from specific company guidelines through to legislative interventions, from soft regulation to criminal law sanctions. The lack of compliance with, or absence of, standards, regulation and laws in AI impacts trustworthiness. This is a weakness in the adoption and usage of AI. There has been criticism of the use of AI models, in the justice and healthcare systems, among others. The use of the COMPAS decision support tool in sentencing to assess recidivism, for example is controversial. Dressel (2018) found that the tool’s accuracy was not dissimilar to predictions made by people without any criminological experience and that race bias was a significant failing. Carter (2020) reviewed the ethical, legal and social implication of using AI in breast cancer care and emphasised the need for detailed discussion on when and what kind of AI should be deployed. On a global scale, the challenge being faced is how to control the development and deployment of AI solutions in a way that facilitates both progress and protection. This research investigates how a regulatory framework could be applied to AI and how we could best design a system to maximise adoption, oversight and compliance. Adopting an interdisciplinary approach, this research will test existing regulatory frameworks against the field of computer science. The research will require bridging the technical feasibility of measuring AI trustworthiness with socio- legal and regulatory practices and frameworks using and combining methods from a variety of disciplines. It will evaluate the hypothesis that a bottom up regulation approach will provide trustworthy AI solutions with measurable compliance in a simpler and more legally sound way than a top down product certification approach. This is an emerging and urgent challenge facing policy makers and this project will provide an integrated perspective on the techno-socio-legal challenge.
Spoken Open Domain Dialogue Systems for Non-native Speakers
Open-domain dialog systems have been receiving a great deal of attention from both academia and industry, resulting in many applications. Chatbots like Alexa are designed for information retrieval and chit-chat purposes, while others like Replika are built for companion and emotional support. There is also a growing interest in using open domain chatbots as a language educator as practising conversations is an effective way for non-native speakers to acquire a new language. However, building a chatbot that can interact with non-native speakers is challenging because: (1) Errors propagated from automatic speech recognition (ASR) systems lead to unexpected responses, (2) Non-native speech may contain a lot of disfluencies and grammatical errors, which degrade the chatbot’s performance, (3) Most non-native speakers are not good conversationalists because of their limited linguistic ability, thus chatbots must be highly interactive and engaging, and able to lead and create a meaningful conversation. This PhD will improve upon state-of-the-art open-domain chatbots, making it possible for chatbots to create a smooth and engaging conversation with non-native speakers. In particular, the PhD will focus on: (1) Making open-domain dialog systems robust to noisy inputs (e.g. ASR errors and disfluencies) by using multiple ASR decoding outputs or enabling the system to ask clarifying questions (2) Making open-domain dialog systems more engaging by leveraging a user’s personalized data such as interests, goals, beliefs, and values. This information is necessary for the chatbot to choose the right topic for the conversation, to avoid discussing things that are out of the user’s interests. The chatbot will learn to do that by looking at dialogue examples in which two people have an engaging/meaningful conversation, on condition that they already know about each other. For direction (1), we will modify the transformer-based seq2seq model, allow it to encode not only the dialog history but also additional ASR information at the current turn. The model will learn either to generate an appropriate response or to ask a clarifying question by looking at spoken dialogue examples. For direction (2), we will create a dataset called DeepConversations, consisting of many engaging/meaningful dialogues. We also propose a transformer-based memory network to encodes each of the user’s personalized data as individual memory representations, and then generating the engaging response word-by-word. The outcome of this PHD will benefit not only non-native speakers but also native speakers who using chatbots for entertainment purposes.
Knowledge Transfer from Text Annotations towards more Effective Learning for Computer Vision
Computer Vision models have achieved human-level accuracy in certain tasks like classification and localization by leveraging large annotated datasets, leading to widespread adoption in several domains. However, in fields like medical diagnostics, adoption is still hampered by the scarcity and/or cost of annotated data. Recently, several works in few-shot learning and self-supervised learning have tried to learn from a limited amount of annotated data, but with limited success. A recent analysis (W.Chen et al 2019) of few-shot algorithms shows that a simple baseline that finetunes a deep model is as good as current state-of-the-art few-shot learning algorithms and fares better in the realistic scenario of a non-negligible domain shift between the train and test sets. Another such analysis (Y. Asano et al 2020) of self-supervised learning methods suggests that unlabelled images aid only in learning low-level features of the initial layers and are not sufficient to learn discriminative mid-level or high-level features. Both these analyses suggest that visual information alone is not enough to perform well on computer vision tasks in the annotation scarce scenario. In contrast to deep learning-based models, humans can learn to recognize new objects or point them out in images from just a handful of labeled examples. One possible reason why humans can understand objects/concepts from a few examples is because of the existence of an external representation of information about the world from prior experiences. Inspired by this, this research project aims to explore how prior knowledge can be modeled and how it can be used to improve the performance of vision models in a limited annotation.scenario. The objectives of this research project are: 1. Develop a knowledge model of the world from a text corpus and already annotated images. Natural language text is a rich source of knowledge. Semantic relationships between objects can be modeled from language to produce a knowledge representation (G Miller 1995). Here we intend to explore how annotated images can be jointly modeled with natural language to produce knowledge prior. 2. Explore how information can flow from this knowledge model to the vision model to improve performance in few-shot learning. 3. Explore how information from this knowledge model can aid in getting more discriminative feature representations in self-supervised learning.
Federated Learning on IoT edge devices using serverless ML
The Internet of Things (IoT) has extensively become involved in different aspects of modern life. Nowadays, we see sensors being deployed in our surroundings and becoming an integral part of our day to day life. With overall improved software architecture, rapid increases in computing power, and embedded decision-making abilities in machines, users now interact with more intelligent systems and many intelligent IoT services and applications are emerging. The typical processing pipeline for IoT applications is that all sensor data is collected and stored on the cloud, where it is used to train various machine learning algorithms. Once trained these algorithms are deployed locally at the edge devices. However, heterogeneity of IoT devices and sensor networks is a major challenge to build these intelligent IoT applications. The ML algorithms designed for IoT devices and edge analytics have to be re-designed, re-trained, and then re-deployed for each type of IoT device joining the IoT infrastructure. The aim of this work is to come up with that better architecture using serverless programming. Serverless ML can prove to be a major step forward in enabling seamless integration of edge analytics for a variety of IoT devices without a need to build a customized ML algorithm for each type of device. Thus, facilitating data scientists to focus on the domain problem rather than the configuration and deployment of ML algorithms over IoT devices. Moreover, serverless architecture brings in scalability inherently and could prove a door way to many intelligent applications. At the later stages of my project, I will use the serverless architecture for edge analytics to deploy distributed and federated learning algorithms on top of the large-scale IoT infrastructure. The ultimate goal will be to automatically train and deploy distributed and federated learning on IoT devices, which can support building distributed intelligent IoT applications without worrying about the heterogeneity of underlying IoT infrastructure.
Challenges Even though many Machine Learning (ML) and Deep Learning (DL) object identification and classification methods have equalled or surpassed human level performance, they still face adoption challenges, especially in the health sector. Adoption of AI in medical image analysis highly depends on trust users have in an automated system. Making AI transparent is one way to increase its adoption. Typical accuracy of a radiologist interpreting ligament injury from an MRI is 94%. ML and DL based classifiers can achieve similar levels of accuracy, but their black box nature means that we do not know if they are diagnosing using relevant features. Studies in Longoni et al. have shown that consumers/patients in healthcare are concerned that ML and DL based solutions would not account for their unique injury features as much as humans would do. This can be true for DL models used in medical image classification, which are trained on large datasets, and may not be able to look for unique features in a patient’s imaging output. This suggests that interpretability, human-In-the-loop (HITL) feature labelling, and unlearning inappropriate features could be combined to move towards more personalised models.
Objectives The main objective of this project is to develop transparent medical image analysis solutions where domain experts can participate in the model building process by combining model interpretability and HITL ML techniques. These solutions will be driven by and demonstrated in the important application of Anterior Cruciate Ligament (ACL) injury classification. Models with prototype layers will be used to develop interpretable classifiers. The proposed project will involve designing user interfaces that help users navigate through a trained model’s decision process, and enable them to interact with the model by reporting back inaccurate image features the model may have picked up in its training phase. This will be followed by unlearning incorrect features that the trained model may be using to reach classification decisions, which in turn improves classification performance. Methods developed will be released to radiologists for evaluation. The resulting interpretability and unlearning solutions should be transferable across different knee joint injury classification problems, as well as to other body parts, and imaging modalities. This project will have three main contributions: (1) Prototype layer based medical images classification and feature visualization; (2) A user interface for HITL feature labelling feedback; (3) Improved system performance through unlearning incorrect features extracted by a trained model.
MRI Classification, Automatic Report Generation and Modelling of Sensor Data for Musculoskeletal Injury Management
The main objective of this project is to develop machine learning techniques for musculoskeletal injury management. The themes of the project are computer vision, language in machine learning and machine learning for sequential data. The project will consist of two phases. The first phase will focus on medical image analysis of Magnetic Reasoning Imaging (MRI) in musculoskeletal regions such as the calf muscle. The aim of this retrospective analysis is to detect and classify injuries. The project may also extend to classifying injuries based on Diffusion Tensor Imaging (DTI). Applications for machine learning in medical image analysis extend beyond classification problems. Automatic generation of the radiologist report is a useful application of machine learning as analysing a medical image and writing a standardised report can be both time consuming and tedious. Many notable approaches have been developed in the field of automating the generation of radiology reports for X-rays of the chest. However, there is a sparsity of literature applying such techniques to 3D medical images such as MRIs. Phase one of the project will apply or further develop approaches to automate the radiologist reports for MRIs of musculoskeletal areas. The phase one model will be trained with multi and cross-modal inputs. Multiple images from an MRI from different angles will be input into the model, along with the radiologist report in order to train the model to generate the report. Structured data such as injury statistics and demographic data will also be used as input to aid the generation of the radiologist report. Methods of combining multi and cross modal input will be explored. The objective of phase two is to develop machine learning techniques for musculoskeletal injury rehabilitation. This will involve analysing sensor data to ensure physio-therapy or exercise is performed correctly. Electromyography (EMG) and/or accelerometry are some of the possible types of sensor data that will be used in phase two.
Bayesian approaches to identifying and ameliorating systematic bias in machine learning algorithms and data
Researchers have identified a number of ways in which various standard ma- chine learning approaches can produce systematic bias against underrepresented or minority groups (or more generally, against categories only present in small subsets of data). This project will look at ways in which such systematic bias can arise in Bayesian inference (a fundamental normative model behind many machine learning approaches). This project will also propose techniques for mitigating this bias, and will implement, test and validate these techniques. The aim in this project will be to produce objective measures of the degree of systematic bias produced by standard Bayesian approaches for data sets with particular characteristics, and to produce measures of the degree to which extensions of the Bayesian approach influence or mitigate against such systematic bias. The aim in this project is not just to address the origins of bias in the approximate approaches implemented in various machine-learning techniques, but to also investigate bias in general, by looking at normatively correct models of reasoning such as full Bayesian inference (models which underlie these approximate approaches).
Quantum computing is a potential game-changer as the field promises an exponential increase in computing power which will enable breakthrough applications in areas as diverse as vaccine and drug discovery, climate modelling, protein folding modelling, financial services and artificial intelligence among others. Equal 1 Laboratories Ireland Limited (Equal 1) is an innovative start-up creating a paradigm shift in quantum computing by developing disruptive, scalable and cost-effective quantum computing technology. Equal1 currently has a number of Quantum Computers operating at 3 kelvin with one currently on site at UCD that will be available for conducting experiments as part of this PhD project. A key goal would be the use of Variational hybrid quantum-classical algorithms. This class of algorithms enhance classical machine learning algorithms with quantum machine learning algorithms, for example quantum Boltzmann machines in which they are used to learn binary probability distributions. This type of algorithm is very promising for gaining an advantage over a classical computer in the Noisy Intermediate-Scale Quantum (NISQ) era, Different large-scale generative and optimization tasks in high impact domains (e.g. medical image processing) can be approached. The student will participate in a collaborative project with Equal1 to explore new data-driven and machine learning-based algorithms in the field of Quantum Artificial Intelligence. This will require the student to tackle open problems like the input problem and output problem, the comparison of gate based and adiabatic quantum computer and the analysis and an development of new approaches that make best use of the underlying hardware capabilities (e.g. native gates and their qubit connectivity, error characteristics).
Event driven AI techniques and hardware implementation for IoT wearable devices
Wireless biomedical sensors should dramatically reduce the costs and risks associated with personal health care while being more and more exploited by telemedicine and efficient e-health systems. However, because of the large power consumption of continuous wireless transmission, the battery life of the sensors is reduced for long-term use. Sub-Nyquist continuous-time discrete-amplitude (CTDA) sampling approaches using level-crossing analog-to-digital converters (ADCs) have been developed to reduce the sampling rate and energy consumption of the sensors. However, traditional machine learning techniques and architectures are not compatible with the non-uniform sampled data obtained from level crossing ADCs. This project aims to develop analog algorithms, circuits, and systems for the implementation of machine learning techniques in CTDA sampled data in wireless biomedical sensors. This “near-sensor computing” approach, will help reduce the wireless transmission rate and therefore the power consumption of the sensor. The output rate of the CTDA is directly proportional to the activity of the analog signal at the input of the sensor. Therefore, artificial intelligence hardware that processes CTDA data should consume significantly less energy. The project involves algorithm development, circuit/chip implementation of the event driven AI , testing and verification etc.
Reasoning with Cases and Knowledge Graphs to Uncover Relationships in Financial Markets
The stochastic nature of financial markets reflects a complex network of interactions, making them a challenging target for analysis and prediction. Within this application domain, identifying meaningful relationships between financial assets is a difficult but important problem for various financial applications, including portfolio optimization, benchmarking company performance, identifying peers and competitors and quantifying market share. However, with recent research, particularly those using machine learning (ML) and deep learning (DL) techniques, focused mostly on returns forecasting, the literature investigating the modelling of asset correlations has lagged somewhat. To address this, the focus of this work is on developing novel ML and DL frameworks to successfully uncover relationships between financial assets. These frameworks will leverage multiple data modalities, and the efficacy of the learned relationships will be demonstrated on several downstream tasks in the financial domain, including portfolio optimization, returns forecasting and sector classification.
An Automated Diabetic Retinopathy Screening and Classification System
Diabetic retinopathy is the chronic eye disease that is the principal cause of permanent vision loss. An automated diabetic retinopathy screening and classification system cannot only aid the ophthalmologists with the efficient, accurate and timed diagnosis of diabetic retinopathy but can also classify diabetic retinopathy according to the severity level. Depending on the severity, the appropriate treatment of the patient can be initiated without any delay. The research questions that will primarily be investigated in the intended research are the diabetic retinopathy screening, grading of diabetic retinopathy into a specific level, and identification of different retinal pathological structures. The main challenge will be to tackle the intensity similarities between the pathological structures (like exudates) and the retinal features (like the optic disc). Furthermore, optic disc detection is highly reliant on photographic illuminations. The poor illumination condition will result in a very dark optic disc region. The accuracy will be highly degraded when the pathological structures are wrongly identified as retinal structures and are removed and vice versa.
As IoT devices become more widespread, creating more and more data, it is no longer credible that cloud computing can absorb and process all of the data, analysis and decision-making involved. Whether in terms of bandwidth, computing power, or algorithmic adaptability, new architectures and new machine learning (ML) techniques need to emerge to meet these new IoT needs. Edge computing is one of the recent techniques that allows some of the processing to be performed locally before being sent to the Cloud for analysis. Using an edge processor it is possible to move part of the intelligence and adaptability from the Cloud directly to the local IoT mesh network. However, the computing power and power requirements of such edge devices remains a limitation for ML frameworks meaning that ML techniques must be optimized or custom designed.
Developing a Methylation Risk Score for Telomere Shortening and investigating its association with age and stress-related disorders
At the end of each chromosome in a human cell is cap-like structure called telomeres. Just like the plastic tip on the end of a shoelace, the telomere keeps the DNA from fraying. As cells divide telomeres gets shorter, overtime the DNA unravels like the shoelace unravelling and the cell dies. As telomeres shorten, our tissues show signs of ageing, thus telomere length (TL) is a marker of aging. Previously, scientists identified 7 genetic determinants of TL, providing novel biological insights into TL and its relationship with disease. However, identifying genetic determinants of TL was only the first step in our journey to understand the role of TL in disease. Recently, scientists have identified a second layer of information (the epigenome) that sits on top of our DNA, acting like a molecular switch by fine-tuning how genes are regulated. The primary aim of this study is to use machine learning methods to train a predictor of telomere shortening using epigenomic profiling data. It will provide a framework for identifying biological predictors of aging, uncovering biological insights into telomere biology and may lead to the identification of potential epigenomic biomarkers and/or therapeutic targets of aging and stress-related phenotypes like depression. The primary Objectives of this study are: Use machine learning methods (e.g. LASSO penalised regression models) to train a predictor of TL based on DNA methylation (a type of epigenetic modification) in a large epidemiology sample (n= 819). Develop a methylation risk score (MRS) for telomere shortening, based on the CpG sites identified in the training set. This MRS will be validated in two independent replication blood cohorts (n=192; n=178, respectively) collated in-house that have DNA methylation and TL measured. Test whether the identified MRS for TL shortening are associated with age (e.g. Alzheimer’s Disease) and stress-related diseases (e.g. Depression) results from previously published DNA methylation-wide association studies. 4Identify the causal relationship between DNA methylation changes and TL in humans using mediation analysis. By the end of the project we will have a robust methodology utilising machine learning algorithms, which could be applied to other biological markers, such as pro-inflammatory cytokines, to examine their relationship with DNA methylation.
Analysis of Aspects of ML Algorithms that Lead to Bias
Issues of algorithmic fairness/bias have received a lot of attention in AI & ML research in recent years. There are two main sources of bias in ML: Negative Legacy: the bias is there in the training data, either due to poor sampling, incorrect labeling or discriminatory practices in the past. Underestimation: the classifier underfits the data, thereby focusing on strong signals in the data and missing more subtle phenomena. In most cases the data (negative legacy) rather than the algorithm itself is the source of bias. Fairness research focuses on fair outcomes no matter what is the source of the problem so the underestimation side of algorithmic bias has not received a lot of attention. However, the algorithmic side of algorithmic bias is important because it is inextricably tied to regularisation, i.e. the extent to which the model fits (overfits) the data. Overfitting occurs when the model fits to noise in the training data thus reducing generalisation. ML practitioners expend a lot of effort avoiding overfitting. This PhD research will focus on the algorithmic aspect of algorithmic bias and the relationship between model fitting and underestimation. An initial paper on this research is available on arxiv. “Algorithmic Bias and Regularisation in Machine Learning” Pádraig Cunningham, Sarah Jane Delany https://arxiv.org/abs/2005.09052 For a wider perspective on research relating to fairness in ML have a look at the papers published at the ACM FAccT conferences https://facctconference.org.
Orchestration of Microservices on The edge: A machine learning-based approach
International Data Corporation predicts that the collective sum of the world’s data will grow to 175ZB by 2025, out of which 90ZB of data will be created on IoT and edge nodes. Offering processing, and storage services at the cloud level is costly. Furthermore, In some applications, the requirements such as low latency, privacy, and scalability are not satisfied if all data is uploaded to the cloud [Mechalikh 2019]. To address these issues Edge or Fog Computing is the new paradigm that allows computation and storage to happen at the edge of the network [Bonomi 2012]. Furthermore, edge nodes are heterogeneous and operate in volatile and dynamic environments, with high degrees of mobility and geo-distribution. Additionally, limited storage and computational power of edge nodes require applications’ software to be developed as a set of lightweight, independent, and executable modules called microservices. Therefore, in such an environment, edge nodes require an orchestrator or a set of orchestrators to support the orchestration of microservices on-demand based on available resources. Some of the challenges in these environments are as follows [LM Vaquero 2019][K Velasquez 2018]:
• Mobility and Dynamism: Due to the uncertainty in mobile and dynamic environments, a network of adaptive orchestrators must be designed so they can adapt to the changes in the environment and to respond to the end-users’ demands on-the-fly and satisfy the quality of service. •Heterogeneity: The orchestrator must employ compatible techniques such as container-based techniques to work with a wide range of heterogeneous software and hardware infrastructure at the same time. •Functionality chaining: Arranging and scheduling heterogeneous microservices with different Service Level Agreement and resource requirements which do not have a standard specification is still an open research challenge. •Data streaming and Data scattering: Identifying data sources that are frequently accessed by microservices and offering a data scattering mechanism that improves bandwidth utilization and reduces data access time.
The proposed research aims to design a distributed network of orchestrators that utilizes Machine Learning techniques so it can: (a) operate in heterogeneous, mobile, and dynamic environments by adapting the services based on the availability of the resources in the environment; (b) provide a dynamic arrangement of functionality chaining to ensure the interoperability of the heterogenous microservices and the required resources. (c) identify and predict frequently used data to facilitate data accessibility and improve the efficiency of the system.