It is estimated that over one billion people live in slums without access to public services such as water and sanitation. Knowing the location and extent of these settlements is critical to monitor the United Nations’ Sustainable Development Goal of ensuring adequate housing and basic services for all by 2030. However, detecting the location of slum areas at a global scale is an open problem. Currently, the main source of information about the percentage of the world urban population living in the slums comes from housing census surveys, which are labour-intensive, time consuming and require substantial financial resources. An alternative is to use passively collected data, such as satellite imagery, and image processing techniques for the task of mapping these settlements. In this context, recent reviews show that the use of machine learning image processing techniques is still very scarce, but that it shows promising results.
The proposed research seeks to advance the development of a global slum inventory. The first objective is to develop a new algorithm for mapping slums. To achieve this, a georeferenced dataset of slum communities, made available by the Brazilian government, will be used. The characteristics of this dataset make it well- suited for this research. Firstly, the slum areas are already georeferenced and labelled, a process that is often very time-consuming in machine learning projects. Secondly, it has data for many cities, which allows for models to be trained in different urban contexts. This has been demonstrated as a limitation of current studies and that would, consequently, contribute to the advancement of the state-of-the-art on the topic. The second objective is utilise machine learning techniques to quantify the population living in these settlements. To assess the suitability of the algorithms that will be developed in this research, the results will be compared with official census data. The models will also be evaluated in terms of the cost of acquiring the images, computation complexity and generalizability of the approach to other regions.
Both humans and machines have difficulty detecting fake news, bias, and identifying emotions. These problems take on a new dimension with the advent of ever more sophisticated machine generated text, such as GPT-2 and Grover which were both released this year. We now face the additional difficulty of detecting machine generated synthetic text. Such is the sophistication of machine generated text that there is ongoing work on release strategies for text generators in order to avoid misuse. GPT-2 is undergoing a staged release approach and was fully released publicly only last week, whereas Grover’s authors plan to release it, because they found “the best defense against Grover turns out to be Grover itself”. There is ongoing work to detect synthetic text, and bias. These approaches can be categorised into Human detection, Automated ML-based detection, and Human-machine teaming. Metadata-based (e.g. time taken to write text, social graph of participants, etc.) prevention provides another tool for detecting synthetic text. My initial scan of the work to date shows no inclusion of emotion classification in these approaches. In one example using controlled generation Grover was prompted with a headline “Timing of May’s ‘festival of Britain’ risks Irish anger”, note the emotion “anger” in the prompt, and tasked to write an article. The human authored article includes emotional words such as “fear”, “attacks”, “hostility”, “mocked”; whereas Grover’s generated article is relatively light on emotion. Other useful applications for emotion classification include assisting people who have difficulty detecting emotions e.g. Asperger’s syndrome; and to allow people to filter content based on emotions. Can emotion help detect synthetic text?
There are several schemes for classifying emotions. IBM Watson Tone Analyzer detects seven tones in written text i.e. “anger, fear, joy, sadness, confident, analytical, and tentative”. Other approaches use six emotions i.e. “Happiness, Sadness, Surprise, Disgust, Anger, Fear” per Ekman’s model. Several other models exist for classifying emotions.
This proposal is to investigate the role emotion classification can play in synthetic text detection. A possible roadmap is to establish the state of the art, partner with OpenAI (who are reaching out for partners). Choose the most appropriate emotion set(s), and benchmark(s), and then to compare human vs synthetic text using e.g. a classifier trained on human text, and synthetic text. The result of this proposal should prove useful and could stand alone, or form part of the wider analysis of neural network interpretability and explainable AI
The goal of this project is to allow higher level control of a sound synthesis model. We will have a corpus of sounds. At a low level each sound is represented as a time series of the sampled waveform. Each sound can be analysed to give mid-level parameters, e.g. log-mel-spectrogram representation or other time frequency description and we can also collect labels such as roughness or happiness, on a scale of 1-10 (call these “highlevel parameters”).
We have a synthesis model that is controlled by “mid-level parameters”. Currently many machine learning based synthesis techniques use log mel-spectrograms to represent audio. This is a perceptually informed timefrequency representation. In synthesis this has the challenge of reconstructing the phase. This can be done using existing signal processing methods (although Wavenet uses a ‘neural vocoder’ to convert the spectrogram into a time domain waveform). We will explore other representations of sound which may be more suitable for flexible single source sound synthesis. Our synthesis model will generate phase in a deterministic manner.
We hypothesise that we get better-sounding results on transients etc using the synthesis model than by inverting the spectrogram.
Our goal is to control the synthesis model via high level parameters. The user will directly control high-level parameters, and the UI will output mid-level parameters, and the synthesis model will then output audio.
The UI can be driven by a neural network. The input layer will be high-level parameters and the output will be mid-level parameters. Initially it will be an “instantaneous” model i.e. for each window it will take in current values of high-level parameters and output mid-level parameters for the same window. Later steps in the research could make it non-instantaneous but still causal. This could be done using a convolutional network taking in multiple recent time-steps, or a recurrent neural network e.g. LSTMs.
Variational autoencoders (VAEs) and Generative adversarial networks (GANs) and hybrids are probably the state of the art for unsupervised learning. Google’s Magenta project has used these models to create new audio synthesiser spaces. Google WaveNet is a pre-trained convolutional AE for audio. Google NSynth is a WaveNet-like model which allows user control of a control space. In NSynth the dimensions defined by interpolation between multiple real instruments. The proposed research is different because it chooses fixed high-level features as the control variables.
Using Knowledge Graphs to Improve Data Quality in Machine Learning
Knowledge graphs have a wide variety of applications such as question-answering, digital assistants, structured reasoning and exploratory reasoning. One aspect of knowledge graphs is that they can be validated by a rule-based system, such as a knowledge base.
Existing data set may be semantically incoherent. Creating large scale knowledge graphs out of existing datasets would add knowledge representation to the semantically incoherent data which in turn would help improving accuracy in prediction tasks for machine learning models. Knowledge graphs by definition are relationship rich because they allow any-to-any relationships. By using the semantic graph integration approach, you’re guaranteeing the use of the most effective large scale, web-scale, and data integration method, one that’s symbiotic with machine learning. It allows to creation of better and most fully disambiguated training sets. This improves the quality of the data set for better results.
The proposed model transforms a data set to knowledge graphs and uses a knowledge base for evaluation of data quality as an additional layer in data prediction with machine learning models. These models would have the potential to simplify the task, use reasoning to enhance the dataset and assert quality issues in the underlying dataset. Especially, in the case of deep domains with very complex rules and complex interaction between rules, there is no substitute. Such scenarios are evident when there is a requirement to integrate disparate domains.
To summarize, the project combines the research fields of Semantic Web and Machine Learning. The goal is to design an ontology that validates datasets which are previously translated into knowledge graphs. However, this would involve work on description logic and propositional logic and building a knowledge base in Prolog. The workflow could go along the lines of 1. Translating a dataset into a knowledge graph which represents dataset characteristics and tells us about the quality of the dataset, 2. Feeding the dataset into the knowledge base (which is the main part of the project) to check against DL/PL rules. 3. Summarizing the results and making suggestions/automatic corrections to the dataset. A part of the work would be to investigate a SHAQL-style restriction language to define the rules.
PhD Projects Short Description Injuries are extremely common in any sport. When an athlete is injured, a decision must be made on when they are ready to return to play, which is crucial in managing the risk of re-injury. Wearable sensors such as Inertial Measurement Units (IMU’s) allow the motion characteristics to be captured during motor tasks such as lunging, squatting, walking etc. This is giving rise to an emerging field of ‘digital biomarkers’ where digitally captured biomechanical features are being used to train models to characterize the various stages of recovery and help identify when the athlete is fit to return to normal activity.
In this research we will attempt to develop models that will enable us to identify the potential role that such digital biomarkers could have in the sports injury field. In particular, motion data from functional motor tasks will be used to interrogate the potential role of digital biomarkers in recovery following anterior cruciate ligament injury (ACL) and subsequent corrective surgery.
One of the challenges in implementing this study will be the limited data availability and more particularly the difficulty in obtaining labelled data. Whilst working with wearable sensor data on athletes obtaining labelled data can be difficult due to the time-consuming nature and domain specific knowledge that is required for the labelling. To overcome this limitation, methods such as transfer learning and semi-supervised learning can be investigated. Transfer learning provides some potential to leverage the knowledge from previously trained models either from the same or a different domain. Transfer learning has recently been successfully implemented on images and text but there has been less work in this area for sensor data (time series data) and hence gives scope for novel methods to be developed.
This study aims to investigate new data driven approaches that can be used to aid clinicians in making informed decisions on when an athlete is fit to return to play. It has been planned to use Pre-seasonal digital data (IMU Sensor data) and analog measures (physical measures such as reach distance) from athletes who have sustained ACL injury.
Machine learning and AI to optimise the cost of ownership for small-scale reverse osmosis processes
The demand for agricultural, industrial, and potable water for domestic use has increased continuously over the last thirty years, reportedly increasing by 1% year on year since the 1980s (UN Water report, 2019). By 2050 consumption is expected to exceed current usage by 20 to 30%, leaving many countries experiencing severe water stress. It is evident that effective and efficient management of this vital resource is critical. Desalination technologies are becoming increasingly necessary to meet water demand, with reverse osmosis being the most prevalent technology, accounting for greater than 60% of installed global capacity (Desal data, 2016). Reverse osmosis, in conjunction with its necessary pre-treatment processes, is resource intensive, particularly in terms of energy, chemicals, and membranes. Economies of scale mitigate operating costs somewhat for large seawater desalination plants. However, smaller-scale systems are becoming more common to treat low volume saline water for industrial and agro-industrial applications, and these smaller systems pose specific challenges in terms of process and operational cost optimisation. ML techniques such as support vector machines and artificial neural networks have been applied to model various desalination processes that pose multivariate and time series challenges. However, it is unclear whether these approaches are optimal for smaller-scale industrial seawater treatment. The aim of this project is to develop models using AI and ML techniques to optimise the cost of ownership in small-scale desalination and water treatment processes. An instrumented and automated reverse osmosis rig will be used to collect data under different operating conditions. Using a combination of existing reverse osmosis operational data and results from experimental work, AI/ML techniques will be applied based on current methodologies and engineering techniques to establish the benefits and limitations of computational intelligence and propose methods for optimisation of small-scale desalination processes.
Cardiac Magnetic Resonance Imaging is one of the most widely used scanning methods for acquiring data from patients for a variety of medical conditions. Similarly, electrocardiograms provide much information about different parts of the heart and its cycle. Whilst adoption of machine learning techniques in medical image processing applications has been slower than in other domains, this is a growing area given the potential of machine learning to assist in diagnosis and reduce costs. There are already examples of machine learning algorithms using such sources individually for identifying and locating issues. Using both signals at the same time is an interesting research direction as it could lead to performance improvements while enhancing the explainability of the diagnosis that this kind of algorithms usually lacks. Data is a key challenge when trying to use off-the-shelf algorithms in this area, specifically the amount of annotated data and its quality. Many researchers report in the literature how they struggle to achieve good results with existing annotated data, especially when working with open datasets. Furthermore, in some cases, there is much data annotated but with noisy labels that lead to very poor accuracy outside the training datasets. For this reason, what I would like to do in my PhD project is to address this challenge by using semi-supervised learning, not just to overcome the lack of labelled data but also to try to improve the performance over test sets and to enhance the explainability of the algorithm. To achieve this goal I will use data provided by collaborators in Tampere and perhaps some of the available open datasets. With it, my goal will be to bring to this field some of the current state-of-the-art techniques in computer vision for similar problems and look to extend them based on the learnings obtained. I strongly believe that succeeding in my objective will have impact in the medical and health sciences field, enhancing and cheapening the diagnosis, and improving the current explainability of state of the art machine learning algorithms.
Injecting Structured Knowledge into Pretrained Language Models
Pre-trained language models such as BERT (Devlin et al., 2018) and XLNet (Yang et al., 2019) have greatly improved the performance of many NLP tasks. These models can capture rich patterns from large-scale corpus and learn good representations for texts. However such models have shortcomings – they underperform on complicated noisy text (Xiong et al., 2019) or texts that need inference and external knowledge to be understood. Niven et al. (2019) found that the reason why BERT performs well on the reasoning task is that it uses spurious statistical cues in the dataset, thus highlighting the limited capability of BERT to truly understand natural language. Liu et al. (2019) proposed to incorporate knowledge graphs into BERT to aid understanding. We build on this work and conjecture that incorporating structured knowledge, such as entity relations or linguistic information, can improve such models’ performance on some NLP tasks.
Specifically, we aim to explore how to inject structured knowledge into large-scale pre-trained models, and we wish to focus on two tasks: Question Answering and Sentiment Analysis. Our main focus will be on the English language, but some other languages such as Chinese will be explored – resources permitting – as well as cross-lingual representations. Two side questions that we also expect to address are 1) how to reduce model size through incorporating structured knowledge, and 2) understanding the role of particular pretraining objectives.
The human body has not yet adapted properly from the hunter-gatherer lifestyle to a more sedentary one, leading to a rise in dangerous, avoidable diseases including obesity, type 2 diabetes, and osteoporosis (Smyth, 2019). People are realising the importance of living a healthier, more active lifestyle. The advance in wearable sensors and mobile fitness applications reflects this. These technologies allow users to track their activities and set goals, but they do not take an active role in prescribing specific training and recovery activities for users (Smyth, 2019).
The context of endurance sports is great from a machine learning perspective for several reasons (Smyth, 2019). Firstly, the number of people participating in endurance events each year is large, and within that there are a lot of inexperienced individuals needing assistance. Secondly, the aforementioned rise in fitness technologies means that there is a ton of data available from mobile fitness apps such as Strava. Lastly, there are a number of interesting problems to be solved using machine learning techniques such as: fitness level estimation, training session classification, recovery and injury prediction, how to develop personalised training programs, goal time prediction and pacing planning (Smyth, 2019).
Prior work in this area includes the development of a novel application for recommender systems to predict a personal best marathon finish time and pacing plan for a user (Smyth, 2017). Initially this was done using runners who had run two or more marathons. The finish time prediction was accurate for fast runners, but less so for slower runners who would likely be those who would benefit most from this prediction. A later paper (Smyth, 2018), largely improved this using a richer training history for runners. However, these methodologies do not allow for first time marathon runners to determine a predicted finish time.
Therefore, this project will involve extending this previous work to inexperienced marathon runners by including data about different distance runs in place of missing marathons, as well as working on solving some of the other tasks mentioned. The explainability of recommendations made to users will also be a focus for this project since this will allow users to understand why specific training activities are being suggested to them.
Smyth, B., Cunningham, P. 2017. “A novel recommender system for helping marathoners to achieve a new personal-best,”
Artificial Intelligent (AI) systems are playing an increasing role in decision-making tasks for a variety of industry sectors and government bodies. As such, interactions between human users and AI systems are becoming much more commonplace, and there is a pressing need to understand how people can understand these systems and come to trust their abilities on diverse and critical tasks. However, these developments raise two fundamental problems: (i) how can we explain the black-box decision-processes of these AI systems and (ii) what type of explanation strategy will work best for people interacting with these systems
Recently, the field of eXplainable AI (XAI) has emerged as a major research effort, underpinned by its own DARPA program (Gunning, 2017 DARPA Report), to find answers to these questions. For example, Kenny and Keane (2019 IJCAI-19) have proposed a Twin System approach to explain the decisions, classifications and predictions of deep-learning systems by mapping the feature-weights of a black-box AI into a much more interpretable case-based reasoning (CBR) system to find explanatory cases. This type of post-hoc explanationby- example has a long history in the CBR literature but is marked by a paucity of user studies; that is, it is not at all clear whether people find these case-based explanations useful.
The proposed research will explore both computationally and psychologically the most effective ways in which cases can be used to explain black-box AI systems. Computationally, new algorithmic methods for finding different types of cases will be developed (e.g., to find counterfactual, semi-factual and factual cases) and explored in the context of the Twin Systems approach involving the three main data types used in deep learning systems (i.e., images, text and tabular data). Psychologically, user studies will be performed to evaluate the explanatory validity of case-based explanations and to identify the optimal forms these might take to aid human users.
Outcomes from the work will be (i) a generic computational framework that can applied to any decisionmaking AI system, (ii) definitive knowledge about how and what cases may be deployed to accurately explain the decision processes of such AI systems. The outcomes of the work will be a generic framework and solution to the XAI problem in the context of post-hoc explanation by example, to help users garner a better understanding of AI systems and to help them be more satisfied and trusting of their decision making processes.
In order to deliver new and innovative products in a timely manner, and to respond to growing consumer demands, companies need to mine and analyse a large number of noisy data sources (both web and in house). The goal? To obtain a holistic view on emerging consumer fads and trends as early as possible in their hype cycle. However, current legislation (e.g. GDPR) prevents the blind analysis of any (potentially) personal or socially-sensitive dataset beyond its intended collection purpose without the expressed informed consent of any affected individuals. This project aims to investigate novel automated solutions for ethical social listening: identification of trends, consumer sentiment, emerging fads and ideas, and opportunity analysis. This includes leveraging existing as well as developing novel methods to distill product reviews/ratings, search volumes and search ranks using in house as well as external sources of (web) data. Key challenges in this project will be “intelligently’’ curating appropriate data sources (automated cleaning, transformation and modelling), handling mixed data, dealing with structured as well as unstructured data, and operating within the domain of Fairness, Accountability, and Transparency in Machine Learning (FATML).
EU directives set out targets for renewable electricity, heat and transport. Low carbon technologies (LCTs) such as heat pumps (HPs) and electric vehicles (EVs) are critical components of the Government of Ireland’s Climate Action Plan to respond to the need to decarbonise heating and transport. The rate of adoption of these technologies is uncertain, but they pose considerable new challenges if they are to become fully- integrated components of the national infrastructure. In particular, the adoption of EVs on a large scale makes it necessary to solve large-scale, modified routing problems with many constraints more urgently than before.
This project responds to the need to identify how energy systems can be transformed to be secure (reliable), clean (green and sustainable), and fair (ensuring the citizen is at the centre of, and benefits from the transformed system). The project aims, in particular, to explore the design demands of electric vehicle routing problems and how they might be resolved efficiently and effectively.
Specifically, for problems such as the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP), the traditional approaches of integer linear programming solvers and constraint programming solvers do not scale to larger problem instances. The project will build on existing supervised, unsupervised and reinforcement learning techniques to develop scalable solutions for these optimisation problems. These machine learning techniques will be used to improve algorithmic efficiency, in terms of solution-time and to solve more difficult routing problems with more constraints while enabling developers to automatically design their algorithms to solve specific problem variants. Building on the recent advances in these fundamental machine learning areas, the project will aim to achieve the energy system transformation by providing fast, scalable, ML-assisted approaches to VRPs and, in particular, E-VRPs. Also, the project has opportunities to explore the generalisation of these techniques to other difficult, important combinatorial optimisation problems by designing problem-independent techniques.
Knowledge-enhanced text mining for short documents
With the vast amount of short text documents available online (tweets, forum messages, social networks in general, etc.) short text document mining has been an important part of natural language processing. However, unlike longer texts, short documents often lack contextual information, are often grammatically incorrect and may contain abbreviations. As a result the traditional approaches for text mining tasks don’t apply well on short documents.
The aim of this project is to investigate augmenting the traditional text mining tasks with semantic information by utilising linguistic resources such as Wordnet, ConceptNet, knowledge graphs, distributed representations, Wikipedia (for identifying related concepts and therefore additional feature), or other suitable ontologies/repositories.
One approach would be to investigate is the semantic analysis of the different parts of speech (nouns, adjectives, verbs) in local context (semantic similarity between pairs of words) and global context (lexical chains) with Wordnet and the effect of augmenting some vs. all on the performance on different linguistic resources. Similarity measures can be evaluated based on path and the contents of synonyms or hyponyms. Usage of graph-based approaches such as page rank algorithm for context mapping among short text and linguistic resources such as Wikipedia is another area we will look at.
Moreover we will look at representation techniques of features of text such as Co-occurrences or keywords, collocations, predicate-argument relations (Verb-object, subject-verb), Head of Noun and word phrases for augmenting the text mining tasks. Another suggested approach would be a building of distribution semantic model with lexical resources. Investigation of the usage and identification of word and contexts, weights and space reduction techniques (LSI, LSA, and PCA) will be explored under this approach.
There is a variety of possible applications areas for the outcome of this research. For example, fake news detection, business intelligence, enhancement of recommender systems (Content based filtering). Another potential or more specific application would be the Social media data analysis. Our research approach could be applied on social media posts to analyze large volumes of unstructured data.
Monitoring Human Engagement in a Situated Dialogue
Currently, many AI dialogue assistants such as Siri, Xiaodu smart speakers, or Alexa have become very popular. However despite their initial popularity, many users rely on them only to set timers or play particular songs etc. Achieving true conversational interaction will require considerably more research.
One metric that will be very important to maximise in order to achieve true conversational interaction is engagement. In the context of dialogue systems, engagement can be thought of as an estimate of just how much the user is interacting with a dialogue system, and importantly, a measure of whether they are enjoying the interaction and are likely to continue into the future. Without having strong engagement, system and users cannot maintain a long history connection. In order to enhance engagement, we must monitor the physical and audio signs of engagement, and also fine tune the language produced by the system in order to maximise the user engagement and adjust strategy as appropriate.
In this PhD project, we will focus on the issue of engagement in dialogue and model how it can be monitored, and how dialogue policies can be adjusted to account for situated engagement. For modelling we will look at a combination of visual and audio / content monitoring to estimate engagement levels. The visual aspects might for example include facial thermal images (applying noir camera filters), facial expression or body temperature and blood pressure. Audio and content elements will focus on the amount of pauses etc in speech as well as the sentiment of content. In the second part of this work we will look at how the dialogue production policy can be adjusted in task oriented dialogues to maximise engagement and fine tune to individual users. One potential model that we will investigate for this purpose is Hierarchical Deep Reinforcement Learning. Deep Reinforcement Learning has in general been found to be very useful in planning dialogue strategy. A hierarchical variant of it has to potential to sperate the different levels of language production so that the policies can come more potential variations without becoming computationally too costly.
Empirical studies and computational modelling will be balanced through the entire PhD project. Data collection and models validation will be executed in empirical studies. As for computational modelling, it is a learning process of Hierarchical Reinforcement learning by a study of the current state-of-the-art methods in the end-to-end neural dialogue processes.
Using Generative Forms of Media to Summarise Video
What is the area of project? Deep Learning has made enormous progress in terms of performance and capabilities of understanding multimedia contents in the past few years. Fused between the fields of Computer Vision and Natural Language Processing, video forms of multimedia poses many interesting research challenges and opportunities.
Why is it important? An enormous amount of video content is being generated daily. How to process and make sense of those information streams can provide tremendous commercial values. From a theoretical point of view, being able to process and understand video content compared with image and text alone, is a step toward more advanced and general AI.
What are the vectors of attack? Audio and visual information in video makes it a great testbed for multi-modal Deep Learning. Highlystructured and sequential in nature, it represents a fertile ground for self-supervised and unsupervised learning methods. In addition, true generative video content is also under-explored compared to generative content in the form of text and images.
What is the expected outcome? Major challenges in video contains can be roughly divided into a few sub-tasks related to video structuring, video description, video shortening, video rating/ranking and video generation.
This project will focus on advancing the state-of-the-art in different video-related tasks and explore how generative forms of multimedia content can be created as summaries of video.
Users are likely to look for news items that are consistent with their own cognitive and political views and ignore news items that are contrary to their own views. The current mainstream personalized recommendation algorithms have exacerbated this so-called ‘filter bubble effect’ phenomenon. The research showed that filter bubbles in the news domain can create serious effects, such as diminishing public discourse, and the fostering of highly polarised views amongst users, etc. This proposed project aims to build a state-of-the-art deep content-based news recommender system that can mitigate these adverse. Based on this goal, we identify a number of potential research tasks to discover accurate information behind news items and improve the performance of current news recommender systems. For example, we will explore how to generate unbiased news summarization as using unbiased news summarization as training data to train recommender systems may reduce the tendency of recommender systems to provide biased guidance; we will study multi-source news summarization to present users with comprehensive summarization of events to increase user awareness of the filter bubble effect; We will build an end-to-end news recommendation model by taking the advantages of cutting-edge deep learning and NLP technologies.
Post-hoc methods of explainability and interpretability of convolutional and recurrent neural networks
In the recent past, the exponential growth in computational power has led to the development and deployment of many Machine Learning models for tasks in Computer Vision, Natural Language Processing among the others. With AI techniques revolutionizing a myriad of sectors of human life in a positive way, there is a fundamental problem which if unaddressed, can be quite detrimental with far-reaching consequences. This problem concerns the “black-box” nature of some of the sophisticated, State-Of-The-Art (SOTA) AI models which in turn lead to skepticism, distrust, lack of confidence and reluctance over the acceptance of the predictions generated by them. Its human psychology to reason, validate decisions rather than accepting something randomly generated by a black-box. Predictions coupled with explanations aids in their understandability and better acceptance, supporting decision-making. Thus, to improve the acceptance rate of such complex models, by winning over the trust, it’s highly recommended to make models more transparent and interpretable to the end-user. This research focuses on addressing the sine qua non of AI, i.e., explainability and interpretability, to make it trustworthy and reliable across various practical domains such as medicine and also act as a catalyst to motivate progress and development in the field of AI. This research intends to provide explanations related to the functioning of models learned with CNNs (Convolutional Neural Network), with RNNs (Recurrent Neural Network) and in hybrid CNN/RNN neural networks. There are various methods that have been explored in the past to provide explainability to AI, such as visual representation, symbolic reasoning, causal-inferencing, rule-based systems, fuzzy-inference systems, to name a few. This research focuses on post-hoc methods related to explainability, wherein, the model architecture is left unperturbed while the predictions of the model are explained using propositional rules.
Assessing the Condition of Irish Pavements (Road Surfaces) using Computer Vision & Machine Learning
The condition assessment of road surface (pavements) is a crucial task in order to ensure their usability and provide maximum safety for the public. It also allows the government to assign the limited resources for maintenance and consider long-term investment schemes. Pavement defects vary depending on the pavement surface. Pavement defects include cracking caused by failure of the surface layer, surface deformation such as rutting that results from weakness in one or more layers of the pavement, disintegrations such as potholes caused by progressive breaking up of pavement into small loose pieces and surface defects such as ravelling caused by errors during construction such as insufficient adhesion between the asphalt and aggregate particulate materials. Currently the road inspection is performed by the manual visual inspection where the structural engineers or certified inspectors manually assess the road condition. However, manual visual inspection is time consuming and cost-intensive. Over the last decade numerous technologies such as machine learning and computer vision have been applied for the assessment of road conditions such as cracks, potholes etc. An automated road cracks/defects detection and classification system could become a valuable tool for improving the performance and accuracy of the inspection and assessment process. Such a system could be used to evaluate the recorded images/videos to extract road condition data. An automated defect/cracks detection system could be integrated into existing road inspection tools to support the inspection process by providing real-time feedback and alerting the operator through highlighting the road defects, thus avoiding possible misinterpretation or missing defects due to operator fatigue. The aim of this research is to develop a machine learning approach to support automated detection/classification and segmentation of pavement defects using road image/videos obtained from various image acquisition devices.
If we have two different explanations from the same machine learning algorithm, or from two different machine learning algorithms, which explanation is better?
Can we quantitatively compare the usefulness of explanations by linking them to certain tasks, where we can assess that an explanation is better because it helps with solving the task better. For example, can the given explanation help a human or a machine improve the accuracy or speed of labelling of a given set of examples? How do we objectively compare explanations for given tasks, and what are good ways to compute and compare the usefulness of explanations?
This project focuses on building supervised machine learning models in the context of sequence and/or time series classification, and developing methods for providing explanations and assessing their usefulness for different applications, for example in the sports science or smart agriculture domains. We will start from deep learning methods and post-hoc explanation methods that aim to explain black-box models such as CAM (Content Activation Maps) and compare these to state-of-the-art linear models (that are intrinsically easier to explain) and their associated explanation. Such techniques, when used in the time series classification task, aim to highlight parts of the time series signal that are useful for the classifier in reaching a classification decision, the so called discriminative parts of the signal.
Can we make use of these highlights/explanations in achieving higher classification accuracy by a second stage classifier or by a human, can we improve the robustness of labelling and do all of this faster, thus allowing us to quantify and compare the usefulness of different explanations.
Deep neural networks (DNN) underpin state-of-the-art applications of artificial intelligence (AI) in almost all fields, such as image, speech and natural language processing. However, DNN architectures are often data, compute, space, power and energy hungry, typically requiring powerful GPUs or large-scale clusters to train and deploy, which has been viewed as a “non-green” technology. Furthermore, often the best performing models are ensembles of hundreds or thousands of base-level models. The space required to store these cumbersome models, and the time required to execute these models at run-time, significantly prohibit their use for applications with limited memory, storage space, or computational power such as mobile devices or sensor networks, and for applications in which real-time predictions are needed.
Knowledge distillation (a novel and cutting-edge model compression method for deep neural nets) can transfer the knowledge from a teacher network (a cumbersome model) to a student (a small model) network, so it is a much more promising technique to disrupt current situation for NLP tasks where almost all systems are tending to use cumbersome DNN architectures. Knowledge distillation techniques have been successfully adapted to the state-of-the-art speech synthesis model WaveNet, which generates realistic-sounding voices for the Google Assistant. This production model is more than 1000 times faster than the original and with higher quality. However, for NLP tasks using cumbersome DNNs (e.g. neural machine translation), distilling knowledge is more challenging and different from the speech task.
Therefore, our goal in this proposal is to develop a more efficient and effective knowledge distillation framework to build fast and compact DNN models for NLP tasks, and to deploy on resource-constrained environments without quality loss and with low latency. Regarding this goal, we have three specific questions to address:
(1) architecture of the student model: it needs to be a simple and small architecture, suitable for parallel computing in terms of training and inference, and suitable for deploying at resource-constrained environments;
(2) the kind of knowledge that needs to be transferred or distilled: the original model memorises the whole dataset and it learns different knowledge, so how can we design the objective function so that we can transfer the required knowledge to the student model?
(3) balance between model size and performance: we need to carefully design the architecture, knowledge to be distilled and objective function to have a better balance between the model size and system performance based on the deployment and run-time requirements.
Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging and regulatory networks in genetics. For this reason, Graph Neural Networks have recently emerged as an interesting methodology of analysing graphs whilst leveraging the power of deep learning. Being their ability to fit many real-world datasets, which have an inherent graph structure, GNNs have found applications in many different domains including:
Applications of GNNs in computer vision include scene graph generation, point cloud classification and segmentation, action recognition, etc.
Recognising semantic relationships between objects facilitates the understanding of the meaning of a visual scene. Scene graph generation models aim to parse an image into a semantic graph which consists of objects and their semantic relationships. Another application reverses the process by generating realistic images given scene graphs. This hints at the intriguing possibility of synthesising images given textual descriptions.
Traffic Accurately forecasting traffic speed, volume or the density of roads in traffic networks is fundamentally important in a smart transportation system. The authors of address the traffic prediction problem using Spatio-Temporal GNNs.
Recommender Systems Graph-based recommender systems consider items and users as nodes.
Chemistry In the field of chemistry, researchers apply GNNs to study the graph structure of molecules/compounds.
Though GNNs have proven their power in learning graph data, challenges still exist due to the complexity of graphs. This PhD project aims to target some urgent challenges and issues facing the generalisation of GNNs,
Model Depth: the performance of a ConvGNN drops dramatically with an increase in the number of graph convolutional layers. This raises the question of whether going deep is still a good strategy for learning graph data.
Scalability Trade-off The scalability of GNNs is achieved at the price of corrupting graph completeness. To perform the pooling operation to coarsen graphs, some works use sampling , others use clustering, in both approaches, the model will lose part of the graph information. By sampling, a node may miss its influential neighbors. By clustering, a graph may be deprived of a distinct structural pattern. How to trade-off algorithm scalability and graph integrity could be a future research direction.
The aim of this PhD is to develop new algorithms that tackle these challenges and extend GNNs to different usage. This target will be demonstrated by showing the performance improvement in distinct application domains.