The goal of this project is to allow higher level control of a sound synthesis model. We will have a corpus of sounds. At a low level each sound is represented as a time series of the sampled waveform. Each sound can be analysed to give mid-level parameters, e.g. log-mel-spectrogram representation or other time frequency description and we can also collect labels such as roughness or happiness, on a scale of 1-10 (call these “highlevel parameters”).
We have a synthesis model that is controlled by “mid-level parameters”. Currently many machine learning based synthesis techniques use log mel-spectrograms to represent audio. This is a perceptually informed timefrequency representation. In synthesis this has the challenge of reconstructing the phase. This can be done using existing signal processing methods (although Wavenet uses a ‘neural vocoder’ to convert the spectrogram into a time domain waveform). We will explore other representations of sound which may be more suitable for flexible single source sound synthesis. Our synthesis model will generate phase in a deterministic manner.
We hypothesise that we get better-sounding results on transients etc using the synthesis model than by inverting the spectrogram.
Our goal is to control the synthesis model via high level parameters. The user will directly control high-level parameters, and the UI will output mid-level parameters, and the synthesis model will then output audio.
The UI can be driven by a neural network. The input layer will be high-level parameters and the output will be mid-level parameters. Initially it will be an “instantaneous” model i.e. for each window it will take in current values of high-level parameters and output mid-level parameters for the same window. Later steps in the research could make it non-instantaneous but still causal. This could be done using a convolutional network taking in multiple recent time-steps, or a recurrent neural network e.g. LSTMs.
Variational autoencoders (VAEs) and Generative adversarial networks (GANs) and hybrids are probably the state of the art for unsupervised learning. Google’s Magenta project has used these models to create new audio synthesiser spaces. Google WaveNet is a pre-trained convolutional AE for audio. Google NSynth is a WaveNet-like model which allows user control of a control space. In NSynth the dimensions defined by interpolation between multiple real instruments. The proposed research is different because it chooses fixed high-level features as the control variables.
Using Knowledge Graphs to Improve Data Quality in Machine Learning
Knowledge graphs have a wide variety of applications such as question-answering, digital assistants, structured reasoning and exploratory reasoning. One aspect of knowledge graphs is that they can be validated by a rule-based system, such as a knowledge base.
Existing data set may be semantically incoherent. Creating large scale knowledge graphs out of existing datasets would add knowledge representation to the semantically incoherent data which in turn would help improving accuracy in prediction tasks for machine learning models. Knowledge graphs by definition are relationship rich because they allow any-to-any relationships. By using the semantic graph integration approach, you’re guaranteeing the use of the most effective large scale, web-scale, and data integration method, one that’s symbiotic with machine learning. It allows to creation of better and most fully disambiguated training sets. This improves the quality of the data set for better results.
The proposed model transforms a data set to knowledge graphs and uses a knowledge base for evaluation of data quality as an additional layer in data prediction with machine learning models. These models would have the potential to simplify the task, use reasoning to enhance the dataset and assert quality issues in the underlying dataset. Especially, in the case of deep domains with very complex rules and complex interaction between rules, there is no substitute. Such scenarios are evident when there is a requirement to integrate disparate domains.
To summarize, the project combines the research fields of Semantic Web and Machine Learning. The goal is to design an ontology that validates datasets which are previously translated into knowledge graphs. However, this would involve work on description logic and propositional logic and building a knowledge base in Prolog. The workflow could go along the lines of 1. Translating a dataset into a knowledge graph which represents dataset characteristics and tells us about the quality of the dataset, 2. Feeding the dataset into the knowledge base (which is the main part of the project) to check against DL/PL rules. 3. Summarizing the results and making suggestions/automatic corrections to the dataset. A part of the work would be to investigate a SHAQL-style restriction language to define the rules.
Knowledge-enhanced text mining for short documents
With the vast amount of short text documents available online (tweets, forum messages, social networks in general, etc.) short text document mining has been an important part of natural language processing. However, unlike longer texts, short documents often lack contextual information, are often grammatically incorrect and may contain abbreviations. As a result the traditional approaches for text mining tasks don’t apply well on short documents.
The aim of this project is to investigate augmenting the traditional text mining tasks with semantic information by utilising linguistic resources such as Wordnet, ConceptNet, knowledge graphs, distributed representations, Wikipedia (for identifying related concepts and therefore additional feature), or other suitable ontologies/repositories.
One approach would be to investigate is the semantic analysis of the different parts of speech (nouns, adjectives, verbs) in local context (semantic similarity between pairs of words) and global context (lexical chains) with Wordnet and the effect of augmenting some vs. all on the performance on different linguistic resources. Similarity measures can be evaluated based on path and the contents of synonyms or hyponyms. Usage of graph-based approaches such as page rank algorithm for context mapping among short text and linguistic resources such as Wikipedia is another area we will look at.
Moreover we will look at representation techniques of features of text such as Co-occurrences or keywords, collocations, predicate-argument relations (Verb-object, subject-verb), Head of Noun and word phrases for augmenting the text mining tasks. Another suggested approach would be a building of distribution semantic model with lexical resources. Investigation of the usage and identification of word and contexts, weights and space reduction techniques (LSI, LSA, and PCA) will be explored under this approach.
There is a variety of possible applications areas for the outcome of this research. For example, fake news detection, business intelligence, enhancement of recommender systems (Content based filtering). Another potential or more specific application would be the Social media data analysis. Our research approach could be applied on social media posts to analyze large volumes of unstructured data.
Monitoring Human Engagement in a Situated Dialogue
Currently, many AI dialogue assistants such as Siri, Xiaodu smart speakers, or Alexa have become very popular. However despite their initial popularity, many users rely on them only to set timers or play particular songs etc. Achieving true conversational interaction will require considerably more research.
One metric that will be very important to maximise in order to achieve true conversational interaction is engagement. In the context of dialogue systems, engagement can be thought of as an estimate of just how much the user is interacting with a dialogue system, and importantly, a measure of whether they are enjoying the interaction and are likely to continue into the future. Without having strong engagement, system and users cannot maintain a long history connection. In order to enhance engagement, we must monitor the physical and audio signs of engagement, and also fine tune the language produced by the system in order to maximise the user engagement and adjust strategy as appropriate.
In this PhD project, we will focus on the issue of engagement in dialogue and model how it can be monitored, and how dialogue policies can be adjusted to account for situated engagement. For modelling we will look at a combination of visual and audio / content monitoring to estimate engagement levels. The visual aspects might for example include facial thermal images (applying noir camera filters), facial expression or body temperature and blood pressure. Audio and content elements will focus on the amount of pauses etc in speech as well as the sentiment of content. In the second part of this work we will look at how the dialogue production policy can be adjusted in task oriented dialogues to maximise engagement and fine tune to individual users. One potential model that we will investigate for this purpose is Hierarchical Deep Reinforcement Learning. Deep Reinforcement Learning has in general been found to be very useful in planning dialogue strategy. A hierarchical variant of it has to potential to sperate the different levels of language production so that the policies can come more potential variations without becoming computationally too costly.
Empirical studies and computational modelling will be balanced through the entire PhD project. Data collection and models validation will be executed in empirical studies. As for computational modelling, it is a learning process of Hierarchical Reinforcement learning by a study of the current state-of-the-art methods in the end-to-end neural dialogue processes.
Post-hoc methods of explainability and interpretability of convolutional and recurrent neural networks
In the recent past, the exponential growth in computational power has led to the development and deployment of many Machine Learning models for tasks in Computer Vision, Natural Language Processing among the others. With AI techniques revolutionizing a myriad of sectors of human life in a positive way, there is a fundamental problem which if unaddressed, can be quite detrimental with far-reaching consequences. This problem concerns the “black-box” nature of some of the sophisticated, State-Of-The-Art (SOTA) AI models which in turn lead to skepticism, distrust, lack of confidence and reluctance over the acceptance of the predictions generated by them. Its human psychology to reason, validate decisions rather than accepting something randomly generated by a black-box. Predictions coupled with explanations aids in their understandability and better acceptance, supporting decision-making. Thus, to improve the acceptance rate of such complex models, by winning over the trust, it’s highly recommended to make models more transparent and interpretable to the end-user. This research focuses on addressing the sine qua non of AI, i.e., explainability and interpretability, to make it trustworthy and reliable across various practical domains such as medicine and also act as a catalyst to motivate progress and development in the field of AI. This research intends to provide explanations related to the functioning of models learned with CNNs (Convolutional Neural Network), with RNNs (Recurrent Neural Network) and in hybrid CNN/RNN neural networks. There are various methods that have been explored in the past to provide explainability to AI, such as visual representation, symbolic reasoning, causal-inferencing, rule-based systems, fuzzy-inference systems, to name a few. This research focuses on post-hoc methods related to explainability, wherein, the model architecture is left unperturbed while the predictions of the model are explained using propositional rules.
Assessing the Condition of Irish Pavements (Road Surfaces) using Computer Vision & Machine Learning
The condition assessment of road surface (pavements) is a crucial task in order to ensure their usability and provide maximum safety for the public. It also allows the government to assign the limited resources for maintenance and consider long-term investment schemes. Pavement defects vary depending on the pavement surface. Pavement defects include cracking caused by failure of the surface layer, surface deformation such as rutting that results from weakness in one or more layers of the pavement, disintegrations such as potholes caused by progressive breaking up of pavement into small loose pieces and surface defects such as ravelling caused by errors during construction such as insufficient adhesion between the asphalt and aggregate particulate materials. Currently the road inspection is performed by the manual visual inspection where the structural engineers or certified inspectors manually assess the road condition. However, manual visual inspection is time consuming and cost-intensive. Over the last decade numerous technologies such as machine learning and computer vision have been applied for the assessment of road conditions such as cracks, potholes etc. An automated road cracks/defects detection and classification system could become a valuable tool for improving the performance and accuracy of the inspection and assessment process. Such a system could be used to evaluate the recorded images/videos to extract road condition data. An automated defect/cracks detection system could be integrated into existing road inspection tools to support the inspection process by providing real-time feedback and alerting the operator through highlighting the road defects, thus avoiding possible misinterpretation or missing defects due to operator fatigue. The aim of this research is to develop a machine learning approach to support automated detection/classification and segmentation of pavement defects using road image/videos obtained from various image acquisition devices.