Using Generative Forms of Media to Summarise Video
What is the area of project?
Deep Learning has made enormous progress in terms of performance and capabilities of understanding multimedia contents in the past few years. Fused between the fields of Computer Vision and Natural Language Processing, video forms of multimedia poses many interesting research challenges and opportunities.
Why is it important?
An enormous amount of video content is being generated daily. How to process and make sense of those information streams can provide tremendous commercial values. From a theoretical point of view, being able to process and understand video content compared with image and text alone, is a step toward more advanced and general AI.
What are the vectors of attack?
Audio and visual information in video makes it a great testbed for multi-modal Deep Learning. Highlystructured and sequential in nature, it represents a fertile ground for self-supervised and unsupervised learning methods. In addition, true generative video content is also under-explored compared to generative content in the form of text and images.
What is the expected outcome?
Major challenges in video contains can be roughly divided into a few sub-tasks related to video structuring, video description, video shortening, video rating/ranking and video generation.
This project will focus on advancing the state-of-the-art in different video-related tasks and explore how generative forms of multimedia content can be created as summaries of video.