Deep Learning from Crawled Spatio-Temporal Representations of Video (DECSTER)

Background image of many Video stills giving the impression of streaming video media, foreground image shows flowing 0s and 1s, with a stylised swirling vector interrogating the data streams

1 July 2018

Applying deep learning in video without using pixel representations. Considering spatio-temporal activity information that is directly extractable from compressed video bitstreams or neuromorphic vision sensing (NVS) hardware

Funder EPSRC
Amount £ 840 401

Project website gow.epsrc.ukri.org

Research topics Deep Learning | Video Delivery | Activity Recognition | Scene Recognition | Object Recognition

Description

Video has been one of the most pervasive forms of online media for some time. Several statistics show that video traffic will dominate IP networks within the next five years. Yet, video remains one of the least-manageable elements of the big data ecosystem. This project argues that this difficulty stems primarily from the fact that all advanced computer vision and machine learning algorithms view video as a stream of frames of picture elements. This is despite the fact that pixel-domain representations are known to be notoriously difficult to manage in machine learning systems, mainly due to: their high volume, high redundancy between successive frames, and artifacts stemming from camera calibration under varying illumination. We propose to abandon pixel representations and consider spatio-temporal activity information that is directly extractable from compressed video bitstreams or neuromorphic vision sensing (NVS) hardware.

The first key outcome of the project will be to design deep neural networks (DNNs) that ingest such activity information in order to derive state-of-the-art classification, action recognition and retrieval results within large video datasets. This will be achieved at record-breaking speed and comparable accuracy to the best DNN designs that utilize pixel-domain video representations and/or optical flow calculations.

The second key outcome will be to design and prototype a crawler-based bitstream parsing and analysis service, where some of the parsing and processing will be carried out by a bitstream crawler running on a remote repository, while the back-end processing will be carried out by high-performance servers in the cloud.

This will enable for the first time the continuous parsing of large compressed video content libraries and NVS repositories with new & improved versions of crawlers in order to derive continuously-improved semantics or track changes and new content elements, in a manner similar to how search engine bots continuously crawl web content. These outcomes will pave the way for exabyte-scale video datasets to be newly-discovered and analysed over commodity hardware.

Outputs

View Principal Investigator's Publications

Team

Lead Institution
UCL

Principal Investigator (PI)
Yiannis Andreopoulos

UCL members
Mohammad Ashraful Anam,
Aaron Chadha,
Andrea Piccione

Academic Collaborators
Queen Mary University of London

Industry Collaborators
Focal International Limited,
iniVation,
Soundmouse Ltd,
Yamaha Motor Co. Ltd