XClose

Institute of Communications and Connected Systems

Home
Menu

Optics for Distributed Learning

Split screen. graphic image of a Neuro processing device and fibre optics

1 October 2020



Developing large scale distributed learning computing systems, using new topologies, partitioning and scheduling strategies over optical networks.
 


Funder Microsoft
Amount £ 73 000

 

Research topics Optics for distributed learning | Reconfigurable topologies | Neural network computational graphs  | Distributed learning partitioning algorithm  | Gradient reduce strategies  


Description

Data centres have been historically based on a server-centric approach with fixed amounts of processor and directly attached memory resources within the boundary of a mainboard tray. The mismatch between fixed proportionalities and diverse set of workloads can lead to substantially under-utilized resources (some cases even below 40%) that account for 85% of the total data centre cost.

The project aims to explore resource (xPU, memory, storage) disaggregation at Rack and Cluster level and identify the scalability limits both in terms of the number of end-points, network capacity per CPU/Memory, physical distance and the associated penalties to processing power, sustained memory bandwidth etc. In particular, the project will focus on optical network technologies, topologies, strategies and control to support large scale distributed learning, using heterogeneous resources (xPUs) in order to minimize training time of diverse distributed learning models.

Outputs

View Principle Investigator's Publications