Designing and Implementing a Resilient Deep Learning Framework

University of Leeds

United Kingdom, Leeds

University RankQS Ranking

Apply Now!

Go to Official Website

Home/United Kingdom/Leeds /University of Leeds/Designing and Implementing a Resilient Deep Learning Framework

University of Leeds

United Kingdom , Leeds

Key Facts

Program Level

PhD (Philosophy Doctorate)

Study Type

Full Time

Delivery

On Campus

Campuses

Main Site

Program Language

English

Start & Deadlines

Next Intake Deadlines

Apply to this program

Go to the official application for the university

Designing and Implementing a Resilient Deep Learning Framework

About

Summary

Foundation models across various domains are experiencing rapid growth, necessitating continuous expansion to enhance performance. However, training these Large Language Models (LLMs) not only demands significant resources but also relies on a robust and dependable system to ensure an effective training process.

Algorithm engineers face numerous challenges when training realistic LLMs, including server crashes, hardware failures, software compatibility issues, network communication errors, and unknown hangs. These failures result in the loss of training output and necessitate multiple restarts, consuming extra time and resources. For instance, launching the training process for a 175B model in a distributed environment requires several hours, occupying a substantial portion of the total training stage, which many researchers find financially burdensome.

Therefore, establishing a robust and dependable platform to support the entire lifecycle of LLM development is not only complex and challenging but also urgently required.
The project aims to explore and develop a resilient deep learning framework, investigating its scientific foundation, to enhance the LLM development lifecycle, with a specific focus on failover perspectives. The system is designed to tolerate any worker's crash or failure without impacting its overall execution. The automatic failover process, transparent to upper-level users, efficiently restarts and re-initializes failed workers based on soft or hard states. Given the novelty of this research, students are encouraged and supported to publish ground-breaking papers at top-tier conferences and even explore technical patents for potential startups.

Requirements

Entry Requirements

Applicants to research degree programmes should normally have at least a first class or an upper second class British Bachelors Honours degree (or equivalent) in an appropriate discipline. The criteria for entry for some research degrees may be higher, for example, several faculties, also require a Masters degree. Applicants are advised to check with the relevant School prior to making an application. Applicants who are uncertain about the requirements for a particular research degree are advised to contact the School or Graduate School prior to making an application.

English Program Requirements

The minimum English language entry requirement for research postgraduate research study is an IELTS of 6.5 overall with at least 6.5 in writing and at least 6.0 in reading, listening and speaking or equivalent. The test must be dated within two years of the start date of the course in order to be valid. Some schools and faculties have a higher requirement.