Search

Chat With Us

    Designing and Implementing a Resilient Deep Learning Framework
    Go to University of Leeds
    University of Leeds

    Designing and Implementing a Resilient Deep Learning Framework

    University of Leeds

    University of Leeds

    flag

    United Kingdom, Leeds

    University RankQS Ranking
    83

    Key Facts

    Program Level

    PhD (Philosophy Doctorate)

    Study Type

    Full Time

    Delivery

    On Campus

    Campuses

    Main Site

    Program Language

    English

    Start & Deadlines

    Next Intake Deadlines
    Apply to this program

    Go to the official application for the university

    Designing and Implementing a Resilient Deep Learning Framework

    About

    Summary

    Foundation models across various domains are experiencing rapid growth, necessitating continuous expansion to enhance performance. However, training these Large Language Models (LLMs) not only demands significant resources but also relies on a robust and dependable system to ensure an effective training process.

    Algorithm engineers face numerous challenges when training realistic LLMs, including server crashes, hardware failures, software compatibility issues, network communication errors, and unknown hangs. These failures result in the loss of training output and necessitate multiple restarts, consuming extra time and resources. For instance, launching the training process for a 175B model in a distributed environment requires several hours, occupying a substantial portion of the total training stage, which many researchers find financially burdensome.

    Therefore, establishing a robust and dependable platform to support the entire lifecycle of LLM development is not only complex and challenging but also urgently required.
    The project aims to explore and develop a resilient deep learning framework, investigating its scientific foundation, to enhance the LLM development lifecycle, with a specific focus on failover perspectives. The system is designed to tolerate any worker's crash or failure without impacting its overall execution. The automatic failover process, transparent to upper-level users, efficiently restarts and re-initializes failed workers based on soft or hard states. Given the novelty of this research, students are encouraged and supported to publish ground-breaking papers at top-tier conferences and even explore technical patents for potential startups.

    Requirements

    Entry Requirements

    Applicants to research degree programmes should normally have at least a first class or an upper second class British Bachelors Honours degree (or equivalent) in an appropriate discipline. The criteria for entry for some research degrees may be higher, for example, several faculties, also require a Masters degree. Applicants are advised to check with the relevant School prior to making an application. Applicants who are uncertain about the requirements for a particular research degree are advised to contact the School or Graduate School prior to making an application.

    English Program Requirements

    The minimum English language entry requirement for research postgraduate research study is an IELTS of 6.5 overall with at least 6.5 in writing and at least 6.0 in reading, listening and speaking or equivalent. The test must be dated within two years of the start date of the course in order to be valid. Some schools and faculties have a higher requirement.

    Fee Information

    Tuition Fee

    GBP 0 

    Application Fee

    GBP  
    University of Leeds

    Designing and Implementing a Resilient Deep Learning Framework

    University of Leeds

    [object Object]

    United Kingdom,

    Leeds

    Similar Programs

    Other interesting programs for you

    Find More Programs
    Wishlist