Federated learning for tackling multimodal and class-imbalance problems in healthcare at University of Leeds

About

Summary

The project aims to develop generalisable models in a distributed Federated Learning framework by investigating novel ways for boosting model performance, improving generalisability, tackling class imbalance, and addressing potential biases in data at client locations. The selected candidate is expected to work in a multi-disciplinary team, for example, with existing clinical collaborators.

Full description

Background

Advancements in data-driven machine learning (ML) techniques have shown tremendous potential in various sectors, including healthcare. However, training a machine learning model and its deployment still needs to be improved as it is challenging in healthcare due to significant data heterogeneity and lack of access to big data. Data sharing between different sites is intractable due to privacy concerns and regulatory challenges. Most ML techniques are data-voracious and require large datasets to generalise on a particular population or data centre distribution. It is often not feasible to obtain large heterogeneous labelled datasets as obtaining ground truth labels is tedious and time-consuming, requiring expert time, which is expensive. As a result, most centres usually have small local datasets that are insufficient to train a model with high accuracy and good generalisability. Furthermore, data from a specific centre can only be biased to a particular population.

Federated learning (FL) allows to train a model across decentralised devices or servers holding data locally which safeguards privacy and data security at the same time aiming to leverage the data from other centres in a distributed way. This increases the training dataset size and hence tackles the above-mentioned limitations in the medical domain. Even though several techniques in FL have been proposed in the past, utilising multi-modal data (both images with various modalities, and text) for multi-task learning (e.g., detection and diagnosis) has been limited and widely steered around model aggregation/fusion technique and fine tuning at client locations locally to avoid risk of data exposure. ML models trained in FL setting can still suffer from performance gap between seen and unseen patient settings and modality differences [1-2].

Objectives

The project aims to develop generalisable models in a distributed FL framework by investigating novel ways for boosting model performance, improving generalisability, tackling class imbalance, and addressing potential biases in data at client locations.

A few multimodal datasets will be provided at the start of the project. Tasks of the project is purely research that lies on two fundamental questions – 1) can we leverage multi-modal data from various centres to enhance FL performance and its generalisability? and 2) can we devise a technique to uplift local performance, tackle local class-imbalance, and provide information regarding biases from the distributed data provided during the training?

References

[1] Q. Liu, C. Chen, J. Qin, Q. Dou and P. Heng, "FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021 pp. 1013-1023. https://doi.org/10.1109/CVPR46437.2021.0010y

[2] Subedi, R., Gaire, R.R., Ali, S., Nguyen, A., Stoyanov, D., Bhattarai, B. (2023). A Client-Server Deep Federated Learning for Cross-Domain Surgical Image Segmentation. Data Engineering in Medical Imaging. DEMI 2023. Lecture Notes in Computer Science, vol 14314. Springer, Cham. https://doi.org/10.1007/978-3-031-44992-5_3

[3] S. Ali, D. Jha, N. Ghatwary, S. Realdon, R. Cannizzaro, O.E. Salem, D. Lamarque, C. Daul, M.A. Riegler, K.V. Anonsen, A. Petlund. A multi-centre polyp detection and segmentation dataset for generalisability assessment. Scientific Data. 2023;10(1):75. https://doi.org/10.1038/s41597-023-01981-y

Requirements

Entry Requirements

Applicants to research degree programmes should normally have at least a first class or an upper second class British Bachelors Honours degree (or equivalent) in an appropriate discipline. The criteria for entry for some research degrees may be higher, for example, several faculties, also require a Masters degree. Applicants are advised to check with the relevant School prior to making an application. Applicants who are uncertain about the requirements for a particular research degree are advised to contact the School or Graduate School prior to making an application.

English Program Requirements

The minimum English language entry requirement for research postgraduate research study in the School of Computer Science is an IELTS of 6.5 overall with at least 6.5 in writing and at least 6.0 in reading, listening and speaking or equivalent. The test must be dated within two years of the start date of the course in order to be valid.

Federated learning for tackling multimodal and class-imbalance problems in healthcare

University of Leeds

United Kingdom,

Leeds

Federated learning for tackling multimodal and class-imbalance problems in healthcare

University of Leeds

University of Leeds

Key Facts

Start & Deadlines