Summary
The project aims to develop generalisable models in a distributed Federated Learning framework by investigating novel ways for boosting model performance, improving generalisability, tackling class imbalance, and addressing potential biases in data at client locations. The selected candidate is expected to work in a multi-disciplinary team, for example, with existing clinical collaborators.
Full descriptionBackground
Advancements in data-driven machine learning (ML) techniques have shown tremendous potential in various sectors, including healthcare. However, training a machine learning model and its deployment still needs to be improved as it is challenging in healthcare due to significant data heterogeneity and lack of access to big data. Data sharing between different sites is intractable due to privacy concerns and regulatory challenges. Most ML techniques are data-voracious and require large datasets to generalise on a particular population or data centre distribution. It is often not feasible to obtain large heterogeneous labelled datasets as obtaining ground truth labels is tedious and time-consuming, requiring expert time, which is expensive. As a result, most centres usually have small local datasets that are insufficient to train a model with high accuracy and good generalisability. Furthermore, data from a specific centre can only be biased to a particular population.
Federated learning (FL) allows to train a model across decentralised devices or servers holding data locally which safeguards privacy and data security at the same time aiming to leverage the data from other centres in a distributed way. This increases the training dataset size and hence tackles the above-mentioned limitations in the medical domain. Even though several techniques in FL have been proposed in the past, utilising multi-modal data (both images with various modalities, and text) for multi-task learning (e.g., detection and diagnosis) has been limited and widely steered around model aggregation/fusion technique and fine tuning at client locations locally to avoid risk of data exposure. ML models trained in FL setting can still suffer from performance gap between seen and unseen patient settings and modality differences [1-2].
Objectives
The project aims to develop generalisable models in a distributed FL framework by investigating novel ways for boosting model performance, improving generalisability, tackling class imbalance, and addressing potential biases in data at client locations.
A few multimodal datasets will be provided at the start of the project. Tasks of the project is purely research that lies on two fundamental questions – 1) can we leverage multi-modal data from various centres to enhance FL performance and its generalisability? and 2) can we devise a technique to uplift local performance, tackle local class-imbalance, and provide information regarding biases from the distributed data provided during the training?
References
[1] Q. Liu, C. Chen, J. Qin, Q. Dou and P. Heng, "FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021 pp. 1013-1023. https://doi.org/10.1109/CVPR46437.2021.0010y
[2] Subedi, R., Gaire, R.R., Ali, S., Nguyen, A., Stoyanov, D., Bhattarai, B. (2023). A Client-Server Deep Federated Learning for Cross-Domain Surgical Image Segmentation. Data Engineering in Medical Imaging. DEMI 2023. Lecture Notes in Computer Science, vol 14314. Springer, Cham. https://doi.org/10.1007/978-3-031-44992-5_3
[3] S. Ali, D. Jha, N. Ghatwary, S. Realdon, R. Cannizzaro, O.E. Salem, D. Lamarque, C. Daul, M.A. Riegler, K.V. Anonsen, A. Petlund. A multi-centre polyp detection and segmentation dataset for generalisability assessment. Scientific Data. 2023;10(1):75. https://doi.org/10.1038/s41597-023-01981-y
