Summary
Next generation sequencing makes it feasible to sample thousands of individuals to understand the causes and development of life-threatening diseases such as cancer. Today it is the scientists themselves rather than sequencing technology that is the bottleneck in the analysis of such sequencing data. This is due to a lack of suitable interactive visualization tools and, as a consequence, scientists overreliance on automated processing methods. Where visualization is used, scientists tend to rely on ordinary computer displays, despite UHD (4k) displays being a commodity and even higher definition displays (UHD+; i.e., Powerwalls with multiple 4k displays, etc.) being readily available. Therefore, the overall aim of this PhD is to investigate how UHD displays can transform how scientists use visualization to analyse genomics data.
That aim has several facets, which may be divided into investigations with the following objectives (NB: any single PhD will focus on some of the objectives, not all! Similarly, a PhD will focus on a particular type of genomic analysis, not the analysis of all genomic data):
1. Investigate scientists workflow to understand the data analysis pipeline, barriers that scientists face, assumptions/simplifications they are forced to make, and where visualization is (not) currently used.
2. Determine how scientists use visualization, including characterising the tasks they perform and quantifying the amount of interaction (pan, zoom, filter, select, etc.). This will identify where inefficiencies occur and, therefore, where UHD+ displays should be beneficial.
3. Investigate the scale at which conventional visualizations break down (e.g., become inefficient to use or difficult to gain insights from). This will involve user experiments that identify the circumstances under which users become overwhelmed by the quantity of data, the extent to which the problems are addressed by: (a) increasing display resolution, (b) particular visual encodings, (c) certain interaction methods (pan, zoom, filter, select, etc.), and (d) analysis aids (analysis wear, algorithms to find patterns like this, etc.).
4. Investigate how information-rich visualizations should be designed to address the barriers, simplifications, inefficiencies and breakdowns (see above). This includes analysing the design space of information-rich visualizations for different data analysis tasks, and conducting user experiments to gather evidence about factors in the design space.
5. Develop and evaluate proof of concept software that implements solutions from the above objectives, to show how UHD+ visualization can transform scientists analysis of genomics data
