Search

Chat With Us

    Fast algorithms for efficient biosequence search
    Go to University of Leeds
    University of Leeds

    Fast algorithms for efficient biosequence search

    University of Leeds

    University of Leeds

    flag

    United Kingdom, Leeds

    University RankQS Ranking
    83

    Key Facts

    Program Level

    PhD (Philosophy Doctorate)

    Study Type

    Full Time

    Delivery

    On Campus

    Campuses

    Main Site

    Program Language

    English

    Start & Deadlines

    Next Intake DeadlinesOctober-2026
    Apply to this program

    Go to the official application for the university

    Next Intake October-2026

    Fast algorithms for efficient biosequence search

    About

    Summary

    DNA sequencing and the following analysis of biosequences become a core procedure for many problems in biological research, such as medical diagnosis, development of target medicine, virology, forensic sciences, etc. Examples of biosequence search include the detection of mutations within human DNA in cancer studies and searching for matching patterns in microbial genomes.
    With an ever-increasing volume of data to process, the development of efficient search algorithms becomes of paramount importance.

    Seeding is one of the techniques which is extensively used in biosequence alignment problems to speed up the searching procedure. A few seeding algorithms have been proposed in the past years. The first seeds which have been suggested represent small contiguous matching patterns and explore the seed-and-extend paradigm. Later the idea has been developed into spaced seeds (binary/ternary), allowing mismatches.

    Another technique is k-mers, a set of short matching patterns of length k which is often used in alignment-free methods.

    However, finding optimal seeds/k-mers, which maximize efficiency, is still a topic for study. Usually, efficiency can be assessed as a compromise between the algorithm’s time complexity and its sensitivity (ratio of correctly aligned sequences). Different studies suggest different methods for increasing efficiency. Often the selection is based on a certain probabilistic model and pre-defined properties of seeds.

    In this project, the student will look into the problem of generating optimal seeds/k-mers to improve the efficiency of biosequence searching algorithms. The focus will be made on the development of a mathematical framework to investigate and demonstrate the efficiency of suggested seeds’ structures.

    Full description

    It has been shown in [1] that spaced seeds may be more efficient than contiguous seeds. Other designs of seeding strategy have been suggested, such as adaptive seeds and minimisers to speed up the search at the price of decreasing sensitivity.

    Seeds generators which produce seeds of a certain structure to satisfy given parameters have been proposed in [2] and [3]. The properties of optimal seeds have been investigated in [4] and a list of optimal seeds for predefined structural properties has been suggested.

    An example of investigating an optimal parameter for k-mers in error correction tools can be found in [5].

    [1] Keich Uri, Li Ming, Ma Bin, Tromp John, On spaced seeds for similarity search. Discrete Applied Mathematics. https://doi.org/10.1016/S0166-218X(03)00382-2

    [2] Brejová, B., Brown, D. G., & Vinar, T. (2004). Optimal spaced seeds for homologous coding regions. Journal of bioinformatics and computational biology, 1(4), 595–610. https://doi.org/10.1142/s0219720004000326

    [3] Kucherov, G., Noé, L., & Roytberg, M. (2006). A unifying framework for seed sensitivity and its application to subset seeds. Journal of bioinformatics and computational biology, 4(2), 553–569. https://doi.org/10.1142/s0219720006001977

    [4] Valeriy Titarenko, Sofya Titarenko. PerFSeeB: Designing Long High-weight Single Spaced Seeds for Full Sensitivity Alignment with a Given Number of Mismatches, 15 November 2021, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-1051543/v1]

    [5] Sharma, A., Jain, P., Mahgoub, A. et al. Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing. BMC Bioinformatics 23, 25 (2022). https://doi.org/10.1186/s12859-021-04547-0

    Requirements

    Entry Requirements

    Applicants to research degree programmes should normally have at least a first class or an upper second class British Bachelors Honours degree (or equivalent) in an appropriate discipline. The criteria for entry for some research degrees may be higher, for example, several faculties, also require a Masters degree. Applicants are advised to check with the relevant School prior to making an application. Applicants who are uncertain about the requirements for a particular research degree are advised to contact the School or Graduate School prior to making an application.

    English Program Requirements

    The minimum English language entry requirement for research postgraduate research study is an IELTS of 6.0 overall with at least 5.5 in each component (reading, writing, listening and speaking) or equivalent. The test must be dated within two years of the start date of the course in order to be valid. Some schools and faculties have a higher requirement.

    Fee Information

    Tuition Fee

    GBP 0 

    Application Fee

    GBP  
    University of Leeds

    Fast algorithms for efficient biosequence search

    University of Leeds

    [object Object]

    United Kingdom,

    Leeds

    Similar Programs

    Other interesting programs for you

    Find More Programs
    Wishlist