Roland Faure

I am a recently graduated PhD student working on sequence bioinformatics. I like imagining and implementing algorithmic solutions for biological problems. During my Ph.D., I focused on genome assembly, more specifically the problems of haplotyping and metagenome assembly. For my postdoc, I am moving on to indexation problems.

Career

Postdoc

2025-?

Aligning on the SRA

In Rayan Chikhi's lab in the institut Pasteur, France. The goal is to search efficiently through the Logan database, in order to provide a "super-BLAST" that would work on the full SRA. We'll see what happens...

Ph.D.

2021-2024

Haplotype assembly from long reads

The Ph.D. manuscript is available here.

Under the joint supervision of Dominique Lavenier in the Genscale team in Rennes, France and Jean-François Flot in the EBE team at the Université libre de Bruxelles, Belgium. The main work of my Ph.D. consisted in developping methods to obtain haplotype-separated assemblies. In the realm of noisy long reads, this yielded HairSplitter (sse below). In the realm of highly accurate long reads, this yielded the Alice assembler (see below).

Software, publications & talks

Alice

Extremely fast and haplotype-aware assembly of high-fidelity long reads.

Existing fast high-fidelity reads assemblers (mDBG, metaMDBG) are based on k-min-mers sketches of the reads. This sketching method makes it impossible to assemble close strains. The idea of Alice is to use a new sketching method, MSR sketching, which does not have the same shortcoming. Alice is extremely fast while being haplotype-aware.
Presented as a talk at SeqBIM 2023 and Genome Informatics 2024 , where I was granted a bursary to come and present it.

The closest thing to a paper there exist yet is Chapter 2 of my Ph.D. thesis

Hairsplitter

Separating noisy long reads into an unknown number of haplotypes

Link to the Peer-Community Journal paper

The goal of Hairsplitter is to produce contiguous and uncollapsed (i.e. with all haplotypes still present) (meta-)genome assembly. It takes an assembly and sequencing reads as input, detects if contigs have been collapsed and output the corresponding uncollapsed assembly. Written in C++. Available on GitHub. The most complete paper yet can be found on biorxiv.
Presented as a poster at Genome informatics 2022 and ISMB/ECCB 2023 . See the poster. Presented as a talk at SeqBIM 2022, JOBIM 2023 and ISMB/ECCB 2023 .

Link to the ILP improvement paper (presented at BIOSTEC 2024)

An Integer Linear Programming version of HairSplitter was developed with Tam Truong and Rumen Andonov and presented at 17th International Joint Conference on Biomedical Engineering Systems and Technologies , where it received the best student paper award. It is available on github

Graphunzip

Untangling assembly graphs to finish assemblies

Developped under the supervision of Nadège Guiglielmoni and Jean-François Flot

Link to the bioRxiv paper

GraphUnzip untangles assembly graphs with the help of long reads and/or Hi-C. It is designed to distinguish the different haplotypes present. Written in Python. Available on GitHub.
Presented the key idea for the first time at JOBIM 2021
The long read algorithm was presented for the first time at SeqBIM 2021 thanks to a grant from the SFBI.

QuickDeconvolution

Rapidly separating barcoded reads into groups

Developped under the supervision of Dominique Lavenier

Link to the published paper (Bioinformatics advances)

QuickDeconvolution aims at answering the barcode deconvolution problem when dealing with barcoded reads. It strives at being scalable and parallelisable. Written in C++. Available on GitHub.

Reviews

Recomb-SEQ 2024

Reviewed a paper with the help of Claire Lemaitre, February 2024

ISMB/ECCB 2024

Reviewed a paper with the help of Dominique Lavenier, January 2024

NAR Genomics and Bioinformatics

Co-reviewed a paper with Karel Brinda, October 2022

Students

Tam Minh Khac Truong

StrainMiner: Data mining and discrete optimizations for strains separation in metagenomes using long reads

Two masters internships of 4 months (2022) and 6 months (2023). Co-supervised with Rumen Andonov
Worked on a new approach to separate error-prone sequencing reads from several strains of the same species in metagenome. The new approach is based on integer linear programming and showed good performance. This work has been integrated into HairSplitter and presented at the 17th International Joint Conference on Biomedical Engineering Systems and Technologies . It is available on github .

Olivier De Thier

Improving ascidian de novo genome assemblies with long reads

Master thesis of 6 months (2023). Co-supervised with Jean-François Flot and Stefano Tiozzo.
This master thesis produced new ascidian genomes of unprecendented quality for Botryllus schlosseri and Polyandrocarpa zorritensis

Baptiste Hilaire

Sequence reduction to increase the speed of bioinformatic tools

Bachelor internship of 6 weeks (2023)
This internship explored the properties of Mapping-friendly Sequence Reductions (MSRs) , with the final objective to increase the speed and/or decrease the memory needed for some bioinformatic pipelines.

Teaching

Initiation to Java programming

20hx2 of practical sessions with 1st year bachelor computer science students at the university of Rennes, Fall 2021 and Fall 2022

Python object-oriented programming

34hx2 of practical sessions with 1st year master bioinformatics students at the university of Rennes, Winter 2022 and Winter 2023

Bioinformatics crash course

A full day crash-course (6h) of bioinformatics to high-school students selected to represent Belgium at the International Biology Olympiad, Brussels, May 2022, 2023 and 2024.

Educational background

2020 - 2021

Master "Bioinformatics & modelisation"

Sorbonne universités

Paris, France

2017-2020

Engineering school

École polytechnique

Saclay, France

2015 - 2017

Math & Physics preparatory school

Lycée Aux Lazaristes

Lyon, France

Contact

GitHub
RolandFaure
Bluesky
@rfaure