skip to content

Department of Plant Sciences

A new major study led by Professor Ian Henderson, Head of the Department’s Genetic and Epigenetic Inheritance in Plants Group, has sequenced the genome of the world’s most widely used model plant species, Arabidopsis thaliana, at a level of detail never previously achieved. Prior to this, challenging regions of the genome remained unassembled, including the centromeres. These enigmatic chromosome regions have previously been impossible to probe due to their highly repetitive structure, leading to them being deemed the genomic equivalent of black holes. Now, for the first time, this study reveals the secrets of the Arabidopsis centromeres, shedding light on their evolution and providing insights into a paradox that has mystified scientists for decades.
Arabidopsis thaliana was adopted as a model plant due to its short generation time, small size, ease of growth and prolific seed production through self-pollination. Its fast life cycle and small genome make it well suited for genetics and to map key genes that underpin traits of interest. It has led to a multitude of discoveries and in 2000 it became the first plant to have its genome sequenced. This initial genome release was of an excellent standard in the chromosome arms, where most of the genes are located, but was unable to assemble the highly repetitive and complex centromeres, telomeres and ribosomal DNA repeats. Now, due to advances in sequencing technologies these challenging regions have been assembled for the first time.
Co-led by Professor Henderson and Professor Mike Schatz from Johns Hopkins University, the study, published in Science, is the first to successfully perform long-read sequencing and assembly of the Arabidopsis thaliana centromeres. Since the genome was first sequenced in 2000, long-read sequencing technologies have advanced allowing researchers to see the genome in >100,000 nucleotide pieces, instead of 100-200 nucleotide pieces. This is thanks to the introduction of nanopore sequencing, which measures electrical current as nucleic acids are passed through a protein nanopore - a hollow structure inserted in a membrane. As DNA passes through the nanopore, different nucleic acid bases change the current in distinct ways. The resulting electrical signal is then decoded to provide the specific DNA sequence. Nanopores have been touted as the ‘holy grail’ of sequencing because they enable the direct reading of DNA,” says Todd Michael, Research Professor at the Salk Institute for Biological Studies and collaborator on the study, “and now with two distinct nanopores we were able to more accurately read strands of DNA over several megabases, which was necessary to traverse the repeat arrays of the Arabidopsis centromeres.”
These data, combined with algorithmic advances that assemble the reads, means that the ‘genomic jigsaw puzzle' is suddenly possible in a way that it wasn’t previously. Professor Schatz adds “the centromeres are some of the most interesting, but also the most difficult regions of the genome to analyse - they are like endless ‘blue sky’ within a jigsaw puzzle. Fortunately, advances in sequencing paired with advances in the computational methods for genome assembly now make it possible to accurately assemble even the most challenging of sequences.” Critically, this also means that probing the genetic makeup of the centromere, which had previously proved to be a dead end due to its challenging structure, is now possible.
Each chromosome possesses a centromere, which is the site at which sister chromatids are held together. During cell division, the chromatid pair separates, and the centromere is the point of attachment for spindle fibres that pull each chromosome to opposite daughter cell. Unlike genes, which are amenable to mapping and sequencing, centromeres are challenging to assemble because they often consist of large, complex arrays of head-to-tail repeats. Now, this breakthrough study is finally able to shine a light on them. “It’s fantastic to be able to see into the centromeres for the first time and use this to understand their unusual modes of evolution,” remarks Professor Henderson.
For decades, researchers have been trying to understand the paradox of how and why centromeric DNA evolves with extraordinary rapidity, whilst remaining stable enough to perform its job during cell division. In contrast, other ancient parts of the cell that have conserved roles, such as ribosomes, which make proteins from mRNA, tend to be very slow evolving. Yet the centromere, despite its conserved role in cell division, is the fastest evolving part of the genome. This study, by revealing the genetic and epigenetic topography of Arabidopsis centromeres, marks a step change in our understanding of this paradox.
As part of the study, the compiled centromere maps provide new insights into the ‘repeat ecosystem’ found in the centromere. The maps reveal the architecture of the repeat arrays, which has implications for how they evolve, the chromatin and epigenetic states of the centromeres and a population of ‘selfish DNA’ called ATHILA transposons that integrate into these regions. The authors present a model that the centromeres are evolving via cycles of satellite repeat homogenisation and retrotransposon-driven diversification. Moving forward they want to use these maps as a foundation to understand how and why centromeres are evolving so rapidly.
This step change was only achieved by bringing together specialists in molecular biology, computer science and cell biology from across the world. Professor Henderson commented, “this was truly a huge team effort that we could only achieve by working together,” with the study involving researchers from the University of Cambridge, Johns Hopkins University, Salk Institute for Biological Sciences, University of Sussex, Gregor Mendel Institute, CEITEC Brno, University of Sheffield, University of Tokyo, Cold Spring Harbor Laboratory, LMU Munich and Max Planck Institute for Plant Breeding Research.
Following on from this important study, Professor Henderson and the team will be looking at how the satellite repeats are evolving, which recombination pathways are involved and whether it’s possible to catch this process ‘in the act.’ Professor Henderson explains “with the support of our funders the BBSRC and Human Frontier Science Program, we want to move forward and use this approach to map centromeres from diverse Arabidopsis species, and ultimately more widely throughout plants, to gain a larger scale perspective for how and why these regions evolve so rapidly, and why they adopt such complex repetitive structures.”

The Arabidopsis Col-CEN assembly recently published by the Professor Ian Henderson is now hosted by The Arabidopsis Information Resource. The genome release includes gene, CEN180 satellite repeat and ATHILA retrotransposon annotation and is available for download and to explore in JBrowse.