The project to read the genomes of 70,000 species of vertebrates reports its first discoveries – sciencedaily
It’s one of the most daring projects in biology today – reading the entire genome of every bird, mammal, lizard, fish, and every other creature with a backbone.
And now comes the first major gain from the Vertebrate Genome Project (VGP): nearly complete, high-quality genomes of 25 species, Howard Hughes Medical Institute (HHMI) researcher Erich Jarvis with dozens of co-authors reports on April 28, 2021, in the newspaper Nature. These species include the big horseshoe bat, the Canada lynx, the platypus and the kākāpō parrot, one of the first high-quality genomes of an endangered vertebrate species.
The article also presents the technical advances that allow scientists to reach a new level of precision and completeness and paves the way for decoding the genomes of the approximately 70,000 vertebrate species living today, says the HHMI researcher and Study co-author David Haussler, computational geneticist at the University of California, Santa Cruz (UCSC). “We will get a spectacular picture of how nature has filled all ecosystems with this incredibly diverse array of animals.”
With a plethora of accompanying documents, the work begins to deliver on this promise. The project team discovered previously unknown chromosomes in the zebra finch, for example, and a startling discovery about the genetic differences between the brain of marmosets and the human brain. The new research also offers hope of saving the endangered kākāpō parrot and vaquita dolphin from extinction.
“These 25 genomes represent a milestone,” says Jarvis, VGP president and neurogeneticist at Rockefeller University. “We are learning a lot more than expected,” he says. “The work is proof of principle for what is to come.”
From 10K to 70K
The VGP milestone took years. The origins of the project date back to the late 2000s, when Haussler, geneticist Stephen O’Brien, and Oliver Ryder, director of conservation genetics at the San Diego Zoo, thought it was time to think big.
Instead of sequencing just a few species, such as humans and model organisms like fruit flies, why not read the complete genomes of ten thousand animals in a daring “10K Genome” effort? Back then, however, the price was in the hundreds of millions of dollars, and the plan never really took off. “Everyone knew it was a great idea, but no one wanted to pay the price,” recalls Beth Shapiro, HHMI researcher and HHMI professor, Beth Shapiro, evolutionary biologist at UCSC and co-author of the Nature paper.
What’s more, scientists’ early efforts to spell out, or “sequencing,” all the letters of DNA in an animal’s genome were riddled with errors. In the original approach used to complete the first raw human genome in 2003, scientists cut DNA into short pieces of a few hundred letters and read those letters. Then came the devilishly difficult job of putting the pieces together in the correct order. The methods were not up to par, which resulted in disassembly, significant deficiencies and other errors. Often, it was not even possible to map genes to individual chromosomes.
The introduction of new sequencing technologies with shorter reads made possible the idea of reading thousands of genomes. These rapidly developing technologies have reduced costs but also reduced the quality of the genome’s assembly structure. Then, in 2015, Haussler and his colleagues brought in Jarvis, a pioneer in deciphering the complex neural circuits that allow birds to cheat new tunes after listening to other people’s songs. Jarvis had previously shown a knack for handling large and complex efforts. In 2014, he and more than 100 colleagues sequenced the genomes of 48 species of birds, which revealed new genes involved in vocal learning. “David and others asked me to take the lead on the Genome 10K project,” recalls Jarvis. “They thought I had the personality for it.” Or, as Shapiro puts it, “Erich is a very arrogant leader, in a nice way. What he wants to happen, he will achieve. “
Jarvis expanded and renamed the idea of the 10K genome to include all vertebrate genomes. He also helped launch a new sequencing center at Rockefeller which, along with one at the Max Planck Institute in Germany led by former HHMI group leader Janelia Research Campus Gene Myers, and another at the Sanger Institute in the UK led by Richard Durbin and Mark Blaxter, currently produces most of the VGP genome data. He asked Adam Phillippy, a leading genome expert at the National Human Genome Research Institute (NHGRI), to chair the VGP assembly team. Then he found about 60 top scientists willing to use their own grant money to pay for the costs of sequencing at the centers to tackle the genomes they were most interested in. The team also negotiated with Maori in New Zealand and officials in Mexico to obtain samples of kākāpō and vaquita in “a fine example of international collaboration,” says Sadye Paez, VGP program director at Rockefeller.
The massive team of researchers has achieved a series of technological advancements. New sequencing machines allow them to read DNA fragments of 10,000 letters or more, instead of a few hundred. Researchers have also developed clever ways to assemble these segments into individual chromosomes. They were able to find out which genes were inherited from the mother and the father. This fixes a particularly thorny problem known as “false duplication,” where scientists mistakenly label maternal and paternal copies of the same gene as two separate genes.
“I think this work opens a very important set of doors, because the technical aspects of assembly have been the bottleneck for genome sequencing in the past,” says Jenny Tung, geneticist at Duke University, who was not directly involved in the research. Having high-quality sequencing data “will transform the types of questions people can ask,” she says.
The team’s improved accuracy shows that previous genome sequences are seriously incomplete. In zebra finches, for example, the team discovered eight new chromosomes and around 900 genes that were thought to be missing. Previously unknown chromosomes have also appeared in the platypus, as team members reported online in Nature earlier this year. The researchers also walked through and correctly assembled long stretches of repetitive DNA, most of which only contained two of the four genetic letters. Some scientists considered these stretches to be non-functional “waste” or “dark matter”. Wrong. Most of the repeats occur in regions of the genome that code for proteins, Jarvis says, suggesting that DNA plays a surprisingly crucial role in turning genes on or off.
This is only the beginning of what the Nature paper envisions it as “a new era of discovery in the life sciences”. With each new sequence in the genome, Jarvis and his collaborators discover new – and often unexpected, discoveries. Jarvis’s lab, for example, finally caught the regulatory region of a key gene that parrots and songbirds need to learn tunes; then his team will try to figure out how it works. The marmoset genome has several surprises in store. While the genes of the marmoset and human brain are largely conserved, the marmoset has several genes for human pathogenic amino acids. This highlights the need to take into account the genomic context when developing animal models, the team reports in an accompanying article in Nature. And in the conclusions also published last year in Nature, a group led by Professor Emma Teeling of University College Dublin in Ireland found that some bats have lost immunity-related genes, which may help explain their ability to tolerate viruses like SARS- CoV-2, which causes COVID-19.
The new information can also boost efforts to save rare species. “It is a crucially important moral duty to help endangered species,” Jarvis said. That’s why the team collected samples from a kākāpō parrot named Jane, which is part of a captive breeding program that brought the parrot back from the brink of extinction. In an article published in the new journal Cell Genomics, of the Cell family of journals, Nicolas Dussex of the University of Otago and his colleagues described their studies of Jane’s genes with other individuals. The work revealed that the last surviving kākāpō population, isolated on an island off New Zealand for the past 10,000 years, has somehow purged the deleterious mutations, despite the species’ low genetic diversity. A similar finding was observed for the vaquita, with around 10-20 individuals remaining on the planet, in a study published in Molecular Ecology Resources, led by Phil Morin at the National Oceanic and Atmospheric Administration Fisheries in La Jolla, California. “This means there is hope for the conservation of the species,” Jarvis concludes.
A clear path
VGP is now focusing on sequencing even more species. The project team’s next goal is to complete 260 genomes, representing all orders of vertebrates, and then raise enough funds to process thousands more, representing all families. This job will not be easy and will inevitably bring new technical and logistical challenges, says Tung. Once hundreds, if not thousands of animals easily found in zoos or laboratories have been sequenced, scientists may face ethical hurdles in obtaining samples from other species, especially when animals are rare or endangered.
But with the new document, the way forward seems clearer than it has been in years. The VGP model even inspires other large-scale sequencing efforts, including the Earth Biogenome project, which aims to decode the genomes of all eukaryotic species within 10 years. Perhaps for the first time, it seems possible to fulfill the dream shared by Haussler and many others of reading every letter in the genome of every organism. Darwin considered the enormous diversity of life on Earth to be “the most beautiful infinite forms”, observes Haussler. “Now we have an amazing opportunity to see how these shapes came to be.”