Basecamp researchers gathering genetic data in Malta
Greg Funnell
A British biotech firm called Basecamp Research has spent the past few years collecting troves of genetic data from microbes living in extreme environments around the world, identifying more than a million species and nearly 10 billion genes new to science. It claims that this massive database of the planet’s biodiversity will help train a “ChatGPT of biology” that will answer questions about life on Earth – but there’s no guarantee this will work.
Jörg Overmann at the Leibniz Institute DSMZ in Germany, which houses one of the world’s most diverse collections of microbial cultures, says increasing known genetic sequences is valuable, but may not result in useful findings for things like drug discovery or chemistry without more information about the organisms from which they were collected. “I’m not convinced that in the end the understanding of really novel functions will be accelerated by this brute-force increase in the sequence space,” he says.
Recent years have seen researchers develop a number of machine learning models trained to identify patterns and predict relationships amid vast amounts of biological data. The most famous of these is AlphaFold, which can predict the 3D structure of a protein based only on genetic data, and earned its creators at Google DeepMind the 2024 Nobel prize in chemistry.
While such “generative biology” models have grown ever more complex since, they haven’t gotten much better, says Frances Ding at the University of California, Berkeley. One reason could be a lack of biodiverse data. “Current models in biology are trained on datasets that disproportionately represent well-studied species (e.g., E. coli, mice, humans), and these models are worse at predicting properties about sequences from other parts of the tree of life,” she says.
Researchers at Basecamp set out to address this biodiversity gap. The company’s growing database now contains samples from more than 120 sites in 26 countries, according to a report the company posted. Jonathan Finn, the company’s chief science officer, says the collection e