For the first time, a new public database will link genetic data with records of where and when the samples it was taken from were collected, making it easier for researchers to share and reuse genetic data for environmental and ecological analyses. The resource, called the Genomic Observatories Metadatabase (GeOMe), was developed by researchers at the Smithsonian’s National Museum of Natural History in collaboration with researchers at California State University Monterey Bay5, UC Berkeley and six other museums and research institutions.
Until now information on where and when genetic data was collected has been missing from widely shared public databases. Such information, about the environment, location and date of each biological sample is critical for comparing biodiversity in different locations worldwide and tracking it across time. Despite calls for more data sharing within the research community, until now researchers have lacked the tools to make this information readily available.
Developers of the database, which was described Aug. 3 in the journal PLOS Biology, say that standardizing and preserving this metadata will greatly enhance the value of the genetic sequence data that researchers are already collecting. They might investigate, for example, how the inhabitants of a specific altitude throughout the world have shifted as our planet’s climate has changed, or assess the stability of microbial communities facing increasingly acidic marine environments.
GeOMe’s developers include Christopher Meyer at the Smithsonian’s National Museum of Natural History, Eric Crandall at California State University Monterey Bay, Michelle Gaither at CSUMB and Hawai’i Institute of Marine Biology, and John Deck at the Berkeley Natural History Museums.
“Tracking biodiversity through global change is a collaborative effort,” said Christopher Meyer, a Smithsonian research zoologist who helped lead GeOMe’s development. “We can’t do it on our own. GeOMe will advance big data and discovery for the future, allowing the sum of scientific endeavors to far exceed individual research products.”The team has worked to ensure that the resource is easy to use and adaptable for a wide range of needs. With the database and toolkit freely available to the research community, scientific journals can now mandate that authors make their metadata available in a searchable and standardized format, just as they have long done for genetic sequence data, they say.
The team has worked to ensure that the resource is easy to use and adaptable for a wide range of needs. With the database and toolkit freely available to the research community, scientific journals can now mandate that authors make their metadata available in a searchable and standardized format, just as they have long done for genetic sequence data, they say.
Scientists who analyze ecological samples—whether they are plants or animals or entire communities of microbes, gathered from the oceans, freshwater, or on land—have their own individual systems for keeping track of when are where those samples were collected. But for the broader research community, such information has been difficult to share and obtain and impossible to comprehensively search. GeOMe provides a solution by permanently linking information about samples’ temporal, environmental, geospatial, and scholarly context to genetic sequence data stored by the National Center for Biotechnology Information.
The researchers say they devoted the time and resources to developing GeOMe because they knew it would be a powerful tool to accelerate discovery. “Genomic data are the foundational layer for our understanding of biodiversity–but until now it has been difficult to put them in their environmental and geospatial context,” Eric Crandall says. “Biodiversity scientists put a lot of time and effort into developing genomic datasets, and these data deserve to be stored in a way that will maximize their potential for reuse and further discovery.”
GeOMe’s development was a collaboration between researchers and computer scientists at the following institutions: the Smithsonian Institution’s National Museum of Natural History; Berkeley Natural History Museums at the University of California, Berkeley; the Hawai’i Institute of Marine Biology at the University of Hawai’i; Biocode; Texas A&M University; the University of California’s Gump South Pacific Research Station, in Moorea, French Polynesia; Berkeley Institute for Data Science at the University of California; the University of Queensland in Australia; and California State University Monterey Bay.
Data in GeOMe will conform to standards developed by the Genomic Standards Consortium and the Biodiversity Information Standards organization, ensuring that submitters capture and record the same essential information about every sample. Employing these standards is essential to ensure that in the future, researchers will be able to conduct analyses across datasets.
Funding for this study was provided by the National Science Foundation, the Gordon and Betty Moore Foundation and the National Oceanic and Atmospheric Administration. GeOMe reached its current level of development with NSF funding to the Diversity of the Indo-Pacific Network.