|
Main page Cancer blog Health blog Articles Resources
Standards for a new genomic era
In a recent issue of Science, Los Alamos geneticist Patrick Chain and his colleagues presented six labels for genome sequence data that are, or will become, available in public databases rather than the two labels used today. The six labels would roughly characterize the completeness and accuracyand consequently, the potential reliabilityof genetic sequencing data. This is of great importance since scientists use such data on a daily basis for cross-referencing unknown genetic material with the genetic material of known organisms. Every living organism with DNA has chromosomes containing the four molecular building blocks, or base pairs, represented by letters A, T, G, and C. One chromosome can contain millions of base pairs arranged like rungs on a ladder of DNA. The base pairs are arranged in sets of specific sequences that make up genes. These gene sequences can contain genetic instructions that help or harm an organismfor example by encoding enzymes that digest certain foods, or inducing cellular aberrations that give rise to certain diseases. Genome scientists have catalogued genetic data from thousands of organisms and placed them in publicly available libraries. Scientists can use these libraries to crosscheck genetic data, for example when attempting to isolate an unknown public health threat, or to determine where a potentially helpful or harmful gene appears to be located on an organism's chromosome. For scientific fields such as biofuels research or environmental remediation, genetic data could help scientists determine whether microorganisms can efficiently break down plant matter to aid in ethanol production, or digest environmental contaminants like hydrocarbons. However, because of the complexity of genetic data, genetic information in public libraries can range from very rough to very refined. In the past, genetic data has been classified either as "draft" or "finished," leaving a wide range of uncertainty about the potential accuracy of genetic data. "In the past few years we've seen major advances in genetic sequencing technology, so we've seen an explosion in the amount of publicly available data," said Chain, who is main author of the Science paper. "The amount of base-pair sequencing data generated each day is in the billionsorders of magnitude larger than what was generated a few years ago. Different sequencing technologies have different levels of accuracy. High degrees of uncertainty in a sequence can potentially lead a researcher down a wrong path that they could follow for a year or more. We now have a need for standards that will provide scientists with an unambiguous estimation of the quality of genetic sequence data". Working with scientists from genome sequencing centers big and smallincluding the U.S. Department of Energy's Joint Genome Institute, the Sanger Institute, the Human Microbiome Project Jumpstart Consortium sequencing centers, Michigan State University, and the Ontario Institute for Cancer Research, among othersChain and his colleagues have proposed that sequence data be placed into one of six categories that augment the existing two categories. The six standards range from "standard draft sequence," representing minimum requirements for public submission, to a "finished sequence," the highest standard, which can be verified to contain only one sequencing error per 100,000 base pairs. "My hope is all the major genome centers and advanced genomics groups use the gradations that fit their needs," said Chris Detter, LANL Genome Science Group Leader and Joint Genome Institute-LANL Center director. "Some centers may want all six, while some may only want three, but as long as they keep them intact, we are in good shape. Then, my hope is that the smaller genomics groups adopt the classes as written to help the rest of the scientific community know what they are generating and submitting". Posted by: Emily Source |
|