Health beyond the headlines
ENCODE junk DNA

Existential Genetics

“Junk DNA” Begins to Find its Purpose in Life with ENCODE

For decades, genetics has maintained that the majority of our DNA is without a real functional purpose. In 1972, geneticist Susumu Ohno coined the term “junk DNA” to refer to non-coding DNA, or DNA between genes that doesn’t actually send chemical messages anywhere to do anything. Scientists speculated that perhaps all of those additional ACTG sequences were just genetic packing peanuts in which the can-do DNA of genes was nestled.

Until now.

The ENCyclopedia Of DNA Elements (ENCODE) Project has found that what was thought previously to be “junk DNA” serves purposes far beyond that of biological bubble wrap. About 80% of DNA is active, which means that DNA formerly relegated to the rubbish bin could be critical to sending messages vital to human life—and health.

Decoding ENCODE

Launched in September 2003 by the National Human Genome Research Institute (NHGRI), the ENCODE Project is a public research consortium, led by NHGRI program staff, comprising scientists from academia, the federal government, and the private sector with grants awarded through ENCODE-related Requests for Applications (RFAs) issued by NHGRI. (RFAs are formal calls for research in a particular scientific area to fulfill specific program objectives by one or more of the 27 Institutes and Centers at the National Institutes of Health; most often, funds are set aside in the budget of the soliciting Institute(s) for RFAs.)

The ENCODE Project aims to identify and describe the elements of the human genome that direct our development and functioning. Essentially, the ENCODE Project extends the what of the Human Genome Project, which catalogued the sequence of the human genome, to help answer questions of how, why, when, and where. As the ENCODE Project Consortium explained in Science in 2004, “Our collective knowledge about putative functional, noncoding elements, which represent the majority of the remaining functional sequences in the human genome, is remarkably underdeveloped at the present time.”

Just as we cannot understand population patterns of health and disease without mountains of data, so goes genetics. And both require analytic lenses through which to make sense of massive amounts of information.

The initial phases of the ENCODE Project were concerned with putting together the right set of tools to illuminate what goes on in the human genome—that is, the critical elements and processes by which ACTG turns into you and me.

ENCODE sought to sort out which existing computational and experimental methods were best suited to probing deeply into a specific swath of the human genome sequence. A companion objective was to develop new high-throughput methods—rapid, automated processing of genetic material enabling millions of analyses—that could be used to identify functional elements of the human genome.

These initial phases brought together both experimental and computational scientists from 35 labs, who provided over 200 data sets focused on a specified target of about 1% of the human genome. If the human genome were a quilt, this would be a tiny square. But, in fact, what these scientists examined consisted of pieces of this tiny square from all over the quilt that is our genetic makeup. Roughly half of that 1% came from 14 biologically well-characterized genomic regions selected for this expressed purpose. The other half came from other, more poorly understood genomic regions, which were selected randomly using a systematic approach.

The explicit goal of these initial phases, as the Consortium explained in Nature in 2007 was “to establish redundancy with respect to the findings represented by different data sets.” In other words, scientists wanted to see if they could take pieces of this quilt produced by a number of different groups, each using different machines, fabric, and thread, and generate a cohesive pattern on which they all agreed.

And that they did.

The results of these initial phases indicated that “junk DNA” has a purpose. And most importantly, these findings renewed the ENCODE Project’s license to explore and quieted some of the skepticism with which the Project was met initially (and continues to meet).

Less than five years after the publication of the pilot results, the ENCODE Project Consortium, which extended the 200 data sets in the initial phases to 1,640, has produced a dizzying array of results presented in 30 papers in three different journals, with the potential to turn some commonly held knowledge on its head, shifting our understanding of how the human genome makes us human, healthy or otherwise.

So just how wrong were we to talk trash about “junk DNA” and what might it mean for population health?

Location, location, location

The spaces between genes—genetic alleyways, as it were—provide homes to “junk DNA.” And while “junk DNA” may reside in genetically sketchy spaces, it is employed. Enhancers, or regulatory DNA bits, and promoters, which set in motion the process by which DNA is transcribed into RNA, occupy those spaces.

Neighbors also include a set of regional managers that stitch together proteins and RNA, forming a regulatory duo in charge of genes that encode proteins. In addition, non-coding DNA provides a context in which new functional molecules can be created, according to Inês Barroso of the Wellcome Trust Sanger Institute, the University of Cambridge Metabolic Research Laboratories, and the NIHR Cambridge Biomedical Research Centre.

Thus, it seems that those genetic alleyways are actually major thoroughfares whose residents aren’t just loitering biologically. And this has potentially important implications for our understanding of the genetic causes of disease.

“Junk DNA” may prove essential to understanding fully knowledge gained from genome-wide association studies (GWAS), which connect traits and diseases with DNA sequence variations. The bulk of the variants identified in association with human diseases in GWAS studies has been found in non-coding regions, which has posed a barrier to understanding how gene X is related to trait or disease Y.

As a result, “junk DNA” may offer “new leads for linking genetic variation and disease,” writes Joseph Ecker, of the Howard Hughes Medical Institute and the Salk Institute for Biological Studies. And as Barroso explains, ENCODE “demonstrates that non-coding regions must be considered when interpreting GWAS results, and [the ENCODE research] provides a strong motivation for reinterpreting previous GWAS findings.”

Get the message?

One set of findings from the ENCODE Project shows that cells are the sites where eventually approximately 75% of the human genome is transcribed. Furthermore, it shows that genes themselves are laced with other message-sending genetic material that may be essential to human development and functioning.

This could be key to understanding when, how, and why healthy cells might go awry. For example, if we can better understand the kinds of things that can go wrong with the intracellular transcription of genetic material, as well as the content generated by the material around genes, perhaps we can get a better handle on the genetic origins of disease. (Maybe this is akin to genetic peer pressure: if genetic material surrounding genes is sending negative messages, then good genes might be led astray.)

The findings from the ENCODE Project are certainly pertinent to the approach scientists currently take—or will in the future—to elucidating connections between genes and diseases. More fundamentally, however, they “force a rethink of the definition of a gene and of the minimum unit of heredity,” according to Ecker.

The ENCODE Project is massive in scope, and has many threads that we can only begin to follow. It is part of a grand arc of discovery in the biological sciences ushered by the discovery of the double helix and buttressed at the beginning of this millennium by the Human Genome Project. In essence, the ENCODE Project is now to genetics what the Large Hadron Collider is to physics. (Except ENCODE does not pose potential risks of warping our time-space continuum. Advantage: ENCODE.)

Only time will tell if the ENCODE Project is capable of delivering information that can translate into better treatments for, or the prevention of, disease. But for now, “junk DNA” seems to have begun to find its purpose in life.

Edited by Abdul El-Sayed.

 

Elevate the conversation

 
The views and opinions expressed on this website are solely those of the authors and do not represent those of the Department of Epidemiology, the Mailman School of Public Health, or Columbia University.