News

A new twist on data storage

By Mollie Rappe

Photography By Lonnie Anderson

Thursday, July 07, 2016

Sandia saving sensitive information securely in DNA

SANDIA BIOENGINEERS Marlene and George Bachand show off their new method for encrypting and storing sensitive information in DNA. Digital data storage degrades and can become obsolete and old-school books and paper require lots of space.  (Photo by Lonnie Anderson)

Z machine, just one of Sandia’s research and test facilities, generates 100-200 gigabytes of data per year. That is a lot of digital data to inscribe on hard drives or beam up to the “cloud.”

George Bachand (1132), a bioengineer at the Center for Integrated Nanotechnology, is exploring a better, more permanent method for encrypting and storing classified data: DNA. Compared to digital and analog information storage, DNA is more compact, more durable, and never becomes obsolete. Readable DNA was extracted from the 600,000-year-old remains of a horse found in the Yukon.

Seven miles of bookshelves

Tape- and disk-based data storage degrades and can become obsolete, requiring rewriting every decade or so. Cloud- or server-based storage requires a vast amount of electricity; in 2011 Google’s server farms used enough electricity to power 200,000 US homes. Furthermore, old-school methods require lots and lots of space. IBM estimated that 1,000 gigabytes of information in book form would take up seven miles of bookshelves. In fact, Sandia recently completed a 15,000-square-foot building to store 35,000 boxes of inactive records and archival documents.

“Historically, the national laboratories and the US government have a lot of highly secure information that they need to store long-term. I see this as a potentially robust way of storing classified information in the future to preserve it for multiple generations,” says George. “The key is how do you go from text to DNA and do that in a way that is safe and secure.”

George was inspired by the recording of all of Shakespeare’s sonnets into 2.5 million base pairs of DNA — about half the genome of the tiny E. coli bacterium. Using this method, the group at the European Bioinformatics Institute could theoretically store 2.2 petabytes of information — 200 times the printed material in the Library of Congress — in one gram of DNA.

Marlene Bachand (1132), a biological engineer at Sandia and George’s wife, adds, “We are taking advantage of a biological component, DNA, and using its unique ability to encode huge amounts of data in an extremely small volume to develop DNA constructs that can be used to transmit and store vast amounts of encrypted data for security purposes.”

The Bachands’ project, funded by Sandia’s Laboratory Directed Research & Development program, has successfully moved from the drawing board to letterhead. Using a practically unbreakable encryption key, the team has encoded an abridged version of the famous Truman letter establishing Sandia into DNA. They then made the DNA, spotted it onto Sandia letterhead, and mailed it — along with a conventional letter — around the country. After the letter’s cross-country trip, the team was able to extract the DNA out of the paper, amplify and sequence the DNA, and decode the message in about 24 hours at a cost of about $45.

Text to DNA and back again

To achieve this proof-of-principle, the first step was to develop the software to generate the encryption key and encrypt text into a DNA sequence. Andrew Gomez worked on this while he was an intern at Sandia; he is now at Senior

Scientific, a nanomedicine company at the University of New Mexico’s Science and Technology Park.

DNA is made up of four bases, commonly referred to by their one-letter abbreviations: A, C, G, and T. Using a three-base code, exactly how living organisms store their information, 64 distinct characters, letters, spaces, and punctuation, can be encoded, with room for redundancy.

For example, spaces make up on average 15 to 20 percent of the characters in a text document, an encryption key could specify that TAG, TAA, and TGA each code for “space” while GAA and CTC could code for “E”. This would reduce the amount of repetition — technically challenging for making and reading DNA — and make brute-force hacking more difficult.

The team’s first test was to encode a 180-character message, about the size of a tweet. Encoding the message into 550 bases was easy; actually making the DNA was hard.

“Our initial approach was very expensive, very time-consuming, and didn’t work,” says George with a chuckle. However, “there’s a new technology that’s come out and made the ability to take synthetic DNA, what are called gene blocks, and stitch them together into these artificial chromosomes. These changes have just happened within the last few years, which has made it pretty extraordinary. Now it is possible to readily make these gene blocks right on the bench top and it can be done in large, production-scale pretty quickly.”

Identifying potential national security applications

Since successfully encoding, making, reading, and decoding the 180-character message and the 700-character Truman letter, George and Marlene are now working on even longer test sequences. However, what the Bachands really want to do is move beyond tests and apply their technique to national security problems.

 “We have achieved the proof-of-principle. Yes, it is possible. Now the big challenge for us is identifying the potential applications,” says George. “Using DNA to store information is pretty cool, it’s science-fiction-y, but the real question is it really good for anything? Can it really supplant any of the current technology and where we’re headed in the future?”

Two possible applications the team has identified are storing historical classified documents and barcoding/watermarking electromechanical components, such as computer chips made in MESA, Sandia’s DoD-certified fabrication facility, prior to storage.

George imagines encoding each component’s history — when it was manufactured, the lot number, starting material, even the results of reliability tests — into DNA and spotting it onto the actual chip. Instead of having to find the serial number and look up that metadata in a digital or paper-based database, future engineers could swab the chip itself, sequence the DNA, and get that information in a practically tamper-proof manner.

Recoverable for 100s of years

To test the feasibility, Marlene spotted lab equipment with a test message, and was able recover and decode the message, even after months of daily use and routine cleaning. DNA spotted onto electronic components and stored in cool, dark environments could be recoverable for hundreds of years.

Another, more straightforward application for the Bachands’ DNA storage method would be for historical or rarely accessed classified documents. DNA requires much less maintenance than disk- or tape-based storage and doesn’t need lots of electricity or tons of space like cloud- or paper-based storage. But conversion of paper documents into DNA requires the “cumbersome” process of scanning, encrypting, then synthesizing the DNA, admits George. Making the DNA is the most expensive part of the process, but the cost has decreased substantially over the past few years and should continue to drop.

“I hope this project progresses and expands the biological scope and nature of projects here at Sandia. I believe the field of biomimicry has no boundaries. Given all of the issues with broken encryption and data breaches, this technology could potentially provide a path to address these timely and ever-increasing security problems,” says Marlene.

Home page thumbnail images credit: Christoph Bock (Max Planck Institute for Informatics)