Hacking DNA. No, really.

Apr 02, 2016

Last year a new genetic engineering technology called CRISPR - Clustered Regularly Interspaced Short Palindromic Repeats - showed up on my radar at a local conference. Long story short, CRISPR is a highly precise technique for editing DNA in situ which follows from the discovery of short sequences of DNA which allow for precise location of individual genes. It's a fascinating technology; there are even tutorials (archived copy, just in case) online for developing your own guide RNA to implement CRISPR/Cas9. What you might not have known is that CRISPR/Cas9 is being actively studied as a theraputic technique in humans due to the amazing amount of success it's shown in modifying the genomes of other forms of life. At Temple University in Philadelphia, Pennsylvania earlier this year molecular biologists successfully used the technique to hack the DNA of cultured human T-cells in vitro that were infected with HIV and delete the HIV DNA entirely. Moreover, when re-exposed to HIV the hacked T-cells were observed to show immunity to the virus. Further observing the cells after they'd been modified showed that no adverse effects were introduced - the cells were healthy, happy, and just as effective post-CRISPR/Cas9 modification as pre-infection with HIV. The research team's peer-reviewed findings were published in the journal Nature in February of 2016, and the paper went open access online in March of this year.

For many years, computer geeks like myself have best understood DNA through the lens of a programming language. It uses a base-4 code with four nucleotides instead of quaternary digits - adenine, cytosine, guanine, and thymine. Nucleotides appear in DNA as codons, or triplets of nucleotides (for example, AAG or TTC) that correspond to one of the 61 amino acids or one of three stop signals which cellular machinery responds to. A length of DNA is a string of codons that represent sequences of amino acids that can be assembled into proteins during the process of translation inside of the cell's cytoplasm. When DNA is interpreted to construct proteins it's uncoiled and unzipped into two separate strands to manufacture messenger RNA in a process called transcription. The DNA rezips and repacks itself for later while the mRNA is used in the manufacture of a protein when it's run through an organelle called a ribosome which passes the mRNA piece by piece through itself, sticking free-floating amino acids against the strip of base pairs in sequence at a rate of about two hundred amino acids or approximately six hundred mRNA bases per minute. The amino acids hook end-to-end with each other in a long chain that coils itself up into a completed protein. This animation is an awesome (in the original definition of the word) depiction of this essential process.

In introductory computer science we often speak of a Turing machine, a hypothetical device which is simple in structure: There is a readable, writeable tape of arbitrary (usually infinite) length divided into discrete cells, a read head which looks at the contents of a cell on the tape beneath it, a write head which can either change or delete the contents of a cell on the tape beneath it, a motor which can move the tape left and right, and a lookup table that maps symbols on the tape to things the machine can be told to do, which includes not only operating parts of the Turing machine's mechanism but how and when the change the symbols on the tape (when to change a 1 to a 0; when to change an 'a' to a 'b'; when to erase a cell; and so forth). There are some variants of the Turing machine which are capable of using more than one tape, all of which can move independently of one another. If that's tricky to imagine... it is. Here's a brief video of how one kind of Turing machine can operate to solve a simple math problem, with a visual demonstration of what it would actually do. There is also a class of Turing machine called a universal Turing machine which is capable of computing anything that can be computed (the proof is called the UTM theorem if you want to dig more deeply into it). To put it another way, if there is a mathematical problem that can be solved, a UTM can run an algorithm (a program, for our purposes) to solve it. It's also worth noting that Marvin Minsky discovered a universal Turing machine that has seven internal states and uses four symbols and a 2-tag system, where the tape is a first-in-first-out queue and the UTM reads a symbol off of the tape, skips over or deletes a fixed number of additional symbols on the tape, and appends another symbol.

Sound familiar?

A research team spanning MIT, Boston University, and NIST developed a programming language that lets the user design custom DNA which acts like circuitry that reprograms living cells in vivo. The programming language, called Cello (which is available on Github!) is based upon another language called Verilog, which is a programming language used to describe electronic circuitry, like the guts of processor cores. Right now Cello offers primitives, the most basic building blocks of programming languages that implement 14 kinds of logic gates and sensors that can detect things in the cell's environment (such as light, temperature, and the presence or absence of various chemicals). The programming language was designed to be flexible enough that users can add their own primitives to it, extending the usability of the language; the language was also designed so that the DNA sequences its compiler generates won't muck up the cell when it's spliced into the cell's genome. During testing the research team wrote programs that implement 60 different circuits, three quarters of which worked the first time they were tried (isn't that a sign of having reached enlightenment?) after being spliced into the DNA of cultures of the bacteria e.coli, a species of bacteria commonly used in laboratory experiments. Right now Cello is optimized for e.coli but it's being ported to other bacteria as I write this. Their work passed peer review and was published in the journal Science, volume 352, issue 6281. Here's the abstract of their paper; you'll have to have a current subscription to their website to read the entire thing. What this means is that the feasibility of modifying bacterial cells (for now) to carry out additional tasks that they are not presently capable of (such as simple logical operations ("if input A is highest, do X; if input B is highest, do Y; if input C is highest, do Z; otherwise don't respond")) has been demonstrated. This strongly implies that more complex operations could be spliced into the DNA of cells which would then carry them out autonomously.

Also, in case you're curious, it is possible to comment that executable DNA using human-readable ASCII text so that it can be read and maintained readily by somebody else later.