An interesting method of data visualisation.

08 February 2007

Data visualisation is a process in which the bits of a given data field are displayed in a graphical format to help the analyst find patterns or anomalies in the data. For example, staring at system logs for a couple of hours is enough to put your mind on autopilot: You'll keep staring and hitting the page down key every once in a while, but your conscious mind doesn't really register the data that your eyes are sending to your brain. Unless there is something unmistakably wrong, even the pattern recognition functions of the brain will be bored to tears, and will probably miss the little variations and oddities that often crop up in a data set. Also, there are some patterns that just won't appear unless you're looking at a very large portion of the data set all at once. A good example of this phenomenon is that there are patterns that arise in large volumes of data encrypted with the RC4 algorithm that won't appear unless you look at something like one gigabyte of data at a time. That sure won't fit on the screen all at once. Not in its usual format, anyway.

As the word 'visualisation' implies, data visualisation is a set of processes which convert data into a visual format, like a line graph, a bar chart, or a map, so that your brain can get the 'big picture' of what's going on and hopefully anything amiss in the data will leap out at you.

Two geneticists have worked up a visualisation method for DNA called DNA rainbows, in which they assign a colour to each of the four bases of DNA (adenine, guanine, cytosine, and thiamine) that encode data in the DNA strand. Different combinations and orientations of these four bases make up a base-four information system (the nucleotide made up of adenine and thiamine can occur in two orientations in DNA, as does the pair of guanine and cytosine; however, adenine cannot bond to cytosine or guanine, nor can thiamine - these are the only two possible structures). After writing the visualisation software they fed samples of human DNA into their programme and turned the jobs loose for a while.

What they found was that their visualisation technique generates images very similiar to stereoisograms (magic eye pictures), which were popular a couple of years ago. If you look at the images they have up on the website (and there are a lot of them - one for every chromosome in the human race, and a couple of interesting genes on each chromosome) you can see that there are subtle patterns in the nucleotides. There are rainbows (hence, the name of the site), circles, patterns of dots, runs of one colour, runs of two colours...

Somehow, I doubt that the patterns spell out "Sorry for the inconvenience," but I might be wrong.

Also of interest to people in this vein would be DNA as music, which wasn't really well known until the band called the Shamen released the album Axis Mutatis that included such a track ("S2 Sequence").