(Professor Shawn Lawson wrote this excellent post for The Approach – enjoy!)
I’m Shawn Lawson , an assistant professor of computer visualization in the Department of Arts. During 2008, I was in an artist residency at the Center for Biotechnology and Interdisciplinary Studies here at Rensselaer Polytechnic Institute. The residency results mimicked other research of converting text into amino acids. Protein synthesis became too costly, so I stayed within the computational realm.
The idea for this Biocryptography project came from thinking about how science and art can be opposite sides of the same coin. Both are observing and trying to make sense of their environs. When I found the Kraus quotation, “Science is spectral analysis. Art is light Synthesis.” I started to see how this concept was not too far-fetched.
I reflected about how the four nucleotides, t c a g, are symbolic letters that when combined create a language for life. This lead me to research the origins of human language, because language also uses letters to create words, sentences, and paragraph. I discovered that hieroglyphic and cuneiform languages are both logographic and alphabetic. This means that they have both symbols that represent words and symbols that represent letters. Then I found that Phoenician is the earliest alphabet where each symbol is represented by one sound, meaning that the logographs had been removed. Also, it is believed that Phoenician is the mother alphabet that has trickled out to the majority of the world’s alphabets. Working with the origin of alphabets in conjunction with the nucleotides of life became obvious.
Using a process called transliteration, I converted the Roman alphabet back in time to the Phoenician alphabet. The Phoenician letters that had no Roman equivalent were dropped. After matching Roman to Phoenician, the remaining Roman letters were dropped. I realized at that moment that dropping some of the Roman letters could cause trouble for anyone needing a lossless transfer; although it worked for my purposes and I found the loss of information an interesting mutation of the original text.
Let’s look at the process. The Image below is a chart showing the Roman alphabet mapped onto the Phoenician alphabet. The first two columns list the respective alphabets. The third column represents the Phoenician letters that are directly related to individual Roman letters. Three letters are dropped because either they are compound sounds or have no representation in the Roman alphabet. The last column shows the Roman letters matched to their historical Phoenician equivalent.
The below image is a chart showing nucleotides mapped into animo acid abbreviations. Starting at the right-most column is a list of the available nucleotides. These nucleotides must be used at triples when making a protein, which are seen in the next column, Codons, to the left. Of these 64 possible codons, several are interchangeable with each other. The interchangeable sets are grouped and given amino acid names seen in the next column to the left. The left-most column represents the abbreviations of these amino acids. Note: the amino acid M (Methionine) is only being used as a start codon even though it can be used in positions other than the beginning. Therefore it is not listed above as part of the cypher.
This final chart, below, re-orders each set of letters or amino acids based on their frequency in English or vertebrates respectively. Here the lists are in descending order with their frequency values in white. Second, now that the lists are ordered, a simple substitution cypher is used to translate English to amino acid. For example: the word ‘eat’ would become ‘SAL’ in amino acids.
Now, let’s take the Karl Kraus quote through the process. In the image below, the first line is our source text. It is transliterated to create the second line in Pheonician. hint: phoenician is read from right to left. The third line is the phoenician is translated back into the reduced Roman alphabet. In our case, we didn’t lose any letters. The spaces are removed as seen in row four. The last row is the amino acid sequence after being cyphered. The amino acid ‘M’ is required at the beginning of every protein sequence and ‘W’ at the end. The codon sequence is now in FASTA format, which was used to determine the overall shape and helix or beta locations.
I used a tertiary level protein prediction folding algorithm on a super computer to find out what the sculptural form of what my amino acid sequence might be. This gave me an ascii formatted protein structure file. I was able to convert this into the 3D model and render out an image. See below.
After figuring out a cypher for text, I started thinking about how other artists have designed cyphers for image data. I decided to create a cypher for three-dimensional data. This simple-cypher links simple three-dimensional data into amino acid bases. See the green block in the image below for the cypher.
Although, the super-computer prediction algorithms didn’t like my amino acid sequence and refused to fold it. For those looking to decode the 3D animation sequence, the sequence can be found here.
This method of biocryptography is not realistic for precise use. My primary intent was to create ideas and discussion about biology, linguistics, and information. What would happen if we take this method of biocryptophy in reverse? Could a genome hide the next great works of literature?
Many thanks to following people for helping me make sense of the madness: Shekhar Garde, Sapna Sarupria, Philip Shemella, Glenn Monastersky, and Daniela Kostova.