Sunday, November 16, 2008

Biology for Computer Scientists

Look at yourselves. You are mostly made of proteins. Your skin is made of collagen. Your arteries, lungs, bladder, ligaments etc are made from elastin. Your eye color is due to a protein . Your hair is keratin. Mechanical forces are generated using myosin, kinesin and dynein. Insulin signals cell metabolism. Enzymes are proteins that reduce the activation energy of a chemical reaction so that it is feasible in biological conditions.

Proteins are made of long strings of polymers. Polymers are the same stuff used to make CDs and plastic bags. Unlike CDs and plastic bags which are made from the same repeating group of atoms, proteins are made from 20 different types of groups of atoms known as amino acids. So there are 20 different amino acids repeating in predefined sequence of arbitrary length.

Proteins do so much by relying on their 3D structure and certain groups of atoms on them. The 3D structure is entirely dependent on the predefined sequence of amino acids of arbitrary length.

This predefined sequence of amino acids or better known as the chemical formula of a protein is defined (encoded) in a molecule called the DNA. Ordinary human cells have 46 huge molecules of DNA called chromosomes. Every human cell literally "reads" these molecules like a processor reading instructions and literally "prints" the proteins in a nano 3D printer called the ribosome.

DNA are a different type of polymers made from repeating sequences of 4 different types of groups of atoms known as the nucleotides.

So the obvious question for the computer scientist is how to encode for chains of 20 types of amino acids using chains of 4 types of nucleotides. We know how to encode for 10 decimal digits using 2 binary digits. The same principle applies her too. We will need 3 nucleotide pairs of a DNA sequence to encode for 1 amino acid of the protein. This is because the number of possible permutations of 3 nucleotides of a DNA using 4 types of nucleotides is 43=64 which is greater than 20 amino acids. Using every 2 nucleotides of a sequence to encode for 20 different amino acids won't work because the number of possible permutations of 2 nucleotides of a DNA using 4 types of nucleotides is 42=16 which is less than the required 20.

So the cell reads the DNA in groups of threes called codons. And each codon codes for one amino acid of a protein. Once the protein is made it folds itself into a 3D structure which lends it its functionality.

Biologists really badly need love and attention from Computer Scientists because Life is Information Technology. We still don't know the algorithm our universe uses to fold these proteins.However we know the algorithm the universe uses for Newtonian physics.

So Biologists want us Computer Scientists to do for them what we did for physicists, so that they can design enzymes to act as medicines, design biochemical pathways that facilitate the production of industrial chemicals and fuels.

We need you!

BTW, This is not entirely science fiction. There are companies like Amyris that do exactly this.

No comments: