Showing posts with label Edwin Jose Palathinkal. Show all posts
Showing posts with label Edwin Jose Palathinkal. Show all posts

Monday, November 17, 2008

My Favorite XKCD Strips

This strip was about the limitations of complexity analysis. If something is unsolvable like TSP, it just means it is unsolvable on a Turing machine. There are other paradigms to try.

Here is another:


This is not technically impossible. It is about a cellular automata called Rule 34 which has been shown to be Turing Complete. i.e. it can simulate everything a gerneral purpose computer can do.

Sunday, November 16, 2008

Biology for Computer Scientists

Look at yourselves. You are mostly made of proteins. Your skin is made of collagen. Your arteries, lungs, bladder, ligaments etc are made from elastin. Your eye color is due to a protein . Your hair is keratin. Mechanical forces are generated using myosin, kinesin and dynein. Insulin signals cell metabolism. Enzymes are proteins that reduce the activation energy of a chemical reaction so that it is feasible in biological conditions.

Proteins are made of long strings of polymers. Polymers are the same stuff used to make CDs and plastic bags. Unlike CDs and plastic bags which are made from the same repeating group of atoms, proteins are made from 20 different types of groups of atoms known as amino acids. So there are 20 different amino acids repeating in predefined sequence of arbitrary length.

Proteins do so much by relying on their 3D structure and certain groups of atoms on them. The 3D structure is entirely dependent on the predefined sequence of amino acids of arbitrary length.

This predefined sequence of amino acids or better known as the chemical formula of a protein is defined (encoded) in a molecule called the DNA. Ordinary human cells have 46 huge molecules of DNA called chromosomes. Every human cell literally "reads" these molecules like a processor reading instructions and literally "prints" the proteins in a nano 3D printer called the ribosome.

DNA are a different type of polymers made from repeating sequences of 4 different types of groups of atoms known as the nucleotides.

So the obvious question for the computer scientist is how to encode for chains of 20 types of amino acids using chains of 4 types of nucleotides. We know how to encode for 10 decimal digits using 2 binary digits. The same principle applies her too. We will need 3 nucleotide pairs of a DNA sequence to encode for 1 amino acid of the protein. This is because the number of possible permutations of 3 nucleotides of a DNA using 4 types of nucleotides is 43=64 which is greater than 20 amino acids. Using every 2 nucleotides of a sequence to encode for 20 different amino acids won't work because the number of possible permutations of 2 nucleotides of a DNA using 4 types of nucleotides is 42=16 which is less than the required 20.

So the cell reads the DNA in groups of threes called codons. And each codon codes for one amino acid of a protein. Once the protein is made it folds itself into a 3D structure which lends it its functionality.

Biologists really badly need love and attention from Computer Scientists because Life is Information Technology. We still don't know the algorithm our universe uses to fold these proteins.However we know the algorithm the universe uses for Newtonian physics.

So Biologists want us Computer Scientists to do for them what we did for physicists, so that they can design enzymes to act as medicines, design biochemical pathways that facilitate the production of industrial chemicals and fuels.

We need you!

BTW, This is not entirely science fiction. There are companies like Amyris that do exactly this.

Will it halt?

This is perhaps the most important thing we should like to know about any process. OK I agree it doesn't seem like an important thing to know. What's the big deal? So lets just say, I have it with me: I have an algorithm to determine whether an algorithm will halt or not and it is always right. I call it h(a,i). It takes any algorithm a as it's first input and an input string i, algorithm a will work on, as its second input. It i.e. h(a,i) returns "will halt" or "will not halt" as its output.

Imagine what I could do with this h(a,i)! I can prove the Goldbach's conjecture and get the Field's Medal with it. More importantly, I can prove everything and win all the field's medals forever. ROFL!

OK here is how I will do it. But first, Goldbach's conjecture:
Every even integer greater than 2 can be written as the sum of two primes.
Nobody has proved it as such. Distributed supercomputers have verified this to be true until 1018. But hey! I have h(a,i)! So I write a program p(i) that loops over all possible even integers greater than i and halts whenever it finds an even number that cannot be written as sum of two primes. Now the best part is I don't even have to run p till the end of time. All I have to do is run h(p, 2). If it returns "will not halt", then Goldbach's conjecture is true or else it is false! What more evidence do we need than to know all even numbers greater than two can be written as sum of two primes? LOL!

Similarly, I could use h(a,i) to prove everything and anything!

P.S: Sorry to get your hopes up. This article was about the well known Halting Problem. And Alan Turing proved a perfect h(a,i) that computes correct answers for all possible values of a and i are impossible. He used a variant of Cantor's Diagonalization Argument to prove it. However a lot of partial versions h(a,i) that compute on a subset of a and i are possible. That is why automatic theorem provers exist. So tough luck guys! There are no easy paths to win Field's Medal :(

P.P.S: The Matrix Trilogy explores the philosophical implications of the halting problem. In it Oracle is like h(a,i), in that it is designed to predict the behavioral outcome of any process (e.g. a human mind, a set of events). It is right most of the time. However Neo is a special process, in that it tries to create a final outcome that would prove Oracle's prior prediction to be wrong. This is evident in the final conflict in Matrix Revolutions where Agent Smith gets the eyes of The Oracle and predicts Neo would give up in the end, but Neo simply chooses to contradict by not giving up. 

In fact this story is so close to the Halting Problem that Alan Turing's proof is all about proving the existence of algorithm "Neo" for all h(a,i). The existence of algorithm "Neo" also hints on the possibility of generating freewill within an algorithm. All one would need is a genetic algorithm that all become a better approximation of h(a,i) in each generation and an algorithm "Neo" to contradict it.

Fascinating! Isn't it?

Saturday, November 15, 2008

Ways to get caught: Buffer Overflows

Buffer Overflows are a class of techniques used by bad guys to do everything they want. I attempt to give a simplified description of how it works.

In native languages like C and C++, buffers a.k.a arrays can be filled with more data than they are made to hold. (This is not strictly true. Modern OSes and compilers have ways to check this even for native code).

If the array is declared as a local array with a fixed size, just like any other local variable, a space equal to the size of the array is pushed into the stack. This is nothing new. All local variables are allocated on the stack segment. The compiler generated machine code automatically pushes all local variables onto the stack segment.

Did I say segment? Yes. For the uninitiated, let me explain: Every executable in most OSes are organized into segments during runtime. There is the data segment, code segment, stack segment and heap segment. Stack and heap segments may share the same memory space and they grow into each other. Keep in mind that stack and heap segments are not the data structures we studied in CS1102, we are talking about sections of memory allocated for an executable during runtime by the OS.

OK so back to local variables being pushed onto the machine stack. There is one other thing that get pushed onto the stack besides local variables: Return Addresses! The stack stores resturn addresses for the processor to return to after a function call. The funny thing is that even main() has a return address.

So here is what the bad guys do; Since most input to any program, be it a web server or a nuclear detonation device, is stored in arrays (remember char** argv?) the bad guys will give the program specially designed input values which overflow a local buffer and overwrite the return address of the current function to point to the address of the beginning of the array. (There are some technnicalities involved like NOP sleds because we do not know the exact address of the beginning of the array. I do not cover these). So the processor returns to the begnning of the array to execute whatever is written there. And guess what? The array contains the machine code bad guys want to execute on your system.

Lets say the bad guys targeted a web server that was running with administrator privileges. Then they would be able to execute code only an administrator would be allowed to execute. Thus buffer overflows could potentially give unlimited power for the attacker over the system.

Believe me, there are ways to upload entire VNC Server DLLs into certain Windows machines because of a buffer overflow in one of the network services a Windows machine opens for the whole world to connect to. And the attacker would be able to hijack your mouse and screen just like in the movies.

Fortunately, with the introduction of the NX bit most buffer overflows are impossible because the NX bit can be enabled to make a stack segment non-executable.

Ways to get caught: SQL Injections

If you have taken CS2102, or wrote an application that uses a relational database to authenticate users, you might have come across code like this:

SELECT * FROM USERS WHERE USERNAME='[username]' AND PASSWORD='[password]'

..where [username] is usually a variable that holds the username the user inputed into you application and [password] is usually the variable that holds the password the user inputed into your application. There may be string concatenation operators that make this possible in your favorite web application development language.

Assuming that the user uses the inputs directly in the SQL query without manipulation there are ways to break into such a system. And SQL Injection is the way. Here is how you do it.

Let us substitute the following input into the above query:
  • [username] is ' OR 'x'='x
  • [password] is ' OR 'x'='x

SELECT * FROM USERS WHERE USERNAME='' OR 'x'='x' AND PASSWORD='' OR 'x'='x'

As you can see this query will return some user from the database and the remaining application logic will authenticate you as that user.

This is a well known technique therefore it is very unusual to find web applications that have this vulnerability. However my polytechnic hired me after I showed them this vulnerability in one of their intranet web applications. So keep your eyes open, and report all such holes to NUS and be rewarded for your effort.

Monday, November 3, 2008

IQ Tests Suck

Here is a question I found on the Mensa website under "Mensa Workout":
Sally likes 225 but not 224; she likes 900 but not 800; she likes 144 but not 145. Does she like 1600 or 1700?
The pattern the question designers want me to find is that Sally likes perfect squares, and so she likes 1600. But there seems to be other valid solutions:
  1. Sally likes numbers in which the sum of the digits plus the number of digits equals 12. Therefore, she likes 225, 900, 144, and, of course, 1700.
  2. (This one is for the fans of Occam's Razor) Sally likes numbers whose sum of the digits is odd. Therefore she likes 1600, not 1700.
  3. Sally likes 1700 because she likes the roots of x4 - 2969 x3 + 2521800x2 - 648810000 x + 49572000000, which is 144, 225, 900 & 1700.
  4. Sally likes 1600 because she likes the roots of x4 - 2869 x3 + 2394900x2 - 612360000 x + 46656000000
  5. Sally does "not" like 1600 or 1700, because neither number has digits which sum to 9. She "does", however, like 1800.
All these solutions are logical but only 3 out of these 6 rational solutions gives me a score.

This makes me wonder: Is Intelligence a social phenomena? Please note the fact the these questions are designed by people who are recognized by the society as intelligent, i.e people who have achieved something "great" in their lives. That implies that the society defines somebody as "intelligent" just because he thinks like the other "great" people.

What if Beethoven was born before the piano? Spielberg before the invention of the camera? Steve Jobs before the computers? So there are people alive today whose technology has not yet been invented yet, the perfect means to their genius has never materialized. And if society starts to brand, recognize and promote only the people who are like the achievers of yesterday, I think we will soon enter a second Dark Age.

Hello World

Hello, I am Edwin and I will be the guest blogger for this week.

I am an undergraduate student of Computational Biology. I study it because I believe it will speed up the progress of Life Sciences and eliminate the need for slow experimental techniques and ethics that limit it. As of now most of that dream is still science fiction, but I hope somebody will make progress.

You can read more about my academic and professional past here.