Friday, January 9, 2009

Data driven programming assigments

This afternoon I attended a talk by Randall Bryant on Data Intensive Scalable Computing. His focus was on computer systems for processing large amounts of data.

I realized the importance of data driven computations earlier though my experience setting programming assignments. I think it is important to have problems that are realistic; where students want to write the programs in order to see the results. Unfortunately, most of the time we are technique driven. We try to form a task around a specific method we want students to use. However, most problems usually have a rather trivial solution, therefore we need to impose some unreasonable constraints or increase the size of the problem to some unbelievably large size in order to force the use of specific techniques.

I think the correct approach is to start from the data. There are large amount of interesting data that is available over the web, from movies to tags. In the UNIX workshop I conducted for freshmen orientation 2008, I made use of the SMS corpus from the WING research group to motivate the use of UNIX pipes.

Dealing with large amounts of publicly available real world data gives rise to realistic computational problems where the effect of efficient algorithms become apparent. Computations that takes hours to run using a naive method can be completed in seconds using the correct approach.

No comments: