Showing posts with label Melvin Zhang. Show all posts
Showing posts with label Melvin Zhang. Show all posts

Sunday, January 11, 2009

My favorite algorithm

Many of the mechanical procedures we learned in primary and secondary school for calculating the result of certain mathematical operations are really just numerical algorithms. Usually we start with the Addition algorithm for finding the sum of two numbers, this is typically followed by the Multiplication algorithm, which uses the Addition algorithm and digit shifting. Often we are just taught the steps, but not the reason why following them will give us the correct answer (correctness) or how the steps were developed in the first place (design).

The problem posted by Chris Henry on the frequency of the mode reminded me of a similar problem, that of finding the majority in a collection of n items. The majority is the item whose frequency is more than n/2. This problem has a very clever but simple solution that only needs only one pass over the data. It was listed by one of its inventors, J Stother Moore, as one of his best ideas. It is also my favorite algorithm.

What is your favorite algorithm?

Saturday, January 10, 2009

Using a personal wiki for notes and todos

Taking notes is an important part of learning something new. It helps to jot down the important points while the material is still fresh in your head. I've experimented with a number of different note taking mediums in the past and I hope to share with you some of my insights.

Paper notes are easy to write (unless you have a bad handwriting, like me) and draw diagrams However, organizing the notes afterwards can be a pain. This works fairly well for module notes as you will usually read them in a sequential fashion and the notes can be written next to the slides. If I'm doing a module summary, which is longer, I will type out the notes.

Sometimes, you might need to do some quick research to solve a small problem, such as figuring out how to configure your smart phone to access the NUS network. After finding the answer on the web, I find it useful to type out a short howto to remind myself how is it done, in case I need to do it again in the future.

The trouble with the above methods is that notes are largely independent of one another and it is difficult to find all the information about a particular topic which may be stored in different files/pieces of paper. A more useful note taking system should allow one to create hyperlinks between different notes and to easily search for specific pieces of information.



A screenshot of TiddlyWiki from WikiMatrix.org


My solution, as you might have guessed from the title, is to use a wiki engine. Unfortunately, most wiki software require a web server and database backend, which is too complicated for me. Realizing the need for a lightweight, portable wiki for personal use, Jeremy Ruston built TiddlyWiki, which is a single HTML file powered by JavaScript. The one I'm using for notes is a derivative known as MPTW, it has a cousin MonkeyGTD which I use for managing projects and todos. You simply store the file on your local machine and open it in a browser. When you save your changes,  the JavaScript writes the changes back to the file.

Friday, January 9, 2009

Data driven programming assigments

This afternoon I attended a talk by Randall Bryant on Data Intensive Scalable Computing. His focus was on computer systems for processing large amounts of data.

I realized the importance of data driven computations earlier though my experience setting programming assignments. I think it is important to have problems that are realistic; where students want to write the programs in order to see the results. Unfortunately, most of the time we are technique driven. We try to form a task around a specific method we want students to use. However, most problems usually have a rather trivial solution, therefore we need to impose some unreasonable constraints or increase the size of the problem to some unbelievably large size in order to force the use of specific techniques.

I think the correct approach is to start from the data. There are large amount of interesting data that is available over the web, from movies to tags. In the UNIX workshop I conducted for freshmen orientation 2008, I made use of the SMS corpus from the WING research group to motivate the use of UNIX pipes.

Dealing with large amounts of publicly available real world data gives rise to realistic computational problems where the effect of efficient algorithms become apparent. Computations that takes hours to run using a naive method can be completed in seconds using the correct approach.

Wednesday, January 7, 2009

From Research to Teaching and Outreach

In my previous post I talked about research in CS, one of the goals of research is to advance our understanding in a specific area. IMHO, there are at least three different "frontiers" of understanding:
  1. expertise of other researchers/practitioner in the area (the usual sense of research)
  2. level of understanding attained by undergraduates in the area (teaching)
  3. public or layman understanding/appreciation of the area (outreach)
Both teaching and outreach are important to the development of a field because it expands the talent pool and imparts modes of thinking that are peculiar to the area. In recent years, there is push to promote Computational Thinking as a fundamental skill, on equal terms with reading, writing, and arithmetic.

I've helped out at the NUS SoC open house booth for the last two years, and a recurring problem is that most folks have no clue as to what we do in Computing. I think the Europeans got it right when they used the term "Informatics" instead of "Computer Science". As Dijkstra wrote in EWD 682, "we don't call painting "brush art", nor surgery "knife science"". Incidentally, Dijkstra is probably one of the earliest bloggers, he started to send his writings to colleagues by regular mail, before the advent of the Internet. Now, you can view his writings at the EWD Archive.

I tend to like to use the term "Information Engineering", which puts the emphasis on the object of our study, information, and away from the tool we use, the computer. Most people tend to have a better sense of what is engineering, it is about building useful things. In computing, we design algorithms and build software to manage/process information.

Do you have some simple way to describe what we do in Computing to the layman? Do share your ideas/suggestions in the comments.

Monday, January 5, 2009

Primer on CS research and relevance of research skills

First of all, I would like to thank Juliana for the invitation.  My name is  Melvin Zhang, and I am a third year graduate student in SoC. I graduated with a B. Comp (Hons) in 2006 from SoC, majoring in Computer Science. I am currently doing research in the area of Computational Biology.

In this post, I will try to give an overview of research in CS and the importance of research skills.

In a nutshell, research is a systematic study of a specific problem to derive new methods, frameworks, systems or facts. CS research can be divided into two broad areas, namely theory and systems. Theory is concerned with coming up with new methods or facts that can be proven mathematically, whereas systems is about the implementation and design of computer systems to solve real world problems. Not all research lead to new ways to solve a problem, sometimes it is about developing a unifying framework which combines different methodologies.

Most of us have to perform various research related activities at one point or another, whether it is working on an research project such as FYP or doing a survey for an essay we have to write. In general, a research project  consists of the following components:
  1. identify and formulate the problem to be solved (the MOST important step)
  2. review of existing work on related problems (so that we don't reinvent the wheel)
  3. proposal of a new solution (this is not as hard as it sounds, often it is a refinement of an existing solution or a combination of ideas from different solutions)
  4. validate the proposed solution (equally important to "sell" your idea)
Although this might sound like a sequential process, it is not. Often, the results from an initial solution are not satisfactory and a careful analysis leads to a reformulation of the problem and/or refinements to the original solution.

Almost all serious projects involved the above steps. That is why research skills such as problem formulation, review of related work and design of experiments to support your claims are important. In particular, formulating the right problem is often the key step towards a solution. To quote the words of Jiddu Krishnamurti, an Indian philosopher, "If we can really understand the problem, the answer will come out of it, because the answer is not separate from the problem.".