Monday, September 24, 2007

11. WebSpider

Source Code: Please click here to download

I was able to accomplish all the tasks. However the third task which required us to implement logging, I was able to completed it only partially since I choose to put a bunch of System.out.print line statements. I could have implemented a better logging, but I did not have the time to finish it with a complete logging application before the due date. Therefore, I realized that it is important to start on harder assignments at least a week before the due date.

Write the web crawler was for me the hardest part. Since the httpunit package doesn’t come with all the Javadoc comments which are always very useful when writing Java code in Eclipse. I also missed some helpful documentation or examples for httpunit on Google or even on their own Website. If I would be more familiar with the package then I am sure that I could write better and more robust code then the one which is available here. That is what I think that writing the actual web crawler made more difficult.

A problem that I ran into in the very beginning was:

When I issued the command ant run, it gave me an error saying that Junit.org does not exist. It took me a while to figure out that putting Junit into the fileset in the build.xml file would actually solve my problem. After that it ran fine, but I ran into problem with the httpunit package.

Finally, I was able to get a good coverage from emma which was near 100% for all. I was not able to get the full 100% because PMD complained about ArrayList which I could not solve in the short time.

In conclusion, I think that this assignment was helpful in dealing with unfamiliar packages and it also gave me a chance to brush up on recursion. Once I heard from a senior worker that he has never used recursive methods in his career when writing programs. I would agree that recursion is not necessary since it is heavier on the memory, has slower execution (as can be observed when executing this code) and is more prone to errors.

By the way, be patient when executing this code since it may take a while until it recursively traverses through the Web links until a specified number of pages have been visited.

No comments: