Monday, March 20, 2006

Too many classes?

This is a post which was originally posted last year, under my other blog thoughts for life. I'm reposting it here because this is where it actually belongs.

I can remember, when I was in under grad, my brother said to me that when you're stuck on a programming problem then just add another class (the word class in this article is used to refer to interfaces and abstract classes as well as concrete classes).

At the time I don't think he or I understood how profound that statement is. How often does one come to code and found just two classes that do too many things, and then scratch ones head trying to figure out what's going on.

That statement is borne out of the notion that having lots of classes is a good thing.

Let's take an example from real life...

Analogy from the Real World
Imagine your motor car's engine was made up of 5 parts. When the starter motor breaks you have to remove the water pump as well because they're both from the same part. So you introduce potential for problems. Maybe the mechanic does not re-install the water pump properly, and so a problem that was just the starter motor now takes longer, with more risk because you cannot just remove the starter motor.

Transferring the analogy into the software design arena and you soon realise that the problems are far worse. A class which does many things which requires one change, there is an even larger risk factor than the engine analogy, because of the potential complexity software component.

To give you an example, a certain component of an application was particularly inefficient and for want of a better word, was just plain bad. I was tasked with upgrading (rewriting) it. When I was finished, the number of classes was about 4 - 5 times higher then in the previous version.

Here are some advantages of having many classes...
  1. Isolation of Bugs: In a system with many small classes, you have many relatively independent sub-systems all working together. If there is a problem, or a bug, chances are the bug will be isolated to one class, and only that class will have to change.
  2. Flexible to Change: I can remember times when I've had to make changes to an already built system and I spend a good part of the time figuring out how the current system works because the current system is comprised of very few classes. If the system were comprised of many small classes, then I would only have to understand many simple, small pieces of code. I may even find that the piece of functionality where the change applies is wrapped in one class already, and that class may even be named appropriately.
  3. Ease of Comprehension: Having many classes means you need less documentation. If you have one method which is rather large and you break it down into maybe two/three methods, and you name the methods appropriately then the method names describe what they do and the play the role that code comments would play in the larger method. The same is true when you break a system down into many classes.
  4. Potential for Good Design. I have found that when a system consists of many classes, chances are, the design is good. The number of classes would however, have to be relevant to the complexity involved, and this relationship is often an exponential one. As the complexity increases, so the number of classes increases even more. It is only by experience that one gets a feel for how many classes are appropriate to perform a particular job, but by and large, if the complexity is high and the number of classes is low, then the design is poor. If the number of classes is high this does not necessarily mean the design is good, but it is one of the indications.
I've heard rumours that in the past people were reluctant to have many classes with many methods because of the overheads of method calls and class intanstiation. The rationale for this consideration has been removed by efficiencies in modern compilers, virtual machines and the speed of hardware. Yes, you do incur an overhead when calling a method and instantiating a class, but, in general practice, I have found this to be negligible. Time and time again, design well, and then optimise rings true.

Real World Example
Why does this work? Well, it won't work if you just divide your large class into smaller classes in a naive way. If you were to divide a class into smaller pieces but still achieve the same degree of coupling between the classes as was previously the case, then you haven't achieved much. It is far more valuable if when dividing the class into smaller ones, the coupling level between the created classes is fairly low. In other words, the various classes do have a semblance of meaning outside of their context.

Take for example the problem that says
  1. Read in a file
  2. Count the number of words in the file
A naive approach would put all the functionality into one class which will read the file in and then count the words by counting the number of spaces. This would require one class.

A far more funky approach would be to separate the reading of the file from the counting, and if this is done with a low degree of coupling (via an interface) then changing the source location from a file to a string would be a simple process. If the counting is also done in its own class, then you could also change that algorithm to count say the letter 'a' without affecting the way it sources the file.

The classes that would be requried would be
  • Interface to communicate with the data source
  • Interface/abstract class to communiate with the counter
  • Concrete implementation of the file sourcer to retrieve it from a file
  • Concrete implementation of the counter to count spaces/white space
  • Controller to hook the two up
We've gone from 1 class to 5 classes in a very short space of time.

So when you code, do not be afraid to make many classes.

And when you're stuck, just add another class.

1 comment:

Trumpi said...

This is very true. In fact, there is a design principle surrounding what you have jsut said. The name of the principle is the Single Responsibility Principle (SRP). The SRP states that any class should have only one reson to change, and not more than one. Now being a principle, there are cases where it would be better to ignore it. However, since I have coded using the SRP, I've found that I was getting the results mentioned in this post.