Bits, Bytes, and Bureaucracies
You may have noticed that the programming examples I have used have been getting larger and larger with the passing chapters. This is partly because the ideas I have been presenting have been getting more and more involved, requiring larger and more involved examples. You may also have guessed that real programs must be larger than the examples I am giving, and indeed they are. The size of a program is often measured by the number of lines of code written by the programmer. By this way of measuring, my examples in the last chapter were 5-line or 6-line programs. Real programs run considerably larger. Most of my computer games come out at around 10,000 lines of code, and that doesn’t include any of the graphics or sound!
What is staggering about so large a program is not the sheer amount of code itself so much as the complexity represented by all that code. A program is not an inert mass of information like a book. Word for word, character for character typed into the computer, a program is a far more complex effort than a book. This is because the words in a book, in comparison to the words in a computer program, are pretty much a loosely connected jumble. The words I chose to use in the last chapter have very little impact on the words I choose for this chapter.
A computer program, by contrast, is an immensely more demanding creature. It acts like a gigantic engine, with thousands of gears and wheels and pulleys, all packed into a very small space, everything very tightly connected. The overwhelming complexity of a huge program is enough to try the courage of any programmer. How can one person, or even a group of people, possibly keep track of this maze of interconnections and relationships?
The problem we face here is not a new one. The creation and maintenance of complex structures has plagued civilization since its earliest days, for a civilization is itself a complex structure requiring maintenance. The task that falls on any government &emdash; to regulate commerce, collect taxes, adjudicate disputes -- is as complex as the devious ways of its many citizens. The first civilization to develop effective techniques for dealing with these problems was Rome.
What was the source of Roman power? How were the Romans able to first create and then maintain an empire over a span of nearly two thousand years? Historians cite many factors, but a crucial factor often underestimated by the layman is the role of the Roman bureaucracy. We normally think of Roman legions marching across Europe, conquering everything in sight, but a much more important factor in Roman success was the mousy bureaucrat following in the wake of the legion, papyri in hand. Rome did not invent bureaucracy, but the Romans refined and developed the art of bureaucracy far beyond anything the world had known. Roman administrative skills made it possible to raise, equip, and train the legions that conquered the territories; these same skills insured that the conquered lands were smoothly and efficiently governed. A newly-won territory quickly became a prosperous component of the Empire rather than a poverty-stricken and sullen vassal. Throughout the Empire, a large and efficient bureaucracy coordinated the flow of goods and people, and brought peace and prosperity to a larger area, for a longer time, than the world has known before or since. Such is the power of bureaucracy.
Exactly what is a bureaucracy? Three primary elements determine the form of a bureaucracy. The bureaus themselves constitute the first element. A bureau is a group of people performing a function. A bureau can be a small, one-person operation, or it can be as large as the Department of Defense. The second element is the assignment of functions to bureaus. Each bureau is responsible for a single function, be it broad or narrow. Each function is assigned to a single bureau. The third element is the set of communications procedures within the bureaucracy. The various bureaus must coordinate their actions; to do this requires a clear and simple communication system for transmitting work orders.
These three elements characterize a bureaucracy, but they do not explain its strengths. Why does a bureaucracy work? What is the source of its ability to handle complex problems?
One strength of the bureaucracy is its modularity. The bureaucracy is broken up into discrete chunks that are much easier to understand. Consider, for example, the United States government. What is it? Well, we could start off by breaking it into three chunks, the legislative, the executive, and the judicial. Each of those three chunks includes within it a great many people. If you wanted more detail, we could break the executive branch into the various departments (State, Defense, Commerce, Labor, etc). We could then break one of the departments down into its subcomponents, going down further and further. Each module within the structure can be broken down to smaller components, and the modules can be reassembled to form the whole. This breaking down and putting together is one of the "big ideas" of Western civilization. It parades under the name analysis and synthesis. It is the basis for many of our civilization’s achievements. The bureaucracy is an example of analysis and synthesis applied to large organizations. Take all the problems that we require the US government to handle; break them down into components, assigning each component to a bureau. If a component is itself too large to digest, break it down further into sub-components. Continue this process of breaking down into subcomponents as necessary. Once the problems have been broken apart and assigned, allow each bureau to tackle its small problem, then put the pieces together. The result? A Social Security program, an environmental protection policy, or an MX missile.
Analysis and synthesis appears in many other areas. It is fundamental to scientific inquiry. The scientist approaches a complex and little-understood phenomenon and starts by breaking it down into its component aspects, identifying those aspects that can be explained with existing theory and isolating the aspect that represents a mystery. This makes it possible to focus intense attention on the single mysterious item. Once the core problem has been cracked, the components can be reassembled to produce a new theory of stellar evolution, a new chemical, or a cure for cancer.
An engineer follows the same pattern in designing a machine. Break the problem up into components. Put one team of engineers on the carburation system. Have another team tackle the suspension, while a third can worry about engine cooling. Send them off on their respective tasks; when they are done, assemble their work into a new car.
The intelligent hammer
A crucial requirement for successful analysis and synthesis is that the problem be broken up in an intelligent manner. If one attempts to subdivide a problem the way one partitions a vase with a hammer, one gets only a shattered mess. The hidden skill in successful analysis and synthesis is the ability to see clean, natural ways to subdivide the problem. And the basis for clean, natural subdivision, the key criterion, is the simplicity of interaction between the modules.
A problem in analysis and synthesis is essentially a problem of untangling. Suppose that I constructed a tangle of balls connected by springs. Some balls might have many springs attached to them, while other balls might have only one or two springs. How would you go about untangling this mess? If you studied it, you would undoubtedly find at least one group of balls that was tightly interconnected with lots of springs, but connected to other groups of balls by only a single spring. This would form the basis of your untangling effort. You would begin by separating the first group from the main mass. As you pick through the tangle, you would search for easily-separated groups. In short, you would analyze the tangle on the basis of the lowest interaction between groups.
This is the key idea to intelligent analysis. One must scan the problem, looking for patterns that break it up into modules that interact with each other in the simplest way. If each module has but one simple interaction with all other modules, then the situation is highly modular and ideal for analysis and synthesis. If some modules have multiple interactions with other modules, then the situation is less modular and will prove more difficult to handle.
An example is in order. Let’s say that you are a manager in a large corporation and are about to hire a new employee. You have interviewed a number of candidates and have made your decision. To implement it, you merely send a memo to the Personnel Department listing four items: the candidate’s name, the date that this person will start work, the salary to be offered, and the personnel requisition under which the candidate is being hired. The Personnel Department will take care of all the details: notifying the candidate of the job offer, obtaining the candidate’s Social Security number, home address, telephone number, filling out all the forms for the government, opening a personnel file on the candidate, and all the myriad other tasks that are required for employment in a large corporation. Your interaction with Personnel is small and simple: only four items of information are required from you. Yet, those four pieces of information trigger a great deal of work inside Personnel. In short, there are few springs between you and Personnel, and many springs inside Personnel. That’s a highly modular situation.
Just for laughs, let us consider a situation with very low modularity. Suppose, for example, that you were responsible for notifying the government of the candidate’s pay, but Personnel was responsible for notifying the government of the candidate’s claimed deductions. Then both you and Personnel would have to obtain the candidate’s name and Social Security number, and probably an internal employee number. You would need to check your information with Personnel, and they would need to check their information with you, and you would both need to check your information with the candidate. There is plenty of opportunity for a snafu here, with slightly different or inconsistent information being reported to the government. In terms of my tangled springs analogy, this situation has lots of springs running between you and the government, you and Personnel, and you and the candidate. A messy, tangled situation like this emphasized the essence of good modularity: lots of internal communication within modules, the absolute minimum of external communication between modules.
There is one other benefit of the highly modular environment: once you have shot off your message to another bureau, you can forget about it. Personnel has their little form, number P-503, that you fill out and send off to them. If you fill it out properly, you need not worry about anything else. They’ll take care of all the little details. Indeed, they are probably taking care of details that you are completely unaware of &emdash; new government regulations about hiring, or whatever. Once a module, or bureau, or engine subassembly has been set up and its inputs determined, you treat it as a black box whose internal workings are of no concern. In future decision-making, you merely tell yourself, "So long as I ship the right inputs, or forms, or whatever, to that module, it will spew out the results I need." It simplifies your thinking.
Modularity in computers: subroutines
So what does all this have to do with computers? As it happens, the concepts of modularity, analysis and synthesis, and clear communications procedures are built into computer programming languages. Indeed, they are expressed with pristine clarity in the concept of the subroutine. The subroutine is one of the simplest and subtlest ideas in all of computing. It is trivially simple to implement, yet very difficult to master. If you think of it as a bureau within a bureaucracy, the idea will come more easily, and after you have worked with it, it will help you understand bureaucracies and analysis and synthesis better.
A subroutine is a small section of a program. It can be anything -- loops, IF-THEN statements, INPUT statements. Anything that you can put into a regular section of program, you can put into a subroutine. There are only two rules about subroutines: first, a subroutine is a closed module. You should not jump into or out of the subroutine halfway through. Second, a subroutine is terminated by a new type of statement: the RETURN statement.
Time for an example. Suppose that you have a program in which it is necessary to get input from the keyboard several times during the course of the program’s execution. Suppose further that the input must be conditioned. That is, for some reason, you want to make sure that the right numbers are typed in. Suppose, for example, that the program analyzes different test scores, and all the test scores are between 0 and 100. You could just hope that the user would always type the numbers in correctly, but if you are a careful programmer, you would anticipate the likelihood of somebody typing in crazy test scores by mistake, scores like 537 or -33. You want to make certain that all the test scores are reasonable. You could write some code to check for this:
60 INPUT SCORE
70 IF (SCORE >= 0) AND (SCORE <= 100) THEN GOTO 110
80 PRINT "You typed in ";SCORE
90 PRINT "That number is wrong. Please try again."
100 GOTO 60
This little bit of code insures that SCORE will always be between 0 and 100, even if the user makes a mistake. Now, you could type this code in every single time your program needed another score. But a much easier way to handle the problem would be to make a subroutine out of it. The subroutine might look like this:
3000 INPUT SCORE
3010 IF (SCORE >= 0) AND (SCORE <= 100) THEN GOTO 3050
3020 PRINT "You typed in ";SCORE
3030 PRINT "That number is wrong. Please try again."
3040 GOTO 3000
The only difference between this subroutine and the earlier bit of code is that the numbering is different and the subroutine ends with a RETURN statement. To use this subroutine, your program need only say "GOSUB 3000". The GOSUB statement is like a GOTO with a memory. It means, "Computer, GOTO this line number, but remember the line you’re on right now." The RETURN statement reverses the process; it says, "Computer, remember the line number you came from? Well, GOTO that line number."
The advantage of this system is that you can call this subroutine from any part of the program. Consider this example:
120 GOSUB 3000
140 GOSUB 3000
When the computer reaches line 120, it goes off to subroutine 3000. That subroutine will RETURN to line 120. Later on, line 140 will go to subroutine 3000, and the subroutine will then RETURN to line 140. You can call subroutine 3000 from any part of the program without the computer losing track of where you are. A GOSUB call is rather like telling the computer, "Computer, go off and do this chunk of work, then come back when you’re done."
Subroutines as bureaucracies
Subroutines very precisely express the three primary elements earlier associated with bureaucracies: bureaus, assignment of functions, and communications between bureaus. The subroutine itself is a bureau. It may not have any bureaucrats to handle its functions, but it doesn’t need any; its commands take care of its operations. Indeed, the subroutine is a very precise bureau: one knows exactly what it does. None of this ambiguous "Bureau of Assorted Functions Support (BAFS)" nonsense that we so often see with modern bureaucracies. The subroutine executes a precise function specified in its code. And the competent programmer has no qualms about rearranging or eliminating a subroutine that is no longer needed.
The second element of a bureaucracy is the assignment of functions to bureaus, and again we see the concept expressed very clearly with the subroutine. You use a subroutine to execute a particular function. If you need input conditioning, just use the sample subroutine presented earlier. That’s the one and only place you need go for input conditioning. If your needs for input conditioning change, then change the input conditioning subroutine. Certainly makes life easy, doesn’t it?
One of the more abstruse concepts associated with subroutines is the generality with which functions are assigned to subroutines. The subroutine example given above is only capable of handling inputs that should fall between 0 and 100. But what if another portion of your program needs inputs between 100 and 200? You would like to have input conditioning for this part of the program, too, but do you need to write another subroutine? Not if you rewrite the first subroutine to be more general. One way to do this is as follows:
3000 INPUT SCORE
3010 IF (SCORE >= LOWER) AND (SCORE <= HIGHER) THEN GOTO 3050
3020 PRINT "You typed in ";SCORE
3030 PRINT "That number is wrong. Please try again."
3040 GOTO 3000
The difference between this subroutine and the earlier one is in line 3010; the constants 0 and 100 have been replaced with variables LOWER and HIGHER. You would now call this subroutine with the following sequence:
120 GOSUB 3000
230 GOSUB 3000
Now the subroutine is able to handle a wider range of functions. However, there is a price we pay for this greater generality: we must now specify the values of LOWER and HIGHER before we use the subroutine.
The third element of a bureaucracy is the set of communications procedures between bureaus. This concept is particularly well-developed in computer programming. In fact, it has its own special term: parameter passing. In a bureacracy, you send all manner of messages: letters, memos, work orders, and so forth. But in a computer, you only send numbers. The numbers that you send to a subroutine to tell it what to do are called parameters; the act of sending them is called parameter passing. The reason we call it parameter passing instead of parameter sending is that parameters can be both sent and received. In our example subroutine, the parameter SCORE is passed back from the subroutine to the calling statement in the main program.
Actually, BASIC uses a very poor method for passing parameters. The numbers that are passed back and forth between subroutines are always global variables. A global variable is a variable that is used throughout the program. The opposite of a global variable is a local variable, a variable that is used in only one subroutine. Imagine a bureaucracy that had no paper, only a gigantic blackboard and a bunch of telescopes, one telescope for each bureaucrat. Suppose then that bureaus communicated with each other not by sending memos back and forth, but rather by writing messages onto the blackboard. Everybody would then read the same blackboard, looking for the messages that concerned them.
BASIC works the same way. All the variables in the program go onto one big blackboard. When our example subroutine had the properly conditioned score to pass back to the calling statement, it wrote the value onto the blackboard in the slot for the global variable SCORE. The calling statement then read the blackboard to find the value of SCORE. The system is very simple, but it can be clumsy when you want to pass lots of parameters. Suppose, for example, that you had a subroutine at line 5000 that needed three variables (V1,V2, and V3) as input parameters and produced another three variables (W1, W2, and W3) as output parameters. Then to call that subroutine you would have to write this much code:
200 GOSUB 5000
All this work just to talk to the subroutine! What a waste of time! As it happens, there is a much better way that some advanced BASICs and many other languages use: it’s called a parameter list. When you use a language with parameter lists, you simply list all of the parameters in parentheses right after the subroutine call. Such a subroutine call with the above example might look something like this:
200 GOSUB 5000(27,158,-9,SCORE,GRADE,FINAL)
That’s much simpler, isn’t it? Unfortunately, the odds are that your version of BASIC doesn’t have this, so you will have to use the old blackboard method. Don’t despair; it is perfectly serviceable, just a little clumsy with some subroutines.
It is interesting to note that one of the most common bugs in any program is the failure to pass parameters properly. Suppose, for example, that you had a subroutine that needed those three global variables V1, V2, and V3 as inputs. Suppose also that you used it a little earlier in the program and that time, you gave V1 a value of 33. A little while later, you decide to call the subroutine, but you forget to give V1 a new value appropriate to the situation. When the computer GOSUBs to the subroutine, the subroutine looks at the blackboard through its telescope in the slot marked "V1" and it sees the same old value, 33. It goes ahead and does its job using that number. Of course, that’s an old number, and it’s all wrong, so the subroutine gives you bad outputs. You get mad and try to figure out how that stupid subroutine fouled up, and you can’t find anything wrong with it. The problem is not with the subroutine itself but with the parameters you passed to it.
The same thing happens with bureaucracies. We goof and send the wrong parameters to the office across the street; they do their duty and get it wrong. Then we yell and scream over the phone at these idiots who screwed everything up. Eventually we find out what really happened, croak a thin little "Oops", and crawl into a hole. At least the computer doesn’t have a sense of righteous anger.
Performance advantages and disadvantages
The alternative to a subroutine is called in-line code. In-line code is merely the same code as the subroutine, put in place of the subroutine call. In-line code is like having your own little bureaus within your organization, rather like having your own little Personnel department or Purchasing Office inside your department. The two are functionally identical, but differ somewhat in terms of performance attributes.
The subroutine is always slower than the in-line code. There are two reasons for this. First, there is a time penalty paid just for talking with the subroutine. It takes time for the computer to make a note of where it is when it encounters a GOSUB statement. It takes more time for the computer to look up the line from which it came when it reaches the RETURN statement. These time penalties, although small, are unavoidable and have nothing to do with the nature of the subroutine. Moreover, subroutines tend to be generalized where in-line code is customized. If, for example, a particular subroutine is meant to handle five different kinds of input conditioning, then when it comes time to handle any one of those five, it will surely waste a little time handling computations not appropriate to that one situation.
Bureaucracies are the same way. You always pay a time penalty just sending the forms through the inter-office mail. Just getting somebody else to pay attention to your problem takes a little time. And there is the same time penalty associated with generality. When you want to buy a large expensive computer, and you reach the place on the form that asks "Quantity", you are wasting time filling out "One". Any reasonable person would know that you don’t go around buying multimillion dollar computers by the gross. But this form is meant to work for big computers and little calculators, for company cars and paper clips, so we use it and pay the time penalty.
The time penalty of subroutines is counterbalanced by their resource-efficiency. When you use a subroutine, you write the code just once; when you use in-line code, you write it each time you use it. When you consider that programs take up scarce RAM space, you realize that subroutines can save you enough RAM to make the time penalty worthwhile.
Again, bureaucracies are the same way. Having your own Purchasing may be faster than going through Corporate Purchasing, but can your company afford the extra expense of your own Purchasing staff? It’s a trade-off between speed and efficiency.
The ideal use of a subroutine comes when it is called occasionally from many different statements in the program. The worst possible use of a subroutine arises when it is called many times (by means of a loop) from a single statement only. In this case, we pay the time penalty each time we use the subroutine, but we enjoy no savings in RAM whatsoever. This situation is analogous to having a hypothetical "Department of Personnel Telephone Answering". Such a department would provide a service to only a single bureau, and would be called on many times a day. Thus, Personnel would pay the time penalty but achieve no resource efficiency. Better to integrate that operation into Personnel.
The ideal subroutine situation is analogous to a Personnel department within an organization. It is called by nearly every bureau in the organization, for everyone needs to hire a new employee occasionally, but it is called few times by each bureau, because few departments hire en masse. That’s why so many organizations quickly sprout Personnel departments.
The real advantage of subroutines, though, arises from their modularity. Subroutines help you organize your program into clean, understandable modules. They make it easy to see the organization of the program. In a well-written program, you can always see exactly where to go to get any job done. Similarly, in a well-organized bureaucracy, you can find exactly the right bureau to solve your problem. As you organize your program, you should ask yourself, "What kind of bureaucracy am I creating here? Is this a clean, understandable bureaucracy, or is it a messy, snafu-prone one?" Unfortunately, when you encounter a problem-ridden bureaucracy, matters are not so simple. You can’t simply press RESET and start all over. Too bad.