1.8 Abstractions for Computing Machines

1.8.1 Computer Hardware Organization

Let's take a little closer look at the way computing machines operate, so that some of the terms taken for granted throughout this book will not be confusing.

First, a definition of the "nuts and bolts".

The term hardware refers to the physical components, including the electronics, which make up the computer itself. That is, this word distinguishes the machine from the instructions it executes.

Every computer must have hardware that allows it to achieve the following tasks:

1. Input. A computer must be able to accept information from the outside world.

Early computers utilized punched paper cards or tape as their primary means of obtaining data. Later, magnetic media employed reels of tape, soft plastic "diskettes" of various sizes, or precision engineered metallic "hard" disks spinning at high speed. Other input devices include light pens, television cameras, "joysticks" for games, a mouse, page document reader, and even the human voice. However, while many people are predicting that one or more of these will replace the alphanumeric keyboard entirely over the next few years, the latter still remains the input device of choice for most people.

2. Memory. A computer must be able to store data.

One useful way of thinking about a computer is to view it as a collection of thousands of pigeon holes, much like those used in a post office to sort mail. Information can be stored in a specific location, and then it can later be retrieved for manipulation by the processor.

For practical purposes, there are two types of computer memory, each having its distinct purpose. One type contains the programs used to start up the computer, perhaps to test the rest of memory for flaws, and to begin the operation of the disk drive so as to obtain some larger program from there. Since this program must be available every time the machine is turned on, it is permanently coded at the time of manufacture. For this reason, such memory is called Read Only Memory, or ROM for short. A computer may have anywhere from a few hundred to tens of thousands of units of its available memory locations dedicated to the built-in ROM programs.

The programs coded into ROM at the time of manufacture are the firmware of the computer. This term is used to distinguish the programs from the ROM chips which contain them (hardware).

The majority of memory falls into the second category. This is read/write or "Random Access Memory" (RAM for short.) This is where user programs and data are stored. Turning off the power causes all the information in RAM to vanish, so it is important to ensure that programs and data are also stored on an external device before this is done. In fact, careful management of resources dictates that a programmer working on a large project should store the work periodically as a precaution against a power failure that would wipe everything out.

3. Processing. A computer has a built-in ability to manipulate the stored data.

The chip (or collection of chips and circuits) that actually moves the data around in memory and manipulates it in other ways, is called the Central Processing Unit (CPU). It contains the circuitry to allow some simple arithmetic operations to be performed, to test the memory for the presence of certain results, and to perform a variety of other operations. Instructions to do these things are given as special numeric codes.

The limited capabilities represented by this "machine language" are not usually used directly by the person sitting in front of the computer console. Rather, other languages and programs are written in terms of the machine's language, and it is through these that the operator interacts with the machine.

If the memory are the post office pigeonholes, the processor can be thought of as the postmaster, moving items from box to box, sorting them, and taking collections of items from the boxes for further processing elsewhere.

4. Control. The various functions of the computer must be coordinated.

Some of the functions that control the input, processing, and output of the data also may be located in the CPU section (or chip). However, there are usually other specialized circuits for this purpose, and in some computers, these actually make up most of the electronic hardware.

Besides the routing of data to and from the correct memory locations and input or output devices, some of this circuitry is connected to certain hardware switch locations of the main memory, and interpret references to these by programs as signals to take action. (e.g. turn on the disk drive, invert the screen to white on black, etc.) This frees the CPU from much time-consuming and unnecessary activity, and allows it to be used more for processing than for control, speeding up the operation of the whole computer.

5. Output. A computer has ways to send information back to the outside world.

Many of the devices mentioned under the input section may simultaneously handle output. For this reason, the two functions are often grouped and referred to collectively as I/O. At one time, printers were the main devices dedicated strictly to output, but today a cathode-ray tube (such as is found in a television) is commonly used as a video display terminal for most purposes, with the printer being employed only at the last stages of a project when a final printed or "hard" copy of the results is desired.

There have been many kinds of printers. Some printed hundreds of lines per minute, cost a small fortune, and were used only with mainframe installations. Others generated letter quality (typewriter style) output, but at only thirty characters per second (or less). Between these in speed, but at a much lower price were the dot matrix printers, which formed letters by making an array of dots on the page with pins, ink jets, or an electrical spark. Laser printers and other whole-sheet devices represent a newer technology that has moved rapidly from large networks to the individual small user. These are now available in colours and with large sheet sizes. A variety of other special purpose output devices also are available, particularly for graphics-oriented displays.

The specific technology employed to implement the physical machine changes rapidly, and need not be detailed further here. Hardware design, and even the assembly of pieces out of a box are not the concern of most people who use or program computers. Rather, they are interested in the way that the total working environment appears to them as they use it.

The total environment presented to the user by the combination of hardware and software that is being employed at the moment is called the virtual machine.

Note that the virtual machine to someone employing a word processor or spreadsheet is quite different from that presented to someone who is programming the same computer. Each has a different abstraction for the computer, a different virtual machine. The same user has a different virtual machine at different times of use, depending on the software currently available for the task at hand.

1.8.2 Computer Software Organization

Thus, it is time to pay attention to the programs that actually run on a computer.

The programs resident in the RAM memory of a computer are collectively referred to as its software.

Software may refer to a purchased "canned" package used to operate an accounting or word processing system. It may also refer to a computer language together with some program written by the user in that language.

The software that handles the disk drives and other I/O, and generally provides the environment in which the programmer works, is referred to as the operating system of the machine.

If the task is actually writing programs, it is important to realize that a computer can take action upon only a limited vocabulary of instructions--usually fewer than one hundred words. However, once given the instructions to follow, the machine will do the set task so rapidly that the programmer saves time in the end, despite the work put into turning those instructions into code that the machine could follow. Here are a few basic definitions.

A set of instructions to a computer that is intended to make it perform a task is called a program. The person devising the program is called a programmer and the collection of all the instructions available to a programmer at a given time is called a programming notation or language.

Because the vocabulary of a computer is limited, a programmer must give the program instructions in a manner carefully chosen for clarity, accuracy, and efficiency. Much of the purpose of this book is to teach the rudiments of strategies for creating such programs. Its students will spend a great deal of time sitting in front of a computer typing in and running sample programs, for computer science, like mathematics (and other worthwhile things), can be learned only through the fingertips. That is, the "hands-on" aspect of programming is not an optional part of the course, it is the course. The book, the professor and the lectures are teaching aides, but the theory they present is worthless unless it is used.

1.8.3 Computing Notations

Since programming is the central issue here, it is worthwhile to consider programming languages in general terms.

The central processing unit (CPU) of a computer can execute programs only through a limited number of instructions placed directly into the memory as numerical codes. These codes and their meanings are collectively referred to as the machine language for that particular processing system.

It is possible to write programs in machine language employing a text editor (like a word processor, but somewhat specialized) for the writing, and then use a program called an assembler to generate the actual code by translating the text file. However, most languages are not machine languages, but are written in terms of these low level codes. The commands in the higher level languages (such as Modula-2) are more like English words than the cryptic abbreviations used in assemblers, or the meaningless (to us) numbers that are the actual machine codes. Once a higher level program has been written out, the machine can translate this notation into the appropriate machine codes so that it can be executed on the processor.

Of course, higher level languages have developed gradually, as has the hardware on which they run. At one time, only the low level machine codes were used, and entering these was a laborious process indeed. As they developed, these languages or notations became somewhat specialized, reflecting the biases of their creators and principal users.

Two Early High-Level Notations

The first high level language to gain common acceptance was FORTRAN (FORmula TRANslation) which was developed in the 1950s for numerical computations and scientific research. This language exists today in many versions, the most common of which is FORTRAN 77. The newest standard version FORTRAN 90 has just become available.

A second language from this early era in computing, which is still extensively used, is COBOL (Common Business Oriented Language). Again, there are many versions of this language, but they are all designed to make it easy to program the solution to business problems.

Though the language definitions themselves do not demand it, actual implementations of both FORTRAN and COBOL require that programs be written out in one set of codes as a text file (using the language vocabulary) and then must be translated by another program into the code that the machine itself can run.

An implementation of a language that is translated once from the programmed form to the machine version, and thereafter run from the machine version, is said to be a compiled implementation. The program which performs the translation task is called a compiler.

BASIC

BASIC (Beginner's All Purpose Symbolic Instruction Code) was developed as a teaching tool rather than as a major problem-solving notation. It also has many versions, some of them with almost as much power as the FORTRAN of which is it sometimes considered an abbreviation. BASIC is often called a "quick-and-dirty" language, because it allows the programmer to write code and get fast results, using the computer as a kind of giant calculator. Unfortunately, it is very "loose" (the dirty part) and its users easily develop rather sloppy ways of thinking and working that are detrimental to the planning of large programs. The following definition happens to apply to most (but not all) versions of this language.

An implementation of a language which is translated from the written code into the machine code as the program is run, and which must be translated this way every time it is run, is said to be an interpreted implementation.

BASIC, in most of its incarnations, is not suited for large programming projects because its design does not enforce good programming habits. Also, because it is frequently implemented as an interpreted language rather than as a compiled one, its programs are rather slow.

Pascal and Modula-2

In the early 1970s, the Swiss computer scientist Niklaus Wirth devised a new teaching language which he called Pascal. Eventually, Pascal became the mainstream language among university computer science faculties around the world. Because it was designed for teaching, Pascal had many shortcomings for programming commercial applications, and there have come to be several enhanced versions. Perhaps the best known was the P-system version developed at the University of California at San Diego. Here, the product of the compiler was not machine code, but an intermediate called P-code, which itself had to be interpreted when the program was executed.

The advantage was that all that was needed to take the compiled program to another computer was the appropriate final stage interpreter for the target machine, because the P-Codes themselves were the same for all machines. Since much of the operating system (filer, editor, etc.) was also written in P-Code, the same virtual machine was presented to the operator or programmer, regardless of the type of hardware employed. Wirth, by the way, was also the one who devised the P-code; the UCSD version was merely the major commercial implementation of this, and provided what became the standard operating system to contain it.

There were other contenders for the title of "standard" Pascal, however. The International Organisation for Standards (ISO) produced a version in 1978, and later the American National Standards Institute (ANSI) also published similar but not quite identical standards for the language. These standard versions of Pascal were widely implemented on minicomputers and mainframes and were commonly used in educational institutions, though these often developed local dialects of their own.

Because of the wide experience with Pascal, and the many extensions of it which others created, Wirth decided to derive a new language of his own from this base. Apparently he believed that the many attempts to enhance Pascal confirmed the belief that it was fundamentally flawed, and that he should start again, rather than do a patch job of his own. He had already produced a language called Modula, whose principal feature was the "module." This allowed programs to be compartmentalized for easier design and error detection.

Borrowing much of the style from Pascal, and the module concept from Modula (which was never very widely used) Wirth developed in 1978 and published in 1980 his description of a new language which he called Modula-2. Besides the language, he devised a new intermediate code which he called M-code, and designed and built a computer (the Lilith, which was an optimized workstation for the language, and whose native machine language was M-Code.) He also designed and distributed both M-Code and machine language compilers to implement the language.

It is important to note that Modula-2 is specifically designed for programming large complex systems, and many comments later in this text will serve to point students toward such tasks, though the examples will, of necessity, be at an elementary level.

Modula-2 gained wide acceptance in a short period of time, and wherever it was introduced it quickly replaced Pascal as a teaching language. It also came to be used for large production purposes in a way that Pascal never was. As a result, it too became the subject of a standardization process begun on the international level by ISO in 1987, and completed with the release of the final standard document in 1996. The results of that process are reflected in this revision of the text.

More recently, generic programming extensions and object oriented extensions were added to the base standard by the ISO Modula-2 committee.

Some Other Modern Languages

Two other languages which are generally compiled and have been used in universities for teaching purposes are PL/1 (Programming Language one) and "C". The former was a creation of IBM, and the latter was contributed by Bell labs. (Actually no predecessors "A" or "B" ever existed outside the lab, but there is reputed to have been a Canadian language whose name was pronounced like the former, but was spelled "EH?".) In recent years, C has been extended to allow it to be "object oriented," and the new variations are called C++ and Objective C.

Still another, whose use has been mandated by the United States government as a standard for defence critical applications, is called Ada, in honour of the Countess of Lovelace (1815-1852), the first computer programmer. Ada is also a descendent of Pascal, but is an enormous language/programming environment. While it must be used for certain military contracts, it has not so far been found to be suitable for beginning instruction, nor does it have many implementations on microcomputers.

A new language from Sun Microsystems called Java borrows notational style from C++ and programming style from Smalltalk. It is an interpreted language and its code can, in theory, be run on any platform. The idea and functionality of Java were also borrowed by Microsoft to create their proprietary language called C#.

Wirth has also not been content to rest on his laurels, and has produced new notations in the Modula-2 style, called Oberon and Oberon-2. These are experiments in minimalism in computing notations, and may give rise to a new mainstream language at a later time. Others have also experimented with the Modula family of languages, producing Modula-3, object oriented Modula, and several other variations.

Other Notations for Problem Solving

In addition to those mentioned above, there are many other specialty languages with small but dedicated followings. There are also a number of very high level (so-called "fourth generation or 4GL") computing environments available that are not so much languages as they are a means to solve problems without writing programs. These are often designed more for business purposes than for scientific ones, however, because in the latter case, it is virtually impossible to anticipate ahead of time even the general structure of problems, and scientists and mathematicians often need the flexibility that a high level notation such as Modula-2 supplies. Moreover, 4GL's are usually interpreted and can be exceedingly slow.

In addition, such programming environments as databases and spreadsheet programs often serve more as languages than they do as applications, and there are even programmable word processing programs that also tend to fall into this category.


Contents