The implementations of most of the data structures thus far depended heavily on the use of pointers and dynamic memory allocation and deallocation. Such techniques have the advantage that one does not have to say ahead of time how large the structure is to be (as with static types such as arrays). However, dynamic use of memory in and of itself is not without its problems.
Consider a piece of code that does, in effect, the following:
NEW (p); q := p; DISPOSE (p);
In practice, the offending lines will probably be too far away from each other physically or logically for the programmer to notice. The pointer q now points to an area of memory that the program has given up ownership on, and is therefore logically invalid. It is not NIL, however, so references to it will appear to work. However, the memory in question can be allocated for another purpose meanwhile, and a value assigned there via yet another pointer is now in direct conflict with the use as the invalid q^. The pointer q has been logically orphaned, and the program that does this is headed for disaster. Such errors are extremely difficult to find, and programs that use a lot of pointers are therefore very prone to errors and very hard to debug. As in most situations where the error is logical in nature, no solution can be offered except:
- plan program logic carefully
- avoid pointers except where necessary
- encapsulate dangerous code in a separate module to make errors easier to find
- test routines one at a time to narrow down pointer errors
Now, consider a slightly different piece of code. Again, in real life, the offending lines are likely to be far apart rather than contiguous.
NEW (p); p := q;
That is, the value of a pointer is reused for something else (perhaps instead of base assignment another NEW is performed on it) without first calling DISPOSE to give back the memory that was allocated. The chunk of memory on the heap that p pointed to after the call to NEW (p) remains allocated, but the program has now lost the means to refer to it, because it no longer has a pointer variable with the correct value.
Memory that a program allocates and then loses its reference to without first deallocating is called garbage.
Once again, little can be offered as a solution in such cases except to appeal to the programmer to take more care with program logic. Some debugging environments also offer the use of tools to monitor memory allocation and determine whether or not the program is "leaking" memory and generating a garbage heap.
Suppose that a program has been working for some time, allocating and deallocating memory in the heap area reserved for the program. The sizes of the allocation blocks that have been used may vary considerably, and the memory map becomes a patchwork of allocated and free blocks of varying sizes as time goes on. It is not difficult to get to the point where one asks for a new block of, say 100 LOCs and the memory manager discovers that the largest available block is only, say 70 LOCs. Now, there may be dozens of such blocks, for a total many times what is needed, but became the memory is not all in one place, the allocation fails.
If not all the memory available to a program is in a single contiguous block, the memory is said to be fragmented.
The small free blocks are sometimes loosely called garbage, but this is not quite correct, because the program does have access to them individually if it asks only for a small amount of memory. Slight memory fragmentation probably has little or no effect on program performance. However, heavy use by a program of dynamic memory in blocks of varying sizes will eventually cause severe fragmentation and may mean that the program can only operate for a certain length of time, and then invariably fails.
Some operating systems have a means to defragment and/or do garbage collection on the dynamic memory automatically. The two tasks are sometimes grouped under a single heading as "garbage collection" though this is not quite accurate. A defragmenter might pause when asked for memory and try to move things around so that a chunk becomes available, then allocate from that. It may try to determine what memory is no longer pointed to by a program pointer variable and mark it as available.
However, such a behaviour creates a serious problem for a program that depends on knowing the values of pointer variables. There is no way for the memory manager to know how the pointers have been used in the logic of the program. Unless it could guarantee that every pointer variable that pointed to a specific location was known by the memory manager and changed at the same time, things could go seriously awry. Consider the code:
heap^.root := temp; heap^.lowerLeft := temp; heap^.last := temp;
As this code concludes, the pointer to a specific piece of memory is held in four places altogether. What if this value changes while the assignments are being done? What if the memory manager changes one of the pointers but not all of them? What if the pointer has been CAST into a variable of some other type to do some low level work? What if the code calls for the pointer to be incremented using SYSTEM.ADDADR and the memory consolidator changes the value of the pointer in the middle of the loop that is doing this?
The problems with garbage collection are similar. Unless the collector routines know about all the pointer variables that point to a location and can therefore deallocate memory in a way that is guaranteed to be error free, the integrity of a program that uses pointers cannot be maintained.
Dynamically allocated memory, all the references to which are guaranteed to be known to the memory manager is said to be traced. Other dynamically allocated memory is said to be untraced.
NOTE: In common with most procedural languages such as Pascal and C, base-standard Modula-2 does not guarantee tracing of memory allocated by NEW and referred to by pointer variables.
Thus, at least with respect to garbage, Modula-2 offers no solutions to the programmer who creates it with ill-thought-out code. If a programmer decides to use pointers and therefore takes on the task of memory management, then the job must be done completely. The base-standard Modula-2 rule is:
NOTE: If the programmer has the ISO standard object oriented extensions to the base language and uses them appropriately, some automatic garbage collection becomes available. (See a later chapter for details.)
There may in some cases be a way around the problem of memory fragmentation, however. Provided that the system actually does have a defragmenter in the memory manager, and provided that the programmer cooperates with it, the programmer may be able to designate parts of the heap as "moveable" and allow the memory manager to defragment those sections other parts are "locked" and no defragmentation is allowed in those sections. The details of how to do this are, of course, system dependent. However, the way shown here for the program to maintain its integrity in the face of possible memory moves is fairly common.
Instead of the normal scheme, a program cooperates in integrity with a defragmenting memory manager by having two pointers, one in static memory, and the other either in static memory, or at least in locked (nonrelocatable) memory. The first pointer is actually used by the program, but only points to the second pointer. The second pointer is never referenced directly by the program, and points to the dynamic memory. The defragmenter is then free to move the memory about, changing the second pointer, and the value of the first will remain untouched, because the location of the second does not change just because its value does.
A pointer to a pointer to data is called a handle.
The declarations could look like this:
TYPE MyDataHandle = POINTER TO POINTER TO Node; Node = RECORD fields END; VAR theHandle : MyDataHandle; thePointer : POINTER TO Node; (* not needed if handle is a type *) temp : Node;
In the code, references to the data are done like this:
theHandle^^ := temp;
References via handles are said to be doubly dereferenced or indirectly referenced.
That is, if hand is a handle, then hand^ is a pointer and hand^^ is the data.
In standard Modula-2 the variables would have to be set up in the first place, and this might be thought of as:
theHandle := CAST (MyDataHandle, SYSTEM.ADR (thePointer)); NEW (thePointer);
However, in order for the memory manager to actually trace the variables it needs to, there are usually some non-standard reserved words added to the language, and the code looks like this:
If it is done this way, the handle is the only variable used in the program. It neither declares nor directly knows about the pointer variable.
WARNING: The details will vary from one system to another. There may even be a non-standard pervasive HANDLE that is used in place of POINTER TO POINTER, so that one writes HANDLE TO. References to the data are still done with double indirection, however.
WARNING: The programmer must cooperate with this arrangement by carefully avoiding any single references such as theHandle^, as the value stored there is subject to change at any time by the defragmenter.
While the information here is necessarily incomplete, the ideas and vocabulary ought to be sufficient to allow the programmer to find the additional material in the system documentation that is needed to make all this work.
Examples: Since the size and resolution of monitors varies greatly from one computer to another, the record that comprises the information shown on the screen is usually held in a relocatable block of memory, so that it can be re-sized dynamically. Any program referring to this section of memory must do so through a handle rather than a pointer. The same is often true of other blocks of system information such as those allocated to file records.