5.2 Making One's Own Data Types

Although small in the sense that it employs only a limited number of pre-defined words, Modula-2 is very flexible and expressive notation. When a feature one needs is lacking, it can usually be added by the programmer. There are a variety of ways of doing this. Some procedures and data types are available in library modules supplied with the implementation and can be imported from them. As discussed in the next chapter, a programmer may even create custom libraries of such Modules if desired.

However, many needs are specific to a particular problem and are defined and used only within the confines of a single program module. As indicated above, a program can not only specify new functionality by defining procedures, (chapter 4) but if the built-in data types are insufficient for clear expression of a problem solution, one can invent new ones. Consider, for instance, a payroll problem oriented by the days of the week. It may be desirable to be able to write a loop like:

  day := Monday;
  WHILE day # Friday
    DO

This could be accomplished readily enough with a long CONST declaration that equated numbers to the days of the week and by declaring day to be a variable of type CARDINAL. To do this, one would have to write:

  CONST
    Monday = 1;
    Tuesday = 2;
    etc.
  VAR
    day : CARDINAL;

Conceptually, at least, the variable day is of a type that uses the names of the days of the week. Actually, it remains of type CARDINAL, and there is nothing to prevent some entirely inappropriate value from being assigned to day Of course day could be restricted to [1..5] in the declaration, and this would improve things, but the declarations would still be rather clumsy.

5.2.1 Ordinal and Enumerated Types

There is a much better way: Create a new type of variable whose values can be the names of the days themselves, and specify day to be of this new type. Here are the appropriate declarations:

  TYPE
    DayName = (Monday, Tuesday, Wednesday, Thursday, Friday);

  VAR
    day : DayName;

The syntax diagram for a TYPE declaration is in figure 5.1:

Once a new type has been created in this way, variables can be declared to be of that type and manipulated very much as the built-in ones like INTEGER and CARDINAL. There are several other possibilities for creating new types; the one in this section is defined as follows:

An enumerated type has its possible values specifically listed by name in its declaration. There is a finite (Cardinal) number n of these values, and they can be thought of as being associated with the constants 0, 1, 2, 3, ... n-1.

NOTE: Unlike Pascal and Modula-2, some computing languages do not permit the declaration of enumerated types.

That is, Modula-2 has already built-in the template for creating and using enumerated types. The representation is still hidden (abstract), though it is a sequential association of numbers with names. The names themselves constitute the potential values for variables of the new type; these are transparent. Once the new type has been declared and the variables created with that type, loops such as the one with which this discussion began are perfectly in order. Note, however, that such a type is not numeric, so that within the loop, the next value of the enumeration cannot be obtained by employing the addition operator. Instead, INC or DEC must be used:

Correct:

  day := Monday;
  WHILE day < Friday  (* INC(Friday) doesn't work *)
    DO
      statement sequence;
      INC (day);
    END;

Incorrect:

  day := day + 1;

NOTES: 1. As was the case in using INC and DEC with numeric variables, care must be taken not to increment or decrement past the end of the range of the variable type. If either the value of day were Monday and one executed DEC (day), or if the value of day were Friday and one executed INC (day), the value of day would become undefined.

2. The alternate forms INC (day, n) and DEC (day, n) may also be used, subject to the stipulation in note (1) above.

3. One may not, however write INC (day, Tuesday) as if Tuesday were actually a cardinal value.

Even though it is not possible to directly treat a value in an enumeration as a cardinal, it is possible to convert back and forth between the value of the item and the value of the cardinal associated with its position in the enumeration. This is done in the following way.

If one has:

  TYPE
    DayName = (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday);

  VAR
    day : DayName;
    cardNum : CARDINAL;
    ch : CHAR;  (* this is a built-in enumerated type *)

then the following assignments return the indicated values.

cardNum := ORD (Sunday)		cardNum is now 0
cardNum := ORD (Thursday)		cardNum is now 4
day := VAL (DayName, 2)		day is now Tuesday
day := VAL (DayName, 7)		Range error.  Last is #6.
cardNum := ORD ("A")		cardNum is now 65
ch := VAL (CHAR, 90)		ch is now "Z"
The built-in function ORD takes one of the names associated with the values of an enumerated type and returns the CARDINAL value corresponding to the position of the name in the enumeration.
The built-in function VAL takes the name of an enumerated type and a CARDINAL value of a position number in the enumeration and returns the corresponding value in the specified type.

Since the position of an item in a list ranges from 0 through (n-1) where n is the number of items, one might think that it would be proper to use VAL and ORD to convert only to and from CARDINALs, not INTEGERs. See the earlier discussion concerning the compatibility of these two. However, VAL has an extended meaning, and in standard Modula-2 can also be used to convert a value of any numeric type to an appropriate value in any other numeric type. Thus,

real := VAL (REAL, int);

is a way of writing

real := FLOAT (int);

and

lreal := VAL (LONGREAL, real);

converts from REAL to LONGREAL, and so on.

As the last two examples illustrate, the CHAR type is also an enumeration of the underlying national characters (often ISO/ASCII,) whose values could be obtained by writing a program to output them on a printer.

VAL and ORD are two more examples of built-in identifiers for standard functions. As with all standard identifiers, they are not reserved words. Rather, all such functions are automatically imported into every module (They are pervasive identifiers.) and should be regarded as unavailable for assignment by the programmer. It is syntactically correct, but in very bad taste to write:

  VAR
    VAL : CARDINAL        (* bad bad bad *)

VAL and ORD are also inverse functions, in the sense that if name is of type T, and num is a CARDINAL, then

  ORD (VAL (T, num));		yields num, and
  VAL (T, ORD (name));		yields name.

The built-in type BOOLEAN is also enumerated. It may be though of as having the definition:

  TYPE
    BOOLEAN = (FALSE, TRUE);

Thus,

  VAL (BOOLEAN, 0);	yields FALSE,
  ORD (TRUE);		yields the value 1,
  INC (boolVar);		produces TRUE if boolVar was false.

Naturally, if boolVar were TRUE, incrementing it would render it undefined, as would decrementing it when it were FALSE.

Here is a sample program illustrating some of these ideas. It calculates a person's pay for a week, given the number of hours worked and the hourly wage.

MODULE SimplePay;

(* Written by R.J. Sutcliffe *)
(* to illustrate the use of enumerated types *)
(* using ISO Standard Modula-2 *)
(* last revision 1996 12 03 *)

FROM STextIO IMPORT
  WriteString, WriteLn, ReadChar, SkipLine;
FROM SWholeIO IMPORT
   WriteCard;
FROM SRealIO IMPORT
  ReadReal, WriteFixed;

TYPE
  DayName = (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday);

VAR
  day : DayName;
  num : CARDINAL;
  wage, hours, totHours : REAL;
  key : CHAR;

PROCEDURE WriteDay (day : DayName);
(* write out a string appropriate for the value of day
pre: none
post: a string is written but no line end is written; *)

BEGIN
  IF day = Monday
    THEN
      WriteString ("Monday");
    ELSIF day = Tuesday THEN
      WriteString ("Tuesday");
    ELSIF day = Wednesday THEN
      WriteString ("Wednesday");
    ELSIF day = Thursday THEN
      WriteString ("Thursday");
    ELSIF day = Friday THEN
      WriteString ("Friday");
    END;
END WriteDay;

BEGIN
  WriteString ("This program computes total weekly wages from ");
  WriteLn;
  WriteString ("a wage rate and daily hours worked");
  WriteLn;
  WriteLn;
  WriteString ("What is your hourly wage? ");
  ReadReal (wage);
  SkipLine;
  totHours := 0.0;     (* initialize total hours *)
  WriteLn;
  day := Monday;

  WHILE day <=Friday
    DO
      WriteString ("How many hours did you work ");
      WriteString ("on ");
      WriteDay (day);
      WriteString ("? ==> ");
      ReadReal (hours);
      SkipLine;
      totHours := totHours + hours;
      WriteLn;
      INC (day);
    END;      (* while *)

  WriteString ("Your total wages for the week are $ ");
  WriteFixed (wage * totHours, 2, 0);
  WriteLn;
  WriteString ("Press a key to conclude ==>");
  ReadChar (key);
END SimplePay.

Sample Output:

This program computes total weekly wages from
a wage rate and daily hours worked

What is your hourly wage? 15.75
How many hours did you work on Monday? ==> 8.0
How many hours did you work on Tuesday? ==> 7.5
How many hours did you work on Wednesday? ==> 7.0
How many hours did you work on Thursday? ==> 6.0
How many hours did you work on Friday? ==> 8.0
Your total wages for the week are $ 574.88
Press return to conclude ==>

As this example illustrates, the ordinal value Monday and the string, "Monday" are not the same thing. A common student error is to code WriteString (day) in an attempt to output the string. The names Monday, Tuesday, Wednesday, Thursday, and Friday are for the use of the compiler only; the program sees them as abstract values, and not as strings. That is, the transparency of the type DayName to the program only goes as far as the listing of the possible values it can take on. The letters M,o,n,d,a, and y used to type the ordinal value Monday in the source code are not available as characters in a string "Monday" to the finished program; only the abstract value is.

The value Saturday is needed to prevent a run time error when the value Friday is incremented.

5.2.2 Subranges Of Existing Types

Still another way to create a new data type is to specify it as a range of consecutive values taken from some built-in or previously defined type (called the host type). The range is indicated by enclosing it in brackets.

A subrange of an ordinal or enumerated type is a sequence of consecutive values of the host type that is indicated by: [start of range .. end of range] where (start of range) <= (end of range.)

Here are a few examples:

  TYPE
    Capitals = ['A' .. 'Z'];
    Peg = ['A' .. 'C'];
    Digit = [0 .. 9];
    Smallnum = [-5 .. 5];
    Mistake = [5 .. 1]; (* compiler error as start > end *)

  VAR
    letter : Capitals;
    origin, destination, temporary : Peg;
        (* use in Tower of Hanoi program *)
    num : Digit;
    sNum : Smallnum;

Or, suppose that the type DayName is declared as in the first line of the TYPE declaration below. Then the second declaration of Weekday as a subrange of the user-defined enumeration is also valid.

  TYPE
    DayName = (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday);
    Weekday = [Monday .. Friday];

  VAR
    day : DayName;
    wDay : Weekday;

It should be noted that when VAL and ORD are used on a subrange of an enumeration type, the values they use or produce are relative to the host type, and not the subrange. As the standard puts it: "The ordinal number of a value of a subrange shall be the same as the ordinal number which the value has in the host type." Thus:

VAL (Weekday, 1)		produces Monday, and
num := ORD (Tuesday)	produces 2

It is also possible (and optional) to include the host type when declaring a range, in the following manner:

  TYPE
    SmallRange = INTEGER [0..99];

This overrides the compiler's automatic assignment of this range as a sub-type of the host type CARDINAL.

There are potentially three advantages to using a subranges. The first is that the subrange is a different abstraction than the original type. Use of the subrange may therefore make a program clearer. Second, some systems may store the values of a subrange more efficiently than those of the original type. This cannot be counted on, and makes a difference (if at all) only when the number of such variables is large, and memory is scarce. Third, an error is generated if any attempt is made to assign something to a variable that is not in the subrange. This occurs because the incorrect value is actually of a different type than the variable to which the assignment is being made. For instance, with the above declarations, all of the following assignment attempts are incorrect:

  day := 5;		(* because 5 is not a DayName *)
  wDay := Sunday;	(* not in the subrange *)
  letter := "p";	(* lowercase not in this type *)
  num := -5;		(* no negatives in this range *)
  sNum := 10;		(* too large, out of range *)

Notice that the details of the declaration will determine whether two variables are expression compatible (the type or at least the host type must exactly match) or assignment compatible (i.e. the base types are assignment compatible). Consider the following declarations:

TYPE
  ARange = CARDINAL [1 .. 10];
  BRange = INTEGER [1 .. 10];
  CRange = ARange;
VAR
  aRangeVar : ARange;
  bRangeVar : BRange;
  cRangeVar : CRange;

Now,

  cRangeVar := aRangeVar;  is legal (types are actually equal)
  aRangeVar := bRangeVar;  is legal (assignment compatible bases)
  cRangeVar := aRangeVar + cRangeVar;  is legal (expression compatible)
  aRangeVar := bRangeVar + aRangeVar;  is illegal (expression incompatible)

A variable of type CRange can be assigned to or used in the same expression as one of type ARange and vice versa. A variable of type CRange can be assigned to but cannot be used in the same expression as one of type BRange even though the same subrange of the same base type is employed to define both. As far as the compiler is concerned, CRange and ARange are expression compatible, but BRange is only assignment compatible with either. The program declaration says they are different types, does it not? There must have been some reason for having two separate types, so once they are declared that way, the programmer must use them that way.

It is also possible to directly assign a variable of type ARange to a variable of type CARDINAL or INTEGER, or to a variable in some range from which ARange is derived (a super range).

Here is some code that is erroneous, but the error cannot be caught by the compiler:

  TYPE
    eighties = [1980..1989];
    nineties = [1990..1999];
 VAR
    year1 : eighties;
    year2 : nineties;
 BEGIN
    ...
    year1 := year2;

In this case, the two variables are of assignment compatible types because both are subranges of the same underlying type, so the compiler has nothing to complain about. However, because their respective ranges share no values in common, an actual assignment will always yield an error at run time.

The method described earlier of deriving a new type from a specified host type also applies when the host type is user-defined. One may write:

  TYPE
    ARange = [1..10];
    BRange = ARange [2..5];

and the values of variables of type BRange would be compatible with those of type ARange because BRange is specifically declared to be a subtype of ARange.

5.2.3 Summary of some Modula-2 compatibility issues:

1. Two variables are (expression) compatible if:
- they are of the same type
- the type of one was declared equal to the type of the other
- the type of one is a subrange of the other, or
- both types are subranges of the same type.

2. Two variables are assignment compatible if:
- their types are compatible
- one is INTEGER and the other CARDINAL or a subrange thereof, or
- one is CARDINAL and the other INTEGER or a subrange thereof.

3. Two variables are incompatible otherwise.

In view of these complications, and the fact that more assignments are likely to fail, one might ask why use subranges at all? The answer is that it is better to get a run time range error at the point at which an inappropriate assignment is first made, rather than much later in the program. The use of a subrange pinpoints the error to the place where remedial action must be taken. Without it, tracing the logic of faulty output back to the appropriate point might be very tedious indeed.

With both enumerations and subranges defined, it is possible to give the following:

5.2.4 Summary of some Modula-2 types:

1. Whole number types:

These are INTEGER, CARDINAL, non-standard long versions of either, if provided, and the type of whole number literals (such as the "5" in thumb := 5.) Items of the latter type may be assigned to either of the other two whole number types, provided they are in an appropriate range.

The name of the underlying type of whole number literals (whether signed or unsigned) is the Z-type, a supertype thought of as including all such whole number literals.

2. Ordinal types:

These are the whole number types, enumerations (built-in or user-defined) and subranges. An ordinal type is any type, all of whose values can be put into a one-to-one correspondence with a finite subset of the Z-type. That is, they are the ones that can be counted off with whole numbers.

3. Scalar types:

These include all the ordinal types together with the real types (REAL and LONGREAL). Because in practice there are only a finite number of different reals representable on a machine with finite precision, one could conceivably count the items of these two types with ordinal numbers. However, what would be two consecutive reals on one machine would not necessarily be on another (perhaps neither could even be represented exactly). Thus the counting of the real entities by a Modula-2 program would not be predictable. Moreover, in theory, a real type models the real numbers of mathematics, of which there are an infinite number between any two given reals. At the very least, there are almost certain to be more representable reals on a given machine than there are available cardinal numbers to count them. Thus the real types are distinguished from the ordinal (countable) types.

Real and long real literals and constants are said to be of the R-type, a supertype thought of as including all such real numbers.

4. Number types:

These include the whole number and real types, (and, as will be seen in a later section, complex number types) but not enumerations such as the type DayNameused above.

5.2.5 Making Comparisons

All scalar types, including user-defined ones can be compared for equality or inequality. Using the less than and similar operators on them also makes sense because scalar types all have an ordering. Thus the line

  WHILE day <= Friday

in the module SimplePay or

  IF charVar <= "A"

both make sense. One may even write something like

  IF booleanExpression1 <= booleanExpression2

which is true unless the left side is TRUE and the right side is FALSE.


Contents