7.4 Comparing and Manipulating Strings

Some of the material in this section has already been touched upon in this chapter, but is here discussed in more detail. Suppose one has a variable of a String type, that is, an ARRAY OF CHAR of some length, that is holding a keyboard input. One then wishes to compare this with some other string--either a literal, or one of a (possibly) different potential length (i.e. of a different formal type). Say, for instance:

  TYPE
    String80  = ARRAY [0 .. 80] OF CHAR;
    String10  = ARRAY [0 .. 10] OF CHAR;

  VAR
    str1, str3 : String80; 
    str2 : String10;

Now, such comparisons as are found in code like

  IF str1 = str2 ... or
  IF str2 = 'January' ...

will certainly result in a "Type conflict" error in the first case and ought to in the second one as well (though some non-standard versions have been known to relax the rules.)

Neither can the programmer expect

  IF str1 = str3 ...

to yield meaningful results, even if an ISO compiler would pass the code as correct, which it does not, because it does not permit arrays to be compared. Even if it did, comparisons would need to involve the entire array, and two arrays are equal only if all their entries are equal (including ones after the string terminator in which we have no interest.) The extraneous characters present after the string terminator in the third comparison will usually cause these entities to be unequal as arrays even though one wishes to regard them as equal strings. (This is one of the problems with not using an abstract implementation of String; one simply knows too much about the structure, and that knowledge could get in the way.)

There is also the difficulty that if one writes something like:

IF str1 < str2
  THEN
    StringAlongWithUs
  ELSE
    StringUpTheUser
  END;

one will always get an error, as the comparisons "<" and ">" are not defined for arrays. Several solutions to these difficulties are possible--most involve appealing to procedures such as Equal and CompareString, which are designed for the purpose of making such comparisons.

It would not be hard to write such procedures if the system lacked it. The comparison would proceed character by character through the two strings until it found one that was different, and at that point the function value would be returned. The key comparison could be something like:

  IF ORD (str1 [count]) < ORD (str2 [count]) ...

NOTE: In the ASCII character set that Modula-2 uses, uppercase letters have a lower ordinal value than do lower case letters.

The details involved in completing this are left to the student as an exercise.

Certain very specific situations arise frequently--such as comparing an input string to one of several possibilities in a small list. For these, one may make use of a specially tailored method as illustrated in the next example.

Suppose the need is to compare input keyboard data, say, to the names of months, in order to determine what action to take next. (That action might be based on the number of days in the month, for instance.) One might take advantage of the fact that the first one to three letters are sufficient to determine the month and that in some cases, one letter is sufficient. Suppose:

  TYPE
    MonthName = (Err, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec);
  VAR
    month : MonthName;
    answer : ARRAY [0 .. 10] OF CHAR;

Now, after accepting a keyboard input of the name of a month into the variable answer, one could determine which of the twelve enumeration constants it corresponded to in the following way:

PROCEDURE MonthEnum (mon : ARRAY OF CHAR) : MonthName;

VAR
  ch : CHAR;
  
BEGIN
  (* check for unique characters in third position *)
  IF CAP(mon [2])="B"
    THEN
      RETURN Feb
    ELSIF CAP (mon[2]) = "C" THEN
      RETURN Dec
    ELSIF CAP (mon[2]) = "G" THEN
      RETURN Aug
    ELSIF CAP (mon[2]) = "L" THEN
      RETURN Jul
    ELSIF CAP (mon[2]) = "P" THEN
      RETURN Sep
    ELSIF CAP (mon[2]) = "T" THEN
      RETURN Oct
    ELSIF CAP (mon[2]) = "V" THEN
      RETURN Nov
    ELSIF CAP (mon[2]) = "Y" THEN
      RETURN May
    END; (* if *)
    
  (* check for unique characters in second position *)
  IF CAP (mon [1]) = "P" THEN
      RETURN Apr
    ELSIF CAP (mon [1]) = "U" THEN
      RETURN Jun  (* Jul and Aug are done already *)
    END;
    
  (* look at remaining first letters *)
  IF CAP (mon [0]) = "J" THEN
      RETURN Jan  (* Jun and Jul are done already *)
    ELSIF CAP (mon[0]) = "M" THEN
      RETURN Mar (* May is done already *)
    ELSE   (* any other second letter passes to next step. *)
      RETURN Err;  (* anything else is an error *)
    END;

END MonthEnum;

Mind you, this method is far from perfect. It will accept "Hug", "Jug" and "Rut" as correct under the first case selection, and it will accept "Hay", "Pay" and "Tadpole" under the second. In fact, more incorrect answers will be processed and assigned a scalar of type MonthName than there will be correct ones. Note the need to have an error result that can be returned in order to prevent a run time error in the event that none of the twelve possibilities are selected from the keyboard input.

A more elegant approach involves the use of pattern matching and is enclosed in the following:

  TYPE
    MonthName = (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec);


PROCEDURE ConvertToMonth (name: ARRAY OF CHAR; 
                  VAR gotIt : BOOLEAN; VAR result: MonthName);

(* MonthName is the Type defined above. *)

VAR
  length,  (* Length of Name passed in *)
  monthNameCounter,  (* Counter on MonthNames *)
  count   (* Counter on Name passed in *) : CARDINAL;
  NamesOfMonths : ARRAY [0 .. 35] OF CHAR;

BEGIN
  NamesOfMonths := "JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC";
  (* First, get length of Name passed in, or use 3, whichever is less.  Also, capitalize it at the same time for easier matching *)

  length := 0;
  WHILE (length <= 2) AND (name [length] <> terminator)
    DO
      name [length] := CAP (name [length]);
      INC (length)
    END;   (* "length" now holds the actual length or 3 *)

  monthNameCounter := 0;
  (* "monthNameCounter" will count through "MonthNames" *)
  gotIt := FALSE;

  WHILE (NOT gotIt) AND (monthNameCounter < 36)
    (* last try is at letter #35 *)
    DO
      count := 0;   (* and "count" counts up to the length *)
      WHILE (count < length) AND (name [count] = NamesOfMonths [monthNameCounter])
        DO    (* try to match "length" characters in a row *)
          INC (count);
          INC (monthNameCounter);
        END;
      IF count = length    (* we did it *)
        THEN
          gotIt := TRUE;   (* tell outside world *)
          DEC (monthNameCounter); (* because start of next one *)
          result := VAL (MonthName, monthNameCounter DIV 3);
              (* and return month name *)
        ELSE    (* no, not yet, so skip to start of next month*)
          monthNameCounter := (monthNameCounter + (3 - count))
        END    (* if count *)       
    END;    (* first while *)

  END ConvertToMonth;

NOTE: The parameter gotIt is provided, because the value returned is not defined when gotIt is false.

This procedure is also not perfect, and may match things incorrectly if not used with care. For instance, if it is called with just "J" it will match the first string starting with that letter and return the value Jan. If it is called with the string "Ma" it will return Mar and never get to the letters describing May. It will not matter what is typed after the third letter if a match is possible. Thus, "Jan is a good kid" will produce Jan. However, if sufficient letters are provided in the input string to allow for a unique match, it will find one. Naturally, this same method could also be used to match somewhat longer strings.


Contents