7.6 Message Encoding and Cryptography

The encoding of messages into a form that a casual reader cannot decipher is more than just a schoolchildren's game. Sensitive data is often protected in such a manner, either for storage or for transmission to other parties. Intelligence gathering agents rely on such techniques to forward reports to their control authorities without the corresponding agencies of hostile nations having access to their contents. Counter intelligence operations simultaneously work hard to break such codes and read the messages after all. Lives and whole wars have been won or lost because of the quality of a nation's code making or code breaking techniques.

One of the simplest types of ciphers (and one of the easiest to break) is based on a simple substitution. Each letter in the original message is replaced by a corresponding letter in the coded version. The decoder must apply the substitution in reverse to render the code in plain text. The substitution could be a simple shift:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"

is coded by

"BCDEFGHIJKLMNOPQRSTUVWXYZA"

Or, it could be coded by a a less regular replacement:

"QAZXSWEDCVFRTGBNHYUJMKIOLP"

The means for doing this is rather easy to render in Modula-2.

DEFINITION MODULE Substitution;

TYPE
  CodeString = ARRAY [0 .. 25] OF CHAR;

PROCEDURE Encode (source : ARRAY OF CHAR;
       VAR dest : ARRAY OF CHAR; key : CodeString);
(* encode the source string with a substitution cipher defined by the key into the destination string *)

PROCEDURE Decode (source : ARRAY OF CHAR;
      VAR dest : ARRAY OF CHAR; key : CodeString);
(* decode the source string with a substitution cipher defined by the key into the destination string *)

END Substitution.

IMPLEMENTATION MODULE Substitution;

(* several local procedures are useful *)

PROCEDURE AlphaPos (ch : CHAR) : CARDINAL;
(* returns the position of a character in the alphabet in the range 0 .. 25 *)

BEGIN
  RETURN ORD (CAP (ch)) - ORD ('A');
END AlphaPos;

PROCEDURE DecodeKey (skey : CodeString; VAR dKey : CodeString);
(* construct an inverse coding key for use in decoding *)

VAR
  count : CARDINAL;
  ch : CHAR;

BEGIN
  FOR count := 0 TO 25
    DO
      ch := skey [count];
      dKey [AlphaPos (ch)] := CHR (ORD ('A') + count);
    END;
END DecodeKey;

PROCEDURE IsLetter (ch : CHAR) : BOOLEAN;
(* determines if the character passed is a letter or some other char *)

BEGIN
  IF ((ORD (ch) >= ORD ('a')) AND (ORD (ch) <= ORD ('z')))
      OR ((ORD (ch) >= ORD ('A')) AND (ORD (ch) <= ORD ('Z')))
    THEN
      RETURN TRUE
    ELSE
      RETURN FALSE
    END;
    
  END IsLetter;

(* now the exported procedures *)

PROCEDURE Encode (source : ARRAY OF CHAR;
         VAR dest : ARRAY OF CHAR; key : CodeString);

VAR
  max, count : CARDINAL;

BEGIN
  count := 0;
  max := LENGTH (source);
  IF max > 0 (* trap out any empty strings and skip this *)
    THEN
      WHILE (count <=  max - 1) AND (count <= HIGH (dest))
        DO
          IF IsLetter (source [count]) (* only encode letters *)
            THEN
              dest [count] := key [AlphaPos (source [count])];
            ELSE
              dest [count] := source [count];
            END;  (* if *)
          INC (count);
        END; (* while *)
    END; (* if *)
  IF count < HIGH (dest)
    THEN
      dest [count] := "";
    END;
END Encode;


PROCEDURE Decode (source : ARRAY OF CHAR;
           VAR dest : ARRAY OF CHAR; key : CodeString);

VAR
  theKey : CodeString;

BEGIN
  DecodeKey (key, theKey); (* create the decode key *)
  Encode (source, dest, theKey );  (* and encode back with it *)
END Decode;

END Substitution.

Here is a brief module designed to test the code above by encoding and then decoding a string.

MODULE TestSubstitution;
(* By R. Sutcliffe.  Revised 1993 04 06 *)

FROM STextIO IMPORT 
  WriteString, WriteLn, ReadString, SkipLine;

FROM Substitution IMPORT
  Encode, Decode, CodeString;

TYPE
  String = ARRAY [0 .. 255] OF CHAR;

VAR
  inStr, outStr, dStr : String;
  coder : CodeString;

BEGIN
  coder := "PLMOKNIJBUHVYGCTFXRDZESWAQ";
  WriteString ("enter message to encode.");
  WriteString (" End with a carriage return ");
  WriteLn;
  ReadString (inStr); (* get the message *)
  SkipLine;
  WriteString ("Original String:");
  WriteLn;
  WriteString (inStr);  (* echo input *)
  WriteLn;
  Encode (inStr, outStr, coder);
  WriteString ("encoded string:"); (* write it out encoded *)
  WriteLn;
  WriteString (outStr);
  WriteLn;
  Decode (outStr, dStr, coder);
  WriteString ("decoded string:");
  (* write de-encoded message to check *)
  WriteLn;
  WriteString (dStr);
  WriteLn;
END TestSubstitution.

Here is the output (except for the opening message) when this program was run.

Original String:

The quick brown fox jumped over the lazy dog's back.

Encoded String:

DJK FZBMH LXCSG NCW UZYTKO CEKX DJK VPQA OCI'R LPMH.

Decoded String:

THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG'S BACK.

This method of encoding is quite inadequate for any security purposes, for it is too easy to construct the coding string from the pattern of the message. The repeated triplet DJK can at once be guessed as THE, and the character after the apostrophe as an S. The positioning of the letters P and C strongly suggests that both are encoded vowels. From these clues, it would not take long to decode the message without the key. A frequency analysis of the letters in a longer message would open up its key even more rapidly.

Several more sophisticated methods are available and offer greater security. In one, the letters in the original are shifted ahead (mod 26) in the alphabet by an amount equal to the position of some special letter in a code word. When the letters of the code word are exhausted, they are used again. To decode, the letters need to be shifted backwards by the same amount as derived from the original code word. That is, for the message:

Mother has died.  Bring the black lantern.

the code word

beaded

produces the shifts

1,4,0,3,4,3  derived from the alphabetic positions of b,e,a,d,e, and d.

and this pattern of shifts is then applied in a repeating way to each block of six letters throughout the message, yielding

Ostkiu lav gjid.  Cviqk ule fobgk pdoxeur.

The result is that a given letter of the alphabet generally translates into a different letter every time it is encountered, making a frequency analysis of the letters worthless. The resulting code is much more difficult to break.

A secret agent would enter the field with a set of encoding words or phrases. Each would be used only once, then discarded. Even the number of letters in the encoding scheme would vary from one message to the next in a random fashion. The people receiving the messages would progress through the same series of encoders to decrypt the messages as they arrived. (This is called a one-pad system.) The pad of encoders might, perhaps be taken from the first lines of a sequence of pages of a popular novel. For even greater security, they might be a set of strings specially chosen to make decoding difficult.

Another use for this system would be to encode data files on a user-entered password. Any other password would produce nothing but gibberish when the file was read back. To make the result even more difficult to decode, the entire ASCII set of characters (128 in all) could be included in the coding scheme.

The code that follows merely re-implements the library module Substitution. Even in this version, not all the possible characters are encoded. The control characters, whose ASCII codes fall in the range 0..31 and 127, are skipped in the coding scheme so that only readable text is produced. Punctuation and spaces are encoded, however. Note that character coding and decoding are off-loaded into separate procedures.

IMPLEMENTATION MODULE Substitution;

(* several local procedures are useful *)

PROCEDURE CodeChar (VAR ch : CHAR; shifter : CHAR);

VAR
  temp : CARDINAL;
  
BEGIN
  IF (ORD (ch) > 31) AND (ORD (ch) < 127)
    THEN (* does not alter control characters; just the rest *)
      temp := ORD (ch) + ORD (shifter);
      IF temp > 126
        THEN
          DEC (temp, 95) (* subtract 127, add 32 *)
        END;
      ch := CHR (temp);
    END;
    
END CodeChar;

PROCEDURE DeCodeChar (VAR ch : CHAR; shifter : CHAR);

VAR
  temp : CARDINAL;
  
BEGIN
  IF (ORD (ch) > 31) AND (ORD (ch) < 127)
    THEN (* does not alter control characters; just the rest *)
      temp := ORD (ch);
      (* Now, if ord (ch) - ord (shifter) < 32, then 95 was subtracted *)
      IF ORD (shifter) + 32 > temp
        THEN
          INC (temp, 95) (* add 127, subtract 32 *)
        END;
      DEC (temp, ORD (shifter));
      ch := CHR (temp);
    END;
    
END DeCodeChar;

(* now the exported procedures *)

PROCEDURE Encode (source : ARRAY OF CHAR;
         VAR dest : ARRAY OF CHAR; key : CodeString);

VAR
  max, count, shiftCount, coderLen : CARDINAL;
  ch : CHAR;
  
BEGIN
  count := 0;
  max := LENGTH (source);
  IF max > 0
    THEN
      shiftCount := 0;
      coderLen := LENGTH (key);
      WHILE (count <= HIGH (dest)) AND (count <=  max - 1)
        DO  
          ch := source [count];
          CodeChar (ch, key [shiftCount]);
          shiftCount := (shiftCount + 1) MOD coderLen;
          dest [count] := ch;
          INC (count);
        END;
      END;
  IF count <= HIGH (dest)
    THEN
      dest [count] := "";  (* terminator character *)
    END;
END Encode;


PROCEDURE Decode (source : ARRAY OF CHAR;
         VAR dest : ARRAY OF CHAR; key : CodeString);

VAR
  max, count, shiftCount, coderLen : CARDINAL;
  ch : CHAR;
  
BEGIN
  count := 0;
  max := LENGTH (source);
  IF max > 0
    THEN
      shiftCount := 0;
      coderLen := LENGTH (key);
      WHILE (count <= HIGH (dest)) AND (count <=  max - 1) 
        DO  
          ch := source [count];
          DeCodeChar (ch, key [shiftCount]);
          shiftCount := (shiftCount + 1) MOD coderLen;
          dest [count] := ch;
          INC (count);
        END;
      END;
  IF count <= HIGH (dest)
    THEN
      dest [count] := "";  (* terminator character *)
    END;
END Decode;

END Substitution.

The client module used with the first version was re-linked and another sample run conducted with the result below. A better code string would result in an even more scrambled result. Note that if the restriction on coding to and from control characters were removed, the implementation would be easier to run faster (fewer computations) but the result would not be readable as plain text.

Original String:

The quick brown fox jumped over the lazy dog's back.

Encoded String:

EUSo]dSNNuKii_RtMhkde[ahGVp\dU^n^SHuUXtacYV`yXzHU[M 

Decoded String:

The quick brown fox jumped over the lazy dog's back.

There are much more sophisticated coding schemes than these, of course, and for very secure messages, enormous computing power is required to decode, and this can only be done by brute force trial-and-error.


Contents