3  Monoalphabetic ciphers

3.1 Introduction

Clearly, the Caesar cipher is not very secure. It’s probably enough if you were sending an unimaginative message to a friend, but for anything mildly important, you probably want to crank up the security a little bit!

The Caesar cipher is a very simple example of a monoalphabetic substitution cipher: one where each alphabet is replaced with another one. This means that there’s a one-to-one mapping between pairs of alphabet in the plain and cipher text. The problem with the Caesar cipher is that the replacement follows a very simple pattern, so once you know one mapping, the rest can be figured out right away.

We can go one step further and use a cipher where the mapping is completely random. For example, A can be mapped to B, but B can be mapped to Q, and C to I, and so on.

Question: How many possible encryptions are there? Would a brute-force attack on such a cipher be sensible?

Try encoding a piece of text by choosing your own cipher. You can either enter each character of the cipher manually, or use the ‘randomise’ button to generate a random cipher.

A→
B→
C→
D→
E→
F→
G→
H→
I→
J→
K→
L→
M→
N→
O→
P→
Q→
R→
S→
T→
U→
V→
W→
X→
Y→
Z→

3.2 Frequency analysis

Although we realistically cannot use brute force, there is a much more clever way to crack such a code. It relies on the fact that certain letters of the alphabet are much more common than others in typical English text.

Try pasting some text into the box below (or clicking the samples), and observe the frequency distribution of the letters in the plot that appears:

Samples:

Question: Try a few different text sources. What are the most and least common letters? Can you think of any reasons why this distribution might systematically vary from text to text?

If you speak a foreign language, try analysing some text in that language to observe how the distribution might change. (Sadly, the box above ignores all accented characters! The schemes we’re discussing today can be adapted to work on non-English letters, but today we’ll focus only on the 26 English alphabet.)

In English, the most common letter is by far ‘E’. If we perform the same analysis on the cipher text, and find that ‘R’ is the most common letter, then it’s likely that ‘R’ decodes to ‘E’ in the plain text.

Once we have a match, we can fill it in and try to solve the rest in an iterative manner.

Another useful piece of information you can get from the cipher text is to find repeated sequences of letters. For example, once you find the letter for ‘E’, it makes sense to look for potential spots where ‘THE’ might be encoded.

3.3 Decryption

With the above information, you should be able to have a go at decrypting this encrypted text. The frequency distribution of letters is shown to the right:

A→
B→
C→
D→
E→
F→
G→
H→
I→
J→
K→
L→
M→
N→
O→
P→
Q→
R→
S→
T→
U→
V→
W→
X→
Y→
Z→

Hint: Once you have put together a few letters, look out for common, repeating, two- or three-letter patterns. The most common set of three letters in English is “THE”. Can you find anywhere where this word might fit?

The first 800 characters from the text above are shown here. In each of the lines below, the upper character represents the cipher text, and the lower character is the decoded plain text: