Page 29 : Beale Ciphers Analyses
C3: More typesetting wizardry?
October 18, 2008 -
C3 was the last bastion of hope for treasure hunters and cryptanalysts alike. It alone had yielded no clues, no secrets. Here surely, was the true treasure map or at least, the author's confession.
These dreams are now faded. For the first time we have a simple, logical, and highly probable explanation for how cipher C3, "Names and Residences", was created.
Acknowledgement
This discovery would not have happened without the stimulating exchange of ideas I have enjoyed with Robert Lewxian over many months. He found the Lewxian Extensions which led to an explanation of C1 (See Page 27). He performed many analyses of the Beale ciphers by laying one code over the other, in what he called Layered Format, and discovered many intriguing similarities between the two in what appeared to be the same numbers jumbled. He concluded this was the result of some type of copycatting.
Manual typesetting 101 - continued
In those days of manual presses, to print 4000 copies of a book, you printed 4000 copies of each page, one page at a time, and then merged them manually for binding.
After a page was printed, the galley had no further use. There was no question of stacking these galleys on a shelf for some future edition. The value of the type inventory it contained was too great to lay around. The galley was disassembled and the sorts used for other purposes.
Finishing the story
John W. Sherman had expended a great deal of time and energy on this project. The ciphers were particularly troublesome. For C2 he had typeset the whole Declaration of Independance and numbered all the words. He wrote the C2 message, coded it, and typeset the codes. In this laborious process he made many well documented errors.
He wanted to get it over. He cut corners with C1. As shown on Page 27, he took preset type from a previous publication of a religious nature, jumbled the type, inserted the Gillogly strings, coded it, and typeset the code. This was far less work than C2, but still too much.
What then was the easiest way to create the third cipher, C3? This had no message; all he needed were the numbers.
Eureka!
The C1 and C2 galleys were already printed. Why not use those numbers?
To prove this is what happened we need to find the segments of code in C3 that are comprised of the same digits from segments in C1 or C2. To do this, we need a computer program.
Description of the Clusters program
The Clusters program was written in MS Visual Basic 5.0 for this purpose. In the following descriptions and tables the word "string" is used to mean the chunk of characters that Sherman lifted from one galley to eventually place in the C3 galley. In counting the string length, we count only the numerical digits, not the commas or spaces.
The program works as follows:
Prior to each run, select the source cipher, C1 or C2, and select the string length to test. String lengths of 20 to 45 were tested.
Using C1 and a length of 30 as an example, the program began by selecting the first 30 digits of C1. It then went to C3 and compared the contents of digits 1-30, 2-31, 3-33, etc., until the end of C3. It is looking for equal content of digits, eg. the C3 string is a jumbled version of the C1 string. It then went back to C1 and selected digits 2-32, and repeated the C3 process, and so on until the end of C1 was reached. The total C1/C2 to C3 string combinations is approximately four million for each string length.
As a specific example, the string in the graphic above from C1, sequence 136 to 144, 81-34-69-128-367-460-17-81-1, contains the same digits as the string from C3 sequence 8 to 15, 318-28-96-107-41-631-78-146. These digits are: 01111123344666778889. Note that any string can begin and end in the middle of a specific code. It just happened that Sherman picked it up that way.
This is a very time-consuming computation. A 2.4Ghz processor took about 25 minutes for each string length tested, and somewhat less for the shorter lengths.
Program results
These are the quantity of matching strings found:
Strings found | ||
Length | C1 | C2 |
20 | 12 | 2 |
21 | 7 | 0 |
22 | 5 | 1 |
23 | 4 | 1 |
24 | 3 | 2 |
25 | 1 | 1 |
26 | 0 | 0 |
27 | 2 | 0 |
28 | 4 | 0 |
29 | 3 | 0 |
30 | 2 | 0 |
31 | 4 | 0 |
32 | 4 | 0 |
33 | 2 | 0 |
34 | 2 | 0 |
No strings were found longer than 34 |
This was already very revealing and clearly showed that C3 had been copied from C1 only, and that with string lengths in the low 20's and below, we were encountering random chance for matching strings. There is little statistical probablity that matching strings of 30 digits or more could be accidental.
On further analyses, many of the strings were found to be overlaps or substrings, a short one within a long one. Eliminating those left us with the following list of unique strings. In this table, the C3 strings are jumbled versions of the C1 strings, containing the same digits.
Equivalent string pairs |
||
Length |
Sequence |
String |
20 | C1: 136-144 |
81-34-69-128-367-460-17-81-1 |
C3: 8-15 | 318-28-96-107-41-631-78-146 | |
22 | C1: 367-375 | 216-548-96-11-201-77-364-218-6 |
C3: 36-45 | 66-15-108-68-77-43-24-122-96-11 | |
21 | C1: 409-417 | 11-150-29-38-46-172-85-194-39 |
C3: 99-107 | 96-214-218-311-43-89-51-90-75 | |
20 | C1: 326-334 | 4-23-111-109-62-31-501-823-2 |
C3: 169-176 | 311-96-54-32-120-18-132-102 | |
22 | C1: 445-454 | 716-275-74-83-11-426-89-72-84-1 |
C3: 186-196 | 6-87-75-47-21-29-37-81-44-18-126 | |
24 | C1: 22-32 | 64-27-81-139-213-63-90-1120-8-15-3 |
C3: 197-206 | 5-132-160-181-203-76-81-299-314-3 | |
34 | C1: 84-197 | 30-44-112-18-147-436-195-320-37-122-113-6-140-8 |
C3: 237-252 | 107-98-123-111-214-136-7-33-45-40-13-28-46-42-10 | |
33 | C1: 464-476 | 101-84-16-79-23-16-81-122-324-403-912-227-936 |
C3: 252-264 | 196-227-344-198-203-247-116-19-8-212-230-31-6 | |
21 | C1: 247-254 | 0-1101-365-92-88-181-275-346 |
C3: 334-342 | 1-305-618-951-320-18-124-78-6 | |
20 | C1: 4-13 | 1-89-76-11-83-1629-48-94-63-1 |
C3: 373-382 | 81-89-16-7-81-39-96-14-43-216 | |
34 | C1: 391-405 | 6-1817-51-39-210-36-3-19-540-232-22-141-617-84-2 |
C3: 380-393 | 14-43-216-118-29-55-109-136-172-213-64-8-227-30 | |
29 | C1: 75-86 | 6-117-136-219-27-176-130-10-460-25-485-1 |
C3: 454-466 | 74-63-120-11-54-61-73-92-180-66-75-101-12 | |
28 | C1: 244-254 | 19-26-33-10-1101-365-92-88-181-275-346 |
C3: 476-486 | 5-890-312-413-328-381-96-105-217-66-11 | |
24 | C1: 16-26 | 95-84-341-975-14-40-64-27-81-139-2 |
C3: 540-548 | 8-343-417-845-951-124-209-49-617 | |
20 | C1: 282-290 | 9-81-216-321-603-14-612-81-3 |
C3: 602-609 | 39-86-103-116-138-164-212-2 |
The images below are accurate transcripts from The Beale Papers. In these, I have underlined and color-coded most of the string pairs from the table above.
How can we be sure that the copying was from C1 to C3 and not the reverse?
By copying and jumbling from C1 to C3 the Gillogly strings and Lewxian Extensions were obliterated. If the reverse were true, we would have to accept that these were created accidentally. This is impossible.
What are the odds?
When this hypothesis was first presented, it was accompanied with statistical calculations which attempted to prove it was a certainty.
Subsequently, a visitor to this site proposed new tests. He suggested that files be created comprised of the same codes as C1 but arranged in random order, and that the Clusters program be run again with these files instead of C1.
Ten such random files were created and tested. All of them produced some equivalent strings, but on average much fewer than C1. One random file however, came close to the results above, producing sixteen equivalent strings versus C1's nineteen. On this basis, we must conclude that the hypothesis, while highly probable, is not certain.
Another C3 mystery solved
A peculiarity of C3 versus the other two ciphers is that the high order codes numbered 400 and above are concentrated in the second half. The graph below from Simon Ayrinhac clearly demonstrates this. Why?
The answer is simple: laziness, sloppiness, rush to finish. Sherman was tired, proud of his accomplishment, and this was the end of the effort. He lost track of his original notion of DOI numbering. As he jumbled number strings, these higher order code numbers were created and he didn't bother to correct them.
Additional proof
Type was cast in soft metal alloy consisting mainly of lead, antimony, and tin. There were sometimes manufacturing defects in individual pieces, and, over time, they accumulated nicks, dents, and scratches. These unique features can be used to identify a specific sort versus all others of the same character.
The images we have of The Beale Papers pamphlet are not of sufficient resolution for this purpose, but a microscopic examination of an original printed copy will inevitably show that the same sorts were used for the composition of C1 and C3.
With this objective in mind, I wish to obtain high resolution photographs of ciphers 1 and 3, Locality of the Vault and Names and Residences. If you know of an owner of an original pamphlet who might cooperate in this project, please contact me.
Errors, errors, errors
Sherman was very consistent in one regard: errors.
In his defense, he certainly did not think his dime novel would attain the fame it did. After all, he confessed it was all a joke with the Gillogly strings.
Final comment
All we need now is a signed confession.