Site hosted by Angelfire.com: Build your free website today!

Page 29 : Beale Ciphers Analyses

C3: More typesetting wizardry?

   October 18, 2008 -

    C3 was the last bastion of hope for treasure hunters and cryptanalysts alike. It alone had yielded no clues, no secrets. Here surely, was the true treasure map or at least, the author's confession.

    These dreams are now faded. For the first time we have a simple, logical, and highly probable explanation for how cipher C3, "Names and Residences", was created.

Acknowledgement

    This discovery would not have happened without the stimulating exchange of ideas I have enjoyed with Robert Lewxian over many months. He found the Lewxian Extensions which led to an explanation of C1 (See Page 27). He performed many analyses of the Beale ciphers by laying one code over the other, in what he called Layered Format, and discovered many intriguing similarities between the two in what appeared to be the same numbers jumbled. He concluded this was the result of some type of copycatting.

Manual typesetting 101 - continued

   In those days of manual presses, to print 4000 copies of a book, you printed 4000 copies of each page, one page at a time, and then merged them manually for binding.

    After a page was printed, the galley had no further use. There was no question of stacking these galleys on a shelf for some future edition. The value of the type inventory it contained was too great to lay around. The galley was disassembled and the sorts used for other purposes.

Finishing the story

    John W. Sherman had expended a great deal of time and energy on this project. The ciphers were particularly troublesome. For C2 he had typeset the whole Declaration of Independance and numbered all the words. He wrote the C2 message, coded it, and typeset the codes. In this laborious process he made many well documented errors.

    He wanted to get it over. He cut corners with C1. As shown on Page 27, he took preset type from a previous publication of a religious nature, jumbled the type, inserted the Gillogly strings, coded it, and typeset the code. This was far less work than C2, but still too much.

    What then was the easiest way to create the third cipher, C3? This had no message; all he needed were the numbers.

Eureka!

    The C1 and C2 galleys were already printed. Why not use those numbers?

how.jpg (40880 bytes)

    To prove this is what happened we need to find the segments of code in C3 that are comprised of the same digits from segments in C1 or C2. To do this, we need a computer program.

Description of the Clusters program

   The Clusters program was written in MS Visual Basic 5.0 for this purpose. In the following descriptions and tables the word "string" is used to mean the chunk of characters that Sherman lifted from one galley to eventually place in the C3 galley. In counting the string length, we count only the numerical digits, not the commas or spaces.

    The program works as follows:

    Prior to each run, select the source cipher, C1 or C2, and select the string length to test. String lengths of 20 to 45 were tested.

    Using C1 and a length of 30 as an example, the program began by selecting the first 30 digits of C1. It then went to C3 and compared the contents of digits 1-30, 2-31, 3-33, etc., until the end of C3. It is looking for equal content of digits, eg. the C3 string is a jumbled version of the C1 string. It then went back to C1 and selected digits 2-32, and repeated the C3 process, and so on until the end of C1 was reached. The total C1/C2 to C3 string combinations is approximately four million for each string length.

    As a specific example, the string in the graphic above from C1, sequence 136 to 144, 81-34-69-128-367-460-17-81-1, contains the same digits as the string from C3 sequence 8 to 15, 318-28-96-107-41-631-78-146. These digits are: 01111123344666778889. Note that any string can begin and end in the middle of a specific code. It just happened that Sherman picked it up that way.

    This is a very time-consuming computation. A 2.4Ghz processor took about 25 minutes for each string length tested, and somewhat less for the shorter lengths.

Program results

    These are the quantity of matching strings found:

Strings found
Length C1 C2
20 12 2
21 7 0
22 5 1
23 4 1
24 3 2
25 1 1
26 0 0
27 2 0
28 4 0
29 3 0
30 2 0
31 4 0
32 4 0
33 2 0
34 2 0
No strings were found longer than 34

    This was already very revealing and clearly showed that C3 had been copied from C1 only, and that with string lengths in the low 20's and below, we were encountering random chance for matching strings. There is little statistical probablity that matching strings of 30 digits or more could be accidental.

    On further analyses, many of the strings were found to be overlaps or substrings, a short one within a long one. Eliminating those left us with the following list of unique strings. In this table, the C3 strings are jumbled versions of the C1 strings, containing the same digits.

 

Equivalent string pairs

Length

Sequence

String

20

C1: 136-144

81-34-69-128-367-460-17-81-1
C3: 8-15 318-28-96-107-41-631-78-146
22 C1: 367-375 216-548-96-11-201-77-364-218-6
C3: 36-45 66-15-108-68-77-43-24-122-96-11
21 C1: 409-417 11-150-29-38-46-172-85-194-39
C3: 99-107 96-214-218-311-43-89-51-90-75
20 C1: 326-334 4-23-111-109-62-31-501-823-2
C3: 169-176 311-96-54-32-120-18-132-102
22 C1: 445-454 716-275-74-83-11-426-89-72-84-1
C3: 186-196 6-87-75-47-21-29-37-81-44-18-126
24 C1: 22-32 64-27-81-139-213-63-90-1120-8-15-3
C3: 197-206 5-132-160-181-203-76-81-299-314-3
34 C1: 84-197 30-44-112-18-147-436-195-320-37-122-113-6-140-8
C3: 237-252 107-98-123-111-214-136-7-33-45-40-13-28-46-42-10
33 C1: 464-476 101-84-16-79-23-16-81-122-324-403-912-227-936
C3: 252-264 196-227-344-198-203-247-116-19-8-212-230-31-6
21 C1: 247-254 0-1101-365-92-88-181-275-346
C3: 334-342 1-305-618-951-320-18-124-78-6
20 C1: 4-13 1-89-76-11-83-1629-48-94-63-1
C3: 373-382 81-89-16-7-81-39-96-14-43-216
34 C1: 391-405 6-1817-51-39-210-36-3-19-540-232-22-141-617-84-2
C3: 380-393 14-43-216-118-29-55-109-136-172-213-64-8-227-30
29 C1: 75-86 6-117-136-219-27-176-130-10-460-25-485-1
C3: 454-466 74-63-120-11-54-61-73-92-180-66-75-101-12
28 C1: 244-254 19-26-33-10-1101-365-92-88-181-275-346
C3: 476-486 5-890-312-413-328-381-96-105-217-66-11
24 C1: 16-26 95-84-341-975-14-40-64-27-81-139-2
C3: 540-548 8-343-417-845-951-124-209-49-617
20 C1: 282-290 9-81-216-321-603-14-612-81-3
C3: 602-609 39-86-103-116-138-164-212-2

    The images below are accurate transcripts from The Beale Papers. In these, I have underlined and color-coded most of the string pairs from the table above.

NamesResidences.jpg (202236 bytes)

Locality.jpg (168976 bytes)

    How can we be sure that the copying was from C1 to C3 and not the reverse?

    By copying and jumbling from C1 to C3 the Gillogly strings and Lewxian Extensions were obliterated. If the reverse were true, we would have to accept that these were created accidentally. This is impossible.

What are the odds?

   When this hypothesis was first presented, it was accompanied with statistical calculations which attempted to prove it was a certainty.

    Subsequently, a visitor to this site proposed new tests. He suggested that files be created comprised of the same codes as C1 but arranged in random order, and that the Clusters program be run again with these files instead of C1.

    Ten such random files were created and tested. All of them produced some equivalent strings, but on average much fewer than C1. One random file however, came close to the results above, producing sixteen equivalent strings versus C1's nineteen. On this basis, we must conclude that the hypothesis, while highly probable, is not certain.

Another C3 mystery solved

   A peculiarity of C3 versus the other two ciphers is that the high order codes numbered 400 and above are concentrated in the second half. The graph below from Simon Ayrinhac clearly demonstrates this. Why?

    C3twoparts.jpg (90613 bytes)

    The answer is simple: laziness, sloppiness, rush to finish. Sherman was tired, proud of his accomplishment, and this was the end of the effort. He lost track of his original notion of DOI numbering. As he jumbled number strings, these higher order code numbers were created and he didn't bother to correct them.

Additional proof

    Type was cast in soft metal alloy consisting mainly of lead, antimony, and tin. There were sometimes manufacturing defects in individual pieces, and, over time, they accumulated nicks, dents, and scratches. These unique features can be used to identify a specific sort versus all others of the same character.

    The images we have of The Beale Papers pamphlet are not of sufficient resolution for this purpose, but a microscopic examination of an original printed copy will inevitably show that the same sorts were used for the composition of C1 and C3.

    With this objective in mind, I wish to obtain high resolution photographs of ciphers 1 and 3, Locality of the Vault and Names and Residences. If you know of an owner of an original pamphlet who might cooperate in this project, please contact me.

Errors, errors, errors

    Sherman was very consistent in one regard: errors.

    In his defense, he certainly did not think his dime novel would attain the fame it did. After all, he confessed it was all a joke with the Gillogly strings.

Final comment

    All we need now is a signed confession.

Back to homepage