Site hosted by Angelfire.com: Build your free website today!

Fundamental Principle of the

Comparative Method


In the work to follow I pose an a priori linguistic universal of Historical Linguistics. Then I try to develop a mathematical procedure for determining the degree of resemblance of two different words for the purpose of determining if two different words have a common linguistic origin. Also in this article I assert that if there are a number of correspondences between two languages then that could not be by chance. It is not necessary for the Historical Linguist to do any math, but I have been working out some of the math to show that the results of Historical Linguists in the comparative method are very real and very certain, and they are not rather artistic or intuitive guess work, as is implied in much of the literature on this subject.

If a given word of language A is compared with a word of some language B and both have a phonetic resemblance and both have a semantic resemblance, then the two words define a correspondence.

If there is a correspondence between a word of language A and a word of language B, then both words probably have a common linguistic origin. If there are two correspondences between language A and language B, then it is highly probable that each correspondence has behind it a common linguistic origin. If there are three correspondences, then that is strongly indicative of common linguistic origin for each of the three. If there are four correspondences or more, then the presence of such correspondences could not be by chance. If one or two of many correspondences are based on a chance resemblance, there would be no way of knowing that they were by chance.

If two words define a correspondence, that does not mean that there is an immediate relation between the two. There may have been other languages between them through which the word has been borrowed, or there may have been different stages of the language over time, so the word form may have changed in both the proto-language of A and the proto-language of B.

On the matter of phonetic resemblance it is convenient to examine the consonants of words since vowels change more often. First classify all consonants into 24 categories. So Arabic gh, Greek g, and hard g would be in the category [g]. Common b, retroflex b, and Greek b (labio dental voiced plosive) would be in the category [b]. Aspirated kh, trilled kh, palatal c, and k would be in the category [k]. We would have the following 24 phonetic categories: b, g, d, w, v, z, h, j, y, k, l, m, n, sh, zh, p, f, ch, q, r, s, t, th, dh. We can then group those phonetic classes together into groups of related phonetic values. We do this by observing what sounds readily change into other sounds, or what common sound laws we would expect to find. Some are /zh/ > /z/, /b/ > /p/, /m/ > /n/. We can group [b] and [p] together, and we can group [g] and [k] together. Thereby we can conservatively estimate the number of minimal phonetic groups at about 10. Unlike the first categories, these minimal groups will vary depending on what sound laws are in view. Some number of sound laws should be predefined, so that they can serve as a reference indicating what is considered a 'close resemblance'.

The following discussion concerns computing the degree of phonetic resemblance. For example take Old English 'cild' meaning "child" and Ancient Hebrew 'yld' meaning "child". The 'l' and 'd' correspond so they are each assigned a value of 1. If there was a correspondence which required a sound law, say /y/ > /j/ then we would have assigned a value of .5. If there was a correspondence where two different phonemes fit into the same phonetic category as defined above then we would assign a value of .75. The values would be added and then divided by the average number of consonants. So in this case the correspondence is (1+1)/3 = .67 which is about 67 percent. This leads to the definition of a phonetic correspondence. A phonetic correspondence is one in which the percent correspondence is at least 50 percent. Principles of analysis involving vowels is not developed in this article, but certainly sometimes words beginning with vowels may need to have the respective vowels taken into consideration, and semivowel to vowel correspondence should be taken into consideration (such as /y/ > /ee/ and /w/ > /oo/).

The determination of semantic correspondence is as follows. If the two meanings are the same assign a value of 1. If the two meanings are synonyms of each other then assign a value of .75. If the meanings are closely related then assign a value of .5.

The percent phonetic correspondence is added to the percent semantic correspondence, and then the resulting number is divided by 2. If the final result is 50 percent then we have a correspondence. A correspondence should be at least 50 percent to suggest a common linguistic origin. For two consonant words the percent correspondence should be at least 56 percent. Words of one consonant should have a correspondence of at least 62 percent to suggests a common linguistic origin.

Sometimes there does not seem to be a correspondence between two words, but in fact there is. This article does not deal with that situation.

The words 'cild' and 'yld', as discussed above, have a correspondence of (.67 + 1)/2 = .83 which is 83 percent. The sound law /y/ > /j/ is known to have occurred in Hebrew a long time ago, so we can compare /jild/* of Hebrew with /child/, and see that /ch/ is simply voiceless /j/. Note /jld/* or /jild/* was reconstructed based upon the regular distribution of the sound law /y/>/j/ in Hebrew.

A preliminary estimate of the number of semantic synonym categories is about 4800 for English. Let us say that there are about 5000 synonym categories, and about 5 or 10 related semantemes for a word on average. We wish to find the probability of a semantic correspondence. Let us say the phonetic correspondence is a given. Take 10 related semantemes divided into the number of synonym categories gives us the number 500. So there is 1 in 500 chance that two words of two particular languages would have a correspondence, that chance is .002 (two tenths of 1 percent). The probability of two semantic correspondences is the product (.0002)(.0002) = .00000004. The probability of four correspondences, given phonetic correspondence, is .0000000000000016. There are only about 550 language families in the world, so the probability of four correspondences in two given languages by chance is zero.

In conclusion, if we have a substantial number of correspondences between two languages, then we can be certain that there is a common linguistic origin for each of the correspondences.


Top                Back

HOME