14 April 2013

Mismatch in the Family


The fact that a language can be regarded as a bundle of coevolving replicators has important consequences for the family-tree model of language evolution. The family tree of a group of languages is the sum of the genealogies of those replicators that have formed coherent bundles (or “have been in the family”) for a sufficiently long time. Any branchings in the family tree tend to correspond to branching-points in the histories of the associated replicators. But this is only a statistical effect. Individual replicators may develop competing variants in the same speech community or invade different languages across communication barriers. Let’s imagine a situation (illustrated with the diagram below) in which a speech community undergoes internal differentiation into more than two languages. If the process were abrupt, we would expect all such splits to be binary. But a speech community is normally a network of numerous local or social sub-communities. Their historical individuation as separate languages takes time and proceeds gradually; innovations spread easily between mutually intelligible dialects. We have prolonged transitional periods between “a dialect network” (say, Vulgar Latin) and “a group of languages” (say, French, Spanish, Italian, Romanian, etc.), rather than a clean separation point. Little wonder that if replicators produce variants during the dialectal period, those variants do not have to undergo neat resolution, all in the same way, in the emerging languages. Quite the opposite, a good deal of mismatch can be expected. In our example, replicator A splits into two variants: A1 and A2, and replicator B splits into B1 and B2, all of them coexisting in the same language, Proto-XYZ, which then splits into X, Y, and Z. Let’s suppose that in each of the daughter languages only one variant of A or B survives and the other is lost. The variants may well end up segregated like this:

Conflicting testimony
  • X: A1, B1
  • Y: A1, B2
  • Z: A2, B2
The distribution of {A1, A2} suggests that X and Y share a common innovation (A1) and so are more closely related to each other than either of them is to Z. But the distribution of {B1, B2} shows a common innovation (B2) suggesting a closer relationship  between Y and Z (to the exclusion of X). The mutual contradiction is only apparent, though. Not every common innovation of a cluster of languages arose after the separation of their most recent common ancestor from its relatives, and so none of them individually tells us much about the subgrouping of {X, Y, Z}. If nearly all replicators behave like A and only some are like B, we shall prefer {{X, Y}, Z}, but if the evidence is less robust, we may be unable to decide between {{X, Y}, Z} and {X, {Y, Z}} (or perhaps {Y, {X, Z}}, if we take more data into account).

As a real-world example, consider three Slavic languages: Polish, Czech and Slovene. According to handbook classifications, Polish and Czech are members of the West Slavic grouping, characterised by a cluster of shared innovations, for example the regular development of Proto-Slavic *tj, *dj into consonants traditionally transcribed *c, *ʒ ( IPA [ʦ, ʣ], a pronunciation preserved in Polish, where the spelling is c, dz; in Czech, *ʒ ends up as /z/, but *c remains /ʦ/). Slovene, a South Slavic language, shows a different phonetic development. The clusters in question are reflected as Slovene č and j (presumably via the palatal stops *, *  IPA [c, ɟ]):

Proto-Slavic
Polish
Czech
Slovene
*světja ‘candle’
świeca
svíce
svẹča
*medja ‘boundary’
miedza
meze (< *meʒa)
meja

There are, however, other changes that tell a different story. The Proto-Slavic non-initial sequences *or, *er, when followed by a consonant, developed in different ways in different parts of Slavic. In Polish and in some other West Slavic languages the vowel and the consonant simply swapped places, yielding *ro and *re, respectively. But in the Slavic dialects ancestral to Czech and Slovak they followed the South Slavic pattern: the outcome was *ra, *, with vowels that can be regarded, in Proto-Slavic terms, as the tense counterparts of lax *e and *o):

Proto-Slavic
Polish
Czech
Slovene
*morkъ ‘twilight’
mrok
mrak
mrak
*berza ‘birch’
brzoza (< *breza)
bříza (< *brěza)
brẹza (< *brěza)

It would be fair to say that the Czech-Slovak group is on the whole “West Slavic” but “South Slavic” in several respects, including the treatment of vowel + *r sequences. As could be expected, there are other complications as well. For example, the groups *tj, *dj do not yield the same otcome everywhere in South Slavic. In Serbo-Croatian we find ć, đ (= IPA [ʨ, ʥ]), which can plausibly be derived from the same source as the Slovene variants, but in Bulgarian (as well as Old Church Slavonic) the development is highly idiosyncratic: št, žd. It is unlikely that the Slovene/Serbo-Croatian type and the Bulgarian one represent a single “Proto-South Slavic” innovation (in fact, the former seems more akin to the West Slavic development). The linguistic diversity of the South Slavic dialectal network must have been considerable even before it started to break up into separate languages, and some of the pre-split variation still persists. The same is true of West Slavic, which is hardly uniform and whose separation from South Slavic on the one hand and East Slavic on the other is not entirely consistent: the family trees of individual replicators often fail to match each other. In such cases it is difficult to represent the historical relationships among the members of the linguistic grouping as a neat  phylogeny with clearly distinct branches.

6 comments:

  1. Hi Piotr,

    Out of curiosity, do you know of any asymmetries like the above in the case of lower-level Slavic branchings such as Eastern/Western South Slavic?

    Regards

    ReplyDelete
  2. I'm sure there are some, although it may be disputable which of them go back to "Late Common Slavic", and which result from more recent diffusion in the Balkan Sprachbund. For example, Macedonian clusters with Bulgarian in most respects, but there are also features connecting it with Serbian. The treatment of *tj, *dj (Macedonian /c, ɟ/ rather than /ʃt, ʒd/) is of the "Western South" type, though it has been argued that it reflects relatively recent South Serbian influence.

    ReplyDelete
  3. My impression is that "South Slavic" is just lumping together all extant Slavic languages that are neither West nor East Slavic. I wouldn't be able to name any common South Slavic feature that wouldn't be shared by some East or West Slavic languages as well.

    ReplyDelete
  4. I agree. It's an areal grouping of mixed origin, not reducible to a "Proto-South Slavic" language. There was no single migration of Slavic speakers into the Balkan region. I didn't say so explicitly above, but Proto-XYZ in my diagram should actually be identified with Proto-/Common-Slavic (with only three selected descendants shown in the picture as a matter of didactic simplification).

    ReplyDelete
  5. Hi Piot
    Would you opine that East Slavic is the most homogenous 'grouping' ?

    ReplyDelete