I found the lexical distance map fascinating but the closer I studied it the more things bothered me. Thus trawling through the net I found Tishchenko Cyrillic versions. So I sat down this weekend and translated, adjusted, combined and updated to create my own version here:
Lexical Distance Among Languages of Europe
Here a list of my changes.
First, the abbreviations. A Romance language abbreviated Pro, I found out after some research stands for Provençal, or Ga I assume stands for Scottish Gaelic. Tyshchenko’s abbreviations are in Cyrillic, those from Elms translated into Latin script. Some Latin script abbreviations correspond with ISO 639-3, some do not. I changed them all to the ISO 639-3 standard.
Second, the legend shows bubbles that represent the speakers of the language. The bubbles area correctly corresponds with the speaker size in logarithmic classes, >3000 speakers, >30 000 speakers, >300 000 speakers ect… That means that the bubble size of Ukrainian with 37 million speakers is the same size as the German or Russian bubble with 95 million speakers (in Europe) or the Icelandic bubble with 300 000 speakers has the same bubble size as languages with 2,5 to 3 million speakers. I calculated the diameter of each bubble new and adjusted them to the number of speakers in Europe of that language (source).
Third, in the Elms diagrams several Indo-European branches, Albanian, Baltic, Celtic, Germanic, Hellenic, Romance (Italic) and Slavic as well some Uralic languages are depict. Missing in Europe is the Indo-European Armenian branch (where does Europe end?), all Turkic languages, all Ibero-Caucasian, Kartvelian and the sole European Semitic Language Maltese and finally Basque. Taking the 66 150 Faroese speakers as a cut off line, there are 1 Armenian, 1 Basque, 2 Germanic, 6 Italic, 2 Kartvelian, 5 Caucasian, 1 Semetic, 8 Turkic, 2 Slavic, 14 Uralic languages missing (again depending on where Europe ends and what is considered a language). Adding Kartvelian, Turkic, Indo-Iranian, Ibero-Caucasian and the Uralic languages seemed a too daunting task. I did add many of the of the missing languages.
Additions Basque, Semitic, Indo-Iranian and Armenian:
Adding Basque and Maltese was not be that much of a hassle. Basque is 70 lexical distance to the left of Spanish and 95 to Berber (which is in North Africa and is not included). Maltese is 70 down from Italian and an undefined distance from Greek. The Indo-Iranian Romani language is wide spread and would have enough speakers to be included but diverges within itself and so wide spread it is hard to determine lexical distance. To get a lexical connection to Armenian I would have to add almost all the others that are missing, my apologies for not attempting.
Since Scots has no official status or clear boundary I did not include it. Luxembourgish on the other hand does and would be close to German and Dutch with a link to French, I placed it where I assume it would be but with no distances marked. Frisian is defined presently as a language group, Northern Frisian, Eastern Frisian and Western Frisian. Sadly, the northern and eastern language usage has diminished and they have 10 000 and 2 250 speakers respectively. West Frisian, with roughly 467 000 speakers, is in good health. I could not figure out which Frisian Tyshchenko was referring to, but I assume West Frisian and labelled the bubble accordingly.
Norwegian has two official written forms, Bokmål and Nynorsk. I considered combining the two but decided against it. Tyshchenko assessed both separately to determine their divergence. Which makes sense, he may not speak Norwegian and many of the other languages researched and had to rely on comparing syntax, vocabulary, morphology, vocabulary, ect… to determine lexical distance and did not have the resources to survey the Norwegian language to determine a standard Norwegian (there is none). So falling back on those two written forms is the best he could do. It also beautifully displays the relationship between Nynorsk, Bokmål and other languages. Bokmål is closer to Danish than to Nynorsk, Nynorsk is closer to Icelandic than any other mainland language.
Back to Provençal. Elms translated провансальська as Provençal, which is probably the correct translation of the Ukrainian word. Provençal is considered an Occitan dialect and as its own language depending on who you ask. I am going to assume that Tishchenko was assessing the lexical distance of the Occitan language and re-labelled Pro as such. Or did Tishchenko mean Franco-Provençal? Probably not, the line is stronger to Spanish than French, I assume Franco-Provençal is missing and a bubble labelled Frp should be placed close to Oci with links to French and Italian. Walloon (Wln) has archaism coming from Latin and significant borrowing from Germanic languages, Dutch, Luxembourgish and German. Picard has no official status in France but does in Belgium and straddles the border between the two nations by Nord-Pas-de-Calais and Picardy. Asturian is recognized typologically and phylogenetically close to Galician-Portuguese, Castilian and less to Navarro-Aragonese (Castilian and Aragonese do not make the cut off line and are not included). The greatest number of speakers of Aromanian are found in Greek Macedonia, with substantial numbers of speakers also found in Albania, Bulgaria, Serbia, and in FYRo Macedonia which also officially recognized it. The Eastern Romance language Aromanian (Rup) has been more influenced by Greek than by Slavic compared to Romanian. I placed Rup close to Ron in the direction of Grk.
Slavic adjustments and updates
It has been a while since the Croats and Serbians have decided that they do not speak the same language and this is accurately depicted above but the Bosnians and Montenegrin also decided that they have their own language. Thus I added a Bos bubble and Mis1 (for missing ISO-code Montenegrin) right next to the Hrv and Srp bubble. By Elms’s translation there is a bubble named Sr between the Czech and Polish bubble, by Tyshchenko’s 1999 diagram there are two bubbles there. I assume the larger one is Silesian and the smaller one Sorbian, I added both there even if Sorbian does not make the cut off line.
Leaves me with 54 languages, representing 670 million people, Europe has an estimated population of 740 million. It checks out.
|Language||Branch or Family||Speakers
|deu||German||Germanic||95 000 000||4.75|
|rus||Russian||Slavic||95 000 000||4.75|
|fra||French||Italic-Romance||60 000 000||3.00|
|ita||Italian||Italic-Romance||57 700 000||2.89|
|eng||English||Germanic||55 600 000||2.78|
|spa||Spanish||Italic-Romance||45 000 000||2.25|
|pol||Polish||Slavic||38 663 000||1.93|
|ukr||Ukrainian||Slavic||37 000 000||1.85|
|ron||Romanian||Italic-Romance||23 782 000||1.19|
|nld||Dutch||Germanic||21 944 000||1.10|
|grk||Greek||Hellenic||13 420 000||0.67|
|hun||Hungarian||Uralic||12 606 000||0.63|
|ces||Czech||Slavic||10 619 000||0.53|
|cat||Catalan||Italic-Romance||10 000 000||0.50|
|por||Portuguese||Italic-Romance||10 000 000||0.50|
|swe||Swedish||Germanic||9 197 090||0.46|
|srp||Serbian||Slavic||8 957 906||0.45|
|bul||Bulgarian||Slavic||8 157 770||0.41|
|sqi||Albanian||Albanian||7 400 000||0.37|
|hrv||Croatian||Slavic||5 752 090||0.29|
|dan||Danish||Germanic||5 522 490||0.28|
|fin||Finnish||Uralic||5 392 180||0.27|
|slk||Slovak||Slavic||5 187 740||0.26|
|nob||Norwegian Bokmål||Germanic||3 854 000||0.19|
|bel||Belarusian||Slavic||3 312 610||0.17|
|lit||Lithuanian||Baltic||3 001 860||0.15|
|glg||Galician||Italic-Romance||2 355 000||0.12|
|bos||Bosnian||Slavic||2 225 290||0.11|
|slv||Slovene||Slavic||2 085 000||0.10|
|lav||Latvian||Baltic||1 752 260||0.09|
|mkd||Macedonian||Slavic||1 407 810||0.07|
|srd||Sardinian||Italic-Romance||1 200 000||0.06|
|est||Estonian||Uralic||1 165 400||0.06|
|nno||Norwegian Nynorsk||Germanic||846 000||0.04|
|fry||Western Frisian||Germanic||467 000||0.02|
|gla||Scottish Gaelic||Celtic||68 130||0.00|
Note: This is just Europe, so if you add Spanish, French, Portuguese and English from elsewhere this table would look different.
Fourth. I added a list of abbreviations and redid the distance scale and speaker categories.
Fifth. Tyshchenko gave language branches circular labels and by the version that includes Iranic also drew circles around the branches. By another version the spaces between connection lines by the branches are coloured in. This all reminded me of an Euler diagram that also shows the relationship between the branches, particularly the Celtic, Germanic and Romance circles overlap. I wanted to include this in my version and so I gave each branch and each language family its own bubble. By some I tinkered around by fading the edges to symbolise that the boundaries of language are fusing with other branches.
Sixth. I added to gravestones for Anatolian and Tocharian
Seventh. I added arrows to other languages outside of Europe.
Finally, a note on the lines that link the different language bubbles. If you look at the Germanic branch then you notice that there are links placed between English and every other Germanic language except for Swedish. Same can be observed by larger languages in Romance or Slavic. A missing line between two languages does not mean that there is no link between them; it just means that the lexical distance between these two languages has not been researched yet. Thus, for example the link between Albanian and Serbian or German and French is real but not shown.
An earlier version of this page had Romansh (Roh) and Latvian mislabelled, and was missing Friulian with 300 000 speakers and iso 639-3 code (Fur).
[…] in a map by an Ukrainian linguist named Tishchenko that has been making the rounds of the net. (English translation with modifications by Alternative Transportation blogger Stephan […]