Lexical Distance Among Languages of Europe 2015

I found the lexical distance map fascinating but the closer I studied it the more things bothered me. Thus trawling through the net I found Tishchenko Cyrillic versions. So I sat down this weekend and translated, adjusted, combined and updated to create my own version here:

Lexical Distance Among Languages of Europe

Here a list of my changes.

First, the abbreviations. A Romance language abbreviated Pro, I found out after some research stands for Provençal, or Ga I assume stands for Scottish Gaelic. Tyshchenko’s abbreviations are in Cyrillic, those from Elms translated into Latin script. Some Latin script abbreviations correspond with ISO 639-3, some do not. I changed them all to the ISO 639-3 standard.

Second, the legend shows bubbles that represent the speakers of the language. The bubbles area correctly corresponds with the speaker size in logarithmic classes, >3000 speakers, >30 000 speakers, >300 000 speakers ect… That means that the bubble size of Ukrainian with 37 million speakers is the same size as the German or Russian bubble with 95 million speakers (in Europe) or the Icelandic bubble with 300 000 speakers has the same bubble size as languages with 2,5 to 3 million speakers. I calculated the diameter of each bubble new and adjusted them to the number of speakers in Europe of that language (source).

Third, in the Elms diagrams several Indo-European branches, Albanian, Baltic, Celtic, Germanic, Hellenic, Romance (Italic) and Slavic as well some Uralic languages are depict. Missing in Europe is the Indo-European Armenian branch (where does Europe end?), all Turkic languages, all Ibero-Caucasian, Kartvelian and the sole European Semitic Language Maltese and finally Basque. Taking the 66 150 Faroese speakers as a cut off line, there are 1 Armenian, 1 Basque, 2 Germanic, 6 Italic, 2 Kartvelian, 5 Caucasian, 1 Semetic, 8 Turkic, 2 Slavic, 14 Uralic languages missing (again depending on where Europe ends and what is considered a language). Adding Kartvelian, Turkic, Indo-Iranian, Ibero-Caucasian and the Uralic languages seemed a too daunting task. I did add many of the of the missing languages.

Additions Basque, Semitic, Indo-Iranian and Armenian:
Adding Basque and Maltese was not be that much of a hassle. Basque is 70 lexical distance to the left of Spanish and 95 to Berber (which is in North Africa and is not included). Maltese is 70 down from Italian and an undefined distance from Greek. The Indo-Iranian Romani language is wide spread and would have enough speakers to be included but diverges within itself and so wide spread it is hard to determine lexical distance. To get a lexical connection to Armenian I would have to add almost all the others that are missing, my apologies for not attempting.

Germanic Adjustments
Since Scots has no official status or clear boundary I did not include it. Luxembourgish on the other hand does and would be close to German and Dutch with a link to French, I placed it where I assume it would be but with no distances marked. Frisian is defined presently as a language group, Northern Frisian, Eastern Frisian and Western Frisian. Sadly, the northern and eastern language usage has diminished and they have 10 000 and 2 250 speakers respectively. West Frisian, with roughly 467 000 speakers, is in good health. I could not figure out which Frisian Tyshchenko was referring to, but I assume West Frisian and labelled the bubble accordingly.
Norwegian has two official written forms, Bokmål and Nynorsk. I considered combining the two but decided against it. Tyshchenko assessed both separately to determine their divergence. Which makes sense, he may not speak Norwegian and many of the other languages researched and had to rely on comparing syntax, vocabulary, morphology, vocabulary, ect… to determine lexical distance and did not have the resources to survey the Norwegian language to determine a standard Norwegian (there is none). So falling back on those two written forms is the best he could do. It also beautifully displays the relationship between Nynorsk, Bokmål and other languages. Bokmål is closer to Danish than to Nynorsk, Nynorsk is closer to Icelandic than any other mainland language.

Romance Adjustments
Back to Provençal. Elms translated провансальська as Provençal, which is probably the correct translation of the Ukrainian word. Provençal is considered an Occitan dialect and as its own language depending on who you ask. I am going to assume that Tishchenko was assessing the lexical distance of the Occitan language and re-labelled Pro as such. Or did Tishchenko mean Franco-Provençal? Probably not, the line is stronger to Spanish than French, I assume Franco-Provençal is missing and a bubble labelled Frp should be placed close to Oci with links to French and Italian. Walloon (Wln) has archaism coming from Latin and significant borrowing from Germanic languages, Dutch, Luxembourgish and German. Picard has no official status in France but does in Belgium and straddles the border between the two nations by Nord-Pas-de-Calais and Picardy. Asturian is recognized typologically and phylogenetically close to Galician-Portuguese, Castilian and less to Navarro-Aragonese (Castilian and Aragonese do not make the cut off line and are not included). The greatest number of speakers of Aromanian are found in Greek Macedonia, with substantial numbers of speakers also found in Albania, Bulgaria, Serbia, and in FYRo Macedonia which also officially recognized it. The Eastern Romance language Aromanian (Rup) has been more influenced by Greek than by Slavic compared to Romanian. I placed Rup close to Ron in the direction of Grk.

Slavic adjustments and updates

It has been a while since the Croats and Serbians have decided that they do not speak the same language and this is accurately depicted above but the Bosnians and Montenegrin also decided that they have their own language. Thus I added a Bos bubble and Mis1 (for missing ISO-code Montenegrin) right next to the Hrv and Srp bubble. By Elms’s translation there is a bubble named Sr between the Czech and Polish bubble, by Tyshchenko’s 1999 diagram there are two bubbles there. I assume the larger one is Silesian and the smaller one Sorbian, I added both there even if Sorbian does not make the cut off line.

Leaves me with 54 languages, representing 670 million people, Europe has an estimated population of 740 million. It checks out.

ISO 639-3 Abreviation	Language	Branch or Family	Speakers in Europe	Bubble Diameter
deu	German	Germanic	95 000 000	4.75
rus	Russian	Slavic	95 000 000	4.75
fra	French	Italic-Romance	60 000 000	3.00
ita	Italian	Italic-Romance	57 700 000	2.89
eng	English	Germanic	55 600 000	2.78
spa	Spanish	Italic-Romance	45 000 000	2.25
pol	Polish	Slavic	38 663 000	1.93
ukr	Ukrainian	Slavic	37 000 000	1.85
ron	Romanian	Italic-Romance	23 782 000	1.19
nld	Dutch	Germanic	21 944 000	1.10
grk	Greek	Hellenic	13 420 000	0.67
hun	Hungarian	Uralic	12 606 000	0.63
ces	Czech	Slavic	10 619 000	0.53
cat	Catalan	Italic-Romance	10 000 000	0.50
por	Portuguese	Italic-Romance	10 000 000	0.50
swe	Swedish	Germanic	9 197 090	0.46
srp	Serbian	Slavic	8 957 906	0.45
bul	Bulgarian	Slavic	8 157 770	0.41
sqi	Albanian	Albanian	7 400 000	0.37
hrv	Croatian	Slavic	5 752 090	0.29
dan	Danish	Germanic	5 522 490	0.28
fin	Finnish	Uralic	5 392 180	0.27
slk	Slovak	Slavic	5 187 740	0.26
nob	Norwegian Bokmål	Germanic	3 854 000	0.19
bel	Belarusian	Slavic	3 312 610	0.17
lit	Lithuanian	Baltic	3 001 860	0.15
glg	Galician	Italic-Romance	2 355 000	0.12
bos	Bosnian	Slavic	2 225 290	0.11
slv	Slovene	Slavic	2 085 000	0.10
lav	Latvian	Baltic	1 752 260	0.09
mkd	Macedonian	Slavic	1 407 810	0.07
srd	Sardinian	Italic-Romance	1 200 000	0.06
est	Estonian	Uralic	1 165 400	0.06
nno	Norwegian Nynorsk	Germanic	846 000	0.04
wln	Walloon	Italic-Romance	600 000	0.03
eus	Basque	Basque	545 872	0.03
cym	Welsh	Celtic	536 890	0.03
mlt	Maltese	Semitic	522 000	0.03
szl	Silesian	Slavic	510 000	0.03
mis1	Montenegrin	Slavic	510 000	0.03
fry	Western Frisian	Germanic	467 000	0.02
ltz	Luxembourgish	Germanic	336 710	0.02
isl	Icelandic	Germanic	300 000	0.02
gle	Irish	Celtic	276 310	0.01
oci	Occitan	Italic-Romance	220 000	0.01
bre	Breton	Celtic	206 000	0.01
pcd	Picard	Italic-Romance	200 000	0.01
frp	Franco-Provençal	Italic-Romance	140 000	0.01
rup	Aromanian	Italic-Romance	114 340	0.01
ast	Asturian	Italic-Romance	110 000	0.01
gla	Scottish Gaelic	Celtic	68 130	0.00
fao	Faroese	Germanic	66 150	0.00
lat	Latin	Italic-Romance	30 000	0.00
wen	Sorbian	Slavic	30 000	0.00

Note: This is just Europe, so if you add Spanish, French, Portuguese and English from elsewhere this table would look different.

Fourth. I added a list of abbreviations and redid the distance scale and speaker categories.

Fifth. Tyshchenko gave language branches circular labels and by the version that includes Iranic also drew circles around the branches. By another version the spaces between connection lines by the branches are coloured in. This all reminded me of an Euler diagram that also shows the relationship between the branches, particularly the Celtic, Germanic and Romance circles overlap. I wanted to include this in my version and so I gave each branch and each language family its own bubble. By some I tinkered around by fading the edges to symbolise that the boundaries of language are fusing with other branches.

Sixth. I added to gravestones for Anatolian and Tocharian

Seventh. I added arrows to other languages outside of Europe.

Finally, a note on the lines that link the different language bubbles. If you look at the Germanic branch then you notice that there are links placed between English and every other Germanic language except for Swedish. Same can be observed by larger languages in Romance or Slavic. A missing line between two languages does not mean that there is no link between them; it just means that the lexical distance between these two languages has not been researched yet. Thus, for example the link between Albanian and Serbian or German and French is real but not shown.

Update 17.05.2015
An earlier version of this page had Romansh (Roh) and Latvian mislabelled, and was missing Friulian with 300 000 speakers and iso 639-3 code (Fur).

90 comments

Як доба раннього середньовіччя заклала основні відмінності в формуванні етносів українців, білорусів та росіян? – Україніка | Ukraїnika says:

7. April 2024 at 16:02

[…] Джерело […]

LikeLike

Anonymous says:

4. March 2024 at 9:46

LOL, lovely comment about Catalan from that Spaniard. There’s written evidence of Catalan as early as the 10th century (giving itself a name already), along with the other languages developing in the area those years (Liones, Castilian, Catalan, Riojan, etc, which had more or less continuity in the years to come). This is the level of stupidity we have to live with every day.

LikeLike

Anonymous says:

11. December 2023 at 11:29

nteresting article except for the fact that Catalan IS NOT a language, catalan is a dialect of Occitane language. Catalan was never mentioned as a langue until the nationalistic movement at the end of XIX Century was artificially modified by Pompeu Fabra which not even an expert in linguistic, he was an chemist. His level of ignorance was such as to eliminate the “ch” present in all latin languages (in Italian cc) substituting it for an “x” because he did not want catalan to resemble Spanish (wrongly called castillian). He modified the gramma with the sole purpose to help the nationalistc movement which is not even supported by historians.

LikeLike

Anonymous says:

4. December 2023 at 10:16

You have to add Vasc.
It is spoken between the south of France and de north Spain. It is one of the oldest language in Europe, but a lot of times forgotten by authorities. Don’t forget us, Vasc, euskara.

LikeLike

How different are the Russian and Ukrainian languages? | Spin, strangeness, and charm says:

1. May 2022 at 20:07

[…] in a map by an Ukrainian linguist named Tishchenko that has been making the rounds of the net. (English translation with modifications by Alternative Transportation blogger Stephan […]

LikeLike

	Як доба раннього сер… on Lexical Distance Among Languag…
	Anonymous on Lexical Distance Among Languag…
	Anonymous on Lexical Distance Among Languag…
	Anonymous on Lexical Distance Among Languag…
	How to make an isoch… on What is an Isochrone Map?

Lexical Distance Among Languages of Europe 2015

Teilen mit:

Related

90 comments

Leave a comment Cancel reply