This is a follow up to a previous post Lexical Distance Among Languages.

If one studies the Elms and Tishchenko lexical distance diagrams then you will notice that the orientation and position of languages and language branches are different from one diagram to another. If you measure the distances from my diagram then you will find that some are exact and some are stretched or shortened. I want to explain why and how by my version the position of the bubbles are where they are and why some of the distances are off.

I assume this is how Tishchenko created his diagrams as well.

**Step 1:** With the assumption that the Germanic branch and Slavic Branch are completed this is how to complete the remaining languages in the Baltic branch and Uralic languages in Europe. First by drawing a circle with the diameter of 70 lexical distance from the German ball and one with 62 lexical distance around the Poland ball. These lexical distances are to Lithuanian and are taken directly from Tishchenko research. (Note by the last version Tishchenko published there is no lexical distance between German and Lithuanian marked, for this explanation I used 70, in this version I use 78).

**Step 2:** Where the two circles intersect is where Lithuanian should be positioned.

**Step 3:** Draw a circle with 42 diameter around Lithuanian and extend the 70 circle from German.

**Step 4:** Where the 42 diameter circle and extension of the 70 from German intersect is where Latvian should be positioned.

**Step 5:** Repeat Steps 1 and 2 to position Hungarian.

Where is there trouble when mapping this in 2D? Well the examples shown until now are relatively simple. If a language’s position is however defined from three directions then a lexical distance will fall short or be too long. Here and example with Estonian that has 45 lexical distance from Finnish and 90 from Hungarian, and more than 71 from Latvian (Tishchenko gave no exact number).

Position of Estonian defined from Latvian, Finnish and Hungarian, including a left over distance from the >70 distance from Latvian.

I encountered these left over distances or missing distance regularly whiles drawing my version, in Particular when adding distance markers to Albanian or Greek. On one of Tyshchenko diagrams he adds curves to make distances longer.

Example of Tyshchenko lengthening curves between Italian and Romanian

I adapted that Idea and added some curves in my version as well, see for example the distance between Gaelic and Breton.

Section of lexical distance diagram showing Celtic languages.

But to straighten out the curves and shorten the biggest stretches one would have to leave a two dimensional diagram and add another dimension.

**Edit 2017.02.27
**There is a mathematical way to calculate node positions of a matrix like this. Marcin Ciura did this with the Polish rail network, see here Warping Maps with SVD. Marcin shared the code and with this method one can calculate distances in 2D and 3D for Matrices.

Thanks for the mention! If you use my code some day, you should convert the similarity fraction s of a pair of languages, s∈[0, 1] to their distance d∈[0, ∞). I think that setting d=−log s would do, as opposed to Tishchenko’s non-additive formula d=100(1−s).

LikeLiked by 1 person

[…] last post described the troubles with mapping lexical distance in 2D and what issues I encountered when drawing this diagram. Playing around this weekend I came up […]

LikeLike