How to and the trouble with mapping lexical distance in 2D

This is a follow up to a previous post Lexical Distance Among Languages.

If one studies the Elms and Tishchenko lexical distance diagrams then you will notice that the orientation and position of languages and language branches are different from one diagram to another.  If you measure the distances from my diagram then you will find that some are exact and some are stretched or shortened.  I want to explain why and how by my version the position of the bubbles are where they are and why some of the distances are off.

I assume this is how Tishchenko created his diagrams as well.

Step 1: With the assumption that the Germanic branch and Slavic Branch are completed this is how to complete the remaining languages in the Baltic branch and Uralic languages in Europe. First by drawing a circle with the diameter of 70 lexical distance from the German ball and one with 62 lexical distance around the Poland ball. These lexical distances are to Lithuanian and are taken directly from Tishchenko research. (Note by the last version Tishchenko published there is no lexical distance between German and Lithuanian marked, for this explanation I used 70, in this version I use 78).
Lexical Distance Map Step 1

Step 2: Where the two circles intersect is where Lithuanian should be positioned.
Lexical Distance Map Step 2

Step 3: Draw a circle with 42 diameter around Lithuanian and extend the 70 circle from German.
Lexical Distance Map Step 3

Step 4: Where the 42 diameter circle and extension of the 70 from German intersect is where Latvian should be positioned.
Lexical Distance Map Step 4

Step 5: Repeat Steps 1 and 2 to position Hungarian.Lexical Distance Map Step 5

Where is there trouble when mapping this in 2D?  Well the examples shown until now are relatively simple.  If a language’s position is however defined from three directions then a lexical distance will fall short or be too long. Here and example with Estonian that has 45 lexical distance from Finnish and 90 from Hungarian, and more than 71 from Latvian (Tishchenko gave no exact number).

Lexical Distance Map Step 6
Position of Estonian defined from Latvian, Finnish and Hungarian, including a left over distance from the >70 distance from Latvian.

I encountered these left over distances or missing distance regularly whiles drawing my version, in Particular when adding distance markers to Albanian or Greek. On one of Tyshchenko diagrams he adds curves to make distances longer.
Tyshchenko lengthening distance lines
Example of Tyshchenko lengthening curves between Italian and Romanian

I adapted that Idea and added some curves in my version as well, see for example the distance between Gaelic and  Breton.

Lexical Distance celtic sectionSection of lexical distance diagram showing Celtic languages.

But to straighten out the curves and shorten the biggest stretches  one would have to leave a two dimensional diagram and add another dimension.

Edit 2017.02.27
There is a mathematical way to calculate node positions of a matrix like this. Marcin Ciura did this with the Polish rail network, see here Warping Maps with SVD. Marcin shared the code and with this method one can calculate distances in 2D and 3D for Matrices.


  1. Those shorter distances imply a hyperbolic space, rather than Euclidean space.

    Hyperbolic space happens in statistics when the sample is too small to have achieved normality. We assume normality prematurely because it makes the math much simpler. Normality is achieved when the mean, median, and mode converge.

    If there is a gap in the distances, then the path is in spherical space. Once a sample has achieved normality, the variance increases. This is where space becomes spherical.

    The geodesics are changing, not the Euclidean distance between the points.

    Great work. Thanks.


  2. Thanks for the mention! If you use my code some day, you should convert the similarity fraction s of a pair of languages, s∈[0, 1] to their distance d∈[0, ∞). I think that setting d=−log s would do, as opposed to Tishchenko’s non-additive formula d=100(1−s).

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s