Two weeks ago Andrés asked in a email:
[…] I’ve been looking at your lexical distance map, but I think there’s something wrong, the line between spanish and italian seem to be the one that should be less than 35, yet it says 41. […]
Which is a fair question. If you study the Lexical Distance Among Languages of Europe 2015 graphic, you see that what is plotted does not always match up with what is labelled. I previously explained in How to and the trouble with mapping lexical distance in 2D why some of the distortions in that 2D graph occur, and I pondered switching to 3D in Lexical Distance Diagram in 3D? to create a more accurate graph. But even then I ran into trouble. This map shows a matrix of relations. Each language (N) is connected with one distance (R) to all the other nodes. This is actually similar to a Direct Network public transportation system where there is connection from one point (station) to all other points.
If one only needs to show the distance between two languages (nodes N=2) then one could do this wonderfully in a 1D graph (dimension D=1). Just a connection line (R=1) would be enough between to languages (N=2).
When comparing three languages (nodes N=3) and their distances (connections R=3) then (unless these are in a perfect line) one would need a 2D graph (dimension n=2) to show the relationship between these languages without distorting the distances.
By four languages (nodes n=4) we need a 3D graph (dimension n=3) to show their distances (connections R=6) to each other without some how distorting their relationship. (Unless, they lie all on a plane or are in a row).
But if one accurately wants to display the relationship between 5 or more languages (nodes N=>5) with 10 or more distances (connections R=>10) then we run out of dimensions (4D?) for a graph. (Unless these language relationships line up perfectly in a smaller dimension). The “Lexical Distance Among Languages of Europe 2015” shows fifty five (nodes n=55) with a theoretical 1485 distances (connections R=1485) on a 2D surface (dimension n=2). Most of the distances have been omitted, and the languages have been placed to squeeze them into a 2D surface. Without any distortion one would need 54 dimensions to show the relationship between the languages accurately.
To be able to depict that many relationships one could find the smallest adjustment necessary to each connection to squeeze the data into a 2D or 3D graphic. Marcin Ciura programmed and warped the Polish railway network in this fashion.
Spinning animation of a 3D Graph of Slavic languages, focused on Slovakian (Centre), with Czech closest, and Russian (large bubble dark green), Polish (medium bubble light green), Ukrainian (cyan green), Belorussian (between Russian and Ukrainian), Bulgarian and Macedonian (top, light green) with linguistic distance according to Beaufils
But these bubbles have still been drastically warped to fit into 3 dimensions. If they had been squeezed into a 2D graph then the 20301 connections would have been even more distorted. If you would measure the distance between the Russian and Slovakian (should be 6.7) and Bulgarian and Slovakian (15.5), you would not measure that exactly but something slightly (and in some cases drastically different).
Spanish and Italian shown as less than 35, even though it is labelled 41, is the result from a placing decision, which ultimately is necessary because the relationships are perfectly aligned in two dimensions.