LikeLike

]]>LikeLike

]]>LikeLike

]]>Thanks for the references. I don’t specifically know of an MDS method for spherical data, but it is an attractive idea. If you look at the set of correlations between objects (i.e. languages), then the inverse cosines of these correlations would correspond to angles (i.e., great-circle distances) between points on the sphere. So in general, n languages could be mapped, without information loss, onto an n-2 dimensional sphere. What you would be looking for is a projection of such a representation onto a 2-sphere with minimal information loss. Depending on how you define information loss, my gut feeling is that the representation would be the same or similar to what you would get from the first three principal components of the correlation matrix. There remains the rather complicated question of how one would define a “correlation” between languages. An idea that jumps into my head is using the probability of mutual comprehension. How likely is a native speaker of one language to understand a random utterance in another language? This could be determined empirically by testing a sample of speakers of various languages. Could one treat these probabilities as correlations? Correlations can be positive or negative. Does it make sense for two languages to have a negative correlation? Would that mean that one is more likely to misunderstand the foreign utterance (too many false friends)? On the other hand, if we consider zero correlation as the farthest distance, then the graphical representation would be restricted to the first octant, which is just a stretched triangle.

I am just thinking as I type — some thoughts you might consider.

]]>I agree, that showing this data in a graph like this is an attractive but flowed method. I think that Elms’ version, to some degree, is better than my version. Her graph is the equivalent of showing a subway system as a diagram in Vignelli‘s style, giving the viewer a clear mental image of connections and relationships, whereas my version is more messy, less clear cut and the equivalent of this 1948 transit map. But both versions are the attempts to show a matrix of data with a multitude of dimensions only in 2 dimensions. Subway systems are much simpler with the number of dimensions they traverse. I did try to explain this dilemma in this blog post Dimensional limits of Graphs. You mentioned multidimensional scaling, which we I have attempted with Marcin and Vincent see Visualizing Lexical Distance in Three Dimensions. My 2015 version used nothing so sophisticated.

However, I feel the visualization of in three dimensions is again not as attractive then to the casual viewer compared to the earlier 2D versions.

I think either either it should be made into an interactive version, similar to this work from Daniel Probst.

or since the MDS warps these data nodes in three dimensions into a rough sphere shape, I am looking for algorithm that would scale the connections so that the nodes are distributed over the surface of a sphere in 3D. Thus the connections between the individual languages would be scaled up or down but the distance from the language node to the center of the sphere is always the same. This would allow me to manually draw a “attractive” global map of the languages with calculated language node placements.

My question to you is, do you know of a method of multidimensional scaling limited to the surface of a sphere?

LikeLike

]]>Apologies for the late reply. I agree, Dacian is sorely missing here, as are a other Indo-European languages. Romanian might well borrow 11% to 15% of it’s vocabulary from Slavic, but the graphic above does not compare the total vocabulary from one language to the next language, even though that would also be very interesting, and some projects have done exactly that for several languages. The above merely compares a very small number of core stable words with each other.

LikeLike

]]>This I drew with MS-PowerPoint, even though these days I would do something like this with Inkscape.

LikeLike

]]>LikeLike

]]>The definition of distance itself is problematic, and you discuss the issues above. But another aspect is: How do you display the distances graphically? The graph links circles with line segments which are coded according to similarity. But it is not at all clear how well the geometric placement reflects these distances. In particular, what does the lack of an edge between two circles mean? You suggest it just means “insufficient data” which is not at all what the visual effect of the graph implies.

There are statistical techniques, such as multidimensional scaling or principal components that allow one to present, in two dimensions, a reasonable summary of the distance matrix. From what I hear, nothing so sophisticated was used to come up with this diagram. Are linguists conversant with these statistical methods?

LikeLike

]]>LikeLike

]]>The hard drive where the original vector file was on, sadly broke.

LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>