If you wanted to update a map showing how to travel from one city, town or village to another then it would be good to know the positions of the inhabited places and the paths connecting them. If you wanted to update a diagram showing the linguistic distance between, language families, languages and dialects then determining their relationship to each other is one of the last steps, finding a list of languages one of the first.
But which list? Lexical Distance Among Languages of Europe used ISO 639-3 . It is maintained by SIL which also publicizes Ethnologue (ɠ). ISO 639-3 has 7 865 entries and an upper limit of 17 029 possible assigned language codes (46% possible codes are already assigned). A language is assigned a three letter code, English has the code eng, German deu or Japanese jpn. SIL, the Summer Institute of Linguistics, Inc. was founded as an asset for Christian missionaries who wished bring the word of God to all people of the Earth, which ever language they may speak. This code system is widely used, for example by Wikipedia (if it is not using ISO 639-1) or by OLAC and by the CLARIN Project.
In 2015, when I updated the lexical distance diagram, there was one language I included that did not have an ISO 639-3 code. Montenegrin is listed with MIS1. If linguists or Montenegrins (which they do) want to have this language included in an ISO 639 standard then they need to apply for its inclusion. By ISO 639-3 this means submitting a change request to SIL in Dallas Texas. It can then be accepted, rejected or put on hold waiting for additional information. This change request process has been criticized for being somewhat of an in-transparent black box. Requests are submitted into the box, something happens in the box and an answer is given some time afterwards. But SIL is moving to become more transparent.
There are also other ISO 639 lists. The ISO 639-2, with 464 entries, also has 3 letters codes for each language listed. English has the code eng, German ger/deu or Japanese jpn. It is maintained by the US Library of Congress, who’s mission is to support US Congress in fulfilling its constitutional duties, and to further the progress of knowledge and creativity for the benefit of the American people. ISO 639-1 has 184 entries that consist of two letters, e.g. English has en, German de or Japanese ja. ISO 639-1 is maintained by Infoterm, an association based in Vienna, founded by UNESCO, with many members from around the world, aiming to support and co-ordinate international co-operation in the field of terminology. Terminology? Basically, when a contract or an agreement is signed between different peoples, they both understand it the same way. This two letter code system is widely used by companies (e.g. Microsoft), websites (e.g. Wikipedia) and organizations.
Have linguists managed to agree on a list themselves?
By ISO 639 language lists there is one linguist that evolved from a faith based organization, one US government organization and an UN association for terminology. Is there maybe an international organization of linguists that have agreed on a system of listing languages? Maintaining a list of languages would be the logical task and purpose of an international linguistic association. The International Linguistic Association publishes the linguistic journal WORD but no language list. The International Association of Applied Linguistics has linguistic news, books and reviews, but sadly also no language list. Resource Network for Linguistic Diversity, Linguistic Data Consortium, International Association of Multilingualism and International Network in Biolinguistics all do not maintain their own language list. I find this is a shame that linguists around the world have not collectively agreed on a common language list that fits the needs of their science best.
MultiTree, Max Planck Institute & Linguasphere
But there are language lists that have been compiled by linguists. Notable is the collection of lists shared by MultiTree a project by Linguistlist. In their database some 383 language lists are available. Mostly covering one geographical area or one language family, the lists were separately compiled between 1700 and 2017 and painstakingly digitized and shared there.
Max Planck Institute
Not relying on the Summer Institute of Linguistics, Inc. to approve or amend any changes to the ISO 639-3, Linguists at the Max Planck Institutes based in Leipzig and Jena maintain two lists of languages. The World Atlas of Language Structures (WALS) which has about 2 679 entries has a code for each language made up of 3 letters. Many codes are the same as the ISO 639 lists. English has the code eng, German ger or Japanese jpn. WALS.info shares a wealth of linguistic information and documents. The WALS code system is also used by other projects e.g. Phoible.
And then Max Planck Institutes also maintain Glottolog (ഗ), which is a counterpart to Ethnologue (ɠ). Glottolog boasts 7 943 language entries. The code system assigned to each language is put together with four letters followed by four numbers, e.g. Standard English has the code stan1293, New Zealand English newz1240, Standard German has stan1295, Yiddish has yidd1256, Yiddish Eastern has the code yidd1258, Yiddish Western yidd1257. Indo-European has the code indo1319. Remember, ISO 639-3 has an upper limit of 17 029 languages that can be included with its code system; Glottolog has an upper theoretical limit of 5.3 billion. That is theoretical, because Glottolog also has a systematic way of assigning the numbers and the letters. Main point, Glottolog will not run out of codes anytime soon, where as ISO 639-3 will.
Finally, if it pleases and sparkles, there is the massive 32 798 long Linguasphere list. The Linguasphere Register is the life’s work of David Dalby (with the support of many others) who developed the list as planetary referential system. The code system consists of two numbers and up to six letters. This system has a theoretical upper limit of about 1 trillion entries, again theoretical because there is also a system behind the numbers and letters. For example 52-ACB-gaf is the code for Donau Yiddish (or Danube/Vienna Yiddish). The dialect Donau Yiddish is part of the inner language Yiddish Western which has the code 52-ACB-ga, in turn that is part of the outer language Yiddish with the code 52-ACB-g. The outer language Yiddish is collected together in the DEUTSCH+ NEDERLANDS net with the code 52-ACB. That net is part of the FRYSK+DEUTSCH chain, or CONTINENTAL-WEST-GERMANIC which has the code 52-AC. The Chain is then part of the NORSK+ FRYSK set, with the code 52-A. That set is part of the zone GERMANIC phylozone 52= which is part of the sector INDO-EUROPEAN phylosector with the code 5= (there are also geosectors and geozones). You can download the Linguasphere Register (🌐?) in PDF form at Linguasphere.
I put the effort in combining the Linguasphere PDF version into one excel file which you can download in the next blog post. If anyone finds mistakes or maybe has the list matched to SIL/Ethnologue (ɠ), Glottolog (ഗ), WALS (🌏?) that would help a lot. These lists, to extent, compete with each other but also their creators have worked together.
Do you actually need codes?
Which Wikipedia? The English Wikipedia has 8190 language articles, German 1981, French 3542, Russian 3270, Mongolian 24, Arabic 686, Spanish 1654, Portuguese 1995, Swedish 665, Turkish 542, Italian 1705, Polish 1050, ect…
Languages would be an obvious and well rewarding set of data for the wiki-data project to assemble. Maybe in the next years the online community will be able to compile language list that builds off of the existing ones.