Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
MIT claims to have found a “language universal” that ties all languages together (arstechnica.co.uk)
181 points by jonbaer on Aug 6, 2015 | hide | past | favorite | 94 comments


A minor feature likely based on human constraints -- hardly any implications of a universal grammar. Even Japanese breaks this easily, which instantly kills universal in a Chomsky sense (it has to be absolute to be something our brains have innately). 37 languages aren't much when there are so many out there, particularly minor ones of isolated populations with entirely different features.

Not super exciting just yet.


Yes, I also think dependency length minimization is just a cognitive constraint. (Lead author here.) The idea of calling this a universal in the Chomsky sense is all from the press. The cool thing about it is that (previous work has shown) you can use this constraint in principle to derive a lot of the more substantive "universals" (really, overwhelming tendencies) of language, such as that natural language expressions are usually well-nested, and that in languages where the verb follows the object, the noun also follows the adjective, and the preposition follows the noun, etc.

As for isolated languages, I just finished performing some dependency length preference experiments on indigenous people in Bolivia, but haven't analyzed the data yet, so we'll see :)


What have you learned about how to talk to the press?

Bad coverage like this reflects back negatively on the research and the institution, sadly.

Uni PR offices are famously bad at twisting and over-selling research results.


I think I've learned there's not much you can do.

My advisor and I talked to two reporters: one from MIT News (https://newsoffice.mit.edu/2015/how-language-gives-your-brai...) and one from Science Magazine (http://news.sciencemag.org/social-sciences/2015/08/all-langu...). They both communicated with us about all kinds of details in the articles, and let us comment on the drafts. We clearly stated what we did and didn't want to claim, and they did a good job conveying what we wanted while adding extra connections we hadn't thought of for popular appeal.

On the other hand, we had no contact with anyone about the Ars Technica article. I've also seen some other articles cropping up that are copying the original articles, and making claims I wouldn't stand by. I don't think there's anything we can do about that.


Have you considered demanding a correction?

They are putting words in your mouth. Somebody should keep journalist accountable.

A few days ago I tried to follow the source of an article in an online newspaper. The source was another online newspaper , in a different language, and the source for this one another one. Along the way things were added and removed, just like in a crazy telephone game.

It makes you think about the news we read and take for granted.


"Scientists notice small trend that may or may not be of importance to language" just isn't a very catchy headline.


A good thing that we did for our last paper [1]: we have set up a FAQ [2] with the most common questions (we updated the FAQ when we received new questions). Thanks to the FAQ, the journalists do not "invent" too much and they have quotes that they can use.

Overall, I think the FAQ helped a lot and it avoided some common mis-interpretations.

[1] Paper: http://www.nature.com/nature/journal/v521/n7553/full/nature1... | arxiv version & video: http://chronos.isir.upmc.fr/~mouret/website/nature_press.xht...

[2] FAQ: http://chronos.isir.upmc.fr/~mouret/website/nature_press.xht...


>Bad coverage like this reflects back negatively on the research and the institution, sadly

On the contrary! It's usually assisted by the researchers and the institution, and it helps them get grants and free press "advertising".


Also: do not let the Uni PR office write the press release. Write it yourself and insist they use yours.


These abstract patterns are interesting (and I'd love to read more about them), butt more so if you can drive some kind of tree that neatly describes all languages. But doing so is essentially solving Chomsky's main theory which so far hasn't been done to my understanding?


Very interesting.

Do you think this constraint would hold in artificial languages (Esperanto, Lojban, Tengwar) as well?


Speakers of artificial languages are probably influenced by the syntax of their native languages, so yes, I suspect we'd see dependency length minimization when people speak those languages too.


Glad this is top comment atm, and thanks for saying it. Pop/folk linguistics is too easy and super annoying.

This is a bad article and it misrepresents the notion of a 'universal' in any sense (Chomsky, Greenberg, you name it) but the most purely functional. Things like sentence/word length being bounded by memory or cognitive capacity or whatever aren't a 'universal' in any meaningful or useful sense, and no linguist would argue otherwise. Probably.


I don't think you understand what the universal grammar is, if you're discounting DLM as merely being "likely based on human constraints," because that's the whole hypothesis of the universal grammar. It posits that there are physical limitations in the brain that force certain structures on language, as opposed to language being completely free form.


The hypothesis that the brain imposes constrains on possible languages is trivially true and the alternative that languages can be free-form is a straw man. That's the problem with universal grammar, depending on who you ask, it is either absurd or trivial. Perhaps there are more sensible formulations of the idea but I have yet to see them.


Finding something universal (and non-trivial) to human languages would still be exciting.


Since they only investigated 37 languages, isn't this only really evidence that they found a feature that ties 98% (~= (38/39) of languages together? Languages are known for having a few outliers with completely crazy rules.

Also, if two sentences are considered together, the average DLM would be significantly lower for those sentences than for one random sentence of the same length. So I'm not sure what this theory implies other than "the definition of a sentence can be vague".


Lead author here. It's true that longer sentences will have longer dependency length on average. That was one of the big methodological problems with previous papers that tried to show this. In the paper, we deal with this by doing stats on dependency length growth rate as sentences get longer, rather than averaging dependency lengths from sentences of different length.

I'm sure that among the 7000 languages of the world there are some that don't minimize dependency length. But I'd be content showing it's an overwhelming tendency rather than a true "universal"!


There's a reasonably well-supported taxonomy of languages, so sample size sufficiency is for experts only.


Does anyone have a list of the 37 languages used?


It's on the page 3 in the paper:

Ancient Greek, Arabic, Basque, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latin, Modern Greek, Persian, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Turkish


70% (26/37) of those are Indo-European (I think). No Amerindian, Polynesian, or Austronesian languages represented.


Nor a single indigenous African language which I find particularly strange given Africa's somewhat central (/s) role in human developmental history.


I'd love to have more interesting languages in the sample. If you speak such a language, and you are willing to parse ~2000 sentences for me, get in touch :)


Did you approach any of the Native Americans tribes? The Navajo in particular are very up on promoting their language (they even translated Star Wars).


Indonesian is an Austronesian language.

Actually, more important is the lack of Niger-Congo languages--these are the most numerous by number.


I count 24 indo-european languages.

Not indo-european: Arabic, Basque, Chinese, Estonian, Finnish, Hungarian, Indonesian, Japanese, Korean, Tamil, Telugu, Turkish, Hebrew

Your point still stands.


[flagged]


Please keep nationalist politics off HN.


It is just history.


That's what nationalist politics always says. It's not welcome here.


Can you prove that? I see that claim for first time. Most of the time, nationalists trying to hide their history, because, in large, we all branched from Africa peoples.


Working with word co-occurence clustering based on Kolmogorov complexity I have become convinced that there is a computational complexity element to every language.

Words like "circumvent" and "environment" are close in regards to complexity. Words like "us" and "me" are close in regards to complexity.

The counting argument tells us that most strings are not compressible. It is then a wonderful feature of sensor data, natural language, DNA and computer code that it can be compressed quite a bit. This means there is a certain order in the language that compressors can use to keep the file size smaller.

There is a cognitive economy trade-off between the energy needed to keep a system running and increased complexity. Less complex language helps us save energy. We use short words for concepts that we use often. Very complex concepts and words like "disambiguation" may be described with shorter simpler words to someone who has not stored that word and general accepted meaning yet.

In this complexity view languages evolve to use as little energy/computational complexity to convey as much information as possible. The results found in this article can also be explained using this view. Parsing a sentence like "Throw the trash out" requires you to store in working memory the word "throw" 'till you get to the word "out" for the full concept "to throw out". Until you get to the word "out", the "throw" remains in a superstate (could become "throw in", "throw on" etc.). You need both words to form a mental picture of someone throwing out the trash. This requires more computational energy to the listener, and is hence ineffective. If you want your message to be heard, you have to communicate in clear simple-energy sentences. So using simpler less computationally intensive sentences benefits both the speaker and the listener.

This would readily explain why natural languages beat the random benchmark. Randomness has far less structure to use for compression by an intelligent agent. Randomness is not optimized communication, since it is more unpredictable.

In short: Simplicity and conveying information with little energy is a fitness factor that natural selection optimizes for. This is universal to all natural language speaking agents with a limited energy budget.


It’s probably more memetics (constrained by cognitive load) than genetics that select the features of languages, and there are likely many other factors that determine the complexity of languages and grammatical structures: For example, there is a sweet spot between minimal symbol count and minimal lengths of the words. In the extremes you have either short words but many symbols, or few symbols but long words. At the same time the number of distinguishable phonemes and therefore symbols are, of course, restricted by the sounds that are producible by the average vocal tract.

Secondly, the communication channels thought→vocalization→hearing→thought or even thought→typing→reading→thought are inherently very noisy, so you end up with a lot of redundancy, like particles, introduction and transition phrases.

And lastly, I think that there are always some words and phases that are not shaped by efficiency/cognitive load, but rather by whether it is fun or fashionable to talk in a certain way. There is certainly some cultural variance that can be orthogonal to efficiency.


Very thought provoking answer! Thanks.

As memes house in agents with an energy budget, I think that shorter simpler memes have more chance to take hold and reproduce ("Make something people want").

Words in a sentence are like models in an ensemble. Simple words are more general and have a high bias and low variance ("Make stuff users want"). Highly complex words and sentences have a lower bias, but a higher variance. You need to average a lot of them to get a clear picture. That's why the sentences in scientific papers are usually so long, they need to gradually cancel out the noise.

> There is certainly some cultural variance that can be orthogonal to efficiency.

Yes, agreed! Though same with memes, certain words or symbols without any redundancy may have cultural value. You may gain energy by speaking a certain language to a certain degree of sophistication. You may have to invest energy to gain access to the information contained in symbols (or have agents "unzip" these for themselves).


DLM minimization doesn't seem to be a generative feature of language, just a constraint. Languages have to be understood and used by humans, and all this paper seems to show is that humans have limitations on working memory. Anything, language or otherwise, that expects to be used by humans would minimize the load on working memory.


Especially because they admit themselves that two of the languages in their set don’t even minimize.

Even with short sentences, German and Japanese often have large DLM values.

With more complex, nested sentences (which are really common in Germany), this becomes more of an issue, because between two connected words you can have 7 subclauses.

(Seriously, read Karl Marx’ Das Kapital, or Günther Grass’ Der Krebsgang, or read any other German author.)


All of the languages are "minimized" in that dependency length is below the random baseline.

German, Japanese, etc. are just much less minimized than other languages like English and Indonesian. Working out why is the next step for us. I don't think it's because these languages are inherently harder to understand. They just represent different solutions to the communication problem.


Thank you! Very interesting research all around.

I think these old German and Japanese languages may have been hard to understand for outsiders, but were used with high sophistication (you have to invest energy to access this information) among insiders. For instance the Japanese pillow words / Makurakotoba or German words for hard to translate concepts like "Weltschmerz", "Kummerspeck" and "Torschlusspanik". All short, useful words for communicating complex rich concepts, provided the agent knows the meaning of these words.


weltschmerz - pity, sorrow

Torschusspanik (sic!) - fear of quick commitment


German, especially, loves deeply nested sentences. This means, on the other hand, though, that we use far less sentences.

The grammar of an average english article in a science magazine reads for me, a native German speaker, like first-grader text.


Run-on sentences are bad style, if I'm any judge. Although, I wonder how much this relies on normal forms and getting normalization wrong.

   The grammar of an average english science magazine reminds me of German first-grader text.


Well, not run-on sentences. Those are different from nested sentences.

A nested sentence is a sentence structure, a combination of main sentences, which are sentences that could stand a lone, and a subclauses, which are clauses that are dependent on main sentences, in a specific way that allows for easier understanding, which is done by providing an explanation for a specific part in a subclause.

As you can see, this doesn’t really work well in english, but in German sentences actually become MORE readable if done like this.


I doubt there's any difference, because the example and my modification translate to German exactly, word by word.


Could it be that the dependency representation has an influence on this? Perhaps the dependency lengths are shorter with non-projective dependencies?


We included both projective and nonprojective dependencies.

Interestingly, there isn't any evidence that nonprojective dependencies are harder for people to understand than projective ones, so I wouldn't expect nonprojective arcs to be shorter on average than projective ones.


It's not really possible to discuss this article if we can't access it.

  This item requires a subscription to Proceedings of the
  National Academy of Sciences.
This is really not good!

http://www.pnas.org/content/early/2015/07/28/1502134112.full...



Cool. What is the measure of word dependency? I.e., I'll agree that in

John threw the trash out

"threw" and "out" are dependent on one another. But is that an either/or, or are there degrees? It seems like "threw" and "trash" are also "dependent" in that they don't make independent sense.


Specifically, we used hand-parsed corpora developed by Google and a whole bunch of computational linguists over the last 15 years.

The dependency representation is of course an incomplete picture of how words hang together in a sentence. But it's the only format that's flexible enough that you could dream of parsing 37 languages to (approximately) the same standard.


All your replies in this thread are more clear/interesting than the original linked article.

Very cool stuff.


They used sentence dependency graphs - in this sentence "John", "trash" and "out" are dependencies of "threw". There's a few variants, but the graph construction rules are standardised.

https://en.wikipedia.org/wiki/Dependency_grammar


Some combination of Enochian and LISP, no doubt. No, I didn't read the article.


I read it hoping to make a Lisp/Esperanto joke but alas, they only found one universal aspect of all languages, they did not find a universal language. The universal tendency is to bundle words together:

> You can see this effect by deciding which of these two sentences is easier to understand: “John threw out the old trash sitting in the kitchen,” or “John threw the old trash sitting in the kitchen out.”


If it's John doing the sitting, I would expect an extra comma before sitting, but for me, the first is a bit ambiguous: is the trash doing the sitting, or John?

That may be because keeping 'threw' and 'out' together in that way in Dutch feels wrong, or at least really, really awkward.


"John threw out the old trash sitting in the kitchen" is a bit of an idiomatic sentence in that understanding its intended meaning depends on knowing that throwing out the trash while sitting down doesn't make a lot of physical sense.

Writing somewhat more formally the sentence would be something like "John threw out the old trash that was sitting in the kitchen." (Although "threw out" itself is somewhat informal language. "John disposed of" or something along those lines would probably be used in a more formal context.)


Actually, I think Lisp has a pretty high average DLM compared to natural languages, which partly explains why some people have so much trouble with it. You can, by just dropping in a pair of parens, arbitrarily separate concepts from one another in a Lisp sentence. In fact, one primary complaint when reading Lisp is trying to pair up closing parens with their opening partners. Do if this research shows anything, it's that Lisp could never have developed as a human language naturally.

.. So I guess it really must have been ordained by the gods, after all, then.


Interestingly the human brain blocks any line starting with

    (fhtagn



The title is significantly overstated. DLM isn't a language, it is a rule that many languages seem to follow and says nothing about Chompsky.


It's "Chomsky" of course, but I _really_ like "Chompsky"!


Well there's also this easter egg:

http://left4dead.wikia.com/wiki/Gnome_Chompski



And a famous[1] chimpanzee called Nim Chimpsky![2]

[1] Depends on the social circles you hang out in...

[2] Oblig. Wiki. P. link: https://en.wikipedia.org/wiki/Nim_Chimpsky

Edit: Damn! Someone (literally) beat me to it.


I think the title isn't overstated but awkward in a way that we're all translating it into an overstatement.

Between the quotes and not knowing what a language universal was I assumed they meant a universal language, apparently I'm not alone in this. The authors might write paper on this phenomenon next!


From 2012: Daniel Everett: "There is no such thing as universal grammar" http://www.theguardian.com/technology/2012/mar/25/daniel-eve...


> Languages like German and Japanese have markings on nouns that convey the role each noun plays within the sentence, allowing them to have freer word order than English.

Since when has German a freer word order than English?

German has precise and strict rules about the placement of

1. normal verbs 2. verbs used in conjunction with modal verbs 3. conjunctions 4. particles in separable verbs 5. stressed parts of the sentence

And we are not talking about rules followed only by prescriptivist grammarians, but very common rules used in everyday conversations.

The article (the PR article, not the academic paper that I haven't read) looks like a poorly researched piece.


Delving in the specifics of individual human languages would be a colossal waste of productive time to otherwise pursue "harder" science and technology.

The only possible "language universal" will be machine language when humans merge with machines.

The physical makeup of the biological brain which is subject to random biochemical reactions just can't maintain something as consistent as how a "language universal" should be.



This seems to agree quite nicely with Jeff Hawkins theories. Mainly that our brains are primarily doing temporal pattern recognition and language is no exception according to Hawkins. By minimizing the temporal gap between related items, you minimize the size of the patterns needed to convey an idea.


Can mods change the poorly written misinformation-plagued blog article link to the source paper:

http://www.pnas.org/content/early/2015/07/28/1502134112


Isn't the interesting part of "language universal" some kind of commonality that is directly the byproduct of common human biology?

Closeness of concepts in words would seem to be a natural commonality because of efficiency of communications between any two entities in general.


Language wants to be a picture, even in the case of Japanese and German, grammar aside. 'Threw out' forms a nicer picture-action. There might be some baseline for cognition for which languages serves as cough 'higher level' interpreter.


By the way if you ever want a great set of tools to play with language check out Python's Natural Language Toolkit: http://www.nltk.org/


NLTK is nice but if you want to work with nontrivial amounts of data it's better to turn to Stanford's NLP tools, or Spacy.


Isn't the universal language love?


yeah yeah, wake me up when they find something that ties all JavaScript implementations together :P


really surprised this isn't getting more action on HN. the implications for CS are immense.


No, the implications are actually quite small. This article is a very bad pop-science summary of relatively banal linguistic research.

Linguists have studied linguistic universals for a long time, which are properties that all human languages have. For example, one could try to imagine (in the style of Borges) a language which had no nouns, and in which all sentences are formed of relationships between verbs—but no natural language has this feature: all natural languages have nouns and verbs.

There are also implicational universals, which are of the form, if [some language] has property X, then it will also have property Y, and tendencies, which are broad driving trends that might have individual exceptions. An example of the latter is that languages that place the verb at the end of the sentence usually have postpositions rather than prepositions, but this has exceptions (e.g., Latin.)

What's being studied here is a tendency in sentence structure: languages usually structure their syntax such that they can minimize the dependency length, or the distance between syntactically releated words in a sentence. This has long been hypothesized, but this paper gives evidence for it in the form of a large cross-language survey. Which is cool! But by no means does it have major implications for CS in any way. (At least, no more than any of the copious previous research on linguistic universals.)

EDIT: I should also add that this area of research is not new. In fact, linguist Joseph Greenberg published an article called 'Some universals of grammar with particular reference to the order of meaningful elements' in 1963. This is continuing research and, while good research, not particularly groundbreaking or pioneering.


The cool thing about dependency length minimization is that you can use it as a principle to derive many of Greenberg's word order universals, in addition to sentence-by-sentence preferences. You can also use it to derive the fact that natural language expressions are usually well-nested (though I'm somewhat dubious: it seems like there are other good possible explanations).


I think it has implications for language design. Some syntaxes are going to be better than others, based on this.

It also means that, for a function that takes several parameters, some parameter orders are better than others.


Yes, specifically you should order arguments so the order will be on average from short to long.

I try to write my code this way. map, filter, and reduce are terrible from this perspective. Unless you have do blocks like in Ruby or Julia!

Also, dplyr's %>% pipe operator is a great way to reduce dependency length in R code.


Thanks for all your answers here! Could you elaborate on the above? Why does it imply short to long orderings are better? Also, I don't follow your comment about map/filter/reduce being bad except unless you have ruby-esque do blocks. Are you referring to something like map(<big function>, array)?


Yeah, map(<big function>, array) creates a dependency that exists from when you read "map" to when you read the name of the array, potentially spanning a very long function. But if you have map(array) do <big function>, then you only have a dependency from "map" to the beginning of the function.

In general, if you have a function call f(a, ..., y, z), when you parse that (mentally, or in a shift-reduce parser) you have to keep the function name f in memory all the way to z. So you want to make a, ..., y as short as possible.

Similarly, dependency length minimization predicts that in English people will want to order expressions from short to long after a verb or preposition. There is a lot of evidence for this preference; it's been documented since the 1930s.

If there were a programming language where the function name came after the arguments, like (a, b)f, then the best order would be long-to-short.

Similarly, the DLM prediction for verb-final languages like Japanese is that people will prefer long-to-short orders. It appears that this preference does exist, but it is much weaker than the short-to-long preference among speakers of English-like languages.


There's a hidden premise in your argument that I would argue is false, and that's that programming languages should attempt to emulate the same principles as natural languages. Programming languages and natural languages do not at all fill the same niche, and there are many cases in which natural languages optimize for things that would be bad in programming languages.

For example, natural languages are infamously redundant—for example, gender agreement between nouns and adjectives and even (in some languages) verbs—but that's because they developed so that they could be understood even if you were shouting over the wind or otherwise didn't hear part of the sentence. Programming languages have no such restrictions, and as such, optimizing a programming language for the same kind of redundancy as a natural language would lead to needless tedium like

    int x = int_addition(int 2, int 3);
but in the context of a programming language, this kind of redundancy ends up being needless bookkeeping without presenting any of the same advantages of redundancy in natural language.

That doesn't mean that your conclusions are wrong—I think some parameter orderings are better than others! But I think that's true for reasons orthogonal to the findings in this paper.


I agree that you don't (shouldn't!) have the redundancy in computer languages. But I think the point of the article is that some word orderings cause less load on our brains. I think that's directly applicable to function parameters. For example, if I have a function that takes an array, a starting offset, an ending offset, and a value to search for within those offsets, then

  int limitedSearch(int *array, int startOffset, int endOffset, int searchValue)
causes less cognitive load than

  int limitedSearch(int *array, int searchValue, int startOffset, int endOffset)
simply because the offsets "go with" the array to make up one concept (where you're searching).

There may be other reasons why some parameter orderings are better than others, but I think the article is directly relevant.


It's much older than 1963. Erasmus was researching this in the 15th/16th century, but John Wilkins' 1668 book is probably the most well known of the early works[1].

[1] https://books.google.com/books?id=BCCtZjBtiEYC


Wilkins' book describes a universal language, which is not at all the same thing as linguistic universals.

Wilkins was trying to derive a language in which each word functioned as an index into a universal ontology of concepts, so that the concept represented by a word could be deduced by breaking apart the structure of the word itself. This is an interesting (if quixotic) experiment, but it's really concerned with building an a priori language.

The study of linguistic universals is the study of properties of natural languages: for example, all languages have pronouns is a linguistic universal, because it is a property that is true of all natural human languages. This is clearly not something that Wilkins was working towards: he was building a new language for the purpose of perfecting and clarifying communication. His Real Character had little—if anything—to do with analysis of the properties of natural language, and therefore also has little to do with the study of linguistic universals.


Why don't you tell us what the implications for CS are?


No, not really. They did not look at enough languages for this to be anything except a starting point for additional research.

As it is you can not draw any conclusions from this.


nah, the implications are mostly biological. The discovery itself only implies that the language module in our brains are bounded by identical rules across all people. Therefore the discovery leads only to greater insight toward an existing biological system rather then a fundamental theoretical property of language.

As a result, the implications aren't as closely intertwined with CS.


What implications for CS? Even the implications for the study of natural languages are not that great (and oversold by the article).


The day... has finally come.


Lisp?


But it uses loads and loads of parenthesis, so no one will ever use it...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: