Neuroscience, evolution, and culture

New paper: `Aesop’s fable’ experiments demonstrate trial-and-error learning in birds, but no causal understanding

Well, it seems I have not written here since two years ago! It has been a busy and exciting period, largely occupied by a book project that is looking at cognitive differences between humans and other animals. One of the by-products of this project is the title paper, a meta-analysis effort in collaboration with Johan Lind. In this paper, we offer a critical look at recent claims that birds, and in particular corvids, can “understand” properties of the physical world such as “light objects float, heavy objects sink,” and are able to use such knowledge to solve new problems. The performance of these birds in some tasks has been compared to that of 5-7 year old children.

The best way to understand the puzzles presented to the crows is to watch this video, from Jelbert et al. (2014) :


From the video, the performance of New Caledonian crows appears impressive. The results of our meta-analysis, however, are not supportive of the original claims. In summary, it seems that crows learn the correct behavior by trial-and-error as they perform the task. In almost all tasks, the birds start choosing one of the two options at chance, and only gradually they switch to the more functional option. The video shows the final stage of learning, rather than the initial random behavior.

We also compared the crow data with data from children, and we found clear differences. While younger children do not do well on most tasks, children aged 6 and older perform much, much better than birds, despite having received much less training.

There are one or two examples of tasks in which birds do well from the very beginning, as well as some tasks in which birds do not learn at all. In our paper, we argue that both occurrences can be understood based on established knowledge of animal learning, and especially associative learning.

The full article has appeared in Animal Behaviour.



New paper: Animal memory: A review of delayed matching-to-sample data


Clarck’s nutcracker (image source)

Animal memory surprises us in many ways. How come, for instance, that Clarck’s nutcracker (pictured) can remember the location of thousands of seeds for many months, but cannot remember the color of a light for more than thirty seconds? To make sense of this and similar pradoxical findings, in a new paper we look at the performance of different species in the delayed matching-to-sample task (DMTS). This somewhat unwieldy name stands for a very simple procedure: we show a sample stimulus for a few seconds, then take it away a wait for a delay. At the end of the delay we show two stimuli: one identical to the sample, the other different. The animal is rewarded (generally, with food) for choosing the stimulus that matches the sample.

It turns out that, while a surprising range of species can learn this task equally well when the delay is very short (bees, pigeons, rats, sea lions, apes, dolphins, you name it), most species have remarkably short memory spans. Bees, those microscopic geniuses, can handle at most a few seconds’ delay, while in most birds memory span is in the range from 10 to 20 s. Mammals seem to do a bit better, a minute or so, but because data have been gathered from just a handful of species (we could find 25) we cannot be sure that this difference is reliable. Only pigeons have been extensively studied among birds, and it is perfectly possible that other birds species have memory spans comparable to mammals. What seems clear, however, is that humans can easily remember simple stimuli for much longer times (48 hours is documented, but it’s easy to imagine much longer memory spans, see the paper linked below for a detailed analysis of the data).

What have we learned from this review? We suspect that long memory spans are possible in non-human species only in the presence of specific adaptations for remembering specific kinds of information (e.g., food locations). Lacking such an adaptation, even simple stimuli like the colored lights often used in DMTS experiments are hard to remember, and there do not seem to be huge differences between species (at least, across vertebrates).

A preprint of the paper is available here.

Media coverage: National Geographic,

New paper: Solution of the comparator theory of associative learning

A few weeks ago I had the good news that our paper on the comparator model of associative learning had been accepted in Psychological Review. This is my first published paper co-authored with by an undergraduate student, Ismet Ibadullaiev, which makes me even happier. The paper (I put up an unofficial copy on my Papers page) deals with a very interesting model of associative learning in which most of the interesting phenomena are generated as memories are retrieved, rather than when memory are stored as assumed by most mainstream theories of associative learning (e.g., the Rescorla-Wagner model and its derivatives).

Our conclusion, unfortunately, is that the theory makes a number of paradoxical predictions that are hard to reconcile with empirical data on learning. For example, it predicts that, in many cases, animals would not distinguish which of two stimuli is most associated with a reward (they do distinguish, of course), or that they should learn equally about faint and intense stimuli (in reality, animals learn preferentially about intense rather than faint stimuli).

These problems have been hard to recognize because the theory had been studied exclusively by intuition and computer simulation. Both are fine tools, but they do run into trouble. The predictions of comparator, as it turns out, vary greatly depending on the value of a few parameters, and our intuition is not well equipped to reason about the non-linear effects that abound in the theory. Simulations give us correct results, but only for the parameter combinations we simulate. We have been fortunate enough to realize that one could write down a formal mathematical solution to the theory. With this solution it became much easier to see the big picture and actually prove what the theory can or cannot do.

I enjoyed working with comparator theory because of its distinct flavor – as hinted above, it’s rather different from other learning models – and because of the many surprises we had while exploring its predictions. Although we found what appear to be serious flaws in the theory, these might be more in its mathematical implementation than in its core concepts. The ideas that memory retrieval is an important factor in associative learning, and that stimulus-stimulus associations are more important than other models acknowledge, may well be worth pursuing. But the formulae that translate these ideas into a testable model will surely need to be revised.

New paper: On elemental and configural theories of associative learning

A new paper of mine just came out in the Journal of Mathematical Psychology. It considers an old issue that has traditionally split the field of associative learning, and that echoes various scientific disputes between holism and reductionism. The question is, when an animal learns about a stimulus, how is the stimulus endowed with the power to cause a response? Configural models of learning assume that a mental representation of the stimulus “as a whole” acquires associative strength (learning psychologists’ term for a stimulus’ power to cause a response), while elemental theories assume that the stimulus is fragmented in a number of small representation elements (say, shape, color, size, and so on), each of which carries some associative strength.

Long story short, it turns out that there is practically no difference in these two approaches. They amount to different bookkeeping of associative strength without this having necessarily any observable consequence. In fact, the main result of the paper is that, given some mild assumptions, for every configural model there is an equivalent elemental model – one that makes exactly the same predictions about animal learning – and, vice-versa, every elemental model has an equivalent configural model.

Thus there is no “better way” to think about how stimuli acquire associative strength, something that I expect will surprise some learning scholars. What I have personally most enjoyed discovering while working on this topic is that learning psychologists, and specifically John M. Pearce in this 1987 paper, have re-invented the formalism of kernel machines, a workhorse of machine learning and computer science since the 1960s. In fact, my proof of the equivalence of configural and elemental models is itself a re-discovery, in a much simpler setting, of the “kernel trick” of machine learning (see the previous link, and thanks to an anonymous reviewer for pointing this out).

Intriguingly, this is not the first time learning psychologists independently develop concepts that had been introduced in machine learning. Another remarkable case is Donald Blough‘s 1975 re-invention of the least mean square filter (or delta rule), a kind of error-correction learning that had been developed in 1960 to build self-regulating electronic circuits, and that Blough developed as a model of animal learning. I resist from speculating too much on whether this means that there is only one way to be intelligent – be it for animals or machines.

New paper: Dog movie stars and dog breed popularity

Our latest paper on the cultural evolution of preferences for dog breeds came out yesterday in PLOS ONE. The message is simple: dog breeds that are featured in successful movies (Lassie come home, 101 Dalmatians, and many others) tend to increase in popularity, sometimes for many years after movie release. This influence was quite strong until, approximately, the 1970s, but has declined since—probably because cinema no longer dominates the media as it used to. You find a nice writeup with more details on co-authors Hal Herzog’s Psychology Today column and Alberto Acerbi’s blog. Some press coverage is here:

What makes a dog breed popular?

Some time ago I wrote about fashions in dog breeds, pointing out the wild fluctuations in popularity in many breeds. Why do these occur? Owning a dog is a serious commitment in terms of time and money, and it would seem natural to try to acquire a dog that is healthy and with a good temperament. I set to find out whether this is actually the case with my colleagues Alberto Acerbi, Hal Herzog, and James Serpell.

In our new paper Fashion vs. Function in Cultural Evolution: The Case of Dog Breed Popularity, we show that, surprisingly, people do not prefer breeds that are better behaved or healthier. On the contrary, the most popular breeds are the most unhealthy, with a host of genetic defects that are at least partly related to intense selection to adhere to quirky breed standards, and possibly with more behavioral problems such as fear of other dogs, aggressiveness, or separation anxiety. We obtained these results crossing data from the C-BARQ database of dog behavior created by James (the actual data used in our analysis are here), data about dog registrations provided by the American Kennel Club to Hal Herzog (available here), and previously published health data (references 14-17 in the paper).

Thus many people (at least those interested in breed dogs) prefer to acquire a dog that is socially recognized to meet a certain “standard” than a healthy and well behaved dog. If you are unfamiliar with breed standards, I can tell you that they are quite exacting, and to many may appear just pointless. Here is, for example, what the nose of a bulldog is supposed to look like:

The nose should be large, broad and black, its tip set back deeply between the eyes. The distance from bottom of stop, between the eyes, to the tip of nose should be as short as possible and not exceed the length from the tip of nose to the edge of underlip. The nostrils should be wide, large and black, with a well-defined line between them. Any nose other than black is objectionable and a brown or liver-colored nose shall disqualify.

(From the AKC web site)

Note: “disqualify” means that the dog should not be considered a “true bulldog.”

The age of human cultural capacity

Venus of SchelklingenWhen did humans evolve, to its full extent, the capacity to create complex culture? We consider this question in a paper appearing in the May 7th issue of Scientific ReportsHere is a quick summary.

Human cultural capacity has been traditionally dated to about 30-40 thousands of years ago, based on an impressive cultural explosion in Europe around that time, leaving us such evidence as sophisticated stone tools and plenty of “art” (objects without any clear practical use), like the figurine depicted to the right, the lion man, and striking cave paintings.

There is a problem, though. If cultural capacity evolved in Europe 30-40 thousand years ago, how did all the human groups that where living outside Europe get it? We have no evidence of genetic flow from Europe to the rest of the world, through which the genes responsible for cultural capacity could have spread. It appears that humans must have had the capacity to create complex culture before they fragmented geographically over a large area. This conclusion, however, appears equally problematic because the first split between human populations is currently dated at about 170,000 years ago. Thus humans would have had the capacity for complex culture for more than 100,000 years before complex culture actually appeared. Although this appears unreasonable, we argue that things actually went this way.

First, we note that archaeologists have unearthed stone tools of complexity comparable to that of the European cultural explosion, but much older (more than 200,000 years old). We also note that other indicators of behavioral modernity appeared earlier than 170,000 years ago, such as genes believed to be important for language and the morphology of the speech apparatus.

Second, we summarize recent work in cultural evolutionary theory showing that cultural evolution is, in its initial stages, exceedingly slow. The reason is essentially that culture is a cumulative process: Complex culture can be created only by building on already existing culture. Thus in the initial stages of cultural evolution there was not enough raw material to be elaborated upon, and the creation of culture was slow. Additionally, human groups were at this time small and scattered over a large area, hence it is likely that cultural elements have been invented many times but disappeared (we make a couple of examples in the paper).

The bottom line is that there is no evidence inconsistent with an early origin of cultural capacity, and current understanding of cultural evolution shows that a long gap between the genetic evolution of the capacity and the actual invention is, in fact, quite expected.

And, we suggest in the paper, Neanderthals may have had the same cultural capacity as ourselves.

An identity on falling powers

Here is a bit of combinatorics I encountered when preparing a paper on the co-evolution of behavioral repertoire, brain size, and lifespan (I will talk about the paper another time…). Let’s begin with two definitions:

Definition 1: The falling power n^{\underline{m}} is defined as the product n(n-1)\cdots(n-m+1), or:

(1)    {\displaystyle n^{\underline{m}}=\prod_{i=1}^m (n-i+1)}

If you are familiar with binomial coefficients, they are related to falling powers by n^{\underline m} = m! {n \choose m}.

Definition 2: The Stirling numbers of the second kind are a double series of numbers that tell us how many ways there are to partition n objects into k non-empty subsets. This is written sometimes S(n,k) and sometimes \left\{n\atop k\right\}. I will use the fancier notation. For example, there are only 3 ways to partition 3 elements in to 2 non-empty subsets: (12)(3), (13)(2), (1)(23), where (xy) means that x and y have been put together in the same subset. These numbers, beyond the simplest case, are nowhere near intuitive (at least to me). For example, There are 7 ways to partition 4 objects into 2 subsets, hence \left\{4\atop 2\right\}=7 (you can figure these ways out for yourself), and 1701 ways to partition 8 objects in 4 subsets, hence \left\{8\atop 4\right\}=1701 (I suggest you do not try this on your own).

Now, it happens that falling powers and Stirling numbers of the second kind are related by the following identity:

(2)    {\displaystyle n^m = \sum_{k=0}^m\left\{m\atop k\right\}n^{\underline k}}

I have only seen this equation proved by induction, but working on the above-mentioned paper I stumbled upon a direct proof that goes as follows. Note first that n^m is the number of ways to arrange n objects in sequences of length m, with repetitions possible (by sequence I mean an ordered selection so that (1,2,2) and (2,1,2) are two different sequences). So we have

(4)    \mbox{\# different sequences of }m\mbox{ objects chosen with repetition among }n = n^m

Equation (2) then comes from the fact that its r.h.s. is a different (and more laborious) way to count the same sequences. In other words, we can first count the sequences that we can form using only k out of the n objects, and then sum over k:

(5)    {\displaystyle n^{m} = \sum_{k=0}^n \mbox{\# different sequences of }m\mbox{ objects using any }k\mbox{ objects out of }n}

 Now we have to calculate the expression in the sum. Consider thus constructing a sequence of length m out of k distinct object, which in turn have been selected among n. There are n^{\underline k} ways of selecting which of the k objects are going to be part of the sequence, given that the first object out of n, the second out of n-1, and so on, until the k-th object can be selected out of n-k+1. Once we have the k objects, in how many ways we can allocate them among the m places of the sequence? This is exactly the number of ways in which a set of size m can be partitioned in k non-empty subsets, or, if you want, the number of ways in which m balls can be placed in k bins without leaving any one bin empty. Thus

(6)    {\displaystyle \mbox{\# different sequences of }m\mbox{ objects using any }k\mbox{ objects out of }n} = \left\{m\atop k\right\} n^{\underline k}

which, together with (5), gives (2).

Empirical support for openness-persuasiveness dynamics

A recent study by Aral & Walker provides support that the openness-persuasiveness dynamics we suggested a few years ago actually goes on in cultural evolution. In short, we had put forward mathematical and simulation models to support the notion that learning from others produces individuals that, over time, become more conservative (less likely to learn from others) and more persuasive (more likely to convince others of one’s own ideas). These predictions have been confirmed by Aral & Walker, who showed that older Facebook users are more difficult to convince do adopt a Facebook app than younger users, and yet are better at convincing others to adopt the app. Up to now, we only had indirect evidence about openness (older people score low on openness in personality tests), and no evidence on persuasion.

We have submitted a comment to the journal relating Aral & Walker’s intriguing findings to our theory. You can find a slightly extended version here, essentially with more references to relevant work.

Human Cognitive Uniqueness conference videos are online!

Watch them here!