Re: Cangelosi/Harnad Symbols

From: Stevan Harnad (harnad@coglit.ecs.soton.ac.uk)
Date: Fri Dec 17 1999 - 17:46:55 GMT


On Thu, 9 Dec 1999, Jelasity Mark wrote:

jm> My problem was (and still is) that the simulations presented in the
jm> paper have nothing to do with your ideas.

"Nothing to do with" is too strong. But if you mean that the support
from that particular toy model is weak and could and should be made
stronger by further testing, I agree.

Let me summarize what I take to be your critique (as it is rather
similar to my own critique of the original Cangelosi/Parisi findings).

    Cangelosi A. and D. Parisi. 1998. The emergence of a "language" in
    an evolving population of neural networks. Connection Science,
    10:83-97.
    http://cogprints.soton.ac.uk/abs/psyc/199803021

That paper showed that in a mushroom world, learning what to eat from
"hearsay" (what I now call "theft," but it basically amounts to using
the naming by others as your cue about what is what) beats learning
directly from features (what I now call "toil" -- trial and error, guided
by error-corrective feedback from the consequences of correct/incorrect
sorting of positive/negative instances).

My critique of this result was that the hearsay is ungrounded, and hence
that this sort of theft is not an "evolutionarily stable strategy"
(ESS). The theft outperforms the toil, because names are easier to
learn than sensory features, and hence the thieves out-survive and
out-reproduce the toilers. But once there are no toilers left, the
thieves are lost, because without hearsay, they have no idea what to
eat.

This would have been apparent if the Cangelosi/Parisi simulations had
been continued for a few more generations after the demise of the last
toilers.

My conclusion was that the problem was the ungroundedness of pure theft.

What was needed was that some (how many?) ground-level categories had to
be grounded in toil by EVERYONE, and then the competition of theft vs.
toil should be only in terms of higher-order recombinations of the
grounded categories. For then the categories acquired by theft would
inherit their grounding from the toil categories of which they are
composed ("grounding transfer").

The trivial conceptual example I always use is that if you have learnt
"horse" by toil and "striped" by toil, then there are in principle two
ways to learn "zebra," one being the old hard way (toil again), the
other by means of the proposition "zebra" = "horse" + "striped".

The simulations in the paper under discussion were meant to show the
evolutionary adaptiveness of exactly that: 5 features, A, B, C, D, E
and three categories:

"EAT" (feature A must be learned to learn this category, and BCDE ignored,
by TOIL)

"MARK" (feature B must be learned, and ACDE ignored, by TOIL)

and

"RETURN" (two ways to learn: either (i) feature conjunction AB must be
learned, and CDE ignored, by TOIL, or (ii) the proposition "RETURN" =
"EAT" + MARK" must be overheard [hearsay], by THEFT).

The idea was that, unlike in the Cangelosi/Parisi paper, the theft-based
categories would be grounded (hence theft would be an ESS) because they
were merely recombinations of grounded toil-based categories, and hence
they would inherit their sensorimotor grounding from their components.

Now in the simulation a few components were knowingly finessed:

(1) We did not really have multiple foragers foraging in parallel. We
just had one doing it at a time (but doing it in parallel would not be
a problem in principle; this just simplified the simulation).

(2) The foragers did not actually learn to "vocalize" as they foraged.
They were just hard-wired to do so. Again, this is not a problem of
principle, as vocalization could have been shaped by learning too.

(3) The foragers did not actually imitate one another's vocalizations
when they overheard them, they merely processed them as passive signals
(an extra feature, over and above ABCDE, if you like), but again, this
was in order to simplify the ecology and the simulation, not because
vocal imitation is a problem in principle.

What was critical was only that the correct categorization of RETURN
should depend on the feature conjunction AB, that there should be two
ways to learn this, toil vs. theft, and that, having learned RETURN by
theft rather than toil, the advantage should be evolutionarily stable,
so that when there are no organisms left that have learned RETURN by
toil, the thieves still know which mushrooms to return to.

Now JM is suggesting (and he may be right) that -- not just as a
matter of an alternative interpretation of this toy simulation, but as a
matter of the information inherent in the task itself -- something worse
than "theft" is going on, with the category "RETURN," namely,
"cheating."

JM is suggesting that the only reason theft LOOKS as if it were an ESS
is because the "proposition" itself ("RETURN = EAT + MARK"), with the
help of the magical hard-wiring of the vocalization/imitation cue, is
serving as a persisting feature for "RETURN."

To put it another way, there is both an expensive feature (AB) and a
very cheap feature (the vocalization), both of which identify the
category "RETURN," and we have simply shown the advantages of cheap
features over expensive ones (and the cheap feature is cheating, because
we have built it in).

JM correctly notes that only if there is independent evidence of
"grounding transfer" of the AB feature, without cheating, will the
simulation have shown what it had set out to show.

Well, we do have independent evidence of grounding transfer, but
unfortunately it is not in the same simulation. So I have to agree with
JM that the theft hypothesis is not adequately tested in this
simulation. I am fairly confident, however (and I think the reason
these simulations look so trivial is that it is obvious what the
outcome must be), that in an experiment explicitly controlling for the
cheating artifact, the outcome would be the same: Grounding would
transfer even without cheating, theft would beat toil, and the strategy
would be evolutionarily stable.

jm> on terminology: I used the term "concept" as done in machine learning.
jm> As I understand it, it is basicly the same as the psychologists' term
jm> "category", i.e. a subset of some domain (male chicken as chicken,
jm> red as color, etc.).

Good. Then we are talking the same language.

jm> So, again, there IS Baldwinian effect. I didn't want to say more,
jm> it is only a little correction anyway.

Fine, correction will be made in the paper (thanks: referees missed
that!).

> jm> Here, "theft" organisms learn return based on the call, and "toil"
> jm> organisms learn based on the mushroom. This means that "theft"
> jm> organisms receive the very same input as toil organisms, except
> jm> they don't receive garbage (C,D,E features).

You are right that the hard-wiring allowed the theft organisms to
"cheat" by not having to bother with AB vs. CDE.

The paper on grounding transfer is:

    Greco, A., Cangelosi, A., & Harnad, S. (2000) A Connectionist Model
    for Categorical Perception and Symbol Grounding. Connection
    Science.
    ftp://gracco.irmkant.rm.cnr.it/pub/angelo/cangelosi-evocom.ps

The simulations will need to be extended in further studies to include
controls for the kind of cheating you point out.

jm> ac> 3) GROUNDING TRANSFER. It is it important to say that theft organisms
jm> ac> actually recognise a RETURN mushroom (1) from the call that describes
jm> ac> it (as expected after the explicit backprop learning) AND ALSO (2)
jm> ac> when they see its features (for a study of the "grounding transfer"
jm> ac> phenomenon, please see also the paper Cangelosi-Greco-Harnad). The
jm> ac> learning of the call and behaviour RETURN will also ground them in
jm> ac> the perceptual categories.
jm>
jm> Without this effect, theft learning of RETURN would be grounded in the
jm> calls. This means that though the organism have EAT and MARK grounded,
jm> and RETURN depends only on these two, the organism had no way to tell
jm> this relationship.

Not only no way, but no need (that is why this might well be cheating).

jm> It would have three independent concepts, two
jm> grounded in perceptual input, one in calls. To tell the relationship,
jm> RETURN should have an explicit logical structure referring to the names
jm> of eat and toil and rules on how to apply the logical structure in
jm> decision tasks.

No. Now you are going rather further, and recommending a theory of how
propositional structure needs to be encoded (as a formal symbol
system). I think you were right the first time: A control is needed to
eliminate the possibility of cheating. It also has to be shown that the
theft strategy is not only an ESS but that it "scales up" to any boolean
recombination of prior categories, rather than just working for a simple
conjunction.

jm> Neither is present in the model. RETURN has no logical
jm> structure, it is learnt via toil in the domain of calls. It is faster,
jm> because there are less features (3 versus 5), furthermore all three are
jm> relevant in the case of calls, while with perceptual input, C,D and E
jm> are irrelevant, i.e. "garbage".

I would put it the other way: The "garbage" features, CDE, are
irrelevant to the RETURN task when learned by theft, because the
informational structure of the task (and the way we "finessed"
vocalization and imitation) has made them irrelevant. This is why
it is cheating, and hence needs to be controlled for.

It is a further theory (JM's, not mine) that a prior symbol system is
needed for theft to succeed without cheating, and to generalize. I do
not agree. I think the origins of the adaptive advantages of symbolic
theft over sensorimotor toil PRECEDED formal symbol systems. In other
words, the first boolean combinations were learned as primitive
invariants, not as the application of formal logical structure. It is
the origin of that logical structure that we are trying to explain
here.

jm> If there is grounding transfer, that may be only due to the similar
jm> structure of calls and perceptual features.

No, that we did control for. But to force grounding transfer to bear the
weight of the theft, we have to eliminate the possibility of cheating.

jm> With using arbitrary
jm> "words" that are not correlated with perceptual input (as we have it in
jm> natural languages: the word "zebra" has no strips)

And the vocalizations have no features; it is only the cheating
loophole that makes it seem as if they do, for then the vocalization
itself becomes, or takes the place of, the features.

jm> I suspect the
jm> grounding transfer would disappear making the whole approach
jm> irrelevant.

I suspect (and I think the other grounding transfer tests bear this out)
that the theft strategy would still win, even when the cheating loophole
was closed, but it would not win quite so easily.

jm> And I haven't mentioned the catastrophic forgetting effect, which means
jm> that after too much theft learning the weights from the hidden layer to
jm> the output layer can change so much that EAT and MARK can be forgotten
jm> altogether.

That's true, but that is a net-specific problem; there are better nets.
It is not a problem for the theft vs. toil competition in principle.

jm> I also have to mention that the only relevant response to this
jm> criticism is to prove that with arbitrary words that have no structural
jm> correlation neither with perceptual input nor with action output there
jm> is still significant grounding transfer. It seems to be a mathematical
jm> miracle, however. I'm am very much interested in the paper Cangelosi
jm> refers to.

I agree that names need to be completely arbitrary and structurally
uncorrelated with what they are naming (this is Saussure's
"l'arbitraire du signe").

It does not require a mathematical miracle (since "Zebra = Horse +
Stripes" clearly does it, unmiraculously). It just requires a slightly
more rigorous toy model.

Note that names are structurally uncorrelated with the category they
name, but they are certainly functionally correlated with it, via the
successful sensory feature detection. The string of arbitrary names
combined in a proposition is still structurally uncorrelated with the
new category, except in the conjunction of the two old categories, and
what it inherits from that (the conjunction of the underlying
features).

jm> One more thing: I can't remember reading about grounding transfer in the
jm> paper, though it may be my fault.

It is mentioned in the introduction on CP, but not further discussed in
this paper.

> jm> Instead of relying on the call
> jm> input, a third strategy could be to use the organisms own
> jm> output as input, i.e. to base the learning of new categories
> jm> on old ones. It would provide the same advantage, and indeed it
> jm> does. The frog's eye recognises concepts connected to size and motion,
> jm> and his concept "eat" depends on these primitive ones, forming
> jm> a hierarchy.
>
jm> I only said here, that once you have some
jm> categories grounded, then you can use them as input to learning higher
jm> level categories. This is also easier than pure honest toil, it is a
jm> sort of self-theft, though the "name" of the target category is of
jm> course missing. The type of learning is irrelevant.

I'm not sure what you mean. You can't learn a new category by talking to
yourself (before you have language!). If you are thinking of reasoning,
it is a little premature in this model. Same for Vygotskyian inner
speech. We need to get to outer speech before we can get to inner
speech....

> jm> In other words, we can see the world AND hear the names of things.
> jm> In my view, theft is done the following way. We hear a new name first,
> jm> and AFTER THAT we figure out how to ground it in PERCEPTUAL INPUT and
> jm> OUR OWN old concepts.
> sh>
> sh> I'm not sure what you have in mind. But here's a new name: "ban-ma" and
> sh> if you go to a zoo, you will find some. Now go figure out how to ground
> sh> it.
>
jm> I'd go to the zoo, and I'd took a good look at the animal which has
jm> "ban-ma" written on its cage. Language works like God, who provides
jm> names and helps us learn to ground them from others who already know
jm> their meaning.

Ah, this is similar to your point about the possibility of learning a
category from positive instances alone (and suffers from the same sort
of problem, namely, that in nontrivial cases it is impossible).

Yes, If all members of a category, besides their critical features
(which are hard to find, and normally need to be learnt by honest toil)
also wore their names on their sleeves, then categorization would be a
lot easier (indeed, you would not have to worry about figuring out
features at all). Indeed this is precisely what cheating is.

No, in a nontrivial category learning task, my telling you that THIS is
a ban-ma would do you next to no good in deciding whether or not the
next candidate was a ban-ma too (just as eating one mushroom, and
either not getting sick, does not thereby make me capable of
distinguishing the edible from the inedible mushrooms). It's not just a
matter of being given a positive instance and contemplating it till its
critical features leap out at you. The critical features are detected
by trial and error, from sampling many positive and negative
instances (with the help of an internal implicit learning device --
possibly a neural net -- that it good at doing just that).

--------------------------------------------------------------------
Stevan Harnad harnad@cogsci.soton.ac.uk
Professor of Cognitive Science harnad@princeton.edu
Department of Electronics and phone: +44 23-80 592-582
Computer Science fax: +44 23-80 592-865
University of Southampton http://www.cogsci.soton.ac.uk/~harnad/
Highfield, Southampton http://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM



This archive was generated by hypermail 2b30 : Tue Feb 13 2001 - 16:23:07 GMT