Re: Cangelosi/Harnad Symbols

From: Stevan Harnad (harnad@coglit.ecs.soton.ac.uk)
Date: Wed Dec 08 1999 - 18:47:51 GMT


On Tue, 7 Dec 1999, Jelasity Mark wrote:

jm> the main adaptive advantage of language
jm> is the boosting of concept learning.
jm> The exact way language boosts
jm> learning is not described yet.

Let me quickly describe it here. ("Concept learning" is rather vague,
and I'm not sure what it means.) I mean learning to act upon, name and
describe new KINDs of things, i.e., learning new categories.

There are two ways to learn new categories. (1) through direct
sensorimotor interaction with the kinds of things in question: learning
by trial and error, with feedback form the consequences of making
mistakes, what to do with what, what to call what. That is what we have
called "sensorimotor toil." It is time-consuming and risky, hence costly.

The second way is through symbols: (2) We are given a string of symbols
that TELLS us what to do with what, what to call what. This is what we
call "symbolic theft" and it has the advantage that it is instantaneous
and costs no time or energy, and entails no risk.

The only problem is that it is not possible to get ALL new categories
this way (2). There has to be a repertoire of old ones that have been
acquired the old way (1). That is the symbol grounding problem.

http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad90.sgproblem.html

jm> One of the fix points in the
jm> interpretations is that language
jm> somehow helps learning new categories; it is rather uncontroversial.

The question the paper tries to answer is the HOW of this "somehow"
(that IS controversial). Moreover, the point is that language's
essential function, and the basis for its adaptive value, is that
it is a new and much more powerful and efficient way of acquiring
new categories.

jm> Another fix point is that the main adaptive advantage of language
jm> lies in this first fix point. It may be more controversial, but it
jm> is true that at least NOW it is one of the main functions of language.

Correct.

jm> in the following I will show that the model and the experiments
jm> presented here support only a rather trivial statement: it is easier
jm> to learn a category if the learning samples do not contain
jm> irrelevant features.
>
jm> I will also argue that this fact does not prove the adaptive advantage of
jm> language, the suggested mechanism is not the way concept learning
jm> works through language.

Your plans are well announced. Now let us see:

jm> (a good piece of advice: eat, mark, and return doesn't not mean anything
jm> like eat, mark, return. read them as a1, a2, a3. only motion is real.)

I agree completely. This is a toy model with a small number of
parameters. It is always best to "de-interpret" such a model in weighing
it. On the other hand, if the toy model DOES capture the right
properties of what it is trying to model, that is, if it is capable of
scaling up to life-size, then the interpretation is justified.

  sh> The net's action and call outputs are compared with what they
  sh> should have been; this difference is then backpropagated so as to
  sh> weaken incorrect connections and strengthen correct ones. In this
  sh> way the forager learns to categorize the mushrooms by performing
  sh> the correct action and call. In the second spread of activation
  sh> the forager also learns to imitate the call. It receives as
  sh> input only the correct call for that kind of mushroom, which it
  sh> must imitate in its call output units. This learning is likewise
  sh> supervised by backpropagation.
jm>
jm> Note that imitation learning is independent of the observed
jm> mushroom features. The imitation of any call can be learned
jm> in the presence of any mushroom with the same overall effects.

Correct. The imitation learning and the mushroom learning are two
different tasks.

  sh> At the end of their
  sh> life-cycles, the 20 foragers with the highest fitness in each
  sh> generation are selected and allowed to reproduce by engendering
  sh> 5 offspring each. The new population of 100 (20x5) newborns is
  sh> subject to random mutation of their initial connection weights
  sh> for the motor behavior, as well as for the actions and calls
  sh> (thick arrows in Figure 2); in other words, there is neither
  sh> any Lamarckian inheritance of learned weights nor any Baldwinian
  sh> evolution of initial weights to set them closer to the final stage
  sh> of the learning of 00, A0, 0B and AB categories. This selection
  sh> cycle is repeated until the final generation.
jm>
jm> Of course, there IS Baldwinian evolution, as Cangelosi admitted.

Cangelosi not only admitted but affirmed that there is Baldwinian
evolution in the model, but not in the initial weights!

For others: Baldwinian evolution in learning is an effect in which it is
not the learning itself that is inherited but the propensity to learn
it. There was an inheritance of the propensity to learn in this model,
but it was in the propensity to learn BY SYMBOLIC THEFT, not (as
correctly noted above) in the propensity to learn BY SENSORIMOTOR TOIL.

In other words, each successive generation was more genetically inclined
to learn by theft instead of by toil, but they were not becoming more
genetically inclined to know which mushrooms were edible and which were
not! THAT they had to re-learn in each generation by toil.

jm> Now, my extended version of the model and experiments
jm>
jm> We have a population of 100 organisms.
jm>
jm> The learning procedure for every organism is the following:
jm>
jm> Every organism is put in a new world 20 times, in each they make 100
jm> steps. In each step, the following is done:
jm>
jm> the closest mushroom is God and God teaches the
jm> organism what to do (eat, mark), but not how to move around making
jm> sure that the organism continues to do a random walk trough the space, and
jm> does not teach "return" neither, because it will be used to test the theft
jm> effect.

Correct, but note that in the intended interpretation, it is not God who
teaches these things, but the CONSEQUENCES of doing the right/wrong
thing. If you eat a poison mushroom, you get sick; if you it an edible
mushrooms, you get nourished.

There will be consequences for "return" too, but they are not yet
operative in the artificial ecology we have designed.

jm> It is based on mushroom
jm> description only, through a backprop cycle, and an additional backrop
jm> cycle God teaches it what to say (based on the correct call only).

I'm not sure what you mean here. Backprop is the error-correction rule
by which the neural net updates its connection strengths. There is no
God; the assumption is that this error backpropagation is triggered by
whether or not the organism has made an error, which in turn is
determined by the consequences.

This is, in other words, simply a mechanism that implements Skinnerian
reinforcement learning. No deus ex machina.

Note to others: Backprop is often criticised for being "supervised." The
idea is that supervision is like having a private teacher that corrects
each of your mistakes every time, internally, and directly, in an
unrealistic way. This is the wrong interpretation of backprop. (Yes,
learning mechanisms can be interpreted and de-interpreted too.) If you
wish to interpret it at all, rather then just take the mechanism to be a
mechanism that generate certain outputs in response to certain inputs,
then interpret it as a mechanism for modifying internal weights and
connectivity on the basis of feedback from the consequences of error.
The only "supervisor" is the environment; again, no deus ex machina.

(I am not, by the way, especially an advocate of backprop, but it is
important to set lay to rest right away this common but incorrect
criticism of backprop.)

jm> If an organism is in the cell of a mushroom and happens to do the right
jm> thing, that it gets a reward.

Correct.

jm> When we have the sum of the rewards for all the 100 organisms, we chose
jm> the top 20 organisms, making 5 new organisms from each via changing 10%
jm> of the weights of the original randomly. Thus we have a new population of
jm> 100.
jm>
jm> The above population cycle is repeated 200 times. Then in the following 10
jm> generations we can proceed 2 ways.
jm>
jm> 1: everything is the same except after the usual 2000 steps the organisms
jm> live further and return is included in the teaching (or ONLY return
jm> is taught?) for and additional 2000 steps.

RETURN is included among the survival conditions: There is a
a positive consequence of returning to the AB mushrooms, just as there
was a positive consequence of eating the A mushrooms and marking the B
mushrooms.

jm> 2: the first 2000 steps are the usual as in 1, but then every organism
jm> learns return (both call and action) (ONLY return?) based on a
jm> correct call.

They all already know eat and mark.

jm> Now, let's do some simplifications, that leave the predictive power of
jm> the model intact.
jm>
jm> First observe, that the motion of an organism is
jm> in fact a random walk, since motion is never taught, at least by
jm> backprop. the genetic algorithm may have some effects, and it may well
jm> be that fitness increases because organisms learn to approach mushrooms
jm> more efficiently. However, this effect is irrelevant from the point of
jm> view of learning concepts about mushrooms through toil. The
jm> results of the paper would not change if motion, and the concept of
jm> "the world of mushrooms" were discarded altogether.
jm> Fitness could be calculated via any measure of learning accuracy over
jm> an example set of different mushrooms.

Correct, but irrelevant. We are talking about learning to sort
mushrooms, not learning to walk or to approach.

jm> Second, observe, that the call and action outputs are in fact identical.
jm> The call output of the organisms are never used in any experiment.
jm> The mysterious "imitation learning" phase seems to be useless, since
jm> it teaches a function that is never used. The only possible effect of it
jm> is that it somehow "clears the ground" for theft learning making use
jm> of the fact that the action and call output is identical, so in the
jm> theft learning the organism has to IMITATE the call input in its action
jm> output. If this is right then it is cheating. If this is not right, then
jm> the call output is useless. This means that the call output and the
jm> imitation learning phase can be discarded.

It is right, and it is not cheating. The experiment is not on imitation
learning, which is not a problem in principle. It is about the relative
value of the Toil vs. Theft strategy. Without the imitation learning
phase there would be no detectable signal sent or received, so no
hypothesis would be tested.

Look: It was made clear that the model is a toy model, with too few
parameters to bear the weight of a realistic ecological interpretation.
Nevertheless it did test the relative success of learning to categorize
by two radically different means -- one slow, one fast; one direct, one
indirect; one sensorimotor, one symbolic. If the toy model captures
realistic variables, and if the two strategies do indeed capture the
relative effectiveness of the prelinguistic and the linguistic way of
acquiring new categories, then the rapid dominance of the one strategy
over the other is a possible explanation of the adaptive advantage of
language.

jm> Third, evolution and learning both evolve the very same weights of the
jm> organism. The combination of evolution and backprop is virtually a
jm> single learning algorithm that has to find good weights for the given
jm> task (the genetic algorithm is typically used to find structure, not
jm> weights, in which case this is not true). So we can think of the model
jm> as containing only one organism, being taught by some algorithm
jm> based on a set of learning examples.

Toy models can always be interpreted many different ways; it is not
particularly informative to show that other interpretations are possible.

jm> Here, "theft" organisms learn return based on the call, and "toil"
jm> organisms learn based on the mushroom. This means that "theft"
jm> organisms receive the very same input as toil organisms, except
jm> they don't receive garbage (C,D,E features).

Not interesting. The "garbage" was there to make the learning less
trivial on OUR interpretation.

jm> This means that the model only proves, that learning is more effective
jm> without garbage in the input examples.

Nothing of the sort.

I like criticism, but criticism is usually more useful if it is based on
the "charity assumption," which is that if there is a way to interpret
what someone is saying in such a way that, if that is what they meant,
then they must be rather stupid, then maybe I should try another
interpretation. Only if no more charitable interpretation is possible
should I assume that my uncharitable one is the right one...

  sh> 8. Conclusions
  sh>
  sh> We have shown that a strategy of acquiring new categories by
  sh> Symbolic Theft completely outperforms a strategy of acquiring them
  sh> by Sensorimotor Toil as long as it is grounded in categories
  sh> acquired by Toil.
jm>
jm> If theft means learning without garbage, yes.

It does not mean garbage learning, so try again.

  sh> Can results from a 3-bit toy world really cast light on the
  sh> rich and complex phenomenon of the origin and adaptive value of
  sh> natural language? This is really a question about whether such
  sh> findings will "scale up" to human size in the real world.
jm>
jm> The problem is not with the scaling up. I will mention two serious
jm> problems with this approach.
jm>
jm> The first is that in the example of the paper, the organisms had
jm> to learn a concept (return) that depends only on two other
jm> concepts, eat and mark. When the theft phase begins, the organism
jm> has already acquired eat and mark.

That was intentional -- in fact, it was the whole point of the
simulation. "Eat" and "mark" must previously be grounded in toil, before
"return" can be acquired by theft (through a boolean combination of the
old categories).

jm> Instead of relying on the call
jm> input, a third strategy could be to use the organisms own
jm> output as input, i.e. to base the learning of new categories
jm> on old ones. It would provide the same advantage, and indeed it
jm> does. The frog's eye recognises concepts connected to size and motion,
jm> and his concept "eat" depends on these primitive ones, forming
jm> a hierarchy.

We are not talking about "concepts" here (whatever that means), but
about learning behavioral categories: What kinds of things can I eat?
I can only find this out by trial and error, and feedback from the
consequences of what I tried, if I made an error. Without the external
feedback, there is no way to know right from wrong.

This applies equally to the ground-level learning of eat/mark by toil,
and to the higher-level learning of return by theft. It is not my own
output from y input that will tell me whether I am right/wrong in either
case; it is the feedback from the external consequences of my output.

For others: This recommendation was motivated by jm's preference for
"unsupervised" learning over "supervised" learning. But here it is the
TASK itself that is essentially a supervised one. Just given mushrooms as
input I am incapable of determining which are and are not edible; only
the feedback from the consequences of eating them can guide me in that.
The mushrooms have 5 features, A, B, C, D and E, but just giving me all
the mushrooms over and over will never reveal that is is only the A
mushrooms that are edible.

jm> Though the frog doesn't have names for its concepts, neither do the
jm> foragers. Or in the other direction, if the foragers do, then frogs
jm> do as well.

When the only available strategy is toil (as for the frog, and for the
prelinguistic human), there is no point naming or vocalizing. The
utility of naming and vocalization begins with the possibility and
utility of theft.

By the way, the frog's categories are inborn, not learned, though
experience activates and perhaps fine-tunes them. For this, mere
exposure, without feedback, may be enough. That is why an unsupervised
model could do it. But the task of our foragers cannot be solved that
way.

jm> The second objection is the other side of the first one: if the
jm> concept return depended on C,D or E (why not?), then the theft strategy
jm> would be sentenced to failure.

Correct, but that is because if the RETURN category depended on anything
but a boolean combination of already grounded categories (and hence their
underlying features) then it could not be learned by theft.

Just as you learn nothing if I tell you that a "snark" is a "boojum"
that is "wrent" -- if you do not know what "boojum" and "wrent" mean.

jm> Furthermore, observe, that "theft" organisms never
jm> learn how to recognise a mushroom to return to. They can recognise
jm> only the call for such mushrooms.
jm> Though the basic concepts are grounded, they are not connected to
jm> the concept "return" if theft strategy is used. In other words,
jm> "return" is grounded in the perception of calls.

Incorrect. If all the foragers who had learned "return" by toil died,
the ones who learned it by theft could still keep correctly responding
to the return mushrooms, because they still know eat and mark, in which
return is grounded. (This would not be true if -- as in the first
Cangelosi & Parisi simulations, by the way) they had learned EAT by
theft. For that is an evolutionarily unstable strategy: the thieves beat
the toilers in fitness while there are still toilers around, but when
they are gone, no one knows what to eat any more, because the thieves
were basing their success on the the vocalization cue "eat" and not on
the mushroom cue "A". Think about this one.)

jm> This is not how language works. I can recognise
jm> things without somebody telling me the name of the thing, after learning
jm> the concept.

And if you didn't know what a "zebra" was, and your life depended on
correctly identifying the next 10 zebras (mixed in with many other
similar animals -- antelopes, giraffes, warthogs, which you also do not
know), but you know what a "horse" is and what "stripes" are, than your
life would be quite safe if I told you: A zebra is a striped horse.

And is that not exactly what language does for you: save you the trouble
of having to find things out the hard way?

jm> In other words, we can see the world AND hear the names of things.
jm> In my view, theft is done the following way. We hear a new name first,
jm> and AFTER THAT we figure out how to ground it in PERCEPTUAL INPUT and
jm> OUR OWN old concepts.

I'm not sure what you have in mind. But here's a new name: "ban-ma" and
if you go to a zoo, you will find some. Now go figure out how to ground
it.

jm> The basic intuition of the paper is interesting but the model is
jm> not relevant to language evolution.

I'm afraid that you have not quite grasped either the model or the basic
intuition -- unless I have misinterpreted your comments...

--------------------------------------------------------------------
Stevan Harnad harnad@cogsci.soton.ac.uk
Professor of Cognitive Science harnad@princeton.edu
Department of Electronics and phone: +44 23-80 592-582
Computer Science fax: +44 23-80 592-865
University of Southampton http://www.cogsci.soton.ac.uk/~harnad/
Highfield, Southampton http://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM



This archive was generated by hypermail 2b30 : Tue Feb 13 2001 - 16:23:06 GMT