Re: "Classical" Theory of Categorisation

From: Stevan Harnad (harnad@cogsci.soton.ac.uk)
Date: Sun Mar 09 1997 - 16:05:47 GMT


Hi, I've done my comments paragraph by paragraph as I read it. As a
result, many of my points are in fact made by Wendy further on. It was
only time constraints that prevented me from going back and taking them
out. Besides, some things are worth repeating...

I will archive this and Wendy's message as Hypermail, if it's alright
with you, and will continue archiving our coglab discussions.

> From: "Wendy Smith" <WS93PY@psy.soton.ac.uk>
> Date: Tue, 25 Feb 1997 08:52:51 GMT
> Models of Category Learning: The Defining Attributes Theory
>
> The "Classical" Theory of Categorisation
>
> The classical theory of categorisation is considered to be the
> "defining attributes" theory, and if this has to be summed up in
> one phrase, it would be: Singly Necessary and Jointly Sufficient. The
> idea is that a category can be defined by a set of attributes. Each
> attribute is singly necessary, which means that if an item does not
> have one of the attributes, it isn't a member of that category, no
> matter what other attributes it does have. The set of attributes is all
> that is necessary to be a category member. If an item has all the
> attributes deemed sufficient, it is a member of that category, no
> matter what other attributes it does or doesn't have.

This perhaps makes it seem more authoritarian than it was, but consider
that the weasel word in all this is "attributes": What is an attribute?
Abstractly, "being a chair" can be an attribute of a chair, and of
course this is necessary and sufficient!

So could "being A or B or C or...." -- make the list as long as you like.

This is not just a logical defect for opponents of the "classical
theory," but it is a genuine flaw in how the whole thing has been
conceptualised.

Our "prototype" for what an "attribute" is is something like "red" or
"round". But then what about Boolean combinations? Is "red or round" an
attribute? How about "not red or round"? But then everything that is
neither red nor round shares this (negative) attribute -- and countless
(literally) other attributes. (Remember Watanabe's "Ugly Duckling
Theorem" from last year?
http://cogsci.soton.ac.uk/~harnad/Hypermail/Foundations.Cognition/0056.html

The real origin of the "classical theory" is in philosophy, where they
did not speak of "necessary/sufficient attributes, or features" (what on
earth is a "necessary feature"?) but necessary and sufficient CONDITIONS
for propositions (statements) to be true.

If you have a proposition of the form:

     If P (is true) then Q (is true) (1)

then Q (is true) is a necessary condition for P (is true) and
P (is true) is a sufficient condition for Q (is true).

That's fine for logic, where these are literally "laws" (if they were
not true, then that would lead to a contradiction) but categories in the
world don't have NECESSARY features in this sense. A horse does not HAVE
TO be four-legged, on pain of contradiction. Only a quadruped (4-legged
thing) HAS to be 4-legged, and that's just for the trivial reason that
it's part of the definition of "quadruped": that's how we've agreed to
use that word. A definition is a two-directional "if/then" statement
(i.e. =by.def equivalence):

If "quadruped" then 4-legged,
and
If 4-legged then "quadruped"

So in a definition, "A" means B,
B is both a necessary and a sufficient condition for being "A."

But in categorising the things in the world we don't have necessary and
sufficient conditions:
A "quadruped" is necessarily 4-legged, but a "cow" is not NECESSARILY
a creature that gives milk and goes "moo."

It may HAPPEN to be the case that all cows hitherto encountered have
given milk and gone moo, and it may HAPPEN to be the case (though we
don''t know it) that no one will ever see cow that does not give milk or
go moo, because there are no such cows, but even that does not mean
that a cow NECESSARILY gives milk and goes moo.

I hope with this I have persuaded you that matters of logical
necessity/sufficiency have nothing to do with Psychology (except in the
trivial case of definitions). All features of real objects are
CONTINGENT: they carry neither the force of logical necessity nor of
logical sufficiency (again: unless we agree to DEFINE things in a
certain way: if a cow was DEFINED as a creature that gives milk and
goes moo, then of course no creature that does not give milk and go moo
can be a cow; it's got to be something else, because we've decided to
use the word "cow" according to a definition that we have made.

So if the "classical" theory of categorisation could not have been based
on necessary and sufficient conditions, what COULD it have been based
on?

Nature, for the most part, decides what's what: We have no choice over
what an edible or a poisonous mushroom looks like. So when we learn to
sort mushrooms, we need a way to sort them correctly. "Invariants"
are attributes of mushrooms on the basis which we can reliably sort them
correctly. It is neither necessary nor sufficient that mushrooms have
these invariants in order to BE mushrooms. But these are the attributes
of mushrooms that make it POSSIBLE for us to sort them correctly.

For ground-level categories that we learn how to sort on the basis of
actually seeing/smelling/touching them, the invariants will be in the
sensory projection (the shadows that mushrooms cast on our sense
organs) or on some property that our brains can DERIVE from those
sensory projections.

> Implications
>
> The theory has several implications. First, category membership
> is all-or-none. Either you are a bird, or you're not a bird; you
> can't be a kind of a bird.

Birds are not all-or-none because of a theory (let alone a PSYCHOLOGICAL
theory). They are all-or-none because of the way biology happens to be:
Species are not graded into one another; they are all or none. So, to
the extent that we can correctly sort creatures by species, there must
be some direct or indirect invariant that distinguishes birds from
nonbirds. (There doesn't HAVE to be a detectable invariant, but if
there hadn't been then we would never have been able to know what was
and wasn't a bird.)

This example is a good one, because it makes it clear that
psychologists are putting on airs when they try to be "ontologists"
(experts on what does and does not exist in the world) or biological
taxonomists (the ones that decide what is a species and what to call
it). All that psychologists are empowered (in fact, obliged) to do is
this: If people are able to successfully sort things in certain ways,
the psychologist (cognitive scientist) must explain how they do it.

Now it HAPPENS to be true that (1) birds differ from nonbirds in
an all-or-none way and (2) people are able to sort, name and describe
them correctly. How do they do this? That's an empirical question,
but one thing you can be sure of is that they don't do it by magic:
There must be invariants in our sensorimotor interactions with birds
(and pictures of birds) that give us the capacity to do this.

One possibility is that we do it the same way that we learned to write
the letters of the alphabet: There are bird "templates" and fish
"templates" and we judge whether something is a bird or a fish by
whichever template it resembles more. In that case, the invariant would
be a relative threshold on a "closeness to template" attribute:
e.g., the distance between the fish and bird templates was 30 along
some sort of a morphing continuum: 15 or less on the continuum = "bird"
and more than 15 = "fish". That's a viable model, except in most cases
it doesn't work. The invariant is something more specific and local than
relative distance from prototypes.

> Second, leading on from this, all members
> of the category are equally representative. A penguin is just as much
> or as little a bird as a turkey, an ostrich or a sparrow. Third, again
> following from the all-or-none nature, the boundaries are clear cut,
> there are no straddlers. Fourth, people have a knowledge, or mental
> representation, of these attributes, and are able to apply this to
> novel items to determine category membership.
>
> Criticisms
>
> The theory met with several criticisms. The first criticism
> revolves around the finding of "typicality" effects. It was
> found (e.g. Rosch & Mervis, 1975) that some items were deemed more
> typical of a category than others, and that membership appears graded
> rather than all-or-none. Items which were more typical could be
> categorised more quickly than less typical examples. These criticisms
> led to another view of categorisation (probabilistic theories,
> including family resemblances and prototype theories - which Vered will
> be doing later, so I don't want to pre-empt her too much).

Can you give some examples of graded categories? Big and small are
graded, but no one ever thought otherwise. Bird and fish don't seem
to be separated only by a difference in degree. Being able to judge how
typical things are is an interesting little capacity we have. But the
capacity to say WHAT they are (as opposed to how typical a member of the
"what" category) is a considerably more profound and powerful a
capacity. Typicality judgments occur AFTER the hard work is done: First
you say WHAT something is; then you can say how "good" a "what" it is.

> The second criticism, stemming from the first, revolves around the
> findings that the boundaries between categories are not always
> clear cut. Not only were inconsistencies between subjects found, but
> also within subjects, when they categorised a set of items, then
> categorised the same items a month later (McCloskey & Glucksberg,
> 1978). This suggested that rather than clear cut boundaries and fuzzy
> middles, perhaps the boundaries were fuzzy, and the centres clear
> (prototypes).

We need examples: Were birds mistaken for fish? And if they were, does
that mean a bird is a (remote) kind of fish?

You'd certainly have trouble saying when "big" became "little" (although
once you'd sampled the range you could sort most of the sizes
quite well. The same is true with, say, green vs. blue: Sure they
blend into one another; so do facial expressions and even ba/da/ga
and ba/pa. But where human sorting is unreliable or random, there
is no point looking for an all-or-none invariant, because there isn't
one. The real question is: How many of the nouns and adjectives
in a dictionary name all-or-none vs. graded categories?

(I actually don't know the answer myself: It would be interesting
to sample a few pages in a dictionary at random and make an estimate!)

> Third, it can be very difficult to establish the set of defining
> attributes. The famous example is that of "games"; everyone
> (supposedly) knows exactly what is or isn't a game, but no-one can give
> a clear set of defining attributes.

Who ever promised that we would be able to name the invariants? How
many other cognitive capacities have we been able to explain by
introspection? Isn't it more likely that we learn categories
implicitly? It's only when the invariants we were using implicitly
themselves become named categories (as in Biederman's geon analysis of
the allegedly nonverbalisable basis of chicken-sexing) that we may
discover the invariants we've been using all along.

To the extent that we can all classify things into the category "games"
and "non-games," there has to be an invariant basis for it, even if it's
a long string of disjunctions: "A Game must have property a or b or c
or..."

To the extent that we CANNOT classify some things as game or non-game
(either because we don't agree or because there just isn't a fact of the
matter), invariants will of course not exist. But that's irrelevant to
the question of how we are able categorise (because for the cases we are
not).

> In addition, if people hold
> beliefs about what attributes are and are not necessary, how stable are
> these beliefs? First, they appear to change with increasing experience
> (Rey, 1983); and, second, the beliefs can be frankly wrong (McNamara &
> Sternberg, 1983). So, the theory was that people use a knowledge of
> the defining attributes to decide category membership; but this
> knowledge may not exist, and, even if it
> does exist, may not be correct; furthermore, even if it is correct, it
> can change over time as a function of experience.

All true, but what follows from it? Again, it's not what I BELIEVE that
matters, it's what I can (and cannot) SORT (and how, which need not be
accessible in the form of explicit beliefs). It's only success at
sorting that matters. If there is no right or wrong, if people
disagree, if sorting changes with time, then we can hardly expect an
invariance to underlie the sorting, because the sorting itself is
varying.

To expect the invariants underlying reliable, consensual, correct
sorting to be based on explicit beliefs rather than the kind of
"unconscious inference" that is implicit in so much of perception is
simply a theory of categorisation, and a wrong one, at that. Don't
look for explicit beliefs; find the invariants that make correct sorting
possible.

> Evaluation
>
> The strongest criticisms levelled appeared to revolve around the
> issue of typicality. Typicality ratings are discrimination tasks,
> not categorisation tasks. If the discrimination task shows an
> influence due to categorisation, then what we are seeing is a
> demonstration of categorical perception.

Typicality judgments are relative judgments (but not explicit
discrimination or discriminability measures): We can often rank order
the members of a category in terms of their typicality. We can also get
pairwise similarity judgments. The expectation would be that they have
the same multidimensional structure when these two sets of data are
scaled. Discriminability tests are usually done on interstimulus
differences that are smaller than the scale for similarity/typicality
judgments.

If identification is categorical and similarity/typicality are lowest
and discriminability is highest at the crossover point of the
identification curve, then there is CP.

Where there is no continuum, and interstimulus differences are well
above the discriminability threshold, only similarity/typicality
measures can be used, and even they need a baseline -- either from an
objective (nonhuman) measure of interstimulus differences (e.g., the
Gabor filters Irv Biederman talked about: we should be getting them
soon) or from a pre-learning baseeline.

> The situations described in
> the vast majority of these studies, therefore, seem equivalent to our
> "post-learning" discrimination tasks. Perhaps what we really need to
> be asking is whether, if defining attributes is the correct mechanism
> for categorisation, it would be able to predict the sorts of results we
> see. The implications from the theory were that there would be clear
> boundaries, but fuzzy middles. This would seem to suggest that,
> following categorisation, there would be a compression within
> categories (because all items within a category are equivalent), and
> separation between categories (because all items are either a member of
> a category, or not a member), relative to the situation before
> categorisation. This is compatible with the findings of absolute C.P.,
> but not so compatible with those of relative C.P. It is particularly
> poor in the situation where there is separation only, and suggests that
> the classical theory cannot be a full explanation of categorisation.

Yes, this is a "post-learning" condition, but because there is neither
a pre-learning baseline nor a quantitative measure of interstimulus
distance, there is no way of saying whether there is CP at all, relative
or absolute.

> The second criticism concerned the fuzziness of the boundaries.
> Before this criticism can be evaluated, we need to decide exactly
> what is meant by fuzziness about the boundary? For example, take the
> boundary between "mammals" and "fish", and the problem of categorising
> a whale and a cat. It will probably take a longer time, and result in
> more uncertainty, to categorise the whale, relative to categorising the
> cat. But what does this mean? Is it that a whale is less of a mammal
> than a cat, or is it that a whale may or may not be a mammal, I just
> don't have enough knowledge to categorise it? That is, let's say the
> defining attributes of a mammal are: hair covered body, viviparous,
> produces milk to feed the young. I have enough experience with cats to
> know that these conditions are satisfied, but I'm not so familiar with
> whales. This doesn't make a whale less of a mammal, or the boundary
> between fish and mammals less clear cut, it just means I don't know
> what attributes a whale possesses. Experience does appear to play a
> role in determining which features are used; it may also play a role in
> the ease with which features can be identified. The nature of the
> experience may also be important, highlighting the role of feedback.
> If there are no consequences to categorising correctly or incorrectly,
> the boundary may remain fuzzy. If, on the other hand, there are
> serious consequences, then a clear boundary may emerge very quickly.

That's all true: It amounts to saying (1) there must be a fact of the
matter, as to what is in what category, and (2) that fact has to matter
to us (getting it right has to have consequences, even if they are only
the esoteric and obscure ones that a microbotanical taxonomist
specialising in Amazonian flora would know or care about!). And above
all, psychologist must be reminded that they are not ontologists: Their
mandate is not figuring out what things there are in the world and what
their features are; their mandate is to describe how we sort things,
and then explain how.

To put it another way, psychologists' territory begins with the nature
of the proximal stimulus (the shadow that distal objects cast on our
sensory organs), and what can be recovered from that, to guide our
categorisation. The distal object and its attributes are none of our
business except in relation to the proximal stimulus and what we prove
to be able to do with it: Sometimes, under some conditions, we can sort
reliably, in an all-or-none fashion. Other times, under other
conditions, we cannot sort reliably. Along every continuum, and for
every set of features, we can generate artificially (unless nature
provides them naturally) cases that we are unable to sort reliably.

Moreover, not all of our categories are grounded directly (through innate
or invariance detectors = sensorimotor "toil"). Some are grounded
indirectly -- and indirectly means explicitly, by a verbal description
or definition that tells us what's what, usually by naming the critical
features ("theft").

> Context effects may also cause fuzziness. If a given item can be
> categorised in more than one way (e.g. edible versus inedible, but
> also perfumed versus non-perfumed), then, in the context-free situation
> described in many of these studies, there may be some
> confusion/disagreement concerning the placement of an item.

Contexts are clearly critical for categorisation; they signal or define
the alternatives that matter, and hence what the relevant invariants
are.

> The third criticism, relating to the role of knowledge and belief,
> does have to be addressed, but perhaps not in the way it is
> usually phrased. Furthermore, this criticism is not specific to this
> approach; the assumption of knowledge also exists in other theories,
> e.g. prototype theories and theory-based theories (Margolis, 1994).
>
> The problem is often presented as an inability to state the
> defining attributes, even when accurate categorisation is taking
> place. But do we have to be able to verbalise a set of attributes to
> be able to use them, or can we use them in an implicit fashion?
> Implicit use could explain correct categorisation in the presence of
> both apparently non-existent and incorrect knowledge. Ultimately, if a
> person can categorise correctly, i.e. reap the rewards of successful
> categorisation and avoid the costs of miscategorisation, then whether
> or not they can define the category is immaterial. Perhaps an
> important point here is the term "correctly". How do we decide what is
> and isn't correct? Arguing over what does and what doesn't constitute
> a category is more an intellectual question for experts in that field
> than a practical question for psychologists.

What is correct or incorrect either comes from reality (and its
consequences for us if we sort things the wrong way) or by social
convention (which can be as consequential for us as physical reality,
e.g., if Saddam Hussein is calling the shots).

The job of psychologists is to explain HOW we sort correctly, when
we do.

> The criticism still has a valid point, though, in that if we
> assume knowledge is required, whether this is implicit, explicit,
> or whatever, then how do we get the knowledge in the first place? And
> exactly how do we use this knowledge to determine category membership?
> Defining attributes may provide a structure, or mental representation,
> for the knowledge which is needed, but the classical theory does not
> provide a clear mechanism for how this knowledge is acquired and used.

Nor do its rivals: The vagueness of saying its done by matching a
prototype, memorising instances, detecting features, or using
"higher-order" knowledge is on a par, isn't it?

> Conclusion
>
> This theory may be singly necessary, but is not jointly sufficient
> as an explanation for categorisation mechanisms.
>
> Discussion
>
> I've put this section as a series of questions which arose out of
> the reading.
>
> What, as psychologists, do we mean by the term "categorisation"?

How about: human sorting behaviour?

> Is there a difference between deciding what a category is, and deciding
> the mechanics of how we categorise?

Yes, the first is ontology, and not a question for cognitive theory.
If by the second you mean a model that can sort things as we do,
then there is indeed a difference.

> How important is the relationship between why we categorise and how we
> categorise?

The "why" question would presumably have different answers for different
categories. If a general reason is wanted, then it's clearly that
sorting things right or wrong has consequences for us.

The "how" question is the one that cognitive theory needs to address.

> How can we study this? - " It is always difficult to decide whether a
> particular observation is a function of the information represented by
> the concept, the structure or the form of that information....or of the
> processes that operate on
> that concept" (Komatsu, 1992, p501)
>
> Is categorisation the same when we have full knowledge of the stimulus
> set, and when we have not yet met all the stimuli?

What is full knowledge of the stimulus-set before meeting all the
stimuli? Have we seen all possible trees? If you've sampled enough of
them to successfully extract the category invariants, then you should
be able to sort new members using those features. If not, then it's
either underextension or overextension (and you're faced with the
"credit/blame assignment problem.")

> Is the mental representation the same in both these cases?

There is always the possibility that your invariants over/underextend
(hence are not invariants at all).

> Is there a difference between "intellectual" type categorisation and
> "practical" categorisation?

Not sure what this means. I suppose sorting mushrooms is important if
you need to eat -- but is it less so if you are a mycologist?
Or do you mean instances versus descriptions? That's the toil/theft
issue.

> Do we have a concept first, then begin to categorise, or does the
> concept arise out of the categorisation process?

What is a concept? By my lights it is either a feature-detector that
picks out a category implicitly or a (grounded) description that does so
explicitly. In both cases the categorisation must precede the context,
but in practice it's all mostly about category REVISION (as above), and
then of course there is a prior concept (provisional invariant) which
fails and must be revised.

> What role does context play in deciding which factors may or may not be
> important?

Depends what you mean by context. Context can mean:

(1) The text surrounding an utterance.

(2) The location when the utterance was uttered.

(3) The background knowledge of an utterance.

I've committed myself to two, related, explicit senses of "context":
The context of acquisition and the context of application of category
names.

In both cases the context consists of THE SET OF CONFUSABLE
ALTERNATIVES amongst which the categorising must be mastered. In
acquisition, it's the set of positive and negative instances on the
basis of which the invariant is extracted. In the context of
application it is the set of alternatives that the category name is
being used to sort. Reducing uncertainty among a set of alternatives if
straightforward information theory.

In the context of acquisition the sample of positive and negative
instances is critical for the detection of the invariant because it
is with respect to that sample of +/- variation that the invariant is
invariant! And revision needs to be done whenever the +/- sample is
extended (the context is widened) and the provisional invariants prove
to generate misses and/or false +'s or -'s.

The context of application is similar, except new categories or
invariants need not be learned. The alternatives simply need to be
sampled to determine which name to use to resolve the uncertainty that
is at issue: anything from the object, to the table, to the leftmost
table, to table-charlie if the category is an individual and other
individuals in the same suprordinate category are the alternatives.

(By the way, I do not believe in "basic level categories" except as what
happens to be the default level for distinguishing alternatives: i.e.,
the default context. But the default context may not be the same for
everyone. Dog breeders need a finer grain than "dog" or even "collie."
And of course the level of individuals is a special one (but not
necessarily "basic." Same goes for "superordinate" and "subordinate"
levels; these are arbitrary, and relative to whatever the context might
be,)

Whether in the context of acquisition or the context of application,
what "X" is always depends on the answer to the question: "Compared to
What?"

This means we are never doing "absolute" identification, only
"relative" identification: relative to the sample of alternatives we
have encountered so far

> Can the "rules" change as experience with the stimulus set increases,
> and, if so, how and at what level does this happen?

Surely it happens, because our invariants are not always right; nor are
the descriptions that we get by symbolic theft. So everything is
negotiable.

This state of affairs is called "underdetermination" in the philosophy
of science: Theories are underdetermined by data: the data are always
compatible with more than one theory (and sometimes many).

The same is true of sensorimotor invariants and rule-based descriptions:
Everything is provisional (and only the ontologists know what there
REALLY is out there)... Our context of alternatives is always based on a
finite sample, hence our variants can only be provisional.

> What would this method of representation have to say about C.P? For
> example, the representation appears to be uniform across all members,
> i.e. they are represented as a set of defining attributes. This would
> appear to suggest that, following learning, C.P. should be demonstrated
> by a strong compression, as well as clear separation. This isn't
> always found, so what does that mean for the theory?

Not sure what "this" refers to here, but I would expect CP only when
it's needed: CP is not needed to sort the zebras from the giraffes:
Nature has already put a gaping gap between them. Not so with baby
chicks, identical twins, interconfusable mushrooms, cancer cells,
mushrooms, etc.

I would predict that if the set of confusable alternatives is scaled,
then CP will occur wherever things are not (or not reliably) linearly
separable in that space according to how they must be sorted. How they
must be sorted is determined by the consequences of miscategorisation.
If the category boundaries happen to be along natural "fault" lines,
the need for CP is minimal. It is when they go AGAINST the natural
fault lines that I would expect CP to be maximal.

> References
>
> Komatsu, L K (1992) Recent Views of Conceptual Structure Psychological
> Bulletin 112 (3) 500 - 526
>
> McCloskey, M E & Glucksberg, S (1978) Natural Categories: Well defined
> or Fuzzy Sets? Memory and Cognition 6 462 - 472
>
> McNamara, T & Sternberg, R (1983) Mental Models of Word Meaning
> Journal of Verbal Learning and Verbal Behaviour 22 449 - 474
>
> Margolis, I (1994) A Reassessment of the Shift from the Classical
> Theory of Concepts to Prototype Theory Cognition 51 73 - 89
>
> Rey, G (1983) Concepts and Stereotypes Cognition 15 237 - 262
>
> Rosch, E & Mervis, C B (1975) Family Resemblances: Studies in the
> Internal Structure of Categories Cognitive Psychology 7 573 - 605



This archive was generated by hypermail 2b30 : Tue Feb 13 2001 - 16:24:05 GMT