CS/PSYCH 129 Week 6 Reactions

CS/PSYC 129 Week 6 Reactions

Sean Lewis

The Plunkett and Marchman paper was bitter. It seems that their gripes are well grounded, so I guess its understandable. I'm also curious as to why we read it. The textbook mentions this experiment, its results and its basis on Marcus' work. It essentially defends a challenge that we wouldn't have known about. I guess it does give more explicit details of the experiment in a shorter form than a normal scientific paper. As an aside, I found the type/token distinction confusing.

The Gasser and Smith paper was interesting though not totally satisfying. I was really confused by their input representation, and their claims that between-category errors don't happen seems like a strong statement given that they occurred 35% of the time (experiment 1). The fact that the errors converge for both nouns and adjectives I guess is good, but does that mean that children's lack of error is accounted for by semantics 35% of the time? At least they do indicate that their model works strongly on a form to form basis, but their success while impressive in some regards, still seems a small part of a solution or account.

The text's coverage of Ellman's word prediction attempt did not suprise me. It seems that outside of a discourse established context, predicting words would be near impossible. But, the fact that it did predict part of speech even though one would think it should is pretty cool, I guess.

Julie Corder

Given that it's late, and I've spent the entire day studying for a CS23 exam to the point of my head spinning and such, I can't promise that this will make any sense or that my questions won't have really obvious answers, but...

In general, I found the Gasser and Smith article to be really interesting -- I like the fact that they're looking for an alternative to the idea of an innate distinction between nouns and adjectives, and that they're able to come up with an alternative explanation. On their first experiment, they said that the performance on adjectives never reached the level of nouns. Obviously, this does not seem like an accurate model of language-acquisition in children, who do not continue to make intra-category mistakes on adjectives for their entire lives. In a footnote, they say that "An initial difference in learning but ultimately equal and near-perfect learning of both nouns and adjectives is achieved with larger hidden layers" but no details were given; I would be interested in knowing more about this network -- how much larger the hidden layer needed to be, etc...

Maybe my biggest point of confustion with this article related to the assertion that nouns identified a smaller category of objects. I don't understand, for example, how "dog" is more specific than "wet." Obviously, "wet" cana pply to any number of nouns -- dog, cat, towel, etc -- but "dog" can also apply to a very wide range of objects that correspond to many different adjectives -- "wet," "big," "dry," "small," etc. Why are the nouns more constraining than the adjectives?

The final problem I had was with the conclusions drawn from Experiment 6. They used two "question" nodes to correspond to the two categories that they wanted the network to form for the inputs. They then acted like it was surprising that the network would associate distinct outputs with these two input nodes -- isn't that what a neural net will, by definition, do? Children aren't always faced with exactly the same question when they're supposed to respond with a noun, anymore than they're faced with exactly the same question everytime they're supposed to respond with a dimensional adjective. The association seems somehow flawed to me..like it really over-simplifies the issue or something...

Or maybe I've just spent too much time reading text books and my mind isn't working quite right? ;)

Dan Fairchild

(Gasser and Smith p. 271 etc.) They talk a lot about how common nouns are acquired earlier than adjectives because they represent a more compact region of semantic hyperspace. So I wonder if any studies have been done with regard to children's acquisition of proper nouns, which occupy a considerably smaller region of the hyperspace than common nouns do.

(G & S p. 277) I found the statement that "learners do not just map objects to words but they also map linguistic inputs to linguistic outputs" very interesting. The fact that some linguistic processing can be done without making use of semantics is pretty cool.

(G & S p. 281) I definitely liked the new learning rule they came up with, with it's more accurate version of the way parents teach their children. It seems much more plausible.

(G & S p. 295) On their "multidimensionsal checkerboard": cool, a tesseract tiling! <random tangent%gt;I now have a strong urge to write a 4-D tic-tac-toe program. The branching factor is worse than chess, but better than go, and there are 920 possible three-in-a-rows, so I'm not too sure if I'll be able to get it to run at a reasonable speed.</random tangent>

I didn't really have anything to say about the Plunkett and Marchman; it's cool that they managed to construct a network that reproduced the way that children acquire the past tense, but other than that, all they did in the article was to successfully show that their results were valid.

Once again, not really much to say with regard to Chapter 9, except that I like the structure of the network used for the lexical development.

Andrew Stout

I need a reminder about how(if) a NN implements associative memory, and memory after one exposure. I remember talking about this briefly at some point, but I don't clearly remember the outcome of the discussion. Is there a simple connectionist architecture that can mimic associative memory? People often learn a word after just one exposure-has anyone been able to get a NN to do that?

McLeod, et al, p 186: These graphs don't look similar at all! Are these supposed to be convincing results, suggesting a similarity between overregularization errors? Granted, it is difficult to compare a simulation with the real world-Adam's percentage of regular verbs is irregulatable and uncontrollable, but just the same, this comparison does not do a good job of supporting the claim. An average of many children's performance might have been a better choice.

What exactly is an epoch? I glean that it's some measure of the number of training runs on a network=8Ait there a more precise definition?

p 199-200: How is a cluster diagram arrived at? What does the "state space" look like? I understand what the diagram is analyzing and what it means, but I'm still not entirely sure how we got to the diagram from the network. Also, I've seen posters of cluster analyses in the Sun Lab-who's doing them? What for?

Jeff Wu

The Modelling text provides a very good reason why a connectionist model approach to language acquisition is more useful in that it is more similar to reality than the other possibilities mentioned. It is very interesting that the model accounts for the over extension and under extension errors occurring at specific stages in language acquisition through the comprehension-production asymmetry observation. However, it is very hard to let go of the Universal Grammar approach in favor of simple recurrent networks. Although SRNs seem to model human development of understanding syntax well, the Universal Grammar provides an easy way to explain how humans think when constructing sentences. UG accounts for things such as ambiguity, and seems to provide the infant with a plausible way of making sentences, whereas SRNs only allow the child to extract from available sequences. How do SRNs account for the fact that children can move the correct auxiliary to the front of a complex sentence, if they have never seen that sequence of words before? The UG, in it's hierachical form makes this problem easy for kids, whereas I can't see how SRNs would preform this task. Cluster analysis doesn't account for movement, or does it?

The fact that currently SRNs must be trained on well-formed grammatical sentences also wouldn't explain occurrences where children of pidgen speakers which display no clear grammar, can form a grammatical language of creole by themselves. UG, on the other hand, does.

The Gasser reading provides a good account for why nouns are learned faster than adjectives. My only question is, what occurs when children learn abstract nouns and proper nouns such as "idea" or "grandma"?

Nori Heikkinen

While this week's reading was interesting, i don't have very many comments on it. The Gasser and Smith article started out comprehensible and interesting, but the further it got into the specifics and exact fractions of the experiment, the less sense it made. On page 285, they mention that the network didn't learn that WET and DRY were opposites ("attributes of one kind") and that ROUGH and SMOOTH were another ... i didn't quite understand this. With the setup they have, wouldn't it be easy to tell that with WET and DRY, all properties were the same (all nodes switched to the same position, whatever) except one? And the fact that there was a pair of them, one with node X on and one with it off, would seem like a really easy way to distinguish that they were "attributes of one kind."

The article on Michael Brent's new work with word segementation evidence is interesting, asserting that babies actually pick up a lot more from isolated, single-word utterances than they do from segmenting long streams of speech. How big of a question is this, however? In the end, musn't it come down to a little of both? Babies will probably never hear every word they'll ever know in isolation--the 9% figure they quoted was high for words in isolation, but low total--and yet they learns thousands and thousands of words. So segmentation soon kicks in ... both sides of this argument are so easy to take that it makes me think that there's some kind of middle ground that's just being overlooked.

Jeff Ebert

Gasser and Smith

Initially, I scoffed at the authors treatment of adjective "categories," mainly because it clashed with introspection into my own adjective concepts. Their exemplar-based view would mean that I learn a new adjectival concept such as RED by computation on instances of objects that have been labeled as red (apples, blood, fire engines, etc.). This is unnecessary, since anyone could simply point to an apple, tell me that its color is red, and thereby cause me to know what red is.

But, of course, the infant who does not understand language has a much harder time than I do when learning new adjectives. Gasser and Smiths explanation for the primacy of nouns over adjectives in early word learning thus seems plausible for an infant who has little more than statistical inference to go by. An example might clarify.

When a parent addresses a child with something like "This is a BIG dog," it is an intractable problem for the child to pick out the relevant feature of the object to which the adjective refers. In this case, BIG might mean shaggy or friendly or willing-to-fetch-slippers-from-under-the-bed. It is only after BIG has been used in other contexts ("This is a BIG chair") that a child might discover that the relevant dimension is size. In the meantime, it is possible that the child will use BIG to describe the nice lady who lives next door, much to the ladys chagrin.

Contrast this to noun categories. Because there is a high correlation between the many features of one category instance and the features of the rest, it makes sense that children are usually correct when they extend a noun term to new instances.

Nevertheless, I cannot reconcile the authors model with some of my intuitions about adjective learning. They claim that adjective categories are hard to learn because members have so little in common besides the one adjectival dimension. I would argue that for adjectives, the less instances have in common, the easier the adjective is to learn. This is because a child will have to encounter fewer instances before eliminating all possible candidate meanings for the adjective (in the dog example above, BIG could mean three things, but after encountering a BIG chair an object quite unlike the dog except in size the child is left with only one possible meaning for dog).

George Gibbard

I was fairly convinced by the overall thrust of the Gasser and Smith article, although there were a few things that really ought to have been made clearer. For example, I wish there could be some statistical estimate of the scale of 'the phenomenon' in human children which they aim to reproduce in their model. Of course there may be no easy way of doing this. The rest of my comments kind of form a thread:

They recognize that their 'syntactic categories' of noun and adjective are distinct from 'the conventional ones': 'because they are directly associated with objects and their properties, they have a semantic force.' In fact, they are not really syntactic categories at all, but rather semantic categories which the authors believe actually form the basis for the acquisition of syntactic categories. This is the opposite of Pinker's approach when he tears down conventional notions of nouns as 'people, places, things,' verbs as 'actions,' and so on. Nonsense, Pinker says, actions can be nouns (like 'action') and so on. The argument being made by Gasser and Smith is: while Pinker is right, nouns are still prototypically things. I think this has some merit.

After all, while actions sometimes work syntactically as nouns, words for things always do. On the other hand, this may not be saying much. It may just be a kind of tautology. It's not entirely clear to me what the status of our normal terms for 'parts of speech' is, mand there are definitely cases linguists don't know how to classify. (Otherwise they'd have much less to argue about.) For example, what is eating in eating people is wrong or in steak isbetter eating? Both work like nouns in their relationship to the higher-level structure of the sentence. But in terms of their relationship to the words inside the phrase they head, the first eatingworks like a verb and the second one like a noun. And semantically, all eating is clearly an action. What's going on? Is this a serious problem in categories? I think the answer is really no, it only becomes a problem if we decide our theory should make it one. Rather the point is that our syntax allows us to treat lots of different things like nouns, but of these, the 'nouns' can't behave in the other ways that the 'verbs' can.

Despite my doubt and confusion about categories in syntax, still, having spent a good amount of my time as a ling major trying to figure these things out, I was a little uncomfortable with Gasser and Smith's ignoring syntax proper (they do recognize that children must learn syntax proper, but propose that this comes later than acquiring the categories syntax uses, which they think are learned through semantic characteristics of different categories). So I sat down and tried to come up with an explanation of what exactly an adjective is syntactically. I am aware that in some languages, adjectives are not really a very distinct category from nouns, as in ancient Indo-European languages, while in other languages, adjectives are not terribly distinct from verbs, as in Chinese. Hmm. I came up with a lot of stuff it may not be worth saying here, but most simply:

Adjectives' functions are divisible into two groups, predicate and modifying. Adjectives express a function, having as their principal external argument either that argument they are predicated of, or that argument they modify. But: nouns and verbs can both also be used as modifiers or predicates. And verbs express functions like adjectives. In English, they are different from verbs in the nitty gritty syntax of how predication and modification work. But we can see why adjectives can be verbs in some languages. On the other hand, there are some further (prototypical, not definitive) differences: verbs can require multiple arguments, whereas adjectives like intransitive verbs typically take one, any other requiring more complex syntax involving prepositional phrases.

But while NPs in syntax typically express arguments of which functions are predicated, we can also say (and I think logicians do) that nouns themselves express functions. It is not most common nouns, but rather determined noun phrases (DNPs) that denote entities with a specific reference, which they get from context or deixis; we expect a speaker to provide enough information to determine the NP well enough so that the listener know which precise entity the speaker intends as an argument. Only a few types of nouns automatically have unique reference: proper nouns and generic nouns (e.g. rice in I like rice must always have the same referent); while pronouns take on a specific reference according to context (I means different entities when used by different speakers). We are left with ordinary nouns applying to a set of referents, expressing certain properties of those referents, just like adjectives express certain properties of their referents (although as Gasser and Smith point out, generally simpler and more 'one-dimensional' properties). For these reasons adjectives and nouns can be treated similarly in some languages: in Latin bona, neuter plural form of the word for 'good,' behaves as a noun meaning 'good things,' and modification is basically a matter of an adjective acting like a second, co-referential noun 'in apposition.'

The different categories have behaviors in common both semantically and syntactically. The differences tend to be which operations the syntax of the language makes most basic and convenient for each category: thus in English an adjective modifies a noun by simply being placed in front of a noun, while a verb to do this must be relativized or made into a participle. To get a DNP indexing a present, known object that fits into a given noun category, we need only say 'the X.' If we refer to the entity by an adjective category it fits into, we need to say 'the Y one.' Given that language evolves and reproduces itself according to which of its vocabulary and constructions speakers find useful, the distribution of vocabulary into syntactic categories must at least crudely reflect which words it is most convenient to use in which syntactic structures. So I think in the end, we really are thrown back onto the prototypes Gasser and Smith discuss as the foundations for syntactic categories. As for outlying members of their categories�nouns that adress only a narrow dimension of their referent's properties, like 'memorabilia,' or adjectives that refer to a complex of specific properties, like 'Kafkaesque,' I think it's plausible that these do not interfere with Gasser and Smith's theory, as they're neither things kids are likely to hear a lot nor things kids would be liable to use correctly; and on the other hand there's no reason the prototype model of category acquisition should interfere with adults' ability to use these non-prototypical terms.