CS/PSYCH 129 Week 10 Reactions

CS/PSYC 129 Week 10 Reactions

Sean Lewis

Elman's paper is interesting in the regard that it seems to support some of the things that Pinker might say through a connectionist framework. I mean this only in regard to the syntax portion however. I think it was Pinker who said that people figure out the semantic value of words based on where they appear in syntax. Elman's model does just this, and pretty impressively too.

At first glance, the ZOG example seemed kind of silly... since all of the words are completely arbitrary. But when I realized that it was using the new input in the right context, even though it hadn't been trained on it, I thought it was kind of cool. I guess before I thought that he was training the network on a different input in the same context, and by giving it a nonsense label was claiming it could learn new words. But he didn't and that is good.

In terms of Elman's other experiments and in general the power of the recurrent network, I think it is a good way to incorporate context effects into a network. This weeks readings showed how the recurrent network effectively handles short distance relationships. I would be curious to see how this architecture fares, or what changes need to be made to it, in handling long-distance dependancies.

Julie Corder

I found the textbook reading to be pretty straight-forward; the only questions I had were about the Elman model, and those were cleared up when I read the other article. I'm a little bothered by the fact that orthographic input is being used by the input (which includes "silent" letters and the like) but I"m not really convinced that it makes a difference . . I guess if it works for written letters then, in theory, the same process would work with phonemes or whatever representation you wanted to use. The spelling in English is SO different from the pronunciation, though, that that bothers me still . .

In the Elman article, I'm confused as to why they needed to use input nodes for voicing and why they needed separate nodes for "consonant" and "interrupted." They only use this structure for the one experiment (on letter sequences, p. 17) and for all of the letters they use the sounds are voiced; also, the "consonant" node always matches the "interrupted" node. The other thing about that experiment that I wondered about was the decision to use *20* hidden nodes and context nodes. I can't find it right now, but I remember seeing one of their later models that had a similar number of inputs and nowhere near as many hidden nodes. Is there a reason that this task required so many hidden nodes? It didn't seem to be THAT complex . .

Finally, in the experiment where they classify words based on simple sentences, they say that "from the fact that there is a class of items which always precedes "chase," "break," "smash," it infers that the large animals form a class." But I thought their sentences were generated randomly based on the syntactic requirements of the verbs (so a verb got a subject and, if necessary, an object.) But unless the nouns were already categorized, why would "The monster chased the girl" be any more likely to have been generated than "The girl chased the monster"?

Andrew Stout

Elman reaction notes: p.6-7 (in my copy) notes that the temporal XOR model applies the XOR rule /all the time/. This doesn't seem that smart to me. Granted, it works, because we only care about the output in cases when the rule applies, but I'd be more impressed with a network that knew when to apply the rule. A slightly more complex task could easily require this sensitivity. Perhaps a hybrid of Jordan's network with "plan" input (McLeod et al 142) and Elman's SRN could accomplish this?

p.8 The graph of RMS error in the letter prediction task is a bit course to fully satisfy me. I have several questions pertaining to this graph. How consistent is the pattern of higher error on the second 'i' and second and third 'u'? The error on these vowels seems to be somewhat higher than on first post-consonant vowels. This makes sense, as in these cases the network has to rely more heavily on context since the input does not predict the next letter. This does, however, lessen the strength of Elman's results-only I can't say with much authority, because there was no analysis presented on this pattern. In some cases the error on the last value seems rather significant; not enough, however, to totally discount Elman's findings.

p. 24 suggests that the time-varying error signal would be a useful form of feedback to the system. How would this work? Would the 'error context' (i.e. the history of errors on past inputs) somehow be incorporated into the backprop error function? Or would this be implemented external to the network somehow? This seems like an interesting suggestion, but it's not at all clear to me how it could be practically implemented.

George, Jeff, Sean, and I replicated the last experiment in Elman's paper for our midterm project. Our efforts did not yield the clean hierarchical cluster analysis Elman presents in the paper. Our difficulties lead me to suspect that either Elman's presented results are somewhat idealized, or that his generative grammar and sentence templates were very carefully designed to yield such clean results. p. 23 shows the cluster diagram of individual tokens of the words 'boy' and 'girl'. Upon close inspection, I spotted an oversight/simplification in Elman's generative templates which may give us some clue. The figure contains sentences such as "woman eat boy", "man eat girl", and "girl eat woman". Our templates disallowed such cannibalism by only allowing NOUN-HUMs to VERB-CONSUME NOUN-EDIBLEs. This leads me to suspect that Elman's super-categories and semantic co-occurances were simpler and less idiosyncratic than the ones we used, thereby promoting clean heretical representation. This does not necessarily illegitimatize Elman's findings, but it is important to realize their scope. Elman has shown that a theoretical connectionist network has the capability to learn semantic and syntactic categories-but it would take a really big network to achieve the level of success achieved by the human brain.

Daniel Fairchild

I don't have many comments this week; there wasn't much to read, and I've already spent too much time looking at the Elman article. The chapter and the article seemed to cover a lot of the same ground, though the book talked more about the Kare nets, which seem useful for trying to do pre-defined tasks with neural networks.

The thing I noticed most about the Elman article was the annoying lack of detail. The experiments he did were described only very cursorily, which made them rather hard to duplicate.

Jeff Wu

The reading for this week focused strongly on the work of Elman and recurrent networks. Since my group followed his experiment in syntax recognition, I suppose I have the most to say about that aspect of the reading. The most significant aspect of Elman's experiment is that it proves that an innate sense of universal grammar is not necessary to natural language acquisition. Although the cluster analysis shows that his model clearly obeys the concept of universal grammar by separating words into different noun, verb, and object subcategories, it was not fed symbolic rules in order to do so. This strongly hints at universal grammar being a natural consequence of how human brains work, and that UG much more easily resembles a physical law of nature, rather than a directly innately used schema.

However, there are many problems with the experiment. Although, it is attempting to prove that natural language acquisition is possible as seen through a simple connectionist model, it doesn't quite capture real-life situations. Babies who are learning grammar never hear every possible sentence generated six times again and again. In fact, much of the input they receive may be ungrammatical. The experiment does not capture the poverty of input concept within its framework.

The experiment does show something interesting about the possible way words are "categorized" in our minds. The hierarchical cluster diagram shows that words are grouped together not only by their syntactic properties, but also by their semantic ones. Although the model was not given the lexical meaning or properties of the words, it was still able to abstract the categories through the semantic argument structure. Elman does not touch upon this idea in his paper, however, we feel this is an important note.

i guess another little important note is that the cluster analysis does not provide the fullest and clearest view of what is actually going on in the model. our group preformed pc analysis on the hidden node levels, which in graphical representation, showed many interesting quirks such a direction towards animacy, and varying closeness of the verbs to the nouns which i guess we shall discuss later today.