CS/PSYCH 129 Week 9 Reactions

CS/PSYC 129 Week 9 Reactions

Julie Corder

I think this may have been addresses somewhat at the end of the article, but I'm really not sure since it ended up confusing me beyond belief . . the thing that really bothered me about the first experiment by Hahn and Nakisa was the fact that all of the words were "novel" but the system was still expected to come up with the single "right" response. It seems like the varying answers by the native speakers in their other study suggests that there isn't necessarily one predictable "right" plural form, and it seems unreasonable to expect the models to find one. Further, that does not seem to model in any way the task faced by human speakers on a daily basis: most of the words we pluralize are words we HAVE heard before, not completely new constructs that we must try to fit to a generalization of existing words. I'm not really sure what testing the models on this task really says about their ability to represent the pluralization tasks that ARE done by humans, or about their ability to produce results similar to those of a person trying to pluralize novel words.

The other thing I was bothered by was their decision to use right-justified phonemic representations and then to pad the inputs with zeros. While right-justifying may help the models to recognize similarities in word-endings, it doesn't seem to make generalizations among variant-length words possible. I think we talked about this in class before break -- but words that act like the English "sing" and "sang" but were of different lengths could have a CVC feature that "marks" the word for a specific irregular form fall in different input nodes, which would not allow a model to form a generalization. I know I'm not explaining this well, but I'm not sure how to say it more clearly, so I'll assume that everyone remembers discussing it and so has at least SOME idea what I'm talking about :)

Nori Heikkinen

a general question about all this type frequency stuff: no matter what language we're speaking, in general, a language's Motherese will contain mostly very common verbs. As the most common verbs in most languages are also the most irregular, does any child's primary linguistic input consist of a majority of regular verbs? ... like in Marcus et al., p. 213, the citation at the bottom--referring to "the distribution of regular and irregular past tenses in the language"--is very different in Motherese than in the actual lexicon of the language.

Marcus et al.
the bit about pattern associators and ablauting verbs (or other common types of irregulars) on pp. 194-5 is somewhat confusing. Given the new verb "spling," I imagine most people here would inflect it "spling-splang-splung," given its hypersimilarity to ring-rang-rung, sing-sang-sung, &c. Marcus et al. say in the middle of p. 195, "there is a broad consensus that pattern associator memories have a role in irregular inflection." But how can we operate on both a pattern-associator and default case system at the same time, as the authors seem to be arguing? By the default case, "spling" should become "splinged," NOT "-splang-splung." I'm confused.

bottom of page 200: "no word in English has the [phonological] sequence found in oink" -- not true! yoink, boink ...

footnote, p. 204 -- Marcus et al. refer here to "misanalysis by some speakers" (and "assum[ing] that the derived verb is directly based on the root"). It's kind of ironic that they, especially with Pinker as a co-author, should label this formation MIS-analysis--isn't this just like what the prescriptive grammarians that Pinker so lambasted in The Language Instinct do? In describing what they believe actually to be happening via symbolic rules, Marcus et al. are here excluding some of their native speakers--those whom a linguist is taught to hold up as their cheif authority on a language. Shouldn't both "wolfs" and "wolves" be acceptable in the sentence "With a couple of quick wolfs/wolves, Arnold consumed his entire lunch"? (Or is this indeed just a case of not-so-accidental homophony "fooling" speakers?) (Also referred to on page 207: "18. Speech errors." Hmmmmm ...

p. 219, in the study of german children, using it to draw conclusions, the parenthetical phrase "(19 [of 22 children] were language-imparied, but they behaved similarly to the unimpared children)"--um, how do you know they behaved simiarlly, and isn't that kind of a huge conclusion to draw from basing your study on 86% language-impaired children? just a thought ...

hahn and nakisa:
General question: if we accept Hahn and Nakisa's idea here that there doesn't have to be any symbolic manipulation going on linguistically, does that just destroy all of classical linguistics? this is weird.

(phonological query: p. 321 - they say the umlauts a:, o:, and u: they say were "treated as one, as is standard in the literature" --why??

p. 323, describing what potentially-salient information was left out of that which was fed into the model--semantics--isn't that a rather big omission? but no one includes it, i guess, so the exclusion of semantics here isn't going to make a difference in comparing their results against any other model (which will have also left them out). But then, on page 342, H&N question whether it's even a relevant concern!--"would inclusion of semantics ... change the overall pattern of results?" and then imply that it woudln't. Is this just more destruction of classical linguistics? help!

Jeff Ebert

Marcus et al

When adults learn, for example, English as a foreign language, I presume they are explicitly taught inflection in terms of rules and exceptions. Would a connectionist argue for a subsymbolic, non-rule-based account of how these individuals inflect English words? This would be the converse case of a symbolist arguing that from patterns in the linguistic input a child induces formal grammatical rules.

A related question: Does anyone know about the systematic errors made when non-native English speakers inflect nouns and verbs? Such errors might be relevant to the present debate.

p. 220
The authors discuss type vs. token frequency, citing that in English, regular verbs constitute a majority of types but a minority of tokens. I would argue that once a particular word is learned, its frequency should play little role in learning patterns of inflection. If a person is exposed to the same purple apple 100 times, he or she will still learn that apples are generally red after being exposed to 50 other apples that are red. Type frequency - though not perfect - should guide the selection of training sets.

Hahn and Nakisa

Interestingly, the authors choose to implement their single-route network with a fixed-alternative output layer - rather than having it output actual inflected lexemes. This implementation presumes that once a manner of inflection is chosen by the network, the inflection can be carried out. But, the authors leave this next step up to the imagination of their readers, and so we fall back on the rule-based, symbolist account of how such an operation can take place.

Jeff Wu

Both articles for the week are very convincing in their respective arguments. Marcus et al provide much evidence for a symbolic rule component to language while Nakisa and Hahn provide the experimental data that disprove the concept. However, the evidence gathered by Marcus et al from English is overwhelming in itself. The fact that people can generalize past tense to new verbs supports the claim. However, the most interesting backing uppers are the homophones, onomatopoeias, and the headlessness argument. It's very obvious that the single-route models cannot account for words in these categories. The article also gives the arguments found in Pinker's Language Instinct for a universal grammar which I have found very persuasive from the first read.

However, the evidence from German seems a little sketch to me. First off, it doesn't seem logical for a language to use a symbolic rule in terms of a minority default. Marcus et al claim that -s would be the default plural, even though it only occurs in less than 7 percent of the total number of verbs. Why would any human refer to a default that was only usable 7% of the time? Secondly, all the evidence provided for why -s should be the default is very thin. They seem to have randomly found -s as a plausible default and then ran with it with as much proof they could gather.

Nakisa and Hahn show that the single route model is actually superior to the dual route model. This is not all too surprising since it seems that they were set out to disprove the dual route model in the first place. They set up a test base that seems biased against the dual route model. For instance, they discarded over 5,000 nouns simply because they were in categories with a type frequency less than .1 percent. The dual route model would handle this data well, and maybe even better than the single route model, and i think the results might differ a bit, if they were left in. Also, Nakisa and Hahn themselves state that the dual route model might be superior over the single route model if features such as semantics and token frequency were included. Anyway, the results don't show that one model is extremely better than the other. This does however, lead to the question of why we would have a more complicated dual route system in our brain if a single route works just fine.

Sean Lewis

Marcus et al make a pretty convincing argument in favor of default rule application of morphological features. But, I find it very unsatisfying that they criticize computation efforts, yet provide no model of their own. I suppose it is important for Marcus to point out errors in approach and implausibilities, but the lack of implementation for his model makes his observations sort useless. If a dual-route model is needed, then it must be implemented to show its computational efficacy.

Since there is no model with which to argue, Hahn and Nakisa coerce dual-route models out of single-route models. I had a couple of with their approach and consequentially their results, however.

The rule-based account proposed by Marcus et al depends upon 100% accurate memory retrieval. Thus, if one of the networks that Hahn and Nakisa use weakly predict a certain structure, that doesn't mean the rule should be applied. The reason is that even though the memory of the network failed, it doesn't mean it should have.

The dual-route model would only really be more effective if irregular forms were recognized with 100% accuracy. Then, while regular forms in a single-route model would be subject to error, they would not be in the dual-route model. The Hahn and Nakisa experiments can not capture this fact, and that is a limitation of their approach (which inherently limits the accuracy of the dual approach). Of course, if Marcus et al had provided their own model, their claims would not have been judged based on a limited model created by detractors.

Hahn and Nakisa's critique of Marcus et al assumptions based on the latter's data, were quite reasonable however.

While Marcus et al's solution would seem less prone to error on a subset of the data, Hahn and Nakisa's implementation of the solution do not capture this quality, making their critique rather vacuous. But, their analysis of Marcus et al's data may indicate that the rules which appear to be in effect in English are not in fact at work in German.

George Gibbard

In general I found Marcus et al. compelling. I have only gotten through the first set of experiments in Hahn and Nakisa. First of all, I agree that generative linguistics hasn't explained irregular morphology well. However, besides rote memorzation and Chomsky's 'minor rules,' the principle of 'analogy' has been well-established in historical linguistics since long before generative linguistics came on the scene. This is what German speakers use to put 50% of loan words into native plural classes, and what English once used to create the past tense 'caught' for the French loan 'catch.' This principle has not been questioned by generative linguistics (possibly apart from Chomsky's 'minor rule theory'óI'm not sure how he decides which words should come under the operations of minor rules); it just hasn't been analyzed. Instead, all irregularities are treated as 'lexical' and thus taking precedence over the rules that assign regular forms. It's not clear though how, in formal descriptions, 'analogy' would make different lexical entries share irregular features. Hence, it's good that Marcus et al. do adopt the 'pattern associator' view of how irregulars are handled. Both articles agree that they agree on this. Hahn and Nakisa view it as significant that 80% of German plurals can be predicted by the pure pattern associator; and this shows that the German plural system is 'highly structured', not 'genuinely arbitrary.' Trueóand this could show that analogy is an important operative principle not just historically but in actual language generation. But Marcus et al. only say that German is 'to varying degrees arbitrary' in this department; in fact they agree with Hahn and Nakisa, accepting the pattern associator model of irregular inflection. I have problems with the claim that 'differing levels of productivity of the [various] irregular forms' is 'a phenomenon inconsistent with a system built around a single unifying symbolic rule.' I would say it's not, and I think Marcus et al. would agree that analogy as processed by a pattern associator could do this, without undermining the more abstract and symbolic default process.

More importantly: Marcus et al.'s strongest claim, as I see it, is that in order to determine a word's inflectional class, a speaker must take into account 'root status and morphological structure.' Hahn and Nakisa classify the arguments they're trying to counter as two: first, that regular classes must be treated as 'default' (hence, not forming 'regions in a similarity space' and opaque to a pattern associator model); and second, that 'default' classes need not be statistically dominant. They ignore the need Marcus et al. point out for 'root status and morphological structure' information; hence the models that they claim represent Marcus et al's views in fact do not do so. Indeed, their models, which use only phonological input, are incapable of taking into account such semantic information. Not that I know how to model this; it just seems like a crucial problem. Basically, Marcus et al. would require a much more formal and symbolic representation of input: one that would be able to determine that an item was not simply the root word, but 'exocentric' ('headless'), and therefore the N involved was not really the phonetically identical root N, i.e. that the Childs are not children. All Hahn and Nakisa's 'dual route models', if intended to represent what Marcus et al. argue for, are straw men.

I also question whether they refute even what they claim to. So far in my reading, they have not shown what they would need to, namely that pure pattern association solves the problem without syntactic/semantic information. They have not shown any ability of pattern associators to assign words to a default class. They have just shown that their version of Marcus et al. (missing what Marcus et al. believe to be crucial and necessary information) also fail to assign the right words to the default class.

Furthermore, Hahn and Nakisa's model itself relies on the use of morphophonemic rules, not really pure pattern association. They expect not actual generation of phonological output, just membership in a class, and a class is defined by a morphophonemic rule.Thus for example 'the umlauts are treated as one.' One what? Traditionally, umlaut is regarded as one (symbolically described) morphophonemic RULE, stateable in different waysóadd a +front feature to the vowel, or, change /a/ to /e/, /o/ to /ø/, /u/ to /y/. You would think pattern association would not like such rules unless it allowed some symbolic representation. Not of course that this is the same kind of rule Hahn and Nakisa are objecting to; they might allow rules to operate on symbolic representations of phonemes or features, just not allow more abstract representations of syntactic categories (like N or V) to be operated on by rules. I don't know if this is actually an important philosophical distinction for them, or just one they don't realize they're making.

Lastly, I think it's ironic that people arguing against the use of rules in generation of linguistics use such rule-bound, pre-programmed models. Two out of three of their models have their operational algorithm pre-programmed, not something possible for neural nets. I'm not sure how legitimate a complaint this is though, and I think that's about all time allows.