CS/PSYCH 129 Week 12 Reactions

CS/PSYC 129 Week 12 Reactions

Sean Lewis

I was not particularly interested in the aphasia/phonological error comparison, although I did appreciate Dell et al's proposal for why the phonological error model could not handle 'heft lemisphere' as it should.

I was confused by the input of the structural priming model, however. Is every word represented orthogonally? If so, then what exactly is the differences between singular and plural nouns. There would be no relationship between them, as they would be just as unique as other words. I don't see how you can capture number and tense by representing words orthogonally. If they had some sort of morphology included in the input, I could not see exactly how it worked.

I found some of the details of Plaut's paper even more confusing. But hey, who's to argue with 98.3% accuracy. For real though, what is log frequency, orthographic neighborhood size and variance? There is a whole bunch of percentages I just didn't get.

And I'm not sure exactly what fixations are... is it like when you are reading 'bay' and as you read 'b' you output /b/, then /a/ for 'a', but when you see 'y', you realize that you should delete the /a/ and replace it with /ei/ for a phonological output of /bei/?

I'm guessing it is something like this, because it is on fixations that they base the ability of their model to capture length effects. It seems the longer the word, the more chances there are for such re-phonologilisizings(!?#@), as they pointed out with the psychological data concerning larger consonant clusters.

A little more clarity would have been nice.

Henrike Blumenfeld

Some comments about the Plaut paper: I think the discussion is very interesting. But it seems to make some claims that are a little on the strong side. The one about doing away with the regular vs exception dichotomy on p561: yes, his model seems able to make the distinction, but he choose an architecture that at least has different levels. Can't we to some extent look at that as an accumulation of 'different modules'? Not all connectionis architectures can deal with this. We've seen in our past tense simulations that the regular vs exceptions distinction definitely is an issue, and even if there are ways to solve this problem, the "apparant dichotomy" still seeems to be a useful distinction to make.

I think Plaut's new take on double dissociation as 'graded functional specialization' instead of separate modules is definitely thought provoking. He cites clinical cases where different impairments are to some extent mixed to support it -- I wonder whether there are any cases of one function being clearly injured, while another is left completely intact (and vice versa), and how those could be explained within that model.

Some small things, I'm wondering why he trained the network on different learning rates (see p555), and how realistic that is in comparison with people learning how to read. Also, I'm no so clear on how Plaut lesioned the network to simulate the different kinds of dyslexia.

Julie Corder

Overall, I found this week's reading to be really interesting and enjoyable. The article by Dell et. al. was particularly interesting to me. I thought they did a really nice job of relating their studies to human performance (e.g. "tip of the tongue" phenomenon). Mainly, though, I liked the fact that they recognized and acknowledged the limitations of their models ("p. 524 -- rheir discussion of the limitations of the aphasia model; p. 527, when they explain that the phonological error model can't explain exchanges) and that they used the limitations of one model as the grounds for introducing the next model.

Plaut also did a good job of recognizing/acknowledging the limitations of his models. I thought that his decision to train the network on an additional 50,000 words with a learning rate of .001 was interesting (p. 555) -- he says it's to minimize noise from the original data. This is the first time I've seen anyone do that. It seems to make sense, though; is it a common thing to do?When he compares the effect of word length on the number of fixations, he compares 4-letter words to 6-letter words and shows that the difference between them is significant. But if you look at the number of non-word-initial letters in the words and the average number of fixations after the first (and, I assume, necessary one -- the beginning of each word), then there is a fixation on 10% of the letters in a 3 letter word, 6.7% of the letters in a 4 letter word, 10% of the letters in a 5 letter word, and 18% of the letters in a 6 letter word. Plaut compares the performance of the word size with the best average word size (4 letters) to the worst (6 letters), but doesn't explain why he choses those two lengths, or why the performance on 3 letter words is the same as the performance on 5 letter words. I'm not sure how much of a difference this actually makes in his argument, but it bothered me. . . maybe it's just that the performance on 3 letter words should be ignored, and the effect doesn't start to change the performance until the words are longer than that, but . .

Nori Heikkinen

Dell, Chang, & Griffin
i found parts of this article really interesting. intriguing was the assertion on page 519 that "as the comprehensibility of telegraphic speech shows, some structural cues are largely unnecessary for understanding." while obvious, this is kind of counterintuitive, and gave me pause. i demonstrate this every day, of course--tonight i handed my roommate the pot for the rice cooker while washing dishes and told her, "cooker -- put -- rice -- you", and she understood me perfectly that i wanted her to put the pot in the rice cooker. all we've been reading about in most models, however, seems to have implied that there's a huge syntactic component that's just being left out by these simple networks that only encode phonetic information. this suggests that sure, the semantic element is huge, but the syntactic element is perhaps not as large a component as we (at least *i*) perhaps thought.

i really liked the idea of lemmas, especially the "tip-of-the-tongue" phenomenon. we'd encountered lemmas before in linguistics (semantics class), but they never really made much sense until now. it's so cool that people can produce everything about a word, even its grammatical gender (p. 520)--i find myself doing this all the time, looking for an author's name or something that has X letters and starts with a G, with an R somewhere in the middle, or something. it's such a common phenomenon, and i'd never really considered the fact that that suggests there are two levels of encoding going on from word to meaning--phonetic and the cool lemma thing.
on the opposite hand, is there some kind of aphasia (or common phenomenon like tip-of-the-tongue in reverse) in which people can produce the phonological information but not the lemma? or is that kind of ridiculous and nonsensical? ...

p. 524, the "three limitations in this aphasia model" -- not sequential! is there NO good way around this? [okay, reading on, i see that some other model ... can't find it now ... does read in things sequentially. maybe it was in the Plaut.]

in the discussion of the "Structural Priming Model" (p. 532), the authors say that "the central claim ... is that structural priming is a form of implicit learning." i can't tell if this is just a really neat represntation of what goes on, or if it's a gross oversimplification. "Central claims" often aren't entirely correct, and often end up making the whole model beg the question ... if priming is actually a form of learning, wouldn't the network never learn to tell the difference between intransitive-locative sentences and just plain passives ("The 747 was landing by the control tower" as opposed to "The 747 was landed by the control tower")?

Plaut
I don't have much to comment on in this article, except for one question. on page 554, the author says that because "position information for internal letters is assumed to be somewhat inaccurate, ... so that the same letter units at neighboring internal positions are activated slightly (to .3)." Later down on the same page, the author says that "the correct phoneme had to be activated above .7 and all others had to be below .3" -- am i reading this wrong, or would all others never be below .3 if surrounding internal letters were activated to .3? okay, bedtime.

Jeff Wu

The Plaut reading for this week seems like a powerful and useful extension to the Elman(1990) work with phonemes. The fact that this work incorporated the issues of dyslexia through the interaction between orthography, phonology, and semantics as well as working with a continuum of consistency seems like a very plausible approach to modelling human language acquisition. The impact of the peripheral damage on the model also produced desired results which also give strength to this work.

What seems a bit undesirable is the fact that the model works purely in order to generate correct pronunciations on a phoneme by phoneme basis. Since Plaut states earlier that there is interaction between semantics and orthography as well, it would be nice to see the model pick up information about lexical items it picks up along the way, and use that knowledge for faster recall of the pronunciation in the future. I couldn't find any information in the results and discussion section of how they incorporated semantics into the model. However, they mention it again in the discussion section.

Nonetheless, I feel that I cannot continue any further with my reaction. The article has made me aware of my possible surface dyslexia and i am not handling this new insight very well.

Jeff Ebert

Dell, Chang, and Griffin

The authors mention that Dell et al. (1997) predicted that aphasic naming errors would fall between two extremes: the normal pattern produced by nonaphasics, and a random pattern. Such a result, they say, would support the continuity thesis, meaning that "normal speech errors and aphasic paraphasias reflect the same processes" (p. 522).

First, I do not see this as much of a prediction. The data for patients with even the worst cases of aphasia are nowhere near the random extreme, and, obviously, patients are, by definition, going to perform worse than normal controls. Second, it does not follow from this highly unconstraining prediction that errors for normals and aphasics reflect the same processes.

It appears that the aphasic model described in the first section of the paper consistently made more semantic errors than both normal controls and aphasic patients. From tables 1 and 2, it seems that the model makes twice as many semantic errors as humans. Given that Dell et al. (1997) handpicked all the parameters and weights of the aphasic model so that performance would match that of controls, it would be extremely interesting to find out that the model could not be made to match semantic error rates in humans. Of course, such a mismatch could be due to something as innocent as their having chosen semantic representations that overlap too much, leading to a higher number of semantically-related errors. However, there is also the possibility that the model is inherently flawed in such a way that semantic errors will always be overproduced.

Daniel Fairchild

On Dell, Chang, and Griffin:

On page 518, they mention that the sounds of words seem to be retrieved sequentially, which seems quite counter-intuitive to me, and therefore interesting.

Later, they talk about adjusting the connection weight parameter (p) and the decay parameter (q). I'm not quite clear on what these are. Is p some sort of general coefficient, or a weight limit, or what? And it seems obvious what q does, but how does it work?

At point on p. 527, they say, "The percentage of errors that were phonotactically legal in the model ranged between 87 and 100 percent." Are they referring to different runs, or is there some other reason why there would be a range for something like this?

Also on the same page, the discussion of the production of Spoonerisms like "heft lemisphere" talks about a phoneme being inhibited after it has already been used in the first word in the phrase. So, does this mean that alliteration would be inhibited? Of course, in an alliterative phrase, there wouldn't be any other phoneme with a high activation ready to take the place of the first phoneme in the second word, but there still might be some sort of latency effect.

Finally, about the graphs on pages 537-538: in both cases, the model seemed to experience no effects from the number of intervening sentences, which seems odd. I would think that the intervening neutral sentences would tend to diminish the effect of the priming sentence somewhat. The other thing I noticed was the human data for datives: there actually seems to be a greater effect from the priming sentence after 10 intervening sentences, which seems rather less plausible, and would lead me to suspect the data somewhat without some sort of justification for it.

On Plaut (is that something like ablaut or umlaut?):

p. 546: Heh. "poorly structured orthographic and phonological representations ... based on context-sensitive triples of letters or phonemic features." Down with Wickelphones and Wickelfeatures!

The network that he used seemed very specialized, designed just for that task, and with a lot of preset parameters. This seems like its making some overassumptions about both the brain's structure and its preconditioning towards reading (generally assumed to be nonexistent, I think).

I also had a bit of a problem with his graphemes. Many of them consist of more than one letter, like "ay", and there doesn't seem to be any explanation of how these were selected. They seem to have been just fed into the model in a preconnected state.

Finally, on pages 558-9, I don't quite understand the differences between the different types of nonwords. If a nonword is derived from an exception word, doesn't that mean that it also has to resemble a regular word, since otherwise the exception word wouldn't be an exception? And what were the controls? If the test they were using for correct pronunciation was resemblance to some word in the corpus, doesn't that mean that each of the nonwords would have to resemble some word in the corpus orthographically? And if so, wouldn't each one also have to resemble a regular word? There do seem to be differences between the success rates (and the number of fixations for the model) obtained for the sets, but they don't give any indication of whether these are significant, which suggests to me that they were not, and that there was no point to dividing the nonwords up.

George Gibbard

I found the Dell, Chang and Griffin to be one of the most impressive, most thoughtful and clearest of all the articles we've read so far. In all three sections, their results are darn good, compare darn well with human experimental results, and don't appear to be fudged or obfuscated in any way.

In the aphasia model, I am intrigued by their well-supported assertion that human aphasia can be modeled by lowering functional parameters of connection weights and maintenance of activation. Both of these could correspond to comprehensible changes in brain chemistry, no? I wonder, what do we think causes aphasia in terms of neurology? Do different parts of the brain require different levels of neurotransmitters to function properly, and do strokes etc. somehow upset this balance?

In the third part, I am a little uncertain what they mean by 'implicit learning,' though I guess nothing in their experiment actually hinges on this terminology. So the idea is that word order is simply a matter of the 'most prominent' items getting the most activation and therefore being lexicalized and output first, right? 'Structural priming' is then explained by the idea that a patient role as subject in one sentence makes subsequent patient roles have greater activation.

The question is, what does 'most prominent' mean? The answer must be, nothing simple; it must reflect the sum of several different sets of criteria, some based purely on event roles, some taking into account status in discourse, as whether an item is being newly introduced or already under discussion, whether it is seen as central to our attention or comparatively unimportant, and so on. And, different items have to be 'most prominent' in this abstract, neurological sense to speakers of different languages. We could say that this idea of 'most prominent' is not really saying anything not already expressed by 'comes first in the sentence', unless we want to argue that this also necessarily reflects something about the prominence of an item in a speaker's consciousness, not just in some subconscious linguistic area. And here we get into some interesting Sapir-Whorf stuff. (the theory that different languages reflect significantly different forms of consciousness and reasoning) Hmmmm....

They point out that 'a connectionist model is only as good as its input and output representations.' We are seriously handicapped, then, because we don't have any good model for a prelinguistic representation of propositions. After all, how are we going to think seriously and analytically about meaning apart from language, and then communicate our thoughts to coworkers? I challenge you guys. Chang et al. have a network outputting sequences of words based on a single, instantaneous input representation of propositional content: But do we really think propositions instantaneously? ....maybe.... also, Sapir-Whorf might dictate that speakers of different languages are also dealing with different inputs of propositional content.