I found this weeks Pinker reading to be more interesting in terms of the experiments and aspects of language he covers. Of particular interest to me was the experiment that tracked eye motion as readers read sentences containing movement. The idea of movement in grammar is a debated topic among grammarians, but the experiment he cites does sort of indicate that movement is part of language processing. In relation to this point, the bits on dangling trees and working memory were the best part of the reading. I've always thought that parsing occurred in a manner simalir to what he proposes.
The observation that a child's memory is limited and that phrases must be put together in small bits before larger ones may indicate an approach to modeling language processing. I think I heard someone mention something about it in class, but I'm curious as to whether people have tried to teach comptuers on phrase construction and then small sentences and then more complex ones.
The two chapters of McLeod both sparked thought. I was thinking that the way backprop worked, it would not be very reflective of human brain cells in action, which the text confirmed. But the autoassociator chapter I found particularly compelling. It was pretty clever to come up with something that provides its input for output so it could be used to associate incomplete patterns. What is the current thought on how the brain recognizes patterns? Is it one large pattern associator, or aret ther clusters that associate classes. Has anyone used multiple autoassociators to categorize words into parts of speach... or recognize incomplete phrases during parsing?
From what Pinker describes in the learning of morphological rules and speech processing, it seems like categorization plays a very central role in language acquisition. He also seems to argue that without some innate understanding of categories natural to language, learners could not solve this problem, thus that there are some restrictions to the kinds of categories that are useful. McLeod adresses category formation, but it looks like that is more in terms of a prototype and whatever is close enough to it. It would be interesting to look at how constraints in what kinds of categories can be formed would be implemented.
Another thing I was thinking about: Pinker's comparison between past tense forms given by people vs forms given by neural networks (would be nice to know more about these networks). Those are pretty strange formations, and I'm not so sure what's going on. Maybe the network finds the closest thing to the input that it has already learned, and then treats the input V according to that 'model'? But the past tense output does not look like any other existing form, so there must be more to this.
The two things that stood out for me in the reading this week both came from Chapter 6 of Pinker. The first was Pinker's observation that phonetic rules seem to be sensitive to features, not to phonemes (which explains why a rule is triggered by all phonemes that share a feature, and not by individual features). I don't have anything overly revolutionary to add to his point or anything -- it was just a different way of looking at a process that I've seen before in every ling class I've taken but that I don't think I've ever heard explained in quite that way before...I thought it was a really interesting way of looking at those rules.
The second thing that stood out to me was his discussion of how much of what we hear is a result of the actual phonemes that are transmitted to us and how much of it is based on what we expect to hear. He pointed out a whole series of 'mondegreens' as examples of people mis-hearing things in ways that sometimes make very little sense and don't seem overly plausible and wouldn't be very highly expected. When I was reading his argument here, though, I couldn't help but think that his examples of mis-hearing song lyrics and the like could be a strong argument that we DO base what we hear on what we expect to hear -- once we hear something incorrectly one time, we're likely to hear the same thing every time we hear the song. Obviously, the 'right' hearing of the lyrics is at least as likely as the 'wrong' one if our hearing is mostly based on the phonemes that make up the words of the songs. But once we've interpretted the words in one way we expect to hear the words that we thought we heard the first time we heard the song, so those are the words that we'll hear each subsequent time that we listen to it...
Finally, I'm wondering how well a neural net can be trained to recognize the variations in the pronunciation of a phoneme based on context. People who pronounce vowels differently due to dialect tend to do so in very specific and rule-based ways (the vowels are shifted to different places in the mouth in a very systematic way), and it seems like it might be possible for the net to generalize enough to recognize that the 'i' in write and ride are not distinguished in English...
One of the things I found the most interesting in this week's portion of The Language Instinct was the problem of speech perception. Pinker constantly referred to this as our "sixth sense," saying that "we simply hallucinated word boundaries." It's a crazy concept, but we demonstrated it the first day in this class when we tried to visually break down the wav file of a sentence. He also says (p. 157, bottom) that "no system today can duplicate a person's ability to recognize both many words and many speakers." I've played around with a few dictation programs, and they're full of holes, it's true, but what about the Swarthmore College Voice Recognition System? It's not perfect, either (Did you say ... --no! --I'm VERY sorry. Please say the full name ... &c.) How does that work? Even if I have to mangle my friends' names to connect to them, the system obviously has "its own idea" of what the names sound like, and can match any student's voice saying them to the template it has--which sounds to me like an ability to "recognize both many words and many speakers." [Many words--probably 4000 on the high side, with both names and assorted offices? Not a ton but a significant amount, also when you consider that they can be said by <2000 different speakers on this campus.]
Page 208 (the one with the sentence "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo") struck me as one of the more extreme sets of examples (also "time flies like an arrow, &c.") of the ridiculousness of language, its myriad interpretations, and the utter futility of trying to teach a computer to do anything like this, and do it correctly. How can you train a model to NOT produce the meaning "Measure the speed of flies in the same way that an arrow measures the speed of flies"? And, once that's done, Pinker admits that "real speech is very far from The dog likes ice cream and that there is much more to understanding a sentence than parsing it" (p. 225, last paragraph)--how is any of this to be even attempted? Then again, perhaps I'm not extremely clear on what we're trying to accomplish, or why what we're doing isn't futile. (Please forgive my desparate tone of voice and note the time stamp on this email.)
Pinker states on page 269 that "some ocmputer scientists,
inspired by the infant, believe that a good robot should learn an internal
software model of its articulators by observing the consequences of its own
babbling and flailing." --should our model babble? cool!
One thought that keeps entering my head as I read this--as i read anything, i confess--is, how does this relate to music? Pinker keeps touching on it and then dancing rapidly away, as on page 237 when he talks about the ridiculousness of trying to find a "verb in musical notation." I first confess I'm a double major, music and linguistics. And i often hear talk about the inherent similarities of music and language. Last week in discussion someone brought up the idea of a "Music Instinct" (in addition to a Counting Instinct, and others). I restrained myself, because i have no idea where the appropriate forum is to ask these questions, so I'm now taking advantage of the late hour and of the medium, where i can post this and it can be ignored if it's inappropriate. But what do linguists / psychologists say about any of this? Where does music fit into the cognitive scheme, linguistically? When do I go to bed? Now.
The observation that connectionist networks are largely parallel made me think of CS23, which I'm also taking this semester. I wonder how difficult it would be to implement a (useful) connectionist network in hardware? Has anyone ever tried this? If so, what were the results? It seems like the massively parallel structure of connectionist networks could result in great speed if implemented so that the calculations involved were actually executed concurrently--rather than successively as must be the case when a neural network simulation is run on a traditional digital computer architecture. But it might be complicated from an electrical engineering standpoint to implement a network of a useful complexity...
What is the effect of trial order on an autoassociator learning multiple patterns, or for that matter on any other neural network? It seemed from the chapter that the order in which the trials are presented to an autoassociator is important, but the chapter did not go into detail about why. What difference does it make if you teach your network all about bagels, then all about dogs, then all about cats, as opposed to cat, dog, bagel, cat, dog, bagel?
I was confused by figure 4.2 in McLeod--is there a typo? (b) and (c) seem to be identical...
How does one integrate an autoassociator into a larger system? The chapter made some mention of using autoassociation as part of a larger cognitive process---how do the different connectionist components get linked? Is that even how it works? Can we even think of an autoassociator as distinct from, say, a simpler pattern associator, and use them in series, or would we just create one big massive network which would incorporate both in a distributed manner? I suppose this leads to the larger question of how localization and specialization of different areas of the brain takes place--how do the language processing centers of the brain interact with visual information processing centers or the memory centers? Do they look like somewhat distinct sub-networks, with only limited connection, or what?
The reading in The Language Instinct was not as interesting as in the previous reading. For the most part, it seemed that Pinker was just reiterating his earlier points and reusing the same examples. Chapters five and six were good lessons in morphology and phonetics, but the other chapters seemed like a repeat of chapters one and three only with the linguistics theories thrown in to the texts.
The Sounds of Silence chapter brought up some issues for me. What makes a certain group of consonants illegal or legal? What are the restrictions or rules that tell us that "s" and "r" do not go together. Also, if a word consists of a vowel and consonants at the beginning of a syllable, is it still called an onset? Or does it just have a rime? The last part of the chapter did a nice job of showing why a morphemic spelling is much more useful than an phonemic one.
In the Modelling text, the mention of the "poverty of the stimulus" reminded me of the poverty of input discussion we had last week from the Pinker text. It was very interesting to point out the similarity between retrieving visual and auditory input.
It was odd reading how confident Pinker is in the inabilities of technology, and how he takes the state of technology at the time of the book as indicative of what technology in general can and can't do. When he is arguing for the complexities of the implications we use to decipher even the most mundane conversation, he appears to be arguing that the sheer complexity of the system prevents artificial systems from being able to perform such tasks. I'm curious as to how he can assume a bound to artificial complexity. Already, we can see how that assumption might be flawed; the Swarthmore College voice directory, although far from perfect, is far superior to DragonDictate (157).
The Connectionist Models reading is a sharp contrast in that it shows utter faith in a methodology that Pinker swiftly dismisses (Pinker 125). It's difficult to discuss the Connectionist reading, however, in that the subject matter is primarily objective. For much of the book, it takes for granted not only that artificial modeling is a valid pursuit, but even that a particular approach (the connectionist approach) is valid. The text then focuses on specifics of that approach. The text is useful and informative, but, alas, not that controversial.
Daniel Fairchild
One of the interesting things that I noticed Pinker talks about is concurrence of phonemes -- the way that all the phonemes in a word affect each other, and cannot be separated. In humans, syntactic, semantic, and pragmatic clues are used to help determine what the speaker actually said. However, a simple speech-recognition program would not have this assistance. So perhaps the only way computers will ever be able to really take dictation, is if we teach them to understand what they are hearing, with complete integration of all of these systems. Related to this is the fact that humans do not understand speech at a phoneme level, but at levels above this; at a minimum we understand at a morpheme level, and frequently it takes levels higher than that to extract some meaning from utterance.
The other thing I noticed was how little we understand how children acquire langauge in the first place. We know that they only take until the age of about four years to obtain an excellent facility with it, but we're really not sure how that happens. With that in mind, it seems much more difficult to teach language to computers: we could be going about it in a comletely wrong manner and not even know it. Especially difficult is figuring out how much to hard-code and how much to try to teach, and also just how the hard-coding is to be done. It seems likely that humans do have a universal grammar that influences what kinds of things we can do with language, but there is still plenty of debate as to just how extensive this grammar is, and we haven't even really begun to speculate about how it might be implemented. This makes it much harder to implement such a thing in a machine.
The Tower of Babel chapter sometimes seems to run counter to Pinkers main thesis that language is a built-in instinct and not a cultural invention like writing. While he remains highly skeptical of the claims that all 6,000 modern languages have been traced to a mere six root languages, he seems willing to accept the basic idea that most of todays languages have evolved over millennia from just a few proto-languages. In fact, he even provides evidence that the spread of a certain language family might have been concurrent with (and the result of) the spread of agriculture which is, of course, an invention and not a hardwired instinct. What Pinker believes, then, is that tribes had possessed other languages before their languages were supplanted by proto-languages such as "Proto-Indo-European." The question is, how can we figure out if this scenario accurately describes what happened thousands of years ago. I suppose the best evidence comes from the existence of so-called "orphan" languages, such as Basque evidence, perhaps, that one need not be taught language by other tribes in order to possess language.
Talking about linguistic universals can be a tricky matter. At one point Pinker talks about the "universal" that no language forms questions by reversing the order of words within a sentence (234). If we include such universal absences, then our list of universals will be infinite. (Consider: "No language forms the imperative by adding the word armadillo to the end of a sentence.") For Pinker, the clinching universal is that if we randomly pick a language, we can always find things that can be considered subjects, objects, and verbs (237). A question worth struggling with is whether the inclusion of these parts of speech is not a natural and unavoidable consequence of trying to communicate about the world (i.e., The world constrains us to experience reality in such a way that we have no choice but to think in terms of and use subjects, objects, and verbs. In this view, concluding that natural selection is responsible for the way we speak is like giving credit to evolution for our walking on the ground when, in fact, we have no choice but to walk on the ground due to the constraints of gravity.) If such parts of speech are necessary, then the universality of subjects, objects, and verbs says more about how the world has shaped language (at a cultural level) than how our brains have been shaped to handle language.
Pinker points out that, on each level of language organization he examines,
language turns out to be a discrete combinatoric
system (or something like that), meaning essentially that units at
each level are made up of different combinations
of distinct units at the lower level: syntax is the combination of
words, words of morphemes, morphemes of
phonemes, phonemes of distinctive features. He also points out that
the enormous number of different concepts
we need to be able to express pretty much requires such a combinatoric
system; we could never remember 60,000
different-sounding words each without remembering them as combinations
of a finite number of smaller units.
Pinker conceptually opposes such a combinatoric system to some other
kind (I don't remember his terminology)
I think his distinction could be thought of as 'digital' versus 'analog.'
What is intersting for me is that while the
sounds of words must be in a 'digital' (phonemic) format, the meanings
of words clearly are not; we can not
find exact definitions or think ot meanings as discrete, bounded and
strictly definable in terms of other meanings
(George Lakoff has discussed this idea a lot in Women, Fire and
DangerousThings). And, we know that thought
itself is a complex analog system that depends on imprecise biochemical
factors.
This might be stating the obvious in a complicated way, but:
I guess the reason speech can not be an analog system, as thought is,
is that mental states are so hugely subtly complex
that to express them in a non-digital format would require an impossibly
complex signal. The mental resources to
understand such signals would probably be even more complicated than
the resources required to think those thoughts.
Therefore the best we can do is the kind of digital system which language
actually is, and even then we know that a
person's linguistic expression and even the linguistic part of their
thoughts reveals only a tiny part of what that person
is actually thinking and feeling. Incidentally for me this is the biggest
problem with the Sapir-Whorf hypothesis (that
what linguistic system a speaker uses is important in determining what
the person thinks): we know that even within a
given language the same words can reflect a million subtly different
thought processes, so whatever power words have
to determine thought processes, there must also be so many more
non-linguistic
factors determining your thoughts that
the Sapir-Whorf thesis would become comparatively trivial.
Gordon and Kiparsky's work, showing how children formed compounds with irregular plurals (mice-eater, rat-eater) wowed me, but the assertion that "Motherese just doesn't have any compounds containing plurals" and therefore that this is "another demonstration of knowledge despite 'poverty of the input'" strikes me as false: teethmarks are a regular topic of discussion around Michael now that he has his first teeth, and i'm sure we'll speak of such monsters as purple-people-eaters around him in the coming months. And Michael will be watching plenty of baseball games with Joel and me once the season starts up, so he'll hear flied out (similar concept) quite a lot. Just because these words aren't in Motherese's limited lexicon doesn't mean babies don't hear them!
Reading the section about learning vocabulary ("a new word every ninety waking minutes"), an old memory was stirred up:
In fourth grade my classmates and I got the deranged (we didn't realize it was so deranged until we started in on the project) idea of seeing who was smartest by comparing vocabularies. We did this by trying to write down all the words we knew ... in alphabetical order, no less! After a few days we realized, even at age nine or so, the utter futility of this endeavor.
Mathematically, it does average out to one word every ninety waking minutes for seventeen years, but in reality, isn't this learning much more concentrated at particular ages or phases of development? Or are we so literally always learning? Do we continue absorbing new words so readily after age eighteen? If not, why do we stop?
Markman and Hutchinson's work on the lack of true synonyms was another odd moment of revelation. The biff experiment was nifty. And the line about the larynx tightening when you stand up led to everyone in my apartment at the time stopping what they were doing, and trying it several times in surprise. Pinker's good at that.
Several years ago, I compiled a list of what I called "&-terms": pairs of items often refered to together, and almost always listed in a set order, such as "hands and knees", "father and son", and "ham and eggs". (Alas, the full list is not handy.) I sent it to a 'zine I participate in, with a solicitation for theories about what was going on, and several suggestions were tendered, but no conclusions were ever made. Now I really want to find that list and see how many conform to the phonological reasoning Pinker gives for ping-pong and fiddle-faddle. (I also have a strange sudden urge to find a pool hall, walk in, say "Philadel-fucking-phia", and see what happens.)
Finally, a meta-comment: if we've got any questions for the man himself, here's the perfect time to ask them:
> Mark your calendars: On Feb 8, 2001, our fifth online chat will feature > Steven Pinker, a professor at MIT and author of books about mind and > language. More details at: http://wordsmith.org/chat .This is being run by Anu Garg, of A.Word.A.Day.