Several things about this chapter interested me, perhaps because it was very biologically grounded and contained a lot of things about human psychological development that I didn't know.
One thing that I found interesting to think about was the way in which a longer time of development becomes selected for. Thus, disruptions in our development (causing deafness, for example), are compensated for by other developmental areas. At first, thinking about this from an evolutionary perspective, I was confused, because if natural selection is eliminating the weak (so to speak), then the brain is essentially working against the "grater good of the species" by compensating for such disruptions. (I'm not trying to upset any one with this comment -- I could never be called a eugenecist, I was just trying to think about compensation and flexibility in the most broad sense) Thus, if a wild animal is born with this kind of disadvantage (ie., poor hearing, sight, etc.) it will be more likely that it's otherwise similar siblings to be killed off. However, as I thought about it more, I realized (duh) that since disruptions such as blindness and deafness are not passed on to the off-spring, compensation is a good way to go, because one needs one's off-spring to be prominent in the population, and for that to happen, one line with a slight, _transient_ (ie., uni-generational) must do as well as the others. What does this tell us about the way that neural networks must be constructed? Not only should the network be able to respond to damages within the system, but it should be able to be able to pass on its _capability_ to solve problems and its _flexibility_ in the face of adversary to future generations (epochs?) of itself without passing on the problems themselves.
I was confused by the view of R&M that their network had not learned the past tense rule when it did not automatically add -ed to any stem. How did they justify that since in all rules of that type, we have all been taught that it is the exception that makes the rule?
Another question that was prominent when I was reading about teaching the past tense was whether the same rules were true for other languages, and in how many other languages has this been tried? Is the way we learn language fundamentally the same, no matter what the specific language is? How does this change in languages without conjugation? Off the top of my head, most of the languages I am familiar with have both regular and irregular endings in the past tense, so I guess it wouldn't make much difference. It would be interesting to test this, however. It might also be interesting to test whether the network, given enough training could distinguish between, say, English and French verbs and give the proper endings for each.
So maybe its the way the book is written, but I'm wondering exactly what proof there is AGAINST the connectionist hyposthesis. Of course all the information the book presents cries out for their side, but there must be research being done that disproves or attempts to disprove the connectionist approach - otherwise, why are all those ninnies still not on the bandwagon? Could we maybe read some research that is for the other sides of the issue, or that casts shadows on the connectionist framework?
I noticed how the architectures of the various networks had to change as the application changed and as the task changed. It seems to me, supported by the discussion of the R&M vs. P&M past tense experiments, that the more general the architecture, the more the propoerties of the network emerge as opposed to having been created in the input. In the network for dealing with reaching for occluded objects, the whole process of presepecifying there to be two modules because "Ungerleider and Mishkin (1982) provide evidence for the separation of these two pathways in the brain." seemed a little shady to me. Part of the connectionist understanding of brain processes is that what SEEMS to be the representation on the output layer is not actually or necessarily the representation on the hidden layer, and that these separate pathways emerge from the constraints of the system (Unless, of course, it's a hardwired thing, but that's not how the above reference sounds to me.)
It also seems that dealing with the occluded objects problem was a little shaky - representing the various objects as symbols, it brought me right back to the anti-GOFAI agruments we ran over in AI. The cube with lit vertices simulation and the moving body simulation were a lot more practical in terms of how the input was provided to the network - that it was provided very very close to how we would perceive the simulation if we were in the place of the network. Perhaps one of the greatest challlenges to simulators is to build simulators which if, instead of going to the input layer, the input pattern instead went to a screen, a human of the target age (in the case of this chapter, neonates to young adolescent) could solve the problem without any conceptual juggling. By conceptual juggling I mean the need to deal with "this represents that, that represents who, ...", etc. - to build a simulation we should just look at immediately apprehendable and understandable input. The further away from immediatley apprehendable and understandable by a human as sensory-level information, is the further away from real "emergent" behavior.
One thing I found myself wondering while reading about Elman's experiment on developing phoneme prediction from a string of phonemes was whether the experiment had been tried on many different languages - it might seem like this is a stupid request, and that nothing new can be learned by applying another language's phonemes to an experiment already proven in English, but what if there were some sort of English-speaking bias to the experiment that is only perceptible in the context of a non-English input pattern? This is one experiment where such a bias can be made visible just by changing the input - but what about the other experiments? It's so difficult to build a good experiment, it must be well near impossible to build a non-biased experiment. Most of us, on reading in the past-tense simulation that R&M jimmied the input pattern by adding more irregulars in the beginning and fewer in the end (and that the experiment would not produce the predicted pattern unless this was done), probably laughed at the built-in bias that input pattern produced - but what about all the architectures and representations that we have across the neural network canon?
In this chapter, we see a shift in focus from what networks do right to what networks do wrong. The networks described in this chapter were applauded, not for their ability to find solutions to problems usually thought to be strictly in the domain of human intelligence, but rather for their ability to make human-like mistakes when attempting to solve these problems. This shift of focus was interesting because one of the most important components of learning is how the network/organism learns from its mistakes.
Another theme covered heavily in this chapter is that of the overall pattern of a network's learning, and how it corresponds to the pattern of a child's learning of a corresponding concept. The R&M simulation was initially presented as a worthwhile example of similarity between patterns of network learning and human learning, particularly of the English past-tense. However, due to the fact that this simulation had no hidden-layer and used the perceptron convergence procedure rather than a more sophisticated learning procedure like backprop, it was able only to solve problems that are linearly separable. I doubt that the acquisition of the past-tense is a linearly separable problem, and if it was, there would likely be little interest in how children learn it. Another problem with R&M's demonstration is the arbitrarily imposed 'vocabulary discontinuity' after ten epochs. If the training set is increased forty-fold at some point during the training, of course the network's performance is going to show a change at that point, especially if the increase of the training set changes the relative frequency of regular/irregular forms. It seemed to me that R&M were basically cheating by adding this 'vocabulary discontinuity' in order to force their network's performance to imitate patterns found in human learning. P&M's experiment used a backprop net w/hidden layer and achieved more impressive results.
Another example in this chapter that I thought was worthless was the 'balancing-beam' simulation. Of all of the aspects of human intelligence to try to simulate in a neural network, this seemed to be extremely trivial.
Elman's simulations of 'fast learning' showed the error of making certain assumptions about innateness based simply on the relatively young age of the organism. In particular, the (psych)textbook example of a person's ability to distinguish patterns of moving dots as biological motion was shown to be not particularly indicative of 'innate' knowledge present in the system. It just happens that both the network and the person choose the simplest explanation for the motion, which happens to be that the lights are attached to a certain type of structure (body), and then extrapolated from there.
on page 108 they talk about how some areas of the brain may be predisposed to do particular things. this is exactly what charlie was arguing a few weeks ago.
the stuff about how development gives optimal adaptation to the environment is important, but I can't figure out why I think so. Maybe it has something to do with the stuff that is talked about later with the beam balancing problem because the if the brain gets to develop, it can deal with one variable at a time instead of all at once.
the face recogonizion stuff: the idea of having two systems appeals to me. It makes sense that nature would program some kind of basic system into the brain so that the kid looks at the right things and gets enough data to survive. It avoids the imprinting problems that birds have. A baby will respond to human faces more that non-human faces because it has been primed.
the vocab stuff I have a hard time with the elmannet results. it seems that the net just "learned" to recogonize the patterns using statisital tricks and didn't really learn anything about how grammar works or any real vocab. Also, the spurt in the vocab in the networks, is that just from exposure to stuff or what?
The weight/balance problem being solved twice makes sense. At first, kids judge on one factor, then as they get more sophisicated, other factors come into play. So it makes sense that the network could be trained that way, first to establish the effect of weight, then distance, then both together. That's the way that ppl get taught.
All right, all right already! I'm convinced! Emergent behavior doesn't mean anything!
Truth be told, I really enjoyed this stuff, which did wonders for putting connectionism into practice for me. Especially all the language stuff, and especially that past tense stuff (R&M trained a SINGLE-LAYER network that could figure out the past tenses? What implications does that have? How "enough" is good enough for learning?)
On the other hand, some of this stuff seems like just spinning our wheels to me. On the one hand, the books are pretty cooked as far as the biological model is concerned (the network architectures, for example, range from the merely weird to the totally far-fetched as far as biology is concerned). On the other hand, while all this stuff is pretty cool, it doesn't advance industry-driven AI work all that much either.
Okay, so it largely discredits the nativists, or at least calls their ideas into serious question, which bodes well for further connectionist research, and that's good. But I don't quite want to make the leap yet that the object permanency research, for instance, has anything but the most tenuous connection to the way infants actually learn (or innately know) about object permanency. But I don't want to use it to build a computer vision processor yet either.
So which is connectionism mostly used for, and which is it more useful for? Is it that it can sort of model the human mode of learning and so it's for the cognitive psychologists? Or is that because it can sort of model the human mode of learning it can be mostly useful for goal-driven AI applications?
One more: so let's say the whole brain can be modelled with a whole bunch of these networks. Well then what ties them together? What brings about modularization in the brain? Is that innate, despite being so plastic? If we have thirty networks that do a piece of vision processing then how do we weld them into an eye?
Human minds run at a lot of clock cycles per second. With current networks and robotics, it makes no sence to attempt to train a network for the equivalent of 6 months. So all the evedence of what infants know after what seems to us only a brief period of development does not indicate that these things are in fact innate.
The evolution theory answer to why the develpomental period is that a fully formed human brain could not make it out of the birth canal of a woman who's hips are narrow enough to be remotely mobile. What effect does this have on thier theory?
If face recognition is innate, an evolutionary argument can be made for why such an ability would be innate can be made. So should we start trying to build robots that innately recognize wall sockets?
As always, prediction of the next letter, assosiation of words with images, etc, is all fine and good, but meaningful communication is still a ways off. If an AI produced the error "go-ed" from being trained on a stream of text (instead of a system designed to produce that error), it would be highly impressive, as it would indicate the formation of an accual gramatical generalization, as opposed to a statistical guess of the correct letters to occupy the space. Also, such errors are made in humans while trying to construct a sentance convaying a thought which it is felt is important to communicate. Currently, all AI's capable of language have no other needs which have such a pressing need to share ideas with us.
Doesn't the "neural nets act like this and so do humans, but this is what is going on behind the scenes in a NN, so it is probably that way in humans too" from the balance beam experiment violate the rules of which way results can be translated?
I really enjoyed reading about the work that's been done on the past tense in English, as well as the other work descibed in this chapter. I'm especially impressed by the quality of results produced using such simple simulations in such a variety of domains, a trend to which I would draw the attention of everyone who looks down on my interest in simulated worlds. It's amazing what a few simplifying assumptions can do -- not that it isn't amazing what they can't do. The applicability of any technique, including simulation, is entirely dependent on the application to which it is being applied. When neural networks are driving cars on real freeways, I want them to have been trained in the real world!
However, if we're interested in investigating the nature of cognition and intelligence, as pure science, we're not ready for the real world. We can barely construct, and certainly can't analyse, the complex models that will be necessary to explore real-world cognition. I think that the vast space of possible experiments lying between the real world and this kind of very simplistic simulation is, or will prove to be, a very fertile ground for connectionist and hybrid research. What I'd particularly like to see is some work with systems that are exposed to a medium-complexity simulated world and asked to solve several different kinds of tasks, particularly if it isn't easy even to tell which task is required in a given situation.
While this would indeed be disregarding the messiness of the real world, it would capture one feature of the real world and real intelligent systems (i.e. us humans) that much research neglects: we're not always solving the same problem, be it the past tense, or tracking and reaching fro objects, or predicting the behavior of balance beams. We do a lot, and if we're interested in modeling intelligence, cognition, and learning, we should ask our models to do a lot too.
i liked the chapter although it was sort of long and almost too elaborate on some of the examples. nevertheless, it was very interesting to see the lines being drawn between child development and neural networks. some of the experiments discussed were rather astonishing, especially the one about physical location and 'grasping' conducted by plunkett and marchman, which kind of grew over my head--that is to say i did not really understand it to its details but i did not go over it again. maybe the class will be enlightening on that issue. the experiment dealing with the beam balance was very interesting and there were two reasons that made me actually conduct it using tlearn. One was the pure fascination i was expecting from seeing that it actually works (despite the fact that i did believe the description), the other was that i wanted to experiment a little on the concept of hard-coded design of a network. tlearn properly learned the patterns after only 30 inputs and generalized pretty well (i looked at the connections between the layers and found very clear patterns representing fig. 3.20). however, is largely attributed to the initial design of the network, which is based on the implementor's appreciation and interpretation of the problem. when i tried the same patterns on several other architectures, no equally-good performance was observed. one could argue that this is related to the innate structure of a human's brain, but i am not willing to accept this 'excuse.' i would like to investigate networks that are nothing at the beginning, and then build themselves according to what they think is needed. the cascade-correlated network by shultz went a little into this direction and i think i will follow this path in my final project.
I thought that this chapter did a fine, if not rousing, job of describing the parallels between connectionist learning processes and human infant development. On the other hand, a lot of the examples seemed fairly contrived, especially in their input and training sets. Of course, this doesn't mean that their results are false, its just that I'm not certain that they weren't picking problems with 'easy' solutions. Another area where my ignorance makes it hard to say something intelligent is in the area of other learning methods. We all agree that nets develope very much like humans, but how about other error-reducing learning techniques (if they exist)?
I'm interested in the fact that many of the infant learning experiments in this chapter largely overstep the issues of object recognition or phenome recognition. This is excusable since they are more interested in characteristically cognitive behavior, but I'm curious about how the infant acquires the basic tools upon which the world is built. I know that the psychologists aren't party to this debate but it feels like the old top down versus bottom up dispute. Of course the Munakata and Mareschal networks are very interesting to me because they seem to address the issues of object recognition and temporal continuity, but they seem to be happy to use regular backprop. I'm very curious; how does your brain decide that it was wrong and then work to minimize the error?
I think one of my strongest reactions to this reading is that there was a lot of information in it, but not much of it was really solid. I have seen already that connectionism is a workable solution, and I didn't really need to read 60 more pages of proof of concept. I suppose if one was being exposed to this for the first time, it might be important, but I had even seen some of the examples that were gone into at length in this chapter in my other reading. The other frustration thing was that the descriptions were all very vague. I feel that the most important aspect of much of this type of network study is the actual training methodology used, and I feel like that was glossed over, especially in the more complicated networks where a study of it would be the most interesting.
Even though I found the discussion frustrating, I still thought some of the later examples in the chapter fascinating. I haven't really been exposed to much of the visual processing aspect of neural networks, aside from a discussion of stereo vision and retinas, so the attempt to model object permanence was really interesting. One thing I thought when I looked at the diagram of the Mareschal et al. network diagram was, "wow, diagrams of the brain look like this." Of course, this is still much much simpler than the brain, but I think the attempt to do something else other than three layer feedforward networks is important, because our brain has considerable more than three layers. In this example especially, though, I REALLY wanted to know more about the error processing/ learning algorithm than they mentioned. I suppose that it is probably complex enough that it would have been out of place in this chapter, but I'm still very interested.
On the other hand, I thought that the study of the balance beam problem was a little bit forced, and there was too much pre-built into the network to really be a good guess at how the brain might do something like that.
The reading this week was rather interesting. I must admit I was more interested in reading about the responses of the children to the different stimuli then in reading about how the same scenario was encoded into a connectionist net. However, it was good to see practically how you might model something like that. (Good info for final prjects perhaps. ;))
One thing that has been bothering me a little that was brought up a little in the beginning of the chapter is what, if any, learning is possible during gestation? Even newborn infants have existed for some period of time. However, I would only think that this could really influence language/hearing type things. Also, I do not know when the ears of a fetus are developed. Are they usable? Can they actually hear anything? This adds to the uncertainty surrounding language "predispositions."
However, the visual stimuli are a different story. I'm pretty sure infants don't see anything till after they are born, so the possibility of object permanence at 3-4 months is very interesting. However, I felt a little let down by the book on this point. Either that, or I wasn't reading close enough, but I never really felt like I got an explanation for why 3-4 month old infants were capable of object permanence. It just seemed that the book dismissed both strong forms of innate knowledge or developmental learning. The chapter literally said "It comes as no surprise to the reader that our connectionist perspective would suggest a more complex story." (pg. 158) Yes, that is no surprise. However, I never felt that they gave a story for what they thought was going on. Unless of course, I lost it in the transition to computer modeling. An explanation of what they believe is the reason for children recognizing some of these behaviors would be greatly appreciated.