The article on modular connectionist networks did well at bringing to light some advantages of a modular approach of designing a network architecture. The hybrid architecture presented, however seemed to use the gating network as a central authority, controlling the contributions of all of the expert networks. One of the important points of connectionist modeling is to get rid of such homunculi. So I guess this wasnt' connectionism in its purest form. Another problem I had with the architecture was that the gating network used a very convoluted system of equations used to determine its output.
Despite the modular network's advantages in breaking down complex functions into simpler functions, I found the arguments comparing this network to the brain to be altogether unconvincing. It seems that the brain would have to have some sort of modular structure, but it would not be so simple to draw lines separating these modules. Also, the connections between modules would likely be much more complex, with modules taking input from other modules, modules feeding back to the gating network (if the brain even had an equivalent of this).
the article about the lesioning of neural networks was very interesting for two reasons: one, it seemed as if removing hidden layer nodes and/or erasing connections in the network affected the network in a way to vaguely exhibit disfunctional human behavior. The actual data were only a close approximation to the human brain effects, but that is a factor that neural networks have to deal with in general. no neural network exists yet, which directly and exactly models the human brain, but all networks are approximations of better or worse quality. the second interesting point relates to the idea of concept categorization of concepts in human brains. hinton & shallice called their paper "lesioning in attractor networks [...]", and they describe the idea that semantic 'pools' draw the input data towards them. The output then corresponds to whichever one of those attracting pools the input ended up in. the same theory exists for categorization effects. if presented with a penguin, an uneducated subject is faced by the decision to classify this animal as a bird or some other type of non-mammal. The theory suggests that there then exists a pool for birds and a pool for the other category (which may overlap a little), and the concept of a penguin is eventually sorted into one of the categories. on this basis, human beings judge categorization, at least this is what the exemplar theory puts forth.
the article about modular networks i found very interesting too. after all, this is exactly what i want to do my final project about, so thanks lisa and robear for making us read this. i certainly believe that modular networks have a great potential at universal task solving, but I was dissatisfied with two points that came up in the article. One is a general point aimed at the network architecture, and the other at the initial design. Let me start from the back. In section 5, the authors propose to incorporate knowledge of the task to be performed into the network by specifying the structure of some of the subnetworks. although the authors discourage the reader from doing so, i think it is completely besides the point to let any knowledge flow into the system from the implementer's side. one strategy towards modeling cognition and other systems using neural networks is to take the burden of implementation details completely off the shoulders of the implementer, and thereby reduce the bias that can flow into the system, so i believe that the modular 'system' should create and design its subnetworks completely on its own.
the other point addresses the emphasis the authors put on the subnetwork idea. In the what and where visual task, one could have uised to completely separate networks to do both tasks independently, but the idea was to have one system develop an understanding of the task. the way that jacobs et al. emphasize the subnetworks make me think about two separate networks, that are simply put into one box because the input and output layers are shared. nevertheless, a clear separation of input and output nodes can be found, one for each network. i believe that the whole system should not have such an obvious internal separation, unless it is used for analysis and debugging of the actual process that the network performs.
The last article kind of zoomed by ...
Task Decomp: Crosstalk is something I had heard about in passing and understood in general but it rather impressive exactly how drastic it can be. I remember all the hoops they jumped through with ALVINN to reduce temporal crosstalk. I am also remembering someone writing disparagingly that connectionists have problems with crosstalk so that they justify the hippocampus by saying it is an input seperator, etc., for the brain. This seems to be the direction thay are taking with saying that modularity is more biologically plausible. If they are right, it would mean that there is some gross pattern in input which could tell the gate network which expert net to use. But if the input were more complex and the line between the two expert nets areas of specialization was not very simple, the gate network would practically have to solve the problem in order to be able to properly delegate tasks.
Symbol Grounding: Back in the land of ambitious claims I am a little suprised at their audacity. They really are serious about drawing as many detailed parallels between the net and a child as they can. This brings up the question of the extent to which we will believe that a net models the brain; is it 'thinking like a kid' or is it just a gross approximation? This isn't a criticism of their model, which I think is really nifty, but rather their haste to liken its behavior to a humans. On a certain scale I agree with them, but on the other hand it is hard to buy. Maybe I'm just having a midterm faith in neural nets crisis.
They brought up the idea of over- and under-extension. I'm interested in how people who try to defend innate symbols (meaning static symbols, otherwise everyone is agreeing) cope with the shifting around that symbols clearly do.
Lesioning: I really like the idea of familiar concepts being attractors. It seems to be a very resilient and dependable system which is bound to give you a decent answer, even if you are presenting it with novel data. I'm also recovering my faith in the applicability of neural networks as models of the human mind and brain; these authors were much more careful about presenting arguments for the parallels they put forward.
I found these articles interesting, especially given the connections made to biological knowledge. The Hinton and Shallice article was very interesting although I was a little skeptical of the assertation that if lesioning a network in a manner similar to that of brain lesions produced similar results, the appeal of connectionist networks as models would be strengthened. I was thinking about what would happen if you looked at some different domain, perhaps something less human-specific. Lesioning certain areas of a simple ganglia in an invertebrate would certainly cause a response. Would a neural network be able to mimic that too? It seems to me that lesioning might be a good method for "checking up" on your work, but I'm not entirely convinced that their case for similarity is particularly srtong.
I really like the fact that Plunkett et al brought up the question, "where does the univocal structure of internal structures come from?" I'm interested in, again, trying to answer this question for simpler, but still biological viable, nervous systems. Are humans the only animals with symbolic knowledge? How then do birds, for instance, know to avoid a poisonous orange butterfly after they've tried to eat one before? Not only that, but they will avoid the color orange in general. Is this just as simple stimulus response? What is the nature of the structures of learning and memory that support this reaction? If it is to any extent developed over the course of the life of the animal, how does that speak to the development of the human symbolic knowledge?
The last article mostly just left me wondering about how a module would be "encoded" within the brain. How would that come about without being prearranged? I'm not sure I can envision how a modular neuron group would arise.
Emergence of Symbols and Symbol Grounding
In good old fashioned AI, it is assumed that perception and action are relatively easy when compared to cognition, and cognition is modeled as symbol manipulation. Given these assumptions, the most interesting question to me is: How are symbols formed (emergence) and linked (grounding) to phenomena in the world? This question is what drew me to connectionism in the first place. In AI the focus is on "in-the-head" problem solving, but in connectionism the focus is on linking perception and action through mediated layers of units.
The Plunkett et. al. model seems to offer a plausible initial explanation of how labels and images might be linked to perform comprehension (i.e. hearing a word and retrieving its image) and production (i.e. seeing an image and saying the associated word). The model exhibited a number of well documented developmental phenomena seen in children, such as the asymmetry between comprehension and production. Initially children can understand more words than they can produce.
Although this model addressses the symbol grounding issue fairly well, it barely touches on the emergence issue. I would have liked to see some post-hoc analysis of the 50 unit hidden layer activations. What kinds of clustering is going on? What sorts of abstractions are being made by the network?
Modular Architectures
Just before the conclusion of this article, the authors state: "...we do not advocate providing the modular architecture with a large number of different expert networks. Rather, the experimenter should judiciously design a small set of potentially useful expert networks..." (page 245). To me, one of the most interesting issues with respect to modularity is precisely how a collection of various network modules would self-organize to solve a problem. By hand-picking an ideal set for each task, this bigger issue is side stepped and we probably can't learn much about how the brain might modularize. Could this model be modified so that the linking of gating networks to the output nodes could be emergent rather than pre-specified? That said though, I do find this architecture useful as an engineering tool.
In the temporal crosstalk experiments, system 1 has 630 weights, system 2 has 1260 weights, and system 3 (the modular case) has 2124 weights (it is 60% bigger than system 2). Using system 2 as a control for comparison with system 3 is somewhat problematic due to much larger size of system 3. Perhaps just having so many more weights makes the problem easier to solve.
Lesioning Networks
Hinton and Shallice explored three ways of lesioning a working network that mapped letter strings to semantic features to try to produce deficits similar to dyslexia:
(1) disconnecting a certain proportion of randomly chosen links between two layers;
(2) adding uniformly distributed noise to every weight between two layers; and
(3) removing a specific number of units within a layer.
In every case, the lesioned network produced the errors typical of dyslexia: semantic ("mice" for "cat"), visual ("hat" for "cat"), and mixed ("rat" for "cat"). The authors stress that the connectionist framework provides a very different interpretation of the data than a classic information processing model would.
I have very little to say this week, as may not be surprising considering the fact that I didn't read enough of all three articles to have an opinion. That needed to be said first just to get it out there. I will respond to this weeks reading next week, when I'm not completely insane.
The one thing that did make me think (when I was able to make sense of the jargon-babble the writers liked SO much) about the lesioning attractor networks article was the relation that all these projects have to have to what actually occurs in the human brain, and trying to model that explicitly. So much of what I've been thinking about recently in terms of project ideas and network articles centers around what I or people can get networks to do, while this article really tried to focus on getting a network to replicate the behavior of acquired dyslexia patients, and once that was modeled, to analyze what had to be done and what the relationship could be. The problem with this process is still the same problem that marcus had, that we really have no idea if any of the parallels that we want to exist between ANNs and brains really do exist, and if all the parallels we can draw between these attractor networks and brains are simply pulling the wrong analogies.
I wish that people could write more clearly as well - I'm no genius, but I'm pretty smart - I shouldn't have to wade through every ridiculously layered sentence these writers like to use to get the information that should be apparent. Just an observation.
Well I should have read the modular task decomposition paper before I worked on my midterm paper. If only I'd know. For one thing, they articulate a source of error which I identified, but not all that clearly. In my paper, I determined that a network with separate parts for each output performed better than a network with one large hidden layer. Now, I know that this is because the smaller modular network was not subject to spatial crosstalk. While I do wish I had know this before, I feel sort of vindicated that others have drawn the same conclusions.
It is also exciting to see that there are easily implemented ways to perform modularization. I'm a fan of the theory that three layer feedforward networks are just not a good solution to many problems, and again I'm glad to know that there are good workable alternatives.
I suppose that this sort of modularization of networks could potentially even be a good solution to the spatial variance problem of identifying something no matter where it is. If some sort of mediator could determine approximately where something might be, it could be adjusted by the mediator into the correct area of input to a main network, where it could be identified.
The second paper I read, Symbol grounding and emergence, while less readable than the first, was still very interesting. Unfortunately, it did not approach the problem of symbol manipulation directly, but even just to allow some symbolic representation to be associated with a fuzzy input is a good start. I guess I have issues with the way the 'symbols' were represented with a localist representation, as I don't think that localism has anything to do with good neural network function, but I imagine that the network would function just as well with a distributed representation. Perhaps it would even be possible to force the network to come up with its own 'symbol' for each group somehow.
Even though I find the idea of attractor networks, especially the way the third article uses recurrence to implement them, I was put to sleep several time trying to get though it. Dense and sort of dully written. However, I think that the idea of using attractor networks to assist in learning pattern mapping is a neat idea, and I may try and use it in my final project. I'm still a little uncertain how the training was carried out though.