CS/PSYCH 128 student reactions to week 13 readings

Week 13 Reactions


Craig

The article by Shastri and Fontaine was very interesting, but a little confusing to follow. I think most of that was due to trying to read it very quickly. It was obvious the effort that was put into explaining carefully how they went about working on their experiment and unlike last week's article by Hinton and Shallice was something that would probably have been easier to try to recreate if looking through the paper carefully. I think it was very valuable to have read a paper that was explained carefully after discussing the importance (and illustrating) of detailed writing.

I thought the experiment was a very interesting one. Obviously of value (practically) to keeping those postal workers sane. ;) I thought the problems that were faced in determining what the proper number was was interesting. Obviously preprocessing was a big part of this project, and the crude network could have been considered preprocessing in some ways as well. That leads to the question of how often people have tried to use neural networks to be trained to do the preprocessing on data? I don't necessarily know if that would be advantageous in any way, but the idea seemed interesting to me. Finally, while looking over the attempts by the network to classify things, I found myself making the same mistakes on some of the zip codes. The error that network had in classifying the zip codes was 66% (on an absolutely correct level). This might have been more interesting had they told us the percentage that actual postal workers get wrong when they don't consider context or any outside information other than reading the zip code.

Chaos

i was really pleased by the research done in the first article. specifically, they seemed to be more concerned with creating a simulation that could produce a certain behaviour (reading zip codes) than with showing that this could be done using a neural network. they identified aspects of the problem which they thought could be solved better by a neural network than by prespecified algorithm, but implemented these parts in conjunction with a controlling algorithm, and with modularized, and in some cases, pretrained, networks. their program tells us nothing about how humans identify zip codes, but it seemed to work relatively well.

that said, i wonder what experimentation they've done since this report using this or a similar system. it seems that it would be fairly easy for them to implement some domain knowledge (such as a list of existing zip codes in the u.s.) which could be used for reality-checking, and i wondered how difficult it would be to develop a similar system for recognizing state codes (there are only 50 of them, after all, which is more than 10, but still relatively small), then integrating the two systems to create some kind of self-checking system that could figure out where to send actual pieces of mail. if i'm not mistaken, though, the post office uses automated mail-readers already - presumably ones which use some of the techniques to which these authors compare their machine.

i'm really not sure at all what the second article is trying to accomplish. i see it as saying that dynamical systems are useful for simulating behaviour in different ways from computational systems (prespecified finite-precision state machines or turing machines). maybe i'm being a naive math person here, but does anyone really question this? my understanding is that computational biology uses dynamical systems to model all kinds of things, and has been doing so for quite some time. is he simply trying to make this case that ai people tend not to think in those terms? obviously, neural networks are dynamical systems - dx/dt is a function of x. but, given that, this article seemed mostly just to be reiterating arguments we've heard before. granted, the specific modelling examples looked pretty interesting, but suffered from the two missing pages in the middle of them.


David A.

The autonomous agents paper was certainly interesting. It is great that he can use neural nets to do interesting and concrete things that involve actual real world phenomena. How he relates this to the basic theory of neural network computation is a little less clear; it is only clear that he thinks there is some basic theory. He waves his hands about dynamics, but what does that mean? Dynamics to me would just mean that the neural network interacts with the environment. Does he mean to say that it is this interaction which allows the network to function in a way relevant to the surroundings? I would think that it would be input from the environment, not output to the environment, which would be more important. Even if the action of the network causes change in the surroundings, it is still input which registers this in the network. He talks about setting up his leg controllers which only sometimes have feedback from the environment, but there has to be some teaching signal. What is that signal if not feedback?

As for the handwriting, I just have one question: How is his method of scanning the line of text and then distributing the input using time delayed connections different from entering one block of image at a time? It seems like the first hidden layer is still getting all of the input at once. Wouldn't it be easier the other way?

Other than that, though, I think the handwriting paper is neat. It agrees with what was discussed in class: Smaller, simpler, interconnected networks, logically designed, can solve complex problems. The paper was a little bit difficult to work through, because of its free use of acronyms. But, it is a very impressive integrated system. I particularly liked the method of using a sigmoid function as a teaching signal. It is really the only way I've seen so far of dealing with teaching a temporal network to identify things.


Martin

beer's (man, i want that name...) article asserts the following in the beginning: because implementing the behavior of fluids on computers would make computationalism true of fluids, therefore implementing human behavior on a computer makes computationalism true of cognition. i hope he did not mean that. after all, a fluid (or more, its behavior) is defined and stated clearly in physical laws, and therefore precisely predictable. Clearly, a human mind is neither, not grounded on laws and not predictable. this is a bad simile to use when justifying the use of computers in cognition research, but i could not think of another one, other than taking examples of cognitive models and demonstrating how closely they resemble human behavior.

i like how he emphasises the independence of a computational model's internal structure and its performance. calculators can compute numbers by shifting bits, which are a form of numbers, if you want to look at them that way. a cognitive model does not use cognition to mimic cognition, but its own inherent structure.

the application of the network in the domain of food chemistry i thought was mad cool. i adore chemistry and i adore connectionism...add the two and make me happier :-) also, training the legs of the robot later on in the section is an interesting application. The whole idea of state space graphs in multiple dimensions makes me think of the approach to problem solving, where an environment (task to be solved) is presented to the network in form of a multidimensional non-flat graph, and in order to find the solution, the network walks the graph by way of steepest gradient. (nilsson, _artificial intelligence_, p. 77)

it seems as if the decision to assess a model's behavior with respect to its own body rather than with respect to the real environment is simplifying the problem. what is the difference with modeling the environment and setting the environment to be a predefined and known 'hull.' i don't really know why i am picking up on this, but it sort of caught my attention.

in the second article, i was unclear how the spacio temporal approach of input feeding encodes the appropriate information for a traditional static network. a network without recurrencies could not care less about what happened a time step ago, right? i am confused...

and there are a lot of acronyms in this paper.

it was interesting to see how well the network learned to perform on 5 digit zip code recognition. i am wondering why the usps does not employ such a system yet. okay, 96% may not be accurate enough, but if we increase the rejection rate from 0 to 5%, we can achieve figures like 98%, which is a lot better. the rejected 5% could then be completed by a human being. hey, this is exactly where computers will replace human beings in the long run.

in terms of word recognition, i would say that it is easier to recognize full words rather than digits. the network can do the zip code task, and the word task should be a lot easier despite the fact that the set of input data is not based upon 26 characters + 10 digits. Why is it easier then? Because the output can be compared and pattern matched with a dictionary. okay, this would not be a perfectly connectionist approach, but i do not think that pure connectionist models have the potential to solve tasks like this one by themselves.

oops, wait, the 96% was for the single digit task...so for zip codes we have a 66%, which is rather bad and unusable.


Nathaniel

Computational and Dynamical Languages for Autonomous Agents:

In addition to a really cleanly though out approach to training simple agents to perform real tasks, this paper offered one of the best justifications of neural nets (or dynamical systems) I have yet seen. Beer was very good at formally comparing computational models to dynamic models, and although I had understood how each of them worked independently I had never really lined them up to see how they differed before.

Almost as a side issue, and casually slipped in as passing comments, Beer had some very solid critiques of computational and symbolic AI. His proposal that just because humans can do computing doesn't mean that they are essentially computers nicely undermines one of the most basic premises of GOFAI. At least as I understood it the reasoning went that at some level humans reasoned symbolically, and that therefore we could build a symbol-based model of that level. But Beer seems to suggest that really humans may not be reasoning in strictly symbolic ways: they only simulate it from time to time.

Another glancing shot at GOFAI is his statements that in a computational model the causal structure of the system must be isomorphic to some computation. Although he doesn't bring it up directly, this is a criticism of structural innateness.

Recognizing Handwritten Digit Strings Using Modular Spatio-temporal Connectionist Networks:

I enjoyed this paper because of the clearity and detail of its description of the system that they constructed. And I was really impressed by the complexity of the system. This is probably the largest and most complex neural-net based system that I've read about, and it is downright exciting to see people building these large systems. And yet it made me wonder at how far we are from building anything that even remotely approximates a bird brain.

Those models that blew the doors off of this one in terms of recognition rates: what sort of systems were they?


David P.

I found that Shastri presented some intresting ideas about how to represent the data used for handwriting recogonition. The idea of sweeping across the letter is one that I had not seen before. After thinking about this idea for a while, I wondered how I would use it to make a recogonition system, and I found that I came up with the exact same system that they did. I like the modular approach they took, with having a network setup to filter the input stream when the digits are touching. It makes sense to me to do it that way.

The Beer article started off quite intresting, but got rather bogged down. The stuff about computation was kind of intresting, but I didn't find it particularly enlightening. The one good point that he raises though, is if we should model systems as huge complicated things, or as collections of simple behaviors that work. Personally, I feel that there is a point where a distinction can be made, but I have no idea where that point is. For example, the insect robots do a really good job of simulating the movement of insects with simple behaviors and processors that control each leg. On the other hand, people walking is definately not the legs walking by themselves. I think the Beer is on the right track, by proposing that maybe a lot of the stuff we see in the world is as simple as it seem.


Martine

I thought the readings this week were among the most interesting we've yet had, so I have a lot to say. In the interest of your time, I'm going to try and condense them a bit.

Even though it was sometimes hard to understand what all the different modules were doing in the Shastri and Fontaine paper, I thought that they did a good job of explaining their solution to a real-world applicable problem. The idea of looking at a spatial pattern as a spatio-temporal sequence made a lot of sense to me, and I was surprised (maybe even skeptical) that no one had thought of it before. I mean, words on a page are definitely static for us too, and yet reading is most assuredly a temporal process. While I was reading, I was struck by a number of things. The first was that I seem to remember hearing, sometime when I was much younger, that the mail was sorted by machine. Is this just a figment of my imagination, or is there already some mail sorting that is done by machines? And if so, how? Also, how will their system deal with international mail, especially that which contains 5 digit number sequences?

They also claim that their system has about 70% accuracy, in the "worst-case." First of all, I would be interested to know what the human (or however mail is sorted now) error rate is. Secondly, it sees a little silly to me that their "worst case" means mis-identifying one digit. But if this system were actually in use, missing one digit is as bad as missing all of them, practically. Certainly, missing one digit at the beginning of the zip code is much worse, in terms of where the letter ends up, then missing 2 at the end.

Finally, I would have liked them to discuss other models in more (or any depth). They give a lot of statistics (our system had this accuracy, theirs had that accuracy), but didn't explain in detail what the differences between the systems were. And since their percentages were not that different, I wasn't all that convinced of the superiority of their system. Why can't they compare them anyway? Shouldn't they be able to compare based on the fact that their goals were the same?

The other article spoke directly to what I am interested in doing for my final project. I thought that the way that Beer discussed the importance of the interaction between the "brain" and its environment as well as its body was fascinating, especially since many of the other networks we have looked at don't really address this issue, and seem to be operating in a fairly isolative context, where learning takes place only in one "dimension" (ie. abstract right and wrong) and is not affected by the constraints of the "body" is is inhabiting -- in fact, in most cases, the body and environment are not introduced at all, and the "brain" is expected to learn without these inputs. In my reaction to the first paper, I talked about the fact that there is increasing evidence that a great deal of function, even in "higher" organisms, is a direct result of a given morphology, either of the body or of the environment. Thus, some actions, such as walking (sidestepping) over uneven ground have been hypothesized to operate without constant sensory input -- which seems to be what Beer found in his model (p. 138)! I guess I was most curious about how, exactly, this method is different from a connectionist model as we have seen from Elman et al.


Jon

Beer: The distinction between a modeler's use of symbolic computation to mimic the behaviour of a system and the system's actual internal use of symbols to produce this behaviour was quite enlightening. He also points out that "the brain is no more obviously a computer than is a thunderstorm, a solar system, or an economy." We can model to some extent the behaviour of all three of these systems using various equations and computer-based simulations. However, this does not necessarily imply that they themselves use these types of symbolic equations to produce their behaviour. A good example of this is the motion of planets in their orbits around the sun. These elliptical-shaped orbits were modeled quite well by the mathematical equations for ellipses. However, Newton's Theory of Universal Gravitation later showed that the elliptical shape of the planets' orbits was an emergent property arising from the universal attraction of bodies of matter. Subsequently, as pointed out in the article, phenomena, such as the precession of Mercury's orbit, that were previously viewed as anomalies in light of Newtonian mechanics, were accounted for by theories of relativity.

Shastri/Fontaine: This article took a hybrid approach to recognizing handwritten digits, a task with many practical applications. The spatio-temporal method of scanning the image seemed to offer many advantages, one of which was to make the network easier to analyze. It seems that this type of hybrid approach makes much more sense than a more hands-off, purely connectionist approach when trying to tackle a real-world problem like recognizing zip codes.


Ben

1. Recognizing Handwritten Digit Strings

I like this model as an example of an engineering solution to a practical problem. The authors don't claim to be putting forward a model of how humans perform the task their network does -- they just get the network to do it. The system they designed incorporates a lot of features which really intrigued me. First, the hybrid overall architecture the authors used draws a good line between what neural networks are really good at and what symbolic decision makers are really good at. As they said, their architecture would support the symbolic-end incorporation of additional domain knowledge, such as what-follows-what statistics, etc. I would have liked to see further investigation along these lines.

Second, I was really intrigued by the authors' use of a temporal dimension because it really does solve problems such as length variance and shift invariance in a very natural way. As the authors say, it is also more true to human image processing in a variety of cases -- humans can't take in a complex image in its entirety, but scan from area of focus to area of focus. How much smaller this approach makes the network is an added plus.

As to the authors' use of pre-programmed feature recognizers, I think it's a good idea as long as it works, but I'm interested to see how the results compare when the feature recognizers are (a) not included or (b) allowed to change during training along with the rest of the network.

I was really glad to see that the authors found a nice, huge data pool. I was quite disappointed with our past-tense friends when I heard that their real-world data was from only a few children. It's really important to have a large enough data pool. Overall, while it's not going to open any major doors to an understanding of the mind, this is a very practical, well-designed system.

2. Computational and Dynamical Languages

This was an interesting discussion of what it means to say that a system is computational or dynamical. To a certain extent, I think that it's quite easy to overplay the difference -- computational theory is quite general over discrete input-state-output systems, just as dynamical theory is general over a broad class of contiuous input-state-output systems, and any such continuous system can be approximated arbitrarily well by some sequence of increasingly complex discrete systems. So what does this have to do with "what's really going on in there"? What's really going on in there is that what comes out of the system has some relationship to what has gone into the system in the past and to the system's state, and this relationship is the result of the components that make up the system. In a general sense, talking about a general system, that's all you can say.

So, in order to understand a system as a cognitive system, you have to impose some structure on it -- you have to see a structure in the way the input and the output are related, and then look for some reflection of that structure in the organization of the system's internal state. The question the author is asking is "what kind of structure?", and I have two responses. The first is, "don't always expect the same kind of structure, but don't ignore what kind of structure you do find," and the second is "what's the difference?"

First, about not always expecting the same kind of structure. Consider us humans. Today, I walked to math class. Like the author of this article, I am very skeptical about the prospect of a computational account of my ability to walk. At some level, even if someday someone showed that the brain is just a binary computer (don't bet on it), the *best* description of locomotion would still be dynamical. Then, I got to my math class. I am equally skeptical of a purely dynamical account of my ability to make sense of mathematics. Even in the face of a dynamical description of the total brain (much more likely), I still think there would be a level of description on which I *really am* operating with symbols -- perhaps fuzzy ones, but symbols nonetheless.

In between locomotion and mathematics are a wide range of activities which humans -- and other cognitive creatures -- do in the course of their daily lives. Some may be best described dynamically, some computationally, and some may lie well enough in between that neither approach alone really does them justice. The theories "it's really a digital computer" and "it's really an analog computer" might not be independently falsifiable, both because *it* could very well be somewhere in between and because, in the limit, digital and analog computers are functionally equivalent.

So, what about the author's models? I'd like to know more about the optimization system he used, but I think his models explore problems that are good examples of what *is* best describable in dynamical terms. I want to see a picture of the behavior of the less common solution to the chemotaxis problem! The more common solution is fairly intuitive, but the other one sounds interesting. With the walking insect model, I found the comparison of systems evolved with proprioception (sensory input of where their own legs are) sometimes, always, or never available. As mentioned, the tripod gait seems to be the best solution for six-legged walkers. What about eight legs? Dozens of legs? What if the legs differ greatly in length? This could be extended into a really interesting biomechanics problem. The chemotaxis model could also be extended. The world of dynamical systems is fertile ground for further research -- let's just remember not to get too preoccupied with what ultimately comes down to a choice between different kinds of mathematics.


Josh

Beer: Mmmm.... beer. I have to say, first off, that this reading would have gone much better for me if I had a fundamental grounding in dynamical systems theory. Since I don't, this paper appeals to me much more in the opening sections. (By the way, if anybody can recommend good reading introducing the idea of dynamical systems, I'd be much obliged.) So I was very pleased to see someone question the computational model for physical systems, mostly, to put down the idea that when we throw ideas about computation around we're frequently confusing our definitions, and also that the computer is merely the most recent model we have for the operation of the brain ("Descartes had his water clocks," etc.) But it seems to me he's giving AI too little credit -- from a theoretical perspective, it seems to me that neural networks are themselves more given to a dynamical system analysis than a Turing machine-style computational one. Nevertheless, it's nice to call the big picture into question. Another interesting choice here is his using GA to evolve recurrent neural networks for each module in his automata. He's careful to say that his approach works for any such dynamical system, but he never defends why he chose that method of creating the networks -- is there something inherent in it that makes dynamical systems work better with dynamically evolved networks?

Shastri and Fontaine: Now this is an appealing process, and a network construction that I understand. :) The main thing that appealed to me here is the idea of not using neural networks for the Greater Glory of Humanity and Cognitive Science but the refreshing idea of evolving a heuristic expert system (of, shall we say, variable abilities to generalize). Cool stuff, and a welcome alternative to both the "here's something that humans can do that neural networks also can do!" and the "here's something we can do with neural networks that we don't care about doing, really" approaches to connectionist research we've seen this semester; see the Beer again for why these approaches are troublesome.


Simon

The handwriting article described a system not unlike one that I used for my final AI project, in which there were several small neural network modules that were trained to recognize different patterns in the input (in their case, slashes, in mine, patterns on a 3x3 grid). These modules then had their weights fixed, and were nested within a greater network. The way in which my model differed from theirs was that they had updatable connections that ran from input units to a hidden layer that was parallel to the fixed networks. The purpose of this active hidden layer was unclear to me, maybe that it was intended to allow the network to process information outside of the fixed networks domains - information like how the features might relate to each other.

What would interest me even more would be if the network identified its own modules - this is what were going to try to do with my existing architecture, hopefully. What this means is that the backslash detector and the bar detector were not created and trained by the experimenter and placed in the architecture, but they were delineated by the ANN, and trained by it.

Which may not be possible with existing ANN algorithms - an approach I like is using GAs to develop the architectural information for the network. In the insect walking article, Beer used a GA to develop the connection weights, the biases, and the time constants for each layer, instead of using traditional backprop through the sensor information. This didnt seem to make much sense to me - isnt there a normal error space associated with the walking problem or the chemotaxis problem? CAN you use gradient descent for this application?

Weve been exposed enough to the dynamical systems hypotheses to side with Beer almost automatically against the computational theories of cognition, but I wonder if his use of GAs to develop his networks might have taken out some of the robustness of his argument - it seems to me that backprop, because it uses completely local information, and nothing holistic, might argue more strongly for dynamical systems that a GA, which focuses on holistic changes in genotype to follow the error to its minimum.

Beers argument in his conclusion is contradictory to what he said earlier, and also brings up Charlies point from Rethinking Innateness about the partial differential equations chapter (Look, its magic!) Beer writes: In general, there need be no clean decomposition of an agents dynamics into distinct functional modules and no aspect of the agents state need be interpretable as a representation. The only requirement is that when coupled to the environment in which it must function, the agent engages in the patterns of behavior necessary to accomplish whatever task it was designed for. So by this argument, what happens between input and output isnt able to be interpreted - which is in contradiction to the rule extraction procedures we read about two weeks ago, which attempted (somewhat successfully) to assign an algorithm to the functioning of a network. Just because dynamical systems are complex doesnt mean they are irreducible, or that there should ne be any attempt to reduce them to understandable chunks. This isnt an argument for computational theory, just an argument against dismissing the interaction between input and output as too complex to consider.


Nik

Every now and then, I think that the Shastri and Fontaine project is what neural networks should really be about. Ignore biological plausibility, don't try to emulate a standard congative task, but use neural networks as an extremely useful computational tool for enginering applications. The network they built is one of the most impressive I've seen as far as being extremly complicated but still well thought out in degign. I don't think they ever mentioned excatly how they trained each of thier moduals, based on our recurring discussion if the subject, it would be interesting to find out how they did it. The Beer article is the other extreme, simulateing what should be the most possible type of life to simulate, insects which for the most part completely act on instinct. As a study in biological plausiblity, it would be interesting to simuate the behavoir of one of the more deterministic insects using an evolutionaly search of possible archetetures, then map out the incect's nervious system and see how well they corrilate. Once again, I have the same complaint as usual with the over-glorification of dynamic systems, since training in respose to an environment simulated by a dynamic system may or may not have any relation to responces to the real world, since there is always the possibility that the system could find the underling dynamic equasion, no matter how cryptic it is to us humans. On the other hand, if it pulled that off while training in the real world, that would be highly cool, if we were able to extract it.