CS/PSYCH 128 student reactions to week 8readings

Week 8 Reactions

Martin

the past tense article was very straight forward in that it tore apart an experiment conducted on the past tense problem. it is astonishing how data seemingly is falsely represented because of slight modifications of small factors that do not necessarily stand out at first. why do plunkett and marchman juxtapose two graphs with the obvious intend to fool the reader into recognizing the similarities because the y-axis unit differs among the two?

the constructivism article was very interesting. i must admit that i have not noticed the fact that elman et al. actually failed to introduce models appropriate for their theories.

for my final project, i intend to do something along the lines of creating a system which has the ability to instantiate and tie in new modules if need be, that is if the current problem cannot be solved with the hardware available at that very moment. this is essentially very similar to the theory which states that areas in the brain are not 'born' for a specific task but have the tendency to accomplish one task. furthermore, this is essentially the idea of the concept of 'starting small.'

obviously, each module in such a system would be somehow hardcoded, but as soon as we accept the fact that one such module could be representing only one neuron rather than a set thereof, this staticity becomes justified by reason of neurons being closely similar on a modular level.

in the second article, marcus also addresses cascade correlation. he does lay out the imperfections of this approach, mainly that even though a network based upon cascade correlation is capable of approximating virtually any function, this approach is not realistic because of time constraints and inaccuracy resulting from the approximation. here, instead of adding subparts to one network, it may be an alternative to instantiate another network along with the first network, and keep the two as one entity.

finally, marcus proposes his theory of how general purpose generalization and categorization could be achieved. in some way i found his proposal to be some sort of bad joke for it stated nothing new. marcus says to restrict rules to certain groups of items only, and to provide methods to expand the rules in order to be able to expand the set of items they affect. if we take the word recognition problem, it is obvious how this already applies: the group of items of consideration contains words, but other concepts such as color are left out. the rules used apply only to this group, but may be extended to include additional members (through the addition of new nodes in this case). fair enough, marcus does argue that such classes should be on a much higher level, i.e. distinguishing nouns from verbs, but the idea is essentially the same.

David P.

On past tense: Marcus seems to saying that comparing the network to a human is a bad idea and I agree with him. The input to the network is different than that which a human child gets. Additionally, the training regimie gets changed in the middle of the experiment for the simulation, further deviating from what a human would get. The enviroment is just too important in development of past tense to neglect the way the experimenters do. Also, you really can't compare the ages. The number of epochs that the network goes trough doesn't necessarly have anything to do with how old it is.

Furthermore, the size of the vocab that the network cannot come close to that that the kid gets, nor can it compare for varitey. It seems that these ppl are just fiddling with the input to get the output that they want.

On constructivism: I didn't really understand this paper. I can't figure out what Marcus is saying. At the begining he is complaing about the models in RI, then switches to griping about cascade-correlation. I agree with some of his compliants, espically those about how all the examples used could be seen as rigged demos, because the creators of the models set up the training data in such a way that the output would give a yes or no answer. However, he doesn't present anything that works any better. It seems that he is saying that just because we can't make a network that really works the way the brain does, we should abandon this like of research entirely, which I don't agree with. even if were going down the wrong track, at minium we are expaing our knowledge of how a useful tool works, and quite possibly making something that can solve problems in a very different way than humans do.

Jon

Marcus(1995) did well at shedding light upon the questionable graphing practices used by PM(1993). The most striking of these is the change in the x-axis variable between the graph of Adam's performance and the graph of the simulation's performance. The x-axis in the Adam graph represents age in months, while the x-axis variable in the graph of the simulation's performance is the size (in words) of the simulation's vocabulary. This change, according to Marcus, was not discussed by PM. Since there is likely not a linear relationship between these two variables, the lack of explanation for their use in PM casts doubt on the validity of the graphs as a basis for similarity between the learning of the simulation and that of a child. It appears that PM chose to use a kind of "sleight-of-hand" in their graphs to support their claims of similarity.

Marcus(1998) took aim at RI's stance against innate representation. It seems ironic that despite this stance, all of the connectionist models presented in RI inherently manipulate representations, i.e. the input and output each have arbitrarily assigned representations. What I found most damaging to the validity of RI's models is that for each particular problem, RI came up with a different network, each with its own set of constraints, architectural and otherwise. If RI were to have come up with a network that could, say, differentiate between nouns and verbs _and_ learn the past tense, such a network would seem to provide a more credible model of human learning.

Martine

These articles were very interesting because although I disagree with RI often, Marcus' comments made me want to defend RI's views. I guess this kind of dialogue is useful, perhaps even common in the psych/AI realm, but I couldn't help but wonder if this guy had anything better to do with his time then just critique the work of others. He made a bit of an attempt to create something of his own (instead of just critizising), but it was less than 2 pages out of 30.

He takes serious issue with the absolutist views of RI, one of which being their arguments against innate representations. I agree that this may be too much of a claim to reasonably make, but, in his response, Marcus doesn't really address the biological aspects innateness at all. That is, if knowledge of certain things in innate, where is it stored and how is it constructed? At least RI _try_ to address issues of biological similarity. Marcus has only one example where a nativist acknowledges biology.

What's symbol-manipulation?

There is also a huge leap taken on page 158, where Marcus quotes Katz and Shatz. He uses the development of occular dominance columns to support innateness. But there is a big gap between vertical or orientation dominance (the potential to be receptive to certain kinds of _physical_ stimuli) and complicated, abstract manipulations like past tense formations. I don't think he has come close to justifing the relevence of the quote. In addition, he accuses RI of using the "poverty of imagination argument." Yet the fact that structures have not been described is by no means trivial!!! Even though RI (as he quote) make a similar statement, they are doing it at a much smaller scale. Marcus seems to be opening all sorts of possibilities that may not be physically possible, no matter what structures we may someday find.

David A.

Well, Marcus really is a talented fellow. Most of the critiques of connectionism I've read so far have seemed to have big holes in their logic, or at least poor grasps of the situation and the material. Not so with Marcus. He seems to know exactly what he is talking about, and his arguments are good enough that I've had to rethink some things about connectionism that I otherwise might have taken for granted.

I think that my faith in the abstract representational abilities of neural networks has been shaken somewhat. I still think there is substance to the field, certainly, but his arguments against true generalization are convincing. It seems like he might have something when he says that if input is different enough, the network can't generalize. This is a large flaw. However, my belief is that if the input has enough content in it (say aural or visual data), as opposed to the arbitrary inputs designed by the neural network programers, than more true generalizations could be made. However, only continued experimentation in the field will tell. I also feel that if we increase the complexity (i.e., number of layers, size, etc.) of the networks we work with, we might begin to see more interesting behavior. However, that might need to wait until we can come up with a learning method which works better for unsupervised (or at least, semi-unsupervised) learning.

I don't quite see why his 'alternative' to connectionism is so alternative. It seems like simple rule base processing to me. Is he being sarcastic? Sure, it seems like a good idea, but how does it works.

One other disagreement with him that I had was that he claims that the inputs given to networks by their programers are representational in themselves, so the networks can't be free of innate representation. For some networks this is true, but for the networks which use distributed representations of the input data, there is no real conceptual difference between that sort of input and actual sensory inputs.

My final though after reading these two papers is that connectionists need to be extra careful to be scientifically rigorous in their work, because each time data is massaged into looking better than it really is, the field and the concepts in it look weaker. I didn't necessarily believe everything he says about the Plunkett and Marchman model, but he sure seemed to blow it out of the water, and to weaken connectionism in the process.

Nathaniel

I thought that many of Marcus' criticisms of the models presented in RI were valid, and exposed many weaknesses in RI's arguements. In general, because the examples in RI were aimed towards people outside of the field they were glossed over and dumbed down in a rather frustrating fashion.

On the other hand, I was nonplussed by Marcus' own models and refutations. I don't know enough about the developement of the heart to be sure, but I feel like he is taking advantage of my ignorance to push his comparison of the parallels between brain and heart developement. And I know he is pulling something with his proof that an SRN can't generalize. It is very easy to exploit the architecture of an SRN to formulate a problem in such a way that it cannot solve the problem. Marcus adds a new input to the network and expects it to generalize, which is quite equivalent to giving a human the ability to smell radio waves. Whee.

Simon

Marcus's more recent paper is a very convincing argument against connectionism - assuming connectionism is what Marcus wants it to be. It sorta seems, however, that both he and Elman (if we want to put them on separate poles of the nativist/connectionist debate) have highly nuanced views of their side and a very "straw man" (to use Marcus's appropriate word) view of the other argument.

When Marcus cites RI's views on innateness (obviously a hot topic in the argument), he cites:

the strongest and most specific form of constraint... is representational, expressed at the neural level in terms of direct constraints on fine-grain patterns of cortical connectivity. This is, we think, the only kind of neural mechanism capable of implementing the claim that detailed knowledge of (for example) grammar, physics or theory of mind are innately specified.

Marcus then goes on to fight against the seeming accusation that nativists think that a neuron is born knowing what it is to do in later life (maybe grammar, physics, of theory of mind). His argument is that RI takes a straw man view of the nativist stance and then easily disproves the nativist agenda based on this view. I fully agree with Marcus, and that's why I asked for the references to defenders of the nativist argument - because RI makes it too easy.

I would also argue that Marcus sets up his own straw man for the connectionists. From RI:

Indeed, we believe that the connectionist framework can help us to rethink, rather than simple accept or reject, the notion of innately specified predispositions and to explore very different hypothetical starting points, be they chronotopic (i.e. temporal), architectural, and/or representational. (p. 114)

This view of innateness seems a little at odds with the revolutionary stance Marcus places the connectionists in. They do not deny that in the process of exploring the brain and its reaction to stimuli, the models they may create begin with chronotopic, _architectural_ , or representational biases that are different from the biases that stronger nativists might place them in. The reason I emphasised the architectural in the list of new hypotheses is that the connectionists are perfectly aware that the architectural constraints of the system or model they use helps to determine the output. That was the entire point of the receptive fields experiment when we tried to model translation variance in the exercises. That was also the point in Elman's starting small experiement.

In summary, I think its really easy to debate a point that you have a strong argument for as long as you feel the opposition has a weak point. It is a little harder to debate your side if you actually do believe that the other argument is pretty strong as well. I would be interested in reading some of the articles Marux cites regarding connectionists who are not as "extreme" as RI is - he lists a whole bunch on pp. 156-7. The second Marcus article was an amazing debunking of the past tense experiments - it can stand on its own without my comments, but I do have to look it over again to make sure that it's points really are as solid as they seem to be (I will never take graphs for granted again - I'll always draw lines at crucial points, and I'll look at the windows of time the experimenters used for the x axis).

Craig

My big question with the readings for this week was what exactly is constructivism? I felt like I missed a lot of what Marcus's argument was because I wasn't/ am not certain what constructivism is.

It seems that Marcus's big problem with RI is his feeling that RI is proposing a strong strong form on no innateness. Marcus at one point said that "any evidence for the existence of innate representations would thus count only against the extreme position advocated in RI, but not necessarily against other, weaker versions of constructivism." However, I didn't think that RI always advocated a very strong absolutely nothing is innate viewpoint. It seemed that at points RI advocated for a position were certain general type things were innate, perhaps the initial connections or weights of the analagous neural network. There were points in RI that did seem to support a very radical no innate viewpoint also though. This leads me to question exactly what Marcus means when he says "weaker versions of constructivism."

On the other hand I thought Marcus's arguments about networks not really understanding the concept "noun" was very interesting. Not being able to generalize the idea of "an x is an x" etc.. was rather thought provoking. One of the thoughts that it led me to is do humans really think of things in this abstract way or do we merely make ourselves think we do? Also, just because an experiment that has so many variables fails to accomplish something does not mean that a network could not accomplish that something. Proofs by negation are very difficult if not impossible to statistically validate to absolute certainty.

Josh

It's obvious from the reading that Marcus is on top of things. He's a smart guy and he knows where the holes in RI's arguments are. After reading it, my faith in RI in particular is shaken somewhat, but at the same time I feel like I want to defend RI against Marcus. Why? Because he goes so out of his way to take down RI at every level and has so little constructive to say. By the time I reached his suggestion for an alternative to connectionism, I thought it was a joke: "hey, what if humans just do rule-based acquisition?" It's as if Chomsky were to republish one of his first papers under a different title. It's unhelpful.

As for Marcus's actual attacks on RI, some were quite powerful: I'm interested in working out just what he means by the idea that the information coded in the input and output nodes of a neural network is innately represented in the nodes. I, for one, think that's cheating: I've always considered this to be a function of the test networks existing in a void, designed for one particular task. Of *course* you have to "innately" represent the idea of "past tense" to a test network in a computer -- it has no other conceptions of the world at all to consider what "past tense" really means.

And hey, since that's another damaging argument to Marcus: I agree that it's damaging to current connectionist work that nobody has built a network that tries to, say, learn all of the English language. On the other hand, all of his arguments about how the network couldn't generalize to new input need to consider two things: a) that's sort of like asking Deep Blue, without any further programming, to play backgammon, and b) one of the reasons humans CAN generalize new input is that we have a lot more information about our environment and a lot MORE input than a network does. Which is not to say that a network given ten thousand inputs COULD generalize on the 10,001st, but that for these small networks living in a void, the training set is the whole universe. So it's more like asking Deep Blue to play water polo.

That said... do any of these non-connectionists have any better ideas than connectionism right now? Anything constructive to say? If Marcus has taken down RI's connectionism, what does he have to replace it with beyond rule-based acquisition?

Charlie

It seems that while bothe these authors have some valid claims against the types of models presented in RI, they both trump up these claims and misapprehend major features of the RI model. I do not doubt that some of their claims have validity, but what I do doubt is the absolute certainty with which they make their pronouncements of the failure of the RI model, or, as I think they would be more accurately described, the RI models.

The attacks on the inability of backprop models to succeed at the task of past-tense formation first of all makes some pretty nit-pickingly detailed requirements for what constitutes learning the past tense properly. Secondly it doesn't seem to address recurrent, Elman-type networks, or any other possibilities beyond the straight back-prop model, assuming that what can be done by ANNs can be done by Elman nets. This is wrong. While the traditional learning schema for an Elman net may not fit the task of past tense formation, the architecture certainly gives tools and freedom of solution unavailable in traditional backprop models. The final problem of this article that I saw was shared with the second article. It is the problem of a misunderstanding of the importance and meaning of the input and output representations.

Both of these articles make bold claims about the inablility of networks to generalize in this way and that. The trouble is, because they have not (and more to the point cannot possibly have) looked at all possible network architectures, input and output representations, etc. their finding a "robust" failure to generalize is nothing of the sort. There can be no proof by the absence of contradiction, and that is what this claim about the failure of networks is. There is nothing in RI that says that backprop, Elman, and cascade-correlation nets are the only types of nets that are useful. Quite to the contrary, there are numerous examples of other types of nets in the book. So why can both these authors take the failure of one or more of these types to be failure of the class of ANNs? It doesn't make sense. Also a large part of their argument (the ba, da, ga thingie in the one and the criticism of localist representation as failing to generalize in the other) have to do with the representation of the input. The reason for the choice of localist is that it is free from bias. That doesn't mean it's the best choice to use. Of course it isn't the best to use!! It's free from microsemantics and way too costly in terms of nodes. What these authors don't seem to realize is that localist representations, that are inherently symbolic, are a limitation on neural nets designed to show how nets can be unbiased and still achieve hidden layer activations that clearly represent a marked difference between the treatments of abstract categories (such as noun and verb). With more layers/modules of neural nets to serve as input and the actual input being the words themselves, the network would be freed up to do some of the things these authors claim it could never do, such as overregularize more than underregularize, or understand that ba, da, and ga are all similar given more information about the nature of the inputs. Of course the only information that ANNs can make inferences about is what you give them, and the localist gives next to no information.

Having said this, I would need more time to decide if the "X is a noun," "Y is a Y" thing is still a problem. But I still don't accept this proof by non-contradiction which seems to be the main line of argument against ANNs everywhere, even in the better informed and more serious articles.

Chaos

a lot of what marcus has to say about "rethinking innateness" seems to ignore some valid ideas that (i think) elman et al brought up. in talking about the concept of "representations", marcus ignores the idea that representations aren't just meaningless symbols, but that things like nouns and verbs are categorized as such because of some arbitrary number of features which they share in common (such as tendency to appear in similar places and contexts within sentences). thus, if a network can take a bunch of nouns and verbs *in context* and differentiate between them by successfully identifying some as nouns and some as verbs, then i think that's a meaningful accomplishment. marcus seems to think it's not. and i think there have been networks which have taken phonetic symbols which are encoded in some non-feature-based way, and managed to make meaningful distinctions between them, though i can't remember the specifics of the experiment, so maybe the encoding wasn't so "innocuous".

on the other hand, marcus does have a point about abstraction. that is, human beings are very good at "meta-concepts" - things like the idea of number which are valid in different areas. neural networks are not good at such things, and, in all our examples in class, have been very tied to the specific location and identity of each of their inputs, unless some sort of very rigid group structure is imposed to force them to generalize across inputs. one question is whether another feature-detecting network could act on the initial input vectors and somehow restructure them such that they could be used in a later network to test for things such as identity. i'm being very vague about this, and i have no idea how it would work. but i have a notion that maybe some things which seem to be more domain-general in humans than networks (such as the ability to recognize identities) might be, rather than pre-specified, simply tested by a separate module which is itself a network, rather than being a part of each network which needs to use the concept of identity.