I was taken a little off-guard by the reading - just a lot of references to concepts I have been introduced to only briefly, and references to projects that in the case of the newsgroup papers (Jakota) the reader is expected to be familiar with. Besides that, I found a lot of informative material and interesting insights in the articles, especially the analytical machines reading, which talked a lot about some of the more confusing concepts of analysing NNet behavior (cluster analysis and principal components analysis). The author's take on what NNets don't know I found confusing and not pertinent at first - the idea that a NNet doesn't KNOW what a vowel is the same way that a frog doesn't KNOW what a fly is just completely passed me by, because I actually (heaven forbid) think that a frog can conceptualize about a fly, instead of contextualizing it. Maybe I should start a new paragraph.
Part of the argument about how the NNet doesn't know the information it seems to contain is that it can't be manipulated within a change of context - the example used was that of a loan-approval NNet that was trained in a boom economy, that couldn't be changed for a bust economy unless it was retrained - becasue the "knowledge" is distributed. This argument doesn't carry to the frog, I don't feel - a frog anywhere, even transported to a completely different environment, will snatch a fly if it senses one. If the argument is more about the context of the fly, and we say, "what would the frog do if the fly was completely the same but doesn't buzz its wings when it flies anymore?", then really we should consider the case of a human - would you eat something that is slightly different - like milk that was exactly like milk only it was blue? With this argument, granted, I'm removing the larger aspect of the assumption of knowledge, but I think the concern is more about the reasearcher's bias in wanting to dictate that the NNEt has a symbolic ("vowel", "consonant") understanding of the input set, which the chapter is all about arguing against. My feeling is that for us to begin to understand what the NNet "knows", we have to play the devil's advocate and say what if our fundamental understanding of the input set from our human senses is contextual, is distributed, and is the same as a NNet? Then are we wrong in saying that the NNet "knows" a vowel - and are we just as wrong in applying the same criteria to our own understanding and saying that we "know" a vowel and a consonant. The direction the author took along this line, moving into Karmiloff-Smith's ideas about fundamental units of understanding and our developing abilities (phases) to manipulate these units on higher and higher levels, was what we began to talk about the first day of class, and I think is really interesting. I would be interested in reading more from this book, if only to read (apropos my intended final project) the promised section on networks that assign units of understanding to different smaller subnets within the larger net.
I also found the Meeden article interesting in its references to the forcing of abstraction from the hidden layer, which I commented on in last week's reaction. The RAAM network is a fascinating example of a analog memory, holding 27 words and a simple grammar in a few weights, and even achieving 80% recall on novel material. I would be interested in seeing how complex a grammar the net could retain - the simple noun verb noun and noun verb grammar seems to be useful for the applications the system was put to, but I would like to see how robust that system is in a larger domain - and how it might further segment the hidden layer activation space. 2/6/99
the first article that i read was clark's associative engines. i liked it. i mean, it was bound to be good after citing the wonderful example of the tank and the sun--just amuses me everytime again. the fact that neural networks can learn almost anything has slowly but definitely made its way into my appreciation by now. i just simply accept it, and that is good, because there is more to be learnt. clark's article was the first article that demonstrated to me, how stupid and static a neural network is, once it has learned. take the example of nettalk: nettalk knows exactly how to translate phonemes into vocal sounds, and 'is aware' of the voiced, sonorous, and syllabic nature of a vowel, but could not possibly be modified to distinguish between vowels and consonants, a task that is a prerequisite for a human reader.
i was further very interested in the competence theory. it seems that in many applications, neural networks are overly dependent on the underlying domain of the application. at first this seems perfectly right, but it distances the neural network from the human mind. in the example of ohm's famous law, the network may well know how to calculate any of the variables provided the other two, but it can only do so because of the hard-coded formula v = i.r. any other formula could not be calculated using the same mechanism, even if the formula were something as simple as p = f.d. the crucial point is that a network uses the actual formula as only external fact (next to the inputs), whereas a human mind employs methods to decode the symbolic representation and to compute simple arithmetics--terms that are unknown to the neural network. furthermore, if a problem was to calculate the voltage across a 1 ohm resistor with a current of 1 amp flowing, even my little brother could deduce the answer 1 volt from the formula and the input, using his (limited) recollection of arithmetics (he's 10). a network, however well trained, will never arrive at the solution 1x1 = 1 but would possibly consider the case 2x1 = 2 and note that the answer decreased by the same amount as one of the inputs and hence come up with the answer 1.
the last thing on my agenda for this article is the concept of non-conceptual contents, as in the frog 'i-see-a-fly-over-there' example. i have never given enough thought to the idea of how knowledge is represented in a network, but this example makes perfect sense. it sort of ties in with the chinese room theory and just enhances the fact that background and conceptual content knowledge is not necessarily required for problem solving, a fact which is heavily used by neural networks, but also found in human beings from time to time. moving on... Reading the introductory paragraphs for the raam concept is very interesting because i currently study encryption algorithms on my own and the technique employed for compression in a raam is essentially the same method as a newly proposed encryption algorithm (by a German scientist) called affe (German: monkey), which stands for associative feed forward encryption (very inventive name, huh?). Konrad Schily, the guy who propsed this algorithm is an AI person and seemingly used AI techniques for such mathematical operations as encryption. The new method does not have any good outlooks yet because it is way too slow and experts do not attribute it a lot of potential.
the rest of the article on raam was very interesting and clear, giving a nice introduction to this simplistic yet powerful concept. i did not like the reading dealing with the email discussion about the symbolic theory. in my opinion it is necessary to be part of such a discussion in order to appreciate it, not only passively read what others wrote. there is a difference between a paper and a message addressed at such a discussion, and the latter makes me want to contribute although this is not possible in a static reading as this one.
Subsymbolic paradigm seems kind of like magic. How do you know when you have gone down far enough?
The though that each thing we use as a symbol is actually made of many smaller concepts is fine, but the longer you consider each symbol, the deeper and deeper you can go. How do you decide when you have reaced the lowest level that is necessary? Does it depend on the application? I like the idea that in symbolic processing the symbol is just a label and in subsymbolic its more than just a pointer to some other location where the idea is stored, but is fundamental to the idea itself.
This fits with the stuff in the web reading, where to break down the net into little peices destroys the connections between the ideas.
RAAM questions:
RE: The recursion that collapses everything into a single node. Does this mean that the entire tree is stored in one node, and if so, does it take up less memory that the tree? (like changing where paraens are for grouping, just looking at different levels)
The sentence operations are impressive but I'm not sure how it happened
I'm amazed that RAAM was able to flip the sentences around the way it did. (Cheetah chase tarzan to tarzan flee cheetah) with out knowing grammar.
It seems that a lot of the people in this reading feel that trying to break down a connectionist network into little peices that can be understood individually destroys the ablity of the network to function without bias. I kinda agree and kinda don't agree because on one hand, to have a truly functioning neural network it would have to be so big that we couldn't know all the states of the neurods at any one time because of the non-lineraitys inherient in the net. On the other hand, we need to decide ahead of time what function each neurode is going to use as the triggering function. I seem to rember something like this debate in "Zen and the Art of Motorcycle Maintaince" where being able to take apart a motorcyle is not the same thing as being able to use it. A lot of the ppl in this publication seem to feel the same way, that knowing how to break down a network is not the same thing as getting it to do what you want it to do.
These readings were interesting - it was nice to read a range of angles on this topic.
I'm still not sure I understand the concept of subsymbolic processing though. I understand that the representation can be a combination of factors that change over time, and that somehow they describe what it is they represent, but I'm not sure I'm clear on just HOW they represent something, or how the specific subsymbols are chosen or changed in a given situation. Is there an example of this in language or in our life? Humans seem to be very firmly grounded in symbolic knowledge, so how does the technique of subsymbolic representation map on to the physical realities of the human (or whatever) brain?
Another thing that especially interested me was the concept of generalization. Generalization seems to be the biggest advbantage of a neural network, or at least one of its big selling points. But what is the "end goal" of a neural network? What kinds of problems are we trying to use them to solve? For instance, if we were to apply a connectionist paradigm to the travelling salesman problem, how could generalization help us to solve the problem? I thought the section on word/sentence processing was neat, and I wondered how other computerized decoders work? Are they all neural network type paradigms, or do they follow a more digital construction?
I read with some excitement the RAAM stuff, simply because I thought at first that perhaps it could be leaning towards an answer for my most recent "What's up with AI?" question, namely "How do humans achieve symbolic representation internally and do any of the current AI paradigms even approach duplicating it (whatever it may be)?"
Well, I'm unconvinced by RAAM, although it was six years ago and it is a step in the right direction. When I did my final AI paper last semester on natural language processing, I was frustrated by all those symbolic algorithmic grammars trying to solve the problem because they weren't doing any learning. And it seems that largely what RAAM accomplishes is the ability to have a system that works on the symbolic level but because it uses subsymbolic representation can "really learn." Which is cool. What it seems to do for, say, language is show at a basic level how a system that already knows some stuff can learn more stuff based on it.
But what about learning the first stuff to begin with? I'm talking innate language stuff again: what are the steps that need to be taken for a system that is being handed all these "words," all these strings, to begin creating connections between these words and form a "grammar?" Must this grammar be algorithmic, as shown here? Or can it be done heuristically? I'm concerned (no offense to those reading this who might be co-authors of the paper) that the idea that repeated attempts at "filling the Gap" will eventually result in a system which perfectly does so; this bears striking resemblance for me to the GOFAI approach of just trying different symbolic representations until one perfectly works. Or am I misunderstanding the point?
Regarding "What Networks Know" -- yes! I'm increasingly troubled by the way the purely connectionist approach ignores two things about human mental function: temporal considerations and internal symbolic representation. I guess that'd be the Gap, of course, but is the Blank paper right? Have more complex and ambitious systems arisen in the last six or seven years that succeed better in filling in the Gap?
The one problem I have is that I feel somewhat overwhelmed in trying to react to this weeks readings. I'm not going to try to react to everything, I just had a few thoughts:
-It was interesting to note that the complaint that NETtalk didn't really know the difference between vowels and consonants because it could not access its knowledge did not apply to RAAM, as can be seen. The RAAM "detector" clearly solves the problem posed on page 72 of the What Networks Don't Know article.
- I was really fascinated by the RAAM model's ability to perform so well on the wide variety of tasks set out for it, especially in light of what seemed to be a flawed internal representation (at least from my own classical level 1.5 schema) of nouns and verbs.
- I like the fact that all the articles are giving ways of further describing what we see in individual candidates for cognitive activity. The presence of a more highly articulated manner of describing what is going on cannot but help us in describing the field of artificial intelligence.
- I also liked the way in which the Clark article used the (somewhat over-dramatized) inversion model to describe the contribution of AI to cognitive science in a more general way. The idea that by studying the actual task-based constraints, independant of our biases, we can form a more detailed and accurate picture of task-specific heuristics that are valuable strikes me as a goal of AI that is uncontroversial enough, yet still powerful and interesting enough, to rally behind as a reason to do experimentation (not that a reason was all that necessary).
A common theme throughout this reading is that of reaching a compromise between symbolic processing and subsymbolic/connectionist processing. In the article concerning the RAAM model, this compromise is reached through the encoding of conventional symbolic structures such as lists and stacks into an abstract representation of the data distributed across the hidden nodes of an auto-associative backprop network. Through the examples of syntactic transormation of the network's input, we are shown the power gained by the ability to holistically enact transformations on data that is represented in a distributed manner. This stands in contrast to the necessary decomposition inherent to performing such operations on symbolic data. However, the added power one would think these holistic operations would bring about did not seem to be capitalized on. It seemed that despite the theoretical insight gained through the study of RAAM, the RAAM did not learn any better or faster thana typical recurrent NN, like that used by Elman to classify words. Nevertheless, the RAAM model was an interesting example of how systems in the subsymbolic corner of the described paradigmatic space can be used to process data that is more symbolic in its nature.
Another advantage of the kind of compromise shown by RAAM is that it shows a tangible illustration of the computational equivalence of the two types of representations (i.e. that any connectionist network can be represented by a universal turing machine.)
The exploration of microwriting and video gaming skills in "Associative Engines" cast a new light on the subject of modularization in biological systems. The studies suggested that there is not necessarily an inherent modularization of brain functions, but rather that any skill set can be broken down into a number of smaller subsets that can be recombined as needed to produce a novel combination of already learned skills. This combination is then used to perform the new task. The example of the microsurgeon's microwriting showed that the skill of microwriting was not itself learned, but rather that the two pre-existing skills of manual dexterity on a small scale and of handwriting on a normal scale were combined. The video game example stated that a player skilled at "Pac-Man" could use these skills to play a similar game with the same control apparatus. An example that I thought would be more illustrative of this "modularization on demand" would be that of a skilled video game player playing a game in the arcade, and then switching to a home-console version of the same game with a radically different control apparatus (e.g. arcade joystick vs. home control-pad.) The player then breaks the game strategy component from the more general video-game playing skill, and combines this skill with the familiarity with the home-console control apparatus to recreate the previous level of game-playing proficiency. This example closely parallels that of the microsurgeon's small-scale handwriting. Numerous other examples in this same vein could be wrought with the experiences of many users' interactions with different kind of machines. PC users learn to adjust to differently-laid out keyboards, different properties of the mouse, etc. Telephone users learn to adjust to touch-tone or rotary phones (some people do still have rotary phones), and the list goes on.
I found the discussion about what is a symbolic versus subsymbolic representation very interesting. I was also a bit confused about it. I think my confusion derives from a seemingly (I think that this is how I view it) symbolic representation that I have of the world. However, each of my "symbols" have many underlying attachments. I could not simply rename a "computer" as "higwopy" and everything would make sense. I don't know if this a part of some symbolic representation that I hold, or if I have developed some underlying subsymbolic representation that I can not differentiate from my apparent (to me) symbolic representation.
In the collection of articles, the article by Garner titled "Symbols, Symbolic Processing and Neural Networks" called for a definition of a symbol. It also offered what it admitted was a poor definition of a symbol, but is a lot of what I think of as a symbol. If I associate a symbol with a variety of different (shall we say "distributed") things and see it's apparent (perhaps "emergent") qualities coming not just from it being a symbol, but from the other things around it, or how it interplays with it's environment, is this a symbolic or subsymbolic representation? Why do I think of it as a symbolic representation? It's hard to conceive of this notion of subsymbolic thinking going on in my head. All kinds of bizarre combinations occur inside of my mind and then an idea of what a computer is "emerges?" That seems very strange to me and is hard to put together with how I can think about how I am thinking.
it seems to me that the question of how well and to what extent connectionist systems "learn" is pretty heavily dependent on why we want them to learn. clark mentions the fact that a connectionist network can learn to perform a specific task, but that, since it has no concept of the symbolic-level distinctions it is making, it cannot respond appropriately to minor systematic changes in its input set. he contrasts this to human learning, which is fairly generalizable and modular.
however, it seems that neural networks by themselves can adapt to differences in input patterns with more training (in cases like the bank loan case). this evolutionary style of learning doesn't really add modularity, though, which seems likely to be a necessary component of any system that can generalize.
for many kinds of learning that humans do (such as most of what we get credit for here at swat), the test of whether or not learning has taken place is the ability to explain what has been learned. this is another problem for a subsymbolic machine, since it would need to categorize its learning in some symbolic manner before it could try to summarize the knowledge itself. or is there a way for a neural network to summarize its results without putting things on a symbolic level which could perform the same function of explaining what it has learned?