In the Large and Kolen paper they mention that recurrent networks haven't been able to successfully learn music sequences. I wonder why that is so. It may the the complexity is to great, but it seems that music would be an ideal temporal sequence to train a recurrent network on. Perhaps the results Elman got with the network are primarily because predicting words in a sentence relies on relatively little temporal information.
One inherent problem to the recurrent network is the inability to selectively save certain memories based on their importance to the current situation. It may learn to strongly react to certain things in the context layer, but all the information is uniformally painted over by the next set of output layer activations. I imagine the generation of recurrent networks mike have a selective short term memory that hold vital memories selected by the network. I can't, however, say now how it might be implemented. Government secrets and all.
After attending Hofstadter's lecture and reading these articles about music I see a tendency to use a fair amount of strong AI strategies to solve the problem. They are thinking about its structure instead of letting the computer's do most of the work by crunching numbers. I wonder if music will be treated in cognitive science like language was. First people will try very hard to model it by more symbolic rule-based systems, but then slowly realize the merits of using simple computer frameworks to crunch numbers until they find more complex structures on their own.
At first the the Port and Gelder article was a little a painful. I can only take so many irrefutable arguments for connectionism and it seems like wherever I go someone is trying to convice me of its validity (I think its the best option right now). The idea of time I think plays an important and sometimes unnoticed role and their argument is an important aspect of the bigger picture. Time has a significant role, because even if a nueron is highly activated it final effect will be entirely dependent on what point it plays a role in a process, because the activation will slowly decay.
Ed Large and temporal structure.
So basically we have a bunch of units that are each responsible for detecting simple oscillations in a specific frequency range.
Lo and behold, they learn to predict polyrhythms - each unit predicting one of the simple rhythms in the polyrhythm.
I applaud the authors for postponing analysis of a network of conencted units until the units are better understood individually.
It is surprising how complex the units get. One would think that the receptive fields are narrow enough that they'd phase lock to the mean signal within their window. One would also think that this mean signal would be a fair predicter of the actual beat, because "noise", i.e., off-beat signals, or signals from other simple rhythms within the polyrhythm, would be distributed with mean equal to the signal that the unit is trying to detect.
It will be really interesting to see how networks of such oscillators interact - and if they can be made to behave similarly to models of networks of biophysical neurons
Gelder and Port
As brevity is the soul of wit and tediousness its limbs and outward flourishes let me be brief
this article is bad bad call i it because to define true badness...
my goodness. The author clearly spent his academic career filling page limits instead of writing concisely and coherently.
At any rate I suppose I'd better substantiate my complaints.
Firstly, the overarching dynamic paradigm is a little grandiose - yeah time happens, its kinda weird, a lot of people think that the real numbers model time pretty well. BUT there is nothing wrong with chopping time up into little pieces as long as the pieces are sufficiently small. If we are worried about events that occur in the 1 second range and we chop time up into milliseconds, then for our purposes time is continuous even though we are technically modelling it discretely.
Secondly, I completely missed whatever subtelty distinguishes Ellman nets from processes which model time as discrete. The Ellman net is all about time steps, and I still characterize it as dynamic, as do the authors.
Chapter 2
Diff eqs whoo hoo
Dynamic Systems
Look, I just saw the word ³cognition² on the same page as pictures with masses on springs and was ready to tear it up on the spot. But I didnıt. In fact, using classical mechanics is probably the worst and most laughable idea Iıve recently heard. The Dynamical Approach has a number of good uses, but is by no means a decent language for describing cognition. Notice that I called it a language; I say this because it does not offer us any new tools that we werenıt already working with (function spaces are equally extensive). It just puts the phenomena in a completely unwieldy form that makes it hard to understand anything. That is, the form of differential equations. Why should we want to do this: well, dynamics has the time variable and nothing else does in current cognitive models does. Thatıs a good point. But their real reason? ³Their usefulness in offering the best scientific explanations of phenomena throughout the natural world has been proved again and again.² Yeah, and again and again is a vanishingly small number. In fact, classical mechanics utterly fails to explain most phenomena in physics. In a small regime, it can be considered a good approximation, but in ALL cases, there is absolutely nothing fundamental about its description! Also, there is nothing continuous about the natural world. Indeed, there is nothing continuous about pulse trains that comprise our cognition. It would not surprise me if we are hopelessly lost here in the world of some poor old french physicist who knows nothing about neurobiology or modern physics. And who would happily apply his lifeıs work on approximation methods to partial differential equations to something like cognitive phenomena. ³Granted, we are a long way off...² Couldnıt have said it better.
The power of the methods developed by physicists do not yield much
that is useful. The reading showed us how to see the most simple
attractors. Now consider this: what are we supposed to make of the
next simplest attractor map? Well, that would be the mandelbrot set
given by the iterative equation Zı=Z^2+c. That equation is pretty
simple to yield such infinite complexity as the Mandelbrot set. Now,
letıs just try to imagine the attractor map of cognition...
So, Iıve been ranting about the way dynamics was introduced in this
article. But now what gets me the most is that they turn around and
say that Elman nets are definitely dynamical. This surprised me since
I would have said that theyıre almost completely on the
connectionist side of things with only a little extension of the
architecture built in to let it repeat over longer sequences. They in
fact do not have a sense of time, or a time variable. But I like Elman
networks, so it is only part of their study, the pure dynamical
systems, that I have contention with.
The musical beat thing: You know, weıre all taking it for granted
that the beat of music is something fundamental and easy to figure out
for a human. I would argue that the beat is a very abstract and
learned thing. There are a lot of humans who canıt find the beat.
Much of our music today is for the rhythm impaired because it uses
high amplitude, ultra-low frequency pulses. Anyone can find the beat
when it is marked so clearly, even a neural network with the cognitive
power of a leech. (Sometimes the tone generators used around campus
move the air mass significantly and thus your body for you.) I think
the beat of music is learned because we are told what the beat is and
exactly where it falls for music that we become familiar with. I think
weıd be hard pressed to find the beat in the music of a completely
unknown style. More importantly, in music that weıre familiar with,
the identifiable beat is almost always accompanied by an emphasis on
the notes. If you play the notes that they printed without any
emphasis, your music teacher would tell you that you had no sense of
the beat. And its true, unless you are expressive with tonal and
amplitude variations, I donıt think a beat is uniquely determined.
Iıd like to see if the network could learn if the MIDI sequences had
variations in amplitude. More importantly, Iıd like to see an
experiment to show that humans can do it. An ideal example would be
this: Take one of Beethovenıs symphonies and ask the class
(excluding the music majors!) to find the beat. Itıd be interesting,
and I predict that weıd be hard pressed to find the beat, let alone
the down beat. (No offense to anyone.)
And a note on abstractions: Iıve been thinking for a while now that
the biggest problem with nets is simply the inability to abstract
an indefinite number of times. This is made clear by the shape
recognition problem: We want to recognize a square whether it is
rotated or scaled. This is tricky because the CCD inputs are in
absolute coordinates. So, we could take the spatial fourier transform
and abstract over one dimension so that it doesnıt matter where it
is translated. Then you take another spatial fourier transform so it
doesnıt matter how big it is. Then you take an angular fourier
transform so it doesnıt matter how itıs rotated, and so on. Now,
if we were to input the final result into a network, it would always
recognize the square any which way. But I could imagine yet another
level on which another fourier transform could be taken. My point is
that abstract relationships need abstract perception and inputs if
there is no abstraction in the network itself. So Iıve been racking
by brains to come up with some abstraction in a network (a ³fourier
network² where the nodes are little integrators and comprise the
coefficients of a fourier series...). What if we made it a practice to
input all levels of abstraction into a network on any input. I bet we
could solve a lot more problems. In one of the backprop networks used
to find the beat, they could train it for four various tempos, but it would never generalize. This is because it isnıt abstracting at all;
and thereıs no mathematical reason sums of sigmoids should be able
to do this. Thus, we need to provide that service. If you took a
time-fourier transform over segments of that piece, you would
recognize it at any tempo!!!!
This article had some interesting points that the authors
repeated over and over again (or so it seemed to me). The idea
that we should represent cognition as a dynamical system and use
dynamical approaches was a new idea to me and one that has a lot
of merit. I especially liked the idea that idea that perhaps our
whole framework (and mindset) are wrong, and although we have found
many good approximations for cognition, we are lacking the appropriate
model. The analogy to astrology really hits home. Although, a
dynamical approach may not be the correct one either - who knows.
The embedded idea is something that really makes sense to me and
fits much better into a dynamical framework. I believe that things
go on at many levels in our head and they work together in some sense
to come up with responses to stimuli. In a discrete model, it's very
impossible to cope with such things in parallel; by the mere fact that
we are operating in time steps. Well, not exactly, but the idea that
all these things happen at the same time and somehow blend together I
do not think can be captured without including time in some very
real sense.
"Everything is simultaneously affecting everything else."
This is definetly how the world works, and modelling it as such sure
makes a lot of sense. That's why we use dynamics to model the world.
So why use sequential machines to model our brains? Our brains also
interact with the world through our bodies. Although I believe that some
of our intellectual activity is certainly just planning, much or
most of what we do is interacting with the world and responding to stimuli.
Why then should we not use a dynamical approach as we do in physics?
I liked the discussion on the relation of Neural Nets to this
"new" approach. "Such networks are little more than sophisticated
devices for mapping static inputes into static outputs." This is
exactly what we have been saying (pretty much). NNets are just a mapping
from one space to another. It is hard for them to capture any essence
of time, although we apptempt to via different, more sophisticated
architectures. But then are these architectures just a patch in a
model that just isn't correct?
Finally, although I enjoyed the article and it definitely got me thinking
on a whole new level, it lacked concreteness in some sense. Just what
is this system that they are referring to? Just what are some of the
parameters or "equations"? I may be ahead of the chapter, but
though the authors raved (and repeated) about this model, they fail
to mention anything concrete.
Hmm. Obvious to some. Not so obvious to Newell, Simon, or others.
One page later, the authors are highly critical of the discrete
timesteps of Turing machines and the computational approach they
symbolize.
My first reaction was that these are appropriate, and they depend upon
the hardware. Then, under footnote 10, they say:
So now, hardware doesn't matter. In my mind, time and state are not
so clearly distinguishable. Biologically, a timestep makes sense to
me. Not a global timestep, but that neuronal firings are binary,
discrete things. I may be wrong and maybe neurons can half-fire. If
so, then we have the continuity that dynamicists claim.
It's About Time: An Overview of Blah, blah, blah
Okay. This article started out really promisingly. I thought-
wow, this is really what learning modelling needs, a way to amke
things continuous. I wonder how they're going to do that? That's
cool.
So I was sort of disappointed. This article knocked down
computationalism, and promoted dynamicism and didn't really provide
any examples of how dynamicism was a feasible approach to take, and
whether any successful attempts to use dynamicism had been done.
There seemed to be no empirical evidence that the dynamic approach was
superior to the computational approach, despite a convincing argument
that cognition is dynamic as opposed to computational. And, unless
I've seriously misunderstood, the authors went ahead and described a
supposedly dynamic model of cognition as being discrete. Now this
strikes me as a severe case of pot arguing with kettle.
I'm also not sure that the statement that human beings aren't
computers and therefore cognition should not be modelled
computationally is quite as simple a statement as the authors seem to
think. Are we really sure that there is some essence behind the
mechanics?
I also am really confused about the statement that "the most
powerful known medium of representation is symbolic, and hence
cognitive processes must manipulate symbols, i.e., must be
computational in nature." First, are we sure that we use the most
powerful medium of represenation, and secondly, didn't we just say
that cognitive processes were dynamic in nature?
Also, it seems that some cognitive processes might be
computational- such as the classic example of chess. True, external
factors are dynamic, but the cognitive processes behind playing the
game would work discretely, no?
I also have issues with their assumption that natural language
is well-understood, but it seems to have been a passing comment.
In short, I think the authors made a lot of claims, and
knocked down a lot of useful stuff, and then went on to either not
support their own claims sufficiently, or contradict themselves.
Though I may just have missed something.
Resonance and the Perception of Meter
I found this article interesting, but difficult because of my
very limited math and physics background. I'm interested to know the
connectionist approaches were unsuccessful in recognizing melodies
played at different tempos. It would seem that a melody when played
at different speeds would still be quite patterned and therefore
recognizable by a network. Why were the attempts to train the
networks unsuccessful? Would the use of context units help? What
does this say about using networks on patterns in which time is a
factor?
Also, as a technical question, was is the difference between
phase-locked and frequency-locked?
How does the model respond to fluctuations in tempo that are
characteristic to music as played by a human. A tempo can easily
decrease from 88 beats per minute to 40, while the listener still
feels the rhythm. Can the model handle such significant changes? The
article describes the reaction to a ritardando, involving a "slight
phase shift." Does this work with great changes in tempo?
It's About Time: An Overview of the Dynamical Approach to
Cognition
Jonathon Shlens
MIND AS MOTION
- "natural cognitive systems, such as people, aren't computers" - leading
up this part, they destroy the symbolic paradigm of AI.
- "cognitive processes and their context unfold continuously and
simultaneously in real time" - the authors here seem to agree with
Hofstadter's idea of "fluid" thought and concepts.
- "further they (other AI approaches) say nothing about ... deliberation
time ... or how a choice can appear more attractive at one time, less
attractive at another" - I think that the authors make a good point.
Neither connectionism nor symbolism can explain these facts in human
cognition. These approaches do not account for the fact that the same
input at a different time can elicit a whole range or distribution
(Gaussian?) of responses and deliberation time; the other approaches spew
out only one answer. The attraction of backprop networks are that they
begin to stray away from this single deliberate answer but rather can
output a mixed answer. However, even backprop networks do not go far
enough for their model does not explain very well variation in deliberation
time and variation in answer over time.
- "the cognitive system does not interact with other aspects of the world
by passing messages or commands; rather, it continuously coevolves with
them" - I think that they hit the spot. By limiting out models to discrete
"messages" and commands" we constrain our models to discrete 'cognition'
and eliminate any possibility of variation or fluidity.
-"representations of stored items are point attractors in the phase space
of the system. Recalling or recognizing an item is a matter of settling
into its attractor, a process that is governed by purely numerical
dynamical rules." - YES! I think that they have hit the mark with this
statement! This approach and belief works well because it explains the
variability of decisions and the self-organization of thought. Since
dynamical systems naturally self-organize around attractors, there is no
need for a teacher telling you when you are right or wrong. Also, this
reminds of the Lorenz phase plot. Basically, the chaotic system remains
continuously bounded by two attractors over time yet never remain on the
same path. Hence, analogously, a thought or cognition might be represented
by an attractor (which self-organizes); however, because of the fact that
it never takes the exact same path twice, the thought continues to exhibit
both variability and fluidity!!! Wow!
- a problem: how can one attach a motive or a goal to a dynamical system?
For instance, in a backprop network the goal is created by presenting the
desired output for the system. How would likewise train a dynamical
system? It seems that one would need to somehow manipulate or change the
attractor of the system? or possibly change the meaning of the attractor
inherent in the system (i.e. change the translation of the attractor from
apples=good to work='not fun',etc).
- "the claim is that models must be capable of describing change from one
state to another arbitrarily close to it, as well as sudden change from one
state to another discretely distinct from it" - This again is an extremely
good point (the idea of fluidity of thought) and reason why symbolic
systems (and possibly connectionist systems) are flawed in their
approach.
- "dynamical systems are just the simultaneous, mutual influencing activity
of multiple parts or aspects" - WOW! Again, their approach seems to agree
with observation- that is dynamical system naturally require parallel
processing!
WOWOWOWOW! I thought that this reading was great! The ideas which
are presented in this chapter strongly agree with the views I had of the
brain coming into the class. I think that the brain is definitely
intertwined and composed of billions of nonlinear feedbacks all of which
create self-organization and fluidity of thought. This approach has
definite merit as it really agrees with my perception of the brain. Also,
if true, this approach also demonstrates the vast complexity of the brain.
Indeed, one linear partial differential equation of two variables is hard
to solve - an hour or two by paper generating an utter mess. Imagine a NON
linear partial differential equation of several billion variables (by the
way, a simple 2 variable nonlinear equations can be literally impossible to
solve).
RESONANCE AND THE PERCEPTION OF MUSICAL METER
I liked this article as it applied alot of the theory in the previous
reading to this problem of musical meter. The whole development of the
theory behind creating a system capable of this feat is very interesting.
Specifically, this article describes that in an implementation of this
model the meter's "oscillatory behavior of interest, and synchronization
arises given simple couplings." Keeping this in mind, in the previous
article they mentioned that "dynamical systems are known to be able to
create structure both in space and in time." Both articles point toward
the fact that large dynamical systems naturally converge on some structure
(musical meter is just one example of such). These quotes reminded me of a
math lecture last month on chaos by Steven Strogatz. In his lecture, he
talked about fire flies in Borneo(?) which at night naturally begin to
'orchestrate' their light. In roughly an hour after night fall, the fire
flies naturally synchronize their behavior and light-up their bodies at the
exact same frequency. In any case, Strogatz proved (partially) the
mathematical reason for this. What it boiled down to was that nonlinear
systems naturally organize. Wow! It is nice to see this math relate to
theories of how the brain recognizes musical meter (or anything else)!
Michael Morton
Response to:
An Overview of the Dynamical Approach to Cognition
Dave Lewis
This is really just an obvious and elementary consequence of the fact
that cognitive processes are ultimately physical processes taking
place in real biological hardware.
What state was the machine in at time 1.5? How long was the machine in
state 1? How long did it take for the machine to change from state 1
to state 2? None of these questions are appropriate.
One inappropriate way to extract temporal considerations from
a computational model is to rely on the timing of operations that
follow from the model's being implemented in real physical
hardware. This is inappropriate because the particular details of a
model's hardware implementation are irrelevent to the nature of the
model, and the choice of a particular implementation is theoretically
completely arbitrary.
Elaine Huang's Briefs
Timothy van Gelder, Robert H. Port
Edward Large and John Kolen
Roger Bock
pg 2 - What is a research paradigm in Kuhn's classic sense?
pg 6 - I find it hard to believe that our minds can be modelled by a
set of state variable equations, however numerous they may be. I can't
believe that a person in one state will always do the same thing next.
I believe that for every me which makes one decision, there is another
hypothetical me which could just as easily have made the other choice.
I think some of our behavior is governed by chance. Although, maybe
it's just that our minds are so chaotic that it appears that chance is
involved. In other words, a very minute difference in my state may
result in me choosing a different course of action. However, because
the difference in the states which results in a different choice is so
infinitesimal, it may as well be chance.
pg 24 - I think the author's are spending a little too much time
redundantly refuting the virtues of the computationalist approach.
It's too bad there is such rivalry inherent in some disciplines of
science.
pg 27 - What is the CNS?
pg 28 - I would love to hear the computationalist rebuttals to some of
the points made in this article, as well as a non-biased discussion of
the deficiencies of the dynamic approach.
pg 35 - I don't think it's necessarily true that using lower
dimensional mathematical models corresponds to studying a system at a
higher level. After all, what if all those dimensions are needed to
show some sort of emergent property?
Dynamics - An Introduction
pg 45 - This chapter was kind of boring, because I am taking Math 30
this semester. Still, it was interesting to hear about dynamics from
another person's viewpoint.
pg 60 - Are asymptotically stable fixed points a subset of
Lyapunov-stable fixed points?
pg 62 - What are separatrices?
pg 64 - What does it mean for a system to have a dense orbit?
Resonance and the Perception of Musical Meter
pg 178 - I'm surprised Cope's work isn't mentioned in this paper, I
wouldn't have thought the field of music production would be that well
populated.
pg 179 - What input would the network trained to recognize a melody at
a specific tempo receieve?
pg 182 - How would one have conflict between pitch structure and
temporal structure?
pg 183 - What exactly is Weber's law?
pg 188 - Why is the ratio of the periods referred to as the bare
winding number?
pg 189 - I don't understand what the different regions mean. Do the
Arnol'd tongues overlap, and if so, what is the significance of
that?
pg 191 - How do the light lines show the effect of coupling?
pg 198 - What does the graph of the combined output show, and how is it
related to the graphs for oscillators one and four?
pg 201 - How do the graphs show that the oscillators are responding
correctly? To me, it looks as if there output is wandering all over
the place.
pg 202 - How would these high level units' behavior emerge from the
behavior of individual neurons?