CONNECTIONISM knowledge is represented in the connections between computational nodes or "neurons" parallel distributed processing - computation and knowledge is distributed over a variety of nodes and connections that can be evaluated simultaneously subsymbolic computation - no such thing as representations of Newell and Simon's symbols modeled after biological neurons How the Brain Works highly interconnected neurons neuron soma - cell body dendrites - receive connection from other cell axon - long "cable" that leads to output synapses synapse - connection to other cell action potential excitatory vs. inhibitory plasticity - long term changes in connections Computer vs. Human Brain computational units, storage units, cycle time, bandwidth, updates per second, graceful degradation NEURAL NETWORKS Neural networks consist of a number of nodes (units), connected by links. Each link has a numeric weight. Each node has an activation value (normally between 0 and 1) which is computed using a activation function. Node's input is linear function of activations and weights: ini = Sum(aj * Wj,i) Nodes output activation is most often a nonlinear function. A step function is motivated by biological neurons, where below a certain input stimulus the neuron does not fire, and above a level it does fire. Sigmoid functions are also used. Mathematically, the step function can be modeled as an extra link into node (allows for learning purely as modification of weights on links.) NETWORK STRUCTURES Nodes are composed into networks of different structures. feed-forward -- directed acyclic graph simple algorithm to compute from inputs to outputs stable outputs recurrent -- includes cycles looks allow for feedback can oscillate, become unstable, or result in chaotic behavior Hopfield networks all units are both input and output all units have input and output connections to all other units with symmetric weights (Wi,j = Wj,i) used for associative memory (partial input retrieves "memorized" total input of training example) # of inputs remembered = .138 * (# of nodes) Boltzmann Machines symmetric weights includes units neither input or output (called hidden) stochastic activation function (probability of activation being 1 a function of weighted input) MULTILAYER FEED-FORWARD NETWORKS Nodes organized into layers input layer output layer hidden layer(s) Each node has link to each node in next layer: each input layer node connects to first hidden layer node each last hidden layer node connects to each output layer node Given a set of inputs and desired outputs, the weights in such a network can be trained through nonlinear regression. How many nodes are needed? with too few, the training examples will conflict and pull all the weights in differing directions with too many, the weights will specialize to each input and generate a lookup table "optimal brain damage" algorithm -- create a large network, train it, then start removing nodes that do little tiling algorithm -- start with single unit and build up a network as needed (like decision-tree learning) PERCEPTRONS Single layer feed-forward networks primarily used in the late 50's single layer because no learning algorithms for more complex networks were known each output is independent of other outputs Minsky and Papert proved limitations in representation (can represent boolean and, or but not xor) each output is linear function of inputs Learning linearly separable functions initial links assigned random values between -.5 and .5 Error = correct-output (T) - generated-output (O) if Error > 0 then increase O, if Error < 0, decrease O modify weights by: Wj <- Wj + alpha * Ij * Error BACK-PROPAGATION LEARNING A learning algorithm for multilayer feed-forward networks. Limitations: not guaranteed to converge to global optimum not guaranteed to be efficient As with Perceptron learning adjust weights to improve output results. but which weights since each output has many paths to each input Adjusting weights from hidden units to output units is similar to Perceptron learning. Wj,i = Wj,i + a * aj * g'(ini) * Errori precompute node-specific constants Di = g'(ini) * Errori Wj,i = Wj,i + a * aj * Di How to compute error term for intermediate units? use a partial weighted sum of errors of outputs Dj = g'(inj) * Si (Wj,i * Di) Applying same weight modifications to earlier links: Wk,j = Wk,j + a * Ik * Dj APPLICATIONS OF NEURAL NETWORKS Recognition Handwritten character recognition Face recognition Control Driving robot arm movement Generation Pronunciation Music composition What about using neural networks to learn patterns and then generate symbolic knowledge? Problem: distributed subsymbolic computation Is there a possibility of moving from connection weights to symbolic representation? <\pre>