CONNECTIONISM

knowledge is represented in the connections between 
computational nodes or "neurons"

	parallel distributed processing - computation and 
	knowledge is distributed over a variety of nodes and 
	connections that can be evaluated simultaneously

	subsymbolic computation - no such thing as 
	representations of Newell and Simon's symbols

	modeled after biological neurons

How the Brain Works

	highly interconnected neurons

	neuron
	  soma - cell body
	  dendrites - receive connection from other cell
	  axon - long "cable" that leads to output synapses
	  synapse - connection to other cell

	action potential
	excitatory vs. inhibitory

	plasticity - long term changes in connections

Computer vs. Human Brain

	computational units, storage units, cycle time, bandwidth, 
	updates per second, graceful degradation



				NEURAL NETWORKS

Neural networks consist of a number of nodes (units), 
connected by links.

Each link has a numeric weight.

Each node has an activation value (normally between 0 
and 1) which is computed using a activation function.

Node's input is linear function of activations and weights:
	ini = Sum(aj * Wj,i)

Nodes output activation is most often a nonlinear function.

	A step function is motivated by biological neurons, 
	where below a certain input stimulus the neuron does 
	not fire, and above a level it does fire.

	Sigmoid functions are also used.

	Mathematically, the step function can be modeled as 
	an extra link into node (allows for learning purely as 
	modification of weights on links.)



				NETWORK STRUCTURES

Nodes are composed into networks of different structures.

feed-forward -- directed acyclic graph
	simple algorithm to compute from inputs to outputs
	stable outputs

recurrent -- includes cycles
	looks allow for feedback
	can oscillate, become unstable, or result in chaotic 
		behavior

Hopfield networks

	all units are both input and output

	all units have input and output connections to all other 
	units with symmetric weights (Wi,j = Wj,i)

	used for associative memory (partial input retrieves 
	"memorized" total input of training example)

	# of inputs remembered = .138 * (# of nodes)

Boltzmann Machines

	symmetric weights

	includes units neither input or output (called hidden)

	stochastic activation function (probability of activation 
	being 1 a function of weighted input)



			MULTILAYER FEED-FORWARD NETWORKS

Nodes organized into layers
	input layer
	output layer
	hidden layer(s)

Each node has link to each node in next layer:

	each input layer node connects to first hidden layer node
	each last hidden layer node connects to each output 
		layer node

Given a set of inputs and desired outputs, the weights in such a
network can be trained through nonlinear regression.

How many nodes are needed?

	with too few, the training examples will conflict and pull all 
	the weights in differing directions

	with too many, the weights will specialize to each input and 
	generate a lookup table

	"optimal brain damage" algorithm -- create a large network, 
	train it, then start removing nodes that do little

	tiling algorithm -- start with single unit and build up a 
	network as needed (like decision-tree learning)



				PERCEPTRONS

Single layer feed-forward networks

	primarily used in the late 50's

	single layer because no learning algorithms for more 
		complex networks were known

	each output is independent of other outputs

Minsky and Papert proved limitations in representation 
(can represent boolean and, or but not xor)

	each output is linear function of inputs

Learning linearly separable functions

	initial links assigned random values between -.5 and .5

	Error = correct-output (T) - generated-output (O)

	if Error > 0 then increase O, if Error < 0, decrease O

	modify weights by:

		Wj <- Wj + alpha * Ij * Error



			BACK-PROPAGATION LEARNING

A learning algorithm for multilayer feed-forward networks.

Limitations:

	not guaranteed to converge to global optimum

	not guaranteed to be efficient

As with Perceptron learning adjust weights to improve 
output results.

	but which weights since each output has many paths 
		to each input

Adjusting weights from hidden units to output units is 
similar to Perceptron learning.

	Wj,i = Wj,i + a * aj * g'(ini) * Errori 

	precompute node-specific constants Di = g'(ini) * Errori

	Wj,i = Wj,i + a * aj * Di

How to compute error term for intermediate units?

	use a partial weighted sum of errors of outputs

	Dj = g'(inj) * Si (Wj,i * Di)

Applying same weight modifications to earlier links:

	Wk,j = Wk,j + a * Ik * Dj



			APPLICATIONS OF NEURAL NETWORKS

Recognition
	Handwritten character recognition
	Face recognition

Control
	Driving
	robot arm movement

Generation
	Pronunciation
	Music composition

What about using neural networks to learn patterns 
and then generate symbolic knowledge?

	Problem: distributed subsymbolic computation

	Is there a possibility of moving from connection 
	weights to symbolic representation?

<\pre>