VISION AND PERCEPTION

Many of the computer systems, robotic or not, portrayed in science
fiction as being intelligent or "life-like" include the ability to map
visual stimuli into a representation of the physical world.

Why is this so important?

	So far we have talked about agent percepts as being 
	symbolic expressions about an abstract world (we feel 
	a breeze, or smell the stench, or hear a scream.)

	In the real world, we have to map from input devices, 
	like cameras, microphones, laser range finders, etc. to 
	symbolic expressions.

If a system can map from visual (or other) stimuli to symbolic
expressions, then we can use search, logic, inference, and planning to
determine how to:
	recognize objects in the world,
	navigate about in the world, and
	manipulate objects in the world.

What makes a smart-bomb smart?

	The fact it uses perceptual information to recognize 
	landmarks so it can perform navigational correction 
	during its flight and to recognize its target.



			IMAGE FORMATION

Images are generally captured with a scope or camera.

Perspective projection:

	parallel lines come together at a distance

	the closer something is to the camera/scope the 
	greater the portion of the image it will take up

Optics can be used to determine the physical mappings of 
positions in image to rays extending from the camera or 
scope opening or lens.

	an infinite number of positions in real world for each 
	pixel in image

	lens cause focus issues and generate a depth of field

Number of pixels determines amount of raw data and 
effects potential for image understanding.

	Common digital cameras have from 256x256 to 
	2048x2048 images.

	Human eye has approximately 120,000,000 rods and 
	6,000,000 cones arranged in hexagonal grid.

Each pixel will contain one or more intensity values 
indicating amount of light for this point in the image.



			IMAGE PROCESSING

Edge Detection

	map from variation in intensities to discontinuities in 
	surfaces in the world

	in human perceptual system, edge detection begins in 
	the optic nerve (before the optic signal reaches the brain)

	in pixelated images, a combination of identifying large 
	derivatives of brightness, apply a smoothing function, 
	and perform convolution

Scene Processing

	segment scene into distinct objects

	determine position and orientation of each object

	determine shape of each object

Extracting 3-D (Depth) Information

	motion -- as camera moves, look for motion of 
		identifiable features

	binocular stereopsis -- compare disparity between two 
		images from different locations

	texture -- patterns of texture vary regularly depending 
		upon orientation and distance

	shading -- brightness of reflection on a surface varies 
		with orientation and distance



		FROM LINES AND OBJECTS TO SHAPES

Once we have moved from pixels to lines and objects, we 
can begin to determine the shapes of the objects.

Approach: Categorize lines based on physical reason for 
their existence.

Categories for lines:
	Boundary -- line marks border between objects
	Interior -- line marks border between surfaces of single object.

	Interior lines classified as to whether the connection of 
	surfaces is convex (protruding) or concave (recessed).

Limiting world to set of objects with three-faced vertices 
yields eighteen possible junction configurations (out of 
208 if there were not physical constraints.)

David Waltz's Constraint Propagation

By working from any known line types, fill in other lines 
based on physical constraints.

Introducing shadows and cracks increases the number of 
connections and labellings but constraint propagation still 
works (just with larger search space.)