Concept Learning, Neural Networks, and the Pursuit
of Artificial Intelligence
(written for a Psychology of Learning class)
11/30/2003
The Pursuit of AI
One of several
goals utilizing computer modeling of psychological processes is the achievement
of so-called artificial intelligence (AI), man-made networks of programs
concerted in such as way as to one day achieve consciousness. The
achievement of this end is rapidly developing a broad history of increasingly
successful approximations, but all still fall short of the most straightforward
measure of apparent consciousness: the Turing test. [Non-fulfillment
of Turing test confirmed as of 2000 (Moor, 2001).] There are two primary
possibilities as to why the many aspiring developers of AI have so far failed
to achieve their goal: either modern computers are not yet up to the task
or no one has yet hit upon the right approach. Given the amazing processing
power of clusters of networked computers arranged in parallel and the computational
equivalency estimates of the human brain given by Kurzweil (1990), it seems
most likely that the latter is the case. With this in mind, it seems
the obvious course of action is to attempt to determine a more effective
approach to AI, perhaps one more closely modeled on what we know of human
learning processes.
First and foremost
of the many concerns in developing a more effective approach to AI is to
identify the level or levels of processing which are most essential to the
arisal of consciousness. Consciousness is perhaps must succinctly
described as relational self-awareness, relying upon a self-concept and
other concepts with which that self-concept is related. It then seems
reasonable to conjecture that the essential level we are looking for is
contingent on the ability to learn and combine concepts, and so that is
where we shall focus our attention. In focusing on the conceptualization
process, it is clear that we are following the information processing paradigm,
and thereby largely neglecting the issue of external inputs and outputs such
as visual and auditory senses, as well as motor function. This seems
perfectly appropriate, as the relevant psychological theories and research
in concept learning frequently border on this level of abstraction anyway.
(Also, other subfields of AI are working diligently on the areas which are
neglected here.)
Concept Learning
A concept is
a set of objects, events, or concepts sharing common features (Lieberman,
2000). Concepts can be considered fundamental units of thought comprising
internal relationships in the mind representing relationships between the
many phenomena experienced through living. Investigation of the nature
of a process is a central component of the development of any model, and
concept formation is no different. Thus it is appropriate to take
a look at concept formation in its basic form in animals as well as in its
more complex form in humans.
In animals, concept
formation is not immediately evident, but for several decades now, experiments
have been going on that provide supporting evidence. Rudimentary concept
formation (post-training recognition of novel pictures with human presence)
was demonstrated in pigeons through the use of operant conditioning of screen
pecking by Herrnstein and Loveland (1964). A later experiment by Herrnstein,
Loveland, and Cable (1976) showed that pigeons can even learn to distinguish
pictures of a particular person from other people in similar pictures.
Cook, Katz, and Cavoto (1997) provided evidence that pigeons can learn to
conceptualize sameness versus difference in a variety of visual stimuli.
These are certainly encouraging results, but they represent only one of the
most basic varieties of concept learning: visual discrimination. These
are only a few of many experiments in this area, many of which may be more
impressive, but these few provide a sufficient indication of the basic animal
capacity for concept formation so as to allow procession to more abstract
types of concept formation in humans.
In humans, concept
formation begins at a basic level more or less on a par with that seen in
early experiments in concept learning with pigeons. However, no one
need conduct experiments to confirm that humans can distinguish pictures
containing humans from those that do not, identify an individual person
from others, or identify the presence of trees, colors, or other features.
More meaningful is the investigation of the processes involved in such examples
of conceptualization. Prominent theories in this direction include:
prototype theory, described by Wittgenstein (1953) and further developed
by Rosch (1973); exemplar theory; ACT-R, pioneered by John R. Anderson (1996);
and neural network modeling. Each of these theories is worth some
consideration for use, except perhaps ACT-R, because it is already a major
modeling project in its own right. Prototype and exemplar theories
provide possible explanations of the process of categorization in concept
learning, while neural networks provide a modeling basis for an unknown
(but wide) variety of processes.
Prototype theory
describes a predominantly intensional mode of conceptual categorization,
meaning that it assumes that categories are defined by properties shared
by category elements (embodied in a prototype). The theory supposes
that instances of a concept (which in many cases can be thought of as simply
a category) are identified by their similarity to an ideal form of that concept
called a prototype. Thus a prototype is a mental template that is used
to identify whether or not some thing is a member of the set of things encompassed
by the relevant category/concept. These prototypes are rather like
an average of the known characteristics of members of the conceptual class.
The prototype itself need not exist as an actual instance of the category
it represents; instead, it may be being an amalgam of category members.
It seems reasonable to think of prototypicality as a measure of commonality
between some thing’s characteristics and the set of a category’s characteristics
weighted by their frequency of occurence. For example, suppose there
is someone who has never seen a penguin before, and then, upon seeing one,
tries to classify it as being an animal or not an animal. According
to prototype theory, such a penguin-naïve observer thinks of features
typical of animals, such as a head, a mouth, and eyes. The penguin
appears to have each of these things and so it is determined to be an animal
due to its possession of such characteristics as are typical of other animals
(without especial regard to any other particular animals).
Exemplar theory
describes a predominantly extensional mode of conceptual categorization,
meaning that it assumes that categories are defined by the set of their
member elements. Focusing on members of categories, exemplar
theory supposes that instances of a category/concept are identified by their
similarity to real-world examples, or exemplars, of a category. Exemplars
are stored directly as members of a category and are then compared directly
in identifying new category members. Category membership is concluded
if sufficient similarity to existing category members exists. For example,
suppose our penguin-naïve observer first encounters a penguin and tries
to classify it as an animal or non-animal. According to exemplar theory,
the penguin is compared to specific known animals such a dog, a duck, and
an otter. Assuming the penguin seems relatively similar to these category
exemplars, it will be determined to be an animal as well, and may even be
included in future exemplar sets.
Concept
The prototype
and exemplar approaches to conceptual categorization differ in a subtle
but important way. According to the exemplar approach, category membership
is decided by comparison to numerous specific category members. Contrastingly,
in the prototype approach, category membership is decided by comparison
to a single composite prototype that represents the typical features of the
category. Upon careful analysis, it becomes clear that the comparison
process involved in the prototype approach is a compressed version of the
exemplar comparison process. Instead of making multiple comparisons
of the characteristics of a potential category member to numerous specific
category members, a prototype allows for a single comparison to be made.
Presumably, the prototype is generated by cross-referencing similar category
items, producing a template for recognizing new category members.
Thus it seems that the prototyping approach to categorization may represent
a stage of concept formation that has progressed to a somewhat higher (more
abstract) cognitive level than that represented by the exemplar approach.
Perhaps the prototype approach takes over at the point where the exemplar
approach becomes inefficient or when a sufficient variety of category examples
are available from which to form an effective prototype. Regardless
of the particulars, the result is a conceptual template that is of tremendous
utility.
These conceptual
templates, as representatives of entire categories, can become building
blocks of even more abstract (superordinate) concepts. Continuation
of the process of recursively abstracting categories results in a hierarchy
of concepts, which, to the best of our knowledge, is exclusive to humans.
The development of this hierarchy coincides with the development of language
and underlies its use. The same might be said of intelligence.
When an extensive range of concepts, including self-referent concepts (e.g.
‘I’), are integrated into this conceptual structure, it may be that consciousness
inevitably results. This line of thinking is one of the most profound
ambitions of the field of artificial intelligence, and of this paper’s author
in particular. It is with this in mind that neural networks based
(loosely) on the Collins and Loftus (1975) model are considered as potentially
integral components in modeling the processes of conceptualization.
Neural Networks
Computer models
of neural networks in the brain are commonly referred to as (artificial)
neural networks and are seeing increasing use in computer modeling of psychological
processes. Early neural networks were single layered and processed
signals in a single direction (from input to output), for which quality
they are referred to as feedforward. The most basic type of feedforward
neural network, called a perceptron, consists of an input layer (layer 0)
fully (unidirectionally) connected to an output layer (layer 1). (As
suggested by Wasserman (1989), such a perceptron is best described as being
a single layer neural network, because the input layer (layer 0) performs
no data processing.) Perceptrons were limited to binary inputs as
well as outputs until Widrow and Hoff showed how perceptrons with continuous
inputs could be trained with the use of a sigmoidal activation function
(Widrow, 1959; Widrow & Hoff, 1960). This enhanced variety of
feedforward perceptron, though shown to exhibit the capacity to learn all
of the somewhat impressive range of functions it can represent (Rosenblatt,
1962), was really quite limited in its applications. As Marvin Minsky
and Seymour Papert suggested in their 1969 book Perceptrons, since such simple
functions as the XOR (exclusive or) logic gate could not be learned by perceptrons,
significant advances in neural network design must use one or more hidden
layers nestled between the input and output layers.
Single layer
neural networks consist an input layer of nodes with weighted links to the
output layer of nodes, each of which has a summation function (which serves
to sum the weighted inputs) and an activation function which determines,
based on the node’s activation threshold, the extent to which that sum will
result in output. The weights associated with each processing (non-input)
node are scalar values that are modified as the network is trained.
There are two basic types of neural network training processes: supervised
and unsupervised. Supervised training couples input data with desired
output data, which is used to compute error values (consisting of the values
of the desired output minus the actual output), which are in turn used to
modify the weights of the output layer. The weights of the output
nodes are arithmetically adjusted by an arbitrary learning rate constant
multiplied by their corresponding error values. This process is called
the delta rule, and it is identical to the Rescorla-Wagner model of associative
learning. Unsupervised training consists of input
data only, leaving the neural network to learn based on its own internal
algorithms alone. Supervised learning is typically much more effective.
Multilayer networks
are simply cascades of single layer networks, with each layer receiving
its input from the previous layer’s output. All of the layers in a
multilayer network, excepting layer 0 (the input layer) and layer n (the
final output layer), are referred to as hidden layers. With the inclusion
of hidden layers, training neural networks to achieve their representational
potential becomes a complex issue. Several training methods have been
devised in the interest of seeking more optimal solutions, one of which
is the backpropagation algorithm. With the advent of the backpropagation
algorithm (see Werbos, 1994), it became possible to use multilayer neural
networks in a wide range of applications for which single layer networks
are simply ineffective.
The ineffectiveness of single-layer networks is due to their incapacity
to make discriminations between data sets more complex than those separable
by a single line (for two-input networks) on a two-dimensional (input) graph.
If n is the number of inputs in a single-layer network, a binary-output
function can be represented by the network if the correct outputs of the
function can be divided into two regions (in n-dimensional variable space)
by a (n-1)-dimensional figure. For two-dimensional (2D) spaces the
dividing figure is a line, for 3D spaces it is a plane, for 4D spaces it
is a 3D hyperplane, and so on. All but the most basic of problems are
linearly inseparable and therefore impossible to solve with a single-layer
network.
Multilayer neural
networks are capable of representing far more complex functions than single-layer
networks. Two-layer networks with n inputs can represent functions
whose output areas (in their n-dimensional variable space) can be bounded
by n lines or line segments, provided that the resulting area is convex,
meaning that a line drawn between any two points inside the area lies entirely
within the area (thus excluding shapes like crescents). Three-layer
networks can represent any variety of areas by combining the convex shapes
of two layers in any number of ways bounded only by the number of neurons
used. So knowing what neural networks are capable of, the process
of training them to actually do so (in an efficient and reliable manner)
is the next most important matter.
Training Neural Networks
Of the two basic
neural network training modalities, supervised and unsupervised, only the
unsupervised mode is typically considered biologically plausible.
In the case of humans especially, this seems like a ridiculous position
to take. Typical human development is strongly characterized by feedback
at a variety of levels in almost every conceivable realm of behavior, internal
and external. It is fairly obvious that even infants are capable of
distinguishing (at some level) between comfort and discomfort, which is
enough to provide them with feedback about the effectiveness of their basic
bio-survival actions. In learning to understand language, children
attempt to mimic the sounds of the words they hear, their success at which
they are able to judge by comparing the sounds they’ve made to the mimicked
sound. Children often learn self-control, social skills, and various
intellectual behaviors through parental reinforcement. Students learn
whether or not the exceptional length of time and effort spent on researching,
thinking about, and writing a paper was worthwhile based on the grades and
feedback received from their instructors (as well as from internal satisfaction
with their work, hopefully). All of these are characteristic of the
influence of supervised learning. Learning literature provides additional
examples too innumerable to even begin to list.
One might argue
that the given examples are all macro-level phenomena and therefore inapplicable
at the level of biological neural networks (such an argument shows a great
lack of imagination, but it deserves an attempted elaboration nonetheless).
Following the example of speech-mimicry, suppose a sound pattern (word) is
transmitted through ear and cochlea, transformed into a form usable by the
speech-interpretation areas of the brain, which somehow communicates its
interpretation to the speech production areas, resulting in speech (or an
approximation thereof). At first, these speech approximations will
be rather poor, and probably not sound quite right even to the child.
Recognizing that some degree of error has been made (unsatisfied mimicry-intent),
the entire speech processing/production pathway may then make adjustments
at each level of processing until the result is satisfactory. One
plausible description of this process is exemplified by the backpropagation
algorithm.
The backpropagation
algorithm for training multilayer neural networks is an extension of the
basic supervised training algorithm for single-layer networks, the details
of which can be found in Rumelhart, Hinton, and Williams (1986). The
primary novelty of the backpropagation algorithm is that the error values
calculated and applied to the output layer are then multiplied by the derivative
of the current layer’s activation function and propagated backwards as the
error values to be used for the adjustment of all the hidden layers in the
network. Because of the use of the derivative of the activation function
in the backpropagation algorithm, it is essential to use a continuously differentiable
activation function such as the sigmoid function, which is commonly used
as the activation function due to its property of squashing of exceptionally
high and low inputs to a manageable mid-range. While this method allows
for neural networks to be trained to represent a wide variety of tasks, the
convergence of the network to a state of stable representation of sufficient
accuracy can take much longer than with more advanced algorithms.
A variety of
more advanced algorithms have been developed as improvements upon or alternatives
to the backpropagation algorithm, one of which is Parker’s (1987) second-order
backpropagation algorithm, which uses second derivatives to better approximate
the optimal weight adjustments to be made during training. Perhaps
even more useful was the discovery by Almeida (1987) and Pineda (1987) of
an implementation of the backpropagation algorithm for recurrent networks
(networks whose outputs feedback to the inputs), the result being a significant
improvement in efficiency over feedforward-only types. Recurrent networks
in general may be, in at least some cases, a more realistic representation
of the brain’s functioning than feedforward networks; this is particularly
true in the case of working memory where learning by rehearsal requires a
feedback loop. There exist even more advanced neural networking paradigms
incorporating recurrence, which, though typically more computationally intensive,
may be well worth the extra processing time for AI applications. One
of the most promising approaches, especially appropriate for concept learning
in particular, is ART (Adaptive Resonance Theory) (Grossberg, 1987;Carpenter
& Grossberg, 1987).
ART (Adaptive Resonance Theory)
Neural networks
designed under the architecture of ART are formidable tools for constructing
functional models of learning processes (such as conceptualization) that
are essential in the development of mature human intelligence. ART-based
neural networks are considerably more complex than the more basic varieties
of neural networks that use such algorithms as backpropagation, yet remain
highly flexible and applicable to the modeling of virtually any cognitive
function. One of the foremost features of ART-based networks that
make them ideal for modeling cognitive processes is a blend of stability
and plasticity that allows them to learn to recognize new patterns while
never losing the capacity to make previously learned pattern classifications.
While this limits the number of pattern classifications that can be made
by an ART network of fixed size, it also guarantees the reliability of the
classifications that can be made, a worthwhile tradeoff.
All varieties
of ART-based neural networks share a set of basic features, though many
modifications and enhancements have been developed over the years.
Initially though, it is most important to focus on the core features of ART
as described by Wasserman (1989). The basic ART system is an unsupervised
learning model and typically consists of comparison and recognition fields
(one each) of neurons, a vigilance parameter, and a reset module. (The
setting of the vigilance parameter has considerable influence on the functioning
of the system: higher vigilance will result in highly accurate memories of
exacting detail, while lower vigilance will result in more abstract categorizations
of greater generalizability.) The comparison field takes an input vector
(a one-dimensional array of values) and transfers it to the recognition
field, where its best match is found to be the single neuron whose set of
weights (weight vector) most closely matches the input vector. Each
recognition field neuron has an output that feeds a negative signal (proportional
to that neuron’s quality of match to the input vector) to the inputs of each
other recognition field neuron and inhibits their output accordingly.
Thus the recognition field exhibits lateral inhibition, allowing each neuron
in it to represent a category to which input vectors are classified.
After the input vector is classified, the reset module compares the strength
of the recognition match to the vigilance parameter. If the vigilance
threshold is met, training commences. Otherwise, if the match level
does not meet the vigilance parameter, the firing recognition neuron is inhibited
until a new input vector is applied; training commences only upon completion
of a search procedure. In the search procedure, recognition neurons
are disabled one by one by the reset function until the vigilance parameter
is satisfied by a recognition match. If no committed recognition neuron’s
match meets the vigilance threshold, then an uncommitted neuron is committed
and adjusted towards matching the input vector.
There are two
basic methods of training ART-based neural networks: slow and fast.
In the slow learning method, the degree of training of the recognition neuron’s
weights towards the input vector is calculated to continuous values with
differential equations and is thus dependent on the length of time the input
vector is presented. With fast learning, algebraic equations are used
to calculate degree of weight adjustments to be made, and binary values
are used. While fast learning is effective and efficient for a variety
of tasks, the slow learning method is more faithful to biological precedents
as well as being infinitely more appropriate for the real-time operation
necessary to realize the goal of para-human consciousness in AI.
Expanding upon
the basic ART architecture are a variety of systems, each with their own
special properties (summarized here). ART-1 is the simplest variety
of ART networks, accepting only binary inputs. ART-2 extends network
capabilities to support continuous inputs, a more flexible configuration.
ART-2A is a streamlined form of ART-2 with a drastically accelerated runtime
resulting, the tradeoff in results being only rarely suboptimal compared
to the full ART-2 implementation (Carpenter, Grossberg, & Rosen, 1991a).
ART-3 (Carpenter & Grossberg, 1990) builds on ART-2 by simulating rudimentary
neurotransmitter regulation of synaptic activity by incorporating simulated
sodium (Na+) and calcium (Ca2+) ion concentrations into the system’s equations,
which results in a more physiologically realistic means of partially inhibiting
categories that trigger mismatch resets. Fuzzy ART (Carpenter, Grossberg,
& Rosen, 1991b) implements fuzzy logic into ART’s pattern recognition,
thus enhancing generalizability. An optional (and very useful) feature
of fuzzy ART is complement coding, a means of incorporating the absence
of features into pattern classifications, which goes a long way towards
preventing inefficient and unnecessary category proliferation. ARTMAP
(Carpenter, Grossberg, & Reynolds, 1991), also known as Predictive ART,
combines two slightly modified ART-1 or ART-2 units into a supervised learning
structure where the first unit takes the input data and the second unit
takes the correct output data, then used to make the minimum possible adjustment
of the vigilance parameter in the first unit in order to make the correct
classification. Fuzzy ARTMAP (Carpenter et al., 1992) is merely ARTMAP
using fuzzy ART units, resulting in a corresponding increase in efficacy.
All of the above
ART systems are capable of demonstrating impressive capacity for and accuracy
of learning, and each may have its place in a complete artificial intelligence
project, though some systems may be more or less appropriate for various
functions. For instance, ARTMAP and Fuzzy ARTMAP are particularly
effective in situations when supervised learning is feasible; a Fuzzy ARTMAP
network has been shown to satisfactorily learn a test set in 1/4000 as many
training sessions as a backpropagation network (Carpenter, Grossberg, &
Rosen, 1991b) (though it could be that Fuzzy ARTMAP took longer to complete
each training session). Continuous-valued Fuzzy ART may be the most
appropriate choice for unsupervised applications because it is an intuitively
better approximation of biological learning (i.e., it operates in real time).
Fuzzy ART has the additional advantage of using complement coding to ensure
the efficient use of recognition layer neurons, which is synonymous with
making more efficient use of computer memory.
Hierarchy, Attention, and Context in Conceptual Development
Since we are
developing a functional model of concept formation’s role in the development
of AI (the possibility of which is rather a hypothesis in itself), it is
not necessary to base the design exclusively on empirically verified theories.
Certainly in many cases it will be beneficial to make use of empirical evidence
characterizing the nature of biological intelligence, but by no means will
all of the elements of this AI design derive from biological models.
In fact, in attempting to provide an existential proof of the AI hypothesis
(merely that it can be done), we can afford to use hypothetical arguments
and even wild speculation so long as it seems to be useful in the overall
design. This in mind, the roles of hierarchy, attention, and context
are discussed subsequently.
It is fairly
evident at this point that the organization of conceptual structure in consciousness
is hierarchical in nature. Therefore, it is obvious that the structure
of models of concept learning should be hierarchical as well. It has
been suggested by Dean Keith Simonton (1999) that creativity may be inversely
proportional to the degree of hierarchical organization in a person’s conceptual
structures, and extending this idea to AI, it may be desirable to initially
opt for a lesser degree of creativity by emphasizing hierarchy. It
would not do to have a pioneering example of successful AI to appear to
be overly creative, as such creativity might lead to unpredictable behavior
that could be regarded as dangerous and aversive. Of course, this may
be an overly optimistic concern, but nonetheless, the need for hierarchy
in concept formation is clear.
Attention is
a very slippery subject because of the tremendous role it plays in so many
aspects (perhaps all aspects?) of consciousness, but hopefully by focusing
on its role in the process of conceptualization a useful analysis and synthesis
can be explicated. In concept formation, a set of features is identified
as being common to one or more perceptual objects (be they visual, aural,
kinesthetic, conceptual, or otherwise) that are then combined into a conceptual
category. At first, when this category consists of only a few members
(or even just one) it may resemble the description of exemplar theory, but
it is most likely that as more members are subsumed into the category it
becomes more like a prototype. As members of a conceptual category
are recognized, it becomes apparent which features most consistently and
saliently recur throughout the category. These recurring features
then become focal points of attention in the context of recognizing the
category to which they apply.
Multiple levels
of categorization occur in the process of concept formation, and conceptual
dimensionalization is sometimes active in the formation of higher-level
concepts. In conceptual dimensionalization, as already-formed basic
concepts recur in recognizable patterns, they will themselves become the
basic elements for the formation of new, more abstract, concepts.
In some cases, the conceptual components of these more abstract concepts
are usefully analyzed by virtue of the quality or frequency of their occurrence,
that is, their dimensions. Examples of this dimensionality are clearest
in such categories as polygons (dimension: number of sides) and colors (dimensions:
light frequency and intensity). Hierarchical Fuzzy ART networks should
be capable of such classifications without any special configuration because
of the multiple levels of synaptic weights involved. Thus, in a sense,
a hierarchical neural network approach to AI may be able to pay attention
(and do so selectively at a variety of conceptual levels). In fact,
Grossberg (2003) specifically refers to this capacity of ART-based neural
networks as attentional focus.
Having established
that a hierarchical system constructed with ART-based neural networks could
effectively pay selective attention to relevant components of concepts in
the process of higher-level concept formation, it becomes necessary to establish
the capability to do so in a useful manner. No information is readily
available on the ability of such complex systems to provide the information
they are clearly capable of providing in a way that it is relevant to the
context in which it is needed. Therefore, it seems reasonable to hypothesize
a system architecture in which the conveyance of the contextual relevance
of information would be possible.
Architecture and System Function
Many important issues
in the design of a concept-learning based AI system do not arise until larger
portions of total system function are analyzed, and so it becomes prudent
to develop significant portions of system architecture that will produce
the desired functionality. Inevitably, shortcomings will become evident,
and an essential part of the design process is to identify and solve the
problems underlying those shortcomings. Suppose then, that the hierarchies
of ART-based neural networks are arranged pyramidally, typical hierarchical
form. Considering the level of necessary connectivity and addressability
for conceptual structures to be very high, these (essentially two-dimensional)
pyramids of conceptual hierarchies then should be arranged in a form that
will allow both the origin of input and level of abstraction of concept
nodes’ output to be easily mapped. A torus in which pyramid tops are
radially arrayed towards the center and their bases spreading away from the
center seems like a feasible structure. In this arrangement, the pyramids
of neural networks would exhibit radial symmetry, with no more than two
pyramids fully in any given plane. The torus shape has several advantages:
concept nodes are addressable by radial position (general area of input origin)
and distance from the center (level of abstraction); the addition of additional
modules at any orientation to the torus is quite feasible; the torus is arbitrarily
scalable to allow any necessary level of penetrability and exposure of surface
area; the central region is left open for unknown functions that may be
hierarchically superior to the primary conceptual structure; the torus readily
lends itself to a variety of interesting mathematical analyses. It
should be noted that because the system is to be implemented in software
its physical geometry is irrelevant to its function, and is presented only
as a visualizable metaphor for its actual computational functioning.
Supposing then that
the AI system’s hierarchies of neural networks are arranged in the form
of a torus, with sensory inputs typically converging upon the outer surface
of the torus, and the level of conceptual abstraction throughout the torus
increasing towards its center, it does not appear that contextual information
would be maintained with a strictly linear traversal of the conceptual structure
from margin to center. However, if a second set of output lines were
to branch off from each conceptual node of the hierarchy and feed to all
output processing areas that might need that information, essential contextual
information might be preserved. For examples, such connections would
logically extend to language processing modules for the production of speech
and written language.
The open central
region of the torus would be an ideal place for the keystone of the AI system:
the self-concept. This is highly speculative, but perhaps a specially
interlaced group of ART networks could be inserted into the central region
after stable high-level concepts are developed. Desirable traits could
be chosen from among the high-level concepts available, and connected, with
positive weights, to this inner node structure. Undesirable traits
could be connected with negative weights, and irrelevant traits might remain
disconnected. It is then possible that this unification of concepts
would become a self-concept for the AI. It is also quite possible
that the desired effect would not be achieved at all, but regardless, the
results are almost certain to be quite informative.
Conclusion
It may still
be premature to attempt a complete AI system, but it is undoubtedly useful
to continually evaluate progress towards that objective in the process of
researching and synthesizing the architecture of each component process.
Undoubtedly a great deal of work remains to be done in designing and testing
all aspects of the functioning of such a system, but the problems involved
are gradually becoming comprehensible. For many years the prospect
of designing artificial intelligence that might become (or at least seem
to be) conscious was little more than a dream, but it is a dream that is
becoming closer and closer to reality. The use of neural networks
and the emphasis of concept learning seems to be essential elements of the
realization of this dream; in combination they begin to suggest key pieces
of the puzzle of arising consciousness.
Almeida, L.B. (1987). A learning rule for asynchronous perceptrons
with feedback in a combinatorial environment. Proceedings of the First International
Conference on Neural Networks, 2, 609-618.
Anderson, John R. (1996). ACT: A simple theory of complex cognition. American
Psychologist, 51, 355-356.
Carpenter, G.A. & Grossberg, S. (1987). ART 2: Self-organization of stable
category recognition codes for analog input patterns. Applied Optics 26(23),
4919-4930.
Carpenter, G.A. & Grossberg, S. (1990). ART 3: Hierarchical search using
chemical transmitters in self-organizing pattern recognition architectures.
Neural Networks, 3, 129-152.
Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., & Rosen,
D.B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental
supervised learning of analog multidimensional maps. IEEE
Transactions on Neural Networks, 3, 698-713.
Carpenter, G.A., Grossberg, S., & Reynolds, J.H. (1991). ARTMAP: Supervised
real-time learning and classification of nonstationary data by a self-organizing
neural network. Neural Networks, 4, 565-588.
Carpenter, G.A., Grossberg, S., & Rosen, D.B. (1991a). ART 2-A: An adaptive
resonance algorithm for rapid category learning and recognition. Neural
Networks, 4, 493-504.
Carpenter, G.A., Grossberg, S., & Rosen, D.B. (1991b). Fuzzy ART: Fast
stable learning and categorization of analog patterns by an adaptive resonance
system. Neural Networks, 4, 759-771.
Carpenter, G.A. & Grossberg, S. (2003). Adaptive Resonance Theory. In
M.A. Arbib (Ed.), The Handbook of Brain Theory and Neural Networks, Second
Edition (pp. 87-90). Cambridge, MA: MIT Press.
Collins, A.M. & Loftus, E.F. (1975). A spreading-activation theory of
semantic memory. Psychological Review, 14, 407-428.
Cook, Robert G., Katz, Jeffrey S., & Cavoto, Brian R. (1997). Pigeon
same-different concept learning with multiple stimulus classes. Journal of
Experimental Psychology: Animal Behavior Processes, 23, 417-433.
Grossberg, S. (1987). Competitive learning: From interactive activation to
adaptive resonance. Cognitive Science, 11, 23-63.
Herrnstein, R. J., & Loveland, D. H. (1964). Complex visual concept in
the pigeon. Science, 146, 549-551.
Herrnstein, R. J., Loveland, D. H., & Cable, C. (1976). Natural concepts
in pigeons. Journal of Experimental Psychology: Animal Behavior Processes,
2, 285-302.
Kurzweil, R. (1990). The age of intelligent machines. Cambridge, MA: MIT
Press.
Lieberman, David A. (2000). Learning: Behavior and Cognition (3rd ed). Belmont,
CA: Wadsworth/Thomson Learning.
Minsky, Marvin, & Papert, Seymour. (1969). Perceptrons. Cambridge, MA:
MIT Press.
Parker, D.B. (1987). Optimal algorithms for adaptive networks: second order
back propagation, second order
direct propagation, and second order Hebbian learning. Proceedings of the
IEEE International Conference on Neural Networks, 2, 593-600.
Pineda, F. J. (1987). Generalization of backpropagation to recurrent neural
networks. Physical Review Letters, 59(19), 2229-2232.
Rosenblatt, Frank. (1962). Principles of neurodynamics: perceptrons and the
theory of brain mechanisms. New York: Spartan Books.
Rosch, Eleanor H. (1973). Natural categories. Cognitive Psychology, 4, 328-350.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal
representations by error propagation. In Parallel distributed processing:
explorations in the microstructure of cognition (pp. 318-362).
Cambridge, MA: MIT Press.
Simonton, D.K. (1999). Origins of genius: Darwinian perspectives on creativity.
New York: Oxford University Press.
Wasserman, Philip D. (1989). Neural computing: theory and practice. New York:
Van Nostrand Reinhold.
Werbos, Paul J. (1994). The roots of backpropagation: from ordered derivatives
to neural networks and political forecasting. New York: Wiley.
Wittgenstein, Ludwig. (1953). Philosophical investigations (G. E. M. Anscombe,
trans.). Oxford: Blackwell.