Author’s Note: While BEAGLE is freely available on Github here, the modifications described below are not. I hope to have a usable version running here or elsewhere soon.
Introduction.
The title alludes to the famous thought-experiment which claims to refute the very idea that a computer could truly understand language in the full sense that humans do. We intend to overcome this refutation by showing how true comprehension of a natural human language can be demonstrated through a suitable computer program, a version of Latent Semantic Analysis.
The Chinese Room
The “Chinese Room” was first put forth by the philosopher Jon Searle [1] In it, he asks us to imagine a man in a room with a shelf of books containing two sorts of items: some cards on which are printed Chinese characters as some books explaining in English the rules of Chinese language use. The man looks at messages that enter through a slot in the door, as in the below image. Since he cannot speak Chinese, he has to consult the books that tell him the rules of which characters should be used to respond, arranging the Chinese characters into the proper response. This response ithe exits through the ‘output’ slot, thus making a proper response to the earlier input.
Nobody would say that the man in the Chinese room understands Chinese. Also, any computer programmed to use a natural human language would (Searle claims) be using the some sort of procedure, merely following rules of which it has no true comprehension. Thus no computer could ever truly understand a human languages, it could only follow rules mindlessly.
Dreyfus and the ‘lifeworld’
In the following study, we show that Searle’s argument is deeply flawed; we shall explain how currently available software can implement a genuine understanding of vocabulary. It proves that we can create artificial minds that truly understand language much like a normal human rather than the prisoner of the Chinese Room. This can be seen in light of work by another famous critic of early AI research, Hubert Dreyfus on the importance of “being-in-the-world” for human intelligence. [2] In Dreyfus’ terms, the man in the Chinese Room and the computers that we use today have no “world”. The present work will show that BEAGLE demonstrates a way that such a world can be implemented artificially.
Background
Early AI research
Searle’s Chinese Room is part of a contentious decades-long debate about the very possibility of AI, and Hubert Dreyfus was teaching at MIT at the time the first AI programs were being created. This early research was unsuccessful, and Dreyfus was well placed to provide feedback from his own philosophical perspective and he gave some suggestions for progress which have been widely discussed since then. Dreyfus later came to call this early AI research “GOFAI”, which stands for “Good Old-Fashioned Artificial Intelligence”. The GOFAI program was based on the idea that the use of “brute force processing” using standard architectures in supercomputers would be able to replicate and surpass the performance of naturally evolved human minds. “Shakey” the robot, “Automated Mathematician”, “SHRDLU”, and many others were made famous in the days of GOFAI. However, in spite of the massive resources poured into these projects, they were unsuccessful. A pattern emerged that came to be called “Moravec’s paradox” [3]. The “paradox” is that it’s more difficult to program a robot to walk around like a toddler or tell the contents of an image than it is to predict the next return of Halley’s Comet.
GOFAI vs. the “world”
Hubert Dreyfus claimed that given the general tendencies of modern philosophy, the human mind was interpreted as a classical GOFAI computer. Since this aproach was unsuccessful, we must then rethink many of the dominant ideas in the philosophy of mind that produced it. According to Dreyfus, what is missing from this is also what is missing in the “Chinese Room” model: “Being-in-the-World”. This term refers to the fact that humans do not compute the answers to problems in anything like the GOFAI model but instead grow up into a world of things that they are engaged with. We do not process this knowledge of our world as Euclidean space with things of various shapes as did SHRDLU and Shakey. Our “life-world” is a network of concern and involvement; containing things we need or obstacles to our work, ideas, words, tools, materials, objects we care for, and things which we depend on that can be broken or missing. The difference between the lifeworld and the GOFAI world are similar to that between Object-oriented Software and previous ways of programming. In object oriented programming, all processes are encapsulated in ‘objects’ defined by usefulness rather than variables and values that merely map onto external objects and their uses.
In this view, many forms of knowledge, for example driving or chess may be constituted for a beginner by sets of rules to be followed (just like a computer program), but these rules are not elaborated or added to (in the sense of adding more rules) to produce an expert. Rather, they allow the beginner to engage with the activity in such a way that they develop intuitive coping mechanisms and proper emotional responses to various circumstances in the domain activities. For example, in driving a car, the driver learns to be relaxed or tense, or apprehensive in the right circumstances. Also, she should be pleased with good performance or pained at bad performance.
On this reading, it would seem that strong AI would be impossible; however certain forms of computing are suitable for emotions and feelings, such as neural networks
[N]o mentalistic model, whether empiricist or idealist, can account for the way past experiences affects present experience, but fortunately, there are models of what might be going on in the hardware that make no use of empiricist association nor of the sort of symbols and rules presupposed in rationalist philosophy and Artificial Intelligence research. Such models are called feed forward simulated neural networks. According to these models, memories of specific situations are not stored and then somehow associated with current input. Rather, if given any input, the connections between “neurons” are modified by a trainer so that that input is paired with what the trainer holds to be the appropriate output. Thereafter, similar inputs will produce the same or similar output.
Feed-forward neural networks, then, provide a model of how the past can affect present perception and action without the brain needing to store specific memories at all. It is precisely the advantage of simulated neural networks that past experience, rather than being stored as a memory, modifies the connection strengths between the simulated neurons. New input can then produce output based on past experience without the net having to, or even being able to, retrieve any specific memories. The point is not that neural networks provide an explanation of association. Rather they allow us to give up seeking an associationist explanation of the way past experience affects present perception and action. [5]
Also relevant is the recent work by cognitive scientists in modeling various forms of emotional behavior through neural networks. For example, Paul Thagard at Waterloo has modeled moral emotional responses this way. [4] However, the implementation of such a rationality requires not merely a neural-network (however sophisticated), it must also have a lifeworld. This is where BEAGLE comes in, for it reads its lifeworld off of the “world” of natural human language. This defines a set of semantic nodes which are functionally equivalent to a neural network unique to each linguistic lifeworld.
Outline of solution. Project Design/Methodology
The Matrix as Lifeworld
Our solution is the demonstration of software that implements a “world” of the sort that Dreyfus describes. To distinguish this sort of world from the common meaning of this term, we shall use the phrase “life-world”, a philosophical term from that refers to a formal outline of the world as lived or as a context of concern and purpose, not to the world as a mere physical system. In this sense, a novel can express a lifeworld by showing the various concerns, motives and actions a person lives through. Our trick is to see how a computer program can embody or comprehend a life-world. Since a lifeworld sums up the simultaneous relationships of a multitude of different items, ideas, actions, qualities, etc, it seems that BEAGLE or something like it could read the formal relationships of a lifeworld out of a large sample of natural language.
Solution Implementation: BEAGLE’s mechanics
Our software “BEAGLE” was created by Dr. Michael Jones for his doctoral thesis in psychology at Queen’s University in 2005. The basic idea is that BEAGLE first parses a large corpus of natural language text. The only processing needed to help the computer is to remove all punctuation and capitalization. The program divides the text into “contexts” which can be a paragraph, full sentence, or article; in our experiment we defined the context as that which lies between any two punctuation marks. (Any definition is fine for our purposes.) The basic idea is to keep track of which words occur together, and you have to arbitrarily choose how to define the basic units of “togetherness”. We ended up with about 20 thousand contexts.
BEAGLE also keeps a list of all words which occur more than once in the entire corpus, generally this adds up to about 50 thousand, which is a little more that your typical human vocabulary. The program has no way to understand words which only occur once, so they are discarded.
[5]
These two sets of data ( words and their contexts) are arranged in a matrix or table (see above) where the columns are the contexts and the rows are the words. Each cell in the matrix contains the number of times that each word occurs in this cell. This is a two dimensional matrix and it tells us a lot about which words occur together very often. This matrix is how BEAGLE computes “semantic distance” which is how our program decides “Which of These Words Belongs”.
However informative this is, it’s not enough for our purposes until it is digested through various mathematical transformations and becomes a ~100-dimensional matrix. [5] Computationally, this is represented by an m by n matrix, with ‘m’ being the number of words in the system’s vocabulary, and ‘n’ being the number dimensions in semantic space. Each row then becomes a vector in semantic space, which allows the derived knowledge to be manipulated with linear algebra. Similarity between words then simply boils down to taking the cosine of the words’ respective semantic vectors. Having somewhere around 100 dimensions is very important for giving results that compare to human levels.
It is logically implied that each matrix is equivalent to one neural network [5], which reminds me of Dreyfus’ suggestion [2] that neural networks in general are more formally congruent to his idea of a life-world. A life-world is also made up of a set of semantic units which relate to each other all at once with different kinds of relationships and various levels of importance. A by-product of the creation of a holographic lexicon is that you are able to define a unique neural network and vice-versa. BEAGLE gives us a way to create an artificial lifeworld by reading off statistical patterns from large bodies of natural language text.
Solution Implementation: “Which One of These Things is Not Like the Others?”
We decided that the proper way to test our model was to take the survey format outline above (“One of These Things is not Like the Other”) and give it to a significant sample of humans. We created a set of questions where all of them were of the form “Which of these words does not belong?” We decided to test the language comprehension of BEAGLE against that of humans by comparing their responses to a simple series of questions. The form that we used was inspired by the Sesame Street feature “One of these things is not like the other…”. In our test, there would be four words to choose from and the test subject would choose which of the words “did not belong”.
And here are the questions from the actual test used for our study, the answer chosen by the majority of humans is in bold, BEAGLE’s choice is in italics (the only case in which they are different is #13):
“Which one of these things does not belong?:”
- Judge court jury teacher
- Hospital pearl school prison
- Sword crossguard English blades
- Christian Church Methodist methods
- President student person myself
- Million one thousand hundred
- Normal weeks months days
- Nation country office union
- One three four five
- British American Russian People
- Company college school student
- Market material business company
- Length width height ladder
- Matter side bottom top
- Hot cold training temperature
- Square circle shape rectangle
- Temple church synagogue community
- Steak potatoes pizza chicken
- Blue green red orange
- Examples of semantic distance from the question set.
It’s interesting that BEAGLE seems to know that “pizza” does not “belong”. While “steak,” “pizza,” and “chicken” are usually the main course, pizza is the odd one out in this set, being both the only composite food item, and the only one which is usually consumed on its own. This difference in role is reflected in the daily use of the word, displacing it semantically from the grouping of the semantic unit defined by the others. This vague illogical concept of “belonging”-ness is merely a reflection of how close things are in one’s lifeworld. Our theory is that the lifeworld is formally congruent to the AI concept of “semantic space”, which is exactly what BEAGLE is reading off of our text corpus.
Notice how the answers to these questions cannot come from a Google search; it requires access to the world of language in the form of intuitions concerning “what belongs together”. BEAGLE determines this from statistical regularities in the co-occurrence of words together or apart. Our hypothesis is that humans do the same thing subconsciously and therefore BEAGLE and humans will have similar scores on tests of “semantic distance”.
Results
General agreement of model with human sample.
We expected that our model’s responses will fall within human range, meaning that it would agree with humans about as much as one human would agree with another. Our results showed about a 93% agreement with humans. We started out with 20 questions, of which one was discarded due to the feeling that including “religion” and “science” in a question might seem unseemly to some respondents. Of these final questions, there was only one that the BEAGLE got wrong. The word set in this one was “length, width, height, ladder”, and our model chose “height” rather than “ladder” or “width”, which is what we would expect for an English speaker. We don’t know for sure yet what the problem is, but we are looking into it. However, it is clear from our results that BEAGLE did well on the test. Because it fell within the normal range of human responses, we consider our model to have passed a “modified Turing Test”. We modified it by restricting our study to on module of function of language use: the modelling of a linguistic lifeworld and the computing of semantic distance of the elements within that lifeworld.
Conclusions: Where to go from here?
We used a robust model of the world which has the following advantages:
1) It can be “read off” of an actual and readily-available text sample, giving it potential access to the collective wisdom of the human race.
2) These text samples can be chosen from among the myriad lifeworlds defined by language communities.
3) These worlds scan be expanded to include visual and practical data as “contexts” in the same way that we used test clauses, since the model itself is neutral to the content of the matrix; as long as something can co-occur, it can be part of the lifeworld. This sort of thing could connect our research to (for example) the research of the Science of Imagination Laboratory at Carleton University, where they model visual imagination using a set of photos tagged with content-labels. If each photo were fed into BEAGLE as a context, and the content tags were fed in as “words”, then visual and text would be part of the same lifeworld matrix and could be the basis of the same sorts of AI that we have exhibited in our study.
4) Practical contexts are also fair game for this treatment, with each sub-task, obstacle, or other practical factors that actions and results could be added to the matrix along with the text and visual data. Really there’s no limit to what sorts of things can be part of a lifeworld for truly Artificial Intelligence once it escapes the Chinese Room.
- Bibliography.
For general background, see this onlineHistory if AI: http://projects.csail.mit.edu/films/aifilms/AIFilms.html
[1] http://plato.stanford.edu/entries/chinese-room/
[2] http://socrates.berkeley.edu/~hdreyfus/html/paper_socrates.html
[3] http://en.wikipedia.org/wiki/Moravec%27s_paradox
[4]P. Thagard Hot Thought. Cambridge, Mass: MIT Press, 2006.
[5] S.T. Dumas, and T.K. Landauer, “A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge,” Psychological Review, vol. 104, no. 2, pp. 211-244, 1997.
[6] From a presentation by Jim Davies and his lab assistants at the “Institute of Cognitive Science Lab Fair” ,Carleton University, in 16OCT2014, but their lab is called the Science of Imagination Laboratory and their website is here: http://scienceofimagination.pbworks.com/w/page/15236327/FrontPage