Introduction:
Kismet is a robot made in the late 1990s at Massachusetts Institute of Technology with auditory, visual and expressive systems intended to participate in human social interaction and to demonstrate simulated human emotion and appearance. The name Kismet comes from the Arabic, Turkish, Urdu, Hindi and Punjabi word meaning "fate" or sometimes "luck"
Hardware design and construction:
In order for Kismet to properly interact with human beings, it contains input devices that give it auditory, visual, and proprioception abilities. Kismet simulates emotion through various facial expressions, vocalizations, and movement. Facial expressions are created through movements of the ears, eyebrows, eyelids, lips, jaw, and head. The cost of physical materials is an estimated US$25,000.
Four color CCD cameras mounted on a stereo active vision head and two wide field of view cameras allow Kismet to decide what to pay attention to and to estimate distances. A .5 inch CCD foveal camera with an 8 mm focal length lens is used for higher resolution post-attentional processing, such as eye detection.
By wearing a small microphone, a user can influence Kismet's behaviour. An auditory signal is carried into a 500 MHz PC running Linux, using software developed at MIT by the Spoken Language Systems Group that can process real-time, low-level speech patterns. A 450 MHz PC running NT processes these features in real-time to recognize the spoken affective intent of the caregiver.
In addition to the computers mentioned above, there are four Motorola 68332s, nine 400 MHz PCs, and another 500 MHz PC.
Maxon DC servo motors with high resolution optical encoders are positioned to give Kismet three degrees of eye movement, which allow it to control gaze direction and gives Kismet the ability to move and orient its eyes like a human. This allows Kismet to simulate human visual behaviors. It also allows humans to assign a communicative value to eye movements and to allow Kismet to focus on what it deems important in its field of vision. Software system
Kismet's social intelligence software system, or synthetic nervous system (SNS), was designed with human models of intelligent behaviour in mind. It contains six subsystems as follows.
Low-level feature extraction system:
This system processes raw visual and auditory information from cameras and microphones. Kismet's vision system can perform skin-color detection, eye detection, and motion detection. Whenever Kismet moves its head, it momentarily disables its motion detection system to avoid detecting self-motion. It also uses its stereo cameras to estimate the distance of an object in its visual field, for example to detect threats -- large, close objects with a lot of movement.
Kismet's audio system is mainly tuned towards identifying affect in infant-directed speech. In particular, it can detect five different types of affective speech: approval, prohibition, attention, comfort, and neutral. The affective intent classifier was created as follows. Low-level features such as pitch mean and energy (volume) variance were extracted from samples of recorded speech. The classes of affective intent were then modeled as a gaussian mixture model and trained with these samples using the expectation-maximization algorithm. Classification is done with multiple stages, first classifying an utterance into one of two general groups (e.g. soothing/neutral vs. prohibition/attention/approval) and then doing more detailed classification. This architecture significantly improved performance for hard-to-distinguish classes, like approval ("You're a clever robot") versus attention ("Hey Kismet over here").
Attention system:
Kismet's attention system selects stimuli in its environment to direct the robot's attention and gaze to, for example if something suddenly appears. The system has two stages: pre-attentive, which uses the low-level visual feature detectors to detect colors and motions, and a limited capacity stage which processes a certain region in the visual field. For example, facial expression recognition or object detection is done in the limited capacity stage. This attention system is influenced not only by external factors, but by Kismet's current task at hand (seek-people vs. seek-toys) or habituation (its limited attention span).
High-level perceptual system:
Kismet's perceptual system translates low-level features into meaningful events. This is done through releasers. A releaser is a kind of checklist, which assesses a combination of low-level features to decide what kind of event it is. For example, "big, fast motion, and close" will indicate a threat.
Motivation system:
The motivation system coordinates Kismet's drives and emotions. The drive subsystem regulates Kismet's social, stimulation and fatigue related needs. Like in an animal that has a level of hunger, each drive becomes more intense until it is satiated. These drives affect Kismet's emotion system, which contains the 6 basic emotions as described by Paul Ekman: anger, disgust, fear, joy, sorrow, surprise. In addition, it contains three arousal states: boredom, interest, and calm. These emotional states can activate behaviors. For example, the fear emotion can induce the escape behavior.
At any given moment, Kismet can only be in one emotional state at a time. However, Breazeal states that Kismet is not conscious, so it does not have feelings.
Behavior system:
Kismet's behavior system decides what behavior to carry out. Behaviors include play with toy, greet person, sleep, and so on. Each behavior receives input from the emotion system, drive system, and various releasers. The values from these modules are combined to produce an activation level value for each behavior. If the activation level of the behavior reaches a certain threshold, Kismet performs the associated behavior.
Motor system:
The motor system controls Kismet's body posture, facial expressions, speech and lip synchronization, and gaze direction. The robot has 9 basis postures, or expressions: fear, accepting, tired, content, stern, disgust, anger, surprise and unhappy. However, Kismet's emotion space is continuous, not discrete. For example, if Kismet shows an unhappy posture and the observer start speaking in a soothing voice, Kismet's expression can smoothly transition to accepting. In addition to facial features such as eyebrows and mouth shape, Kismet can also change the orientation of its ears. For instance, arousal is conveyed by pointing its ears upward.
Kismet speaks a proto-language with a variety of phonemes, similar to baby's babbling. It uses the DECtalk voice synthesizer, and changes pitch, timing, articulation, etc. to express various emotions. Intonation is used to vary between question and statement-like utterances. Lip synchronization was important for realism, and the developers used a strategy from animation: simplicity is the secret to successful lip animation. Thus, they did not try to imitate lip motions perfectly, but instead create a visual short hand that passes unchallenged by the viewer.
In the media:
Kismet has been featured on NBC as well as Discover magazine and is the project of Cynthia Breazeal. It also played a small role in the Steve Reich opera Three Tales, as a symbol of the development of artificial intelligence, and also a voice of traditional ethics.
A replica of Kismet is part of a traveling exhibit along with Breazeal (in a pre-recorded segment) in the Star Wars: Where Science Meets Imagination exhibition.