Commentary on Michael A. Arbib
Word Counts:
Abstract: 67 words
Main Text: 1067 words
References: 199 words
Total Text: 1463 words
Abstract: In the target article, LR3:Parity, one of the possible criteria for language readiness, is defined with a simple description, “What counts for the speaker (or producer) must count for the listener (or receiver)”. Based on a self-observation principle, we would like to suggest that this ability plays a much more crucial role in communication than can be inferred from the statement, namely the role of “understanding others”.
In everyday life, an ordinary person communicates with another person without much difficulty. Let us consider the following circumstances:
John and Mary are shopping now. Tomorrow is Mary's birthday. John wants to give her a birthday present, but has no idea of what she really wants. Passing through in front of the boutique, John notices that Mary is gazing at the red sweater displayed in the show window. Next day, John is to successfully give her an appropriate birthday present, and Mary might have more successfully had him buy the sweater for herself.
In the above situation, John estimates Mary's mental state. In other words, John understands that Mary desires to possess the red sweater, and expects that she will be happy if he gives it to her as a birthday present. In addition, Mary might have estimated John's mental state of wondering what to buy for her birthday present, and predicted John's action of buying the sweater after his perceiving her implication of the desire. Successful communication thus involves prediction of each other's action, which requires estimation of each other's mental state, such as emotion, intention, belief, and knowledge.
However, John cannot access Mary's mental state directly; information available to him is only from her actions and surrounding environments. To predict Mary's action, John needs to know the dynamics of her mental state and its relation to her action and environments; and vice versa. We generalized this situation within the framework of “dynamics estimation” and uncovered substantial difficulty in communication (Makino&Aihara, 2003). It is difficult to achieve mutual estimation of each other's mental state, if one must estimate the partner's mental state by reconstructing its space, which should be as high-dimensional and complex as the estimator's mental state space is. In a case of state-space reconstruction only from observed data, the reconstruction dimension of the estimated state should be very large up to 2n+1 (Takens, 1981) or more than 2D0 (Sauer et al., 1991) even under a purely deterministic situation, where n and D0 are the dimension of the state space and the box-counting dimension of the possible attractor in the partner's internal dynamics, respectively. Moreover, the attractors themselves may be neither low-dimensional nor asymptotically stable (Kaneko and Tsuda, 2003).
One way to solve this difficult problem is to exploit similarities between peers (i.e., to use parity). John may develop an internal model (Ito, 1970; Kawato et al., 1987) of others by learning from the dynamics of John himself. Since John is similar to Mary in that they belong to the same species, the dynamics model of John is applicable to Mary's dynamics to a certain extent. Although several pioneering studies (e.g. Humphrey, 1978, 1984; Wolpert et al., 2003) had already made this point, we (Makino&Aihara, 2003) further pointed out that this self-learning needs to use objective observation of the self, observing oneself through an external world, so that the learned model involves conversion between subjective and objective information (the self-observation principle; see Figure A1). If learning oneself were closed within one's brain, the learned model would need subjective information as input, and couldn't be applied to estimate others' mental states from objective observation. It is only when the model is learned through objective observation of the self that the model is applicable to estimate others' mental state from objective observation.
If the brain builds such a dynamics model using the self-observation principle, some neurons participating in the model are likely to act similar to mirror neurons, i.e. to respond to both oneself's action and others' action. Indeed, this idea complies with several claims of the target article concerning mirror neurons. As shown by the MNS model, the origin of grasping mirror neurons can be described as a feedback control system of a hand state, which incorporates predicting one's future under the inherent delay of the sensory-motor system (see also the “forward model” in (Wolpert et al., 2003)). Moreover, the idea is along another claim for the evolutionary origin of LR3:Parity in Section 5, because the dynamics model provides prediction of oneself's future and that of others' future, which closely correspond to planning of one's action and interpretation of the action of others, respectively.
This viewpoint enables us to regard parity as a tool of “understanding others”. Parity is of course important to share actions, meanings, and symbols with others, but “understanding others” is far more primitive and essential in communication. In particular, we can give a more substantial explanation for origins of other “language readiness” abilities, such as LR1:Complex imitation and LR4:Intended communication, using the ability to “understand others” given by the parity.
As for complex imitation, it is unsure that simple imitation of movements can be evolved by itself into complex imitation of goal-oriented actions. Naive description, such as learning the goal of an imitated action from accidental success within a number of simple imitations of a movement, would fail to describe imitation of elaborate skills such as nut-cracking. Rather, chimpanzees may endure a long and fruitless training of nut-cracking because they recognize the goal of cracking, to eat nuts, beforehand. We suggest that the parity well describes goal recognition, for it is a part of the ability of “understanding others”; John can use his dynamics model, which has learned association between his goal and his action, to estimate Mary's goal from her action.
As for intended communication, we can improve the “intended communication hypothesis” by assuming parity. The target article raised this hypothesis because some explanation is needed for the transition from goal-oriented imitation to intended communication, i.e., “those [pantomimes] intended by the utterer to have a particular effect on the recipient.” However, we suggest that it is much more easily explained if we assume parity. Because the utterer is able to estimate the recipient's mental state, the utterer can recognize a pantomime's effect to the recipient in terms of the change of the recipient's estimated mental state.
These discussions can further let us reconsider the necessity of complex imitation in the evolutionary stages towards language. Although the target article required the evolution of imitation as a step to recognize others' goals and intentions, our new explanation of the origin of intended communication depends no more on imitation, but only on parity. Since parity accompanies mirror neurons, this does not break the point that mirror neurons are required for intended communication. However, we suggest that complex imitation can be optional for intended communication.

References
Humphrey, N. (1978, June 29). Nature's psychologists. New Scientist, 900-903.
Humphrey, N. (1984). The inner eye: Social intelligence in evolution. Faber and Faber.
Ito, M. (1970). Neurophysiological aspects of the cerebellar motor control system. International Journal of Neurology, 7, 162-176.
Kaneko, K. & Tsuda, I. (eds.). (2003). Focus Issue: Chaotic Itinerancy. Chaos, 13, 926-1164.
Kawato, M., Furukawa, K., & Suzuki, R. (1987). A hierarchical neural network model for the control and learning of voluntary movements. Biological Cybernetics, 57, 169-185.
Makino, T. & Aihara, K. (2003). Self-observation principle for estimating the other's internal state: A new computational theory of communication. Mathematical Engineering Technical Reports METR 2003-36, Department of Mathematical Informatics, Graduate School of Information Science and Technology, the University of Tokyo.
Sauer, T., Yorke, J. A., & Casdagli, M. (1991). Embedology. Journal of Statistical Physics, 65(3/4), 579-616.
Takens, F. (1981). Detecting strange attractors in turbulence. In D. A. Rand & B. S. Young, (eds.), Dynamical systems and turbulence, Vol. 898 of Lecture notes in mathematics (pp. 366-381). Springer-Verlag, Berlin.
Wolpert, D. M., Doya, K., & Kawato, M. (2003). A unifying computational framework for motor control and social interaction. Philosophical Transactions of the Royal Society of London B, 358, 593-602.
Acknowledgements
This research was partially supported by the Advanced and Innovational Research Program in Life Science, from the Ministry of Education, Culture, Sports, Science, and Technology of the Japanese Government. This research was also partially supported by Research Fellowships of the Japan Society for the Promotion of Science for Young Scientists.