Information relations between gestures and languages during speech
- Yang Cheng
- Oct 24, 2022
- 2 min read
Updated: Oct 30, 2022
Intuitively we know this must be true, but can we find evidence?

We talk with not just verbal language but also non-verbal ones: hand gestures, body poses, head motions, and facial expressions. How rich is the information contained in these non-verbal channels? Is there a way to quantify it? Can we understand their meanings (gestures, poses, motions, etc.) in the same way we understand natural language with the help of machine learning tools?
A simple assumption: if we can extract discrete patterns in non-verbal behaviors, for instance, poses, which are robustly observable across various communication scenarios, doesn't that mean these patterns can be considered as words? To examine our assumptions, we need to accomplish several challenges.
The first challenge is how to extract patterns from non-verbal behaviors. This is extremely difficult. Since no dataset focuses on labeling the way of body actions across a speech (not to say the criteria are not clear ). We tried different representations and clustering methods, which all have less preferable performance. So I designed a natural but effective way to classify gestures.

The whole image of the speaker has been split into a nine-box grid. The hands of the speaker are located in one of the nine grids, which is used to represent the pose. For example, as in the left picture above, the right hand in box one is treated as 5. The left hand in box one is treated as 5*10 = 50. This gesture is coded as 55. There are 81 gestures in total. They will be used as tokens like language models.

We input language and gesture tokens into a transformer model to identify the information hidden in body gestures. The result shows that the predictions have been improved.
This research has many limitations: methods are too naive, the dataset is too small, etc. It is more like a baseline or a starting point of related research. I am working on some improvements to both the dataset and the pattern extraction methods. I have made some progress, which I hope to share in the future.
Paper link: https://aclanthology.org/2022.coling-1.12/
Comments