Motion capture technology has enabled the acquisition of high quality human motions for animating digital characters with extremely high fidelity. However, despite all the advances in motion editing and synthesis, it remains an open problem to modify pre-captured motions that are highly expressive, such as contemporary dances, for stylization and emotionalization. In this work, we present a novel approach for stylizing such motions by using emotion coordinates defined by the Russell's Circumplex Model (RCM). We extract and analyze a large set of body and motion features, based on the Laban Movement Analysis (LMA), and choose the effective and consistent features for characterizing emotions of motions. These features provide a mechanism not only for deriving the emotion coordinates of a newly input motion, but also for stylizing the motion to express a different emotion without having to reference the training data. Such decoupling of the training data and new input motions eliminates the necessity of manual processing and motion registration. We implement the two-way mapping between the motion features and emotion coordinates through Radial Basis Function (RBF) regression and interpolation, which can stylize free-style highly dynamic dance movements at interactive rates. Our results and user studies demonstrate the effectiveness of the stylization framework with a variety of dance movements exhibiting a diverse set of emotions.
Abstract Non‐playable characters (NPCs) play a crucial role in enhancing immersion in video games. However, traditional NPC behaviors are often hard‐coded using methods such as Finite State Machines, Decision and Behavior trees. This has a few limitations; namely, it is quite difficult to implement complex cooperative behaviors and secondly this makes it easy for human players to identify and exploit patterns in behavior. To overcome these challenges, Reinforcement learning (RL) can be used to generate dynamic and real‐time NPC responses to human player actions. In this paper, we report on first results of applying RL techniques to a Non‐Zero Sum, adversarial asymmetric game, using a multi‐agent team. The game environment simulates a museum heist, where the objective of the successfully trained team of robbers with different skills (Locksmith, Technician) is to steal valuable items from the museum without being detected by the scripted security guards and cameras. Both agents were trained concurrently with separate policies and received both individual and group reward signals. Through this training process, the agents learned to cooperate effectively and use their skills to maximize both individual and team benefits. These results demonstrate the feasibility of realizing the full game where both robbers and security guards are trained at the same time to achieve their adversarial goals.
Synthesizing human motion with a global structure, such as a choreography, is a challenging task. Existing methods tend to concentrate on local smooth pose transitions and neglect the global context or the theme of the motion. In this work, we present a music-driven motion synthesis framework that generates long-term sequences of human motions which are synchronized with the input beats, and jointly form a global structure that respects a specific dance genre. In addition, our framework enables generation of diverse motions that are controlled by the content of the music, and not only by the beat. Our music-driven dance synthesis framework is a hierarchical system that consists of three levels: pose, motif, and choreography. The pose level consists of an LSTM component that generates temporally coherent sequences of poses. The motif level guides sets of consecutive poses to form a movement that belongs to a specific distribution using a novel motion perceptual-loss. And the choreography level selects the order of the performed movements and drives the system to follow the global structure of a dance genre. Our results demonstrate the effectiveness of our music-driven framework to generate natural and consistent movements on various dance types, having control over the content of the synthesized motions, and respecting the overall structure of the dance.
Abstract Data‐driven skeletal animation relies on the existence of a suitable learning scheme, which can capture the rich context of motion. However, commonly used motion representations often fail to accurately encode the full articulation of motion, or present artifacts. In this work, we address the fundamental problem of finding a robust pose representation for motion, suitable for deep skeletal animation, one that can better constrain poses and faithfully capture nuances correlated with skeletal characteristics. Our representation is based on dual quaternions, the mathematical abstractions with well‐defined operations, which simultaneously encode rotational and positional orientation, enabling a rich encoding, centered around the root. We demonstrate that our representation overcomes common motion artifacts, and assess its performance compared to other popular representations. We conduct an ablation study to evaluate the impact of various losses that can be incorporated during learning. Leveraging the fact that our representation implicitly encodes skeletal motion attributes, we train a network on a dataset comprising of skeletons with different proportions, without the need to retarget them first to a universal skeleton, which causes subtle motion elements to be missed. Qualitative results demonstrate the usefulness of the parameterization in skeleton‐specific synthesis.
Folk dances often reflect the socio-cultural influences prevailing in different periods and nations; each dance produces a meaning, a story with the help of music, costumes and dance moves. However, dances have no borders; they have been transmitted from generation to generation, along different countries, mainly due to movements of people carrying and disseminating their civilization. Studying the contextual correlation of dances along neighboring countries, unveils the evolution of this unique intangible heritage in time, and helps in understanding potential cultural similarities. In this work we present a method for contextually motion analysis that organizes dance data semantically, to form the first digital dance ethnography. Firstly, we break dance motion sequences into some narrow temporal overlapping feature descriptors, named motion and style words , and then cluster them in a high-dimensional features space to define motifs . The distribution of those motion and style motifs creates motion and style signatures , in the content of a bag-of-motifs representation, that implies for a succinct but descriptive portrayal of motions sequences. Signatures are time-scale and temporal-order invariant, capable of exploiting the contextual correlation between dances, and distinguishing fine-grained difference between semantically similar motions. We then use quartet -based analysis to organize dance data into a categorization tree , while inferred information from dance metadata descriptions are then used to set parent-child relationships. We illustrate a number of different organization trees, and portray the evolution of dances over time. The efficiency of our method is also demonstrated in retrieving contextually similar dances from a database.
Data-driven character animation techniques rely on the existence of a properly established model of motion, capable of describing its rich context. However, commonly used motion representations often fail to accurately encode the full articulation of motion, or present artifacts. In this work, we address the fundamental problem of finding a robust pose representation for motion modeling, suitable for deep character animation, one that can better constrain poses and faithfully capture nuances correlated with skeletal characteristics. Our representation is based on dual quaternions, the mathematical abstractions with well-defined operations, which simultaneously encode rotational and positional orientation, enabling a hierarchy-aware encoding, centered around the root. We demonstrate that our representation overcomes common motion artifacts, and assess its performance compared to other popular representations. We conduct an ablation study to evaluate the impact of various losses that can be incorporated during learning. Leveraging the fact that our representation implicitly encodes skeletal motion attributes, we train a network on a dataset comprising of skeletons with different proportions, without the need to retarget them first to a universal skeleton, which causes subtle motion elements to be missed. We show that smooth and natural poses can be achieved, paving the way for fascinating applications.
Exergames do not have the capacity to detect whether the players are really enjoying the game-play. The games are not intelligent enough to detect significant emotional states and adapt accordingly in order to offer a better user experience for the players. We propose a set of body motion features, based on the Effort component of Laban Movement Analysis (LMA), that are used to provide sets of classifiers for emotion recognition in a game scenario for four emotional states:concentration, meditation, excitement and frustration. Experimental results show that, the system is capable of successfully recognizing the four different emotional states at a very high rate.
Child characters are commonly seen in leading roles in top-selling video games. Previous studies have shown that child motions are perceptually and stylistically different from those of adults. Creating motion for these characters by motion capturing children is uniquely challenging because of confusion, lack of patience and regulations. Retargeting adult motion, which is much easier to record, onto child skeletons, does not capture the stylistic differences. In this paper, we propose that style translation is an effective way to transform adult motion capture data to the style of child motion. Our method is based on CycleGAN, which allows training on a relatively small number of sequences of child and adult motions that do not even need to be temporally aligned. Our adult2child network converts short sequences of motions called motion words from one domain to the other. The network was trained using a motion capture database collected by our team containing 23 locomotion and exercise motions. We conducted a perception study to evaluate the success of style translation algorithms, including our algorithm and recently presented style translation neural networks. Results show that the translated adult motions are recognized as child motions significantly more often than adult motions.