Research Topics

We are conducting a wide range of topics centered on spoken language processing and dialogue, emphasizing both theoretical and practical aspects. Below are the general research areas:

Speech recognition and synthesis
Spoken dialogue systems
Natural language processing
CG agent interaction
Avatar communication support

In recent years, the emergence of neural networks, particularly large language models (LLMs), has significantly transformed these research fields. Below, we introduce the research content and examples of specific research themes for each area.

(Last updated: 2024/12)

Speech Recognition and Synthesis

Speech processing technologies that transcribes speech audio data into text and assigns time labels, such as speaker and context, forms the foundation of speech media processing. The process involves various information processing technologies, from signal processing to language processing and speech understanding. Lee Lab has been conducting research on speech recognition for years. Recently, we have also expanded our field into speech synthesis research for unified speech processing.

Speech recognition based on data generation via speech synthesis
Speaker diarization
Speech-laugh synthesis

Spoken Dialogue Systems

A simple voice command system can be made by composing speech recognition and synthesis technologies. However, a human-like dialogue system should include much more problem-solving modules to achieve a natural conversation, such as context understanding, recognizing user characteristics, finding conversation goals, devising strategies, detecting errors or mis-understanding, etc.

While traditional research has focused on the individual modules, recent advancements in neural networks allow end-to-end approaches that learn these processes collectively. And namely, the recent large language models (LLMs) has especially improved their accuracy significantly.

Multimodal dialogue systems are also effective for engaging, context-sensitive interactions. A camera input can provide assessment of overall situations, and verbal or non-verbal responses with gestures and emotions makes the dialogue more intuitive.

Building on our extensive experience in practical machine dialogue systems, we collaborate with companies and research projects to conduct comprehensive research on spoken dialogue systems.

In-car voice information guidance systems based on LLMs (in collaboration with an automotive company)
AI simulated patients and automated medical interview evaluation (in collaboration with Fujita Health University)
Remote communication support using information-extracting agents
Spoken dialogue systems as media content

Natural Language Processing

Natural language processing (NLP) is a core technology of dialogue system to determine how to manage the dialogue in text domain. Recently, NLP has progressed dramatically by huge neural networks and large language models. Our lab also has a research topic in the field of NLP, focuses on “human-to-human dialogues”.

Online motivational interview systems
Dialogues aimed at clarifying and verbalizing user thoughts
Narrative processing: extracting character relationships diagram, generating summaries

CG Agent Interaction

When having a conversations with machines, human always prefer human-like appearances since it is a natural way. Our lab focuses on dialogue systems using CG characters with full-body representations capable of physical communication (aka ECA: Embodied Conversational Agent).

While conversation technologies has been emerged to give users much chance to speak to machines, most people still feels strange to talk to a machine, and conversations never be “like a human” to all.

Our lab has a fully integrated original CG-agent system that integrates speech recognition, synthesis, and dialogue technologies with CG agent rendering engine. We conduct wide-ranging research from autonomous CG agents serving as AI front-ends to CG avatars operated by humans.

Physical communication support using CG avatars
Automated generation of behavior and body movements
Design theory of CG agents

In particular, we aim to eliminate discomfort in dialogues with CG agents, enabling effortless, human-like conversations.

Perception of “being an appropriate conversation partner” (dialogue-perception)
CG-specific conversational styles
Affordances for dialogue
Spatial immersion using self-projection avatars
Virtual social touch

Avatar Communication Support

Research on “avatars,” where humans remotely operate robots or CG characters to make conversations and activities, has been ongoing in various fields. Recently, it has become more prominent at online culture, as seen with VTubers who uses CG avatars to communicate. Enabling virtualized conversations with CG avatars is promising so that it can serve to relaxing people’s time and place constraints, protecting privacy, and supporting social participation for people with mobility issues.

Despite its potential, conversations using CG avatars still face challenges and limitations, preventing widespread adoption. Our lab leverages its long-time expertise in CG agent dialogue systems for human-machine dialogue to study CG avatar communication support.

Under the Moonshot Research and Development Program for realizing an avatar-symbiotic society project, we aim to create and socially implement an avatar system that anyone can use, anytime, anywhere.

Research and development of an integrated CG avatar operation system (avatar1000 system)
Defining conversational styles and requirements mediated by CG characters
Real-time behavior generation from speech: Speech2Motion
Collection of avatar operation corpora