Research Topics

We are conducting a wide range of topics centered on spoken language processing and dialogue, emphasizing both theoretical and practical aspects. Below are the general research areas:

  • Speech recognition and synthesis
  • Spoken dialogue systems
  • Natural language processing
  • CG agent interaction
  • Avatar communication support

In recent years, the emergence of neural networks, particularly large language models (LLMs), has significantly transformed these research fields. Below, we introduce the research content and examples of specific research themes for each area.

Last updated: 2026/04

Speech Recognition and Synthesis

— Three decades of expertise meets cutting-edge technology

Speech processing technologies that transcribe speech audio data into text and assign time labels, such as speaker and context, form the foundation of speech media processing. The process involves various information processing technologies, from signal processing to language processing and speech understanding. Lee Lab has been conducting research on speech recognition for years. Our open-source speech recognition engine Julius, developed by PI Lee in 1996, has a history spanning nearly three decades and has been widely used both domestically and internationally.

Recently, we have also expanded our research into speech synthesis for unified speech processing. We are advancing research that integrates speech recognition and synthesis, incorporating cutting-edge techniques such as LLM-based speech synthesis.

  • Speech recognition based on data generation via speech synthesis
  • Speaker diarization (who spoke when) and its applications
  • Expressive speech synthesis (speech-laugh, etc.)
  • LLM-based speech synthesis
Recent publications
  • Sei Ueno, Akinobu Lee "Beam search considering continuity in LLM-based speech synthesis" ASJ 155th Meeting (Spring 2026), 1-5-3
  • Yuuki Yamakawa, Sei Ueno, Akinobu Lee "Rakugo speech synthesis focusing on character portrayal using large-scale models" ASJ 155th Meeting (Spring 2026), 1-Q-31
  • Keigo Ichikawa, Sei Ueno, Akinobu Lee "Data generation for speaker diarization based on speaker transition probability" ASJ 153rd Meeting (Spring 2025), 3-P-2

Spoken Dialogue Systems

— Making machine conversations more natural with LLMs
Remdis dialogue system

A simple voice command system can be made by composing speech recognition and synthesis technologies. However, a human-like dialogue system should include much more problem-solving modules to achieve a natural conversation, such as context understanding, recognizing user characteristics, finding conversation goals, devising strategies, detecting errors or mis-understanding, etc.

While traditional research has focused on individual modules, recent advancements in neural networks allow end-to-end approaches that learn these processes collectively. The recent large language models (LLMs) have especially improved their accuracy significantly. Multimodal dialogue systems are also effective for engaging, context-sensitive interactions. A camera input can provide assessment of overall situations, and verbal or non-verbal responses with gestures and emotions make the dialogue more intuitive.

Building on our extensive experience in practical machine dialogue systems, we collaborate with companies and research projects to conduct comprehensive research on spoken dialogue systems. We are developing next-generation dialogue systems centered on the multimodal dialogue platform Remdis, leveraging LLMs.

  • In-car voice information guidance systems based on LLMs (in collaboration with an automotive company)
  • AI simulated patients and automated medical interview evaluation (in collaboration with Fujita Health University)
  • Remote communication support using information-extracting agents
  • Spoken dialogue systems as media content
Recent publications

Natural Language Processing

— Connecting people through empathetic dialogue

Natural language processing (NLP) is a core technology for dialogue systems that determines how to manage conversation in the text domain. Recently, NLP has progressed dramatically through huge neural networks and large language models. Our lab focuses on the field of NLP with particular emphasis on “human-to-human dialogues.”

  • Online motivational interview systems
  • Dialogues aimed at clarifying and verbalizing user thoughts
  • Narrative processing: extracting character relationships, generating summaries
Recent publications

CG Agent Interaction

— Talk to CG characters as naturally as you talk to humans
CG agent dialogue with MMDAgent-EX

When having conversations with machines, humans always prefer human-like appearances since it is a natural way. Our lab focuses on dialogue systems using CG characters with full-body representations capable of physical communication (aka ECA: Embodied Conversational Agent).

While conversation technologies have emerged to give users more chances to speak to machines, most people still feel strange talking to a machine, and conversations never feel truly “human-like.” Our lab has a fully integrated original CG-agent system that combines speech recognition, synthesis, and dialogue technologies with a CG agent rendering engine. We conduct wide-ranging research from autonomous CG agents serving as AI front-ends to CG avatars operated by humans.

In particular, we aim to eliminate discomfort in dialogues with CG agents, enabling effortless, human-like conversations.

  • Perception of “being an appropriate conversation partner” (dialogue-perception)
  • CG-specific conversational styles
  • Affordances for dialogue
  • Spatial immersion using self-projection avatars
  • Virtual social touch
Recent publications

Avatar Communication Support

— Toward an avatar society anyone can join, anytime, anywhere
Valles avatar operation system

Research on “avatars,” where humans remotely operate robots or CG characters to make conversations and activities, has been ongoing in various fields. Recently, it has become more prominent in online culture, as seen with VTubers who use CG avatars to communicate. Enabling virtualized conversations with CG avatars is promising as it can serve to relax people’s time and place constraints, protect privacy, and support social participation for people with mobility issues.

Despite its potential, conversations using CG avatars still face challenges and limitations, preventing widespread adoption. Our lab leverages its long-time expertise in CG agent dialogue systems for human-machine dialogue to study CG avatar communication support.

Under the Moonshot Research and Development Program for realizing an avatar-symbiotic society project, we aim to create and socially implement an avatar system that anyone can use, anytime, anywhere.

  • Research and development of an integrated CG avatar operation system (avatar1000 system)
  • Defining conversational styles and requirements mediated by CG characters
  • Real-time behavior generation from speech: Speech2Motion
  • Collection of avatar operation corpora
Recent publications