Lee Lab., NITech

Welcome! Here at Lee Laboratory of Nagoya Institute of Technology, we focus on human-to-human and human-to-machine communication through speech and language, and are conducting research on speech recognition, spoken dialogue systems, natural language processing, speech-based interaction, and avatar communication. Our goal is to advance technologies related to spoken language processing and to realize highly sophisticated, voice- and language-driven man–machine interfaces that are truly natural and user-friendly for everyone.

Our research topic involves:

Speech recognition and synthesis
Spoken dialogue system
Natural language processing
CG based humanoid agent interaction
Avatar communication

Lee Lab is running in collaboration with Sako laboratory. We also have a cooperative relationship with Tokuda, Nankaku and Hashimoto Laboratory.

About the PI

LEE Akinobu was born in Kyoto, Japan, on December 19, 1972. He received the B.E. and M.E. degrees in information science, and the Ph.D. degree in informatics from Kyoto University, Kyoto, Japan, in 1996, 1998 and 2000, respectively. He worked on Nara Institute of Science and Technology as an assistance professor from 2000-2005. Currently he is a professor of Nagoya Institute of Technology, Japan. His research interests include speech recognition, spoken language understanding, and spoken dialogue system. He is a member of IEEE, ISCA, JSAI, IPSJ and the Acoustical Society of Japan.

He is also a researcher who loves coding and has been involved in open-source activities for over 25 years. Below is a list of open-source software and CG avatars for which he serves as the lead developer:

ASR engine Julius (from 1996)
CG Agent Interaction Toolkit MMDAgent (2011～)
Extended version for CG Avatar interaction: MMDAgent-EX MMDAgent-EX (2020～)
High-quality Open CG Avatars: Gene and Uka (2023～)

News and Posts

Following members has been joined to Lee and Sako lab on this April. ビンヒデオくん / BIN Hideo (M1) 義井健史くん / YOSHII Kensi (M1) 岩澤芙弓さん / IWASAWA Fuyumi 江崎友都くん / ESAKI Yuto 志満津奈央さん / SHIMAZU Nao 田中愛菜さん / TANAKA Mana 東省吾くん / HIGASHI Shogo 藤田敦也くん / FUJITA Atsuya 村松洸兵くん / MURAMATSU Kohei 辰巳花菜さん / TATSUMI Kana 宮下陸くん / MIYASHITA Riku

Julius version 4.6 has been released. You can get it from its GitHub site. What’s new in Julius-4.6 Julius-4.6 is a minor release with new features and fixes, including GPU integration and grammar handling updates. GPU-based DNN-HMM computation (Take a look at v4.6 performance comparison on YouTube!) Now Julius can compute DNN-HMM with GPU. Total decoding will be four times faster than CPU-based computation on Julius-4.5. Requires CUDA version 8, 9 or 10.

The Lab’s web site has been re-organized with Hugo with refreshed English page. Goodbye WordPress!

Julius has merged a pull request that adds a new feature “grammar search on the 1st pass”. To use it, get the latest code on master branch. It enables applying full grammar on the 1-pass, thus outputs more reliable (grammar-constrained) result at the 1st pass. Background The grammar-based recognition on Julius does not apply the full grammar on the 1st pass, but applies only the word-pair constraint extracted from the grammar for efficiency.

Following members has been joined to Lee and Sako lab on this April. 菊地源くん / KIKUCHI Gen (M1) 畑中　哲哉くん / HATANAKA Tetsuya 松本　優太くん / MATUMOTO Yuta 池口　弘尚くん / IKEGUCHI Hironao 愛甲　拓海くん / AIKO Takumi 小木曽　雄飛くん / OGISO Yuto 渡邉　大地くん / WATANABE Daichi 堀田　義眞くん / HOTTA Yoshimasa 白井　建くん / SHIRAI Takeru 小川　凜人くん / OGAWA Rinto

The graduation thesis presentation meeting was held. The following ten members gave presentations: 松岡優太「楽曲の再生履歴情報を用いた GA による自動メロディ生成」中川樹「ホルンを対象とした音響信号による音色悪化要因判別」尾関日向「音楽音響信号を対象としたギターパートの分離」池田将「対話的楽曲推薦を目的とした発話からの感情要素と感情強度推定手法」高井幸輝「マルチコーパスおよびマルチラベルを用いた Pre-training Fusion による自発音声の感情分類」岡本空大「ニューラルネットワークを用いた講演音声の自動発話印象推定における推定精度の向上」大平原海斗「Weakly-labeled Data を用いた音響イベント検出」加藤弘泰「音声対話システムにおけるユーザ話速およびコーパスに基づく話速制御」西山達也「話者情報を用いたBERTによる複数人対話における応答選択」石島侑弥「対話状態追跡における対話行為タグを用いた重要対話履歴抽出」

The 2019 master’s thesis review meeting was held. The following members gave presentations: 森　凜太朗「言語対の音素事後確率を用いた第二言語学習者の発音習熟度判別」降籏　暢基「擬人化エージェントを用いた情動表出による対話性認知の獲得」冨田　直希「頑健な音声対話システムのための言語情報と統計情報による対話シナリオ拡張」尾関　晃英「ニューラル対話システムにおける発話評価器を用いたスタイル性の高い応答文生成」田中　涼太「収束的- 拡散的デコーディングを用いた事実知識に基づく対話生成」神谷　祐太朗「バイオリンを対象とした楽譜情報からのビブラート予測」河島　有孝「コード認識による押弦絞り込みを取り入れた異弦同音を区別可能な演奏音からのタブ譜推定」谷口　拓海「音色とピッチの揺らぎを考慮した歌声区間推定」 NGUYEN TU NAM「転移学習と合成画像を用いた指文字認識」

Ryota Tanaka (M2)’s paper has been published at Computer Speech & Language jounal. Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee, “Context and Knowledge Aware Dialogue System and System Combination for Grounded Response Generation” In Proc. Computer Speech & Language Journal, vol. 62, July 2020. http://www.sciencedirect.com/science/article/pii/S0885230820300036

A paper entitled “Speaker Aware BERT for Multi-Party Dialogue Selection” by Tatsuya Nishiyama has been accepted for poster presentation at AAAI2020/DSTC8, which will be held on New York on February 8, 2020.

About the PI

LEE Akinobu

News and Posts

New members joined on 2021

Julius-4.6 Released

Web site renewal

Julius: added a new feature

New members joined on 2020

The 2019 bachelor thesis presentation was held

The 2019 master's thesis review meeting was held.

A new paper has been published in Computer Speech & Language Journal

A paper has been accepted in AAAI2020/DSTC8