research.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">

<head>
  <meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta name="description" content="Lei Li" />
  <meta name="keywords"
    content="machine learning deep learning natural language processing machine translation probabilistic programming" />
  <meta name="author" content="Lei Li" />
  <link href="style/style.min.css" rel="stylesheet" type="text/css" />
  <title>Lei LI's Research Projects</title>
</head>

<body class="page-template-template-home page-id-2">
  <header class="site-header--mobile" role="banner">
    <div class="mobile-header-right">
      <button type="button" class="js-toggle-nav menu-button" aria-controls="mobile-nav-tray" aria-expanded="false">
        <span class="vh">Menu</span>
        <svg class="menu-icon" width="48" height="33" focusable="false">
          <use href="#menu-icon" />
        </svg>
      </button>
    </div>
  </header>

  <div id="mobile-nav-tray" class="mobile-nav-tray" aria-hidden="true">
    <div class="mobile-nav-header">
      <div class="site-branding">
        <a href="https://www.cmu.edu/" target="_blank" rel="home">
          <span class="vh">Lei Li</span>
        </a>
      </div>
      <button type="button" class="js-toggle-nav close-button" aria-controls="mobile-nav-tray" aria-expanded="false">
        <span class="vh">Close</span>
        <svg class="menu-icon-svg" width="33" height="33" focusable="false">
          <use href="#close-icon" />
        </svg>
      </button>
    </div>
    <nav class="navigation--mobile">
      <ul class="mobile-nav mobile-nav--main">
        <li><a href="index.html">Home</a></li>
        <li class="current-menu-item"><a href="research.html">Research</a></li>
        <li><a href="teaching.html">Teaching</a></li>
        <li>
          <a href="pubs.html">Publications</a>
        </li>
        <li><a href="people.html">People</a></li>
        <li><a href="https://lileicc.github.io/blog/">Blog</a></li>
      </ul>
    </nav>
  </div>
  <div>
    <header id="masthead" class="site-header" role="banner">
      <div class="wrap">

        <div class="l-header">
          <div class="l-header-col-2">

            <div class="site-navigation">
              <nav class="js-accessible-menu navigation--main" aria-label="Main Navigation" role="navigation">
                <ul class="nav-list">
                  <li><a href="index.html">Home</a></li>
                  <li class="current-menu-item"><a href="research.html">Research</a></li>
                  <li><a href="teaching.html">Teaching</a></li>
                  <li>
                    <a href="pubs.html">Publications</a>
                  </li>
                  <li><a href="people.html">People</a></li>
                  <li><a href="https://lileicc.github.io/blog/">Blog</a></li>
                </ul>
              </nav>

            </div>

          </div>
        </div>

      </div>
    </header>
    <main id="main" class="site-main" role="main">
      <header class="banner">
        <div class="wrap">
          <div class="banner-title">
            <span class="h1">
              Research </span>
          </div>
        </div>
      </header>
      <section class="section">
        <div class="wrap">
          <h2>Intelligent Writing and Text Generation</h2>
          <ul>
            <li>developing controllable and interpretable methods for effective
              text generation. </li>
            <li>Xiaomingbot: an intelligent news-writing robot. [<a href="https://xiaomingbot.github.io">Demo</a>]</li>
            <li>Bayesian sampling methods for controllable text generation: <a href="pubs/miao2019cgmh.pdf">CGMH</a>,
              <a href="pubs/zhang2019generating.pdf">MHA</a>, <a href="pubs/zhang2020language.pdf">TSMH</a>,
              controlling the language generation explicitly using various
              constraints.
            </li>
            <li>VAE with hierarchical latent priors. Solving the training
              problem for VAE with mixture of exponential family distribution. <a
                href="pubs/shi2020dispersed.pdf">DEMVAE</a></li>
            <li>Training better data-to-text generation with both data-text
              pairs and additional raw text. check out <a href="pubs/ye2020variational.pdf">variational
                template machine</a> that learns infinite templates for
              generation in the latent space. </li>
            <li>One embedding is not enough to represent a word! <a href="pubs/miao2019kernelized.pdf">Bayesian
                Softmax</a> improves text generation. </li>
            <li>Application in Advertising system: <a href="pubs/song2021triangular.pdf">Generating bidwords for
                sponsored
                search</a>, <a href="https://arxiv.org/abs/1912.01114">News headline editing</a>.</li>
          </ul>
          <h2>Multilingual Machine Translation </h2>
          <p>How to develop a unified model that can translation many language
            pairs well? Existing neural machine translation relies on rich
            parallel bilingual corpus, which not readily available for many
            non-English language pairs. </p>
          <ul>
            <li>Does pre-trained monolingual language models such as BERT/GPT
              benefit bilingual neural machine translation? check out CTNMT <a href="pubs/yang2020towards.pdf">paper</a>
            </li>
            <li><img src="mrasp-logo.png" alt="" title="mRASP" style="width: 149px; height: 56px;" align="left" />Can
              we build a universal pre-trained neural model that can improve on
              any pairs of language translation? Even if the languages do not
              occur in the pre-training corpus? The <a href="pubs/lin2020pre.pdf">mRASP
              </a> and <a href="pubs/pan2021contrastive.pdf">mRASP2</a> papers try to answer this question. Read <a
                href="https://medium.com/@panxiao1994/mrasp2-multilingual-nmt-advances-via-contrastive-learning-ac8c4c35d63">the
                blog post here</a>.</li>
            <li>Building a model like human who knows two languages: integrating
              the translation capabilities from one to the other and vice versa,
              as well as the capabilities to compose sentences in both
              languages. check out Mirror Generative Neural Machine Translation
              <a href="pubs/zheng2020mirror.pdf">paper</a>.
            </li>
            <li>Prune-tune: a method that continually learns multiple domains of
              translation. It improves domain-specific translation successively
              without degenerating on the general domain, avoiding the common
              catastrophic forgetting problem. check out <a href="pubs/liang2021finding.pdf">Prune-tune
                paper</a>. [<a href="https://ohlionel.github.io/project/Prune-Tune/">project
                page</a>]</li>
            <li><a href="pubs/lin2021learning.pdf">Learning language-specific sub-network</a> is possible for
              multilingual
              neural machine translation. It improve zero-shot translation. </li>
            <li><a href="">Graformer</a>: connecting pretrained BERT and GPT with a small bridge module to boost
              performance for multilingual machine translation. It furhter enables easy exploitation of pre-trained
              models
              using monolingual corpus in multiple languages. </li>
            <li><a href="">CIAT</a>: designing small adapter subnets for multilingual machine translation. Should the
              adapter be serial or parallel to main backbone network? This study finds parallel adapter works better to
              counter interferences in langauges. </li>
            <li>How to achieve the top performance in multiple language
              directions in WMT 20 (Chinese-English, German-English,
              French-German, English-Khmer, English-Pashto). check out our
              experience in this <a href="https://arxiv.org/abs/2010.14806">report</a>,
              and this <a href="https://arxiv.org/abs/2010.14029">report</a>. </li>
            <li>The algorithms are deployed to production, check out <a
                href="https://translate.volcengine.cn">VolcTrans</a>
              that serves hundred millions of translation requests daily, in 55
              languages. </li>
          </ul>
          <h2>Speech-to-text Translation</h2>
          <p>Can we build a single unified model that takes voice input in one
            language and output translation in another language? Existing
            systems are cascaded, combining an ASR system and a MT system. This
            project aims to build real working system that can achieve it in
            end-to-end fashion. The major challenges are two folds. A model
            should translate from the source language to the other, and it has
            to convert from one modality (audio) to the other (text). In
            addition, existing open datasets for speech translation are limited,
            usually a few hundred hours, much less than that for machine
            translation (e.g. 4 million pairs of sentences for English-German)</p>
          <ul>
            <li>Can we utilize the additional transcription text in the source
              language to help train a better encoder for speech translation?
              Inspired by human's listen, understand, and translate steps, we
              propose LUT method that utilizes triplets of source audio, source
              text, target text, and a pre-trained BERT model to train a better
              end-to-end speech-to-text translation system. check out LUT <a
                href="https://arxiv.org/abs/2009.09704">paper</a>.
              [<a href="https://dqqcasia.github.io/projects/LUT/">project
                page</a>]</li>
            <li>Can we utilize additional large parallel bilingual sentence
              pairs for machine translation to enhance speech translation? A
              idea based on consecutive decoding can achieve this. checkout
              COSTT <a href="https://arxiv.org/abs/2009.09737">paper</a>. [<a
                href="https://dqqcasia.github.io/projects/COSTT/">project
                page</a>]</li>
            <li>Study of human brain reveals that there is common region is the brain responsible for text and speech
              processing.
              Can a neural network map input of text and speech to the same semantic space? check out <a
                href="pubs/han2021learning.pdf">Chimera</a>, a model for speech-to-text translation,
              utilizes the notion of share-semantic space to further improve speech translation. </li>
            <li>Training techniques such as progressive multitask training improves speech translation. <a
                href="pubs/ye2021end.pdf">XSTNet</a> obtains the state-of-the-art translation performance on MuST-C
              dataset. </li>
          </ul>
          <ul>
          </ul>
          <ul>
          </ul>
          <h2>AI-powered Drug Discovery</h2>
          <p>The purpose of this project is to use AI and machine learning to
            power the whole process of drug discovery, test, trial validation
            and manufacturing. </p>
          <ul>
            <li>Find novel and diverse molecules that are effective in terms of
              multiple chemical properties and target proteins. check out the
              MARS <a href="https://openreview.net/forum?id=kHSu4ebxFXY">paper</a>.
            </li>
          </ul>
        </div>
      </section>
      <section id="software" class="highlight-section section">
        <div class="wrap">
          <div class="highlight-heading">
            <h1> Software </h1>
          </div>
          <ul>
            <li><a href="https://github.com/bytedance/lightseq">LightSeq</a>: A
              High Performance Training and Inference Library for Transformer models. 
              It is widely used for machine translation, text generation, visual recognition, and more. 
              With the custom CUDA
              implementation, it achieves 10x speed-up over the original
              tensorflow seq2seq package, and faster than other implementations.
            </li>
            <li> <a href="https://github.com/bytedance/neurst"> NeurST </a>: A
              toolbok with readily available models for neural machine
              translation and speech-to-text translation. </li>
            <li><a href="https://bayesianlogic.github.io/">BLOG</a>: a
              probabilistic programming language for machine learning</li>
            <li><a href="https://github.com/lileicc/swift">Swift</a>: a compiler
              for the probabilistic programming language BLOG.</li>
            <li><a href="software/dynammo-r346.zip">DynaMMo</a>: learning
              toolbox for multi-dimensional co-evolving time series. <a href="https://github.com/lileicc/dynammo">github
                page</a></li>
            <li><a href="software/clds-r347.zip">CLDS</a>: complex-valued linear
              dynamical system</li>
            <li><a href="software/plif-r345.zip">PLiF</a>: time-shift-invariant
              feature extraction for time series </li>
            <li><a href="software/bolero-r349.zip">BoLeRO</a>: human motion
              capture occlution recovering</li>
            <li><a href="paralearn/index.html">paralearn</a>: a parallel
              algorithm for learning Markov models and linear dynamical systems
              (i.e. Kalman filter) </li>
            <li><a href="software/mlds-r662.zip">MLDS</a>: learning dynamical
              model for tensor time series </li>
          </ul>
        </div>
      </section>
      <section id="dataset" class="highlight-section section">
        <div class="wrap">
          <div class="highlight-heading">
            <h1>Dataset</h1>
          </div>
          <ul>
            <li> TTNews: a dataset for Chinese document summarization. 50,000
              news articles with summary for training, and 4,000 news articles
              for testing. [<a href="http://tcci.ccf.org.cn/conference/2018/dldoc/taskgline03.pdf">Task
                description</a>] [<a href="https://pan.baidu.com/s/1bppQ4z1">Training
                data</a>] [<a href="https://www.dropbox.com/s/luizl5rftml05nc/nlpcc_summarization_2017-2018_evaluation.zip?dl=0">Testing
                data and evaluation script</a>] [Reports from <a href="pubs/hua2017overview.pdf">NLPCC2017</a>
              and <a href="pubs/li2018overview.pdf">NLPCC2018</a>] </li>
            <li> CNewSum: an extended version of TTNews for Chinese document
              summarization. It includes 304,307 documents and human-written
              summaries. It includes additional adequacy-level and
              deducibility-level labels. [<a href="https://dqwang122.github.io/projects/CNewSum/">Project
                URL</a>] </li>
            <li>MLGSum: a multilingual text summarization corpus with 1.2
              million articles in 12 languages. Average length per article is
              570 words. [<a href="https://dqwang122.github.io/projects/CALMS/">Project
                URL] </a> [<a href="https://drive.google.com/file/d/1i9xfOkQ60kixj0rZ-kCo8UCo2fZ51fCY/view?usp=sharing">Data</a>]</li>
          </ul>
        </div>
      </section>      
      <section class="highlight-section section">
        <div class="wrap">
          <div class="highlight-heading">
            <h1>Past Projects</h1>
          </div>
          <h2>Probabilistic programming languages and Bayesian inference</h2>
          <ul>
            <li> <a href="http://bayesianlogic.cs.berkeley.edu/">Bayesian Logic
                (BLOG)</a> and its inference system. </li>
          </ul>
          <h2>Time series learning</h2>
          <ul>
            <li> Modelling, summarization, clustering, imputation, and
              forecasting for multiple co-evolving time series data, with or
              without missing values. [<a href="pubs/li2009dynammo.pdf">DynaMMo
                paper</a>] [<a href="pubs/cao2018brits.pdf">BRITS neural based
                approach</a>] </li>
            <li>Human motion and motion capture analysis: <a href="mocap.stitch/index.html">natural
                motion stitching</a>, <a href="pubs/li2011time.pdf">motion
                clustering,</a> </li>
            <li>Data center monitoring: to forecast temperature distribution
              across servers using approximate thermo-dynamics, therefore to
              control the cooling with minimal energy consumption in data
              centers. [ThermoCast paper]</li>
            <li> <a href="mlds/index.html">Tensor time series (MLDS)</a> </li>
          </ul>
          <h2>Parallel Learning for Sequential Models</h2>
          <ul>
            <li> <a href="paralearn/index.html"> Parallel algorithms for
                graphical model on multicore </a> : (finished) </li>
          </ul>
          <h2>Network analysis</h2>
          <ul>
            <li>social network and social media analysis</li>
            <li> <a href="http://www.db.cs.cmu.edu/db-site/Projects/cdem">CDEM</a>
              :fly embryo gene pattern mining. (finished) </li>
          </ul>

        </div>
      </section>
    </main>
  </div>
</body>

</html>