/
research.html
322 lines (312 loc) · 17.6 KB
/
research.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="Lei Li" />
<meta name="keywords"
content="machine learning deep learning natural language processing machine translation probabilistic programming" />
<meta name="author" content="Lei Li" />
<link href="style/style.min.css" rel="stylesheet" type="text/css" />
<title>Lei LI's Research Projects</title>
</head>
<body class="page-template-template-home page-id-2">
<header class="site-header--mobile" role="banner">
<div class="mobile-header-right">
<button type="button" class="js-toggle-nav menu-button" aria-controls="mobile-nav-tray" aria-expanded="false">
<span class="vh">Menu</span>
<svg class="menu-icon" width="48" height="33" focusable="false">
<use href="#menu-icon" />
</svg>
</button>
</div>
</header>
<div id="mobile-nav-tray" class="mobile-nav-tray" aria-hidden="true">
<div class="mobile-nav-header">
<div class="site-branding">
<a href="https://www.cmu.edu/" target="_blank" rel="home">
<span class="vh">Lei Li</span>
</a>
</div>
<button type="button" class="js-toggle-nav close-button" aria-controls="mobile-nav-tray" aria-expanded="false">
<span class="vh">Close</span>
<svg class="menu-icon-svg" width="33" height="33" focusable="false">
<use href="#close-icon" />
</svg>
</button>
</div>
<nav class="navigation--mobile">
<ul class="mobile-nav mobile-nav--main">
<li><a href="index.html">Home</a></li>
<li class="current-menu-item"><a href="research.html">Research</a></li>
<li><a href="teaching.html">Teaching</a></li>
<li>
<a href="pubs.html">Publications</a>
</li>
<li><a href="people.html">People</a></li>
<li><a href="https://lileicc.github.io/blog/">Blog</a></li>
</ul>
</nav>
</div>
<div>
<header id="masthead" class="site-header" role="banner">
<div class="wrap">
<div class="l-header">
<div class="l-header-col-2">
<div class="site-navigation">
<nav class="js-accessible-menu navigation--main" aria-label="Main Navigation" role="navigation">
<ul class="nav-list">
<li><a href="index.html">Home</a></li>
<li class="current-menu-item"><a href="research.html">Research</a></li>
<li><a href="teaching.html">Teaching</a></li>
<li>
<a href="pubs.html">Publications</a>
</li>
<li><a href="people.html">People</a></li>
<li><a href="https://lileicc.github.io/blog/">Blog</a></li>
</ul>
</nav>
</div>
</div>
</div>
</div>
</header>
<main id="main" class="site-main" role="main">
<header class="banner">
<div class="wrap">
<div class="banner-title">
<span class="h1">
Research </span>
</div>
</div>
</header>
<section class="section">
<div class="wrap">
<h2>Intelligent Writing and Text Generation</h2>
<ul>
<li>developing controllable and interpretable methods for effective
text generation. </li>
<li>Xiaomingbot: an intelligent news-writing robot. [<a href="https://xiaomingbot.github.io">Demo</a>]</li>
<li>Bayesian sampling methods for controllable text generation: <a href="pubs/miao2019cgmh.pdf">CGMH</a>,
<a href="pubs/zhang2019generating.pdf">MHA</a>, <a href="pubs/zhang2020language.pdf">TSMH</a>,
controlling the language generation explicitly using various
constraints.
</li>
<li>VAE with hierarchical latent priors. Solving the training
problem for VAE with mixture of exponential family distribution. <a
href="pubs/shi2020dispersed.pdf">DEMVAE</a></li>
<li>Training better data-to-text generation with both data-text
pairs and additional raw text. check out <a href="pubs/ye2020variational.pdf">variational
template machine</a> that learns infinite templates for
generation in the latent space. </li>
<li>One embedding is not enough to represent a word! <a href="pubs/miao2019kernelized.pdf">Bayesian
Softmax</a> improves text generation. </li>
<li>Application in Advertising system: <a href="pubs/song2021triangular.pdf">Generating bidwords for
sponsored
search</a>, <a href="https://arxiv.org/abs/1912.01114">News headline editing</a>.</li>
</ul>
<h2>Multilingual Machine Translation </h2>
<p>How to develop a unified model that can translation many language
pairs well? Existing neural machine translation relies on rich
parallel bilingual corpus, which not readily available for many
non-English language pairs. </p>
<ul>
<li>Does pre-trained monolingual language models such as BERT/GPT
benefit bilingual neural machine translation? check out CTNMT <a href="pubs/yang2020towards.pdf">paper</a>
</li>
<li><img src="mrasp-logo.png" alt="" title="mRASP" style="width: 149px; height: 56px;" align="left" />Can
we build a universal pre-trained neural model that can improve on
any pairs of language translation? Even if the languages do not
occur in the pre-training corpus? The <a href="pubs/lin2020pre.pdf">mRASP
</a> and <a href="pubs/pan2021contrastive.pdf">mRASP2</a> papers try to answer this question. Read <a
href="https://medium.com/@panxiao1994/mrasp2-multilingual-nmt-advances-via-contrastive-learning-ac8c4c35d63">the
blog post here</a>.</li>
<li>Building a model like human who knows two languages: integrating
the translation capabilities from one to the other and vice versa,
as well as the capabilities to compose sentences in both
languages. check out Mirror Generative Neural Machine Translation
<a href="pubs/zheng2020mirror.pdf">paper</a>.
</li>
<li>Prune-tune: a method that continually learns multiple domains of
translation. It improves domain-specific translation successively
without degenerating on the general domain, avoiding the common
catastrophic forgetting problem. check out <a href="pubs/liang2021finding.pdf">Prune-tune
paper</a>. [<a href="https://ohlionel.github.io/project/Prune-Tune/">project
page</a>]</li>
<li><a href="pubs/lin2021learning.pdf">Learning language-specific sub-network</a> is possible for
multilingual
neural machine translation. It improve zero-shot translation. </li>
<li><a href="">Graformer</a>: connecting pretrained BERT and GPT with a small bridge module to boost
performance for multilingual machine translation. It furhter enables easy exploitation of pre-trained
models
using monolingual corpus in multiple languages. </li>
<li><a href="">CIAT</a>: designing small adapter subnets for multilingual machine translation. Should the
adapter be serial or parallel to main backbone network? This study finds parallel adapter works better to
counter interferences in langauges. </li>
<li>How to achieve the top performance in multiple language
directions in WMT 20 (Chinese-English, German-English,
French-German, English-Khmer, English-Pashto). check out our
experience in this <a href="https://arxiv.org/abs/2010.14806">report</a>,
and this <a href="https://arxiv.org/abs/2010.14029">report</a>. </li>
<li>The algorithms are deployed to production, check out <a
href="https://translate.volcengine.cn">VolcTrans</a>
that serves hundred millions of translation requests daily, in 55
languages. </li>
</ul>
<h2>Speech-to-text Translation</h2>
<p>Can we build a single unified model that takes voice input in one
language and output translation in another language? Existing
systems are cascaded, combining an ASR system and a MT system. This
project aims to build real working system that can achieve it in
end-to-end fashion. The major challenges are two folds. A model
should translate from the source language to the other, and it has
to convert from one modality (audio) to the other (text). In
addition, existing open datasets for speech translation are limited,
usually a few hundred hours, much less than that for machine
translation (e.g. 4 million pairs of sentences for English-German)</p>
<ul>
<li>Can we utilize the additional transcription text in the source
language to help train a better encoder for speech translation?
Inspired by human's listen, understand, and translate steps, we
propose LUT method that utilizes triplets of source audio, source
text, target text, and a pre-trained BERT model to train a better
end-to-end speech-to-text translation system. check out LUT <a
href="https://arxiv.org/abs/2009.09704">paper</a>.
[<a href="https://dqqcasia.github.io/projects/LUT/">project
page</a>]</li>
<li>Can we utilize additional large parallel bilingual sentence
pairs for machine translation to enhance speech translation? A
idea based on consecutive decoding can achieve this. checkout
COSTT <a href="https://arxiv.org/abs/2009.09737">paper</a>. [<a
href="https://dqqcasia.github.io/projects/COSTT/">project
page</a>]</li>
<li>Study of human brain reveals that there is common region is the brain responsible for text and speech
processing.
Can a neural network map input of text and speech to the same semantic space? check out <a
href="pubs/han2021learning.pdf">Chimera</a>, a model for speech-to-text translation,
utilizes the notion of share-semantic space to further improve speech translation. </li>
<li>Training techniques such as progressive multitask training improves speech translation. <a
href="pubs/ye2021end.pdf">XSTNet</a> obtains the state-of-the-art translation performance on MuST-C
dataset. </li>
</ul>
<ul>
</ul>
<ul>
</ul>
<h2>AI-powered Drug Discovery</h2>
<p>The purpose of this project is to use AI and machine learning to
power the whole process of drug discovery, test, trial validation
and manufacturing. </p>
<ul>
<li>Find novel and diverse molecules that are effective in terms of
multiple chemical properties and target proteins. check out the
MARS <a href="https://openreview.net/forum?id=kHSu4ebxFXY">paper</a>.
</li>
</ul>
</div>
</section>
<section id="software" class="highlight-section section">
<div class="wrap">
<div class="highlight-heading">
<h1> Software </h1>
</div>
<ul>
<li><a href="https://github.com/bytedance/lightseq">LightSeq</a>: A
High Performance Training and Inference Library for Transformer models.
It is widely used for machine translation, text generation, visual recognition, and more.
With the custom CUDA
implementation, it achieves 10x speed-up over the original
tensorflow seq2seq package, and faster than other implementations.
</li>
<li> <a href="https://github.com/bytedance/neurst"> NeurST </a>: A
toolbok with readily available models for neural machine
translation and speech-to-text translation. </li>
<li><a href="https://bayesianlogic.github.io/">BLOG</a>: a
probabilistic programming language for machine learning</li>
<li><a href="https://github.com/lileicc/swift">Swift</a>: a compiler
for the probabilistic programming language BLOG.</li>
<li><a href="software/dynammo-r346.zip">DynaMMo</a>: learning
toolbox for multi-dimensional co-evolving time series. <a href="https://github.com/lileicc/dynammo">github
page</a></li>
<li><a href="software/clds-r347.zip">CLDS</a>: complex-valued linear
dynamical system</li>
<li><a href="software/plif-r345.zip">PLiF</a>: time-shift-invariant
feature extraction for time series </li>
<li><a href="software/bolero-r349.zip">BoLeRO</a>: human motion
capture occlution recovering</li>
<li><a href="paralearn/index.html">paralearn</a>: a parallel
algorithm for learning Markov models and linear dynamical systems
(i.e. Kalman filter) </li>
<li><a href="software/mlds-r662.zip">MLDS</a>: learning dynamical
model for tensor time series </li>
</ul>
</div>
</section>
<section id="dataset" class="highlight-section section">
<div class="wrap">
<div class="highlight-heading">
<h1>Dataset</h1>
</div>
<ul>
<li> TTNews: a dataset for Chinese document summarization. 50,000
news articles with summary for training, and 4,000 news articles
for testing. [<a href="http://tcci.ccf.org.cn/conference/2018/dldoc/taskgline03.pdf">Task
description</a>] [<a href="https://pan.baidu.com/s/1bppQ4z1">Training
data</a>] [<a href="https://www.dropbox.com/s/luizl5rftml05nc/nlpcc_summarization_2017-2018_evaluation.zip?dl=0">Testing
data and evaluation script</a>] [Reports from <a href="pubs/hua2017overview.pdf">NLPCC2017</a>
and <a href="pubs/li2018overview.pdf">NLPCC2018</a>] </li>
<li> CNewSum: an extended version of TTNews for Chinese document
summarization. It includes 304,307 documents and human-written
summaries. It includes additional adequacy-level and
deducibility-level labels. [<a href="https://dqwang122.github.io/projects/CNewSum/">Project
URL</a>] </li>
<li>MLGSum: a multilingual text summarization corpus with 1.2
million articles in 12 languages. Average length per article is
570 words. [<a href="https://dqwang122.github.io/projects/CALMS/">Project
URL] </a> [<a href="https://drive.google.com/file/d/1i9xfOkQ60kixj0rZ-kCo8UCo2fZ51fCY/view?usp=sharing">Data</a>]</li>
</ul>
</div>
</section>
<section class="highlight-section section">
<div class="wrap">
<div class="highlight-heading">
<h1>Past Projects</h1>
</div>
<h2>Probabilistic programming languages and Bayesian inference</h2>
<ul>
<li> <a href="http://bayesianlogic.cs.berkeley.edu/">Bayesian Logic
(BLOG)</a> and its inference system. </li>
</ul>
<h2>Time series learning</h2>
<ul>
<li> Modelling, summarization, clustering, imputation, and
forecasting for multiple co-evolving time series data, with or
without missing values. [<a href="pubs/li2009dynammo.pdf">DynaMMo
paper</a>] [<a href="pubs/cao2018brits.pdf">BRITS neural based
approach</a>] </li>
<li>Human motion and motion capture analysis: <a href="mocap.stitch/index.html">natural
motion stitching</a>, <a href="pubs/li2011time.pdf">motion
clustering,</a> </li>
<li>Data center monitoring: to forecast temperature distribution
across servers using approximate thermo-dynamics, therefore to
control the cooling with minimal energy consumption in data
centers. [ThermoCast paper]</li>
<li> <a href="mlds/index.html">Tensor time series (MLDS)</a> </li>
</ul>
<h2>Parallel Learning for Sequential Models</h2>
<ul>
<li> <a href="paralearn/index.html"> Parallel algorithms for
graphical model on multicore </a> : (finished) </li>
</ul>
<h2>Network analysis</h2>
<ul>
<li>social network and social media analysis</li>
<li> <a href="http://www.db.cs.cmu.edu/db-site/Projects/cdem">CDEM</a>
:fly embryo gene pattern mining. (finished) </li>
</ul>
</div>
</section>
</main>
</div>
</body>
</html>