Skip to content
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.

LightLDA

hahahu edited this page Aug 3, 2017 · 3 revisions

This document shows how to install and use LightLDA.

Downloading

git clone --recursive https://github.com/Microsoft/lightlda

Installation

  • Windows

Open windows/LightLDA.sln using Visual Studio 2013 and build all the projects.

  • Linux (Tested on Ubuntu 14.04)

Run $sh ./build.sh to install the program.

Running LightLDA

We provide some quick guidelines as follows for your reference. You can get get more detailed instructions about command line arguments by running $./lightlda --help

  • Preprocess LightLDA takes specific binary format as its input. To run LightLDA you should first convert your own data to the format supported by LightLDA. A tool is provided to convert the LibSVM-format data to LightLDA-format data. So for simplicity, you can prepare your dataset in the LibSVM format first. In the following steps, we assume your dataset is in LibSVM format.

    1. Counting dataset meta information. ./example/get_meta.py input.libsvm output.word_tf_file
    2. Split your LibSVM data into several parts.
    3. Convert your data from LibSVM format to binary format used by LightLDA. ./bin/dump_binary input.libsvm.part_id input.word_tf_file part_id
  • Training on single machine

    We provide examples to illustrate how to use LightLDA to train topic models on a single machine. For instance, you can run in Powershell(Windows) $ ./example/nytimes.ps1 or in Bash(Linux)$ ./example/nytimes.sh to get a quick start of LightLDA.

  • Training with distributed setting with MPI

    Running MPI-based distributed LightLDA is quite similar to the single machine setting. Just use mpiexec and prepare a machine list file. Run $ mpiexec -machinefile machine_list lightlda lda_argments.