DeepRank_PyTorch

A simple version of DeepRank & MatchPyramid implement in PyTorch.

Please reference paper: DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval [https://arxiv.org/abs/1710.05649]. Text Matching as Image Recognition [https://arxiv.org/abs/1602.06359].

Quick Start

Toy Dataset Download

You can download data from [here].

$ tar xzvf data.tar.gz

Dataset Format

Word Dictionary File

(eg. word_dict.txt)

We map each word to a uniqe number, called wid, and save this mapping in the word dictionary file.

For example,

word   wid
machine 1232
learning 1156

Corpus File

(eg. qid_query.txt and docid_doc.txt)

We use a value of string identifier (qid/docid) to represent a sentence, such as a query or a document. The second number denotes the length of the sentence. The following numbers are the wids of the sentence.

For example,

docid  sentence_length  sentence_wid_sequence
GX000-00-0000000 42 2744 1043 377 2744 1043 377 187 117961 ...

Relation File

(eg. relation.train.fold1.txt, relation.test.fold1.txt ...)

The relation files are used to store the relation between two sentences, such as the relevance relation between query and document.

For example,

relevance   qid   docid
1 3571 GX245-00-1220850
0 3571 GX004-51-0504917
0 3571 GX006-36-4612449

Embedding File

(eg. embed_wiki-pdc_d50_norm)

We store the word embedding into the embedding file.

For example,

wid   embedding
13275 -0.050766 0.081548 -0.031107 0.131772 0.172194 ... 0.165506 0.002235

Training & Evaluation

Run the following commond to train & evaluate DeepRank and MatchPyramid model on Letor dataset:

$ jupyter notebook
$ # open sandbox-*.ipynb

Requirements

Python 3.6
TensorFlow 1.1.0

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
config		config
data		data
deeprank		deeprank
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deeprank_qc.py		deeprank_qc.py
deeprank_rl.py		deeprank_rl.py
matchpyramid_rl.py		matchpyramid_rl.py
sandbox-RL-deeprank.ipynb		sandbox-RL-deeprank.ipynb
sandbox-RL-matchpyramid.ipynb		sandbox-RL-matchpyramid.ipynb
sandbox-deeprank.ipynb		sandbox-deeprank.ipynb
sandbox-matchpyramid.ipynb		sandbox-matchpyramid.ipynb
sandbox.hx.ipynb		sandbox.hx.ipynb
sandbox.ipynb		sandbox.ipynb

License

pl8787/DeepRank_PyTorch

Folders and files

Latest commit

History

Repository files navigation

DeepRank_PyTorch

Quick Start

Toy Dataset Download

Dataset Format

Word Dictionary File

Corpus File

Relation File

Embedding File

Training & Evaluation

Requirements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages