StackSearch

RSSE implementation using crowd-sourced data from Stack Overflow.

Utilizes word embedding models trained on a custom corpus, to recommend code snippets and/or Stack Overflow posts to user queries.

This project focuses on an RSSE for Java queries and makes use of the StackOverflow data dump.

Environment Setup

Clone StackSearch repository and run make setup. Handles virtual env setup and dependency installation.

git clone git@github.com:nikosoik/stacksearch.git ${HOME}/stacksearch && cd ${HOME}/stacksearch/src && make setup

Source the activate file to get into the python virtual environment.
```
source ${HOME}/.stacksearch/bin/activate
```
Trained word vector models, metadata and indices can be found here.

Web App Usage

Make sure the word vector models, metadata and indices are placed in the proper directories.
Run the Flask web app.
```
python3 web_app.py
```
Open http://localhost:5000/ in browser.

Preview

CLI Usage

demo.py [-h] {fasttext,tfidf,hybrid} ...

StackSearch Demo

positional arguments:
  {fasttext,tfidf,hybrid}
    fasttext            Use a FastText model for searching.
    tfidf               Use a TF-IDF model for searching.
    hybrid              Use a Hybrid model (FastText & TF-IDF) for searching.

optional arguments:
  -h, --help            show this help message and exit

Search model 'fasttext'
usage: demo.py fasttext [-h] MODEL INDEX METADATA RESULTS

positional arguments:
  MODEL       Path to the FastText model.
  INDEX       Path to the FastText search index.
  METADATA    Path to the metadata index.
  RESULTS     Number of results for each query.

optional arguments:
  -h, --help  show this help message and exit


Search model 'tfidf'
usage: demo.py tfidf [-h] MODEL INDEX METADATA RESULTS

positional arguments:
  MODEL       Path to the TF-IDF model.
  INDEX       Path to the TF-IDF search index.
  METADATA    Path to the metadata index.
  RESULTS     Number of results for each query.

optional arguments:
  -h, --help  show this help message and exit


Search model 'hybrid'
usage: demo.py hybrid [-h]
                      FASTTEXT MODEL TFIDF MODEL FASTTEXT INDEX TFIDF INDEX
                      METADATA RESULTS

positional arguments:
  FASTTEXT MODEL  Path to the FastText model.
  TFIDF MODEL     Path to the TF-IDF model.
  FASTTEXT INDEX  Path to the FastText search index.
  TFIDF INDEX     Path to the TF-IDF search index.
  METADATA        Path to the metadata index.
  RESULTS         Number of results for each query.

optional arguments:
  -h, --help      show this help message and exit

Example

./demo.py hybrid wordvec_models/fasttext_archive/ft_v0.6.1.bin wordvec_models/tfidf_archive/tfidf_v0.3.pkl wordvec_models/index/ft_v0.6.1_post_index.pkl wordvec_models/index/tfidf_v0.3_post_index.pkl wordvec_models/index/extended_metadata.pkl 20

Preview

Hybrid model

FastText model ft_v0.6.1.bin loaded.
TF-IDF model tfidf_v0.3.pkl loaded.
Index keys used: TitleV, BodyV

Query [query + enter], quit ['q' + enter]: How to calculate md5 checksums?
Tags (e.g. java, android): md5

1/20
################################# CODE #################################

DigestUtils.md5Hex(str);

################################# CODE #################################

Title: Java calculate MD5 hash
Post: https://stackoverflow.com/questions/7776116
Answer: https://stackoverflow.com/questions/7776244

Answer score: 38
Snippets for this post: 3
Top 8 tags for this query: java, md5, messagedigest, android, spring, checksum, md5sum, hashcode

Next code snippet [enter], new query ['q' + enter]:

Citation

Please cite our work

@InProceedings{10.1007/978-3-030-45234-6_6,
  url={https://doi.org/10.1007/978-3-030-45234-6_6}
  title={Extracting Semantics from Question-Answering Services for Snippet Reuse},
  author={Themistoklis Diamantopoulos, and Nikolaos Oikonomou, and Andreas Symeonidis},
  booktitle={Fundamental Approaches to Software Engineering},
  pages={119--139},
  year={2020},
  publisher={Springer International Publishing}
}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
evaluation		evaluation
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
webapp_preview.png		webapp_preview.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation

evaluation

src

src

.dockerignore

.dockerignore

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

requirements.txt

requirements.txt

webapp_preview.png

webapp_preview.png

Repository files navigation

StackSearch

Environment Setup

Web App Usage

Preview

CLI Usage

Example

Preview

Citation

About

Releases

Packages

Contributors 2

Languages

License

nikosoik/stacksearch

Folders and files

Latest commit

History

Repository files navigation

StackSearch

Environment Setup

Web App Usage

Preview

CLI Usage

Example

Preview

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages