Skip to content

zihao-ai/AI-Paper-Collector

Repository files navigation

AI-Paper-Collector

version Status-building PRs-Welcome stars FORK Issues Open In Colab

Web demo:

Colab notebook: here

Motivation

Fully-automated scripts for collecting AI-related papers. Support fuzzy and exact search for paper titles.

demo

Search Categories

- [EMNLP 2019-2021] [ACL 2019-2022] [NAACL 2019-2022] [COLING 2020-2022] 
- [ICLR 2014-2022] [ICML 2014-2022] [CCS 2016-2022] [USS 2016-2022] 
- [EUROSP 2016-2022] [AAAI 2014-2022] [IJCAI 2014-2022] [CVPR 2014-2022] 
- [ICCV 2013-2021] [MM 2014-2022] [KDD 2014-2022] [CIKM 2019-2022] 
- [WSDM 2019-2022] [ECCV 2020-2022] [COLT 2016-2022] [AISTATS 2017-2022] 
- [JMLR 2014-2022] [TDSC 2016-2022] [TIP 2016-2022] [TPAMI 2014-2022] 
- [TIFS 2016-2022] [ICDM 2019-2022] [ICASSP 2014-2022] [NDSS 2016-2021] 
- [WWW 2014-2022] [NIPS 2016-2022] [WACV 2020-2022] 

Installation

Current installation is to clone this repo.

git clone https://github.com/MLNLP-World/AI-Paper-Collector.git
cd AI-Paper-Collector
pip install -r requirements.txt

Usage(v0.1.0)

We provide three usage modes, the first is interactive (main.py), the second is command-line (cli_main.py) and the other is web interface (app.py). The interactive mode is recommended for the first time users.

Interactive Usage with Example

To start the interactive, type:

python main.py

The initialization of indices and posting-list may take several seconds when you first use this script, after that they will be stored in ./output/ and you don't have to initialized them again. When the ./cache/cache.json is updated, you should manually delete ./output/postings.pkl and run main.py again.

Serveral steps to interactively search paper.

  1. the keyword query
  2. search mode (exact, fuzzy or boolean)
  3. (fuzzy) threshold
  4. the limit of results
  5. a list of conferences, separated by comma
  6. the file path of the output (top-5 for command preview, all results in this file)

E.g.

[+] Initializing System...
[+] Loading from cache...
[+] Enter your query: few-shot

[+] Select search mode:
	[1] Exact
	[2] Fuzzy
  [3] Boolean
[+] Enter a number between 1 to 3: 2
[+] Enter threshold between 0 and 100 (default: 50):
[+] Enter limit >= 0 (default: None):
[+] Enter the list of confs separated by comma
	E.g. "ACL,CVPR" or "AAAI" or enter nothing for all confs
[+] Enter your list of conferences (default: All Confs): SIGIR,WSDM,CIKM

[+] Search Results:
[=] Only show Top-5, Please Save results to see all.
[1] [CIKM2021] REFORM: Error-Aware Few-Shot Knowledge Graph Completion.
[2] [CIKM2021] Boosting Few-shot Abstractive Summarization with Auxiliary Tasks.
[3] [CIKM2021] Multi-objective Few-shot Learning for Fair Classification.
[4] [CIKM2020] Graph Few-shot Learning with Attribute Matching.
[5] [CIKM2020] Few-shot Insider Threat Detection.

[+] Enter Save filename:
[+] Writing results to output/fuzzy_None_SIGIR_WSDM_CIKM_few-shot.txt
[+] Writing results Done!

Boolean Query Rules:

For boolean search, you can use the standard boolean expressions with [AND, OR, NOT] and brackets. For example, you can write your queries like:

  1. language AND generation AND (pre-train OR pretrain)
  2. (dialogue OR dialog) AND generation AND NOT (response AND selection)
  3. toxic AND (dialogue OR conversation OR dialog)

Note that when you want to search for a phrase (e.g. contrastive learning), you should type contrastive AND learning instead of leaving blank between the words like contrastive learning.

The boolean query allows you to search exactly the key-words that you are interested in. Besides, it also helps to include the near-synonyms (like dialog, dialogue and conversation) and exclude the words that you are not interested in (like the second example).

Command-line Usage

For command-line usage, you can use the following commands:

# -q, --query:     the input query, and the content with multiple words should be wrapped in quotation marks
# -m, --mode:      the search mode: fuzzy or exact, default is exact
# -t, --threshold: the threshold for the fuzzy search, default is 50
# -l, --limit:     the limit num of the fuzzy search result, default is None
# -c, --conf:      the list of the conferences needs to search, default is all
# -o, --output:    the output file name, default is [mode]_[threshold]_[confs]_[query].txt
# -f, --force:     force to update the cache file incrementally
python cli_main.py --query QUERY \
    [--mode {fuzzy,exact}] \
    [--threshold THRESHOLD] [--limit LIMIT] [--conf CONF] \
    [--output OUTPUT] [--force]

E.g.

# Note that the input query must be enclosed in `""`, such as "few shot".
python cli_main.py -q "few shot" -m fuzzy -l 10 -t 10 -c AAAI,ACL -o results.txt

Web interface Usage

For web interface usage, you can use the following commands:

pip install -r requirements.txt
python app.py

Then open the following URL: http://localhost:5000

E.g. web

How to add new conferences from DBLP

Automatically Updating via an issue-triggered workflow

If anyone wants to add a new list of conferences. please raise an issue following the format of this one. We will check and label it, then the workflow will run automatically. issue format

For users who clone the project to use

  • add new conferences by modifying the conf/dblp_conf.json file
[
    # add the name and dblp_url of the new conf
    {
        "name": "WWW2021",
        "url": "https://dblp.org/db/conf/www/www2021.html"
    },
    ...
]
  • run the script
# force to update the cache file incrementally
python cli_main.py --query '' --force

Disclaimer

Since the tool is in the development stage, we can not guarantee that the papers found will meet your needs. I hope for your understanding. In addition, all the results come from DBLP, ACL, NIPS, OpenReview, if this violates your copyright, you can contact us at any time, we will delete it as soon as possible, thank you:)

Organizers

Contributors

Thanks to the contributors:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published