Skip to content

iansnyder333/ScholarlyRecommender

Repository files navigation

DeepSource
DeepSource

Logo

Scholarly Recommender

End-to-end product that sources recent academic publications and prepares a feed of recommended readings in seconds.
Try it now »

Explore the Docs · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Methods
  9. Acknowledgements

About The Project

As an upcoming data scientist with a strong passion for deep learning, I am always looking for new technologies and methodologies. Naturally, I spend a considerable amount of time researching and reading new publications to accomplish this. However, over 14,000 academic papers are published every day on average, making it extremely tedious to continuously source papers relevant to my interests. My primary motivation for creating ScholarlyRecommender is to address this, creating a fully automated and personalized system that prepares a feed of academic papers relevant to me. This feed is prepared on demand, through a completely abstracted streamlit web interface, or sent directly to my email on a timed basis. This project was designed to be scalable and adaptable, and can be very easily adapted not only to your own interests, but become a fully automated, self improving newsletter. Details on how to use this system, the methods used for retrieval and ranking, along with future plans and features planned or in development currently are listed below.

(back to top)

Built With

Python Streamlit Pandas NumPy Arxiv.arxiv

(back to top)

Getting Started

To try ScholarlyRecommender, you can use the streamlit web application found Here. This will allow you to use the system in its entirety without needing to install anything. If you want to modify the system internally or add functionality, you can follow the directions below to install it locally.

Prerequisites

In order to install this app locally you need to have following:

  • git
  • python3.9 or greater (earlier versions may work)
  • pip3

Installation

To install ScholarlyRecommender, run the following in your command line shell

  1. Clone the repository from github and cd into it
    git clone https://github.com/iansnyder333/ScholarlyRecommender.git
    cd ScholarlyRecommender
  2. Set up the enviroment and install dependencies
    make build

All done, ScholarlyRecommender is now installed. You can now run the app with

make run

(back to top)

Usage

Once installed, you want to calibrate the system to your own interests. The easiest way to do this is using the webapp.py file. Alternativley, you can use calibrate.py, which runs on the console.

Make sure you are cd into the parent folder of the cloned repo.

Run this in your terminal as follows:

make run

This is the same as running:

streamlit run webapp.py

Navigate to the configure tab and complete the steps. You can now navigate back to the get recommendations tab and generate results! The web app offers full functionality and serves as an api to the system, while using the webapp, updates made to the configuration will be applied and refreshed in a continuous manner.

Note: If you are using ScholarlyRecommender locally, certain features such as direct email will not work as the original applications database will not be available. If you want to configure the email feature for yourself, you can follow the instructions provided in mail.py. This will require some proficiency/familiarity with SMPT. If you are having issues please feel free to check the docs, or make a discussion post here and someone will help you out.

(back to top)

Roadmap

  • ✅ Adding email support on the web app ✅
  • OS support, specifically for windows.
  • shell scripts to make installs, updates, and usage easier.
  • Database to store user configurations, currently zero user data is saved. Also would like to improve data locality and cache to improve user experience.
  • Making it easier to give feedback to suggested papers to improve the system
  • Improving the overall labeling experience, specifically the pooling process and labeling setup to be more integrated.
  • Improve modularity in the webapp and try to improve caching for faster performance.
  • Many visual and user experience improvements, a complete overhaul of the UX is likely.
  • Allowing users to navigate between pages without using the Navbar, streamlit does not currently support this directly.

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the apache license 2.0. See LICENSE for more information.

(back to top)

Contact

Ian Snyder - @iansnydes - idsnyder136@gmail.com

Project Email - scholarlyrecommender@gmail.com

My Website: iansnyder333.github.io/frontend/

Linkedin: www.linkedin.com/in/iandsnyder

(back to top)

Methods

Once candidates are sourced in the context of the configuration, they are ranked. The ranking process involves using the normalized compression distance combined with an inverse weighted top-k mean rating from the candidates to the labeled papers. This is a modified version of the algorithm described in the paper "“Low-Resource” Text Classification- A Parameter-Free Classification Method with Compressors" (1). The algorithm gets the top k most similar papers to each paper in the context that the user rated and calculates a weighted mean rating of those papers as its prediction. The results are then sorted by the papers with the highest predicting rating and are returned in accordance with the desired amount.

While using a large language model such as BERT might yield a higher accuracy, this approach is considerably more lightweight, can run on basically any computer, and requires virtually no labeled data to source relevant content. If this project scales to the capacity of a self improving newsletter, implementing a sophisticated deep learning model such as a transformer could be a worthwhile addition.

(back to top)

Acknowledgements

1 - “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors (Jiang et al., Findings 2023)

README Template - Best-README-Template by (othneildrew)

(back to top)