Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publicly Visible Simulation Results Archive for Benchmark Datasets from Synergy #1448

Open
rohitgarud opened this issue May 23, 2023 · 2 comments

Comments

@rohitgarud
Copy link
Contributor

Feature Request

Is your feature request related to a problem? Please describe.
Researchers perform the same simulations on the benchmark datasets using available models again and again wasting time and resources. Simulations play an important role in deciding the models for their own datasets from the same or different domains as the benchmark datasets.

Describe the solution you'd like
An archive of simulation results (maybe a GitHub repo or a page in the documentation) might be helpful for quickly going through the performance of different models for benchmark datasets. This can help in decision-making and avoid repeated simulations on benchmark datasets by new researchers trying to use ASReview for their own datasets.

Teachability, Documentation, Adoption, Migration Strategy
I think a filterable table is an ideal option to present the simulation results containing the fields for details such as feature extractor, classifier, balancer, and query strategy used The fields for simulation results such as recalls at different levels, WSS, ERF, and ATD along with the dataset information such as the name of the dataset, topic(s), number of records, number and percentage of included records, etc should be included. Some other information can also be included such as who performed the simulation and the random seed, the time required (including information about the hardware used) for the simulation, etc. Adding the recall plots will be a plus.

This can allow researchers to quickly see which models they should try first for their own simulations depending upon their domain and other factors such as the number of records and expected relevant records.

@jteijema
Copy link
Member

Hi @rohitgarud. We've been playing with this idea for a while, and we would love your input on the project. Let's set up this project as a collaboration!

@rohitgarud
Copy link
Contributor Author

Great.. how you are planning to develop the platform.. we will discuss further details in the meeting..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants