Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manipulate and save a DocumentSet object after loading. #60

Open
rjavierch opened this issue Sep 11, 2023 · 4 comments
Open

Manipulate and save a DocumentSet object after loading. #60

rjavierch opened this issue Sep 11, 2023 · 4 comments

Comments

@rjavierch
Copy link
Contributor

rjavierch commented Sep 11, 2023

Hello, I am wondering how possible is to manipulate (like in a pandas table) and save a loaded DocumentSet such as .bib, ieee_csv. Or also manipulate and save the data after doing a refinement (for example using refine_scupos).

Thank you!

@stijnh
Copy link
Member

stijnh commented Sep 12, 2023

Hi

I am wondering how possible is to manipulate (like in a pandas table)

Manipulating the documents themselves is not possible. You can, however, manipulate a DocumentSet which contains a list of documents by, for example, calculating the intersection, union, or differen between sets (see DocumentSet)

and save a loaded DocumentSet such as .bib, ieee_csv.

Saving a document set is not possible, but it is a highly requested feature. There are open issues for saving a document set as a Bibtex file or RIS file:

Of you interesting in looking into these, we welcome all relevant pull requests!

@FlashFFF
Copy link

FlashFFF commented Oct 19, 2023

I was looking for this as well. A possible workaround might be to just Export the documentset to a csv and later Reimport it if needed.
Or is there any other way to not lose my progress everytime I shut down my machine? I mean, there must be a database saved somewhere, or is all this data sitting in the memory?

@FlashFFF
Copy link

Which fields are called from the api upon refine? Is it all the ones from the class litstudy.types.Document?

@okoknik
Copy link

okoknik commented Oct 26, 2023

I was looking for this as well. A possible workaround might be to just Export the documentset to a csv and later Reimport it if needed. Or is there any other way to not lose my progress everytime I shut down my machine? I mean, there must be a database saved somewhere, or is all this data sitting in the memory?

Alternatively you could pickle the document set which takes less space than a csv. After that you can reload it whenever you would like to perform further analysis on the set. Just use these code snippets:

to save:
with open("data.pickle", "wb") as f:
pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
to load:
with open("data.pickle", "rb") as f:
data = pickle.load(f)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants