Possibly creating feature vectors using full text of records #1455
rohitgarud
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
This looks very promising. In OpenAlex are ngrams available, I'm wondering if we can use those somehow in this feature extraction workflow as well. Based on your idea. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
There is a longformer model by AllenAI with a context length of 16384, which was used in the QASPER paper by researchers for embedding full texts for Question-Answering task can be used as feature extractor using the SBERT feature extraction model with the recently updated functionality (Python API) to possibly obtain feature vectors (embeddings) for the full texts of the papers in the dataset. This can be one of the solutions for the use cases where the required information is somewhere in the full text and is not available in the abstract.
Please give me your feedback. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions