Possibly creating feature vectors using full text of records #1455

rohitgarud · 2023-05-30T12:11:32Z

rohitgarud
May 30, 2023

Hi all,
There is a longformer model by AllenAI with a context length of 16384, which was used in the QASPER paper by researchers for embedding full texts for Question-Answering task can be used as feature extractor using the SBERT feature extraction model with the recently updated functionality (Python API) to possibly obtain feature vectors (embeddings) for the full texts of the papers in the dataset. This can be one of the solutions for the use cases where the required information is somewhere in the full text and is not available in the abstract.
Please give me your feedback. Thank you.

J535D165 · 2023-05-30T12:49:57Z

J535D165
May 30, 2023
Maintainer

This looks very promising.

In OpenAlex are ngrams available, I'm wondering if we can use those somehow in this feature extraction workflow as well. Based on your idea.

1 reply

rohitgarud May 30, 2023
Author

Yes, @J535D165 .. we can use those ngrams to create feature vectors by combining them into a single text. We can also extract full text from PDFs.. this will need some extra work but we can get the entire text. This can also be used to extract only the methodology section or some other sections from the text and use it for screening instead of abstracts (many users are having this use case #1398 #1384)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly creating feature vectors using full text of records #1455

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Possibly creating feature vectors using full text of records #1455

rohitgarud May 30, 2023

Replies: 1 comment · 1 reply

J535D165 May 30, 2023 Maintainer

rohitgarud May 30, 2023 Author

rohitgarud
May 30, 2023

Replies: 1 comment 1 reply

J535D165
May 30, 2023
Maintainer

rohitgarud May 30, 2023
Author