Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 970 Bytes

File metadata and controls

13 lines (9 loc) · 970 Bytes

Scientific Paper Summarization using Document AI and Vertex AI

DEPRECATED: Go to github.com/GoogleCloudPlatform/generative-ai/language/use-cases/document-summarization

Training Data

ScisummNet - Scientific Article Summarization Dataset

  • Google Cloud Storage Bucket: gs://cloud-samples-data/documentai/ScisummNet
    • pdf - Original PDF files of papers from ACL Anthology
    • summary_txt - Human-written summaries of papers
    • json - Contains Document.json files processed by the Document AI OCR Processor
    • full_txt - Contains Full OCR-Extracted Text from each Document extracted from Document.json files