Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

View first page before entire document is loaded - support range header #419

Closed
5 tasks done
joepio opened this issue Jun 26, 2019 · 11 comments
Closed
5 tasks done
Assignees
Labels
question Further information is requested stale

Comments

@joepio
Copy link

joepio commented Jun 26, 2019

Before you start - checklist

  • I have read documentation in README
  • I have checked sample and test suites to see real life basic implementation
  • I have checked if this question is not already asked

What are you trying to achieve? Please describe.

In our project (issue, demo), I'd like to load only the pages that I'm viewing, and render the first page before the entire document is loaded.

From my understanding, PDF.js supports Range headers and the react-pdf API describes that it's possible to include a PDFDataRangeTransport object in the file property. I fail to see what to do to actually send these Range headers, though!

Describe solutions you've tried

  • Check if the source PDF is optimized for the web
  • Check if the hosting service supports HTTP Range headers

Environment

  • Chrome 75
  • MacOS 10.14.5
  • React-PDF 4.0.5
  • React-scripts 3.0.1
  • React 16.8.6
@wojtekmaj
Copy link
Owner

Hi,
Yeah, PDFDataRangeTransport should be supported, as React-PDF just passes it to pdf.js, does not much else with it. I found this topic on PDFDataRangeTransport objects creation.

It seems like the easiest way to get the behavior you want is to simply pass an URL as file prop. This should work just fine: https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#range

@wojtekmaj wojtekmaj added the question Further information is requested label Jul 4, 2019
@wojtekmaj wojtekmaj self-assigned this Jul 4, 2019
@joepio
Copy link
Author

joepio commented Jul 4, 2019

Thanks for the reply (and the awesome library, for that matter) @wojtekmaj!

Unfortunately, I do pass the URL as a file prop (source, demo), but it only renders after the entire document has been fetched.

Also, the request for the PDF file does not appear have any range headers.

Perhaps this conditional is never actually true if I pass a string?

    // File is PDFDataRangeTransport
    if (file instanceof PDFDataRangeTransport) {
      return { range: file };
    }

@joepio
Copy link
Author

joepio commented Aug 5, 2019

According to PDF.js developers, PDF.js does not support gzip encoding of range responses, so it needs to be set explicitly. According to the PDF.js docs, you can set custom headers. Since Document passes the options object to PDFjs.getDocument, this should work:

<Document
  options={{
    httpHeaders: {
      'Accept-Encoding': 'Identity',
    }
  }}
  file={"https://example.com/some.pdf"}
>

However, it does not, so I'm still investigating what is going on. It seems likely that it's a pdf.js issue.

@angel-langdon
Copy link

angel-langdon commented Jan 28, 2022

@joepio Did you manage to solve this issue?

I am using FastAPI as backend in Python and I have not managed to solve it

I have tried passing to <Document/> this sample URL (2 GB PDF) https://s3.amazonaws.com/pdftron/downloads/pl/2gb-sample-file.pdf
and it loads the first page immediately

@joepio
Copy link
Author

joepio commented Jan 28, 2022

@angel-langdon never managed to get it working, unfortunately...

@angel-langdon
Copy link

angel-langdon commented Jan 30, 2022

@joepio Well I finally managed to do it, it was failing because our backend implementation was not compatible with pdf.js

Frontend component

interface MemoizedDocumentProps {
  url: string;
  children: JSX.Element | null;
}

const MemoizedDocument = memo((props: MemoizedDocumentProps) => {
  const file = useMemo(
    () => ({ url: props.url }),
    [props.url]
  );
  return (
    <Document
      file={file}
    >
      {props.children}
    </Document>
  );
});

Backend implementation (in Python)

import os
from typing import BinaryIO

from fastapi import FastAPI, HTTPException, Request, status
from fastapi.responses import StreamingResponse


def send_bytes_range_requests(
    file_obj: BinaryIO, start: int, end: int, chunk_size: int = 10_000
):
    """Send a file in chunks using Range Requests specification RFC7233

    `start` and `end` parameters are inclusive due to specification
    """
    with file_obj as f:
        f.seek(start)
        while (pos := f.tell()) <= end:
            read_size = min(chunk_size, end + 1 - pos)
            yield f.read(read_size)


def _get_range_header(range_header: str, file_size: int) -> tuple[int, int]:
    def _invalid_range():
        return HTTPException(
            status.HTTP_416_REQUESTED_RANGE_NOT_SATISFIABLE,
            detail=f"Invalid request range (Range:{range_header!r})",
        )

    try:
        h = range_header.replace("bytes=", "").split("-")
        start = int(h[0]) if h[0] != "" else 0
        end = int(h[1]) if h[1] != "" else file_size - 1
    except ValueError:
        raise _invalid_range()

    if start > end or start < 0 or end > file_size - 1:
        raise _invalid_range()
    return start, end


def range_requests_response(
    request: Request, file_path: str, content_type: str
):
    """Returns StreamingResponse using Range Requests of a given file"""

    file_size = os.stat(file_path).st_size
    range_header = request.headers.get("range")

    headers = {
        "content-type": content_type,
        "accept-ranges": "bytes",
        "content-encoding": "identity",
        "content-length": str(file_size),
        "access-control-expose-headers": (
            "content-type, accept-ranges, content-length, "
            "content-range, content-encoding"
        ),
    }
    start = 0
    end = file_size - 1
    status_code = status.HTTP_200_OK

    if range_header is not None:
        start, end = _get_range_header(range_header, file_size)
        size = end - start + 1
        headers["content-length"] = str(size)
        headers["content-range"] = f"bytes {start}-{end}/{file_size}"
        status_code = status.HTTP_206_PARTIAL_CONTENT

    return StreamingResponse(
        send_bytes_range_requests(open(file_path, mode="rb"), start, end),
        headers=headers,
        status_code=status_code,
    )


app = FastAPI()


@app.get("/video")
def get_video(request: Request):
    return range_requests_response(
        request, file_path="path_to_my_video.mp4", content_type="video/mp4"
    )

I would strongly recommend reading the Range Requests RFC https://datatracker.ietf.org/doc/html/rfc7233 to understand everything, there are a few gotchas

@duriann
Copy link

duriann commented Mar 1, 2022

@joepio好吧,我终于设法做到了,由于区分大小写的标头而失败了。此外,您需要指定所有这些标头才能正常工作:

    headers  = {
         "Content-Type" : "application/pdf" ,
         "Accept-Ranges" : "bytes" ,
         "Content-Encoding" : "identity" ,
         "Access-Control-Expose-Headers" : (
             "Accept-Ranges , 内容长度, 内容范围" 
        ), "内容长度" : str ( end - start + 1 ),
         "内容范围" : f"字节{开始} - {结束} / {
         文件大小} "
    }

我强烈建议阅读 Range Requests RFC https://datatracker.ietf.org/doc/html/rfc7233以了解所有内容,有一些陷阱

hello, Are you write like this ?
<Document options={{ httpHeaders: { 'Content-Type': 'application/pdf', 'Accept-Ranges': 'bytes', 'Content-Encoding': 'identity', 'Access-Control-Expose-Headers': 'Accept-Ranges , Content-Length, Content-Range', 'Content-Length': '1000000', 'Content-Range':bytes 0 - 999999 / 1000000, }, }} file={'https://s3.amazonaws.com/pdftron/downloads/pl/2gb-sample-file.pdf'}/>
I found it doesn't work for me。。

@angel-langdon
Copy link

@bolosea I don't know if not english characters are valid, see my updated answer for full details

@duriann
Copy link

duriann commented Mar 3, 2022

@bolosea I don't know if not english characters are valid, see my updated answer for full details

thanks for your reply,but I don't backend。 and I found that url ' https://s3.amazonaws.com/pdftron/downloads/pl/2gb-sample-file.pdf' in pdf.js example project can works as expected in this
what's happend?
The weird thing is that it doesn't work when I download the pdf.js source code and run it with pdf.getDocument, I want to cry,QAQ

@github-actions
Copy link
Contributor

github-actions bot commented Jun 6, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 14 days.

@github-actions github-actions bot added the stale label Jun 6, 2022
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 14 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

4 participants