Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong JPEG library version: library is 90, caller expects 80 #205

Open
lzw5399 opened this issue Nov 5, 2020 · 4 comments
Open

Wrong JPEG library version: library is 90, caller expects 80 #205

lzw5399 opened this issue Nov 5, 2020 · 4 comments

Comments

@lzw5399
Copy link

lzw5399 commented Nov 5, 2020

Summary

Hi, I wrote a ocrserver based on gosseract (frontend page based on https://github.com/otiai10/ocrserver), below description all can be found in https://github.com/lzw5399/ocrserver.

There is a demand that OCR the received pdf base64 string, so I add the https://github.com/gen2brain/go-fitz depandency to convert the pdf to image(png) page by page, then use gosseract recognize the image. but after go-fizt was added I found the jpeg related functionality didn't work well, It seems have version conflict.
Thanks in advance for any help ^-^

  • combine the go-fizt & gosseract
  • then client.SetImageFromBytes(bytes) will return below error
  • after docker exec , and run find -name '*libjpeg*' I can't find any libjpeg 90 related files. It's really confused me
    image

Reproducibility

Reproducibility Frequency

  • 100%

Reproducible Dockerfile

# build stage
FROM golang:1.15 as builder

ENV GO111MODULE=on \
    GOPROXY=https://goproxy.cn,direct

WORKDIR /app

COPY . .

RUN rm -rf /etc/apt/sources.list && \
    echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" >> /etc/apt/sources.list && \
    apt-get update

RUN apt-get install -y \
    libleptonica-dev \
    libtesseract-dev \
    tesseract-ocr

RUN GOOS=linux GOARCH=amd64 go build .

RUN mkdir publish && cp bank-ocr publish && \
    cp -r app publish && mkdir publish/config && \
    cp config/appsettings.yaml publish/config/

FROM ubuntu:20.04

WORKDIR /app

COPY --from=builder /app/publish .

RUN rm -rf /etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse'>>/etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse'>>/etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse'>>/etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse'>>/etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse'>>/etc/apt/sources.list

RUN apt-get update \
  && apt-get install -y \
    libleptonica-dev \
    libtesseract-dev \
    tesseract-ocr \
    mupdf \
    mupdf-tools

RUN apt-get install -y \
  tesseract-ocr-eng \
  tesseract-ocr-chi-sim

ENV GIN_MODE=release \
    PORT=8080

EXPOSE 8080

ENTRYPOINT ["./bank-ocr"]

Otherwise, describe how to reproduce

  1. foo bar
  2. spam ham
  3. hoge fuga

Environment

  • uname -a
Linux 4d79f9b136d7 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • go env
golang:1.15 inside dockerfile
  • go version
golang:1.15 inside dockerfile
  • tesseract --version
root@4d79f9b136d7:/app# tesseract --version
tesseract 4.1.1
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.9.2 libzstd/1.4.4
@otiai10 otiai10 self-assigned this Nov 5, 2020
@otiai10
Copy link
Owner

otiai10 commented Jan 6, 2021

FYI

# Build stage
FROM golang:1.15 as builder

ENV GO111MODULE=on
RUN rm -rf /etc/apt/sources.list && \
    echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" >> /etc/apt/sources.list && \
    apt-get update -qq

RUN apt-get install -y \
    libleptonica-dev \
    libtesseract-dev \
    tesseract-ocr

RUN echo "Tesseract Version in Builder Stage:" >> /tess-versions && tesseract --version >> /tess-versions

# App stage
FROM ubuntu:20.04 as runner

COPY --from=builder /tess-versions /tess-versions

RUN rm -rf /etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse'>>/etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse'>>/etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse'>>/etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse'>>/etc/apt/sources.list && \
    echo 'deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse'>>/etc/apt/sources.list

RUN apt-get update \
  && apt-get install -y \
    libleptonica-dev \
    libtesseract-dev \
    tesseract-ocr \
    mupdf \
    mupdf-tools

RUN apt-get install -y \
  tesseract-ocr-eng \
  tesseract-ocr-chi-sim

RUN echo "\nTesseract Version in Runner Stage:" >> /tess-versions && tesseract --version >> /tess-versions

CMD ["cat", "/tess-versions"]
│ [issue-205] Tesseract Version in Builder Stage:
│ [issue-205] tesseract 4.0.0
│ [issue-205]  leptonica-1.76.0
│ [issue-205]   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
│ [issue-205]  Found AVX2
│ [issue-205]  Found AVX
│ [issue-205]  Found SSE
│ [issue-205]
│ [issue-205] Tesseract Version in Runner Stage:
│ [issue-205] tesseract 4.1.1
│ [issue-205]  leptonica-1.79.0
│ [issue-205]   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
│ [issue-205]  Found AVX2
│ [issue-205]  Found AVX
│ [issue-205]  Found FMA
│ [issue-205]  Found SSE
│ [issue-205]  Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4

@otiai10 otiai10 removed their assignment Jan 6, 2021
@h4ckitt
Copy link

h4ckitt commented Sep 30, 2022

Hi, I Have This Exact Issue And Will Be Happy To Provide Any Information Needed To Find A Solution.

Like OP, I'm Using go-fitz To Convert The PDF To Image, Then Feeding It To Gosseract.

@Trey2k
Copy link

Trey2k commented Mar 18, 2023

I am running into this issue as well. I am also using go-fitz. Would love to see a solution.
Edit: Workaround is to just encode it to a PNG instead of JPG with go-fitz.

@fpinna
Copy link

fpinna commented Sep 26, 2023

Using PNG worked for me also.
thanks @Trey2k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants