Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]"Error opening data file /usr/share/eng.traineddata" error, regardless of TESSDATA_PREFIX #1412

Open
rezad1393 opened this issue Feb 7, 2022 · 10 comments
Labels

Comments

@rezad1393
Copy link

To get the version of CCExtractor, you can use --version.

CCExtractor version: CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.

In raising this issue, I confirm the following:

  • [ x] I have read and understood the contributors guide.
  • [x ] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • [x ] I have checked that the issue I'm posting isn't already reported.
  • [ x] I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • [x ] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • [x ] I have used the latest available version of CCExtractor to verify this issue exists.
  • [x ] I have ticked all the boxes in this section and to prove it I'm deleting the section completely to remove boilerplate text.

Necessary information

  • Is this a regression (i.e. did it work before)? I don't know
  • What platform did you use? linux
  • What were the used arguments? ccextractor -hardsubx VIDEOPATH

Video links

anything really

Additional information

what ever I set as TESSDATA_PREFIX ccextract still says the same error with the same path,
setting TESSDATA_PREFIX affects tesseract so I know it is not that.
but CCExtractor seems to look at a hardocded path.

@PunitLodha PunitLodha added the OCR label Feb 9, 2022
@paulshields
Copy link

paulshields commented Mar 24, 2022

I came across this bug when facing the same issue. I noticed there was a wget for the traineddata in the ccextractor/linux/build-static.sh file

wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata

though the path appears to have changed.

I can see all the traineddata files here though https://github.com/tesseract-ocr/tessdata.git

I downloaded the eng.traineddata via GitHub and copied it to the tesseract tessdata dir

sudo cp eng.traineddata /usr/share/tesseract-ocr/4.00/tessdata/

This then allowed me to run ccextractor against a file with burned in subs (no need to set TESSDATA_PREFIX) and it (mostly) worked. It ran at least and generated an SRT file.. I think I just need to play around with some thresholds to get more accurate OCR.

ccextractor CEA-608-SEI-Captiondata.mp4 -hardsubx -subcolor white -detect_italics -whiteness_thresh 90 -conf_thresh 60 -o cea.srt

btw - if you prefix ccextractor with strace -e file then you can see that it looks in various places for the tessdata directory

HardsubX (Hard Subtitle Extractor) - Burned-in subtitle extraction subsystem
openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/tessdata/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/share/tessdata/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
openat(AT_FDCWD, "/usr/share/tesseract-ocr/tessdata/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/tesseract-ocr/4.00/tessdata/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
openat(AT_FDCWD, "/usr/share/tesseract-ocr/4.00//tessdata/eng.traineddata", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/share/tesseract-ocr/4.00//tessdata/eng.traineddata", O_RDONLY) = 3

@ocococococ
Copy link

ocococococ commented Sep 4, 2022

At least, on mac using brew for tesseract 5 installation, tessdata directory /usr/local/share/tessdata is never found.
This could be too simplistic but changing tesseract version check in file ocr.c
if (!strncmp("4.", TessVersion(), 2))
by
if (TessVersion()[0] >= '4')
seems to do the trick
by forcing to use same code as tesseract version 4 which appends a slash to tessdata parent path.

Minor changes in CMakeLists.txt are also required to build on mac Big Sur.
tesseract_5_mac.patch.zip

@PunitLodha
Copy link
Member

Could you please check if this is still an issue on the latest master? Should have been fixed by #1479

@rezad1393
Copy link
Author

rezad1393 commented Mar 15, 2023

Can't test for hardsub

 ~/ccextractor-master/linux % RUST_BACKTRACE=full ./build_hardsubx 
Running pre-build script...
Obtaining Git commit
Git command not present, trying folder approach
Storing variables in file
Commit: Unknown
Date: 2023-03-15
Stored all in compile_info_real.h
Done.
Trying to compile...
Checking for cargo...
rustc >= MSRV(1.54.0)
Building rust files...
   Compiling rusty_ffmpeg v0.10.0+ffmpeg.5.1
   Compiling ccx_rust v0.1.0 (/home/me/ccextractor-master/src/rust)
error: failed to run custom build command for `rusty_ffmpeg v0.10.0+ffmpeg.5.1`

Caused by:
  process didn't exit successfully: `/home/me/ccextractor-master/src/rust/../../linux/rust/release/build/rusty_ffmpeg-0ee1e7c47bb1286e/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at 'No linking method set!', /home/me/.cargo/registry/src/github.com-1ecc6299db9ec823/rusty_ffmpeg-0.10.0+ffmpeg.5.1/build.rs:343:13
  stack backtrace:
     0:     0x5603810c832d - std::backtrace_rs::backtrace::libunwind::trace::h8217d0a8f3fd2f41
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
     1:     0x5603810c832d - std::backtrace_rs::backtrace::trace_unsynchronized::h308103876b3af410
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
     2:     0x5603810c832d - std::sys_common::backtrace::_print_fmt::hc208018c6153605e
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:66:5
     3:     0x5603810c832d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hf89a7ed694dfb585
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:45:22
     4:     0x5603810edecc - core::fmt::write::h21038c1382fe4264
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/fmt/mod.rs:1197:17
     5:     0x5603810c4ba1 - std::io::Write::write_fmt::h7dbb1c9a3c254aef
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/io/mod.rs:1672:15
     6:     0x5603810c9c05 - std::sys_common::backtrace::_print::h4e8889719c9ddeb8
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:48:5
     7:     0x5603810c9c05 - std::sys_common::backtrace::print::h1506fe2cb3022667
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:35:9
     8:     0x5603810c9c05 - std::panicking::default_hook::{{closure}}::hd9d7ce2a8a782440
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:295:22
     9:     0x5603810c9926 - std::panicking::default_hook::h5b16ec25444b1b5d
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:314:9
    10:     0x5603810ca196 - std::panicking::rust_panic_with_hook::hb0138cb6e6fea3e4
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:698:17
    11:     0x5603810ca049 - std::panicking::begin_panic_handler::{{closure}}::h4cb67095557cd1aa
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:586:13
    12:     0x5603810c87e4 - std::sys_common::backtrace::__rust_end_short_backtrace::h2bfcac279dcdc911
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:138:18
    13:     0x5603810c9db9 - rust_begin_unwind
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
    14:     0x560380bddf13 - core::panicking::panic_fmt::h1de71520faaa17d3
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
    15:     0x560380be17fa - build_script_build::static_linking::h5176694b9f3d639f
    16:     0x560380be207f - build_script_build::main::h1d1e33981c90d847
    17:     0x560380be7783 - core::ops::function::FnOnce::call_once::hd5fa772c5bfe5459
    18:     0x560380be2259 - std::sys_common::backtrace::__rust_begin_short_backtrace::ha9a647e66b4a3dc7
    19:     0x560380be7479 - std::rt::lang_start::{{closure}}::h824bfacb332716dd
    20:     0x5603810c082e - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h4937aaa125c8d4b2
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:280:13
    21:     0x5603810c082e - std::panicking::try::do_call::h6f5c70e8b0a34f92
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492:40
    22:     0x5603810c082e - std::panicking::try::h68766ba264ecf2e2
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456:19
    23:     0x5603810c082e - std::panic::catch_unwind::hc36033d2f9cc04af
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137:14
    24:     0x5603810c082e - std::rt::lang_start_internal::{{closure}}::h78c037f4a1a28ded
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/rt.rs:128:48
    25:     0x5603810c082e - std::panicking::try::do_call::he6e1fffda4c750ee
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492:40
    26:     0x5603810c082e - std::panicking::try::h48a77ddbb2f4c87a
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456:19
    27:     0x5603810c082e - std::panic::catch_unwind::hfa809b06a550a9e7
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137:14
    28:     0x5603810c082e - std::rt::lang_start_internal::h4db69ed48eaca005
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/rt.rs:128:20
    29:     0x560380be7461 - std::rt::lang_start::h0fe959b208925438
    30:     0x560380be2143 - main
    31:     0x7fefa92e3790 - <unknown>
    32:     0x7fefa92e384a - __libc_start_main
    33:     0x560380bde205 - _start
    34:                0x0 - <unknown>
warning: build failed, waiting for other jobs to finish...
Failed. 

@ocococococ
Copy link

ocococococ commented Mar 15, 2023

Could you please check if this is still an issue on the latest master? Should have been fixed by #1479

FYI, for my use cases (with these options -DWITH_OCR=ON -DWITHOUT_RUST=ON), it is ok.
tesseract 5 is used and tessdata can be found correctly.

I still needed to apply minor CMake modifications to be able to build it on Mac Os Big Sur (see tesseract_5_mac.patch.zip above)

@PunitLodha
Copy link
Member

Can't test for hardsub

 ~/ccextractor-master/linux % RUST_BACKTRACE=full ./build_hardsubx 
Running pre-build script...
Obtaining Git commit
Git command not present, trying folder approach
Storing variables in file
Commit: Unknown
Date: 2023-03-15
Stored all in compile_info_real.h
Done.
Trying to compile...
Checking for cargo...
rustc >= MSRV(1.54.0)
Building rust files...
   Compiling rusty_ffmpeg v0.10.0+ffmpeg.5.1
   Compiling ccx_rust v0.1.0 (/home/me/ccextractor-master/src/rust)
error: failed to run custom build command for `rusty_ffmpeg v0.10.0+ffmpeg.5.1`

Caused by:
  process didn't exit successfully: `/home/me/ccextractor-master/src/rust/../../linux/rust/release/build/rusty_ffmpeg-0ee1e7c47bb1286e/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at 'No linking method set!', /home/me/.cargo/registry/src/github.com-1ecc6299db9ec823/rusty_ffmpeg-0.10.0+ffmpeg.5.1/build.rs:343:13
  stack backtrace:
     0:     0x5603810c832d - std::backtrace_rs::backtrace::libunwind::trace::h8217d0a8f3fd2f41
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
     1:     0x5603810c832d - std::backtrace_rs::backtrace::trace_unsynchronized::h308103876b3af410
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
     2:     0x5603810c832d - std::sys_common::backtrace::_print_fmt::hc208018c6153605e
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:66:5
     3:     0x5603810c832d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hf89a7ed694dfb585
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:45:22
     4:     0x5603810edecc - core::fmt::write::h21038c1382fe4264
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/fmt/mod.rs:1197:17
     5:     0x5603810c4ba1 - std::io::Write::write_fmt::h7dbb1c9a3c254aef
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/io/mod.rs:1672:15
     6:     0x5603810c9c05 - std::sys_common::backtrace::_print::h4e8889719c9ddeb8
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:48:5
     7:     0x5603810c9c05 - std::sys_common::backtrace::print::h1506fe2cb3022667
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:35:9
     8:     0x5603810c9c05 - std::panicking::default_hook::{{closure}}::hd9d7ce2a8a782440
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:295:22
     9:     0x5603810c9926 - std::panicking::default_hook::h5b16ec25444b1b5d
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:314:9
    10:     0x5603810ca196 - std::panicking::rust_panic_with_hook::hb0138cb6e6fea3e4
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:698:17
    11:     0x5603810ca049 - std::panicking::begin_panic_handler::{{closure}}::h4cb67095557cd1aa
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:586:13
    12:     0x5603810c87e4 - std::sys_common::backtrace::__rust_end_short_backtrace::h2bfcac279dcdc911
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys_common/backtrace.rs:138:18
    13:     0x5603810c9db9 - rust_begin_unwind
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
    14:     0x560380bddf13 - core::panicking::panic_fmt::h1de71520faaa17d3
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
    15:     0x560380be17fa - build_script_build::static_linking::h5176694b9f3d639f
    16:     0x560380be207f - build_script_build::main::h1d1e33981c90d847
    17:     0x560380be7783 - core::ops::function::FnOnce::call_once::hd5fa772c5bfe5459
    18:     0x560380be2259 - std::sys_common::backtrace::__rust_begin_short_backtrace::ha9a647e66b4a3dc7
    19:     0x560380be7479 - std::rt::lang_start::{{closure}}::h824bfacb332716dd
    20:     0x5603810c082e - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h4937aaa125c8d4b2
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:280:13
    21:     0x5603810c082e - std::panicking::try::do_call::h6f5c70e8b0a34f92
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492:40
    22:     0x5603810c082e - std::panicking::try::h68766ba264ecf2e2
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456:19
    23:     0x5603810c082e - std::panic::catch_unwind::hc36033d2f9cc04af
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137:14
    24:     0x5603810c082e - std::rt::lang_start_internal::{{closure}}::h78c037f4a1a28ded
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/rt.rs:128:48
    25:     0x5603810c082e - std::panicking::try::do_call::he6e1fffda4c750ee
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492:40
    26:     0x5603810c082e - std::panicking::try::h48a77ddbb2f4c87a
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456:19
    27:     0x5603810c082e - std::panic::catch_unwind::hfa809b06a550a9e7
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137:14
    28:     0x5603810c082e - std::rt::lang_start_internal::h4db69ed48eaca005
                                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/rt.rs:128:20
    29:     0x560380be7461 - std::rt::lang_start::h0fe959b208925438
    30:     0x560380be2143 - main
    31:     0x7fefa92e3790 - <unknown>
    32:     0x7fefa92e384a - __libc_start_main
    33:     0x560380bde205 - _start
    34:                0x0 - <unknown>
warning: build failed, waiting for other jobs to finish...
Failed. 

@prateekmedia could you look into this? Seems like the build_hardsubx script is broken by adding rusty_ffmpeg which is a dependency of rsmpeg

@prateekmedia
Copy link
Contributor

prateekmedia commented Mar 16, 2023 via email

@PunitLodha
Copy link
Member

@rezad1393 can you try adding the env variables, FFMPEG_INCLUDE_DIR and FFMPEG_PKG_CONFIG_PATH and then trying again?

FFMPEG_INCLUDE_DIR=/usr/include
FFMPEG_PKG_CONFIG_PATH=/usr/lib/pkgconfig

or to whatever the correct path for your machine is

@rboy1
Copy link
Contributor

rboy1 commented Oct 3, 2023

@PunitLodha @cfsmp3 when do you think we could see a new release?

@cfsmp3
Copy link
Contributor

cfsmp3 commented Oct 4, 2023

@PunitLodha @cfsmp3 when do you think we could see a new release?

When we can merge all the pending PRs I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants