Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCV video decoding too slow for vidgear - ffmpeg might be better suited for decoding #148

Open
golyalpha opened this issue Jul 19, 2020 · 31 comments
Labels
ENHANCEMENT ⚡ New Feature/Addition/Improvement PROPOSAL 📩 A proposal/proposition WORK IN PROGRESS 🚧 currently been worked on.

Comments

@golyalpha
Copy link

Detailed Description

I've ran some encode and decode benchmarks, and it's becoming fairly obvious, that VidGear (CamGear API) is currently unable to decode 1080p60 video on computers where, given certain settings, ffmpeg can encode 1080p60 in real time or slightly faster without hardware acceleration.

Context

Currently, it's impossible to pass 1080p60 H264 (only tested codec) video through VidGear on computers that should be able to decode 1080p60 video just fine. The idea would be to replace the OpenCV VideoStream API with something more performant, like ffmpeg, since ffmpeg is capable of outputting raw video into stdout.

The CamGear API should not need to change from the developer's standpoint.

Your Environment

  • VidGear version: 0.1.8
  • Branch: PyPI
  • Python version: 3.8.2
  • pip version: 19.2.3
  • Operating System and version: Win10Pro 1909

Any Other Important Information

Encode/Decode benchmarks for VidGear (encode is compressed, so ffmpeg):

> poetry run sp-benchmark
Results:
        Encode:
                1080p: 71.34738617215714
                900p: 94.22808977047293
                720p: 137.51681644444432
                480p: 388.3952044196786
                360p: 506.7212349134308
                240p: 1020.0560010744543
                144p: 1860.4607896260777
        Decode:
                1080p: 36.054442749368185
                900p: 44.78923780306475
                720p: 55.349642074620796
                480p: 76.08067848749076
                360p: 81.93545752827764
                240p: 90.02970867849261
                144p: 99.95882945711747

ffmpeg decode benchmark (4k60):

> ffmpeg -i .\streampipe\benchmark\resources\bbb_sunflower_2160p_60fps_normal.mp4 -f null -
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '.\streampipe\benchmark\resources\bbb_sunflower_2160p_60fps_normal.mp4':
  Duration: 00:10:34.53, start: 0.000000, bitrate: 8487 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 3840x2160 [SAR 1:1 DAR 16:9], 8002 kb/s, 60 fps, 60 tbr, 60k tbn, 120 tbc (default)
    Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 160 kb/s (default)
    Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side), fltp, 320 kb/s (default)
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> wrapped_avframe (native))
  Stream #0:2 -> #0:1 (ac3 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
    Stream #0:0(und): Video: wrapped_avframe, yuv420p(progressive), 3840x2160 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 60 fps, 60 tbn, 60 tbc (default)
    Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, 5.1(side), s16, 4608 kb/s (default)
frame=38072 fps=152 q=-0.0 Lsize=N/A time=00:10:34.56 bitrate=N/A speed=2.53x
video:19928kB audio:356706kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
@abhiTronix
Copy link
Owner

abhiTronix commented Jul 19, 2020

The idea would be to replace the OpenCV VideoStream API with something more performant, like ffmpeg, since ffmpeg is capable of outputting raw video into stdout.

@golyalpha I Don't think that replacing OpenCV with FFmpeg is a good idea. Since the only leverage of FFmpeg over OpenCV is performance, While OpenCV is more advantageous as:

  • OpenCV seamlessly support multiple backends, which include powerful GStreamer, libav and FFmpeg itself too.
  • OpenCV is available on pretty much every Linux distribution, while FFmpeg might not be (for legal reasons).
  • OpenCV itself is available under flexible 3-clause BSD license while FFmpeg you have to make sure that no GPL components are enabled (some notable examples are x264 (H264 encoder) and libac3 (Dolby AC3 audio codec)). See https://www.ffmpeg.org/legal.html for details.
  • Plus FFmpeg is still buggy to be completely adopted reliably in my experience.

This is some of the reason why it's not a good idea. Being said that, we can still implement another Videocapture gear which works purely on FFmpeg. Thanks for this idea anyways.

@abhiTronix abhiTronix added ENHANCEMENT ⚡ New Feature/Addition/Improvement PR WELCOMED 📬 Related Pull Requests are welcomed for this issue! PROPOSAL 📩 A proposal/proposition labels Jul 19, 2020
@golyalpha
Copy link
Author

golyalpha commented Jul 19, 2020

Hmm, it is quite possible that merely using a different OpenCV backend - FFmpeg in this case - could potentially result in the lost performance to be gained, but I am quite skeptical to be honest. Gonna run some more benchmarks, and see.

As for FFmpeg not being available on some linux distributions - afaik, most offer system or user packages using which you can install ffmpeg, that being said, certain features (like hwaccel support) might be missing.

@golyalpha
Copy link
Author

golyalpha commented Jul 19, 2020

Alright, it seems that OpenCV is the part slowing down the decode process, because I just ran the same benchmark with ffmpeg as OpenCV backend, and the results are pretty much within the margin of error compared to the first one.
Implementing a new gear specifically for ffmpeg decoding does seem to be a direction we could both agree on.

@Thomasedv
Copy link

Thomasedv commented Jul 20, 2020

I ended up using another python module(PyAv) to decode frames because opencv (and vidgear as well because of it) just gives me less frames, despite reporting a higher total frame count. FFmpeg both reports and gives me all frames.

While less ideal, i used that for decoding and Writegear for encoding because it's so well made and easy to use.

I should probably investigate if the ffmpeg backend for opencv works better though.

@golyalpha
Copy link
Author

@Thomasedv That frame count mismatch sounds like a bug, though I haven't had that issue.

@Thomasedv
Copy link

It likely is, probably related to the file format I was working with. But it's what pushed me to just swap to something other than opencv.

@golyalpha
Copy link
Author

@abhiTronix, how open are you to using PyAV instead of FFMpeg in a subprocess?

@abhiTronix
Copy link
Owner

abhiTronix commented Jul 20, 2020

@abhiTronix, how open are you to using PyAV instead of FFMpeg in a subprocess?

@golyalpha PyAV directly provides Pythonic binding for the FFmpeg libraries, so there's no need of subprocess, thanks for bring this up @Thomasedv .

@golyalpha
Copy link
Author

Yeah, no, those were two separate things (PyAV OR FFMpeg in a subprocess), I didn't mean using PyAV in a subprocess.

@abhiTronix
Copy link
Owner

I should probably investigate if the ffmpeg backend for opencv works better though.

@Thomasedv Nope. It works worse according to my experience.

@abhiTronix
Copy link
Owner

@golyalpha Can you benchmark PyAV too?

@golyalpha
Copy link
Author

I can try, sure.

@abhiTronix

This comment has been minimized.

@golyalpha
Copy link
Author

golyalpha commented Jul 20, 2020

Alright, so, I've ran two benchmarks. One for just pure PyAV, and two with conversion to ndarray (with conversion to bgr24 pixfmt)

PyAV, no numpy:

Results:
        Encode:
                1080p: 73.02016990515683
                900p: 96.09936289726127
                720p: 143.28125537304703
                480p: 391.3972834093045
                360p: 552.811206220232
                240p: 1063.424403817687
                144p: 1874.5386877448052
        Decode:
                1080p: 91.65095330166262
                900p: 137.72149202414678
                720p: 222.37910244273223
                480p: 576.4954171016395
                360p: 802.877244418239
                240p: 1662.490813006485
                144p: 3751.7117184715594

The framerates are much lower than what I expected, but at least all of them are above 60

With conversion to bgr24 ndarray:

Results:
        Encode:
                1080p: 71.13727909096043
                900p: 95.8622574395303
                720p: 140.39600917324137
                480p: 396.6082721578848
                360p: 557.6079918236094
                240p: 1079.0677933333425
                144p: 1924.9309030011138
        Decode:
                1080p: 40.38017327435203
                900p: 57.87112250518494
                720p: 88.71823443616022
                480p: 231.0341425143733
                360p: 328.8222122692613
                240p: 610.0679514019871
                144p: 1158.537115184671

Unfortunately, these framerates are a bit too slow (really within a margin of error compared to OpenCV) for my tastes, though I am unsure what exactly is causing it - whether it's the yuv444p to bgr24 conversion, or the conversion to ndarray.

@Thomasedv
Copy link

Thomasedv commented Jul 20, 2020

@golyalpha Can you share some sample code? There are a few ways to improve performance. Here is a snippet from my loading:

# No idea if buffer size affects anything but I increased just to be safe. 
self.container = av.open(path, buffer_size=32768 * 1000)
self.v_stream = self.container.streams.video[0]

# Below have been tested, minimal time gain by enabling flags.
cc = self.v_stream.codec_context

# Fast decode, non-significant gain, could possibly break stuff?
self.v_stream.flags2 |= cc.flags2.FAST

# Not used, seems slower, might be worth trying for specific use
# if 'LOW_DELAY' in flags:
#     self.v_stream.flags |= cc.flags.LOW_DELAY

# AUTO or FRAME mode is faster than default SLICE 
# Important
self.v_stream.thread_type = 'AUTO' 

# Iterator that fetches images from ffmpeg and converts to images
self.frame_iter = (i.to_image() for i in self.container.decode(self.v_stream))

for i in frame_iter:
    pass

Some basic code that iterated all frames, that i previously used, take note that with or without FAST is negligible for the AUTO/FRAME, depending on run one is faster than the other.

Timetable: Tested on ~6 sec 1080p 24fps video. The conversion to images is by far the bigger factor. Without that, it can be as low as 0.23 seconds.

Benchmark [seconds]
1.8150 Flags FRAME
1.8500 Flags AUTO
1.8527 Flags FAST,AUTO
2.0453 Flags FAST,FRAME
3.2467 Flags SLICE
3.2507 Flags FAST,LOW_DELAY,FRAME
3.2537 Flags LOW_DELAY,FRAME
3.2537 Flags FAST,SLICE
3.2553 Flags FAST,LOW_DELAY,SLICE
3.2573 Flags LOW_DELAY,SLICE
3.2640 Flags LOW_DELAY,AUTO
3.4810 Flags FAST,LOW_DELAY,AUTO

@abhiTronix
Copy link
Owner

@golyalpha Are you using system FFmpeg or provide with PyAV? Try this to use system one (uninstall other one): https://pyav.org/docs/stable/overview/about.html#bring-your-own-ffmpeg

@golyalpha
Copy link
Author

@abhiTronix I'm on Windows, so there's only statically compiled ffmpeg. I'm gonna try out the suggestions provided by @Thomasedv though.

@abhiTronix
Copy link
Owner

@Thomasedv Take a look at this: https://pyav.org/docs/stable/overview/about.html#unsupported-features. Unfortunately, They don't support Hardware Decoding.

@abhiTronix abhiTronix added the OPEN MIC 🎙️ Issue/PR in context is open for discussion, Tune in to add your own views. label Jul 20, 2020
@golyalpha
Copy link
Author

golyalpha commented Jul 20, 2020

Alright, here's the benchmarks for AUTO and FRAME thread modes:
AUTO:

Results:
        Encode:
                1080p: 71.98127191267378
                900p: 92.41501457500298
                720p: 141.62948499270368
                480p: 383.9361130307915
                360p: 524.1047799011076
                240p: 1049.0982344988033
                144p: 1855.4381733211183
        Decode:
                1080p: 69.06515961116662
                900p: 96.55518458696021
                720p: 142.90701570434987
                480p: 394.51797547183196
                360p: 551.2138532902977
                240p: 927.1320560544067
                144p: 1528.588425018916

FRAME:

Results:
        Encode:
                1080p: 73.55168007667027
                900p: 95.02909909800753
                720p: 129.28694639800915
                480p: 348.7019859159272
                360p: 468.755127009201
                240p: 1026.2475642655454
                144p: 1757.4164806123872
        Decode:
                1080p: 64.21718766606168
                900p: 91.79361979266271
                720p: 143.48533532044576
                480p: 396.51005070372173
                360p: 561.1878288648363
                240p: 951.1565270738812
                144p: 1548.841756773775

As for Hardware Decoding, there is a valid point brought up by the FFMpeg team, regarding most CPUs being able to decode video just fine, and as the above benchmarks provide, it may be unnecessary.

While I do feel like there's more performance to be had from ffmpeg SW decode, I believe the main goal of this issue - to get decode speeds up to a reasonable frame rate (i.e. at least in the ballpark of encode performance) - has been met, and I feel that any performance gains that are to be had, can also be achieved even once the video source gear with PyAV backend has been implemented.

It should also be noted that all benchmarks above include the conversion to ndarray with bgr24 pixel format in their timings, meaning there shouldn't be any undue performance drops from my code, to VidGear's implementation.

@Thomasedv
Copy link

Thomasedv commented Jul 20, 2020

@golyalpha That's good (reasonably close i guess?), also thanks for the tip on ffmpeg backend for opencv. Also depending on CPU you might be limiting your decoding with encoding taking up the CPU.

@abhiTronix I wouldn't mind a PyAV backend, but the reason i like VidGears WriteGear is that i can basically just supply my own arguments i use with ffmpeg for other applications and i'm good to go. No need to find out what i need to call to make sure it works as i want. So i hope it that won't get completely replaced. Not to mention this brings hardware support given the user has a ffmpeg build with support.

@golyalpha
Copy link
Author

golyalpha commented Jul 20, 2020

As for the code that I currently use to run the decode benchmark, here it is:

        container = av.open(
            path.join(
                data,
                f"{label}.mp4"
            )
        )
        stream = container.streams.video[0]
        stream.thread_type = 'FRAME'
        start = perf_counter()
        for frame in tqdm(
                container.decode(stream),
                desc=f"Decoding ({res['width']}x{res['height']})",
                unit="f"
            ):
            frame.to_ndarray(format="bgr24")
        runs[label] = BENCHMARK_FRAMES/(perf_counter() - start)
        container.close()

data is path to folder with videos to decode (1080p.mp4, 900p.mp4, etc.)
label comes from the iteration over my "RESOLUTIONS" constant which is a dict of dicts, which mostly specifies resolution names, and their encode parameters.
runs is a dict that contains the FPS achieved during the benchmark for a given resolution name (starts out empty)

In my case, the data folder is geenrated from a single source file during the encode benchmarks.

I'm gonna write some transcode benchmarks now to measure the throughput, but yeah.

@golyalpha
Copy link
Author

golyalpha commented Jul 20, 2020

Alright, so, here's the full benchmark suite results. There's probably no surprise that the transcode benchmark did take a performance hit compared to both encode and decode benchmarks:

Results:
        Encode:
                1080p: 72.49778428646773
                900p: 95.24675059757413
                720p: 146.1157946973238
                480p: 386.1656789138575
                360p: 541.801571495458
                240p: 996.2011529368015
                144p: 1911.881386878759
        Decode:
                1080p: 69.3150663488147
                900p: 96.72077482045493
                720p: 146.651004339403
                480p: 405.85843116776283
                360p: 558.9751377175018
                240p: 940.221868855512
                144p: 1552.6584099742868
        Transcode:
                1080p: 43.92896890729546
                900p: 60.798654160991546
                720p: 92.93422626603142
                480p: 260.05522489413437
                360p: 351.4173172889867
                240p: 574.5021579737244
                144p: 1043.884468086483

Fortunately, this can be resolved by using two machines instead of one, and passing the frames using VidGear's NetGear.

@abhiTronix abhiTronix changed the title VidGear video decoding too slow - ffmpeg might be better suited for decoding OpenCV video decoding too slow for vidgear - ffmpeg might be better suited for decoding Sep 11, 2020
@abhiTronix abhiTronix added WORK IN PROGRESS 🚧 currently been worked on. PROPOSAL 📩 A proposal/proposition and removed PROPOSAL 📩 A proposal/proposition PR WELCOMED 📬 Related Pull Requests are welcomed for this issue! labels Mar 7, 2022
@abhiTronix abhiTronix added this to the 0.2.6 milestone Mar 7, 2022
@abhiTronix
Copy link
Owner

🎉 A new library is here: https://github.com/abhiTronix/deffcode

👍🏽 Any suggestions are most welcomed.

@zorrobyte
Copy link

@abhiTronix I'm using a Logitech Brio 4k webcam that needs MJPG to stream 4k. Using Videogear for this is very CPU intensive. On Windows, I get full FPS no problem. On Ubuntu, I get 100% CPU usage and lowered FPS.

Is there a way to hardware accelerate webcams using OpenCV/VideoGear?

I'm using this for a vehicle autonomy project (Openpilot Webcam)

@abhiTronix
Copy link
Owner

@zorrobyte use deffcode instead since Videogear works on OpenCV backend which itself is slower.

@zorrobyte
Copy link

zorrobyte commented May 9, 2022

@zorrobyte use deffcode instead since Videogear works on OpenCV backend which itself is slower.

Is it still possible to use cv options with deffcode?
eg

  cap_road.set(cv::CAP_PROP_FRAME_HEIGHT, 480);
  cap_road.set(cv::CAP_PROP_FPS, s->fps);
  cap_road.set(cv::CAP_PROP_AUTOFOCUS, 0); // off
  cap_road.set(cv::CAP_PROP_FOCUS, 0); // 0 - 255?
  // cv::Rect roi_rear(160, 0, 960, 720);

  // transforms calculation see tools/webcam/warp_vis.py
  float ts[9] = {1.50330396, 0.0, -59.40969163,
                  0.0, 1.50330396, 76.20704846,
                  0.0, 0.0, 1.0};

Is there a webcam code example for me to try?

@abhiTronix
Copy link
Owner

webcam code example

Unfortunately, Webcams are yet to be supported by deffcode. You can try CUDA backend for OpenCV Videocapture Decoding to speed up things while decoding. FYI Videogear or vidgear in general cannot accelerate beyond the speed of OpenCV itself. It can offload process to a different threads by multi-threading, but if the producer thread is already slow, it cannot do anything.

@AbdulrahmanSoliman1
Copy link

AbdulrahmanSoliman1 commented Jul 4, 2022

can you update us with the time webcam will be supported?
, I tried to add it to deffcode files

`17:27:04 ::   FFhelper    ::  DEBUG   :: URL scheme `video` is supported by FFmpeg.
17:27:05 ::    Sourcer    ::  DEBUG   :: Retrieving Metadata...
17:27:05 ::   FFdecoder   :: CRITICAL :: Activating Video-Only Mode of Operation.
17:27:06 ::   FFdecoder   ::   INFO   :: Live/Network Stream detected! Number of frames for given source is unknown.
17:27:06 ::   FFdecoder   ::  DEBUG   :: Executing FFmpeg command: `C:\Users\ABOD_\AppData\Local\Temp\ffmpeg-static-win64-gpl/bin/ffmpeg.exe -f dshow -rtbufsize 1024M -i video=Integrated Camera -pix_fmt bgr24 -s 1280x720 -framerate  -f rawvideo -`
17:27:07 :: Helper_Async  ::  DEBUG   :: Found valid WebGear data-files successfully.
17:27:07 ::  WebGear_RTC  ::  DEBUG   :: `C:\Users\ABOD_\.vidgear\webgear_rtc` is the default location for saving WebGear_RTC data-files.
17:27:07 ::  WebGear_RTC  :: CRITICAL :: Using custom stream for its Default Internal Video-Server.
17:27:07 ::  WebGear_RTC  ::  DEBUG   :: Setting params:: Size Reduction:1%
17:27:07 ::  WebGear_RTC  ::  DEBUG   :: Running Starlette application.
ffmpeg version 2022-06-30-git-03b2ed9a50-full_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
INFO:     Started server process [20196]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
Input #0, dshow, from 'video=Integrated Camera':
  Duration: N/A, start: 28852.888804, bitrate: N/A
  Stream #0:0: Video: mjpeg (Baseline) (MJPG / 0x47504A4D), yuvj422p(pc, bt470bg/unknown/unknown), 1280x720, 30 fps, 30 tbr, 10000k tbn
Stream mapping:
  Stream #0:0 -> #0:0 (mjpeg (native) -> rawvideo (native))
Press [q] to stop, [?] for help
Output #0, rawvideo, to 'pipe:':
  Metadata:
    encoder         : Lavf59.25.100
  Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24(pc, gbr/unknown/unknown, progressive), 1280x720, q=2-31, 663552 kb/s, 30 fps, 30 tbn
    Metadata:
      encoder         : Lavc59.34.100 rawvideo`

and it worked but the frames is not showing on WebGear_RTC website
it always stuck at setting timestamps

@abhiTronix
Copy link
Owner

abhiTronix commented Jul 5, 2022

@AbdulrahmanSoliman1 It will happen eventually but I need to finish releasing Vidgear v0.2.6 and then I can work on anything else. Actually I'm doing this in my free time which is very limited right now, therefore once I get the appropriate amount of time, I'll work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ENHANCEMENT ⚡ New Feature/Addition/Improvement PROPOSAL 📩 A proposal/proposition WORK IN PROGRESS 🚧 currently been worked on.
Projects
Status: In Progress
Development

No branches or pull requests

6 participants