Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOW-TO] get hardware acceleration #1027

Open
dbuscaino opened this issue Apr 30, 2024 · 6 comments
Open

[HOW-TO] get hardware acceleration #1027

dbuscaino opened this issue Apr 30, 2024 · 6 comments

Comments

@dbuscaino
Copy link

Hello everyone, I'm trying to get hardware acceleration to reduce the cpu consumption while using picamera2 to stream the camera video.

I have a cm4 with two official raspberry camera 3.
Streaming a single camera requires around 45% of cpu consumption while streaming with both cameras require almost 100% of cpu.

I'm currently stream the video this way:

tcp_command = "-an -preset ultrafast -tune zerolatency -rtsp_transport tcp -f rtsp " + server_ip
raw_camera_resolution = (2304, 1296)
high_bitrate = 1500000
high_video_size = (1280, 720)

camera = Picamera2(0)
output = FfmpegOutput(tcp_command)

video_config = camera.create_video_configuration(main={"size": high_video_size}, raw={"size": raw_camera_resolution})
camera.configure(video_config)
encoder = H264Encoder(high_bitrate)

camera.set_controls({"FrameDurationLimits": (25000, 15000), "AfMode": controls.AfModeEnum.Manual, "LensPosition": 0})
camera.start_recording(encoder, output)

More than one year ago, before bullseye and using the old legacy camera, I was able to not using the cpu to stream the camera (the first raspberry camera version tho). It was possiile by compiling userland and ffmpeg with some options and using ffmpeg to stream the video with almost the same command as the one above.

Now this solution is not working anymore and it seems hard to find some topic related to my request.

Thanks in advance for your help

@davidplowman
Copy link
Collaborator

Hi, thanks for the question. You're right that the Arm cores are busier now that we run more of the camera stack on them. Here are some ideas to consider:

  • I thought results were slightly better if I added "format": "YUV420" to your main stream. This causes slightly less memory traffic.
  • Python is of course less efficient than C++ at processing all the frames and interrupts, and this gets worse with higher framerates. Might you be able to use libcamera-vid instead? This is a pure C++ application.
  • libcamera-vid (also know as rpicam-vid) can stream directly to networks. Might this help?
  • The 45% number is, if I understand correctly, a percentage of just a single core, and you have 4 of them. So there should be scope there to run a second camera as well. (Knowing what Python is like with multi-threading, it might run better if you start a separate Python process for each camera, you'd have to try it and see.)

I'm afraid I don't know too much about compiling customised versions of FFmpeg, so I'm sorry that I can't help you with that.

@dbuscaino
Copy link
Author

Hi and thanks for your answer!

  • I thought results were slightly better if I added "format": "YUV420" to your main stream. This causes slightly less memory traffic.

I'll try this for sure! I also read (but not in a recent posts) that maybe V4L2 might also improve cpu usage. Do you know something about that?

  • Python is of course less efficient than C++ at processing all the frames and interrupts, and this gets worse with higher framerates. Might you be able to use libcamera-vid instead? This is a pure C++ application.

Yes, I know C++ would be more efficient than Python but unfortunately the streaming code is just a small part of a larger Python program.
I'll try it anyway just to check if cpu consumption is lower but I think it's not possible to integrate it into the Python program. This is because I also use the frames for other purposes. In Python, if I use camera = Picamera(0) more than once, I get a "Device or resource busy" error (of course), and I think I would get the same error if I try to access the camera from different programs at the same time.

  • The 45% number is, if I understand correctly, a percentage of just a single core, and you have 4 of them. So there should be scope there to run a second camera as well. (Knowing what Python is like with multi-threading, it might run better if you start a separate Python process for each camera, you'd have to try it and see.)

Yes, you understood it correctly!
I'll think about that, but the main problem is that I need the frames for other purposes. Maybe I could use a pipe to provide the frames to the other processes, but it requires me a bit of work.

@davidplowman
Copy link
Collaborator

This will already be using the V4L2 encoder. If you run just the camera, with no encoding, the CPU usage is still relatively high (well, as a percentage of a single core). There does just seem to be quite a lot of code churning over, as camera and encoder interrupts are all passed up to and then handled by Python.

You're right that you can't access the same camera from different processes, but you can access different cameras. Not sure if that helps you, though. Piping buffers to other processes is likely to be expensive, of course. You can improve that by passing shared memory buffers around, but I think the whole thing would start to get really complicated.

Is it a problem that running two cameras will burn most of a CPU core? I would expect you might have problems getting the video encoder to deal with two streams at just under 60fps as well. Another thought would be to put force_turbo=1 into your /boot/firmware/config.txt file. The system isn't always great at realising how busy it is, so this can help too.

@dbuscaino
Copy link
Author

dbuscaino commented May 10, 2024

This will already be using the V4L2 encoder. If you run just the camera, with no encoding, the CPU usage is still relatively high (well, as a percentage of a single core). There does just seem to be quite a lot of code churning over, as camera and encoder interrupts are all passed up to and then handled by Python.

You're right that you can't access the same camera from different processes, but you can access different cameras. Not sure if that helps you, though. Piping buffers to other processes is likely to be expensive, of course. You can improve that by passing shared memory buffers around, but I think the whole thing would start to get really complicated.

Is it a problem that running two cameras will burn most of a CPU core? I would expect you might have problems getting the video encoder to deal with two streams at just under 60fps as well. Another thought would be to put force_turbo=1 into your /boot/firmware/config.txt file. The system isn't always great at realising how busy it is, so this can help too.

Finally i found some time to try what you suggested!

  • Adding "format": "YUV420" helped a lot because the cpu consumption is 10-20% less!

  • I did not see any improvements adding force_turbo=1 in my /boot/firmware/config.txt

I'd really like to try using libcamera-vid to stream directly to networks, but I can't make it works...
From the documentation you linked I read:

It is possible to use the libav backend as a network streaming source for audio/video. To do this, the output filename specified by the -o argument must be given as a protocol url, see [ffmpeg protocols](https://ffmpeg.org/ffmpeg-protocols.html) for more details on protocol usage.

I usually stream using the rtsp protocol and of course I can found it among the ffmpeg protocols.

However when I try to run this command
libcamera-vid -t 0 --codec libav --libav-format mpegts --libav-audio -o "rtsp://public_ip:port/specific_path"

I got this error (I'm coping the whole output because maybe it could be usefull):

[0:47:03.821424383] [3791]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:47:03.871220933] [3794]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:47:03.873578621] [3794]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@0/imx708@1a to Unicam device /dev/media1 and ISP device /dev/media0
[0:47:03.873684507] [3794]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
[0:47:03.883941129] [3794]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:47:03.886311354] [3794]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media2 and ISP device /dev/media3
[0:47:03.886419518] [3794]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
Preview window unavailable
Mode selection for 640:480:12:P
    SRGGB10_CSI2P,1536x864/0 - Score: 1486.67
    SRGGB10_CSI2P,2304x1296/0 - Score: 1786.67
    SRGGB10_CSI2P,4608x2592/0 - Score: 2686.67
[0:47:03.889459339] [3791]  INFO Camera camera.cpp:1183 configuring streams: (0) 640x480-YUV420 (1) 1536x864-SBGGR10_CSI2P
[0:47:03.889966438] [3794]  INFO RPI vc4.cpp:611 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 1536x864-SBGGR10_1X10 - Selected unicam format: 1536x864-pBAA
[h264_v4l2m2m @ 0x55aa98e890]  <<< v4l2_encode_init: fmt=179/0
[h264_v4l2m2m @ 0x55aa98e890] Using device /dev/video11
[h264_v4l2m2m @ 0x55aa98e890] driver 'bcm2835-codec' on card 'bcm2835-codec-encode' in mplane mode
[h264_v4l2m2m @ 0x55aa98e890] requesting formats: output=YU12 capture=H264
Input #0, pulse, from 'default':
  Duration: N/A, start: 1715330568.672152, bitrate: 1536 kb/s
  Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s
Output #0, mpegts, to 'rtsp://public_ip:port/specific_path':
  Stream #0:0: Video: h264, drm_prime(tv, smpte170m/smpte170m/bt709), 640x480, q=2-31, 200 kb/s, 30 fps, 30 tbr, 1000k tbn
  Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp, 32 kb/s
#0 (0.00 fps) exp 32876.00 ag 5.99 dg 1.01
#1 (30.01 fps) exp 32876.00 ag 5.99 dg 1.01
terminate called after throwing an instance of 'std::runtime_error'
  what():  libav: unable to open output mux for rtsp://public_ip:port/specific_path: Protocol not found
Aborted

A question related to stream directly using libcamera: it's possibile to use this command from python and also grab the frames for other purposes (I know I can import libcamera from python)?

@davidplowman
Copy link
Collaborator

I don't know too much about rtsp specifically, but I'm not sure that our libav integration is able to support it. In my experience, rtsp can be a bit tricky because it requires a server to deliver sdp descriptions and do some negotiation with the client. @naushir might be able to answer that definitively.

Another alternative would be to output your h.264 stream directly to stdout and pipe that into a separate ffmpeg process. This is less elegant, but it's the same thing that Picamera2 does, and you might get some performance benefits in running libcamera-vid instead.

@dbuscaino
Copy link
Author

I don't know too much about rtsp specifically, but I'm not sure that our libav integration is able to support it. In my experience, rtsp can be a bit tricky because it requires a server to deliver sdp descriptions and do some negotiation with the client. @naushir might be able to answer that definitively.

Another alternative would be to output your h.264 stream directly to stdout and pipe that into a separate ffmpeg process. This is less elegant, but it's the same thing that Picamera2 does, and you might get some performance benefits in running libcamera-vid instead.

Yes I know rtsp is a bit tricky but I use media-mtx as a server to manage the clients connections and it works very well!

Anyway I will try to pipe the h264 stream into ffmpeg and maybe compiling it I'll reach my goal!

Again thanks so much for your support!
I'll update this answer later with my result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants