Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFmpeg-Kit on Android consumes excessive time and memory compared to Termux #959

Open
tasy5kg opened this issue Apr 11, 2024 · 14 comments
Open
Labels
android Affect Android platform needs-analysis We don't know that this is. It must be investigated further performance

Comments

@tasy5kg
Copy link

tasy5kg commented Apr 11, 2024

Background: Termux is a Linux terminal emulator on the Android platform. We can install FFmpeg in it by executing pkg install ffmpeg, and the current provided version is 6.1.1.

Recently, I compared the performance of transcoding videos using FFmpeg via Termux and via FFmpeg-Kit and found that Termux consumes much less time and memory.

For example, on my Android 14 phone, FFmpeg in Termux transcoded a video in 44.4 seconds and consumed 604M of memory. However, using FFmpeg-Kit (implemented with com.arthenica:ffmpeg-kit-full-gpl:6.0-2.LTS and calling FFmpegKit.executeAsync()) for the same task required 82.1 seconds and 724M of memory. Screenshots:

To ensure this is not device-specific, I conducted tests on an Android 10 x86-64 emulator. Termux completed the task in 30 seconds, using 585M of memory, while FFmpeg-Kit took 88 seconds and utilized nearly 1GB of memory. Screenshots:

Despite various attempts to find the cause — including making sure both were using the arm64-v8a platform (or both were x86-64), importing different videos, altering bit depths, and hardware acceleration settings, as well as adjusting power options and background process limits — the huge performance gap persisted consistently.

I tested video transcoding on Windows using FFmpeg versions 6.0 and 6.1.1 and observed no significant performance differences.

So I am wondering if this is an unexpected behavior, and it would be better if FFmpeg inside FFmpeg-Kit could perform as well as Termux.

The FFmpeg-Kit project brings great convenience to Android developers like me who develop functions based on FFmpeg. I am eager to provide additional information or assistance if required. Any insights or guidance would be sincerely appreciated.

@tanersener tanersener added the android Affect Android platform label Apr 11, 2024
@tanersener
Copy link
Collaborator

Thanks for the benchmarks. According to the screenshots:

The test on Termux:

  1. Decodes an h265 file natively by enabling hw accelerated decoders available via the auto flag
  2. Encodes the video stream using the mediacodec hevc encoder

The test on FFmpegKit:

  1. Decodes a file natively. No hw accelerated decoders are used. Input format is also not visible in the screenshot
  2. Encodes the video stream using the x265 encoder

Which means this test mostly compares Termux's hevc_mediacodec encoder with FFmpegKit's x265 encoder.

Is seeing FFmpegKit's x265 encoder consume two times more CPU and RAM normal? I guess so. We have an ASM Support wiki page, where we list libraries that cannot fully use CPU specific instructions. Unfortunately, x265 is there for both Android and iOS. We also know that there are memory leaks in x265. So, it's not surprise to see x265 so bad.

If you have time I suggest testing the performance of the same encoder in both implementations.

@tasy5kg
Copy link
Author

tasy5kg commented Apr 11, 2024

I apologize for not providing a full screenshot initially. However, I am sure I am not making the wrong comparison. The command lines used in each round of comparison are exactly the same except for the output file path.

In the first round of comparison (on my Android 14 phone), command lines are visible, both used hevc(native) decoding and hevc_mediacodec (hardware accelerated) encoding. You can also see that the file size output by both is the same at 22440KB.

In the second comparison (on Android 10 emulator), both used hevc(native) decoding and libx265(software) encoding. x265 [info] appears in the output of the command line, which proves that Termux used the software encoder libx265 here. The difference in file size may be due to different versions of libx265.

@tasy5kg
Copy link
Author

tasy5kg commented Apr 11, 2024

This is hevc (native) -> h264 (libx264). Termux consumes 23.7 seconds and 756M of memory; FFmpeg-Kit consumes 32.2 seconds and 896M of memory.
IMG_20240412_021722
Screenshot_2024-04-12-02-14-08-499_me tasy5kg cutesqueeze

@tanersener
Copy link
Collaborator

Thanks for updating the screenshots. Now, I see the difference for the same codec.

Well, x265 is not surprising as I said before. But the diff in x264 and mediacodec is too much. x264 takes 50% more time,mediacodec almost takes twice the time it spends on termux.

On one hand, it is good to have a reference library to compare our performance against. On the other hand, identifying the root cause requires extensive testing.

I'll add a task to the Roadmap to analyse how termux achieves that.

If you have time, I suggest testing FFmpegKit with the following options and checking if the performance improves in any of those scenarios.

  1. No GUI in FFmpegKit test application
  2. Commands executed via synchronous FFmpegKit.execute() calls
  3. Commands executed after redirection is disabled via FFmpegKitConfig.disableRedirection()

@tanersener tanersener added needs-analysis We don't know that this is. It must be investigated further performance labels Apr 11, 2024
@tasy5kg
Copy link
Author

tasy5kg commented Apr 19, 2024

If you have time, I suggest testing FFmpegKit with the following options and checking if the performance improves in any of those scenarios.

  1. No GUI in FFmpegKit test application
  2. Commands executed via synchronous FFmpegKit.execute() calls
  3. Commands executed after redirection is disabled via FFmpegKitConfig.disableRedirection()

I tested all three scenarios and did not seem to observe any performance improvement.

Could we customize the version of FFmpeg source code when compiling FFmpeg-Kit? If possible, maybe I can try it myself using a newer version of FFmpeg.

@tanersener
Copy link
Collaborator

Okay. Thanks for checking. Well, this line defines FFmpeg version to be compiled. You can try using a newer version there.

@tasy5kg
Copy link
Author

tasy5kg commented Apr 20, 2024

I noticed that the wiki page Speed Optimization mentioned that the --speed option can increase the speed of FFmpeg operations but not set by default. Could the reason be related to this?


Also, I think the performance issues I mentioned may still need to be reproduced by anyone else to ensure that this issue is not caused by my personal compilation environment. (If it is because of my stupid mistake that this issue arose, I would feel really embarrassed to inconvenience you!🥹)

@tanersener
Copy link
Collaborator

Speed and size are two primary concerns for FFmpegKit users.

I previously tested the --speed option, but I didn't observe significant improvements in my tests. Consequently, I decided not to enable it, at least to reduce size, which is still not good enough for most users.

However, these tests were conducted on older versions. It may be necessary to rerun them to reassess the situation. Unfortunately, time constraints are a significant factor for me. I am currently pressed for time, and we have limited contributions to address these issues.

I appreciate your contribution and feedback. I will make an effort to dedicate some time on termux in the upcoming 1-2 weeks. I will share my findings here.

@tanersener
Copy link
Collaborator

Today, I conducted some tests on the libx264 encoder using the largest test file from example.com. It is a 18 MB file.

I ran the following command on an arm64-v8a device.

-y -benchmark -i example.mp4 -c:v libx264 compressed_ffmpeg_kit_full_gpl_x264.mp4

I observed a difference in memory usage. Other than that, I didn't see a significant difference between ffmpeg 6.1.1 @ termux 0.118 and ffmpeg-kit-full-gpl-6.0-2 in terms of cpu usage.

This is termux, where FFmpeg 6.1.1 binary is compiled using NDK r26b and Android API Level 24.

bench: utime=224.977s stime=4.960s rtime=32.341s
bench: maxrss=687092kB

This is ffmpeg-kit 6.0, compiled on NDK r22b and Android API Level 24

bench: utime=241.973s stime=5.484s rtime=33.303s
bench: maxrss=776324kB

I also repeated my tests on a local ffmpeg-kit 6.0 binary compiled on NDK r26d. There is a very small improvement in cpu usage. But, it is nowhere near the difference you observed in your tests.

bench: utime=219.950s stime=4.762s rtime=31.421s
bench: maxrss=777468kB

@tasy5kg
Copy link
Author

tasy5kg commented Apr 23, 2024

In recent days, I've been reflecting on and investigating this issue extensively, conducting numerous tests in an attempt to identify the cause behind these test results.

Upon reviewing the tests I've conducted, I realized that all the video samples used in my tests were recorded using my phone. This is because I'm trying to develop an Android application to compress videos shot on my phone.

When testing with my own recorded videos, visible performance differences were evident regardless of the encoder used. However, after receiving your response, I attempted testing with video samples downloaded from the internet and obtained results similar to yours - no noticeable performance differences during transcoding.

This prompted me to consider that the issue might lie in decoding performance. Videos from the internet are typically compressed and easier to decode, whereas videos recorded on my phone usually have higher resolution and bitrate, making decoding performance the true bottleneck.

Therefore, I selected some videos I recorded and others downloaded from the internet, and tested decoding performance using the FFmpeg -hide_banner -benchmark -an -i <input.mp4> -f -null - command. These tests were conducted on my Android phone. (Snapdragon 8 Gen 2, arm64-v8a, API 34, 12GB RAM)

List of files used for testing:

file name size codec bit_rate resolution pix_fmt color_space source
VID_20240411_190238_.mp4 42.3MB hevc 15724862 1920*1080 yuv420p bt709 shot by me, uploaded to Google Drive
VID_20240410_174528_HDR10PLUS_.mp4 139.4MB hevc 38935277 3840*2160 yuv420p10le bt2020nc shot by me, uploaded to Google Drive
file_example_MP4_1920_18MG.mp4 17.0MB h264 4486713 1920*1080 yuv420p bt709 downloaded from file-examples.com
1918465-uhd_3840_2160_24fps.mp4 46.0MB h264 25236664 3840*2160 yuv420p bt709 downloaded from pexels.com

Test result:

file name label utime(s) stime(s) rtime(s) maxrss(kB)
VID_20240411_190238_.mp4 Termux 15.648 0.498 8.283 152632
VID_20240411_190238_.mp4 FFmpeg-Kit 32.697 0.538 17.049 267496
VID_20240411_190238_.mp4 compare +109% +8% +106% +75%
VID_20240410_174528_HDR10PLUS_.mp4 Termux 83.827 0.977 56.333 565168
VID_20240410_174528_HDR10PLUS_.mp4 FFmpeg-Kit 148.439 2.138 101.33 644808
VID_20240410_174528_HDR10PLUS_.mp4 compare +77% +119% +80% +14%
file_example_MP4_1920_18MG.mp4 Termux 6.215 0.544 1.243 146376
file_example_MP4_1920_18MG.mp4 FFmpeg-Kit 7.329 0.447 1.441 257008
file_example_MP4_1920_18MG.mp4 compare +18% -18% +16% +76%
1918465-uhd_3840_2160_24fps.mp4 Termux 16.796 0.687 2.891 360164
1918465-uhd_3840_2160_24fps.mp4 FFmpeg-Kit 20.59 0.658 3.565 463252
1918465-uhd_3840_2160_24fps.mp4 compare +23% -4% +23% +29%

Let's review the log output. On the left side of the screenshot is the output from Termux, and on the right side is the output from FFmpeg-Kit.

Screenshot 2024-04-23 143838

Here I'm taking VID_20240411_190238_.mp4 as an example, and the logs for the other files are similar. Statistics logs are omitted.

It appears that they both use the decoder called native, but there is a significant performance difference.

It seems like this is the real issue at hand.

@tasy5kg
Copy link
Author

tasy5kg commented Apr 23, 2024

Additionally, I also tested not using the native decoder, but using the Android hardware accelerated decoder.

Command line used for testing:

ffmpeg -hide_banner -an -benchmark -hwaccel mediacodec -i <input.mp4> -f null -

Log output in Termux:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/storage/emulated/0/FFmpegTest/VID_20240420_182825_8K_.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
encoder : Lavf60.16.100
Duration: 00:01:08.07, start: 0.000000, bitrate: 105099 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, bt709), 7680x4320, 105097 kb/s, 23.99 fps, 24 tbr, 90k tbn (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
[hevc_mediacodec @ 0xb4000073052a0c00] Both surface and native_window are NULL
[hevc_mediacodec @ 0xb4000073052a0c00] Using surface 0x0
[hevc_mediacodec @ 0xb4000073052a0c00] No Java virtual machine has been registered
[hevc_mediacodec @ 0xb4000073052a0c00] Failed to getCodecNameByType
[hevc_mediacodec @ 0xb4000073052a0c00] Output crop parameters top=0 bottom=4319 left=0 right=7679, resulting dimensions width=7680 height=4320
[hevc_mediacodec @ 0xb4000073052a0c00] MediaCodec started successfully: codec = c2.qti.hevc.decoder, ret = 0
Stream mapping:
Stream #0:0 -> #0:0 (hevc (hevc_mediacodec) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
[hevc_mediacodec @ 0xb4000073052a0c00] Output MediaFormat changed to android._color-format: int32(2141391876), android._video-scaling: int32(1), android._dataspace: int32(260), color-standard: int32(1), color-range: int32(2), color-transfer: int32(3), sar-height: int32(1), rotation-degrees: int32(0), hdr-static-info: data, sar-width: int32(1), crop: Rect(0, 0, 7679, 4319), width: int32(7680), feature-secure-playback: int32(0), frame-rate: int32(30), hdr10-plus-info: data, height: int32(4320), max-height: int32(4320), max-width: int32(8192), mime: string(video/raw), priority: int32(1), color-format: int32(21), image-data: data, stride: int32(7680), slice-height: int32(4320)}
[hevc_mediacodec @ 0xb4000073052a0c00] Output crop parameters top=0 bottom=4319 left=0 right=7679, resulting dimensions width=7680 height=4320
[hevc_mediacodec @ 0xb4000073052a0c00] Output MediaFormat changed to android._color-format: int32(2141391876), android._video-scaling: int32(1), android._dataspace: int32(260), color-standard: int32(1), color-range: int32(2), color-transfer: int32(3), sar-height: int32(1), rotation-degrees: int32(0), hdr-static-info: data, sar-width: int32(1), crop: Rect(0, 0, 7679, 4319), width: int32(7680), feature-secure-playback: int32(0), frame-rate: int32(30), hdr10-plus-info: data, height: int32(4320), max-height: int32(4320), max-width: int32(8192), mime: string(video/raw), priority: int32(1), color-format: int32(21), image-data: data, stride: int32(7680), slice-height: int32(4320)}
[hevc_mediacodec @ 0xb4000073052a0c00] Output crop parameters top=0 bottom=4319 left=0 right=7679, resulting dimensions width=7680 height=4320
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
encoder : Lavf60.16.100
Stream #0:0(und): Video: wrapped_avframe, nv12(tv, bt709/bt709/smpte170m, progressive), 7680x4320, q=2-31, 200 kb/s, 24 fps, 24 tbn (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 wrapped_avframe
[out#0/null @ 0xb40000730521cfc0] video:765kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
frame= 1633 fps= 54 q=-0.0 Lsize=N/A time=00:01:08.04 bitrate=N/A speed=2.26x
bench: utime=20.202s stime=7.498s rtime=30.168s
bench: maxrss=505652kB

Log output in FFmpegKit:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/storage/emulated/0/FFmpegTest/VID_20240420_182825_8K_.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
encoder : Lavf60.16.100
Duration: 00:01:08.07, start: 0.000000, bitrate: 105099 kb/s
Stream #0:0[0x1](und): Video: hevc (hvc1 / 0x31637668), yuv420p(tv, bt709), 7680x4320, 105097 kb/s, 23.99 fps, 24 tbr, 90k tbn (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
[hevc_mediacodec @ 0x70158de800] Both surface and native_window are NULL
[hevc_mediacodec @ 0x70158de800] Using surface 0x0
[hevc_mediacodec @ 0x70158de800] Output crop parameters top=0 bottom=4319 left=0 right=7679, resulting dimensions width=7680 height=4320
[hevc_mediacodec @ 0x70158de800] MediaCodec started successfully: codec = c2.qti.hevc.decoder, ret = 0
Stream mapping:
Stream #0:0 -> #0:0 (hevc (hevc_mediacodec) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
[hevc_mediacodec @ 0x70158de800] Output MediaFormat changed to {crop-right=7679, max-height=4320, sar-width=1, color-format=21, slice-height=4320, image-data=java.nio.HeapByteBuffer[pos=0 lim=104 cap=104], mime=video/raw, hdr-static-info=java.nio.HeapByteBuffer[pos=0 lim=25 cap=25], priority=1, stride=7680, color-standard=1, feature-secure-playback=0, color-transfer=3, sar-height=1, hdr10-plus-info=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0], crop-bottom=4319, max-width=8192, crop-left=0, width=7680, color-range=2, crop-top=0, rotation-degrees=0, frame-rate=30, height=4320}
[hevc_mediacodec @ 0x70158de800] Output crop parameters top=0 bottom=4319 left=0 right=7679, resulting dimensions width=7680 height=4320
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
encoder : Lavf60.3.100
Stream #0:0(und): Video: wrapped_avframe, nv12(tv, bt709/bt709/smpte170m, progressive), 7680x4320, q=2-31, 200 kb/s, 24 fps, 24 tbn (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
encoder : Lavc60.3.100 wrapped_avframe
frame= 1633 fps= 54 q=-0.0 Lsize=N/A time=00:01:08.04 bitrate=N/A speed=2.25x
video:765kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=22.278s stime=6.367s rtime=30.356s
bench: maxrss=620108kB

As you can see, the decoding performance of Termux and FFmpegKit is exactly the same when using the Android hardware accelerated decoder hevc_mediacodec.

@tanersener
Copy link
Collaborator

Thanks for running those tests. I need some time to review them.

@tanersener
Copy link
Collaborator

I ran your test scenarios on my end. The results from the native decoder are consistent with the figures in your tests. However, in my case, the MediaCodec decoder in ffmpeg-kit was also 40% slower.

I've noticed the following differences between the termux builds and ffmpeg-kit builds. I believe these differences contribute to the performance gap between the two.

  1. The termux binaries are compiled using a custom Android NDK toolchain, while ffmpeg-kit utilizes the default LLVM toolchain provided with the Android NDK
  2. The termux toolchain implements & enables certain native libraries that are not included in the Android NDK
  3. Several external libraries in termux are compiled with ASM, which unfortunately wasn't possibe for the same libraries in ffmpeg-kit
  4. FFmpeg is compiled with different configuration options

@tanersener
Copy link
Collaborator

tanersener commented May 4, 2024

I managed to enable ASM for x265 on 64bit Android architectures in the development branch. This will speed up x265 operations.

There is also a new --toolchain option defined for android.sh to override the default NDK llvm toolchain. That can be used to build ffmpeg-kit with custom toolchains e.g. termux toolchain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
android Affect Android platform needs-analysis We don't know that this is. It must be investigated further performance
Projects
Status: No status
Development

No branches or pull requests

2 participants