Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Video Super Resolution for Windows (AMD, Intel and NVIDIA) and MacOS #1180

Open
wants to merge 49 commits into
base: master
Choose a base branch
from

Conversation

linckosz
Copy link

@linckosz linckosz commented Feb 8, 2024

Hi,

Context:
Video Super Resolution (VSR) is to Video as DLSS is to 3D Rendering.
Why not let Moonlight being one of the first game streaming solution leveraging such technology?
AI upscaling means significantly less bandwidth usage without compromising the video quality!
NVIDIA, Intel (link in French), and more recently AMD start to advertise their respective technologies to enhance video quality using AI.

Solution:
VSR was not a straight forward implementation, I needed to add the component Video Processor to D3D11VA in order to offload the frame processing from the CPU to the GPU, and leveraging their additional GPU capabilities.
I added a UI checkbox in SettingsView.qml, but the main process logic has been done in d3d11va.cpp.
NVIDIA is providing VSR and HDR enhancement, I could implement VSR perfectly on SDR content, but could not yet HDR (more detail below).
Intel is providing VSR, it has been implemented, but yet to be tested on Arc GPU (I don't have it).
AMD just released AMF Video Upscaling, I prepared the code but need a RX 7000 series (I don't have it) and apparently it might be a quite different approach of implementation.

Testings:
The solution works stable on my rig, I did try different configuration (size, bandwidth, V-Sync, HDR, AV1, etc.) during few days.
AMD Ryzen 5600x
32GB DDR4 3200
RTX 4070 Ti
Moonlight v5.0.1
Sunshine v0.21.0

(Update May 6th, 2024)
A complete report is available at the comment below.
I could test it with a wider range of GPUs:

  • Nvidia RTX 4070 Ti (Windows)
  • AMD RX 7600 (Windows)
  • Intel Arc A380 (Windows)
  • Intel UHD Graphics (16EU), an iGPU from N95 CPU (Windows)
  • M1 Pro (MacOS)

Improvements:

  • (Minor bug) Using a RTX GPU, when HDR is activated, I needed to force the format of the SwapChain to DXGI_FORMAT_R8G8B8A8_UNORM instead of DXGI_FORMAT_R10G10B10A2_UNORM. Otherwise, if I use DXGI_FORMAT_R10G10B10A2_UNORM with VSR activated, the screen becomes a lot darker. I tried many ColorSpaces combinaisons.
  • (Minor bug) Using a RTX GPU, when HDR is activated, VSR adds a kind of extra white border to many components, like an over-sharpened picture. In comparison, SDR is fine.
  • (Medium bug) Using a RTX GPU, when I use the Window Mode, I can manually scale down randomly with no crash, but scaling up while keeping the ratio make the screen becoming black and oftenly crash Moonlight. To avoid the crashes, I just allow the picture to be stretch and bigger than the initial window size.
  • (Minor bug) Using a RTX GPU, I have also coded Nvidia HDR enhancement, I could activate it with some Color space setting (can see the detail in the comment of the method enableNvidiaHDR) but in such configuration the screen is always darker. Probably need to work on the Color space, format, and maybe from Sunshine side, to understand the behavior. So the feature is their, but the User cannot use it with current configuration.
  • (Test) I don't have an Intel GPU, I could only tried with a Intel N95 which has a iGPU, and Intel is support iGPU since CPU Gen 10th (Comet Lake). The code works, but I could barely see an improvement, apparently the best result is on Arc GPU series. I need someone to test it (comparison pictures like below).
  • (Improvement) AMD VSR still yet to be implemented, the documentation (Upscaling and Denoising) is very clear, it looks achievable at first sight. But it requires to have a RX 7000 series, which I don't have...

Results (comparison):
Resolution Test
Banding Test


Commits description

USER INTERFACE
Add a new UI feature called "Video AI-Enhancement" (VAE).

Changes made:

  1. Creation of a new class VideoEnhancement which check the liability to the feature.
  2. Add the checkbox "Video AI-Enhancement" in the "Basic Settings" groupbox.
  3. Disable VAE when fullscreen is selected
  4. Add a registery record
  5. On the Overlay and the mention "AI-Enhanced" when activated
  6. Add a command line for the class VideoEnhancement

BACKEND PROCESSING
Adding VideoProcessor to D3D11VA to offload video processing from CPU to GPU, and leveraging additional GPU capabilities such as AI enhancement for the upscaling and some filtering.

Changes made:

  1. VideoProcessor is used to render the frame only when "Video AI-Enhancement" is enabled; when disabled, the whole process is unchanged.
  2. Add methods to enable the Video Super Resolution for NVIDIA, and Intel. AMD method is currently empty, need to POC the solution with the AMF documentation.
  3. Add methods to enable SDR to HDR. Currently only NVIDIA has such feature, but the code place is prepared if Intel and AMD will too.
  4. Some existing variables local to a method (like BackBufferResource) changed to global scope to be consumed be also VideoProcessor methods.
  5. In ::initialize(), the application checks if the system is capable of leveraging GPU AI enhancement, if yes, it inform the UI to display the feature.
  6. ColorSpace setups (Source/Stream) for HDR are not optimal, further improvment might be possible. Issues observed are commented in the code at relevant places.

Changes made:
1. Creation of a new class VideoEnhancement which check the liability to
   the feature.
2. Add the checkbox "Video AI-Enhancement" in the "Basic Settings" groupbox.
3. Disable VAE when fullscreen is selected
4. Add a registery record
5. On the Overlay and the mention "AI-Enhanced" when activated
6. Add a command line for the class VideoEnhancement
Adding VideoProcessor to D3D11VA to offline video processing from CPU to GPU, and leveraging additional GPU capabilities such as AI enhancement for the upscaling and some filtering.

Changes made:
1. VideoProcessor is used to render the frame only when "Video
   AI-Enhancement" is enabled; when disabled, the whole process is unchanged.
2. Add methods to enable the Video Super Resolution for NVIDIA, and
   Intel. AMD method is currently empty, need to POC the solution with
   the AMF documentation.
3. Add methods to enable SDR to HDR. Currently only NVIDIA has such
   feature, but the code place is prepared if Intel and AMD will too.
4. Some existing variables local to a method (like BackBufferResource)
   changed to global scope to be consumed be also VideoProcessor methods.
5. In ::initialize(), the application checks if the system is capable of
   leveraging GPU AI enhancement, if yes, it inform the UI to display
   the feature.
6. ColorSpace setups (Source/Stream) for HDR are not optimal, further
   improvment might be possible. Issues observed are commented in the
   code at relevant places.
@linckosz linckosz changed the title Vsr Add Video Super Resulotion using NVIDIA and Intel GPUs Feb 8, 2024
@linckosz linckosz changed the title Add Video Super Resulotion using NVIDIA and Intel GPUs Add Video Super Resolution using NVIDIA and Intel GPUs Feb 8, 2024
app/gui/SettingsView.qml Outdated Show resolved Hide resolved
app/streaming/video/ffmpeg-renderers/d3d11va.cpp Outdated Show resolved Hide resolved
app/streaming/video/videoenhancement.cpp Outdated Show resolved Hide resolved
app/streaming/video/videoenhancement.cpp Outdated Show resolved Hide resolved
app/streaming/video/ffmpeg-renderers/d3d11va.cpp Outdated Show resolved Hide resolved
app/streaming/video/ffmpeg-renderers/d3d11va.cpp Outdated Show resolved Hide resolved
app/streaming/video/ffmpeg-renderers/d3d11va.cpp Outdated Show resolved Hide resolved
app/streaming/video/videoenhancement.h Outdated Show resolved Hide resolved
app/streaming/video/videoenhancement.cpp Outdated Show resolved Hide resolved
app/streaming/video/ffmpeg-renderers/d3d11va.h Outdated Show resolved Hide resolved
Changes made:
1. Creation of a new class VideoEnhancement which check the liability to
   the feature.
2. Add the checkbox "Video AI-Enhancement" in the "Basic Settings" groupbox.
3. Disable VAE when fullscreen is selected
4. Add a registery record
5. On the Overlay and the mention "AI-Enhanced" when activated
6. Add a command line for the class VideoEnhancement
Adding VideoProcessor to D3D11VA to offline video processing from CPU to GPU, and leveraging additional GPU capabilities such as AI enhancement for the upscaling and some filtering.

Changes made:
1. VideoProcessor is used to render the frame only when "Video
   AI-Enhancement" is enabled; when disabled, the whole process is unchanged.
2. Add methods to enable the Video Super Resolution for NVIDIA, and
   Intel. AMD method is currently empty, need to POC the solution with
   the AMF documentation.
3. Add methods to enable SDR to HDR. Currently only NVIDIA has such
   feature, but the code place is prepared if Intel and AMD will too.
4. Some existing variables local to a method (like BackBufferResource)
   changed to global scope to be consumed be also VideoProcessor methods.
5. In ::initialize(), the application checks if the system is capable of
   leveraging GPU AI enhancement, if yes, it inform the UI to display
   the feature.
6. ColorSpace setups (Source/Stream) for HDR are not optimal, further
   improvment might be possible. Issues observed are commented in the
   code at relevant places.
@cgutman
Copy link
Member

cgutman commented Feb 25, 2024

I'm going to introduce usage of ID3D11VideoProcessor for color conversion, so you can wait to make further changes until that new code is in to minimize conflicts or duplicated work.

m_IsHDRenabled was a duplication of the existing condition "m_DecoderParams.videoFormat & VIDEO_FORMAT_MASK_10BIT".

Replace the variable m_IsHDRenabled (2)

m_IsHDRenabled was a duplication of the existing condition "m_DecoderParams.videoFormat & VIDEO_FORMAT_MASK_10BIT".
Remove VideoEnhancement::getVideoDriverInfo() method (which was based of Window Registry)
and use the existing method CheckInterfaceSupport() from the Adapter.
@linckosz
Copy link
Author

I'm going to introduce usage of ID3D11VideoProcessor for color conversion, so you can wait to make further changes until that new code is in to minimize conflicts or duplicated work.

No worry, I keep going with current ID3D11VideoProcessor implementation and will do the change once your code is available.

- MaxCLL and MaxFALL at 0 as the source content is unknown in advance.
- Output ColorSpace matched SwapChain
…oProcessor"

"reset" was not used in latest code
 - NVIDIA: After updating NVIDIA driver to 551.61, VSR works in Exclusive Fullscreen (Tested on a RTX 4070 Ti)
 - Intel: VSR works in Exclusive Fullscreen (Test on a Arc a380)
 - AMD: VSR is WIP
@linckosz
Copy link
Author

linckosz commented Feb 27, 2024

I did many tests about Color Space with HDR on, as part of the final result I found something interesting that I wanted to share.

I have 2 Graphic cards on the same PC, a RTX 4070 Ti and a Arc a380.
I streamed another PC running on a CPU Intel N95.
In attachment are 2 HDR pictures (GPU Output.zip) coming from the same Moonlight session (H.265 decoding via the Arc) on the same display (Gigabyte m27q), same DP cable, but they still quite different. The reason is that one is displayed via the RTX (DP output), and the other is displayed via the Arc (DP output).
Arc a380: The picture is clear, the color close to be accuratly rendered, it is perfectly usable. Just minor artifacts around very contrasted areas like texts.
RTX 4070 Ti: The picture color is whashed out, and there is a significant banding effect. Unpleasant to use.

Conclusion:
HDR output is handled quite differently from one GPU to another one, and can lead to a poor quality picture.

…nt is On

- Simplification of the class VideoEnhancement as all properties will be set at D3D11va initialization
- Since it never change during a whole session, only scan all GPU once at the application launch and keep track of the most suitable adapter index with VideoEnhancement->m_AdapterIndex.
- Adapt setHDRoutput as the adapter might be different (not the one linked to the display).
- In case Video Enhancement is Off, we keep using the previous behavior (=using the adapter linked to the display).
- Update setHDRoutput in case of multiple displays, to make sure we get the HDR information of the display where Moonlight is displayed
During the scan, it is useless to enable enhancement capabilities for all GPU as it will be done later right after for only the selected GPU.
- [Minor] No need to set HDR Stream and Output if HDR is disabled in Moonlight UI
- [Minor] During the selection of most Video enhancement GPU, the best result was not saved resulting of selecting the last GPU scanned.
@linckosz
Copy link
Author

linckosz commented Apr 1, 2024

I am using N100 PC. It keeps black screen when i enable AI enhancement. is there anything i can do to fix that?

@Kobain-Seo ,
That's normal because the code you tried was outdated. I just updated the code today, you can try again. As I added a submodule (AMF), just don't forget to run the command "git submodule update --init --recursive".
I have a N95 Mini-PC and it works pretty well, compare to a Arc A380, the result is similar.
Here is a screenshot using the N95 (source: 1440, stream: 1080@150, display: 1440).

With Video Enhancement:
1080@150E@N95

Without:
1080@150S@N95

@linckosz
Copy link
Author

linckosz commented Apr 1, 2024

@cgutman ,
Can you help to review the code again?
I have finished the code and already added the latest commits.
I was able to better understand the DirectX pipeline in place, and I could resolve the color issue by applying the shaders (the ones already in use) on the top of of vendors' upscaling solution.

@linckosz linckosz changed the title Add Video Super Resolution using NVIDIA and Intel GPUs Add Video Super Resolution on all GPUs (NVIDIA, Intel and AMD) Apr 1, 2024
When the Host has the HDR off, and if Moonlight has HDR on, the screen appears slightly grey.
@Kobain-Seo
Copy link

@linckosz

Thank you for your great work.
I've tested the latest source.
As i mentioned, my client is N100.
I setup is HEVC or AV1 with 144FPS.
I thinks AI upscailing is little bit heavy for N100 w/ 144FPS.

Screenshots are below:

av1_ai
av1_hdr_ai
av1_hdr_no_ai
av1_no_ai

@linckosz
Copy link
Author

linckosz commented Apr 4, 2024

@Kobain-Seo ,
Thanks for sharing your result!
You are right, it looks like the extra computation needed for enhancement reached a CPU threshold and the rendering time might take more time than needed for 144Hz (1 frame every 7ms).
I tested at 120Hz with my N95 and observed the same limitation.
Using Enhancement, I could max out at 1440 90fps 50Mbps, while without it I can reach 1440 120fps 150Mbps.

The checkbox is greyed out instead of being hidden.
@Kobain-Seo
Copy link

@linckosz
I have a question.
You mentioned :

(Medium bug) Using a RTX GPU, when I use the Window Mode, I can manually scale down randomly with no crash, but scaling up while keeping the ratio make the screen becoming black and oftenly crash Moonlight. To avoid the crashes, I just allow the picture to be stretch and bigger than the initial window size.

does it mean that moonlight stretch screen when host stream aspect ratio of 4:3 with 16:9 resolution without black bars?
if it does, it helps issues like #1002.

I hope there is stretch option for moonlight.

@linckosz
Copy link
Author

@linckosz I have a question. You mentioned :

(Medium bug) Using a RTX GPU, when I use the Window Mode, I can manually scale down randomly with no crash, but scaling up while keeping the ratio make the screen becoming black and oftenly crash Moonlight. To avoid the crashes, I just allow the picture to be stretch and bigger than the initial window size.

does it mean that moonlight stretch screen when host stream aspect ratio of 4:3 with 16:9 resolution without black bars? if it does, it helps issues like #1002.

I hope there is stretch option for moonlight.

I able to fix the blackscreen bug later while scaling up the window and it now can keep the ratio adding the black bars. That’s why the box is checked.
I am not sure I understand your question, do you mean that you would like to change the resolution of the host on the fly based on the window? A option like “the client control the resolution”? If so, I haven’t seen such options method yet, resolutions are currently pre-filed in the drop-down menu, all 16:9. As for the feature I developed, when scaled up it helps with the blur mentioned, but for the ratio it is something else to develop that may involve sunshine as well.

@Kobain-Seo
Copy link

@linckosz I have a question. You mentioned :
(Medium bug) Using a RTX GPU, when I use the Window Mode, I can manually scale down randomly with no crash, but scaling up while keeping the ratio make the screen becoming black and oftenly crash Moonlight. To avoid the crashes, I just allow the picture to be stretch and bigger than the initial window size.
does it mean that moonlight stretch screen when host stream aspect ratio of 4:3 with 16:9 resolution without black bars? if it does, it helps issues like #1002.
I hope there is stretch option for moonlight.

I able to fix the blackscreen bug later while scaling up the window and it now can keep the ratio adding the black bars. That’s why the box is checked. I am not sure I understand your question, do you mean that you would like to change the resolution of the host on the fly based on the window? A option like “the client control the resolution”? If so, I haven’t seen such options method yet, resolutions are currently pre-filed in the drop-down menu, all 16:9. As for the feature I developed, when scaled up it helps with the blur mentioned, but for the ratio it is something else to develop that may involve sunshine as well.

I played some old games which only support 1280x760 or 1024x768. But streaming resolution is 1920x1080. So basically moonlight only support fixed aspect ratio, there are black bars left and right. What i hope is to stretch(or zoom to fit) 4:3 ratio to 16:9. I wondered that 'Medium bug' may be relevant for stretching screen.

Thank you for your kind reply.

@HakanFly
Copy link

HakanFly commented Apr 11, 2024

Hi there,

I'm here to give feedback on the Legion Go (Z1 Extreme / 780M RDNA3). And it's a 16:10 1600p capable screen.

As discussed on discord, I had a big doubt following a misunderstanding about the functionality implemented in Moonlight with AMD and its equivalent of VSR.
I can confirm that it has nothing to do with the new Video Upscale option. And it works with RDNA2.
And I can confirm that it works. It's even pleasant to have a sharper image.

But I find it hard to justify its use on a handleld PC.

Today, with 5Ghz wifi, I'm able to stream in AV1 1600p=>1600p=>1600p 100mbps @120FPS without any packet loss or jitter. And the Legion Go is able to do this with a TDP of 6-7W and 4 cores deactivated. That's very handy for making the battery last a long time (~6h).

1600p=>1600p=>1600p + AI-Enhanced: impossible to maintain 120 FPS. And with 90-100FPS, I'm on a TDP of 12-13W.

1600p => 1050p/1200p => 1600p + AI-Enhanced: it's hard to go above 70 FPS. And even at 60 FPS and a TDP of 25W or 30W, I still get CPU throttling.

Well, I've had some network problems since I recently changed my ISP box (Freebox Ultra), which has led to some jitter due to repeated latency peaks. And I was hoping it would support Jumbo Frames on wired networks, but it doesn't. I'm waiting for some hardware to fix it. Then I'll be able to resume my benchmarks more easily.

- Force to use FSR 1.0 instead of FSR 1.1 as the second one force to convert YUV->RGB->YUV which is too heavy for iGPU, and we have to efficient way to identify a dedicated GPU from an APU.
- Some memory optimization in the renderFrame method to accelerate the render.
 - Add The library MetalFX
 - Check the Upscaling capability based on MacOS version (13+)
 - Add Spacial scaler for Luma and Chroma textures
@linckosz linckosz changed the title Add Video Super Resolution on all GPUs (NVIDIA, Intel and AMD) Add Video Super Resolution for Windows (AMD, Intel and NVIDIA) and MacOS May 6, 2024
@linckosz
Copy link
Author

linckosz commented May 6, 2024

Based on the work done from @TimmyOVO on the iOS version (metal branch), I was able to reuse the MetalFX method to apply it to the MacOS version.
The result is quite interesting as the picture is significantly sharpener, and the latency is identical.
Here is a comparison on a Macbook M1 Pro.

MacOS

@HakanFly
Copy link

HakanFly commented May 7, 2024

Hey @linckosz,
I'm back with new tests on Legion Go and the build r2416 (https://ci.appveyor.com/project/cgutman/moonlight-qt/builds/49660626).

I've applied a few additional optimizations to the OS. And I'm glad it had a (minimal) impact on the Moonlight power reduction.
And with your latest optimizations, I was able to run all the tests with a TDP limited to 8W. And that's just great!

AV1 / 120 Mbps / 120 FPS
V-Sync Enabled
Frame Pacing Enabled

Here are some screenshots to show the difference in a static context :

As I was saying with the debug version, it really is night and day in terms of performance impact.
Good Job !

I share other screenshots here:
https://www.dropbox.com/scl/fo/fvs7dwr64qpgjw6w050va/AHXQSAfdc4g2ZZMde0_j7F4?rlkey=3cw0vpgi82lifxwuiyiyabrk8&e=1&st=fi2cm3u9&dl=0

@linckosz
Copy link
Author

linckosz commented May 8, 2024

Hi @HakanFly ,
Fantastic, thanks for sharing! Jumping from 25W to 8W is certainly a significant improvement.
I was able to achieve it by improving the way some objects were created to limit memory consumption.
I would not be able to figure out this issue without your support to this PR 👍.

Comparison Legion Go

@DeNT15T
Copy link

DeNT15T commented May 9, 2024

Hi,

I have an off-topic question: Is it possible to implement Frame Generation on Moonlight?

As far as I know, using Frame Generation on the host side can cause dropped frames on Moonlight. Therefore, I'm curious if it's possible to use Frame Generation on Moonlight instead.

@linckosz
Copy link
Author

Is it possible to implement Frame Generation on Moonlight?

Hi @DeNT15T ,
This is possible, at least I know how for AMD GPU, but it requires to have DirectX12 and currently Moonlight is based on DirectX11.
If you know how to implement DX12, we can try and benchmark it.

@HakanFly
Copy link

Hi @DeNT15T, I have no problems with FSR3 FG on games that support it officially or with a mod. But I have an AMD GPU. I don't know about Nvidia.
On the other hand, I never use AFMF or Lossless Scaling FG. And I don't think they're compatible.
After that, I think everything's fine on RDNA3 because HAGS and Sunshine work well together.

@linckosz I suppose FSR 3.1 should also bring its share of improvements. After all, I know that AFMF already supports Directx11. I thought Directx12 was just a limitation for DLSS FG.

@linckosz
Copy link
Author

linckosz commented May 10, 2024

@HakanFly , @DeNT15T

Few information about Frame generation methods with DX11.

AFMF and RSR
It does work with DX11 games, but exclusive fullscreen must be activated. Currently Moonlight seems not using an exclusive fullscreen, just try to set a lower streaming resolution than the display, the display won't change its resolution. I just tried, and AFMF and RSR are not activated. So, may be, but cannot confirm yet, need to figure out how to activate an exclusive fullscreen.
And it will be limited to recent AMD GPUs.

AMD Frame Interpolation (commonly called FG)
There are 2 methods:

  1. This one won't work as it is using Temporary Upscaling FSR3, which needs 3D data inputs, something that Moonlight cannot provide as it is 2D data only (video frame).
  2. This one won't either with DX11 as it using FRS3 which only work with DX12 (and soon with Vulkan). Nevertheless, it can be investigated as it seems to work with 2D texture as a replacement of SwapChain.

AMD AMF FRC
FRC is only available on DX12 for AMD GPUs. I was refering to this slution in the above message.
It may work if we use DX12.

@HakanFly
Copy link

Wow thank you for the feedback

@linckosz
Copy link
Author

linckosz commented May 22, 2024

Linux Video Super Resolution result:
Unfortunatly I wasn't able to make it work for Linux, below are the reasons.
The only path I found is to upgrade the rendering engine of OpenGL, which is something I am not able to do for now.

Vulkan (plvk.cpp)

  • I need to convert the compute shader FSR.glsl to SPIR-V formart using gslang, but the conversion always fails.
  • I could not figure out how to load a shader in SDL_Vulkan, the only code samples I found were based on GLFW.

OpenGL ES2 (eglvid.cpp)

  • ES2 can only load Vertex and Fragment shaders, in order to load a compute shader (FSR.glsl), we need to upgrade to ES3.1. For reference, mpv.io for Linux has implemented FSR.glsl using ES3.1.

@cgutman
Copy link
Member

cgutman commented May 23, 2024

ES2 can only load Vertex and Fragment shaders, in order to load a compute shader (FSR.glsl), we need to upgrade to ES3.1. For reference, mpv.io for Linux has implemented FSR.glsl using ES3.1.

Despite the name of the renderer being "opengles2", I think we do generally get a GLES 3.0+ context from SDL.

If you want to force it, I think you can do so with code like:

    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 1);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants