Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drm/vc4: Extend BROADCOM_SAND modifier to handle computing automatically #5940

Draft
wants to merge 896 commits into
base: rpi-6.6.y
Choose a base branch
from

Conversation

6by9
Copy link
Contributor

@6by9 6by9 commented Feb 9, 2024

The vc4 HVS can handle almost any column height in the SAND format, and this was signalled via the vendor bits in DRM_FORMAT_MOD_BROADCOM_SAND128.

DRM is now requiring that all potential modifier values are reported, not just the canonical version of them.
It is not feasible to report all the potential column heights, therefore add an option to go for a lowest common denominator calculation that computes the column height as height * 3/2 as it is YUV420.

tomba and others added 30 commits February 6, 2024 16:52
Track the errors from the CSI-2 receiver: overflows and discards. These
are recorded in a table which can be read by the userspace via debugfs.

As tracking the errors may cause much more interrupt load, the tracking
needs to be enabled with a module parameter.

Note that the recording is not perfect: we only record the last
discarded DT for each discard type, instead of recording all of them.
This means that e.g. if the device is discarding two unmatched DTs, the
debugfs file only shows the last one recorded. Recording all of them
would need a more sophisticated recording system to avoid the need of a
very large table, or dynamic allocation.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Drop 'sensor_embedded_data' field, as it is unused.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Set hardcoded values for enum csi2_mode, as the values will be
programmed to HW registers.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
When pisp_fe_pad_set_fmt() is given an mbus code that CFE does not
support, it currently defaults to MEDIA_BUS_FMT_SBGGR10_1X10. This is
not correct, as FE does not support SBGGR10.

Set the default to MEDIA_BUS_FMT_SRGGB16_1X16 instead.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Set default meta format's field to V4L2_FIELD_NONE, instead of zeroing
it which indicates V4L2_FIELD_ANY. Metadata doesn't have fields, so NONE
makes sense, and furthermore the default v4l2 link validation will check
for matching fields, or that the sink field is NONE.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
When the FE is enabled, ensure that the FE_CONFIG node is enabled.
Otherwise fail cfe_start_streaming() entirely.

Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
It was accidentally placed in the audio decoder section.

Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Allow DT aliases of eg DSI2 to force make DRM allocate the
display with the requested name.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Since "drm/vc4: hvs: Support BCM2712 HVS" booting Pi4
with dual 4kp30 displays connected fails with:
vc4-drm gpu: [drm] *ERROR* [CRTC:107:pixelvalve-4] flip_done timed out

It has been tracked down to the referenced commit adding a
path to clear the SCALER_DISPBKGND_FILL when not required.

Dual 4kp30 works with a core clock of 297MHz when background fill
is enabled, but requires a higher value with it disabled.
320MHz still fails, while 330MHz seems okay.

Lets always enable background fill for Pi0-4.

Fixes: e84da23 ("drm/vc4: hvs: Support BCM2712 HVS")

Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
dtoverlays: Fix up edt5406 entries to match with vc4-kms-dsi-7inch

vc4-kms-dsi-7inch expects the touch fragment to be named ts_i2c_frag,
but edt5406 didn't do this.

dt: Add DSI1 and DSI2 aliases to 2712

In order to keep the DRM names consistent as DSI-1 and DSI-2, add
aliases to the Pi5 DT.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Forcing a gpiochip to have a fixed base number now leads to a warning
message. Remove the need to do so by calculating hwirq numbers based
on bank numbers.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Fixes: 3b0213d ("gpio: Add GPIO support for Broadcom STB SoCs")
Vision Components have made an OV9281 module which blocks reading
back the majority of registers to comply with NDAs, and in doing
so doesn't allow auto-increment register reading as used when
reading the chip ID.

Use two reads and manually combine the results.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
In order to use iommu on hevc set dma mask_and_coherent in probe.
I am assured dma_set_mask_and_coherent is benign on Pi4 (which has
no iommu) and it seems to be so in practice.
Also adds a bit of debug to make internal buffer allocation failure
easier to spot in future

Signed-off-by: John Cox <jc@kynesim.co.uk>
Add iommu to rpivid so it can cope with scatter/gather

Signed-off-by: John Cox <jc@kynesim.co.uk>
Remove the MEDIA_BUS_FMT_PISP* format codcs entirely. For the image
pad formats, use the 16-bit Bayer format mbus codes instead. For the
config and stats pad formats, use MEDIA_BUS_FMT_FIXED.

Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
The firmware wants the YUYV format stride alignment to be to a multiple
of 32pixels / 64 bytes. The kernel driver was configuring it to a multiple
of 16 pixels / 32 bytes, which then failed when it tried starting to
stream.

Correct the alignment requirements.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
See: raspberrypi#4179

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
When polled without the use of IRQ, FT5x06 registers may return
undefined initial data, causing unwanted touches or event spamming.
A simple way to filter this out is to suppress touches until the
TD_STATUS register changes for the first time.

Increase the delay before first polling to 300ms, to avoid
transient I2C read flakiness that seems to occur after reset.

Signed-off-by: Nick Hollinghurst <nick.hollinghurst@raspberrypi.com>
The RP1 I2C interfaces were being left with their default clock rates,
apparently 400kHz.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
bcm2708_fb is disabled by the vc4-kms-v3d overlay, which means that the
DMA memcpy support it provides is not available to allow vclog to read
the VC logs from the top 16MB on Pi 2 and Pi 3. Add the code to the
vc_mem driver, which will still be enabled.

It ought to be possible to do a proper DMA_MEM_TO_MEM copy via the
generic DMA customer API, but that can be a later step.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
11cf37e switched to using drm_fb_dma_get_gem_addr instead of
drm_fb_dma_get_gem_obj and adding fb->offset[].

However the tiled formats need to compute the offset in a more
involved manner than drm_fb_dma_get_gem_addr applies, and we
were ending up with the offset for src_[xy] being applied twice.

Switch back to using drm_fb_dma_get_gem_obj and fully computing
the offsets ourselves.

Fixes: 11cf37e ("drm/vc4: Move the buffer offset out of the vc4_plane_state")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Now that we have removed unique PISP media bus codes, the cfe format
table has multiple entries with the same media bus code for 16-bit
formats. The test in cfe_video_link_validate() did not account for this.
Fix it by testing the media bus code and the V4L2 pixelformat 4cc
together.

As a drive-by, ensure we have a valid CSI2 datatype id when programming
the hardware block.

Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
@pelwell
Copy link
Contributor

pelwell commented Feb 12, 2024

I can't speak to whether this patch is the best solution to the problem, but I can believe that it might be.

@6by9
Copy link
Contributor Author

6by9 commented Feb 12, 2024

I can't speak to whether this patch is the best solution to the problem, but I can believe that it might be.

I'm waiting for @jc-kynesim to check that we can get this working, and then will send it upstream to see what their response is.

The failure mode isn't critical at present, and ideally I want to avoid having to revert a downstream approach if mainline object, hence leaving it as a draft for now.

It appears that bits in the Root Control Register are reset with
perst_n, which means the PCI layer's call to enable CRS prior to
adding/scanning the bus has no effect. Open-code the enable in
brcm_pcie_start_link as a workaround.

Without CRS visibility, configuration reads issued by the CPU don't
retire if the endpoint returns a CRS response - the RC will poll until a
(large) timeout is reached. This means the core can stall for a long
time during boot.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
@popcornmix
Copy link
Collaborator

We'll want to check that 4kp60 hevc decode doesn't degrade with this change, as it was already somewhat borderline on Pi4. My understanding is we previously would choose a larger height to get better sdram bank switching behaviour between adjacent stripes of image.

I don't know how beneficial this actually was, but it's possible it was providing some necessary performance benefit.

@jc-kynesim
Copy link
Contributor

Pretty sure that the current code simply rounds height up to next x16 and then has col height as 3/2*height.

@6by9
Copy link
Contributor Author

6by9 commented Feb 12, 2024

We'll want to check that 4kp60 hevc decode doesn't degrade with this change, as it was already somewhat borderline on Pi4. My understanding is we previously would choose a larger height to get better sdram bank switching behaviour between adjacent stripes of image.

I don't know how beneficial this actually was, but it's possible it was providing some necessary performance benefit.

I consulted with jc on this and what was expected.

The V4L2 driver does round the column height up to a multiple of 16 and *3/2, however as long as it sets the buffer height to be the padded height, then DRM should auto-compute the correct number. This is the reason I'm slightly concerned over looking at src_h, not buffer h.

@popcornmix
Copy link
Collaborator

Okay - there is a more complicated algorithm on the firmware side for calculating an optimal height.

The requirement in the source that the stride be an odd multiple of two pages in size is actually over-restrictive.
The intention is that in a four-bank SDRAM if a motion compensation source location overlaps a stripe boundary in X
and a page transition in Y then the four pages covered are in different banks (call this the "four corners property").
As we have an eight-bank device we get the desired effect if the stride is a multiple of the page size and stride/pagesize & 7 != 0, 1 or 7.
On Pi 2 our wacky custom SDRAM gives us 8k (64-line) pages, while on Pi 1 we have 4k (32-line) pages.
I think our best bet to satisfy the "four corners" property and avoid aliasing is to pad stripes to an odd multiple of 2k (an odd multiple of 16 lines),
and to avoid multiples which bring the banks in adjacent stripes so nearly into alignment that a 23 pixel high motcomp neighbourhood can reach from a bank n page in stripe x to a bank n page in stripe x+1.

(This is used for both the 8-bit and 10-bit column formats).

But I guess if we're not doing this currently then it doesn't affect this PR.
It is perhaps something we should investigate as it could help with 4kp60 decode performance.

@6by9
Copy link
Contributor Author

6by9 commented Feb 12, 2024

V4L2 stateful (H264/MPEG4/MPEG2/H263/VC-1) video decode doesn't produce SAND - that was only available via MMAL that officially we have now deprecated.

The only supported generator of SAND is the HEVC decoder, which (AFAIK) has no specific requirements.

The DRM driver will still accept the old SAND modifier with an explicit column height, however it is not advertising all the relevant modifier values (which is the new requirement).
If we need to introduce it later, then supporting the relevant 4kB page aligned values is still plausible as it will be some subset of the 4kB (32 line) alignments up to max 256kB (2048 lines), which is a max 64 entries and likely far less.

@6by9
Copy link
Contributor Author

6by9 commented Feb 12, 2024

Looking at what I hope is a vaguely current FFmpeg from JC, DRM gets passed the cropped width/height as the buffer size, which is going to cause grief as DRM then hasn't got the padded height.

Seeing as it also sets the source rectangle there appears to be little reason to set the buffer geometry to the cropped value, and DRM can then use fb->height.

@jc-kynesim
Copy link
Contributor

Whilst the height isn't passed directly it is passed implicitly in the plane[1] offset (offset/128)

@jc-kynesim
Copy link
Contributor

Though I'll agree I probably shouldn't have cropped w/h in the AddFB2

@6by9
Copy link
Contributor Author

6by9 commented Feb 12, 2024

Following discussion with jc

  • fixed as fourcc_mod_broadcom_param only extracts an int.
  • aligned height to a multiple of 2 lines

With the matching drm_fourcc.h change imported, https://github.com/jc-kynesim/rpi-ffmpeg.git branch
dev/5.1.4/bcm_auto_mod_1 is working with the brave new world.

@jc-kynesim
Copy link
Contributor

Thanks.
I'd like to note my mild discomfort with how height now needs to be the original allocation height when previously we could get away with (and much code did) applying cropping directly to the stored frame height. The alternative would be to use the chroma offset to calculate the luma height (which would be reliable) but I can see how this might well play badly with upstream so I guess this is the best we can do.

@6by9
Copy link
Contributor Author

6by9 commented Feb 12, 2024

Unless there are serious objections, then I was going to send this upstream and get their response before looking at merging it here.
I'll look at sending that on Wednesday, so please shout soon.

P33M and others added 6 commits February 13, 2024 10:11
Switch to using card-detection via GPIO, and add missing emmc_cmd pin.
Also, "emmc_*" isn't the name of the respective function, but the name
of the pin. These pins are single-function, but need pulls set
accordingly.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The SDIO_CFG register SD_PIN_SEL conflates two settings - whether eMMC
HS or SD UHS timings are applied to the interface, and whether or not
the card-detect line is functional. SD_PIN_SEL can only be changed when
the SD clock isn't running, so add a bcm2712-specific clock setup.

Toggling SD_PIN_SEL at runtime means the integrated card-detect feature
can't be used, so this controller needs a cd-gpios property.

Also fix conditionals for usage of the delay-line PHY - no-1-8-v will
imply no bits set in hsemmc_mask or uhs_mask, so remove it.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The pad control registers are concatenated onto the GPIO pad control
registers, as with previous steppings.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The CM5 is a platform that will appear in multiple boards, each of
which may have different connectivity. Split the CM5 DTS into a common
cm5.dtsi and board-specific dts files, where the CM5 DTS file (the one
loaded by the firmware by default) is an alias for the CM5IO DTS file.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The I2C spec says the DDC link should run at 100kHz or less, however
Pi5/BCM2712 had been configured for 200kHz.
Reduce it to comply with the spec, and match Pi4.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The vc4 HVS can handle almost any column height in the SAND format,
and this was signalled via the vendor bits in DRM_FORMAT_MOD_BROADCOM_SAND128.

DRM is now requiring that all potential modifier values are reported,
not just the canonical version of them.
It is not feasible to report all the potential column heights.
After review, add the aligned column heights that follow the optimal
SDRAM page allocation algorithm for a spread of image heights, and
explicity advertise those.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
@6by9
Copy link
Contributor Author

6by9 commented Feb 13, 2024

Reworked so that V4L2 uses the magic page aligned heights, and DRM advertises those heights as modifiers.

Seems to work without changes to FFmpeg.

There is a small performance gain by having the column strides
aligned to SDRAM banks.
Precompute the column heights that would follow from various common
image heights, and align to the nearest one of them.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
@jc-kynesim
Copy link
Contributor

I suggest you remove the constrain2x lines - they no longer do something useful.
Alternatively leave then as they were previously and only do the new calculation if the user passed in 0. The former is probably better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet