hide sony bootloader unlock status in QSEE api call #7

j4nn · 2020-09-07T21:40:26Z

This patch forces bootloader unlock status returned from trust
zone in sony proprietary api to be always as "locked".
It does not fix kernel command line args that are set by verified
bootloader with unlock related options and thus does not interfere
with other android default verified boot policy.
Instead it only fixes handling in original sony drm blobs that
can be beneficial particularly if sony device (drm) key has been
restored after unlock.

Without this patch, following log can be observed:
libdevice_security_static: get_rooting_status.cpp:80 rooting_status 2
With this patch, following is logged instead:
libdevice_security_static: get_rooting_status.cpp:80 rooting_status 1

This patch has been tested in today's build of los 17.1 in both TA
partition states, i.e. lost and restored device key.

Please note also that the same api returns decrypted device key at
offset 0x20 (16 bytes) if it has been restored in 66667 TA unit.
If the device key has been lost and not restored, 16 zero bytes
are returned at offset 0x20 instead.
That means this way userspace proprietary libs may actually use
the device key without directly reading it from the TA unit, if the
still locked flag check is passed (just a theory without reverse
engineering proof).

Change-Id: I4cea5b666377d71fb63d985839d095aa4240fb44

j4nn · 2020-09-07T21:45:37Z

Applies to (and tested with) derfelot:lineage-17.1_update 8ee9de4 commit (Merge Linux 4.4.235 kernel).

derfelot · 2020-09-08T19:48:51Z

since this doesn't have an effect on devices without restored keys, I'm fine with it. any objections @cryptomilk ?

derfelot · 2020-09-08T19:53:03Z

Oh @j4nn could you please change commit title to the like "qseecom: [current title]?

thanks

j4nn · 2020-09-08T20:05:54Z

Commit title changed as suggested, forced pushed, please pull the ca508e2 commit.
Thanks.

[ Upstream commit d26383d ] The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) whatawurst#1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 whatawurst#2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 whatawurst#3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 whatawurst#4 0x560578e07045 in test__pmu tests/pmu.c:155 whatawurst#5 0x560578de109b in run_test tests/builtin-test.c:410 whatawurst#6 0x560578de109b in test_and_print tests/builtin-test.c:440 whatawurst#7 0x560578de401a in __cmd_test tests/builtin-test.c:661 whatawurst#8 0x560578de401a in cmd_test tests/builtin-test.c:807 whatawurst#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 whatawurst#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 whatawurst#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 whatawurst#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 whatawurst#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

Flamefire · 2020-10-15T13:11:05Z

I had this code in my build of this ROM a while ago and removed it again. I don't see what exactly it changes, i.e. no benefits in real use cases. I also see missing range checks which make me feel uncomfortable given that this is kernel code.

To be clear: I can confirm that the log output is rooting_status 1 (obviously as this is forced in there).

But integrity checks, bootloader lock state (during power on) etc are still failing and e.g. Magisk Hide is still required.

Can you provide an example how this actually benefits anything? Like something that works now which did not work before.

linckandrea · 2020-10-15T13:27:05Z

I had this code in my build of this ROM a while ago and removed it again. I don't see what exactly it changes, i.e. no benefits in real use cases. I also see missing range checks which make me feel uncomfortable given that this is kernel code.

To be clear: I can confirm that the log output is rooting_status 1 (obviously as this is forced in there).

But integrity checks, bootloader lock state (during power on) etc are still failing and e.g. Magisk Hide is still required.

Can you provide an example how this actually benefits anything? Like something that works now which did not work before.

"I don't see what exactly it changes"

"This patch forces bootloader unlock status returned from trust
zone in sony proprietary api to be always as "locked"."

"It does not fix kernel command line args that are set by verified
bootloader with unlock related options and thus does not interfere
with other android default verified boot policy."

j4nn · 2020-10-15T19:10:02Z

@Flamefire could you please be more specific about the missing range checks?
As far as I can see the qsee command structure is normally validated within __validate_send_cmd_inputs(), so that ensures request and response buffers are within the "shared buffer" including req and resp lengths.
Basically we are peeking within that shared buffer and changing a byte only in case of a specific request together with magic string in the response.

Concerning benefits - you can see examples in the stock rom. With this patch you can get video enhancements and wifi display working, without the patch both do not work even when having TA-locked restored.
It may be less useful with lineage os unless someone installs specific sony apps, but on the other hand, there are lot of sony proprietary blobs included within lineage os build, so it may actually have some influence as it even can be seen from the log, when the locked state is checked.

So why not to have it if someone has the drm key restored?
Without this, the restored key is provably not used in some sony apps, disabling drm features not caring about presence of the key.

Also as mentioned in the commit log, the qsee api which returns the bl lock status returns also decrypted device key.
Possibly sony proprietary/drm related blobs can check the lock status and use the decrypted device key only in case the status is still locked.

The decrypted device key returned at offset 0x20 is calculated from the last 16 bytes of hwconfig ta unit (0x7d3) xored by the 16 bytes from 66667 ta unit (the device key part which get's erased with bl unlock).

This way proprietary blobs can work with the decrypted device key without trying to read it directly from TA, making reverse engineering / finding where the key is used possibly more difficult than just checking which binary links to libmiscta.so.

Moreover as qseecom calls seem to be statically linked in many blobs since pie, it is also difficult to spot which binary calls that too.
Because of this, this patch is the most easier way to "fix" the bl lock status in all sony blobs, without patching them and even without a need to search what would need to be patched.

Flamefire · 2020-10-15T19:27:31Z

could you please be more specific about the missing range checks?

I mean the code like if (strncmp((uint8_t *)rb + 0x31, "HWC_Yoshino_Com_", 16) == 0). How can you be sure there are 0x31+ bytes and there is a NULL terminated string or 16 more bytes at this point? AFAIK the checks done are basically a "hack" to detect the actual buffer to patch and don't wait for a specific packet with a known structure and size. Hence an OOB read could happen possibly leading to a bootloop. So this "peeking" without checking the size of the buffer could(!) be dangerous.
So is this sure, that the size is enough? Why/How? Otherwise I'd suggest adding range checks.

so it may actually have some influence

That's what I meant: I'd like to see some influence. You mentioned "video enhancements and wifi display working". So seeing this with the ROM would be a good point.

To me "Possibly sony proprietary/drm related blobs can check the lock status" is not enough to warrant changing the kernel, which is a risk. A small one but still and it possibly incurs possible costs in resolving merge conflicts when the kernel is updated. I just want to make sure there is an actual (as opposed to possible) benefit in doing so.

Of course this is my opinion and nothing personal but a purely technical question, I hope you don't mind.

j4nn · 2020-10-15T19:52:58Z

I mean the code like if (strncmp((uint8_t *)rb + 0x31, "HWC_Yoshino_Com_", 16) == 0). How can you be sure there are 0x31+ bytes and there is a NULL terminated string or 16 more bytes at this point?

I agree it is a hack, but that is what you do when a vendor tries to limit usage of your own device.
I will add the checks, that is no problem.
They are most likely not needed, because qseecom api requires use of ION memory allocator for req & resp buffers and therefore the "shared buffer" in the qseecom driver is kernel page aligned including it's size as can be seen in __ion_alloc(): len = PAGE_ALIGN(len);

That's what I meant: I'd like to see some influence. You mentioned "video enhancements and wifi display working". So seeing this with the ROM would be a good point.

Fee free to make sony stock gallery with video playback working in los - I am not interested to do that.
But I am very sure that you would see your proof in that case, as it happens with stock rom.

This patch forces bootloader unlock status returned from trust zone in sony proprietary api to be always as "locked". It does not fix kernel command line args that are set by verified bootloader with unlock related options and thus does not interfere with other android default verified boot policy. Instead it only fixes handling in original sony drm blobs that can be beneficial particularly if sony device (drm) key has been restored after unlock. Without this patch, following log can be observed: libdevice_security_static: get_rooting_status.cpp:80 rooting_status 2 With this patch, following is logged instead: libdevice_security_static: get_rooting_status.cpp:80 rooting_status 1 This patch has been tested in today's build of los 17.1 in both TA partition states, i.e. lost and restored device key. Please note also that the same api returns decrypted device key at offset 0x20 (16 bytes) if it has been restored in 66667 TA unit. If the device key has been lost and not restored, 16 zero bytes are returned at offset 0x20 instead. That means this way userspace proprietary libs may actually use the device key without directly reading it from the TA unit, if the still locked flag check is passed (just a theory without reverse engineering proof). Change-Id: I4cea5b666377d71fb63d985839d095aa4240fb44

j4nn · 2020-10-15T21:26:29Z

Added the checks as suggested by @Flamefire. Retested to confirm it still works.
Please see the new forced ee107e2 commit.
Thanks.

j4nn · 2020-10-15T21:56:44Z

That follows the convention already used in the qseecom driver - see qseecom_unload_app().
That means if it was wrong, it would need to be patched there too.

Besides, if you wanted to try to crash the kernel from qseecom driver by oob read (which would not happen as there would not be a not mapped page boundary), you would need to have root already (or atleast tee_device selinux context), so there would be many much easier ways to crash the system, making this discussion academical;-)

Flamefire

I see, there is an array of size MAX_APP_NAME_SIZE which is 64, so that's fine

Making this discussion academical

Sure, but as shown by history it usually is not one bug that causes the vulnerability but a chain of minor issues. That's why I wanted to be careful here.

Added a couple minor change suggestions which IMO make sense. E.g. getting the rb pointer only when it will be retained and making the logic easier to follow. Besides that basically just matching style and directly casting to the target type instead of void* which makes the below lines easier to reason about.

I hope you don't feel nagged by that, just want to make sure, everything is in the best possible shape, given that it is a kernel change :) Hence only a comment, not a condition for acceptance (not that I can enforce that in any way ;) )

Flamefire · 2020-10-16T08:49:47Z

drivers/misc/qseecom.c

+		rb = (void *)__qseecom_uvirt_to_kvirt(data,
+						(uintptr_t)req->resp_buf);
+		if (sb != NULL && req->cmd_req_len >= sizeof(uint32_t) * 2)
+			if (sb[0] != 0x07 || sb[1] != 0x04)
+				sb = NULL;
+		if (sb == NULL)
+			rb = NULL;


Suggested change

rb = (void *)__qseecom_uvirt_to_kvirt(data,

(uintptr_t)req->resp_buf);

if (sb != NULL && req->cmd_req_len >= sizeof(uint32_t) * 2)

if (sb[0] != 0x07 || sb[1] != 0x04)

sb = NULL;

if (sb == NULL)

rb = NULL;

if (sb != NULL && req->cmd_req_len >= sizeof(uint32_t) * 2) {

if (sb[0] == 0x07 && sb[1] == 0x04)

rb = (uint32_t *)__qseecom_uvirt_to_kvirt(data, (uintptr_t)req->resp_buf);

else

sb = NULL;

}

In my opinion, this suggestion would not make any significant difference with code efficiency as the __qseecom_uvirt_to_kvirt() is simple arithmetic pointer conversion, not involving any memory mapping or whatever, so it's cost even if done unnecessary is nearing zero. I am not sure if this way the code would be more readable - to me it seems not really.
Concerning the cast to uint32_t * instead of void * - I guess that is a matter of taste.
I just followed the pattern already existing in the qseecom source code - you can see there casting to void * even though it is assigned to something else the same way I did it here - in fact, I copied it from the existing code.
So I am not sure what is better - to keep the usage uniform or try to invent something new.
In any case, there is no difference in the end.

__qseecom_uvirt_to_kvirt() is simple arithmetic pointer conversion,

Ok, then yes, doesn't affect efficiency.

I am not sure if this way the code would be more readable

It also adds the braces for the multi-line if code which is IMO required according to the style guide. But yeah, doesn't matter much

Concerning the cast to uint32_t * instead of void * - I guess that is a matter of taste.

Yeah, this is mostly about the adjacent usage of sizeof(uint32_t) and the access so one doesn't have to scroll up to the definition to verify this. Again, just minor

A nested 'if' is not a problem without braces as long as there is no 'else' there. No 'else' in the original code, no braces. You have 'else' in your suggestion, so you have braces too.

Flamefire · 2020-10-16T08:50:39Z

drivers/misc/qseecom.c

+		if (rb[0] == 0) {
+			if (strncmp((uint8_t *)rb + 0x31,
+				    "HWC_Yoshino_Com_", 16) == 0)
+			{
+				((uint8_t *)rb)[0x30] = 1;
+				// 0=not_allowed, 1=locked, 2=unlocked,
+				// 3=allowed_when_sl_is_unlocked,
+				// 4=allowed_since_sl_is_unlocked,
+				// 5=unsupported_bl_status->generic error
+				//   (no info in security test screen "none")
+			}
+		}


Suggested change

if (rb[0] == 0) {

if (strncmp((uint8_t *)rb + 0x31,

"HWC_Yoshino_Com_", 16) == 0)

{

((uint8_t *)rb)[0x30] = 1;

// 0=not_allowed, 1=locked, 2=unlocked,

// 3=allowed_when_sl_is_unlocked,

// 4=allowed_since_sl_is_unlocked,

// 5=unsupported_bl_status->generic error

// (no info in security test screen "none")

}

}

if (rb[0] == 0 && strncmp((uint8_t *)rb + 0x31, "HWC_Yoshino_Com_", 16) == 0) {

((uint8_t *)rb)[0x30] = 1;

// 0=not_allowed, 1=locked, 2=unlocked,

// 3=allowed_when_sl_is_unlocked,

// 4=allowed_since_sl_is_unlocked,

// 5=unsupported_bl_status->generic error

// (no info in security test screen "none")

}

This seems to be just formatting change - not really acceptable in my opinion.
I believe it is better to keep code lines shorter, less than 80 chars - is not that even required in kernel coding style to have the lines with less than 80 columns?

I mainly combined the nested if to an AND to reduce the nesting level which is a readability improvement and makes the check(s) explicit.
No idea about the 80 chars TBH, especially as TABs are used I find that hard to reason about...

Yes, seen the combined conditions with the 'and'. Still in the end the difference is only at formatting level.
Which is wrong in your case - it is very simple: tabs are always considered 8 chars wide in kernel code and there is 80 columns limit per line.
See the kernel coding style, for example here:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html

Flamefire · 2020-10-16T08:51:00Z

drivers/misc/qseecom.c

@@ -3425,6 +3427,19 @@ static int __qseecom_send_cmd(struct qseecom_dev_handle *data,
 		return -ENOENT;
 	}

+	if (!memcmp(data->client.app_name, "tzxflattest", strlen("tzxflattest")))
+	{
+		sb = (void *)__qseecom_uvirt_to_kvirt(data,


Suggested change

sb = (void *)__qseecom_uvirt_to_kvirt(data,

sb = (uint32_t *)__qseecom_uvirt_to_kvirt(data,

As mentioned above - check in the upstream code, how __qseecom_uvirt_to_kvirt() is used. Using (void *) just keeps that style.

j4nn · 2020-10-16T09:51:35Z

Sure, but as shown by history it usually is not one bug that causes the vulnerability but a chain of minor issues. That's why I wanted to be careful here.

Well the only write operation in the code is the changing of bl unlock state, after making very sure that it is exactly the response we want to modify. No way this could be escalated to anything.
The mentioned theoretical possibility for kernel crash on out of boundary read is not existing in practice as the previous input args validation makes sure that all pointers are within the qseecom shared buffer, which is size aligned to page size in ion allocator.
I have added the checks as suggested anyway, even though they are not needed in my opinion.

I hope you don't feel nagged by that, just want to make sure, everything is in the best possible shape, given that it is a kernel change :) Hence only a comment, not a condition for acceptance (not that I can enforce that in any way ;) )

I am ok with any suggestions, but I do not feel like those in the comment above are really worth of another commit and re-testing.

Flamefire · 2020-10-16T10:40:06Z

I am ok with any suggestions, but I do not feel like those in the comment above are really worth of another commit and re-testing.

Sure. That's why it is a suggestion and it may still help the maintainer(s) to reason about the change(s). I don't mind keeping it the way it is now.

j4nn · 2020-10-16T11:05:39Z

Thank you. It is always good if somebody reviews the code.

[ Upstream commit e773ca7 ] Actually, burst size is equal to '1 << desc->rqcfg.brst_size'. we should use burst size, not desc->rqcfg.brst_size. dma memcpy performance on Rockchip RV1126 @ 1512MHz A7, 1056MHz LPDDR3, 200MHz DMA: dmatest: /# echo dma0chan0 > /sys/module/dmatest/parameters/channel /# echo 4194304 > /sys/module/dmatest/parameters/test_buf_size /# echo 8 > /sys/module/dmatest/parameters/iterations /# echo y > /sys/module/dmatest/parameters/norandom /# echo y > /sys/module/dmatest/parameters/verbose /# echo 1 > /sys/module/dmatest/parameters/run dmatest: dma0chan0-copy0: result whatawurst#1: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result whatawurst#2: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result whatawurst#3: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result whatawurst#4: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result whatawurst#5: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result whatawurst#6: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result whatawurst#7: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result whatawurst#8: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 Before: dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 48 iops 200338 KB/s (0) After this patch: dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 179 iops 734873 KB/s (0) After this patch and increase dma clk to 400MHz: dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 259 iops 1062929 KB/s (0) Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com> Link: https://lore.kernel.org/r/1605326106-55681-1-git-send-email-sugar.zhang@rock-chips.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 4a9d81c ] If the elem is deleted during be iterated on it, the iteration process will fall into an endless loop. kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [nfsd:17137] PID: 17137 TASK: ffff8818d93c0000 CPU: 4 COMMAND: "nfsd" [exception RIP: __state_in_grace+76] RIP: ffffffffc00e817c RSP: ffff8818d3aefc98 RFLAGS: 00000246 RAX: ffff881dc0c38298 RBX: ffffffff81b03580 RCX: ffff881dc02c9f50 RDX: ffff881e3fce8500 RSI: 0000000000000001 RDI: ffffffff81b03580 RBP: ffff8818d3aefca0 R8: 0000000000000020 R9: ffff8818d3aefd40 R10: ffff88017fc03800 R11: ffff8818e83933c0 R12: ffff8818d3aefd40 R13: 0000000000000000 R14: ffff8818e8391068 R15: ffff8818fa6e4000 CS: 0010 SS: 0018 #0 [ffff8818d3aefc98] opens_in_grace at ffffffffc00e81e3 [grace] whatawurst#1 [ffff8818d3aefca8] nfs4_preprocess_stateid_op at ffffffffc02a3e6c [nfsd] whatawurst#2 [ffff8818d3aefd18] nfsd4_write at ffffffffc028ed5b [nfsd] whatawurst#3 [ffff8818d3aefd80] nfsd4_proc_compound at ffffffffc0290a0d [nfsd] whatawurst#4 [ffff8818d3aefdd0] nfsd_dispatch at ffffffffc027b800 [nfsd] whatawurst#5 [ffff8818d3aefe08] svc_process_common at ffffffffc02017f3 [sunrpc] whatawurst#6 [ffff8818d3aefe70] svc_process at ffffffffc0201ce3 [sunrpc] whatawurst#7 [ffff8818d3aefe98] nfsd at ffffffffc027b117 [nfsd] whatawurst#8 [ffff8818d3aefec8] kthread at ffffffff810b88c1 whatawurst#9 [ffff8818d3aeff50] ret_from_fork at ffffffff816d1607 The troublemake elem: crash> lock_manager ffff881dc0c38298 struct lock_manager { list = { next = 0xffff881dc0c38298, prev = 0xffff881dc0c38298 }, block_opens = false } Fixes: c87fb4a ("lockd: NLM grace period shouldn't block NFSv4 opens") Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit e8bd76ede155fd54d8c41d045dda43cd3174d506 ] kernel panic trace looks like: whatawurst#5 [ffffb9e08698fc80] do_page_fault at ffffffffb666e0d7 whatawurst#6 [ffffb9e08698fcb0] page_fault at ffffffffb70010fe [exception RIP: amp_read_loc_assoc_final_data+63] RIP: ffffffffc06ab54f RSP: ffffb9e08698fd68 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8c8845a5a000 RCX: 0000000000000004 RDX: 0000000000000000 RSI: ffff8c8b9153d000 RDI: ffff8c8845a5a000 RBP: ffffb9e08698fe40 R8: 00000000000330e0 R9: ffffffffc0675c94 R10: ffffb9e08698fe58 R11: 0000000000000001 R12: ffff8c8b9cbf6200 R13: 0000000000000000 R14: 0000000000000000 R15: ffff8c8b2026da0b ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 whatawurst#7 [ffffb9e08698fda8] hci_event_packet at ffffffffc0676904 [bluetooth] whatawurst#8 [ffffb9e08698fe50] hci_rx_work at ffffffffc06629ac [bluetooth] whatawurst#9 [ffffb9e08698fe98] process_one_work at ffffffffb66f95e7 hcon->amp_mgr seems NULL triggered kernel panic in following line inside function amp_read_loc_assoc_final_data set_bit(READ_LOC_AMP_ASSOC_FINAL, &mgr->state); Fixed by checking NULL for mgr. Signed-off-by: Gopal Tiwari <gtiwari@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

Leo88gav · 2021-03-14T12:49:29Z

how model sony? be was to xa2 sony pioneer?

Flamefire · 2021-04-22T19:34:20Z

@derfelot Are you willing to include this? IMO it's a useful addition

commit 2800aadc18a64c96b051bcb7da8a7df7d505db3f upstream. It's possible for iwl_pcie_enqueue_hcmd() to be called with hard IRQs disabled (e.g. from LED core). We can't enable BHs in such a situation. Turn the unconditional BH-enable/BH-disable code into hardirq-disable/conditional-enable. This fixes the warning below. WARNING: CPU: 1 PID: 1139 at kernel/softirq.c:178 __local_bh_enable_ip+0xa5/0xf0 CPU: 1 PID: 1139 Comm: NetworkManager Not tainted 5.12.0-rc1-00004-gb4ded168af79 whatawurst#7 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 RIP: 0010:__local_bh_enable_ip+0xa5/0xf0 Code: f7 69 e8 ee 23 14 00 fb 66 0f 1f 44 00 00 65 8b 05 f0 f4 f7 69 85 c0 74 3f 48 83 c4 08 5b c3 65 8b 05 9b fe f7 69 85 c0 75 8e <0f> 0b eb 8a 48 89 3c 24 e8 4e 20 14 00 48 8b 3c 24 eb 91 e8 13 4e RSP: 0018:ffffafd580b13298 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: 0000000000000000 RDX: 0000000000000003 RSI: 0000000000000201 RDI: ffffffffc1272389 RBP: ffff96517ae4c018 R08: 0000000000000001 R09: 0000000000000000 R10: ffffafd580b13178 R11: 0000000000000001 R12: ffff96517b060000 R13: 0000000000000000 R14: ffffffff80000000 R15: 0000000000000001 FS: 00007fc604ebefc0(0000) GS:ffff965267480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055fb3fef13b2 CR3: 0000000109112004 CR4: 00000000003706e0 Call Trace: ? _raw_spin_unlock_bh+0x1f/0x30 iwl_pcie_enqueue_hcmd+0x5d9/0xa00 [iwlwifi] iwl_trans_txq_send_hcmd+0x6c/0x430 [iwlwifi] iwl_trans_send_cmd+0x88/0x170 [iwlwifi] ? lock_acquire+0x277/0x3d0 iwl_mvm_send_cmd+0x32/0x80 [iwlmvm] iwl_mvm_led_set+0xc2/0xe0 [iwlmvm] ? led_trigger_event+0x46/0x70 led_trigger_event+0x46/0x70 ieee80211_do_open+0x5c5/0xa20 [mac80211] ieee80211_open+0x67/0x90 [mac80211] __dev_open+0xd4/0x150 __dev_change_flags+0x19e/0x1f0 dev_change_flags+0x23/0x60 do_setlink+0x30d/0x1230 ? lock_is_held_type+0xb4/0x120 ? __nla_validate_parse.part.7+0x57/0xcb0 ? __lock_acquire+0x2e1/0x1a50 __rtnl_newlink+0x560/0x910 ? __lock_acquire+0x2e1/0x1a50 ? __lock_acquire+0x2e1/0x1a50 ? lock_acquire+0x277/0x3d0 ? sock_def_readable+0x5/0x290 ? lock_is_held_type+0xb4/0x120 ? find_held_lock+0x2d/0x90 ? sock_def_readable+0xb3/0x290 ? lock_release+0x166/0x2a0 ? lock_is_held_type+0x90/0x120 rtnl_newlink+0x47/0x70 rtnetlink_rcv_msg+0x25c/0x470 ? netlink_deliver_tap+0x97/0x3e0 ? validate_linkmsg+0x350/0x350 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x1b2/0x280 netlink_sendmsg+0x336/0x450 sock_sendmsg+0x5b/0x60 ____sys_sendmsg+0x1ed/0x250 ? copy_msghdr_from_user+0x5c/0x90 ___sys_sendmsg+0x88/0xd0 ? lock_is_held_type+0xb4/0x120 ? find_held_lock+0x2d/0x90 ? lock_release+0x166/0x2a0 ? __fget_files+0xfe/0x1d0 ? __sys_sendmsg+0x5e/0xa0 __sys_sendmsg+0x5e/0xa0 ? lockdep_hardirqs_on_prepare+0xd9/0x170 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fc605c9572d Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 da ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 2e ef ff ff 48 RSP: 002b:00007fffc83789f0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000055ef468570c0 RCX: 00007fc605c9572d RDX: 0000000000000000 RSI: 00007fffc8378a30 RDI: 000000000000000c RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 00007fffc8378b80 R14: 00007fffc8378b7c R15: 0000000000000000 irq event stamp: 170785 hardirqs last enabled at (170783): [<ffffffff9609a8c2>] __local_bh_enable_ip+0x82/0xf0 hardirqs last disabled at (170784): [<ffffffff96a8613d>] _raw_read_lock_irqsave+0x8d/0x90 softirqs last enabled at (170782): [<ffffffffc1272389>] iwl_pcie_enqueue_hcmd+0x5d9/0xa00 [iwlwifi] softirqs last disabled at (170785): [<ffffffffc1271ec6>] iwl_pcie_enqueue_hcmd+0x116/0xa00 [iwlwifi] Signed-off-by: Jiri Kosina <jkosina@suse.cz> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v12.0.0-rc3 Acked-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2103021125430.12405@cbobk.fhfr.pm Signed-off-by: Jari Ruusu <jariruusu@protonmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit f63d24baff787e13b723d86fe036f84bdbc35045 ] This fixes the following trace caused by receiving HCI_EV_DISCONN_PHY_LINK_COMPLETE which does call hci_conn_del without first checking if conn->type is in fact AMP_LINK and in case it is do properly cleanup upper layers with hci_disconn_cfm: ================================================================== BUG: KASAN: use-after-free in hci_send_acl+0xaba/0xc50 Read of size 8 at addr ffff88800e404818 by task bluetoothd/142 CPU: 0 PID: 142 Comm: bluetoothd Not tainted 5.17.0-rc5-00006-gda4022eeac1a whatawurst#7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x150 kasan_report.cold+0x7f/0x11b hci_send_acl+0xaba/0xc50 l2cap_do_send+0x23f/0x3d0 l2cap_chan_send+0xc06/0x2cc0 l2cap_sock_sendmsg+0x201/0x2b0 sock_sendmsg+0xdc/0x110 sock_write_iter+0x20f/0x370 do_iter_readv_writev+0x343/0x690 do_iter_write+0x132/0x640 vfs_writev+0x198/0x570 do_writev+0x202/0x280 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RSP: 002b:00007ffce8a099b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 14 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RDX: 0000000000000001 RSI: 00007ffce8a099e0 RDI: 0000000000000015 RAX: ffffffffffffffda RBX: 00007ffce8a099e0 RCX: 00007f788fc3cf77 R10: 00007ffce8af7080 R11: 0000000000000246 R12: 000055e4ccf75580 RBP: 0000000000000015 R08: 0000000000000002 R09: 0000000000000001 </TASK> R13: 000055e4ccf754a0 R14: 000055e4ccf75cd0 R15: 000055e4ccf4a6b0 Allocated by task 45: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 hci_chan_create+0x9a/0x2f0 l2cap_conn_add.part.0+0x1a/0xdc0 l2cap_connect_cfm+0x236/0x1000 le_conn_complete_evt+0x15a7/0x1db0 hci_le_conn_complete_evt+0x226/0x2c0 hci_le_meta_evt+0x247/0x450 hci_event_packet+0x61b/0xe90 hci_rx_work+0x4d5/0xc50 process_one_work+0x8fb/0x15a0 worker_thread+0x576/0x1240 kthread+0x29d/0x340 ret_from_fork+0x1f/0x30 Freed by task 45: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xfb/0x130 kfree+0xac/0x350 hci_conn_cleanup+0x101/0x6a0 hci_conn_del+0x27e/0x6c0 hci_disconn_phylink_complete_evt+0xe0/0x120 hci_event_packet+0x812/0xe90 hci_rx_work+0x4d5/0xc50 process_one_work+0x8fb/0x15a0 worker_thread+0x576/0x1240 kthread+0x29d/0x340 ret_from_fork+0x1f/0x30 The buggy address belongs to the object at ffff88800c0f0500 The buggy address is located 24 bytes inside of which belongs to the cache kmalloc-128 of size 128 The buggy address belongs to the page: 128-byte region [ffff88800c0f0500, ffff88800c0f0580) flags: 0x100000000000200(slab|node=0|zone=1) page:00000000fe45cd86 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xc0f0 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 raw: 0100000000000200 ffffea00003a2c80 dead000000000004 ffff8880078418c0 page dumped because: kasan: bad access detected ffff88800c0f0400: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc Memory state around the buggy address: >ffff88800c0f0500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88800c0f0480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800c0f0580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ================================================================== ffff88800c0f0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Reported-by: Sönke Huster <soenke.huster@eknoes.de> Tested-by: Sönke Huster <soenke.huster@eknoes.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit af68656d66eda219b7f55ce8313a1da0312c79e1 ] While handling PCI errors (AER flow) driver tries to disable NAPI [napi_disable()] after NAPI is deleted [__netif_napi_del()] which causes unexpected system hang/crash. System message log shows the following: ======================================= [ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures. [ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)' [ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking bnx2x->error_detected(IO frozen) [ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports: 'need reset' [ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking bnx2x->error_detected(IO frozen) [ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports: 'need reset' [ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset' [ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow: [ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows: [ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow: [ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows: [ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset' [ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking bnx2x->slot_reset() [ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing... [ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset --> driver unload Uninterruptible tasks ===================== crash> ps | grep UN 213 2 11 c000000004c89e00 UN 0.0 0 0 [eehd] 215 2 0 c000000004c80000 UN 0.0 0 0 [kworker/0:2] 2196 1 28 c000000004504f00 UN 0.1 15936 11136 wickedd 4287 1 9 c00000020d076800 UN 0.0 4032 3008 agetty 4289 1 20 c00000020d056680 UN 0.0 7232 3840 agetty 32423 2 26 c00000020038c580 UN 0.0 0 0 [kworker/26:3] 32871 4241 27 c0000002609ddd00 UN 0.1 18624 11648 sshd 32920 10130 16 c00000027284a100 UN 0.1 48512 12608 sendmail 33092 32987 0 c000000205218b00 UN 0.1 48512 12608 sendmail 33154 4567 16 c000000260e51780 UN 0.1 48832 12864 pickup 33209 4241 36 c000000270cb6500 UN 0.1 18624 11712 sshd 33473 33283 0 c000000205211480 UN 0.1 48512 12672 sendmail 33531 4241 37 c00000023c902780 UN 0.1 18624 11648 sshd EEH handler hung while bnx2x sleeping and holding RTNL lock =========================================================== crash> bt 213 PID: 213 TASK: c000000004c89e00 CPU: 11 COMMAND: "eehd" #0 [c000000004d477e0] __schedule at c000000000c70808 whatawurst#1 [c000000004d478b0] schedule at c000000000c70ee0 whatawurst#2 [c000000004d478e0] schedule_timeout at c000000000c76dec whatawurst#3 [c000000004d479c0] msleep at c0000000002120cc whatawurst#4 [c000000004d479f0] napi_disable at c000000000a06448 ^^^^^^^^^^^^^^^^ whatawurst#5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x] whatawurst#6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x] whatawurst#7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc whatawurst#8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8 whatawurst#9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64 And the sleeping source code ============================ crash> dis -ls c000000000a06448 FILE: ../net/core/dev.c LINE: 6702 6697 { 6698 might_sleep(); 6699 set_bit(NAPI_STATE_DISABLE, &n->state); 6700 6701 while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) * 6702 msleep(1); 6703 while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state)) 6704 msleep(1); 6705 6706 hrtimer_cancel(&n->timer); 6707 6708 clear_bit(NAPI_STATE_DISABLE, &n->state); 6709 } EEH calls into bnx2x twice based on the system log above, first through bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes the following call chains: bnx2x_io_error_detected() +-> bnx2x_eeh_nic_unload() +-> bnx2x_del_all_napi() +-> __netif_napi_del() bnx2x_io_slot_reset() +-> bnx2x_netif_stop() +-> bnx2x_napi_disable() +->napi_disable() Fix this by correcting the sequence of NAPI APIs usage, that is delete the NAPI after disabling it. Fixes: 7fa6f34 ("bnx2x: AER revised") Reported-by: David Christensen <drc@linux.vnet.ibm.com> Tested-by: David Christensen <drc@linux.vnet.ibm.com> Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 38c9c22a85aeed28d0831f230136e9cf6fa2ed44 upstream. Syzkaller reported use-after-free bug as follows: ================================================================== BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130 Read of size 2 at addr ffff8880751acee8 by task a.out/879 CPU: 7 PID: 879 Comm: a.out Not tainted 5.19.0-rc4-next-20220630-00001-gcc5218c8bd2c-dirty whatawurst#7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x1c0/0x2b0 print_address_description.constprop.0.cold+0xd4/0x484 print_report.cold+0x55/0x232 kasan_report+0xbf/0xf0 ntfs_ucsncmp+0x123/0x130 ntfs_are_names_equal.cold+0x2b/0x41 ntfs_attr_find+0x43b/0xb90 ntfs_attr_lookup+0x16d/0x1e0 ntfs_read_locked_attr_inode+0x4aa/0x2360 ntfs_attr_iget+0x1af/0x220 ntfs_read_locked_inode+0x246c/0x5120 ntfs_iget+0x132/0x180 load_system_files+0x1cc6/0x3480 ntfs_fill_super+0xa66/0x1cf0 mount_bdev+0x38d/0x460 legacy_get_tree+0x10d/0x220 vfs_get_tree+0x93/0x300 do_new_mount+0x2da/0x6d0 path_mount+0x496/0x19d0 __x64_sys_mount+0x284/0x300 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f3f2118d9ea Code: 48 8b 0d a9 f4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 f4 0b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc269deac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3f2118d9ea RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffc269dec00 RBP: 00007ffc269dec80 R08: 00007ffc269deb00 R09: 00007ffc269dec44 R10: 0000000000000000 R11: 0000000000000202 R12: 000055f81ab1d220 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The buggy address belongs to the physical page: page:0000000085430378 refcount:1 mapcount:1 mapping:0000000000000000 index:0x555c6a81d pfn:0x751ac memcg:ffff888101f7e180 anon flags: 0xfffffc00a0014(uptodate|lru|mappedtodisk|swapbacked|node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc00a0014 ffffea0001bf2988 ffffea0001de2448 ffff88801712e201 raw: 0000000555c6a81d 0000000000000000 0000000100000000 ffff888101f7e180 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880751acd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880751ace00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880751ace80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ ffff8880751acf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880751acf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== The reason is that struct ATTR_RECORD->name_offset is 6485, end address of name string is out of bounds. Fix this by adding sanity check on end address of attribute name string. [akpm@linux-foundation.org: coding-style cleanups] [chenxiaosong2@huawei.com: cleanup suggested by Hawkins Jiawei] Link: https://lkml.kernel.org/r/20220709064511.3304299-1-chenxiaosong2@huawei.com Link: https://lkml.kernel.org/r/20220707105329.4020708-1-chenxiaosong2@huawei.com Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: ChenXiaoSong <chenxiaosong2@huawei.com> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…loc() commit d29f59051d3a07b81281b2df2b8c9dfe4716067f upstream. The voice allocator sometimes begins allocating from near the end of the array and then wraps around, however snd_emu10k1_pcm_channel_alloc() accesses the newly allocated voices as if it never wrapped around. This results in out of bounds access if the first voice has a high enough index so that first_voice + requested_voice_count > NUM_G (64). The more voices are requested, the more likely it is for this to occur. This was initially discovered using PipeWire, however it can be reproduced by calling aplay multiple times with 16 channels: aplay -r 48000 -D plughw:CARD=Live,DEV=3 -c 16 /dev/zero UBSAN: array-index-out-of-bounds in sound/pci/emu10k1/emupcm.c:127:40 index 65 is out of range for type 'snd_emu10k1_voice [64]' CPU: 1 PID: 31977 Comm: aplay Tainted: G W IOE 6.0.0-rc2-emu10k1+ whatawurst#7 Hardware name: ASUSTEK COMPUTER INC P5W DH Deluxe/P5W DH Deluxe, BIOS 3002 07/22/2010 Call Trace: <TASK> dump_stack_lvl+0x49/0x63 dump_stack+0x10/0x16 ubsan_epilogue+0x9/0x3f __ubsan_handle_out_of_bounds.cold+0x44/0x49 snd_emu10k1_playback_hw_params+0x3bc/0x420 [snd_emu10k1] snd_pcm_hw_params+0x29f/0x600 [snd_pcm] snd_pcm_common_ioctl+0x188/0x1410 [snd_pcm] ? exit_to_user_mode_prepare+0x35/0x170 ? do_syscall_64+0x69/0x90 ? syscall_exit_to_user_mode+0x26/0x50 ? do_syscall_64+0x69/0x90 ? exit_to_user_mode_prepare+0x35/0x170 snd_pcm_ioctl+0x27/0x40 [snd_pcm] __x64_sys_ioctl+0x95/0xd0 do_syscall_64+0x5c/0x90 ? do_syscall_64+0x69/0x90 ? do_syscall_64+0x69/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Signed-off-by: Tasos Sahanidis <tasos@tasossah.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/3707dcab-320a-62ff-63c0-73fc201ef756@tasossah.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>