Skip to content

Commit

Permalink
[android] add hardware specific workaround for Nexus9 in armv7 mode
Browse files Browse the repository at this point in the history
we got a couple of bug reports, all with the same failure:

https://bugzilla.xamarin.com/show_bug.cgi?id=44907
https://bugzilla.xamarin.com/show_bug.cgi?id=46482
https://bugzilla.xamarin.com/show_bug.cgi?id=51791

The bugs are private, therefore here what I wrote:

```
Thank you Matthias, this was very helpful.

>>> 09-30 12:24:50.347: A/DEBUG(6211): pid: 6191, tid: 6209, name: Thread-4  >>> com.distech.x50.ui.droid <<<
>>> 09-30 12:24:50.347: A/DEBUG(6211): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8
>>> 09-30 12:24:50.347: A/DEBUG(6211):     r0 d6b58728  r1 00004001  r2 00000000  r3 d6b58738
>>> 09-30 12:24:50.347: A/DEBUG(6211):     r4 00000008  r5 ea97427c  r6 00001f3c  r7 00000015
>>> 09-30 12:24:50.347: A/DEBUG(6211):     r8 00001f38  r9 00000015  sl d6b58728  fp d67fed90
>>> 09-30 12:24:50.347: A/DEBUG(6211):     ip ea9742c0  sp d67fed60  lr ea7a4504  pc ea716a58  cpsr 800e0010

The crash happens on this assignment:
https://github.com/mono/mono/blob/de1865dad5c0350f391fedcaa08f02f610530d3f/mono/mini/mini-generic-sharing.c#L418
Here the according disassembly:
https://gist.github.com/lewurm/b1094749027c9e5ea19fdc4fac7905a7

The crash happens in the last loop iteration (`r9=i`, `r7=slot`).  Looking
at the machine code it just _cannot_ happen, which is confirmed by the C
code as well: `*oti` successfully happens in the if check, but after
returning from `alloc_oti()`, `oti` doesn't contain a valid address anymore.
I suspect some weird hardware issue that fails to restore all registers
properly from the stack.
```
[...]

```
Thanks again Matthias.  Unfortunately, I'm out of ideas, and I can't
blame anything but the hardware. The situation we see is too weird. We
segfault at offset `0xdda60: str sl, [r4]`, but really we should
already segfault at offset `0xdda2c: ldrge   sl, [r4, mono#8]!"`.  So I
suspect two things why this could happen:

(1) The instruction at `dda2c` fails to do the post-increment correctly
    for *whatever* reason.

(2) Something along the execution path corrupts the stackslot, where
    `r4` is saved, in such a way that it *exactly* masks it with `0xf`.
    Everything else on the stack looks fine, so this is sort of very
    unlikely to be honest.

This only happens on a very specific device: The Nexus 9 is the only
device that was ever shipped with the Tegra K1 T132.  I suspect an issue
in the binary translation layer of `armv7` instruction set to the internal
micro-ops of the CPU.  I tried to stress test the instruction in
question (see https://github.com/lewurm/ldrinsntest), however I was not
able to trigger a crash.  So either, I'm missing some context in order
to trigger the bug or I'm on a completely wrong track.

That said, even if we could proof that it is indeed a hardware issue,
the workaround is also non-trivial (it would then either need a fix in
gcc or require a microcode update by the chip vendor).
```

And then we saw:
golang/go#19809 (comment)

Another hint that this device is buggy.
  • Loading branch information
lewurm authored and jonpryor committed Jun 1, 2017
1 parent a5be71a commit f5f0523
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions mono/mini/mini-generic-sharing.c
Expand Up @@ -390,7 +390,17 @@ info_has_identity (MonoRgctxInfoType info_type)
/*
* LOCKING: loader lock
*/
#if defined(PLATFORM_ANDROID) && defined(TARGET_ARM)
/* work around for HW bug on Nexus9 when running on armv7 */
#ifdef __clang__
static __attribute__ ((optnone)) void
#else
/* gcc */
static __attribute__ ((optimize("O0"))) void
#endif
#else
static void
#endif
rgctx_template_set_slot (MonoImage *image, MonoRuntimeGenericContextTemplate *template_, int type_argc,
int slot, gpointer data, MonoRgctxInfoType info_type)
{
Expand Down

0 comments on commit f5f0523

Please sign in to comment.