Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Display impl doesn't render full output #153

Open
mx00s opened this issue May 1, 2024 · 4 comments
Open

Simple Display impl doesn't render full output #153

mx00s opened this issue May 1, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@mx00s
Copy link
Contributor

mx00s commented May 1, 2024

TLDR: See the minimal example in 41d2649 on the display-bug branch and the following repro steps:

$ git checkout display-bug
$ cargo build --release --package nexus-tools --bin cargo-nexus
$ cd examples
$ ../target/release/cargo-nexus nexus run --bin display_bug
(0)
(0
PANIC

This Display impl doesn't render the trailing ) as expected when using the .to_string() method.

struct Parens(u8);

impl Display for Parens {
    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
        write!(f, "({})", self.0)
    }
}

#[nexus_rt::main]
fn main() {
    let x = Parens(0);

    // as expected, prints `(0)`
    nexus_rt::println!("{}", x);

    // unexpectedly, prints `(0`
    nexus_rt::println!("{}", x.to_string());

    // assertion fails unless the workaround in the `Display` impl is used instead
    assert_eq!(x.to_string(), r#"(0)"#);
}

I'm unclear yet why the .to_string() method in particular doesn't work as expected yet nexus_rt::println!("{}", x); does. I'm also not sure how to write the assertion without .to_string().

Workaround

Using this Display implementation instead is apparently a viable workaround.

impl Display for Parens {
    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
        write!(f, "({}", self.0)?;
        write!(f, ")")
    }
}
@danielmarinq
Copy link
Contributor

danielmarinq commented May 1, 2024

Wow, strange bug -- possible sources I think: syscall, memory allocator, RISC-V to NVM translation

@mx00s mx00s added the bug Something isn't working label May 1, 2024
@mx00s
Copy link
Contributor Author

mx00s commented May 1, 2024

Thanks for the leads. Sam offered some tips too.

It occurred to me I hadn't tried simplifying the struct more. Interestingly, the unexpected behavior disappears with this change.

@@ -7,11 +7,11 @@ use alloc::string::ToString;
 use core::fmt::Display;
 use nexus_rt::Write;
 
-struct Parens(u8);
+struct Parens;
 
 impl Display for Parens {
     fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
-        write!(f, "({})", self.0)
+        write!(f, "({})", 0u8)
     }
 }
 
@@ -27,7 +27,7 @@ impl Display for Parens {
 
 #[nexus_rt::main]
 fn main() {
-    let x = Parens(0);
+    let x = Parens;
 
     // as expected, prints `(0)`
     nexus_rt::println!("{}", x);

I can look into this more and try to isolate the fault. Assigning to myself.

@mx00s mx00s self-assigned this May 1, 2024
@mx00s
Copy link
Contributor Author

mx00s commented May 2, 2024

Found some more clues with the help of cargo nexus run's -v option. Outputs below have been piped through cat -n | awk 'length($2) > 7' | grep ecall, so the first column is the line number, and only lines in which the original first column (now the second) output is over 7 characters long. That latter column, for pc, normally has 7 hex digits, but may be longer when the program's print output is emitted there.

Output without workaround

Here the first println! spans 3 separate ecalls ((, 0, and )), and the (incomplete, (0) second println! happens in one ecall.

   294	(0002840 00000073 ecall                                   0     2844
   474	00002840 00000073 ecall                                   0     2844
   551	)0002840 00000073 ecall                                   0     2844
  2564	(00002840 00000073 ecall                                   0     2844

Output with workaround

And when the aforementioned workaround is applied,

  • the same sort of pattern for the first println! (although the 3rd ecall happens on line 660 of the output as opposed to 551).
  • the second println! is accomplished in a single ecall, with the full expected output, this time on line 2944 of the output as opposed to 2564.
   294	(0002900 00000073 ecall                                   0     2904
   474	00002900 00000073 ecall                                   0     2904
   660	)0002900 00000073 ecall                                   0     2904
  2944	(0)0002900 00000073 ecall                                   0     2904

I also notice the PC (final column) is slightly different between these runs, but that doesn't seem terribly surprising.

@mx00s
Copy link
Contributor Author

mx00s commented May 3, 2024

I made some minor changes to the program in 0e1a90d to facilitate debugging, like adding more print! calls and prefixing+suffixing all outputs with ~~. I also added some tracing::trace! calls to get useful log output from some relevant parts of the codebase.

Full outputs are available in 4dacede.

Excerpt of RUST_LOG="trace" ./examples/repro.sh

Portions retained include:

  • path to the compiled riscv32i ELF binary
  • ranges of ELF segments used to initialize the VM's memory and the virtual address where they're loaded
  • all riscv32i ECALL instructions and the corresponding Nexus VM syscalls
  • interleaved nexus_rt::print! outputs
TRACE nexus_riscv: loading ELF path="/Users/sage/g/nexus-zkvm/display-bug/target/riscv32i-unknown-none-elf/debug/display_bug"
TRACE nexus_riscv: init memory from ELF segment elf_range=4096..29652 virt_addr=0
TRACE nexus_riscv: init memory from ELF segment elf_range=30640..37396 virt_addr=51120
TRACE nexus_riscv: init memory from ELF segment elf_range=37396..37396 virt_addr=57876
...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=51576 inp2=6
~~A~~
...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=51552 inp2=3
~~(TRACE ...
...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=4193543 inp2=1
0TRACE ...
...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=51555 inp2=3
)~~TRACE ...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=51640 inp2=1

TRACE ...
...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=51676 inp2=6
~~B~~
...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=57880 inp2=4
~~(0TRACE ...
...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=51640 inp2=1

TRACE ...
...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=51724 inp2=6
~~C~~
...
TRACE nexus_riscv::eval: current instruction encoded=[115, 0, 0, 0] decoded=Inst { pc: 12436, len: 4, word: 115, inst: ECALL }
TRACE nexus_vm::syscalls: pc=12436 num=1 inp1=53431 inp2=6
PANIC
...

Observations about syscalls:

  • pc is consistently 12436
  • num is consistently 1 (for write_log)
  • inp1 is the string's address
  • inp2 is the string's length, including the trailing \n when EOL

Excerpt of llvm-objdump -D -t -s /Users/sage/g/nexus-zkvm/display-bug/target/riscv32i-unknown-none-elf/debug/display_bug

Contents of section .data:
 ...
 c960 7e7e2829 7e7e0000 60c90000 03000000  ~~()~~..`.......
 c970 63c90000 03000000 7e7e417e 7e0a0000  c.......~~A~~...
 c980 78c90000 06000000 6578616d 706c6573  x.......examples
 c990 2f737263 2f62696e 2f646973 706c6179  /src/bin/display
 c9a0 5f627567 2e727300 88c90000 1f000000  _bug.rs.........
 c9b0 20000000 05000000 0a000000 b8c90000   ...............
 c9c0 00000000 b8c90000 01000000 88c90000  ................
 c9d0 1f000000 21000000 05000000 7e7e427e  ....!.......~~B~
 c9e0 7e0a0000 dcc90000 06000000 88c90000  ~...............
 c9f0 1f000000 22000000 05000000 88c90000  ...."...........
 ca00 1f000000 23000000 05000000 7e7e437e  ....#.......~~C~
 ca10 7e0a0000 0cca0000 06000000 88c90000  ~...............
 ca20 1f000000 24000000 05000000 7e7e2830  ....$.......~~(0
 ca30 297e7e00 2cca0000 07000000 88c90000  )~~.,...........
 ...

Observations about the syscalls in relation to the .data section:

  • ~~A~~\n: at 0xc978 (51576)
  • ~~(: at 0xc960 (51552)
  • 0: syscall suggests it's at 0x3ffd07 (4193543), which is outside the .data section
  • )~~: at 0xc963 (51555)
  • \n: at 0xc9b8 (51640)
  • ~~B~~\n: at 0xc9dc (51676)
  • ~~(0: syscall suggests it's at 0xe218 (57880), which is outside the .data section
    • Interestingly, the full string we'd want to print here for the program to be correct, ~~(0)~~, already exists in the .data section at 0xca2c, but the Nexus VM (and maybe even RISC-V?) doesn't read it directly.
  • \n: at 0xc9b8 (51640), again
  • ~~C~~\n: at 0xca0c (51724)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants