Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPFS Icicle Kit knsh config failure #12259

Open
MainframeReboot opened this issue Apr 29, 2024 · 11 comments
Open

MPFS Icicle Kit knsh config failure #12259

MainframeReboot opened this issue Apr 29, 2024 · 11 comments

Comments

@MainframeReboot
Copy link

Hi,

I am new to NuttX and have hit a wall when it comes to running NuttX as a kernel build. The provided flat build (nsh) config worked and I was able to run NuttX without issue. Building NuttX using the provided kernel build config (knsh) causes errors when running on the Icicle kit:

image

I have attempted to debug this issue on my own and while comparing the Icicle kit knsh config to other RISC-V platform knsh configs, I've noticed that a number of parameters that differ. For instance, in the Icicle kit knsh config, the number of pages for DATA, HEAP and TEXT are all set to 0. As are the addresses for TEXT and DATA.

CONFIG_ARCH_DATA_NPAGES=0
CONFIG_ARCH_DATA_VBASE=0xC0000000
CONFIG_ARCH_HEAP_NPAGES=0
CONFIG_ARCH_HEAP_VBASE=0x00000000
CONFIG_ARCH_TEXT_NPAGES=0
CONFIG_ARCH_TEXT_VBASE=0x00000000

Is this potentially the problem or am I completely off track here? Any direction here would be greatly appreciated!

@acassis
Copy link
Contributor

acassis commented Apr 29, 2024

@jkivilin @pussuw do you know about this issue?

@pussuw
Copy link
Contributor

pussuw commented Apr 30, 2024

Looks like a pmp/memory protection issue. The knsh config does not work out of the box, you need opensbi ton run this config

@MainframeReboot
Copy link
Author

Looks like a pmp/memory protection issue. The knsh config does not work out of the box, you need opensbi ton run this config

I took a look and OpenSBI is running (at least the logo displays so I assume it is running as part of the HSS). Is there any more information I can get regarding what needs to be done to get this to work?

@pussuw
Copy link
Contributor

pussuw commented Apr 30, 2024

Opensbi domains control pmp configuration. Set the memory areas and access rights accordingly. Or open up everything. Disabling pmp is not enough as supervisor/user modes need to be given explicit access to memory.

I don't use hss so can't help you with that

@sastel
Copy link

sastel commented May 1, 2024

@pussuw to summarize what you are saying, we have to set up the HSS OpenSBI memory areas and access rights in a way that match/support the NuttX knsh config. Did I get that right?

@MainframeReboot I'm pretty sure the PMP configuration is done through Microchip's MSS Configurator tool, which generates an .xml file. The HSS makefile runs this .xml file through a Python script to generate a bunch of header/source files for the HSS build. This config is what the HSS applies to the MSS during boot.

You can see here that the sample .xml for the Icicle kit has config for PMP ICICLE_MSS_mss_cfg.xml#L797-L800:

            <register address="0x00000016" description="pmp setup register, 64 bits" name="MPU_CFG_PMP2">
            <field Type="RW" name="PMP" offset="0" width="38">0xFFFFFFFFF</field>
            <field Type="RW" name="RESERVED" offset="38" width="18">0x0</field>
            <field Type="RW" name="MODE" offset="56" width="8">0x1F</field>

Here are some resources on OpenSBI.
https://www.thegoodpenguin.co.uk/blog/an-overview-of-opensbi/
https://riscv.org/wp-content/uploads/2019/12/Summit_bootflow.pdf

@pussuw
Copy link
Contributor

pussuw commented May 6, 2024

Easiest way is to just open up everything and not worry about PMP. PMP is used for inter-hart isolation, if you don't use AMP I would not worry about it.

Our own stripped down / limited SBI does it during boot:

  /* Open everything for PMP */

  WRITE_CSR(pmpaddr0, -1);
  WRITE_CSR(pmpcfg0, (PMPCFG_A_NAPOT | PMPCFG_R | PMPCFG_W | PMPCFG_X));

Don't know how this is done for HSS bootloader but there must be some way.

Adding one rule in the first pmpcfg register is enough, as the CPU goes through the PMP list in-order, and if a rule that matches the memory area is found, access is either immediately given or revoked, so it does not matter what the rest of the pmpcfg registers even contain.

Why not just leave PMP unconfigured ? Like I said, the access must be explicitly GRANTED for U-/S-modes, otherwise access is given only for M-mode. This is why the PMP configuration step is mandatory.

@MainframeReboot
Copy link
Author

I have spent another day debugging and although some small progress has been made in advancing when the kernel build crashes, I am still stuck on getting the build to run.

So far I have done the following:

  • Confirmed pmpcfg0 is set to 0x1F (NAPOT/R/W/X)
  • Confirmed pmpaddr0 is set to max (0xFFFFFFFF)
  • Confirmed that the satp register is being configured correctly to allow virtual addressing.
  • Attempted this build with the skip-opensbi flag in the hss-payload-generator set to both true and false while toggling the NuttSBI feature on and off.
  • Increased stack sizes.
  • Followed the code to the initial spot of failure, which is not related to any interrupts so apologies for the confusion in my last post.

It appears that in the default knsh config, CONFIG_RAM_VSTART is set to 0x0 and CONFIG_RAM_START is set to 0x80200000 (this matches the linker script). When the code enters the following function with the passed in paddr value set to 0x8020E000:

static inline uintptr_t riscv_pgvaddr(uintptr_t paddr)
{
  if (paddr >= CONFIG_ARCH_PGPOOL_PBASE && paddr < CONFIG_ARCH_PGPOOL_PEND)
    {
      return paddr - CONFIG_ARCH_PGPOOL_PBASE + CONFIG_ARCH_PGPOOL_VBASE;
    }
  else if (paddr >= CONFIG_RAM_START && paddr < CONFIG_RAM_END)
    {
      return paddr - CONFIG_RAM_START + CONFIG_RAM_VSTART;
    }

  return 0;
}

The returned value is 0xE000. This value is then subsequently passed to mmu_ln_getentry as the value for lnvaddr and is set as memory address for lntable. The DEBUGASSERT call succeeds but the following index of lntable immediately causes a crash:

uintptr_t mmu_ln_getentry(uint32_t ptlevel, uintptr_t lnvaddr,
                          uintptr_t vaddr)
{
  uintptr_t *lntable = (uintptr_t *)lnvaddr;
  uint32_t  index;

  DEBUGASSERT(ptlevel > 0 && ptlevel <= RV_MMU_PT_LEVELS);

  index = (vaddr >> RV_MMU_VADDR_SHIFT(ptlevel)) & RV_MMU_VPN_MASK

  return lntable[index]; // Crashes here.
}

Out of curiousity I changed CONFIG_RAM_VSTART to 0x80200000 so that lnvaddr is no longer 0xE000 as this does not seem right but this just causes issues further down the line (riscv_stack_color function to be exact) so I am assuming this is not correct.

Any ideas on what I could be doing wrong here given all of the things I have tried?

@pussuw
Copy link
Contributor

pussuw commented May 7, 2024

I have not tried the knsh config in many months, maybe there is some regression in upstream.

The mapping should be vaddr=paddr so CONFIG_RAM_START==CONFIG_RAM_VSTART==0x80200000. I think this has been the default behavior before, maybe something has changed that breaks this?

If setting CONFIG_RAM_VSTART explicitly in the defconfig fixes this, then it is a correct fix. I'm just wondering, where does this query for address 0x8020E000 come from ? As far as I can remember riscv_pgvaddr is used only in the MMU / address environment logic and those addresses should be in page pool memory.

Problems in riscv_stack_color could indicate a problem with user heap, as the process's stack is allocated from there.

I run a downstream mpfs target with kernel mode all the time so that should work just fine. Thus, it would make sense the icicle:knsh defconfig has some issues that cause your crashes. Or a regression in upstream we have not detected yet.

@MainframeReboot
Copy link
Author

I can provide some more information regarding the queries in question.

I started debugging from the function mpfs_start which contains a function call to mpfs_mm_init. This function itself performs two things:

  1. Sets up the Kernel mapping through a function call to mpfs_kernel_mappings. In this step, lnvaddr looks reasonable (0x8020D000).
  2. Enables the MMU through a call to the mmu_enable.

After step 2, nx_start is called, which runs until it hits the function kmm_map_initialize. Within this function is a call to up_addrenv_kmap_init and this is where the problems begin. In here, the value of paddr (0x8020E000) is obtained from g_kernel_pgt_pbase and the value of vaddr is obtained from CONFIG_ARCH_KMAP_VBASE (which is set to 0xBF000000 in the default config). paddr is then passed_ on to the function riscv_pgvaddr. Since this value is not >= CONFIG_ARCH_PGPOOL_PBASE (0x80400000) but >= to CONFIG_RAM_START (0x8020000), the value of 0xE000 is returned (until I explicitly set CONFIG_RAM_VSTART to 0x80200000).

Hopefully this provides more clarity into what is occurring in my system at the moment. I will attempt a few more things tomorrow, including going over the older versions of the knsh config to see if I can spot anything, as well as looking into my user heap to ensure it's being configured correctly.

Thanks again for your support through this debug.

@pussuw
Copy link
Contributor

pussuw commented May 8, 2024

Yes that makes sense, forgot I enabled kmap for icicle. Adding CONFIG_RAM_VSTART to the defconfig should fix this issue. What's causing the stack coloring issue depends on which stack is in question.

@MainframeReboot
Copy link
Author

I did some more debugging and have managed to boot NuttX in kernel mode, sort of.

After updating CONFIG_RAM_VSTART to 0x80200000, the initial crash went away but led to another crash that I tracked down to the function up_initial_state within riscv_initialstate.c.

Within this function, the stack_alloc_ptr field within the passed in stack control block is set to a value that is obtained from the array g_cpux_idlestack. This array is indexed using the value provided by a call to riscv_mhartid().

The value returned by riscv_mhartid() for my current build is 1. This makes sense as I am using HSS and building the NuttX payload to run on the U54_1 core, while HSS runs on the E54 (hart id 0).

What isn't good is that g_cpux_idlestack is defined within riscv_common_memorymap.h as EXTERN const uint8_t *g_cpux_idlestack[CONFIG_SMP_NCPUS];. I am not using SMP so in the config this value is not set and as such defaults to 1. Then, inside the function riscv_set_basestack, g_cpux_idlestack is populated using CONFIG_SMP_NCPUS:

static inline void riscv_set_basestack(uintptr_t base, uintptr_t size)
{
  unsigned int i;

  for (i = 0; i < CONFIG_SMP_NCPUS; i++)
    {
      g_cpux_idlestack[i] = (const uint8_t *)(base + size * i);
    }
}

So we can see a value for the idle stack is being created and set but then it's being accessed incorrectly within the up_initial_state function as it uses the hart id obtained from riscv_mhartid(), which in my build is 1 and there is nothing set there:

 if (tcb->pid == IDLE_PROCESS_ID)
    {
      tcb->stack_alloc_ptr = (void *)g_cpux_idlestack[riscv_mhartid()]; // hartid is 1 but only index 0 exists
      tcb->stack_base_ptr  = tcb->stack_alloc_ptr;
      tcb->adj_stack_size  = SMP_STACK_SIZE;

Is the idea that NuttX kernel must be used alongside SMP? Even so, in then scenarios where the HSS is used to boot say a non-NuttX application on cores 1&2 and NuttX on cores 3&4 will mean CONFIG_SMP_NCPUS would be set to 2, meaning g_cpux_idlestack would have entries in indices 0&1 but the riscv_mhartid() calls would return values of 3 and 4 respectively, so this fault would occur. Not to mention that anyone using the Icicle kit in its default flow of having HSS boot payloads would always run into this fault as the HSS runs on hart 0, so even if all 4 U54 cores were used and the array was filled with 4 values, the call to obtain the hartid will inevitably return a value of 4 and cause a fault.

For the purposes of my debug, I have indexed the array directly at 0 just to get NuttX to boot in kernel mode. It now does so, albeit I get an error when it attempts to load the ELF file /bin/init but I am assuming the default mpfs code doesn't mount a file system so I can work on getting that going now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants