Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dsp hangs RedPitaya (caused by zeroing the global timer counter) #11

Open
carandraug opened this issue May 23, 2019 · 6 comments · May be fixed by #15
Open

dsp hangs RedPitaya (caused by zeroing the global timer counter) #11

carandraug opened this issue May 23, 2019 · 6 comments · May be fixed by #15

Comments

@carandraug
Copy link
Member

The current dsp code hangs the RedPitaya. The simplest way to reproduce this is to use an empty action table.

$ touch empty-action-table
$ ./dsp empty-action-table
hello, w
hello, world!
actiontable is 0 lines long
alloc action table
read action table file.
## output stops here, press enter to flush
exec action table
faffing with actiontables
set time
exec action table done.
## At this point, system becomes irresponsive. We can no longer ssh and need to poweroff

I have reproduced this on RedPitaya OS versions 0.94 (oldest we have), 0.95, and 0.97. I have tried Tom's code on the master branch (version after his summer project) and on tom-december-changes (with his work during summer). I have reduced this to the minimal example of setting the global timer counter to zero, with this code:

// build with:
// gcc -Wall -g -O2 -c test.c -o test.o
// gcc -Wall -g -O2 -o test test.o

#include <stdint.h>
#include <stdio.h>

#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <errno.h>

#define XPS_SCU_PERIPH_BASE		0xF8F00000U
#define XPAR_GLOBAL_TMR_BASEADDR	(XPS_SCU_PERIPH_BASE + 0x00000200U)

#define GLOBAL_TMR_BASEADDR               (XPAR_GLOBAL_TMR_BASEADDR-0x200U)
#define GTIMER_COUNTER_LOWER_OFFSET       (0x00U+0x200U)
#define GTIMER_COUNTER_UPPER_OFFSET       (0x04U+0x200U)
#define GTIMER_CONTROL_OFFSET             (0x08U+0x200U)

#define PAGE_SIZE ((size_t)getpagesize())
#define PAGE_MASK ((uint64_t)(long)~(PAGE_SIZE - 1))

int
main(int argc, char *argv[])
{
  int TIMER_FD = open("/dev/mem", O_RDWR|O_SYNC);
  if (TIMER_FD < 0) {
      fprintf(stderr, "open(/dev/mem) failed (%d)\n", errno);
      return 1;
  }

  volatile uint8_t* TIMER_MMAP;
  TIMER_MMAP = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
                    TIMER_FD, GLOBAL_TMR_BASEADDR);
  if (TIMER_MMAP == MAP_FAILED) {
      fprintf(stderr, "mmap64(0x%x@0x%x) failed (%d)\n",
              PAGE_SIZE, (uint32_t)(XPAR_GLOBAL_TMR_BASEADDR), errno);
      return 1;
  }

  // Disable Global Timer
  *(volatile uint32_t *)(TIMER_MMAP+GTIMER_CONTROL_OFFSET) = 0x00;

  // Set Global Timer Counter Register to zero
  // Comment out this lines and the system no longer hangs.
  *(volatile uint32_t *)(TIMER_MMAP+GTIMER_COUNTER_LOWER_OFFSET) = (uint32_t)0;
  *(volatile uint32_t *)(TIMER_MMAP+GTIMER_COUNTER_UPPER_OFFSET) = (uint32_t)0;

  // Re-enable Global Timer
  *(volatile uint32_t *)(TIMER_MMAP+GTIMER_CONTROL_OFFSET) = (uint32_t)0x1;
  return 0;
}

Tiago has a version of the dsp program on his redpitaya branch which works but his version never zeros the register counter. His version reads the initial counter value and computes the difference from it.

I have asked Tiago who remembers seeing the same behaviour I do now.

But I guess this must have worked at some point somehow.

@tomparks can you shed some light on this?

@carandraug
Copy link
Member Author

For what is worth, I don't see anything wrong the code. I have also checked the ARM Cortex-A9 MPCore Technical Reference Manual for the instructions on the global timer (sections 4.4.1 and 4.4.2) and seems like we are doing exactly what they document we should.

@coralmw
Copy link
Contributor

coralmw commented Jun 3, 2019

I had a go at reproducing the issue in QEMU, no hang but this is on xilinx upstream 2014.4. gist here. 2014.4 was used as it is the most recent version to match the file structure assumed by the xlinx wiki QEMU guide, by having broken out kernel, devicetree and rootfs images.

The copy of the code I was working on had a git checkout of the https://github.com/RedPitaya/RedPitaya at tag:0.92 in it, if that helps the bisect process. I'm pretty sure it didn't crash back then!

@carandraug
Copy link
Member Author

The copy of the code I was working on had a git checkout of the https://github.com/RedPitaya/RedPitaya at tag:0.92 in it, if that helps the bisect process.

Were you building the system from source back then? The thing is that I can only find images for their OS since 0.94 (see redpitaya downloads which although does have a 0.92 directory it only includes the ecosystem). I have tried to build 0.92 from source but that needs gcc 4 or older.

@carandraug
Copy link
Member Author

I have tried to use web.archive to try and get the RedPitaya old builds. Seems like back then they were using dropbox to distribute the builds which web archive did not archive. See archive from February 2015. The next archival is from March 2016 which is when new builds required an account and they introduced the download servers which only includes the ecosystem for older releases

@mickp
Copy link
Member

mickp commented Sep 25, 2019

I started seeing what timer adjustments we could get away with, and it looks like (at least) the SD card interface is using the global timer:

initTimer: 0
GetTime: 261274587014
GetTime: 261274595081
Add 5 seconds
SetTime 262941261747
GetTime: 262941261777
Subtract 1mmc0: Timeout waiting for hardware interrupt.
sdhci: =========== REGISTER DUMP (mmc0)===========
sdhci: Sys addr: 0x1bdc0000 | Version:  0x00008901
sdhci: Blk size: 0x00007200 | Blk cnt:  0x00000000
sdhci: Argument: 0x0033e0e0 | Trn mode: 0x00000027
sdhci: Present:  0x01ff0000 | Host ctl: 0x00000003
sdhci: Power:    0x0000000f | Blk gap:  0x00000000
sdhci: Wake-up:  0x00000000 | Clock:    0x00000207
sdhci: Timeout:  0x0000000e | Int stat: 0x00000002
sdhci: Int enab: 0x02ff008b | Sig enab: 0x02ff008b
sdhci: AC12 err: 0x00000000 | Slot int: 0x00000001
sdhci: Caps:     0x69ec0080 | Caps_1:   0x00000000
sdhci: Cmd:      0x0000193a | Max curr: 0x00000001
sdhci: Host ctl2: 0x00000000
sdhci: ===========================================

I think that means we can't mess with it safely. Alternatives:

  • Execute the ActionTable in chunks, so that each chunk will end before the next timer overflow.
  • See if we can use the private timer on the core we're executing on.

@mickp
Copy link
Member

mickp commented Sep 25, 2019

We won't ever need to reset the global timer in our lifetimes. The global timer counter is a 64-bit value (as a pair of adjacent 32-bit registers), incremented every 3ns, so the rollover time is around 1755 years.

From RedPitaya's usleep.c:

/* Global Timer is always clocked at half of the CPU frequency */
#define COUNTS_PER_USECOND  (XPAR_CPU_CORTEXA9_CORE_CLOCK_FREQ_HZ / (2*1000000))

From xparameters.h

/* Definitions for peripheral PS7_CORTEXA9_1 */
#define XPAR_PS7_CORTEXA9_1_CPU_CLK_FREQ_HZ 666666687

Confirmed on board RP-F0708F: the counter is incremented every 3ns.

@mickp mickp linked a pull request Sep 25, 2019 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants