RC v1.2.0 #10
base: master
Are you sure you want to change the base?
RC v1.2.0 #10
Conversation
### Added - Add API to access and control the HW cycle counters on the HERO accelerator-side: - `hero_rt_start_cycle_cnt()` - `hero_rt_reset_cycle_cnt()` - `hero_rt_stop_cycle_cnt()` - `hero_rt_get_cycles()` - Add `testsuite` for CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alessandrocapotondi for creating this PR. I suggest the following two things:
- The proposed fix for Job IDs are dropped when crossing page boundaries #8 should be tested on the board.
- The target library should also provide a timer implementation for the host side. This can for example use Linux system functions internally.
hero_rt_get_cycles() | ||
{ | ||
return 0x0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline
void | ||
hero_rt_start_cycle_cnt() | ||
{ | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be a corresponding timer implementation for the host side as well? I think this would be useful when doing profiling and comparing host vs. accelerator execution. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PULP timer returns the number passed of clock cycles. How would you implement this on the host while staying consistent with the unit (clock cycles)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two options:
-
You are using functions like
clock_gettime()
on the host which gives you the time in seconds and nanoseconds. You can then convert this into accelerator clock cycles or host clock cycles depending on what comparison you want to do. The frequencies of these two clocks you can get from thepulp
struct defined and set up by libpulp (see hero-support), namelypulp->pulp_clk_freq_mhz
andpulp->host_clk_freq_mhz
. Alternatively, you can also get the host clock frequency from sysfs. -
Despite executing on the host, you can still use the timer inside the accelerator directly. How to do this is illustrated here:
https://github.com/pulp-platform/hero-support/blob/6762d055953089eea215c60a58ad3a62c6b717e2/libpulp/src/pulp.c#L503
You just need to be careful that the accelerator is not manipulating the timer simultaneously.
Best regards,
Pirmin
// just wait for the last one... | ||
dma = (hero_dma_job_t)plp_dma_memcpy_priv(ext_addr,loc_addr,size_tmp,ext2loc); | ||
//dma_job = (hero_dma_job_t)plp_dma_memcpy_priv(ext_addr,loc_addr,size_tmp,ext2loc); | ||
dma_cmd = plp_dma_getCmd(ext2loc, size, PLP_DMA_1D, PLP_DMA_TRIG_EVT, PLP_DMA_NO_TRIG_IRQ, PLP_DMA_PRIV); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will test it as soon as my other DMA and compiler issues are resolved. This way I won't have to create a custom test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new hero_dma_memcpy_async
works great, it's even faster than the old implementation by 0.5 - 1.0 bytes per cycle, depending on transfer size and whether DRAM is involved or not. Just the typo in line 62 needs to be fixed.
@@ -58,7 +58,8 @@ hero_dma_memcpy_async(void *dst, void *src, int size) | |||
int ext2loc; | |||
unsigned int ext_addr_tmp, ext_addr, loc_addr; | |||
int size_tmp = size; | |||
hero_dma_job_t dma = 0; | |||
hero_dma_job_t dma_job = plp_dma_counter_alloc(); | |||
uin32_t dma_cmd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: uin32_t
=> uint32_t
Changelog: v1.2.0 - 2019-08-02
Fixed
hero_dma_memcpy_async
API. In case of big memory transfers, some DMA job were leacked, resulting on the termination of DMA channels available.Changed