Skip to content
This repository has been archived by the owner on Jan 6, 2022. It is now read-only.

RC v1.2.0 #10

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

RC v1.2.0 #10

wants to merge 2 commits into from

Conversation

alessandrocapotondi
Copy link
Contributor

Changelog: v1.2.0 - 2019-08-02

Fixed

  • Fix #8. Fixed hero_dma_memcpy_async API. In case of big memory transfers, some DMA job were leacked, resulting on the termination of DMA channels available.

Changed

  • Added API to access HW cycles counters.

 ### Added
 - Add API to access and control the HW cycle counters on the HERO accelerator-side:
 	- `hero_rt_start_cycle_cnt()`
	- `hero_rt_reset_cycle_cnt()`
	- `hero_rt_stop_cycle_cnt()`
	- `hero_rt_get_cycles()`
 - Add `testsuite` for CI.
 ## Changelog: v1.2.0 - 2019-08-02

 ### Fixed
 - Fix [#8](#8). Fixed `hero_dma_memcpy_async` API. In case of big memory transfers, some DMA job were leacked, resulting on the termination of DMA channels available.

 ### Changed
 - Added API to access HW cycles counters.
@alessandrocapotondi alessandrocapotondi changed the title Rel v1.2.0 RC v1.2.0 Aug 2, 2019
Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alessandrocapotondi for creating this PR. I suggest the following two things:

  1. The proposed fix for Job IDs are dropped when crossing page boundaries #8 should be tested on the board.
  2. The target library should also provide a timer implementation for the host side. This can for example use Linux system functions internally.

hero_rt_get_cycles()
{
return 0x0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline

void
hero_rt_start_cycle_cnt()
{
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a corresponding timer implementation for the host side as well? I think this would be useful when doing profiling and comparing host vs. accelerator execution. What do you think?

Copy link

@ctbur ctbur Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PULP timer returns the number passed of clock cycles. How would you implement this on the host while staying consistent with the unit (clock cycles)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two options:

  1. You are using functions like clock_gettime() on the host which gives you the time in seconds and nanoseconds. You can then convert this into accelerator clock cycles or host clock cycles depending on what comparison you want to do. The frequencies of these two clocks you can get from the pulp struct defined and set up by libpulp (see hero-support), namely pulp->pulp_clk_freq_mhz and pulp->host_clk_freq_mhz. Alternatively, you can also get the host clock frequency from sysfs.

  2. Despite executing on the host, you can still use the timer inside the accelerator directly. How to do this is illustrated here:
    https://github.com/pulp-platform/hero-support/blob/6762d055953089eea215c60a58ad3a62c6b717e2/libpulp/src/pulp.c#L503
    You just need to be careful that the accelerator is not manipulating the timer simultaneously.

Best regards,
Pirmin

// just wait for the last one...
dma = (hero_dma_job_t)plp_dma_memcpy_priv(ext_addr,loc_addr,size_tmp,ext2loc);
//dma_job = (hero_dma_job_t)plp_dma_memcpy_priv(ext_addr,loc_addr,size_tmp,ext2loc);
dma_cmd = plp_dma_getCmd(ext2loc, size, PLP_DMA_1D, PLP_DMA_TRIG_EVT, PLP_DMA_NO_TRIG_IRQ, PLP_DMA_PRIV);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this change is sufficient to fix the issue described in #8 . This should definitely be tested. @ctbur could you have a look please?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will test it as soon as my other DMA and compiler issues are resolved. This way I won't have to create a custom test case.

Copy link

@ctbur ctbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new hero_dma_memcpy_async works great, it's even faster than the old implementation by 0.5 - 1.0 bytes per cycle, depending on transfer size and whether DRAM is involved or not. Just the typo in line 62 needs to be fixed.

@@ -58,7 +58,8 @@ hero_dma_memcpy_async(void *dst, void *src, int size)
int ext2loc;
unsigned int ext_addr_tmp, ext_addr, loc_addr;
int size_tmp = size;
hero_dma_job_t dma = 0;
hero_dma_job_t dma_job = plp_dma_counter_alloc();
uin32_t dma_cmd;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: uin32_t => uint32_t

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants