Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredevice: After DMA Handle is being terminated during playback on EFC, RTIO and other driver malfunction #2214

Open
linuswck opened this issue Sep 22, 2023 · 4 comments

Comments

@linuswck
Copy link
Contributor

Bug Report

One-Line Summary

Delay and RTIO related drivers that are outside of a self.core_dma.record() malfunction after DMA playback is being terminated in the middle of execution on EFC.

Issue Details

This bug is discovered when developing the example code for shuttler #2193, which can be modified to reproduce the bug.
This bug exists on both DMA and DDMA. I do not know if it is an EFC exclusive bugs.

After self.core_dma.playback_handle is being terminated during execution, at least the following drivers malfunction when it is not placed inside a self.core_dma.record(),

  1. rtio_output()
  • The related Gateware does not respond to rtio_output value
  1. rtio_input_data()
  • Always return 0
  1. delay() and delay_mu()
  • All the delays are skipped

But, self.core.reset() looks to be functional by observing the logs on both kasli and EFC and all the previously mentioned drivers still work inside a self.core_dma.record().

Steps to Reproduce

  1. Add while true loop for the self.core_dma.playback_handle(example_waveform_handle) statement
  2. Add the following lines behind self.led() as a test for writing and reading rtio channel
self.pdq_config.set_offset(0, 0b1111)
delay(.1*s)
print(self.pdq_config.get_offset(0))
delay(.1*s)
  1. Power on the board and wait for a successful connection between Kasli and EFC
  2. Run the coredevice example code and wait for it to enter the infinite loop
  3. Ctrl+C to terminate the program
  4. Rerun the coredevice example code and the bugs occur

To show that self.core_dma.playback_handle() is still functional, you can comment out self.init() line.

Expected Behavior

  • LED blinks
  • Offset is written and read correctly
  • DAC is calibrated by ADC successfully
  • Reset the relay to be all off
  • Reset all pdq output channels
  • Output the example waveform indefinitely

Actual (undesired) Behavior

You will not see LED blinks and relay got reset. All the delay does not exist and the program immediately stops at
assert self.afe_adc.read_id() >> 4 == 0x038d as it reads a zero value.
The printed offset value is incorrect and has a vlaue of 0.

Your System

  • ARTIQ version: ARTIQ8 Beta
  • Artiq Commit: 36b3678
  • Hardware involved: Kasli, EFC, Shuttler
@occheung
Copy link
Contributor

occheung commented Sep 22, 2023

The lack of RTIO output & proper input handling can be reproduced using a standalone variant.

from artiq.experiment import *


class Blinky(EnvExperiment):
    def build(self):
        self.setattr_device("core")
        self.led0 = self.get_device("led0")
        self.led1 = self.get_device("led1")
        self.ttlin = self.get_device("ttl0")
        self.setattr_device("core_dma")

        self.setattr_argument("led0_state", NumberValue(ndecimals=0, step=1))

    @kernel
    def record(self):
        with self.core_dma.record("blink"):
            self.led1.pulse(1*s)
            delay(1*s)
    
    @kernel
    def run(self):
        self.core.reset()
        self.led0.set_o(bool(self.led0_state))
        if self.led0_state:
            print(self.ttlin.sample_get())
            self.core.break_realtime()
        self.led1.off()
        self.record()

        blink_handle = self.core_dma.get_handle("blink")
        self.core.break_realtime()

        while True:
            self.core_dma.playback_handle(blink_handle)

Steps

  1. Run this code with led0_state=0. LED0 should be off while LED1 blinks.
  2. Ctrl-C
  3. Run this code again with led0_state=1.
    • LED0 is still off.
    • TTL input reports 0 (normally it should hang the kernel due to the lack of inputs).
    • LED1 blinks.

@occheung
Copy link
Contributor

occheung commented Sep 23, 2023

I think it is most likely a CRI routing issue.

When DMA starts playback, the CRI listens to the DMA instead of the kernel, until the playback is completed.

csr::cri_con::selected_write(1);
csr::rtio_dma::enable_write(1);
#[cfg(has_drtio)]
if _uses_ddma {
send(&DmaStartRemoteRequest { id: ptr as i32, timestamp: timestamp });
}
while csr::rtio_dma::enable_read() != 0 {}
csr::cri_con::selected_write(0);

Given we are performing playback using an infinite loop, terminating the kernel via CTRL-C is very likely to not execute L415 in time, which leaves the CRI routing RTIO instructions from the DMA.

Adding L415 before executing the new kernel fixes the issue in my example (it does not use DDMA).

@linuswck
Copy link
Contributor Author

I found that using the scheduler to run the experiment in the following way can also recreate this buggy behavior.

Steps to Reproduce:

  1. Add self.setattr_device("scheduler") to def build()
  2. Add while not(self.scheduler.check_termination()): for the self.core_dma.playback_handle(example_waveform_handle) statement
  3. Run artiq_master
  4. Call the artiq_client submit to submit the same experiment twice and the two experiments should have RID 0 and RID 1 respectively.
  5. Terminate RID 0 with artiq_client delete 0.

The RID0 experiment should be terminated as usual but same buggy behavior existed on RID1 experiment.
(Same as described in the PR description)

@Spaqin
Copy link
Collaborator

Spaqin commented Sep 25, 2023

In that case, should core.reset() also reset the cri_con back to 0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants