Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pios_usart on F4: DMA send/receive implementation (WIP) #2032

Open
wants to merge 6 commits into
base: next
Choose a base branch
from

Conversation

glowtape
Copy link
Member

Current implementation switches an USART to DMA RX and/or TX, if there's an respective configuration in board_hw_defs. (Should probably add a bitfield to HwShared to knock out DMA on certain USARTs purpose.)

Test on my Revo clone, which idles at 26%, enabling 250hz fullbore logging, without DMA, it jumps to 63%, however with DMA, just to 34%. (DMA buffer size is currently configured to fixed 64 bytes.)

Using Crossfire on my Seppuku, CPU usage drops by 1.5-2% from 58% (with small dips to 57%) to 56%.

Very likely can't enable all USARTs with DMA, due to eventual stream conflicts with DShot and/or WS2811 and/or Seppuku OSD. Haven't played extensive DMA chart bingo yet. On Seppuku for instance, DMA RX on the dedicated receiver port clashes with DAC.

Can't operate multiple channels per stream without micromanaging everything, i.e. reinitializing DMA stream configuration for each transfer, and scheduling them to avoid collisions, which would require some managing infrastructure.

Not really an issue, tho, since current best use for this would just be OpenLager, from what I can tell. So we'd need theoretically just need a single USART configured for DMA TX. And document it somewhere so that users pick the right pin.

Posted for appraisal.

@tracernz tracernz added this to the Post-Wired milestone Jan 19, 2018
} else {
/* No bytes to send, disable TXE interrupt */
USART_ITConfig(usart_dev->cfg->regs, USART_IT_TXE, DISABLE);
}
}
}

static uintptr_t PIOS_USART_GetDevice(const struct pios_usart_cfg *cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the entire PR except this :/

Copy link
Member Author

@glowtape glowtape May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need access to some data in the dev handle. The alternatives to ditching this one I came up with would be:

  • Removing the static qualifier of the PIOS_USART_x_id fields and then extern'ing it in the handler over in board_hw_defs, to pass it as parameter.
  • Implement both handlers per USART over in pios_usart and call them from board_hw_defs, similar to the regular IRQ handlers, but unassigned.
  • Move stuff into the config struct (icky).

The current way is more or less the inverse of this:
https://github.com/d-ronin/dRonin/blob/next/flight/PiOS/STM32F4xx/pios_usart.c#L196

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna do per-USART handlers. Hides everything from board_hw_defs.


// FIXME XXX Clear / reset uart here - sends NUL char else

/* Check DMA transmit configuration and prepare DMA. */
if (cfg->dma_send) {
usart_dev->dma_send_buffer = (uint32_t)PIOS_malloc(PIOS_USART_DMA_SEND_BUFFER_SZ);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no sane zero-copy / way to use the PIOS_Com buffers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably do something with circ_queue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, circ_queue was written with DMA in mind (though I can't promise perfectly so, as I've not used it this way yet).

You can get a bunch of the buffer space, for receive, and fill it and then circ_queue_advance_write_multi. Similarly you can get a block of read data but keep it around until released with circ_queue_read_completed_multi.

The big caveats are: A) You don't want to keep too much of the transmit data around after it's been sent, clogging up the buffer... so you don't want to say, DMA 300 bytes and then only release them from the buffer all at once when the DMA is completed. B) When's the right time to make data available to receive? You really need per-byte interrupts still in case someone is waiting for the last byte, which means the interrupt loading isn't improved much on receive.

Might be worth skipping receive DMA even, for this reason.

Copy link
Member Author

@glowtape glowtape May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Receive DMA would be interesting to axe most of the Crossfire interrupts. :P

Or any interrupts from serial RX. Saves 1.5% CPU on my quad with CRSF, going from interrupts to DMA.

Regarding circ_queue, I'll probably have to introduce two new callbacks for this. The current ones just copy things into a buffer you need to supply.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as receive goes, it uses idle detection available on the F4. So when receive goes idle, even after the last byte, the USARTs IRQ handler stops DMA, which triggers the DMA handler and empties whatever it got into PIOS_Com. Might be a bit heavy handed for straddling bytes, but generally it improves things, see the 1.5% CPU savings mentioned earlier.

I'll probably add a way to hint that we don't want DMA, so depending on the device type, PIOS HAL can request to not use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah. Maybe don't even stop DMA. Get a chunk of circ_queue to receive into, and if you get an interrupt, hand over some of the bytes (this should be cheap, so even if there's brief idle intervals it's not to bad). When DMA ends (because buffer is exhausted) allocate a new chunk from circqueue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Hand over bytes either when DMA ends OR idle).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah shit this may be race-y. Because we could be idle but the last character may not have been DMA'd yet.

@glowtape glowtape modified the milestones: Inconceivable, Ludicrous Apr 10, 2018
@glowtape
Copy link
Member Author

glowtape commented May 1, 2018

Rebased, changed the handler stuff to axe _GetDevice(), added a flag field to pios_usart_params to supply DMA hints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants