word-wise dma copy #175

ricrpi · 2016-08-09T10:44:41Z

If DMA transfers are word aligned then a more efficient copy can be done. I have not seen a dma transfer that is not word aligned however I have not extensively checked all roms so have left the safer byte-wise implementation in.

Narann · 2016-08-09T14:40:43Z

src/pi/pi_controller.c

-    }
+	if (((dram_address | rom_address | longueur) & 3) != 0)
+	{	
+    	for(i = 0; i < longueur; ++i)


Can I suggest to add a tab in from of the for loop to keep code readable?

Narann · 2016-08-09T14:55:36Z

I have not extensively checked all roms so have left the safer byte-wise implementation in.

From the manual:

All DMA transactions in the Nintendo 64 must use 64 bit aligned for data in RDRAM. DMA transactions for data in ROM must use 16 bit aligned addresses. )

So, to keep code clean, I suggest to remove old code and assume DMA are always 16 bits aligned.

What do you think?

wareya · 2016-08-09T15:01:02Z

Depends on what the console actually does. Manuals are known to be "best practice for stability/speed" instead of "the hardware requires this".

I would keep the former behavior until someone pops in with a hardware test.

Narann · 2016-08-09T15:18:36Z

Thinking about it, you're surely right, if it's technically possible we can't assume it will not happen...

richard42 · 2016-08-16T05:29:19Z

I'm a little hesitant to merge this because it increases the code complexity a fair amount, for a performance gain on the raspberry pi / arm architectures. Do you have any quantitative data on the performance improvement from using this aligned memcpy in real world scenarios? If it's just moving a few hundred or thousand bytes per VI then it's probably not worth it. But if these loops are run for millions of iterations per VI then it would be.

ricrpi · 2016-08-16T06:37:36Z

The performance increase is 1 to 2 % faster (timing to frame 501), depending on ROM on a Pi B.
It isn't much of an improvement but given Zelda OOT intro is only running at ~70% full speed it is significant.

It would be interesting to know if any ROMS require byte-wise copy as if none do then the word-wise only implementation would be simpler than the current byte-wise copy.

loganmc10 · 2017-10-21T03:12:09Z

There is merit to this. RSP DMA transfers must be 8-byte aligned. The N64 ignores the lower 3 bits of the source and destination address. It also ignores the lower 3 bits of the length (or rather, it assumes the lower 3 bits are all 1's, then it adds 1).

As long as we properly mask the source and destination addresses, the transfers should always be 8-byte aligned.

@ricrpi I don't know if you're still around. I'm far less confident in the alignment of the PI DMA's, but I am very confident in the 8-byte alignment of the RSP DMA's. If you update this PR to just to 8-byte aligned RSP DMA's (with no fallback) I'd be willing to do a fair bit of testing on it

ricrpi · 2017-10-21T07:56:31Z

I am trying to keep up with all the commits/issues but have very limited time to do any coding at the moment :-(

I will try to take a look at this though as it shouldn't take too long to patch in.

word-wise dma copy

90350ae

Narann reviewed Aug 9, 2016
View reviewed changes

commented and fixed code-alignment

d07c727

Gillou68310 force-pushed the master branch from 9596180 to 0666982 Compare January 29, 2019 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

word-wise dma copy #175

word-wise dma copy #175

ricrpi commented Aug 9, 2016

Narann Aug 9, 2016

Narann commented Aug 9, 2016

wareya commented Aug 9, 2016

Narann commented Aug 9, 2016

richard42 commented Aug 16, 2016

ricrpi commented Aug 16, 2016

loganmc10 commented Oct 21, 2017

ricrpi commented Oct 21, 2017

word-wise dma copy #175

Are you sure you want to change the base?

word-wise dma copy #175

Conversation

ricrpi commented Aug 9, 2016

Narann Aug 9, 2016

Choose a reason for hiding this comment

Narann commented Aug 9, 2016

wareya commented Aug 9, 2016

Narann commented Aug 9, 2016

richard42 commented Aug 16, 2016

ricrpi commented Aug 16, 2016

loganmc10 commented Oct 21, 2017

ricrpi commented Oct 21, 2017