New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I2C clock-stretching bug #4884
Comments
Your "Steps to reproduce the behaviour" does not include the step "find some hardware that uses I2C clock stretching. We're not about to go and spend significant money on a UDC9081 EVM kit, so something else is required. A recent kernel commit (available in 5.10.95 via |
This issue was very well described a few years ago #254 on the old SoC BCM2835, but the description is still exactly the same as the new SoC BCM2711 used on the RPi4 which claimed to have implemented the fix. The suggestions made by @rewolff to fix through the driver would be a great start. Although not eradicating at least it would make possible the use of hardware i2c by slaves that use stretch clock. The recommended solution for devices that definitely don't work with I2C hardware is to use the i2c-gpio overlay, which bitbashs the pins. But for many devices like the Texas UCD9081 this is not possible because it implements transition timeout and we know that with bitbang there can be big stops depending on what the MCU needs to process. People we are talking about a simple I2C and the issue is very clear. @rewolff are you test de BCM2711? |
This may well be me misremembering, but did all the I2C controllers get the fix on BCM2711, or only the new ones? (i2c3-6) |
That's not a detail I've picked up on, but it sounds like the kind of thing Broadcom would do. |
Guys, If you need testing-hardware, let me know and I'll ship you a module. You're not having to spend a cent on that. I'll sponsor "quick testing by those who can do that". Just get in contact and get me an addres. Or find my shop, pick out a module (that doesn't name a chip), and at the checkout say: "bank payment" and include the remark that I promised you a free one here. I'm a bit swamped to find time to test this. Sorry. |
I've just checked with @P33M as the person with probably the most knowledge about this (hopefully he'll correct me if I report this wrong). Recollection is that only the new controllers are fixed in case it broke something else. In each case there is a pin muxing option to map a new controller onto the pins of BSC0 or BSC1. (BSC6 in the case of GPIOs 0&1 alt5, or BSC3 for GPIOs 2&3 alt5). It's going on memory, but clock stretching is only supported in the address ACK phase, not on all bytes, and the stretch has to be for at least 1/2 an I2C clock cycle. |
OK then. My protocol specifies that after the address phase, you specify the register number and at that moment, I'll have to go out and gather the data that needs to be sent. I have some "abstract" registers that have 8 bits that come from different places. That might take a bit more than half an i2c clock cycle. So: BCM i2C is still not reasonably compliant with the protocol, I'll continue to recommend using SPI then. |
Thanks for the contribution, but as I tried to make it clear any chip that uses stretch and respects the I2C protocol must present the bug, however I believe it is always important to describe and specify exactly how to reproduce and that includes citing the hardware used. As for the kernel update indicated where there were changes regarding the stretch timeout, which is clearly not the problem as the stretch is not happening for a very long time, it is actually quite short. Anyway I followed your suggestions and as I predicted unfortunately the bug is still present. |
Thank you for your willingness! I am available to test and I have everything needed (hardware and knowledge) to run any test. I've already invested a lot of time in this problem and I have no doubt that there is a serious bug in the system, whether in the BCM2711 or drive. I'm taking a close look at the drive to rule out errors in this one or even perform a workaround as you initially suggested. It would be very opportune to have someone responsible at the Raspberry Foundation to accompany and assist us. |
What would this new controller be? Wouldn't it be the BCM2711 you're quoting? Do you have more than one version of BCM2711? How do I know if mine is new? It would be extremely important to confirm this information, this problem has been going on for many years. If in fact it was implemented only in the address phase, stretch is not fully implemented and I understand that there is not much use in such an implementation. Clock stretching pauses a transaction by holding the SCL line LOW after acknowloedgment. The transaction cannot continue until the line is released HIGH again. On the byte level, a device may be able to receive bytes of data at a fast rate, but needs more time to store a received byte or prepare another byte to be transmitted. Targets can then hold the SCL line LOW after reception and acknowledgment of a byte to force the controller into a wait state until the target is ready for the next byte transfer in a type of handshake procedure. |
AIUI, these are new I2C controllers on the 2711, above what is present on earlier SoC's. The older controllers with the faults are still there, but the additional ones have the fixes as specified. The older ones were not fixed, as there was a worry that the fix may introduce other bugs, and leaving things worse. You can mux the new controllers on to the same pins as the old ones. |
@kaedros it is not "after ack" it is inside the bit-cycle meant for "ack" that the clock stretching usually happens. The slave can clock stretch to "buy time" to decide weather or not to ACK the byte..... This made me think. If someone misunderstood the "only in the ack" to mean "only in addresss-ack" then maybe it's usable: There is an ACK after every byte, and that's where my hardware needs the time to process the data. My offer for hardware still stands. Just go to https://bitwizard.nl/shop and chose a module that you might like (e.g. dual relay https://bitwizard.nl/shop/Relay ) and select bank transfer. Then add the remark who you are on github and that I promised you free hardware. |
I understand. Can you share with everyone how this new driver can be muxed on the same pins? |
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/arch/arm/boot/dts/overlays/README#L1806
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/arch/arm/boot/dts/overlays/README#L1833
So
and then use /dev/i2c-3 and /dev/i2c-6. |
Thanks for clarification. So the bcm2711 dtsi file needs to be fixed: But this won't have any affect on this issue. |
I wasn't even aware that that flag existed! I'll try to get further confirmation that BSC0 & BSC1 haven't been updated. |
The I2C clock is generated by subdividing the core clock. When the core clock changes, so does the I2C clock. You can stop the core clock changing by any of the following means:
|
That is precisely the failure mode: Usually nothing happens, but occasionally, the timing is such that the BCM sees the SCL signal high and then proceeds with the next phase in the statemachine: pull the clock low again. As long as the slave sees this pulse, things still work. My AVR-slave will tolerate pulses as short as two clocks: 250ns is enough, even though officially it should be 5000ns. The I2C module runs a statemachine on a clock divided from the core clock. The trick would be to make the SCL signal an "count enable" for that divider in the situations where clock stretching is allowed. |
This thread helped me a lot, so I'll try to summarize my findings here as well. Unfortunately I don't have access to neither a logic analyzer nor an oscilloscope. I try to communicate with a BNO055 (apparently well known for wonky clock stretching) by using a Raspberry Pi 4B. In between sits an i2c mux based on the TCA9548a chip. I'm using the GPIO pins BCM 2 and BCM 3 to drive the i2c bus. I tried the methods suggested above, that is:
In all the situations above, I experienced readout errors from the BNO055. The frequency of the errors might have changed but they were always there. The errors that I see always manifests themselves as a flip of the msb in the returned data. Eg an expected The only way I have found to reliably read data from the BNO055 is to use the software implementation of the i2c protocol, that is, adding the line When using the software driver, the communication looks stable, the readout frequency that I see is actually higher than when using the hardware controller. I suspect that this depends on underlying timing errors that I don't see from userspace. When I say 'readout frequency', I mean 'how many times can I query the device per second from userspace' and not clock speed on the bus. The i2c-gpio driver will of course impose a much higher cpu load but it does seem to work. If anyone have suggestions for how a reliable communication can be achieved using any of the hardware controllers on the RPi 4, I would be very happy. There are so many better things to do with your cpu cycles than banging out bits on the i2c bus... |
I found same problem with a BNO055 on ubuntu20.04. To change i2c clock frequency, open "/boot/firmware/syscfg.txt" and add "i2c_arm_baudrate=400000" like followings.
|
What do the signals look like as analogue voltages? |
Thank you for answer. When I measuring my SCL and SDA I2c signals by osciloscope, I see the same or simillar as on picture Rpi3 i2c aborted, but you think true, there are added some RF interference pulses, but their value (or size) is max. up to 20% of dital levels. When SCL-SDA pulses has a normal width then communication is always reliable. Only when RPi generate very narrow pulse (as on picture above) than abort (I/O write or read error) in i2c communication started. I can try to save some real data from osciloscope and put them here. |
I2C is a protocol that was designed a long time ago. So the "high" level is made with a resistor. Any capacitance on the bus will cause the signal to change "slowly" from low to high. The problem is that the i2c module in the broadcom CPUs is discrete in the time domain. the broadcom releases the clock signal and when there is no clock stretching it sees that it is HIGH on the next "decision moment" and proceeds with the next action from the protocol. When there is no clock stretching the clock has been hign for almost the full bit-time, so everybody on the bus has SEEN that the signal was high. When there IS clock stretching, the slave will keep the clock signal low after the master (broadcom) has released the signal. When then the slave happens to release the clock JUST before the broadcom is ready to check the signal, the broadcom can detect the clock signal as high even though it has only been high for a few nanoseconds. It will then proceed with pulling the clock low again for the next bus-period. Some/many slaves will not see such a short pulse as a real clock pulse and remain waiting for the next high from the master.... The solution in hardware is fairly simple. Just prevent the i2c pacing-timer from counting when the clock signal is LOW while we are not driving it. So if the peripheral clock is set to 16MHz and we're doing bog-standard 100kHz I2C, then the counter is set to count to 80 before passing on another clock pulse to the I2C perpiheral inside the broadcom. (remember: the peripheral needs to "do things" twice in each bit-time). In normal operation the bus capacitance will keep the clock signal low (as seen by the digital input) for say 120 ns after the master releases the signal and allows the pullup to do its work. So now there will be two cycles that the count-to-80 counter doesn't run and the effective clock rate will be about 1.2% slower. But when there is clock stretching and 4.7 microseconds after the master releases the clock in the old situation on count 77 the slave releases the clock signal, the broadcom would see the signal high and proceed with the next half-bittime stuff, clock low again. You could do this trick ONLY when clock stretching is allowed, resulting in about 0.12% slowdown. Or you could do this every time when the clock is supposed to be high. That would result in a 1.2% slowdown in my example case, but it WOULD allow you to increase the bus speed beyond what would otherwise be possible. |
I was going to say: "you can add a resistor in the SCL line to see who is pulling the CLK line low. That said... THe yellow line looks like a clk signal and the blue as a data (SDA) signal. The spike on the SDA line seems like an ACK from a slave. The different LOW voltage level is also a hint. When the clk goes low again for the next byte, the slave quickly releases the SDA line and the RPI slowly reasserts the SDA line low for the next bit. Near the right you do see a clock stretching event. Here the "who's pulling" is not as clear as on the SDA line. So adding a resistor might help. Aim for say.3V with the 1k pullup, so about 100 ohms. measure on the pullup side of the resistor and you'll see a nice zero when "this" side of the bus is driving the signal low and a 0.3V when the other side is driving it low. I don't think you cought the issue of this topic on the scope. |
First of all, SCL is in essence only driven by master, RPI. So suppose there is a "only write makes sense" device, say a DAC. then SCL is the clock, SDA is still bidirectional: The device has to ACK the "sending data to device #x" , and then the first byte if it needs more than one byte. It looks as that is happening, with the QN8066 being less strong to drive the line slow than the RPI. However... In the first transaction I see where the last "1" (of 0100001) is a bit iffy: It could/should have started on the "going low" flank of the SCL line, but starts a bit laster, only JUST before the clk going high flank. Then there seem to be two UP flanks of the clock line, where there should be only one. Then there is the 0: WRITE bit and then the ACK. The enlargment (second picture) looks perfect. 8 neat bits (7 address bits and WRITE) and then the ACK where the QN8066 is quicker to assert the SDA line for ACK than the RPI is to release it (allowing it to read the ACK) And then after the ACK is communicated, the QN8066 releases the SDA line (alllowing it to go high) and a short while later, the RPI reasserts it because it wants to transmit a 0 as the next bit. All this is perfect. And all this has nothing to do with the bug in the RPI that it will reassert the SCL line a VERY short while after the slave stops "clock stretching". So, let me explain clock stretching. Suppose I have a slow processor on the other side instead of a QN8066. It might need to decide: "Is he talking to me?" after the 7 address bits and the R/W bit. If that takes a few microseconds, no problem. The device can hold the clock low, so the "ACK" can be clocked in on the next "UP" SCL flank, after it has figured that out. (you're not supposed to ACK when the master addresses some other device!) Your clock seems to be set to about 27 microseconds cycletime. That's 37kHz. That's an odd value. Why are you using such an odd frequency ? Edit; On the more accurate expansion I'm measuring closer to 30 us. So your freqiency comes to between 33 and 38kHz. |
I compared measuring waveforms between "100% reliable communication" (my first picture from osciloscope with strong RF noise, it is generated by small China CPU) and all others pictures which I collect when I driving QN8066 by RPi3B+. It seems that it could be help make a + 2usec time shift of SDA signal synchronized to SCL. Than i2c signal will be similar as signal from China CPU, which is 100% reliable also under strong HF interference. Could you please advise me how can I realize it for to try make a test? Regarding SCL frequency I still have set in /boot/config.txt : |
You cannot change the timing of the SDA and SCL signals coming from RPI. So your only option would be to use software I2C. I'm told there is such a feature. |
Thank you for new informations.
Could you please check my ideas and if you found smth. incorrect or incomprehensible, please help me by your comment or question. Thank you very much! |
I really don't have much experience in "software I2C". I've never used it. "looks good to me". |
"The only way I have found to reliably read data from the BNO055 is to use the software implementation of the i2c protocol, that is, adding the line dtoverlay=i2c-gpio,i2c_gpio_sda=2,i2c_gpio_scl=3,i2c_gpio_delay_us=2,bus=3 to the /boot/config.txt" - rillbert This comment saved me countless hours. Getting unstable results with Rpi4 on bookworm trying to communicate with a Bno085. The chip kept random resetting itself after a few hundred cycles. Now communicating on /dev/i2c-3 without problems. Thank you. |
Even trying to disable hw I2C on its bus 1 in /boot/config.txt ( |
Could I ask you please to try release GPIO 2,3 before you load dtoverlay?
save /boot/config.txt than reboot and test i2c communication. |
I did all the tests again (Raspberry Pi 4 Model B Rev 1.1 interfacing MH-Tiny ATTINY88). I also added Notice that the /dev/i2c-1 device driver is missing, as not configured:
The load&stress test (loop of different I2C synchronous requests/responses) fails rather quickly, while no error is found when using other I2C pins. In addition, I verified that the conflict is on SCL (GPIO3, pin 5). So, this works:
|
Just wanted to add: If you are using i2c-gpio (software), make sure the corresponding module is loaded otherwise it won't show under /dev/i2c-
|
I have been busy and will look into this later this week and will get back
to you. Sorry for the delay.
…On Tue, Nov 21, 2023 at 4:09 AM Wieland Schopohl ***@***.***> wrote:
Just wanted to add: If you are using i2c-gpio (software), make sure the
corresponding module is loaded otherwise it won't show under /dev/i2c-
modprobe i2c-gpio
—
Reply to this email directly, view it on GitHub
<#4884 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJCYQEINLIVEQWOHWQBOU3YFSKYDAVCNFSM5OBZAPXKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBSGA3TSOJVHE3Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Don't worry. There is actually not much to do. Just wanted to add this
little info about the kernel module for software i2c an raspberry!
Am Di., 21. Nov. 2023 um 21:22 Uhr schrieb Steve Troxel <
***@***.***>:
… I have been busy and will look into this later this week and will get back
to you. Sorry for the delay.
On Tue, Nov 21, 2023 at 4:09 AM Wieland Schopohl ***@***.***>
wrote:
> Just wanted to add: If you are using i2c-gpio (software), make sure the
> corresponding module is loaded otherwise it won't show under /dev/i2c-
>
> modprobe i2c-gpio
>
> —
> Reply to this email directly, view it on GitHub
> <
#4884 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAJCYQEINLIVEQWOHWQBOU3YFSKYDAVCNFSM5OBZAPXKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBSGA3TSOJVHE3Q>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#4884 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANKESDUOJL63XLENZ6G2CCTYFUEPPAVCNFSM5OBZAPXKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBSGE3DEMRVHA2A>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks for your feedback. :-) |
Do we know if RPi5 still has this issue? |
Pi 5 uses RP1 which has a completely different I2C design from Synopsys that supports clock stretching correctly. |
Describe the bug
I'm having intermittent problems with the RPi4 when it performs I2C reads on devices (like UCD9081 texas) that use clock stretch.
This apparently affects the Raspberry Pi and was documented in a lot of detail, back in 2013 by someone at Advamation. Here is that post.
Also have an open issue #3064 for RPi3 which apparently has not been resolved. Everything indicates that the SoCs (BCM2835, BCM2836 e BCM2837) do not have hardware support for clock stretch, but some interventions were made to stabilize the software i2c.
Although I found evidence on the datasheet BCM2711 on the chapter 3 that it had been fixed on the hardware RPi4, this is apparently not the case as it was a RPi4 I was getting the problem on.
By setting a baud rate of 10,000 the readings are more stable but still occur in smaller amounts.
dtparam=i2c_baudrate=10000
An alternative solution would be to use the software I2C
dtoverlay=i2c-gpio,bus=3
, however with the UCD9081 device this is not an option as it implements a maximum time for a byte transition and is something that cannot be ensure with software I2C that it can be interrupted for brief moments to process other interruptions.Steps to reproduce the behaviour
dtparam=i2c=on,i2c_baudrate=100000
python import smbus import time bus = smbus.SMBus(1) time.sleep(1) read = bus.read_i2c_block_data(0x65, 0x0, 16)
We can see that there was a stretch and soon after the RPi4 did not guarantee the adequate clock width, resulting in a chain error.
In the image below we can see the correct communication where the clock at the same moment is with its correct width, probably because there was no stretch request by the device.
Device (s)
Raspberry Pi 4 Mod. B
System
OS: Raspberry Pi reference 2021-10-30 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 88b21fc27e128ea6b330777aca68e0061ebf4fe, stage4
Firmware: Jan 20 2022 13:56:48 Copyright (c) 2012 Broadcom version bd88f66f8952d34e4e0613a85c7a6d3da49e13e2
Kernel: Linux JigaBrise 5.10.92-v7l+ #1514 SMP Mon Jan 17 17:38:03 GMT 2022 armv7l GNU/Linux
Soc: BCM2711 b03112
Logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: