I2C Broadcom bug workaround. #254

rewolff · 2013-03-20T17:05:25Z

The broadcom BCM2835 has a (hardware) bug in the I2C module.

The BCM implements "clock stretching" in a bad way:

Instead of ensuring a minimum "hightime" for the clock, the BCM lets the I2C clock internally run on, and when it's time to flip the clock again, it checks wether or not the clock has become high, and if it is low, it does a "clock stretch". However if just in the few nanoseconds before this check the slave releases the clock signal indicating it is ready for a new clock, the clock pulse may become as short as 40ns. (that's the lowest I'm able to measure, and that of course is quite rare). With a 100kHz (10 microseconds) I2C clock, the "bad" moments to release the clock are 5 microseconds apart.

My slaves require a clock pulse of at least 500ns. (which should be a factor of ten shorter than what the master should generate when it is configured for 100kHz operation).

That is a hardware bug that we can't do much about at the moment. ( Tip for the hardware guys: instead of implementing clock stretching in this way: inhibit the counting of the CLOCKDIV counter while the clock remains low. This will probably clock-stretch a few fast-clocks on every cycle, but standards conformance will improve substantially).

When this occurs, the BCM will send a byte, with a very short first clock pulse. The slave then doesn't see the first clock pulse, so when at the end of that byte it is time to ACK that byte, the slave is still waiting for the 8th bit. The BCM interprets this as a "NAK" and aborts the I2C transfer. However at that point in time the slave sees its 8th clock pulse and issues an "ACK" (Pulls the SDA line low). This in turn means that the BCM cannot issue a proper "START" condition for the next transfer.

Suggested software workaround to reduce the impact of this hardware bug: When a transfer is aborted due to a "NAK" for a byte, issue a single extra clock cycle before the STOP to synchronize the slave. This probably has to be done in software.

This only happens in the "error path" when the slave doesn't ACK a byte. Maybe it's best to only do this when it is NOT the first byte in a transfer, as the bug cannot happen on the first byte of a transfer.

Does this software fix belong in the driver? Yes. It's a driver for a piece of hardware that has a bug. It is the duty of the driver to make this buggy hardware work the best it can.

popcornmix · 2013-03-20T17:35:22Z

If a workaround in the driver makes it work better for the majority of users, then sure it should be included.
Do you have a suitable patch?

rewolff · 2013-03-20T23:26:32Z

No. Not yet.

I have not delved into the I2C driver yet. Around a year ago, I had plans to write it myself but before I got around to it, someone else had written it. So someone who already knows this driver might be better equipped to do this as a five minute project. It would take me at least several hours to get to know the driver.
It pays to first discuss it if you (and maybe others) agree that a fix/workaround is warranted.
I just wanted to document this as a "todo" we shouldn't forget it....
Maybe someone would suggest a different workaround.

Speaking about workarounds: The workaround could also be triggered by the SDA line still being low when we are trying to initiate a new transaction. That is clearly a "problem" situation.

The "it goes wrong" case might be rare for "normal" cases. Many people will have a hardware-i2c module on the I2C-bus. Back when I2C was invented "10us" might have been a "very short" timeframe for some chips to handle the requests, so clock stretching was necessary. Nowadays any hardware I2C implementation should be able to handle "fast I2C" up to even faster.

This combines with the way the atmel guys implemented their async modules. Instead of having the module run on the externally provided clock (i2c-slave or SPI-slave), the module still runs on the processor clock. And it synchronizes all incoming signals by passing them through a bunch of filpflops. Good hardware design. The alternative is IMHO better: run the module off the external clock, and synchronize when the data passes to the other (cpu-) clock domain. This would for example allow the SPI module to function at 20MHz, even though the CPU runs at only 8.

Anyway. Enough hardware rambling.

Suggested workaround: Issue an extra clock cycle when at the beginning of a transaction the SDA line is still low. (it should start out high to be able to issue a "start" condition).

popcornmix · 2013-03-21T17:43:00Z

Firstly, I know little about I2C. I have spoken to Gert and others about workarounds.

There are two problems with I2C known about:

The clock stretching problem you have described
Problem with I2C state machine when doing restart

For 1, Gert's view is there is no foolproof workaround.
If I2C slave doesn't support clock stretching you are fine.
If I2C clock stretches by a bounded amount, then lowering the I2C clock frequency can avoid the problem
Switching to a bit banging driver for slaves that don't fall in the previous cases is a safe workaround.

I don't know if your proposal will work. I'd be interested to hear if it reduces the problem, or fixes it completely.

Gert says he has a desciption of the bug that he'll try and get released.

For 2, in the GPU I2C driver we have some code working around a problem with state machine:

/***********************************************************
 * Name: i2c_actual_read
 *
 * Arguments:
 *       const I2C_PERIPH_SETUP_T *periph_setup,
         const uint32_t sub_address,
         const uint32_t data_to_read_in_bytes,
         void *data
 *
 * Description: Routine to actually transfer data to the I2C peripheral
 *
 * Returns: int == 0 is success, all other values are failures
 *
 ***********************************************************/
static int32_t i2c_actual_read(  const I2C_PERIPH_SETUP_T *periph_setup,
                                 const I2C_USER_T *user,
                                 const uint32_t sub_address,
                                 const uint32_t read_nbytes,
                                 void *data,
                                 const int using_interrupt)
{
   int32_t rc = -1; /* Fail by default */
   int32_t status = 0;
   int32_t nbytes;

   uint32_t sub;
   uint8_t* read_data = (uint8_t*)data;
   uint32_t data_read = 0;

   if( NULL != periph_setup && NULL != data ) {

      // confirm that the latch is held for this transaction
      assert( i2c_state.periph_latch[ periph_setup->port ] != rtos_latch_unlocked() );

      // -> START
      // -> Slave device address | WRITE
      // <- Ack
      // -> Sub address
      // <- Ack
      // -> Sub address
      // <- Ack
      // -> ReSTART
      // -> Slave device address | READ
      // <- Ack
      // <- Data[0]
      // -> Ack
      // <- Data[1]
      // -> nAck
      // -> STOP

      I2CC_x( periph_setup->port )     = I2CC_EN | I2CC_CLEAR; /* Enable I2C and clear FIFO */
      I2CS_x( periph_setup->port )     = I2CS_ERR | I2CS_DONE | I2CS_CLKT; /* clear ERROR and DONE bits */
      I2CA_x( periph_setup->port )     = periph_setup->device_address;

      sub = sub_address;
      nbytes = periph_setup->sub_address_size_in_bytes;

      if( 0 == nbytes ) {
         /* No subaddress to send - just issue the read */
      } else {
         /*
           See i2c.v: The I2C peripheral samples the values for rw_bit and xfer_count in the IDLE state if start is set.

           We want to generate a ReSTART not a STOP at the end of the TX phase. In order to do that we must
           ensure the state machine goes RACK1 -> RACK2 -> SRSTRT1 (not RACK1 -> RACK2 -> SSTOP1). 

           So, in the RACK2 state when (TX) xfer_count==0 we must therefore have already set, ready to be sampled:
           READ ; rw_bit     <= I2CC bit 0   -- must be "read"
           ST;    start      <= I2CC bit 7   -- must be "Go" in order to not issue STOP
           DLEN;  xfer_count <= I2CDLEN      -- must be equal to our read amount

           The plan to do this is:
           1. Start the sub-add]ress write, but don't let it finish (keep xfer_count > 0)
           2. Populate READ, DLEN and ST in preparation for ReSTART read sequence
           3. Let TX finish (write the rest of the data)
           4. Read back data as it arrives

         */
         assert( nbytes <= 16 );
         I2CDLEN_x( periph_setup->port )  = nbytes;
         I2CC_x( periph_setup->port )    |= I2CC_EN | I2CC_START; /* Begin, latch the WRITE and TX xfer_count values */

         /* Wait for us to leave the idle state */
         while( 0 == ( I2CS_x(periph_setup->port) & (I2CS_TA|I2CS_ERR) ) ) {
            _nop();
         }
      }

      /* Now we can set up the parameters for the read - they don't get considered until the TX xfer_count==0 */
      I2CDLEN_x( periph_setup->port )  = read_nbytes;
      I2CC_x( periph_setup->port )     |= I2CC_EN | I2CC_START | I2CC_READ;

      /* Let the TX complete by providing the sub-address */
      while( nbytes-- ) {
         I2CFIFO_x( periph_setup->port ) = sub & 0xFF; /* No need to check FIFO fullness as sub-address <= 16 bytes long */
         sub >>= 8;
      }

      /* We now care that the transmit portion has completed; the FIFO is shared and we mustn't read out
         any of the data we were planning on writing to the slave! */

      /* Wait for peripheral to get to IDLE or one of the two RX states - this way we *know* TX has completed or hit an error */
      {
         uint32_t state;
         bool_t state_transition_complete;
         bool_t error_detected;
         do { 
            state = (I2CS_x( periph_setup->port ) & 0xf0000000) >> 28;
            state_transition_complete = ((state == 0) || (state == 4) || (state == 5));
            error_detected = (I2CS_x(periph_setup->port) & (I2CS_ERR | I2CS_CLKT)) != 0;
         } while(!state_transition_complete && !error_detected);

         if (error_detected) {
            /* Clean up, and disable I2C */
            I2CC_x( periph_setup->port ) &= ~(I2CC_INTD | I2CC_INTR);
            I2CC_x( periph_setup->port ) &= ~(I2CC_START | I2CC_READ);
            I2CS_x( periph_setup->port ) = I2CS_CLKT | I2CS_ERR | I2CS_DONE;
            I2CC_x( periph_setup->port ) |= I2CC_CLEAR;
            I2CC_x( periph_setup->port ) = 0;
            return -1;
         }
      }

      if (using_interrupt)
      {
         /* Wait for interrupt to complete. */
         i2c_state.active_buffer[periph_setup->port] = data;
         i2c_state.active_buffer_length[periph_setup->port] = read_nbytes;
         i2c_state.active_buffer_offset[periph_setup->port] = 0;
         i2c_state.pending_transfer[periph_setup->port] = I2C_PENDING_TRANSFER_READ;
         RTOS_LATCH_T latch = rtos_latch_locked();
         i2c_state.pending_latch[periph_setup->port] = &latch;

         /* Enable interrupt. */
         I2CC_x( periph_setup->port ) |= I2CC_INTD | I2CC_INTR;

         rtos_latch_get (&latch);

         i2c_state.pending_latch[periph_setup->port] = NULL;
         data_read = i2c_state.active_buffer_offset[periph_setup->port];

         rc = (data_read == read_nbytes) ? 0 : -1;
      }
      else
      {
         uint32_t time_now = 0;

         /* Loop until we've read all our data or failed. */
         while( 0 == ( I2CS_x(periph_setup->port) & (I2CS_TA|I2CS_ERR|I2CS_DONE) ) ) {
            _nop();
         }

         /* Wait for some data to arrive - we should wait, at most, I2C_TIMEOUT_IN_USECS for data to arrive every time we start waiting */
         time_now = i2c_state.systimer_driver->get_time_in_usecs( i2c_state.systimer_handle );
         while( ((i2c_state.systimer_driver->get_time_in_usecs( i2c_state.systimer_handle ) - time_now) < I2C_TIMEOUT_IN_USECS)
                && ( data_read < read_nbytes )
                && !(I2CS_x( periph_setup->port ) & I2CS_ERR) ) 
         {
            if (I2CS_x( periph_setup->port ) & I2CS_RXD) 
            {
               read_data[ data_read ] = I2CFIFO_x( periph_setup->port );
               data_read++;
               time_now = i2c_state.systimer_driver->get_time_in_usecs( i2c_state.systimer_handle  );
            }
         }

         if( (data_read != read_nbytes) /* Did we read all the data we asked for? */
             || ( (read_nbytes - data_read) != I2CDLEN_x( periph_setup->port ) ) /* Has DLEN decremented? */
             || ( 0 != (I2CS_x( periph_setup->port ) & I2CS_ERR) ) ) { /* Are there any errors? */
            rc = -1;
         } else {
            while( I2CS_DONE != (I2CS_x(periph_setup->port) & I2CS_DONE) ); /* Wait for the peripheral */
            rc = 0;
         }
      }

      /* Clean up, and disable I2C */
      I2CC_x( periph_setup->port ) &= ~(I2CC_INTD | I2CC_INTR);
      if(I2CS_x( periph_setup->port ) & I2CS_ERR) {
         //Wait for it to be idle
         while(I2CS_x( periph_setup->port ) & I2CS_TA)
            _nop();
      }
      I2CS_x( periph_setup->port ) = I2CS_ERR | I2CS_DONE;
      //Finally disable the I2C
      I2CC_x( periph_setup->port ) = 0x0;
   }

   if( !user->skip_asserts ) {
      assert( rc >= 0 );
      _nop();
   }

   return rc;
}

I'm not sure if people are hitting this issue, but the pasted could may be helpful if you are.

rewolff · 2013-03-22T08:53:44Z

It is not possible to completely eradicate this hardware bug.

When it happens, the master and slave disagree on the number of clock cycles. Therefore, the data, as interpreted by the slave and the data as the master (= raspberry pi = driver/application) intended differ. You might be writing a completely bogus value in a register somewhere. Highly annoying.

My suggested workaround would at least cause the next transaction to work. The impact would be halved: only one transaction botched instead of two. It would not eliminate the problem.

If I2C slave doesn't support clock stretching you are fine.

If the I2C slave doesn't require clock stretching....

If I2C clock stretches by a bounded amount, then lowering the I2C clock frequency can avoid the problem

Yes. I have 10 devices "in the field" at one location. Apparently the clock frequency (RC clock) of three of the modules is different in such a way that they end up "done" with the clock stretching in exactly the wrong moment. Reducing the clock frequency fixed the problems for those modules.

If the clock stretching delay is NOT fixed, but, say, varies by more than 5 microseconds, there is a more or less statistical chance of hitting the problem. In my case the slave requires 0.25 microseconds of clock width. So clock stretching ends in the 5% of the time leading up to the clock changing moment for the BCM, then things go wrong. So, in this case, about 1 in 20 transfers would go wrong. (specs say min 250ns for a clock pulse. That in fact doesn't mean that it won't be seen if it is shorter. I think the chances of it being seen goes linearly from 100% at > 250ns to 0% at < 125ns. )

I've spent yesterday stretching the clock stretching period in such a way that it would fall around the 2.5 microsecond mark in the 5 microsecond window-of-opportunity. If I'd aim for the start of the window and my RC clock runs bit fast, I'd hit the "0" which triggers the bug. If I aim for 5, same thing. So now I'm aiming in the middle, away from the bad places. But change the I2C clock to 80kHz, and BAM, I'm aiming right for the sensitive spot... (and if it doesn't happen with 80kHz, it'll happen with some value between 80 and 100).

popcornmix · 2013-03-22T16:50:22Z

Here is Gert's description of the I2C bug:
https://dl.dropbox.com/u/3669512/2835_I2C%20interface.pdf

rewolff · 2013-03-23T13:03:15Z

Haha! I have instrumented my test-board in exactly the same way. I've cut the SCL trace and mounted a 0.1" jumper connector on there. While not measuring there is a normal jumper there, or a female connector with a 100 ohm resistor when I am.

I don't think it only happens on the first clock. That would mean I could work around the problem by either NOT clock stretching at all (as most hardware I2C chips probably do), or by causing the clock stretch to last at least a full cycle.

I have instrumented my i2c slave code in such a way that I can cause the clock stretch to change by 0.5 microsecond increments. I can command it to change the clock stretch to XXX over I2C and then it works like that for 50 miliseconds before changing back to the default values. (in case it doesn't work we need it working again. This way I can quickly scan a whole range of values for these settings, while monitoring the SDA and SCL lines with a logic analyzer. Sending a "use delay XXX" and then sending a sting takes 25 miliseconds. THen incrementing a counter in the script and sending the "next delay" takes the shell script about 100ms. So I can scan 100 possible values in about 10 seconds.

I HAVE seen the bug happening on longer delays. So the theory: "it only happens on the first" is not correct.

As to the "super efficient" interrupt handler: Yes, I can do that too. The thing is: The clock stretching feature allows me to put the "handle this byte" code in the interrupt routine and not worry about it taking a few microseconds or a few miliseconds. In theory the clock stretching will make sure that the master waits for me to finish handling this byte before continuing with the next one. That plan is totally busted with the bug in the 2835. I now just cause clock stretching to stop, and then hope to finish handling the byte in the 70 microseconds that follow.

I have to "cycle count" to aim for the middle of the 5 microsecond half-clock for me to release the clock stretching. If I'd aim for the start (to cause a 4.9 microsecond clock-high-period) the clock on my slave might be running a few percent faster and cause the bug to trigger a half-cycle earlier. So both ends of the half-clock cycle are dangerous, I have to aim for a short, but valid enough clock cycle.

The change in the module is IMHO simple: When SCL is "driven high" (i.e. not driven at all), stop the I2C module master clock. This has the side effect of allowing much longer I2C buses because automatic clock stretching occurs when the bus capacitance is above spec.

iz8mbw · 2013-03-23T14:53:25Z

This: http://comments.gmane.org/gmane.linux.kernel.rpi/268 and this: http://news.gmane.org/gmane.linux.kernel.rpi can help?

rewolff · 2013-03-24T06:45:32Z

Thanks for trying to help, iz8mbw, but those have nothing to do with this (hardware) bug.

I've measured my I2C clock and it comes out as 100kHz which the driver thinks it should be. The driver has a "clock source" object which it queries for the frequency and it uses that. I have not yet instrumented my driver to print out the frequency.

rewolff · 2013-03-24T08:09:44Z

Just for information for people who want to SEE the bug in action I have some "proof" in the form of a logic analsyer dump.
Hmm I can attach images, but not the binary LA dump. That one is here: http://prive.bitwizard.nl/LA_with_bug_scan_50_to_100_microseconds_delay.bin.gz

In the image LA_with_bug_shot1.png you'll see the short burst of activity: "set delay to XXX" then a small burst that tries to write some 10 bytes, but it gets aborted due to a NACK. On the right you see a proper 10-byte-packet. The SDA line remains low until the next burst that is misaligned (all bits are shifted, and in my slave the "xth-byte-in-packet" counter has not been reset by the non-existant start condition at the start of that packet.

In the image LA_with_bug_shot2.png I've zoomed in to the section where the short pulse happens. This the longest pulse that I've seen go wrong: 290ns. (above datasheet-spec for the Atmel!)

rfmerrill · 2013-03-26T08:33:19Z

Oh so THAT's where that came from!

I've got a project I'm trying to control with I2C from a raspberry pi and I'm honestly close to just giving up and either switching to a beagleboard or having the Pi talk to an FPGA via SPI which then does the I2C stuff.

rewolff · 2013-03-26T08:47:08Z

I have a board in the mail that should be able to do that (expected delivery: tomorrow). SPI connector, I2C connector. Would be a nice project to try with that board. Send me an email if you want to be kept informed.

Current situation: I'm seeing behaviour compatible with Gert's: "only the first cycle can go wrong" when the slave is an ATMEGA, and I'm seeing the bug in full force when the slave is one of my ATTINYs.

Here are the delays I tested:

and the shortest SCL pulse observed during this test-session was 4.5 microseconds.

Now testing with the attiny slave, the tested delays for the ACK are:

and the minimum pulse width is 41ns (my measurement resolution):

Ah. Found it! It seems that clock stretching before the ACK of a byte is "dangerous" but clock stretching AFTER the ACK is ok.....

markdootson · 2013-04-08T01:50:11Z

I and others are certainly hitting the repeated start issue. I have a userspace hack at http://cpansearch.perl.org/src/MDOOTSON/HiPi-0.26/BCM2835.xs in proc hipi_i2c_read_register_rs. This seems to work for the users who have reported trying it or incorporating the C code ( though that's only 5 people ), even though I had not figured out correctly how to determine that the TX stage had completed. I suppose I need to learn to compile kernel modules and run existing tests against code above.

Ferroin · 2013-04-08T01:59:33Z

Couldn't we just force the kernel to ignore the I2C hardware and use bitbanged GPIO-based I2C on the same pins? I would think the overhead for that would be more acceptable then the bugs in the hardware itself.

rewolff · 2013-04-08T06:18:24Z

What overhead do you expect? The current driver defaults to 100kHz clock. So reading 10 bytes from an i2c intertial sensor takes about 1ms. If you bitbang, you're likely to be churning CPU cycles all that time....

popcornmix · 2013-04-08T13:50:19Z

My preference would be for the driver to optionally support bitbanging i2c through a module parameter (or maybe a separate module if that is more convenient).

Many devices don't use clock stretching (or clock stretch by a bounded amount so can work correctly at reduced speed), so the hardware I2C driver is still preferable in those cases.

Ferroin · 2013-04-08T14:47:40Z

The kernel itself has a driver for bitbanged I2C on GPIO pins, this driver has much lower overhead than bitbangging from userspace, and exposes the same API as the hardware I2C drivers. My suggestion would be to compile both that driver and the hardware I2C driver as modules, then load the hardware driver by default.

popcornmix · 2013-04-08T15:18:50Z

@Ferroin
Do you use the bitbanged I2C driver on Pi?
What .config options have you added to build it? Any source patches?
What do you do to choose the driver to use (modprobe? different /dev/ devices?)

Ferroin · 2013-04-08T15:57:40Z

Haven't used the driver on the Pi, but I have a friend who uses it successfully on a BeagleBoard.
The config option is I2C_GPIO.
If both that and the broadcom driver were configured as modules, choosing the driver would be as simple as loading the kernel module.
The GPIO module does need parameters to tell it which GPIOs to use as SDA and SCL, I don't know what the parameters are, but based on the not very well documented source in dirvers/i2c/busses/i2c-gpio.c, it looks like the apropriate params are 'sda=X scl=Y' where X and Y are the pins you ant to use.

kadamski · 2013-04-09T18:18:59Z

The i2c-gpio module itself does not provide any way to pass parameters to it. So you need another module that registers platform data and configures at least bus number, SDA and SCL pins. I've written proof of concept of such a module and it even worked on pins 0 and 1 but without i2c-bcm2708 loaded (pcf8574 was detected by i2cdetect, I was able to drive some LED with gpio-pcf857x module). But then I burned my pcf8574 when changing the pins (and I don't have any other i2c chips right now) so I couldn't really test it. Here's the code

popcornmix · 2013-04-10T10:33:04Z

@KADAMSI
Making sda_pin/scl_pin module parameters may be useful. As would a pull request.
Can anyone confirm this works for them?

rewolff · 2013-04-10T11:20:23Z

That is less trivial than it looks: It expects to be passed a platformdata structure all the way.

On the other hand, the platform does not dictate which GPIOs you are going to use today, so putting in the work to change it does sound reasonable. (normally the platform data specifies: "How is /this/ computer wired". )

kadamski · 2013-04-10T11:45:56Z

@popcornmix
I'm planning to make them configurable, I just had not enough time yesterday. I would also like it to be able to create more than one bus at a time. Maybe in dynamic way, like GPIOS export and unexport sysfs files (but I'm not sure if it's possible to cleanly unexport such bus)? Oh, and I found another pcf8574 and could confirm that it worked well also on different GPIO pins when external pullup resistors are used. Unfortunately I don't have more sophisticated I²C devices to test. So are you interested in providing such module in default kernel tree?

popcornmix · 2013-04-10T12:51:45Z

@kadamski
I don't have a setup to test modules like this. However any hardware features the chip provides (like I2C) should have kernel drivers to allow more people to use them. If the I2C hardware has bugs that will stop people using certain peripherals then a bitbanging alternative sounds worthwhile.

So, if you produce a new driver that looks harmless for people not using it (i.e. built as a module, and doesn't nothing until loaded), and there's some evidence (i.e. some confirming user reports) it is working correctly, then I'm happy to accept a PR.

commit 7d3d43d (net: In unregister_netdevice_notifier unregister the netdevices.) makes pktgen crashing at module unload. [ 296.820578] BUG: spinlock bad magic on CPU#6, rmmod/3267 [ 296.820719] lock: ffff880310c38000, .magic: ffff8803, .owner: <none>/-1, .owner_cpu: -1 [ 296.820943] Pid: 3267, comm: rmmod Not tainted 3.4.0-rc5+ raspberrypi#254 [ 296.821079] Call Trace: [ 296.821211] [<ffffffff8168a715>] spin_dump+0x8a/0x8f [ 296.821345] [<ffffffff8168a73b>] spin_bug+0x21/0x26 [ 296.821507] [<ffffffff812b4741>] do_raw_spin_lock+0x131/0x140 [ 296.821648] [<ffffffff8169188e>] _raw_spin_lock+0x1e/0x20 [ 296.821786] [<ffffffffa00cc0fd>] __pktgen_NN_threads+0x4d/0x140 [pktgen] [ 296.821928] [<ffffffffa00ccf8d>] pktgen_device_event+0x10d/0x1e0 [pktgen] [ 296.822073] [<ffffffff8154ed4f>] unregister_netdevice_notifier+0x7f/0x100 [ 296.822216] [<ffffffffa00d2a0b>] pg_cleanup+0x48/0x73 [pktgen] [ 296.822357] [<ffffffff8109528e>] sys_delete_module+0x17e/0x2a0 [ 296.822502] [<ffffffff81699652>] system_call_fastpath+0x16/0x1b Hold the pktgen_thread_lock while splicing pktgen_threads, and test pktgen_exiting in pktgen_device_event() to make unload faster. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>

hkapanen · 2016-03-01T10:51:42Z

Has the HW been fixed on BCM2837 (in RPi 3)?

hmf2015 · 2018-08-15T12:52:23Z

Dear all, today i've tested the BNO055 in the package GY-955 from East Asia what behaves very ugly at 100_000 Hz with a Pi 2B (BMC2835). For tests I've changes the frequency of the I2C bus from 50_000 to 400_000 Hz and I found problems at 100_000 Hz, but for 50_000, 85_000, 115_000 and 125_000Hz the code from Adafruit (Adafruit_Python_BNO055) works pretty fine with the I2C connection, 150_000 Hz seems to be too fast for the BNO055, it works but the values are sometimes wrong. I confess, I didn't test the frequency by an oscilloscope, I've only read out the file /sys/class/i2c-adapter/i2c-1/of_node/clock-frequency, therefore some uncertainty is left open. Now I start to implement a full interface to the BNO055 in ADA.
Facit: The hardware bug in the BMC I2C handling appears only at 100_000 Hz, it must be confirmed.

rewolff · 2018-08-15T13:44:54Z

Just FYI, The hardware bug happens when the slave device does clock stretching and releases the SCL line after 0.9 to 0.999 I2C clock periods. The line will start to rise, and the pi hardware will say: Oh, he didn't do clock stretching so I can provide the next clock pulse by pulling the clock line low again.

The bug happens when the time remaining in the clock period is sufficient for the clk line to pass the "HIGH" threshold for the raspberry but not for the slave devices OR when the slave has noise filtering and considers a high during < x% of the normal I2C clock cycle a glitch.

This does not "only happen at 100kHz", but also at many other frequencies, except that the response time of the slave device then has to be a different value.

Suppose the clock is 100kHz, so a normal cycle would be 5us high and 5us low. Now suppose at t=0 the pi starts the last clock cycle by pulling the clock line low. At t=5us the pi releases the clock signal, but the slave is still busyand keeps the clock low. At t=9us the slave is done and releases the clock signal. At t=10us the pi will look at the clock signal, and think: Ok, no clock stretching, so continue, and it pulls the clock signal low again. The clock signal has now only been high for 1us (a bit less due to capacitance and the pullup resistors' time constant) and might not be seen by the slave. Now a different device does not become ready after 9 microseconds, but after 10.5. Now at 100kHz, the pi will see "still low, clock stretching, wait another 5 microseconds. Now we'll get a 4.5 microsecond clock pulse and things will work. But run this device at 90kHz, and it gets into trouble. Now the pi will look at 11 microseconds from the start, and the clock signal will have been high for less than a microsecond -> Problems.

The people who complain about this are NOT the ones who have those other devices. Those work at the default 100kHz. But the bug is still very real and would occur on those devices simply at other bus speeds than the devices that you and I have.

hmf2015 · 2018-08-15T16:11:57Z

Thank you for the response, it explains very good, why I got a reliable transfer at the frequences I chosen:
at 100_000 Hz the problems raise in the period of 9 to 9.99 µsec
at 85_000 Hz the problems start at 1_000_000/85_000*.9 = 10,59 µsec > 9.99 sec, therefore the clock stretching is ignored I assume, for all frequencies less then 85_000 Hz the start point of problems is higher than 10,59 µsec.
at 115_000 Hz the bus cycle is 1_000_000/125_000 = 8.7 µsec < 9 µsec, therefore the clock stretcing is detected correctly, for all frequencies higher than 115_000 Hz the bus cycle is less then 8.7 µsec.
So when I have problems I can calculate the frequence I need.
(I read the information on http://www.advamation.com/knowhow/raspberrypi/rpi-i2c-bug.html,
there was noted .5 instead of .9 of a period).

rewolff · 2018-08-16T08:42:37Z

Well. It depends on what your slave detects as a valid clock signal. And how much delay the RC on your bus introduces. In my case, the AVR I use as a slave considers pulses less than two clock cycles to be a glitch. With an internal RC clock of 8MHz (+/-10%!!!), the pulse needs to be at least 250ns to be detected. I2C specifications require 5 microseconds, so there is a factor of 20 "margin" between what is officially required and what no longer works.

At first I thought I'd increase the clock stretch in my slave produces a bit to say 12us. That works until the bus capacitance adds enough delay to cause a problem again. So it is very difficult to work around. Annoying to have to tell your clients not to use the standard 100kHz i2c.

hmf2015 · 2018-08-18T10:18:38Z

Sorry, but it I needed some time to write a full implementaion and test scenario.
I thank you for all the information you gave to me, it was very helpfull.
My test scenario is build up with 2 MEMS ( BNO055 packaged as GY-055, MPU9255 packaged as MPU-9250/6500, WHO_AM_I register says MPU9255) and 1 GPIO extender PCF 8574 (packaded PCF 8574).
The I2C clock at the Pi (2b) was set to 125 kHz, verified by reading /sys/class/i2c-adapter/i2c-1/of_node/clock-frequency.
My code runs 100 times reading data from the MEMS and writing data for PCF with a delay 0.1 sec.
Result:

Initialisation of the MEMS was successful.
Reading from the MEMS data over the I2C bus was successfull over all cycles.
Writing data to the PCF over the I2C bus was successfull over all cycles.

Therefore the MEMS and the PCF accept a clock higher then the standard 100 kHz.
In an elder environment the communcation to the AVR is done via serial line, in my second one the the communication will be done over USB, but it is not finished, actual I tripped over the I2C bus problems cause by the BNO055, but I think for my environment I found a workaround for a Pi 2b and I hope it will work with a Pi 3 too, that's the next test. And then I have to integrate 2 other MEMS, 1 VL53L0X and 1 BME280, I hope it will work with the new frequence and the bus capacitance accept all these elements.

piquan · 2018-09-05T07:53:35Z

I'm playing with this issue now on a RPi 3 Model B+, and I'm seeing different results than the Advamation article describes. The silicon wasn't revved recently, was it?

The short version is, things seem to work if I stretch at read-preACK, write-preACK, or write-postACK; the only time I need to be sure to stretch either long or not-at-all is read-postACK. I still need to clean up my experiment before I can say anything certain, though.

First, a note about terminology. The Advamation article says:

So, I2C with the Raspberry Pi only works if
the slave does not use clock-stretching at all, or
the slave stretches the clock only at the end/directly after an I2C-read-ACK-phase (after reading ACK/NACK), but then by more than 0.5 clock periods.

The term "I2C-read-ACK-phase" seems somewhat ambiguous. If it's an I2C read command (i.e., the post-address R/W̅ bit is set), the slave would be writing the A̅C̅K̅ bit, not reading it, contrary to the parenthetical. When I say "read-postACK" here, I mean that the R/W̅ bit was set right after the address (and presumably not changed by protocol like SMBus does), so the slave has just written its A̅C̅K̅ response.

As has been hinted at around here, it seems that if the RPi senses that SCL is being held low by a slave, it will delay its internal SCL clock by one half-cycle (5μS). The bug is triggered if the slave holds SCL low for almost one half-cycle. If that happens, then the RPi will still let SCL go high, and then pull it down immediately afterwards when its SCL clock triggers. This results in a runt clock pulse, which some devices might see as a pulse while others don't. (For reference: the I2C spec implies a nominal SCL high time of 5μS, and establishes a minimum time of 4.0μS at 100kHz, which is the default speed of the RPi's I2C interface.)

In my experiments, in the read-preACK, write-preACK, and write-postACK phases, the RPi would ensure that the maximum SCL high time it would allow was 4μS or so. It seemed willing to push its SCL clock back in those phases. I suspect that the RPi sampled the SCL line shortly after the it released it, and if SCL was still low, it pushed its SCL clock back a half-phase.

However, in the read-postACK phase, I did see that the RPi would emit runt clocks (<4μS, often 2μS or less) if the SCL line was held low for about 6–9μS. I suspect that in this phase, the Broadcom engineers follow a different path that tries to make it less likely that they will have to delay a half-phase on a high-capacitance line. Sadly — as we've seen — it falls short.

Therefore, my current code needs to stretch SCL beyond ½ cycle in the read-postACK phase of reads that may take a while (such as ADC conversions). If I fail to

The current tests that I'm running use a PIC16F1619 as a slave device. The slave device uses a loop to stretch the clock in both read and write commands, both pre- and postACK. The length of this loop plus overhead varies from 3μS (less than the time the RPi holds SCL low) to 90μS in 0.35μS increments. I have a Python program on the RPi that will different values to two registers, then read them back, and assert that the read-back values are correct. The values being used

rewolff · 2018-09-05T08:10:01Z

Piquan,
Broadcom has been downplaying this by proclaiming that this problem only happens on the very first clock stretch. I do not think that this is the case. You should be able to verify that. Advamation seems to be following broadcom statments.

piquan · 2018-09-05T09:12:53Z

@rewolff Well, there may be something to their interpretation. If, by “first clock stretch” they mean “the first time in a given stage that SCL is stretched by more than one clock translation” then their description is accurate: if you stretch by more than one clock cycle, then yeah: in all phases, if you stretch by >.5 cycle, you’re in good shape. At least, in the vague experiments I’ve done so far; more precise experiments are yet to come.

I’m happy to conduct some better experiments to demonstrate when the RPi’s I2C can and can’t handle clock stretches, and get repros that BCM can send to the engineers.

Sadly, there’s a very significant silicon rev they’d need to fix this issue; they need to not tick the SCL divider clock when SCL is held low (by an internal or external source). That’s probably going to require a silicon-level change, which is expensive; most ECOs are metal-level, which is… well, not cheap, but way cheaper than silicon.

My main goal here is to characterize the types of slaves that would and wouldn’t work with the RPi, with some experimental evidence.

It sounds like we need more solid data, such as repro scenarios, to convince Broadcom to re-engineer the I2C controller. Even with solid data, the engineers will have to convince management that it’s worth revising silicon for the next revision.

That said, I’ll put some time into designing reproducible scenarios so the BCM engineers can clearly see and test the problem. I’m going to depend on the community members here to communicate my findings to Broadcom, though; I don’t have a channel to them.

piquan · 2018-09-05T09:27:37Z

I should clarify something about my previous comment for the community, about silicon vs. metal changes.

You know how, if you get a board from SparkFun or Adafruit, it often has traces you can cut or solder together to change its behavior? Most I2C boards have traces you can change to change the address, enable or disable the pull-up resistors, etc.

Microchip designers do the same thing. There’s lots of changes they can make on the metal layers of a microchip, so that they don’t have to change the silicon layers. That’s because changing silicon layers is hard and expensive (like changing the chip on a board), while metal layers are a lot easier to change (like changing the traces on a board).

Many useful changes can be made in the metal layers, because designers think ahead and make metal connections (jumpers, essentially) that change the behavior of a chip.

But if there’s a change they didn’t plan for, and can’t figure out how to make by rerouting traces within the chip, then they have to change the silicon. Silicon needs a lot more planning, verification, and masking, so it’s hard to change.

Many companies, if they have a silicon layer that’s “good enough”, will use it as long as possible. They’ll reuse some parts of the silicon layer in multiple revisions of the chip.

I suspect that’s where BCM is at: they need good evidence that this bug can’t be fixed in drivers or metal before they can justify re-engineering this part of the silicon layer from scratch.

JamesH65 · 2018-09-05T09:28:23Z

It is extremely unlikely that there will be a respin of the exisiting silicon just to fix this issue (got a spare $1M?). Future chips may well have a fix though, who knows.

rewolff · 2018-09-05T10:22:52Z

On Wed, Sep 05, 2018 at 02:13:19AM -0700, Joel Ray Holveck wrote: It sounds like we need more solid data, such as repro scenarios, to convince Broadcom to re-engineer the I2C controller. Even with solid data, the engineers will have to convince management that it’s worth revising silicon for the next revision.

They already have had six years and 3 silicon revisions where they have demonstrated their decision: "won't fix".

if you stretch by >.5 cycle, you~re in good shape.

I do not think so. Every time you release CLK in the "just before the broadcom looks again" time period, you get short pulses. I can easily delay the release of the clock signal by say 5 useless microseconds to always be out of the 0.5 cycle. But that did not work.

rewolff · 2018-09-05T10:28:05Z

On Wed, Sep 05, 2018 at 02:28:53AM -0700, James Hughes wrote: It is extremely unlikely that there will be a respin of the exisiting silicon just to fix this issue (got a spare $1M?). Future chips may well have the fix though.

I doubt it. We went from BCM2835 with KNOWN bug to the BCM2836 with the same bug to the BCM2837 with the exact same bug to the BCM2837B0. Now why do you think the next revision will be different? BCM has a target market: Phones and TV-sticks for these chips. In their target market nobody is doing software-I2C on general purpose microcontrollers like AVRs and PICs. they are NOT going to fix this.

RalphCorderoy · 2018-09-05T11:34:42Z

It is a shame this investigative effort is required, great though it is. When I asked before for an erratum that covers this it was thought there wasn't one. Has that now changed and there's a URL? Ideally, Broadcom would help the community by precisely describing the fault and when it occurs so workarounds or evaluations of compatible components would be easier to work out with confidence. Even to the extent of the snippet of relevant HDL if they don't want to attempt to translate that accurately into a description.

rewolff · 2018-09-05T12:57:37Z

The suggestion to halt the half-bit-clock-divisor when allowing clock stretching is a good one. It would work.

The problem with that is that part of the I2C module would then need to run at "full clock speed". Now the I2C module is a nice low-power one because it runs normally not faster than 200kHz.

piquan · 2018-09-06T07:04:28Z

The main reason for my post was what I said at the beginning of my original post:

I'm playing with this issue now on a RPi 3 Model B+, and I'm seeing different results than the Advamation article describes. The silicon wasn't revved recently, was it?

If we had significant changes to the I2C hardware or driver after the 2013 Advamation writeup, and the community knows how they may affect this bug, then I should make appropriate changes to my experiments.

I know the fundamental problem is in hardware, but for all I know, somebody may have found by experiment that changing a seemingly-unrelated register in the driver actually changes this bug's behavior. I also haven't confirmed that the BCM2837 has the same I2C module as the RPi 2 (the most recent time I could find definitive confirmation that the I2C hardware was unchanged since this problem was first reported). I also haven't found any information about the any of the BCM283x steppings; for all I know, there could have been a metal change in the BCM2836 era.

I'm not interested in bashing Broadcom, speculating how they may change future revisions, or anything like that.

My primary goal here is to document clearly the current behavior; it looks like it's changed over time. I have a secondary goal to allow future users to reproduce my experiments to see if the results are still valid, or to make improvements or test other conditions. I have another secondary goal to allow Broadcom engineers to reproduce the tests easily if they want to. Both of these mean I need to publish my experiments, which also helps: because of my own hubris, if I'm publishing my experiments, I'll be more driven to make them rigorous.

My current experimental plan is to use a PIC16F1619 as the slave implementation; it's fast and versatile enough to do the clock stretching experiments, easy enough to work with, and common enough that other experimenters can reproduce my results. If the results are ambiguous, I might also build an experiment with an FPGA. I might see if I can get reasonable granularity using another Raspberry Pi in a bit-bashing configuration as the slave, but I suspect I'll need something without a whole OS on it to do real-time testing.

To address another couple of things that came up:

Sorry that my original post was incomplete; I accidentally posted an early draft. I'll leave it as-is, since the changes I made are pretty minor.

I don't claim originality on the idea of halting the clock divisor while SCL was externally held low. I read that somewhere else, but don't remember where; I thought it was from Gert van Loo in the I2C clock stretching thread on raspberrypi.org, but I can't find it now, so I must have seen that elsewhere. I'm not convinced that it would change the power profile significantly,

JamesH65 · 2018-09-06T08:31:59Z

The I2C HW block has not been fixed/changed, and is unlikely to be, in the bcm2835/6/7 as that would require respins of the chip which is simply much too expensive.

I do not know who Advamation are, but they are unrealted to the RPF(T) group, so I cannot guarantee the articles accuracy.

hmf2015 · 2018-09-07T07:33:43Z

Dear all, I got a very strange phainomenon on my Pi 2B and 3, I try to explain:
when I started to implement the BNO055 end of June, I used an elder Raspbian (I don't now the version) and I got a lot of invalid I2C addresses using i2cdetect. For that I looked in the net and I found this forum with good informations. For that I changed the I2C frequence as described above and all the ghosts vanish and I can work. Mid of August I upgrated Raspbian to I think raspbi-dev 4.14.62-v7+ and the system became a little bit unstable, but I can work. From update to update the system became more unstable, noted leaks on the sd card. Therefore I re-installed Raspbian via NOOBS (Linux raspbi-dev 4.14.62-v7+ #1134 SMP Tue Aug 14 17:10:10 BST 2018 armv7l) and I forgot to set the I2C frequence but no ghosts appeared. Okay I don't use microkontroller on the bus, only MEMS and GPIO extender, but I feel there is an workaround in Raspbian's i2c code. Further I think the detected leaks came from an imperfect upgrade, I'm using the the same card and no leaks are detected.

piquan · 2018-09-07T08:05:32Z

@hmf2015 This issue is about a very specific I2C problem. Problems with the SD card are not caused by this I2C problem. If your SD card is not working right, you should try a new SD card. Some cards are not good, and stop working quickly. My SD card stopped working last month, and I had to replace it.

Usually, I2C ghosts are not because of this problem. They most often happen because:

You do not have a good power supply. I like this one.
Your cables are too long. I2C does not work well with cables longer than 1m.
Your cables have capacitive coupling. If you have the I2C pins (SCL and SDA) twisted together, that can make a problem. Sometimes, people use twisted-pair cables (like Ethernet cables) that twist the I2C pins, and that makes problems. If you have to use twisted-pair cables, then twist SCL with Vss (ground) and SDA with Vdd (3.3V). Look at the I2C spec in section 7.5.
Your I2C device has 5V. The Raspberry Pi does I/O at 3.3V. If the voltages are different, that can make problems. (The BNO055 needs 3.3V, so if you're using it with 5V, you have a problem.)

I said that we're talking about a particular problem. Some devices might make ghosts because of this problem, but not many devices. If devices make ghosts because of this problem, they don't make many ghosts. I think your problem is different.

I don't know what you mean when you say that you "noted leaks on the sd card". That's an important problem, but it is not a problem that we talk about in this web page's issue. This conversation is about a particular bug. That bug does not affect SD cards.

I do not think that recent Raspbian changed how I2C works. We think that Raspbian cannot fix this bug. We might be wrong. Thank you for telling us about this.

If you are only telling us this information to help us, then we thank you. If you are telling us this information to ask for help, you might want to ask at the Raspberry Pi forums. The forums talk about many things, like I2C, SD cards, GPIO extenders, and many more things. The things you are seeing may be the same things we are talking about, but they may be different things. The people at the forums can help you to know which problems you are seeing.

Thank you for telling us this information!

hmf2015 · 2018-09-07T08:44:22Z

Thanks for the answer, but the sd card problem was produced by the O/S, I checked the card on my (Unix) host, and it was ok, and now the same cad works perfekt after new installation.
The ghosts on the I2C bus vanish by a higher frequence and now after the reinstallation they don't appear, R/C problems and reflections I can, I think, exclude. I know the 3.3V problems, but but the minimal 1 level is less.
The BNO055 package has a voltage reglulator.
I also thought that the I2C problems would be the same as before, but it doesn't, at the moment I have not the time (and the interest) to look into the source code of Raspbian to detect the differences, I'm interest on a working environment.
The list of elements should only show, that I'm working in an Andoid environment (with a real Unix) as BCM supports, not with microcontroller what I attach via serial line or USB since I'm lazy. In your environment the conditions are more restrict than mine, so I cannot say whether your problems become less, but I hope.

rewolff · 2018-09-07T12:33:47Z

Hi hmf2015,
This is not a forum. This is a bug tracker. This list-of-comments discusses a bug which is unrelated to your problems. Please discuss your problem with an appropriate forum or helpdesk. Again, we appreciate your question: "Maybe it's the same? Maybe my description helps solve that bug?" but now we know for sure that what you are seeing is unrelated, please do not "pollute" this discussion further.

sergey-suloev · 2018-11-22T23:36:04Z

@popcornmix
hi, long time ago you have mentioned a problem with I2C state machine during restart and posted some code regarding this issue.
I am not sure what you meant but I have an issue with WM8731 codec connected to I2C.
After 2-3 restarts the I2C bus completely hangs up. i2cdetect doesn't show any info except 1st line and hangs up too. The only way to recover is power down/up cycle.
Does this look like a issues you have mentioned ?
I am using 4.19 mainine kernel.

rewolff · 2018-12-03T10:46:57Z

@sergey-suloev When i2cdetect does not continue scanning after 8 or 16 addresses, I would say that this is unrelated to this issue. The issue pops up when devices do clock stretching to indicate that they are not yet ready for the next clock cycle. When I2C was designed almost 40 years ago, that might have been conceivable for hardware implementations too. But nowadays, hardware is always ready in time to handle the next clockcycle because 10 microseconds is ages of time for something implemented in hardware.

Both drd and gadget interrupt handler use the struct cdns3 pointer as dev_id, it causes devm_free_irq at cdns3_gadget_exit doesn't free gadget's interrupt handler, it freed drd's handler. So, when the host interrupt occurs, the gadget's interrupt hanlder is still called, and causes below oops. To fix it, we use gadget's private data priv_dev as interrupt dev_id for gadget. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000380 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000971d79000 [0000000000000380] pgd=0000000971d6f003, pud=0000000971d6e003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: mxc_jpeg_encdec crct10dif_ce fsl_imx8_ddr_perf CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-03486-g69f4e7d9c54a-dirty #254 Hardware name: Freescale i.MX8QM MEK (DT) pstate: 00000085 (nzcv daIf -PAN -UAO) pc : cdns3_device_irq_handler+0x1c/0xb8 lr : __handle_irq_event_percpu+0x78/0x2c0 sp : ffff800010003e30 x29: ffff800010003e30 x28: ffff8000129bb000 x27: ffff8000126e9000 x26: ffff0008f61b5600 x25: ffff800011fe1018 x24: ffff8000126ea120 x23: ffff800010003f04 x22: 0000000000000000 x21: 0000000000000093 x20: ffff0008f61b5600 x19: ffff0008f5061a80 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 003d090000000000 x13: 00003d0900000000 x12: 0000000000000000 x11: 00003d0900000000 x10: 0000000000000040 x9 : ffff800012708cb8 x8 : ffff800012708cb0 x7 : ffff0008f7c7a9d0 x6 : 0000000000000000 x5 : ffff0008f7c7a910 x4 : ffff8008ed359000 x3 : ffff800010003f40 x2 : 0000000000000000 x1 : ffff0008f5061a80 x0 : ffff800010161a60 Call trace: cdns3_device_irq_handler+0x1c/0xb8 __handle_irq_event_percpu+0x78/0x2c0 handle_irq_event_percpu+0x40/0x98 handle_irq_event+0x4c/0xd0 handle_fasteoi_irq+0xbc/0x168 generic_handle_irq+0x34/0x50 __handle_domain_irq+0x6c/0xc0 gic_handle_irq+0xd4/0x174 el1_irq+0xb8/0x180 arch_cpu_idle+0x3c/0x230 default_idle_call+0x38/0x40 do_idle+0x20c/0x298 cpu_startup_entry+0x28/0x48 rest_init+0xdc/0xe8 arch_call_rest_init+0x14/0x1c start_kernel+0x48c/0x4b8 Code: aa0103f3 aa1e03e0 d503201f f9409662 (f941c040) ---[ end trace 091dcf4dee011b0e ]--- Kernel panic - not syncing: Fatal exception in interrupt SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0002,2100600c Memory Limit: none ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 7733f6c ("usb: cdns3: Add Cadence USB3 DRD Driver") Cc: <stable@vger.kernel.org> #v5.4 Signed-off-by: Peter Chen <peter.chen@nxp.com> Link: https://lore.kernel.org/r/1577437804-18146-1-git-send-email-peter.chen@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit af58e1f upstream. Both drd and gadget interrupt handler use the struct cdns3 pointer as dev_id, it causes devm_free_irq at cdns3_gadget_exit doesn't free gadget's interrupt handler, it freed drd's handler. So, when the host interrupt occurs, the gadget's interrupt hanlder is still called, and causes below oops. To fix it, we use gadget's private data priv_dev as interrupt dev_id for gadget. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000380 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000971d79000 [0000000000000380] pgd=0000000971d6f003, pud=0000000971d6e003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: mxc_jpeg_encdec crct10dif_ce fsl_imx8_ddr_perf CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-03486-g69f4e7d9c54a-dirty #254 Hardware name: Freescale i.MX8QM MEK (DT) pstate: 00000085 (nzcv daIf -PAN -UAO) pc : cdns3_device_irq_handler+0x1c/0xb8 lr : __handle_irq_event_percpu+0x78/0x2c0 sp : ffff800010003e30 x29: ffff800010003e30 x28: ffff8000129bb000 x27: ffff8000126e9000 x26: ffff0008f61b5600 x25: ffff800011fe1018 x24: ffff8000126ea120 x23: ffff800010003f04 x22: 0000000000000000 x21: 0000000000000093 x20: ffff0008f61b5600 x19: ffff0008f5061a80 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 003d090000000000 x13: 00003d0900000000 x12: 0000000000000000 x11: 00003d0900000000 x10: 0000000000000040 x9 : ffff800012708cb8 x8 : ffff800012708cb0 x7 : ffff0008f7c7a9d0 x6 : 0000000000000000 x5 : ffff0008f7c7a910 x4 : ffff8008ed359000 x3 : ffff800010003f40 x2 : 0000000000000000 x1 : ffff0008f5061a80 x0 : ffff800010161a60 Call trace: cdns3_device_irq_handler+0x1c/0xb8 __handle_irq_event_percpu+0x78/0x2c0 handle_irq_event_percpu+0x40/0x98 handle_irq_event+0x4c/0xd0 handle_fasteoi_irq+0xbc/0x168 generic_handle_irq+0x34/0x50 __handle_domain_irq+0x6c/0xc0 gic_handle_irq+0xd4/0x174 el1_irq+0xb8/0x180 arch_cpu_idle+0x3c/0x230 default_idle_call+0x38/0x40 do_idle+0x20c/0x298 cpu_startup_entry+0x28/0x48 rest_init+0xdc/0xe8 arch_call_rest_init+0x14/0x1c start_kernel+0x48c/0x4b8 Code: aa0103f3 aa1e03e0 d503201f f9409662 (f941c040) ---[ end trace 091dcf4dee011b0e ]--- Kernel panic - not syncing: Fatal exception in interrupt SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0002,2100600c Memory Limit: none ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 7733f6c ("usb: cdns3: Add Cadence USB3 DRD Driver") Cc: <stable@vger.kernel.org> #v5.4 Signed-off-by: Peter Chen <peter.chen@nxp.com> Link: https://lore.kernel.org/r/1577437804-18146-1-git-send-email-peter.chen@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Add various tests to check maximum number of supported programs being attached: # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.185325] bpf_testmod: loading out-of-tree module taints kernel. [ 1.186826] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 1.270123] tsc: Refined TSC clocksource calibration: 3407.988 MHz [ 1.272428] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns [ 1.276408] clocksource: Switched to clocksource tsc #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK <--- (new test) #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_replace:OK #269 tc_opts_revision:OK Summary: 18/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230929204121.20305-2-daniel@iogearbox.net

Add a new test case which performs double query of the bpf_mprog through libbpf API, but also via raw bpf(2) syscall. This is testing to gather first the count and then in a subsequent probe the full information with the program array without clearing passed structs in between. # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz [ 1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns [ 1.402734] clocksource: Switched to clocksource tsc [ 1.426639] bpf_testmod: loading out-of-tree module taints kernel. [ 1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK <--- (new test) #269 tc_opts_replace:OK #270 tc_opts_revision:OK Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-4-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Add a new test case to query on an empty bpf_mprog and pass the revision directly into expected_revision for attachment to assert that this does succeed. ./test_progs -t tc_opts [ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz [ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns [ 1.412419] clocksource: Switched to clocksource tsc [ 1.428671] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK #269 tc_opts_query_attach:OK <--- (new test) #270 tc_opts_replace:OK #271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-6-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

tinue mentioned this issue Dec 7, 2014

I2C error Pi4J/pi4j-v1#117

Closed

notro mentioned this issue Dec 29, 2014

PR and issue list purging? #751

Closed

popcornmix added the Bug label Dec 30, 2014

hamishcunningham mentioned this issue Oct 13, 2015

MoPi/simbamon power button bit set on boot hamishcunningham/pi-tronics#63

Closed

dlech mentioned this issue Nov 9, 2015

Some I2C sensors don't work on PiStorms ev3dev/ev3dev#425

Open

RobertLucian mentioned this issue Jul 24, 2018

fix - getting max values (1023) (or invalid) when using analogRead/DHT/US/etc sensor DexterInd/GrovePi#411

Closed

FDelporte mentioned this issue Jan 26, 2021

Will V2 support I2C clock stretching? Pi4J/pi4j-v2#56

Closed

kaedros mentioned this issue Feb 11, 2022

I2C clock-stretching bug #4884

Open

I2C Broadcom bug workaround. #254

I2C Broadcom bug workaround. #254

Comments

rewolff commented Mar 20, 2013

popcornmix commented Mar 20, 2013

rewolff commented Mar 20, 2013

popcornmix commented Mar 21, 2013

rewolff commented Mar 22, 2013

popcornmix commented Mar 22, 2013

rewolff commented Mar 23, 2013

iz8mbw commented Mar 23, 2013

rewolff commented Mar 24, 2013

rewolff commented Mar 24, 2013

rfmerrill commented Mar 26, 2013

rewolff commented Mar 26, 2013

markdootson commented Apr 8, 2013

Ferroin commented Apr 8, 2013

rewolff commented Apr 8, 2013

popcornmix commented Apr 8, 2013

Ferroin commented Apr 8, 2013

popcornmix commented Apr 8, 2013

Ferroin commented Apr 8, 2013

kadamski commented Apr 9, 2013

popcornmix commented Apr 10, 2013

rewolff commented Apr 10, 2013

kadamski commented Apr 10, 2013

popcornmix commented Apr 10, 2013

hkapanen commented Mar 1, 2016

hmf2015 commented Aug 15, 2018

rewolff commented Aug 15, 2018

hmf2015 commented Aug 15, 2018 • edited

rewolff commented Aug 16, 2018

hmf2015 commented Aug 18, 2018 • edited

piquan commented Sep 5, 2018

rewolff commented Sep 5, 2018

piquan commented Sep 5, 2018

piquan commented Sep 5, 2018

JamesH65 commented Sep 5, 2018 • edited

rewolff commented Sep 5, 2018 via email

rewolff commented Sep 5, 2018 via email

RalphCorderoy commented Sep 5, 2018

rewolff commented Sep 5, 2018

piquan commented Sep 6, 2018

JamesH65 commented Sep 6, 2018

hmf2015 commented Sep 7, 2018 • edited

piquan commented Sep 7, 2018

hmf2015 commented Sep 7, 2018

rewolff commented Sep 7, 2018

sergey-suloev commented Nov 22, 2018 • edited

rewolff commented Dec 3, 2018

hmf2015 commented Aug 15, 2018 •

edited

hmf2015 commented Aug 18, 2018 •

edited

JamesH65 commented Sep 5, 2018 •

edited

hmf2015 commented Sep 7, 2018 •

edited

sergey-suloev commented Nov 22, 2018 •

edited