New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
driver/mtd_spi_nor, pkg/littlefs: improve reliability with corrupted flash (new PR) #20589
base: master
Are you sure you want to change the base?
Conversation
881f219
to
7f550ac
Compare
@benpicco I think I found a suitable solution to get rid of the Kconfig configuration options and have a versatile option to enable manufacturer dependent security options. The flags are defined in the global header file https://github.com/RIOT-OS/RIOT/blob/master/drivers/include/mtd_spi_nor.h Therefore I would introduce four new structures: two for Macronix with security (1byte and 4byte access) and two for ISSI (1byte and 4byte access). I'll probably have the preliminary commit with the proposed changes ready in a couple of hours, that should make everything clearer. |
e07f123
to
c293317
Compare
The static-test codespell complains about these lines in drivers/mtd_spi_nor/mtd_spi_nor.c being a typo (it would like to have WELL instead of WEL):
I'm not sure how to fix that? Oh wel... 😅 |
the static-test checks if the tool that is running a test is installed - so you probably do not have codespell installed.
it there is a long form (write enable L... what is the L? eg WREL?) or alternative of WEL (STATUS_WEL ?) you could use that -and workaround the spell checker or as it suggest it could be added to the ignored words list. |
WEL stands for "Write Enable Latch". However I found out that codespell added an inline option to ignore certain words. That would avoid adding a global ignored word (and potentially missing typos of "well" in the future): https://github.com/codespell-project/codespell?tab=readme-ov-file#inline-ignore |
citing from https://github.com/codespell-project/codespell?tab=readme-ov-file#ignoring-words
i don't think you need to go the inline route ( |
Okay, this was a bit naive of me. Obviously the spellcheck only checks with the ignored word list that is currently in master. So either I have to ignore the failed spell test or open a separate PR with just the updated ignored word list 🤔 |
that PR also would get spellchecked i think -> there is a egg and chicken problem seems like (our)codespell does not follow their doc codespell-project/codespell#3272 (don't know which date our version is seems like our codespell is case-insensitive in either case |
i read it wrong by case sensitive they mean you need to type it the way they have it in their typo database (usually lowercase) (i don't know why they do it like this) codespell-project/codespell#2451 (comment) so for WEL to not trigger you need to add wel but then wel and wEl and Wel would also not trigger |
So this was quite a rabbit hole. No matter what I did I could not reproduce the issue... It turns out that Linux Mint had an old version of codespell. Now I installed it via pip and not form the package repository and now it complains about WEL as well, yay. BUT... the latest Release of codespell does not support the inline ignores yet. It's only implemented in the development version. A workaround is changing "WEL" to "WEL-Flag" and now the spellcheck passes. |
2113699
to
1e2953d
Compare
The last PR adds a new test for the mtd_spi_nor driver which tests the security features. Contrary to what I wrote in the first post, you can trigger the P_FAIL (Program Failed) and E_FAIL (Erase Failed) flags by protecting a block with the Block Protect flags in the status register. The test is not suuuper neat because I had to copy some internal functions from the mtd_spi_nor driver to have direct access to the register. This seemed the least intrusive way to create the tests. This is a log from the most relevant test for this PR, checking the security functions with the Block Protect bits.
These two messages are generated by the new code that was created by @vincent-d:
HOWEVER... I do not love the solution with the configuration (setting both Flag and Opcode) in drivers/mtd_spi_nor/mtd_spi_nor_configs.c. On one hand it's unnecessary duplication and on the other hand it is really easy to forget to set one of the two (ask me how I know...). And if you forget to set the correct opcodes, it results in false positive security behaviour. An undefined opcode is set to 0x00 (NOP) and that returns 0x00. This 0x00 is interpreted as "everything is fine" by the security code. My preferred option would be adding some Pseudomodules, similar to the AT24CXXX driver. However I'm not sure if it would make sense to do it the same way by allocating every namespace that begins with "M25F...", "M25R...", "IS25LP...", "IS25LE...", "SST25V..." etc. A better option would probably be something like "MTD_SPI_NOR_MX", "MTD_SPI_NOR_IS", "MTD_SPI_NOR_SST" for defining the opcode set and "MTD_SPI_NOR_SECURITY" and (in a future expansion) "MTD_SPI_NOR_ECC" for defining the features. Any other ideas about this are very welcome. |
8509a22
to
01d79f8
Compare
@benpicco You reviewed the original PR, could you take a quick look at this to give me some guidance on the previous question? You don't have to review this PR yet. |
cf373f9
to
c2ca283
Compare
I implemented the Pseudomodule solution and IMO it is way more elegant than the original idea and scales a lot better. So one thing to add to this function is trying to clear the flag manually perhaps after a certain number of tries. (As well as adding the timeout). Therefore it took a lot of testing and fiddling until the ISSI chip was running as expected and the solution with the Pseudomodules proved to be superior in this case because I was able to add the additional OpCodes conditionally. Unfortunately I did not separate the changes in separate commits because I did not intend them to become this involved... |
I think this PR is already quite extensive and beyond the originally intended scope, so I would like to postpone adding the timeout tasks for now to a separate PR. |
Contribution description
This is a takeover of PR #14908, which has been abandoned. The goal is to implement the proposed changes in the original PR to make it mergeable.
The changes to make it mergeable after four years were necessary due to the following commits:
60eb75d
This commit removed the "mtd_write_page" function in favor of the "mtd_write_page_raw" function.
ea105d3
This commit removed the "mtd_spi_nor_write" function, which was changed in the original PR as well but these changes are not necessary anymore.
Open Tasks:
The review from @benpicco ( mtd_spi_nor | littlefs: improve reliability with corrupted flash #14908 (review) ) has to be included in this PR still.
My plan was to add a header file to the drivers/mtd_spi_nor folder which would probably be called
something like "drivers/mtd_spi_nor/include/mtd_spi_nor_security.h""drivers/mtd_spi_nor/include/mtd_spi_nor_defines.h".The security bits used by this PR are not universal to all manufacturers of Flash chips, specifically the Flash used by the original author @vincent-d is from Macronix and for example ISSI uses a different register and different bit positions.
I'll have to check if they are manufacturer specific or device specific.The Kconfig related changes
can probably bewere deleted, as the Kconfig subsystem has been removed from the mtd_spi_nor driver.There is no timeout for the added functions wait_for_write_enable_cleared and wait_for_write_enable_set, which would get the thread stuck if the chip does not answer.
The function wait_for_write_complete does not have a timeout either but it counts the attempts and how many times it yielded the thread. This can be used as a basis for a timeout.
Write tests for the Macronix and ISSI flash security features.
Add IS25LP128 and IS25LE01G as DuTs to the test (and possibly more, whatever the magic drawer might supply).
Testing procedure
Testing this might be somewhat difficult, as I do not have a flash chip with broken memory cells (at least none that I know of)
and I do not have the Macronix chip.I will buy the MX25L51245G flash from the original PR to test the functionality and for testing the handling of broken memory cells we'll have to get creative.Update: Two MX25L12873F chips are on the way to me which have the same security register as the aforementioned chip.
Update 2: The mechanisms in the code to detect corrupt memory areas can be triggered by applying protection to a certain area in the chip.
Therefore it is possible to create a test case by "destroying" a chip.The PR comes with new tests that utilize the Block Protect functions from NOR Flashs which triggers the Program Fail and Erase Fail flags. The Block Protect bits are not permanent and can be reset easily.
The nRF52840DK development board has a suitable flash chip on board and is covered by the test.
All tests should run successfully to verify the PR is working correctly.
Issues/PRs references
This closes #14908.