NFFS can store a valid file-system instance which is unrecoverable after reset. #10

nvlsianpu · 2018-09-24T15:40:30Z

It is possible that during regular filesystem operation filesystem footprint is shaped such that during subsequent NFFS restore operation maximum number of block will be exceeded in peek. This is caused by difference in algorithm which should lead to expected (the same) NFFS runtime variables state.

Proposition for resolution of that is to rework NFFS restore in such manner that will be more sequential process than is now:

1st restore inodes to RAM
2nd restore valid block to RAM, thanks to that inodes state are known, the maximum block number shouldn’t be exceeded.

It also looks like both scans should be perform by traversing oldest to newest area. Descending solution should be even more effective, but it is not supported by nffs objects on the flash.

nvlsianpu · 2018-09-24T15:46:36Z

@carlescufi @andrzej-kaczmarek Any comments on that. I will try to resolve the issue soon, is there anyone worth to be added in the loop for discussions?

nvlsianpu · 2018-09-25T09:14:53Z

In consequence at some point of execution call rc = nffs_restore_object(&disk_object); returns FS_ENOMEM error code, which is ignored.

carlescufi · 2018-09-25T09:38:07Z

@andrzej-kaczmarek @sjanc can you please help us make this known upstream in Mynewt to see if someone there can help @nvlsianpu with the analysis and fix?

carlescufi · 2018-09-25T09:40:45Z

@ccollins476ad could you take a look at this?

ccollins476ad · 2018-09-27T17:51:23Z

Thanks for reporting this, @nvlsianpu. I agree with the analysis, and I think the proposed solution is a good one.

EDIT: I keep convincing myself that there is still the possibility of failure, even if inodes are processed before data blocks. I will check back after I refamiliarize myself with NFFS!

ccollins476ad · 2018-09-27T21:13:01Z

Just to follow up on my last comment - I do think a problem remains, even if we fully process inodes before data blocks. Specifically, the inode pool can become exhausted during file system restoration.

The below is not the greatest explanation in the world. Please don't hesitate to ask if something is not clear.

Unfortunately, when we discover an inode deletion record, we cannot immediately discard the deleted inode and its children. Instead, we need to remember that these items are deleted. If we were to discard deleted inodes during the restore process, that would introduce an ambiguity for subsequently discovered children: is the parent deleted, or has it just not been discovered yet? Currently, we are allowed to assume that the parent is not discovered yet, because we maintain records of deleted inodes.

I have an idea for how to solve this. I think the core issue here is that the contents of an NFFS disk are not in chronological order. For example, say a disk contains 1) a parent directory P, 2) a child file C, and 3) a deletion record for P. NFFS allows these objects to be be located on disk in any order relative to one another. If we had some guarantee that children always appear after their parent, and that deletion records are after the inodes they delete, then the restore process would not need to remember which inodes were deleted. Instead, NFFS could assume that an orphan's parent has been deleted, and could safely discard the orphan.

Currently, NFFS writes an object to any disk area that can accommodate it. This is what introduces the uncertainty in the ordering of objects. Instead, NFFS could write to areas in a sequential fashion. If the object being written does not fit in the current area, NFFS would select the next area as its new current area. Nothing new would get written to the first area until it is garbage collected, even if it could accommodate a subsequent write.

nvlsianpu · 2018-10-01T12:09:00Z

Thanks for elaboration - you are right in your analyze.

What you propose will increase wear levering, especially if object are big in comparison to area size.

We can consider usage of additional RAM temporary, just when initializing:

inode size on disk: 20 B + file_name/directory_name
data_block size on disk: 24 B + data
nffs_area header on disk: 3 B

File deletion:
the worst case: the smallest file create and immediate delete:
[data_block of 1 B data] + [inode of 1 B file_name] + [deletion inode] = 25 + 21 + 20 = 66 B

For each deleted inode only ID and sequence number must be known during file image parsing -> 8 B

each the worst case required 8 B for store required deletion info.

The worst case is that whole partition is filled by deleted files, no directory at all, regular area size:

deleted_inode_info_max = ((Partition_size - area_size - areas_count * 3) / 66 ) * 8. So ratio is around 1/8 [RAM/ROM]

For NFFS which contain 2 areas of size 4096 required 62 item of deleted inode info -> 496 B required.
I think this is much tu much. And will be even worse if considers directory-create-delete scenario.

But I have also idea of change management of IDs:

nvlsianpu · 2018-10-01T12:37:54Z

Lest assume that NFFS will try to avoid make holes in inode ID range - this mean that it will use only smal ranges of ID, count of the maximum number of each kind of inode. This mean that we need to remember (ID, Sequence number) pair of deleted inode.

struct nffs_deleted_inode {
    uint32_t ndi_id;            /* deleted object ID. */
    uint32_t ndi_seq;           /* Sequence number. */
};

Whenever NFFS attempt to create a new inode it will take spare ID form deletion list (uses it with increment sequence number obviously), and just in case the list was empty it will get brand new ID.
For current NFFS we need to create a list of deleted inode ID-Seq pairs with capacity of FS_NFFS_NUM_INODES * 2 (as there are FILE and DIRECTORY inode kind).

Such a patch will ensure that we will always fit in memory constrains when restoring NFFS, but RAM consumption per inode will increase.

ccollins476ad · 2018-10-02T22:15:45Z

@nvlsianpu, I think I understand your proposed solution, and I believe it would indeed solve this problem. However, I think there may be another issue that the solution does not address (this issue exists in the current implementation as well).

Key

Symbol	Meaning
DEL x	Deletion record for inode `x`
fx(p=y)	File inode with ID `x`; parent directory has ID `y`
bx(p=y)	Data block with ID `x`; parent file has ID `y`

Say we have the following NFFS layout:

Area 0	Area 1	Scratch	Area 3
f100(p=0)	b3(p=100)	...	b5(p=200)
b1(p=100)	b4(p=100)	...	b6(p=200)
b2(p=100)	f200(p=0)	...	DEL 100

In other words, we have two files, 100 and 200. Each file contains several data blocks. Finally, file 100 has recently been deleted.

When garbage collection occurs, the disk will now look like this:

Area 0	Area 1	Area 2	Scratch
f100(p=0)	b3(p=100)	b5(p=200)	...
b1(p=100)	b4(p=100)	b6(p=200)	...
b2(p=100)	f200(p=0)		...

That is, the deletion record gets erased.

If the system reboots, file 100 now looks valid again; its deletion record was lost.

I might not be thinking hard enough, but I still believe we need some chronological ordering of records within the disk.

nvlsianpu · 2018-10-03T09:53:44Z

So deleted inode record are not GC properly as i understand - I think if we apply my proposition of manage ID ranges then we will be able to keep deletion inode entry RAM representation, It might be extension to previously proposed struct nffs_deleted_inode.

ccollins476ad · 2018-10-03T14:21:15Z

@nvlsianpu , perhaps I just don't understand your proposed solution. During garbage collection, how would NFFS know to preserve the deletion record?

nvlsianpu · 2018-10-04T09:14:38Z

so struct nffs_deleted_inode need to know where overridden inode is - so need to add a reference to it. This reference should be null when referenced inode is deleted.

nvlsianpu · 2018-10-04T10:27:30Z

@ccollins476ad - I just looked at the nffs_gc() code https://github.com/apache/mynewt-nffs/blob/master/src/nffs_gc.c#L498. It looks like block which don't have parent inode in the `from_area' won't be copied. Am I right, or i missed something.

ccollins476ad · 2018-10-04T18:29:24Z

@nvlsianpu, I don't believe that is the case. The code applies nffs_gc_inode_blocks() to every inode in the system, regardless of which area they reside in.

nvlsianpu · 2018-10-04T19:41:17Z

@ccollins476ad You are right - I miss the iteration range. Sorry for false alarm.
What do you think about:

so struct nffs_deleted_inode need to know where overridden inode is - so need to add a reference to it. This reference should be null when referenced inode is deleted.

?

MitterdorferMathias · 2018-10-17T08:19:19Z

Hallo,
we are currently using NFFS in our project. We also recognized problems during the file system init phase, if the same file was written multiple times.
To resolve this issue do you think that there are changes of the metadata stored in flash required?
(concerning migration of NFFS for a firmware update)

nvlsianpu · 2018-10-17T10:02:23Z

@MitterdorferMathias As you can read actually there are 2 bug-fix way. In both metadata of data and inode object are kept the same. It is possible that metadata of the area object will change.
But for botch solution NFFS will be unable to parse previous FS flash representation.

nvlsianpu · 2019-02-19T09:03:47Z

@ccollins476ad @utzig Any update?

NFFS is removed as it has serious bugs (by design) which haven't been resolved since extended range of time. One of most serious issues bunch were described here: apache/mynewt-nffs#10 Since lack of support NFFS upsterem it doesn't make sense to keep it in zephyr. Signed-off-by: Andrzej Puzdrowski <andrzej.puzdrowski@nordicsemi.no>

nvlsianpu mentioned this issue Sep 24, 2018

NFFS issue after many writes by btsettings zephyrproject-rtos/zephyr#9748

Closed

nvlsianpu changed the title ~~NFFS can store valid a file-system instance which is unrecoverable after reset.~~ NFFS can store a valid file-system instance which is unrecoverable after reset. Sep 24, 2018

This was referenced Nov 3, 2018

Add zffs zephyrproject-rtos/zephyr#11026

Closed

NFFS: Handle unexpected Power Off zephyrproject-rtos/zephyr#11213

Closed

nvlsianpu mentioned this issue Mar 1, 2019

tests/subsys/fs/nffs_fs_api/common/nffs_test_utils.c fail with Assertion failure on nrf52840 zephyrproject-rtos/zephyr#12635

Closed

nvlsianpu mentioned this issue Jan 9, 2020

fs: Remove NFFS support zephyrproject-rtos/zephyr#21793

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NFFS can store a valid file-system instance which is unrecoverable after reset. #10

NFFS can store a valid file-system instance which is unrecoverable after reset. #10

nvlsianpu commented Sep 24, 2018 •

edited

nvlsianpu commented Sep 24, 2018

nvlsianpu commented Sep 25, 2018

carlescufi commented Sep 25, 2018

carlescufi commented Sep 25, 2018

ccollins476ad commented Sep 27, 2018 •

edited

ccollins476ad commented Sep 27, 2018

nvlsianpu commented Oct 1, 2018

nvlsianpu commented Oct 1, 2018

ccollins476ad commented Oct 2, 2018

nvlsianpu commented Oct 3, 2018

ccollins476ad commented Oct 3, 2018

nvlsianpu commented Oct 4, 2018

nvlsianpu commented Oct 4, 2018

ccollins476ad commented Oct 4, 2018

nvlsianpu commented Oct 4, 2018

MitterdorferMathias commented Oct 17, 2018

nvlsianpu commented Oct 17, 2018

nvlsianpu commented Feb 19, 2019

NFFS can store a valid file-system instance which is unrecoverable after reset. #10

NFFS can store a valid file-system instance which is unrecoverable after reset. #10

Comments

nvlsianpu commented Sep 24, 2018 • edited

nvlsianpu commented Sep 24, 2018

nvlsianpu commented Sep 25, 2018

carlescufi commented Sep 25, 2018

carlescufi commented Sep 25, 2018

ccollins476ad commented Sep 27, 2018 • edited

ccollins476ad commented Sep 27, 2018

nvlsianpu commented Oct 1, 2018

nvlsianpu commented Oct 1, 2018

ccollins476ad commented Oct 2, 2018

Key

nvlsianpu commented Oct 3, 2018

ccollins476ad commented Oct 3, 2018

nvlsianpu commented Oct 4, 2018

nvlsianpu commented Oct 4, 2018

ccollins476ad commented Oct 4, 2018

nvlsianpu commented Oct 4, 2018

MitterdorferMathias commented Oct 17, 2018

nvlsianpu commented Oct 17, 2018

nvlsianpu commented Feb 19, 2019

nvlsianpu commented Sep 24, 2018 •

edited

ccollins476ad commented Sep 27, 2018 •

edited