Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"this._changes is undefined" when adding records to auto-loaded db & collection #183

Open
cepm-nate opened this issue Sep 23, 2020 · 9 comments
Assignees

Comments

@cepm-nate
Copy link

cepm-nate commented Sep 23, 2020

I'm submitting a...


[ ] Regression (a behavior that used to work and stopped working in a new release)
[X] Bug report
[ ] Feature request
[ ] Documentation issue or request
[ ] Support request

Current behavior

This error shows up after saving 500,000 rows, then refreshing the browser and trying to save the next set of rows. Code is simple to replicate.

Error Code


Uncaught (in promise) TypeError: this._changes is undefined
    _createChange collection.ts:1148
    _createInsertChange collection.ts:1158
    _insertMetaWithChange collection.ts:1167
    insertOne collection.ts:726
    insert collection.ts:665

Expected behavior

No error, rows inserted into collection.

Minimal reproduction of the problem with instructions

https://jsbin.com/nejijiberu (Copy to local dev server if needed. it's hard to make LokiDB work on jsBin) Same code as below

		let testCollection; // we will assign it after persistence
		IndexedStorage.register();
		PartitioningAdapter.register();

		let idbAdapter = new IndexedStorage(dbName);
		let paAdapter = new PartitioningAdapter(
			idbAdapter,
		 	{
				paging: true, 
				pageSize: 5*1024*1024 
			}
		);

		wdb = new Loki(dbName);

		let persistenceOptions = {
			adapter: paAdapter,
			autosave: window.self === window.top,
			autosaveInterval: 1000,
			autoload: window.self === window.top,
			throttledSaves: true,
			persistenceMethod: 'indexed-storage',
		};

		wdb.initializePersistence(persistenceOptions)
		.then(() => {
			console.log('Loki Persistence initialized');
			// Initialize collection
			testCollection = wdb.getCollection('Test');
			if (!testCollection) testCollection = wdb.addCollection('Test');
			// Display number of rows IN collection
			const oldRecords = testCollection.find();
			console.log('Number of rows loaded from IndexedDB collection:',oldRecords.length);
			// Use verbose method to get maximum id of existing records.
			const oldIds = oldRecords.map((r) => parseInt(r.id));
			const oldIdsLength = oldIds.length;
			let maxId = oldIds[0] || 0;
			for (let i = 1; i < oldIdsLength; ++i) {
				if (oldIds[i] > maxId) {
					maxId = oldIds[i];
				}
			}
			// Fetch placeholder data
			fetch('https://jsonplaceholder.typicode.com/photos') 
				.then(response => response.json())
				.then(json => {
					console.log('Enlarging result set from jsonplaceholder...');
					let bSet = json;
					// Make set larger
					bSet = [
						...bSet,...bSet,...bSet,...bSet,...bSet,
						...bSet,...bSet,...bSet,...bSet,...bSet,
					];
					bSet = [
						...bSet,...bSet,...bSet,...bSet,...bSet,
						...bSet,...bSet,...bSet,...bSet,...bSet,
					];
					// bSet = [
					// 	...bSet,...bSet,...bSet,...bSet,...bSet,
					// 	...bSet,...bSet,...bSet,...bSet,...bSet,
					// ];
					console.log('Cloning',bSet.length,'records...');
					bSet = JSON.parse(JSON.stringify(bSet)); // clone so we can reset id field
					console.log('Resetting',bSet.length,'id fields, starting at',maxId+1);
					for (i = 0; i < bSet.length; i += 1) bSet[i].id = i+maxId+1; // reset id field
					console.log('Adding',bSet.length,'rows from jsonplaceholder into LokiDB...');
					testCollection.insert(bSet);
					console.log('Added rows from jsonplaceholder. Refresh to see if they stick around.');
				})

		})
		.catch((e) => console.error('initializePersistence error', e.message));

What is the motivation / use case for changing the behavior?

I need to save large numbers of records (about 2 mil) across 80 or so collections. The above bug was encountered while I was attempting to make an easy way to replicate a secondary bug.

The secondary bug: With 80 or so collections and ~2M records saved (and showing up in both IndexedDB and in memory), reloading the browser would cause the collections to vanish, even though I can still see the data sitting in IndexedDB. I figured it'd be best to focus on the primary bug first, then try to figure out why something between LokiDB, IndexedStorage, and PartitioningAdapter was failing to autoload the 80+ collections.

Environment


LokiDB version: 2.1.0

Exact same error in Firefox 81.0 and Chrome 85.0.4183.121.
@Viatorus Viatorus self-assigned this Sep 24, 2020
@cepm-nate
Copy link
Author

cepm-nate commented Sep 24, 2020

My current workaround for the main bug mentioned is to put

if (!collection.getChanges()) collection.flushChanges();

just before the collection.insert() line, so that the _changes property is created before the .insert() call. Thanks for accepting this as a bug!

.. and kudos for the speed of LokiDB! In my tests, it outperforms lokiJS by a huge factor!

@Viatorus
Copy link
Member

So the problem is somewhere inside the PartitioningAdapter. But I am not sure where. Never used this one. I will take a deeper look at the weekend.

@cepm-nate
Copy link
Author

Thank you so much! I'm looking through the partitioning adapter code, and can't find any references to _changes, so was not sure where else to go.

I think I've run into another part of the glitch with the 2nd bug mentioned as well, and pinpointed the difference. Sometimes after running my own data into it and refreshing the browser (making sure to wait until there are no _dirty collections by looking at wdb._persistenceAdapter._dirtyPartitions.length ), there is a difference between these two data sets:

a) wdb._collections (empty)
b) wdb._persistenceAdapter._dbref.collections (has first few collections, then lots of empty ones. Always stops at same collection).

Flow:

  • Copy data from external into Loki, and then it autosaves to IndexedDB.
  • Wait for wdb._persistenceAdapter._dirtyPartitions.length = 0
  • Refresh browser
  • Get (a)(b) issue described above.

Oddly, right before refreshing the browser, the count of wdb._persistenceAdapter._dbref.collections where _dirty is true is always 1. And it's always the LAST collection that has that flag set. :-/ Even if I wait for 5-10 minutes, that last collection remains marked as DIRTY, but is not mentioned in wdb._persistenceAdapter._dirtyPartitions. It could just be the way they do loops and unmark the collections, but I find the _dirty flag to be an oddity there.

The real problem is the (a)(b) issue above, where the adapter seems to silently stop loading data from IndexedDB. No errors thrown.

I'm going to try using several different splitting characters to see if that's hitting something in my data, as well as pinpoint exactly the record that's making it drop the ball and will report back here.

@cepm-nate
Copy link
Author

cepm-nate commented Sep 25, 2020

2nd Issue Updates

Common results:

  • wdb._collections is empty after browser refresh (refreshed browser after loading data from external source, then waiting for all _dirty flags to be OFF, then waiting 1 more minute).

Tests & Results:

  • Changing 'Page Size' up to 24MB increases the number of collections in wdb._persistenceAdapter._dbref.collections to between 11-17 that have _data.
  • Changing 'Page Size' down to 5 MB results in 5-7 wdb._persistenceAdapter._dbref.collections that have _data.
  • Changing 'delimiter' around does not change results much. (tried all sorts of characters and combinations that I know are not in the data)
  • Turning paging OFF results: about the same.
  • Refreshing after waiting 10 seconds after 0 _dirty collections: results are about the same as refreshing after waiting two minutes after 0 _dirty collections.
  • Even with exact same settings, wdb._persistenceAdapter._dbref.collections with _data varies by 3 or 4 collections each time. (looks like that's just based on the order the collections are saved/loaded in. Not all are equal size, and order changes by 4 or 5 slots)

Testing Procedure:

  • Chrome DevTools: Application tab: Delete Database (confirm)
  • Shift-Refresh page
  • Sync data from external source into LokiDB
  • Wait for _dirty flag count in be zero. (and then additional time as described above)
  • Shift-Refresh page
  • Chrome DevTools console: wdb._persistenceAdapter._dbref._collections.map((c) => `${c.name}: ${c._data.length}`) to view collections and data count.

Conclusion so far: Somewhere between IndexedDB and the Persistance Adapter, things are failing silently.

Is there a command I can use to 'Reload data from iDB into memory, just as Autoload does'? That way I could test without refreshing the browser. I've tried various combinations of all the loadDatabase() calls I could find, but nothing seemed to trigger the same condition as refreshing the browser.

Further Updates

Something looks off with the keys in IndexeDB, right after loading database from external source (before refreshing browser):
image
And funnily enough, after I refresh the browser and peer into wdb._persistenceAdapter._dbref._collections, that is right where the data stops.

I dropped a console.warn('dirty:', JSON.stringify(this._dirtyPartitions)); into the end of exportDatabase(dbname, dbref). (Marked up where missing)
image
...and it looks like those 'missing partition numbers' are never being marked as dirty, and thus never being saved to iDB.

As _loadNextPartition() expects the partition numbers to be contiguous, then it seems the bug has to do with the collection/partition not being marked dirty.

So, I changed the 'autosaveInterval' to 5000 instead of 1000, and got this:
image
And as you can see, it's still not catching that some of the collections are dirty. It is as if the saving to iDB is happening at the same time as other data is being added to the DB in memory, then _autosaveClearFlags is being called which resets ALL _dirty flags to false, instead of just the ones that were actually saved. Do I need to cache ALL the incomming data somewhere else first, then dump it all into the LokiDB database at once (all collections) so it all fits within one "saveInterval" window? It still seems a little risky, as data might be added while a background save is happening....

@Viatorus
Copy link
Member

Wow, great analysis!

I found the first bug. I don't know what chased me to rename the serialization output (removing the underscore)...with the correct naming it works.

Please have a look here: #184

Could you provide a small code example for the second bug? Maybe it is gone with the fix.

@cepm-nate
Copy link
Author

#184 fixed the first bug, thank you!

The 2nd bug remains. I'll see if I can work up some code with timeouts or intervals to repeat it. :-)

@cepm-nate
Copy link
Author

cepm-nate commented Sep 25, 2020

2nd Issue Repeatable Code

After running the below code, take a look at the IndexedDB storage, and you should notice some keys missing. On mine it likes to skip mm.db.4, 9, 14, and 19 consistently.

It helps if you stick console.warn('dirty:', JSON.stringify(this._dirtyPartitions)); just above the return statement in exportDatabase() of the partitioningAdapter code file.

		function showAndAddTestData(db) {
			// Initialize collections
			const numCollections = 20;
			let col;

			console.log('Data loaded before reset:');
			db._collections.forEach((c) => {
				console.log(c.find().length + ' rows in ' + c.name);
				db.removeCollection(c.name);
			});

			console.log('Collections cleared. Now fetching...');

			setTimeout(() => { // timer to make sure the CLEAR command cleans out iDB too.
				// do the request just once...
				fetch('https://jsonplaceholder.typicode.com/photos') 
				.then(response => response.json())
				.then(json => {
					console.log('JSON fetched, saving...');
					let collection, i = 0;
					const it = setInterval(() => {
						// only do this a few times
						if (i > numCollections) { clearInterval(it); return; }
						// Get/Create collection
						collection = db.addCollection(`Test_${i}`);
						// Add clone to collection
						collection.insert(JSON.parse(JSON.stringify(json)));
						i += 1;
					}, 200); // END OF interval
				});

			}, 1000); // END OF timer to make sure the CLEAR command is echoed to iDB
		};

		
		let idbAdapter = new IndexedStorage('mm.db');
		let paAdapter = new PartitioningAdapter(idbAdapter);

		twdb = new Loki('mm.db');

		let persistenceOptions = {
			adapter: paAdapter,
			autosave: window.self === window.top,
			autosaveInterval: 500,
			autoload: window.self === window.top,
			throttledSaves: false,
			persistenceMethod: 'indexed-storage',
		};

		console.log('Loki started. Initializing Persistence...');

		return twdb.initializePersistence(persistenceOptions)
		.then(() => {
			console.log('Loki Persistence is Initialized');
			showAndAddTestData(twdb);
			return true;
		})
		.catch((e) => console.error('initializePersistence error', e.message));

I took a stab at letting the partitioning-adapter.js unmark the collections as _dirty after saving (instead of depending on the main Loki code to do it all-at-once), but it seems like it's _dirty flag unsetting never made it back up to the main database. Is this an architectural issue? Is there a way to make the _dirty flag changes echo back up to the main database?

@cepm-nate
Copy link
Author

cepm-nate commented Sep 25, 2020

As a workaround in my code, I added a callback into the Partitioning Adapter code that is called whenever a partition is saved. Then I tied into that in my application code and reset the _dirty flag for that partion, and it is all working. (and also commented out the this._autosaveClearFlags(); that is called within lokiDB's _saveDatabase function for the "reference" mode adapters) But that feels terribly dirty. I'd love to hear what solution you'd recommend.

@cepm-nate
Copy link
Author

cepm-nate commented Sep 27, 2020

It looks like wdb._collections.forEach((c) => { wdb.removeCollection(c.name); }); only deletes HALF of the collections?? When you run my code above several times, take a close look at the qty of _data rows. They should always be exactly 5000 per collection, but for some reason the first few of them seem to keep growing when you refresh the browser.

wdb._collections.forEach((c) => { wdb.removeCollection(c.name); }); deletes half of wdb._collections each time, but wdb._persistenceAdapter._dbref._collections stays at the full number.

I've split the bug of db.removeCollection(name) not triggering a database autosave into a seperate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants