Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow concurrent MediaInfoStore retrieval #4639

Closed

Conversation

SurfaceS
Copy link
Contributor

@SurfaceS SurfaceS commented May 8, 2024

allow concurrent MediaInfoStore retrieval.

Less blocking (only on same filename), but keep synchronized as needed.

for @ik666 :
why synchronized ?
STORE.containsKey(filename) && STORE.get(filename).get() != null
Thread1 => STORE.containsKey(filename) = true
Thread2 => STORE.clear()
Thread1 => STORE.get(filename).get() != null => Throw exception.

@ik666
Copy link
Contributor

ik666 commented May 8, 2024

@SurfaceS
Yeah, I was rushing a little to fast. But generally speaking we should check if we can program code that does not lock on its own, if there is a special data structure like ConcurrentHashMap with optimized behaviour. If we do lock on our own, the optimization of the ConcurrentHashMap implementation will be thrown away and we could replace it with a HashMap again ...

Therefore the above example could be written this way :

		MediaInfo mediaInfoCached = STORE.get(filename).get();
		if (mediaInfoCached != null) {
			return mediaInfoCached;
		}

No lock needed, as far I can see it ... I will scan the class for more places to optimize and will commit it to the other PR

@ik666
Copy link
Contributor

ik666 commented May 8, 2024

One more thing, please. If you don't mind, it would be nice to work on the same branch. Otherwise, we probable will get merge conflicts and I am usually the one who runs behind you, because you're merging your changes to main, before my PR.

@ik666
Copy link
Contributor

ik666 commented May 8, 2024

I added an optimized version to PR #4635. I think that should do it and prevent NPE's.

@SurfaceS
Copy link
Contributor Author

SurfaceS commented May 8, 2024

STORE.get(filename).get() = NPE

@SurfaceS
Copy link
Contributor Author

SurfaceS commented May 8, 2024

One more thing, please. If you don't mind, it would be nice to work on the same branch. Otherwise, we probable will get merge conflicts and I am usually the one who runs behind you, because you're merging your changes to main, before my PR.

you should split your code into multiple PR, because some solution may be merged, some not.

@ik666
Copy link
Contributor

ik666 commented May 8, 2024

STORE.get(filename).get() = NPE

Yeah thanks, but I think you got the idea behind it. Should work now.

@SurfaceS
Copy link
Contributor Author

SurfaceS commented May 8, 2024

If we do lock on our own, the optimization of the ConcurrentHashMap implementation will be thrown away and we could replace it with a HashMap again ...

True, and that why we use synchronize.
ConcurrentHashMap is not locking the Map (object) but only portion of it, when asked. Under the hood it use synchronize itself as well.
synchronize lock the object (here the entire map). Then everything you do under bracket lock the object.
Here, we want it as we check if a key is there, then get the value of this key.

why synchronized ?
STORE.containsKey(filename) && STORE.get(filename).get() != null
Thread1 => STORE.containsKey(filename) = true
Thread2 => STORE.clear()
Thread1 => STORE.get(filename).get() != null => Throw exception.

STORE.get(filename).get() still throw NPE with a ConcurrentHashMap.

Latency talking, it is nearly 0 improvement (really relative/subjective), because the ConcurrentHashMap will lock itself each time we request.
There is many discussion about this on the web.

On the code proposed here, the STORE is locked only on doing something under it (get/set/clear, and so on).

Before, it was locked when parsing: that was the blocking thing.

@ik666
Copy link
Contributor

ik666 commented May 8, 2024

As far as I know, the ConcurrentHashMap synchronizes only on the bucket the object lies in. This is of course a much butter approach as to sync on the HashMap itself. Otherwise this class would be useless ...

@ik666
Copy link
Contributor

ik666 commented May 8, 2024

And this doesn't need the sync I believe ...

WeakReference<MediaInfo> mediaInfoCached = STORE.get(filename);
		if (mediaInfoCached != null && mediaInfoCached.get() != null) {
			return mediaInfoCached.get();
		}

The benefit is, when you access Objects that lays in different buckets, this implementation is much faster. And the benefit is measurable, since the profiler was pointing out to this implementation as a bottleneck.

@ik666
Copy link
Contributor

ik666 commented May 8, 2024

@SurfaceS I'd propose to finalize PR #4635 and see, if we run into any issues, leaving this PR out of scope, since it brings more complexity without any value. We can later pick it up, if needed.

The NPE should be fixed in 235be72

@ik666
Copy link
Contributor

ik666 commented May 8, 2024

However guys ... I'm off for the weekend. Will be back on Monday.

@SurfaceS
Copy link
Contributor Author

SurfaceS commented May 8, 2024

On this file, synchronized (STORE) was used for 2 things.

  • Prevent concurrency access (as the STORE is an HashMap).
  • Prevent multiple time the same request (bad usage).

bottleneck is here :

synchronized (STORE) {
	open db.
	query db.
	parse mediaInfo.
	update db.
	and so on.
}

Removing the synchronized remove the bottleneck, but add this unwanted issue.

That's why this PR add the synchronized on filename.

About ConcurrentHashMap vs synchronized :

You should favor synchronized Map when data consistency is of utmost importance, and you should choose ConcurrentHashMap for performance-critical applications where there are far more write operations than there are read operations (+5% benchmark).

That's why I doubt the real benefit of using ConcurrentHashMap, as we read far more that we write.

@SurfaceS SurfaceS changed the title allow concurrent mediainfostore retrieval allow concurrent MediaInfoStore retrieval May 8, 2024
@ik666
Copy link
Contributor

ik666 commented May 8, 2024

  • Prevent multiple time the same request (bad usage).

Yeah. I think this is solved in 517d264. You might want to have a look at the current PR implementation.

@Nadahar
Copy link
Contributor

Nadahar commented May 8, 2024

I hate to ruin the party, but there's a reason why media parsing is serialized in the first place. I doubt there's much performance gain to be had here unless there is some other "issue" at hand.

The reason is that media parsing is IO bound. It's not the time the CPU spends to analyze that matters, it's the time it takes to retrieve the data (in almost all cases at least, unless maybe if you're using a RAM drive or similar). Asking a harddrive to retrieve multiple files in parallel is slower than doing it serially - because a harddrive is fastest when reading large chunks of data, not doing "random access". I'm pretty sure the same is true for SSDs.

Thus, running multiple parsing processes will slow things down, not speed it up. The only exception is where the data resides on different devices. I'd say that is a quite rare circumstance that don't outweigh the slowdown achieved during "normal" parsing with multiple files on the same device.

I'd say that the order/priority of the parsing probably is "the problem", not the fact that it's running sequensially.

@ik666
Copy link
Contributor

ik666 commented May 8, 2024

@Nadahar You're complete right ... the issue is the startup experience. If the media scanner is building up the database and a user makes a browse request in that time, the browse request and the library scanner are blocking each other. Same for multiple browse requests. This ought to be solved in #6535.

This was referenced May 8, 2024
@SurfaceS
Copy link
Contributor Author

included in v14.1

@SurfaceS SurfaceS closed this May 14, 2024
@SurfaceS SurfaceS deleted the main-mediainfostore-block branch May 14, 2024 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants