allow concurrent MediaInfoStore retrieval #4639

SurfaceS · 2024-05-08T10:22:50Z

allow concurrent MediaInfoStore retrieval.

Less blocking (only on same filename), but keep synchronized as needed.

for @ik666 :
why synchronized ?
STORE.containsKey(filename) && STORE.get(filename).get() != null
Thread1 => STORE.containsKey(filename) = true
Thread2 => STORE.clear()
Thread1 => STORE.get(filename).get() != null => Throw exception.

ik666 · 2024-05-08T11:51:18Z

@SurfaceS
Yeah, I was rushing a little to fast. But generally speaking we should check if we can program code that does not lock on its own, if there is a special data structure like ConcurrentHashMap with optimized behaviour. If we do lock on our own, the optimization of the ConcurrentHashMap implementation will be thrown away and we could replace it with a HashMap again ...

Therefore the above example could be written this way :

		MediaInfo mediaInfoCached = STORE.get(filename).get();
		if (mediaInfoCached != null) {
			return mediaInfoCached;
		}

No lock needed, as far I can see it ... I will scan the class for more places to optimize and will commit it to the other PR

ik666 · 2024-05-08T11:54:26Z

One more thing, please. If you don't mind, it would be nice to work on the same branch. Otherwise, we probable will get merge conflicts and I am usually the one who runs behind you, because you're merging your changes to main, before my PR.

ik666 · 2024-05-08T11:58:28Z

I added an optimized version to PR #4635. I think that should do it and prevent NPE's.

SurfaceS · 2024-05-08T12:08:27Z

STORE.get(filename).get() = NPE

SurfaceS · 2024-05-08T12:10:45Z

One more thing, please. If you don't mind, it would be nice to work on the same branch. Otherwise, we probable will get merge conflicts and I am usually the one who runs behind you, because you're merging your changes to main, before my PR.

you should split your code into multiple PR, because some solution may be merged, some not.

ik666 · 2024-05-08T12:22:28Z

STORE.get(filename).get() = NPE

Yeah thanks, but I think you got the idea behind it. Should work now.

SurfaceS · 2024-05-08T12:37:24Z

If we do lock on our own, the optimization of the ConcurrentHashMap implementation will be thrown away and we could replace it with a HashMap again ...

True, and that why we use synchronize.
ConcurrentHashMap is not locking the Map (object) but only portion of it, when asked. Under the hood it use synchronize itself as well.
synchronize lock the object (here the entire map). Then everything you do under bracket lock the object.
Here, we want it as we check if a key is there, then get the value of this key.

why synchronized ?
STORE.containsKey(filename) && STORE.get(filename).get() != null
Thread1 => STORE.containsKey(filename) = true
Thread2 => STORE.clear()
Thread1 => STORE.get(filename).get() != null => Throw exception.

STORE.get(filename).get() still throw NPE with a ConcurrentHashMap.

Latency talking, it is nearly 0 improvement (really relative/subjective), because the ConcurrentHashMap will lock itself each time we request.
There is many discussion about this on the web.

On the code proposed here, the STORE is locked only on doing something under it (get/set/clear, and so on).

Before, it was locked when parsing: that was the blocking thing.

ik666 · 2024-05-08T12:44:55Z

As far as I know, the ConcurrentHashMap synchronizes only on the bucket the object lies in. This is of course a much butter approach as to sync on the HashMap itself. Otherwise this class would be useless ...

ik666 · 2024-05-08T12:49:53Z

And this doesn't need the sync I believe ...

WeakReference<MediaInfo> mediaInfoCached = STORE.get(filename);
		if (mediaInfoCached != null && mediaInfoCached.get() != null) {
			return mediaInfoCached.get();
		}

The benefit is, when you access Objects that lays in different buckets, this implementation is much faster. And the benefit is measurable, since the profiler was pointing out to this implementation as a bottleneck.

ik666 · 2024-05-08T12:57:46Z

@SurfaceS I'd propose to finalize PR #4635 and see, if we run into any issues, leaving this PR out of scope, since it brings more complexity without any value. We can later pick it up, if needed.

The NPE should be fixed in 235be72

ik666 · 2024-05-08T13:03:32Z

However guys ... I'm off for the weekend. Will be back on Monday.

SurfaceS · 2024-05-08T13:33:03Z

On this file, synchronized (STORE) was used for 2 things.

Prevent concurrency access (as the STORE is an HashMap).
Prevent multiple time the same request (bad usage).

bottleneck is here :

synchronized (STORE) {
	open db.
	query db.
	parse mediaInfo.
	update db.
	and so on.
}

Removing the synchronized remove the bottleneck, but add this unwanted issue.

That's why this PR add the synchronized on filename.

About ConcurrentHashMap vs synchronized :

You should favor synchronized Map when data consistency is of utmost importance, and you should choose ConcurrentHashMap for performance-critical applications where there are far more write operations than there are read operations (+5% benchmark).

That's why I doubt the real benefit of using ConcurrentHashMap, as we read far more that we write.

ik666 · 2024-05-08T13:53:45Z

Prevent multiple time the same request (bad usage).

Yeah. I think this is solved in 517d264. You might want to have a look at the current PR implementation.

Nadahar · 2024-05-08T14:10:00Z

I hate to ruin the party, but there's a reason why media parsing is serialized in the first place. I doubt there's much performance gain to be had here unless there is some other "issue" at hand.

The reason is that media parsing is IO bound. It's not the time the CPU spends to analyze that matters, it's the time it takes to retrieve the data (in almost all cases at least, unless maybe if you're using a RAM drive or similar). Asking a harddrive to retrieve multiple files in parallel is slower than doing it serially - because a harddrive is fastest when reading large chunks of data, not doing "random access". I'm pretty sure the same is true for SSDs.

Thus, running multiple parsing processes will slow things down, not speed it up. The only exception is where the data resides on different devices. I'd say that is a quite rare circumstance that don't outweigh the slowdown achieved during "normal" parsing with multiple files on the same device.

I'd say that the order/priority of the parsing probably is "the problem", not the fact that it's running sequensially.

ik666 · 2024-05-08T14:16:31Z

@Nadahar You're complete right ... the issue is the startup experience. If the media scanner is building up the database and a user makes a browse request in that time, the browse request and the library scanner are blocking each other. Same for multiple browse requests. This ought to be solved in #6535.

SurfaceS · 2024-05-14T11:07:58Z

included in v14.1

allow concurrent mediainfostore retrieval

05b3fb6

SurfaceS changed the title ~~allow concurrent mediainfostore retrieval~~ allow concurrent MediaInfoStore retrieval May 8, 2024

This was referenced May 8, 2024

Performance / concurrency / MediaInfoParser #4635

Merged

v14 #3969

Closed

V14.1 #4651

Merged

SurfaceS closed this May 14, 2024

SurfaceS deleted the main-mediainfostore-block branch May 14, 2024 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow concurrent MediaInfoStore retrieval #4639

allow concurrent MediaInfoStore retrieval #4639

SurfaceS commented May 8, 2024 •

edited

ik666 commented May 8, 2024

ik666 commented May 8, 2024

ik666 commented May 8, 2024

SurfaceS commented May 8, 2024

SurfaceS commented May 8, 2024

ik666 commented May 8, 2024

SurfaceS commented May 8, 2024

ik666 commented May 8, 2024

ik666 commented May 8, 2024

ik666 commented May 8, 2024 •

edited

ik666 commented May 8, 2024

SurfaceS commented May 8, 2024

ik666 commented May 8, 2024

Nadahar commented May 8, 2024

ik666 commented May 8, 2024

SurfaceS commented May 14, 2024

allow concurrent MediaInfoStore retrieval #4639

allow concurrent MediaInfoStore retrieval #4639

Conversation

SurfaceS commented May 8, 2024 • edited

ik666 commented May 8, 2024

ik666 commented May 8, 2024

ik666 commented May 8, 2024

SurfaceS commented May 8, 2024

SurfaceS commented May 8, 2024

ik666 commented May 8, 2024

SurfaceS commented May 8, 2024

ik666 commented May 8, 2024

ik666 commented May 8, 2024

ik666 commented May 8, 2024 • edited

ik666 commented May 8, 2024

SurfaceS commented May 8, 2024

ik666 commented May 8, 2024

Nadahar commented May 8, 2024

ik666 commented May 8, 2024

SurfaceS commented May 14, 2024

SurfaceS commented May 8, 2024 •

edited

ik666 commented May 8, 2024 •

edited