Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local scan: report progress #9

Open
bernd-wechner opened this issue Jun 12, 2017 · 8 comments
Open

local scan: report progress #9

bernd-wechner opened this issue Jun 12, 2017 · 8 comments
Labels
A-scan Area: Media scanning C-enhancement Category: A PR with an enhancement or an issue with an enhancement proposal

Comments

@bernd-wechner
Copy link

I'm playing with mopidy local scan and an initial observation is I can't pull progress info out of mopidy any way. Not even with logging set to the DEBUG level.

Given a local media store can be large and take a long time to scan, I see it as imperative to see some pulse of life, or progress report from the scanner. Not least when any number of things might cause it to hang, not least of which being the local media store being on a mounted NAS system (hardly an unusual scenario I'd think) - though that is stretching the definition of "local" it is none the less where local media library might (in my case, does) reside.

@bernd-wechner
Copy link
Author

OK, have come as far as finding that one line would do I need:

diff --git a/mopidy/internal/path.py b/mopidy/internal/path.py
index 307d733..5e3a531 100644
--- a/mopidy/internal/path.py
+++ b/mopidy/internal/path.py
@@ -136,6 +136,7 @@ def _find_worker(relative, follow, done, work, results, errors):
 
             parents = parents + [(st.st_dev, st.st_ino)]
             if stat.S_ISDIR(st.st_mode):
+                logger.info('Scanning %s ...', entry)
                 for e in os.listdir(entry):
                     work.put((os.path.join(entry, e), parents))
             elif stat.S_ISREG(st.st_mode):

and seems a reasonable piece of logging info. I wonder would a pull request for this be welcome?

It's not what I'd call a temporally perfect pulse given it emits as dirs are thrown onto the job queue for worker threads (i.e. dir has been found but not scanned yet) but is plenty good for its intended purpose. Namely on a huge library that takes a long time to scan, some pulse of life and progress is reassuring and vital.

As it stands it just stalls at the start of the scan with no output until the whole tree has been walked ... and that looks stone dead when I point it at a mounted NAS with a huge library!

@adamcik
Copy link
Member

adamcik commented Jun 12, 2017

We have https://github.com/mopidy/mopidy/blob/develop/mopidy/local/commands.py#L157 which logs each file we process. Or is your problem with the initial scan of what files exist?

@bernd-wechner
Copy link
Author

Thanks, I say that in the debug log indeed! And noticed as I tweaked my one line that debug is forced to file and info ends up on console (when running mopidy on the CLI to test).

But yes the issue I have is with the first time experience that I had and I expect others will. Runs something like this:

  1. Research music solutions
  2. Find mopidy and think, hopeful ... python, tick, looks alive, tick (not like Amarok and Banshee ;-), and more things ... tick, tick, tick.
  3. OK, read a little to see how it goes.
  4. Install it.
  5. Run it ... seems to start
  6. Point it to my media library and do "mopidy local scan"
  7. Hangs ... kill it. oh well. Something is awry ...

I simply contend this is not likely an unusual first time experience, and turns from sour to pleasant with this one line patch above.

Yes, it's the initial scan it seems the tree walk that is done in a lump and then only at end is an info message written, and then with the list in hand mopidy goes to scan the actual files for metadata. The latter is of course slow, but on a NAS with a large library (not uncommon, albeit less and less so as frankly on-line streaming is - as I predicted a decade ago ;-) slowly making large local media collections less relevant) this first tree walk is still very time consuming. I can go time it on my system maybe for input, but it's way beyond the threshold of human patience and wanting a pulse or progress indicator to differentiate working from stuck ...

Not least because when working with a NAS, stuck is not all that uncommon either. NAS can go down, LAN night go down, LAN or NAS might get flooded (someone else in house is streaming video off it say or down or uploading large files ;-) ...

So I'd still see this one liner as a value add to the product IMO and recommend a pull into main. Though given it's an initial directory walk and logging later uses the word "scan" perhaps it should here log ("Found %s ...", entry) for clarity. Point is when I run with this I have a very pleasing pulse of life as my rather large NAS tree is walked.

I haven't done a whole tree walk yet so don't know timing, but to be honest, as I proceed that's on the list, not least because a feature I admire (the cross I bear, being the management of large local library) on any scanner is intelligent projection, which isn't hard to do for rescans (where a previous run is on record). I know that when I run a python tree walk over a backup library on a SATA drive (not NAS) then it's still pretty time consuming and on my todo has been to add timing records and feedback, because I do need multiple passes for tag cleaning ...

@adamcik
Copy link
Member

adamcik commented Jun 13, 2017

We also have https://github.com/mopidy/mopidy/blob/develop/mopidy/local/commands.py#L162 in the file path with is the non debug user logging. We could perhaps add a log callback to the tree walk that triggers every n files and then just have it log info log scanned M files?

As for the tree walk being slow it used to be much much worse, now at least we have a worker pool of threads scanning so that we can make progress while other threads are IO blocked. If you can think of other ways to make this less painful and speed it up that would be great.

@bernd-wechner
Copy link
Author

Not sure of the drawback that one line of info logging poses? Perhaps bad to do logging IO in a worker thread? Having trouble imaging what you mean by a log callback, passing an arg to the worker thread which supplies a function that emits log into from the main thread?

I tried my simple one-liner above on my library to check performance. I'm kind of happy with what I see, it stalls a moment after the root dir is logged (while the other 9 worker threads are waiting I guess for the first sign of work on the queue) but not a bothersome period. But in short order I get:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 29: ordinal not in range(128)

suggesting I have unicode fouling up that logging line (I do have folders with probable unicode names (the band Άβατον for example). Alas can't easily see which entry is fouling it as I can't emit the entry to the console ;-) and am done playing for tonight.

Folder by folder pleases me mind yoy, as I follow the age old artist/album/song directory format (mostly) a folder is an album (mostly). But for the product proper any pulse of life would please a newb trialing it too.

I love the worker pool by the way, it took some reading to get my head around it, (python threading new to me too), but looks slick (though "relative" seems not to be used anywhere, and put in with some plan in mind that hasn't eventuated).

Alas not a pro on speeding up dir walks. They are always a tad slow in my experience. I'm already impressed by your farming out to workers, so while one thread is reading the filenames in a dir, another is actually checking the file. I have typically just used os.walk and been patient ;-). Little choice as if I'm walking on an ARM based NAS on a shared LAN (with my wife,kids, tenants, guests).

(as an aside - I'd PM but can't see how on your profile - I'm a Swiss expat ;-).

@jodal jodal transferred this issue from mopidy/mopidy Aug 12, 2019
@jodal jodal added A-scan Area: Media scanning C-enhancement Category: A PR with an enhancement or an issue with an enhancement proposal labels Dec 21, 2019
@Delapouite
Copy link
Contributor

The current codebase now offers a _ScanProgress class: https://github.com/mopidy/mopidy-local/blob/master/mopidy_local/commands.py#L256

Each time the scan flushes (every 100 files by default, following the value of scan_flush_threshold config), a useful progress log is displayed to the user like:

Scanned 300 of 1337 files in 16s, ~145s left

Is it enough details to close this issue?

@jodal
Copy link
Member

jodal commented Apr 7, 2020

Logging during the scan was greatly improved for the 3.0 release. However, I still don't think we log anything showing progress during the initial file hierarchy traversal, found in mopidy_local/mtimes.py.

@srsgores
Copy link

I'd like to see this as well. I tried running a scan on my rclone-mounted share, and I had no idea if the scan even worked. Even scanning 500k+ lossless tracks on a physically-connected zfs pool still continues after an hour, and I have no idea when it will finish.

Why don't we incrementally add tracks as opposed to walking the tree, then adding all of them at once?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-scan Area: Media scanning C-enhancement Category: A PR with an enhancement or an issue with an enhancement proposal
Projects
None yet
Development

No branches or pull requests

5 participants