Skip to content
This repository has been archived by the owner on Sep 25, 2022. It is now read-only.

Parallelise folder import phase #75

Open
pmonks opened this issue Mar 15, 2018 · 0 comments
Open

Parallelise folder import phase #75

pmonks opened this issue Mar 15, 2018 · 0 comments

Comments

@pmonks
Copy link
Owner

pmonks commented Mar 15, 2018

Currently the tool imports folders serially, in a first phase of import (this allows files to be efficiently batched in the second phase, without having to worry about parent folders - a significant performance improvement over earlier schemes that processed imports folder-by-folder).

Unfortunately because folders are inter-dependent (i.e. you can't import a child folder until the ancestor tree has been imported), parallelising this phase is more difficult than the file case, and was punted in v2.0 of the tool.

By requiring BulkImportSources to scan directories breadth-first, some level of parallelisation would become possible during the folder import phase. i.e. the first level folders would be imported serially, then each of those folders' sub-folder trees imported in parallel.

There are worst case corner cases that need some thought (e.g. when there are fewer first-level folders than the optimal number of threads in the thread pool), but in general this should markedly speed up the folder import phase for large folder trees.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant