Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency with larger data sets #82

Open
tpmccallum opened this issue Aug 29, 2015 · 1 comment
Open

Efficiency with larger data sets #82

tpmccallum opened this issue Aug 29, 2015 · 1 comment

Comments

@tpmccallum
Copy link

Hi,
I have a question about ingesting text files in stages (as opposed to running the make file in one sitting).
When running the make file with very large number I get the following message, and I can't help think that there may be a more efficient way of ingesting the items.
'''
parallel: Warning: No more processes: Decreasing number of running jobs to 1. Raising ulimit -u or /etc/security/limits.conf may help.
'''

Just to clarify (as far as I know) there are no issues with the files or the catalog (encoding is good - utf8 only etc). I run smaller sets from time to time for testing and they work fine. This efficiency issue only presents itself when ingesting over say 10 million records.

Please see the following ulimit -a output also

'''
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31559
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 9000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31559
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
'''

Thanks so much,
Tim

@tpmccallum
Copy link
Author

Toying with an idea on Line 81
parallel -a files/metadata/jsoncatalog.txt --block 100M --pipepart python bookworm/MetaParser.py > $@

instead of

cat files/metadata/jsoncatalog.txt | parallel --pipe python bookworm/MetaParser.py > $@

Will report back soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant