Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planet-Import is slow on very potent server #511

Open
georgbachmann opened this issue Dec 13, 2023 · 3 comments
Open

Planet-Import is slow on very potent server #511

georgbachmann opened this issue Dec 13, 2023 · 3 comments

Comments

@georgbachmann
Copy link

georgbachmann commented Dec 13, 2023

I am using the latest docker image using this docker-compose configuration:

version: "3.4"

services:

  nominatim:
    image: mediagis/nominatim:4.3
    shm_size: '4gb'
    environment:
      - PBF_PATH=/osm/planet-latest.osm.pbf
      - NOMINATIM_PASSWORD=test
      - POSTGRES_SHARED_BUFFERS='4GB'
      - THREADS=50
      - NOMINATIM_ADDRESS_LEVEL_CONFIG=/config/our_own_ranks.json
    volumes:
      - ./nominatim-data:/var/lib/postgresql/14/main
      - ./flatnode:/nominatim/flatnode
      - ./osm:/osm
      - ./config:/config

When I try to import a smaller extract on my local machine (a MacBook M1) it suuuuuper fast. So I do understand the stages of the import. And the osm2psql part ist fast on my local machine as well as the rankings afterwards. I get like 2000+ items per second. That seems fine?!?

So now on my server, which is a machine with 60 CPU cores, about 300!!GB of RAM and fast NVME drives, the osm2psql part is also super fast, but when it then starts to rank things, it turns super slow...

2023-12-13 07:12:55: Done 3107220 in 12238 @ 253.895 per second - rank 30 ETA (seconds): 1260190.67

So like 250-ish items per second, which would result in an import time of around 2 weeks.
I am not really good at optimizing postgres, but I guess I that will be my problem?

Would be suuuuuper happy about help on what in my postgres config is wrong and therefore slowing everything down.

Desktop / Server (please complete the following information):

  • OS & Version: Ubuntu 22.04.3 LTS
  • Docker Version: 20.10.21
  • Nominatim Version: 4.3
@leonardehrenfried
Copy link
Collaborator

Does this server have a SDD or HDD? A fast disk is paramount to good performance.

@georgbachmann
Copy link
Author

georgbachmann commented Dec 13, 2023

It has NVME drives... they should be super fast...

@georgbachmann
Copy link
Author

georgbachmann commented Dec 15, 2023

Our sys-admin changed the file system settings a bit, so that it does not write to disk that often, but that only improved the whole speed a very little bit. I also at the same time increased the shm size a bit as well. So now we have around 350 items per second now...

But also, as I let it run a bit further now, we get the following crash:

023-12-15 08:36:41: Done 3079560 in 8643 @ 356.286 per second - rank 30 ETA (seconds): 898108.41
2023-12-15 08:36:42: Done 3079920 in 8644 @ 356.286 per second - rank 30 ETA (seconds): 898107.17
.......................................................................................................................................................

Traceback (most recent call last):
  File "/usr/local/bin/nominatim", line 12, in <module>
    exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module',
  File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 225, in nominatim
    return get_set_parser().run(**kwargs)
  File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 121, in run
    return args.command.run(args)
  File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/setup.py", line 134, in run
    indexer.index_full(analyse=not args.index_noanalyse)
  File "/usr/local/lib/nominatim/lib-python/nominatim/indexer/indexer.py", line 140, in index_full
    if self.index_by_rank(26, 30) > 1000:
  File "/usr/local/lib/nominatim/lib-python/nominatim/indexer/indexer.py", line 174, in index_by_rank
    total += self._index(runners.RankRunner(rank, analyzer), 20 if rank == 30 else 1)
  File "/usr/local/lib/nominatim/lib-python/nominatim/indexer/indexer.py", line 234, in _index
    runner.index_places(pool.next_free_worker(), part)
  File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 201, in next_free_worker
    return next(self.free_workers)
  File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 209, in _yield_free_worker
    if thread.is_done():
  File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 159, in is_done
    if self.conn.poll() == psycopg2.extensions.POLL_OK:
psycopg2.errors.TriggeredDataChangeViolation: tuple to be updated was already modified by an operation triggered by the current command
HINT:  Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.

Can this have something todo with this here? https://lists.openstreetmap.org/pipermail/dev/2020-September/030986.html So basically some bad data? The data I use is the planet-latest from last week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants