Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psycopg2.OperationalError: server closed the connection unexpectedly #500

Open
echo292 opened this issue Nov 8, 2023 · 6 comments
Open
Labels
resource problem Nominatim runs out of resources, either because the machine doesn't have enough or they are limited

Comments

@echo292
Copy link

echo292 commented Nov 8, 2023

Describe the bug
Initial container creation from docker image returns error after running for a while. Something might be timing out. The north america pbf is a larger file at 12gb.
2023-11-08 01:07:43 psycopg2.OperationalError: server closed the connection unexpectedly

To Reproduce
Steps to reproduce the behavior:

  1. Ran the following command:

docker run -it --shm-size=60g -e PBF_URL=https://download.geofabrik.de/north-america-latest.osm.pbf -e IMPORT_WIKIPEDIA=true -e IMPORT_US_POSTCODES=true -e IMPORT_TIGER_ADDRESSES=false -p 8080:8080 --name nominatimNorthAmerica mediagis/nominatim:4.3

  1. See error

2023-11-08 01:07:30 2023-11-08 09:07:30: Done 27322100 in 3651 @ 7481.536 per second - rank 30 ETA (seconds): 2204.75
2023-11-08 01:07:31 2023-11-08 09:07:31: Done 27329580 in 3652 @ 7481.704 per second - rank 30 ETA (seconds): 2203.70
2023-11-08 01:07:32 2023-11-08 09:07:32: Done 27337060 in 3653 @ 7481.461 per second - rank 30 ETA (seconds): 2202.78
2023-11-08 01:07:33 2023-11-08 09:07:33: Done 27344540 in 3655 @ 7481.357 per second - rank 30 ETA (seconds): 2201.81
2023-11-08 01:07:35 2023-11-08 09:07:35: Done 27352020 in 3656 @ 7481.101 per second - rank 30 ETA (seconds): 2200.88
2023-11-08 01:07:35 2023-11-08 09:07:35: Done 27359500 in 3656 @ 7482.438 per second - rank 30 ETA (seconds): 2199.49
2023-11-08 01:07:35 2023-11-08 09:07:35: Done 27366980 in 3656 @ 7483.752 per second - rank 30 ETA (seconds): 2198.10
2023-11-08 01:07:36 2023-11-08 09:07:36: Done 27374460 in 3657 @ 7484.661 per second - rank 30 ETA (seconds): 2196.84
2023-11-08 01:07:36 2023-11-08 09:07:36: Done 27381960 in 3658 @ 7485.282 per second - rank 30 ETA (seconds): 2195.65
2023-11-08 01:07:38 2023-11-08 09:07:38: Done 27389440 in 3659 @ 7485.196 per second - rank 30 ETA (seconds): 2194.68
2023-11-08 01:07:39 2023-11-08 09:07:39: Done 27396920 in 3660 @ 7484.945 per second - rank 30 ETA (seconds): 2193.75
2023-11-08 01:07:40 2023-11-08 09:07:40: Done 27404400 in 3661 @ 7484.936 per second - rank 30 ETA (seconds): 2192.76
2023-11-08 01:07:41 2023-11-08 09:07:41: Done 27411880 in 3662 @ 7484.948 per second - rank 30 ETA (seconds): 2191.75
2023-11-08 01:07:42 2023-11-08 09:07:42: Done 27419380 in 3663 @ 7484.888 per second - rank 30 ETA (seconds): 2190.77
2023-11-08 01:07:43 2023-11-08 09:07:43: Done 27426860 in 3664 @ 7485.145 per second - rank 30 ETA (seconds): 2189.69
2023-11-08 01:07:43 2023-11-08 09:07:43: Done 27434340 in 3664 @ 7486.064 per second - rank 30 ETA (seconds): 2188.43
2023-11-08 01:07:43 Traceback (most recent call last):
2023-11-08 01:07:43 File "/usr/local/lib/nominatim/lib-python/nominatim/indexer/indexer.py", line 234, in _index
2023-11-08 01:07:43 runner.index_places(pool.next_free_worker(), part)
2023-11-08 01:07:43 File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 201, in next_free_worker
2023-11-08 01:07:43 return next(self.free_workers)
2023-11-08 01:07:43 File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 209, in _yield_free_worker
2023-11-08 01:07:43 if thread.is_done():
2023-11-08 01:07:43 File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 159, in is_done
2023-11-08 01:07:43 if self.conn.poll() == psycopg2.extensions.POLL_OK:
2023-11-08 01:07:43 psycopg2.OperationalError: server closed the connection unexpectedly
2023-11-08 01:07:43 This probably means the server terminated abnormally
2023-11-08 01:07:43 before or while processing the request.

Expected behavior
I expected the Nominatim server to complete this command step.

Screenshots & Logs
If applicable, add screenshots & logs to help explain your problem.

Desktop / Server (please complete the following information):
OS & Version: [Windows 11 Pro 23H2]
Docker Version: [Server: Docker Desktop 4.25.0 (126437)]
Nominatim Version: [4.3.1]

Additional context
Here are the workstation specs:
i9 14900k
192GB of memory
4 TB M.2 SSD

I found this info about the postgresql timing out.
I can't find these connection parameters in any of the basic Nominatim server settings.

https://www.tomatolist.com/show_blog_page.html?no=9b5e7549-e3d3-484e-9259-7b53432f9f26

conn = psycopg2.connect(database="DBname",
host="Your_IP_address",
user="username",
password="Your_password",
port=5432,
keepalives=1,
keepalives_idle=130,
keepalives_interval=10,
keepalives_count=15)

Thank you,

-E

@echo292 echo292 added the bug label Nov 8, 2023
@echo292
Copy link
Author

echo292 commented Nov 8, 2023

Here are the Postgresql logs.

Part 1

2023-11-08 06:42:33.270 UTC [55] LOG:  checkpoints are occurring too frequently (8 seconds apart)
2023-11-08 06:42:33.270 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 06:42:38.309 UTC [55] LOG:  checkpoints are occurring too frequently (5 seconds apart)
2023-11-08 06:42:38.309 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 06:42:41.471 UTC [208] ERROR:  canceling autovacuum task
2023-11-08 06:42:41.471 UTC [208] CONTEXT:  while scanning block 9152001 of relation "public.planet_osm_nodes"
	automatic vacuum of table "nominatim.public.planet_osm_nodes"
2023-11-08 06:42:43.330 UTC [443] ERROR:  canceling autovacuum task
2023-11-08 06:42:43.330 UTC [443] CONTEXT:  while scanning block 2420922 of relation "public.place"
	automatic vacuum of table "nominatim.public.place"
2023-11-08 06:42:44.485 UTC [370] ERROR:  canceling autovacuum task
2023-11-08 06:42:44.485 UTC [370] CONTEXT:  while scanning block 750676 of relation "public.planet_osm_ways"
	automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 06:42:46.654 UTC [447] ERROR:  canceling autovacuum task
2023-11-08 06:42:46.654 UTC [447] CONTEXT:  while scanning block 13507 of relation "public.placex"
	automatic vacuum of table "nominatim.public.placex"
2023-11-08 06:42:47.968 UTC [447] ERROR:  canceling autovacuum task
2023-11-08 06:42:47.968 UTC [447] CONTEXT:  while scanning block 5163 of relation "public.location_property_osmline"
	automatic vacuum of table "nominatim.public.location_property_osmline"
2023-11-08 06:46:45.517 UTC [456] ERROR:  canceling autovacuum task
2023-11-08 06:46:45.517 UTC [456] CONTEXT:  while scanning block 398686 of relation "public.planet_osm_nodes"
	automatic vacuum of table "nominatim.public.planet_osm_nodes"
2023-11-08 06:46:47.296 UTC [459] ERROR:  canceling autovacuum task
2023-11-08 06:46:47.296 UTC [459] CONTEXT:  while scanning block 2582094 of relation "public.place"
	automatic vacuum of table "nominatim.public.place"
2023-11-08 06:46:48.445 UTC [462] ERROR:  canceling autovacuum task
2023-11-08 06:46:48.445 UTC [462] CONTEXT:  while scanning block 786985 of relation "public.planet_osm_ways"
	automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 06:46:50.652 UTC [456] ERROR:  canceling autovacuum task
2023-11-08 06:46:50.652 UTC [456] CONTEXT:  while scanning block 22030 of relation "public.placex"
	automatic vacuum of table "nominatim.public.placex"
2023-11-08 06:46:52.008 UTC [459] ERROR:  canceling autovacuum task
2023-11-08 06:46:52.008 UTC [459] CONTEXT:  while scanning block 14075 of relation "public.location_property_osmline"
	automatic vacuum of table "nominatim.public.location_property_osmline"
2023-11-08 06:47:24.987 UTC [55] LOG:  checkpoints are occurring too frequently (15 seconds apart)
2023-11-08 06:47:24.987 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 06:47:32.862 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:32.862 UTC [716] CONTEXT:  while scanning block 440701 of relation "public.planet_osm_nodes"
	automatic vacuum of table "nominatim.public.planet_osm_nodes"
2023-11-08 06:47:34.635 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:34.635 UTC [716] CONTEXT:  while scanning block 2589145 of relation "public.place"
	automatic vacuum of table "nominatim.public.place"
2023-11-08 06:47:35.784 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:35.784 UTC [716] CONTEXT:  while scanning block 791191 of relation "public.planet_osm_ways"
	automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 06:47:37.955 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:37.955 UTC [716] CONTEXT:  while scanning block 26012 of relation "public.placex"
	automatic vacuum of table "nominatim.public.placex"
2023-11-08 06:47:41.666 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:41.666 UTC [716] CONTEXT:  while scanning block 18797 of relation "public.location_property_osmline"
	automatic vacuum of table "nominatim.public.location_property_osmline"
2023-11-08 06:50:31.721 UTC [58] LOG:  using stale statistics instead of current ones because stats collector is not responding
2023-11-08 06:51:26.744 UTC [1108] ERROR:  canceling autovacuum task
2023-11-08 06:51:26.744 UTC [1108] CONTEXT:  while scanning block 857934 of relation "public.planet_osm_nodes"
	automatic vacuum of table "nominatim.public.planet_osm_nodes"
2023-11-08 06:51:28.715 UTC [1210] ERROR:  canceling autovacuum task
2023-11-08 06:51:28.715 UTC [1210] CONTEXT:  while scanning block 878300 of relation "public.planet_osm_ways"
	automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 06:51:30.909 UTC [1245] ERROR:  canceling autovacuum task
2023-11-08 06:51:30.909 UTC [1245] CONTEXT:  while scanning block 95895 of relation "public.placex"
	automatic vacuum of table "nominatim.public.placex"
2023-11-08 06:51:32.177 UTC [1108] ERROR:  canceling autovacuum task
2023-11-08 06:51:32.177 UTC [1108] CONTEXT:  while scanning block 8089 of relation "public.place_addressline"
	automatic vacuum of table "nominatim.public.place_addressline"
2023-11-08 06:51:33.681 UTC [1210] ERROR:  canceling autovacuum task
2023-11-08 06:51:33.681 UTC [1210] CONTEXT:  while scanning block 9512 of relation "public.search_name_2"
	automatic vacuum of table "nominatim.public.search_name_2"
2023-11-08 06:51:36.609 UTC [1108] ERROR:  canceling autovacuum task
2023-11-08 06:51:36.609 UTC [1108] CONTEXT:  while scanning block 11042 of relation "public.location_area_large_2"
	automatic vacuum of table "nominatim.public.location_area_large_2"
2023-11-08 06:51:37.813 UTC [1245] ERROR:  canceling autovacuum task
2023-11-08 06:51:37.813 UTC [1245] CONTEXT:  while scanning block 23382 of relation "public.location_property_osmline"
	automatic vacuum of table "nominatim.public.location_property_osmline"
2023-11-08 06:51:38.945 UTC [1210] ERROR:  canceling autovacuum task
2023-11-08 06:51:38.945 UTC [1210] CONTEXT:  while scanning block 8205 of relation "public.search_name"
	automatic vacuum of table "nominatim.public.search_name"
2023-11-08 07:16:14.562 UTC [55] LOG:  checkpoints are occurring too frequently (29 seconds apart)
2023-11-08 07:16:14.562 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 07:29:51.700 UTC [55] LOG:  checkpoints are occurring too frequently (28 seconds apart)
2023-11-08 07:29:51.700 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 07:37:04.314 UTC [55] LOG:  checkpoints are occurring too frequently (29 seconds apart)
2023-11-08 07:37:04.314 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
*** stack smashing detected ***: terminated
2023-11-08 09:07:43.713 UTC [53] LOG:  server process (PID 8240) was terminated by signal 6: Aborted
2023-11-08 09:07:43.713 UTC [53] DETAIL:  Failed process was running:  UPDATE placex
	                SET indexed_status = 0, address = v.addr, token_info = v.ti,
	                    name = v.name, linked_place_id = v.linked_place_id
	                FROM (VALUES (23803236, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803247, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803270, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803318, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803379, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803387, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803453, hstore(ARRAY['name'], ARRAY['Cherry School'])::hstore, NULL::hstore, NULL::int, '{"names": "{7281851,15536,64307}"}'::jsonb),(23803458, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803469, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803475, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803489, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803514, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(2
2023-11-08 09:07:43.714 UTC [53] LOG:  terminating any other active server processes
2023-11-08 09:07:43.775 UTC [53] LOG:  all server processes terminated; reinitializing
2023-11-08 09:07:43.908 UTC [8279] LOG:  database system was interrupted; last known up at 2023-11-08 09:06:35 UTC
2023-11-08 09:07:43.910 UTC [8279] LOG:  database system was not properly shut down; automatic recovery in progress
2023-11-08 09:07:43.912 UTC [8279] LOG:  redo starts at 5B/8B004358
2023-11-08 09:07:50.464 UTC [8279] LOG:  redo done at 5B/C9361F90 system usage: CPU: user: 2.51 s, system: 1.83 s, elapsed: 6.55 s
2023-11-08 09:07:50.879 UTC [53] LOG:  database system is ready to accept connections
2023-11-08 16:06:28.749 UTC [45] LOG:  starting PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-11-08 16:06:28.749 UTC [45] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-11-08 16:06:28.749 UTC [45] LOG:  listening on IPv6 address "::", port 5432
2023-11-08 16:06:28.749 UTC [45] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-11-08 16:06:28.758 UTC [46] LOG:  database system was interrupted; last known up at 2023-11-08 09:07:50 UTC
2023-11-08 16:06:28.760 UTC [46] LOG:  database system was not properly shut down; automatic recovery in progress
2023-11-08 16:06:28.761 UTC [46] LOG:  redo starts at 5B/C93620C0
2023-11-08 16:06:28.761 UTC [46] LOG:  invalid record length at 5B/C93620F8: wanted 24, got 0
2023-11-08 16:06:28.761 UTC [46] LOG:  redo done at 5B/C93620C0 system usage: CPU: user: 0.00 s, 

Part 2

2023-11-08 16:58:31.057 UTC [47] LOG:  checkpoints are occurring too frequently (4 seconds apart)
2023-11-08 16:58:31.057 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:58:53.171 UTC [47] LOG:  checkpoints are occurring too frequently (22 seconds apart)
2023-11-08 16:58:53.171 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:59:33.156 UTC [47] LOG:  checkpoints are occurring too frequently (3 seconds apart)
2023-11-08 16:59:33.156 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:59:35.445 UTC [47] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2023-11-08 16:59:35.445 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:59:37.759 UTC [47] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2023-11-08 16:59:37.759 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:59:39.547 UTC [272] ERROR:  canceling autovacuum task
2023-11-08 16:59:39.547 UTC [272] CONTEXT:  while scanning block 737428 of relation "public.planet_osm_ways"
	automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 17:26:43.161 UTC [55] LOG:  starting PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-11-08 17:26:43.161 UTC [55] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-11-08 17:26:43.161 UTC [55] LOG:  listening on IPv6 address "::", port 5432
2023-11-08 17:26:43.161 UTC [55] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-11-08 17:26:43.162 UTC [56] LOG:  database system was interrupted; last known up at 2023-11-08 16:59:45 UTC
2023-11-08 17:26:43.163 UTC [56] LOG:  database system was not properly shut down; automatic recovery in progress
2023-11-08 17:26:43.164 UTC [56] LOG:  redo starts at 93/C9ECCA38
2023-11-08 17:26:47.006 UTC [56] LOG:  redo done at 93/D2E4FFA0 system usage: CPU: user: 0.54 s, system: 1.87 s, elapsed: 3.84 s
2023-11-08 17:26:47.523 UTC [55] LOG:  database system is ready to accept connections

@echo292
Copy link
Author

echo292 commented Nov 8, 2023

Could be related.
python3-psycopg2 \

psycopg/psycopg2#829

@leonardehrenfried
Copy link
Collaborator

These sorts of errors happen when your container doesn't have enough disk or memory, for example #484

I would check how much is assigned to your docker VM.

@echo292
Copy link
Author

echo292 commented Nov 8, 2023

Hello,
Thank you for the fast response.

For this issue, I'm running Docker in Windows 11 and not in a VM with the workstation configuration listed below.

Should I increase any of the Postgresql settings?

POSTGRES_SHARED_BUFFERS (default: 2GB)
POSTGRES_MAINTENANCE_WORK_MEM (default: 10GB)
POSTGRES_AUTOVACUUM_WORK_MEM (default: 2GB)
POSTGRES_WORK_MEM (default: 50MB)
POSTGRES_EFFECTIVE_CACHE_SIZE (default: 24GB)
POSTGRES_SYNCHRONOUS_COMMIT (default: off)
POSTGRES_MAX_WAL_SIZE (default: 1GB)
POSTGRES_CHECKPOINT_TIMEOUT (default: 10min)
POSTGRES_CHECKPOINT_COMPLETION_TARGET (default: 0.9)

Desktop / Server (please complete the following information):
OS & Version: [Windows 11 Pro 23H2]
Docker Version: [Server: Docker Desktop 4.25.0 (126437)]
Nominatim Version: [4.3.1]

Additional context
Here are the workstation specs:
i9 14900k
192GB of memory
4 TB NVME M.2 SSD

Thank you,

-E

@echo292
Copy link
Author

echo292 commented Nov 8, 2023

One other note, I don't receive any errors about running out of disk space.

#484
psycopg2.errors.DiskFull: could not resize shared memory segment "/PostgreSQL.2013621802" to 50438144 bytes: No space left on device

@leonardehrenfried
Copy link
Collaborator

Docker requires Linux and on windows always runs in a VM even though it's hidden from you.

@leonardehrenfried leonardehrenfried added resource problem Nominatim runs out of resources, either because the machine doesn't have enough or they are limited and removed bug labels Dec 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resource problem Nominatim runs out of resources, either because the machine doesn't have enough or they are limited
Projects
None yet
Development

No branches or pull requests

2 participants