polardb shared storage file-dio:///var/polardb/shared_datadir is unavailable #503

qwe123520 · 2024-04-26T07:39:41Z

Describe the problem

docker单节点启动polardb-pg修改配置文件起不来报错polardb shared storage file-dio:///var/polardb/shared_datadir is unavailable
配置文件如下：

postgresql.txt

...

polardb-bot · 2024-04-26T07:40:55Z

Hi @qwe123520 ~ Thanks for opening this issue! 🎉

Please make sure you have provided enough information for subsequent discussion.

We will get back to you as soon as possible. ❤️

qwe123520 · 2024-04-26T07:43:05Z

错误日志如下：
2024-04-26 15:31:50.066 CST [14] [14] LOG: forked new process, pid is 16, true pid is 16
2024-04-26 15:31:50.066 CST [14] [14] LOG: forked new process, pid is 17, true pid is 17
2024-04-26 15:31:50.078 CST [14] [14] LOG: polardb try start vfs process
2024-04-26 15:31:50.078 CST [14] [14] LOG: pfs in localfs mode
2024-04-26 15:31:50.081 CST [14] [14] FATAL: polardb shared storage file-dio:///var/polardb/shared_datadir is unavailable.
2024-04-26 15:31:50.081 CST [14] [14] BACKTRACE:
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(elog_finish+0x1fd) [0x555e31bde55d]
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(+0x7db1ae) [0x555e31a4d1ae]
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(PostmasterMain+0xf53) [0x555e319dbf63]
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(main+0x830) [0x555e316bacf0]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6ace30cd90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f6ace30ce40]
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(_start+0x25) [0x555e316ca6d5]
2024-04-26 15:31:50.202 CST [14] [14] LOG: database system is shut down

mrdrivingduck · 2024-04-28T01:40:59Z

@qwe123520 What is your docker startup command?

qwe123520 · 2024-04-28T02:27:55Z

使用的这个镜像”polardb/polardb_pg_local_instance“，没有配置额外的启动命令。

mrdrivingduck · 2024-04-28T03:17:37Z

@qwe123520 跟镜像没有关系，跟从镜像上启动容器的方式有关系。所以我在询问启动容器的命令是什么？用下面的命令启动容器呢？

docker pull polardb/polardb_pg_local_instance
docker run -it --rm polardb/polardb_pg_local_instance psql

qwe123520 · 2024-04-28T03:30:08Z

docker run -d --name polardb -v /data/polardb/:/var/polardb/ polardb/polardb_pg_local_instance使用的这个命令启动的。

qwe123520 · 2024-04-28T03:31:30Z

docker run -it --rm polardb/polardb_pg_local_instance psql我只要-v使用本机目录就不行

mrdrivingduck · 2024-04-28T03:33:52Z

docker run -d --name polardb -v /data/polardb/:/var/polardb/ polardb/polardb_pg_local_instance使用的这个命令启动的。

本机目录上 /data/polardb/ 这个目录存在且非空吗？

qwe123520 · 2024-04-28T03:34:51Z

是的，它存在并且非空

mrdrivingduck · 2024-04-28T03:37:57Z

是的，它存在并且非空

需要用一个存在且空白的目录来启动容器，这样容器启动脚本发现目录为空就会在这个目录中 initdb 创建数据目录；如果启动脚本发现目录不为空，就会按启动脚本中指定好的数据目录拉起数据库，如果目录中已有内容是一些别的文件就有问题。

qwe123520 · 2024-04-28T03:42:00Z

这个目录是之前启动的时候创建出来的，然后修改了postgres.conf然后就起不来了

SamirWell · 2024-05-22T08:27:03Z

@mrdrivingduck 快来回答问题啦

SamirWell · 2024-05-22T09:35:26Z

就是修改里面postgres.conf之后才会出现这样的问题就不知道和shared_datadir 有啥关系快出来解决问题啦~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

快快快

SamirWell · 2024-05-22T09:40:23Z

还有就是恢复之前的conf内容都不行就改不得

mrdrivingduck · 2024-05-22T10:00:02Z

@qwe123520 @SamirWell

具体修改了什么内容？可否提供下 diff？
根据启动命令，/data/polardb/ 下应该会有 primary_dir/ 之类的几个目录。可以看下每个目录中的 current_logfiles 找到错误日志名称，看看最后的错误日志内容是什么

SamirWell · 2024-05-22T10:10:32Z

2024-05-22 17:56:18.943 CST [20] [20] LOG: vfs_unlink file-dio:///var/polardb/shared_datadir/polar_flog/flashback_log.history.tmp
2024-05-22 17:56:18.944 CST [20] [20] LOG: vfs_rename from file-dio:///var/polardb/shared_datadir/polar_flog/flashback_log.history.tmp to file-dio:///var/polardb/shared_datadir/polar_flog/flashback_log.history
2024-05-22 17:56:18.944 CST [20] [20] LOG: The flashback log will switch from 0/877E0 to 0/10000000
2024-05-22 17:56:18.944 CST [20] [20] LOG: The flashback log shared buffer is ready now, the current point(position) is 0/10000000(0/FF3FFF0), previous point(position) is 0/0(0/0), initalized upto point is 0/10000000
2024-05-22 17:56:18.945 CST [20] [20] LOG: enable persisted slot, read slot from polarstore.
2024-05-22 17:56:18.945 CST [20] [20] LOG: vfs open dir pg_replslot, num open dir 1
2024-05-22 17:56:18.945 CST [20] [20] LOG: vfs open dir file-dio:///var/polardb/shared_datadir/pg_replslot, num open dir 1
2024-05-22 17:56:18.945 CST [20] [20] LOG: vfs_unlink file-dio:///var/polardb/shared_datadir/pg_replslot/replica1/state.tmp
2024-05-22 17:56:18.946 CST [20] [20] LOG: restore slot replica1 with version 10002, replay_lsn is 0/1BA24B8, restart_lsn is 0/1752788
2024-05-22 17:56:18.946 CST [20] [20] LOG: vfs_unlink file-dio:///var/polardb/shared_datadir/pg_replslot/replica2/state.tmp
2024-05-22 17:56:18.946 CST [20] [20] LOG: restore slot replica2 with version 10002, replay_lsn is 0/1BA24B8, restart_lsn is 0/1752788
2024-05-22 17:56:18.946 CST [20] [20] LOG: vfs open dir pg_replslot, num open dir 1
2024-05-22 17:56:18.946 CST [20] [20] LOG: vfs open dir file-dio:///var/polardb/shared_datadir/pg_twophase, num open dir 1
2024-05-22 17:56:18.946 CST [20] [20] LOG: database system was not properly shut down; automatic recovery in progress
2024-05-22 17:56:18.946 CST [20] [20] LOG: state is 4
2024-05-22 17:56:18.965 CST [19] [19] LOG: polar_flog_index log index is insert from 28
2024-05-22 17:56:19.023 CST [19] [19] WARNING: The flashback log record at 0/895F0 will be ignore. and switch to 0/10000028
2024-05-22 17:56:19.023 CST [19] [19] LOG: Recover the flashback logindex to 0/10000000
2024-05-22 17:56:19.362 CST [21] [21] PANIC: polardb shared storage is unavailable.
2024-05-22 17:56:19.362 CST [21] [21] BACKTRACE:
postgres(5432): polar worker process (+0x3fdc5e) [0x560ccc2d4c5e]
/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/polar_worker.so(polar_worker_handler_main+0xd6) [0x7fdf24745ff6]
postgres(5432): polar worker process (StartBackgroundWorker+0x2d7) [0x560ccc629517]
postgres(5432): polar worker process (+0x76441c) [0x560ccc63b41c]
postgres(5432): polar worker process (+0x765dbe) [0x560ccc63cdbe]
postgres(5432): polar worker process (PostmasterMain+0xd4c) [0x560ccc640d5c]
postgres(5432): polar worker process (main+0x830) [0x560ccc31fcf0]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fdf231fed90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fdf231fee40]
postgres(5432): polar worker process (_start+0x25) [0x560ccc32f6d5]

SamirWell · 2024-05-22T10:11:22Z

修改了所有目录内的conf里面 max_connections = 2000

SamirWell · 2024-05-22T10:15:09Z

其实我刚才的做法是先 docker 初始化了数据库没有启动
修改了所有里面的配置最大连接数为2000

然后启动docker 是ok的，

我再次重启一下容器就不行了额，应该是另有原因，看着像是重新挂载方面的问题

inline int
polar_mount(void)
{
	int ret = 0;
	if (polar_vfs[polar_vfs_switch].vfs_mount)
		ret = polar_vfs[polar_vfs_switch].vfs_mount();
	if (polar_enable_io_fencing && ret == 0)
	{
		/* POLAR: FATAL when shared storage is unavailable, or force to write RWID. */
		if (polar_shared_storage_is_available())
		{
			polar_hold_shared_storage(false);
			POLAR_IO_FENCING_SET_STATE(polar_io_fencing_get_instance(), POLAR_IO_FENCING_WAIT);
		}
		else
			elog(FATAL, "polardb shared storage %s is unavailable.", polar_datadir);
	}
	return ret;
}

inline int
polar_remount(void)
{
	int ret = 0;
	if (polar_vfs[polar_vfs_switch].vfs_remount)
		ret = polar_vfs[polar_vfs_switch].vfs_remount();
	if (polar_enable_io_fencing && ret == 0)
	{
		/* POLAR: FATAL when shared storage is unavailable, or force to write RWID. */
		if (polar_shared_storage_is_available())
		{
			polar_hold_shared_storage(true);
			POLAR_IO_FENCING_SET_STATE(polar_io_fencing_get_instance(), POLAR_IO_FENCING_WAIT);
		}
		else
			elog(FATAL, "polardb shared storage %s is unavailable.", polar_datadir);
	}
	return ret;
}

SamirWell · 2024-05-22T10:16:01Z

@mrdrivingduck 要不你测试下场景

mrdrivingduck · 2024-05-22T10:24:47Z

我测试了如下场景，没有发现问题：

$ mkdir polardb_pg
$ docker run -it --rm \
    --env POLARDB_PORT=5432 \
    --env POLARDB_USER=u1 \
    --env POLARDB_PASSWORD=your_password \
    -v ./polardb_pg:/var/polardb \
    polardb/polardb_pg_local_instance \
    echo 'done'

## edit max_connections in three postgresql.conf files

$ docker run -d \
    -p 54320-54322:5432-5434 \
    -v ./polardb_pg:/var/polardb \ 
    polardb/polardb_pg_local_instance

36c196cd8cb3e7b3dfcd2b9268409377462ee42caf95289080ce20f17ab45f61

$ docker exec -it 36c196cd8cb3e7b3dfcd2b9268409377462ee42caf95289080ce20f17ab45f61 bash
$ ps -ef
$ exit

$ docker stop 36c196cd8cb3e7b3dfcd2b9268409377462ee42caf95289080ce20f17ab45f61            
36c196cd8cb3e7b3dfcd2b9268409377462ee42caf95289080ce20f17ab45f61

$ docker run -d \                                                                      
    -p 54320-54322:5432-5434 \
    -v ./polardb_pg:/var/polardb \
    polardb/polardb_pg_local_instance

cdbffcd6b3e6e2f55ac98ee61bfd48ac185db624f5142f3dfc7a0f920ac7a154

$ docker exec -it cdbffcd6b3e6e2f55ac98ee61bfd48ac185db624f5142f3dfc7a0f920ac7a154 bash
$ ps -ef

SamirWell · 2024-05-22T10:28:22Z

可能是我在k3s上面部署的原因吗？

mrdrivingduck · 2024-05-22T10:35:12Z

可能是我在k3s上面部署的原因吗？

需要看下在容器内能否正确访问 /var/polardb/shared_datadir，以及里面的文件是否符合预期。另外确保 volume 没有被多个容器挂载。

SamirWell · 2024-05-22T10:45:46Z

可能是我在k3s上面部署的原因吗？

需要看下在容器内能否正确访问 /var/polardb/shared_datadir，以及里面的文件是否符合预期。另外确保 volume 没有被多个容器挂载。

如果是k3s或者k8s这种滚动升级，存在同时挂载的时间窗，就会挂掉是不~

刚才又重新测试下这种延迟重启的场景还是挂的 o(╥﹏╥)o

mrdrivingduck · 2024-05-22T10:50:59Z

可能是我在k3s上面部署的原因吗？

需要看下在容器内能否正确访问 /var/polardb/shared_datadir，以及里面的文件是否符合预期。另外确保 volume 没有被多个容器挂载。

如果是k3s或者k8s这种滚动升级，存在同时挂载的时间窗，就会挂掉是不~

polardb_pg_local_instance 这个镜像是一个在单机运行共享存储集群的 demo，里面有个简单的 entrypoint 脚本来做管理，目的是方便快速拉起并体验。如果有外部的集群管理和存储管理，那么会和这里面运行的 entrypoint 脚本冲突。建议直接使用纯二进制镜像 polardb/polardb_pg_binary 来适配集群管理工具，这里面是没有管理脚本的。

SamirWell · 2024-05-23T08:03:48Z

最后测试重启前执行

rm -f $shared_datadir/DEATH

就好了，这样就适合在k8s/k3s上单节点部署使用了吧

mrdrivingduck · 2024-05-23T11:14:57Z

最后测试重启前执行
rm -f $shared_datadir/DEATH
就好了，这样就适合在k8s/k3s上单节点部署使用了吧

产生这个文件说明至少有两个数据库实例在同一份数据目录上启动了。这样是有问题的。

qwe123520 added the question Further information is requested label Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polardb shared storage file-dio:///var/polardb/shared_datadir is unavailable #503

polardb shared storage file-dio:///var/polardb/shared_datadir is unavailable #503

qwe123520 commented Apr 26, 2024

polardb-bot bot commented Apr 26, 2024

qwe123520 commented Apr 26, 2024

mrdrivingduck commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

mrdrivingduck commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

mrdrivingduck commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

mrdrivingduck commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

mrdrivingduck commented May 22, 2024 •

edited

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

mrdrivingduck commented May 22, 2024

SamirWell commented May 22, 2024

mrdrivingduck commented May 22, 2024

SamirWell commented May 22, 2024 •

edited

mrdrivingduck commented May 22, 2024

SamirWell commented May 23, 2024

mrdrivingduck commented May 23, 2024

polardb shared storage file-dio:///var/polardb/shared_datadir is unavailable #503

polardb shared storage file-dio:///var/polardb/shared_datadir is unavailable #503

Comments

qwe123520 commented Apr 26, 2024

polardb-bot bot commented Apr 26, 2024

qwe123520 commented Apr 26, 2024

mrdrivingduck commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

mrdrivingduck commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

mrdrivingduck commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

mrdrivingduck commented Apr 28, 2024

qwe123520 commented Apr 28, 2024

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

mrdrivingduck commented May 22, 2024 • edited

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

SamirWell commented May 22, 2024

mrdrivingduck commented May 22, 2024

SamirWell commented May 22, 2024

mrdrivingduck commented May 22, 2024

SamirWell commented May 22, 2024 • edited

mrdrivingduck commented May 22, 2024

SamirWell commented May 23, 2024

mrdrivingduck commented May 23, 2024

mrdrivingduck commented May 22, 2024 •

edited

SamirWell commented May 22, 2024 •

edited