Improve RELOAD ... switchover=1 #2100

sanikolaev · 2024-04-29T04:17:55Z

Proposal:

Notes from @klirichek

Terms:

'base set of files' - files like foo.sph, foo.spa, foo.spd etc. - files belonging to a table.
'new set of files' - files like foo.new.sph, foo.new.spa, ... - files with pre-suffix 'new' between name and
extension
'old set of files' - files like foo.old.sph, foo.old.spa, ... - files with pre-suffix 'old' between name and
extension

Currently implemented flows of operations over tables in plain mode on startup:

1. On the path of a table in config present 'base set of files' belonging to the table.

Load the table.
Create and lock <table>.spl
Serve the table.

2. On the path of a table in config present 'new set of files'

Rename new set of files to base set of files (i.e. remove 'new' pre-suffixes)
Load the table from set of files.
Create and lock <table>.spl
Apply table's kill-list to targets, and other kill-lists to table as target.
Serve the table.

If it is unable to load the table by any reason - perform roll-back renaming set of files to 'new set of files', then report table is not served.

3. On the path of a table in config present 'base set of files', and also 'new set of files'

Rename base set of files to 'old set of files'.
Rename 'new set of files' to base set of files.
Load the table from set of files.
Create and lock <table>.spl.
Apply table's kill-list to targets, and other kill-lists to table as target.
Serve the table.

If it is unable to load table by any reason - perform roll-back in the same 2 steps in opposite order: 1-st rename base
set of files to new set of files, then rename old set of files to base set of files.

Notice, if 'new set of files' is damaged, daemon doesn't try to load from 'base set of files'. That is because on
startup rotation performed before serving, and so, 'base set of files' is just not loaded even if they present; they're
only target of renaming to 'old set of files' with unlinking in case of success.

4. On the path of a table in config there are no files, or they're damaged

Report table not serving.

Notice, renaming itself is performed with 'try-rollback' pattern. That is, we first try to rename set of files one-by-one, and if an error happens during the process, and part of files are already processed - we then rollback that renaming. So, in usual flavour we try to never leave part of files renamed - they either processed all, either returned back. In case of exception during such back-renaming daemon is killed with fatal error - to avoid any further damage. That is: we try to rename 'foo1', 'foo2', 'foo3' to 'bar1', 'bar2', 'bar3' -> couple is renamed, but 'foo3' fails to be renamed to ''bar3. At this case we have 'bar1', 'bar2', and stucked 'foo3' on the disk, - and we try roll-back, rename 'bar2' to 'foo2' and 'bar1' to 'foo1'. But if any of such back renaming fails - say, we can't rename 'bar1' to 'foo1' - then daemon issues fatal-error and will die.

Currently implemented flows of operations over tables in plain mode on runtime on HUP signal:

First daemon reloads the config. In case of she-bang, it starts the script once and collects output.
Daemon compares CRC32 of fresh and active config, compares sizes, compares creation and modification time. If nothing updated, nothing then happens.

So, below we assume, config is changed.

1. new table arrived in config

Load and serve the table

2. table vanished from config

Remove table from the list of tables, and unload it.

2. table changes role

That is possible only via seamless rotation. Each query grab reference to affected table, and since that moment doesn't depend on the global set of tables. Daemon just reconfigures current set of tables according to config, and if, say, table 'foo' which was plain became distributed, according to fresh config, then - existing (running) queries to plain table 'foo' still works with previous one. New queries directed to new (distributed) one. When old queries finish, they release reference, and old plain table 'foo' will finally be released and removed. No one can achieve access to old table, since it is no more mentioned in any set of tables, and only lives by reference in running queries.

Changing role may be useful, if you started with a local table, and then want to split it to shards. In this case you can just rewrite the config and declare same table as distributed.

3. table changes name

Now it is processed as a combo of 'new table arrived' and 'table vanished'. That is - new copy of table loaded from the same files, old one vanishes. That is quite easy to provide true renaming in that case - we can identify table by path, type and settings - and then just change the name of already loaded, without need to unload/load.

4. table changes path; new path contains 'base set of files'

Daemon seamlessly loads table from new path, then releases previous version. Previous files left intact

5. table changes path; new path contains 'new set of files'

Daemon rotates 'new set of files' to 'base set of files' and loads table from new path.
Then it unlinks (deletes) files on previous path.

Switchover behavior

Switchover is the way to seamless switch to another index files. Reload from another location first copies files to the place specified in config, and then performs rotation over this copy. Reload with switchover makes no copy, but directly switches to another location.

Current implementation

'reload table ... option switchover=1' was implemented by adding special 'link' file. That is simple line of text with location of index files. It looks like redundant piece, since the same goal can be achieved by modifying the config. And in plain mode it looks like more consistent.

It duplicates the path in config. That is - if we have path there, why we need extra path?
We don't allow any changes aside config - but link file breaks this.
It keeps table metadata (info about placement) together into table's data.
It is solely daemon's details, and not supported by other tools like indexer. That is - if a file present, you can perform indexing, rotation, whatever - but daemon will ignore your files and will point solely to the place in link file. That can easy lead to confusion.

Proposed implementation

Main principle - paths should be defined solely in config. If you want to change the path - modify config. You don't need any external things like 'link' file; just provide final path in the config and that's all. Such approach is transparent for all tools like indextool, converter, indexer. They expect to find everything in the config and should not care about any 'proxy' or 'link' files.

So, if you want to switchover - you should modify config and correct path to the table's files. Then - if table files already present on the new place - run 'reload tables' via console, or send HUP signal to daemon. So, nothing to implement for now, everything is already work. If index is not yet created - usual indexing with following reload will do the thing. Notice, indexing with --rotate in the case unlink (delete) previous files.

For a single table we can fix behaviour of 'reload table' (see below p.3), and it should be enough.

Current features (or bugs)

switchover (in config) + indexer --rotate kills previous files. M.b. it should be fixed. That is because in usual flow of rotation we do the same. But when path changed, this feature looks more like a bug.
interferention with 'link' files. If you switchover (in config) to a folder where 'link' file present, and reload - daemon will read the file and jump to the new place. And since link file is not managed by any tool - only manual investigation and removing the file can help. It looks like a serious reason to stop use link files and keep things simple.
'reload table foo' now just reloads from previous path. That is - it doesn't re-parse the config and achieve new path, if any. It just uses original path which was used to load the table. It looks like a bug.
'reload table foo from 'path' option switchover=1' is redundant. If we fix previous p.3 - then to achieve switchover of a single table it will be enough provide new path and config, and then reload the table (no need to reload all tables). So, variant with switchover can be
1. Removed
2. Left just as dupe of single 'reload table' (i.e. - reload by path from the config)
3. Left as enhanced v of 'reload table', which may use provided path different ways. There are different variants. Say, we modified path in the config, and '/old/path' changed to '/some/path'.
  - issue 'reload table foo from '/some/path' option switchover=1'. Daemon parses config, sees that provided path is the same as '/some/path' in config, then performs reloading of that single table from new path. In this case provided path used just to check that it matches path in the config.
  - issue 'reload table foo from '/another/path' option switchover=1'. Daemon parses config, sees provided path is different from '/some/path' in config, and issues error because paths are not match.
  - issue 'reload table foo from '/yet/another/path' option switchover=1'. Daemon reloads foo from /yet/another/path, but only temporary, as exception. If after such switchover it receives HUP, or restarts - it will forget about such switchover and just load table from the path provided in config.

Checklist:

^{To be completed by the assignee. Check off tasks that have been completed or are not applicable.}

The text was updated successfully, but these errors were encountered:

sanikolaev added the est::size_M label Apr 29, 2024

sanikolaev assigned klirichek Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve RELOAD ... switchover=1 #2100

Improve RELOAD ... switchover=1 #2100

sanikolaev commented Apr 29, 2024

Improve RELOAD ... switchover=1 #2100

Improve RELOAD ... switchover=1 #2100

Comments

sanikolaev commented Apr 29, 2024

Proposal:

Currently implemented flows of operations over tables in plain mode on startup:

1. On the path of a table in config present 'base set of files' belonging to the table.

2. On the path of a table in config present 'new set of files'

3. On the path of a table in config present 'base set of files', and also 'new set of files'

4. On the path of a table in config there are no files, or they're damaged

Currently implemented flows of operations over tables in plain mode on runtime on HUP signal:

1. new table arrived in config

2. table vanished from config

2. table changes role

3. table changes name

4. table changes path; new path contains 'base set of files'

5. table changes path; new path contains 'new set of files'

Switchover behavior

Current implementation

Proposed implementation

Current features (or bugs)

Checklist: