Skip to content

New Discovery Reconcile Algorithm

Markus Goetz edited this page Jul 28, 2017 · 8 revisions

In this page, I try to explain what I have in mind as a replacement of the current csync algorithm. This is removing the last csync brick.

In order to understand the new one, i'm explaining how this currently works:

The Current Discovery / Reconcile step

(The discovery phase was called update in csync speak)

  • csync_update() local: goes recursively 'stat'ing each file on the FS. Build up the local.tree. Compare with the DB to know the initial intruction.
  • csync_update() remote: goes recursively 'PROPFIND'ing each file on the server. Build up the remote.tree Compare with the DB to know the initial intruction. (Optimisation: if the etag has not change, simply copy the value from the DB in csync_statedb_get_below_path)
  • csync_reconcile: Go over local.tree and remote.tree to find the difference and update the instructions.
  • SyncEngine::treewalkFile: Post-reconcile: merge local.tree and remote.tree together in the _syncItemMap
  • The sync Engine proceed by converting the _syncItemMap into a vector. Check permissions, then start creating propagator jobs
  • Propagation start and do the change to the file system, the server, and update the DB as it goes.

Some things we notice: is that this SyncEngine::treewalkFile function is doing this akward convertion with between the csync and the Qt world. This is quite complex function.

New proposal:

Merge discovery and reconcile in one function.

For every directory:

  • Query the server if the etag has not changed
  • Query the FS for file in that directory
  • Compare the sets of file vs. DB
    • NEW files: check if they are renamed then propagate the file (or the rename)
    • CHANGED files: propagate it now.
    • DELETED files: schedule a propagation at the end of the sync (maybe it is renamed, so we need to keep it)
    • NOTHING: don't do anything
  • Recurse if it is a directory (after its MkcolJob or MoveJob finished).

Pseudocode:

fn process_path(string directory_name, bool etag_has_changed) {
    // If the etag was changed, do a PROPFIND to get the list of files. 
    // Otherwise get them from the database.
    set server_entries = etag_has_changed ? do_propfind(directory_name) 
                                          : get_from_db(directory_name);
    // Iterate over local files in this directory
    for(auto fileEntry : QDirIterator(directory_name)) {
        auto instruction = compare_with_db(fileEntry);
        if (instruction == NEW) {
            if (etag_has_changed && server_entries.contains(fileEntry.name)) {
                 process_sync(fileEntry, server_entries[fileEntry.name]);
            } else if (check_rename_inode(fileEntry)) { // check the db for file with same inode
                 process_rename(fileEntry);
            } else {
                 process_sync(fileEntry, nullptr)
            }
        } else if (instruction == CHANGED) {
            process_sync(fileEntry, server_entries[fileEntry.name]);
        } else if (instruction == NONE) {
            if (etag_has_changed && !server_entries.contains(fileEntry.name)) {
                 process_delete_later(fileEntry.name); // Will delete at the end of the sync (in case it was renamed)
                 server_entries.remove(fileEntry.name)
                 continue;
            } else if (etag_has_changed && compare_with_db(server_entries[fileEntry.name]) != NONE) {
                process_sync(fileEntry, )
            }
        }
        
        server_entries.remove(fileEntry.name)

        if (fileEntry.is_directory()) {
           //recurse
           process_path(fileEntry.name, etag_has_changed && compare_with_db(server_entries[fileEntry.name]) != NONE)
        }
    }

    // Now the file that are not there localy:
    for (auto entry : server_entries) {
       if (etag_has_changed && compare_with_db(entry) == NEW && check_rename_fileid(entry)) {
             process_rename(entry);
       } else if (etag_has_changed && compare_with_db(entry) != NONE) {
           process_sync(nullptr, entry)
           if (fileEntry.is_directory()) {
              // recurse
              process_path(entry.name, true);
           }
       } else {
           process_delete_later(entry.name);
       }
    }
}

// Check if we should download or upload the file, handle conflicts, ....
fn process_sync(auto localEntry, auto remoteEntry) {
    if (!check_permission_and_other_tweak(localEntry, remoteEntry))
       return;
    propagate(localEntry, remoteEntry);
}

// Do the delete after the sync. 
// Note: we still need to recurse as well within the deleted directory to make sure
// there is no files to restore.  Or ignored files that blocks the delete
fn process_delete_later(string fileName);

The idea is to do all step at once. and proceed directory by directory, recursively.

Later, we can only go in the directory that were changed according to the file system watcher. We don't need to go at all into directory we know have not changed.

TODO: Figure how to do UPDATE_METADATA when childs are propagated to the remote

TODO: Find out how to parallelize this to not have the one-remote-PROPFIND-at-a-time issue like with the current codebase.