Skip to content

spec: WebDAV Optimizations

dragotin edited this page Jul 1, 2013 · 5 revisions

E-Tags

Introduction

The ownCloud clients use the WebDAV protocol (RFC 2518 et al) to access the files on the ownCloud Server, adding sync semantics on top.

When a sync run is performed, the clients need to know which files have changed. This can be achieved through time stamp comparison (dangerous due to clock skews on laptops and virtual servers) or changing hash sums (not an option due to ownClouds storage backend architecture, where some backends allow shared access and at the same time cannot provide hash sums on their own.

There are two naïve approaches to list the remote contents so the sync algorithm can judge what changed: Recursively iterating through all directories (called collections in WebDAV lingo) on the server via PROPFIND (slow due to roundtrips), or performing a single recursive PROPFIND with infinite depth (resulting in a potentially huge amount of data on every check).

Optimizations of Client Sync Operations

The ownCloud clients store the server filesystem layout in a local cache, and hence would only be interested in the difference between two consecutive sync runs. That leads to potential for optimization, which is exploited as follows:

Since ownCloud 4.5, an E-Tag property is provided by the server for every folder and file as part of its PROPFIND response, as proposed in RFC4918. As soon as a file changes, the file is assigned a new E-Tag. Likewise, it's folder receives a new E-Tag, as does its parent, etc. This way, the client traffic is reduced to a single HEAD or PROPFIND request in case nothing changes.

If something changes, the relevant changes can be quickly evaluated with a few PROPFINDs for those directories where the E-Tag has changed.

Case consideration for Client-Operations affecting E-Tags

  • Server has new file or directory: E-Tag gets assigned, E-Tags of parent directories are modified up until the root directory, PROPFIND result contains a new E-Tag yet unknown to client.
  • **Server file gets changed:**E-Tag changes, changes are propagated to all parent directories.
  • File on server gets deleted: File is deleted, E-Tag of parent directories is modified recursively.
  • Client has new file or directory: No existing entry in client sync database is found. Once a file has been uploaded to the server via PUT, a PROPFIND is issued to retrieve the E-Tag value and entered into the local sync database.
  • Client file gets deleted: The file entry exists in the client sync database but not longer on the file system.
  • Client file gets changed: The files modification time is different than the one stored in the database.

Server and Backend Implementation Details

Whenever a new file is created on the server, it gets added to the to the server side meta-data cache and a new E-Tag is generated and stored in the meta-data cache. Additionally the E-Tags of the parent folders of the new file gets re-generated.

Whenever a file on the server is changed either trough the web-interface or trough WebDAV, the meta-data in the cache gets updated and a new E-Tag is generated for the updated file and it's parent folders.

If a change is made not trough ownCloud but directly on the underlying filesystem, the server first needs to detect the change in the filesystem in either of the following ways:

  • Every time a folder is opened in the web-interface or WebDAV that folder is checked for changes
  • A cron-job walks trough all the folders in the meta-data cache and checks them for changes.
  • A cli-interface that can be used to trigger a check for changes by, for example, a daemon using inotify to watch for changes made in the filesystem.

Limitations

WebDAV uses XML based result transmission. These can grow big for large data sets which can be time consuming.

The current sync mechanism runs through three phases: The first one called updating scans the local and remote replica to get information about the current situation on both sides. The second phase called reconciling calculates the required changes, ie. which file has to be up- or downloaded. The third phase called propagation does the actual copying.

Since there can pass some time between the update phase and the actual propagation of a change, there is a risk that the source file changes again in between. That is going to be detected and the file is synced again in the next run. ...