Skip to content

YapDatabaseCloudCore

Robbie Hanson edited this page Nov 12, 2018 · 14 revisions

I can sync to that.

YapDatabaseCloudCore is a general cloud syncing system that allows you to support a wide variety of cloud systems. For example: Dropbox, Box, Google Drive, Amazon Cloud Drive, ownCloud ... (you get the idea)

Overview

There are a lot of cloud services out there. And from a consumer's perspective, there's a lot of variation. Some services integrate better with different operating systems. Some are more tailored for enterprise. And then there's pricing.

But what about from a developer's perspective? What if we ignored the client apps, and the pricing tiers, and the marketing? What if we just looked at the developer API's for each cloud service? What would we find?

We'd find an awful lot of similarity. All of these services offer a REST API. Most are file based (although some are record based). And they all provide some kind of revision system. For example, if we're going to upload a modified version of a file, our HTTP request probably has some kind of header field which indicates the previous version number of the file we're trying to update. And the server would reject the update request if we're out-of-date. (Just like a git system would reject a "push", if we're out-of-date and need to do a "pull" first.)

YapDatabaseCloudCore is cloud service agnostic. That is, it's designed to be able to support just about any type of cloud service you can throw at it. And it supports both objects (key/value pairs), and also regular files (jpg, pdf, mp4, etc).

Operation Based

A common misperception is that a record-based cloud service is required in order to support syncing objects (records with key/value pairs). This is untrue. And YapDatabaseCloudCore helps you to support either record-based cloud services OR file-based cloud services.

There's a whole bunch of (cheap & efficient) file-based sync services out there. But developers primarily work with objects. And objects aren't files, right? ... But they could be. We already know how to serialize our objects for storage in a database. It's not difficult to extend this concept of serialization for storage in a file in the cloud. We could use JSON, or XML, or protocol buffers, or something custom. So, theoretically, we could store all of our objects as individual files in the cloud.

Locally we're still using a database. We just need to adapt the object for storage in the cloud, in a format which the cloud service supports.

You'll note that this is true for file-based cloud services, as well as record-based cloud services. For example, many record-based cloud services only support JSON, or only support a handful of data types. Thus, to store something like a color in the cloud, you'd need to convert it to/from a string.

YapDatabaseCloudCore assits you by helping you store a "queue of operations". That is, an ordered list of changes that you need to push to the server. You have all the control in terms of what information gets stored in an "operation", and you're the one that's responsible for executing the "operation". Thus just about any kind of cloud service can be supported.

Basics

Imagine we want to support FooBarCloud, the latest imaginary cloud service provider. Since YapDatabaseCloudCore is kind of a "bring your own cloud API" system, we can support it.

To break this down, let's go through all the various steps one would need to take in order to sync an object to the FooBarCloud. For example, let's say we have a contact book app, and the user has just modified the firstName of a contact, and hit 'Save'. This translates into a modified MYContact object that gets saved into the local database. Here's everything that needs to happen:

  • Within the same atomic transaction that modified the MYContact object, we need to record the fact that this contact was changed, and needs to be pushed to the cloud.
  • Also within the same atomic transaction, we should record what changes were made. So something like: "old:{firstName: Robby}, new:{firstName: Robbie}". (This will allow us to properly merge changes in the event of a conflict. More on this topic later.)
  • After the transaction, we need to generate the file-based representation of the contact object.
  • Then we need to fetch, from our local database, the latest revision tag for the contact. Since FooBarCloud uses the etag system, this means the latest known etag we have for the corresponding URL.
  • Then we need to perform the proper HTTP REST call to upload the file-based representation to the cloud. For our FooBarCloud, this would mean performing a PUT to /contacts/the_contact_uuid.json. (And we have to specify: If-Match: PreviousETag)
  • Assuming we get a 200 OK response from the server, we need to execute a new atomic transaction to remove the "flag" in the database that says the upload needs to be performed. And remove the saved information regarding what changed. (The "old:{}, new:{}" stuff.)
  • Also in this same transaction we need to record the new etag.

So ... that's a lot of boilerplate-type stuff that needs to happen. But you'll notice that the actual HTTP request was only a small portion of it. And in fact, if you switched from FooBarCloud to MooCowCloud, the only thing that would change would be specifics of the HTTP request.

So all that other boilerplate-type stuff above that isn't the HTTP request, that's exactly what YapDatabaseCloudCore is for.

Operations

At the heart of the system is YapDatabaseCloudCoreOperation. When you need to perform some REST operation, you record that fact with one of these objects. That is, you create a YapDatabaseCloudCoreOperation instance to record whatever information you'll need in order to perform the REST operation in the future. And this instance gets stored in the database. So even if the user quits the app, and relaunches it tomorrow, the operation instance will automatically be restored.

Think about it like this: In a magical ideal world, the user's device would always be connected to the Internet. And the connection would be so fast, every network operation would be instantaneous. And there would never be upload errors or merge conflicts. And there would be rainbows and unicorns and rivers made out of chocolate. But in the real world we have to accept certain facts. The user might not have an Internet connection. And it might be slow. And there will be a delay between the moment we save something in the local database, and when that information hits the cloud. And there will be interrupted uploads and merge conflicts and death and taxes.

Long story short: You cannot simply perform the REST operation at the moment you need to, because you cannot guarantee it will succeed. Instead you need to record information about the REST operation that needs to be performed, wait for it to succeed (resolving conflicts as neccessary), and then delete the recorded operation. And YapDatabaseCloudCoreOperation is what helps you perform this mundane type stuff.

There's a lot more to be said about operations (dependencies, priorities, graphs, pipelines, etc). But let's start with the basics.

YapDatabaseCloudCoreOperation

YapDatabaseCloudCoreOperation is a bare-bones class that provides the very minimal functionality required by the YapDatabaseCloudCore extension. As such, you're highly encouraged to subclass YapDatabaseCloudCoreOperation, and add your own properties. Whatever you need to facilitate the REST operations and your sync logic.

When subclassing, ensure that you:

  1. properly implement the NSCoding protocol
    (so your custom properties get saved to the database)
  2. properly implement the NSCopying protocol
    (because the system will make copies of your instances)
  3. property subclass the isEqualToOperation:
    (because the system uses this method to see if a database write is required in various cases)

Operation Order

When syncing objects to a cloud service provider, one of the most difficult tasks is properly ordering all the operations.

Your local database (such as YapDatabase) supports atomic transactions. This means you can change multiple objects simultaneously, and save them all to the database in one single atomic transaction. Your cloud service provider, however, likely does NOT support atomic transactions involving multiple files.

This can make things complicated when our objects:

  • have references to other objects
  • and those references are expected to be valid

For example, a new PurchaseOrder object may point to a new Customer object. Within the local database, we can store both the new PurchaseOrder object and new Customer object within the same atomic transaction. But when pushing these objects to the cloud, we can only push one at a time. So, in this case, we'd like to push the Customer object first, and then the PurchaseOrder.

YapDatabaseCloudCore solves these problems using operation dependencies.

Operation Dependencies

Every operation can be assigned a set of dependencies. That is, you can specify that operationB depends on operationA.

NSString *pathA = @"/files/topSecret.txt.key";
NSString *pathB = @"/files/topSecret.txt.encrypted";

opA = [FooBarCloudOperation uploadWithCloudPath:pathA];
opB = [FooBarCloudOperation uploadWithCloudPath:pathB];

[opB addDependency:opA.uuid];

The system will then ensure that opB is not started until opA has completed.

Thus you can think of operation dependencies as HARD REQUIREMENTS.

Operation Priorities

In contrast to dependencies, operation priorities are more like SOFT HINTS.

That is, you can given an operation a higher or lower priority as a hint to the system. And YapDatabaseCloudCore will take the priority into consideration (along with dependencies), when dispatching operations to you.

Graphs

Every database commit may generate zero or more operations. If one or more operations are created, then a YapDatabaseCloudCoreGraph instance is created to manage all the operations (for the commit). The graph will take into account each operation's dependencies (and priorities), and it will conceptually create a graph of the operations.

For example, say the following operations are created:

  • opA (priority=100, dependencies={})
  • opB (priority=0, dependencies={})
  • opC (priority=0, dependencies={opB})
  • opD (priority=0, dependencies={opB})

The graph will deduce:

  • opA should be executed first, as it has no dependencies and the highest priority
  • opB can be executed next. opB does NOT need to wait for opA to finish. So opB can be executed in parallel (while opA is still in flight).
  • opC & opD cannot be started until opB completes.
  • opC & opD can be executed in any order, and can execute in parallel.

Further, you do NOT need to worry about dependencies between commits. (For example, if objectA was created in commit #4, and objectB was created in commit #5, and objectB references objectA...) This is a non-issue in YapDatabaseCloudCore (when using the default settings), because each commit gets its own graph. And the graph for commit #4 MUST complete in its entirety before the graph for commit #5 can start.

(This is the default setting, however there are several optimzations that you can implement to speed things up !)

Pipelines

A pipeline is the management system for operations that you'll deal with.

Every operation that you create must be assigned to a single pipeline. The pipeline will then insert the operation into a graph, and it will then dispatch operations to the delegate when the operations are ready to be uploaded to the cloud.

Every pipeline has a delegate (that's you), which must implement a single method:

@protocol YapDatabaseCloudCorePipelineDelegate
@required

- (void)startOperation:(YapDatabaseCloudCoreOperation *)operation
           forPipeline:(YapDatabaseCloudCorePipeline *)pipeline;

@end

The delegate method is where you'll perform the REST API stuff for the operation.

Getting Started

The first thing you need to do is register a YapDatabaseCloudCore extension. For general information about extensions within YapDatabase, see the extensions wiki article.

YapDatabaseCloudCore *ext =
  [[YapDatabaseCloudCore alloc] initWithVersionTag: @"v1"
                                           options: nil

YapDatabaseCloudCorePipeline *pipeline =
  [[YapDatabaseCloudCorePipeline alloc] initWithName: YapDatabaseCloudCoreDefaultPipelineName
                                            delegate: self];
	
[ext registerPipeline:pipeline];

[database asyncRegisterExtension: ext
                        withName: @"my_cloud_core_ext"
                 completionQueue: dispatch_get_main_queue()
                 completionBlock:^(BOOL ready)
{
  NSLog(@"Cloud core extension registered !");
}];

Once your extension is registered, the general flow of things is like this:

  1. you create operations as needed, and add them to a pipeline

    [databaseConnection readWriteWithBlock:^(YapDatabaseReadWriteTransaction *transaction){
    
        // other stuff, such as modifying an object, etc...
        
        YapDatabaseCloudCoreOperation *op = <create op>
        [[transaction ext:name] addOperation:op]; // add to pipeline
        
        // Which pipeline? => specified by op.pipeline property
        // If op.pipeline is nil, uses default pipeline
    }];
    
  2. the pipeline handles the grunt work:

    • analyzing the dependencies you set
    • analyzing the priorities you set
    • tracking how many operations are in-flight
    • comparing this to the configured maxConcurrentOperationCount
    • checking to see if the pipeline is suspended (for example, you might suspend the pipeline when you detect there's not an Internet connection).
  3. when the pipeline determines an operation can be dispatched, it will invoke the delegate method

    - (void)startOperation:(YapDatabaseCloudCoreOperation *)operation
           forPipeline:(YapDatabaseCloudCorePipeline *)pipeline
    {
        // Here's where you perform the proper REST API
        // stuff for the operation...
    }
    
  4. If the REST operation succeeds, you update the database accordingly, and also notify the pipeline

    [databaseConnection readWriteWithBlock:^(YapDatabaseReadWriteTransaction *transaction){
    
        // other stuff, such as modifying an object, etc...
        
        [[transaction ext:name] completeOperation:op]; // Done !
    }];
    
  5. Or, if the REST operation doesn't succeed, you can tell the pipeline what to do:

    // Option A:
    // Tell the pipeline to restart it whenever it's ready
    [pipeline setStatusAsPendingForOperation:op];
    
    // Option B:
    // Tell the pipeline restart it whenever it's ready,
    // but after a delay (e.g. exponential backoff algorithm)
    [pipeline setStatusAsPendingForOperation:op retryDelay:10];
    
    // Option C:
    // Suspend the pipeline in order to resolve a conflict
    [pipeline suspend];
    

Dealing with conflicts

If you've read this far, you've probably come to the conclusion that YapDatabaseCloudCore is pretty boring. And you're right.

  • YOU have to create all operations
  • YOU have to figure out what information to store in the operations
  • YOU have to configure the operation dependencies & priorities
  • YOU have to perform the REST API operations
  • YOU have to deal with errors & conflicts

YapDatabaseCloudCore just handles a LOT of various grunt work for you. Grunt work that you'd have to spend a bunch of time coding and testing yourself. And grunt work that is largely the same regardless of cloud platform.

Optimizations - Part 1

Most applications will simply use a single pipeline (the default pipeline). However, YapDatabaseCloudCore supports multiple pipelines, which opens up some interesting possibilities.

For example, let's say that we're making a recipe app. Which means that we're syncing Recipe objects and Photos (that the user takes of the prepared recipe).

Uploading a recipe is quick, as the recipe object/file is rather small. However, uploading a photo of the recipe is going to take a lot longer. Since we want to store full-size photos, this means we're uploading several megabyte files. This isn't a problem, but it may have an effect on our syncing.

For example, imagine the user performs the following actions (in this order):

  • adds a photo to a recipe
  • creates a new recipe (appetizer)
  • creates a new recipe (cookies)

This results in 3 new operations:

  • upload photo (8 megabytes) (graph #1)
  • upload new recipe (appetizer) (graph #2)
  • upload new recipe (cookies) (graph #3)

Which means the recipes won't hit the server until after the photo has been uploaded. Maybe this is what you want. But what if it's not? What if you want the large photos to not block the recipe objects?

One easy solution is to move the photos to a different pipeline.

Here's how it works:

  • Every operation must be assigned to exactly one pipeline.
  • Every pipeline operates independently.
  • Operations cannot have cross-pipeline dependencies.

So if you moved all photo operations to their own pipeline, then the upload of these large files won't block the upload of changes to recipe objects.

This is the "low-hanging fruit optimization". That is, it addresses the most common complaint (a single large upload blocking the entire queue), in the most common scenario (the large uploads can be done separately from all the other objects, or can otherwise be managed independently).

Optimizations - Part 2

Are you a pro at managing operations & dependencies ?

Are you ready to point a (metaphorical) loaded shotgun at your foot, and prepared to deal with the consequences ?

Well then, you're in luck. Because YapDatabaseCloudCorePipeline supports the concept of a single "flat graph".

Here's the deal:

  • it's much much much easier to deal with operation dependencies within the context of a single commit
  • this is a bite sized amount of changes that's easy to grok, and easy to manage
  • by default, the pipeline will place all operations within a single commit into a single graph
  • and it will refuse to start any operations from that graph until all previous graphs (from previous commits) have fully completed
  • this ensures the cloud state moves steadily from commit to commit, and generally frees you from dealing with millions of crazy edge cases

However, this safety comes at a cost - speed.

If you watch the queue in action, there will likely be many time when you think, "hey, this operation could be running in parallel without any problems". Except it doesn't, because it's from a later commit.

There is a way to improve this, but it comes at a cost - more code & more responsibility.

YapDatabaseCloudCorePipeline supports an advanced mode of operation called "FlatGraph". Here's how it works:

  • it will conceptually add every operation to a single graph
  • it reality, it still puts each operation into a graph corresponding to its specific commit
  • but it doesn't restrict itself to only dispatching operations from the lowest commit graph
  • for example, if the maxConcurrentOperationCount is 5, and there aren't any more operations that can be dispatched from graphA (commitA), then it will go about dispatching operations from graphB (commitB)
  • assuming multiple operations can be dispatched, all with the same priority value, it will give precedence to opertions from earlier commits

This means you have a bunch of work to do. Specifically, you need to create a formal dependency graph. That is:

  • given any possible operation (opA) in commitA
  • and given any possible operation (opB) in commitB
  • your formal dependency graph must determine if opB should depend on opA

This may or may not be easy, it all depends on your setup. If you think this is a challenge you want to take on, here's how to go about it.

  • Create a subclass of YapDatabaseCloudCore (MyAppCloud)

  • Create a subclass of YapDatabsaeCloudCoreConnection (MyAppCloudConnection)

  • Create a subclass of YapDatabaseCloudCoreTransaction (MyAppCloudTransaction)

  • Override this method in YapDatabaseCloudCoreTransaction:

     - (NSArray *)processOperations:(NSArray *)inOperations
                         inPipeline:(YapDatabaseCloudCorePipeline *)pipeline
                       withGraphIdx:(NSUInteger)operationsGraphIdx