Skip to content

KnowledgeBase Data Sharing

dskyle edited this page Jul 22, 2018 · 3 revisions

The KnowledgeBase is capable of storing large objects: strings, integer arrays, double arrays, and binary files (i.e., blobs). The KnowledgeBase avoids copying these large values as much as possible, by storing them behind std::shared_ptrs. This page describes the semantics of this sharing, how to avoid copying of large data, and how to avoid undefined behavior with shared data.


Table of Contents


KnowledgeRecord Implicit Sharing

For example:

#include "madara/knowledge/KnowledgeBase.h"
using namespace madara::knowledge;

KnowledgeBase kb;
kb.set("numbers", std::vector<double>(100, 42));

This will store a vector with 100 copies of the number 42 in the Madara variable "numbers". This operation itself will not copy the data of the vector.

// Set the first value to 47, in-place, without copying
kb.set_index("numbers", 0, 47);

KnowledgeRecord numbers = kb.get("numbers")

The KnowledgeRecord named numbers will contain a reference to the same internal vector. This get() operation does not copy the vector. However, calling to_doubles() to access this vector will cause all 100 doubles to be copied into a new vector. In addition, trying to modify the vector in either the KnowledgeBase or KnowledgeRecord with set_index will likewise cause a copy to be made. This is typically called "copy on write".

To access the shared data directly from a KnowledgeRecord, you can call retrieve_index(), but this only works effectively for integer or double array values.

Explicit Sharing

To access the vector without causing a copying operation, you can use share_doubles() instead of to_doubles(). This will return a std::shared_ptr<const std::vector<double>>. Note the const. The KnowledgeBase assumes that only it will modify the values it holds. Modifying the data through a shared pointer would violate that assumption, and likely result in undefined behavior. You can, however, safely read the data via the shared pointer, without any locking, since if the KnowledgeBase would modify the shared data, it will make a copy if necessary. For example:

// You can use auto as variable type to avoid this long type name
std::shared_ptr<const std::vector<double>> shr_numbers = kb.share_doubles("numbers");

Now, shr_numbers refers to the same data held within the KnowledgeBase. If we continue with this:

kb.set_index("numbers", 1, 57);

the KnowledgeBase will copy its "numbers" variable before modifying the held data, since it knows it has shared that data previously. Note that this applies even if the shared_ptr has gone out of scope. Once shared externally, the KnowledgeBase treats that data as const.

Storing Data Without Copying

In the example given above, the "numbers" variable was inserted into the KnowledgeBase without copying because the vector was passed using an rvalue reference to the temporary vector. In contrast, this version does copy:

std::vector<double> numbers(100, 42);
kb.set("numbers", numbers);

Because the vector is not a temporary, it gets passed as an lvalue reference, which forces a copy operation inside set(). To avoid this copying, use this instead:

kb.set("numbers", std::move(numbers));

This will avoid copying the doubles, but will leave the numbers local variable empty. This does still result in a heap allocation to store the new std::vector itself.

You can also pass data into the KnowledgeBase using a std::unique_ptr. In this example:

std::unique_ptr<std::vector<double>> numbers_ptr = madara::mk_unique<std::vector<double>>(100, 42);
kb.set("numbers", std::move(numbers_ptr));

The pointer will be stored directly (although a heap allocation is still performed to store the shared_ptr control block). This unique_ptr approach usually only gives significant benefit to objects that have a large sizeof(), which cannot be moved efficiently.

Safe Sharing

Either approach for storing existing data in a KnowledgeRecord without copying leaves open the possibility of undefined behavior if a pointer to the data is created prior to storage in a KnowledgeRecord. Even if this pointer isn't invalidated, using it directly is unsafe, as it will violate assumptions made by KnowledgeRecord and KnowledgeBase. If you require a pointer to the data once it is stored, use the appropriate share_* method after insertion. In addition, don't cast away the const qualifier on the shared_ptr. Modifying the data shared by the share_* methods is undefined behavior.

Constructing In-place

A safer and more efficient alternative to moving data into a KnowledgeRecord is to construct it in-place, using the emplace_* methods. Each type (other than scalar integers and doubles) has an emplace_* method which forwards all the arguments passed to the underlying type's constructor. You can also invoke in-place construction by passing an instance of a type tag from madara::knowledge::tags. For example:

// Construct a std::vector<double> in-place
KnowledgeRecord record(tags::doubles, 100, 42);
// Construct a std::string in-place
record.emplace_string("Hello World");