Skip to content

[WIP] C API Proposal

Jack Gerrits edited this page May 8, 2020 · 14 revisions

Note: The names object, function, and parameter names used in this document will not be the final names. The names are simply used as a descriptor for its intended purpose; final names will need to be properly namespaced and C-ified

Objects and typedefs

Exposed Objects

enum HashType 
{ 
  VW_DEFAULT_HASH, 
  VW_STRING_HASH, 
  VW_BYTE_HASH 
}; 

enum ErrorCode { /* TBD */ };

struct vw_feature  // Don’t expose the internal VW feature struct. 
{ 
  float value; 
  size_t weight_index; 
}; 

struct primitive_feature_space  // For manual construction and manipulation of an example's features 
{ 
  unsigned char name; 
  vw_feature* fs; 
  size_t len; 
}; 

The following types would allow for custom reductions to be plugged into the VW reduction stack. This functionality currently does not exist in any form


vw* initialize(std::string s, io_buf* model = nullptr, bool skipModelLoad = false, trace_message_t trace_listener = nullptr, void* trace_context = nullptr); 

vw* initialize(int argc, char* argv[], io_buf* model = nullptr, bool skipModelLoad = false, trace_message_t trace_listener = nullptr, void* trace_context = nullptr); 

vw* seed_vw_model(vw vw_model, std::string extra_args, trace_message_t trace_listener = nullptr, void* trace_context = nullptr);

// Allows the input command line string to have spaces escaped by '\' 
vw* initialize_escaped(std::string const& s, io_buf* model = nullptr, bool skipModelLoad = false, trace_message_t trace_listener = nullptr, void* trace_context = nullptr); 

void cmd_string_replace_value(std::stringstream*& ss, std::string flag_to_replace, std::string new_value); 

VW_DEPRECATED("By value version is deprecated, pass std::string by const ref instead using `to_argv`") 
char** get_argv_from_string(std::string s, int& argc); 

// The argv array from both of these functions must be freed. 
char** to_argv(std::string const& s, int& argc); 
char** to_argv_escaped(std::string const& s, int& argc); 
void free_args(int argc, char* argv[]); 

const char* are_features_compatible(vw& vw1, vw& vw2); 
/* 
  Call finish() after you are done with the vw instance.  This cleans up memory usage. 
 */ 
void finish(vw& all, bool delete_all = true); 

void sync_stats(vw& all); 
enum ReductionType { /* TBD */ };
enum ReductionDataType { /* TBD */ };

struct reduction // Maybe pass around copies of this to avoid the question of memory ownership. Its lightweight and won't be used in the hot path, so perf impact is minimal. Will need to be careful about versioning the struct correctly if we need to extend it 
{ 
  predict_fn predict; 
  learn_fn learn; 
  ReductionDataType input_data_type; 
  ReductionDataType output_data_type; 
  ReductionType type = CUSTOM; 
  ... // needs to partially mimic the learner struct 
}; 

Function Pointers

Note: Function signatures TBD. The following type names are used as function pointers in these documents

trace_message_t // A handler for trace logs. Null will result in no trace log handling
example_factory_t // A factory to create examples from. In practice, this should probably just be a memory pool of some sort

Typedefs

typedef void* vw;

typedef void* example;

typedef void* options;

The following typedefs would allow for custom reductions to be plugged into the VW reduction stack. This functionality currently does not exist in any form

typedef void* ReductionStack;  // Whats the interaction between ReductionStack and vw? 

Internal Objects

vw – The internal representation of the VW object. Possibly in a wrapper to allow easy import and export of data across the API

example – The internal representation of the example object. Probably won't need a wrapper

options – The Some well-defined options object. Possibly a flatbuf or protobuf. TBD

The following types would allow for custom reductions to be plugged into the VW reduction stack. This functionality currently does not exist in any form

ReductionStack – Should we treat the stack as an array or a linked list? Internally it’s a linked list, but an array is easier to manipulate for a user

struct ReductionStack { 
  Stack<learner> _reduction_stack; 
  size_t _size?; 
  vw* _vw; 
}; 

Exposed Functions

The proposed API functions are divided into separate documents based on their primary purpose. Some functions can fall into multiple categories, and these are split relatively arbitrarily based on my own judgement. Each document is each split into 3 sections.

  • The current C++ API surface that most language bindings use (found in vw.h).
  • The interfaces that will be deprecated in the proposal
  • The the proposed functionality.

VW Setup and Teardown

Parser and Example Lifetime

Example Manipulation

Utilities

Questions/Comments - Jack

  • Remove import_example? - it is unclear what it does
  • Remove export_example? - it is unclear what it does
  • parse_label should return a label
  • Rename new_unused_example -> allocate_example?
  • Does num_weights make sense? Isn't it sparse?
  • feature_space stuff should all be merged into one place
  • I think this is the right time to remove prediction and label from the example at an api level. It fixes and cleans up A LOT
Clone this wiki locally