Skip to content

[WIP] C API Introduction

peterychang edited this page Jun 16, 2020 · 12 revisions

This page contains an initial introduction to the VW C API, including the rationale behind its creation as well as high level design concepts and principles that will need to be followed to maintain a consistent interface.

This page does NOT contain any real function signatures. Any object or function found here should be taken as a pseudocode example used to illustrate specific concepts or principles. Many of the specific designs and patterns the API will use will necessarily be guided by the limitations imposed by the C language

Rationale

Currently, VW does not have an official API surface; or another way to say it would be that every header file in VW is considered to be a part of the API.

Every language binding in VW binds to and exposes different objects and functions. This results in inconsistent workflows and capabilities across our supported languages. Additionally the python bindings (currently using our C++ interface) has two problems, both of which a rich C interface will solve. The first is that the boost-python binaries need to be installed to compile or run the VW library. The second is a binary incompatibility between the MacOS C++ libraries and Anaconda's python binaries (see issue #2100).

A well-defined API surface will also allow internal code changes to be made without the risk of changing or removing functionality consumers of the library depend on. Finally, a carefully designed C API can potentially allow us to maintain backward ABI compatibility, which would open the possibility of using dynamically loaded libraries for faster client-side deployments. This final point should be considered a stretch goal though, as maintaining a proper ABI requires immense care.

Design Principles

  • VW is a library first. The command line tool will be functionality added on top of it
  • The only entry point into the core VW library will be the C interface
    • At minimum, the following modules will need to be migrated to the new interface:
      • All language bindings (including the creation of a new C++ interface, which will bind to the C interface)
      • The command line tool
      • Any external libraries that use VW
      • Any end-to-end tests that currently use any part of the C++ interface
    • The following will NOT need to be migrated
      • Unit tests
      • Possibly some functional tests
  • Existing functionality should be allowed as much as possible. Legacy language bindings should be recreated on top of the new API if at all possible
    • There may be some existing functionality that is either impossible to replicate or may not make sense anymore. These should be discussed on a case-by-case basis
  • The library should own all memory in the following cases
    • The memory represents an internal data structure (eg: example)
    • A pointer or reference to the memory is saved anywhere, in any form, within the library
      • In the case of data that is used strictly as const input parameters (eg: string constants), the caller should own the memory and the library should perform a copy if necessary.
  • The C API should be designed for a power user, allowing for maximal functionality and flexibility. Simplified interfaces will be built on top of it.

Style Guide

API design and layout

  • Output parameters come at the end of the parameter list
    • Parameters that are both input and outputs need not follow this rule. Place them wherever makes the most sense.
  • Every API function will return a status code. Function outputs will be returned via out-param
    • Language-specific bindings should hide this detail and return errors in a language-idiomatic way
  • The library owns all memory associated with opaque types
  • The library owns all memory associated with a pointer or reference that is saved within the library
    • Deep copies should be made of all const pointer types that need to be saved
  • All objects created via a create call must be destroyed via a destroy call
    • The library will never take ownership of memory created in this way. The objects should be copied if necessary
  • The implementation for any API functions in header <blah>.h must be in <blah>.cc

Naming Conventions

Naming conventions should generally follow the GTK coding style

  • Object names should be pascal cased and prefixed with VW -- eg: VWPascalCase
  • Function and variable names should be snake cased -- eg: vw_snake_case
  • All function names must be prefixed in the following order:
    1. vw -- eg: vw_workspace
    2. create/destroy if applicable -- eg: vw_create_workspace
    3. The component the function is operating on -- eg: vw_create_workspace or vw_example_setup
    4. get if applicable -- eg: vw_example_get_feature_space
  • Functions that allocate and return a pointer to in-library memory must be prefixed with create
  • Functions that free a pointer containing in-library memory must be prefixed with destroy
  • Functions that return a pointer to an internal data structure must be prefixed with get

Limitations in C

The language features allowed under the standard C specifications are very limited, and may be surprising to the typical C++ developer. Listed below are some of the C++ features that are not available in C.

  • Object-oriented functionality
    • Private member variables
    • Member functions
      • Function pointers are allowed
    • Inheritance, polymorphism, or any form of encapsulation
  • Function overloading
    • Function names cannot be the same regardless of the type or number of arguments
  • References
    • Pointers must be used instead
  • Default parameter values
  • Namespaces
Clone this wiki locally