Skip to content

Featurizer API server-side application for featurization (RESTful API for feature extraction via injected features extraction library)

License

Notifications You must be signed in to change notification settings

BDALab/featurizer-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Featurizer API

GitHub last commit GitHub issues GitHub code size in bytes GitHub top language GitHub

Server side application:

This package provides a modern RESTFul featurizer API created using Python programming language and Flask-RESTful library. It is designed to be used for various feature extraction libraries due to its feature injection capabilities (instructions are provided in the Configuration section) and flexible input/output data definition (multi-dimensional samples/features, multiple subjects, etc.). On top of that, the featurizer API provides endpoints for user authentication and JWT-based request authorization, it supports handling of cross-origin resource sharing, request-response caching, advanced logging, etc. It comes also with the basic support for containerization via Docker (Dockerfile and docker-compose).

Client side application:

To make the use of the Featurizer API as easy as possible, there is a PyPi-installable lightweight client side application named Featurizer API client that provides method-based calls to all endpoints accessible on the API. For more information about the Featurizer API client, please read the official readme and documentation.

Endpoints:

  1. featurization endpoints (api/resources/featurizer)
    1. /featurize - calls .extract on the specified features-extractor (featurizer interface). This endpoint is designed to be used to compute the features specified in the features-extraction pipeline.
  2. security endpoints (api/resources/security)
    1. /signup - signs-up a new user.
    2. /login - logs-in an existing user (obtains access and refresh JWT tokens).
    3. /refresh - refreshes an expired access token (obtains refreshed FWT access token).

The full programming sphinx-generated docs can be seen in the official documentation.

Contents:

  1. Installation
  2. Configuration
  3. Featurization
  4. Injection
  5. Workflow
  6. Data
  7. Examples
  8. License
  9. Contributors

Installation

# Clone the repository
git clone https://github.com/BDALab/featurizer-api.git

# Install packaging utils
pip install --upgrade pip
pip install --upgrade virtualenv

# Change directory
cd featurizer-api

# Activate virtual environment
# Linux
# Windows

# Linux
virtualenv .venv
source .venv/bin/activate

# Windows
virtualenv venv
venv\Scripts\activate.bat

# Install dependencies
pip install -r requirements.txt

# Two necessary steps (see the configuration section bellow):
#
# 1. create .env file with the JWT secret key at api/.env
# 2. configure features extractor libary injection at api/configuration/injection.json

Configuration

Necessary configuration

To make the Featurizer API working, there are two steps that must be performed:

  1. create .env file with the JWT secret key at api/.env to enable proper user authorization of the requests (more information can be seen in the next sub-section; 2. point - authorization)
  2. configure the features-extractor library injection at api/configuration/injection.json to enable automatic injection of the feature extractor to be used to compute the features (more information can be seen in the next sub-section; 6. point - featurization)

Full configuration

The package provides various configuration files stored at api/configuration. More specifically, the following configuration is provided:

  1. authentication (api/configuration/authentication.json): it supports the configuration of the database of users. In this version, the sqlite database is used for simplicity. The main configuration is the URI for the *.db file (pre-set to api/authentication/database/database/database.db). An empty database file is created automatically.
  2. authorization (api/configuration/authorization.json): it supports the configuration of the request authorization. In this version, the JWT authorization is supported. The main configuration is the name of the .env file that stores the JWT secret key. For security reasons, the .env file is not part of this repository, i.e. before using the API, it is necessary to create the .env file at api-level, i.e. api/.env and set the JWT_SECRET_KEY field (e.g. JWT_SECRET_KEY="wfTHu38GpF5y60djwKC0EkFj586jdyZR").
  3. cors (api/configuration/cors.json): it supports the configuration of the cross-origin resource sharing. In this version, no sources are added to the origins, (to be updated per deployment).
  4. caching (api/configuration/caching.json): it supports the configuration of API request-response caching. In this version, the simple in-memory caching with the TTL of 60 seconds is used.
  5. logging (api/configuration/logging.json): it supports the configuration of the logging. The package provides logging on three levels: (a) request, (b) response, (c) werkzeug. The log files are created in the logs directory located at the featurizer's root directory.
  6. featurization (api/configuration/injection.json): it supports the configuration of the features-extraction library injection. By design, the features-extraction library is not part of the requirements.txt. The injection of the feature extractor as well as the requirements on the features-extraction library and the process of featurization are summarized in the Featurization and Injection sections.

Featurization

The featurizer API provides featurization interface class FeaturesExtractorPipeline located at api/featurization/interface that accepts a specific injected feature extractor class and the extractor's configuration. It also provides the extract method accepting data to be featurized and the pipeline of features to be extracted. To featurize the data, it calls the extract method on the initialized and configured feature extractor instance. The definition of the featurization interface class can be seen bellow.

class FeaturesExtractorPipeline(object):
    """Class implementing the features extractor pipeline interface"""

    def __init__(self, extractor, sample, config):
        """
        Initializes the FeaturesExtractorPipeline (using injected extractor).

        :param extractor: feature extractor interface class
        :type extractor: <injected>.interface.featurizer.HandwritingFeatures
        :param sample: sample data to extract the features from
        :type sample: api.interfaces.inputs.Sample
        :param config: feature extractor configuration
        :type config: api.interfaces.inputs.FeaturesExtractorConfiguration
        """
        self.extractor = extractor(sample.values, sample.labels, **config.extractor_configuration)

    def __repr__(self):
        return str({"extractor": self.extractor})

    def __str__(self):
        return repr(self)

    def __call__(self, pipeline):
        return self.extract(pipeline)

    def extract(self, pipeline):
        """
        Extracts the features from the features extraction pipeline.

        :param pipeline: pipeline with the feature names and kwargs
        :type pipeline: api.interfaces.inputs.FeaturesPipeline
        :return: extracted features and feature labels
        :rtype: dict
        """

        # Extract the features via the injected features extractor
        extracted = self.extractor.extract(pipeline.pipeline)

        # Return the extracted feature values and labels
        return {
            "values": extracted["features"],
            "labels": extracted["labels"]
        }

The features-extraction library must implement the FeatureExtractor class serving as an interface between the featurizer API and the features-extraction library. The interface must be placed at <library>/interface/featurizer. As shown above, the feature extractor must accept **extractor_configuration in its __init__ method to enable passing the configuration (the configuration is optional on the library level, but the interface must be consistent). An example template of the feature extractor interface is shown bellow.

class FeatureExtractor(object):
    """
    Class implementing the features extractor interface for the Featurizer API.

    For more information about featurizer, see the following repositories:
    1. [server side](#github.com/BDALab/featurizer-api)
    1. [client side](#github.com/BDALab/featurizer-api-client)

    For more information about the attributes, see: ``extract(...)``
    """

    def __init__(self, values, labels=None, **configuration):
        """
        Initializes the FeatureExtractor featurizer API interface.

        :param values: data values to extract the features from
        :type values: numpy.ndarray
        :param labels: data labels for data samples, defaults to None
        :type labels: list, optional
        :param configuration: common extractor configuration
        :type configuration: **kwargs, optional
        """

        # Set the data values/labels
        self.values = values
        self.labels = labels if labels else []

        # Set the extractor configuration
        self.configuration = configuration if configuration else {}

    def extract(self, pipeline):
        """
        Interface method: extract the features.

        **Data**

        1. data is of type: ``numpy.ndarray``.
        2. data is mandatory.
        3. data shape: In general, data to have the shape (M, ..., D). Where M
           stands for subjects (i.e. subjects are in the first dimension), and
           D stands for D data samples (of shape ...).
            1. in the case of data having the following shape: (D, ), the API
               assumes it is a vector of D data sample points for one subject.
               It transforms the data to a row vector: (1, D) to add the
               dimension for the subject.
            2: in the case of data having the following shape: (M, ..., D),
               the API does not transform the data, but it assumes there are
               M subjects abd D data samples, each having (...) dimensionality,
               e.g. if data has the shape (M, 3, 10) it means that there are
               M subjects and each of the subjects has 10 data samples (each
               being three dimensional).

        **Labels**

        1. labels are of type: ``list``.
        2. labels are optional.
        3. labels are of length D (for each data sample, there is one label)

        **Configuration**

        1. configuration are of type: ``dict``.
        2. configuration is optional.
        3. configuration provides common kwargs for feature extraction

        **Pipeline**

        1. pipeline is of type: ``list``.
        2. pipeline is mandatory.
        3. each element in the pipeline is of type: ``dict``.
        4. each element in the pipeline has the following keys: a) ``name``
           to hold the name of the feature to be computed, and b) ``args``
           to hold the arguments (kwargs) for the specific feature extraction
           method that is going to be used (it is of type: ``dict``).

        **Output**

        The extracted features follow the same shape convention as the input
        data: the subjects are in the first dimension, and the features are
        in the last dimension (each feature having shape ...).

        :param pipeline: pipeline of the features to be extracted
        :type pipeline: list
        :return: extracted features and labels
        :rtype: dict {"features": ..., "labels": ...}
        """

        # TODO: computation (implement the feature extraction)
        values, labels = None, None

        # Return the extracted features and feature labels
        return {
            "features": values,
            "labels": labels
        }

Injection

The injection of the features-extraction library is configured at api/configuration/injection.json. The configuration looks as following:

{
  "features_extraction_library": {
    "injection_types": [
      "local",
      "pip"
    ],
    "injection_type": "",
    "injection": {
      "local": {
        "import_name": "",
        "installation_name": ""
      },
      "pip": {
        "import_name": "",
        "installation_name": ""
      }
    }
  }
}

There are two options how to inject a feature extractor:

  1. injection_type is set to pip; features-extraction library is installed via pip, in this case, the import_name as well as the installation_name must be specified (exception is raised otherwise)
  2. injection_type is set to local; features-extraction library package is placed at featurizer-api, in this case, the import_name is needed only (no installation needed) (exception is raised otherwise)

The installation_name is used to install the features-extraction library via pip install <installation_name>. And the import_name is used to import the feature extractor and feature extractor-specific exceptions via import <import_name>.interface.featurizer.FeatureExtractor and from <import_name>.interface.featurizer.exceptions import *.

Workflow

In order for a user to use the API, the following steps are required:

  1. a new user must be created via the /signup endpoint
  2. the existing user must log-in to get the access and refresh tokens via the login endpoint
  3. calls to the /featurize endpoint can be made
  4. if the access token expires, a new one must be obtained via the /refresh endpoint

For specific examples for each step of the workflow, see the Examples section.

Data

Input data

Structure of the input data is the following: it is a dict object with these field-value pairs (example bellow):

  • samples (dict, mandatory; placeholder for the sample values/labels)
  • samples.values (numpy.array, mandatory; sample values)
  • samples.labels (list, optional; sample labels)
  • features (dict, mandatory; placeholder for the features-extraction pipeline)
  • features.pipeline (list, mandatory; features-extraction pipeline)
  • features.pipeline[0..., F] (dict, mandatory; single feature configuration)
  • extractor_configuration (dict, optional; features-extractor configuration)

Shape:

Shape of the sample values: (first dimension, (inner dimensions), last dimension)

  • the first dimension is dedicated to subjects
  • the inner dimensions are dedicated to the dimensionality of the samples
  • the last dimension is dedicated to samples

Important requirement that must be met is to provide the features-extractor with the data it can process (shape, format, etc.).

# Example:
# - M subjects, D samples of (... dimensions)
# - N features in the pipeline
# - sampling frequency (fs) in the extractor configuration
{
    "samples": {
        "labels": ["element 1", ... "element D"],
        "values": np.array((M, ..., D))
    },
    "features": {
        "pipeline": [
            {
                "name": "feature 1",
                "args": {"abc": 123, "def": 456}
            },
            ...
            {
                "name": "feature N",
                "args": {}
            },
            ...
        ]
    },
    "extractor_configuration": {"fs": 8000}
}

Examples:

  • 100 subjects, each having 30 1-D samples (shape (1,) or shape (1, 1)): shape = (100, 1, 30)
  • 250 subjects, each having 20 2-D samples (shape (2,) or shape (1, 2)): shape = (250, 2, 20)
  • 500 subjects, each having 10 samples with the shape of (3, 4): shape = (500, 3, 4, 10)

Output data

Structure of the output data is the following: it is a dict object with these field-value pairs (example bellow):

  • features (dict, mandatory; placeholder for the feature values/labels)
  • features.values (numpy.array, mandatory; feature values)
  • features.labels (list, optional; feature labels)

Shape:

Shape of the feature values: (first dimension, (inner dimensions), last dimension)

  • the first dimension is dedicated to subjects
  • the inner dimensions are dedicated to the dimensionality of the features
  • the last dimension is dedicated to features
# Dimensions: M subjects, N features of (... dimensions)
{
    "features": {
        "labels": ["feature 1", ... "feature N"],
        "values": array of shape (M, ..., N)
    }
}

Examples:

  • 100 subjects, each having 30 1-D samples (shape (1,) or shape (1, 1)), samples shape: (100, 1, 30); 1000 1-D features, features shape (100, 1, 1000)
  • 250 subjects, each having 20 2-D samples (shape (2,) or shape (1, 2)), samples shape: (250, 2, 20); 100 2-D features, features shape: (250, 2, 100)
  • 500 subjects, each having 100 samples with the shape of (3, 4), samples shape: (500, 3, 4, 100)); 50 features with the shape of (5, 10, 15), features shape: (500, 5, 10, 15, 50)

Serialization/deserialization

As the sample/feature values are stored as a numpy.array, they must be JSON-serialized/deserialized. For this purpose, the package provides the api.wrapper.data.DataWrapper class.

Examples

User sign-up

import requests

# Prepare the sign-up data (new user to be created)
body = {
    "username": "user123",
    "password": "pAsSw0rd987!"
}

# Call the sign-up endpoint (locally deployed API)
response = requests.post(
    "http://localhost:5000/signup",
    json=body)

User log-in

import requests

# Prepare the log-in data (already created user)
body = {
    "username": "user123",
    "password": "pAsSw0rd987!"
}

# Call the log-in endpoint (locally deployed API)
response = requests.post(
    "http://localhost:5000/login",
    json=body)

# Get the access and refresh tokens from the response
if response.ok:
    access_token = response.json().get("access_token")
    refresh_token = response.json().get("refresh_token")

Featurization

import numpy
import requests
from pprint import pprint
from api.wrappers.data import DataWrapper

# Set the number of subjects (10)
num_subjects = 10

# Set the shape of the samples for each subject (1, 100): 1-D sample vector with 100 samples
samples_shape = (1, 100)

# Prepare the sample values/labels (labels are optional)
values = numpy.random.rand(num_subjects, *samples_shape)
labels = [f"sample {i}" for i in range(samples_shape[-1])]

# Serialize the sample values
values = DataWrapper.wrap_data(values)

# Prepare the featurization pipeline (example: 2 dummy features)
features_pipeline = [
    {
        "name": "feature 1",
        "args": {"arg_x": 100, "arg_y": [1, 2, 3]}
    },
    {
        "name": "feature 2",
        "args": {"arg_z": True}
    }
]

# Prepare the features extractor configuration (example: fs = 8000)
extractor_configuration = {"fs": 8000}

# Prepare the featurizer data
body = {
    "samples": {
        "labels": values,
        "values": labels
    },
    "features": {
        "pipeline": features_pipeline
    },
    "extractor_configuration": extractor_configuration
}

# Prepare the authorization header (take the access_token obtained via /login endpoint)
headers = {
    "Authorization": f"Bearer <access_token>"
}

# Call the featurize endpoint (locally deployed API; endpoints: /featurize)
response = requests.post(
    url="http://localhost:5000/featurize",
    json=body,
    headers=headers,
    verify=True,
    timeout=10)

if response.ok:

    # Get the features
    values = response.json().get("features").get("values")
    labels = response.json().get("features").get("labels")
    
    # Deserialize the features
    values = DataWrapper.unwrap_data(values)
    
    pprint(values)
    pprint(labels)

Expired access token refresh

import requests

# Prepare the refresh headers (take the refresh_token obtained via /login endpoint)
headers = {
    "Authorization": f"Bearer <refresh_token>"
}

# Call the refresh endpoint (locally deployed API)
response = requests.post(
    "http://localhost:5000/refresh",
    headers=headers)

# Get the refreshed access token
if response.ok:
    access_token = response.json().get("access_token")

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributors

This package is developed by the members of Brain Diseases Analysis Laboratory. For more information, please contact the head of the laboratory Jiri Mekyska mekyska@vut.cz or the main developer: Zoltan Galaz galaz@vut.cz.

About

Featurizer API server-side application for featurization (RESTful API for feature extraction via injected features extraction library)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published