Skip to content
This repository has been archived by the owner on Oct 24, 2018. It is now read-only.

Architecture

Gabriel Commeau edited this page Nov 12, 2014 · 2 revisions

Components

The Flume2Storm connector is made of several components:

core

It contains the APIs and the base classes for the Flume2Storm event. There are 2 APIs:

  • Location Service API. It describes how the location service is used, and what a service provider is. Both the storm-sink and the flume-spout use the location service. The storm-sink is a service provider.
  • Connection framework API. It defines:
    • Connection Parameters. It describes how to establish a connection between the event sender and Receptor. Implementations may contain attributes such as IP address, port number, or which connection timeout to use.
    • Event Sender. It describes how an event sender is used. The storm-sink is an event sender.
    • Event Receptor. It describes how an Event Receptor is used. The flume-spout is an event receptor.
  • Flume2Storm event. It defines what an F2SEvent is and provides utility classes to facilitate working with it.

storm-sink

This is the Flume2Storm implementation of a Flume sink that sends Flume events to Storm. In the context of the location service API, this is a service provider. In the context of the connection framework API, it is an event sender.

When opening, it performs two tasks:

  • As an event sender, it initializes and starts the server.
  • As a service provider, it starts the location service and registers its connection parameters.

When closing, it unregisters its connection parameters from the location service, and stops the location service and the event sender server.

While active and if event receptors are connected, it reads events from the upstream Flume channel to form a batch of events, and converts the Flume events (from the channel) into Flume2Storm events. It then sends the batch of events to the event receptors. The details of how this last step is done depends on the actual implementation.

flume-spout

This is the Flume2Storm implementation of a Storm spout that receives events from Flume. In the context of the connection framework API, it is an event receptor.

A flume-spout is configured using event emitters, which define how to convert the Flume2Storm events into Storm tuples and how to emit them down the topology.

When opening, the flume-spout starts the location service and registers a listener in order to receive service provider addition and removal notifications. When closing, it stops the location service.

While active, the flume-spout waits for new service providers to "appear". When this happens, it creates an event receptor instance using the associated connection parameters and starts the instance. Note that the connection parameters are provided through the location service. The event receptors of a flume-spout receive Flume2Storm events from the event senders and store them temporarily in memory, until the nextTuple() method of the spout is called. When nextTuple() is called, the event receptor uses the configured event emitter(s) to convert the events into tuples and to emit them down the topology.

kryonet-flume2storm

This is an implementation of the connection framework that uses KryoNet. It provides:

  • An implementation of the connection parameters, which contains the attributes for the event receptor to connect: address, port, KryoNet buffer sizes.
  • An implementation of the service provider, which basically wraps around the connection parameters. Note that the serialization class for the service provider converts to and from String (which makes it convenient to read when accessing the ZooKeeper data).
  • An implementation of the event receptor, which contains a KryoNet client. Once started, it maintains the connection to the event sender. If the connection is broken, it tries to reconnect, using a configuration reconnection delay, until it is stopped.
  • An implementation of the event sender, which contains a KryoNet server. It currently uses only one strategy to dispatch the events to the event receptors: for each new event to send, pick the next client in a round-robin fashion and try to send the event. If it succeeds, move on to the next event. If not, try the next client. Once a message has failed to be sent for a configurable amount of retries, it marks the event as failed and moves on to the next event.

static-location-service

This is an implementation of the location service that does not use ZooKeeper, but instead relies on configuration files. In order to use it, a ServiceProviderConfigurationLoader implementation is required, which creates and configures the service providers from the configuration. Upon creation of the static-location-service, the configuration file is read and the list of service providers generated. After that, calls to register or unregister will not do anything.

dynamic-location-service

This is an implementation of the location service that uses ZooKeeper, especially using the "group membership" pattern. When starting the dynamic-location-service, it connects to the configured ZooKeeper quorum and starts receiving notifications for newly registered or unregistered service providers. While started, the dynamic-location-service tries to maintain the connection to the ZooKeeper quorum. If broken, it makes a best effort to reconnect using the same session. Calls to register() registers a new service provider and all the connected instances of the ServiceListener receive a notification of the newly registered service provider. Note that a disconnection from ZooKeeper will not trigger unregistration notifications.

Internal Components

The Flume2Storm connector contains other components for internal use by the previously detailed modules:

  • utility: Library for common utilities and tools for the Flume2Storm connector. Amongst others, it contains a CircularList implementation, and a TCPForwarder that is useful for test purposes.

  • kryo-utils: Utility classes related to the usage of Kryo. This library contains a serializer/deserializer for Flume2Storm events, and it is used by the flume-spout as well as the kryonet-flume2storm modules.

  • test-impl: This library contains a simple implementation of the Flume2Storm connector for test and demonstration purposes. All the components of both the location service and connection framework are available. This library is used by the other projects when an implementation of the API is required by the unit tests.

  • integration-tests: Integration tests that put all the components together. It contains a unit test that tests the storm-sink and the flume-spout using all the possible combinations of implementation of the location service and the connection framework. It also contains an example Storm topology in order to validate the Flume2Storm connector using external Storm and Flume clusters.

General workflow

  1. As a service provider, the storm-sink registers its connection parameters to the location service. The location service adds the storm-sink to the service provider list.
  2. The flume-spout connects to the location service and requests the list of service providers. The location service sends the service provider list to the flume-spout. Note that notifications events from the location service can be received in order to listen for addition or removal of service providers.
  3. The flume-spout uses the service provider list information to connect to the storm-sink.
  4. The storm-sink retrieves the events from the Channel, converts them into Flume2Storm events, and finally sends then to the flume-spout.