Architecture
The Flume2Storm connector is made of several components:
It contains the APIs and the base classes for the Flume2Storm event. There are 2 APIs:
-
Location Service API. It describes how the location service is used, and what a service provider is. Both the
storm-sink
and theflume-spout
use the location service. Thestorm-sink
is a service provider. -
Connection framework API. It defines:
- Connection Parameters. It describes how to establish a connection between the event sender and Receptor. Implementations may contain attributes such as IP address, port number, or which connection timeout to use.
-
Event Sender. It describes how an event sender is used. The
storm-sink
is an event sender. -
Event Receptor. It describes how an Event Receptor is used. The
flume-spout
is an event receptor.
- Flume2Storm event. It defines what an F2SEvent is and provides utility classes to facilitate working with it.
This is the Flume2Storm implementation of a Flume sink that sends Flume events to Storm. In the context of the location service API, this is a service provider. In the context of the connection framework API, it is an event sender.
When opening, it performs two tasks:
- As an event sender, it initializes and starts the server.
- As a service provider, it starts the location service and registers its connection parameters.
When closing, it unregisters its connection parameters from the location service, and stops the location service and the event sender server.
While active and if event receptors are connected, it reads events from the upstream Flume channel to form a batch of events, and converts the Flume events (from the channel) into Flume2Storm events. It then sends the batch of events to the event receptors. The details of how this last step is done depends on the actual implementation.
This is the Flume2Storm implementation of a Storm spout that receives events from Flume. In the context of the connection framework API, it is an event receptor.
A flume-spout
is configured using event emitters, which define how to convert the Flume2Storm events into Storm tuples and how to emit them down the topology.
When opening, the flume-spout
starts the location service and registers a listener in order to receive service provider addition and removal notifications. When closing, it stops the location service.
While active, the flume-spout
waits for new service providers to "appear". When this happens, it creates an event receptor instance using the associated connection parameters and starts the instance. Note that the connection parameters are provided through the location service. The event receptors of a flume-spout
receive Flume2Storm events from the event senders and store them temporarily in memory, until the nextTuple()
method of the spout is called. When nextTuple()
is called, the event receptor uses the configured event emitter(s) to convert the events into tuples and to emit them down the topology.
This is an implementation of the connection framework that uses KryoNet. It provides:
- An implementation of the connection parameters, which contains the attributes for the event receptor to connect: address, port, KryoNet buffer sizes.
- An implementation of the service provider, which basically wraps around the connection parameters. Note that the serialization class for the service provider converts to and from
String
(which makes it convenient to read when accessing the ZooKeeper data). - An implementation of the event receptor, which contains a KryoNet client. Once started, it maintains the connection to the event sender. If the connection is broken, it tries to reconnect, using a configuration reconnection delay, until it is stopped.
- An implementation of the event sender, which contains a KryoNet server. It currently uses only one strategy to dispatch the events to the event receptors: for each new event to send, pick the next client in a round-robin fashion and try to send the event. If it succeeds, move on to the next event. If not, try the next client. Once a message has failed to be sent for a configurable amount of retries, it marks the event as failed and moves on to the next event.
This is an implementation of the location service that does not use ZooKeeper, but instead relies on configuration files.
In order to use it, a ServiceProviderConfigurationLoader
implementation is required, which creates and configures the service providers from the configuration.
Upon creation of the static-location-service
, the configuration file is read and the list of service providers generated. After that, calls to register or unregister will not do anything.
This is an implementation of the location service that uses ZooKeeper, especially using the "group membership" pattern.
When starting the dynamic-location-service
, it connects to the configured ZooKeeper quorum and starts receiving notifications for newly registered or unregistered service providers. While started, the dynamic-location-service
tries to maintain the connection to the ZooKeeper quorum. If broken, it makes a best effort to reconnect using the same session.
Calls to register()
registers a new service provider and all the connected instances of the ServiceListener
receive a notification of the newly registered service provider. Note that a disconnection from ZooKeeper will not trigger unregistration notifications.
The Flume2Storm connector contains other components for internal use by the previously detailed modules:
-
utility
: Library for common utilities and tools for the Flume2Storm connector. Amongst others, it contains aCircularList
implementation, and aTCPForwarder
that is useful for test purposes. -
kryo-utils
: Utility classes related to the usage of Kryo. This library contains a serializer/deserializer for Flume2Storm events, and it is used by theflume-spout
as well as thekryonet-flume2storm
modules. -
test-impl
: This library contains a simple implementation of the Flume2Storm connector for test and demonstration purposes. All the components of both the location service and connection framework are available. This library is used by the other projects when an implementation of the API is required by the unit tests. -
integration-tests
: Integration tests that put all the components together. It contains a unit test that tests thestorm-sink
and theflume-spout
using all the possible combinations of implementation of the location service and the connection framework. It also contains an example Storm topology in order to validate the Flume2Storm connector using external Storm and Flume clusters.
- As a service provider, the
storm-sink
registers its connection parameters to the location service. The location service adds thestorm-sink
to the service provider list. - The
flume-spout
connects to the location service and requests the list of service providers. The location service sends the service provider list to theflume-spout
. Note that notifications events from the location service can be received in order to listen for addition or removal of service providers. - The
flume-spout
uses the service provider list information to connect to thestorm-sink
. - The
storm-sink
retrieves the events from the Channel, converts them into Flume2Storm events, and finally sends then to theflume-spout
.