Skip to content
JohnDaws edited this page Aug 13, 2018 · 2 revisions

The following example pipelines demonstrate how to use each of the transports to scale a Baleen process. A more detailed introduction can be found in Baleen scaling. It is assumed that the reader is familiar with setting up simple Baleen pipelines and so little explanation is given here regarding the basics of collection readers, annotators and consumers.

The configuration for each transport assumes the default (for example if working with the Docker examples at the foot of this page) but will likely need to be changed real applications.

Vertical scaling configuration

config.yml

pipelines:
- name: receiver
  multiplicity: 2
  file: ./receiver.yml
- name: sender
  file: .sender.yml

Horizontal scaling configuration

master_config.yml

pipelines:
# Optional receiver on master
#- name: receiver
#  multiplicity: 2
#  file: ./receiver.yml
- name: sender
  file: ./sender.yml

slave_config.yml

pipelines:
- name: receiver
  multiplicity: 4
  file: ./receiver.yml

In memory

(for vertical scaling only)

sender.yml

collectionreader:
  class: FolderReader
  folders:
  - ./files
   
consumers:
- class: uk.gov.dstl.baleen.transports.memory.MemoryTransportSender
#   topic: transport
#   capacity:
#   blacklist:
#   whitelist: 

receiver.yml

 
collectionreader:
  class: uk.gov.dstl.baleen.transports.memory.MemoryTransportReceiver 
#   topic: transport
#   blacklist:
#   whitelist: 

annotators:
- myAnnotators

consumers:
- print.Entities

ActiveMQ

http://activemq.apache.org/

sender.yml

activemq:
#  host: localhost
#  port: 61616
#  protocol: tcp
#  brokerargs: 
#  topic: baleen
#  user:
#  pass:

collectionreader:
  class: FolderReader
  folders:
  - ./files  

consumers:
- class: uk.gov.dstl.baleen.transports.activemq.ActiveMQTransportSender
#   topic: transport
#   capacity:
#   blacklist:
#   whitelist: 

receiver.yml

activemq:
#  host: localhost
#  port: 61616
#  protocol: tcp
#  brokerargs: 
#  topic: baleen
#  user:
#  pass:
 
collectionreader:
  class: uk.gov.dstl.baleen.transports.activemq.ActiveMQTransportReceiver 
#   topic: transport
#   blacklist:
#   whitelist: 

annotators:
- myAnnotators

consumers:
- print.Entities  

Kafka

https://kafka.apache.org/

sender.yml

kafka:
#  host: localhost
#  port: 9092
#  consumerGroupId: 1
 
collectionreader:
  class: FolderReader
  folders:
  - ./files
   
consumers:
- class: uk.gov.dstl.baleen.transports.kafka.KafkaTransportSender
#   topic: transport
#   capacity:
#   blacklist:
#   whitelist: 

receiver.yml

kafka:
#  host: localhost
#  port: 9092
#  consumerGroupId: 1

collectionreader:
  class: uk.gov.dstl.baleen.transports.kafka.KafkaTransportReceiver 
#  topic: transport
#  blacklist:
#  whitelist: 
#  timeout: 1000
#  maxPollDocs: 1
#  offset: earliest

annotators:
- myAnnotators

consumers:
- print.Entities  

RabbitMQ

https://www.rabbitmq.com/

sender.yml

rabbitmq:
#  host: localhost
#  port: 5672
#  virtualHost: /
  user: guest
  pass: guest
#  https: false
# Only relevent for http true
#  trustAll: false
# Only relevent for http true and trustAll false
#  sslprotocol: TLSv1.1
#  keystorePass: 
#  keystorePath: 
#  truststorePass: 
#  truststorePath: 
 
collectionreader:
  class: FolderReader
  folders:
  - ./files
  
consumers:
- class: uk.gov.dstl.baleen.transports.rabbitmq.RabbitMQTransportSender
#   topic: transport
#   capacity:
#   blacklist:
#   whitelist:   
#   exchange:
#   routingKey

receiver.yml

rabbitmq:
#  host: localhost
#  port: 5672
#  virtualHost: /
  user: guest
  pass: guest
#  https: false
# Only relevent for http true
#  trustAll: false
# Only relevent for http true and trustAll false
#  sslprotocol: TLSv1.1
#  keystorePass: 
#  keystorePath: 
#  truststorePass: 
#  truststorePath: 
 
collectionreader:
  class: uk.gov.dstl.baleen.transports.rabbitmq.RabbitMQTransportReceiver 
#   topic: transport
#   blacklist:
#   whitelist: 
  
annotators:
  - myAnnotators

consumers:
- print.Entities

Redis

https://redis.io/

sender.yml

redis:
#  host: localhost
#  port: 6379
 
collectionreader:
  class: FolderReader
  folders:
  - ./files
   
consumers:
- class: uk.gov.dstl.baleen.transports.redis.RedisTransportSender
#  topic: transport
#  capacity:
#  blacklist:
#  whitelist: 
 

receiver.yml

redis:
#  host: localhost
#  port: 6379
 
collectionreader:
  class: uk.gov.dstl.baleen.transports.redis.RedisTransportReceiver 
#   topic: transport
#   blacklist:
#   whitelist: 
  
annotators:
- myAnnotators

consumers:
- print.Entities

Docker commands for testing

For testing purposes with vertical scaling the following Docker commands may be useful to run the transports on the localhost. Note that in Windows it may be necessary to change the host to the Docker container IP in the sender and receiver yaml files.

ActiveMQ

docker run -d -e 'ACTIVEMQ_CONFIG_MINMEMORY=512' -e 'ACTIVEMQ_CONFIG_MAXMEMORY=2048' -p 61616:61616 -p 8161:8161 webcenter/activemq:5.14.3

You can then go to http://localhost:8161 in a browser to use the management console with credentials admin:admin

Kafka

docker run -d -p 2181:2181 -p 9092:9092 -e ADVERTISED_HOST=`localhost` -e ADVERTISED_PORT=9092 spotify/kafka

or on Windows if kafka is not visible on the localhost:

docker run -d -p 2181:2181 -p 9092:9092 -e ADVERTISED_HOST=`docker-machine ip \`docker-machine active\`` -e ADVERTISED_PORT=9092 spotify/kafka

NB we use this image as it includes the Zookeeper service in the same image.

RabbitMQ

docker run -d --hostname my-rabbit -p 15672:15672 -p 5672:5672 rabbitmq:3-management

You can then go to http://localhost:15672 in a browser to use the management console with credentials guest:guest

Redis

docker run -d -p 6379:6379 redis:3.2.9