Skip to content

guide to the toolscape

Philip (flip) Kromer edited this page May 16, 2012 · 9 revisions

Guide to the Infochimps Toolscape

  • Gorillib: minimal-dependency core language modification and low-level toolkit. Fine-grained control over what is loaded.

    • core -- indispensible improvements to the core language, metaprogramming support, and broadly-useful convenience methods.
    • path_helpers -- path_to, autoload_paths (like $PATH)
    • model -- lightweight structured object definition
    • builder_model -- builder (dsl-style) model definition
    • icss -- construct structured object from schema
  • Configliere: Manage settings

    • commandline -- read settings from commandline params
    • command -- git-style executables; provide multiple commands (including scoped options)
    • layer -- Project settings through a late-resolved stack of config objects. Intended to solve the 'patch the config for test/dev/prod', the 'only some config variables apply for this command, some apply for this other one', and the 'organization / cloud / cluster / facet / server' problems
  • Wukong: data flows and job flows

    • fs -- Abstracts file, hdfs (jruby or thrift), s3n, s3hdfs, and scp.
    • job -- Workflow definition
    • flow -- Dataflow definition
    • streamer -- Black-box data transform
    • widgets -- Common data transforms
    • hadoop_rb -- Hadoop jobs using streamers
    • flume_rb -- Flume flow configuration and wukong-streamer flume decorators
  • Vayacondios: data goes in, the right thing happens. Universal routing of facts, configuration and metrics throughout the organization.

    • server -- receive events via HTTP, websockets, flume, or UDP (statsd), and make simple constrained queries.
    • configliere -- transparently syndicate configuration through configliere
    • notifications. -- an activesupport/notification observer
    • triggers -- plug in any wukong transformer to manipulate events upon receipt
    • workers -- collection of small daemons that pull in truth, scheduled or on-demand.
    • backend -- mongo; cube; activity stream
  • Hanuman: Elegant small graph assembly

    • models -- graph, stage, event
    • binding -- bind graph to resources, execute it
    • graphdot -- express graph as .dot (graphviz) format, and thusly PNG or SVG.
    • canvas -- view and edit a hanuman graph from your browser
  • Swineherd: Common interface on ugly tools

    • commander -- Turn readable hash into safe commandline (param conv, escaping) does this go in configliere?
    • launcher -- Execute command, capture stdin/stderr
    • reporter -- Summarize execution with a viacondios-able hash
    • mixins -- Java, gnu-style, has_input/output
    • template -- template scripts with configliere variables
    • apps -- Hadoop, pig, flume; ?? cp, mv, rm, zip, tar, bz2, gz, ssh, scp ??
  • Ironfan: System Diagram come to life

    • models -- cluster, facet, server, machine, component, aspect, announcement. Use configliere/layer and gorillib/dsl_model.
    • knife plugins --
    • silverware -- discovery and aspect slicing
    • pantry -- cookbooks
    • ci --
    • chimpstation -- set up a workstation
    • homebase -- organizes all of it
  • Goliath + SenorArmando:

Toolscape Ecosystem

We build on the following tools:

  • Testing:

    • rspec
    • guard (not watchr). Plugins exist for arbitrary processes ('process'), rspec, chef, live reload.
    • machinist for fixture generation
    • simplecov for coverage testing
    • only include spork if your test environment takes a long time to load.
  • Conversion:

    • nibbler for HTML parsing
    • addressable for URL parsing; see also postrank-uri
    • FasterCSV for CSV parsing
    • OJ for JSON
    • [Erubis] for simple ERB templating. [Tilt] for generic templating
    • [Redcarpet] for markdown
  • Documentation

  • Web requests

    • in eventmachine, use em-http-request.
    • otherwise, use rest-client
  • Web Framework

    • rails
      • use haml, not ERB
      • use compass, SASS (not scss) and the twitter bootstrap CSS framework
      • use jQuery
    • goliath
    • dalli for caching on Heroku
    • jasmine for Javascript testing
    • devise, warden and omniauth for authentication
  • Database

    • tire for Elasticsearch
    • mongoid for MongoDB
    • mysql2 for mysql
  • Development Helpers