Skip to content

praxis-of-nines/khafra-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Khafra Search

Khafra allows you to easily deploy and run Manticore or Sphinx as an Elixir deployment. The idea is eventually to create a managed cluster similar to ElasticSearch. You will want to use khafra if:

  • You want to keep your deployments and tooling Elixir based and not learn how to maintain Sphinx or Manticore
  • You test and reindex your project often in development and wish to speed up your development cycle
  • You find it messy to keep track of all the configuration files for projects over and over
  • You wish to deploy Sphinx without configuring CRON jobs on the server
  • You want to monitor sphinx from an Elixir or Phoenix project
  • New to search indexing and want to learn

Khafra is a dependency that can be added to any Elixir project and uses Quantum to handle job execution schedules (for non-real time indexing use cases).

To query your running sphinx environment you can use the Giza Sphinx Client for Elixir

Installation

def deps do
  [
    {:khafra_search, "~> 0.1"}
  ]
end

# Add to your application or supervisor
def start(_type, _args) do
    import Supervisor.Spec

    # List all child processes to be supervised
    children = [
      ...,
      supervisor(Khafra.Supervisor, [])
    ]

    opts = [strategy: :one_for_one, name: YourApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

Getting Started

First up set some indexing config in your application to connect to your database and index a table:

config :khafra_search, :source_sqldb,
  adapter: :postgres,
  database: "database_name",
  username: "db_user_name",
  password: "db_user_pass",
  hostname: "localhost"
# Note the \\ newline deliminator for query strings
config :khafra_search, :source_person,
  parent: :source_sqldb,
  query: """
    SELECT id, name, company, title, updated_at \\
    FROM persons
  """,
  attributes: [
    updated_at: :datetime],
  fields: [
    name:    :string,
    company: :string,
    title:   :string
  ]

config :khafra_search, :i_person,
  parent: :index_defaults,
  source: :source_person

# Specifying indices to index allows to change which indexes are rotated per environment
config :khafra_search, 
  indices: [:i_person]

Now let's do a local Sphinx install and query some data

# May take time depending on connection speed
> mix khafra.sphinx.download linux_64

> mix khafra.gen.sphinxconf

# Try out your config
> mix khafra.sphinx.index all

# Start the search daemon
> mix khafra.sphinx.searchd

# You can now query sphinx! (Recommendation: use the Giza Elixir search client).  Let's rotate the index while running
> mix khafra.sphinx.index rotate all

Rotating the index can be set on a schedule using the 'advanced' configuration options. You should now be able to test locally and your deployment depends entirely on how you like to run your Elixir deployments.

Advanced Configuration

Configure the indexer to rotate once per day (see Quantum for more details):

config :khafra_search, Khafra.Scheduler,
  timezone: "America/Los_Angeles",
  global: true,
  timeout: :infinity,
  jobs: [
    {"* * * * *", {Khafra.Job.Index, :run, [
      [{:option, "--rotate"}, {:option, "--all"}]
    ]}},
    {"@daily", {Khafra.Job.Index, :run, [{:option, "--rotate"}, {:option, "--all"}]}}
  ]

Configure other indexer defaults + generate wordforms (see Sphinx Docs for details):

# Note the cwd! keyword so the generator uses an absolute path for all of your environments
config :khafra_search, :index_defaults,
  type: "plain",
  source: {:sql, :source_sqldb},
  morphology: "none",
  min_stemming_len: "1",
  min_word_len: "1",
  min_infix_len: "2",
  html_strip: "0",
  preopen: "0",
  wordforms: "[cwd!]/sphinx/wordforms.txt"

> mix khafra.gen.wordform "s02e02" "season 2 episode 2"

> mix khafra.sphinx.index rotate all

Deployment Example

Coming soon

Development Direction (potential features upcoming):

  • Generators for some of the deployment tasks that are useful. For example when using distillery you probably want to set these commands up:
#!/bin/sh

release_ctl eval --mfa "Khafra.ReleaseTasks.download_sphinx/1" --argv -- "$@"
#!/bin/sh

release_ctl eval --mfa "Khafra.ReleaseTasks.generate_config/1" --argv -- "$@"
#!/bin/sh

release_ctl eval --mfa "Khafra.ReleaseTasks.indexer/1" --argv -- "$@"
#!/bin/sh

release_ctl eval --mfa "Khafra.ReleaseTasks.searchd/1" --argv -- "$@"

And in your rel/config.exs

environment :prod do
  set include_erts: true
  set include_src: false
  set cookie: :"some complicated cookie"
  set commands: [
    index: "rel/commands/indexer.sh",
    searchd: "rel/commands/searchd.sh",
    gen_config: "rel/commands/gen_config.sh",
    download_sphinx: "rel/commands/download_sphinx.sh"
  ]
end
  • Helpers for distributed indexes so that a cluster knows exactly what to do without more specific instructions and sphinx configuration

  • Testing and helpers to maintain a real time sphinx index

  • Mix tasks to generate configuration with sensible defaults (possibly directly from postgres/mysql table data?)

  • Monitoring tools and optional UI

  • Heartbeat and monitoring logic

About

A deployable service for a fast and reliable distributable search index

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages