Skip to content
Steve Cook edited this page Aug 5, 2013 · 3 revisions

Welcome to the open-source-search-engine wiki!

Quick install instructions are available.

Features

  • Live demo at http://www.gigablast.com/
  • Written in C/C++ for optimal performance.
  • Over 500,000 lines of C/C++.
  • 100% custom. A single binary. The Web Server, Database and everything else is all contained in this source code in a highly efficient manner.
  • Scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers.
  • Reliable. Has been tested in live production since 2002 on billions of queries on indexes of over 12 billion web pages.
  • Track record. Has been used by many clients. Has been successfully used in distributed enterprise software.
  • Cached web pages with query term highlighting.
  • Supports any document conversion plugin to convert PDF, etc. to HTML
  • Shows popular topics of search results (Gigabits)
  • Email alert monitoring.
  • "Synonyms" based on wiktionary data. Using query expansion method.
  • Customizable "synonym" file: my-synonyms.txt
  • Stores position and format information of each word in an indexed document.
  • Complete scoring details are displayed in the search results.
  • Indexes anchor text of inlinks to a web page.
  • Can cluster results from same site.
  • Duplicate removal from search results.
  • Distributed web crawler.
  • Crawler/Spider is highly programmable and URLs are binned into priority queues. Each priority queue has a throttle and max outstanding connection parms.
  • Complete REST/XML API
  • Can inject documents into the index in real time using XML or HTML.
  • Automated data corruption detection and repair based on hardware failures.

Features available but currently disabled because of code overhaul

  • Boolean query support
  • Spellchecker