Skip to content

Releases: schranz-search/schranz-search

Release 0.4.0 (2024-03-21)

20 Mar 23:45
c0b3df6
Compare
Choose a tag to compare

Full Changelog

Release 0.3.1 (2024-01-19)

19 Jan 12:21
6e84eb2
Compare
Choose a tag to compare

Full Changelog

Release 0.3.0 (2024-01-10)

10 Jan 21:00
51ce9d4
Compare
Choose a tag to compare

Highlights

  • 🔍 Support for Loupe 0.5
  • 🏗️ Support for Symfony 7
  • 🐘 Support for PHP 8.3

Full Changelog

Release 0.2.1 (2023-11-21)

21 Nov 23:04
fb2845d
Compare
Choose a tag to compare

Full Changelog | Documentation | Repository

Release 0.2.0 (2023-09-29)

28 Sep 22:54
082a303
Compare
Choose a tag to compare

SEAL welcomes Loupe

A search engine based on only PHP and SQLite

Loupe

An SQLite based, PHP-only fulltext search engine.

Use the new adapter with schranz-search/seal-loupe-adapter.

Read more about Loupe project by @Toflar: https://github.com/loupe-php/loupe.


Full Changelog

Release 0.1.1 (2023-06-05)

05 Jun 17:58
39c14e3
Compare
Choose a tag to compare

Full Changelog

First Release of SEAL (2023-05-16)

15 May 23:51
be611c1
Compare
Choose a tag to compare
Schranz Search Logo with a Seal on it with a magnifying glass

Schranz Search - First Release of SEAL

Monorepository for SEAL a Search Engine Abstraction Layer with support to different search engines
Documentation | Packages

Elasticsearch | Opensearch | Meilisearch | Algolia | Solr | Redisearch | Typesense
PHP | Symfony | Laravel | Spiral | Mezzio | Yii



Hello and welcome 👋,

About six month ago at the beginning of December 2022 I started the "Schranz-Search" project, which later out of that SEAL was born. At first more the project starteed as a research around different search engines which are around. At that time with a very limited knowledge about alternatives to Elasticsearch I was very curious what exists "beyond the tellerrand". With the support of different communities around Twitter, Reddit, Meetups, .. I could create a list of different search engines, and the list was bigger then expected and still grows.

My personally experience being a Core Developer at Sulu CMS Sulu a Symfony based CMS was limited to Elasticsearch. After having a look at the different search engines which did exist, I had to sortout which ones make sense to add to such an abstraction and are mostly used by the PHP community. Beside Opensearch, which should as a fork of Elasticsearch be a easy way to support, I did have a look at Algolia and Meilisearch and had so the first punch of search engines together I wanted to support. And so the start was created for SEAL the Search Engine Abstraction Layer.

Avoiding bringing complexity and search jargons to the end user

Search engines can be complex and they all have their own terms for different things. The target for the project was to hide the complexity of different search engines behind a easy understandable interface and so be very beginner friendly. The important part here was how the definition of the data which wanted to be added to the search engine need to be structured. Different search engines have different terms to define their mappings, fields, options, ... In the search engine abstraction layer wanted to avoid this kind of terms like doc_values: true, index: true, TAGS, keyword or other special terms of the different search engines. In the research I did stumble over Meilisearch definitions and really liked how they are targetting this issue. Instead of using some special search jargons terms in Meilisearch you are just telling what you want todo with the data fields you are indexing / saving. So a simple configuration inspired by Meilisearch was shipped to SEAL by using simple understandable words like searchable, filterable and sortable. So the following Schema definitions was born:

<?php

use Schranz\Search\SEAL\Schema\Field;
use Schranz\Search\SEAL\Schema\Index;
use Schranz\Search\SEAL\Schema\Schema;

$schema = new Schema([
    'blog' => new Index('blog', [
        'id' => new Field\IdentifierField('id'),
        'title' => new Field\TextField('title', sortable: true),
        'description' => new Field\TextField('description'),
        'tags' => new Field\TextField('tags', multiple: true, filterable: true),
        'published' => new Field\DateTimeField('published', sortable: true),
        'comments' => new Field\ObjectField('comments', [
            'text' => new Field\TextField('text', searchable: false),
            'author' => new Field\IntegerField('author'),
        ], multiple: true),
    ]),
]);

To be near as possible to PHP with the definitions the following types where supported Text, Integer, Float, Boolean and DateTime. This way all kind of different PHP Types are represented, with the multiple flag every type could also be an array of data. And with a special type called Object even assocative arrays could be added.

Strict vs. Dynamic Schema

There was nearly no discussion for me about going with a dynamic schema, I always wanted to go with a strict Schema like it is defined for databases. The first case was not all search engines supporting dynamic schemas. If you are new the search engines this means that you can push any data to it and by some kind of magic the search engines put that field into a specific type and configuration e.g. a string will by a text type in elasticsearch and so on, but if the first inputted string looks like a date it is a date field type and additional text will fail. My experience with this kind of mechanism was really bad and I only recommend it for quick prototyping. To go with a fixed and strict schema I wanted to prevent unwanted magic and add support for a wider range of search engines which do not support that kind of magic.

Creating a single interface to communicate with the search engine

After defining the definitions of the fields. The next and most important part was how the create the interface for the user of the library to communicate with the search engines. I'm really a big fan of @frankdejonge work with Flysystem, an abstraction for local and remote filesystems. It uses a single class and the Adapter Design Pattern to communicate with the different systems. That was the pattern we definitely can reuse for our abstraction. Another library which did also have an impact of the architecture is @doctrine, in the first implementation of SEAL I did go with a SchemaManager and a Connection, which is very similar how the Doctrine/DBAL works. After some implementation of different Adapters I decided to split the Connection class into two seperate classes the Indexer and the Searcher. Thx here to @wachterjohannes and @Toflar who helped me find a good way for splitting the read and write and so make things like a ReadWriteAdapter a lot easier. But back to the more important class the Engine, which is responsible for providing a single interface for the end user of the library to comunicate with there different search engines. For this we added the following methods to it:

interface EngineInterface
{
    public function saveDocument(string $index, array $document): void;

    public function deleteDocument(string $index, string $identifier): void;

    /**
     * @throws DocumentNotFoundException
     *
     * @return array<string, mixed>
     */
    public function getDocument(string $index, string $identifier): array;

    public function createSearchBuilder(): SearchBuilder;

    public function createIndex(string $index): void;

    public function dropIndex(string $index): ?TaskInterface;

    public function existIndex(string $index): bool;

    public function createSchema(): void;

    public function dropSchema(): void;
}

The usage of string representation of the index make it easier for the end user, without any imports or loading they are able with an instance of the Engine to add, delete, search and manage there search engines indexes. Internal the Engine forwards the Index instance and so the configured fields to the Adapter so that the adapter can work with it.

grafik

Fighting the search engines

The main difficulty was to fight the different search engines mappings, schemas, field definitions to match into the defined Field with options with searchable, filterable and sortable. For example to make a field only filterable and not searchable I first thought its enough to index it in Elasticsearch as a Keyword. But still if you did search for the whole word it did still show up the document in the result. After some deep diving into Elasticsearch and Lucene I found out that I could achieve it by configure the field index: false but doc_values: true. This was the only solution I found for this kind of options on my side that Elasticsearch behave the expected way. The most easiest thing as our own mapping implemented the same way was the support for Meilisearch as it uses nearly the same type of configurations. For Algolia I first thought it is the same, but sorting in Algolia requires additional replica indexes. This is also why a strict schema is required for the Search Engine abstraction that we now at creating time of the Indexes which Indexes we need to create. So at the creating time of the Indexes for Algolia we create in the AlgoliaSchemaManager additional replicas which have the specific sorting defined. At search time we are using that replica and it returns us then the result in the expected order.

Beside Elasticsearch, Opensearch, Algolia and Meilisearch I also later added the support for Solr (because used widely in the @typo3 community), RediSearch (personally a big fan of @redis) and Typesense (which did come up sometimes in my research on Reddit). With some kind of community help from the different Search Engines I could implement Solr via its Cloud mode and Typesense ...

Read more