-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ManticoresearchAdapter #103
base: 0.5
Are you sure you want to change the base?
Conversation
ab0ccb5
to
b74f9aa
Compare
e85cd0b
to
7a97e4c
Compare
8cff708
to
2d70635
Compare
Hi. I'm a member of Manticore team. Please let me know if we can help with this. |
c95518c
to
c508f6c
Compare
@sanikolaev your help is really welcome here. Maybe we can do this step by step, first would be nice if you could help how to map a The abstraction is supporting the following kind of fields for single representation I currently did use the following mapping, I think that should be correct. The datetime / timestamps seems to be representated in Manticoresearch as Number so I did use in our converter to convert "2023-..." to a timestamp number presentation like we already using for Apache Solr. So a very basic mapping should look like this hope atleast that is correct:
But now the more difficult part, every Field can be multiple, I'm not yet sure how I can map something else then a
The
While reading the documentation about text / string I'm not sure if a field which contains text would maybe be better to be
All kind of fields can be
The problem with the multiple fields is what currently make the Implementation crashing as I'm not sure how this can be handle with manticore search engine or sphinx:
Form the previous discussion some JSON field maybe would support this, but I'm not sure about correclty defining that types. as it fails there in case of combination with indexed:
As example our test has a |
c508f6c
to
38db6ff
Compare
38db6ff
to
9087029
Compare
This is only possible using the mysql> drop table if exists t; create table t(string_array json, float_array json, bool_array json); insert into t values(0, '["abc", "def"]', '[1.23, 2.34]', '[true, false]'),(0, '["ghi", "jkl"]', '[3.45, 4.56]', '[true, true]'); select *, any(x = 'abc' for x in string_array), any(x > 3.0 and x < 4.0 for x in float_array), all(x = 1 for x in bool_array) from t;
--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.01 sec)
--------------
create table t(string_array json, float_array json, bool_array json)
--------------
Query OK, 0 rows affected (0.01 sec)
--------------
insert into t values(0, '["abc", "def"]', '[1.23, 2.34]', '[true, false]'),(0, '["ghi", "jkl"]', '[3.45, 4.56]', '[true, true]')
--------------
Query OK, 2 rows affected (0.00 sec)
--------------
select *, any(x = 'abc' for x in string_array), any(x > 3.0 and x < 4.0 for x in float_array), all(x = 1 for x in bool_array) from t
--------------
+---------------------+---------------+---------------------+--------------+--------------------------------------+-----------------------------------------------+--------------------------------+
| id | string_array | float_array | bool_array | any(x = 'abc' for x in string_array) | any(x > 3.0 and x < 4.0 for x in float_array) | all(x = 1 for x in bool_array) |
+---------------------+---------------+---------------------+--------------+--------------------------------------+-----------------------------------------------+--------------------------------+
| 1515343812221005444 | ["abc","def"] | [1.230000,2.340000] | [true,false] | 1 | 0 | 0 |
| 1515343812221005445 | ["ghi","jkl"] | [3.450000,4.560000] | [true,true] | 0 | 1 | 1 |
+---------------------+---------------+---------------------+--------------+--------------------------------------+-----------------------------------------------+--------------------------------+
2 rows in set (0.00 sec) BTW |
@sanikolaev thx for the response, what about
|
I tried to skip the attribute and indexed part for the json fields still run into another error this is the manticore field defintions had to use {
"title": {
"type": "text",
"options": [
"indexed"
]
},
"header_image_media": {
"type": "integer",
"options": []
},
"header_video_media": {
"type": "string",
"options": []
},
"article": {
"type": "text",
"options": [
"indexed"
]
},
"blocks_text_title": {
"type": "json",
"options": []
},
"blocks_text_description": {
"type": "json",
"options": []
},
"blocks_text_media": {
"type": "multi",
"options": []
},
"blocks_embed_title": {
"type": "json",
"options": []
},
"blocks_embed_media": {
"type": "json",
"options": []
},
"footer_title": {
"type": "text",
"options": [
"indexed"
]
},
"created": {
"type": "timestamp",
"options": []
},
"commentsCount": {
"type": "integer",
"options": []
},
"rating": {
"type": "float",
"options": []
},
"comments_email": {
"type": "json",
"options": []
},
"comments_text": {
"type": "json",
"options": []
},
"tags": {
"type": "json",
"options": []
},
"categoryIds": {
"type": "multi",
"options": []
},
"_source": {
"type": "string",
"options": []
}
} This is the document: {
"title": "New Blog",
"header_image_media": 1,
"article": "<article><h2>New Subtitle<\/h2><p>A html field with some content<\/p><\/article>",
"blocks_text_title": "[\"Titel\",\"Titel 2\",\"Titel 4\"]",
"blocks_text_description": "[\"<p>Description<\\\/p>\",\"<p>Description 4<\\\/p>\"]",
"blocks_text_media": [
3,
4,
3,
4
],
"blocks_embed_title": "[\"Video\"]",
"blocks_embed_media": "[\"https:\\\/\\\/www.youtube.com\\\/watch?v=iYM2zFP3Zn0\"]",
"footer_title": "New Footer",
"created": "2022-01-24T12:00:00+01:00",
"commentsCount": 2,
"rating": 3.5,
"comments_email": "[\"admin.nonesearchablefield@localhost\",\"example.nonesearchablefield@localhost\"]",
"comments_text": "[\"Awesome blog!\",\"Like this blog!\"]",
"tags": "[\"Tech\",\"UI\"]",
"categoryIds": [
1,
2
],
"_source": "{\"unrelated\":\"Unrelated\"}"
} it is indixed via the PHP client this way: $searchIndex = $this->client->index('test_complex');
$searchIndex->addDocument($aboveDocument, '23b30f01-d8fd-4dca-b36a-4710e360a965'); But when try to load that document via: $searchIndex = $this->client->index('test_complex');
$searchIndex->getDocumentById('23b30f01-d8fd-4dca-b36a-4710e360a965'); It errors with:
Not sure why this is happening. |
I see. This is right. Manticore doesn't natively support nested objects and the period sign is used for json, e.g.:
Manticore doesn't support string IDs. The ID requirements can be found here https://manual.manticoresearch.com/Creating_a_table/Data_types#Document-ID. |
From the document above we have text which is searchable but are represented by an array of texts, as we did flatten the whole blocks objects. As suggested by you I did now use for this array text fields ( The [
'type' => 'text',
'index' => true,
'fields' => [
'raw' => ['type' => 'keyword'],
],
] So a field |
The equivalent of Elasticsearch's
in Manticore is
|
I'm not sure if I did understand you correctly {
"uuid": "23b30f01-d8fd-4dca-b36a-4710e360a965",
"tags": ["UI", "UX"]
} For searchable I think we could use {
"uuid": "23b30f01-d8fd-4dca-b36a-4710e360a965",
"tags": "UI UX"
} But that how we could still get then filterability to work to get document tagged with that tags. That would still I think require a {
"uuid": "23b30f01-d8fd-4dca-b36a-4710e360a965",
"tags": "UI UX",
"tags_raw": ["UI", "UX"]
} PS: we are using the https://github.com/manticoresoftware/manticoresearch-php here so we are not actually do any create table ... statement ourselfs. |
Manticoresearch is a Sphinx Fork providing PHP implementation over https://github.com/manticoresoftware/manticoresearch-php. As requested by some on reddit we are trying to support also this.
TODO
testFindMultipleIndexesSkipped TODO issueExternal