Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching Patient by Identifier is slow. #476

Open
whyfate opened this issue May 6, 2022 · 5 comments
Open

Searching Patient by Identifier is slow. #476

whyfate opened this issue May 6, 2022 · 5 comments

Comments

@whyfate
Copy link
Contributor

whyfate commented May 6, 2022

Is your feature request related to a problem? Please describe.
I have 100w patient data,and use identifier to search very slowly.

GET /fhir/Patient?identifier=test
@kennethmyhra
Copy link
Collaborator

kennethmyhra commented May 6, 2022

Doing a test on one of my test servers with the search below returns in about 140-150 milliseconds. There are 405 patient entries in the database.

GET https://spark.incendi.no/fhir/Patient?identifier=12345

The server is a Ubuntu 20.04 with 1 CPU core and 2GB of memory.

@whyfate
Copy link
Contributor Author

whyfate commented May 6, 2022

Yes, when the amount of data is still small, the query efficiency is OK, but when the amount of data becomes more, the efficiency is very low.
My server is Ubuntu 18.04 with 8 CPU core and 32GB of memory.
100w patient entries search need 3s.

@kennethmyhra
Copy link
Collaborator

So we are talking about a million patient entries. I was not aware of the suffix w meaning 10 000.

Have you tried adding these indexes? https://github.com/FirelyTeam/spark/blob/r4/master/scripts/CreateIndexes.js

If you are lacking the index at line 2 adding this should help with the lack of performance you are seeing, could be it needs a slight modification since you are effectively not including system when querying the identifier.

As always when working with indexes. Profiling your database should help further pinpointing the issue. Also please test these indexes in a test environment and backup your database before running that script.

@whyfate
Copy link
Contributor Author

whyfate commented May 7, 2022

Yes,I add a index to Identifier,But it has little effect on the query. I think the problem is here

private static FilterDefinition<BsonDocument> TokenQuery(String parameterName, Operator optor, String modifier, ValueExpression operand)
{
//$elemMatch only works on array values. But the MongoIndexMapper only creates an array if there are multiple values for a given parameter.
//So we also construct a query for when there is only one set of values in the searchIndex, hence there is no array.
string systemfield = parameterName + ".system";
string codefield = parameterName + ".code";
string textfield = parameterName + ".text";
switch (optor)
{
case Operator.EQ:
var typedEqOperand = ((UntypedValue)operand).AsTokenValue();
if (modifier == Modifier.TEXT)
{
return Builders<BsonDocument>.Filter.Regex(textfield, new BsonRegularExpression(typedEqOperand.Value, "i"));
}
else //Search on code and system
{
//Set up two variants of queries, for dealing with single token values in the index, and multiple (in an array).
var arrayQueries = new List<FilterDefinition<BsonDocument>>();
var noArrayQueries = new List<FilterDefinition<BsonDocument>>{
Builders<BsonDocument>.Filter.Not(Builders<BsonDocument>.Filter.Type(parameterName, BsonType.Array))};
var plainStringQueries = new List<FilterDefinition<BsonDocument>>{
Builders<BsonDocument>.Filter.Type(parameterName, BsonType.String)};
if (modifier == Modifier.NOT) //NOT modifier only affects matching the code, not the system
{
noArrayQueries.Add(Builders<BsonDocument>.Filter.Exists(parameterName));
noArrayQueries.Add(Builders<BsonDocument>.Filter.Ne(codefield, typedEqOperand.Value));
arrayQueries.Add(Builders<BsonDocument>.Filter.Exists(parameterName));
arrayQueries.Add(Builders<BsonDocument>.Filter.Ne("code", typedEqOperand.Value));
plainStringQueries.Add(Builders<BsonDocument>.Filter.Exists(parameterName));
plainStringQueries.Add(Builders<BsonDocument>.Filter.Ne(parameterName, typedEqOperand.Value));
}
else
{
noArrayQueries.Add(Builders<BsonDocument>.Filter.Eq(codefield, typedEqOperand.Value));
arrayQueries.Add(Builders<BsonDocument>.Filter.Eq("code", typedEqOperand.Value));
plainStringQueries.Add(Builders<BsonDocument>.Filter.Eq(parameterName, typedEqOperand.Value));
}
//Handle the system part, if present.
if (!typedEqOperand.AnyNamespace)
{
if (string.IsNullOrWhiteSpace(typedEqOperand.Namespace))
{
arrayQueries.Add(Builders<BsonDocument>.Filter.Exists("system", false));
noArrayQueries.Add(Builders<BsonDocument>.Filter.Exists(systemfield, false));
plainStringQueries.Add(Builders<BsonDocument>.Filter.Exists("system", false));
}
else
{
arrayQueries.Add(Builders<BsonDocument>.Filter.Eq("system", typedEqOperand.Namespace));
noArrayQueries.Add(Builders<BsonDocument>.Filter.Eq(systemfield, typedEqOperand.Namespace));
plainStringQueries.Add(Builders<BsonDocument>.Filter.Eq("system", typedEqOperand.Namespace));
}
}
//Combine code and system
var arrayEqQuery = Builders<BsonDocument>.Filter.ElemMatch(parameterName, Builders<BsonDocument>.Filter.And(arrayQueries));
var noArrayEqQuery = Builders<BsonDocument>.Filter.And(noArrayQueries);
var plainStringQuery = Builders<BsonDocument>.Filter.And(plainStringQueries);
return Builders<BsonDocument>.Filter.Or(arrayEqQuery, noArrayEqQuery, plainStringQuery);

@whyfate
Copy link
Contributor Author

whyfate commented May 7, 2022

It's the slow query log.

{
    "s": "I", 
    "c": "COMMAND", 
    "id": 51803, 
    "ctx": "conn581", 
    "msg": "Slow query", 
    "attr": {
        "type": "command", 
        "ns": "spark.searchindex", 
        "command": {
            "find": "searchindex", 
            "filter": {
                "internal_level": 0, 
                "internal_resource": "Patient", 
                "$or": [
                    {
                        "identifier": {
                            "$elemMatch": {
                                "code": "test"
                            }
                        }
                    }, 
                    {
                        "identifier": {
                            "$not": {
                                "$type": 4
                            }
                        }, 
                        "identifier.code": "test"
                    }, 
                    {
                        "$and": [
                            {
                                "identifier": {
                                    "$type": 2
                                }
                            }, 
                            {
                                "identifier": "test"
                            }
                        ]
                    }
                ]
            }, 
            "projection": {
                "internal_selflink": 1
            }, 
            "$db": "spark", 
            "lsid": {
                "id": {
                    "$uuid": "748eacb6-cce5-4088-9e40-67b08adea3ed"
                }
            }
        }, 
        "planSummary": "IXSCAN { internal_level: 1, internal_resource: 1 }", 
        "keysExamined": 1001234, 
        "docsExamined": 1001234, 
        "cursorExhausted": true, 
        "numYields": 1001, 
        "nreturned": 0, 
        "queryHash": "F3A3EA5B", 
        "planCacheKey": "E06A389B", 
        "reslen": 106, 
        "locks": {
            "ReplicationStateTransition": {
                "acquireCount": {
                    "w": 1002
                }
            }, 
            "Global": {
                "acquireCount": {
                    "r": 1002
                }
            }, 
            "Database": {
                "acquireCount": {
                    "r": 1002
                }
            }, 
            "Collection": {
                "acquireCount": {
                    "r": 1002
                }
            }, 
            "Mutex": {
                "acquireCount": {
                    "r": 1
                }
            }
        }, 
        "storage": { }, 
        "protocol": "op_msg", 
        "durationMillis": 2469
    }
}

This query used '$or', so it didn't hit the identifier index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants