Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution time inconsistent with the use of an index with sorting #20674

Open
AntoineAA opened this issue Mar 1, 2024 · 0 comments
Open

Execution time inconsistent with the use of an index with sorting #20674

AntoineAA opened this issue Mar 1, 2024 · 0 comments

Comments

@AntoineAA
Copy link

AntoineAA commented Mar 1, 2024

My Environment

  • ArangoDB Version: 3.11.7 enterprise
  • Deployment Mode: Single Server
  • Deployment Strategy: Manual Start
  • Configuration:
  • Infrastructure: own
  • Operating System: Ubuntu 22.04 / Mint
  • Total RAM in your machine: 32Gb
  • Disks in use: SSD
  • Used Package: Debian or Ubuntu .deb

Component, Query & Data

Affected feature:
AQL query

AQL query (if applicable):
QUERY 1 - slow one

FOR o IN z_obs
    FILTER o.project_id == "k-world-flora"
    
    LET name = (
        o.computed
        ? (o.computed.current_name ? o.computed.current_name : null)
        : o.submitted.name
    )
    
    LET taxo = (
        FOR t IN z_taxa
            FILTER t.name == name
            FILTER "k-world-flora" IN t.sources
            FOR sp IN t.species_list
                FILTER sp.nameAccordingTo == "k-world-flora"
                RETURN {
                    family: sp.family.scientificNameWithoutAuthor,
                    species: {
                        name: sp.scientificNameWithoutAuthor,
                        author: sp.scientificNameAuthorship
                    }
                }
        )[0]
    
    FILTER taxo
    
    SORT o.date_created DESC
    LIMIT 0, 20
    
    RETURN MERGE(o, {
        taxo: taxo
    })

QUERY 2 - fast one

FOR o IN z_obs
    FILTER o.project_id == "k-world-flora"
    
    LET name = (
        o.computed
        ? (o.computed.current_name ? o.computed.current_name : null)
        : o.submitted.name
    )
    
    LET taxo = (
        FOR t IN z_taxa
            FILTER t.name == name
            FILTER "k-world-flora" IN t.sources
            LIMIT 1
            RETURN 1
        )[0]
    
    FILTER taxo
    
    SORT o.date_created DESC
    LIMIT 0, 20
    
    RETURN MERGE(o, {
        taxo: (
        FOR t IN z_taxa
            FILTER t.name == name
            FILTER "k-world-flora" IN t.sources
            FOR sp IN t.species_list
                FILTER sp.nameAccordingTo == "k-world-flora"
                RETURN {
                    family: sp.family.scientificNameWithoutAuthor,
                    species: {
                        name: sp.scientificNameWithoutAuthor,
                        author: sp.scientificNameAuthorship
                    }
                }
        )[0]
    })

Query 3 - slow one without taxo filter (-> fast)

FOR o IN z_obs
    FILTER o.project_id == "k-world-flora"
    
    LET name = (
        o.computed
        ? (o.computed.current_name ? o.computed.current_name : null)
        : o.submitted.name
    )
    
    LET taxo = (
        FOR t IN z_taxa
            FILTER t.name == name
            FILTER "k-world-flora" IN t.sources
            FOR sp IN t.species_list
                FILTER sp.nameAccordingTo == "k-world-flora"
                RETURN {
                    family: sp.family.scientificNameWithoutAuthor,
                    species: {
                        name: sp.scientificNameWithoutAuthor,
                        author: sp.scientificNameAuthorship
                    }
                }
        )[0]
    
    //FILTER taxo
    
    SORT o.date_created DESC
    LIMIT 0, 20
    
    RETURN MERGE(o, {
        taxo: taxo
    })

AQL explain and/or profile (if applicable):
QUERY 1 - slow one

Query String (848 chars, cacheable: true):
 FOR o IN z_obs
     FILTER o.project_id == "k-world-flora"
     
     LET name = (
         o.computed
         ? (o.computed.current_name ? o.computed.current_name : null)
         : o.submitted.name
     )
     
     LET taxo = (
         FOR t IN z_taxa
             FILTER t.name == name
             FILTER "k-world-flora" IN t.sources
             FOR sp IN t.species_list
                 FILTER sp.nameAccordingTo == "k-world-flora"
                 RETURN {
                     family: sp.family.scientificNameWithoutAuthor,
                     species: {
                         name: sp.scientificNameWithoutAuthor,
                         author: sp.scientificNameAuthorship
                     }
                 }
         )[0]
     
     FILTER taxo
     
     SORT o.date_created DESC
     LIMIT 0, 20
     
     RETURN MERGE(o, {
         taxo: taxo
     })

Execution plan:
 Id   NodeType                Est.   Comment
  1   SingletonNode              1   * ROOT
 27   IndexNode             159532     - FOR o IN z_obs   /* reverse persistent index scan, index scan + document lookup */    
 29   SubqueryStartNode     159532       - LET #4 = ( /* subquery begin */
 28   IndexNode             159532         - FOR t IN z_taxa   /* hash index scan, index scan + document lookup (projections: `sources`, `species_list`) */    FILTER ("k-world-flora" IN t.`sources`)   /* early pruning */   
 12   CalculationNode       159532           - LET #13 = t.`species_list`   /* attribute expression */   /* collections used: t : z_taxa */
 13   EnumerateListNode   15953200           - FOR sp IN #13   /* list iteration */
 14   CalculationNode     15953200             - LET #15 = (sp.`nameAccordingTo` == "k-world-flora")   /* simple expression */
 15   FilterNode          15953200             - FILTER #15
 26   LimitNode                  1             - LIMIT 0, 1
 16   CalculationNode            1             - LET #17 = { "family" : sp.`family`.`scientificNameWithoutAuthor`, "species" : { "name" : sp.`scientificNameWithoutAuthor`, "author" : sp.`scientificNameAuthorship` } }   /* simple expression */
 30   SubqueryEndNode       159532             - RETURN  #17 ) /* subquery end */
 19   CalculationNode       159532       - LET taxo = #4[0]   /* simple expression */
 20   FilterNode            159532       - FILTER taxo
 23   LimitNode                 20       - LIMIT 0, 20
 24   CalculationNode           20       - LET #21 = MERGE(o, { "taxo" : taxo })   /* simple expression */   /* collections used: o : z_obs */
 25   ReturnNode                20       - RETURN #21

Indexes used:
 By   Name              Type         Collection   Unique   Sparse   Cache   Selectivity   Fields                             Stored values   Ranges
 27   idx_11016981662   persistent   z_obs        false    false    false       54.27 %   [ `project_id`, `date_created` ]   [  ]            (o.`project_id` == "k-world-flora")
 28   idx_10219655942   hash         z_taxa       true     false    false      100.00 %   [ `name` ]                         [  ]            (t.`name` == (o.`computed` ? (o.`computed`.`current_name` ? o.`computed`.`current_name` : null) : o.`submitted`.`name`))

Functions used:
 Name    Deterministic   Cacheable   Uses V8
 MERGE   true            true        false  

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   optimize-subqueries
  4   move-calculations-up-2
  5   move-filters-up-2
  6   use-indexes
  7   remove-filter-covered-by-index
  8   use-index-for-sort
  9   remove-unnecessary-calculations-2
 10   move-calculations-down
 11   move-filters-into-enumerate
 12   reduce-extraction-to-projection
 13   splice-subqueries

Optimization rules with highest execution times:
 RuleName                                    Duration [s]
 remove-unnecessary-calculations-2                0.00039
 use-indexes                                      0.00028

46 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00245

profile

Query String (848 chars, cacheable: false):
 FOR o IN z_obs
     FILTER o.project_id == "k-world-flora"
     
     LET name = (
         o.computed
         ? (o.computed.current_name ? o.computed.current_name : null)
         : o.submitted.name
     )
     
     LET taxo = (
         FOR t IN z_taxa
             FILTER t.name == name
             FILTER "k-world-flora" IN t.sources
             FOR sp IN t.species_list
                 FILTER sp.nameAccordingTo == "k-world-flora"
                 RETURN {
                     family: sp.family.scientificNameWithoutAuthor,
                     species: {
                         name: sp.scientificNameWithoutAuthor,
                         author: sp.scientificNameAuthorship
                     }
                 }
         )[0]
     
     FILTER taxo
     
     SORT o.date_created DESC
     LIMIT 0, 20
     
     RETURN MERGE(o, {
         taxo: taxo
     })

Execution plan:
 Id   NodeType            Calls   Items   Filtered   Runtime [s]   Comment
  1   SingletonNode           1       1          0       0.00002   * ROOT
 27   IndexNode               3    2000          0       0.03744     - FOR o IN z_obs   /* reverse persistent index scan, index scan + document lookup */    
 29   SubqueryStartNode       5    4000          0       0.00590       - LET #4 = ( /* subquery begin */
 28   IndexNode               4    3000          0       1.58068         - FOR t IN z_taxa   /* hash index scan, index scan + document lookup (projections: `sources`, `species_list`) */    FILTER ("k-world-flora" IN t.`sources`)   /* early pruning */   
 12   CalculationNode         4    3000          0       1.46989           - LET #13 = t.`species_list`   /* attribute expression */   /* collections used: t : z_taxa */
 13   EnumerateListNode      30   29000          0       0.50607           - FOR sp IN #13   /* list iteration */
 14   CalculationNode        30   29000          0       0.01885             - LET #15 = (sp.`nameAccordingTo` == "k-world-flora")   /* simple expression */
 15   FilterNode              3    2000      25894       0.02653             - FILTER #15
 26   LimitNode               3    2000          0       0.00021             - LIMIT 0, 1
 16   CalculationNode         3    2000          0       0.00334             - LET #17 = { "family" : sp.`family`.`scientificNameWithoutAuthor`, "species" : { "name" : sp.`scientificNameWithoutAuthor`, "author" : sp.`scientificNameAuthorship` } }   /* simple expression */
 30   SubqueryEndNode         2    1000          0       0.00078             - RETURN  #17 ) /* subquery end */
 19   CalculationNode         2    1000          0       0.00046       - LET taxo = #4[0]   /* simple expression */
 20   FilterNode              1      20          6       0.00021       - FILTER taxo
 23   LimitNode               1      20          0       0.00001       - LIMIT 0, 20
 24   CalculationNode         1      20          0       0.00020       - LET #21 = MERGE(o, { "taxo" : taxo })   /* simple expression */   /* collections used: o : z_obs */
 25   ReturnNode              1      20          0       0.00001       - RETURN #21

Indexes used:
 By   Name              Type         Collection   Unique   Sparse   Cache   Selectivity   Fields                             Stored values   Ranges
 27   idx_11016981662   persistent   z_obs        false    false    false       54.27 %   [ `project_id`, `date_created` ]   [  ]            (o.`project_id` == "k-world-flora")
 28   idx_10219655942   hash         z_taxa       true     false    false      100.00 %   [ `name` ]                         [  ]            (t.`name` == (o.`computed` ? (o.`computed`.`current_name` ? o.`computed`.`current_name` : null) : o.`submitted`.`name`))

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   optimize-subqueries
  4   move-calculations-up-2
  5   move-filters-up-2
  6   use-indexes
  7   remove-filter-covered-by-index
  8   use-index-for-sort
  9   remove-unnecessary-calculations-2
 10   move-calculations-down
 11   move-filters-into-enumerate
 12   reduce-extraction-to-projection
 13   splice-subqueries

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Cache Hits/Misses   Filtered   Peak Mem [b]   Exec Time [s]
           0            0           0         3497               0 / 0      25900     2092498944         3.65286

Query Profile:
 Query Stage               Duration [s]
 initializing                   0.00000
 parsing                        0.00021
 optimizing ast                 0.00004
 loading collections            0.00001
 instantiating plan             0.00012
 optimizing plan                0.00157
 instantiating executors        0.00025
 executing                      3.65064
 finalizing                     0.00006

QUERY 2 - fast one

Query String (1005 chars, cacheable: true):
 FOR o IN z_obs
     FILTER o.project_id == "k-world-flora"
     
     LET name = (
         o.computed
         ? (o.computed.current_name ? o.computed.current_name : null)
         : o.submitted.name
     )
     
     LET taxo = (
         FOR t IN z_taxa
             FILTER t.name == name
             FILTER "k-world-flora" IN t.sources
             LIMIT 1
             RETURN 1
         )[0]
     
     FILTER taxo
     
     SORT o.date_created DESC
     LIMIT 0, 20
     
     RETURN MERGE(o, {
         taxo: (
         FOR t IN z_taxa
             FILTER t.name == name
             FILTER "k-world-flora" IN t.sources
             FOR sp IN t.species_list
                 FILTER sp.nameAccordingTo == "k-world-flora"
                 RETURN {
                     family: sp.family.scientificNameWithoutAuthor,
                     species: {
                         name: sp.scientificNameWithoutAuthor,
                         author: sp.scientificNameAuthorship
                     }
                 }
   ...

Execution plan:
 Id   NodeType              Est.   Comment
  1   SingletonNode            1   * ROOT
 38   IndexNode           159532     - FOR o IN z_obs   /* reverse persistent index scan, index scan + document lookup */    
 43   SubqueryStartNode   159532       - LET #3 = ( /* subquery begin */
 39   IndexNode           159532         - FOR t IN z_taxa   /* hash index scan, index scan + document lookup (projections: `sources`) */    FILTER ("k-world-flora" IN t.`sources`)   /* early pruning */   
 12   LimitNode                1           - LIMIT 0, 1
 36   LimitNode                1           - LIMIT 0, 1
 13   CalculationNode          1           - LET #16 = 1   /* json expression */   /* const assignment */
 44   SubqueryEndNode     159532           - RETURN  #16 ) /* subquery end */
 16   CalculationNode     159532       - LET taxo = #3[0]   /* simple expression */
 17   FilterNode          159532       - FILTER taxo
 20   LimitNode               20       - LIMIT 0, 20
 41   SubqueryStartNode       20       - LET #8 = ( /* subquery begin */
 40   IndexNode               20         - FOR t IN z_taxa   /* hash index scan, index scan + document lookup (projections: `sources`, `species_list`) */    FILTER ("k-world-flora" IN t.`sources`)   /* early pruning */   
 27   CalculationNode         20           - LET #24 = t.`species_list`   /* attribute expression */   /* collections used: t : z_taxa */
 28   EnumerateListNode     2000           - FOR sp IN #24   /* list iteration */
 29   CalculationNode       2000             - LET #26 = (sp.`nameAccordingTo` == "k-world-flora")   /* simple expression */
 30   FilterNode            2000             - FILTER #26
 37   LimitNode                1             - LIMIT 0, 1
 31   CalculationNode          1             - LET #28 = { "family" : sp.`family`.`scientificNameWithoutAuthor`, "species" : { "name" : sp.`scientificNameWithoutAuthor`, "author" : sp.`scientificNameAuthorship` } }   /* simple expression */
 42   SubqueryEndNode         20             - RETURN  #28 ) /* subquery end */
 34   CalculationNode         20       - LET #30 = MERGE(o, { "taxo" : #8[0] })   /* simple expression */   /* collections used: o : z_obs */
 35   ReturnNode              20       - RETURN #30

Indexes used:
 By   Name              Type         Collection   Unique   Sparse   Cache   Selectivity   Fields                             Stored values   Ranges
 38   idx_11016981662   persistent   z_obs        false    false    false       54.27 %   [ `project_id`, `date_created` ]   [  ]            (o.`project_id` == "k-world-flora")
 39   idx_10219655942   hash         z_taxa       true     false    false      100.00 %   [ `name` ]                         [  ]            (t.`name` == (o.`computed` ? (o.`computed`.`current_name` ? o.`computed`.`current_name` : null) : o.`submitted`.`name`))
 40   idx_10219655942   hash         z_taxa       true     false    false      100.00 %   [ `name` ]                         [  ]            (t.`name` == (o.`computed` ? (o.`computed`.`current_name` ? o.`computed`.`current_name` : null) : o.`submitted`.`name`))

Functions used:
 Name    Deterministic   Cacheable   Uses V8
 MERGE   true            true        false  

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   optimize-subqueries
  4   move-calculations-up-2
  5   move-filters-up-2
  6   use-indexes
  7   remove-filter-covered-by-index
  8   use-index-for-sort
  9   remove-unnecessary-calculations-2
 10   move-calculations-down
 11   move-filters-into-enumerate
 12   reduce-extraction-to-projection
 13   splice-subqueries

Optimization rules with highest execution times:
 RuleName                                    Duration [s]
 use-indexes                                      0.00033

46 rule(s) executed, 1 plan(s) created, peak mem [b]: 32768, exec time [s]: 0.00262

profile

Query String (1005 chars, cacheable: false):
 FOR o IN z_obs
     FILTER o.project_id == "k-world-flora"
     
     LET name = (
         o.computed
         ? (o.computed.current_name ? o.computed.current_name : null)
         : o.submitted.name
     )
     
     LET taxo = (
         FOR t IN z_taxa
             FILTER t.name == name
             FILTER "k-world-flora" IN t.sources
             LIMIT 1
             RETURN 1
         )[0]
     
     FILTER taxo
     
     SORT o.date_created DESC
     LIMIT 0, 20
     
     RETURN MERGE(o, {
         taxo: (
         FOR t IN z_taxa
             FILTER t.name == name
             FILTER "k-world-flora" IN t.sources
             FOR sp IN t.species_list
                 FILTER sp.nameAccordingTo == "k-world-flora"
                 RETURN {
                     family: sp.family.scientificNameWithoutAuthor,
                     species: {
                         name: sp.scientificNameWithoutAuthor,
                         author: sp.scientificNameAuthorship
                     }
                 }
   ...

Execution plan:
 Id   NodeType            Calls   Items   Filtered   Runtime [s]   Comment
  1   SingletonNode           1       1          0       0.00002   * ROOT
 38   IndexNode               3    2000          0       0.03715     - FOR o IN z_obs   /* reverse persistent index scan, index scan + document lookup */    
 43   SubqueryStartNode       4    3000          0       0.00503       - LET #3 = ( /* subquery begin */
 39   IndexNode               3    2000          0       0.06891         - FOR t IN z_taxa   /* hash index scan, index scan + document lookup (projections: `sources`) */    FILTER ("k-world-flora" IN t.`sources`)   /* early pruning */   
 12   LimitNode               3    2000          0       0.00027           - LIMIT 0, 1
 36   LimitNode               3    2000          0       0.00025           - LIMIT 0, 1
 13   CalculationNode         3    2000          0       0.00017           - LET #16 = 1   /* json expression */   /* const assignment */
 44   SubqueryEndNode         2    1000          0       0.00056           - RETURN  #16 ) /* subquery end */
 16   CalculationNode         2    1000          0       0.00023       - LET taxo = #3[0]   /* simple expression */
 17   FilterNode              1      20          6       0.00014       - FILTER taxo
 20   LimitNode               1      20          0       0.00000       - LIMIT 0, 20
 41   SubqueryStartNode       1      40          0       0.00003       - LET #8 = ( /* subquery begin */
 40   IndexNode               1      40          0       0.02711         - FOR t IN z_taxa   /* hash index scan, index scan + document lookup (projections: `sources`, `species_list`) */    FILTER ("k-world-flora" IN t.`sources`)   /* early pruning */   
 27   CalculationNode         1      40          0       0.02209           - LET #24 = t.`species_list`   /* attribute expression */   /* collections used: t : z_taxa */
 28   EnumerateListNode       1     551          0       0.01754           - FOR sp IN #24   /* list iteration */
 29   CalculationNode         1     551          0       0.00045             - LET #26 = (sp.`nameAccordingTo` == "k-world-flora")   /* simple expression */
 30   FilterNode              5      40        492       0.00076             - FILTER #26
 37   LimitNode               5      40          0       0.00004             - LIMIT 0, 1
 31   CalculationNode         5      40          0       0.00013             - LET #28 = { "family" : sp.`family`.`scientificNameWithoutAuthor`, "species" : { "name" : sp.`scientificNameWithoutAuthor`, "author" : sp.`scientificNameAuthorship` } }   /* simple expression */
 42   SubqueryEndNode         1      20          0       0.00007             - RETURN  #28 ) /* subquery end */
 34   CalculationNode         1      20          0       0.00026       - LET #30 = MERGE(o, { "taxo" : #8[0] })   /* simple expression */   /* collections used: o : z_obs */
 35   ReturnNode              1      20          0       0.00001       - RETURN #30

Indexes used:
 By   Name              Type         Collection   Unique   Sparse   Cache   Selectivity   Fields                             Stored values   Ranges
 38   idx_11016981662   persistent   z_obs        false    false    false       54.27 %   [ `project_id`, `date_created` ]   [  ]            (o.`project_id` == "k-world-flora")
 39   idx_10219655942   hash         z_taxa       true     false    false      100.00 %   [ `name` ]                         [  ]            (t.`name` == (o.`computed` ? (o.`computed`.`current_name` ? o.`computed`.`current_name` : null) : o.`submitted`.`name`))
 40   idx_10219655942   hash         z_taxa       true     false    false      100.00 %   [ `name` ]                         [  ]            (t.`name` == (o.`computed` ? (o.`computed`.`current_name` ? o.`computed`.`current_name` : null) : o.`submitted`.`name`))

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   optimize-subqueries
  4   move-calculations-up-2
  5   move-filters-up-2
  6   use-indexes
  7   remove-filter-covered-by-index
  8   use-index-for-sort
  9   remove-unnecessary-calculations-2
 10   move-calculations-down
 11   move-filters-into-enumerate
 12   reduce-extraction-to-projection
 13   splice-subqueries

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Cache Hits/Misses   Filtered   Peak Mem [b]   Exec Time [s]
           0            0           0         3017               0 / 0        498       47939584         0.18399

Query Profile:
 Query Stage               Duration [s]
 initializing                   0.00000
 parsing                        0.00020
 optimizing ast                 0.00004
 loading collections            0.00001
 instantiating plan             0.00014
 optimizing plan                0.00202
 instantiating executors        0.00030
 executing                      0.18126
 finalizing                     0.00008

Query 3

Query String (850 chars, cacheable: true):
 FOR o IN z_obs
     FILTER o.project_id == "k-world-flora"
     
     LET name = (
         o.computed
         ? (o.computed.current_name ? o.computed.current_name : null)
         : o.submitted.name
     )
     
     LET taxo = (
         FOR t IN z_taxa
             FILTER t.name == name
             FILTER "k-world-flora" IN t.sources
             FOR sp IN t.species_list
                 FILTER sp.nameAccordingTo == "k-world-flora"
                 RETURN {
                     family: sp.family.scientificNameWithoutAuthor,
                     species: {
                         name: sp.scientificNameWithoutAuthor,
                         author: sp.scientificNameAuthorship
                     }
                 }
         )[0]
     
     //FILTER taxo
     
     SORT o.date_created DESC
     LIMIT 0, 20
     
     RETURN MERGE(o, {
         taxo: taxo
     })

Execution plan:
 Id   NodeType              Est.   Comment
  1   SingletonNode            1   * ROOT
 26   IndexNode           159532     - FOR o IN z_obs   /* reverse persistent index scan, index scan + document lookup */    
 22   LimitNode               20       - LIMIT 0, 20
 28   SubqueryStartNode       20       - LET #4 = ( /* subquery begin */
 27   IndexNode               20         - FOR t IN z_taxa   /* hash index scan, index scan + document lookup (projections: `sources`, `species_list`) */    FILTER ("k-world-flora" IN t.`sources`)   /* early pruning */   
 12   CalculationNode         20           - LET #13 = t.`species_list`   /* attribute expression */   /* collections used: t : z_taxa */
 13   EnumerateListNode     2000           - FOR sp IN #13   /* list iteration */
 14   CalculationNode       2000             - LET #15 = (sp.`nameAccordingTo` == "k-world-flora")   /* simple expression */
 15   FilterNode            2000             - FILTER #15
 25   LimitNode                1             - LIMIT 0, 1
 16   CalculationNode          1             - LET #17 = { "family" : sp.`family`.`scientificNameWithoutAuthor`, "species" : { "name" : sp.`scientificNameWithoutAuthor`, "author" : sp.`scientificNameAuthorship` } }   /* simple expression */
 29   SubqueryEndNode         20             - RETURN  #17 ) /* subquery end */
 23   CalculationNode         20       - LET #21 = MERGE(o, { "taxo" : #4[0] })   /* simple expression */   /* collections used: o : z_obs */
 24   ReturnNode              20       - RETURN #21

Indexes used:
 By   Name              Type         Collection   Unique   Sparse   Cache   Selectivity   Fields                             Stored values   Ranges
 26   idx_11016981662   persistent   z_obs        false    false    false       54.27 %   [ `project_id`, `date_created` ]   [  ]            (o.`project_id` == "k-world-flora")
 27   idx_10219655942   hash         z_taxa       true     false    false      100.00 %   [ `name` ]                         [  ]            (t.`name` == (o.`computed` ? (o.`computed`.`current_name` ? o.`computed`.`current_name` : null) : o.`submitted`.`name`))

Functions used:
 Name    Deterministic   Cacheable   Uses V8
 MERGE   true            true        false  

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   remove-unnecessary-calculations
  4   optimize-subqueries
  5   move-calculations-up-2
  6   move-filters-up-2
  7   use-indexes
  8   remove-filter-covered-by-index
  9   use-index-for-sort
 10   remove-unnecessary-calculations-2
 11   move-calculations-down
 12   move-filters-into-enumerate
 13   reduce-extraction-to-projection
 14   splice-subqueries

47 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00134

profile

Query String (850 chars, cacheable: false):
 FOR o IN z_obs
     FILTER o.project_id == "k-world-flora"
     
     LET name = (
         o.computed
         ? (o.computed.current_name ? o.computed.current_name : null)
         : o.submitted.name
     )
     
     LET taxo = (
         FOR t IN z_taxa
             FILTER t.name == name
             FILTER "k-world-flora" IN t.sources
             FOR sp IN t.species_list
                 FILTER sp.nameAccordingTo == "k-world-flora"
                 RETURN {
                     family: sp.family.scientificNameWithoutAuthor,
                     species: {
                         name: sp.scientificNameWithoutAuthor,
                         author: sp.scientificNameAuthorship
                     }
                 }
         )[0]
     
     //FILTER taxo
     
     SORT o.date_created DESC
     LIMIT 0, 20
     
     RETURN MERGE(o, {
         taxo: taxo
     })

Execution plan:
 Id   NodeType            Calls   Items   Filtered   Runtime [s]   Comment
  1   SingletonNode           1       1          0       0.00001   * ROOT
 26   IndexNode               1      20          0       0.00081     - FOR o IN z_obs   /* reverse persistent index scan, index scan + document lookup */    
 22   LimitNode               1      20          0       0.00001       - LIMIT 0, 20
 28   SubqueryStartNode       1      40          0       0.00013       - LET #4 = ( /* subquery begin */
 27   IndexNode               1      34          0       0.03182         - FOR t IN z_taxa   /* hash index scan, index scan + document lookup (projections: `sources`, `species_list`) */    FILTER ("k-world-flora" IN t.`sources`)   /* early pruning */   
 12   CalculationNode         1      34          0       0.02084           - LET #13 = t.`species_list`   /* attribute expression */   /* collections used: t : z_taxa */
 13   EnumerateListNode       1     383          0       0.01208           - FOR sp IN #13   /* list iteration */
 14   CalculationNode         1     383          0       0.00031             - LET #15 = (sp.`nameAccordingTo` == "k-world-flora")   /* simple expression */
 15   FilterNode              4      34        336       0.00055             - FILTER #15
 25   LimitNode               4      34          0       0.00004             - LIMIT 0, 1
 16   CalculationNode         4      34          0       0.00011             - LET #17 = { "family" : sp.`family`.`scientificNameWithoutAuthor`, "species" : { "name" : sp.`scientificNameWithoutAuthor`, "author" : sp.`scientificNameAuthorship` } }   /* simple expression */
 29   SubqueryEndNode         1      20          0       0.00008             - RETURN  #17 ) /* subquery end */
 23   CalculationNode         1      20          0       0.00028       - LET #21 = MERGE(o, { "taxo" : #4[0] })   /* simple expression */   /* collections used: o : z_obs */
 24   ReturnNode              1      20          0       0.00001       - RETURN #21

Indexes used:
 By   Name              Type         Collection   Unique   Sparse   Cache   Selectivity   Fields                             Stored values   Ranges
 26   idx_11016981662   persistent   z_obs        false    false    false       54.27 %   [ `project_id`, `date_created` ]   [  ]            (o.`project_id` == "k-world-flora")
 27   idx_10219655942   hash         z_taxa       true     false    false      100.00 %   [ `name` ]                         [  ]            (t.`name` == (o.`computed` ? (o.`computed`.`current_name` ? o.`computed`.`current_name` : null) : o.`submitted`.`name`))

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   remove-unnecessary-calculations
  4   optimize-subqueries
  5   move-calculations-up-2
  6   move-filters-up-2
  7   use-indexes
  8   remove-filter-covered-by-index
  9   use-index-for-sort
 10   remove-unnecessary-calculations-2
 11   move-calculations-down
 12   move-filters-into-enumerate
 13   reduce-extraction-to-projection
 14   splice-subqueries

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Cache Hits/Misses   Filtered   Peak Mem [b]   Exec Time [s]
           0            0           0           34               0 / 0        336       32079872         0.06930

Query Profile:
 Query Stage               Duration [s]
 initializing                   0.00000
 parsing                        0.00018
 optimizing ast                 0.00003
 loading collections            0.00001
 instantiating plan             0.00012
 optimizing plan                0.00157
 instantiating executors        0.00025
 executing                      0.06712
 finalizing                     0.00007

Dataset:

  • 3.2M documents in z_obs collection
  • 500K documents in z_taxa collection
  • 1 billion documents in the database

Size of your Dataset on disk:
3T

Replication Factor & Number of Shards (Cluster only):

Steps to reproduce

  1. Run queries

Problem:
It is normal for the projection to be faster after the LIMIT clause (read documents to get additional fields). The problem is that by using this sort option (date_created), the first hundred documents match the taxo filter criterion. If I remove the taxo filter from QUERY 1, it becomes as fast as query 2 and the 20 documents returned have the taxo field.

Expected result:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant