Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrderBy clause is not XQuery 3 compliant #5089

Open
adamretter opened this issue Oct 14, 2023 · 0 comments
Open

OrderBy clause is not XQuery 3 compliant #5089

adamretter opened this issue Oct 14, 2023 · 0 comments
Labels
bug issue confirmed as bug
Milestone

Comments

@adamretter
Copy link
Member

adamretter commented Oct 14, 2023

Specifically, eXist-db's implementation of the XQuery order by clause does not follow the W3C specification. The W3C XQuery 3.1 Order By Clause specification states:

The purpose of an order by clause is to impose a value-based ordering on the tuples in the tuple stream. The output tuple stream of the order by clause contains the same tuples as its input tuple stream, but the tuples may be in a different order.

This means that any tuples in a tuple stream after an Order By Clause should be in the order defined by the order by clause. In the eXist-db implementation that is not the case. In eXist-db any expressions/clauses within a FLWOR expression that occur after the order by clause, but before the return clause, will not see an ordered tuple stream; instead they see the tuple stream prior to the order by clause. The reason for this is that in eXist-db the operation of executing the sort for the Order By clause is always deferred until just before the return clause of the FLWOR expression. In eXist-db this is known as a postEval phase, see:

  1. FLWORClause#postEval(Sequence)
  2. ForExpr#eval(Sequence, Item)
  3. OrderByClause#postEval(Sequence)

Modifying eXist-db to correctly implement the Order By Clause semantics is a large task, and may require a rewrite of all FLWOR expressions implementations. The issue was discovered during development of this PR - #4530

See the pending test ct:order-alpha-ascending-indexes#0 in exist-core/src/test/xquery/xquery3/count.xqm.

On an tangentially related note, we suspect there may be other problems with all other FLWOR expressions in eXist-db that implement postEval with regards to compliance with W3C XQuery 3.1.


In Further Detail

Given the XQuery:

for $x in ('a', 'b')

  count $index1
  
  let $remainder := $index1 mod 2
  order by $remainder, $index1
  
  count $index2

return
    <x index1="{$index1}" index2="{$index2}">{$x}</x>

The result should be computed as:

<x index1="2" index2="1">b</x>
<x index1="1" index2="2">a</x>

However eXist-db produces the incorrect result due to its incorrect implementation of the order by clause:

<x index1="2" index2="2">b</x>
<x index1="1" index2="1">a</x>

This occurs because eXist-db evaluates the query by effectively executing the following steps:

  1. Bind $x := 'a'
  2. Bind $index1 := 1
  3. Bind $remainder := 1
  4. Bind $index2 := 1
  5. Add <x index1="1" index2="1">a</x> to the result sequence.
  6. Bind $x := 'b'
  7. Bind $index1 := 2
  8. Bind $remainder := 0
  9. Bind $index2 := 2
  10. Add <x index1="2" index2="2">b</x> to the result sequence.
  11. Sort the result sequence based by $remainder ascending and then $index1 ascending
  12. Return the sorted result sequence.

The problem is that eXist-db applies the sorting (i.e. the order by clause) after the fact.

I have created a detailed UML Sequence Diagram of the above XQuery execution in eXist-db that shows the exact problem:
eXist-db Count and Order By - UML Sequence Diagram

In case it is helpful for anyone else, the code for the UML Sequence Diagram:

participant user as "User"
participant xquery as "XQuery"
participant pathexpr as "PathExpr"
participant x as "$x: ForExpr"
participant index1 as "$index1: CountExpr"
participant remainder as "$remainder: LetExpr"
participant orderby as "OrderByClause"
participant index2 as "$index2: CountExpr"
participant element as "ElementConstructor"

activate user

'XQuery Compilation Phase
note over user: XQuery Compilation phase starts here
activate xquery
user -> xquery: compile(...)
xquery -> xquery: analyzeAndOptimizeIfModulesChanged(...)
activate pathexpr
xquery -> pathexpr: analyze(...)
activate x
pathexpr -> x: analyze(...)
activate index1
x -> index1: analyze(...)
activate remainder
index1 -> remainder: analyze(...)
activate orderby
remainder -> orderby: analyze(...)
activate index2
orderby -> index2: analyze(...)
activate element
index2 -> element: analyze(...)
element -> index2
deactivate element
index2 -> orderby
deactivate index2
orderby -> remainder
deactivate orderby
remainder -> index1
deactivate remainder
index1 -> x
deactivate index1
x -> pathexpr
deactivate x
pathexpr -> xquery
deactivate pathexpr
xquery -> user
deactivate xquery

'XQuery Execution Phase
note over user: XQuery Execution phase starts here
activate xquery
user -> xquery: execute(...)
activate pathexpr
xquery -> pathexpr: eval(null)
activate x
pathexpr -> x: eval(null)
x -> x: isOuterFor == true
x -> x: $index1 is FLWORClause == true
activate index1
x -> index1: preEval(('a', 'b'))
index1 -> index1: hasPreviousOrderByDescending() == false
index1 -> index1: calcStartPosition == 0
activate remainder
index1 --> remainder: preEval(('a', 'b'))
activate orderby
remainder --> orderby: preEval(('a', 'b')) 
activate index2
orderby -> index2: preEval(('a', 'b')) 
index2 -> index2: hasPreviousOrderByDescending() == false
index2 -> index2: calcStartPosition == 0
index2 -> orderby
orderby --> remainder
remainder --> index1
index1 -> x

'Process first Item 'a'
x -> x: processItem('a')
x -> index1: eval(null, null)
index1 -> index1: increment/decrement count == 1
index1 -> remainder: eval(null, null)
remainder -> orderby: eval(null, null)
note over orderby, index2, element: Error: No ordering of the Tuple Stream as been performed by `order by` before\n the `count $index2` is evaluated and subsequently used by the `x` Element Constructor
orderby -> index2: eval(null, null)
index2 -> index2: increment/decrement count == 1
activate element
index2 -> element: eval(null, null)
element-> index2: (index1=1 index2=1 x='a')
index2-> orderby: (index1=1 index2=1 x='a')
orderby -> orderby: orderedResult == \n  [\n    (index1=1 index2=1 x='a')\n  ]
note over orderby: orderedResult is not yet sorted!
orderby -> remainder: (index1=1 index2=1 x='a')
remainder -> index1: (index1=1 index2=1 x='a')
index1 -> x: (index1=1 index2=1 x='a')

'Process second Item 'b'
x -> x: processItem('b')
x -> index1: eval(null, null)
index1 -> index1: increment/decrement count == 2
index1 -> remainder: eval(null, null)
note over orderby, index2, element: Error: No ordering of the Tuple Stream as been performed by `order by` before\n the `count $index2` is evaluated and subsequently used by the `x` Element Constructor
remainder -> orderby: eval(null, null)
orderby -> index2: eval(null, null)
index2 -> index2: increment/decrement count == 2
index2 -> element: eval(null, null)
element-> index2: (index1=2 index2=2 x='b')
deactivate element
index2-> orderby: (index1=2 index2=2 x='b')
orderby -> orderby: orderedResult == \n  [\n    (index1=1 index2=1 x='a')\n    (index1=2 index2=2 x='b')\n  ]
note over orderby: orderedResult is not yet sorted!
orderby -> remainder: (index1=2 index2=2 x='b')
remainder -> index1: (index1=2 index2=2 x='b')
index1 -> x: (index1=2 index2=2 x='b')

'Post Eval phase
x -> x: callPostEval == true
note over x: Post Eval Phase starts here...
x --> index1: postEval([\n    (index1=1 index2=1 x='a')\n    (index1=2 index2=2 x='b')\n  ])
index1 --> remainder: postEval([\n    (index1=1 index2=1 x='a')\n    (index1=2 index2=2 x='b')\n  ])
remainder --> orderby: postEval([\n    (index1=1 index2=1 x='a')\n    (index1=2 index2=2 x='b')\n  ])
orderby -> orderby: orderedResult.sort
orderby -> orderby: orderedResult == \n  [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
note over orderby: orderedResult is now sorted!
note over orderby: Error: The Tuple Stream was sorted after the `count $index2`,\n it should have been sorted before!
orderby -> index2: postEval([\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ])
index2 -> orderby: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate index2
orderby --> remainder: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate orderby
remainder --> index1: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate remainder
index1 --> x: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate index1
x -> pathexpr: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate x
pathexpr -> xquery: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate pathexpr
note over user: The orderedResult overwrites the result sequence,\n and is returned to the User
xquery -> user: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate xquery

deactivate user
@adamretter adamretter added the bug issue confirmed as bug label Oct 14, 2023
@adamretter adamretter added this to the eXist-7.0.0 milestone Oct 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issue confirmed as bug
Projects
None yet
Development

No branches or pull requests

1 participant