Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Map,Reduce,Filter in AQL with Special Aggregation Types #611

Open
Kelerchian opened this issue Nov 27, 2023 · 4 comments
Open
Labels
Actyx This issue leads to a version bump of Actyx. Feature

Comments

@Kelerchian
Copy link
Contributor

Kelerchian commented Nov 27, 2023

Product

Actyx

When

Querying with AQL

I want to

use map/reduce/filter-like capabilities with bound variables

So that

I can form efficient query which would normally require multiple requests (e.g. WHERE IN) type queries

Additional notes

For example: Finding all users that reside in a particular location can be done in this way

PRAGMA features := singlepass collections tagspread
FROM SINGLEPASS

LET location_updates = FROM 'user:location' ORDER DESC FILTER _.location = 'stockholm' END -- find all 
LET location_map = Map.from(
  location_updates,                           -- determine the iterable source
  (user) => user.id,                          -- determine key
  (key, value, existing, _index) =>           -- determine how map entry is assigned
    CASE IsDefined(existing) => existing  
    CASE true => value
    ENDCASE
)

LET user_ids = Set.from(location_map.keys);
LET matching_users = FROM 'user:created' & (  -- this query results similarly to WHERE IN style SQL query
  TagSpread(                                  -- spread into 'user:userid_1' | 'user:userid_2' | 'user:userid_3' | ...
    user_ids,                            -- determine source of tagspread: must be an iterable
    (id) => `user:{id}`,                 -- determine source are transformed
    '|'                                  -- determine the spread operator
  )
)  END                          

LET users_map = Map.from(
  matching_users,                             -- determine the iterable source
  (user) => user.id,                          -- determine key
  (key, value, existing, _index) =>           -- determine how map entry is assigned
    SELECT {                                
      ...value,
      location: (                           -- replace _.location
        CASE IsDefined(location_map[key]) => location_map[key]
        CASE true => value.location
        ENDCASE
      )
    }
)

SELECT users_map.values
@Kelerchian Kelerchian added Feature Actyx This issue leads to a version bump of Actyx. labels Nov 27, 2023
@rkuhn
Copy link
Member

rkuhn commented Nov 27, 2023

As a smaller proposal I’d try to model this with two new features: aggregating into an object with dynamic keys, and lambdas. The right-hand side of the : separator could be either a CBOR value (keeping only the last inserted one) or a lambda that gets the previous value (defaults to “undefined”) and the new value.

FROM 'user:location' & TIME > 1d ago
AGGREGATE { [_.userId]: _.location } -- obtain latest location for all users
SELECT ...Entries(_) -- yields sequence of [userName, location] tuples
FILTER _[1] = 'Stockholm' -- keep only those who are in Stockholm
SELECT ...FROM 'user:created' & `user:{_[0]}` END -- and get their details

The above is suitable when there are not so many location updates and lots of users. If there are only a few users, the below would be faster:

FROM 'user:created'
FILTER 'Stockholm' =
  ((FROM 'user:location' & `user:{_.userId}` ORDER DESC LIMIT 1 END)[0].location ?? 'unknown')

If in the first case we want to keep also the previous location, we could do it like this:

FROM 'user:location' & TIME > 1d ago
AGGREGATE { [_.userId]: \(prev) => { now: _.location, from: prev.now ?? null } } -- obtain latest location for all users
SELECT ...Entries(_) -- yields sequence of [userName, location] tuples
FILTER _[1].now = 'Stockholm' -- keep only those who are in Stockholm
SELECT {
  ...FROM 'user:created' & `user:{_[0]}` END -- and get their details
  cameFrom: _[1].from
}

(actually I found that the update function needs on the prev parameter, the rest is otherwise available)

@Kelerchian
Copy link
Contributor Author

or a lambda that gets the previous value (defaults to “undefined”) and the new value.

Is it safe to have a lambda replacement for a value? I am concerned that, if we introduce lambda in different place, different circumstances, it will work differently? (e.g.. If we introduce lambda in other place than aggregate, then it will behave differently, different param is supplied into the lambda)

@rkuhn
Copy link
Member

rkuhn commented Nov 29, 2023

You’re asking the right question: before introducing lambdas we need to decide on very clear semantics for them. This does not only pertain to capturing behaviour / hygiene, but also to whether a lambda is a value (that can be stored in objects) or something else.

In the syntax, we could allow lambdas only in certain places, but the less uniform the rules are the more users will be confused.


My current thinking is that regarding a lambda as a value means defining its serialized form, making it possible to store it and recreate it. The most common problem with this approach is that all referenced context needs to be captured by value and serialized — you may search for “Java Serialization” to get a picture of the issues. We may be able to optimise this, but it remains a potential foot gun, e.g. when capturing another function and so on.

It may be easier to declare functions to belong to a difference universe than values, so you cannot store them in objects, arrays, etc. A LET binding would then either refer to a value or a function. This limits the number of functional programming patterns that can be used, but I also think that we shouldn’t embed a roughly hewn Haskell into our peculiar database :-) (aeh, sorry, databank)

@Kelerchian
Copy link
Contributor Author

So now we have two problems:

1.) Internal representation of a lambda and its serialization problem
2.) Harm of contextual lambda and its semantical relation to the query


I would like to propose a solution similar to my initial solution which may solve those two at the same time because they are related.

  • A lambda is a state machine. Not hand-sewn, but more akin to how Rust converts an async function into a future internally.
  • If it helps: Lambda is not serializable. It is stored in a context, referenced through a special reference LamdaRef(ID).
  • Lambda is a stateless pure function (e.g. whose body is CASE...END CASE).
  • Lambda can have a contextually variable number of parameters. A number of parameters can be verified against its context.
  • To solve the problem with context, we introduce other processors with lambda (for the lack of better words)
  • These processors have flush and apply which act similarly, but they let go of the control to the passed lambda.
  • Each processor has different requirements of lambda parameters number (which can be used as a verification point before execution)

For example, with a new processor PushFold:

AGGREGATE {   -- obtain latest location for all users
  [_.userId]: PushFold(
    _.location,                                           -- pass the currently evaluated event as `next`
    (prev, next) => {
      now: next,
      from: prev.now ?? null
    })
}

PushFold provides a semantic context to the lambda, or in other words, a legitimate reason why it has (prev, next) as an argument. This enables us to unawkwardly create another form of lambda with different forms, and different numbers of parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Actyx This issue leads to a version bump of Actyx. Feature
Projects
None yet
Development

No branches or pull requests

2 participants