Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implicit individual/variable functions in PROBABILITY OF (...) #535

Open
riastradh-probcomp opened this issue Feb 17, 2017 · 3 comments
Open
Labels

Comments

@riastradh-probcomp
Copy link
Contributor

ESTIMATE * FROM VARIABLES OF p
     ORDER BY PROBABILITY OF (MUTUAL INFORMATION WITH c USING 100 SAMPLES)

Need to transmit the implicit variable into MUTUAL INFORMATION.

This is not trivial!

@fsaad
Copy link
Collaborator

fsaad commented Feb 18, 2017

simulate weight, t1.(estimate * from columns of p1'
            ' where probability of (mutual information with age < 1) > 0.8)'
            ' from p1 limit 10')

@riastradh-probcomp
Copy link
Contributor Author

riastradh-probcomp commented Feb 24, 2017

Two alternative approaches:

Temporary SQL functions

In an implied-variable (or implied-row) context -- i.e., ESTIMATE ... FROM VARIABLES OF p (or ESTIMATE ... FROM [PAIRWISE] p) --, replace any PROBABILITY OF (<subexpression>) that uses the implied variable (row) by bql_temp_fn_123(name) (or bql_temp_fn_123(rowid)), where bql_temp_fn_123 is a SQL function, temporarily established for the duration of the query with winders, that executes the query ESTIMATE PROBABILITY OF (<subexpression>) WITHIN p with the implicit variable (row) explicated, with its argument substituted for the explicit row. For example, for the query

ESTIMATE * FROM VARIABLES OF p
    ORDER BY PROBABILITY OF (MUTUAL INFORMATION WITH c < 0.01)

the function bql_temp_fn_123 would assemble and execute the query

ESTIMATE PROBABILITY OF (MUTUAL INFORMATION OF <name> WITH c < 0.01) WITHIN p

with its argument substituted for <name>.

To do this we must:

  • (~1 hour) Write and test code to detect implicit variables in an AST subtree.
  • (~1 day) Write and test code to transform subexpressions with implicit variables into SQL templates with explicit variables. (Note we cannot use SQL/BQL parameters until query parameters for names, not just values #156 is implemented.)
  • (~1 day) Adapt the BQL cursor winder/unwinder mechanism to include non-SQL actions, such as creating and destroying a SQL function, test it, and think through the consequences to make sure it is reliable.
  • (~1 day) Write and test code tying it all together and make sure it works.

This plan is more clearly articulated up front with fewer unknowns, but likely has a longer long-term maintenance burden (additional complexity in the BQL->SQL compiler), and no long-term benefits.

Virtual tables with table-valued functions

Create a virtual table with table-valued functions for performing the

SIMULATE MUTUAL INFORMATION OF c WITH d FROM MODELS OF p

queries, so that we can compile that into:

SELECT * FROM mutual_information_simulator
    WHERE population = 'p' AND target_variable = 'c' AND reference_variable = 'd'

Then it should be trivial to do the same for implicit variables in some context.

To do this we must:

  • (~1 day) [done privately] Mess around with virtual tables to attain facility with them.
  • (~1 day) [done in 27e21b9] Draft and test a virtual table for simulating mutual information from models of a population.
  • (~1 day) [done in 813e9d5] Adapt the compiler to generate code to use the virtual table, and test it.
  • (~1 day) [done in d537e0f] Adapt the compiler to push implicit variable references down through SIMULATE FROM MODELS OF subqueries.

This plan has more unknowns: there is a bug in sqlite3 affecting virtual tables which might affect us (need to spend an hour or so to review it to be sure), and there may be complexities in working with virtual tables that will come up in the 'mess around' stage requiring adjustment of the rest of the plan. However, I expect it to improve the long-term maintainability of the compiler, and I expect the same mechanism to be useful for other things -- e.g., making it easier to parallelize more queries, and make more of BQL composable and nestable.

@riastradh-probcomp
Copy link
Contributor Author

Draft is committed and lightly tested. Could use more testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants