Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Between ordering - string < date #2774

Closed
neumino opened this issue Jul 29, 2014 · 25 comments
Closed

Between ordering - string < date #2774

neumino opened this issue Jul 29, 2014 · 25 comments

Comments

@neumino
Copy link
Member

neumino commented Jul 29, 2014

This return true

r.expr("foo").lt(r.now())

This return the states I expect (basically all of them) - state is a simple index on the field state and states are all strings.

r.db('test').table('test').between(0, 'z', {index: "state"})

However this return nothing:

r.db('test').table('test').between(0, r.now(), {index: "state"})

Ping @mlucy

@neumino
Copy link
Member Author

neumino commented Jul 29, 2014

Hum, actually I just can't get any result when I'm using a date in between.

@mlucy mlucy added this to the 1.14 milestone Jul 30, 2014
@mlucy
Copy link
Member

mlucy commented Jul 30, 2014

The orderings in datum_t::cmp and time_to_str_key don't match up. datum_t::cmp says that pseudotypes are always larger than "real" types, while time_to_str_key says they sort in-between R_NUM and R_STR.

We need to make these consistent for anything to make sense. Since people probably have data on-disk indexed by a pseudotype, I think we should change datum_t::cmp to reflect time_to_str_keys semantics.

@mlucy
Copy link
Member

mlucy commented Jul 30, 2014

Assigning to @Tryneus because he told me he has some free time.

@neumino
Copy link
Member Author

neumino commented Jul 30, 2014

@mlucy -- I thought that pseudotypes were behaving like other types, that is to say we compare them by their name.

array < bool < null < number< string < time

Is there a reason a pseudo types are always greater than a normal type? Or is it just that we only have time for now?

@mlucy
Copy link
Member

mlucy commented Jul 30, 2014

@neumino -- if you call r.now.typeof the output should be something like PTYPE<TIME>. We generally have the rule that types sort according to the output of typeof. (This is why time_to_str_key says they sort between NUMBER and STRING; N < P < S.) I'm not sure why datum_t::cmp thinks that pseudotypes are always larger than real types. That logic seems odd to me too.

@Tryneus
Copy link
Member

Tryneus commented Jul 30, 2014

I'm pretty sure changing datum_t::cmp will break sindex function compatibility. If we want to maintain compatibility, we would need new terms for anything that can call datum_t::cmp, which I imagine is a lot of things. Opinions?

@mlucy
Copy link
Member

mlucy commented Jul 30, 2014

@Tryneus: what would break?

@mlucy
Copy link
Member

mlucy commented Jul 30, 2014

(I mean, besides the obvious fact that sindex functions would compare types differently after the change.)

@Tryneus
Copy link
Member

Tryneus commented Jul 30, 2014

So, in the simplest example I can come up with, a user has a table with one entry: {date:<time object>}.

They add an index r.row('date').lt("some string"). Granted, this is a pointless index, but bear with me.

Before changing datum_t::cmp, the index for this row would be false. After changing datum_t::cmp, the index would be true. If they were to upgrade from the old behavior to the new behavior using the same data files, their rows would now be indexed incorrectly (that is, if they were to delete and reinsert the row, the index for it would change). I'm not sure what effects this may have on various queries.

@mlucy
Copy link
Member

mlucy commented Jul 30, 2014

I'm honestly not sure what to think. There are basically three options:

  • Require people to re-create sindexes this release.
    • This sucks, and we need to stop doing it.
  • Keep the old behavior for old sindexes
    • This is a pain in the ass.
    • This can be confusing: why does this one sindex act totally differently than new sindexes with the same code?
    • We could make this less bad by logging a warning when we read an sindex with a comparison operator in it.
  • Re-interpret old sindexes to use the new behavior.
    • This is terrible.
    • It might not be so bad here because it only matters if you're comparing times to strings, which is probably exceptionally uncommon.
    • We could still log a warning.

@srh -- I assume you're strongly in favor of the second option?

@gchpaco
Copy link
Contributor

gchpaco commented Jul 30, 2014

Can we not bump the version tag on the sindex and recreate it during import, rather than require the user to do it?

@mlucy
Copy link
Member

mlucy commented Jul 30, 2014

@gchpaco: I think the goal is to not force people to re-import their data between minor versions any more.

@mlucy
Copy link
Member

mlucy commented Jul 30, 2014

Actually, that's another option: we could leave this broken until 2.0 and fix it then. (There are obvious problems with that, though.)

@gchpaco
Copy link
Contributor

gchpaco commented Jul 30, 2014

Put another way: upon startup, identify old versions of these sindexes and rebuild them quietly? Thus migrating the on-disk format to the current version but without requiring manual operations of the user.

@gchpaco
Copy link
Contributor

gchpaco commented Jul 30, 2014

In this specific case that could be as easy as "old index, ok, drop it and recreate using original specs but not broken this time"

@mlucy
Copy link
Member

mlucy commented Jul 30, 2014

I see, you mean re-create the sindex from scratch to match the new interpretation of the sindex function?

We'd have to re-create almost all the sindexes (since so many things do comparisons), which would prevent queries on the sindexes from running, possibly for a long time, and afterward the results of some queries would be different than they were before. But maybe that's better than the other options; what do other people think?

@gchpaco
Copy link
Contributor

gchpaco commented Jul 30, 2014

We could conceivably retain the old index while we rebuild it to serve queries, although that could be a bad idea from the codebase POV.

@timmaxw
Copy link
Member

timmaxw commented Jul 30, 2014

Here's a proposal: If these changes might break an index, then we flag the index and put a warning message in the log. Queries against a flagged index fail with a message explaining the situation. The user can rebuild the index (which automatically clears the flag) or manually unflag it. If the index should have been rebuilt but the user manually unflags it, then the behavior is undefined.

@srh
Copy link
Contributor

srh commented Jul 31, 2014

@srh -- I assume you're strongly in favor of the second option?

Yes. Also, the datum_t::cmp-specific work is not really a pain in the ass. It comprises some clear and specific code that will exist for a version or maybe two (or maybe five! But it's not so bad.). The harder work is in the general support of this feature.

But it's important that we not crash and lose users' data, that we be capable of fixing bugs in the ReQL implementation or specification, and that users' databases don't refuse to work after an upgrade. This is the way to do it.

In order to tell users they need to rebuild indexes, this is simplest and most transparent: They rebuild the new index with a different name, while the old one still exists, and then they the new index. There are no technical barriers to supporting index renaming. E.g. table.index_rename('before', 'after').

@srh
Copy link
Contributor

srh commented Jul 31, 2014

I'm working on the backwards compatibility stuff in sam_2774. I hope to get it done by today.

@srh
Copy link
Contributor

srh commented Jul 31, 2014

Review 1847 contains disk format stuff in branch sam_2774_disk_structure. This only implements the disk format changes (including 1.13 backwards compatibility of course), and the use of the secondary index reql version found on disk. Scanning the sindex func or anything related to that is not implemented yet.

@mlucy
Copy link
Member

mlucy commented Jul 31, 2014

In light of #2789, I think we may have to just force people to re-import their data this release, which would obviate this problem.

@Tryneus
Copy link
Member

Tryneus commented Aug 5, 2014

This is fixed in next as of f3cf1a3 (review was 1837). Will be in release 1.14.

@Tryneus Tryneus closed this as completed Aug 5, 2014
@aleclarson
Copy link

We generally have the rule that types sort according to the output of typeof.

Sorry for the necropost. But this should be somewhere in the docs, if it's not already. 😄

@marshall007
Copy link
Contributor

@aleclarson thanks, I created an issue in the docs repo: rethinkdb/docs#1210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants