Between ordering - string < date #2774

neumino · 2014-07-29T18:42:59Z

This return true

r.expr("foo").lt(r.now())

This return the states I expect (basically all of them) - state is a simple index on the field state and states are all strings.

r.db('test').table('test').between(0, 'z', {index: "state"})

However this return nothing:

r.db('test').table('test').between(0, r.now(), {index: "state"})

Ping @mlucy

The text was updated successfully, but these errors were encountered:

neumino · 2014-07-29T19:42:09Z

Hum, actually I just can't get any result when I'm using a date in between.

mlucy · 2014-07-30T03:19:08Z

The orderings in datum_t::cmp and time_to_str_key don't match up. datum_t::cmp says that pseudotypes are always larger than "real" types, while time_to_str_key says they sort in-between R_NUM and R_STR.

We need to make these consistent for anything to make sense. Since people probably have data on-disk indexed by a pseudotype, I think we should change datum_t::cmp to reflect time_to_str_keys semantics.

mlucy · 2014-07-30T07:26:47Z

Assigning to @Tryneus because he told me he has some free time.

neumino · 2014-07-30T08:11:40Z

@mlucy -- I thought that pseudotypes were behaving like other types, that is to say we compare them by their name.

array < bool < null < number< string < time

Is there a reason a pseudo types are always greater than a normal type? Or is it just that we only have time for now?

mlucy · 2014-07-30T10:03:31Z

@neumino -- if you call r.now.typeof the output should be something like PTYPE<TIME>. We generally have the rule that types sort according to the output of typeof. (This is why time_to_str_key says they sort between NUMBER and STRING; N < P < S.) I'm not sure why datum_t::cmp thinks that pseudotypes are always larger than real types. That logic seems odd to me too.

Tryneus · 2014-07-30T21:21:57Z

I'm pretty sure changing datum_t::cmp will break sindex function compatibility. If we want to maintain compatibility, we would need new terms for anything that can call datum_t::cmp, which I imagine is a lot of things. Opinions?

mlucy · 2014-07-30T21:49:25Z

@Tryneus: what would break?

mlucy · 2014-07-30T21:49:51Z

(I mean, besides the obvious fact that sindex functions would compare types differently after the change.)

Tryneus · 2014-07-30T22:19:12Z

So, in the simplest example I can come up with, a user has a table with one entry: {date:<time object>}.

They add an index r.row('date').lt("some string"). Granted, this is a pointless index, but bear with me.

Before changing datum_t::cmp, the index for this row would be false. After changing datum_t::cmp, the index would be true. If they were to upgrade from the old behavior to the new behavior using the same data files, their rows would now be indexed incorrectly (that is, if they were to delete and reinsert the row, the index for it would change). I'm not sure what effects this may have on various queries.

mlucy · 2014-07-30T22:42:05Z

I'm honestly not sure what to think. There are basically three options:

Require people to re-create sindexes this release.
- This sucks, and we need to stop doing it.
Keep the old behavior for old sindexes
- This is a pain in the ass.
- This can be confusing: why does this one sindex act totally differently than new sindexes with the same code?
- We could make this less bad by logging a warning when we read an sindex with a comparison operator in it.
Re-interpret old sindexes to use the new behavior.
- This is terrible.
- It might not be so bad here because it only matters if you're comparing times to strings, which is probably exceptionally uncommon.
- We could still log a warning.

@srh -- I assume you're strongly in favor of the second option?

gchpaco · 2014-07-30T22:43:53Z

Can we not bump the version tag on the sindex and recreate it during import, rather than require the user to do it?

mlucy · 2014-07-30T22:45:33Z

@gchpaco: I think the goal is to not force people to re-import their data between minor versions any more.

mlucy · 2014-07-30T22:46:00Z

Actually, that's another option: we could leave this broken until 2.0 and fix it then. (There are obvious problems with that, though.)

gchpaco · 2014-07-30T22:46:53Z

Put another way: upon startup, identify old versions of these sindexes and rebuild them quietly? Thus migrating the on-disk format to the current version but without requiring manual operations of the user.

gchpaco · 2014-07-30T22:47:17Z

In this specific case that could be as easy as "old index, ok, drop it and recreate using original specs but not broken this time"

mlucy · 2014-07-30T22:53:14Z

I see, you mean re-create the sindex from scratch to match the new interpretation of the sindex function?

We'd have to re-create almost all the sindexes (since so many things do comparisons), which would prevent queries on the sindexes from running, possibly for a long time, and afterward the results of some queries would be different than they were before. But maybe that's better than the other options; what do other people think?

gchpaco · 2014-07-30T23:03:04Z

We could conceivably retain the old index while we rebuild it to serve queries, although that could be a bad idea from the codebase POV.

timmaxw · 2014-07-30T23:12:13Z

Here's a proposal: If these changes might break an index, then we flag the index and put a warning message in the log. Queries against a flagged index fail with a message explaining the situation. The user can rebuild the index (which automatically clears the flag) or manually unflag it. If the index should have been rebuilt but the user manually unflags it, then the behavior is undefined.

srh · 2014-07-31T10:04:06Z

@srh -- I assume you're strongly in favor of the second option?

Yes. Also, the datum_t::cmp-specific work is not really a pain in the ass. It comprises some clear and specific code that will exist for a version or maybe two (or maybe five! But it's not so bad.). The harder work is in the general support of this feature.

But it's important that we not crash and lose users' data, that we be capable of fixing bugs in the ReQL implementation or specification, and that users' databases don't refuse to work after an upgrade. This is the way to do it.

In order to tell users they need to rebuild indexes, this is simplest and most transparent: They rebuild the new index with a different name, while the old one still exists, and then they the new index. There are no technical barriers to supporting index renaming. E.g. table.index_rename('before', 'after').

srh · 2014-07-31T10:45:38Z

I'm working on the backwards compatibility stuff in sam_2774. I hope to get it done by today.

srh · 2014-07-31T15:42:57Z

Review 1847 contains disk format stuff in branch sam_2774_disk_structure. This only implements the disk format changes (including 1.13 backwards compatibility of course), and the use of the secondary index reql version found on disk. Scanning the sindex func or anything related to that is not implemented yet.

mlucy · 2014-07-31T22:24:08Z

In light of #2789, I think we may have to just force people to re-import their data this release, which would obviate this problem.

Tryneus · 2014-08-05T19:48:13Z

This is fixed in next as of f3cf1a3 (review was 1837). Will be in release 1.14.

aleclarson · 2017-08-02T14:10:59Z

We generally have the rule that types sort according to the output of typeof.

Sorry for the necropost. But this should be somewhere in the docs, if it's not already. 😄

marshall007 · 2017-08-02T16:18:01Z

@aleclarson thanks, I created an issue in the docs repo: rethinkdb/docs#1210

neumino added tp:bug labels Jul 29, 2014

mlucy added this to the 1.14 milestone Jul 30, 2014

mlucy added the release_blocker label Jul 30, 2014

mlucy assigned Tryneus Jul 30, 2014

mlucy mentioned this issue Jul 31, 2014

Secondary indexes with common prefixes sometimes sort incorrectly #2789

Closed

Tryneus closed this as completed Aug 5, 2014

srh mentioned this issue Aug 6, 2014

RethinkDB in-place upgrade testing #2821

Open

srh added a commit that referenced this issue Aug 6, 2014

Added tests for #2774 to upgrade.py.

2259ee0

marshall007 mentioned this issue Aug 2, 2017

Document sorting behavior across various types rethinkdb/docs#1210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Between ordering - string < date #2774

Between ordering - string < date #2774

neumino commented Jul 29, 2014

neumino commented Jul 29, 2014

mlucy commented Jul 30, 2014

mlucy commented Jul 30, 2014

neumino commented Jul 30, 2014

mlucy commented Jul 30, 2014

Tryneus commented Jul 30, 2014

mlucy commented Jul 30, 2014

mlucy commented Jul 30, 2014

Tryneus commented Jul 30, 2014

mlucy commented Jul 30, 2014

gchpaco commented Jul 30, 2014

mlucy commented Jul 30, 2014

mlucy commented Jul 30, 2014

gchpaco commented Jul 30, 2014

gchpaco commented Jul 30, 2014

mlucy commented Jul 30, 2014

gchpaco commented Jul 30, 2014

timmaxw commented Jul 30, 2014

srh commented Jul 31, 2014

srh commented Jul 31, 2014

srh commented Jul 31, 2014

mlucy commented Jul 31, 2014

Tryneus commented Aug 5, 2014

aleclarson commented Aug 2, 2017

marshall007 commented Aug 2, 2017

Between ordering - string < date #2774

Between ordering - string < date #2774

Comments

neumino commented Jul 29, 2014

neumino commented Jul 29, 2014

mlucy commented Jul 30, 2014

mlucy commented Jul 30, 2014

neumino commented Jul 30, 2014

mlucy commented Jul 30, 2014

Tryneus commented Jul 30, 2014

mlucy commented Jul 30, 2014

mlucy commented Jul 30, 2014

Tryneus commented Jul 30, 2014

mlucy commented Jul 30, 2014

gchpaco commented Jul 30, 2014

mlucy commented Jul 30, 2014

mlucy commented Jul 30, 2014

gchpaco commented Jul 30, 2014

gchpaco commented Jul 30, 2014

mlucy commented Jul 30, 2014

gchpaco commented Jul 30, 2014

timmaxw commented Jul 30, 2014

srh commented Jul 31, 2014

srh commented Jul 31, 2014

srh commented Jul 31, 2014

mlucy commented Jul 31, 2014

Tryneus commented Aug 5, 2014

aleclarson commented Aug 2, 2017

marshall007 commented Aug 2, 2017