Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add nonce column to actor_states #1105

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

kasteph
Copy link
Contributor

@kasteph kasteph commented Dec 22, 2022

This PR:

  • persists the actors nonce in the actor_states table and;
  • adds the nonce column to the actor_states table in the schema.
    • the nonce column has a UNIQUE constraint on it but still nullable (postgres docs)
    • I didn't wanna make it into a PK bc we'll need to fill existing rows with some unique values and I don't see the need to create yet another table which will be added overhead for Lily users who are using Postgres for storage. Adding another table would mean creating a view to unify these two tables (actor_states) and (actor_states_with_nonce).

Resolves #1104

@kasteph kasteph marked this pull request as ready for review December 22, 2022 05:45
@codecov-commenter
Copy link

codecov-commenter commented Dec 22, 2022

Codecov Report

Merging #1105 (f29e1f5) into master (852e26b) will not change coverage.
The diff coverage is n/a.

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #1105   +/-   ##
======================================
  Coverage    32.8%   32.8%           
======================================
  Files          46      46           
  Lines        3156    3156           
======================================
  Hits         1037    1037           
  Misses       2023    2023           
  Partials       96      96           

@placer14
Copy link
Contributor

placer14 commented Jan 3, 2023

  • Since the column is nullable with no default defined, the default should be NULL?
  • Would the constraint fail on tables with multiple rows having the same (height,head,nonce) values? I believe that given we already have dupe rows w (height,head) adding NULL nonce values won't make those unique. Correct?

I suspect this migration will not apply cleanly and I would really prefer to test this on a copy of that table in our production DB before we accept/merge this.

@placer14
Copy link
Contributor

placer14 commented Jan 3, 2023

I don't see the need to create yet another table which will be added overhead for Lily users who are using Postgres for storage. Adding another table would mean creating a view to unify these two tables (actor_states) and (actor_states_with_nonce).

Honestly, the view approach seems best here. We're making the schema a "lie" by indicating this column is nullable (which is impossible in the domain). This will require those who consume this schema to understand the semantics of NULL here (which are completely an operational detail and will only confuse those who are looking carefully).

And we're saying there's "overhead" in a view, but if we rename the existing table and put it and the new able behind a view called actor_states it should be seemless to the consumer. Am I missing some other drawbacks?

@kasteph
Copy link
Contributor Author

kasteph commented Jan 6, 2023

Thanks for taking the time to review this and the feedback!

Since the column is nullable with no default defined, the default should be NULL?

Yes.

Would the constraint fail on tables with multiple rows having the same (height,head,nonce) values?

If there are multiple rows with the same (height,head,nonce) values then it will violate that constraint per the Postgres doc:

In general, a unique constraint is violated if there is more than one row in the table where the values of all of the columns included in the constraint are equal. source

I believe that given we already have dupe rows w (height,head) adding NULL nonce values won't make those unique. Correct?

That's a good catch, thanks. I'll add the code column as well so that it will be (height,head,code,nonce). The first three columns are already part of the PK.

I suspect this migration will not apply cleanly and I would really prefer to test this on a copy of that table in our production DB before we accept/merge this.

That sounds good.

We're making the schema a "lie" by indicating this column is nullable (which is impossible in the domain). This will require those who consume this schema to understand the semantics of NULL here (which are completely an operational detail and will only confuse those who are looking carefully).

Do you mind expanding what you mean by the semantics of NULL? From my understanding, it is a marker for missing or unknown values in RDBMSes, according to the creator of RDBMSes. Thus it makes sense to me that it could be accept a NULL value since the nonce was not captured for this task in the first place and therefore a good candidate for a missing or unknown value.

And we're saying there's "overhead" in a view, but if we rename the existing table and put it and the new able behind a view called actor_states it should be seemless to the consumer. Am I missing some other drawbacks?

It is much simpler to include in the column comment that this column will have null values before this change is applied because it was not in the application code rather than create a new table, rename the old table, create a view that unifies these tables, and then document that.

The method you describe is seamless to the consumer, certainly, but there are also people who operate Lily for whom such tasks is just extra work (and sometimes not easy to find and can only be found in the CHANGELOG or hidden deep within the docs -- if it even is, for e.g., miner_sector_infos and miner_sector_infos_v7).

@@ -79,6 +79,8 @@ type ActorState struct {
Code string `pg:",pk,notnull"`
// Top level of state data as json.
State string `pg:",type:jsonb,notnull"`
// The next Actor nonce that is expected to appear on chain.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already making changes here, let's add a state_root field too. It will make the table re-org safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, let me think harder about the schema then because I don't wanna overcomplicate things but also make sure it's correct.

`
ALTER TABLE {{ .SchemaName | default "public"}}.actor_states
ADD COLUMN nonce BIGINT,
ADD CONSTRAINT actor_states_uniq_nullable_nonce UNIQUE (height,head,nonce);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should replace height with state_root as it's reorg safe.

@placer14
Copy link
Contributor

placer14 commented Jan 12, 2023

Do you mind expanding what you mean by the semantics of NULL? From my understanding, it is a marker for missing or unknown values in RDBMSes, according to the creator of RDBMSes. Thus it makes sense to me that it could be accept a NULL value since the nonce was not captured for this task in the first place and therefore a good candidate for a missing or unknown value.

The schema is all there is as far as a def of the shape of data. NULL is a placeholder for an empty value for us here because we don't know retroactively what these values are suppose to be. We are adding this piece of semantic knowledge as a comment on a column (maybe) but this knowledge does not appear in other use cases where it is important and easily overlooked (for the same reason it would be lost in the CHANGELOG). (Such as how an ORM would interpret the schema automagically, or how certain query will behave in the face of a NULLable column vs not, or even writing the query in a way that gives us the answer we expected in this specific context (which could be surprisingly different from others).

Ideally, the semantics of our internal operations should influence this schema as little as possible. That said, if we're introducing these semantics to simplify dev/op efforts, that's one thing and we've made such concessions in the past. But we've also tried to make schemas mostly read-only and using new models to represent schema changes is the most honest way of tracking how our exports evolve over time. We attempted this with the miner_sector_info_v7 table. Whether we do one or the other is fine, but doing both is hurting us in both ways (making schema semantics fuzzy and complicating our dev/ops lives).

If we are going to deprecate this schema in favor of something slimmer in the future once IDL work lands, then I'm much less concerned about these things in the short term. I wanted to raise this concern (as I have in the past) and make sure we are aware of the costs we are paying.

@placer14
Copy link
Contributor

It is much simpler to include in the column comment that this column will have null values before this change is applied because it was not in the application code rather than create a new table, rename the old table, create a view that unifies these tables, and then document that.

Also, let's not trivialize the differences here. Creating a comment is easy for this PR, but I will need to spend cycles to verify production doesn't break/degrade due to this migration. We are trading off avoidance of one complexity for taking on another one at runtime. (Which, also as a operator, I'm not a fan of.)

@placer14
Copy link
Contributor

@kasteph I hope this doesn't come off as nitpicky. I appreciate you submitting the PR for this and I won't block the merge if others want to go this way. Please let me know once the migration is ready for testing and I'll run it in staging for a bit to measure/test.

@birdychang
Copy link
Contributor

@frrist @kasteph @placer14 Should we just move forward?

@placer14
Copy link
Contributor

placer14 commented Jan 24, 2023

There is a risk that this migration is expensive when we roll out to production and ends up interrupting services while the database blocked writing to this table.

To mitigate this risk, I would like to dupe the actors table in production and test this migration first. I was finishing the removal of data from production so I can have room on disk for this but we could spin up a duplicate instance from backup and run the test on that and pay the cost of a single instance for a day (~$300?). I can start that process now and will report back with results from the test.

We could also handle the rollout manually by turning off actor tasks during migration and then gapfilling them after it completes. This would only affect data from the disabled tasks. I'm not sure about the scope of impact of data missing from this task on alerts and monitoring/analysis capabilities, but I'm certain we use this data and is non-zero.

Lastly, we could avoid testing if this were a new table and not an alter operation. This is guaranteed to have no affect on production.

If you want to abort the migration testing on a new instance, please ping me and I'll tear it down.

@placer14
Copy link
Contributor

placer14 commented Jan 24, 2023

Looks like restoring a production backup to a duplicate instance is broken in the UI. (!!!) I have an open request to support to investigate that.

@placer14
Copy link
Contributor

We discussed that this migration is not necessary. In order to avoid work needed to test/land this, we will postpone discussion on this improvement until some time later if needed. I have abandoned the migration testing work I mentioned previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicated rows in actor states task when using CSV as storage
5 participants