Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework BufferedRandomAccessFile #907

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

Calvin-L
Copy link
Collaborator

This commit makes major alterations to BufferedRandomAccessFile to bring it in line with the general RandomAccessFile contract:

  • Invariant V1 has been strengthened to include a contract for the length field.
  • Invariants V2-V5 have been altered to account for cases where curr > length, which the RandomAccessFile contract allows.
  • Additionally, invariant V5 has been adjusted to clarify that dirty=true does not imply that there are unflushed bytes. The implication is only one-way: if there are unflushed bytes, then dirty=true.
  • The hi and maxHi fields have been removed. These largely duplicate information that can be quickly recomputed, meaning they add additional complexity to the class invariants. Without those fields, invariant V6 can be removed.
  • The close() method now closes the underlying file descriptor even if flush() throws an exception.

Due to the complexity of this change, I have added a TLA+ specification for BufferedRandomAccessFile to enable model checking of its design.

To ensure that the code is correct, this commit also adds a substantial amount of test infrastructure:

  • Several new regression tests have been added. These are a little cryptic because they were randomly generated, but they do reflect true behaviors of the RandomAccessFile contract.
  • A fuzz test has been added. This test ensures that many sequences of random operations behave correctly. It also minimizes failures so they can be easily converted to regression tests.

This commit also makes some whitespace and formatting adjustments.

Fixes #835. I have verified that the tests now pass under Java 21.

This commit makes major alterations to `BufferedRandomAccessFile` to
bring it in line with the general `RandomAccessFile` contract:

 - Invariant V1 has been strengthened to include a contract for the
   `length` field.

 - Invariants V2-V5 have been altered to account for cases where
   `curr > length`, which the `RandomAccessFile` contract allows.

 - Additionally, invariant V5 has been adjusted to clarify that
   `dirty=true` does not imply that there are unflushed bytes.  The
   implication is only one-way: if there are unflushed bytes, then
   `dirty=true`.

 - The `hi` and `maxHi` fields have been removed.  These largely
   duplicate information that can be quickly recomputed, meaning they
   add additional complexity to the class invariants.  Without those
   fields, invariant V6 can be removed.

 - The `close()` method now closes the underlying file descriptor even
   if `flush()` throws an exception.

Due to the complexity of this change, I have added a TLA+ specification
for `BufferedRandomAccessFile` to enable model checking of its design.

To ensure that the code is correct, this commit also adds a substantial
amount of test infrastructure:

 - Several new regression tests have been added.  These are a little
   cryptic because they were randomly generated, but they do reflect
   true behaviors of the `RandomAccessFile` contract.

 - A fuzz test has been added.  This test ensures that many sequences
   of random operations behave correctly.  It also minimizes failures
   so they can be easily converted to regression tests.

This commit also makes some whitespace and formatting adjustments.

Fixes #835.  I have verified that the tests now pass under Java 21.

Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
@Calvin-L Calvin-L force-pushed the cal-gh835-bufferedrandomaccessfile branch from cd7cf92 to 297c4c8 Compare April 10, 2024 21:54
@lemmy lemmy added bug error, glitch, fault, flaw, ... Tools The command line tools - TLC, SANY, ... labels Apr 10, 2024
@lemmy
Copy link
Member

lemmy commented Apr 10, 2024

Impressive!

ArraySlice(a, startInclusive, endExclusive) == [elems |-> SubSeq(a.elems, startInclusive + 1, endExclusive)]
ArrayConcat(a1, a2) == [elems |-> a1.elems \o a2.elems]

WriteToFile(file, offset, data_to_write) ==

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting to me that most of these functions are about Arrays. Even WriteToFile and ExtendFile are relatively easy make generic array operations, maybe OverlapAt and TruncateOrExtendWith, where both take a placeholder as an argument. It may simply be worth making this an Array module, and then Common stays more focused on the spec itself WriteToFile(file, offset, data_to_write) == Array.Overlap(file, data_to_write, offset, ArbitarySymbol) (I have no idea why I felt compelled to reverse arguments).

I suppose, otherwise I was going to suggest that we add brief documentation on what the semantics of WriteToFile and ExtendFile were, but in many ways something like ExtendFile == Array.TurncateOrExtendWith(file, new_length, ArbitrarySymbol) does this for you.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably overthinking these utility functions, but it struck me that maybe the most intuitive implementation involves slice & concat.

TruncateOrExtend(a, T, len) == IF len > ArrayLen(a) THEN ArrayConcat(a, Array(T, len - ArrayLen(a))) ELSE ArraySlice(a, 0, len)

OverlayAt(a1, a2, T, i) == ArrayConcat(TruncateOrExtend(a2, T, i), a2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should any of those utility operators be contributed as an Array module to our CommunityModules?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some simple documentation to these operators.

but it struck me that maybe the most intuitive implementation involves slice & concat

I agree that the current definitions are a little ugly. I wrote them after one or two failed attempts to come up with simpler definitions.

The ones you've given are certainly cleaner than my early attempts, and I will adopt your definition for TruncateOrExtend. But I might keep the definition I wrote for WriteToFile. The equivalent definition in terms of other operators is actually fairly cryptic (your definition for OverlayAt needs to include the portion of the contents after the written data if the contents were not written at the end of the array):

WriteToFile(file, offset, data_to_write) ==
    ArrayConcat(ArrayConcat(
        TruncateOrExtendFile(file, offset),
        data_to_write),
        ArraySlice(file, offset + ArrayLen(data_to_write), ArrayLen(file)))

Should any of those utility operators be contributed as an Array module to our CommunityModules?

Yes! I'll look into doing that soon.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your definition for OverlayAt needs to include the portion of the contents after the written data if the contents were not written at the end of the array

Whoops! Easy to make things simpler if you forget the critical parts 😆

@@ -0,0 +1,384 @@
\* Copyright (c) 2024, Oracle and/or its affiliates.

----------------------- MODULE BufferedRandomAccessFile -----------------------

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to say that I have absolutely found it useful to have a spec of the class for review, and especially as a refinement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec could also be referenced at tlaplus/examples

Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
This commit also updates the specification to model EXACTLY what the
`RandomAccessFile` docs say about the interaction between `setLength`
and the file pointer.

Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
@Calvin-L
Copy link
Collaborator Author

Thanks a ton @IamfromSpace for the feedback!

I think I've addressed all the major comments so far.

@IamfromSpace
Copy link

These updates look great! For the sake of thoroughness I still have a few comments coming through on tests, but nothing of real concern. Just wrapped up the unit tests, but it's taking me a bit longer to digest the fuzzing file :)

Copy link

@IamfromSpace IamfromSpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty, that's 100% everything from me :) I think there's some test comments that are worth a look over, but there's certainly no deal-breakers in there, so I've got the approval on there now. Not sure if the repo is configured to reset on changes or not, but if so, I'll keep it active for any commits that come in.

Nice work on this!

Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
In particular, "legal" is now "well-defined".

Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
@lemmy
Copy link
Member

lemmy commented May 2, 2024

I won't have time to review this PR anytime soon and I don't want to be a bottleneck. Given @IamfromSpace's review, I feel confident about this PR. @Calvin-L, feel free to merge or consider inviting additional reviewers at the TLA+ Community Meeting in two weeks. Reviewing the spec offers a great opportunity for those unfamiliar with or not fond of Java to contribute to the TLA+ tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug error, glitch, fault, flaw, ... Tools The command line tools - TLC, SANY, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Java 21 causes TLC's BufferedRandomAccessFileTest and OffHeapDiskFPSetTest to fail
4 participants