Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: some uses of Bytes.unsafe_to_string likely unsafe #1967

Closed
tomjridge opened this issue Jul 6, 2022 · 13 comments
Closed

irmin-pack: some uses of Bytes.unsafe_to_string likely unsafe #1967

tomjridge opened this issue Jul 6, 2022 · 13 comments
Assignees
Labels
feature/layered-store Related to the Layered Store type/bug Related to a bug

Comments

@tomjridge
Copy link
Contributor

For example, in mapping_file.ml on main branch, there is:

(* Fill and close [file0] *)
    let buffer = Bytes.create 16 in
    let register_entry ~off ~len =
      (* Write [off, len] in native-endian encoding because it will be read
         with mmap. *)
      (* if Int63.to_int off < 500 then
       *   Fmt.epr "register_entry < 500: %d %d\n%!" (Int63.to_int off) len; *)
      Bytes.set_int64_ne buffer 0 (Int63.to_int64 off);
      Bytes.set_int64_ne buffer 8 (Int64.of_int len);
      Ao.append_exn file0 (Bytes.unsafe_to_string buffer)
    in
...

The buffer is shared over invocations of register_entry, but register_entry calls Bytes.unsafe_to_string. This looks wrong to me.

@metanivek
Copy link
Member

metanivek commented Jul 6, 2022

Good catch. I saw this related issue recently #1815 which I definitely think we ought to address before release. I think we can either create individual issues (like this one) for closing with individual PRs or close this issue to focus on making one PR that addresses all the issues we can find (and would close the earlier issue). What do you think?

@tomjridge
Copy link
Contributor Author

tomjridge commented Jul 6, 2022

Also some comments on #1814

What I wrote then: "At the very least, all uses of Bytes.unsafe_to_string should either be obviously valid (i.e. looking locally at the lines involved can guarantee the use is correct), or there should be a comment as to why the use is valid."

@tomjridge
Copy link
Contributor Author

A single PR would be great, but the problem is: There will be some uses which look like they are unsafe; but if we change them, we kill performance. Clearly if something is definitely unsafe we need to change it. But given the impact on performance, we need to be careful if there is a use where it is not clear either way.

So I would actually favour eg tackling all the obvious cases in a single PR, which also marks the unobvious cases with a "TODO" that we need to investigate further.

@metanivek
Copy link
Member

I like that idea. Open a single PR that closes #1815 by

  1. Addressing all obvious cases
  2. Adding TODO + individual issues for non-obvious cases

@tomjridge
Copy link
Contributor Author

tomjridge commented Jul 6, 2022

Sounds good to me. Let's hope we can mitigate any performance losses for the "obvious cases".

@tomjridge tomjridge added the type/bug Related to a bug label Jul 7, 2022
@tomjridge
Copy link
Contributor Author

tomjridge commented Jul 7, 2022

An initial pass through the irmin-pack code (only), with comments on each occurrence of Bytes.unsafe_to_string: https://github.com/tomjridge/irmin/tree/investigate_unsafe_to_string

We also should look at Bytes.unsafe_of_string (but these should all hopefully be "obviously safe").

@samoht
Copy link
Member

samoht commented Jul 7, 2022

Could we replace the local usage with val with_buffer: int -> (bytes -> unit) -> string to make it this is supposed to be a local buffer (and that we should keep a reference of if when exiting the callback).

@tomjridge
Copy link
Contributor Author

@samoht Sorry to be dim - could you explain further? I don't follow what you mean.

@samoht
Copy link
Member

samoht commented Jul 7, 2022

We should replace these patterns:

let str =
  let buf = Bytes.create n in
  f buf;
  Bytes.unsafe_to_string buf
in
...

into:

let str = with_buffer f in
...

@tomjridge
Copy link
Contributor Author

For a draft PR see #1970

@samoht samoht added the feature/layered-store Related to the Layered Store label Jul 7, 2022
@samoht
Copy link
Member

samoht commented Jul 7, 2022

Kind of critical for the release - work in on-going (@metanivek and @tomjridge are working on this)

@tomjridge
Copy link
Contributor Author

@metanivek has kindly let me of documenting some of the other functions (thanks!) - there is a follow up issue #1972 for the extra documentation. If people are happy I think we should merge.

@tomjridge
Copy link
Contributor Author

Mostly closed by #1970. Follow up issue #1972. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/layered-store Related to the Layered Store type/bug Related to a bug
Projects
None yet
Development

No branches or pull requests

3 participants