Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add draft proposal for package hardening #562

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

samoht
Copy link
Member

@samoht samoht commented Jul 21, 2017

I would like to start the discussion about improving the general quality of MirageOS packages. Ideally we should have a tool to help us do that (something like mirage lint) and/or a dashboard as @hannesm suggested a few time already.

I know some of these criteria are very opinionated but I would like to get a sense of what other people in the community is thinking, so please feel free to disagree or to suggest better criteria :-)

@hannesm
Copy link
Member

hannesm commented Jul 21, 2017

@samoht thanks for writing this up. I'm not sure whether hardening is the right word.

I don't understand why/what "by vendoring opam metadata" means. reproducibility of a MirageOS unikernel should depend only on a) dependencies (which are embedded into it, including version number) b) ld and C compiler c) OCaml compiler.

I'm also unsure what you mean by "have a way to gather licenses" -- you can do this in a (more or less) generic way using opam metadata, can't you?

what I'm missing from the list:

  • have a design document specifying the scope of the library (and esp. what is out of scope)
  • doesn't expose any exceptions in the public interface
  • avoid integer overflows (basically every addition and subtraction, as well as multiplication needs to be guarded unless you know that an overflow can never happen (in this case, a comment should be suggested))
  • works on 32bit (esp. in regards to the last point)?
  • maintainer: GitHub recently introduced code owners, which may be worth to use for some (of our core) libraries (see https://github.com/blog/2392-introducing-code-owners)
  • safe-string (looks like this will be the default in 4.06) switch to safe-string by default (related to MPR#7113) ocaml/ocaml#1252
  • provide pretty printers (e.g. by using fmt)
  • maybe some best practises: use result type (and rresult combinators)! use logs for logging, astring for string processing, bos for operating system interaction, and fpath for file paths (which unfortunately still is not used by mirage-block/fs)
  • follow OCaml programming guidelines https://ocaml.org/learn/tutorials/guidelines.html

EDIT: needed to add some more notes:

@djs55
Copy link
Member

djs55 commented Jul 21, 2017

I like both @samoht and @hannesm's lists.

Regarding "have a way to gather licenses" it's useful to be able to get the license file itself because it will often have text like

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

so if you want to send someone a binary unikernel you need to also include the copyright notice i.e. the copyright holders as distinct from the authors or the maintainers (sometimes they're the same but sometimes they're not) and a bunch of other stuff which isn't readily available from the opam file.

@dbuenzli
Copy link
Contributor

Regarding "have a way to gather licenses" it's useful to be able to get the license file itself because it will often have text like

This will do it:

opam show -f depends: ./opam | cut -f 1 -d ' ' | xargs odig license --no-pager --warn-error

See odig help license for the details.

@cfcs
Copy link

cfcs commented Jul 21, 2017

  • better linting of dependencies would definitely be nice (I've quite often experienced mirage-related packages that didn't properly specify all of their dependencies).

  • perhaps a release tool that requires the author to install the pkg in their own opam (could be as switch) to prevent yolo-maintainers from pushing broken commits.

  • I would suggest banning int and float entirely. I don't see the reason not to use Int32/Int64/Nativeint instead (or @hannesm's usane), but if someone can argue for the existence of the int type I'd be eager to hear their take on this.

  • making package signing mandatory (once https://github.com/hannesm/conex/ takes off) would also be a good addition to the security of the mirage ecosystem.

  • re: jbuilder: I'm a bit skeptical about making jbuilder a requirement, since it still seems a bit unstable (or perhaps people are still learning how to use it) -- I just noticed that most of the time where I run into packages that don't build, it's because of jbuilder. I wouldn't mind if this became a general guideline sometime in the future, but for now I'd be sad to have other build systems outlawed.

@dbuenzli
Copy link
Contributor

but if someone can argue for the existence of the int type I'd be eager to hear their take on this.

Indexing arrays maybe ?

@hannesm
Copy link
Member

hannesm commented Jul 21, 2017

@dbuenzli @cfcs indexing arrays is one use case, another is string or cstruct length operations and arithmetic (e.g. slicing a DNS frame into pieces, and keeping track of offsets... this will never be > 512 (or 2 ^ 16 - 1 in the TCP case)). if you receive a 16bit (8bit/24bit) value from a wire protocol, it is very safe to use int for that (unless you allow clients to fill an int without appropriate checking (since Cstruct wraps (using e.g. set_uint16) 0xFF00FF to 0x00FF)).

@cfcs
Copy link

cfcs commented Jul 21, 2017

@dbuenzli @hannesm I believe that the native array and string indexing using int is a messy thing leftover from when int was not platform-dependent.
Basically I think int should have stayed 31-bits (preferably named int31, but oh well), and that int63 should have been added as a type. Consequently using it in Cstruct is imho also a design mistake.

The wrapping of(aka silent overflowing/underflowing of) ints is exactly what we want to [EDIT: prevent], so ideally we'd have a functor that defined an integer field, and use operations on that field.

@hannesm if you receive a 8-bit length field (0x00) and subtract one (because the length field was supposed to include the length of the packet) you will get -1, and depending on how you use that value, you'd probably want to have used an unsigned integer type and be alerted to the underflow possibility at compile-time, or when parsing the input, rather than sometime later during the execution.

I'm aware of the fact that it's sometimes a bit awkward to not use the basic types, and I still hope we can find a way to make it less awkward. (some kind of polymorphic arithmetic operations like val (+) : ('A).t -> ('A).t -> ('A).t arithmetic_result - pseudo-code since I don't know how to express that in ocaml)

@samoht
Copy link
Member Author

samoht commented Jul 21, 2017

Thanks all for your feedback!

I think we should aim to make that list actionable in the short term -- so the discussion about switching everything to not using int seems a bit unrealistic for now on (we still have libraries without test/doc, and we just started to add fuzz testing to some of the core libs). I agree that aiming for 32 bit support is great, especially if we want to compile things to Javascript. Any idea how to do this? Maybe we should have a special compiler switch set with 32bits that we could use with TravisCI?

Thanks @dbuenzli for the license tip, that's exactly what we need. @hannesm: for the reproducible bit, we started experimenting with having more control over the dependencies which are used to build a released binary (using things similar to lock files): https://github.com/moby/vpnkit/tree/master/repo so that we have a stronger control over the opam universe used in the CI.

@hannesm @cfcs: thanks for your lists. I will try to merge all of this into something consistent :-)

90
```

We now have various libraries covering all aspect of the OS: from network to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...aspects...


We now have various libraries covering all aspect of the OS: from network to
block device drivers -- at all levels of the stack: from virtual network cards
to TLS or virtual block devices to virtual filesystem. These libraries also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...filesystems.

We now have various libraries covering all aspect of the OS: from network to
block device drivers -- at all levels of the stack: from virtual network cards
to TLS or virtual block devices to virtual filesystem. These libraries also
span multiple hypervisors such Xen as well as KVM (via solo5).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...multiple hypervisors including Xen and (via solo5) KVM.

span multiple hypervisors such Xen as well as KVM (via solo5).

All of this is great: this means that as a MirageOS user, you can spend time
developping your own application instead of spending time re-implementing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

developing

All of this is great: this means that as a MirageOS user, you can spend time
developping your own application instead of spending time re-implementing
something that the OS would usually do for you. There are of course still
gaps in the ecosystem, but this is getting better that it used to be.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but things are getting better!

these libraries. Would be great to be able to answer questions such as:
Does the library has a good test coverage? Was it formally verifired?
Was it used in production? In order to do this, I would like to introduce
the "MirageOS hardining process". Ideally this will live in a tool (e.g.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardening


- have well-identified maintainers
- use the correct set of tags in opam metadata
- use proper names: same name for ocaml and ocamlfind package, and also ideally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean opam rather than ocaml here?

- have documentation
- have a clear LICENSE file
- use Jbuilder to build, in order to have short feedback loops when fixing bugs
spawning multiple packages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spanning rather than spawning?

spawning multiple packages
- use topkg to release, in order to ease the task of the release manager
- have proper indentation (usgin ocp-indent + checked-in ocp-indent file)
- use at most 80 collumns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columns (or combine this bullet with previous, and host a standard ocp-indent file somewhere?)

@hannesm
Copy link
Member

hannesm commented Jul 21, 2017 via email

@avsm
Copy link
Member

avsm commented Jul 21, 2017

@samoht if you make a list of what CI modes would be useful (e.g. afl, 32-bit, ...) I can add those explicitly into mirage-ci

@@ -0,0 +1,118 @@
## Package hardening
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mort mentioned on mirageos-devel, hardening has security connotations, whereas this is more about 'towards high-quality packages' (and not only about security)

@hannesm
Copy link
Member

hannesm commented Jul 25, 2017

the current PR looks more like a laundry list - I agree with mort that pointing to some examples of "good libraries" would be good

and another item would be "semantic versioning" -- although it seems there's no proper definition / understanding inside the MirageOS community (so maybe we should not mention it?)

@samoht
Copy link
Member Author

samoht commented Jul 25, 2017

Yup completely agree about the laundry list, I feel the same. Any opinion on how to organise that better?

Maybe we should split this into "phases" or "levels", e.g:

  1. the 'boring paperwork' tasks to complete to make the library compatible with odig/topkg
  2. then some basic coding style 'sanity'
  3. how to go further towards quality (fuzz testing, discussion about int overflow, etc)
  4. bonus: formal verification, etc

(yes, I quite enjoyed the capnproto protocol levels)

And maybe have badges for each levels to put in the README.

Also I will replace "hardening" by something related to "quality".

For the examples of good library, please submit an example or two :-) I am not aware of any library today which fulfils all the basic items on that list ...

@mor1
Copy link
Member

mor1 commented Aug 1, 2017

That structure into levels LGTM.
How do any of those tasks relate to https://mirage.io/wiki/packaging ?
I don't know of examples of existing libraries that meet all criteria :)
But if restructured as levels, perhaps worth assigning levels to the core libs at least, and we should file relevant issues needed to bring them up to higher levels / next level?
(A good hackathon / pioneer project perhaps?)


- Have unit tests (using alcotest) (ideally with coverage report)

- Have fuzz tests (using crowbar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great idea for all projects, but the tooling really isn't ready for this to be included yet in a document people might be trying to adhere to. "Have randomized property-based testing" is more achievable and will give people the right foundation to use tools like Crowbar later, once they're more ready to be integrated in a CI pipeline.

@yomimono
Copy link
Contributor

yomimono commented Aug 2, 2017

The coding style items are all good goals, but they're at varying levels of subjectivity and apparent-ness (sorry). "avoid potential overflows" and "don't surface exceptions" are testable and concrete; items that begin with "be aware of" are less so. We should either come up with concrete ways in which a library surfaces these qualities, or note that they are goals somewhere rather than specific properties that can be possessed or not possessed.

@samoht samoht force-pushed the package-hardening branch 2 times, most recently from 931d6ad to d7b3299 Compare August 3, 2017 14:15
@samoht
Copy link
Member Author

samoht commented Aug 3, 2017

Pushed a tentative list of levels ... not really sure where this is going and indeed some of the item will be hard to verify automatically, at least in the short term.


- Use the correct set of tags in opam metadata

- Have documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crucial is to define the behaviour on edge cases (see e.g. the discussion about mirage-console being blocking on Xen vs partial write on Solo5 mirage/mirage-flow#4), and whether and how arguments are checked (e.g. values given to set* in cstruct are not checked, see mirage/ocaml-cstruct#78)

- Have a design document specifying the scope of the library (and esp. what is
out of scope)

- Use the correct set of tags in opam metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?


- Depends only on libraries released in opam.

- Provide pretty printers by using [fmt](http://erratique.ch/software/fmt/doc/Fmt.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intertwined with the previous use alcotest... if you use alcotest, you likely already defined pretty printers


- Avoid the use of C bindings for no good reasons. A good reason would be to
improve performance by an order of magnitude, or re-use an existing C library
that has not been rewritten yet.
Copy link
Member

@hannesm hannesm Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe elaborate a bit with links how difficult it is to get C bindings cross compiled atm!? and indicate that MirageOS does not include a C library, but only very few functions which are required for the OCaml runtime (see https://github.com/mirage/ocaml-freestanding/tree/master/nolibc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely should add a documentation. What I'm afraid with MirageOS 4 is the ability to compile C stubs for the Unix target (with the libc according to the dune context) and be blocked when we want to switch the whole codebase to a new target such as ocaml-freestanding/Solo5.

Currently, mirage-crypto or digestif try to compile properly C stubs with ocaml-freestanding flags (including nolibc) but such test (just to check that these C stubs can be cross-compiled with ocaml-freestanding) is not done by mirage and nothing (as far as I can say) can prevent a bad C stubs for all targets.


- Avoid integer overflows (basically every addition and subtraction, as well
as multiplication needs to be guarded unless you know that an overflow can
never happen (in this case, a comment should be suggested))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> in this case a comment is required. and add "you can test this by passing max_int and min_int in your tests for all int which occur"

@yomimono
Copy link
Contributor

I'd like to revisit this, with the goal of making it into a concrete list of things to achieve by interested newcomers, applicable to at least three repositories, by the start of the next Mirage hack retreat on 28 November.

@hannesm
Copy link
Member

hannesm commented Oct 25, 2017

@yomimono sounds great, given the release of ocamlformat, I'd suggest to jump on the train and reindent all the libraries, and deploy pre-commit hooks to never accept any code which ocamlformat disagrees with.

@cfcs
Copy link

cfcs commented Oct 25, 2017

@hannesm can we please have some fair warning before anything like that happens - reformatting everything and breaking everyone's local patches sounds to be me like a pretty high cost to pay to get normalized whitespace.

@avsm
Copy link
Member

avsm commented Oct 25, 2017

I agree with @cfcs -- can we get the current guidelines in place without creating new ones? I've only just started using ocamlformat and it's a young tool with some way to go before it can be imposed as a standard. For example, object syntax isnt supported, and there are still rewrite issues with portions of the AST. The tool is definitely heading in the right direction, but will take some time to mature.

@hannesm
Copy link
Member

hannesm commented Jan 4, 2018

ok, so how do we move forward here? I'd appreciate to have such a document in place (and agree that we should as well have some example libraries which are "high quality"). my experience with ocamlformat didn't lead to immediate adaption (main reason: ocp-indent and ocamlformat disagree (with default configuration) on indentation -- I use ocp-indent while developing, and am willing to use ocamlformat as pre-commit hook, but if they disagree on indentation, that's too much hassle for me)).

there are still some comments (from several people). can someone with a fresh pair of eyes review/revise this PR and we get settled to a first public draft!?

@samoht
Copy link
Member Author

samoht commented Jan 4, 2018

I will be very happy to see this moving forward but I have unfortunately a growing list of TODOs that I want to complete first. If anyone else is interested, please feel free to push directly to my branch :-)

@hannesm
Copy link
Member

hannesm commented Oct 27, 2018

EDIT: sorry for the force-push

I spend some time to revise this PR, and first wrote a paragraph that not every OCaml library works on MirageOS. I introduced tags (ready (works with MirageOS), integrated (uses coding style, packaging, documentation guidelines), automated (uses travis CI), approved (has owners, fuzz testing, etc.), verified) instead of levels (NB: approved has too many subitems).

I think most of our core libraries are in automated state, we should get badges there and strive for approved within the MirageOS4 release cycle!?


Style best practises
- Have proper indentation (using ocp-indent + checked-in ocp-indent file).
- Use at most 80 columns (`ocp-indent` unfortunately doesn't check this).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be add a note about ocamlformat here.


The library uses continuous integration systems to catch issues early:
- The [ocaml-ci-scripts](https://github.com/ocaml/ocaml-ci-scripts) contain
Travis and AppVeyor integration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be add a note about ocaml-ci.

@samoht
Copy link
Member Author

samoht commented Feb 4, 2021

I'd like to revisit that PR at one point, as @TheLortex has started to work on mirage-ci.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants