Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing Software Preview Submissions #122

Open
arjunsuresh opened this issue Aug 25, 2022 · 22 comments
Open

Allowing Software Preview Submissions #122

arjunsuresh opened this issue Aug 25, 2022 · 22 comments

Comments

@arjunsuresh
Copy link
Contributor

The current MLCommons rules do not allow a preview submission just because a software component used is in "preview" (not available) stage. This restriction is not desirable as

  • The available category requires the software to be available as on the submission date. But there can be late modifications on the software which may not get sufficient time to be released even as beta

One option currently is to submit such a system under RDI category. But the rules are a bit ambiguous on Software RDI components. So, my proposal is to restrict RDI components to only hardware components and allow "Preview software" in Preview category.

@DilipSequeira
Copy link
Contributor

Preview is narrowly tailored for a reason.

Preview submissions are not subject to review or reproducibility by customers, and the lack of these things is not good for a benchmark suite. The sanction of striking results next time if not replicated as Available is also a limited disincentive, since last round is "old news". So there needs to be commensurate benefit to allowing such submissions. For new hardware, it comes from the intense market and press interest which often accompanies new devices.

Hardware and software are also different in important ways:

Hardware cycles are relatively inelastic, and somewhat unpredictable towards the end of development if gating bugs are found and need to be worked around. In comparison, there's a well-understood iterative model for software lifecycles where a cutoff date is set, and features that don't make the date are deferred to next release. If a single feature is important, and of sufficient maturity to provide to customers in an early state, submitters can reasonably plan for a release of that feature, at least at beta quality. "Preview" is only useful if working software comes so late in the MLPerf cycle that you can't do a QA cycle on it and release as beta.

Further, the iterative nature of software cycles means that you usually have one to ship, and one in development - and with adequate machine resources, it's possible to enter with both. The development one is always faster, of course, so if you think your competitors might submit theirs, you might also feel you have to submit yours. That would just increase everyone's work to nobody's benefit. That's not the case with hardware: it's typically either very close to ready, or unusable.

It would be helpful for submitters in favor of Preview software to explain why the marginal benefit over submitting a beta outweighs those disadvantages.

@arjunsuresh
Copy link
Contributor Author

Thank you @DilipSequeira for your detailed reply. I completely understand the reasons for narrowing down the preview submissions and they are very reasonable. But from my experience most new hardware (except from well established companies) need last minute software changes and they are critical for their submissions. Also, I'm not very clear on the release as beta requirement. In particular I have the following two questions

  1. If my software is in a public repo can any commit hash be considered a beta release? (Supposing this release is done after the submission deadline but commit is done before the deadline)
  2. If my software is closed and only accessible to customers does sharing of the software binary to one or more customers constitute availability of the software?

Based on your previous comment I suppose your answer would be no to both these questions. While hardware release cycles are uniformly applicable to almost all submitters working on a new hardware software release cycles are particularly critical for those people working on newer hardware which may not have a fully stable software stack. For example while an established vendor might loose 3-5% performance due to using a released software against the latest one, a new vendor might be losing 10-50% performance or even a chance to submit if the accuracy threshold is not met.

@DilipSequeira
Copy link
Contributor

DilipSequeira commented Aug 30, 2022

@arjunsuresh if you have preview hardware, you do not have to meet the Availability requirements for software necessary to use that component. Specifically, the availability requirement is waived for "newly developed" hardware components, where the definition of "newly developed" is in the rules.

For your other questions, see the "Available" column here.

My interpretation of the rules is:

  • If your software source is in a public repo, it's available so long as there's a commit hash. No release process is required.
  • If it's a closed binary only available to paying customers, it's required to be a named, versioned release, and it must be available to any qualified customer of the hardware. "Beta" has some wiggle-room, but it would be up to the submitter to justify if necessary that it's "made available to customers as a clear part of the release sequence" - i.e. that the component historically has pre-production releases via some release process, and this follows that process.

@arjunsuresh
Copy link
Contributor Author

Thank you @DilipSequeira on the links and interpretations.

Actually I'm not talking about preview hardware but a hardware on which not all the MLPerf inference models have been tested on - my understanding is that only Nvidia and Intel have submitted inference results for all the inference models. So, any other submitter might suffer from software issues on their already available hardware which can restrict them from a possible submission.

"If your software source is in a public repo, it's available so long as there's a commit hash. No release process is required."

This makes life easier for those working on public repos (we being one). The rule also allows some additional PRs and so I see no issue arising here for submitters working on public repos.

"Beta release" for closed binary is where clarity is needed. My interpretation or expectation of a "beta release" is that the binary should be properly tagged with a version number and made available to customers. This binary should be made before the submission deadline and any customer who asks for it after the submission deadline should have access to it.

Mainly I think "asking for a proper release cycle" is unnecessary here. But since this is not what we are going through ourselves I would ask the opinion of other submitters here. Qualcomm might be the best one to answer as their software stack for Cloud AI 100 is not publicly available as of today.

@DilipSequeira
Copy link
Contributor

Model freeze, and model acceptance for a given round, are controlled by the working group, so if submitters feel there is insufficient time to prepare for a model, it will get pushed to the next round (and this has happened multiple times.) Once submitters have agreed to accept the model for a given round, it's a commitment to implement at beta quality or better if you want to submit on existing hardware - and this reflects a reality of the market: if hardware available in the market is intended for use in a particular class of workloads, customers should be able to port such models to it with reasonable effort.

Of course, inability to produce software on a committed schedule will affect your ability to show the potential of your hardware, but that's not unique to MLPerf.

@arjunsuresh
Copy link
Contributor Author

Actually my request is not just restricted to a new model but even the existing models which are not yet tested on the hardware of a submitter (usually not applicable for Nvidia). I understand the reasons for at least "beta" quality -- but my concern is regarding the time delay between making a software binary and its release -- in many cases this takes weeks. We can consider the following scenario:

As an OEM I'm making a system using an accelerator or company X. Now, a few days before the deadline company X provides a software binary to us (say from a nightly build) and this enables us to collect results for a MLPerf model. As per current rules, this result cannot go under Available category unless the given binary is classified as a "beta release" and time delay for the same vary depending on the internal policies of company X from a day to many weeks. Moreover I'm not sure if this restriction is bringing significant "quality difference" to a submission anyway and that's why I'm proposing to remove the "release cycle" requirement of the software for available category.

@DilipSequeira
Copy link
Contributor

The "beta" rule predates my involvement in MLPerf, so I don't know the original rationale. However, I would say this is not about the quality of submissions, but their credibility. There should be some threshold to prevent a submitter taking research software that might never be productized, labeling it a "beta", and using that to publish comparisons against other submitters. The "clear part of a release sequence" language means you're making a public commitment to your customers that these optimizations will be in production in a reasonably short time frame, and it's clearly enough decidable that it could be checked in audit.

@arjunsuresh
Copy link
Contributor Author

I completely agree on the credibility part. But do you agree on the following points?

  1. Say I'm having nightly staging builds. Can any of these builds qualify as a beta release if it is made available to my customers? These builds are clearly part of a release cycle. As a guarantee we can even ask that this build must have a production release in 4-6 weeks from the submission deadline.
  2. Suppose I'm having a research software which is substantially different from my production one - say a compiler based on GCC whereas my current release software is based on LLVM. In such case I can submit my results based on the research software under RDI category and this in no way affect my available submissions in future rounds using the production software.

@DilipSequeira
Copy link
Contributor

  1. if your CI/CD is sufficiently mature that you post nightlies and any qualified customer can pick them up to use in their own development, then my interpretation would be that all of those nightlies are part of your release sequence, you could deem any one of them to be a beta, and you're covered. While I'm not aware of any submitters with closed source products at that level of software maturity, it's possible.
  2. my understanding is that the intent of RDI is for cutting edge research which is a long way from productization. Software is rarely that far from productization. And your example (LLVM vs gcc) doesn't seem remotely close to approaching the intent of RDI. Others may have a different understanding of the category.

@arjunsuresh
Copy link
Contributor Author

"if your CI/CD is sufficiently mature that you post nightlies and any qualified customer can pick them up to use in their own development, then my interpretation would be that all of those nightlies are part of your release sequence, you could deem any one of them to be a beta, and you're covered. While I'm not aware of any submitters with closed source products at that level of software maturity, it's possible."

Here, the bold part is not necessarily true. Like the nightly builds are made available to customers only on request or when a special need arises. I guess those selected nightly builds are still eligible to be part of the release sequence and hence can be considered as a "beta release" even if it is not put as an official beta release on the submitter website as on the submission deadline.

"my understanding is that the intent of RDI is for cutting edge research which is a long way from productization. Software is rarely that far from productization. And your example (LLVM vs gcc) doesn't seem remotely close to approaching the intent of RDI. Others may have a different understanding of the category."

I did not mean just substituting gcc with llvm - For example say I'm having a matured software stack based on one compiler framework which supports a good number of models in production use. Now we are researching on working with a new compiler framework which as of now only supports some special cases or say only one inference model. I can't call this software an "available one" - because there are no previous releases nor any estimate on the future release. From my experience release of such a software can easily take months if not years and I suppose it is fair to put them under RDI category.

@DilipSequeira
Copy link
Contributor

I think we're really getting into hypotheticals on the betas. My understanding based on past discussions is the threshold for Availability is that there's high confidence that this is not research software and that its presence in your customers' hands is imminent, and the test of that its that it's a "real" beta release.

For RDI... if you're more than 221 days away from production, then you have no problem. If you're going to be (say) 90 days from production by the MLPerf deadline, then you probably want to start planning for an early beta as soon as the submission date is known.

@arjunsuresh
Copy link
Contributor Author

Thank you @DilipSequeira for the clarification on RDI. I think that part is clear now.

Regarding "beta release", the only contention is regarding the requirement for the software binary to be "released" as on the submission date. Of course this is only relevant for companies where there is significant delay between the production of a binary and a possible beta release. If we relax the requirements for available category such that "software binary" must be available on the submission date and its "beta release" be available within 4 to 6 weeks (which ensures audit) from the submission deadline I suppose this problem goes away and we can agree on "no Software Preview submissions".

@DilipSequeira
Copy link
Contributor

DilipSequeira commented Aug 31, 2022

@arjunsuresh It's important where possible to be able to establish the legality of a submission on submission day, without an obligation to do something later (with the exception of "preview" which implies something is coming later.)

The implication of what you're suggesting is that SW submitters should plan to optimize right up to the deadline, submit, and then work on conformance. But for most submitters, it should work equally well to optimize until 6 weeks before the deadline, start making beta release candidates, roll in critical MLPerf-oriented optimizations to key operators in the last week or two with soak testing of those specific changes, and push the beta out for the submission date.

The WG should only accept models into a round when a critical mass of submitters are confident that they can produce conformant implementations on submission day. The freeze date is set to support that; If we think it takes longer to produce conformant submissions than the current date allows, we should move it earlier.

@arjunsuresh
Copy link
Contributor Author

@DilipSequeira I understand your point in maintaining a beta release candidate specifically for MLPerf and that looks okay to me. But I'm not sure if all submitters can actually do a beta release from a beta candidate in a couple of days or within a week (mainly due to legal procedures). But I might be wrong here - lets see what other submitters have to say.

@arjunsuresh
Copy link
Contributor Author

Just summarizing our requirement

  1. The concern is applicable to existing models and not necessarily new ones.
  2. When we test a model on a new SUT we might face accuracy/performance issues and this can necessitate last minute code changes.
  3. We would like to incorporate all those changes if they are done and all experiments completed before the submission deadline.
  4. This means, software changes even as late as 2-3 days before the submission deadline should be allowed in Available/Preview category.
  5. If an accelerator company provides any other company an SDK before the submission deadline and if this qualifies as a "beta release" then all the above concerns are void.

@DilipSequeira
Copy link
Contributor

@arjunsuresh "Our" meaning OctoML, right?

My understanding is that you guys submit OSS, so none of the above discussion is relevant to you, as anything from an OSS repo is always classed as "Available".

@arjunsuresh
Copy link
Contributor Author

Yes Dilip. Actually we also have plans to test the performance on different hardware platforms and there we also rely on their respective software stack.

@DilipSequeira
Copy link
Contributor

What's the problem with asking those manufacturers to provide release software (or at least beta), if they're submitting Available hardware, given that the schedules are known six months in advance.

It seems the only reason is that you want to incorporate all the results of experiments right up to the deadline, rather than having an internal deadline that your partners can target. But the cost of that is allowing more unauditable unreproducible submissions.

As well as the problems I pointed out above - supposed we allowed this.

  • What prevents the submission of experimental software that isn't fit for customer use?
  • What's the required relationship between the software you submit in preview, and the software you turn up with for the next release to demonstrate the performance in Available? For hardware fairly clear that it's the same physical device, but for software, can it be arbitrarily different software on the same hardware platform?

@arjunsuresh
Copy link
Contributor Author

yes Dilip. The point is to make use of all results of the experiments as much as possible. As we discussed in the Inference meeting, we can use rdi-software category for such results (which will be exempt from audit but won't carry the restrictions of rdi-hardware category. Another option will be to allow available-hardware category which means only hardware is available (software is premature) and such systems are part of audit. Either of these will work for us.

DilipSequeira added a commit to DilipSequeira/policies that referenced this issue Sep 30, 2022
This also adds a definition of a Reproducible software component so that RDI does not become a venue for submissions that are not reproducible and can never be reproducible.
@arjunsuresh
Copy link
Contributor Author

Hi @DilipSequeira can you please give a PR for your change? This issue came up for Tiny submissions too and as per the current rules if a software is not available but hardware is, it is problematic.

@DilipSequeira
Copy link
Contributor

#126

@arjunsuresh
Copy link
Contributor Author

Thank you @DilipSequeira Sorry I had missed it earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants