Utilization derived metric #101

johntran-nv · 2022-02-14T22:14:03Z

No description provided.

github-actions · 2022-02-14T22:14:18Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

johntran-nv · 2022-02-14T22:17:33Z

@petermattson , what do you think of this?

tjablin · 2022-02-14T23:48:20Z

peak_system_tensor_flops_per_second means the peak tensor operations of the hardware, counting only tensor math throughput and not additional vector or pointwise math datapaths.

I think peak_system_tensor_flops_per_second is well-defined for architectures like NVIDIA GPUs and Google TPUs, but is not well-defined for CPUs, DSPs, or FPGAs. Furthermore, it is possible that some architectures may allow overlapping tensor and vector math operations such to achieve greater than 100% throughput.

Comparing utilization for different software implementations for fixed hardware seems useful, but utilization comparisons between different hardware seems meaningless. I guess the intention is to promote utilization as a conversion factor between FLOPS and actual performance, but I think we should try to promote comparisons based on performance directly and not try to fix FLOPS.

Given the experience in MLPerf Inference with NIC to accelerator bandwidth, I would prefer not to involve numbers in MLPerf that cannot be measured directly. I don't want to adjudicate complaints that someone is not calculating the peak tensor operations of their hardware correctly, and I also don't want to get into the business of measuring FLOPS.

The model_tensor_flops term seems like a pain to compute with many possible edge cases and room for disagreements. If this is proposal is accepted, I would prefer that MLCommons provide an official model_tensor_flops for each model rather than allowing submitters to calculate their own.

The definition of model_tensor_flops implies that there is a single unambiguous number of operations required by a given model, but its unclear how to count required operations when an implementation may choose a sub-cubic dot implementation. How are model_tensor_flops counted for sparse operations?

petermattson · 2022-02-21T14:03:41Z

Should we consider making this a "recommended methodology" or something rather than binding it tightly into results guidelines? Then we can resolve some of these issues through discretion in application, but the methodology is citable for consistency when appropriate.

…

On Tue, Feb 15, 2022 at 12:48 AM tjablin ***@***.***> wrote: - peak_system_tensor_flops_per_second means the peak tensor operations of the hardware, counting only tensor math throughput and not additional vector or pointwise math datapaths. I think peak_system_tensor_flops_per_second is well-defined for architectures like NVIDIA GPUs and Google TPUs, but is not well-defined for CPUs, DSPs, or FPGAs. Furthermore, it is possible that some architectures may allow overlapping tensor and vector math operations such to achieve greater than 100% throughput. Comparing utilization for different software implementations for fixed hardware seems useful, but utilization comparisons between different hardware seems meaningless. I guess the intention is to promote utilization as a conversion factor between FLOPS and actual performance, but I think we should try to promote comparisons based on performance directly and not try to fix FLOPS. Given the experience in MLPerf Inference with NIC to accelerator bandwidth, I would prefer not to involve numbers in MLPerf that cannot be measured directly. I don't want to adjudicate complaints that someone is not calculating the peak tensor operations of their hardware correctly, and I also don't want to get into the business of measuring FLOPS. The model_tensor_flops term seems like a pain to compute with many possible edge cases and room for disagreements. If this is proposal is accepted, I would prefer that MLCommons provide an official model_tensor_flops for model rather than allowing submitters to calculate their own. The definition of model_tensor_flops implies that there is a single unambiguous number of operations required by a given model, but its unclear how to count required operations when an implementation may choose a sub-cubic dot implementation. How are model_tensor_flops counted for sparse operations? — Reply to this email directly, view it on GitHub <#101 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIIVUHITDFMEKU5R67JAK3DU3GIE7ANCNFSM5OMXN3LQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Utilization derived metric

e9b5171

johntran-nv added 2 commits February 14, 2022 16:07

Update RESULTS_GUIDELINES.md

72064a4

Update RESULTS_GUIDELINES.md

b5bd3a3

nv-rborkar mentioned this pull request Apr 7, 2022

Recommended methodology for calculating Utilization mlcommons/training_policies#486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilization derived metric #101

Utilization derived metric #101

johntran-nv commented Feb 14, 2022

github-actions bot commented Feb 14, 2022 •

edited

johntran-nv commented Feb 14, 2022

tjablin commented Feb 14, 2022 •

edited

petermattson commented Feb 21, 2022 via email

Utilization derived metric #101

Are you sure you want to change the base?

Utilization derived metric #101

Conversation

johntran-nv commented Feb 14, 2022

github-actions bot commented Feb 14, 2022 • edited

johntran-nv commented Feb 14, 2022

tjablin commented Feb 14, 2022 • edited

petermattson commented Feb 21, 2022 via email

github-actions bot commented Feb 14, 2022 •

edited

tjablin commented Feb 14, 2022 •

edited