Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Action] Practice to measure power consumption for a project which CI/CD. #397

Open
1 of 5 tasks
SamYuan1990 opened this issue Apr 25, 2024 · 0 comments
Open
1 of 5 tasks

Comments

@SamYuan1990
Copy link
Contributor

Description

The power consumption becomes a problem when we running LLM on data centers and k8s.
Ref to the cloud native AI white paper, the difference between technological steak the makes the case more complex.
For example, different GPU device, different deployment architectures, TEE from security points of view, etc.
Recently as kepler community completed a POC for set up tekton on a clean BM on AWS, and other discussions around to make kepler's validation with pipeline.
An interesting question is that how kepler validate itself between current testing version and latest stable version.
If so, which means with a stable version of kepler and pipeline as github action, tekton etc... we can make a pattern for measure power consumption for any project via CI/CD pipeline.

Outcome

  • Concept level: A pattern for any project on k8s to measure power consumption via group of cloud native tools as tekton, kind, kepler.

  • Implementation level: The pattern should be implemented flexible enough to cover different cases with pluggable with a sample code repo for share and reuse as github action or other... approaches.

  • self owned github runner.

  • different arch.

  • different OS.

  • BM/VM.

  • etc....

  • Deliver level: A blog and events to share this pattern.

  • Ownership level: From kepler community to share it to TAG as common/generic infra?

To-Do

  • kepler community complete validate kepler itself this year.
  • refine kepler model server totken logic, to decouple workload phase from model server training and reuse it as workload.
  • base on the workload, making pipeline to validate kepler between versions.
  • find another project replace workload parts and validation parts as an example.

note : as kepler's model having power from idle and dynamic, a workload is need for the target project to... get idle and dynamic power changes?

cc: @rootfs, @sunya-ch, @marceloamaral , please help me correction for any mistake. or we can correct later on.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Comments

it may over years to be completed, maybe we can breakdown tasks and making things parallel.
Some previous discussion on sustainable-computing-io/kepler-model-server#212
the example https://github.com/sustainable-computing-io/aws_ec2_self_hosted_runner/blob/main/.github/workflows/ci_integration.yml#L35-L73 for set up tekton on a new created ec2 instance.

@leonardpahlke leonardpahlke changed the title [<Action>] Practice to measure power consumption for a project which CI/CD. [Action] Practice to measure power consumption for a project which CI/CD. Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant