[dy] Record memory and cpu usage for pipeline runs #4761

dy46 · 2024-03-14T22:42:04Z

Description

This PR adds memory and cpu tracking for pipeline runs in the backend. There will be a follow up PR to surface the metrics in the frontend.

Calculating memory is relatively straightforward, we can get the memory usage for all processes for a pipeline run. To get the memory usage per pipeline, we can use psutil.Process.memory_percent(). CPU can be done in a similar way, but getting the cpu usage percentage per process is more complicated. The comments in the code explains more in depth how it's implemented, but basically we need to track the cpu usage at various points in time and compare the numbers to get the cpu usage percent.

Right now, we get memory and cpu usage every time a heartbeat is run for the pipeline run, so roughly every 10 seconds. The usage is stored in the PipelineRun.metrics field. The pipeline run cpu and memory usage will also be included in the tags for the heartbeat log.

Example:

How Has This Been Tested?

Tested locally with various pipelines

Checklist

The PR is tagged with proper labels (bug, enhancement, feature, documentation)
I have performed a self-review of my own code
I have added unit tests that prove my fix is effective or that my feature works
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
If new documentation has been added, relative paths have been added to the appropriate section of docs/mint.json

cc: @wangxiaoyou1993

wangxiaoyou1993 · 2024-03-21T00:59:26Z

mage_ai/orchestration/pipeline_scheduler_original.py

        self.logger.info(
            f'Pipeline {self.pipeline.uuid} for run {self.pipeline_run.id} '
            f'in schedule {self.pipeline_run.pipeline_schedule_id} is alive.',
            **tags,
        )

-        if memory_usage and memory_usage >= MEMORY_USAGE_MAXIMUM:
-            self.memory_usage_failure(tags=tags)
+    def __measure_and_record_usage(self) -> Tuple[float, float]:


This will work with local_python executor but not other executors.

wangxiaoyou1993 · 2024-03-21T01:05:01Z

mage_ai/orchestration/pipeline_scheduler_original.py

+            self.pipeline.type in [PipelineType.INTEGRATION, PipelineType.STREAMING]
+            or self.pipeline.run_pipeline_in_one_process
+        ):
+            if job_manager.has_pipeline_run_job(self.pipeline_run.id):


this will not work when you have multiple schedulers

dy46 added 2 commits March 13, 2024 22:21

[dy] Initial commit

8dfce49

[dy] Add comments

3002c85

dy46 requested a review from wangxiaoyou1993 March 14, 2024 22:42

wangxiaoyou1993 reviewed Mar 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dy] Record memory and cpu usage for pipeline runs #4761

[dy] Record memory and cpu usage for pipeline runs #4761

dy46 commented Mar 14, 2024

wangxiaoyou1993 Mar 21, 2024

wangxiaoyou1993 Mar 21, 2024

[dy] Record memory and cpu usage for pipeline runs #4761

Are you sure you want to change the base?

[dy] Record memory and cpu usage for pipeline runs #4761

Conversation

dy46 commented Mar 14, 2024

Description

How Has This Been Tested?

Checklist

wangxiaoyou1993 Mar 21, 2024

Choose a reason for hiding this comment

wangxiaoyou1993 Mar 21, 2024

Choose a reason for hiding this comment