Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core affinity issue #175

Open
jpetucci opened this issue May 24, 2023 · 1 comment
Open

Core affinity issue #175

jpetucci opened this issue May 24, 2023 · 1 comment

Comments

@jpetucci
Copy link

jpetucci commented May 24, 2023

Description:
Running inference with the enable_workflow (Ray Workflow) option causes all processes to be pinned to a single core.

Steps to Reproduce:
Follow the conda or container installation instructions and run inference with the --enable_workflow option

Expected Behavior:
It is expected that the workload would be spread across the resources available in the Ray Cluster, i.e. processes should run on difference cores

Actual Behavior:
All processes and Ray workers have the same cpu/core affinity:

In top, notice the P column is all 0 for FastFold processes

top - 11:12:45 up 11 days, 17:25,  2 users,  load average: 20.82, 17.44, 8.73
Tasks: 734 total,   5 running, 729 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.8 sy,  1.3 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 385438.1 total,   1440.9 free,  18176.0 used, 365821.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 359381.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+  P COMMAND                                                                                                                                                                                                                                                                    
2125171 jmp579    35  15 1782.8g  18.3g   8.8g R  59.6   4.9   2:42.05  0 hhblits                                                                                                                                                                                                                                                                    
2125248 jmp579    35  15 1062268 137828   2996 R  12.3   0.0   1:09.75  0 jackhmmer                                                                                                                                                                                                                                                                  
2125246 jmp579    35  15 1007988  89680   2872 R  11.9   0.0   1:08.86  0 jackhmmer                                                                                                                                                                                                                                                                  
2125247 jmp579    35  15 1025432 105864   2884 R  10.3   0.0   0:58.70  0 jackhmmer                                                                                                                                                                                                                                                                  
2123624 jmp579    20   0  225.7g  86096  13028 S   1.0   0.0   0:02.73  0 raylet                                                                                                                                                                                                                                                                     
2123631 jmp579    20   0  804288 105296  49400 S   1.0   0.0   0:04.43  0 python                                                                                                                                                                                                                                                                     
2123420 jmp579    20   0  129.4g 437152 210308 S   0.7   0.1   0:12.96  4 python                                                                                                                                                                                                                                                                     
2123574 jmp579    20   0 1073304  32960  11948 S   0.7   0.0   0:01.69  0 gcs_server                                                                                                                                                                                                                                                                 
2123706 jmp579    35  15  113.1g 106740  54952 S   0.3   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123718 jmp579    35  15  113.1g 106980  55192 S   0.3   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123722 jmp579    35  15  113.1g 107372  55492 S   0.3   0.0   0:00.75  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123730 jmp579    35  15  113.1g 107144  55368 S   0.3   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123797 jmp579    35  15  114.8g 116336  61196 S   0.3   0.0   0:00.81  0 ray::_workflow_                                                                                                                                                                                                                                                            
2124995 jmp579    35  15  114.9g 115916  60600 S   0.3   0.0   0:00.72  0 ray::WorkflowMa                                                                                                                                                                                                                                                            
2126067 jmp579    20   0  276204   6036   4428 R   0.3   0.0   0:00.15 35 top                                                                                                                                                                                                                                                                        
2123086 jmp579    20   0  170892   6664   4888 S   0.0   0.0   0:00.01 19 sshd                                                                                                                                                                                                                                                                       
2123087 jmp579    20   0  235076   5316   3620 S   0.0   0.0   0:00.07 27 bash                                                                                                                                                                                                                                                                       
2123310 jmp579    20   0  170892   6664   4884 S   0.0   0.0   0:00.07 28 sshd                                                                                                                                                                                                                                                                       
2123311 jmp579    20   0  234772   4924   3516 S   0.0   0.0   0:00.06 31 bash                                                                                                                                                                                                                                                                       
2123407 jmp579    20   0  220704   3420   3084 S   0.0   0.0   0:00.00 31 bash                                                                                                                                                                                                                                                                       
2123592 jmp579    20   0  952648 103220  49492 S   0.0   0.0   0:00.57  0 python                                                                                                                                                                                                                                                                     
2123599 jmp579    20   0  957180 107896  49792 S   0.0   0.0   0:00.60  0 python                                                                                                                                                                                                                                                                     
2123682 jmp579    20   0  883632 110344  49640 S   0.0   0.0   0:01.02  0 python                                                                                                                                                                                                                                                                     
2123705 jmp579    35  15  113.1g 106984  55200 S   0.0   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123707 jmp579    35  15  113.1g 107012  55220 S   0.0   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123708 jmp579    35  15  113.1g 106724  54956 S   0.0   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123709 jmp579    35  15  113.1g 108988  55236 S   0.0   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123710 jmp579    35  15  113.1g 109304  55468 S   0.0   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123711 jmp579    35  15  113.1g 107060  55268 S   0.0   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123712 jmp579    35  15  113.1g 107492  55716 S   0.0   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123713 jmp579    35  15  113.1g 106824  55048 S   0.0   0.0   0:00.72  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123714 jmp579    35  15  113.1g 107204  55416 S   0.0   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123716 jmp579    35  15  113.1g 107036  55248 S   0.0   0.0   0:00.73  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123717 jmp579    35  15  113.1g 108752  54932 S   0.0   0.0   0:00.72  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123719 jmp579    35  15  113.1g 107388  55592 S   0.0   0.0   0:00.74  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123720 jmp579    35  15  113.1g 106728  54956 S   0.0   0.0   0:00.75  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123721 jmp579    35  15  113.1g 107036  55248 S   0.0   0.0   0:00.75  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123723 jmp579    35  15  113.1g 107356  55560 S   0.0   0.0   0:00.75  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123725 jmp579    35  15  113.1g 106744  54964 S   0.0   0.0   0:00.75  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123726 jmp579    35  15  113.1g 106888  55112 S   0.0   0.0   0:00.74  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123727 jmp579    35  15  113.1g 107004  55212 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123728 jmp579    35  15  113.1g 107044  55252 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123729 jmp579    35  15  113.1g 107124  55348 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123731 jmp579    35  15  113.1g 106944  55164 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123732 jmp579    35  15  113.1g 106784  55008 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123733 jmp579    35  15  113.1g 106996  55204 S   0.0   0.0   0:00.77  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123739 jmp579    35  15  113.1g 106872  55100 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123740 jmp579    35  15  113.1g 107312  55536 S   0.0   0.0   0:00.77  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123742 jmp579    35  15  113.1g 107216  55432 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123782 jmp579    35  15  113.1g 106884  55104 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123783 jmp579    35  15  113.1g 107128  55348 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123784 jmp579    35  15  113.1g 107040  55240 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123785 jmp579    35  15  113.1g 107100  55304 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123786 jmp579    35  15  113.1g 107008  55216 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123787 jmp579    35  15  113.1g 108688  54956 S   0.0   0.0   0:00.77  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123788 jmp579    35  15  113.1g 107040  55244 S   0.0   0.0   0:00.76  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123789 jmp579    35  15  113.1g 107216  55436 S   0.0   0.0   0:00.77  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123790 jmp579    35  15  113.1g 106488  54724 S   0.0   0.0   0:00.77  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123791 jmp579    35  15  113.1g 107052  55272 S   0.0   0.0   0:00.77  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123792 jmp579    35  15  113.1g 106908  55132 S   0.0   0.0   0:00.79  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123793 jmp579    35  15  114.8g 116588  61416 S   0.0   0.0   0:00.82  0 ray::_workflow_                                                                                                                                                                                                                                                            
2123794 jmp579    35  15  113.1g 106996  55216 S   0.0   0.0   0:00.79  0 ray::IDLE                                                                                                                                                                                                                                                                  
2123795 jmp579    35  15  114.8g 116376  61224 S   0.0   0.0   0:00.80  0 ray::_workflow_                                                                                                                                                                                                                                                            
2123796 jmp579    35  15  114.8g 115044  61140 S   0.0   0.0   0:00.79  0 ray::_workflow_                                                                                                                                                                                                                                                            
2125095 jmp579    35  15  113.3g 106756  55196 S   0.0   0.0   0:00.65  0 ray::Manager 

This is also confirmed with taskset (output is truncated)

$ for pid in $(ps -ef | grep ray | awk '{print $2}'); do taskset -cp $pid; done
pid 2123574's current affinity list: 0
pid 2123592's current affinity list: 0
pid 2123599's current affinity list: 0
pid 2123624's current affinity list: 0
pid 2123631's current affinity list: 0
pid 2123682's current affinity list: 0
pid 2123705's current affinity list: 0
pid 2123706's current affinity list: 0
pid 2123707's current affinity list: 0
pid 2123708's current affinity list: 0
pid 2123709's current affinity list: 0
pid 2123710's current affinity list: 0

Environment:

Operating System: Red Hat Enterprise Linux 8.7 (Ootpa)
Software Version: Latest, commit id 05681304651b1b29d7d887db169045ea3dd28fce

Steps Taken to Resolve:
This seems to be a torch issue see: ray-project/ray/issues/34201 and pytorch/pytorch/issues/99625 . One fix is to set KMP_AFFINITY to disabled before running inference:

KMP_AFFINITY=disabled
@Gy-Lu
Copy link
Contributor

Gy-Lu commented Jun 20, 2023

We have noticed this problem also, thanks for the solution :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants