Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance k8s container status #132

Open
mmmommm opened this issue Apr 17, 2024 · 0 comments
Open

Enhance k8s container status #132

mmmommm opened this issue Apr 17, 2024 · 0 comments
Labels
enhancement An improvement to an existing feature

Comments

@mmmommm
Copy link
Contributor

mmmommm commented Apr 17, 2024

This is just my opinion based on my experience. Thank you for the wonderful product :)

Problem

The current code monitoring the status of the containers only checks the container's status, resulting in error messages that are not user-friendly.

I believe that by having get_container_status return information other than pod_status, we can display more appropriate errors to the users.

Proposed Solution

This is quite simplified, but here's the idea. I'm using stringify to override methods, but there might be a better way.

k8s.py

# can not encode datetime type, define custom encoder and use it
class DatetimeJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime.datetime):
            return obj.isoformat()
        return obj

@overrides
def get_container_status(self, iteration: Optional[str]) -> str:
    # Locates the kernel pod using the kernel_id selector.  Note that we also include 'component=kernel'
    # in the selector so that executor pods (when Spark is in use) are not considered.
    # If the phase indicates Running, the pod's IP is used for the assigned_ip.
    pod_status = ""
    kernel_label_selector = f"kernel_id={self.kernel_id},component=kernel"
    ret = client.CoreV1Api().list_namespaced_pod(
        namespace=self.kernel_namespace, label_selector=kernel_label_selector
    )
    if ret and ret.items:
        # if ret.items is not empty, then return the strigify json of the pod data
        pod_dict = ret.items[0].to_dict()
        dump_json = json.dumps(pod_dict, cls=DatetimeJSONEncoder)
        return dump_json
    else:
        self.log.warning(f"kernel server pod not found in namespace '{self.kernel_namespace}'")
        return ""

Additional context

This might be specific to my environment, but by setting it to wait when the k8s pod is in the ContainerCreating state or no error has occurred, and ContainersReady is false, it has started to work properly even without a kernel image puller.

This is quite simplified example code

@overrides
async def confirm_remote_startup(self):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  pod_info = self.get_container_status(str(i))
      # if pod_info is empty string or None, it means the container is not found
      if pod_info:
          pod_info_json = json.loads(pod_info)
          status = pod_info_json["status"]
          pod_phase = status["phase"].lower()
          if pod_phase == "running":
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          else:
              if "conditions" in status:
                  for condition in status["conditions"]:
                      if "containerStatuses" in status:
                          # check if the ContainerCreating
                          if (
                              status["containerStatuses"][0]["state"]["waiting"]["reason"]
                              == "ContainerCreating"
                          ):
                              self.log.info("Container is creating ...")
                              continue
               if (
                          condition["type"] == "ContainersReady"
                          and condition["status"] != "True"
                      ):
                          self.log.warning("Containers are not ready waiting 1 second.")
                          await asyncio.sleep(1)
                          continue
@mmmommm mmmommm added the enhancement An improvement to an existing feature label Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant