Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kopf crashes when there are disabled APIServers. #1073

Open
mehrdad-khojastefar opened this issue Oct 29, 2023 · 6 comments
Open

Kopf crashes when there are disabled APIServers. #1073

mehrdad-khojastefar opened this issue Oct 29, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@mehrdad-khojastefar
Copy link
Contributor

Long story short

I've come across this issue when I was trying to run my operator inside a kubernetes cluster that has linkerd.io as its service mesh. the thing is it is not setup correctly so the team decided to disable the api via --runtime-config. now the /apis/tap.linkerd.io/v1alpha1/ returns 503 errors.
Normally I would like to ignore this error as the kubectl does, when I list pods it shows a little warning that tap.linkerd.io is not available and then shows me the list of pods.
But I noticed that kopf keeps getting crashed.

I have tried settings.scanning.disabled = True but that did not help, although I thought it would while reading the docs.

Kopf version

1.36.2

Kubernetes version

1.23.13

Python version

3.11

Code

No response

Logs

/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/running.py:179: FutureWarning: Absence of either namespaces or cluster-wide flag will become an error soon. For now, switching to the cluster-wide mode for backward compatibility.
  warnings.warn("Absence of either namespaces or cluster-wide flag will become an error soon."
[2023-10-29 18:31:15,038] kopf.activities.star [INFO    ] Activity 'startup_config' succeeded.
[2023-10-29 18:31:15,128] kopf._core.engines.a [INFO    ] Initial authentication has been initiated.
[2023-10-29 18:31:15,130] kopf.activities.auth [INFO    ] Activity 'login_fn' succeeded.
[2023-10-29 18:31:15,130] kopf._core.engines.a [INFO    ] Initial authentication has finished.
[2023-10-29 18:31:16,932] kopf._core.reactor.o [ERROR   ] Request attempt #1/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:19,133] kopf._core.reactor.o [ERROR   ] Request attempt #2/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:20,236] kopf._core.reactor.o [ERROR   ] Request attempt #3/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:22,335] kopf._core.reactor.o [ERROR   ] Request attempt #4/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:25,347] kopf._core.reactor.o [ERROR   ] Request attempt #5/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:30,437] kopf._core.reactor.o [ERROR   ] Request attempt #6/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:38,454] kopf._core.reactor.o [ERROR   ] Request attempt #7/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:51,477] kopf._core.reactor.o [ERROR   ] Request attempt #8/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:32:12,502] kopf._core.reactor.o [ERROR   ] Request attempt #9/9 failed; escalating: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:32:12,538] kopf._core.reactor.r [ERROR   ] Resource observer has failed: (None, None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 148, in check_response
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 108, in guard
    await coro
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/observation.py", line 113, in resource_observer
    resources = await scanning.scan_resources(groups=group_filter, settings=settings, logger=logger)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 31, in scan_resources
    resources.update(await coro)
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 605, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 83, in _read_new_apis
    resources.update(await coro)
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 605, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 97, in _read_version
    rsp = await api.get(url, settings=settings, logger=logger)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/api.py", line 111, in get
    response = await request(
               ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/auth.py", line 45, in wrapper
    return await fn(*args, **kwargs, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/api.py", line 85, in request
    await errors.check_response(response)  # but do not parse it!
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 150, in check_response
    raise cls(payload, status=response.status) from e
kopf._cogs.clients.errors.APIServerError: (None, None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 148, in check_response
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/kopf", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/cli.py", line 60, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 92, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/cli.py", line 109, in run
    return running.run(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/running.py", line 81, in run
    asyncio.run(coro)
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/running.py", line 138, in operator
    await run_tasks(operator_tasks, ignored=existing_tasks)
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/running.py", line 419, in run_tasks
    await aiotasks.reraise(root_done | root_cancelled | hung_done | hung_cancelled)
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 238, in reraise
    task.result()  # can raise the regular (non-cancellation) exceptions.
    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 108, in guard
    await coro
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/observation.py", line 113, in resource_observer
    resources = await scanning.scan_resources(groups=group_filter, settings=settings, logger=logger)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 31, in scan_resources
    resources.update(await coro)
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 605, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 83, in _read_new_apis
    resources.update(await coro)
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 605, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 97, in _read_version
    rsp = await api.get(url, settings=settings, logger=logger)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/api.py", line 111, in get
    response = await request(
               ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/auth.py", line 45, in wrapper
    return await fn(*args, **kwargs, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/api.py", line 85, in request
    await errors.check_response(response)  # but do not parse it!
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 150, in check_response
    raise cls(payload, status=response.status) from e
kopf._cogs.clients.errors.APIServerError: (None, None)

Additional information

No response

@mehrdad-khojastefar mehrdad-khojastefar added the bug Something isn't working label Oct 29, 2023
@prabhatkgupta
Copy link

@mehrdad-khojastefar I'm also facing the exact same issue in our k8s cluster

@prabhatkgupta
Copy link

prabhatkgupta commented Jan 18, 2024

@mehrdad-khojastefar were you able to find any workaround on this issue?

@mehrdad-khojastefar
Copy link
Contributor Author

@prabhatkgupta I was able to fix it, you can take a look at it here https://github.com/mehrdad-khojastefar/kopf
As you can tell I hadn't had time to make it a proper pull request :), I am using this version in production and it hadn't have problems since. Please review it and use it with caution. I don't suggest to use it everywhere without testing and ... .
I will make it a proper pull request in the upcomming weeks.

@prabhatkgupta
Copy link

@mehrdad-khojastefar how can I use your code in my docker?

@mehrdad-khojastefar
Copy link
Contributor Author

mehrdad-khojastefar commented Feb 10, 2024

@prabhatkgupta
https://gist.github.com/javrasya/e95ade856ff42e4649972f8a54368459
This would help. you need to modify requirements.txt file and rebuild your docker image

@prabhatkgupta
Copy link

@mehrdad-khojastefar tried to pip install from your github repo, facing the following issue

Traceback (most recent call last):
 File "/usr/local/bin/kopf", line 5, in <module>
   from kopf.cli import main
 File "/usr/local/lib/python3.9/site-packages/kopf/__init__.py", line 117, in <module>
   from kopf._core.engines.admission import (
 File "/usr/local/lib/python3.9/site-packages/kopf/_core/engines/admission.py", line 14, in <module>
   from kopf._cogs.clients import creating, errors, patching
 File "/usr/local/lib/python3.9/site-packages/kopf/_cogs/clients/creating.py", line 3, in <module>
   from kopf._cogs.clients import api
 File "/usr/local/lib/python3.9/site-packages/kopf/_cogs/clients/api.py", line 55, in <module>
   ) -> aiohttp.ClientResponse | None:
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants