Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite watch-streams stopping immediately with no obvious reasons why. The watched resources do then spin up. #1094

Open
James-Hirst-1998 opened this issue Jan 25, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@James-Hirst-1998
Copy link

Long story short

When my cluster is being spun up one of the pods uses the Kopf framework to watch for service monitors to arrive (Prometheus CRDs) and then it should do some further manipulation. In the error case the pod first finds the CRD for the service monitor and starts the watch-stream for the objects but almost immediately stops the watch-stream.

There are no logs as to why it stops, no error code and the pod is marked as healthy. I have confirmed the service monitors do spin up in this unhealthy case so I am unsure why the infinite watch is being terminated so quickly. In a healthy case I have logs where the watch-stream starts and after 30 seconds it sees the first service monitor and then does its further manipulation.

Kopf version

1.36.2

Kubernetes version

1.28.3

Python version

3.8.10

Code

@kopf.on.startup()
def configure(settings, **_):
    """Configure kopf."""
    settings.watching.server_timeout = 60
    settings.watching.client_timeout = 70

@kopf.on.create("servicemonitor")
async def servicemonitor_create_fn(spec, namespace, logger, **_kwargs):
    """Trigger function for a ServiceMonitor being created."""
    logger.debug("A service monitor has been created")

Logs

{"message": "Starting Kopf 1.36.2.", "timestamp": "2024-01-10T18:11:44.808848+00:00", "severity": "debug"} 
{"message": "Activity 'configure' is invoked.", "timestamp": "2024-01-10T18:11:44.809096+00:00", "severity": "debug"} 
{"message": "Activity 'configure' succeeded." "timestamp": "2024-01-10T18:11:44.810125+00:00", "severity": "info"}
{"message": "Initial authentication has been initiated.", "timestamp": "2024-01-10T18:11:44.810619+00:00", "severity": "info"} 
{"message": "Activity 'login_with_service_account' is invoked.", "timestamp": "2024-01-10T18:11:44.810795+00:00", "severity": "debug"}
{"message": "Activity 'login_with_service_account' succeeded.", "timestamp": "2024-01-10T18:11:44.811506+00:00", "severity": "info"}
{"message": "Initial authentication has finished.", "timestamp": "2024-01-10T18:11:44.811631+00:00", "severity": "info"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:11:45.505721+00:00", "severity": "warn"} 
{"message": "Starting the watch-stream for customresourcedefinitions.vl.apiextensions.k8s.io cluster-wide.", "timestamp": "2024-01-10T18:11:45.506829+00:00", "severity": "debug"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:11:57.826303+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:11:57.829433+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:02.336769+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:02.347011+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.109591+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.111244+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.114258+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.210684+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.213385+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.214971+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.307980+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.319059+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.655493+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.661128+00:00", "severity": "warn"}
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.709665+00:00", "severity": "warn"}
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.805993+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.916102+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:13:00.005047+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:13:00.214595+00:00", "severity": "warn"} 
{"message": "Starting the watch-stream for servicemonitors.v1.monitoring.coreos.com cluster-wide.", "timestamp": "2024-01-10T18:13:00.311233+00:00", "severity": "debug"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:13:00.315617+00:00", "severity": "warn"} 
{"message": "Stopping the watch-stream for servicemonitors.v1.monitoring.coreos.com cluster-wide.", "timestamp": "2024-01-10T18:13:00.404687+00:00", "severity": "debug"}

Additional information

We have found this framework super useful so far and first saw this issue in the Kopf version 1.36.2 so I believe it could be related to this change - a499244

I have no kubernetes API pods in my cluster to get logs off to help me debug the issue and the code snippet I provided has stripped out some of servicemonitor_create_fn logic because we are not getting as far as entering the function. I am simultaneously looking for a workaround this this bug by trying to find out if there is any setting to update to allow a retry or stop querying the kubernetes API quite as regularly so if you can provide any details on that it would be great.

Final thing to note is in a healthy cluster example the final unresolved log
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:13:00.315617+00:00", "severity": "warn"}
does not appear after the watch-stream has started so maybe it could be a timing window issue.

@James-Hirst-1998 James-Hirst-1998 added the bug Something isn't working label Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant