Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using yamlencode() to define flink-conf.yaml might be broken #13

Open
thodson-usgs opened this issue Jan 22, 2024 · 4 comments
Open

Using yamlencode() to define flink-conf.yaml might be broken #13

thodson-usgs opened this issue Jan 22, 2024 · 4 comments

Comments

@thodson-usgs
Copy link
Contributor

I'm starting to suspect that the output of yamlencode() to flink-conf.yaml is not being parsed correctly.
When I look at the logs, values set my yamlencode() appear as:
"number": "12345"
whereas other values appear as
number: 12345

In fact, both values can occur in the same deployment, which make me think they aren't equivalent.

@yuvipanda
Copy link
Collaborator

yuvipanda commented Jan 22, 2024

Numbers are the tricky bit, but usually yamlencode just passes these through - so "12345" gets encoded to "12345" while 12345 gets encoded to 12345. Do you have an example config to show?

The other option is to switch to jsonencode, because json is a subset of YAML and anything that accepts yaml should accept JSON (with the exception of the presence of hard tabs). This is actually generally what I do, because JSON isn't space significant while YAML is.

@thodson-usgs
Copy link
Contributor Author

thodson-usgs commented Jan 26, 2024

I believe I've verified this while testing restart strategies.
Viewing the job manager config in Flink's dashboard:
I pass the config through terraform and I get
"restart-strategy.type": "exponential-delay"
"restart-strategy.exponential-delay.max-backoff": "20 min"
and the restart fails.

Whereas if I pass the config through the recipe's config.py I get
restart-strategy.type: exponential-delay
restart-strategy.exponential-delay.max-backoff : 20 min
and the restart works.

If I pass both, both appear in the config, which may also indicate they are not equivalent.

I'll report more examples as I come across them, but I'm feeling confident that our terraform-flink-config is broken!

@thodson-usgs
Copy link
Contributor Author

Also checking the job manager logs, all the quoted parameters appear red. Unclear what red indicates, but I would guess an error (color not shown).

INFO  [] - Loading configuration property: blob.server.port, 6124
INFO  [] - Loading configuration property: kubernetes.jobmanager.annotations, flinkdeployment.flink.apache.org/generation:2
INFO  [] - Loading configuration property: kubernetes.jobmanager.replicas, 1
INFO  [] - Loading configuration property: "kubernetes.operator.metrics.reporter.prom.port", "9999"
INFO  [] - Loading configuration property: taskmanager.memory.task.off-heap.size, 256m
INFO  [] - Loading configuration property: jobmanager.rpc.address, gh-2dhytest-2dfeedstocks-2dgpcp-2dfrom-408fbc.default
INFO  [] - Loading configuration property: kubernetes.taskmanager.cpu, 1.0
INFO  [] - Loading configuration property: "prometheus.io/port", "9999"
INFO  [] - Loading configuration property: kubernetes.service-account, flink
INFO  [] - Loading configuration property: kubernetes.cluster-id, gh-2dhytest-2dfeedstocks-2dgpcp-2dfrom-408fbc
INFO  [] - Loading configuration property: "restart-strategy.exponential-delay.initial-backoff", "10 s"
INFO  [] - Loading configuration property: kubernetes.internal.taskmanager.replicas, 5
INFO  [] - Loading configuration property: taskmanager.memory.flink.size, 1536m
INFO  [] - Loading configuration property: kubernetes.container.image, flink:1.16
INFO  [] - Loading configuration property: parallelism.default, 1
INFO  [] - Loading configuration property: kubernetes.namespace, default
INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
INFO  [] - Loading configuration property: taskmanager.memory.jvm-overhead.max, 1024m
INFO  [] - Loading configuration property: kubernetes.rest-service.exposed.type, ClusterIP

@thodson-usgs
Copy link
Contributor Author

thodson-usgs commented Jan 30, 2024

#6 may close this. (...maybe not, did another pull and I see this branch moved back to yamlencode)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants