Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS create server group is throwing exception because of problem with Orca running with Java 17 #6926

Open
jgrumboe opened this issue Feb 1, 2024 · 9 comments
Labels

Comments

@jgrumboe
Copy link

jgrumboe commented Feb 1, 2024

Issue Summary:

We wanted to upgrade to 1.33.0 and ECS deployments started failing with the an exception:

Exception ( Create Server Group )
Exception evaluating property 'value' for java.util.ArrayList, Reason: groovy.lang.MissingPropertyException: No such property: value for class: java.lang.String

Downgrading to 1.32.3 resolved the problem.

Cloud Provider(s):

AWS ECS

Environment:

Tested in two environments:

  • production env running on GKE
  • local dev environment running from localgit

Feature Area:

Pipeline stage "Deploy" in combination with a "Find Image from Tags" stage

Description:

This is the complete stacktrace from the execution:

groovy.lang.MissingPropertyException: Exception evaluating property 'value' for java.util.ArrayList, Reason: groovy.lang.MissingPropertyException: No such property: value for class: java.lang.String
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.getAtIterable(DefaultGroovyMethods.java:8760)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.getAt(DefaultGroovyMethods.java:8748)
	at groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:2055)
	at org.codehaus.groovy.runtime.callsite.GetEffectivePojoPropertySite.getProperty(GetEffectivePojoPropertySite.java:63)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGetProperty(AbstractCallSite.java:329)
	at com.netflix.spinnaker.orca.clouddriver.tasks.providers.ecs.EcsServerGroupCreator.getOperations(EcsServerGroupCreator.groovy:125)
	at com.netflix.spinnaker.orca.clouddriver.tasks.servergroup.ServerGroupCreator.call(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
	at com.netflix.spinnaker.orca.clouddriver.tasks.servergroup.CreateServerGroupTask.execute(CreateServerGroupTask.groovy:55)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:166)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:122)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withLoggingContext(RunTaskHandler.kt:473)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.access(RunTaskHandler.kt:89)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:122)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:121)
	at com.netflix.spinnaker.orca.q.handler.AuthenticationAware.withAuth-0(AuthenticationAware.kt:51)
	at com.netflix.spinnaker.security.AuthenticatedRequest.lambda-zsh(AuthenticatedRequest.java:272)
	at com.netflix.spinnaker.orca.q.handler.AuthenticationAware.withAuth(AuthenticationAware.kt:51)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withAuth(RunTaskHandler.kt:89)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:121)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:119)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:292)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:281)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler.invoke(OrcaMessageHandler.kt:69)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler.invoke(OrcaMessageHandler.kt:61)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler.invoke(OrcaMessageHandler.kt:86)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler.invoke(OrcaMessageHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler.withExecution(OrcaMessageHandler.kt:96)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withExecution(RunTaskHandler.kt:89)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler.withStage(OrcaMessageHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withStage(RunTaskHandler.kt:89)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler.withTask(OrcaMessageHandler.kt:61)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:89)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:281)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle-0(RunTaskHandler.kt:119)
	at com.netflix.spinnaker.orca.lock.NoOpRunOnLockAcquired.execute(ExternalLock.kt:191)
	at com.netflix.spinnaker.orca.lock.RetriableLock.get(RetriableLock.java:128)
	at com.netflix.spinnaker.orca.lock.RetriableLock.get(RetriableLock.java:103)
	at com.netflix.spinnaker.kork.core.RetrySupport.retry(RetrySupport.java:34)
	at com.netflix.spinnaker.orca.lock.RetriableLock.lock(RetriableLock.java:57)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withLocking(RunTaskHandler.kt:298)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:118)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:89)
	at com.netflix.spinnaker.q.MessageHandler.invoke(MessageHandler.kt:36)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler.invoke(OrcaMessageHandler.kt:46)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:89)
	at com.netflix.spinnaker.orca.q.audit.ExecutionTrackingMessageHandlerPostProcessor.invoke(ExecutionTrackingMessageHandlerPostProcessor.kt:72)
	at com.netflix.spinnaker.q.QueueProcessor.invoke-0(QueueProcessor.kt:90)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

So, something is happening in orca in EcsServerGroupCreator.groovy line 125 I thought. That's where a Deploy stage with image referenced from a previous Find Image from Tags stage tries to get the imageId.
I searched if the code changed, but there weren't any changes in the last 5 years.

My next thought was maybe it's related to Java 17 upgrade of orca.

I tested multiple scenarios locally:

  • version 1.32.3, all services running Java 11: all fine, no exception
  • version 1.33.0, all services running Java 17: exception in orca for ECS create server group
  • version 1.33.0, orca running Java 11, rest running Java 11: all fine, no exception
  • version 1.33.0, all services running Java 11: all fine, no exception

Steps to Reproduce:

  • Start Spinnaker 1.33.0 with all services running Java 17
  • create a pipeline with the following stages
    • Find Image from Tags
    • Deploy to ECS server group
      • container mapping should reference the previous stage output
  • Run the pipeline, it will fail the Deploy stage after 1 second with an exception

Additional Details:

It only fails in combination with the Find Image from Tags stage reference.
Single Deploy stages with fixed container mappings are working.

@spinnakerbot
Copy link

This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:

@spinnakerbot remove-label stale

@jgrumboe
Copy link
Author

@spinnakerbot remove-label stale

@jgrumboe
Copy link
Author

@dbyron-sf could you or someone else have a look here?

@dbyron-sf
Copy link
Contributor

@jgrumboe Can you post the contents of your bake stage? I'm not familiar with ecs....trying to find the bit of pipeline config that:

bakeStage.context.amiDetails.imageId.value.get(0).toString()

from here references. As I look at the code more closely, maybe it's your find image stage?

As I look at the output of one of my find image stages, imageId is a string, not a list. It's amiDetails that's a list though. See FindImageFromTagsTask for that. ImageFinder seems to be the source of truth that imageId is a String.

So, I'm not sure how this ever worked.

@jgrumboe
Copy link
Author

There is no bake stage as we build the image in GitHub Actions and push it to AWS ECR. So we just have a FindImageFromTags stage that looks up the image tag in AWS ECR.

I had already a quick conversation in slack with @jasonmcintosh who thinks it could be Groovy3 related: https://spinnakerteam.slack.com/archives/C091CCWRJ/p1706795583143819

As said, the FindImageFromTags stage has worked fine under Java11 and for us for years already.

@dbyron-sf
Copy link
Contributor

@jgrumboe Are you set up to make/test the code change here to treat amiDetails as a list and imageId as a string?

@jgrumboe
Copy link
Author

You mean localdev setup? Yes, I have it working.
So, I should change https://github.com/spinnaker/orca/blob/da1f0bbb93f8980b87a986cbfc3c1548fed3aeb5/orca-clouddriver/src/main/groovy/com/netflix/spinnaker/orca/clouddriver/tasks/providers/ecs/EcsServerGroupCreator.groovy#L125 to treat amiDetails as a list and imageId as a string?
I can try that the next days.

@dbyron-sf
Copy link
Contributor

Yes please.

@spinnakerbot
Copy link

This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:

@spinnakerbot remove-label stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants