Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC2 Ubuntu instance health check failure: SSM get correct "Online" Output response from instance, but UI shows Offline #267

Open
owenCCY opened this issue Mar 14, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@owenCCY
Copy link
Contributor

owenCCY commented Mar 14, 2024

Describe the bug

Customer using CLO 2.1.1 (upgraded from CLO1.0.x)
They have more than 100 instances.
They are using Ubuntu 22.04, and installed flb agent, SSM get correct "Online" Output response from instance, but UI shows Offline.

Output:
{"fluent-bit":{"version":"1.9.10","edition":"Community","flags":["FLB_HAVE_IN_STORAGE_BACKLOG","FLB_HAVE_PARSER","FLB_HAVE_RECORD_ACCESSOR","FLB_HAVE_STREAM_PROCESSOR","FLB_HAVE_TLS","FLB_HAVE_OPENSSL","FLB_HAVE_METRICS","FLB_HAVE_AWS","FLB_HAVE_AWS_CREDENTIAL_PROCESS","FLB_HAVE_SIGNV4","FLB_HAVE_SQLDB","FLB_HAVE_METRICS","FLB_HAVE_HTTP_SERVER","FLB_HAVE_SYSTEMD","FLB_HAVE_VALGRIND","FLB_HAVE_FORK","FLB_HAVE_TIMESPEC_GET","FLB_HAVE_GMTOFF","FLB_HAVE_UNIX_SOCKET","FLB_HAVE_LIBYAML","FLB_HAVE_ATTRIBUTE_ALLOC_SIZE","FLB_HAVE_PROXY_GO","FLB_HAVE_JEMALLOC","FLB_HAVE_LIBBACKTRACE","FLB_HAVE_REGEX","FLB_HAVE_UTF8_ENCODER","FLB_HAVE_LUAJIT","FLB_HAVE_C_TLS","FLB_HAVE_ACCEPT4","FLB_HAVE_INOTIFY","FLB_HAVE_GETENTROPY","FLB_HAVE_GETENTROPY_SYS_RANDOM"]}}

Expected Behavior

Instance Online

Current Behavior

Instance Offline

Reproduction Steps

Use CLO 2.1.1, have more then 1 page of instances, install agent in Ubuntu 22.04 then click load more.

Possible Solution

No response

Additional Information/Context

No response

Solution Version

2.1.1

AWS Region. e.g., us-east-1

No response

Other information

No response

@owenCCY owenCCY added the bug Something isn't working label Mar 14, 2024
@owenCCY
Copy link
Contributor Author

owenCCY commented Mar 15, 2024

Base on customer environment check, their frontend sends duplicate instance ids into SSM client, causing api error:
An error occurred (DuplicateInstanceId) when calling the SendCommand operation:

Checked their UI, duplicate instances are listed in the frontend, the api call getInstanceAgentStatus sends duplicate instance ids, causing the above issue.

The solution applied for customer:
Add dedupe code in CentralizedLogging-APIInstanceAPIInstanceAgentStatus
from : instance_list = args.get("instanceIds", list())
to: instance_list = list(set(args.get("instanceIds", list())))

@owenCCY
Copy link
Contributor Author

owenCCY commented Mar 15, 2024

Base on reproduction test, we do not see the same issue in the release version (2.1.1 and above).

Will keep watching if upgraded customers have the same issue.

@owenCCY owenCCY closed this as completed Mar 15, 2024
@owenCCY owenCCY reopened this Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant