Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entire lockup of Guardian system, running locally #3450

Open
mattsmithies opened this issue Apr 5, 2024 · 7 comments
Open

Entire lockup of Guardian system, running locally #3450

mattsmithies opened this issue Apr 5, 2024 · 7 comments
Assignees
Labels
bug Something isn't working community

Comments

@mattsmithies
Copy link
Contributor

mattsmithies commented Apr 5, 2024

Problem description

Having started to run the Guardian yesterday locally by upgrading to the latest version 2.23.1, I have been using it throughout today to create a new client against.

Specifically, I have simply been working with the login, register, token, and session endpoints before I start building out more advanced flows.

After using the API of guardian throughout the day it froze up completely, previously, before this time, I was testing the register flows.

Step to reproduce

Reproducibility in this case will be tricky, but I have run a couple of commands, including:

docker compose down
docker compose -d up 

Running docker ps yields this

CONTAINER ID   IMAGE                                  COMMAND                  CREATED          STATUS                    PORTS                                                                                        NAMES
98f4298c7ffb   hedera-guardian-web-proxy              "/docker-entrypoint.…"   13 minutes ago   Up 13 minutes             0.0.0.0:3000->80/tcp                                                                         hedera-guardian-web-proxy-1
878809b6661b   hedera-guardian-api-gateway            "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             3002/tcp, 6555/tcp                                                                           hedera-guardian-api-gateway-1
bc5a50efae6a   hedera-guardian-application-events     "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             0.0.0.0:3012->3012/tcp                                                                       hedera-guardian-application-events-1
7d1be65e7575   hedera-guardian-guardian-service       "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             0.0.0.0:5007->5007/tcp, 6555/tcp                                                             hedera-guardian-guardian-service-1
aca8d17e50ed   hedera-guardian-policy-service         "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             0/tcp, 0.0.0.0:5006->5006/tcp                                                                hedera-guardian-policy-service-1
6e46385684f2   hedera-guardian-worker-service         "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             6555/tcp                                                                                     hedera-guardian-worker-service-2
0672d5e935ed   hedera-guardian-worker-service         "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             6555/tcp                                                                                     hedera-guardian-worker-service-1
908ecddee2a0   hedera-guardian-auth-service           "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             0.0.0.0:5005->5005/tcp, 6555/tcp                                                             hedera-guardian-auth-service-1
d3d411240b71   hedera-guardian-notification-service   "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes                                                                                                          hedera-guardian-notification-service-1
a481402a9b6f   hedera-guardian-logger-service         "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             6555/tcp                                                                                     hedera-guardian-logger-service-1
0d307384c4ac   mongo-express:1.0.2-20                 "/sbin/tini -- /dock…"   13 minutes ago   Up 13 minutes             8081/tcp                                                                                     hedera-guardian-mongo-express-1
b8d87a862b6c   ipfs/kubo:v0.26.0                      "/sbin/tini -- /usr/…"   13 minutes ago   Up 13 minutes (healthy)   0.0.0.0:4001->4001/tcp, 0.0.0.0:5001->5001/tcp, 4001/udp, 0.0.0.0:8080->8080/tcp, 8081/tcp   hedera-guardian-ipfs-node-1
59496daff47c   nats:2.9.24                            "/nats-server --http…"   13 minutes ago   Up 13 minutes             4222/tcp, 6222/tcp, 0.0.0.0:8222->8222/tcp                                                   hedera-guardian-message-broker-1
3ab891ae2180   prom/prometheus:v2.44.0                "/bin/prometheus --c…"   13 minutes ago   Up 13 minutes             0.0.0.0:9090->9090/tcp                                                                       hedera-guardian-prometheus-1
cf7285271fa1   mongo:6.0.13                           "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             27017/tcp                                                                                    hedera-guardian-mongo-1
52f4e7100776   hedera-guardian-topic-viewer           "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             3006/tcp, 0.0.0.0:5009->5009/tcp                                                             hedera-guardian-topic-viewer-1
b9fd0330a2b1   hashicorp/vault:1.12.11                "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             0.0.0.0:8200->8200/tcp                                                                       hedera-guardian-vault-1
66213f53878f   grafana/grafana:10.0.10                "/run.sh"                13 minutes ago   Up 13 minutes             3000/tcp, 0.0.0.0:9080->9080/tcp                                                             hedera-guardian-grafana-1
def27ca4aef0   hedera-guardian-mrv-sender             "docker-entrypoint.s…"   13 minutes ago   Up 13 minutes             3003/tcp, 0.0.0.0:5008->5008/tcp                                                             hedera-guardian-mrv-sender-1

And as a point of confirmation, the API stopped working.

Expected behavior

Just for the Guardian to run and work accordingly.

The question would be, does any of the authentication methods change state or have side effects?

Screenshots

Screenshot 2024-04-05 at 17 13 19

@mattsmithies
Copy link
Contributor Author

I tried using the system this morning, just logging in and while I could see the login screen, I received this error notification on the bottom right of the screen.

In addition, the entire API seems like it's down. The question would be, is this an instance of the database becoming corrupt?

Screenshot 2024-04-08 at 10 02 29

@mattsmithies
Copy link
Contributor Author

mattsmithies commented Apr 8, 2024

I am running on macOS 13.4.1, but I have managed to successfully build and run the software initially with test API calls to some basic authentication endpoints.

Using a "develop" env file, in conjunction with docker compose up -d --build

And, before this upgrade, I ran the troubleshoot directions:

docker builder prune --all
docker compose build --no-cache

@mattsmithies
Copy link
Contributor Author

mattsmithies commented Apr 8, 2024

Okay, so I continued digging. And I found out that my standard registry user was deleted.

Effectively, I have a test that checks whether a user cannot be registered -- so a standard registry user, we use the name 'dovuauthority'.

If you look at the screenshot, I was able to generate a duplicate user, so maybe data was lost.

Screenshot 2024-04-08 at 10 21 24

This below image is my test that I run to check if Registration of a user of a role of registry.

Screenshot 2024-04-08 at 10 20 23

@mattsmithies
Copy link
Contributor Author

So, I logged in with the new standard registry user, and it's freezing up on the initial registration screen for setting up your keys.

Has anyone else experienced this behaviour?

Screenshot 2024-04-08 at 10 22 23

@mattsmithies
Copy link
Contributor Author

Okay, another update.

I believe I've resolved the issue (but not the apparent data loss that was suffered), I will continue on my R&D quest.

I am able to login as a user and run my tests again, it feels inconsistent, but if I hit this issue again, I will add more comments.

@mattsmithies
Copy link
Contributor Author

Okay, I believe I think I understand what is happening.

I've had complete DB data loss three times now in the last week on my local machine.

This tends to happen when a task is unable to complete as it has reached a state whereby it is unable to finish -- and almost has an "infinite loop" effect system-wide.

Perhaps the data is in an invalid state for logic to continue?

What I have observed, is that while there are timeout functions for tasks, it almost attempts to create new tasks on top (or loops), and then blocks up the entire task queue and underlying event loop.

This has the downstream effect that folks are unable to view data through the Guardian frontend as the ability to fetch data from the API has been completely blocked.

@mattsmithies
Copy link
Contributor Author

mattsmithies commented May 2, 2024

This could be related to #3520 (only time will tell)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community
Projects
None yet
Development

No branches or pull requests

3 participants