Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: icinga-stack-icinga-kubernetes keeps crashing with "invalid memory address or nil pointer dereference" #69

Open
Foxeronie opened this issue Apr 11, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@Foxeronie
Copy link

Foxeronie commented Apr 11, 2024

Affected Chart

icinga-stack

Which version of the app contains the bug?

0.3.0

Please describe your problem

Hi!
After installing the icinga stack via helm, the icinga-stack-icinga-kubernetes pod keeps crashing with the following log output.

I0411 12:55:03.767592       1 database.go:285] "Connecting to database" logger="database"
E0411 12:55:03.878359       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 743 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x181e020, 0x2a31a80})
	/go/pkg/mod/k8s.io/apimachinery@v0.29.2/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000017a40?})
	/go/pkg/mod/k8s.io/apimachinery@v0.29.2/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x181e020?, 0x2a31a80?})
	/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/icinga/icinga-kubernetes/pkg/schema/v1.(*Ingress).Obtain(0xc00048f450, {0x1d31ad0, 0xc0003df760})
	/build/pkg/schema/v1/ingress.go:102 +0x57d
github.com/icinga/icinga-kubernetes/pkg/sync/v1.(*Sync).sync.func1(0xc000d80240)
	/build/pkg/sync/v1/sync.go:85 +0x4f
github.com/icinga/icinga-kubernetes/pkg/sync.(*Sink).Upsert(0x2a34030?, {0x1d1c6f0, 0xc00041c5f0}, 0xc000d67e18?)
	/build/pkg/sync/sink.go:59 +0x38
github.com/icinga/icinga-kubernetes/pkg/sync.(*Controller).stream(0xc0000fd5c0, {0x1d1c6f0, 0xc00041c5f0}, 0xc00049ebd0)
	/build/pkg/sync/controller.go:93 +0x468
github.com/icinga/icinga-kubernetes/pkg/sync.(*Controller).Stream(0xc0000fd5c0, {0x1d1c6f0, 0xc00041c5f0}, 0xc00049ebd0)
	/build/pkg/sync/controller.go:55 +0x369
github.com/icinga/icinga-kubernetes/pkg/sync/v1.(*Sync).sync.func3()
	/build/pkg/sync/v1/sync.go:98 +0x46
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 83
	/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:75 +0x96
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x153bd7d]

goroutine 743 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000017a40?})
	/go/pkg/mod/k8s.io/apimachinery@v0.29.2/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x181e020?, 0x2a31a80?})
	/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/icinga/icinga-kubernetes/pkg/schema/v1.(*Ingress).Obtain(0xc00048f450, {0x1d31ad0, 0xc0003df760})
	/build/pkg/schema/v1/ingress.go:102 +0x57d
github.com/icinga/icinga-kubernetes/pkg/sync/v1.(*Sync).sync.func1(0xc000d80240)
	/build/pkg/sync/v1/sync.go:85 +0x4f
github.com/icinga/icinga-kubernetes/pkg/sync.(*Sink).Upsert(0x2a34030?, {0x1d1c6f0, 0xc00041c5f0}, 0xc000d67e18?)
	/build/pkg/sync/sink.go:59 +0x38
github.com/icinga/icinga-kubernetes/pkg/sync.(*Controller).stream(0xc0000fd5c0, {0x1d1c6f0, 0xc00041c5f0}, 0xc00049ebd0)
	/build/pkg/sync/controller.go:93 +0x468
github.com/icinga/icinga-kubernetes/pkg/sync.(*Controller).Stream(0xc0000fd5c0, {0x1d1c6f0, 0xc00041c5f0}, 0xc00049ebd0)
	/build/pkg/sync/controller.go:55 +0x369
github.com/icinga/icinga-kubernetes/pkg/sync/v1.(*Sync).sync.func3()
	/build/pkg/sync/v1/sync.go:98 +0x46
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 83
	/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:75 +0x96
    architecture: amd64
    bootID: 406da169-d74e-430b-91d2-4123fa8d3c3d
    containerRuntimeVersion: containerd://1.7.11-k3s2
    kernelVersion: 5.15.0-97-generic
    kubeProxyVersion: v1.26.13+rke2r1
    kubeletVersion: v1.26.13+rke2r1
    machineID: 0dbc36be8286482db6e70915edadf354
    operatingSystem: linux
    osImage: Ubuntu 22.04.4 LTS
    Kubernetes: v1.26.13+rke2r1

Best regards,
Patrick

@Foxeronie Foxeronie added the bug Something isn't working label Apr 11, 2024
@lippserd lippserd transferred this issue from Icinga/helm-charts Apr 11, 2024
@lippserd
Copy link
Member

Hi @Foxeronie,

Thanks for the report. I will have a look.

Best regards,
Eric

@lippserd
Copy link
Member

Hi @Foxeronie,

Can you please pull the image and try again. The issue should be fixed.

Best regards,
Eric

@Foxeronie
Copy link
Author

Hi Eric,

this problem seems to be solved. Thank you! :)
But now I'm running in the next error with this pod.

I0412 19:32:26.050838       1 database.go:285] "Connecting to database" logger="database"
2024-04-12T21:32:26.417133870+02:00 I0412 19:32:26.417013       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="dial tcp: lookup icinga-stack-kubernetes-database: operation was canceled"
2024-04-12T21:32:26.417151387+02:00 I0412 19:32:26.417046       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="context canceled"
2024-04-12T21:32:26.417376280+02:00 I0412 19:32:26.417254       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="context canceled"
I0412 19:32:26.417356       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="dial tcp: lookup icinga-stack-kubernetes-database: operation was canceled"
2024-04-12T21:32:26.417409427+02:00 I0412 19:32:26.417372       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="dial tcp: lookup icinga-stack-kubernetes-database: operation was canceled"
2024-04-12T21:32:26.417406338+02:00 [invalid connection]
2024-04-12T21:32:26.417483281+02:00 I0412 19:32:26.417387       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="dial tcp: lookup icinga-stack-kubernetes-database: operation was canceled"
2024-04-12T21:32:26.417496599+02:00 I0412 19:32:26.417393       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="dial tcp: lookup icinga-stack-kubernetes-database: operation was canceled"
2024-04-12T21:32:26.417504349+02:00 I0412 19:32:26.417411       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="dial tcp: lookup icinga-stack-kubernetes-database: operation was canceled"
2024-04-12T21:32:26.417524360+02:00 I0412 19:32:26.417407       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="dial tcp: lookup icinga-stack-kubernetes-database: operation was canceled"
2024-04-12T21:32:26.417533293+02:00 I0412 19:32:26.417390       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="dial tcp: lookup icinga-stack-kubernetes-database: operation was canceled"
I0412 19:32:26.417633       1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="context canceled"
2024-04-12T21:32:26.417790525+02:00 F0412 19:32:26.417719       1 main.go:204] can't retry: can't perform "INSERT INTO `persistent_volume_claim_ref` (`kind`, `name`, `uid`, `persistent_volume_id`) VALUES (:kind, :name, :uid, :persistent_volume_id) ON DUPLICATE KEY UPDATE `kind` = VALUES(`kind`), `name` = VALUES(`name`), `uid` = VALUES(`uid`), `persistent_volume_id` = VALUES(`persistent_volume_id`)": Error 1406 (22001): Data too long for column 'name' at row 7
failed to create fsnotify watcher: too many open files

Should I create a new issue for this?

Best regards,
Patrick

@lippserd
Copy link
Member

A new issue is not necessary yet. Please run the following statement in the database:

ALTER TABLE persistent_volume_claim_ref MODIFY COLUMN name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;

After that, if the daemon no longer crashes, please run the following query:

SELECT * FROM persistent_volume_claim_ref WHERE LENGTH(name) > 63;

And if it is ok data protection wise, please share the result. Maybe the result of the column kind is enough.

@Foxeronie
Copy link
Author

Foxeronie commented Apr 15, 2024

I'm now getting the following error.

2024-04-15T09:57:52.542959591+02:00 F0415 07:57:52.542760       1 main.go:204] can't retry: can't perform "INSERT INTO `node_volume` (`mounted`, `node_id`, `device_path`) VALUES (:mounted, :node_id, :device_path) ON DUPLICATE KEY UPDATE `mounted` = VALUES(`mounted`), `node_id` = VALUES(`node_id`), `device_path` = VALUES(`device_path`)": Error 1364 (HY000): Field 'name' doesn't have a default value

Edit:
Sorry I forgot the output

MariaDB [kubernetes]> SELECT * FROM persistent_volume_claim_ref WHERE LENGTH(name) > 63;
+----------------------+-----------------------+----------------------------------------------------------------------------------------+--------------------------------------+
| persistent_volume_id | kind                  | name                                                                                   | uid                                  |
+----------------------+-----------------------+----------------------------------------------------------------------------------------+--------------------------------------+
| �6�2/`2@���4�bo���          | PersistentVolumeClaim | prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0 | 31eabb2b-3768-4b01-a39f-5b3523ca8540 |
+----------------------+-----------------------+----------------------------------------------------------------------------------------+--------------------------------------+
1 row in set (0.001 sec)

MariaDB [kubernetes]> 

@lippserd
Copy link
Member

Hi @Foxeronie,

Thanks for sharing the output.

Regarding your last error: I pushed a fix. Please pull the image and try again.

Best regards,
Eric

@Foxeronie
Copy link
Author

Hi Eric,

thanks for your work. The last output is fixed, sadly the next error appeared.
Is it outside any normal conventions to have so long names? Not that we are generally the problem with this.

2024-04-15T14:11:28.323929591+02:00 F0415 12:11:28.323793       1 main.go:204] can't retry: can't perform "INSERT INTO `pvc` (`storage_class`, `phase`, `name`, `volume_mode`, `actual_capacity`, `desired_access_modes`, `namespace`, `uid`, `created`, `actual_access_modes`, `minimum_capacity`, `id`, `volume_name`, `resource_version`) VALUES (:storage_class, :phase, :name, :volume_mode, :actual_capacity, :desired_access_modes, :namespace, :uid, :created, :actual_access_modes, :minimum_capacity, :id, :volume_name, :resource_version) ON DUPLICATE KEY UPDATE `storage_class` = VALUES(`storage_class`), `phase` = VALUES(`phase`), `name` = VALUES(`name`), `volume_mode` = VALUES(`volume_mode`), `actual_capacity` = VALUES(`actual_capacity`), `desired_access_modes` = VALUES(`desired_access_modes`), `namespace` = VALUES(`namespace`), `uid` = VALUES(`uid`), `created` = VALUES(`created`), `actual_access_modes` = VALUES(`actual_access_modes`), `minimum_capacity` = VALUES(`minimum_capacity`), `id` = VALUES(`id`), `volume_name` = VALUES(`volume_name`), `resource_version` = VALUES(`resource_version`)": Error 1406 (22001): Data too long for column 'name' at row 18

Best regards,
Patrick

@lippserd
Copy link
Member

Is it outside any normal conventions to have so long names? Not that we are generally the problem with this.

I thought that they are restricted, that's why the schema is limited in that regard, but obviously they're not 😆.

Please execute the following statements to increase the available length of all volume name columns:

ALTER TABLE node_volume MODIFY COLUMN name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;
ALTER TABLE pod_pvc MODIFY COLUMN volume_name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;
ALTER TABLE pod_pvc MODIFY COLUMN claim_name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;
ALTER TABLE pod_volume MODIFY COLUMN volume_name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;
ALTER TABLE container_mount MODIFY COLUMN volume_name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;
ALTER TABLE pvc MODIFY COLUMN name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;
ALTER TABLE pvc MODIFY COLUMN volume_name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;
ALTER TABLE persistent_volume MODIFY COLUMN name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;

@Foxeronie
Copy link
Author

I thought that they are restricted, that's why the schema is limited in that regard, but obviously they're not 😆.

Ah, okay. :D

I ran all commands. One additional error appeared for table data, column "name".
I also modified this column with

ALTER TABLE data MODIFY COLUMN name varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;

and now the pod is runnning. :)
Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants