Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Previous versions of tt-kmd may still be the one used if not explicitly removed on Ubuntu 20.04 via DKMS #1

Open
tt-rkim opened this issue Dec 7, 2023 · 1 comment
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation internal_report

Comments

@tt-rkim
Copy link

tt-rkim commented Dec 7, 2023

We have a few cases of bare metal Ubuntu 20.04 boxes with a previous version of tt-kmd installed which don't pick up the newest version when added to dkms even after a module reload/reboot.

Example output:

tt-admin@e08cs08:~$ sudo dkms status tenstorrent
tenstorrent, 1.21, 5.4.0-166-generic, x86_64: built
tenstorrent, 1.21, 5.4.0-167-generic, x86_64: installed
tenstorrent, 1.26, 5.4.0-166-generic, x86_64: installed

Note that we've seen this on systems with 1.20.1 and 1.23, so seems to be version-independent.

In tt-smi, we see that the driver in use is not the most recent one. 1.26 is our desired version.

One way to deal with this from the user side is to remove all dkms modules before adding the newest one. However, this is cumbersome with the dkms command line interface requiring specific versions listed for removal. sed and awk are friends here but we personally would prefer to sidestep that and have a nicer install experience.

@warthog9 warthog9 added bug Something isn't working documentation Improvements or additions to documentation internal_report labels Apr 26, 2024
@alewycky-tenstorrent
Copy link
Contributor

If we install a package from "dkms mkdeb", it will invoke /usr/lib/dkms/common.postinst which will additionally build for the newest installed kernel or all kernels if autoinstall_all_kernels="y" is set in /etc/dkms/framework.conf. Building for newest installed kernel still fails in unlikely corner-cases (install multiple new kernels, boot one that isn't the newest), but it's close enough.

The other approach that I recommend is to run "dkms autoinstall" on boot. Here are the instructions I worked up:
sudo systemctl edit --force --full dkms-autoinstall.service

[Unit]
Description=Recompile DKMS modules for running kernel
DefaultDependencies=no
Before=systemd-udev-trigger.service

[Service]
Type=oneshot
ExecStart=/usr/sbin/dkms autoinstall

[Install]
WantedBy=systemd-udev-trigger.service

sudo systemctl enable dkms-autoinstall.service

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation internal_report
Projects
None yet
Development

No branches or pull requests

3 participants