Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"since last reflash" metrics and flags #1245

Open
gabrielburnworth opened this issue Sep 21, 2020 · 0 comments
Open

"since last reflash" metrics and flags #1245

gabrielburnworth opened this issue Sep 21, 2020 · 0 comments

Comments

@gabrielburnworth
Copy link
Contributor

Goals

  • Get specific information about OS and OTA system stability.
  • Gather data over time to establish a baseline to which any changes in performance can be compared.

Requirements

  • Data must reset when SD card is flashed (such that it is clear for the first boot thereafter).

Existing metrics

  • Existing logs and telemetry allow us to view problems from the server, but patterns must be interpreted from a large number of data points and may be either outdated or lost when storage limits are reached.
  • Existing bot state and API data fields show current device info and last connection times, but do not provide the necessary context to determine if stability issues (such as boot looping or OTA failures) are present.

Possible SD card state fields:

first_boot_at

data type: string (ISO timestamp) or integer (epoch)
details: set upon boot if value does not already exist
reportable metric: days since last reflash (calculated). Also: date of SD card flash.
interpretation/motivation: A high number of days since the last SD card flash would suggest system stability and corruption resistance. A low number would suggest a new user or unrecoverable OTA errors or app crashes.

boot_count

data type: integer
details: increment upon each boot
reportable metric: number of reboots since last reflash
interpretation/motivation: A low number, when compared with a high number of days since last reflash, would suggest system stability. A high number could indicate boot looping or app crashes.

upgrade_count

data type: integer
details: increment upon each OTA update installation
reportable metric: number of FarmBot OS upgrades since last reflash
interpretation/motivation: High counts that match the number of available updates during the reporting period would indicate OTA system success. Low (corruption) or abnormally high counts (looping) would indicate OTA system issues.

unstable_os_installed

data type: boolean
details: set to true upon installation of any version not from the "stable" channel (i.e., release candidates)
reportable metric: Has a device had an alpha or beta version installed since the last SD card reflash?
interpretation/motivation: A truthy value would indicate an increased possibility of database errors from past or present unstable development releases. Perform a "hard reset" before troubleshooting further. Since downgrades and channel switching is allowed, the currently installed version and channel isn't sufficient to determine if an unstable version has been installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant