Skip to content

Guidelines for choosing logging levels

Bert Kleewein edited this page Aug 31, 2020 · 1 revision

High level view

  • logger.critical is used to report conditions that may be unrecoverable. It complete library/app failure is likely, then logger.critical should be used.
  • logger.error is used to report things that need the immediate attention of the app or library developers. If something doesn't warrent a new bug in some tracking system, it probably shouldn't be reported with logger.error
  • logger.warning is used to report things that are outside the bounds of "normal operation", but not problematic enough to warrant opening a new bug.
  • logger.info is used to report significant events in the flow of the library. We want logger.info to show what the app is doing, but not with so much detail that the meaning is lost.
  • logger.debug is used to report details on how the app is doing things.

Of these, the line between logger.info and logger.debug is the most vague. To decide between these two, use the following guidelines:

  • If in doubt, use logger.debug. We don't want to flood logger.info with too much detail.
  • If you're logging to prove that your algorithm is flowing corectly, use logger.debug.
  • If something calls into the library, either from above or below, logger.info might be appropriate. (unless the calls are really noisy)
  • If something external returns an error, logger.info might be appropriate. (unless we expect frequent errors in this condition and we recover)
  • If internal logic determines that an error occured, we report it using logger.info
  • If the flow of the program does something unexpected or out of the ordinary, logger.info might be appropriate.
  • If we get more than 1 or 2 logger.info calls per second, there are probably too many logger.info calls.

Some customers will redirect logger calls to a web service, such that each call to logger.foo will result in an HTTP call pushing the output string over the network. This is why we want to be conservative with calls to any level above logger.debug.

If you think about viewing the log as a tech on the customer side: logger.critical requires immediate action. Either a phone-call to the person on call or an emergency reboot. logger.error gets reported and handled as soon as possible. logger.warning may or may not even be noticed. logger.info is used when someone at the customer site wants to figure out what's going on. logger.debug is used when one of the library developers (or someone else with deep understanding) wants to figure out what's going on.

Double logging:

We want to avoid logging the same condition multiple times. This typically happens with errors that get returned, where a function logs an error, then returns the error, and the calling function logs it before returning, and so on.

To avoid this, we use the following convention:

  1. If we get an error from an external API, we log it and info level. We use `info because it is somewhat important to note that an external component failed, but it is not yet an error that we need to report to the caller.
  2. If we get an error from an internal function, we don't log it. We assume that the error was logged when it actually happened or when it was returned to us from external code.
  3. If we call handle_background_exception, we don't need to log because handle_background_exception will log it for us.

logger.critical

(very rare)

Something bad has happened and we can no longer guarantee that things are going to work.

Examples

  • threading.Thread() call raises an Exception

logger.error

(rare)

Something bad has happened, and we either couldn't report it to you or it's so bad that you need to investigate it.

Examples:

  • Unexpected state
    • operation still not handled at last stage
    • attempting to complete an already-completed op.
    • No onConnected callback is registered (error becasue our code always registers this).
  • Our code raises an exception inside of a transport callback, we catch it and use logger.error to report it, but don't want to crash the transport.

logger.warning

(very rare)

Something happened that we couldn't report to you, but we think it might not be that bad.

Examples:

  • Message received on a module input, but user is not subscribed for that input.

logger.info

(common)

Something significant happened, but you only get the high level info.

Examples:

  • User called into our code and we started something
  • We returned an error to the user
  • We called into the transport
  • The transport calls into us
  • An error was created inside our code out of something that wasn't an error. e.g. if queue_size == 0: raise Exception("queue should not be empty").
  • Errors that come from another layer do not warrant a logger.info call.
  • A new operation was initiated as a side-effect of another. e.g. starting a connection on publishing
  • A new event is crated as a side-effect of something unrelated.
  • Completion of an operation is cut-short. e.g. ConnectionOp() completes early because transport is already connected.
  • We receive a message on a topic that isn't registered or the MID or RID is unexpected
  • An op is put into a queue.
  • The queue unblocks and ops start again
  • A timer caused something new to start. e.g. "starting sas token renewal".

logger.debug

(super-duper common)

Anything else.

Examples:

  • Calling into some internal object.
  • An error is returned from some internal object
  • An internal object got a call from some other internal object
  • An internal object returned an error to some other internal object.
  • An operation is completed or ignored
  • Details on some internal structure or some internal processing. e.g. "topic parsed. method name=blah"
  • Progress report on some internal flow. (e.g. Calling into SSL, done initializing pipeline)
  • State changes. (e.g. "CONACK received. Setting connected=True")
  • Internal handoff (e.g. "incoming message put into message inbox")
  • Internal object created or destroyed.
  • Timer start, stop, run

~ ~