The etch status website should distinguish between transient and recurring failures. I.e. a host that has only failed the last one or two times should be in one category, but after say three or more consecutive failures it should move to another category. If I were investigating I'd want to focus on hosts that are failing repeatedly. There are lots of reasons why hosts might fail occasionally in a large environment.
Moved from https://sourceforge.net/apps/trac/etch/ticket/6