Skip to content
swidup edited this page Sep 12, 2013 · 5 revisions

General Guidance

Here are some guidelines and requirements that we have developed over time for coding public cases.

Public Knowledge Only

The VCDB is intended to be a research repository of public incident data. As such, people contributing data should not provide any information which is not available to the public and not contained in any of the reference URLs.

Process for Getting Involved

You'll need a GitHub userid to participate. There are two ways to get involved--either by adding new canddiate incidents to the Issues for this repository, or by volunteering to code the incidents into VERIS format.

Adding New Candidate Incidents

Since this is a public repository, anyone can add new candidate incidents for consideration. Search on the organization's name within this repository first to make sure it isn't already listed. If it doesn't show up, create a new issue.

Coding Issues

If you want to help code the issues into VERIS format, we will need your GitHub userid so that you can be added to the group that is allowed to claim issues. Send email to participate(at)vcdb(dot)org with your github userid so that we can give you access. Please familiarize yourself with VERIS prior to claiming incidents. Questions can also be sent to the same email address above. Once you have access, follow the process below:

  • Find an unclaimed issue in GitHub and assign it to yourself
  • Code the incident in the Survey Gizmo tool and submit it.
  • Close the issue in GitHub.

The AwaitingDetails tag means that there isn’t enough information yet to code the incident, so we are letting it mature. The Update tag means it is an incident that may already be coded in the JSON incidents, so that probably isn’t a good candidate to claim either. All the others are up for grabs. We do have incidents where there may be no data breach—we are tracking other types of security events, coding them in VERIS whether they are breaches or not. The DOS and Defacements are frequently examples of these.

Right now we have the bulk of our older incidents waiting to be migrated over to GitHub, so there is a bit of a competition going for who can grab and code incidents. We hope to get them migrated in the next week. New issues are added to GitHub several times a day.

You won’t see your incident show up in the JSON until we do our regular export from the Survey Gizmo tool. These are done on a batch basis, and the validation rules are applied at that time. As a new incident coder, we will be reviewing the incidents you code and providing feedback until you get up to speed on the nuances of VERIS.

Quality of the Dataset

There are three elements of quality that we are concerned about in this dataset. Here they are described in order from most important to least important.

  1. Correctness - Information in the database is accurate. A record does not indicate that physical theft was present when there was no physical theft.
  2. Consistency - Similar incidents should result in similar records. A record of a lost laptop computer should look like all the other records of a lost laptop computer. Another part of consistency is ensuring that every incident is internally consistent. If malware was one of the actions, then integrity should be one of the affected attributes. A record should not contradict itself.
  3. Completeness - Having all of the public information about an incident known. This may mean sifting through dozens of articles to find an additional nugget of information. Doing so may not always have a favorable time to value tradeoff and should be avoided in those cases. However, all things being equal, a more complete record is better than a less complete record.

Victim Names in the Database

If the victim is a government agency, try to be consistent in naming convention. Always use the two-digit ISO country country code with no periods (e.g, "US Department of the Treasury" instead of "U.S." or "United States." Beyond that, try to use the full and official name (e.g., "Department of the Treasury" rather than "US Treasury Department" or "U.S. Dept of Treasury."

Always use the full/official name of the victim rather than abbreviations or shortened versions. For orgs like "IBM," where the acronym is the official name, use that. This will help us have consistent records (i.e., it's hard for a database to know that "BOA" and "Bank of America" are the same entity).

Victim name is not a required field (in fact, the entire victim section is not a required field). If the victim's name is not known, leave the field blank. If you record something like "Unknown retailer," the systems won't be able to distinguish that from a real name, and it'll get messy trying to get a clean list of victim names

Scenarios

Lost and Stolen Devices

Lost and stolen devices will always have the attributes of Confidentiality and Availability recorded. Confidentiality covers a loss of possession, while Availability captures not recovering the asset/data.

Don't automatically check "Yes" to data_disclosure. If no evidence, and no reason to believe it's exposed to further loss, go with "No." If there's reason to think it might be or might have been exposed/compromised, but we have no hard evidence, go with "Potentially." If there is positive direct evidence or strong circumstantial evidence that it was disclosed - even on a small scale to a single unauthorized person - go with "Yes"

Brute Force Attacks

Anytime an actor successfully uses a brute-force or dictionary attack, select "Yes" to data disclosure for 1 credential - even if no other data was compromised. If an actor successfully used phishing, pretexting, keyloggers, etc to nab a credential, data_disclosure = Yes to that as well. That seems a bit weird, but we're looking for simple black, and white rules here and it is true that the actor now knows non-public data on the victim. This will make for more consistent results.