Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E03: provide guidance on variable naming #76

Open
korenmiklos opened this issue Jul 14, 2020 · 1 comment
Open

E03: provide guidance on variable naming #76

korenmiklos opened this issue Jul 14, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@korenmiklos
Copy link
Contributor

  • Box 'Variable names':
    • the discussed rules for variable names are not exhaustive since variable names also must not be any of the reserved names (see Stata Manual Section 11.3)
    • I would propose to introduce some naming convention for variables so that students already start off with a good practice (for example only use English words, avoid abbreviation, use only lower case characters etc.); this could be a good chance to stress the importance of being consistent, whatever the rule/system is that they may be following
  • In my opinion, special characters should show up nowhere in a project (except for the raw data on which we have no influence) for reasons of backward compatibility, cross-platform compatibility and the understanding of colleagues/referees/editors who don't speak the language.
  • legibility = readability?
@korenmiklos korenmiklos added the enhancement New feature or request label Jul 14, 2020
@korenmiklos korenmiklos added this to the EEA-2020 milestone Jul 14, 2020
@sergiocorreia
Copy link

Agreed. On the naming convention, some advice would be to:

  • Stick to ASCII as much as possible. Recent versions of Stata allow UTF8 (gen fóßé = 2) but that makes collaboration and debugging difficult, as you said
  • Use snake case (lowercase words separated with underscores) instead of camel case or other alternatives that have been shown harder to read in coding.
  • I can't remember where I read this, but Statacorp suggests general-to-specific naming. For instance, nominal_gdp and real_gdp instead of gdp_nominal and gdp_real.
  • Lastly, although you can have up to 32 chars, it's best not to have too long names because that increase reliance on abbreviations, and I've seen my share of bugs caused by these (in fact, nowadays I always start with set varabbrev off, at the dismay of coauthors).

Maybe one way of showing these is by examples? Say we have nominal and real GDP as in above. A table would be a way of showing-not-telling:

Definition Nominal GDP Real GDP
Suggested nominal_gdp real_gdp
Too short ngdp rgdp
Too long nominal_gross_domestic_product real_gross_domestic_product
Less readable: CamelCase NominalGDP / NominalGdp RealGDP / RealGdp
Less readable: general-to-specific gdp_nominal gdp_real
Less portable: UTF8 pib_nominale pib_réel

@korenmiklos korenmiklos removed this from the EEA-2020 milestone Aug 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants