Variable/Identifier Name Analysis

Source code is for humans

This is a project I have been thinking about for some time, exploring an interest in the human communication aspects of programming, non-reserved keyword/syntax features of source that are determined by creative/appropriate human language identifiers.

Potentially find commonly misused or improvable naming patterns, identify high or low quality code/commits, look for patterns related to code authorship, training, or sibling projects.

Steps

acquire repo/directory/files
read source code files
extract variables, array keys, function names
analyze strings
- split multi-word names (ie. camelCase => camel case)
- show list of files/projects containing one or more string
- view as tag clouds
- count occurences in corpus and per file, TF-IDF to relate "similar" files/projects
create a report
add modules/functions/scripts/ReST APIs to find identifiers in other languages

Beyond

attempt to classify source code application by its identifiers
vote for good/bad names
rate repo on naming readability and descriptive precision
extend beyond PHP with existing static analyais tools/abstract syntax tree

Thoughts:

PHP is easy ($xxx, function xxx(...), ->xxx, ['xxx']) (Bash/Perl similar)

Java is pretty easy, with declarations ([scope] TypeName xxx, function xxx)

Other languages, not sure yet...should make use of native tooling, compilers, analysis, there may be libraries which extract these and more details from many/all languages

analyze and report as above for PHP

Prototype/Proof-of-Concept

varnames.sh - proof of concept with basic PHP variable regex only

varnames.txt - raw output

varnames_word_parts.txt - after programming-case conversion tool

varnames_word_parts_counts.txt - word parts with corpus frequency

PoC corpus is from a number of URL shortener repos from GitHub (~700, but only a fraction of them in PHP, approximately 5000 PHP files)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

varnames.sh

varnames.sh

varnames.txt

varnames.txt

varnames_word_parts.txt

varnames_word_parts.txt

varnames_word_parts_counts.txt

varnames_word_parts_counts.txt

Repository files navigation

Variable/Identifier Name Analysis

Source code is for humans

Steps

Beyond

Thoughts:

Prototype/Proof-of-Concept

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
varnames.sh		varnames.sh
varnames.txt		varnames.txt
varnames_word_parts.txt		varnames_word_parts.txt
varnames_word_parts_counts.txt		varnames_word_parts_counts.txt

techbio/varnames

Folders and files

Latest commit

History

Repository files navigation

Variable/Identifier Name Analysis

Source code is for humans

Steps

Beyond

Thoughts:

Prototype/Proof-of-Concept

About

Resources

Stars

Watchers

Forks

Languages