Add .C as a cpp extension and make extensions case-sensitive #1025
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
It turns out that some projects use capital .C to refer to C++.
This is true for the OpenFoam project which is a widely used open source software for physics simulations.
Although I do not know of other projects that follow this convention it is worth noting that both github and
cloc
recognize the .C extension as a C++ file. This was showed in more detail in this Issue:#1024
Implementation
The file
languages.json
contains a list of extensions for each supported language. So adding ".C" should have been enough for this feature. But the code was casting extensions to lowercase, which makes .c and .C indistinguishable. Because of this the.to_lower
was also removed from the get_extension utility function.The "to_lower" was probably added for a purpouse. In order to not break functionality I added a second check. If the case sensitive test does not return any result, a second test is done with the to-lower. This will make the code a tiny bit slower, but only when the case-sensitive check fails and will only repeat a rather fast check.
Testing
Tests are created by the
build.rs
script according to the files present intests/data
. In order to test my solution I copied thetests/data/cpp.cpp
into a file calledtests/data/cpp_C.C
.Adding this file without modifying the code will cause a failing test, which is what we want. This is mostly a coincidence I think, for some reason the C language summarize blank lines differently than the Cpp summarize.
I think the more appropriate test would be that each of these files also contain the name of the language that they are meant to be as part of the top comment. This would require additional code changes which I think is a bit outside of the scope of this change. I do intend to create a different PR adding this to the tests, if that is ok.
Observation: Since this new file is a copy of an existing one, it does not occupy extra space in the git file-store, since it stores files by content hash.
Additional considerations
In Windows the file-system is case-insensitive but case-preserving. I believe that this means that the current version of the code will work correctly on windows to identify .C as cpp. But I do not have a windows system to test this on.