Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ".C" as a C++ extension #1024

Open
glazari opened this issue Aug 13, 2023 · 0 comments
Open

Add ".C" as a C++ extension #1024

glazari opened this issue Aug 13, 2023 · 0 comments

Comments

@glazari
Copy link

glazari commented Aug 13, 2023

The OpenFoam software uses the peculiar standard of marking its c++ files with capital C ".C" extension.

https://github.com/OpenFOAM/OpenFOAM-dev

Github recognizes the project as being primarily C++ based
image

Running tokei however shows it as primarily c code.

➜  OpenFOAM-11 git:(master) tokei
===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Alex                    1           25           20            0            5
 BASH                   16         2992         1641          899          452
 C                    4062       877863       577069       157483       143311
 C Header             4137       626652       252159       236572       137921
 CMake                   5          239          191           18           30
 C Shell                 1           52            0           47            5
 C++                     2         1050          727          102          221
 CSS                     1           68           52            6           10
...
===============================================================================
 Total                8929      1534683       846827       401359       286497
===============================================================================

Running cloc it properly recognizes the project as primarily c++

➜  OpenFOAM-11 git:(master) cloc .
   15074 text files.
   13298 unique files.
    6423 files ignored.

github.com/AlDanial/cloc v 1.90  T=3.04 s (2876.3 files/s, 503256.6 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C++                            4062         161063         140143         577250
C/C++ Header                   4135         167367         207285         251901
Bourne Shell                    447           3356           4493           9588
liquid                           63            299              0           2505
Bourne Again Shell               16            452            883           1657
XML                               4             27             34            339
lex                               1            101             73            332
C                                 2             94            110            253
awk                               4             34            120            194
CMake                             5             30             18            191
...
--------------------------------------------------------------------------------
SUM:                           8752         332929         353496         844873
--------------------------------------------------------------------------------

I am not aware of any other projects that use this ".C" extension for C++. But OpenFoam is a very popular open source Simulation software widely used in the industry. Also, given the fact that both github and cloc are able to properly identify the project as CPP I think it makes sense to include this as an extension.

Am I missing some reason why this extension should not be added? does it conflict with the conventions of some other C project?

I can help with implementation if there is agreement in that this is a good addition.

my version of tokei is:

➜ tokei --version
tokei 12.1.2 compiled with serialization support: json
glazari added a commit to glazari/tokei that referenced this issue Aug 14, 2023
/# Motivation
It turns out that some projects use capital .C to refer to C++.

This is true for the OpenFoam project which is a widely used open
source software for physics simulations.

Although I do not know of other projects that follow this convention
it is worth noting that both github and `cloc` recognize the .C
extension as a C++ file. This was showed in more detail in this Issue:

XAMPPRocky#1024

/# Implementation

The file `languages.json` contains a list of extensions for each
supported language. So adding ".C" should have been enough for this
feature. But the code was casting extensions to lowercase, which makes
.c and .C indistinguishable. Because of this the `.to_lower` was also
removed from the get_extension utility function.

The "to_lower" was probably added for a purpouse. In order to not break
functionality I added a second check. If the case sensitive test does
not return any result, a second test is done with the to-lower. This
will make the code a tiny bit slower, but only when the case-sensitive
check fails and will only repeat a rather fast check.

/# Testing

Tests are created by the `build.rs` script according to the files
present in `tests/data`. In order to test my solution I copied the
`tests/data/cpp.cpp` into a file called `tests/data/cpp_C.C`.

Adding this file without modifying the code will cause a failing test,
which is what we want. This is mostly a coincidence I think, for some
reason the C language summarize blank lines differently than the Cpp
summarize.

I think the more appropriate test would be that each of these files also
contain the name of the language that they are meant to be as part of
the top comment. This would require additional code changes which I
think is a bit outside of the scope of this change. I do intend to
create a different PR adding this to the tests, if that is ok.

Observation: Since this new file is a copy of an existing one, it does
not occupy extra space in the git file-store, since it stores files by
content hash.

/# Additional considerations

In Windows the file-system is case-insensitive but case-preserving.
I believe that this means that the current version of the code will
work correctly on windows to identify .C as cpp. But I do not have
a windows system to test this on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant