Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activity 2: Analysis #2

Open
selkins13 opened this issue Nov 20, 2020 · 20 comments
Open

Activity 2: Analysis #2

selkins13 opened this issue Nov 20, 2020 · 20 comments

Comments

@selkins13
Copy link
Collaborator

Duration: 20 minutes

We will use different analysis tools to identify wrong practices in a repository. To do it we will use the following commands:

  • git-sizer
  • git-find-dirs-many-files
  • git-find-lfs-extensions
  • git-find-dirs-unwanted
  • git-filter-repo

Before starting any analysis, pick one repository of your preference that you would like to analyze.

⚠️ Make sure during all this exercise you don't post any private information that should not be shared publicly.

Clone this repository as we have added all the tools into the it for making the workshop more convenient:

# Clone the repository
git clone https://github.com/githubuniverseworkshops/grafting-monorepos.git


#  or use the GitHub CLI
gh repo clone githubuniverseworkshops/grafting-monorepos

Stats of repo size: git-sizer

  1. Download the corresponding compiled version of git-sizer.

Optionally you can install git-sizer using Homebrew if you are on Mac.

  1. Run the tool from the root of the repository to analyze:
/path/to/git-sizer --verbose

Find files that should be in LFS: git-find-lfs-extensions

  1. Checkout the grafting-monorepos repository
  2. Run the tool from the root of the repository to analyze:
/path/to/grafting-monorepos/scripts/git-find-lfs-extensions

Print directories with the number of files contained: git-find-dirs-many-files

  1. Checkout the grafting-monorepos repository
  2. Run the tool from the root of the repository to analyze:
/path/to/grafting-monorepos/scripts/git-find-dirs-many-files

Find dirs that should not be committed: git-find-dirs-unwanted

  1. Checkout the grafting-monorepos repository
  2. Run the tool from the root of the repository to analyze:
/path/to/grafting-monorepos/scripts/git-find-dirs-unwanted | head -n 15            

Analyze the repository: git-filter-repo --analyze

  1. Clone the git-filter-repo tool
  2. Execute the tool from the linux repository
/path/to/git-filter-repo/git-filter-repo --analyze

Report out

Report your findings from the above commands in comments section below. Be sure to include answers to the following questions in your comments, if possible:
- Do you find any patterns?
- Was there anything surprising?

⚠️ Make sure during all this exercise you don't post any private information that should not be shared publicly.

For examples and more information, please see README.md -> Activity 2.

@toddocon
Copy link

Here's my git-sizer output:

Processing blobs: 186589                        
Processing trees: 323092                        
Processing commits: 51356                        
Matching commits to trees: 51356                        
Processing annotated tags: 42                        
Processing references: 783                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  51.4 k   |                                |
|   * Total size               |  13.4 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |   323 k   |                                |
|   * Total size               |   237 MiB |                                |
|   * Total tree entries       |  6.14 M   |                                |
| * Blobs                      |           |                                |
|   * Count                    |   187 k   |                                |
|   * Total size               |  51.4 GiB | *****                          |
| * Annotated tags             |           |                                |
|   * Count                    |    42     |                                |
| * References                 |           |                                |
|   * Count                    |   783     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  2.08 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |  1.58 k   | *                              |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |   198 MiB | ********************           |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  11.1 k   |                                |
| * Maximum tag depth      [5] |     1     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [6] |  5.08 k   | **                             |
| * Maximum path depth     [6] |    25     | **                             |
| * Maximum path length    [6] |   280 B   | **                             |
| * Number of files        [7] |  29.2 k   |                                |
| * Total size of files    [8] |  8.30 GiB | ********                       |
| * Number of symlinks     [9] |   175     |                                |
| * Number of submodules  [10] |    20     |                                |

@larsxschneider
Copy link
Collaborator

@toddocon That looks good in general. The 200MB file might be a good candidate for Git LFS.

@alubchuk
Copy link

Here is my git-sizer output:

Processing blobs: 51587
Processing trees: 100112
Processing commits: 15133
Matching commits to trees: 15133
Processing annotated tags: 17
Processing references: 785
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  15.1 k   |                                |
|   * Total size               |  6.52 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |   100 k   |                                |
|   * Total size               |  42.2 MiB |                                |
|   * Total tree entries       |  1.16 M   |                                |
| * Blobs                      |           |                                |
|   * Count                    |  51.6 k   |                                |
|   * Total size               |  1.21 GiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |    17     |                                |
| * References                 |           |                                |
|   * Count                    |   785     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  33.5 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |    87     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |  20.3 MiB | **                             |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  4.27 k   |                                |
| * Maximum tag depth      [5] |     1     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [6] |   623     |                                |
| * Maximum path depth     [7] |    10     | *                              |
| * Maximum path length    [8] |   111 B   | *                              |
| * Number of files        [6] |  2.81 k   |                                |
| * Total size of files    [9] |  37.9 MiB |                                |
| * Number of symlinks         |     0     |                                |
| * Number of submodules       |     0     |                                |

@onetrickwolf
Copy link

My results 😬

git-sizer

git-sizer --verbose
Processing blobs: 19360                        
Processing trees: 34523                        
Processing commits: 7588                        
Matching commits to trees: 7588                        
Processing annotated tags: 0                        
Processing references: 116                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  7.59 k   |                                |
|   * Total size               |  3.02 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |  34.5 k   |                                |
|   * Total size               |  16.5 MiB |                                |
|   * Total tree entries       |   444 k   |                                |
| * Blobs                      |           |                                |
|   * Count                    |  19.4 k   |                                |
|   * Total size               |  1.88 GiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |     0     |                                |
| * References                 |           |                                |
|   * Count                    |   116     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  39.5 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |   193     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |   521 MiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  3.20 k   |                                |
| * Maximum tag depth          |     0     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [5] |   218     |                                |
| * Maximum path depth     [6] |    10     | *                              |
| * Maximum path length    [7] |   138 B   | *                              |
| * Number of files        [5] |  1.26 k   |                                |
| * Total size of files    [8] |   813 MiB |                                |
| * Number of symlinks     [9] |     1     |                                |
| * Number of submodules       |     0     |                                |

git-find-lfs-extensions

/Users/brentconn/IdeaProjects/grafting-monorepos/scripts/git-find-lfs-extensions

Type           Extension              LShare    LCount     Count      Size       Min       Max
-------        ---------             -------   -------   -------   -------   -------   -------
all            *                       1.0 %        17      1141       144         0       105
binary         bson                   16.0 %         2        12       107         0       105
binary         gz                     10.0 %         4        39         7         0         2
binary         png                     3.0 %         5       146        18         0         0
text           json                    3.0 %         4       101         4         0         1
text           js                      0.0 %         1       553         3         0         1
text           svg                    12.0 %         1         8         1         0         0

Can't post much else due to sensitive information :)

@toddocon
Copy link

toddocon commented Dec 11, 2020

Extensions:

% ./git-find-lfs-extensions 

Type             Extension                      LShare    LCount     Count      Size       Min       Max
-------          ---------                     -------   -------   -------   -------   -------   -------
all              *                               4.0 %       534     11707      3351         0       149
binary           dn                            100.0 %        94        94      1034         2       149
binary           dll                            72.0 %       108       149       497         0        42
binary           lib                            43.0 %        16        37       271         0        93
binary           so                             35.0 %        32        90       164         0        38
binary           png                             8.0 %        40       493       146         0        21
binary           dylib                          70.0 %        19        27       131         0        32
binary           woff                           64.0 %         9        14       122         0        13
binary           otf                            70.0 %         7        10       116         0        16
binary           psd                            20.0 %        13        62        91         0        51
binary           gif                            74.0 %        69        93        86         0         3
binary           a                              44.0 %         4         9        62         0        31
binary           jar                           100.0 %         1         1        59        59        59
binary           dds                            34.0 %         8        23        57         0        16
binary           exe                            76.0 %        10        13        42         0        14
binary           pdb                           100.0 %         7         7        33         0        13
binary           8bf                           100.0 %         2         2        23         7        15
binary           exr                            27.0 %         9        33        18         0         5
binary           jpg                             7.0 %         3        42        15         0         9
binary           glb                            11.0 %         2        17        11         0         6
binary           bin                            28.0 %         2         7         6         0         3
binary           gz                            100.0 %         3         3         4         0         2
binary           mod                            25.0 %         3        12         4         0         1
binary           eps                           100.0 %         1         1         2         2         2
binary           pdf                            13.0 %         2        15         3         0         0
binary           texture                        50.0 %         1         2         1         0         1
binary           csf                            40.0 %         2         5         1         0         0
binary           tga                           100.0 %         1         1         0         0         0
binary           blend                         100.0 %         1         1         0         0         0
binary w/o ext   ZZZZpm                        100.0 %         6         6        33         2         9
binary w/o ext   ZZZZZ_caps                    100.0 %        10        10        32         1         8
binary w/o ext   standard multiplugin          100.0 %         2         2        25        11        13
binary w/o ext   doxygen                       100.0 %         1         1        17        17        17
binary w/o ext   amtlib                        100.0 %         4         4         8         2         2
binary w/o ext   getZZZZZZ                     100.0 %         1         1         6         6         6
binary w/o ext   updaternotifications          100.0 %         4         4         2         0         0
binary w/o ext   bcp                           100.0 %         1         1         0         0         0
text             obj                            27.0 %         3        11        20         0        18
text             h                               0.0 %        12      3819        57         0         6
text             fbx                            25.0 %         2         8         9         0         8
text             c                               0.0 %         1       154        10         0         6
text             hpp                             0.0 %         2       742        16         0         2
text             hdr                           100.0 %         3         3         4         0         2
text             js                              0.0 %         2       340        18         0         1
text             html                            0.0 %         1       287         5         0         2
text             dat                            50.0 %         4         8         2         0         0
text             cpp                             0.0 %         2      1851        30         0         1
text             f90                             5.0 %         1        19         1         0         0
text             fi                              6.0 %         1        16         0         0         0
text w/o ext     configure                     100.0 %         2         2         1         0         0

@larsxschneider
Copy link
Collaborator

@alubchuk your repo looks really good. Again the 20MB file might be a good candidate for LFS. In general I recommend to use LFS for files >1MB which are changed regularly.

@larsxschneider
Copy link
Collaborator

@onetrickwolf 521 MB files is definitely a candidate for LFS.

@larsxschneider
Copy link
Collaborator

@toddocon You have a lot of compiled libraries like dll, lib, so in your repo. If possible it would be good to move that to an artifact store or use Git LFS for them.

@larsxschneider
Copy link
Collaborator

@toddocon I don't know what dn files are... but all of them in your repo are larger than 500kb (the default cut off)... and in total they consume 1GB in your repo. LFS candidate 👍

binary           dn                            100.0 %        94        94      1034         2       149

@toddocon
Copy link

What of these results files are of the most interest?

% time git-filter-repo --analyzee

Processed 561079 blob sizes
Processed 51072 commitswarning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 12580 and retry the command.
Processed 51356 commits
Writing reports to .git/filter-repo/analysis...done.
git-filter-repo --analyze  956.38s user 57.45s system 98% cpu 17:05.08 total


README
blob-shas-and-paths.txt
directories-all-sizes.txt
directories-deleted-sizes.txt
extensions-all-sizes.txt
extensions-deleted-sizes.txt
path-all-sizes.txt
path-deleted-sizes.txt
renames.txt

@larsxschneider
Copy link
Collaborator

larsxschneider commented Dec 11, 2020

The first 3 lines of each of those could be interesting:

directories-all-sizes.txt
directories-deleted-sizes.txt
extensions-all-sizes.txt
extensions-deleted-sizes.txt
path-all-sizes.txt
path-deleted-sizes.txt

@toddocon
Copy link

toddocon commented Dec 11, 2020

Example from directories-deleted-sizes.txt:

=== Deleted directories by reverse size ===
Format: unpacked size, packed size, date deleted, directory name
  6411041769 3865142944 2019-11-19 apps/path1
  2526122419 2121845383 2019-11-05 apps/path2
  1721073323 1670074223 2018-08-27 apps/path3

Example from directories-all-sizes.txt:

=== All directories by reverse size ===
Format: unpacked size, packed size, date deleted, directory name
  66362262776 22878379764 <present>  <toplevel>
  36655294341 6728535969 <present>  external
  11090835223 5978643918 <present>  apps

@larsxschneider
Copy link
Collaborator

larsxschneider commented Dec 11, 2020

@toddocon These apps are just present in the history. They are not in the HEAD commit but you carry around the data with every clone. In the next section we will discuss what you can do about it 😉


external sounds like 3rd party components. That being on the second place means it might make sense to explore a dependency management system. Can you reveal what language is used in this repo?

@toddocon
Copy link

the final ones:

==> extensions-all-sizes.txt <==
=== All extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
  8227796558 7999310896 <present>  .png
  6482556756 1893380227 <present>  .dn
  2531400184 1608367250 2019-11-18 .k2

==> extensions-deleted-sizes.txt <==
=== Deleted extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
  2531400184 1608367250 2019-11-18 .k2
   490733568  220449697 2019-07-11 .raw
   111890545  111923458 2016-05-04 .7z

==> path-all-sizes.txt <==
=== All paths by reverse accumulated size ===
Format: unpacked size, packed size, date deleted, path name
   723820832  259456521 2019-11-05 components/path1
   963509828  235721785 <present>  external/path1
   239188752  223828914 2019-06-03 external/path2

==> path-deleted-sizes.txt <==
=== Deleted paths by reverse accumulated size ===
Format: unpacked size, packed size, date deleted, path name(s)
   723820832  259456521 2019-11-05 components/path1
   239188752  223828914 2019-06-03 external/path1
   217239244  203439053 2017-05-20 apps/path1

@larsxschneider
Copy link
Collaborator

==> extensions-all-sizes.txt <==
=== All extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
  8227796558 7999310896 <present>  .png
  6482556756 1893380227 <present>  .dn
  2531400184 1608367250 2019-11-18 .k2

png files use up most of the space by far. Again using Git LFS might be useful.

==> extensions-deleted-sizes.txt <==
=== Deleted extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
  2531400184 1608367250 2019-11-18 .k2
   490733568  220449697 2019-07-11 .raw
   111890545  111923458 2016-05-04 .7z

k2 files use up significant space although these files are not used anymore.

==> path-all-sizes.txt <==
=== All paths by reverse accumulated size ===
Format: unpacked size, packed size, date deleted, path name
   723820832  259456521 2019-11-05 components/path1
   963509828  235721785 <present>  external/path1
   239188752  223828914 2019-06-03 external/path2

components/path1 is not used anymore but uses lots of space. Same for external/path2 ... plus that might be a 3rd party?

@toddocon ☝️

@thomas-schuster
Copy link

git-sizer

Processing blobs: 2222
Processing trees: 2519
Processing commits: 515
Matching commits to trees: 515
Processing annotated tags: 0
Processing references: 6

Name Value Level of concern
Overall repository size
* Commits
* Count 515
* Total size 151 KiB
* Trees
* Count 2.52 k
* Total size 824 KiB
* Total tree entries 22.2 k
* Blobs
* Count 2.22 k
* Total size 90.7 MiB
* Annotated tags
* Count 0
* References
* Count 6
Biggest objects
* Commits
* Maximum size [1] 827 B
* Maximum parents [2] 2
* Trees
* Maximum entries [3] 28
* Blobs
* Maximum size [4] 980 KiB
History structure
* Maximum history depth 378
* Maximum tag depth 0
Biggest checkouts
* Number of directories [5] 56
* Maximum path depth [5] 5
* Maximum path length [5] 61 B
* Number of files [5] 209
* Total size of files [6] 2.05 MiB
* Number of symlinks 0
* Number of submodules 0

@larsxschneider
Copy link
Collaborator

@thomas-schuster your repo looks perfect!

@neilwang0913
Copy link

Processing blobs: 4388
Processing trees: 5043
Processing commits: 2073
Matching commits to trees: 2073
Processing annotated tags: 0
Processing references: 48

Name Value Level of concern
Overall repository size
* Commits
* Count 2.07 k
* Total size 572 KiB
* Trees
* Count 5.04 k
* Total size 3.22 MiB
* Total tree entries 82.3 k
* Blobs
* Count 4.39 k
* Total size 228 MiB
* Annotated tags
* Count 0
* References
* Count 48
Biggest objects
* Commits
* Maximum size [1] 914 B
* Maximum parents [2] 2
* Trees
* Maximum entries [3] 44
* Blobs
* Maximum size [4] 58.4 MiB ******
History structure
* Maximum history depth 1.85 k
* Maximum tag depth 0
Biggest checkouts
* Number of directories [5] 82
* Maximum path depth [6] 8
* Maximum path length [6] 144 B *
* Number of files [7] 399
* Total size of files [8] 157 MiB
* Number of symlinks 0
* Number of submodules [9] 1

@neilwang0913
Copy link

Please any comments and suggestion about the result from “git-sizer --verbose” for my work repository?

@larsxschneider
Copy link
Collaborator

@neilwang0913 In general your repository is in great shape and there is no reason for concern. The single 58 MB file might be a good candidate for Git LFS, but since your overall repository size is rather low that is no big concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants