Skip to content
This repository has been archived by the owner on Aug 16, 2023. It is now read-only.

Consolidate to a single .gitignore in the root directory #106

Open
kevindigo opened this issue Mar 5, 2021 · 9 comments
Open

Consolidate to a single .gitignore in the root directory #106

kevindigo opened this issue Mar 5, 2021 · 9 comments

Comments

@kevindigo
Copy link
Collaborator

We currently have a second .gitignore file in CSVsToParse. While that works, it is a bit unusual to have subtree .gitignore files. Plus, we have to then have a line in that .gitignore telling it not to ignore .gitignore itself. It seems cleaner to just get rid of the extra .gitignore file, and have all of the ignore rules in the main .gitignore file in the root directory.

@keeganwitt
Copy link
Contributor

keeganwitt commented Mar 11, 2021

I think you may not want to do this. If you do, the CSVsToParse directory itself will no longer be tracked, because Git doesn't track empty directories.

One option some folks do to work around this is to create an empty .gitkeep file in the directory you wish to keep. I somewhat disfavored this solution since it's basically a workaround, it's not in any way directly supported by Git tooling.

While I do favor a single root .gitignore file in general, the empty directory scenario is a reasonable exception to the rule in my view. .gitkeep isn't a terrible choice either though (it's a fairly common pattern). You could even put a comment in the file describing what the empty directory is for (using comment syntax, including plaintext, since nothing will be reading that file).

@kevindigo
Copy link
Collaborator Author

Thanks for the comment. It inspired me to think about our real goal with all of this. Currently we track CSVsToParse but then ignore (don't track) anything inside it. I was thinking it would be just as good to ignore CSVsToParse itself, rather than tracking it. That would be much simpler, as a single .gitignore entry could handle that.

However, it would force users not only to create CSVsToParse themselves, but also to spell it correctly in order for it to automatically be ignored. That wouldn't be great, especially since files in that directory could contain PII that shouldn't be committed into git.

Using a .gitkeep seems fine, especially if it contains explanatory text. An alternative would be to have a non-dot normal markdown file in CSVsToParse, which would explain the situation. I'm not sure what to name it, but it could be something like CSVFileDocumentation.md. I'm not sure how much of the related documentation currently in README.md would get moved here, or copied here.

@keeganwitt
Copy link
Contributor

keeganwitt commented Mar 12, 2021

I think the issue with a normal (non dot) file is that if you wanted to read multiple CSVs at once (which is why the directory exists in the first place), that file would get sent in over stdin if you relied on shell globbing (which I think is a natural way to do it).

Just to be clear what I mean by that,

$ echo 1 > .1
$ echo 2 > 2
$ cat *
2

vs

$ echo 1 > 1
$ echo 2 > 2
$ cat *
1
2

So if I were using cat * as input to Starfish, it'd include the contents of the new file you propose (unless you do it as a hidden file).

.gitkeep is a perfectly reasonable alternative. These are competing conventions. The only reason I favored .gitignore over that is that it's a defined standard, being part of Git, whereas .gitkeep has no particular standard. However, for the purposes of solving problems like this one, there's no functional reason to choose one over the other.

Taking a further step back: why have this directory in Git at all? All Starfish knows about is what comes in on stdin. I don't normally add directories to Git unless there are expectations in the code about the directory structure. I've accidentally committed a change to an existing file I didn't intend to, but I've never in my life accidentally done git add (that could just be me though -- I never do git add .)

@danisyellis
Copy link
Collaborator

Good question. I agree that ideally our tracked file structure would match only what the code requires, but in this case I'm prioritizing ease of using Starfish over code clarity for devs.

Since Starfish is a community management tool, I'd love for non-engineers to be able to follow the README instructions and run it, so I'm making everything as explicit as possible. So we have the empty CSVsToParse in Git as an attempt to make Starfish as easy to run as possible.

I'm assuming that some users will want to have multiple CSVs in a separate directory. Having the empty CSVsToParse directory available by default means that
a) there's one less step for the user who wants that directory
b) every user will have their CSVs in the same place, so in the README I can give an example with the exact path. (Thinking about this, iff there's a way to enforce this path with the code, instead of command line input from the user, that would be even better.)

Since this folder will likely contain PII I also want tracking of the files turned off by default.

@keeganwitt
Copy link
Contributor

keeganwitt commented Mar 16, 2021

If there's a way to enforce this path with the code, instead of command line input from the user, that would be even better.

If you wanted to do it that, what do you think of just having the directory hard-coded and read all CSVs in that directory?
I tend to think stdin is less grokkable for less technical folks. For additional flexibility, you could have the directory to read CSVs from default to that, but allow it to be overridden with a command line arg, which can specify either a file or directory of CSVs (using isFile() or isDirectory()).

@kevindigo
Copy link
Collaborator Author

We definitely want to read from a file path rather than STDIN. It's on the to-do list for when we circle back to starfish.

One interesting option (which I'm not advocating for at this point) would be to prevent users from reading the file from anywhere inside the starfish git working directory tree. Basically, prevent them from accidentally committing it.

My first thought was to prevent it from being in any git tree, but they might specifically want to keep the list tracked in an internal git repo, so that seems too restrictive. I suppose we could add a command line option to override that (--allow-input-file-to-be-in-a-git-tree), but I don't love that.

@danisyellis
Copy link
Collaborator

Oh yeah, right! I've even created an issue for that.
We'd talked about how it would make the code less confusing, but I hadn't thought through how it will also make Starfish easier to run for a user. Now I'm excited for that refactor.

@keeganwitt
Copy link
Contributor

keeganwitt commented Mar 16, 2021

@danisyellis Yeah, I noticed that after I'd already left the last comment.

@kevindigo I was curious how you'd prevent being in any Git tree. Keep moving up the directory tree until root (or you find a .git)? I know you weren't necessarily advocating that, just thinking through what that'd mean.
I agree though, that seems too much. Also, one might decide to have a local Git repo with no remote, just for versioning purposes.

@kevindigo
Copy link
Collaborator Author

Yes, I was imagining climbing the tree looking for .git files. Good point about the repo with no remotes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants