Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsideration of Dockerfile.* file association #2128

Closed
simpleapples opened this issue Jul 5, 2020 · 27 comments
Closed

Reconsideration of Dockerfile.* file association #2128

simpleapples opened this issue Jul 5, 2020 · 27 comments

Comments

@simpleapples
Copy link

simpleapples commented Jul 5, 2020

I have a file named dockerfile.go, which is recognised as a Dockerfile because of #1908 .

I know this decision has a point #1907 , but I think it's unresonable. When a Dockerfile is recognised as Unknown, nothing will happen. But when a go file is recoginsed as Dockerfile, VSCode will format it as Dockerfile. (Especially when I choose Format on save, I think it's widely used.). As you can see on the screenshot below.

When it happend, I have to revert the code and add a *.go file association. It's a heavier action than just adding a dockerfile associaiton.

So, I think it necessary to remove Dockerfile.* file association.

Annotation 2020-07-05 114338

@bwateratmsft
Copy link
Contributor

There were overwhelmingly (about 50x) more people with actual Dockerfiles named Dockerfile.***, than non-Dockerfiles named that, so we plan to keep #1908. You can override this in your project with something like this in settings.json:

    "files.associations": {
        "dockerfile.go": "go"
    }

@simpleapples
Copy link
Author

simpleapples commented Jul 6, 2020

I read issue #1907 and I know there are a lot of people use Dockerfile.***, but it can be covered by adding Dockerfile.prod, Dockerfile.dev and so on. You can cater the large group by adding Dockerfile.xxx as more as possible. But it does not mean you you can hurt the small group.

Besides, the pattern Dockerfile.* mess up the file association of other extensions and cause conflicts.

@bwateratmsft
Copy link
Contributor

it can be covered by adding Dockerfile.prod, Dockerfile.dev and so on.

I saw hundreds of different extension endings on searches in GitHub and in telemetry; it is not sustainable to try to add them case-by-case.

Besides, the pattern Dockerfile.* mess up the file association of other extensions and cause conflicts.

Can you elaborate? How does setting a language association affect other extensions?

@simpleapples
Copy link
Author

Can you elaborate? How does setting a language association affect other extensions?

For example, I install go extension and dockerfile.go is recognized as go file, but it will be recognized as a Dockerfile after installing docker extension.

@bwateratmsft
Copy link
Contributor

I see what you mean. Can you try the language association? That should make dockerfile.go be treated as a Go file.

@simpleapples
Copy link
Author

I see what you mean. Can you try the language association? That should make dockerfile.go be treated as a Go file.

It's a solution for me, but what I want is removing the Dockerfile.* file association, if you also think it will affect other extensions too.

@bwateratmsft
Copy link
Contributor

I know it will overclassify Dockerfiles in some cases, like this one, however, this happened only a small fraction as often as underclassifying. From what I saw in telemetry it was about 2% named Dockerfile.* that wasn't a Dockerfile, compared to about 98% that was.

Due to the limitations of VSCode there's no perfect solution, but manually adding each of the hundreds of variants I saw isn't realistic. We decided that since the workaround (setting a language association) is the same for each, then it was better to do what is right for 98% instead of 2%.

@simpleapples
Copy link
Author

simpleapples commented Jul 6, 2020

I think there are other elements we should consider except the number.

Let's say I created a dockerfile.go, what I expect for this file is treating as a go file. Because it has an extension .go, there is no reason to treat it as a Dockerfile.

But when I created a dockerfile.prod, it's ok if the vscode doesn't format it as a Dockerfile, because I known Dockerfile.prod is not the offical name of a Dockerfile. Meanwhile, we can edit a Dockerfile without highlights easily, but it's hard to write a go file without auto formation.

@simpleapples
Copy link
Author

If you really want to cover situations as much as possible, I think it's better to add a First Line Pattern, because a dockerfile must begin with a FROM instruction, documents said.

@bwateratmsft
Copy link
Contributor

I'll look into this firstLine property, I didn't know about it until now.

@bwateratmsft bwateratmsft reopened this Jul 6, 2020
@simpleapples
Copy link
Author

Thank you, perl extension use the first line property (https://github.com/microsoft/vscode/blob/7a5bca9fafbbb23de289179be95e9c4832540c02/extensions/perl/package.json#L23).

@rcjsuen
Copy link
Contributor

rcjsuen commented Jul 6, 2020

If you really want to cover situations as much as possible, I think it's better to add a First Line Pattern, because a dockerfile must begin with a FROM instruction, documents said.

Ignoring newlines and taking leading whitespace into consideration, Dockerfiles should start with a comment (or a parser directive), an ARG instruction, or a FROM instruction.

@simpleapples
Copy link
Author

I did a small test and added a First Line Property on package.json of docker extension, it works.

Annotation 2020-07-06 234920

Ignoring newlines and taking leading whitespace into consideration, Dockerfiles should start with a comment (or a parser directive), an ARG instruction, or a FROM instruction.

But unfortunately, the First Line Property can not cover cases which start with a comment.

@dbreshears dbreshears added this to the 1.5.0 milestone Jul 9, 2020
@dbreshears dbreshears added the P2 label Jul 9, 2020
@bwateratmsft
Copy link
Contributor

I've done some investigating. The firstLine property gets treated as an OR alongside filenamePatterns--so, if a file matches filenamePatterns OR firstLine, it gets treated as a Dockerfile. If a conflicting filenamePattern matches, it will use that instead of Dockerfile. For instance, "test.py" starting with FROM alpine will be treated as a Python file.

With "firstLine": "\\s*([Ff][Rr][Oo][Mm].+|#.*|[Aa][Rr][Gg].+)", leading whitespaces are tolerated, and FROM ..., ARG ..., #comments, and #parser=directives seem to all be matched correctly.

Some examples that are correctly treated as Dockerfiles, assuming the "Dockerfile.*" association is removed:

Dockerfile.test:

FROM alpine
# Because the first line matches
    from alpine
# Because the first line matches
# The language server does warn about lowercase
    #a leading comment with leading whitespace
FROM alpine
# Because the first line matches
    # a leading comment with different leading whitespace
FROM alpine
# Because the first line matches
#fake=parserdirective
FROM alpine
# Because the first line matches
ARG test
FROM alpine
# Because the first line matches
ARG #blah--the language server correctly flags this as an error
FROM alpine
# Because the first line matches

Dockerfile:

using namespace std;
// Because the file name matches "Dockerfile"

Some examples that are correctly not treated as Dockerfiles:

Dockerfile.py:

print('hello world')
# Because the file name matches "*.py" from Python

Some examples that are incorrectly not treated as Dockerfiles:

Dockerfile.test:


FROM alpine
# No match on the first line, because it is blank

Dockerfile.py:

FROM alpine
# Because the file name matches "*.py" from Python

@rcjsuen @simpleapples (also @chrisdoherty4 since he originally opened #1907)

There are three shortcomings:

  • A "Dockerfile.*" file that doesn't match first line when opened, then is changed to match, does not pick up on the new language until the tab is closed and opened again. For example, if I make an empty file named "Dockerfile.test", and open it to add contents, it will not be treated as a Dockerfile (which also means no intellisense to help you draft it). If I add "FROM alpine", save, close, and reopen, it is now recognized.
  • A "Dockerfile.*" filename that begins with blank will not be treated as a Dockerfile (the second-to-last example), unless they go set the language or a file association.
  • A "Dockerfile.*" matching another language's filenamePattern will always be overridden as that language (the last example). For example, "Dockerfile.py" will be treated as a Python file, even if it is a valid Dockerfile.

What do you all think? I'm a little more concerned about the first shortcoming--it might justify adding back some of the most common Dockerfile.* names. The second and third seem pretty minor to me.

@rcjsuen
Copy link
Contributor

rcjsuen commented Jul 13, 2020

@bwateratmsft What about cases like Dockerfile.sh with a shebang?

#!/bin/sh
docker build .

Shell comment syntax is the same as Docker and Python with the #. How will this resolve?

@simpleapples
Copy link
Author

What do you all think? I'm a little more concerned about the first shortcoming--it might justify adding back some of the most common Dockerfile.* names. The second and third seem pretty minor to me.

@bwateratmsft Thank you for your investigating.

For the first shorcoming, It seems like there is no better way to solve the problem. But we can add frequently used dockerfile.xxx to filenamePattern to cover majority of situations. firstLine match can be regarded as a plan B.

For a file started with a #comment, I think it should not be treated as a dockerfile. It will hit too many types of files, such as a shell script file, as @rcjsuen said.

@bwateratmsft
Copy link
Contributor

bwateratmsft commented Jul 13, 2020

But we can add frequently used dockerfile.xxx to filenamePattern to cover majority of situations.

I don't think that's feasible long-term. We found hundreds of values in #1907, a very long tail--so adding the top 10 still leaves hundreds of people out.

I'm not yet convinced that Dockerfile.* is much of a problem. I absolutely recognize it interferes in @simpleapples' original case, but the workaround for that scenario is the same as the workaround before, just the other way around. However, the number of people with the new problem (Dockerfile.* is not a Dockerfile, but is recognized) is far smaller than with the old problem (Dockerfile.* is a Dockerfile, but is not recognized). Additionally, I still feel that dockerfile.go and similar is an antipattern.

@chrisdoherty4
Copy link

chrisdoherty4 commented Jul 13, 2020

Given I raised the original ticket I thought I would contribute in brief (and essentially echo @bwateratmsft ).

It's nonsensical to revert a change that compliments the significant majority of use-cases because of 1 edge case that can be alleviated with a rename, or an additional configuration line.

@simpleapples , you'll probably move on from your current project that uses dockerfile.go and never hit the problem again.

@simpleapples
Copy link
Author

I know it's hard to be perfect, based on several reasons: Dockerfile is a special file type which usually doesn't have a specific file extension, and VSCode allows xxx.* on filenameMatch.

So, It makes sense if you insist on remaining changes in #1908, I could set a user-defined file association to avoid the problem.

@bwateratmsft Thank you for your trying on this issue.

@bwateratmsft
Copy link
Contributor

I'm going to leave this open for now and keep thinking about it. Perhaps there's some combination of filename pattern, first line pattern, and MIME type that removes the need to compromise.

@bwateratmsft bwateratmsft self-assigned this Jul 17, 2020
@bwateratmsft bwateratmsft removed this from the 1.5.0 milestone Jul 29, 2020
@bwateratmsft bwateratmsft added this to the 1.6.0 milestone Jul 29, 2020
@TigerC10
Copy link

Another use case, template generators. I've got a Dockerfile.ejs.t file which is getting picked up by the Dockerfile.* rule rather than the ejs parser.

In my settings json I added:

{
    "files.associations": {
        "Dockerfile.ejs.t": "ejs"
    }
}

But the trouble here is that I can't share this setting with my team. 😕

@rcjsuen
Copy link
Contributor

rcjsuen commented Aug 23, 2020

In my settings json I added:

{
    "files.associations": {
        "Dockerfile.ejs.t": "ejs"
    }
}

But the trouble here is that I can't share this setting with my team. 😕

@TigerC10 You can't put it in a .vscode/settings.json file in your project?

@bwateratmsft
Copy link
Contributor

I opened microsoft/vscode#105590, I think the easiest thing would be if we could add specific exceptions (like Dockerfile.go). The number of unique filenames that are Dockerfiles was big but the number that aren't Dockerfiles seemed relatively small--Dockerfile.go seemed to be probably the most common. If we could exclude it we'd improve our language classification accuracy a fair bit.

@bwateratmsft bwateratmsft modified the milestones: 1.6.0, Future Aug 28, 2020
@bwateratmsft bwateratmsft removed their assignment Aug 28, 2020
@segevfiner
Copy link

Another issue is that after a dockerfile.go is misidentified as a Dockerfile switch the language using the Change Language Mode command leaves syntax highlighting completely messed up.

@bwateratmsft
Copy link
Contributor

Given the upstream microsoft/vscode#105590 is rejected, I'll close this. I think that the best-for-the-most strategy is to keep things as they are now, with Dockerfile.* recognized as a Dockerfile. If we continue to get feedback we can reconsider.

@bwateratmsft bwateratmsft removed this from the Future milestone Nov 3, 2020
@rcjsuen
Copy link
Contributor

rcjsuen commented Dec 8, 2020

@segevfiner That sounds like an issue with Visual Studio Code itself. Is there an issue open about this?

@segevfiner
Copy link

@rcjsuen Couldn't find any, but who knows what's the right keywords to find an existing one

@vscodebot vscodebot bot locked and limited conversation to collaborators Dec 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants