Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multi-document YAML streams #534

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

hairyhenderson
Copy link
Owner

Note: This probably won't work for a while - I just want to park this here so I don't forget about it 😉

YAML files (or "streams") can contain multiple documents, where the --- sequence denotes the start of a document. Currently gomplate will only parse the first document in the stream, ignoring others.

Usually this isn't a big deal, because most YAML files only contain one document, but multi-document streams are becoming more common (like with Kubernetes).

Consider:

# ignored comment
---
# empty document, but not ignored due to the ---
---
foo: bar
---
foo: baz
$ gomplate -c in.yaml -i '{{ .in | toJSON }}'
{}

I think the behaviour should be:

  • when parsing with data.YAML (MIME application/yaml), the value should be the first non-empty document
  • when parsing with data.YAMLArray, the value should be an array of maps (empty ones should be null)
$ gomplate -c in.yaml -i '{{ .in | toJSON }}'
{"foo":"bar"}
$ gomplate -d in.yaml -i '{{ include "in" | YAMLArray | toJSON }}'
[null,{"foo":"bar"},{"foo":"baz"}]

Signed-off-by: Dave Henderson dhenderson@gmail.com

@hairyhenderson hairyhenderson changed the title [WIP] Handle multi-document YAML streams Handle multi-document YAML streams Nov 15, 2019
@hairyhenderson
Copy link
Owner Author

Some updates and random thoughts on this... go-yaml now supports multi-doc streams (!). The way it works is you call Decode multiple times on the reader, and it spits out a new document each time. This means that simply shimming this into data.YAMLArray isn't that simple. I'd have to detect whether the stream contains multiple documents first (maybe as simple as "are there many ---s in this string?).

I've explored another interesting way of handling this in gomplate, which is to have a function (data.YAMLStream) that returns a chan interface{} which lets you range on the output. A nice property of this is that if you don't range on it, you don't get an array, you get the first document.

The problem with this approach, of course, is that you lose chainability - you can't just do (data.YAMLStream $s).value, because it returns a channel.

I could just return an array, which would make it less flexible, but more predictable.

The other challenge is what to do with multi-document YAML stream datasources? A common use-case (for me, at least) is to use a Kubernetes resource YAML file as a datasource, which often contains multiple documents. Right now, gomplate will simply read the first document and ignore the rest. Sometimes this is good enough, sometimes not. Sometimes I'll strip out the ---s and re-parse, which is ugly but workable.

One way to do it would be to add a new bogus MIME type like application/stream+yaml, which turns on multi-document parsing, but I'm not a huge fan of that, since these are valid YAML streams, and application/yaml is the correct MIME type for that.

A perhaps more user-friendly way to approach this is to detect whether the stream contains more than one document, and return an array if so, else return the single document. There's a downside here though, with situations where the datasource has an unpredictable number of documents - like a log file, or an arbitrary collection of Kubernetes resources, which could have a single document.

Another way to handle this could be to use a fragment in the URL (like http://example.com/file.yaml#2 to get the 3rd document), though this could be quite obscure, and loses the ability to range through documents dynamically.

🤔 More thinking probably necessary on this...

@github-actions
Copy link

github-actions bot commented May 1, 2023

This pull request is stale because it has been open for 60 days with no activity. Remove stale label or comment or this will be automatically closed in a few days.

@github-actions github-actions bot added the Stale label May 1, 2023
@hairyhenderson
Copy link
Owner Author

ah, good reminder - thanks bot 😉

it's probably obvious that this has gone nowhere - mostly because I'm stuck figuring out a good API for this

@github-actions github-actions bot removed the Stale label May 8, 2023
@github-actions
Copy link

github-actions bot commented Jul 7, 2023

This pull request is stale because it has been open for 60 days with
no activity. If it is no longer relevant or necessary, please close
it. Given no action, it will be closed in 14 days.

If it's still relevant, one of the following will remove the stale
marking:

  • A maintainer can add this pull request to a milestone to indicate
    that it's been accepted and will be worked on
  • A maintainer can remove the stale label
  • Anyone can post an update or other comment
  • Anyone with write access can push a commit to the pull request
    branch

@github-actions github-actions bot added the Stale label Jul 7, 2023
@hairyhenderson hairyhenderson added this to the future milestone Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant