Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asciicast format version 2 #196

Closed
wants to merge 0 commits into from
Closed

asciicast format version 2 #196

wants to merge 0 commits into from

Conversation

ku1ik
Copy link
Contributor

@ku1ik ku1ik commented Mar 13, 2017

The ideas behind asciicast v2 format are:

  • enable live, incremental writing to a file during recording (with v1
    format the final recording JSON can only be written as a whole after finishing
    the recording session),
  • allow the players to start the playback as soon as they read the meta-data
    line (contrary to v1 format which requires reading the whole file), without need to buffer whole recording in memory,
  • use the same data representation whether you're recording to a file or streaming via UNIX pipe or WebSocket.

Preview of the doc: https://github.com/asciinema/asciinema/blob/v2/doc/asciicast-v2.md

@dhobsd
Copy link

dhobsd commented Mar 16, 2017

CasTTY is an asciicast-compatible recorder and web player that also records / plays back audio. I'd like to keep it compatible with the asciicast format, but also have some of the needs of my tools supported. In some cases (audio, resizing), CasTTY already has a solution, so compatibility between the tools would be great to maintain.

Audio

(From an IRC discussion in #asciinema)

There is no interest in supporting PCM audio formats for recorders or players. All audio should be output / input in a format that formally describes the structure of that audio. As such, PCM audio and properties shouldn't be part of this spec.

However, we should add an attribute to specify a path to some audio data. This should be specified as a URI to allow for audio stored on non-HTTP endpoints (for example, on a local or network filesystem while recording).

For live streaming, we should consider the complexity of synchronizing the events with audio. In particular, events and audio are going to be delivered over separate channels, and the audio must get priority. However, people tuning into the stream in the middle will need some method of synchronization to understand the position of their starting audio WRT the start of the cast so that:

  1. Events received that occured before the delivered audio started are discarded.
  2. Events received that occur after the audio is delivered are engaged at the right time, with respect to the audio position.

The CasTTY player synchronizes events with respect to audio.currentTime (because audio gets real-time treatment whereas setTimeout is a joke, especially when events are consistently ~100ms apart).

Compatibility & Streaming

v2 players will not be compatible with v1 video, given NDJSON, and vice versa. Well, at least v1 definitely can't support v2, but v2 needs a fair bit of back-compat glue around parsing and/or content type to support v1. The two reasons for using NDJSON are for flushing each event to avoid issues with crashes, and to better support live streaming from a recording session.

Handling partial output

The first argument for NDJSON is that JSON requires an entire object to be valid. CasTTY writes to its output stream with no buffer. From the ticket mentioned, in the rationale, even outputting a partial recording would have been preferable. This would be possible simply by not buffering output (which is what CasTTY does). However, a partial output file in the v1 format is fundamentally broken, possibly in two ways.

The most obvious is that you may have a partial event at the end. The second is that the duration of the video is still unknown.

Live Streaming

Live streaming is assumed to be complicated with a complete JSON object because you do not have a complete JSON object.

Suggestion

This suggestion allows trivial compatibility of v2 players with v1 format files, and compatibility of v1 players with v2 recordings(!) -- v1 players would only not support live streaming and e.g. keyboard input subtitles.

Both cases of the partial output problem can be fixed trivially with a utility that follows this silly algorithm:

  • Parse the JSON incrementally until you find the last element. During parsing, keep a total of the times in a duration counter
  • If the last element is complete, close the event array by appending ],"duration":<whatever>
  • If the last element is incomplete, seek to the previous comma, and replace it with the string above. Then truncate the file to the current position.

For live streaming, we can set up websockets for each event channel. In this case, the event format does not change. Each event is still represented as [<delta>, "<event>"]. Websockets is a message-oriented protocol, so there is no concern about receiving a partial JSON array in the event stream.

Therefore, I would propose the new protocol look exactly like v1 for a recording, except for the introduction of a new stdin key:

{
  "version": 2,
  "width": 80,
  "height": 24,
  "duration": 1.515658,
  "command": "/bin/zsh",
  "title": "",
  "env": {
    "TERM": "xterm-256color",
    "SHELL": "/bin/zsh"
  },
  "stdout": [
    [0.248848, "\u001b[1;31mHello \u001b[32mWorld!\u001b[0m\n"],
    [1.001376, "I am \rThis is on the next line."]
  ],
  "stdin": [
    [0.248848, "Hello World!\n"],
    [1.001376, "I am \rThis is on the next line."]
  ]
}

v1 players can then play back the v2 recording format. v2 players can play back v1 recording format. Live streams would be slightly different:

{
  "version": 2,
  "width": 80,
  "height": 24,
  "command": "/bin/zsh",
  "title": "",
  "env": {
    "TERM": "xterm-256color",
    "SHELL": "/bin/zsh"
  },
  "stdout": "ws://uri.to/stdout",
  "stdin": "ws://uri.to/stdin"
}

The data over the websockets would be presented in the form [[<delta0>, "<event0>"], [<delta1>, "<event1>"],...,[<deltaN>, "<eventN>"]]. In this case, the duration key becomes optional. It would be fine in all cases to specify that "title", "command", and "env" are optional in all cases.

A v1 player can be modified to detect the type of stdout and refuse to play it if !Array.isArray(data.stdout).

Using websockets here has an additional advantage: multiplexing data from N different event streams (keyboard input, terminal output, audio) into M different outputs where M<N is going to be difficult. I suspect that asciinema's recorder forks at least once to do a forkpty(3) equivalent. CasTTY in fact has three processes (input, output, shell / command) and 5 threads (input, output, shell / command, audio reader, audio writer). When recording audio, you need to synchronize events to a single clock (which in the case of audio is the audio clock, defined by the sample rate). Sharing the clock between processes is annoying, and I'll need to figure out how to do that in CasTTY. But worse than that would be handling multiplexed input from keyboard and shell from different processes, synced to the same clock, into a single output stream. When all output streams are different, the only shared resource is the clock, and less sharing is always easier.

Environment

Can we further specify what is taken from the environment? Environment can be sensitive, and while the file format only specifies SHELL and TERM, it leaves the possibility open for recording other environment variables. (In particular, CasTTY records PS1 as well.) Since environment is a resource that people sometimes consider private, I think it makes sense for this spec to mandate what conforming players / recorders are allowed to observe as "useful for debugging purposes".

Input and Pausing

There were some concerns mentioned about input. Though the spec doesn't mention it yet, it was mentioned that the input events would be used to show keystrokes during playback as if they were subtitles. The use-case is that keystrokes for utilities like GNU screen or tmux might not appear in output.

However, other keystrokes that do not appear in output are cases where termios.c_lflag &= ~ECHO -- i.e. typing passwords. CasTTY supports pausing, which might be useful for avoiding this. However, that doesn't work very well during a live stream. I propose also supporting an "input mute" for keystrokes such that one may disable them during a broadcast, should the need arise. This would function like an audio mute, but would mute output on the keyboard input channel.

Resizing

CasTTY handles resizing by enforcing that the resize is never larger than the terminal window at the start of recording, or the options passed to it in -r and -c (which also cannot be larger than the terminal window at the start of recording). By doing this, resizes are guaranteed to fit within the player window as it is determined at start.

I would recommend not making resizing an event encoded in the stream for a few reasons. But in particular, it's hard to get correct. Window resize events are delivered via SIGWINCH, and may be delivered to a different process or thread than the one that is doing the output. Synchronizing this event with input (which may be coming from a pipe, not a keyboard) and output from the executed program is not easy / possible to get correct without some kind of serialization or total order of events, which either means buffering / sorting output or locking.

If you really wanted to add this, it would still be much easier as a separate event stream, instead of being interleaved -- especially since SIGWINCH is going to be delivered to the process on the master end of the pty.

@ku1ik
Copy link
Contributor Author

ku1ik commented Mar 18, 2017

@dhobsd wow, that's a comment!

Re audio:

I think we're on the same page. I agree that audio/screen sync is a challenge, and syncing screen updates to audio clock sounds like a good solution for it (although see my suggestion/question at the end of this comment re using audio clock).

A meta-data attribute like audio_url/uri or similar could be used to keep the path to audio file/stream. There's one thing to take into account here though. When moving/copying asciicast with audio reference, you need to always copy the audio file with it, and you need to make sure the audio URI is still valid after move (you may need to adjust it in case it was absolute file path like /home/user/audio.mp3 and you copied the recording to a different machine). Another problem would appear for recordings uploaded to asciinema.org - there's no plan to host audio files on asciinema.org (in near future, maybe some day...), but what can we do with recordings referencing local audio file? Show the warning before upload saying the audio won't be uploaded? Ask the user to upload it manually to some public URL and adjust the URI in asciicast file? Not sure yet. Or maybe we need extra "container" file, pointing to both JSON and audio file? (that would make audio stream and "video" stream equal class citizens, but feels like too much complexity at the same time)

Re compatibility & streaming:

Re knowing the duration of the recording: for live streams you obviously don't know and don't need to know. For recorded sessions the player can just go through all the stream events and add that up right after loading the file into memory.

One important reason for writing to disk in realtime is to not buffer whole recording in memory. I mentioned crashes, but preventing incremental mem usage growth (as the recording proceeds) is also desired. This would enable creating arbitrary-length recording sessions (only disk space is the limit), and would also allow asciinema play (or any other player) to play straight from disk, without loading the whole thing into mem.

Making player support both v1 and v2 doesn't look like a big problem for me, and in my opinion this extra overhead is worth given the problems v2 solves (more on why I don't think extending v1 would work below).

I don't like the idea that we may be producing broken JSON files. Some people record sessions automatically when their shell is started (they basically log everything they do in terminal to some central directory of sessions) and when they reboot in one way (soft) or another (hard reboot) they would be getting broken files. This would require extra "fixing" logic in all tools reading asciicasts (asciinema play in terminal, web players etc) and would make life harder for anyone using simple ruby/python scripts they wrote to re-process their recordings.

About this example you gave:

{
  "version": 2,
  "width": 80,
  "height": 24,
  "duration": 1.515658,
  "command": "/bin/zsh",
  "title": "",
  "env": {
    "TERM": "xterm-256color",
    "SHELL": "/bin/zsh"
  },
  "stdout": [
    [0.248848, "\u001b[1;31mHello \u001b[32mWorld!\u001b[0m\n"],
    [1.001376, "I am \rThis is on the next line."]
  ],
  "stdin": [
    [0.248848, "Hello World!\n"],
    [1.001376, "I am \rThis is on the next line."]
  ]
}

This prevents writing to a file in real-time, because now you can't write both stdout and stdin. Of course you can be writing these two data streams to two separate tmp files during recording and at the end read them and build a single JSON file. This however leaves you with tmp files in case of crash/reboot.

Re environment:

I fully agree about being more specific here. We can consider removing "env" and putting the values of these 2 env vars (SHELL, TERM) as top level meta-data under "shell" and "term" keys for example.

Re Input and Pausing:

I love this idea ❤️ , but I don't see how this relates to file/stream format. It seems to be recorder-only thing to me.

Re resizing:

This whole resizing business is tricky, and I don't think there's one good way to solve it. If I understand correctly, when castty receives SIGWINCH it passes down min(initial_width, current_width), min(initial_height, current_height) so the recorded process gets it clamped. Right? That's neat. Indeed with this approach you can avoid extra SIGWINCH stream.

Re total order of events:

I believe having total order of all kinds of events makes things simpler and easier to reason about.

We're not dealing here with High-Frequency-Trading system or any other where microseconds make difference, so multiplexing everything that happens onto a single stream doesn't have practical downsides to me. I have a felling many of your suggestions come out of the way you implemented castty. You talk about synchronizing clocks, threads, processes, SIGWINCH delivered to random process/thread and coordination problems related to all that.

If you create a dedicated thread for writing all sorts of events (not including audio here) into file/websocket, then this process can have a single clock, and read events from a buffered, thread-safe queue. All other processes/threads (handling stdin, stdout, all of them trapping sigwinch) can just push events to this queue in fire-and-forget fashion. I guess having such thread-safe buffered queue is okay if you only have threads and gets tricky when you have multiple processes. But maybe you can just use a pipe here as buffered queue (not sure how UNIX pipes deal with multiple concurrent writers though...). Then there's audio, but maybe there's no need to use audio clock for timing non-audio events. What's wrong with dumping raw audio stream to audio file, just like that, and using single getCurrentTime value obtained at the start of the recording session in the event-writer thread as the zero/base time for calculating absolute time for each event (and deriving delta from that)? What are the bad cases here? (de-sync of video audio in some circumstances?)

Thanks for the comment, it definitely opened my eyes for many things!

@dhobsd
Copy link

dhobsd commented Mar 18, 2017

Audio

I would suggest that in the case that asciinema.org were to support audio upload, the way it would work would be that the recording would always reference a local file. The upload functionality would send both the JSON and the audio file referenced from it to the remote server, which would then rewrite the JSON to reference a path to the uploaded audio. To download such an asciicast, you would receive a tarball that extracted both the JSON and the audio into a directory; the JSON would have an audio_uri that referenced a relative path from it to the audio file. If the player sees a relative path in the URI (something like file://./foo.mp3), it would use realpath(3) (or similar functionality) to resolve where the audio file is.

Regarding the name, audio_uri would be my preference.

Compatibility

Let's throw out the idea of player compatibility entirely, then. I thought it might be useful, but maybe not. You made a point about having input commands go through a control channel to the output process (which basically is what they do anyway), so I guess NDJSON is fine. But there are still clock problems, and more on that later.

It would still be nice if the 3-tuple of each event was (delta, data, type) instead of (delta, type, data), because then at least the v2 player doesn't have to deal with variable indexes depending on the format it's looking at. I.e. the event decoder for v1 is the same as for v2, except it doesn't look at e[2].

Pause, Input Mute, and Format

It doesn't have to do with the format, but I think the spec should warn people implementing an input recorder to consider the security implications of that feature, and that pause / input mute are recommended for that reason.

Total Order of Events and Clocks

Total order makes the events simpler to think about during playback, but the complexity of linearizing time ends up somewhere. Your point about just sending input events to the output thread over a separate channel is reasonable. That's how I implement commands, and how I was going to solve the clock sharing issue, but I didn't consider the implications on the requirements for the stream (and in particular that interleaving it stops being a problem then). So good point.

Regarding using multiple clocks when recording audio, you just can't reasonably do it. Audio is recording at some fixed sample rate in real-time, and the system clock has no real-time guarantees. Every time your read of the system clock is late, you either accumulate that latency, or try to synchronize it with the audio clock. In which case, you may as well have just used the audio clock.

So what I do right now is effectively what you describe. The audio clock is just the sample rate, which is incremented whenever a frame is queued for writing. The "delta" of an event is defined as the difference between two readings of (frames_sampled * 1000) / sample_rate.

There is still one real problem with this approach, which is that you can (and basically almost always will) have latency reading the clock. Because of this, you still carry accumulating skew every time you read the clock later than before. The delta of two absolute values from a start time does give you a way to get out of the skew, which is that you can do an exact read and get rid of all previously accumulated latency, or you can do a read with less latency and get rid of some of it. But basically there's some constant factor of clock read latency present, and any time that fluctuates to a higher value, the only way to counter it is with an exact read, which doesn't necessarily happen often.

If instead we recorded these at absolute offsets, we would have [[1.001, "e1"], [2.002, "e2"], ... -- the latency of the read only skews the latent event, not any subsequent events. And this is only the best case you can achieve with prev-delta.

When audio is in the mix, any latency sucks, and I still see drift come and go in longer recordings.

From this perspective, I'd really love to go back to what I used to do, which is having offsets recorded as delta from start, instead of delta from previous. This makes live streaming harder, but only if your header doesn't include audio start time from 0 (which probably isn't hard).

Live streaming without audio isn't difficult because you're just going to play whatever events you have whenever you get them, so the delta is basically useless. Live streaming with audio means that you're going to play events you have whenever you get them, unless they're in the future wrt the audio.

Playback and seek of a regular recording with or without audio is actually easier, because the player doesn't have to keep track of an event duration anywhere: the next event is always the delta between the previous events (modulo the audio time, if that's playing).

Soooo, this makes me wonder whether I could convince you to make the event delta a delta from start instead of delta from previous? The JSON is bigger, but compression helps.

@ku1ik
Copy link
Contributor Author

ku1ik commented Mar 18, 2017

Let me start with addressing the last thing you wrote: making time absolute (seconds elapsed since the beginning), I'll get to all the other things in following comments.

I would consider this.

Let's assume we use single interleaved stream of events. If we used absolute time, the players/tools which don't understand/use certain events could just completely filter them out, without affecting all subsequent events. With delta-from-prev, as we have now, you still need to take delta from ignored/unused events and accumulate them into next ones. That would be one (relatively small) argument to have absolute time.

Having absolute time would be more precise, not only because of not accumulating clock read skew, but also because of not accumulating float addition skew.

The downsides of absolute time, as I see today, would be:

"Time compression" (-w / --max-wait) would be slightly harder to implement, on any stage (recording, post-processing, playback). When clamping "pause time" of given event, we would need to adjust absolute time of all subsequent events by "total subtracted time accumulated so far".

The other related thing is, if you want to remove a stdout print event today (because it printed some secret you don't want to disclose, or you just want to remove some part of the recording), then you just open JSON and remove the lines you want. With absolute time you can't just do that - after removing any line you need to adjust the time of all the following events, which isn't feasible for human, you need a tool for that.

The size of the file would slightly go up, but in a minor way, so that's fine.

@dhobsd
Copy link

dhobsd commented Mar 19, 2017

I hadn't even thought of the delta-prev issue for dealing with interleaved event streams with possibly unsupported event types. It is indeed a small burden to support the additional delta, but it's an interesting point. The float skew is another one I hadn't thought about, though I suspect that's going to be relatively minor: events are basically ms-level accuracy, recorders and players are using IEEE-754 doubles, and the precision of setTimeout is terrible anyway.

One additional point is that it does actually make the code simpler. My LOC went up on both the recording and player side to do delta-prev. I mean, it's obviously a trivial amount, but deleted code is debugged code.

Time compression isn't a thing I care much about, because I eventually want to support appending to a recording, and one can already compress time by "pausing" during recording. Since I'm mostly concerned with syncing to audio tracks, compressing events would be a poor UX. It also seems like this could be solved on the player side as well, where the person actually watching the video could define a maximum timeout, or watch the video with some timer coefficient.

Removing an event because of secrets I do very much care about, but since I also care about audio, removing an event actually shouldn't shift the times of any subsequent events. Here again, pausing during recording provides a reasonable strategy to avoid the problem entirely. But even if we assume that's not enough, I'm not sure I agree that you can always just remove the event anyway. What if the secret-containing event also has some terminal escape sequences in it that move the cursor? All of a sudden, you have to actually go edit one or more events, and if you don't preserve terminal sequences, the rest of your recording is wonky.

I've considered adding editing tools for this kind of thing (to be able to cut bits of audio and shift bits of recording along with it). But I really think these should be tools: if you want a time shift, or you want to "mute" a secret over some sequence of events, having a tool that knows how to preserve the rest of your terminal is pretty helpful.

@dhobsd
Copy link

dhobsd commented Mar 21, 2017

Summary of points:

I consider necessary and non-controversial:

  • Warn implementors on the security implications of recording keyboard input and environment variables
  • Require a practical subset of which environment variables should be recorded (suggested: TERM, SHELL, PS1, PS2), when they are recorded
  • Reserve audio_uri for audio-enabled recordings
  • Ditch window resize event type in favor of constraining that in recording

I consider necessary and possibly controversial:

  • Make time representation absolute from start of recording, instead of relative to previous event

I don't really care about, but I think would be nice from an implementor perspective:

  • Change event format to [<time>, <event>, <type>] (instead of currently-specified [<time>, <type>, <event>]) so that v1 and v2 can at least share event decoding.

@ku1ik
Copy link
Contributor Author

ku1ik commented Mar 24, 2017

Quick note on resize events: it seems there's a family of CSI sequences for controlling the terminal window, including the one to resize it. See here: #198

That's another argument to ditch separate stream of resize events. The recorder implementation could either do what castty does (don't record sigwinch, clamp them for slave), or insert them as \e[8;h;w;t seq in recorded stdout stream. The latter would mean JS players could make use of it (or not), and when replaying a recording in actual terminal, it would resize it (for the ones supporting this seq) without any extra code on the cli player side.

@ku1ik
Copy link
Contributor Author

ku1ik commented Mar 24, 2017

Given we're left with only 1 currently supported event type ("o" - print to stdout), and 1 possibly useful but not supported/used yet ("i" - stdin), I'm considering going back to [delta, "stdout text"] (supporting only stdout event stream) for now.

I have different priorities now than displaying keystrokes (really not sure when I would get to this, it's already waiting years), and there's always possibility to make v3 format in the future.

The most important thing for me right now is to make this incremental writing / appending / streaming friendly.

@dhobsd is stdin recording important to you in a short term?

@ku1ik
Copy link
Contributor Author

ku1ik commented Mar 24, 2017

I realized something re using control seq for storing resize events. If this seq ends up in stdout not as a result of SIGWINCH but because of an app printing it (essentially requesting window resize) and terminal doesn't support this (doesn't send SIGWINCH here) then the full screen app (for ex. vim) doesn't resize, but the JS player would resize (it doesn't have a way to tell if this esc seq resulted in resize or not).

Need to give it more thought...

@dhobsd
Copy link

dhobsd commented Mar 25, 2017

I don't have any short term goals for supporting stdin recording. Right now it's playback with audio on console and possibly some editing tools. I'm fine to punt on stdin for now and window sizing entirely.

But can we make it delta-start instead of delta-prev? :)

@josegonzalez
Copy link

I added this comment in #127 - and its certainly not as well thought out as the above comments - but adding here for posterity.


One thing that would be good would be to have the start time of the session stored in the initial metadata. This could be extracted in other tooling to provide auditable ssh sessions.

Additionally, being able to "inject" metadata might be interesting, so you could potentially tag a created session with stuff like:

  • user that ssh'd
  • hostname
  • server environment

And have that be exposable in some external ui.

@dhobsd
Copy link

dhobsd commented Apr 28, 2017

I think custom keys could be useful. It'd be up to the recording tool to determine how to let users set them, but I think it's a good idea.

I like the color palette suggestion in asciinema/discussions#8 as well. If automatically detecting the colors used turns out to be a hard problem, we can always export a CLI option that allows using a pre-set palette by name, or defining the colors to use in some palette namespace in tools that support it.

@XANi
Copy link

XANi commented May 16, 2017

Currently I'm developing re-implementation of server for similiar session-capture (just focused more on auditing than casting) and I was looking into using asciicast format. Some feedback about format:

  • extra metadata is very useful and it is not always just environment variables so maybe it should have additional meta, or meta_{app_specific_suffix} key ? That way say audit app could have meta_auth key that describes auth method, user login, etc, while screencast could have meta_author with author data. Or just simple meta_timestamp with absolute start of recording (or maybe it should be a field of its own?)
  • Time absolute to start of cast or option to have it. Aside from previously mentioned issues, it allows for trivial rough index creation (just divide file X times and read timestamp of first valid event from each to create basic index) and with that quick jumping across big files with greatly reduced load time.
  • As for resizing, the protocol should not be dictated by what app can or can't do right now, having support in protocol now at the very worst will make it work same as v1 and when someone figures out a way to do it reliably it will just work better without having to modify format
  • On terminal metadata in general. Maybe instead of "size" it should output "all I know about current terminal state right now" ? Currently it isn't much more than size, but if/when someone implements recording input then states like "does input echo is on" become important

@ku1ik
Copy link
Contributor Author

ku1ik commented Sep 2, 2017

@XANi I think the unix timestamp of start of the recording could be top level field of its own, that's a good idea (also suggested by @josegonzalez).

Re "is input echo on?" example, in this particular case we have this information in recorded stdout stream in a form of non-printable esc sequence, as apps turn it on/off by just writing to stdout. Almost all terminal's internal state is driven by stuff written to stdout, and resizing of terminal is an event coming from outside of the terminal, which is very different in nature than state modified by the apps running within this terminal. I don't know about any other "external force" other than SIGWINCH as of this moment (correct me if I'm wrong).

@ku1ik
Copy link
Contributor Author

ku1ik commented Sep 2, 2017

About collected environment variables:

What could work is to have a white-list of env variables saved under env key in the recording header. It could default to SHELL,TERM and you could override it either via command line switch when recording or via config file option.

Having this you could set it to for example SHELL,TERM,USER,HOSTNAME,CUSTOM_STUFF and have all the information that's useful to you.

UPDATE: I opened separate issue to discuss env var collection: #222

@ku1ik
Copy link
Contributor Author

ku1ik commented Sep 2, 2017

As for extra, non-environment meta-data in the header:

I see following options:

  • use the above env vars feature: put any extra info in new env vars and white-list these. Downside is that your shell and all apps started in it would inherit these env vars and this could potentially leak some sensitive data
  • use underscore prefixed keys, like "_session_id": "67fds6futysgh", "_my_internal_thing": "blah" - the UNIX way, similar to . prefixed hidden files. You could add them manually / with a script, after recording is finished by editing the file, or the recording tool (e.g. castty) could add them automatically, prefixed with a tool name if it's tool specific thing (_castty_audio_url)

UPDATE: I opened separate issue to discuss this in detail: #223

@ku1ik
Copy link
Contributor Author

ku1ik commented Sep 2, 2017

About event timing in v2: I'm convinced to go with absolute (relative from the start of the recording) time. /cc @dhobsd

UPDATE: this is already implemented in this branch.

@ku1ik
Copy link
Contributor Author

ku1ik commented Sep 2, 2017

command key should probably be dropped (or disabled by default) - see the issue with it here: #216

@ku1ik
Copy link
Contributor Author

ku1ik commented Sep 2, 2017

There's also a case of file extension for v2.

We used .json for v1 asciicasts, but I wouldn't use it with v2 as json-lines/ndjson is not JSON.parse-able. Each line is json-parsable, the whole thing is not, so using .json would send false message to users and/or tools.

From what I understand json-lines spec and ndjson spec allow each to be any JSON value, which means our format of first header line (object) + subsequent event lines (arrays) conforms to them. So I'd rather go with .jsonl or .ndjson rather than just .json.

I'd consider going even further with this: use new media type (content-type), something like application/vnd.asciicast.v2, and a file extension like .cast. The reasoning behind using custom media-type/extension is: if you see file with .jsonl ext, how do you know it's asciicast and not Hadoop/Spark dataset or other thing? Any json-lines supporting tool (like jq) can read and process .jsonl files with the assumption that each line represents distinct and equal entry ({...} object in 99% of cases). We use { ... } only in the first line, and [...] in all subsequent lines, and processing such file with generic .jsonl tools would probably not work (easily) anyway. So using custom extension would give it domain-specific meaning, saying "this is asciicast v2", and the v2 spec would say "it's encoded as json-lines, with the following meaning of lines: ...".

UPDATE: I opened separate issue to discuss this in detail: #224

@XANi
Copy link

XANi commented Sep 2, 2017

Custom extension/type is definitely good idea as then it can be bound to app under desktop or browser and "just work" when clicked. As long as it is still parseable (nd)json(-lines)

As for metadata I think there should just be meta key on top level and all of the extra data goes under it, just to keep it contained under one place that, if not needed/used can be just dropped by dropping single key. Same way env is.

That way if you want to "anonymize" cast, just drop 2-3 keys instead of iterating over whole structure to drop a bunch of them.

command should definitely stay in spec, just not used by default(or at all) by asciinema.

That allows format to be used not just pretty demos, but also for stuff like auditing and access control

@ku1ik
Copy link
Contributor Author

ku1ik commented Sep 4, 2017

I opened separate issue for discussing color theme shape: #221 (let's discuss this topic there)

@ku1ik
Copy link
Contributor Author

ku1ik commented Sep 4, 2017

@XANi thanks for feedback. This thread is getting really long so I decided to split v2 format discussion in sub-topics. Would you mind copying your comment about file ext and meta-data into #223 and #224 ?

@ku1ik ku1ik changed the base branch from master to v2 September 16, 2017 19:18
@ku1ik
Copy link
Contributor Author

ku1ik commented Sep 16, 2017

I added timestamp to asciicast header in #229.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants