Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'conversion' records are underspecified #40

Open
ato opened this issue Jul 11, 2018 · 1 comment
Open

'conversion' records are underspecified #40

ato opened this issue Jul 11, 2018 · 1 comment

Comments

@ato
Copy link
Member

ato commented Jul 11, 2018

Problem 1: dates

What should WARC-Date on a 'conversion' record be? Section 5.4 says:

The timestamp shall represent the instant that data capture for record creation began.

Does 'data capture' in the context of a conversion refer to the capture of the original record? Or does it refer to the moment you started writing the transformed content? If the former how do you record the date of transformation? If the latter how do you know the date the resource was originally archived? Presumably by following WARC-Refers-To header right?

However section 6.8 'conversion' includes this statement:

Each transformation should result in a freestanding, complete record, with no dependency on survival of the original record.

Which implies you should not rely on the original record for anything... but how do you actually do that?

One solution to this problem would to be to allow and recommend WARC-Refers-To-Date on 'conversion' records. The case of a conversion of a conversion needs specifying too.

Problem 2: protocol headers

If you convert request or response record do you convert the HTTP headers too? If you don't we run into the 'freestanding, complete record' problem again. Some HTTP headers are necessary for replay.

The examples and this statement sort of imply you don't include protocol headers:

For ‘conversion’ records, the payload is defined as the record block.

Can you use a conversion record to transform from one protocol to another?

Problem 3: determining the type of the original record

Again we trip over 'freestanding, complete'. After the original record is lost how do you know if the conversion was made from a 'response' or 'request' record? Nothing seems to imply you couldn't make a 'conversion' of a 'request' or even a 'warcinfo' for that matter.

@ato
Copy link
Member Author

ato commented Jul 11, 2018

The implementation guidelines have this to say on the WARC-Date matter:

Note that a different behavior should be adopted for payload migration: according to the standard, the WARC-date of a conversion record is the date of the creation of the new record, that is when the migration occurred. There is indeed a great difference between converting a file from a container format to another, and migrating the format of this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant