Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squarespace #3243

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Squarespace #3243

wants to merge 10 commits into from

Conversation

EruditeLying
Copy link

@EruditeLying EruditeLying commented Jan 30, 2024

Translator for Squarespace blogs. Should work with most templates

Only search pages that put a data-url attribute on results are supported

Has a priority of 300 so that it has higher priority than Embedded Metadata, which is supported for Squarespace blogs but insufficient

Translator for Squarespace blogs. Should work with most templates

Only search pages that put a data-url attribute on results are supported
Normalize line endings to CR LF
We are in 2024
Remove code to scrape <meta name="datePublished"> tags, which Squarespace blogs don't use. They use <meta itemprop="datePublished"> instead, which is already handled
Lower priority to 330, as a relatively low quality metadata translator
Raise priority to 299, as it should be preferred to Embedded Metadata, which can't extract all metadata from Squarespace blogs
Add test case for tags
Priority 300 is better to take priority over Embedded Metadata
Scrape tags, if present
Copy link
Member

@AbeJellinek AbeJellinek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I wrote out a bunch of suggestions, but I'm actually wondering whether it would make more sense to fold this into the Embedded Metadata translator like we already do for WordPress blogs rather than adding a new empty-target translator.

Squarespace.js Outdated Show resolved Hide resolved
Squarespace.js Outdated Show resolved Hide resolved
Squarespace.js Outdated Show resolved Hide resolved
Squarespace.js Outdated Show resolved Hide resolved
Squarespace.js Outdated
}

function getMetadataItem(doc, prop) {
return attr(doc, 'html>head>meta[itemprop="' + prop + '"]', 'content');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we need the extra specification and not just meta[itemprop="${prop}"]? <meta> tags are valid in the body and we don't normally restrict the selector unless there's a very good performance argument for doing so (e.g. detection in Embedded Metadata).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the selector from some other translator (Embedded Metadata probably). I'll simplify it

As suggested by @AbeJellinek:
- use relative URLs in search results
- accept meta tags from anywhere in the page
- don't set website type
- use ZU.cleanAuthor to build the creator
- use Array.from instead of Array.prototype.slice.call

Additionally, call ZU.trimInternal on the abstract, to normalize whitespace

Update test cases accordingly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants