Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONDecodeError: Extra data: line 21 column 1 (char 572) for URL https://lubelska.co.uk/ #143

Open
advance512 opened this issue Jun 16, 2020 · 2 comments · May be fixed by #144
Open

JSONDecodeError: Extra data: line 21 column 1 (char 572) for URL https://lubelska.co.uk/ #143

advance512 opened this issue Jun 16, 2020 · 2 comments · May be fixed by #144

Comments

@advance512
Copy link

advance512 commented Jun 16, 2020

Seems that the issue is that the JSON-LD document is:

// <![CDATA[
{
  "@context": "http:\/\/schema.org\/",
  "name": "Lubelska",
  "@type": "Organization",
  "logo": "https://lubelska.co.uk/wp/wp-content/uploads/2019/05/Lubelska-1.jpg",
  "url": "https://lubelska.co.uk/",
  "sameAs": [
    "https://twitter.com/EdwardHowey",
    "https://www.facebook.com/Lubelska-309144763268698/",
    "https://www.pinterest.co.uk/lubelskaltd/",
    "https://www.instagram.com/lubelska1/"
  ],
  "contactPoint": [{
    "@type": "ContactPoint",
    "telephone": "+44 20 3911 5526",
    "email": "info@lubelska.co.uk",
    "contactType": "sales"
  }]
}
// ]]&gt;

and after the replacing in jsonLd._extractItems():

            # sometimes JSON-decoding errors are due to leading HTML or JavaScript comments
            data = json.loads(
                HTML_OR_JS_COMMENTLINE.sub('', script), strict=False)

it becomes:

{
  "@context": "http:\/\/schema.org\/",
  "name": "Lubelska",
  "@type": "Organization",
  "logo": "https://lubelska.co.uk/wp/wp-content/uploads/2019/05/Lubelska-1.jpg",
  "url": "https://lubelska.co.uk/",
  "sameAs": [
    "https://twitter.com/EdwardHowey",
    "https://www.facebook.com/Lubelska-309144763268698/",
    "https://www.pinterest.co.uk/lubelskaltd/",
    "https://www.instagram.com/lubelska1/"
  ],
  "contactPoint": [{
    "@type": "ContactPoint",
    "telephone": "+44 20 3911 5526",
    "email": "info@lubelska.co.uk",
    "contactType": "sales"
  }]
}
// ]]&gt;

and naturally this part which was not replaced:

// ]]&gt;

causes the error.

@advance512 advance512 changed the title https://lubelska.co.uk/JSONDecodeError: Extra data: line 21 column 1 (char 572) for URL JSONDecodeError: Extra data: line 21 column 1 (char 572) for URL https://lubelska.co.uk/ Jun 16, 2020
@Vitiell0
Copy link

Vitiell0 commented Jul 27, 2020

Having the same problem with this url: https://www.eatwell101.com/shrimp-and-broccoli-foil-packs-recipe

Which has this as the value for script after running HTML_OR_JS_COMMENTLINE

'\n{
"@context":"https:\\/\\/schema.org\\/",
"@type":"Recipe",
"mainEntityOfPage":{
"@type":"WebPage","
@id":"https:\\/\\/www.eatwell101.com\\/shrimp-and-broccoli-foil-packs-recipe"},
"name":"Baked Shrimp and Broccoli Foil Packs with Garlic Lemon Butter Sauce",
"url":"https:\\/\\/www.eatwell101.com\\/shrimp-and-broccoli-foil-packs-recipe",
"headline":"Baked Shrimp and Broccoli Foil Packs with Garlic Lemon Butter Sauce",
"Description":"This baked shrimp foil pack meal is ready in under 30 minutes - The easiest way to cook shrimp in your oven!",
"author":{
"@type":"Person",
"name":"Christina Cherrier"},
"image":"https:\\/\\/www.eatwell101.com\\/wp-content\\/uploads\\/2019\\/04\\/shrimp-and-broccoli-recipe-2.jpg",
"datePublished":"2020-01-10 07:47:21",
"dateModified":"2020-06-20 17:47:39",
"Publisher":"Eatwell101",
"ingredients":"",
"prepTime":"PT10M",
"cookTime":"PT15M",
"recipeYield":"2 servings"}
// ]]>\n'

so same problem where // ]]>\n' was not replaced correctly

@Vitiell0
Copy link

Just opened a PR with a fix here: #144

@Vitiell0 Vitiell0 linked a pull request Aug 27, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants