Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extruct not matching up with Schema.org structured data testing tool (Incorrect image Urls) #191

Open
dconnx opened this issue Feb 22, 2022 · 3 comments

Comments

@dconnx
Copy link

dconnx commented Feb 22, 2022

Test Url: https://www.fabucci.ie/ladies-shoes/marian-gold-stiletto-with-black-toe-cap.html
Schema.org Structured Data Testing Tool for same: https://validator.schema.org/#url=https%3A%2F%2Fwww.fabucci.ie%2Fladies-shoes%2Fmarian-gold-stiletto-with-black-toe-cap.html

On this page there is 1 product with embedded microdata structured data. The product has 3 images and the structured data testing tool shows 3 JPG image URLs.

Extruct parses this page HTML and returns an "images" Array but, instead of the JPG image URLs, the Array contains the [structured data] Product Url repeated 3 times.

Schema.org structured data testing tool:
image

Extruct parsed data:
image

@The-Nightwing
Copy link

The-Nightwing commented May 3, 2022

How can I get this parsed data using the url given? I mean how can i reproduce this error.

because i am able to get a perfect array of jpg image urls, and that too correct.

like this:
(part of the extruct parsed data)

'type': 'http://schema.org/BreadcrumbList'},
               {'properties': {'description': 'Read more',
                               'image': ['https://www.fabucci.ie/14873-medium_default/-marian-navy-suede-envelope-clutch-bag.jpg',
                                         'https://www.fabucci.ie/14874-medium_default/-marian-navy-suede-envelope-clutch-bag.jpg',
                                         'https://www.fabucci.ie/14875-medium_default/-marian-navy-suede-envelope-clutch-bag.jpg',
                                         'https://www.fabucci.ie/14876-medium_default/-marian-navy-suede-envelope-clutch-bag.jpg'],

Also the url given by you is not working anymore, so picked a different product from the same website.

@getorca
Copy link

getorca commented Oct 1, 2022

I've noticed this issue before as well. It shouldn't be in the breadcrumb list.

because i am able to get a perfect array of jpg image urls, and that too correct.

You're sharing the breadcrumb list, I would check the type for Product.

I've noticed this issue before as well with extruct

@The-Nightwing
Copy link

Can we resolve this issue internally? @getorca

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants