Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug with getting parent node ( /.. ) #59

Open
dastorda opened this issue Nov 26, 2020 · 2 comments
Open

Potential bug with getting parent node ( /.. ) #59

dastorda opened this issue Nov 26, 2020 · 2 comments

Comments

@dastorda
Copy link

Steps to reproduce: save the content of view-source:https://www.productfrom.com/product/416492-adidas-copa-mundial-soccer-shoes in test.html

package main

import (
	"fmt"

	"github.com/antchfx/htmlquery"
)

func main() {
	doc, err := htmlquery.LoadDoc("test.html")
	if err != nil {
		panic(err)
	}
	product := htmlquery.FindOne(doc, "//div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, 'text:right')]//span")
	if product != nil {
		fmt.Println("product:", htmlquery.InnerText(product))
	}
}

When I test the xpath expression online, e.g. here: https://htmlstrip.com/xpath-tester, then it finds a match using this expression: //div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, ' text:right')]//span.

@da70
Copy link

da70 commented Jan 7, 2021

Your Go program will work if you change "text:right" in your XPath expression to "test-right":

$ curl -o test.html https://www.productfrom.com/product/416492-adidas-copa-mundial-soccer-shoes
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 37512    0 37512    0     0  68327      0 --:--:-- --:--:-- --:--:-- 68327
$ cat main.go 
package main

import (
	"fmt"

	"github.com/antchfx/htmlquery"
)

func main() {
	doc, err := htmlquery.LoadDoc("test.html")
	if err != nil {
		panic(err)
	}
	product := htmlquery.FindOne(doc, "//div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, 'text:right')]//span")
	if product != nil {
		fmt.Println("product:", htmlquery.InnerText(product))
	}
}
$ diff main.go main-typo-fixed.go 
14c14
< 	product := htmlquery.FindOne(doc, "//div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, 'text:right')]//span")
---
> 	product := htmlquery.FindOne(doc, "//div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, 'text-right')]//span")
$ go run main.go
$ go run main-typo-fixed.go 
product: Adidas COPA MUNDIAL soccer shoes

Not sure why your original XPath expression with the colon in the class name works for https://htmlstrip.com/xpath-tester, but it does not work in Chrome dev tools console. There again if you change "text:right" to "text-right" you will get the correct element:

image

@zhengchun
Copy link
Contributor

Hello, I checked your give URL: view-source:https://www.productfrom.com/product/416492-adidas-copa-mundial-soccer-shoes, there is no any text:right characters in HTML source code ,only have text-right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants