Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select all between two things? #82

Open
didibus opened this issue Jul 30, 2023 · 1 comment
Open

Select all between two things? #82

didibus opened this issue Jul 30, 2023 · 1 comment

Comments

@didibus
Copy link

didibus commented Jul 30, 2023

I struggle to figure out how to select all elements between two elements such as:

<ul class="pagination">
   <li class="prevnext"><a href="#" onclick="return false;" class="disablelink">&lt;</a></li>
   <li class="current"><a href="/pc-game-pass/games">1</a></li>
   <li><a href="/pc-game-pass/games?page=2" rel="nofollow">2</a></li>
   <li class="h-ellip"><span>...</span></li>
   <li><a href="/pc-game-pass/games?page=3" rel="nofollow">3</a></li>
   <li class="l"><a href="/pc-game-pass/games?page=4" rel="nofollow">4</a></li>
   <li class="prevnext"><a href="/pc-game-pass/games?page=2" rel="nofollow">&gt;</a></li>
</ul>

And I want to select all li elements between the one of class current and the one of class l using hickory selectors, so that I get back:

   <li><a href="/pc-game-pass/games?page=2" rel="nofollow">2</a></li>
   <li class="h-ellip"><span>...</span></li>
   <li><a href="/pc-game-pass/games?page=3" rel="nofollow">3</a></li>

How do you do that?

Thank You

@Mertzenich
Copy link

I made a couple functions to help with this task, based on Hickory's select-locs and select functions. Please pardon the doc strings, they have the necessary information but are a bit hard to read quickly.

(defn select-locs-between
  "Given a start selector function, an end selector function, a
   filter selector function, and a hickory data structure, returns a
   vector containing all of the locs between the loc selected by
   the start selector and the loc selected by the end selector where
   the filter selector function returned a loc."
  [start-selector-fn end-selector-fn filter-selector-fn hickory-tree]
  (loop [loc (->> hickory-tree
                  (hzip/hickory-zip)
                  (sel/select-next-loc start-selector-fn)
                  (zip/right)
                  (sel/select-next-loc filter-selector-fn))
         selected-nodes (transient [])]
    (if (nil? loc)
      (persistent! selected-nodes)
      (recur (sel/select-next-loc filter-selector-fn (zip/right loc) zip/right end-selector-fn)
             (conj! selected-nodes loc)))))

(defn select-between
  "Given a start selector function, an end selector function, a
   filter selector function, and a hickory data structure, returns a
   vector containing all of the nodes between the loc selected by
   the start selector and the loc selected by the end selector where
   the filter selector function returned a loc."
  [start-selector-fn end-selector-fn selector-fn hickory-tree]
  (mapv zip/node (select-locs-between start-selector-fn end-selector-fn selector-fn hickory-tree)))

As mentioned above, select-locs-between was based on the existing select-locs function. It finds the loc immediately following the loc that matches your start selector function and then recursively finds the following locs until it reaches one that matches your end selector function. Only results that match the filter selector function are returned.

Here is an example:

(require '[hickory.core :as h]
         '[hickory.select :as s])

(def html
  "<ul class=\"pagination\">
       <li class=\"prevnext\"><a href=\"#\" onclick=\"return false;\" class=\"disablelink\">&lt;</a></li>
       <li class=\"current\"><a href=\"/pc-game-pass/games\">1</a></li>
       <li><a href=\"/pc-game-pass/games?page=2\" rel=\"nofollow\">2</a></li>
       <li class=\"h-ellip\"><span>...</span></li>
       <li><a href=\"/pc-game-pass/games?page=3\" rel=\"nofollow\">3</a></li>
       <li class=\"l\"><a href=\"/pc-game-pass/games?page=4\" rel=\"nofollow\">4</a></li>
       <li class=\"prevnext\"><a href=\"/pc-game-pass/games?page=2\" rel=\"nofollow\">&gt;</a></li>
   </ul>")

(def htree
  (-> html
      (h/parse)
      (h/as-hickory)))

(select-between (s/class "current")
                (s/class "l")
                s/element
                htree)

Returns:

[{:type :element,
  :attrs nil,
  :tag :li,
  :content
  [{:type :element,
    :attrs {:href "/pc-game-pass/games?page=2", :rel "nofollow"},
    :tag :a,
    :content ["2"]}]}
 {:type :element,
  :attrs {:class "h-ellip"},
  :tag :li,
  :content
  [{:type :element, :attrs nil, :tag :span, :content ["..."]}]}
 {:type :element,
  :attrs nil,
  :tag :li,
  :content
  [{:type :element,
    :attrs {:href "/pc-game-pass/games?page=3", :rel "nofollow"},
    :tag :a,
    :content ["3"]}]}]

The question is likely a common one, I had it myself when I started using Hickory recently, so I wonder if it would be worth putting this somewhere. Since its a little less general than the typical selectors perhaps adding an extra namespace which provide this sort of functionality would be in order? Or add a section to the docs with larger examples? Not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants