After bundling hickory in an uberjar, it throws and exception when running that .jar file #80

JovanToroman opened this issue May 6, 2023 · 2 comments


I created a Clojure webapp and used hickory for scraping web pages. I used io.github.clojure/ {:git/tag "v0.9.1" :git/sha "27ff8a4"} for creating an uberjar of the app. I build it using clj -T:build uber, my deps.edn looking like this:

{:paths ["src/clj" "src/dev"]
 :deps {ring/ring {:mvn/version "1.4.0"}
        http-kit/http-kit {:mvn/version "2.5.3"}
        com.taoensso/timbre {:mvn/version "5.2.1"}
        metosin/reitit {:mvn/version "0.5.17"}
        metosin/ring-http-response {:mvn/version "0.9.3"}
        org.clj-commons/hickory {:mvn/version "0.7.3"}
        hiccup/hiccup {:mvn/version "1.0.5"} {:mvn/version "1.2.0"}
        org.clojure/core.async {:mvn/version "1.6.673"}
        com.draines/postal {:mvn/version "2.0.5"}}
 :aliases {:build {:extra-paths ["src/build"]
                   :extra-deps {io.github.clojure/ {:git/tag "v0.9.1" :git/sha "27ff8a4"}
                                org.clj-commons/hickory {:mvn/version "0.7.3"}}
                   :ns-default uberjar}
           :dev {:main-opts ["-m" "gajbe.server"]}}}

Then when I tried running the app using the java -jar target/gajbe.jar command, I encountered this exception:

Exception in thread "async-dispatch-1" java.lang.NoClassDefFoundError: hickory/core/HickoryRepresentable
        at gajbe.rasclanjivaci.ProcesorBeogradskiOglasi.izvuci_oglase(rasclanjivaci.clj:97)
        at gajbe.rasclanjivaci$fn__24183$G__24153__24185.invoke(rasclanjivaci.clj:10)
        at gajbe.rasclanjivaci$fn__24183$G__24152__24188.invoke(rasclanjivaci.clj:10)
        at clojure.core$map$fn__5935.invoke(core.clj:2770)
        at clojure.lang.LazySeq.sval(
        at clojure.lang.LazySeq.seq(
        at clojure.lang.RT.seq(
        at clojure.core$seq__5467.invokeStatic(core.clj:139)
        at clojure.core$apply.invokeStatic(core.clj:662)
        at clojure.core$mapcat.invokeStatic(core.clj:2800)
        at clojure.core$mapcat.doInvoke(core.clj:2800)
        at clojure.lang.RestFn.invoke(
        at gajbe.rasclanjivaci$dohvati_oglase.invokeStatic(rasclanjivaci.clj:117)
        at gajbe.rasclanjivaci$dohvati_oglase.invoke(rasclanjivaci.clj:115)
        at gajbe.poslovi$pokreni_obavestavaca$fn__24511$state_machine__21095__auto____24512$fn__24514.invoke(poslovi.clj:12)
        at gajbe.poslovi$pokreni_obavestavaca$fn__24511$state_machine__21095__auto____24512.invoke(poslovi.clj:12)
        at clojure.core.async.impl.runtime$run_state_machine.invokeStatic(runtime.clj:62)
        at clojure.core.async.impl.runtime$run_state_machine.invoke(runtime.clj:61)
        at clojure.core.async.impl.runtime$run_state_machine_wrapped.invokeStatic(runtime.clj:66)
        at clojure.core.async.impl.runtime$run_state_machine_wrapped.invoke(runtime.clj:64)
        at gajbe.poslovi$pokreni_obavestavaca$fn__24511.invoke(poslovi.clj:12)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.base/java.util.concurrent.ThreadPoolExecutor$
        at clojure.core.async.impl.concurrent$counted_thread_factory$reify__15124$fn__15125.invoke(concurrent.clj:29)
        at java.base/

Caused by: java.lang.ClassNotFoundException: hickory.core.HickoryRepresentable
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(
        at java.base/java.lang.ClassLoader.loadClass(
        ... 27 more

It has nothing to do with the thread being async, because it happens with sync as well. I attach the uberjar and the only file in which I use the library.

Clojure ns using hickory:

(ns gajbe.rasclanjivaci
  (:require [clojure.string :as str]
            [hickory.core :as h]
            [ :as hs]
            [gajbe.urlovi :as url]
            [gajbe.util :refer [-m]]
            [java-time.api :as jt])
  (:import (java.time ZoneId)))

(defprotocol IzvuciOglase
  (dohvati-stranicu [this] "prihvata spisak svih urlova i vraća html stranicu")
  (izvuci-oglase [this] "uzima stranicu i vraca kolekciju mapa, od kojih je svaka jedan oglas")
  (obradi-oglas [this oglas] "uzima hickory podatke jednog oglasa i vraca njegove elemente"))

(defn datum-string->instant [^String datum ^String format]
  (let [formater (jt/formatter format)
        local-date-time (.atStartOfDay (jt/local-date formater datum))]
    (jt/instant (jt/zoned-date-time local-date-time (ZoneId/systemDefault)))))

(defn- relativni-u-apsolutni
  [^String datum]
  (condp re-matches datum
    #"[Dd]anas" (jt/instant)
    #"[Jj]u[a-zA-Z\u00C0-\u024F]e" (jt/minus (jt/instant) (jt/days 1))
    #"pre nedelju dana" (jt/minus (jt/instant) (jt/weeks 1))
    #"pre ([0-9]+) nedelj[a-z]" :>> (fn [[_ broj-nedelja]]
                                      (jt/minus (jt/instant) (jt/weeks (read-string broj-nedelja))))
    #"pre ([0-9]+) dana" :>> (fn [[_ broj-dana]]
                               (jt/minus (jt/instant) (jt/days (read-string broj-dana))))
    #"([0-9]+) dan[a-z]?[\s]+pre" :>> (fn [[_ broj-dana]]
                                        (jt/minus (jt/instant) (jt/days (read-string broj-dana))))
    #"([0-9]+) sat[a-z]?[\s]+pre" :>> (fn [[_ broj-sati]]
                                        (jt/minus (jt/instant) (jt/hours (read-string broj-sati))))
    #"([0-9]+) minut[a-z]?[\s]+pre" :>> (fn [[_ broj-minuta]]
                                          (jt/minus (jt/instant) (jt/minutes (read-string broj-minuta))))
    #"([a-zA-Z]+) ([0-9]+), ([0-9]+)" :>> (fn [[_ mesec dan godina]] ;; e.g. Mar 21, 2023
                                            (datum-string->instant (str/join "/" [godina mesec (inc (read-string dan))])

  (relativni-u-apsolutni "8 sati pre"))

(deftype ProcesorKP [urlovi imena-domena]
  (dohvati-stranicu [_this]
    (slurp (first (:KP urlovi))))
  (izvuci-oglase [this]
    (let [oglasi (hs/select (hs/tag :article) (h/as-hickory (h/parse (dohvati-stranicu this))))]
      (map (partial obradi-oglas this) oglasi)))
  (obradi-oglas [_this oglas]
    (let [[{{link-oglasa :href} :attrs}] (hs/select (hs/class :Link_link__J4Qd8) oglas)
          link-oglasa (str (:KP imena-domena) link-oglasa)
          [{[naslov] :content}] (hs/select (hs/class :AdItem_name__RhGAZ) oglas)
          [{[opis] :content}] (hs/select (hs/child (hs/class :AdItem_adTextHolder__Fmra9) (hs/tag :p)) oglas)
          [{[cena] :content}] (hs/select (hs/class :AdItem_price__jUgxi) oglas)
          [{{link-fotografije :src} :attrs}] (hs/select (hs/child (hs/class :AdItem_imageHolder__LZaKa) (hs/tag :img))
          [{[mesto] :content}] (hs/select (hs/child (hs/class :AdItem_originAndPromoLocation__HgtYj) (hs/tag :p)) oglas)
          datum (relativni-u-apsolutni
                  (first (:content (last
                                     (hs/select (hs/child (hs/class :AdItem_postedStatus__swUhG)
                                                  (hs/tag :p)) oglas)))))
          kp-obnovljen? (some? (first (:content (first (hs/select (hs/child (hs/class :AdItem_postedStatus__swUhG)
                                                                    (hs/tag :a)) oglas)))))
          izvor :KP]
      (-m link-oglasa naslov opis cena link-fotografije mesto datum kp-obnovljen? izvor))))

(deftype ProcesorHaloOglasi [urlovi imena-domena]
  (dohvati-stranicu [_this]
    (slurp (first (:halo-oglasi urlovi))))
  (izvuci-oglase [this]
    (let [oglasi (hs/select (hs/and (hs/class :product-item) (hs/el-not (hs/class :banner-list)))
                   (h/as-hickory (h/parse (dohvati-stranicu this))))]
      (map (partial obradi-oglas this) oglasi)))
  (obradi-oglas [_this oglas]
    (let [[{[naslov] :content {link-oglasa :href} :attrs}] (hs/select (hs/child (hs/class :product-title) (hs/tag :a))
          link-oglasa (str (:halo-oglasi imena-domena) link-oglasa)
          [{[{[cena] :content}] :content}] (hs/select (hs/attr :data-value) oglas)
          [{{link-fotografije :src} :attrs}] (hs/select (hs/descendant (hs/tag :figure) (hs/tag :img)) oglas)
          mesto (str/join "/" (map (comp first :content)
                                (:content (first (hs/select (hs/class :subtitle-places) oglas)))))
          [tip kvadratura broj-soba] (map (comp first :content)
                                       (hs/select (hs/descendant (hs/class :product-features) (hs/class :value-wrapper))
          [{[datum] :content}] (hs/select (hs/class :publish-date) oglas)
          datum (datum-string->instant datum "dd.MM.yyyy.")
          izvor :halo-oglasi]
      (-m link-oglasa naslov cena link-fotografije mesto tip kvadratura broj-soba datum izvor))))

(deftype ProcesorBeogradskiOglasi [urlovi imena-domena]
  (dohvati-stranicu [_this]
    (slurp (first (:beogradski-oglasi urlovi))))
  (izvuci-oglase [this]
    (let [oglasi (hs/select (hs/class :classified) (h/as-hickory (h/parse (dohvati-stranicu this))))]
      (map (partial obradi-oglas this) oglasi)))
  (obradi-oglas [_this oglas]
    (let [[{[naslov] :content}] (hs/select (hs/child (hs/class :title) (hs/tag :h3)) oglas)
          [{{link-oglasa :href} :attrs}] (hs/select (hs/child (hs/class :classified) (hs/tag :a)) oglas)
          link-oglasa (str (:beogradski-oglasi imena-domena) link-oglasa)
          [{[opis] :content}] (hs/select (hs/child (hs/class :fbac) (hs/tag :p)) oglas)
          [{[cena] :content}] (hs/select (hs/class :sl-price) oglas)
          cena (str/trim cena)
          [{{link-fotografije :src} :attrs}] (hs/select (hs/class :cpic) oglas)
          [{[mesto] :content}] (hs/select (hs/class :sl-loc) oglas)
          [{[datum] :content}] (hs/select (hs/child (hs/class :fbac) (hs/class :small-light)) oglas)
          datum (relativni-u-apsolutni (str/trim (second (str/split datum #" "))))
          kp-obnovljen? (some? (first (:content (first (hs/select (hs/child (hs/class :AdItem_postedStatus__swUhG)
                                                                    (hs/tag :a)) oglas)))))
          izvor :beogradski-oglasi]
      (-m link-oglasa naslov opis cena link-fotografije mesto datum kp-obnovljen? izvor))))

(defn dohvati-oglase []
  (sort-by :datum jt/after?
    (mapcat izvuci-oglase
      [(->ProcesorBeogradskiOglasi url/urlovi-oglasa url/imena-domena)
       (->ProcesorHaloOglasi url/urlovi-oglasa url/imena-domena)
       (->ProcesorKP url/urlovi-oglasa url/imena-domena)])))

  (mapcat izvuci-oglase [(->ProcesorBeogradskiOglasi url/urlovi-oglasa) (->ProcesorHaloOglasi url/urlovi-oglasa)
                         (->ProcesorKP url/urlovi-oglasa url/imena-domena)]))

Let me know if you need any other info.

@JovanToroman sorry for the delay on this. Is this still a problem? There's a lot going on here, would you be able to create a small reproducible sample project without any of your own code to see if the problem still exists?

Unfortunately I haven't made regular commits as I made changes to my code so I am not able to go back to the version when this error used to happen. I think that you can close this as I found a different way of building and running the jar.

