Skip to content

ekchang/jsouper

Repository files navigation

Jsouper

No API, no problem. Jsouper helps you parse HTML into Java objects. It borrows from Square's Moshi and is powered by jsoup:

Document document = Jsoup.connect("https://play.google.com/store").get();

Jsouper jsouper = new Jsouper.Builder().build();
ElementAdapter<Movie> elementAdapter = jsouper.adapter(Movie.class);

Movie movie = elementAdapter.fromElement(document);
System.out.println(movie);

or with Retrofit:

Retrofit retrofit = new Retrofit.Builder().baseUrl(PlayStoreApi.BASE_URL)
        .addConverterFactory(JsoupConverterFactory.create())
        .build();

playStoreApi = retrofit.create(PlayStoreApi.class);
Call<List<Movie>> movies = playStoreApi.getMovies();

Sample

See the sample module for a Java and Android example which uses Jsouper to build a Google Play Movies UI just from hitting the website:

Custom Element Adapters

Unlike Moshi or Gson, there is a lot more work required to parse HTML to your Java objects. This is because HTML tags and attributes rarely map directly to how you want to structure your objects.

At minimum, you will need to define a custom ElementAdapter for your most primitive classes. query() is used to define the primary key for identifying the top-most Element that maps to your Java object - it gets called using jsoup's Element.select(query). You will probably need to familiarize yourself with how to extract attributes with jsoup before proceeding.

From each valid Element from this query, define how the object is constructed in fromElement:

public class CoverAdapter extends ElementAdapter<Cover> {
  @Override
  public String query() {
    return "cover";
  }

  @Override
  public Cover fromElement(Element element) throws IOException {
    final String imageUrl =
        element.select("div.cover-inner-align").select("img").first().attr("data-cover-large");
    final String targetUrl = element.select("a.card-click-target").attr("href");
    return new Cover(imageUrl, targetUrl);
  }
}

Objects composed by objects you've already defined ElementAdapter's for can be generated by Jsouper. You still need to define the query parameter, but this can be done with the @SoupQuery annotation in your model class declaration:

@SoupQuery("div.card.no-rationale.tall-cover.movies.small")
public class Movie {
  public final Cover cover;
  public final Detail detail;
  public final Rating rating;

  public Movie(Cover cover, Detail detail, Rating rating) {
    this.cover = cover;
    this.detail = detail;
    this.rating = rating;
  }
}

...

ElementAdapter<Movie> movieAdapter = jsouper.adapter(Movie.class);
Movie movie = movieAdapter.fromElement(document);

There is no serialization support at the moment.

Registering Adapters

Jsouper supports 2 ways to register an adapter. Add it explicitly when you build Jsouper:

Jsouper jsouper = new Jsouper.Builder()
    .add(Cover.class, new CoverAdapter())
    .build();

Or annotate it in your model class:

@SoupAdapter(CoverAdapter.class)
public class Cover {
  public final String imageUrl;
  public final String targetUrl;
}

Built-in Type Adapters

Jsouper currently has built-in support for

  • Collections, Lists, Sets

Contributions to add support for Maps and Arrays (and anything else that is currently missing) are welcome.

Collections do require specifying the parameterized type (this is handled automatically with the Retrofit converter):

Type listOfMoviesType = Types.newParameterizedType(List.class, Movie.class);
ElementAdapter<List<Movie>> moviesAdapter = jsouper.adapter(listOfMoviesType);
List<Movie> movies = moviesAdapter.fromElement(Jsoup.connect("https://play.google.com/store").get());
movies.forEach(System.out::println);

Download

Snapshot builds are currently available in Sonatype's snapshots repository. Get the latest JAR or depend via Maven:

<dependency>
  <groupId>com.ekchang.jsouper</groupId>
  <artifactId>jsouper</artifactId>
  <version>0.0.3-SNAPSHOT</version>
</dependency>

or Gradle:

compile 'com.ekchang.jsouper:jsouper:0.0.3-SNAPSHOT'
compile 'org.jsoup:jsoup:1.9.1'

Retrofit2 converter is also available:

compile 'com.ekchang.jsouper:retrofit-converter-jsouper:0.0.3-SNAPSHOT'
compile 'com.squareup.retrofit2:retrofit:2.0.2'

License

Copyright 2016 Erick Chang
Copyright 2015 Square, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

A souped up HTML to Java deserializer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages