Skip to content

Tutorial (Indexers, newznab, API, *arr, etc.)

theotherp edited this page Jan 5, 2023 · 9 revisions

The following is meant as a introduction to some of the concepts you need to understand to properly use NZBHydra. Some of the chapters are kept intentionally short as the focus lies somewhere else.

How do indexers work?

Indexers scrape the usenet for nice stuff (mostly Movies, TV shows and porn, but also games, apps and music to a lesser degree). These are uploaded split in many parts and are often uploaded under names that a) do not make the easily discoverable and b) makes it hard to find out what exactly they contain. So they might find a couple of ZIP files but they're named "ABCD123", so you don't know if it's a movie or whatever. They may actually download the ZIPs and see inside it. But many uploads are actually named more or less like their content. It's up to the indexers to find these things and index them, hence the name. For each release they create an NZB file which is a file containing all the information you need to download this release. You don't need to know anything about what this file looks like or how it's created. In the simplest form the indexer just saves this file and some basic information about the release (size, upload date, etc). This is what raw search engines like Binsearch do. You can search by the names of these releases and filter by size and age, but if the release is named weirdly you probably won't find it. And if you search for a title that only has common words in it you will get a lot of unrelated results.

"Proper" indexers will try to find out for each release what it actually contains and try to assign it proper metadata, e.g. find the movie and save that information along in the database. If it finds an episode of "Generic TV Show" it will add it to its internal list of "Generic TV Show" episodes, along with season and episode number. That way you can go to their website and search for "Generic TV Show, Season 1, Episode 12" and find releases for exactly that. If you tried to do that with a raw search engine you'd need to enter "Generic TV Show s01e12" and would miss any releases that are named "Generic TV Show 1x12", for example.

Where do the indexers get all the information about movies and TV shows? From metadata providers like TheTVDB or https://www.imdb.com/. The indexers will not only save the metadata but also the ID for the movie or TV show. In our example let's say the TVDB ID for "Generic TV Show" is 12345.

What's an API?

Every indexer provides an API (Application programming interface) endpoint, a certain URL that can be called to programmatically retrieve the indexer's releases and search them. This API is described by the [https://newznab.readthedocs.io/en/latest/misc/api/](newznab spec) but it should be noted that no indexer actually implements all the functions described there. The API is only meant to be called by programs like NZBHydra or Sonarr. Indexers usually limit the access to the API to a certain amount of hits per day (because each API request takes a little bit of processing power). Many indexers allow a couple of hits for free users and thousands for paying VIP users.

An API search URL might look like this: https://www.indexer.com/api?apikey=someapikey&t=search&q=whatever.

API searches (the API also allows downloading and some other stuff, but we'll ignore that) can be made using several search parameters which determine which results are returned.

Pagination of results

Indexers never return more than a certain amount of results at once. Every block of results is called a page. The default amount of results per page is 100, although some indexers allow up to 100 results. When more than the results of the first page are needed, we need to query the indexer for results using an offset which is the number of results we've already seen. So for example the first query returned 100 results and for the next page we ask for the results using an offset of 100. This is used to reduce the amount of data needed to process by the indexer and NZBHydra. When more results than 100 exist and we need more, we need to make another query (which is another API call counted by the indexer). Unfortunately with slow indexers this means that asking for the next 100 results make take a couple of seconds and for the next 100 results again and again. NZBHydra asks every indexer for the maximum amount of results per page and will show these (if you're using the UI), along with the info how many total results are there. Hydra will show something like "Loaded 300 of 10000 results", in which case we probably queried 3 indexers, each of which returned 100 results. If you want to see more results you have to use the "Load more" button. You can also use the dropdown option "Load all" but this may take very long as this may require a lot of calls to the indexer (e.g. 5 calls for another 500 results or, in the example above, about 33 per indexer).

Categories

Each release is a assigned a category. These categories are predefined to a certain degree (some indexers invent new ones). Each category has a fixed number which identifies this category for searches. The categories are split into main and subcategories. Main categories are for example "Movies" (2000), "TV" (5000), "Audio" (3000). Each main category has several subcategories. "Movies" has the subcategories "HD" (2040), "SD" (2030) and others, "TV" has the subcategories "HD" (5040), "SD" (5030) and others. You can already see that the subcategories always start with the same digit as their main category because they're subcategory. If an API search is made with the parameter cat=2000 that means that only Movie results should be returned (so any results with a category that starts with "2"). The same way if an API search is made with the parameter cat=2040 that means that only HD Movie results should be returned. It's also possible to combine multiple categories: cat=2000,5000 will only return Movie and TV results, cat=2010,2030 will only return foreign and "other" movies (whatever that is) (so movies that have either category assigned). You can see that if cat=2000 returns any results with a category that starts with "2" it doesn't make any sense to search for cat=2000,2010,2020,2030 - that's the same as searching for cat=2000.

Search types

API searches can be made using several functions. These determine how results are searched and what parameters can be added to the search. The search type is defined by the t parameter in the URL.

SEARCH

This is a search in its most basic form (t=search). You can provide a simple text based query using q=whatever which would limit returned results to those with "whatever" in their name. But even that parameter is optional. It's possible just to search t=search&cat=2000 to get a list of the latest movies. This is called an "update query" in NZBHydra because it doesn't search for anything in particular. That's the kind of query periodically made by Sonarr just to keep up-to-date.

It's important to understand that the search function with a query parameter (q=whatever) doesn't use any special logic. It just searches in the release title, nothing else. This involves all the downsides described above, i.e. you might miss releases if you use the wrong words or have too many false positives if your query is too generic.

MOVIE / TVSHOW

So the indexer has already indexed all its releases and assigned meta data and knows exactly which TV show a release contains. Wouldn't it be nice to search for that exact TV show (or even episode)? That can be achieved by using the specific search types t=tvsearch (or t=movie). This allows providing a media ID (as described above) that specifies what you're looking for. t=tvsearch&tvdbid=73739 will search for all episodes of "Generic TV Show", t=tvsearch&tvdbid=73739&cat=5040&season=1&ep=12 will search for all HD releases of Season 1, Episode 12 of "Generic TV Show".

The same goes for movies, search for t=movie&imdbid=tt0012345 to only find "Blockbuster Movie" releases.

There are several media ID types and not all of them are supported by all indexers:

  • IMDB ID (for movies). Nearly every indexer supports this.
  • TheTVDB ID (for TV shows). Nearly every indexer supports this.
  • TVmaze (for TV shows). Many indexers support this.
  • The Movie Database. Many indexers support this.
  • IMDB ID (for TV shows). Few indexers support this.
  • TVRage. This was a TV show meta data provider that's been offline for a while. Still supported by many indexers for older TV shows.

It's also possible that a search type is supported but no ID. That means that you can search specifically for movies or TV shows but only using plain text queries.

MUSIC / BOOK

Same as for TV shows and movies the spec also defines searches for music and books using specific search parameters (like author & title or artist & album). Some indexers support these. I can't say how good the results are.

What does all that mean for NZBHydra, Sonarr, etc?

NZBHydra can be used two ways:

  1. As an "artificial" indexer that you plug in Sonarr.
  2. As a GUI to manually search all your indexers in one place.

Either way you have to indexer all your indexers into NZBHydra. An automatic "caps check" will determine which of the described search types (SEARCH, TVSHOW, MOVIE, AUDIO, BOOK) and which of the search IDs (TVDB, IMDB, etc.) are supported by the indexer. This is done via "brute force". NZBHydra will execute a search for each of the types and IDs and check if at least 90% of the returned results match the search. That way we can be sure that the type/ID is actually supported. If the caps check does not determine a certain search type supported that does not mean that the indexer won't return any results in this area. So an indexer that doesn't support AUDIO will certainly return audio releases, you just can't search the indexer by artist and album or such.

So let's say you entered NZBHydra into Sonarr. Sonarr makes an update query every 30 minutes or so. NZBHydra queries all its configured indexers, aggregates the results, removes duplicates, filters out some results (if you configured any filters in the config) and return the list to Sonarr which may ask for another batch of results. That's the "update query" described above. Now let's say you're missing a certain episode. You can trigger this search manually but Sonarr will also execute a backlog search now and then. It will call NZBHydra searching for this particular show and episode, e.g. t=tvshow&tvdbid=73739&season=1&episode=12. NZBHydra will search all indexers which support this search type and ID but will also ignore any which don't support that.

Query generation

To "fix" this you can enable query generation. That means NZBHydra will convert the ID into a title and, if needed, add season and episode to a query and make a text based query using the SEARCH function. In the example above the indexer will be queried using "Generic TV Show" s01e12. It's also possible to enable this only as a fallback which means that an indexer supporting the search type and ID will be searched using these and, if it doesn't return any results, NZBHydra will then execute a search using the generated query.

Short note about raw search engine (Binsearch, Nzbindex, etc.): These never support any kind of IDs. You can only search them using plain text queries so if you want to use them with *arr you'll have to enable query generation.

Manual configuration of caps

It's almost never a good idea to manually change the search types and IDs that NZBHydra determined to be supported by an indexer. If you remove any you will get less results and if you add any that aren't actually supported you will get errors.

Torrent trackers and torznab

Most torrent trackers work completely different than indexers. They don't index stuff, every torrent is usually manually uploaded by somebody, but it may also be scraped from other indexers. They rarely have any of the metadata the indexers have, so they don't know what TV show a certain torrent is for. There are private trackers that do stuff usenet users can only dream of, but that's another story.

Torrent trackers also usually don't have an API to be searched. To fix that torznab was invented, which is basically a slightly modified newznab format to translate tracker searches to a format that can be programmatically read. The most popular program to provide this API access is Jackett. You can configure trackers there and they can then be called NZBHydra or Sonarr. Jackett will execute the search against the trackers, translate the results and return a torznab result. Jackett is basically for torrent trackers what NZBHydra is for usenet indexers.

Due to their nature these trackers often don't support any search types or perhaps only one and rarely any IDs. NZBhydra allows to read the jackett config and automatically add all its configured trackers. In this case the supported search types and IDs are pulled from the config and not determined by brute force.

Linking programs to NZBHydra

By now you hopefully understand what NZBHydra does and how it works but perhaps not why you should (or perhaps shouldn't) use it. This part does not touch NZBHydra as a manual search tool. There's no downside there. This part rather discusses the pros and contras of adding NZBHydra as an indexer to your programs.

Why?

Pro

  1. You refer all your programs (Sonarr, Radarr, Lidarr, LL, Mylar, etc.) to NZBHydra. You enter your indexers once in NZBHydra. Whenever you have a new indexer you only need to configure it once.
  2. You can configure all your jackett trackers automatically instead of having to add each one manually.
  3. You have vastly more options to (pre)filter the results by filtering out results with or without certain words or regexes, usenet poster, usenet group, etc.
  4. You get fancy stats and a unified download and search history.
  5. You get finer control over the access of the indexers: You can do load balancing, account for the API limit, use indexers only for update queries or only specific search queries or only certain categories.
  6. You get query generation and conversion between IDs (i.e. if a program provides a TMDB ID and the indexer only supports IMDB then NZBHydra will convert it).

Contra

  1. You add NZBHydra as a single point of failure. If it crashes or has a bug all the other programs don't work properly. Once a download fails (e.g. because the indexer is offline or the download limit has been reached) Sonarr may disable NZBHydra for a while.
  2. NZBHydra aggregates all your search results and returns the newest 100. *arr will use paging to get up to 1000 results and then stop. All your results must fit into this limit of 1000 results whereas, when you add each indexer individually to *arr, the limit applies only to that indexer's results. That means with NZBHydra you may miss results because they're outside this scope 1000 results. I personally don't think that's ever a problem but it's possible and gets more probable with every indexer (and especially tracker) you add.

How?

On the NZBHydra config in the upper right there's an "API?" button. Click it. Use the shown URLs and API key to add NZBHydra as you would add any indexer to your program. Make sure to use the correct endpoints (one for newznab / usenet und one for torznab / torrents).

Duplicates and indexer priority

With multiple indexers it's likely that NZBHydra will find results on different indexers that actually link to the same usenet upload. I call these duplicates. Two results are considered duplicates if they have the same title, usenet group, usenet poster and roughly the same size and age. Two results which share the same title but differ in any of those other attributes are not duplicates. The idea is that it doesn't matter which of the duplicates you download, if the download fails it doesn't make sense to download any of the others because it's the same upload on usenet and they will fail as well.

By default duplicates are hidden in the GUI although you can enable their display via the display options dropdown. For API calls just one of the duplicates is returned. In either case NZBHydra must decide which of the results of a duplicate group to show first / return. This decision is made using two values (in that order):

  1. The indexer priority.
  2. The result age.

By default all indexers have the same priority. The result from the indexer with the highest priority value is chosen. If those are equal then the newest result is chosen (just to be deterministic, it doesn't really matter as the results are duplicates and their age is nearly the same).

The idea is that you assign indexers which allow many downloads / API hits a higher score than those which allow fewer downloads. If you don't use indexers with low limits or never reach them there's really no reason to set the priority - I never did.

Indexer priority does not and can not in any way influence the decision of an external program which result to download or to decide a usenet result over or torrent result or whatever. Results from trackers (torznab results) are never considered duplicates of anything because (unless you use a bunch of public trackers) because they're effectively different downloads.