Skip to content
/ tall Public

Promise-based, No-dependency URL unshortner (expander) module for Node.js

License

Notifications You must be signed in to change notification settings

lmammino/tall

Repository files navigation

tall

npm version Build Status codecov.io JavaScript Style Guide Written in TypeScript

Promise-based, No-dependency URL unshortner (expander) module for Node.js 16+.

Note: This library is written in TypeScript and type definitions are provided.

Install

Using npm

npm install --save tall

or with yarn

yarn add tall

Usage

ES6+ usage:

import { tall } from 'tall'

tall('http://www.loige.link/codemotion-rome-2017')
  .then((unshortenedUrl) => console.log('Tall url', unshortenedUrl))
  .catch((err) => console.error('AAAW 👻', err))

With Async await:

import { tall } from 'tall'

async function someFunction() {
  try {
    const unshortenedUrl = await tall(
      'http://www.loige.link/codemotion-rome-2017'
    )
    console.log('Tall url', unshortenedUrl)
  } catch (err) {
    console.error('AAAW 👻', err)
  }
}

someFunction()

ES5:

var { tall } = require('tall')
tall('http://www.loige.link/codemotion-rome-2017')
  .then(function (unshortenedUrl) {
    console.log('Tall url', unshortenedUrl)
  })
  .catch(function (err) {
    console.error('AAAW 👻', err)
  })

Options

It is possible to specify some options as second parameter to the tall function.

Available options are the following:

  • method (default "GET"): any available HTTP method
  • maxRedirects (default 3): the number of maximum redirects that will be followed in case of multiple redirects.
  • headers (default {}): change request headers - e.g. {'User-Agent': 'your-custom-user-agent'}
  • timeout: (default: 120000): timeout in milliseconds after which the request will be cancelled
  • plugins: (default: [locationHeaderPlugin]): a list of plugins for adding advanced behaviours

In addition, any other options available on http.request() or https.request() are accepted. This for example includes rejectUnauthorized to disable certificate checks.

Example:

import { tall } from 'tall'

tall('http://www.loige.link/codemotion-rome-2017', {
  method: 'HEAD',
  maxRedirect: 10
})
  .then((unshortenedUrl) => console.log('Tall url', unshortenedUrl))
  .catch((err) => console.error('AAAW 👻', err))

Plugins

Since tall v5+, a plugin system for extending the default behaviour of tall is available.

By default tall comes with 1 single plugin, the locationHeaderPlugin which is enabled by default. This plugin follows redirects by looking at the location header in the HTTP response received from the source URL.

You might want to write your own plugins to have more sophisticated behaviours.

Some example?

  • Normalise the final URL if the final page has a <link rel="canonical" href="http://example.com/page/" /> tag in the <head> of the document
  • Follow HTML meta refresh redirects (<meta http-equiv="refresh" content="0;URL='http://example.com/'" />)

Known plugins

Did you create a plugin for tall? Send us a PR to have it listed here!

How to write a plugin

A plugin is simply a function with a specific signature:

export interface TallPlugin {
  (url: URL, response: IncomingMessage, previous: Follow | Stop): Promise<
    Follow | Stop
  >
}

So the only thing you need to do is to write your custom behaviour following this interface. But let's discuss briefly what the different elements mean here:

  • url: Is the current URL being crawled
  • response: is the actual HTTP response object representing the current
  • previous: the decision from the previous plugin execution (continue following a given URL or stop at a given URL)

Every plugin is executed asynchronously, so a plugin returns a Promise that needs to resolve to a Follow or a Stop decision.

Let's deep dive into these two concepts. Follow and Stop are defined as follows (touché):

export class Follow {
  follow: URL
  constructor(follow: URL) {
    this.follow = follow
  }
}

export class Stop {
  stop: URL
  constructor(stop: URL) {
    this.stop = stop
  }
}

Follow and Stop are effectively simple classes to express an intent: should we follow the follow URL or should we stop at the stop URL?

Plugins are executed following the middleware pattern (or chain of responsibility): they are executed in order and the information is propagated from one to the other.

For example, if we initialise tall with { plugins: [plugin1, plugin2] }, for every URL, plugin1 will be executed before plugin2 and the decision of plugin1 will be passed over onto plugin2 using the previous) parameter.

How to write and enable a plugin

Let's say we want to add a plugin that allows us to follow HTML meta refresh redirects, the code could look like this:

// metarefresh-plugin.ts
import { IncomingMessage } from 'http'
import { Follow, Stop } from 'tall'

export async function metaRefreshPlugin(
  url: URL,
  response: IncomingMessage,
  previous: Follow | Stop
): Promise<Follow | Stop> {
  let html = ''
  for await (const chunk of response) {
    html += chunk.toString()
  }

  // note: This is just a dummy example to illustrate how to use the plugin API.
  // It's not a great idea to parse HTML using regexes.
  // If you are looking for a plugin that does this in a better way check out
  // https://npm.im/tall-plugin-meta-refresh
  const metaHttpEquivUrl = html.match(
    /meta +http-equiv="refresh" +content="\d;url=(http[^"]+)"/
  )?.[1]

  if (metaHttpEquivUrl) {
    return new Follow(new URL(metaHttpEquivUrl))
  }

  return previous
}

Then, this is how you would use your shiny new plugin:

import { tall, locationHeaderPlugin } from 'tall'
import { metaRefreshPlugin } from './metarefresh-plugin'

const finalUrl = await tall('https://loige.link/senior', {
  plugins: [locationHeaderPlugin, metaRefreshPlugin]
})

console.log(finalUrl)

Note that we have to explicitly pass the locationHeaderPlugin if we want to retain tall original behaviour.

Contributing

Everyone is very welcome to contribute to this project. You can contribute just by submitting bugs or suggesting improvements by opening an issue on GitHub.

Note: Since Tall v6, the project structure is a monorepo, so you'll need to use a recent version of npm that supports workspaces (e.g. npm 8.5+)

License

Licensed under MIT License. © Luciano Mammino.