Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing jobs by something other than path #282

Open
iszulcdeepsense opened this issue Jul 17, 2023 · 3 comments
Open

Routing jobs by something other than path #282

iszulcdeepsense opened this issue Jul 17, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@iszulcdeepsense
Copy link
Collaborator

iszulcdeepsense commented Jul 17, 2023

Routing jobs by the path can be painful and harmful. Here's why:
For instance, if we were about to deploy WordPress to Racetrack, we'd need to deploy it at the non-root path, ie. /pub/job/wordpress-job/0.0.2/. This is required by Racetrack due to path routing done by PUB.
However, WordPress doesn't seem to work with the non-root base path (or at least I wasn't able to achieve it).
This is not about just the WordPress. Drupal commits the same sin. I was finally able to deploy it to Racetrack by doing some weird HTML output rewriting, which is unreliable and can't be generalized.

Furthermore, jobs can be accessed either by its exact version /pub/job/adder/0.0.2/, alias /pub/job/adder/latest/ or a wildcard /pub/job/adder/0.0.x/ so what we really need is to serve a job at wildcarded /pub/job/adder/*/ URL path. Again, this is not supported by common applications.

The only cure I see is not to do path routing at all. Instead jobs could be routed by hostname.
That is: hostname adder-0-0-2.pub.dev-racetrack-cluster.example.com/, which is a part of HTTP request, could be parsed by PUB.

I know it requires a lot of effort to be done by the infrastructure providers (k8s ingress, wildcarded HTTPS certificates) and local testing might become more complex with the addition of a local DNS server. However, Racetrack would become wide open to nearly any kind of application.

(This is not urgent, it's rather nice-to-have feature)

@iszulcdeepsense iszulcdeepsense added the enhancement New feature or request label Jul 27, 2023
@iszulcdeepsense
Copy link
Collaborator Author

Path rewriting breaks front-end apps

Here's a detailed explanation why the problem exists due to path rewriting. It is common to many front-end apps, like Drupal, WordPress or pgAdmin.

  1. A job is hosted by Pub at base URL: /pub/job/drupal-job/0.0.2/ (to distinguish it from other jobs)
  2. A target application (eg. Drupal) awaits for the requests at the root path, so docker proxy job rewrites the URL and trims down the prefix, transforming /pub/job/drupal-job/0.0.2/index.html into /index.html and it passes the request to the target container.
  3. A target frontend app doesn't know it's being proxied and generates absolute paths in its HTML content:
    <link rel="stylesheet" href="/assets/style.css" />
  4. The HTML page goes back to user's browser, which tries to load a CSS file, so it makes another request to Pub for /assets/style.css
  5. Pub rejects that request with 404 Not Found as it's invalid job URL.

Note that while the issue occurs in almost any kind of front-end apps, there are exceptions (like Sphinx or HUGO) that are "proxy-friendly" and work well thanks to using relative paths inside its HTML/JS contents.

@iszulcdeepsense
Copy link
Collaborator Author

In general, to address this issue, we need to find a way of putting additional information to HTTP request that allows Racetrack to distinguish a job, and dispatch the request to appropriate place. HTTP request looks usually like this:

GET /pub/job/adder/0.0.3/docs HTTP/2
Host: racetrack.dev.example.com
accept-encoding: deflate, gzip, br, zstd
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
accept-language: en-US,en;q=0.9,pl;q=0.8,da;q=0.7
cache-control: max-age=0
cookie: racetrack_sessionid=***; X-Racetrack-Auth=***
sec-ch-ua: "Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Linux"
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36

Possible options:

  • Path: "/pub/job/adder/0.0.3/docs" - this is a current approach, it isn't perfect though, as described above
  • Host name: "Host: adder-0-0-2.pub.dev-racetrack-cluster.example.com" - has certificate difficulties as described above
  • Host port: "Host: racetrack.dev.example.com:8565" - Each job will get a unique port number. Should work even with HTTPS, but Infrastructure target has to support opening new ports on demand (eg. new port in Kubernetes' Ingress Controller, plus opening new port on Cloudflare and the firewalls). One drawback is that user doesn't know what job he's really looking at as all he sees is just a random port number 8565.
  • Cookie: "cookie: job=adder:0.0.3" - Bad idea. Cookie is persistent between requests, but it won't allow opening more than one tab in a browser as it's shared.
  • Custom header: "X-Job: adder:0.0.3" - Bad idea. This header will be cleared out after making a new request through the browser.
  • Additional Query param: "/index.html?racetrack_job=adder:0.0.3" - It might break when a browser redirects you to the other absolute URL, clearing out this query param

@iszulcdeepsense iszulcdeepsense changed the title Routing by hostname Routing jobs not by path Dec 1, 2023
@iszulcdeepsense iszulcdeepsense changed the title Routing jobs not by path Routing jobs by something other than path Dec 1, 2023
@iszulcdeepsense
Copy link
Collaborator Author

Currently, the best way I see is to go with hostname subdomains.
This would be an optional feature for extraordinary apps, working only with the infrastructure targets that supports it (local Docker doesn't even know about hostnames).
There would be 2 steps required:

  1. Ask your Kubernetes operator to create a subdomain for you. Infrastructure targets would be responsible for subdomain issuing. This would be a long step, that has to be done outside of Racetrack.
  2. Claim hostname in a job manifest - separate field which instructs Racetrack to route a job by this hostname if it occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant