Skip to content

Preview service

Romaric Mourgues edited this page Jul 2, 2021 · 4 revisions

Preview Service Spec

General

The preview service is in charge of generating preview (thumbnails) of any (when possible) file published in Twake. When files are attached to resources, the preview is available (after the generation process delay) and can be used in the Twake UI.

To be defined: A file may have several previews and each preview may be saved on several sizes. By 'several previews', it means that a document file with several pages may produce several images, ie one image per page.

Supported types

Preview service should be able to generate preview as image (light and web compliant one) for most commonly used files: Images, Documents (pdf, doc, xls, ...) and potentially videos. This is the priorities: static images (jpg, png, etc), gifs (generate a static image from first frame), pdf, office documents, videos, vectorial image (svg)

Supported size

Any document of more than 50mo or one chunk will not have preview for the first version.

Requirements

The process generating preview:

  1. should be 'deployable' on any machine (laptop, Cloud provider, ...) without any major difficulty. Saying this, Docker is probably a good candidate...
  2. must be written in Typescript and use the same technologies as other platform services (Fastify, RabbitMQ, ...)
  3. must be able to talk with other Twake platform services (get the file to generate preview for, tell service that the process is complete, ...)
  4. must be a service ie can be called from other 'platform services' with some well defined API (JSON/REST or a pubsub layer, or both)
  5. is automatically called by the backend when a preview needs to be generated. The user does not have to ask for preview generation.
  6. is 'trackable'. This means that since the preview process can take time to generate the preview, we should be able to know if the process is queued, started, running, completed or in error.

Process

The current section describes how to preview service talks with other parts of the platform.

First version

  1. An actor uploads a file using the file API. Once the upload is complete, the backend ask for preview generation by sending a JSON message somewhere (somewhere can be pubsub or an HTTP endpoint). The message contains everything which is needed to generate the preview, but does not contains the file. As a response, the preview process sends back a process id. From this process id, the actor can get the process status (queued, starting, running, completed, errored)
  2. The preview process get the message from 1. Based on the type of file, it starts a preview process (generating thumbnail for image may not be the same as generating a process for PDF for example).
  3. When the preview process is complete, the preview image is saved. Saving the image means pushing it into the backend like
    POST /platform/api/images/:id/thumbnails
    form-data with image
    
  4. When the preview is saved, the backend updates the file metadata in the database by referencing the preview identifier. This identifier will be used by UI to get the thumbnail, for example GET /api/images/:id/thumbnails/:tid where :tid can be empty (it depends if we can generate several previews).
  5. The actor can send back 'technical notification' to user (a websocket message), and so the UI can now be updated with the thumbnail(s).

Second version

The second version is almost the same as the first one, but adds a workqueue to better manage thumbnail generation. The goal of this is to not crash the service under heavy load: It may be hard to generate several thumbnails in parallel if it takes tens of seconds to start something like libreoffice for example...

The workqueue is here to buffer incoming requests, and since the process is async, queueing a task into the queue will sends back a process id which is 'trackable'.

The workqueue implementation can be more or less smart: it can be in 'maison' (not so easy to implement), can use tools such as RxJS, NodeJS queue systems like Bull, or more advanced using Apache Kafka...

-- Romaric Notes

I wonder if we can do a version 0.5 to make sure we can start using the node file API (with frontends) before September.

We need to talk about where this is going to be. I would really like to not have an additionnal container to Twake as the strategy is to go the other way (reduce number of containers running). Why not leverage our current "micro-service" based microservice ? Of course I agree that the communication between this preview service and the outside must be done with what you describe in the wiki (pubsub / REST / json etc).