Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The same message is constantly dispatched to the Worker #67

Closed
gildas-lormeau opened this issue Jun 28, 2018 · 18 comments
Closed

The same message is constantly dispatched to the Worker #67

gildas-lormeau opened this issue Jun 28, 2018 · 18 comments

Comments

@gildas-lormeau
Copy link

gildas-lormeau commented Jun 28, 2018

I'm submitting a ... (check one with "x")

[x] bug report => search github for a similar issue or PR before submitting
[ ] feature request

Scope (check one with "x")

[ ] @zetapush/cli
[ ] @zetapush/cometd
[ ] @zetapush/core
[ ] @zetapush/create
[ ] @zetapush/platform
[x] @zetapush/worker

Current behavior

Just after starting the worker, I get these logs whereas the client did not call the process method of the worker.

⠦ Starting worker...
⠧ Starting worker...
[INFO] Worker is up!  ‌
[EXPERIMENTAL] Create HTTP Server  ‌
[EXPERIMENTAL] Running at http://localhost:3000  ‌
WorkerInstance::dispatch { name: 'process',
  namespace: '',
  parameters: { ... },
  requestId: '2m5kwjb4dpbzqx114l17wgreiljo:9vrjcyc:1',
  taskId: 'zR6yA4ZXrBM' }
WorkerInstance::dispatch { name: 'process',
  namespace: '',
  parameters: { ... },
  requestId: '2m3i2v3dsz45pnh116p50lrcaxvk:2o8yfu1:1',
  taskId: '19EDoWq4L6M' }
^C[WARN] Properly disconnect client  ‌
[WARN] Client properly disconnected  ‌

The requestId values are always the same.

Expected behavior

I should not see these messages since the client did not interact with the worker yet.

Minimal reproduction of the problem with instructions

Start my worker (id: lFnXaV9u).

What is the motivation / use case for changing the behavior?

It seems it's a bug.

Please tell us about your environment:

  • Operating System: Windows 10 Pro/WSL (Ubuntu 16.04.4 LTS)
  • Node Version: 8.11.1
  • Npm Version: 6.1.0
  • @angular/* Version: 8.27.14
@gildas-lormeau
Copy link
Author

gildas-lormeau commented Jun 28, 2018

I redeployed the worker and the issue has disappeared so it's not reproducible anymore.

@ghoullier
Copy link
Contributor

ghoullier commented Jun 28, 2018

This behavior can be explained by in the following case.

Your front-end application send 2 call to process method.
This 2 calls are stored in a queue on zetapush platform.
When your worker is started, zetapush platform dispatch 2 queued task.
Your worker seems to not correctly handle process task, so zetapush platform will preserve tasks in the internal queue.

@gildas-lormeau
Copy link
Author

gildas-lormeau commented Jun 28, 2018

As far as I know, my worker correctly handles the task. However the response is usually not served to the client because of a 5000ms timeout. Maybe it's related?

@ghoullier
Copy link
Contributor

Can you share the code of your process method?

@gildas-lormeau
Copy link
Author

It's a bit long... Can't you access it on your end?

@gildas-lormeau
Copy link
Author

gildas-lormeau commented Jun 28, 2018

Anyway, there it is:

module.exports = class {
  async process(parameters, context) {
    return await DocProcessor.run(parameters.url);
  }
};

// ------------
// DocProcessor
// ------------
class DocProcessor {
  static async run(url) {
    const processor = new DOMProcessor(url);
    await processor.initialize();
    processor.removeScripts();
    processor.removeObjects();
    processor.removeMediaSrcAttributes();
    processor.removeFrames();
    processor.removeMetaCharset();
    processor.removeMetaRefresh();
    processor.removePreloadLinks();
    processor.insertDefaultFavicon();
    processor.resolveURLs();
    await Promise.all([processor.processInlineStylesheets(), processor.processImages(), processor.processLinkStylesheets(), processor.processStyleAttributes()]);
    return processor.getDocContent();
  }
}

// ------------
// DOMProcessor
// ------------
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const mapElements = (elements, lambda) => Array.prototype.map.call(elements, lambda);

class DOMProcessor {
  constructor(url) {
    this.baseURI = url;
  }

  async initialize() {
    const content = await Resource.getTextContent(this.baseURI);
    this.dom = new JSDOM(content, { url: this.baseURI, virtualConsole: new jsdom.VirtualConsole() });
    const doc = this.dom.window.document;
    this.selectElements = elements => doc.querySelectorAll(elements);
    this.createElement = tagName => doc.createElement(tagName);
  }

  getDocContent() {
    return this.dom.serialize();
  }

  removeScripts() {
    this.selectElements("script").forEach(element => element.remove());
    this.selectElements("[onload]").forEach(element => element.removeAttribute("onload"));
  }

  removeObjects() {
    this.selectElements('applet, object:not([type="image/svg+xml"]):not([type="image/svg-xml"]), embed:not([src*=".svg"])').forEach(element => element.remove());
  }

  removeMediaSrcAttributes() {
    this.selectElements("audio[src], video[src]").forEach(element => element.removeAttribute("src"));
  }

  removePreloadLinks() {
    this.selectElements("link[rel*=preload], link[rel*=prefetch]").forEach(element => element.remove());
  }

  removeFrames() {
    this.selectElements("iframe, frame").forEach(element => element.remove());
  }

  removeMetaRefresh() {
    this.selectElements("meta[http-equiv=refresh]").forEach(element => element.remove());
  }

  removeMetaCharset() {
    this.selectElements("meta[charset]").forEach(element => element.remove());
  }

  resolveURLs() {
    this.selectElements("[href]").forEach(element => element.setAttribute("href", element.href));
    this.selectElements("[src]").forEach(element => element.setAttribute("src", element.src));
  }

  insertDefaultFavicon() {
    const docHead = this.selectElements("html > head")[0];
    const faviconElement = this.selectElements('link[href][rel="shortcut icon"], link[href][rel="apple-touch-icon"], link[href][rel="icon"]');
    if (docHead && !faviconElement) {
      faviconElement = doc.createElement("link");
      faviconElement.setAttribute("type", "image/x-icon");
      faviconElement.setAttribute("rel", "shortcut icon");
      faviconElement.setAttribute("href", "/favicon.ico");
      docHead.appendChild(faviconElement);
    }
  }


  async processImages() {
    await Promise.all([
      Resource.setElementsDataURI(this.selectElements('link[href][rel="shortcut icon"], link[href][rel=apple-touch-icon], link[href][rel=icon]'), "href"),
      Resource.setElementsDataURI(this.selectElements("img[src], input[src][type=image]"), "src"),
      Resource.setElementsDataURI(this.selectElements("video[poster]"), "poster"),
      Resource.setElementsDataURI(this.selectElements('object[type="image/svg+xml"], object[type="image/svg-xml"], embed[src*=".svg"]'), "src")
    ]);
  }

  async processInlineStylesheets() {
    await Promise.all(mapElements(this.selectElements("style"), async styleElement => {
      const textContent = await DomUtil.processStylesheet(styleElement.textContent, this.baseURI);
      if (styleElement.media) {
        textContent = "@media " + styleElement.media + "{ " + textContent + " }";
      }
      styleElement.textContent = textContent;
    }));
  }

  async processStyleAttributes() {
    await Promise.all(mapElements(this.selectElements("[style]"), async element => {
      const textContent = await DomUtil.processStylesheet(element.getAttribute("style"), this.baseURI);
      element.setAttribute("style", textContent);
    }));
  }

  async processLinkStylesheets() {
    await Promise.all(mapElements(this.selectElements("link[rel*=stylesheet]"), async linkElement => {
      const resourceURL = linkElement.getAttribute("href");
      if (resourceURL) {
        let textContent = await Resource.getTextContent(resourceURL);
        textContent = await DomUtil.processStylesheet(textContent, resourceURL);
        const styleElement = this.createElement("style");
        if (linkElement.media) {
          textContent = "@media " + linkElement.media + "{ " + textContent + " }";
        }
        styleElement.textContent = textContent;
        linkElement.parentElement.replaceChild(styleElement, linkElement);
      }
    }));
  }
}

// --------
// Resource
// --------
const request = require("request-promise-native");
const dataUri = require("strong-data-uri");

class Resource {
  static async getTextContent(resourceUrl) {
    if (resourceUrl.startsWith("data:")) {
      return dataUri.decode(resourceUrl).toString();
    } else {
      const requestOptions = {
        method: "GET",
        uri: resourceUrl,
        resolveWithFullResponse: true,
        pool: { maxSockets: 5 }
      };
      try {
        const resourceContent = await request(requestOptions);
        return resourceContent.body;
      } catch (e) {
        return "";
      }
    }
  }

  static async getDataURI(resourceUrl) {
    if (resourceUrl.startsWith("data:")) {
      return resourceUrl
    } else {
      const requestOptions = {
        method: "GET",
        uri: resourceUrl,
        resolveWithFullResponse: true,
        encoding: null,
        pool: { maxSockets: 5 }
      };
      try {
        const resourceContent = await request(requestOptions);
        return dataUri.encode(resourceContent.body, resourceContent.headers["content-type"]);
      } catch (e) {
        return resourceUrl;
      }
    }
  }

  static async setElementsDataURI(elements, attributeName) {
    await Promise.all(mapElements(elements, async element => {
      const resourceURL = element.getAttribute(attributeName);
      if (resourceURL) {
        try {
          const dataURI = await Resource.getDataURI(resourceURL);
          element.setAttribute(attributeName, dataURI);
        } catch (e) {
        }
      }
    }));
  }
}

// -------
// DomUtil
// -------
const url = require("url");

class DomUtil {
  static async processStylesheet(content, baseURI) {
    content = DomUtil.removeCssComments(content);
    content = await DomUtil.processImports(content, baseURI);
    const urls = content.match(/url\s*\(([^\)]*)\)/gi) || [];
    await Promise.all(urls.map(async resourceUrl => {
      const result = resourceUrl.match(/^url\s*\(\s*(?:'|")?\s*([^('"\))]*)\s*(?:'|")?\s*\)$/i);
      if (result && result[1]) {
        const origUrl = result[1];
        const absoluteUrl = url.resolve(baseURI, origUrl);
        const dataUri = await Resource.getDataURI(absoluteUrl);
        if (content.indexOf(origUrl) != -1) {
          content = content.replace(origUrl, dataUri);
        }
      }
    }));
    return content;
  }

  static async processImports(content, baseURI) {
    content = DomUtil.removeCssComments(content);
    const imports = content.match(/(@import\s*url\s*\([^\)]*\)\s*(.*)(;|$))|(@import\s*('|")?\s*[^\(;'"]*\s*('|")?\s*(.*)(;|$))/gi) || [];
    let processed = false;
    await Promise.all(imports.map(async cssImport => {
      if (!cssImport.endsWith(";")) {
        cssImport += ";";
      }
      const result = cssImport.match(/(url\s*\(\s*(?:'|")?\s*([^('"\))]*)\s*(?:'|")?\s*\)\s*([^;]*);)|(@import\s*\(?\s*(?:'|")?\s*([^('"\))]*)\s*(?:'|")?\s*(?:\)\s*([^;]*);))/i);
      if (result && (result[2] || result[4])) {
        const origUrl = result[2] || result[4];
        const absoluteUrl = url.resolve(baseURI, origUrl);
        const media = result[3] || result[5];
        processed = true;
        let textContent = await Resource.getTextContent(absoluteUrl);
        if (media) {
          textContent = "@media " + media + "{ " + textContent + " }";
        }
        if (content.indexOf(cssImport) != -1) {
          content = content.replace(cssImport, textContent);
        }
      }
    }));
    if (processed) {
      return await DomUtil.processImports(content, baseURI);
    } else {
      return content;
    }
  }

  static removeCssComments(content) {
    var start, end;
    do {
      start = content.indexOf("/*");
      end = content.indexOf("*/", start);
      if (start != -1 && end != -1)
        content = content.substring(0, start) + content.substr(end + 2);
    } while (start != -1 && end != -1);
    return content;
  }
}

@ghoullier
Copy link
Contributor

Indeed, the problem came from the worker method timeout. This value is not yet configurable. It should be available for the next version.

@gildas-lormeau
Copy link
Author

gildas-lormeau commented Jun 28, 2018

Could you please increase it to something like 10s or 15s meanwhile? It would help.

@ghoullier
Copy link
Contributor

I juste released a canary version 0.28.0-alpha.b4399b84 with an increased worker timeout to 60000ms.

@gildas-lormeau
Copy link
Author

Thanks! 👍

@ghoullier
Copy link
Contributor

We released a stable version 0.28 on stable channel.

@gildas-lormeau
Copy link
Author

gildas-lormeau commented Jul 2, 2018

It looks like the timeout issue is somewhat still present, the log disappeared but if you go to my app, type "https://zetapush.com/" in the input and click on "OK". You may see the response message corresponding to the return of the method call is never received on client-side. That's why the buttons at the bottom right of the screen don't enable themselves. If you test a smaller page like "https://www.quikinvoice.com/", it will work though.

EDIT: I'm using zeta v0.28.0.

@aurelien-baudet
Copy link
Contributor

I have tested your application. You are right, no response when using page zetapush.com.
Maybe it is not an issue with the timeout but an error in the processor that is never raised.

I have attached logs of your application. If you look at the timestamps, it stops after 3 seconds.
The timeout is not reached but nothing happens after last log which it seems it should not be the last one according to what happens with https://www.quikinvoice.com/.

singlefile.log

@gildas-lormeau
Copy link
Author

@aurelien-baudet The processor works fine on https://zetapush.com, You can test it with the index page at the root at the project. I don't see any errors anywhere and it works as expected on my machine. However, when I run the code within a zeta worker, I see the response in the logs but it's never received by the client.

@gildas-lormeau
Copy link
Author

gildas-lormeau commented Jul 2, 2018

BTW, your logs show the process ran as expected (i.e. there are no errors during the execution). The last line of the logs is the response that is never received by the client.

@gildas-lormeau
Copy link
Author

@aurelien-baudet You're right. This issue does not seem related to a timeout issue. I was not able to reproduce it with a simple test. It does not seem related to the response size either. Could you please confirm the issue is on your end anyway?

@aurelien-baudet
Copy link
Contributor

aurelien-baudet commented Jul 4, 2018

@gildas-lormeau After several tests, the behavior is the same locally and in the cloud. With URLs of "light/small" websites, the front receives a response. But with URLs of "heavy" websites, worker sends the response but the front never receives it.

The issue is related to the size of the data transferred on the network. Internally, we use CometD as real-time protocol. CometD is currently configured to exchange messages with a limit of 1Mo.

We could increase this limit but we think it is not the best option because you can still have a website whose size could be greater than the new limit.

So a better option is to store your result using our file system cloud service (as the new documentation is not ready yet, here is the old one : https://ref.zpush.io/#it_zpfs_hdfs):

  1. import and inject Zpfs_hdfs service from @zetapush/platform
  2. ask for an upload URL (call newUploadUrl)
  3. post the content using HTTP on this URL (use the lib you want)
  4. finish the file upload (call newFile). The download URL is returned.
  5. return the download URL to the client
  6. Download content from your client

Do not hesitate to give us feedback about this cloud service.

@aurelien-baudet
Copy link
Contributor

I created several issues that are related to the trouble you encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants