Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] PDF Comments Extractor - New Feature, Help Needed #3300

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ibraude
Copy link
Contributor

@ibraude ibraude commented Oct 22, 2019

#New Feature - Help Needed
This is a PR referring to #3296.

I've started working on a new feature that is very useful for me and I thought it might be useful for other Boostnote users who read and summarize long texts.

The Feature

Import all annotations, highlights, and comments from a pdf file into a new Boostnote note.
For example:

I read a long document and highlight the main points as I go.
image

Then, instead of copy & pasting content to my summary in Boostnote, or constantly switching between windows, I can import the annotated pdf file to Boostnote:
image

This extracts all of the annotations from the pdf and opens a new note with the annotations in it.

image

This could be very useful for students or anyone that wants to summarize long texts.

Current behavior

This works perfectly in dev mode.
However, when I compile and build, webpack throws multiple errors. the build is complete but the functionality of the new feature does not work.

WARNING in ./~/pdfjs-dist/build/pdf.js
Critical dependencies:
9858:6-13 require function is used in a way in which dependencies cannot be statically extracted
 @ ./~/pdfjs-dist/build/pdf.js 9858:6-13

WARNING in ./~/pdfjs-dist/build/pdf.min.js
Critical dependencies:
1:121738-121745 require function is used in a way in which dependencies cannot be statically extracted
1:121755-121762 require function is used in a way in which dependencies cannot be statically extracted
1:122078-122085 require function is used in a way in which dependencies cannot be statically extracted
 @ ./~/pdfjs-dist/build/pdf.min.js 1:121738-121745 1:121755-121762 1:122078-122085

WARNING in ./~/pdfjs-dist/build/pdf.js.map
Module parse failed: C:\Users\Itai Braude\Documents\Coding\Boostnote\node_modules\pdfjs-dist\build\pdf.js.map Unexpected token (1:10)
You may need an appropriate loader to handle this file type.
SyntaxError: Unexpected token (1:10)

WARNING in ./~/pdfjs-dist/build/pdf.worker.js.map
Module parse failed: C:\Users\Itai Braude\Documents\Coding\Boostnote\node_modules\pdfjs-dist\build\pdf.worker.js.map Unexpected token (1:10)
You may need an appropriate loader to handle this file type.
SyntaxError: Unexpected token (1:10)

I have started investigating the issue and I found the changing "webpack-production-config.js" to load pdfjs using raw-loader solves the Module Parse Failed error as so:

const skeleton = require('./webpack-skeleton')
const webpack = require('webpack')
const path = require('path')
const NodeTargetPlugin = require('webpack/lib/node/NodeTargetPlugin')

var config = Object.assign({}, skeleton, {
  module: {
    loaders: [
      {
        test: /pdf(\.worker)?(\.min)?\.js\.map$/,
        loader: 'raw-loader'
      },
      {
        test: /(\.js|\.jsx)?$/,
        exclude: [/(node_modules|bower_components)/, /pdf(\.worker)?(\.min)?\.js\.map$/],
        loader: 'babel'
      },
      {
        test: /\.styl$/,
        exclude: /(node_modules|bower_components)/,
        loader: 'style!css?modules&importLoaders=1&localIdentName=[name]__[local]___[path]!stylus?sourceMap'
      },
      {
        test: /\.json$/,
        loader: 'json'
      }
    ]
  },
  output: {
    path: path.join(__dirname, 'compiled'),
    filename: '[name].js',
    libraryTarget: 'commonjs2',
    sourceMapFilename: '[name].map',
    publicPath: 'http://localhost:8080/assets/'
  },
  plugins: [
    new webpack.NoErrorsPlugin(),
    new NodeTargetPlugin(),
    new webpack.optimize.OccurenceOrderPlugin(),
    new webpack.DefinePlugin({
      'process.env': {
        'NODE_ENV': JSON.stringify('production'),
        'BABEL_ENV': JSON.stringify('production')
      }
    })
  ]
})

module.exports = config

And I know that the following code in node_modules/pdf-dist/build/pdf.js is the section that causes the WARNING in ./~/pdfjs-dist/build/pdf.js Critical dependencies: 9858:6-13 require function is used in a way in which dependencies cannot be statically extracted @ ./~/pdfjs-dist/build/pdf.js 9858:6-13 error:

The code:

{
  var useRequireEnsure = false;

  if (typeof window === 'undefined') {
    isWorkerDisabled = true;

    if (typeof require.ensure === 'undefined') {
      require.ensure = require('node-ensure');
    }

    useRequireEnsure = true;
  } else if (typeof require !== 'undefined' && typeof require.ensure === 'function') {
    useRequireEnsure = true;
  }

  if (typeof requirejs !== 'undefined' && requirejs.toUrl) {
    fallbackWorkerSrc = requirejs.toUrl('pdfjs-dist/build/pdf.worker.js');
  }

  var dynamicLoaderSupported = typeof requirejs !== 'undefined' && requirejs.load;
  fakeWorkerFilesLoader = useRequireEnsure ? function () {
    return new Promise(function (resolve, reject) {
      require.ensure([], function () {
        try {
          var worker;
          worker = require('./pdf.worker.js');
          resolve(worker.WorkerMessageHandler);
        } catch (ex) {
          reject(ex);
        }
      }, reject, 'pdfjsWorker');
    });
  } : dynamicLoaderSupported ? function () {
    return new Promise(function (resolve, reject) {
      requirejs(['pdfjs-dist/build/pdf.worker'], function (worker) {
        try {
          resolve(worker.WorkerMessageHandler);
        } catch (ex) {
          reject(ex);
        }
      }, reject);
    });
  } : null;

  if (!fallbackWorkerSrc && (typeof document === "undefined" ? "undefined" : _typeof(document)) === 'object' && 'currentScript' in document) {
    var pdfjsFilePath = document.currentScript && document.currentScript.src;

    if (pdfjsFilePath) {
      fallbackWorkerSrc = pdfjsFilePath.replace(/(\.(?:min\.)?js)(\?.*)?$/i, '.worker$1$2');
    }
  }
}

However, I could not find a solution to this problem.

Expected behavior

Pdf annotation extractor feature will work in production mode as it does in dev mode

Steps to reproduce

  1. clone my forked repo and switch to "pdf-extractor" branch
git clone https://github.com/ibraude/Boostnote.git
git checkout pdf-extractor
  1. run Boostnote in dev mode to see the working version
yarn install
yarn run dev

3.Build Boosnote to see the error

grunt pre-build

If anyone has any ideas regarding this issue I will really appreciate it.

Also, I would like to hear if this is a feature that other users find useful and if so I will make sure to develop it further and make a PR.

Thanks a lot 🙏

my package.json:

{
  "name": "boost",
  "productName": "Boostnote",
  "version": "0.13.0",
  "main": "index.js",
  "description": "Boostnote",
  "license": "GPL-3.0",
  "scripts": {
    "start": "electron ./index.js",
    "compile": "grunt compile",
    "test": "npm run ava && npm run jest",
    "ava": "cross-env NODE_ENV=test ava --serial",
    "jest": "jest",
    "fix": "eslint . --fix",
    "lint": "eslint .",
    "dev": "cross-env NODE_ENV=development node dev-scripts/dev.js",
    "watch": "webpack-dev-server --hot"
  },
  "config": {
    "electron-version": "3.0.8"
  },
  "repository": {
    "type": "git",
    "url": "git+https://github.com/BoostIO/Boostnote.git"
  },
  "keywords": [
    "boostnote",
    "b00st",
    "boostio",
    "note",
    "snippet",
    "storage",
    "electron"
  ],
  "author": "Junyoung Choi <fluke8259@gmail.com> (https://github.com/Rokt33r)",
  "contributors": [
    "Kazu Yokomizo (https://github.com/kazup01)",
    "dojineko (https://github.com/dojineko)",
    "Romain Bazile (https://github.com/gromain)",
    "Bruno Paz (https://github.com/brpaz)",
    "Fabian Mueller (https://github.com/dotcs)",
    "Yoshihisa Mochihara (https://github.com/yosmoc)",
    "Mike Resoli (https://github.com/mikeres0)",
    "tjado (https://github.com/tejado)",
    "Sota Sugiura (https://github.com/sota1235)",
    "Milo Todt (https://github.com/MiloTodt)"
  ],
  "bugs": {
    "url": "https://github.com/BoostIO/Boostnote/issues"
  },
  "homepage": "https://boostnote.io",
  "dependencies": {
    "@enyaxu/markdown-it-anchor": "^5.0.2",
    "@hikerpig/markdown-it-toc-and-anchor": "^4.5.0",
    "@rokt33r/js-sequence-diagrams": "^2.0.6-2",
    "@rokt33r/markdown-it-math": "^4.0.1",
    "@rokt33r/season": "^5.3.0",
    "@susisu/mte-kernel": "^2.0.0",
    "aws-sdk": "^2.48.0",
    "aws-sdk-mobile-analytics": "^0.9.2",
    "chart.js": "^2.7.2",
    "codemirror": "^5.40.2",
    "codemirror-mode-elixir": "^1.1.1",
    "connected-react-router": "^6.4.0",
    "copy-webpack-plugin": "^5.0.4",
    "electron-config": "^1.0.0",
    "electron-gh-releases": "^2.0.4",
    "escape-string-regexp": "^1.0.5",
    "file-uri-to-path": "^1.0.0",
    "file-url": "^2.0.2",
    "filenamify": "^2.1.0",
    "flowchart.js": "^1.6.5",
    "font-awesome": "^4.3.0",
    "fs-extra": "^5.0.0",
    "highlight.js": "^9.13.1",
    "i18n-2": "^0.7.2",
    "iconv-lite": "^0.4.19",
    "immutable": "^3.8.1",
    "invert-color": "^2.0.0",
    "js-yaml": "^3.12.0",
    "jsonlint-mod": "^1.7.4",
    "katex": "^0.10.1",
    "lodash": "^4.11.1",
    "lodash-move": "^1.1.1",
    "markdown-it": "^6.0.1",
    "markdown-it-abbr": "^1.0.4",
    "markdown-it-admonition": "^1.0.4",
    "markdown-it-emoji": "^1.1.1",
    "markdown-it-footnote": "^3.0.0",
    "markdown-it-imsize": "^2.0.1",
    "markdown-it-kbd": "^1.1.1",
    "markdown-it-multimd-table": "^2.0.1",
    "markdown-it-plantuml": "^1.1.0",
    "markdown-it-smartarrows": "^1.0.1",
    "markdown-it-sub": "^1.0.0",
    "markdown-it-sup": "^1.0.0",
    "markdown-toc": "^1.2.0",
    "mdurl": "^1.0.1",
    "mermaid": "^8.0.0-rc.8",
    "moment": "^2.10.3",
    "mousetrap": "^1.6.2",
    "mousetrap-global-bind": "^1.1.0",
    "node-ipc": "^8.1.0",
    "pdfjs-dist": "^2.2.228",
    "prettier": "^1.18.2",
    "prop-types": "^15.7.2",
    "query-string": "^6.5.0",
    "raphael": "^2.2.7",
    "raw-loader": "^3.1.0",
    "react": "^16.8.6",
    "react-autosuggest": "^9.4.0",
    "react-codemirror": "^1.0.0",
    "react-color": "^2.2.2",
    "react-composition-input": "^1.1.1",
    "react-debounce-render": "^4.0.1",
    "react-dom": "^16.8.6",
    "react-image-carousel": "^2.0.18",
    "react-redux": "^7.0.3",
    "react-router-dom": "^5.0.0",
    "react-sortable-hoc": "^0.6.7",
    "react-transition-group": "^2.5.0",
    "redux": "^3.5.2",
    "sander": "^0.5.1",
    "sanitize-html": "^1.18.2",
    "striptags": "^2.2.1",
    "turndown": "^4.0.2",
    "turndown-plugin-gfm": "^1.0.2",
    "typo-js": "^1.0.3",
    "unique-slug": "2.0.0",
    "uuid": "^3.2.1"
  },
  "devDependencies": {
    "ava": "^0.25.0",
    "babel-core": "^6.14.0",
    "babel-jest": "^22.4.3",
    "babel-loader": "^6.2.0",
    "babel-plugin-react-transform": "^2.0.0",
    "babel-plugin-webpack-alias": "^2.1.1",
    "babel-preset-env": "^1.6.1",
    "babel-preset-es2015": "^6.3.13",
    "babel-preset-react": "^6.24.1",
    "babel-preset-react-hmre": "^1.0.1",
    "babel-register": "^6.11.6",
    "browser-env": "^3.2.5",
    "color": "^3.0.0",
    "concurrently": "^3.4.0",
    "copy-to-clipboard": "^3.0.6",
    "cross-env": "^5.2.0",
    "css": "^2.2.4",
    "css-loader": "^0.19.0",
    "devtron": "^1.1.0",
    "dom-storage": "^2.0.2",
    "electron": "3.0.8",
    "electron-debug": "^2.2.0",
    "electron-devtools-installer": "^2.2.4",
    "electron-packager": "^12.2.0",
    "eslint": "^3.13.1",
    "eslint-config-standard": "^6.2.1",
    "eslint-config-standard-jsx": "^3.2.0",
    "eslint-plugin-react": "^7.8.2",
    "eslint-plugin-standard": "^3.0.1",
    "faker": "^3.1.0",
    "grunt": "^0.4.5",
    "grunt-electron-installer": "2.1.0",
    "history": "^4.9.0",
    "husky": "^1.1.0",
    "identity-obj-proxy": "^3.0.0",
    "jest": "^22.4.3",
    "jest-localstorage-mock": "^2.2.0",
    "jsdom": "^9.4.2",
    "json-loader": "^0.5.4",
    "markdownlint": "^0.11.0",
    "merge-stream": "^1.0.0",
    "mock-require": "^3.0.1",
    "nib": "^1.1.0",
    "react-css-modules": "^4.7.9",
    "react-input-autosize": "^1.1.0",
    "react-test-renderer": "^16.8.6",
    "redux-devtools": "^3.5.0",
    "redux-devtools-dock-monitor": "^1.1.3",
    "redux-devtools-log-monitor": "^1.4.0",
    "signale": "^1.2.1",
    "standard": "^8.4.0",
    "style-loader": "^0.12.4",
    "stylus": "^0.52.4",
    "stylus-loader": "^2.3.1",
    "webpack": "^1.12.2",
    "webpack-dev-server": "^1.12.0"
  },
  "optionalDependencies": {
    "grunt-electron-installer-debian": "^0.2.0",
    "grunt-electron-installer-redhat": "^0.3.1"
  },
  "optional": false,
  "ava": {
    "files": [
      "tests/**/*-test.js"
    ],
    "require": [
      "babel-register",
      "./tests/helpers/setup-browser-env.js",
      "./tests/helpers/setup-electron-mock.js"
    ],
    "babel": "inherit"
  },
  "jest": {
    "moduleNameMapper": {
      "\\.(jpg|jpeg|png|gif|eot|otf|webp|svg|ttf|woff|woff2|mp4|webm|wav|mp3|m4a|aac|oga)$": "<rootDir>/__mocks__/fileMock.js",
      "\\.(css|less|styl)$": "identity-obj-proxy"
    },
    "setupFiles": [
      "<rootDir>/tests/jest.js",
      "jest-localstorage-mock"
    ]
  },
  "husky": {
    "hooks": {
      "pre-commit": "npm run lint"
    }
  }
}

@Flexo013 Flexo013 added help wanted 🆘 Pull request/issue requires extra help from the community. Check these out if you're new! question ❓ Issue concerns a question. labels Oct 22, 2019
@ibraude
Copy link
Contributor Author

ibraude commented Oct 22, 2019

As this seems to be a problem with webpack, perhaps upgrading to webpack 4 or even 2 might solve the issue?
any ideas will be appreciated.

@ZeroX-DG
Copy link
Member

Have you tried using this configuration?
https://github.com/mozilla/pdf.js/tree/master/examples/webpack
I'll try to investigate this later.

@ibraude
Copy link
Contributor Author

ibraude commented Oct 24, 2019

I have, but there's a good chance I've missed something as I haven't had much experience with webpack configuration.

using 'pdfjs-dist/webpack' works great on dev mode even without any further configuration.
The problem is with production for some reason.

I've also followed this thread mozilla/pdf.js#7612
and tried to see if any of the fixes there are relevant but unfortunately with no success.

@ZeroX-DG
Copy link
Member

ZeroX-DG commented Jun 3, 2020

@ibraude sorry for the wait, can you resolve conflict and give me more information on the error you get when you try run it in production mode?

@ZeroX-DG
Copy link
Member

@ibraude ping!

@ibraude
Copy link
Contributor Author

ibraude commented Jun 11, 2020

@ZeroX-DG conflicts resolved. Regarding the error, I don't have any more information other than what's described in the initial comment.

@Flexo013 Flexo013 requested a review from ZeroX-DG June 14, 2020 10:11
@Flexo013 Flexo013 added the awaiting review ❇️ Pull request is awaiting a review. label Jun 14, 2020
@ZeroX-DG
Copy link
Member

I've been investigate this for a while and still stuck with this. I'm not an expert in pdf.js, I'll try investigate more but it will take a bit longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review ❇️ Pull request is awaiting a review. help wanted 🆘 Pull request/issue requires extra help from the community. Check these out if you're new! question ❓ Issue concerns a question.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants