Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add v1 of typescript definition file #9369

Closed
wants to merge 4 commits into from
Closed

Conversation

acchou
Copy link
Contributor

@acchou acchou commented Jan 16, 2018

This is a WIP typescript definition file (index.d.ts). There are several issues to resolve before this is ready:

  • Determine where in the repository the file should live. I've placed it in the root for now.
  • Modify gulpfile.js to copy index.d.ts to the root of the distribution directory build/dist. If the file is named index.d.ts there is no need for further configuration, it is optional but suggested by the TypeScript docs to add a reference to package.json.
  • Add some tests. Testing the types will require having TypeScript. I am hesitant to add TypeScript as a devDependency, but perhaps this is the best way to proceed... Alternatively we could omit tests or add tests in a separate repo. Some guidance here would be helpful.
  • Cull down the type file to the desired public interface. I've erred on the side of completeness, a fair amount of culling should be done to remove types not intended for external consumption. Some guidance would be helpful here, as I'm not entirely sure what was intended to be in the public API.

@timvandermeij
Copy link
Contributor

If complete, this fixes #7909. I'm wondering how the maintenance of this will go. Should we maintain it manually on can we generate this file somehow?

@yurydelendik
Copy link
Contributor

let's place it in e.g. "external/types/" folder and also split into several files, e.g. display.ts/core.ts/shared.ts

@acchou
Copy link
Contributor Author

acchou commented Jan 17, 2018

I manually created this file, and I recommend that it be manually updated. There are many cases where there's a stylistic choice on how to specify the types, and in this code base in particular I think it serves as useful documentation, even for developers who are not using TypeScript.

The best way to help maintain it is probably to create a test file that exercises some of the most common APIs. When the APIs change, the test will fail and prompt the developer who changed the code to update the TypeScript definition and tests.

I have taken a crack at this with an external test file (not included in this pull request), but I'm having trouble setting up the test scaffolding. Some issues:

  • Where should the test go
  • How to specify the dependency on TypeScript (e.g. is it ok to add TypeScript as a devDependency)
  • How to set up the test itself. I'm not familiar with the test framework that PDFJS uses. If one of you could create an empty test file that is already working, I can add the tests themselves much easier.

@acchou
Copy link
Contributor Author

acchou commented Jan 17, 2018

@yurydelendik It is possible to split the types into separate files, but it introduces several issues:

  • The type files will need to be refactored if definitions move around, or if files are added or removed (which have a public API)
  • It requires TypeScript references, which are a bit less well understood (at least by me) and will require some additional configuration. I'm worried this may cause issues for users down the road.
  • I'm not sure about how well symbol lookup will work with IDEs.

I think the best solution would be to convert PDFJS itself to TypeScript, then the types would be part of the source files and no parallel type definitions or files would be needed. But this is probably too ambitious :) However I'll say that this type definition file goes a long way towards that goal, if it's something the team would consider.

For now I'd suggest we put the effort into adding tests, then we can consider whether refactoring to split the types makes sense. Does that sound ok?

@acchou
Copy link
Contributor Author

acchou commented Jan 17, 2018

I'll also add that as I mentioned in the original PR, there's much culling that could probably be done to make the type file smaller. Doing that first might result in a small enough file that there's less of a feeling that it needs to be split up.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jan 17, 2018

The best way to help maintain it is probably to create a test file that exercises some of the most common APIs. When the APIs change, the test will fail and prompt the developer who changed the code to update the TypeScript definition and tests.

I suppose that my general question/objection here is if people using TypeScript will actually help maintain these definitions, since it seems somewhat unfair to put that burden on every PDF.js contributor considering that the library is only using regular JavaScript!?

Cull down the type file to the desired public interface. I've erred on the side of completeness, a fair amount of culling should be done to remove types not intended for external consumption. Some guidance would be helpful here, as I'm not entirely sure what was intended to be in the public API.

In general, the public interface is the one that's found in https://github.com/mozilla/pdf.js/blob/master/src/display/api.js (and perhaps in some of the other files in https://github.com/mozilla/pdf.js/tree/master/src/display); i.e. what ends up in the build/pdf.js file.
Probably it also makes sense to document e.g. the viewer components, as listed in https://github.com/mozilla/pdf.js/blob/master/web/pdf_viewer.component.js.

However, note that all code found in https://github.com/mozilla/pdf.js/tree/master/src/core is running in a web worker, and is thus not even accessible much less part of any public interface.
Finally, note that as part of the upcoming version 2.0 of PDF.js, which is tracked in https://github.com/mozilla/pdf.js/projects/5, we're also aiming to remove the global PDFJS object.

@yurydelendik
Copy link
Contributor

yurydelendik commented Jan 17, 2018

The type files will need to be refactored if definitions move around, or if files are added or removed (which have a public API)

We already tracking changes to public API. So modifying/syncing with .d.ts might not be an issue

It requires TypeScript references, which are a bit less well understood (at least by me) and will require some additional configuration. I'm worried this may cause issues for users down the road.

When TS builds .ts, it generates .d.ts file that references other files. So it's pretty common practice.

I'm not sure about how well symbol lookup will work with IDEs.

As above, IDE will support stuff generated by standard TS compiler.

convert PDFJS itself to TypeScript,

There is also alternative: Flow. I fonder how easy to convert TS to Flow (or other way around).

@Snuffleupagus
Copy link
Collaborator

I think the best solution would be to convert PDFJS itself to TypeScript, then the types would be part of the source files and no parallel type definitions or files would be needed. But this is probably too ambitious :)

We're still in the process of slowly transitioning the code-base to ES6, and generally more modern JavaScript usage. However, I'd seriously question if it's worth the time and effort required to migrate everything to TypeScript (especially considering the size of the code-base and the number of existing contributors).

@acchou
Copy link
Contributor Author

acchou commented Jan 17, 2018

@Snuffleupagus I understand that the intended API is in api.js, but the actual interface exposed includes other types that are required to support those in api.js. Those types in turn expose access to other types. I took the closure of accessible types in this manner to create the type definition file. This is the set of files the types actually come from:

src/display/global.js
src/shared/util.js
src/display/api.js
src/display/dom_utils.js
src/display/annotation_layer.js
src/display/text_layer.js
src/display/metadata.js
src/display/svg.js
src/core/obj.js
src/core/primitives.js
src/core/annotation.js
web/interfaces.js
web/pdf_link_service.js
web/ui_utils.js
web/pdf_rendering_queue.js
web/base_viewer.js
web/text_layer_builder.js
web/pdf_find_controller.js
web/annotation_layer_builder.js
web/download_manager.js
web/pdf_page_view.js
web/pdf_viewer.js
web/pdf_thumbnail_viewer.js
web/pdf_thumbnail_view.js
web/pdf_history.js
src/display/canvas.js
src/core/fonts.js
src/core/font_renderer.js
src/core/cmap.js
src/core/stream.js
src/display/global.js
src/pdf.js

@yurydelendik This is another thing to consider, do we really want to add a .ts file for every one of these? We should cull this list first, then decide if splitting into separate files is needed.

To your other question, unfortunately Flow types and TypeScript types aren't compatible at the moment, though it would be a very nice feature to have. My understanding is that TS is getting more traction, and in practice my understanding is that there are more requests for TS than Flow types for pdf.js from the user community.

@acchou
Copy link
Contributor Author

acchou commented Jan 17, 2018

If you'd like to see which types come from which files, they're documented in the source code: https://github.com/acchou/pdf.js/blob/41205e3a8dd09e49f710760658a0c9d2168b0381/index.d.ts

The regex //.*\.js should find the comments that specify source files.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jan 18, 2018

[...] but the actual interface exposed includes other types that are required to support those in api.js. Those types in turn expose access to other types.

Please keep in mind here that what's returned through the API is not at all the same types that exist in the worker. When data is passed from a web worker to the main thread (or vice versa), via postMessage, the "structured cloning algorithm" is being used.

What this means in practice is that if you have e.g. an object on the worker side which is instanceof Name, see https://github.com/mozilla/pdf.js/blob/master/src/core/primitives.js#L19, then such an object once sent to the main thread (i.e. the API) consists of nothing more than a plain object of the format { name: 'some_string', }.
Edit: Hence https://github.com/acchou/pdf.js/blob/41205e3a8dd09e49f710760658a0c9d2168b0381/index.d.ts#L1871-L1875 is unfortunately not correct on the API side.

The above is just one very simply example out of many more, but as you can see the types used in the worker doesn't really make sense on the main/API side of things. What's returned by various API methods could probably be thought of more like a proxy to the underlying worker data instead.

@acchou
Copy link
Contributor Author

acchou commented Jan 18, 2018

Ok, this was something that I didn't understand. When the types are corrected, they should serve as helpful documentation of what the public API actually looks like.

I'll take some time to make this correction; what would be most helpful is to understand how to create a test for this, then I can write actual tests for these types. I need help with the test scaffolding/setup. Where should I put tests for types, and how should I set it up?

@acchou
Copy link
Contributor Author

acchou commented Jan 18, 2018

It should be easy to convert: just change classes to interfaces, and remove methods. This is for all data objects returned by postMessage, which should help cull down the type file a lot.

For example, I could modify Name as:

  interface Name {
    name: string;
  }

@acchou
Copy link
Contributor Author

acchou commented Jan 18, 2018

Let's consider PDFPageProxy as an example. It has a method:

    getAnnotations(params?: GetAnnotationsParameters): Promise<Annotation[]>;

In general it looks like in the pdfjs API, things returned as Promise<T> are posted from workers using structured cloning, so I'm assuming that Annotation[] will be a cloned object.

Currently I have Annotation typed like this:

class Annotation {
    data: {
      annotationFlags: number;
      borderStyle: AnnotationBorderStyle;
      color: Color | null;
      hasAppearance: boolean;
      id: string;
      rect: Rect;
      subtype: AnnotationSubtype | null;
    };

    viewable: boolean;
    printable: boolean;

    /**
     * Check if a provided flag is set.
     *
     * @public
     * @memberof Annotation
     * @param {number} flag - Hexadecimal representation for an annotation
     *                        characteristic
     * @return {boolean}
     * @see {@link shared/util.js}
     */
    hasFlag(flag: AnnotationFieldFlag): boolean;
  }

There are a few things I could change:

  • Turn Annotation into an interface instead of class.
  • Remove the hasFlag method because it won't be accessible in the API.
  • Because hasFlag is the only place that mentions the enum type AnnotationFieldFlag, I can remove that too. So the following enum is not accessible to the public API:
 enum AnnotationFieldFlag {
    READONLY,
    REQUIRED,
    NOEXPORT,
    MULTILINE,
    PASSWORD,
    NOTOGGLETOOFF,
    RADIO,
    PUSHBUTTON,
    COMBO,
    EDIT,
    SORT,
    FILESELECT,
    MULTISELECT,
    DONOTSPELLCHECK,
    DONOTSCROLL,
    COMB,
    RICHTEXT,
    RADIOSINUNISON,
    COMMITONSELCHANGE
  }

Therefore the final type for Annotation would be:

  interface Annotation {
    data: {
      annotationFlags: number;
      borderStyle: AnnotationBorderStyle;
      color: Color | null;
      hasAppearance: boolean;
      id: string;
      rect: Rect;
      subtype: AnnotationSubtype | null;
    };

    viewable: boolean;
    printable: boolean;
  }

Does that sound correct?

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jan 18, 2018

Does that sound correct?

Not quite, since the only thing that's returned for annotations is the data object, as can be seen in

pdf.js/src/core/document.js

Lines 310 to 319 in 96c573a

getAnnotationsData: function Page_getAnnotationsData(intent) {
var annotations = this.annotations;
var annotationsData = [];
for (var i = 0, n = annotations.length; i < n; ++i) {
if (!intent || isAnnotationRenderable(annotations[i], intent)) {
annotationsData.push(annotations[i].data);
}
}
return annotationsData;
},

Hence the getAnnotations method of the API simply returns a regular Array, containing zero or more objects of the form { annotationFlags, borderStyle, color, ... }.


A piece of general advice: I'd suggest playing with the API locally to check what's actually returned from the various methods, since it might perhaps be difficult to reason about that theoretically if you're not already familiar with a particular part of the code-base.

@acchou
Copy link
Contributor Author

acchou commented Jan 18, 2018

I'll have to do that some more. It's often not enough though, because dynamically it's hard to know if you're seeing everything - there are plenty of optional keys in various types that don't always show up every time, for example.

I'll try adding a unit test, that should help document things and also provide dynamic verification of the types.

@acchou
Copy link
Contributor Author

acchou commented Jan 22, 2018

Question: 'enableStats' is defined as a case in getDefaultSetting but there's no corresponding field in global.js. Is this a bug or an intentional omission...?

@acchou
Copy link
Contributor Author

acchou commented Jan 22, 2018

In StatTimer, are functions like time(name), timeEnd(name), and reset() intended to be part of the public API? I'm guessing the answer is no...

@timvandermeij
Copy link
Contributor

The StatTimer class is used in our debugging tools, so it's not part of the API.

@acchou
Copy link
Contributor Author

acchou commented Jan 22, 2018

Ok thanks. Another issue: AnnotationType doesn't appear to be exported, which means the annotationType field of Annotation will be an opaque number to clients. Am I missing something or should AnnotationType be exported?

@acchou
Copy link
Contributor Author

acchou commented Jan 23, 2018

Is PDFWorker meant to be opaque, or are its methods intended for public consumption? Here's the typing I have so far, but it's unclear what is intended. api_spec.js uses these some of these methods.

    class PDFWorker {
        constructor(name: string, port?: MessagePort);
        name: string;
        destroyed: boolean;
        postMessageTransfers: boolean;
        promise: Promise<void>;
        port: MessagePort;
        messageHandler: MessageHandler;
        destroy(): void;
        fromPort(port: MessagePort): PDFWorker;
    }

@acchou
Copy link
Contributor Author

acchou commented Jan 23, 2018

I just pushed several changes:

  • Separate types into multiple files in types/*.d.ts, as suggested by @yurydelendik
  • As suggested by @Snuffleupagus, verified static types by examining runtime types. Made some fixes to the static types as a result.
  • Added types for viewer components
  • Add tests for the most common APIs to verify runtime type matches type declarations. These tests live in examples/typescript. The tests have a dependency on pdf files from test/pdfs.
  • Modify dist target in gulpfile to copy type files to build/dist/types
  • Modify generated package.json to add reference to toplevel type declaration file pdf.d.ts. This means TypeScript consumers of the pdfjs-dist module automatically get the correct typings.

For the viewer components, I exported them by default into the PDFJS object. I'm not sure I've set that part up correctly.

@acchou
Copy link
Contributor Author

acchou commented Jan 23, 2018

Also, the tests are not run as part of any other gulp task, they are separate at the moment.

@timvandermeij
Copy link
Contributor

I have not done an in-depth review, but I did notice two things. Most of the files are just type definitions, but why does examples/typescript/src/domstubs.ts contain actual logic, i.e., is it not possible to use the existing logic in any way since we'd really prefer not to duplicate it? Moreover, since we're in the process of deprecating the global PDFJS object (see https://github.com/mozilla/pdf.js/blob/master/src/display/global.js#L36), we should find another solution for that here, but I'm not entirely sure what a good approach for this would be.

I notice that there may be some unanswered questions above, but maybe in the meantime you already found some answers. If not, could you compile a list of remaining questions in one comment so it's easier for us to answer them?

For reviewing, it would help to rebase onto the current master and to squash the commits (see https://github.com/mozilla/pdf.js/wiki/Squashing-Commits if you're not familiar with it). Now there are 21 commits and 24 changes files, which makes it hard to get a good overview of the changes since some changes are just there because of merge commits. After rebasing and squashing, there should be no more merge commits and one (or perhaps two if you want to split the testing from the actual type definitions) functional commits. Thank you!

@timvandermeij
Copy link
Contributor

For Gulp, I think the best place is

gulp.task('externaltest', function () {
. The externaltest target already runs some separate test files for code in the external folder, so you can add your tests there too.

@acchou
Copy link
Contributor Author

acchou commented Jan 23, 2018

For the external test, my test requires an npm install to pull in some dependencies. I'm not super familiar with gulp and I'm not sure it's a good idea to blindly do an npm install and npm test. Open to suggestions on that.

The only remaining question I have is how to structure the viewer types. These are currently collected in types/web.d.ts and exported in types/pdf.d.ts. But this means an import of "pdfjs-dist" automatically pulls the viewer's exported types into the PDFJS variable as well (following what's done by https://github.com/mozilla/pdf.js/blob/master/web/pdf_viewer.component.js. Perhaps if PDFJS is going away then there is a more modular approach.

On the topic of removing PDFJS, this should be straightforward to accommodate. Everything is defined in the _pdfjs namespace, and re-exported in pdf.d.ts. To reshape the API just move things around in the latter file to re-export appropriately.

I'll squash the commits shortly.

@acchou
Copy link
Contributor Author

acchou commented Jan 23, 2018

Squashed.

@acchou
Copy link
Contributor Author

acchou commented Jan 23, 2018

Regarding domstubs.ts, I could work on removing it. It will mean there's a dependency between examples/typescript and examples/node. Maybe domstubs.js should live somewhere else? Or we can just accept the sibling dependency.

@acchou
Copy link
Contributor Author

acchou commented Mar 27, 2018

After looking at pdfjs-dist some more, I believe @nicolas-schreiber is right. I modified the module declaration for the viewer to pdfjs-dist/web/pdf_viewer and updated the constructor function type for PDFLinkService, and squashed again.

@acchou
Copy link
Contributor Author

acchou commented Mar 27, 2018

Note that the declaration for the pdfjs-dist/web/pdf_viewer requires use as a module, not as a global pdfjsViewer variable in a script. Given that TypeScript clients are more likely to use pdf.js as a module than a script, I think this is ok.

I did play around with the TypeScript methods of declaring UMD modules but couldn't figure it out. Maybe someone else who's more familiar with it can issue a future PR if that's an important use case for them.

In the meantime, the viewer is accessible as a module.

@SylvesterMachielse
Copy link

Does anyone know the status?

@acchou
Copy link
Contributor Author

acchou commented May 23, 2018

It's largely complete (as of 3/27 - I haven't checked more recent updates), but the lack of feedback from maintainers means it's basically in limbo... Maybe someone from the team could select one of these options:

  • we can't review this now, maybe by XYZ date
  • sorry we're just not interested in adding typescript type declarations to pdf.js
  • we have some issues with the type declaration file and/or its quality, style, or process
  • we'd like to try out flow or some other system
  • adding this is too low priority compared with other things on the roadmap
  • this belongs outside pdf.js proper, perhaps it should go in @types/pdfjs-dist
  • other?

I might submit this to @types/pdfjs-dist instead if there is no further response.

@timvandermeij
Copy link
Contributor

@yurydelendik Could you chime in for the above?

@timvandermeij
Copy link
Contributor

timvandermeij commented Jun 27, 2018

@acchou First of all, sorry for the delay here. We put a lot of effort in getting version 2.0 done for release and spare time was a bit limited for me personally.

I looked at this again and even though I think the type definitions make sense I still have concerns regarding maintenance, mainly because PDF.js itself does not use any TypeScript and keeping this in sync with the code will be challenging. Therefore, to answer your question above, I personally think it's better to place it in @types/pdfjs-dist since that contains type definitions for a lot of other libraries too and it seems that people are used to having their type definitions for non-TypeScript projects there.

@brendandahl
Copy link
Contributor

Sorry for the delay, Yury and I haven't had much time to work on pdf.js lately. We discussed this a bit recently though and we'd like to accept this and then work on removing duplicate comments and letting the typescript definitions be the documentation for the public API. I think we can accept as is, then do the cleanup.

@acchou
Copy link
Contributor Author

acchou commented Jun 28, 2018

Sounds good. Let me know if I can help with any cleanup here.

@FraserKillip
Copy link

Sorry to be that guy.. but any update on the merge for this? The current @types are completely wrong and it makes working with this lib very hard :/

@timvandermeij
Copy link
Contributor

timvandermeij commented Oct 22, 2018

I'm also fine with merging this after the 2.0 stable release, which should happen soon, if it's updated as we have added a few more annotation types in the meantime. Since this doesn't touch any core files, I agree with @brendandahl that even more cleanup can be done at a later stage.

@acchou
Copy link
Contributor Author

acchou commented Oct 22, 2018

If you can point me in the general direction of where these changes are located, I could update the type annotations.

external/types/shared.d.ts Outdated Show resolved Hide resolved
annotationType: AnnotationType.POLYLINE;
vertices: Vertex[];
}
interface PolygonAnnotation extends Annotation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation layer changes are in https://github.com/mozilla/pdf.js/blob/master/src/core/annotation.js (core) and https://github.com/mozilla/pdf.js/blob/master/src/display/annotation_layer.js (display). I think InkAnnotation is missing, but maybe there is more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, see b74c5fd.

@swftvsn
Copy link
Contributor

swftvsn commented Feb 11, 2019

Hi, sorry to be that guy again but, any update on this? @timvandermeij @acchou ?

@timvandermeij
Copy link
Contributor

timvandermeij commented Sep 20, 2019

I'm still a bit on the fence here. Even though this PR looks good, there is an alternative in #10575 that aims to generate the TypeScript files from type annotations in the docstrings. To me this seems much more maintainable since we don't have to keep multiple files in sync. I think the end result would then be a combination of both, where we move the type annotations from this PR into the docstrings. What do you think? I think providing TypeScript definitions is good, but since it isn't our main focus it should really be easy to maintain.

@brendandahl
Copy link
Contributor

I was recently made aware of typescripts --checkJs mode which works off of jsdoc comments. If we can get that working on travis, I think that's the path going forward.

@burtonator
Copy link

May I recommend just moving forward with even a manually updated typescript definition?

This ticket was updated 1y10m ago... This is REALLY needed for external developers because without it the API is very very very opaque.

Even just copying @types/nodejs-dist over and running with that would be good because we can manually merge the types in the mean time as they evolve.

Sending a 1 line PR is a lot easier than trying to build an auto-generating system. I can't imagine the core API changes that much in 1y10m where we couldn't have updated it manually.

@brendandahl
Copy link
Contributor

I wasn't hoping to build an auto generating system. I was thinking we could have all our typescript definition files and then use things like @implements SomeInterface. However, I played around more with checkJs and it's not as powerful as I was hoping, as it does not support interfaces and doesn't look like it ever will.

@tamuratak
Copy link
Contributor

tamuratak commented Oct 11, 2019

If the approach of #10575 is preferred, what prevents #10575 to be merged? To my testing, rebasing the branch of #10575 to the current master is not difficult. Even only merging the comments of JSDoc in that PR seems valuable.

With npx gulp dts, it generates build/dts/types.d.ts.

@wojtekmaj
Copy link
Contributor

If there are no TS experts in PDF.js team I'd suggest to move these typings to e.g. definitely-typed. Otherwise, we will have a hard time maintaining these typings, with not many people willing & able to review the changes when necessary. Also, this would require contributors to know TypeScript whenever any API change is done.

@acchou
Copy link
Contributor Author

acchou commented Dec 27, 2019

Agreed, that seems to be the right thing to do at this point.

One thing before I do that though... has anyone tried to use the new --declaration flag with --allowJs (https://www.typescriptlang.org/docs/handbook/release-notes/overview.html#--declaration-and---allowjs)? This seems to allow the typescript compiler to natively generate declaration files from jsdoc annotated javascript source. This seems to be the preferred approach for the team from PR #10575.

@timvandermeij
Copy link
Contributor

@acchou I don't think anyone tried that yet, but it would be good to look into. I think we've come to the conclusion that manually maintaining the TypeScript bindings here is just not feasible and I agree with @wojtekmaj's analysis here. Let's close this PR and focus our efforts either on getting #10575 landed. We have already merged most of the documentation improvements from that patch in separate PRs, so we can either choose to rebase that PR or make a new PR with the solution where TypeScript natively generates the declaration files from the JSDoc comments. In any case, if we are to provide the bindings, they must be generated in some form to avoid a maintenance burden. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet