Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] PHP84 : Add spec compliance #74

Draft
wants to merge 7 commits into
base: 3.x
Choose a base branch
from
Draft

Conversation

veewee
Copy link
Owner

@veewee veewee commented Feb 21, 2024

Q A
Type improvement
BC Break yes
Fixed issues #72

Summary

A playground for the new spec-compliance RFC:

WIP: Extended DOM

Current stats:

 ...............................................................  63 / 607 ( 10%)
............................................................... 126 / 607 ( 20%)
............................................................... 189 / 607 ( 31%)
............................................................... 252 / 607 ( 41%)
............................................................... 315 / 607 ( 51%)
............................................................... 378 / 607 ( 62%)
............................................................... 441 / 607 ( 72%)
............................................................... 504 / 607 ( 83%)
............................................................... 567 / 607 ( 93%)
........................................                        607 / 607 (100%)

Time: 00:00.183, Memory: 34.43 MB

OK (607 tests, 1047 assertions)

Target PHP version: 8.1 (set by config file) Enabled extensions: dom.
Scanning files...
Analyzing files...

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  60 / 383 (15%)
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 120 / 383 (31%)
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 180 / 383 (46%)
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 240 / 383 (62%)
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 300 / 383 (78%)
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 360 / 383 (93%)
░░░░░░░░░░░░░░░░░░░░░░░
------------------------------

       No errors found!

------------------------------

Checks took 2.46 seconds and used 87.322MB of memory
Psalm was able to infer types for 100% of the codebase

TODO:

  • Upgrade to new DOM spec compliance implementation.
  • Unit tests
  • Psalm
  • Cs-fixer
  • Infection
  • Update documentation for changed functions / added functions.
    • Review : Does the new API makes sense... ?
  • Remove open TODOs
  • Git rid of Assert library
  • Add support for callables on xslt and xpath : https://wiki.php.net/rfc/improve_callbacks_dom_and_xsl

@veewee veewee marked this pull request as draft February 21, 2024 15:08
@veewee veewee force-pushed the php84-spec-compliance branch 2 times, most recently from 13f0140 to 5fbf4d4 Compare February 21, 2024 15:27
@veewee veewee force-pushed the php84-spec-compliance branch 10 times, most recently from c938d8c to 2a0559f Compare February 23, 2024 09:50
src/Xml/Dom/Configurator/pretty_print.php Outdated Show resolved Hide resolved
src/Xml/Dom/Configurator/canonicalize.php Show resolved Hide resolved
src/Xml/Dom/Locator/Attribute/xmlns_attributes_list.php Outdated Show resolved Hide resolved
src/Xml/Dom/Locator/Node/value.php Outdated Show resolved Hide resolved

interface Loader
{
public function __invoke(DOMDocument $document): void;
public function __invoke(): XMLDocument;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO : document new API for loading DOM

@@ -8,6 +8,8 @@
use XSLTProcessor;

/**
* TODO : Add support for callables : https://wiki.php.net/rfc/improve_callbacks_dom_and_xsl (either here or through a separate configurator)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO : Add support

@nielsdos
Copy link

nielsdos commented Mar 3, 2024

Just letting you know that while waiting for the reviews, I spent some time today implementing the missing APIs in a separate branch. (Can be found at nielsdos/php-src#93)
In particular:

  • DOM\Element::$substitutedNodeValue (I might add the same one for DOM\Attr too, not sure yet)
  • DOM\Element::getInScopeNamespaces(): array
  • DOM\Element::rename(?string $namespaceURI, string $qualifiedName): void and also for DOM\Attr.

Fortunately, as expected, these were rather easy to implement.

@veewee
Copy link
Owner Author

veewee commented Mar 4, 2024

Cool! I'm a bit busy this week but will try to play around with it soon :)

@veewee veewee force-pushed the php84-spec-compliance branch 2 times, most recently from 26d0875 to 644dd7d Compare March 10, 2024 08:09
@veewee
Copy link
Owner Author

veewee commented Mar 10, 2024

Just wanted to give you a small update @nielsdos :

  • The new entity substitution works flawlessly
  • Renaming nodes works in general, but I have a feeling there are a few cases not working properly (will come back to this when later, I now covered about half of the test-suite but I need to dive into the specifics so that I can give you more detailed information)
  • Removing namespaces works as described above (one case still failing - need to dive into the specifics as well) - It's getting a bit/log more complex now though.
  • The getInScopeNamespaces works theoretically, but I expect it to work a bit different. I've mentioned in your PR : Extra dom additions 1 nielsdos/php-src#93 (comment) (this results in the optimization code throwing 21 failures right now)

Tests: 608, Assertions: 1022, Errors: 20, Failures: 21.

Getting closer to the end goal :)

@nielsdos
Copy link

Thanks for the update! Nice to see ^^

I'll try to have a look soon-ish at how to deal with the getInScopeNamespaces() discrepancy.

As for the other issues: if you're stuck debugging them, I could always try to have a look at your code with a debug PHP build, if I know what test case to look for :)

@veewee veewee force-pushed the php84-spec-compliance branch 2 times, most recently from dd60a1d to 4628c16 Compare March 14, 2024 07:05
@veewee
Copy link
Owner Author

veewee commented Apr 1, 2024

@nielsdos All unit tests are succeeding, which is nice.

Next up - static analysis:

64 errors found

I'm using the stubs that are included in your PR which are not able to infer specific types in specific situations. What would be the best way forward there? Is it a matter of just fixing the provided stub files? Or should some kind of effort be needed in psalm ad well? How do you see the path forward there?

(I know it' still very early for adding PHP 8.4 support - but I'm just asking to figure out what the next steps would be in that space.)

@nielsdos
Copy link

nielsdos commented Apr 1, 2024

Psalm has stubs built in, which they also further annotate with version information, e.g. https://github.com/vimeo/psalm/blob/5.x/stubs/extensions/dom.phpstub
So when 8.4 support comes they'll have to copy over the new stuff from our own stubs and add the PHP 8.4 version annotation for the new stuff.

Can you share an example of a type inference error?

@veewee
Copy link
Owner Author

veewee commented Apr 2, 2024

Can you share an example of a type inference error?

I'm just gonna paste the full list of errors here so that you can skim around it - but that doesnt mean I expect answers for them ;)
I still need to go through the list and eliminate my own mistakes and will try to narrow down the list by altering the stub file from my end. Maybe it can be shared with spalm when I'm done with it.

Reveal the errrors
ERROR: MissingDependency - src/Xml/Dom/Assert/assert_dom_node_list.php:15:45 - DOM\NodeList depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
function assert_dom_node_list(mixed $node): \DOM\NodeList


ERROR: MoreSpecificReturnType - src/Xml/Dom/Assert/assert_dom_node_list.php:15:45 - The declared return type 'DOM\NodeList' for VeeWee\Xml\Dom\Assert\assert_dom_node_list is more specific than the inferred return type 'object' (see https://psalm.dev/070)
function assert_dom_node_list(mixed $node): \DOM\NodeList


ERROR: LessSpecificReturnStatement - src/Xml/Dom/Assert/assert_dom_node_list.php:17:12 - The type 'object' is more general than the declared return type 'DOM\NodeList' for VeeWee\Xml\Dom\Assert\assert_dom_node_list (see https://psalm.dev/129)
    return instance_of(\DOM\NodeList::class)->assert($node);


ERROR: MissingDependency - src/Xml/Dom/Assert/assert_dom_node_list.php:17:24 - DOM\NodeList depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
    return instance_of(\DOM\NodeList::class)->assert($node);


ERROR: MixedArgument - src/Xml/Dom/Assert/assert_dom_node_list.php:17:24 - Argument 1 of Psl\Type\instance_of cannot be mixed, expecting class-string (see https://psalm.dev/030)
    return instance_of(\DOM\NodeList::class)->assert($node);


ERROR: MissingDependency - src/Xml/Dom/Collection/NodeList.php:63:15 - DOM\HTMLCollection depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
     * @param \DOM\HTMLCollection $list


ERROR: MixedInferredReturnType - src/Xml/Dom/Collection/NodeList.php:64:16 - Could not verify return type 'VeeWee\Xml\Dom\Collection\NodeList<DOM\Element>' for VeeWee\Xml\Dom\Collection\NodeList::fromDOMHTMLCollection (see https://psalm.dev/047)
     * @return NodeList<\DOM\Element>


ERROR: MissingDependency - src/Xml/Dom/Collection/NodeList.php:73:15 - DOM\NodeList depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
     * @param DOMNodeList<X> $list


ERROR: MixedInferredReturnType - src/Xml/Dom/Collection/NodeList.php:74:16 - Could not verify return type 'VeeWee\Xml\Dom\Collection\NodeList<X>' for VeeWee\Xml\Dom\Collection\NodeList::fromDOMNodeList (see https://psalm.dev/047)
     * @return NodeList<X>


ERROR: InvalidClass - src/Xml/Dom/Collection/NodeList.php:196:15 - Class, interface or enum DOMXpath has wrong casing (see https://psalm.dev/007)
     * @param list<callable(DOMXpath): DOMXpath> $configurators


ERROR: InvalidArgument - src/Xml/Dom/Collection/NodeList.php:208:52 - Argument 2 of VeeWee\Xml\Dom\Xpath::fromUnsafeNode expects callable(DOM\XPath):DOM\XPath, but callable(DOMXpath):DOMXpath provided (see https://psalm.dev/004)
                => Xpath::fromUnsafeNode($node, ...$configurators)->query($xpath, $node)


ERROR: InvalidClass - src/Xml/Dom/Collection/NodeList.php:214:15 - Class, interface or enum DOMXpath has wrong casing (see https://psalm.dev/007)
     * @param list<callable(DOMXpath): DOMXpath> $configurators


ERROR: InvalidArgument - src/Xml/Dom/Collection/NodeList.php:222:52 - Argument 2 of VeeWee\Xml\Dom\Xpath::fromUnsafeNode expects callable(DOM\XPath):DOM\XPath, but callable(DOMXpath):DOMXpath provided (see https://psalm.dev/004)
                => Xpath::fromUnsafeNode($node, ...$configurators)->evaluate($expression, $type, $node)


ERROR: InaccessibleProperty - src/Xml/Dom/Configurator/document_uri.php:17:9 - DOM\XMLDocument::$documentURI is marked readonly (see https://psalm.dev/054)
        $document->documentURI = $documentUri;


ERROR: MixedReturnTypeCoercion - src/Xml/Dom/Locator/Attribute/attributes_list.php:14:12 - The declared return type 'VeeWee\Xml\Dom\Collection\NodeList<DOM\Attr>' for VeeWee\Xml\Dom\Locator\Attribute\attributes_list is more specific than the inferred return type 'VeeWee\Xml\Dom\Collection\NodeList<mixed>|VeeWee\Xml\Dom\Collection\NodeList<never>' (see https://psalm.dev/197)
 * @return NodeList<\DOM\Attr>


ERROR: InvalidArgument - src/Xml/Dom/Locator/Attribute/attributes_list.php:22:26 - Argument 1 of Psl\Vec\values expects iterable<mixed, mixed>, but DOM\NamedNodeMap provided (see https://psalm.dev/004)
    $attributes = values($node->attributes);


ERROR: MixedReturnTypeCoercion - src/Xml/Dom/Locator/Attribute/attributes_list.php:24:12 - The type 'VeeWee\Xml\Dom\Collection\NodeList<mixed>' is more general than the declared return type 'VeeWee\Xml\Dom\Collection\NodeList<DOM\Attr>' for VeeWee\Xml\Dom\Locator\Attribute\attributes_list (see https://psalm.dev/197)
    return new NodeList(...$attributes);


ERROR: MixedArgument - src/Xml/Dom/Locator/Attribute/attributes_list.php:24:28 - Argument 1 of VeeWee\Xml\Dom\Collection\NodeList::__construct cannot be mixed, expecting DOM\Node (see https://psalm.dev/030)
    return new NodeList(...$attributes);


ERROR: InvalidArgument - src/Xml/Dom/Locator/Element/children.php:20:9 - Argument 1 of Psl\Vec\filter expects iterable<mixed, mixed>, but DOM\NodeList provided (see https://psalm.dev/004)
        $node->childNodes,


ERROR: MissingDependency - src/Xml/Dom/Locator/Element/siblings.php:20:9 - DOM\NodeList depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
        $node->parentNode?->childNodes?->getIterator() ?? [],


ERROR: MixedArgument - src/Xml/Dom/Locator/Element/siblings.php:20:9 - Argument 1 of Psl\Vec\filter cannot be array<never, never>|mixed, expecting iterable<mixed, mixed> (see https://psalm.dev/030)
        $node->parentNode?->childNodes?->getIterator() ?? [],


ERROR: MoreSpecificReturnType - src/Xml/Dom/Locator/Node/detect_document.php:17:44 - The declared return type 'DOM\XMLDocument' for VeeWee\Xml\Dom\Locator\Node\detect_document is more specific than the inferred return type 'DOM\Document|null' (see https://psalm.dev/070)
function detect_document(\DOM\Node $node): \DOM\XMLDocument


ERROR: LessSpecificReturnStatement - src/Xml/Dom/Locator/Node/detect_document.php:19:12 - The type 'DOM\Document|DOM\XMLDocument|null' is more general than the declared return type 'DOM\XMLDocument' for VeeWee\Xml\Dom\Locator\Node\detect_document (see https://psalm.dev/129)
    return is_document($node) ? $node : $node->ownerDocument;


ERROR: NullableReturnStatement - src/Xml/Dom/Locator/Node/detect_document.php:19:12 - The declared return type 'DOM\XMLDocument' for VeeWee\Xml\Dom\Locator\Node\detect_document is not nullable, but the function returns 'DOM\Document|null' (see https://psalm.dev/139)
    return is_document($node) ? $node : $node->ownerDocument;


ERROR: PossiblyNullPropertyFetch - src/Xml/Dom/Locator/Xsd/locate_namespaced_xsd_schemas.php:20:19 - Cannot get property on possibly null variable $document->documentElement of type DOM\Element|null (see https://psalm.dev/082)
    $attributes = $document->documentElement->attributes;


ERROR: MixedAssignment - src/Xml/Dom/Locator/Xsd/locate_namespaced_xsd_schemas.php:23:10 - Unable to determine the type that $schemaLocation is being assigned to (see https://psalm.dev/032)
    if (!$schemaLocation = $attributes->getNamedItemNS($schemaNs, 'schemaLocation')) {


ERROR: MissingDependency - src/Xml/Dom/Locator/Xsd/locate_namespaced_xsd_schemas.php:23:28 - DOM\NamedNodeMap depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
    if (!$schemaLocation = $attributes->getNamedItemNS($schemaNs, 'schemaLocation')) {


ERROR: PossiblyNullReference - src/Xml/Dom/Locator/Xsd/locate_namespaced_xsd_schemas.php:23:41 - Cannot call method getNamedItemNS on possibly null value (see https://psalm.dev/083)
    if (!$schemaLocation = $attributes->getNamedItemNS($schemaNs, 'schemaLocation')) {


ERROR: MixedPropertyFetch - src/Xml/Dom/Locator/Xsd/locate_namespaced_xsd_schemas.php:28:25 - Cannot fetch property on mixed var $schemaLocation (see https://psalm.dev/034)
    $parts = split(trim($schemaLocation->textContent), '/\s+/');


ERROR: MixedArgument - src/Xml/Dom/Locator/Xsd/locate_namespaced_xsd_schemas.php:28:25 - Argument 1 of trim cannot be mixed, expecting string (see https://psalm.dev/030)
    $parts = split(trim($schemaLocation->textContent), '/\s+/');


ERROR: PossiblyNullPropertyFetch - src/Xml/Dom/Locator/Xsd/locate_no_namespaced_xsd_schemas.php:21:19 - Cannot get property on possibly null variable $document->documentElement of type DOM\Element|null (see https://psalm.dev/082)
    $attributes = $document->documentElement->attributes;


ERROR: MixedAssignment - src/Xml/Dom/Locator/Xsd/locate_no_namespaced_xsd_schemas.php:22:10 - Unable to determine the type that $schemaLocNoNamespace is being assigned to (see https://psalm.dev/032)
    if (!$schemaLocNoNamespace = $attributes->getNamedItemNS($schemaNs, 'noNamespaceSchemaLocation')) {


ERROR: MissingDependency - src/Xml/Dom/Locator/Xsd/locate_no_namespaced_xsd_schemas.php:22:34 - DOM\NamedNodeMap depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
    if (!$schemaLocNoNamespace = $attributes->getNamedItemNS($schemaNs, 'noNamespaceSchemaLocation')) {


ERROR: PossiblyNullReference - src/Xml/Dom/Locator/Xsd/locate_no_namespaced_xsd_schemas.php:22:47 - Cannot call method getNamedItemNS on possibly null value (see https://psalm.dev/083)
    if (!$schemaLocNoNamespace = $attributes->getNamedItemNS($schemaNs, 'noNamespaceSchemaLocation')) {


ERROR: MixedPropertyFetch - src/Xml/Dom/Locator/Xsd/locate_no_namespaced_xsd_schemas.php:27:25 - Cannot fetch property on mixed var $schemaLocNoNamespace (see https://psalm.dev/034)
    $parts = split(trim($schemaLocNoNamespace->textContent), '/\s+/');


ERROR: MixedArgument - src/Xml/Dom/Locator/Xsd/locate_no_namespaced_xsd_schemas.php:27:25 - Argument 1 of trim cannot be mixed, expecting string (see https://psalm.dev/030)
    $parts = split(trim($schemaLocNoNamespace->textContent), '/\s+/');


ERROR: InvalidNullableReturnType - src/Xml/Dom/Locator/document_element.php:16:52 - The declared return type 'DOM\Element' for /users/verweto/projects/github/veewee/xml/src/xml/dom/locator/document_element.php:16:231:-:closure is not nullable, but 'DOM\Element|null' contains null (see https://psalm.dev/144)
    return static fn (\DOM\XMLDocument $document): \DOM\Element => $document->documentElement;


ERROR: NullableReturnStatement - src/Xml/Dom/Locator/document_element.php:16:68 - The declared return type 'DOM\Element' for /users/verweto/projects/github/veewee/xml/src/xml/dom/locator/document_element.php:16:231:-:closure is not nullable, but the function returns 'DOM\Element|null' (see https://psalm.dev/139)
    return static fn (\DOM\XMLDocument $document): \DOM\Element => $document->documentElement;


ERROR: PossiblyNullArgument - src/Xml/Dom/Locator/elements_with_namespaced_tagname.php:23:46 - Argument 1 of VeeWee\Xml\Dom\Locator\Element\locate_by_namespaced_tag_name cannot be null, possibly null value provided (see https://psalm.dev/078)
            => locate_by_namespaced_tag_name($document->documentElement, $namespace, $localTagName);


ERROR: PossiblyNullArgument - src/Xml/Dom/Locator/elements_with_tagname.php:23:35 - Argument 1 of VeeWee\Xml\Dom\Locator\Element\locate_by_tag_name cannot be null, possibly null value provided (see https://psalm.dev/078)
            => locate_by_tag_name($document->documentElement, $tagName);


ERROR: PossiblyNullPropertyFetch - src/Xml/Dom/Locator/root_namespace.php:15:63 - Cannot get property on possibly null variable $document->documentElement of type DOM\Element|null (see https://psalm.dev/082)
    return static fn (\DOM\XMLDocument $document): ?string => $document->documentElement->namespaceURI;


ERROR: MixedInferredReturnType - src/Xml/Dom/Manipulator/Attribute/rename.php:19:42 - Could not verify return type 'DOM\Attr' for /users/verweto/projects/github/veewee/xml/src/xml/dom/manipulator/attribute/rename.php:19:522:-:closure (see https://psalm.dev/047)
    return disallow_issues(static fn (): \DOM\Attr => match(true) {


ERROR: MixedReturnStatement - src/Xml/Dom/Manipulator/Attribute/rename.php:19:55 - Possibly-mixed return value (see https://psalm.dev/138)
    return disallow_issues(static fn (): \DOM\Attr => match(true) {
        is_xmlns_attribute($target) => rename_xmlns_attribute($target, $newQName),
        default => tap(fn () => $target->rename($newNamespaceURI, $newQName))($target)
    });


ERROR: MixedReturnStatement - src/Xml/Dom/Manipulator/Attribute/rename.php:19:55 - Could not infer a return type (see https://psalm.dev/138)
    return disallow_issues(static fn (): \DOM\Attr => match(true) {
        is_xmlns_attribute($target) => rename_xmlns_attribute($target, $newQName),
        default => tap(fn () => $target->rename($newNamespaceURI, $newQName))($target)
    });


ERROR: PossiblyNullArgument - src/Xml/Dom/Manipulator/Document/optimize_namespaces.php:23:37 - Argument 1 of VeeWee\Xml\Dom\Locator\Xmlns\recursive_linked_namespaces cannot be null, possibly null value provided (see https://psalm.dev/078)
        recursive_linked_namespaces($documentElement),


ERROR: InvalidNullableReturnType - src/Xml/Dom/Manipulator/Document/optimize_namespaces.php:24:47 - The declared return type 'string' for /users/verweto/projects/github/veewee/xml/src/xml/dom/manipulator/document/optimize_namespaces.php:24:711:-:closure is not nullable, but 'null|string' contains null (see https://psalm.dev/144)
        static fn (\DOM\NamespaceInfo $info): string => $info->namespaceURI


ERROR: NullableReturnStatement - src/Xml/Dom/Manipulator/Document/optimize_namespaces.php:24:57 - The declared return type 'string' for /users/verweto/projects/github/veewee/xml/src/xml/dom/manipulator/document/optimize_namespaces.php:24:711:-:closure is not nullable, but the function returns 'null|string' (see https://psalm.dev/139)
        static fn (\DOM\NamespaceInfo $info): string => $info->namespaceURI


ERROR: RiskyTruthyFalsyComparison - src/Xml/Dom/Manipulator/Element/copy_named_xmlns_attributes.php:17:13 - Operand of type null|string contains type string, which can be falsy and truthy. This can cause possibly unexpected behavior. Use strict comparison instead. (see https://psalm.dev/356)
        if ($xmlns->prefix && !$target->hasAttribute($xmlns->nodeName)) {


ERROR: PossiblyNullOperand - src/Xml/Dom/Manipulator/Element/rename.php:27:88 - Cannot concatenate with a possibly null null|string (see https://psalm.dev/080)
    if ($newPrefix && $newPrefix !== $target->prefix && $target->hasAttribute('xmlns:'.$target->prefix)) {


ERROR: PossiblyNullOperand - src/Xml/Dom/Manipulator/Element/rename.php:28:43 - Cannot concatenate with a possibly null null|string (see https://psalm.dev/080)
        $target->removeAttribute('xmlns:'.$target->prefix);


ERROR: InvalidPropertyAssignmentValue - src/Xml/Dom/Traverser/Visitor/RemoveNamespaces.php:26:25 - $this->filter with declared type 'callable(DOM\Attr|DOM\Element):bool|null' cannot be assigned type 'callable(DOM\Attr):bool|null' (see https://psalm.dev/145)
        $this->filter = $filter;


ERROR: UndefinedPropertyFetch - src/Xml/Dom/Traverser/Visitor/RemoveNamespaces.php:84:38 - Instance property DOM\Node::$localName is not defined (see https://psalm.dev/039)
        return new Action\RenameNode($node->localName, null);


ERROR: MixedArgument - src/Xml/Dom/Traverser/Visitor/RemoveNamespaces.php:84:38 - Argument 1 of VeeWee\Xml\Dom\Traverser\Action\RenameNode::__construct cannot be mixed, expecting string (see https://psalm.dev/030)
        return new Action\RenameNode($node->localName, null);


ERROR: MissingDependency - src/Xml/Dom/Xpath/Locator/query.php:24:27 - DOM\NodeList depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
            static fn (): DOMNodeList => assert_dom_node_list(


ERROR: MixedInferredReturnType - src/Xml/Dom/Xpath/Locator/query_single.php:28:66 - Could not verify return type 'DOM\Node' for /users/verweto/projects/github/veewee/xml/src/xml/dom/xpath/locator/query_single.php:28:695:-:closure (see https://psalm.dev/047)
        static function (\DOM\XPath $xpath) use ($query, $node): \DOM\Node {


ERROR: MissingDependency - src/Xml/Dom/Xpath/Locator/query_single.php:31:31 - DOM\NodeList depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
                static fn (): \DOM\NodeList => assert_dom_node_list(


ERROR: InvalidArgument - src/Xml/Dom/Xpath/Locator/query_single.php:40:17 - Argument 1 of Webmozart\Assert\Assert::count expects Countable|array<array-key, mixed>, but DOM\NodeList provided (see https://psalm.dev/004)
                $list,


ERROR: InvalidArgument - src/Xml/Dom/Xpath/Locator/query_single.php:42:96 - Argument 1 of count expects Countable|array<array-key, mixed>, but DOM\NodeList provided (see https://psalm.dev/004)
                format('Expected to find only one node that matches %s. Got %s', $query, count($list))


ERROR: MissingDependency - src/Xml/Dom/Xpath/Locator/query_single.php:45:20 - DOM\NodeList depends on class or interface dom\iteratoraggregate that does not exist (see https://psalm.dev/157)
            return $list->item(0);


ERROR: MixedReturnStatement - src/Xml/Dom/Xpath/Locator/query_single.php:45:20 - Could not infer a return type (see https://psalm.dev/138)
            return $list->item(0);


ERROR: InvalidReturnType - src/Xml/Encoding/Internal/Decoder/Builder/element.php:14:12 - The declared return type 'array<string, array<array-key, mixed>|string>' for VeeWee\Xml\Encoding\Internal\Decoder\Builder\element is incorrect, got 'non-empty-array<string, array<array-key, mixed>|null|string>' (see https://psalm.dev/011)
 * @return array<string, string|array>


ERROR: InvalidReturnStatement - src/Xml/Encoding/Internal/Decoder/Builder/element.php:25:16 - The inferred type 'non-empty-array<string, null|string>' does not match the declared return type 'array<string, array<array-key, mixed>|string>' for VeeWee\Xml\Encoding\Internal\Decoder\Builder\element (see https://psalm.dev/128)
        return [$name => $element->textContent];


ERROR: RiskyTruthyFalsyComparison - src/Xml/Encoding/Internal/Decoder/Builder/namespaces.php:25:26 - Operand of type null|string contains type string, which can be falsy and truthy. This can cause possibly unexpected behavior. Use strict comparison instead. (see https://psalm.dev/356)
                        ($node->prefix ? $node->localName : '') => $node->value


ERROR: InvalidArgument - src/Xml/Xslt/Transformer/document_to_string.php:22:44 - Argument 1 of XSLTProcessor::transformToXml expects DOMDocument, but DOM\XMLDocument provided (see https://psalm.dev/004)
                $processor->transformToXML($document->toUnsafeDocument()),

@nielsdos
Copy link

nielsdos commented Apr 2, 2024

At least some errors are legit, e.g. Class, interface or enum DOMXpath has wrong casing (see https://psalm.dev/007)

Some others, e.g. Instance property DOM\Node::$localName is not defined (see https://psalm.dev/039), are because localName is now only defined on DOM\Attr and DOM\Element.

Some others are about nullability.

So all in all, quite a few seem to be because of type changes.

DOM\HTMLCollection depends on class or interface dom\iteratoraggregate

This one is a "mistake" in the stub and you can fix it by copying the stub from the master branch. This was found already by one of the Symfony developers and fixed in master. What's going on here is that I didn't put a backslash before IteratorAggregate so it first searches in the DOM namespaces and only after that fails in the global namespace. I put "mistake" in quotes because technically it works due to how namespace lookup works in PHP, but looks like psalm is very strict.

@veewee
Copy link
Owner Author

veewee commented Apr 16, 2024

Stubs issues:

@nielsdos I was fixing some psalm issues and was bumping against following issues:

  • ✅ Collections don't contain more detailed iterable<Tk, Tv> information : manually adding to the stub files from my side. This should probably be ported to psalm.

  • ❌ Document::documentURI:

ERROR: InaccessibleProperty - src/Xml/Dom/Configurator/document_uri.php:17:9 - DOM\XMLDocument::$documentURI is marked readonly (see https://psalm.dev/054)
        $document->documentURI = $documentUri;

documentURI is marked as readonly in the stubs but is not readonly:
See tests/Xml/Dom/Configurator/DocumentUriTest.php. This feature is being used for inline XSD schema parsing in WSDLs from my end and I suppose should be left writeable?

  • ❌ Node::$ownerDocument is null|Document? I would expect it to never be null? What is the case for null here?
  • ❌ Document::$documentElement is null|Document? I would expect it to never be null? What is the case for null here?

If those are resolved, I think the rest of them are located inside my repository.
However the 3 items marked with a cross seem invalid to me.

Can you check these?

@nielsdos
Copy link

✅ Collections don't contain more detailed iterable<Tk, Tv> information : manually adding to the stub files from my side. This should probably be ported to psalm.

Yeah this information should be put into psalm eventually.

documentURI is marked as readonly in the stubs but is not readonly:

Fixed in php/php-src#13982, good catch

❌ Node::$ownerDocument is null|Document? I would expect it to never be null? What is the case for null here?

This is, unfortunately, the right type. $ownerDocument is only NULL for document nodes, and the owner document for other nodes. See spec: https://dom.spec.whatwg.org/#dom-node-ownerdocument

❌ Document::$documentElement is null|Document? I would expect it to never be null? What is the case for null here?

This is also unfortunately possible. I.e.: https://3v4l.org/bsF7g/rfc#vgit.master

@nielsdos
Copy link

In other news, I've been wrapping up the implementation work for the remaining features I wanted to add to ext-dom in PHP 8.4.

One thing I'm having some difficulties with is the API design of Element::rename function proposed earlier.

The problem is as follows:
In one of the feature additions I'm proposing for 8.4, I'm adding Document::$body. This has the type HTMLElement. HTMLElement is a class that extends Element, and the idea here is that if an element is in the HTML namespace, getting an element should return an instance of HTMLElement instead of Element. (For example if firstElementChild is in the HTML namespace, it returns an HTMLElement)
This clashes with Element::rename. Let's say we start with an element in the namespace "urn:x" and we use Element::rename to change the namespace to the HTML namespace. Suddenly our Element instance should actually be an HTMLElement object, but that's not possible because we can't change the object type of an instance. Similarly, the opposite problems happens when we go from the HTML namespace to a non-HTML namespace.

I haven't been able to come up with a good solution. Here are some things I thought about:

  • Ignore the problem. The easy solution, but might also be weird because suddenly you can't assign it to properties that have the HTMLElement type, even if you changed the element to the HTML namespace.
  • Return a new instance from rename(). This would work but is kind of awkward for developers I think. Also when the element is already stored somewhere this is a recipe for disaster as the existing instance of the element is stale.
  • Leave rename() out. Also an easy solution but would be annoying for developers...

I'm not sure how to go about it. Maybe you have some ideas.

@veewee
Copy link
Owner Author

veewee commented Apr 17, 2024

@nielsdos

If I understand the problem correctly, then it's about some kind of transition from HTMLElement to regular Element and the other way around?

I'm a bit confused by the sentence:

and the idea here is that if an element is in the HTML namespace, getting an element should return an instance of HTMLElement instead of Element.

This means that wether it returns a HTMLElement is based on a namespace URI like http://www.w3.org/1999/xhtml?

There are some ways to look at this:

  • Would renaming an HTML element be something someone wants to do in PHP context? I guess it will probably be a rather limited approach. Renaming the namespace there would most likely be many bridges too far.
  • Renaming an XML element into a HTML element would most likely also be an operation that would hardly ever happen. I can imagine a scenario where you want to add something that looks like HTML to XML, but it won't really be HTML compliant I suppose.

Let's take a look at the provided solutions:

Ignore the problem. The easy solution, but might also be weird because suddenly you can't assign it to properties that have the HTMLElement type, even if you changed the element to the HTML namespace.

I'm not sure what properties you are talking about here - but I would say that switching type of node is not something you want to do here. So to me this is a no-go.

Return a new instance from rename(). This would work but is kind of awkward for developers I think. Also when the element is already stored somewhere this is a recipe for disaster as the existing instance of the element is stale.

This is indeed not a good idea either. However I do like the idea of making it return the node you are renaming so that it can be chained. That's another discussion I suppose.

Leave rename() out. Also an easy solution but would be annoying for developers...

You mean leave it out of the HTMLElement class? This might be confusing since you are extending the regular Element and it would be there and throw an exception. It is still needed in regular Element though for various operations you can find in this repository, so not having it would be kinda blocking.

Maybe some other ideas:

  • Add a hard block on renaming cross node-types by throwing an exception? It can be done by checking the source and target namespace I suppose. Since this is an operation that wouldn't or shouldnt happen in many scenarios, it might be a sensible thing to do?

  • Limiting the rename() method on the HTMLElement so that you cannot change namespaces. This would still lead to the issue from XML to HTML namespace. I'm not sure this would have a lot of impact, cause it inherits from a regular element.

In other news, I've been wrapping up the implementation work for the remaining features I wanted to add to ext-dom in PHP 8.4.

Cool, great work. I'm sure your work here will be viable in the long run!

I do have some last thing I am thinking on and off about for some while now, maybe I'dd just ask here:

Why do both XMLReader and XMLWriter don't support a regular resource stream (like fopen) as input / output target? It's kind off annoying that you need to e.g. write an xml resource stream to a file first before opening it or write to a file first and then do something with it. Would that be something that would be easy to add to e.g. PHP 8.4? (Maybe disputable since they are trying to move away from resources for the past years)

To be clear, I don't want to add more work for you to the table since you are already doing so much. Let me know if I can help with anything.

@veewee
Copy link
Owner Author

veewee commented Apr 17, 2024

@nielsdos Thanks for the review on the static analysis part. I was able to get it all covered now.

@nielsdos
Copy link

nielsdos commented Apr 17, 2024

@nielsdos Thanks for the review on the static analysis part. I was able to get it all covered now.

Awesome! 🎉

If I understand the problem correctly, then it's about some kind of transition from HTMLElement to regular Element and the other way around?

Indeed

This means that wether it returns a HTMLElement is based on a namespace URI like http://www.w3.org/1999/xhtml?

Yes

There are some ways to look at this:

Would renaming an HTML element be something someone wants to do in PHP context? I guess it will probably be a rather limited approach. Renaming the namespace there would most likely be many bridges too far.
Renaming an XML element into a HTML element would most likely also be an operation that would hardly ever happen. I can imagine a scenario where you want to add something that looks like HTML to XML, but it won't really be HTML compliant I suppose.

You are right about both these things: these are strange things to do. However, users have always managed to surprise me :-)

I agree with your conclusions about the provided workarounds.

Leave rename() out. Also an easy solution but would be annoying for developers...

You mean leave it out of the HTMLElement class? This might be confusing since you are extending the regular Element and it would be there and throw an exception. It is still needed in regular Element though for various operations you can find in this repository, so not having it would be kinda blocking.

Yes, leave it out for HTMLElement for example. I think you're right this is strange though.
Or leave the rename method out entirely... What also crossed my mind is that rename is also incompatible if we ever want to make something like the registerNodeClass function such that you could register custom Element classes (e.g. being able to do registerElementClass('xhtml namespace', 'body', MyBodyElement::class).

Maybe some other ideas:
Add a hard block on renaming cross node-types by throwing an exception? It can be done by checking the source and target namespace I suppose. Since this is an operation that wouldn't or shouldnt happen in many scenarios, it might be a sensible thing to do?
Limiting the rename() method on the HTMLElement so that you cannot change namespaces. This would still lead to the issue from XML to HTML namespace. I'm not sure this would have a lot of impact, cause it inherits from a regular element.

These seem like sensible solutions, I'm not sure though on how well this generalises and how well this would work for potential future DOM APIs (e.g. the registerElementClass example from earlier).


Why do both XMLReader and XMLWriter don't support a regular resource stream (like fopen) as input / output target?

I don't know, ask my predecessors :-) It should not be hard to add though. I found two feature requests about this: https://bugs.php.net/bug.php?id=46146 and https://bugs.php.net/bug.php?id=63506

It's kind off annoying that you need to e.g. write an xml resource stream to a file first before opening it or write to a file first and then do something with it. Would that be something that would be easy to add to e.g. PHP 8.4?

I easily see the use case for XMLWriter, it's often the case that XML is embedded into something, and with XMLWriter being stream-oriented it makes perfect sense to have an openStream method. I'll look into this, stay tuned ;)

As for XMLReader, I find it harder to see the use case. You can use XMLReader::open and read from any uri (even from custom stream wrappers). Can you give me a concrete use case for XMLReader? A concrete use case can also help if I need to draft an RFC text.

(Maybe disputable since they are trying to move away from resources for the past years)

Well, using existing resource types is fine, introducing new resource types is not. The stream resource will stay with us for some time anyway.

@veewee
Copy link
Owner Author

veewee commented Apr 17, 2024

Yes, leave it out for HTMLElement for example. I think you're right this is strange though. Or leave the rename method out entirely... What also crossed my mind is that rename is also incompatible if we ever want to make something like the registerNodeClass function such that you could register custom Element classes (e.g. being able to do registerElementClass('xhtml namespace', 'body', MyBodyElement::class).

What would be the benifit of having a registerElementClass? As-in : what does it solve?
It makes things complexer and I can't think of a good reason to do so.

There are 2-3 ways to go in this case :

  • Either accept that the node-type can change when performing these operations based on the XMl / HTML rules or the customly registered classes.
  • Or Don't allow cross-node renames (which is basically what I proposed earlier)
  • Don't to renaming at all (which is kind off a blocker)

I easily see the use case for XMLWriter, it's often the case that XML is embedded into something, and with XMLWriter being stream-oriented it makes perfect sense to have an openStream method. I'll look into this, stay tuned ;)

Nice :)

As for XMLReader, I find it harder to see the use case. You can use XMLReader::open and read from any uri (even from custom stream wrappers). Can you give me a concrete use case for XMLReader? A concrete use case can also help if I need to draft an RFC text.

For example XML responses in HTTP Responses:
An HTTP response you receive over an already opened up resource stream contains XML.
Now you first need to write this stream to a file before you can use XMLReade from a file.
Afterwards you need to make sure to remove that file.
Those are some steps too far to me, not to mention the I/O overhead.

Even in PSR-7, you could use a StreamWrapper to make sure you don't have to write to a file before reading this stream.

It would be nice if one could just XMLReader::openStream($httpResponseStream);

@nielsdos
Copy link

Yes, leave it out for HTMLElement for example. I think you're right this is strange though. Or leave the rename method out entirely... What also crossed my mind is that rename is also incompatible if we ever want to make something like the registerNodeClass function such that you could register custom Element classes (e.g. being able to do registerElementClass('xhtml namespace', 'body', MyBodyElement::class).

What would be the benifit of having a registerElementClass? As-in : what does it solve? It makes things complexer and I can't think of a good reason to do so.

I was thinking about hypothetical future additions that could cause problems. registerElementClass was once suggested on the mailing list so that's why I brought it up. But you may be right that this is thinking about problems that don't exist.
I'm just a bit scared of implementing APIs that will bite me in my ass later on when implementing other features.

There are 2-3 ways to go in this case :

* Either accept that the node-type can change when performing these operations based on the XMl / HTML rules or the customly registered classes.

* Or Don't allow cross-node renames (which is basically what I proposed earlier)

* Don't to renaming at all (which is kind off a blocker)

The first two bullet points are the most sensible in my opinion. I'll have another think about it. I'll also discuss this internally a bit more.
I don't want to get rid of renaming because it's a handy feature.

I easily see the use case for XMLWriter, it's often the case that XML is embedded into something, and with XMLWriter being stream-oriented it makes perfect sense to have an openStream method. I'll look into this, stay tuned ;)

Nice :)

Please see nielsdos/php-src#106
This implements a new method XMLWriter::openStream(resource $stream). It takes an open stream and writes its data to it.

When implementing this I dug into the libxml APIs and found that the underlying API allows setting a character encoding. So it's possible to extend the function signature to this: XMLWriter::openStream(resource $stream, ?string $encoding) where you could set a particular character encoding via the optional $encoding argument. I think it's sensible to add that encoding argument, and maybe even for the other 2 open methods (i.e. openUri and openMemory) too? Please let me know what you think.

As for XMLReader, I find it harder to see the use case. You can use XMLReader::open and read from any uri (even from custom stream wrappers). Can you give me a concrete use case for XMLReader? A concrete use case can also help if I need to draft an RFC text.

For example XML responses in HTTP Responses: An HTTP response you receive over an already opened up resource stream contains XML. Now you first need to write this stream to a file before you can use XMLReade from a file. Afterwards you need to make sure to remove that file. Those are some steps too far to me, not to mention the I/O overhead.

Even in PSR-7, you could use a StreamWrapper to make sure you don't have to write to a file before reading this stream.

It would be nice if one could just XMLReader::openStream($httpResponseStream);

Yeah okay that makes perfect sense, thanks.
I'd model it after https://www.php.net/manual/en/xmlreader.open.php though, so XMLReader::openStream(resource $stream, ?string $encoding = null, int $flags = 0). Let me know what you think.

@veewee
Copy link
Owner Author

veewee commented Apr 18, 2024

Please see nielsdos/php-src#106 This implements a new method XMLWriter::openStream(resource $stream). It takes an open stream and writes its data to it.

Wow that was fast 😮

When implementing this I dug into the libxml APIs and found that the underlying API allows setting a character encoding. So it's possible to extend the function signature to this: XMLWriter::openStream(resource $stream, ?string $encoding) where you could set a particular character encoding via the optional $encoding argument. I think it's sensible to add that encoding argument, and maybe even for the other 2 open methods (i.e. openUri and openMemory) too? Please let me know what you think.

Sure makes sense to add it. There is no option to do so at this moment, so makes sense!

Yeah okay that makes perfect sense, thanks. I'd model it after https://www.php.net/manual/en/xmlreader.open.php though, so XMLReader::openStream(resource $stream, ?string $encoding = null, int $flags = 0). Let me know what you think.

That would indeed be a good API. (The false return on failure is not the best design, but at least in line with what there is today.)

I notice from my library, I did not implement encoding and flags - so might need to do that here as well. Maybe I'll also create some kind of libxml flag builder someday, cause those are confusing me everytime I look at them :)

@nielsdos
Copy link

FWIW, after discussion internally about rename we decided the best option would be to disallow cross-node renames, which is what you suggested earlier. I.e. if (namespaceURI is the HTML namespace || node is in the HTML namespace) { the namespace must be the HTML namespace for both namespaceURI and the node, otherwise we throw }

For XMLReader & XMLWriter, I can do a short RFC for this. I'll work on adding the encoding parameter too, it makes me wonder what the encoding is it uses right now actually or if it does some kind of autodetect... Of course the libxml docs aren't really helping here...

Maybe I'll also create some kind of libxml flag builder someday, cause those are confusing me everytime I look at them :)

The options aren't always amazingly named. E.g. LIBXML_NOENT is my "favourite" one because people often think it stands for "NO ENtity substitution", but it does the opposite :-(

@nielsdos
Copy link

FWIW, I made a draft RFC for the XMLReader&Writer openStream function: https://wiki.php.net/rfc/xmlreader_writer_streams. I'm sharing this with a few people and if nothing comes up I'll post it to the mailing list tomorrow evening or on Tuesday or so, I'll see.
In the end I decided to keep the signatures for XMLWriter the same and not add the $encoding argument to openStream() because XMLWriter::startDocument() already has an encoding argument and it is handled nicely there.

@veewee
Copy link
Owner Author

veewee commented Apr 21, 2024

@nielsdos nice! Let's hope for a smooth acceptance path then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants