Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INWX: support changing from null MX to non-null MX #2800

Open
synotna opened this issue Jan 17, 2024 · 17 comments
Open

INWX: support changing from null MX to non-null MX #2800

synotna opened this issue Jan 17, 2024 · 17 comments

Comments

@synotna
Copy link

synotna commented Jan 17, 2024

Describe the bug
With INWX, when changing from null MX to non-null MX records, the ordering matters because INWX enforces that you cannot add further MX records while you have a null MX record

dnscontrol does not seem to handle this, leading to try to add non-null MX records before the null MX record is modified

To Reproduce
Steps to reproduce the behavior:

  1. Have an INWX domain that has null MX
  2. Change to having non-null MX records e.g. the standard set for Google
  3. The additions are attempted before the modification of the existing null MX record, causing them to fail

#1: + CREATE MX example.org 10 alt3.aspmx.l.google.com. ttl=3600
FAILURE! (2308) Data management policy violation
#2: + CREATE MX example.org 10 alt4.aspmx.l.google.com. ttl=3600
FAILURE! (2308) Data management policy violation
#3: + CREATE MX example.org 5 alt1.aspmx.l.google.com. ttl=3600
FAILURE! (2308) Data management policy violation
#4: + CREATE MX example.org 5 alt2.aspmx.l.google.com. ttl=3600
FAILURE! (2308) Data management policy violation
...
#6: ± MODIFY MX example.org: (0 . ttl=3600) -> (1 aspmx.l.google.com. ttl=3600)
SUCCESS!

Expected behavior
The existing null MX record should be modified first so that the subsequent additions do not fail

DNS Provider
INWX

@tlimoncelli
Copy link
Contributor

CC @blackshadev (author of dnsgraph, which decides the ordering of updates)
CC @patschi (INWX maintainer)

@blackshadev
Copy link
Contributor

I am interested to know if any other provider has the same issue. I do not (yet) have any exceptions in the ordering flow for any provider. And it seems weird to me to make an exception since technically you can depent on empty records and the diff shown below should work in most systems.

I don't know of the INWX maintainer (@patschi) agrees, but it seems INWX specific and thus is should be best handled as an exception in the provider. My suggestion would be to detect the case in the provider : modify-ing an empty MX record. And treat is as a Delete + Create. The TransIP also have some weird edgecases were I ended up deleting and recreating the record instead.

@tlimoncelli
Copy link
Contributor

On one hand....it does seem prudent that an NullMX should have a different weight than regular MX.

On the other hand... how often will this happen? Is there a sufficient workaround? (The workaround would be to delete all MX records in one push then add the Null MX in a second push).

@tlimoncelli
Copy link
Contributor

tlimoncelli commented Jan 17, 2024

I'd like to see if its possible to create an integration test that detects this situation.

I created this: #2801

Could someone with an INWX account try running the integration tests from that branch?

Tom

P.S. The providers I do have creds for all passed the test: https://github.com/StackExchange/dnscontrol/actions/runs/7559961364/job/20584943571?pr=2801

@patschi
Copy link
Contributor

patschi commented Jan 17, 2024

I'd like to see if its possible to create an integration test that detects this situation.

Seems to work (the test itself at least):

$ go test -v -verbose -timeout 0 -provider INWX | grep -i fail
--- FAIL: TestDNSProviders (130.29s)
    --- FAIL: TestDNSProviders/dnscontrol1.com (130.09s)
        --- FAIL: TestDNSProviders/dnscontrol1.com/16:NullMX:unnull (0.12s)
        --- FAIL: TestDNSProviders/dnscontrol1.com/17:NullMXApex:unnull (1.27s)
        --- FAIL: TestDNSProviders/dnscontrol1.com/19:complex_TXT:TXT_with_1_dq-1interior (0.27s)
        --- FAIL: TestDNSProviders/dnscontrol1.com/20:TXT_backslashes:TXT_with_backslashs (0.41s)
FAIL
FAIL    github.com/StackExchange/dnscontrol/v4/integrationTest  130.876s

TRANSIP seems to fail the test as well: https://github.com/StackExchange/dnscontrol/actions/runs/7559961364/job/20585007084?pr=2801

If we prefer having some sort of workaround in INWX provider itself, I can try to work something out.

@tlimoncelli Is the CI running basically the integrationTest just like when running them locally? I could then create a new account with INWX with one test domain and add it to the CI pipeline. So that we have this covered as well.

@tlimoncelli
Copy link
Contributor

@patschi Yes, the CI is running the same integrationTest. If you'd like to send me creds, please follow these instructions: https://docs.dnscontrol.org/developer-info/byo-secrets (this is a new doc; if you see something that needs correcting please let me know!)

@tlimoncelli
Copy link
Contributor

Since TRANSIP also fails the test (link) I lean towards fixing this in dnsgraph. If 2 have this issue, that's a trend.

I'm concerned (but not certain!) that fixing it at the dnsgraph level might not help INWX. INWX uses diff.NewCompat(dc).IncrementalDiff(foundRecords) which reorders the changes randomly. Would you be interested in converting to diff2? Looking at the code... the conversion looks relatively straightforward.

@patschi
Copy link
Contributor

patschi commented Jan 17, 2024

Looks like almost all providers seem to use the same?
https://github.com/search?q=repo%3AStackExchange%2Fdnscontrol%20%22diff.NewCompat(dc).IncrementalDiff%22&type=code

Shouldn't all providers then be adjusted to the new diff2 code to be consistent? (Was looking for some example code how I can convert it)

@tlimoncelli
Copy link
Contributor

@blackshadev hmm... this is interesting!

The BIND provider outputs this message:

https://github.com/StackExchange/dnscontrol/actions/runs/7559961364/job/20585004068?pr=2801

PASS integrationTest.TestDNSProviders/example.com/17:NullMXApex:renull (0.00s)
  2024/01/17 18:12:10 Warning: Found unresolved records [example.com example.com www.example.com].
  This can indicate a circular dependency, please ensure all targets from given records exist and no circular dependencies exist in the changeset. These unresolved records are still added as changes and pushed to the provider, but will cause issues if and when the provider checks the changes.
  For more information and how to disable the reordering please consolidate our documentation at [https://docs.***.org/developer-info/ordering](https://docs.%2A%2A%2A.org/developer-info/ordering)
      integration_test.go:247: 
          - DELETE example.com MX 8 example.com. ttl=300
          - DELETE example.com A 1.2.3.4 ttl=300
          ± MODIFY example.com MX (4 www.example.com. ttl=300) -> (0 . ttl=300)
          - DELETE www.example.com A 1.2.3.8 ttl=300
  WRITING ZONEFILE: zones/example.com.zone
          --- PASS: TestDNSProviders/example.com/17:NullMXApex:renull (0.00s)

I'm seeing the same message in NAMEDOTCOM, HEXONET, HEDNS, and then I stopped looking.

@tlimoncelli
Copy link
Contributor

Looks like almost all providers seem to use the same? https://github.com/search?q=repo%3AStackExchange%2Fdnscontrol%20%22diff.NewCompat(dc).IncrementalDiff%22&type=code

Shouldn't all providers then be adjusted to the new diff2 code to be consistent? (Was looking for some example code how I can convert it)

The diff.NewCompat() call is a bandaid. I wrote that so that I could get rid of the old diff code and not have to maintain the old and new code in parallel. I would definitely like to see all providers move to the diff2 functions. About half of them have converted:

$ grep diff2.By providers/*/*.go | wc -l
      22
$ grep diff.NewCompat providers/*/*.go | wc -l
      23

I think INWX would use diff2.ByRecord(). Take a look at providers/bunnydns/records.go for a relatively clean example.

@blackshadev
Copy link
Contributor

Since TRANSIP also fails the test (link) I lean towards fixing this in dnsgraph. If 2 have this issue, that's a trend.

I'm concerned (but not certain!) that fixing it at the dnsgraph level might not help INWX. INWX uses diff.NewCompat(dc).IncrementalDiff(foundRecords) which reorders the changes randomly. Would you be interested in converting to diff2? Looking at the code... the conversion looks relatively straightforward.

This is a different "issue" all together, one I cannot fix: TransIP Will not allow an empty MX record. So the whole test setup doesn't even work for TransIP. I suspect most providers will not accept an empty MX record and thus fail the test.

@blackshadev hmm... this is interesting!

The BIND provider outputs this message:

https://github.com/StackExchange/dnscontrol/actions/runs/7559961364/job/20585004068?pr=2801

PASS integrationTest.TestDNSProviders/example.com/17:NullMXApex:renull (0.00s)
  2024/01/17 18:12:10 Warning: Found unresolved records [example.com example.com www.example.com].
  This can indicate a circular dependency, please ensure all targets from given records exist and no circular dependencies exist in the changeset. These unresolved records are still added as changes and pushed to the provider, but will cause issues if and when the provider checks the changes.
  For more information and how to disable the reordering please consolidate our documentation at [https://docs.***.org/developer-info/ordering](https://docs.%2A%2A%2A.org/developer-info/ordering)
      integration_test.go:247: 
          - DELETE example.com MX 8 example.com. ttl=300
          - DELETE example.com A 1.2.3.4 ttl=300
          ± MODIFY example.com MX (4 www.example.com. ttl=300) -> (0 . ttl=300)
          - DELETE www.example.com A 1.2.3.8 ttl=300
  WRITING ZONEFILE: zones/example.com.zone
          --- PASS: TestDNSProviders/example.com/17:NullMXApex:renull (0.00s)

I'm seeing the same message in NAMEDOTCOM, HEXONET, HEDNS, and then I stopped looking.

Hmm very weird, I hope I can take a crack at this issue upcoming friday. It is weird, there shouldn't be a difference in the behavior of providers, but for some reason there is?

@tlimoncelli
Copy link
Contributor

About TransIP: Ooops! My bad. Yes, TransIP is a non-issue as NullMX is not processed.

@blackshadev
Copy link
Contributor

@tlimoncelli I am unable to reproduce your issue with the integration test on the BIND provider:

~ go test -v -verbose -provider BIND -start 17 -end 17                    
                                                                                                                                                                           
=== RUN   TestDNSProviders
=== RUN   TestDNSProviders/example.com
=== RUN   TestDNSProviders/example.com/Clean_Slate:Empty
    integration_test.go:247: 
        - DELETE example.com NS ns1.example.com. ttl=300
        - DELETE example.com NS ns2.example.com. ttl=300
WRITING ZONEFILE: zones/example.com.zone
=== RUN   TestDNSProviders/example.com/17:NullMXApex:create
    integration_test.go:247: 
        + CREATE example.com A 1.2.3.2 ttl=300
        + CREATE example.com MX 0 . ttl=300
        + CREATE www.example.com A 1.2.3.8 ttl=300
WRITING ZONEFILE: zones/example.com.zone
=== RUN   TestDNSProviders/example.com/17:NullMXApex:unnull
    integration_test.go:247: 
        + CREATE example.com MX 8 www.example.com. ttl=300
        ± MODIFY example.com MX (0 . ttl=300) -> (2 example.com. ttl=300)
WRITING ZONEFILE: zones/example.com.zone
=== RUN   TestDNSProviders/example.com/17:NullMXApex:renull
    integration_test.go:247: 
        ± MODIFY example.com MX (2 example.com. ttl=300) -> (0 . ttl=300)
        - DELETE example.com MX 8 www.example.com. ttl=300
WRITING ZONEFILE: zones/example.com.zone
--- PASS: TestDNSProviders (0.02s)
    --- PASS: TestDNSProviders/example.com (0.02s)
        --- PASS: TestDNSProviders/example.com/Clean_Slate:Empty (0.00s)
        --- PASS: TestDNSProviders/example.com/17:NullMXApex:create (0.01s)
        --- PASS: TestDNSProviders/example.com/17:NullMXApex:unnull (0.00s)
        --- PASS: TestDNSProviders/example.com/17:NullMXApex:renull (0.00s)
=== RUN   TestDualProviders
    integration_test.go:369: Clearing everything
    integration_test.go:363: #1:
        - DELETE example.com A 1.2.3.2 ttl=300
        - DELETE example.com MX 0 . ttl=300
        - DELETE www.example.com A 1.2.3.8 ttl=300
WRITING ZONEFILE: zones/example.com.zone
    integration_test.go:376: Adding nameservers from another provider
    integration_test.go:363: #1:
        + CREATE example.com NS ns1.example.com. ttl=300
        + CREATE example.com NS ns2.example.com. ttl=300
WRITING ZONEFILE: zones/example.com.zone
    integration_test.go:379: Running again to ensure stability
--- PASS: TestDualProviders (0.00s)
=== RUN   TestNameserverDots
=== RUN   TestNameserverDots/No_trailing_dot_in_nameserver
--- PASS: TestNameserverDots (0.00s)
    --- PASS: TestNameserverDots/No_trailing_dot_in_nameserver (0.00s)
PASS
ok      github.com/StackExchange/dnscontrol/v4/integrationTest  0.025s

I also ran it without the start parameter. But everything runs fine without warning.

What did you do differently? I am on the main branch commit d926e45

@tlimoncelli
Copy link
Contributor

Welp, now I can't reproduce it either. I even re-ran it in GHA: https://github.com/StackExchange/dnscontrol/actions/runs/7586343357/job/20664596836

:confused emoji:

Ok, let's say that was a fluke.

That said, I'm still interested in seeing if the original bug can be solved at the dnsgraph level. It is similar to the fact that when converting an A record to a CNAME yet a challenge because it is different.

@blackshadev
Copy link
Contributor

Wel given that it works as expected on BIND. How do we expect it to work any other way on INWX any differently?

If @patschi uses the diff2 algorithm DNSGraph should sort things out right? If not, we need a way for me to debug and run some test suite.

For now: My understanding is without using diff2 , it will not use DNSGraph . So lets do that first and check for the result afterward. Or am I missing the point completely?

@tlimoncelli
Copy link
Contributor

I think the next step should be to convert INWX to use diff2. If that fixes the NullMX problem, we've killed two birds with one stone.

If @patschi is interested in doing the conversion, that'd be great. If you want to add a test account (https://docs.dnscontrol.org/developer-info/byo-secrets) it would enable me to help out. (Don't let the long doc fool you... the actual work to set this up is rather small.)

If converting to diff2 doesn't work... then we'll loop back to @blackshadev with a repro case. :)

@tlimoncelli
Copy link
Contributor

@tlimoncelli Is the CI running basically the integrationTest just like when running them locally? I could then create a new account with INWX with one test domain and add it to the CI pipeline. So that we have this covered as well.

@patschi To securely send the credentials for use with the project, use this link: https://transfer.secretoverflow.com/u/tlimoncelli (I should be able to get to it in the next week)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants