New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
factor produces wrong output #26497
Comments
The bug seems to be related to factoring of multivariate expressions including import sympy
a, b = sympy.symbols("a b")
((b - sympy.I)**2 * (b + sympy.I) * (a + 1)).expand().factor() which produces an incorrect expression |
This is a serious bug. It has been around since SymPy started factorising such polynomials automatically: d3eab46. Before that you would need to use In [1]: import sympy
...: a, b = sympy.symbols("a b")
...: e = ((b - sympy.I)**2 * (b + sympy.I) * (a + 1))
In [6]: e.expand().factor()
Out[6]:
⎛ 2 ⎞
(a + 1)⋅⎝b + 1⎠
In [7]: e.expand().factor(extension=True)
Out[7]:
⎛ 2 ⎞
(a + 1)⋅⎝b + 1⎠
In [8]: e.expand().factor(domain=QQ_I)
Out[8]:
⎛ 2 ⎞
(a + 1)⋅⎝b + 1⎠
In [11]: e.expand().factor(domain=QQ.algebraic_field(I))
Out[11]:
⎛ 2 ⎞
(a + 1)⋅⎝b + 1⎠ There must be a bug in the code for factorisation over algebraic fields: sympy/sympy/polys/factortools.py Lines 1280 to 1307 in d11d0b2
|
The bug is in In [25]: sqf(expand(((y - I)**2 * (y + I) * (z + 1))))
Out[25]:
2
(y - ⅈ) ⋅(y + ⅈ)
In [26]: sqf(expand(((y - I)**2 * (y + I) * (x + 1))))
Out[26]: x + 1 Both of those answers are wrong. The factors for one variable are missing in each case. The bug is here: sympy/sympy/polys/sqfreetools.py Lines 409 to 429 in 87bf9d4
That seems to use the same logic as for the univariate case that you can compute the square free factorisation by differentiating and then computing the gcd. The difference though is that in the multivariate case there is more than one choice of variable to differentiate with respect to and this seems to make an arbitrary choice. We might then end up with a factor like What should probably happen is that once sympy/sympy/polys/sqfreetools.py Lines 404 to 407 in 87bf9d4
The implemented algorithm seems to be Yun's algorithm but applying the univariate version of the algorithm naively to multivariate polynomials. In Yun's paper it is stated that the algorithm requires the multivariate polynomials to be primitive in the given varaible that is used for differentiation:
I think this means that to apply this algorithm in the multivariate case you would first need to use In [12]: p = expand((y - I)**2 * (y + I) * (z + 1))
In [13]: p
Out[13]:
3 3 2 2
y ⋅z + y - ⅈ⋅y ⋅z - ⅈ⋅y + y⋅z + y - ⅈ⋅z - ⅈ
In [14]: p.as_poly().primitive()
Out[14]: (1, Poly(y**3*z + y**3 - I*y**2*z - I*y**2 + y*z + y - I*z - I, y, z, domain='ZZ_I'))
In [15]: p.as_poly(y).primitive()
Out[15]: (z + 1, Poly(y**3 - I*y**2 + y - I, y, domain='ZZ_I[z]')) |
The bug for In [1]: sqf(expand(((y - 2)**2 * (y + 2) * (x + 1))))
Out[1]: x + 1
In [2]: sqf(expand(((y - 2)**2 * (y + 2) * (z + 1))))
Out[2]:
2
(y - 2) ⋅(y + 2) I think that the difference in the case of To be clear the expected output for sqf is like: In [3]: sqf(expand((x-1)*(x-2)*(x-3)**2*(x-4)**5))
Out[3]:
5 2 ⎛ 2 ⎞
(x - 4) ⋅(x - 3) ⋅⎝x - 3⋅x + 2⎠ It is supposed to be a factorisation of the expression like with |
This fixes the bug in diff --git a/sympy/polys/sqfreetools.py b/sympy/polys/sqfreetools.py
index e11eebc7b7..2fd4f0b8ca 100644
--- a/sympy/polys/sqfreetools.py
+++ b/sympy/polys/sqfreetools.py
@@ -23,7 +23,7 @@
from sympy.polys.euclidtools import (
dup_inner_gcd, dmp_inner_gcd,
dup_gcd, dmp_gcd,
- dmp_resultant)
+ dmp_resultant, dmp_primitive)
from sympy.polys.galoistools import (
gf_sqf_list, gf_sqf_part)
from sympy.polys.polyerrors import (
@@ -401,30 +401,36 @@ def dmp_sqf_list(f, u, K, all=False):
deg = dmp_degree(f, u)
if deg < 0:
return coeff, []
- elif deg == 0:
- coeff2, factors = dmp_sqf_list(dmp_LC(f, u), u-1, K, all=all)
- factors = [([fac], exp) for fac, exp in factors]
- return coeff*coeff2, factors
- result, i = [], 1
+ content, f = dmp_primitive(f, u, K)
- h = dmp_diff(f, 1, u, K)
- g, p, q = dmp_inner_gcd(f, h, u, K)
+ result = []
- while True:
- d = dmp_diff(p, 1, u, K)
- h = dmp_sub(q, d, u, K)
+ if deg != 0:
- if dmp_zero_p(h, u):
- result.append((p, i))
- break
+ h = dmp_diff(f, 1, u, K)
+ g, p, q = dmp_inner_gcd(f, h, u, K)
- g, p, q = dmp_inner_gcd(p, h, u, K)
+ i = 1
- if all or dmp_degree(g, u) > 0:
- result.append((g, i))
+ while True:
+ d = dmp_diff(p, 1, u, K)
+ h = dmp_sub(q, d, u, K)
- i += 1
+ if dmp_zero_p(h, u):
+ result.append((p, i))
+ break
+
+ g, p, q = dmp_inner_gcd(p, h, u, K)
+
+ if all or dmp_degree(g, u) > 0:
+ result.append((g, i))
+
+ i += 1
+
+ coeff_content, result_content = dmp_sqf_list(content, u-1, K, all=all)
+ coeff *= coeff_content
+ result += [([fac], exp) for fac, exp in result_content]
return coeff, result Unfortunately that does not fix the original problem. It turns out that In [3]: a, b = symbols("a b")
In [4]: e = ((b - I)**2 * (b + I) * (a + 1))
In [5]: e
Out[5]:
2
(a + 1)⋅(b - ⅈ) ⋅(b + ⅈ)
In [6]: expand(e)
Out[6]:
3 2 3 2
a⋅b - ⅈ⋅a⋅b + a⋅b - ⅈ⋅a + b - ⅈ⋅b + b - ⅈ
In [7]: sqf_part(expand(e))
Out[7]:
2 2
a⋅b + a + b + 1
In [8]: sqf_part(expand(e)).factor()
Out[8]:
⎛ 2 ⎞
(a + 1)⋅⎝b + 1⎠
In [9]: expand(e).factor() # wrong
Out[9]:
⎛ 2 ⎞
(a + 1)⋅⎝b + 1⎠ Here the output from The bug is somewhere in sympy/sympy/polys/factortools.py Lines 1280 to 1307 in 87bf9d4
I'm not familiar with the algorithm in use here. Again it looks very similar to the corresponding univariate algorithm. In the debugger we already have a wrong output from dmp_factor_list_include because we only get two factors when there should be three. The output of dmp_factor_list is correct for the given input so the problem needs to be earlier. The input f seems correct and dmp_sqf_part gets the right square free part.
|
Maybe the bug is triggered because the square free part has only real coefficients i.e. we're trying to factorise a polynomial in In [11]: e
Out[11]:
2
(a + 1)⋅(b - ⅈ) ⋅(b + ⅈ)
In [12]: sqf_part(e)
Out[12]:
2 2
a⋅b + a + b + 1 It should be possible for In [20]: quo(e, sqf_part(expand(e)))
Out[20]: b - ⅈ I'm not sure that's how the algorithm is supposed to work though. I don't actually know where this particular algorithm comes from. |
The algorithm in That algorithm is written for univariate polynomials though. I'm not sure if |
Possibly the Algebraic factoring and rational function integration |
The problem with |
The algorithm in sympy/sympy/polys/sqfreetools.py Lines 55 to 74 in af0c855
It is not enough just to differentiate wrt one arbitrarily chosen variable in the multivariate case. Hence: In [22]: p = x**2*y**4 + 2*x**2*y**2 + x**2 + 2*x*y**4 + 4*x*y**2 + 2*x + 2*y**4 + 4*y**2 + 2
In [23]: p
Out[23]:
2 4 2 2 2 4 2 4 2
x ⋅y + 2⋅x ⋅y + x + 2⋅x⋅y + 4⋅x⋅y + 2⋅x + 2⋅y + 4⋅y + 2
In [24]: p.factor()
Out[24]:
2
⎛ 2 ⎞ ⎛ 2 ⎞
⎝y + 1⎠ ⋅⎝x + 2⋅x + 2⎠
In [25]: Poly(p).is_sqf # wrong
Out[25]: True I think that wrong result from |
We can fix diff --git a/sympy/polys/sqfreetools.py b/sympy/polys/sqfreetools.py
index e11eebc7b7..e00907b84d 100644
--- a/sympy/polys/sqfreetools.py
+++ b/sympy/polys/sqfreetools.py
@@ -10,7 +10,7 @@
from sympy.polys.densebasic import (
dup_strip,
dup_LC, dmp_ground_LC,
- dmp_zero_p,
+ dmp_zero_p, dmp_ground_p,
dmp_ground, dmp_LC,
dup_degree, dmp_degree,
dmp_raise, dmp_inject,
@@ -70,8 +70,20 @@ def dmp_sqf_p(f, u, K):
"""
if dmp_zero_p(f, u):
return True
- else:
- return not dmp_degree(dmp_gcd(f, dmp_diff(f, 1, u, K), u, K), u)
+
+ for i in range(u+1):
+
+ fp = dmp_diff_in(f, 1, i, u, K)
+
+ if dmp_zero_p(fp, u):
+ continue
+
+ gcd = dmp_gcd(f, fp, u, K)
+
+ if not dmp_ground_p(gcd, None, u):
+ return False
+
+ return True
def dup_sqf_norm(f, K): I am wondering though whether there should actually be a distinction between the existing function that basically determines if a polynomial in In any case fixing
However the algorithm presented as The simplest fix I can imagine is to try shifts in turn like:
We would then need to return a list of shifts for |
It might just be sufficient to do this repeatedly:
Possibly that is even what the paper is describing... |
I believe all issues are fixed by gh-26515. The OP example gives: In [2]: import sympy
In [3]: a_n, m = sympy.symbols('a_n m', real=True, positive=True)
...: expr = (-a_n**4 *m**2 / (a_n**2 + 1) - 2 * a_n**4 *m / (a_n**2 + 1) - a_n**4 - a_n**4 / (a_n**2 +
...: 1) + 2 * sympy.I * a_n**3 *m**2 / (a_n**2 + 1) + 4 * sympy.I * a_n**3 *m / (a_n**2 + 1) + 2 * sy
...: mpy.I * a_n**3 + 2 * sympy.I * a_n**3 / (a_n**2 + 1) + 2 * sympy.I * a_n *m**2 / (a_n**2 + 1) + 4
...: * sympy.I * a_n *m / (a_n**2 + 1) + 2 * sympy.I * a_n + 2 * sympy.I * a_n / (a_n**2 + 1) +m**2 /
...: (a_n**2 + 1) + 2 *m / (a_n**2 + 1) + 1 + 1 / (a_n**2 + 1))
In [4]: expr.subs({a_n: 1, m: 1})
Out[4]: 12⋅ⅈ
In [5]: expr.factor().subs({a_n: 1, m: 1})
Out[5]:
3
-3⋅(1 - ⅈ) ⋅(1 + ⅈ)
In [6]: expr.factor().subs({a_n: 1, m: 1}).expand()
Out[6]: 12⋅ⅈ The PR still needs tests and things to finish off but I don't have time right now. At least I think that the problem is identified and we have a fix. If anyone wants to finish the PR then go ahead. |
This is the fix I went with. I think it's valid. I think there are only finitely many bad shifts |
After fixing this in gh-26515 I find the following for the OP example: In [16]: expr
Out[16]:
4 2 4 4 3 2 3 3 2 ↪
aₙ ⋅m 2⋅aₙ ⋅m 4 aₙ 2⋅ⅈ⋅aₙ ⋅m 4⋅ⅈ⋅aₙ ⋅m 3 2⋅ⅈ⋅aₙ 2⋅ⅈ⋅aₙ⋅m 4⋅ⅈ⋅ ↪
- ─────── - ─────── - aₙ - ─────── + ────────── + ───────── + 2⋅ⅈ⋅aₙ + ─────── + ───────── + ──── ↪
2 2 2 2 2 2 2 2 ↪
aₙ + 1 aₙ + 1 aₙ + 1 aₙ + 1 aₙ + 1 aₙ + 1 aₙ + 1 aₙ ↪
↪ 2
↪ aₙ⋅m 2⋅ⅈ⋅aₙ m 2⋅m 1
↪ ──── + 2⋅ⅈ⋅aₙ + ─────── + ─────── + ─────── + 1 + ───────
↪ 2 2 2 2
↪ + 1 aₙ + 1 aₙ + 1 aₙ + 1 aₙ + 1
In [17]: expr.factor()
Out[17]:
3 ⎛ 2 2 ⎞
-(aₙ - ⅈ) ⋅(aₙ + ⅈ)⋅⎝aₙ + m + 2⋅m + 2⎠
─────────────────────────────────────────
2
aₙ + 1
In [18]: expr.cancel()
Out[18]:
4 3 2 2 2 2 2 2
- aₙ + 2⋅ⅈ⋅aₙ - aₙ ⋅m - 2⋅aₙ ⋅m - aₙ + 2⋅ⅈ⋅aₙ⋅m + 4⋅ⅈ⋅aₙ⋅m + 4⋅ⅈ⋅aₙ + m + 2⋅m + 2
In [19]: expr.cancel().factor()
Out[19]:
2 ⎛ 2 2 ⎞
-(aₙ - ⅈ) ⋅⎝aₙ + m + 2⋅m + 2⎠ I believe that all of these expressions are equivalent but ideally In [21]: factor(a_n**2 + 1)
Out[21]:
2
aₙ + 1
In [22]: factor(a_n**2 + 1, domain=QQ_I)
Out[22]: (aₙ - ⅈ)⋅(aₙ + ⅈ) This is also a bug although much less serious. |
This reference gives a multivariate form of Yun's algorithm on page 8:
https://d-nb.info/1036637972/34 That looks possibly better than the fix I used in #26514 |
I have completed the PR #26515 that fixes this. All issues identified should be fixed I believe. |
sympy.factor(expr)
produces the wrong output if an underscore is used in the name of one of the symbols inexpr
.EDIT: The issue is more general. The correctness of the factorization output depends on the names of the variables. Using symbol names with more than one character also changes the output, for example.
Observed behavior:
For example,
gives different numerical results if I factor
expr
and replacea_n
andm
with numbers.expr.subs({a_n: 1, m: 1})
produces12i
, whereasexpr.factor().subs({a_n: 1, m: 1})
produces-6
.A correct behavior would result in
12i
, or equivalent numerical expressions, in both cases.On the other hand, defining
correctly produces the equivalent outputs
12i
in both cases.Sympy version
I observe this in the stable branch, but also in 1.12 and 1.10. Didn't check other versions.
Other modifications that produce the same error
Using
a_n, m = sympy.symbols('an m', real=True, positive=True)
, so a symbol with a name of more than one character, also reproduces the error.The text was updated successfully, but these errors were encountered: