Some unit tests crash when using symengine #12359

iyanmv · 2024-05-07T15:13:22Z

Environment

Qiskit version: 1.1.0rc1 (but also in previous stable versions)
Python version: 3.12.3
Operating system: Arch Linux

What is happening?

Some unit tests fail due to symengine crashes. For example, the following five tests from test_circuit_load_from_qpy.py crash on my system:

test_parameter_expression
test_evolutiongate_param_expr_time
test_parameter_expression_global_phase
test_parameter_vector_element_in_expression
test_symengine_full_path

From the first test, the exact line when the the test crashes is this:

qiskit/qiskit/qpy/binary_io/value.py

Line 58 in 6ff0eb5

expr_bytes = obj._symbol_expr.__reduce__()[1][0]

Which produces a src/tcmalloc.cc:304] Attempt to free invalid pointer 0x6146ecca5b60.

How can we reproduce the issue?

Here is a small self-contained script that can replicate the crash described above.

from qiskit.circuit.parameter import Parameter
from qiskit.circuit.parameterexpression import ParameterExpression


theta = Parameter("theta")
phi = Parameter("phi")
sum_param = theta + phi

sum_param._symbol_expr.__reduce__()
# src/tcmalloc.cc:304] Attempt to free invalid pointer 0x645c8e3eaa30 
# [1]    987079 IOT instruction (core dumped)  python

The following running on the same system works:

import symengine

theta = symengine.Symbol("theta")
phi = symengine.Symbol("phi")
sum_param = theta + phi

sum_param.__reduce__()

What should happen?

Ideally, all unit tests would pass in a clean chroot env on Arch Linux.

Any suggestions?

No response

The text was updated successfully, but these errors were encountered:

iyanmv · 2024-05-07T16:32:32Z

For some reason, if symengine is imported, there is no crash, i.e.,

import symengine
from qiskit.circuit.parameter import Parameter
from qiskit.circuit.parameterexpression import ParameterExpression


theta = Parameter("theta")
phi = Parameter("phi")
sum_param = theta + phi

sum_param._symbol_expr.__reduce__()

jakelishman · 2024-05-07T17:34:18Z

Thanks for the report. Do you know what version of symengine you're using?

Fwiw, it would be really pretty weird if this is caused by Qiskit and not some allocator problem within Symengine - our use of it is very much inline with their public interface, and we don't go messing around in their internals or anything.

iyanmv · 2024-05-07T20:08:04Z

I'm using symengine 0.11.2, and Arch Linux builds the library with the following options:

cmake -B build -S $pkgname-$pkgver \
  -DCMAKE_INSTALL_PREFIX=/usr \
  -DBUILD_SHARED_LIBS=ON \
  -DWITH_TCMALLOC=ON \
  -DWITH_PTHREAD=ON \
  -DWITH_SYMENGINE_THREAD_SAFE=ON \
  -DWITH_ARB=ON \
  -DWITH_ECM=ON \
  -DWITH_LLVM=ON \
  -DWITH_MPFR=ON \
  -DWITH_MPC=ON \
  -DWITH_PRIMESIEVE=ON \
  -DWITH_BOOST=ON \
  -DWITH_COTIRE=OFF \
  -DWITH_SYSTEM_CEREAL=ON
cmake --build build

jakelishman · 2024-05-07T20:38:56Z

If you installed symengine from pip or conda, you'd probably have got their pre-packaged wheels, which is the same situation we're usually expecting. If you're using it from Arch's repos in the system Python installation, then this could very well be a difference: I don't think symengine's published wheels use tcmalloc, and it could be some fault there (or perhaps some questionable behaviour that happens to work with glibc's malloc, but tcmalloc is rejecting).

The difference between pre-importing symengine and not probably isn't indicative of everything being super fixed (since qiskit does just import symengine internally) - it's more likely a spooky-action-at-a-distance memory bug.

If you have the means to try, would you be able to see if the error repeats if you drop the -DWITH_TCMALLOC=ON (or manually set it to off) while you're building the symengine your Python installation is linking against?

iyanmv · 2024-05-07T21:00:12Z

I was going to do that experiment next, yes. I will rebuild symengine and python-symengine without tcmalloc and see if that solves the issue.

I just installed qiskit quickly in a venv and when I use the symengine from PyPI I also can't replicate the issue.

iyanmv · 2024-05-10T08:30:30Z

I can confirm those tests don't fail when using symengine without tcmalloc. I will create an issue on Arch Linux.

jakelishman · 2024-05-10T10:09:03Z

Thanks so much for looking into it in depth! I guess if the issue is somewhere in Arch / tcmalloc / symengine, we can close this issue on Qiskit? I'll close it, but feel free to re-open if there's more to discuss.

iyanmv · 2024-05-10T11:14:41Z

After solving the crashes due to tcmalloc, I found another error related to symengine (this time a numerical rounding error).

In particular is this test:

qiskit/test/python/qobj/test_pulse_converter.py

Line 343 in 235e581

def test_parameterized_shift_frequency(self):

It works running on a virtual env, but if fails using system shared libraries. It's a numerical error happening in this line:

qiskit/qiskit/circuit/parameterexpression.py

Line 534 in 235e581

return float(real_expr)

Running on my system real_expr has type symengine.lib.symengine_wrapper.RealDouble and value 3140000.0. However, running float(real_expr) returns 3139999.9999999995. I think this possibility was considered here but it doesn't work as expected:

qiskit/qiskit/pulse/utils.py

Lines 71 to 74 in 235e581

    
           # Remove truncation error and convert the result into Python builtin type. 
        
           # Value could originally contain a rounding error, e.g. 1.00000000001 
        
           # which may occur during the parameter expression evaluation. 
        
           evaluated = np.round(operand, decimals=decimal).item()

The reason is that decimal has a value 10 so this error is not removed with the rounding. Passing decimals=9 fixes the issue for me.

Note also this relevant comment from NumPy (see Notes):

np.round uses a fast but sometimes inexact algorithm to round floating-point datatypes. For positive decimals it is equivalent to np.true_divide(np.rint(a * 10decimals), 10decimals), which has error due to the inexact representation of decimal fractions in the IEEE floating point standard [1] and errors introduced when scaling by powers of ten.
(...)
If your goal is to print such values with a fixed number of decimals, it is preferable to use numpy’s float printing routines to limit the number of printed decimals:
(...)
The float printing routines use an accurate but much more computationally demanding algorithm to compute the number of digits after the decimal point.
Alternatively, Python’s builtin round function uses a more accurate but slower algorithm for 64-bit floating point values:

Perhaps it's better to use round(operand, ndigits=decimal) and change the default value of decimal from 10 to 9? What do you think? I can make a PR if you think it's a good idea (I will check first that there are no other side effects after changing that).

iyanmv · 2024-05-10T12:06:05Z

Sorry, I guess round() is out of the table because of complex numbers.

iyanmv · 2024-05-10T12:18:21Z

Also, after thinking a bit I don't understand why there should be such a rounding error in the first place. Do you know why is that error expected from that comment in the code and why decimal has a default value of 10 instead of 14 for example (which should be fine with double precision I guess)? I also can't replicate the error using pure symengine code. I can try to compile symengine without MPFR as a wild guess... but I don't understand what is going on.

iyanmv · 2024-05-12T12:57:08Z

No difference compiling symengine with -DWITH_MPFR=OFF

iyanmv · 2024-05-12T14:35:23Z

I noticed that the ParameterExpression(f/1000) is multiplied by a float here:

qiskit/qiskit/qobj/converters/pulse_instruction.py

Line 780 in f20dad0

frequency = self.disassemble_value(instruction.frequency) * 1e9

This causes frequency to have the value ParameterExpression(1000000.0*f) in this particular unit test instead of ParameterExpression(1000000*f) which could have be obtained by multiplying by an integer instead, i.e.,

frequency = self.disassemble_value(instruction.frequency) * 10**9

This is what ultimately leads to the numerical error that could have been avoided and has nothing to do with the double precision of the float type.

iyanmv · 2024-05-12T14:40:29Z

And here is a small self-contained example replicating the issue with pure symengine:

import symengine

f = symengine.Symbol("f")
expr = f / 1000

expr_wrong = expr * 1e9
expr_good = expr * 10**9

wrong_value = float(expr_wrong.subs("f", 3.14))
good_value = float(expr_good.subs("f", 3.14))

print(wrong_value)
# 3139999.9999999995
print(good_value)
# 3140000.0

Classes in pulse_instruction.py scale frequency values to GHz by multipliying `ParameterExpression` with float 1e9. This can lead to numerical errors on some systems using symengine. Instead, this scaling can be done multiplying by integer 10**9. See: Qiskit#12359 (comment)

jakelishman · 2024-05-15T17:27:28Z

Sorry for the lack of response here - I reopened the issue, started typing a comment, then got distracted and forgot I hadn't posted.

Thanks so much for tracking this down and providing the fix - that's a very tricksy bug to find, and thanks for filing it on symengine.

iyanmv · 2024-05-16T05:49:39Z

No problem ;)

* Avoid lossing precision when scaling frequencies Classes in pulse_instruction.py scale frequency values to GHz by multipliying `ParameterExpression` with float 1e9. This can lead to numerical errors on some systems using symengine. Instead, this scaling can be done multiplying by integer 10**9. See: #12359 (comment) * Add release note --------- Co-authored-by: Jake Lishman <jake.lishman@ibm.com>

* Avoid lossing precision when scaling frequencies Classes in pulse_instruction.py scale frequency values to GHz by multipliying `ParameterExpression` with float 1e9. This can lead to numerical errors on some systems using symengine. Instead, this scaling can be done multiplying by integer 10**9. See: #12359 (comment) * Add release note --------- Co-authored-by: Jake Lishman <jake.lishman@ibm.com> (cherry picked from commit 96607f6)

* Avoid lossing precision when scaling frequencies Classes in pulse_instruction.py scale frequency values to GHz by multipliying `ParameterExpression` with float 1e9. This can lead to numerical errors on some systems using symengine. Instead, this scaling can be done multiplying by integer 10**9. See: #12359 (comment) * Add release note --------- Co-authored-by: Jake Lishman <jake.lishman@ibm.com> (cherry picked from commit 96607f6) Co-authored-by: Iyán <me@iyanmv.com>

iyanmv added the bug Something isn't working label May 7, 2024

jakelishman closed this as completed May 10, 2024

jakelishman reopened this May 10, 2024

iyanmv mentioned this issue May 11, 2024

Some qiskit unit tests fail when using symengine with tcmalloc symengine/symengine.py#475

Open

This was referenced May 12, 2024

Small example with different values when using PyPI symengine and shared library symengine/symengine.py#476

Open

Avoid lossing precision when scaling frequencies #12392

Merged

jakelishman closed this as completed in #12392 May 16, 2024

mergify bot mentioned this issue May 16, 2024

Avoid lossing precision when scaling frequencies (backport #12392) #12416

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some unit tests crash when using symengine #12359

Some unit tests crash when using symengine #12359

iyanmv commented May 7, 2024

iyanmv commented May 7, 2024

jakelishman commented May 7, 2024

iyanmv commented May 7, 2024

jakelishman commented May 7, 2024

iyanmv commented May 7, 2024

iyanmv commented May 10, 2024

jakelishman commented May 10, 2024

iyanmv commented May 10, 2024

iyanmv commented May 10, 2024

iyanmv commented May 10, 2024

iyanmv commented May 12, 2024

iyanmv commented May 12, 2024

iyanmv commented May 12, 2024 •

edited

jakelishman commented May 15, 2024

iyanmv commented May 16, 2024

Some unit tests crash when using symengine #12359

Some unit tests crash when using symengine #12359

Comments

iyanmv commented May 7, 2024

Environment

What is happening?

How can we reproduce the issue?

What should happen?

Any suggestions?

iyanmv commented May 7, 2024

jakelishman commented May 7, 2024

iyanmv commented May 7, 2024

jakelishman commented May 7, 2024

iyanmv commented May 7, 2024

iyanmv commented May 10, 2024

jakelishman commented May 10, 2024

iyanmv commented May 10, 2024

iyanmv commented May 10, 2024

iyanmv commented May 10, 2024

iyanmv commented May 12, 2024

iyanmv commented May 12, 2024

iyanmv commented May 12, 2024 • edited

jakelishman commented May 15, 2024

iyanmv commented May 16, 2024

iyanmv commented May 12, 2024 •

edited