Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility of taking out numpy dependency? #6

Open
pchtsp opened this issue Jul 9, 2021 · 47 comments
Open

Possibility of taking out numpy dependency? #6

pchtsp opened this issue Jul 9, 2021 · 47 comments

Comments

@pchtsp
Copy link

pchtsp commented Jul 9, 2021

Hello!

Thanks for this library. We'd like to add it as dependency in PuLP (https://github.com/coin-or/pulp/) but I do not want to have to add numpy as dependency. In fact, we maintain an numpy-free version of your mps reader: https://github.com/coin-or/pulp/blob/9a71ae08e9416a6dc37c702dab1dd752e6bb0d00/pulp/mps_lp.py#L31

I'd have no issue in helping with a PR if needed.

in any case, thanks again!

@jmaerte
Copy link
Owner

jmaerte commented Jul 9, 2021

Hello!

Is there any particular reason for the need of a project to be numpy-free?
I actually never encountered any problem with numpy.

Anyway; i can look into it and try to modify it towards being numpy-free if that really helps you out.

Best Regards!

@pchtsp
Copy link
Author

pchtsp commented Jul 9, 2021 via email

@jmaerte
Copy link
Owner

jmaerte commented Jul 9, 2021

I see.

No problem. From my side numpy is not really needed as it is only used in the return-types. I can definitely factor it out.

I just used it since what I'm returning are matrices and it is the most common representation of a matrix in python imo.

Nonetheless - I will probably refactor the code in the next couple of days.

Best regards!

@jmaerte
Copy link
Owner

jmaerte commented Jul 10, 2021

Hello!

To this topic: I have created a new branch called "numpy_free", see https://github.com/jmaerte/pysmps/tree/numpy_free.

Since I do not have a test case on my PC atm I would need you to test this version if you don't mind (I assume you might have plenty of examples).

Can you also test the stochastic part of the reader? That would be great.

If you can confirm, I will merge this into the master branch and will commit it as the latest version in pypi.

Thanks in advance!

@pchtsp
Copy link
Author

pchtsp commented Jul 11, 2021 via email

@jmaerte
Copy link
Owner

jmaerte commented Jul 14, 2021

Has everything worked out so far?

Best regards!

@pchtsp
Copy link
Author

pchtsp commented Jul 14, 2021

Hello!
In fact I was just testing it out before you wrote. I'm trying, first of all, to load the same mps file in pulp and pysmps and see if they load the same information.
But for some reason pysmps "blocks" with the first file I tried:

dance_pairing_-pulp.mps.txt

I did the following:

import pysmps.smps_loader as mps
file = 'dance_pairing_-pulp.mps.txt'
pysmps_data = mps.load_mps(file)
print(pysmps_data)

Once I manage to run and successfully checked manually some mps files, I will adapt pulp to integrate directly with pysmps and then I will be able to run all pulp unittests using pysmps and see if anything breaks.

@jmaerte
Copy link
Owner

jmaerte commented Jul 14, 2021

So I have loaded this file myself now and the problem with your code is not the loading (even tho it takes quite long too - round about half a minute; maybe I should implement a loading bar) but the printing of the returned matrices and vectors... so the "blocking" you are describing lays in print(pysmps_data) on my side..

I would rather test it with a smaller problem to check it on plausibility by the eye.

Note aswell: The current pysmps version on pypi is still using numpy as i do not want to update it to a version that might not be running properly.

Best regards!

@jmaerte
Copy link
Owner

jmaerte commented Jul 14, 2021

Just so you can check if the loading is blocking or if something else causes this: I have just commited a version to the numpy_free branch of this repository that implements a progress bar for loading the file.

Best regards.

@jmaerte
Copy link
Owner

jmaerte commented Jul 14, 2021

I would suggest cloning this branch and just executing the python code in the console so you get a local function called load_mps instead of one imported by pypi.

@pchtsp
Copy link
Author

pchtsp commented Jul 14, 2021 via email

@jmaerte
Copy link
Owner

jmaerte commented Jul 14, 2021

I will check out your pulp version for differences to what I did.

However: The Result returned should be the same and should take an eternity to print to the console even with your pulp version, right?

@pchtsp
Copy link
Author

pchtsp commented Jul 15, 2021

The complete code I ran was:

import pulp as pl
import pysmps.smps_loader as mps

file = 'dance_pairing_-pulp.mps.txt'

# this runs very fast
pul_data = pl.mpslp.readMPS(file, pl.LpMinimize)
print(pul_data)

# this takes X seconds to run
pysmps_data = mps.load_mps(file)
# this blocks for some reason
print(pysmps_data)

Unfortunately, the returned information is no quite the same for my pulp function and yours (I modified the code to match pulp's own internal format). But the print(pul_data) is also instantaneous (it prints a dictionary). You're right in that the mps.load_mps takes just some seconds and block during the print(pysmps_data). Knowing this, I'll continue to validate the output data and probably integrate it in pulp to test it.

Still, the current execution time for the load_mps method is too large. Specially compared with the current pulp version (I'm not sure if I took out parts, I don't think so). I think we can also work in getting it to run faster. If you check the pulp implementation maybe it will give some hints.

Thanks!

@jmaerte
Copy link
Owner

jmaerte commented Jul 15, 2021

So I see some problems with my code for this size of linear programs. I will try to fix this asap.

Though I can't guarantee that the result is exactly what your MPS function returns.

@jmaerte
Copy link
Owner

jmaerte commented Jul 15, 2021

As far as I can judge it from looking at it your code can't handle multiple different RHS?
Also it only parses the first RHS given in the respective line. Compare to this linear program given in the wikipedia article on MPS file formats:

https://en.wikipedia.org/wiki/MPS_(format)

First line of RHS1 has 2 identifiers. One for LIM1 and one for LIM2. You would only catch LIM1 as far as I can see?

Best regards!

@pchtsp
Copy link
Author

pchtsp commented Jul 15, 2021 via email

@jmaerte
Copy link
Owner

jmaerte commented Jul 15, 2021

I have just uploaded a new commit to the numpy_free branch. Feel free to test it.

You now need to import the code from mps_reader.py.

The reason for this is that i want to keep compatability with old code using the library. So the old mps reader for now stays inside the smps_loader.py file so I do not need to change the whole smps reader aswell.

As a stand-alone MPS file reader i would offer the one in the mps_loader.py file since the output is now completely different, not returning the constraint matrix as a 2d array anymore (this was by the way the computational bottleneck of the code earlier since it needed to initialize an all-0-matrix of the size of the problem because it was not programmed for sparse or large problems).

I'm excited for your feedback on the new MPS reader.

P.S.: The output of the new MPS function comes closer to what you are returning i guess.

@pchtsp
Copy link
Author

pchtsp commented Jul 17, 2021 via email

@jmaerte
Copy link
Owner

jmaerte commented Jul 17, 2021

Well i do not plan to keep it like this. I want to make my smps reader reference back to the improved mps reader. So it is not written just for pulp but rather as a new stand-alone mps reader. But to keep compatability i need the old output format.

So my plan is to use the improved reader in the old mps reader function and just reparse the output.

@jmaerte
Copy link
Owner

jmaerte commented Jul 20, 2021

Is there a need for an mps reader still?

@pchtsp
Copy link
Author

pchtsp commented Jul 20, 2021 via email

@MLopez-Ibanez
Copy link

I have just uploaded a new commit to the numpy_free branch. Feel free to test it.

-            matrix = np.zeros((len(self.places),1))
+           self.matrix = [[0]] * len(self.places)

This way of creating a "matrix" in Python is not recommended because lists are mutable objects that are shallow copied, which means that matrix[0][0] is the same variable as matrix[1][0]. The following testcase shows the problem:

vector = [0] * 5
vector[0] = 5
print(vector)

bad_matrix = [[0]] * 5
bad_matrix[0].append(0) 
print(bad_matrix)
bad_matrix[0][0] = 5
print(bad_matrix)

matrix = [[0] for _ in range(5) ]
matrix[0].append(0) 
print(matrix)
matrix[0][0] = 5
print(matrix)

gives the following output:

[5, 0, 0, 0, 0]
[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]
[[5, 0], [5, 0], [5, 0], [5, 0], [5, 0]]
[[0, 0], [0], [0], [0], [0]]
[[5, 0], [0], [0], [0], [0]]

@jmaerte
Copy link
Owner

jmaerte commented Aug 27, 2021

Of course. I see.

Nonetheless it seems like this inofficial branch is dead anyway due to no interest in a numpy-free version anymore.

Thanks anyway; I will fix this.

Best regards!

@MLopez-Ibanez
Copy link

I think there is interest, just that @pchtsp has lots on his plate and may take some time to get to this, but I trust he will eventually do. (It is good to see more collaboration between Pulp and other projects rather than wheel re-invention).

@MLopez-Ibanez
Copy link

By the way, I should have said that this is not the only instance where this happens. There is at least one more (strange that no test failed because of this):

                       A = [[0]] * len(row_names)

@jmaerte
Copy link
Owner

jmaerte commented Aug 27, 2021

Thanks again - now all occurrences of this should be resolved.

@pchtsp
Copy link
Author

pchtsp commented Aug 31, 2021

Hello again. Thanks both @jmaerte @MLopez-Ibanez .
Indeed, as @MLopez-Ibanez said, my plate is quite full (and then I added holidays).
I've pulled the new version of the branch and did some tests:

import pulp as pl
import pysmps.smps_loader as mps
try:
    from time import process_time as clock
except ImportError:
    from time import clock

file = 'dance_pairing_-pulp.mps.txt'

time = - clock()
pul_data = pl.mpslp.readMPS(file, pl.LpMinimize)
time += clock()
print(time)

time = - clock()
pysmps_data = mps.load_mps(file)
time += clock()
print(time)

First of all, minor comment: there is a import sys missing in the smps_loader.py that I had to add to avoid an import error. Also, pycharm shows there is a reference to a function named __load_mps that is not imported here.

Regarding time: the time difference is still too large unfortunately. For this file (which is not a large problem I think): the pulp readMPS function takes 0.53s while the mps.load_mps function takes 21s

@pchtsp
Copy link
Author

pchtsp commented Aug 31, 2021

Sorry, I just realized I was using the incorrect import. If I change my second line to import pysmps.mps_loader as mps then the speed is the same. I will continue with the tests.

@jmaerte
Copy link
Owner

jmaerte commented Aug 31, 2021

I thought so - the smps_loader in this package should have nothing changed as far as i remember (except that i swapped numpy for built-in lists).

The speed boost by using sparse matrices for the boundary conditions is only implemented in the mps_loader file.

@pchtsp
Copy link
Author

pchtsp commented Aug 31, 2021

Hello again,

I've adapted the code in the following pulp branch: https://github.com/coin-or/pulp/tree/pysmps
Tests still fail to run. If you want to try the tests you can clone the repository and then create a virtualenv installing pysmps locally.

# clone repo
# branch to pysmps
python3 -m venv venv
source venv/bin/activate
pip install -U -e /LOCAL/PATH/TO/pysmps
python3 -m unittest pulp/tests/test_pulp.py

I haven't yet added pysmps to the package requirements so it will fail if not installed.

Some issues to iron out:

  1. Is there a reason why load_mps returns a dictionary with keys constraints and variable (one in singular, one in plural)?
  2. Binary variables are getting an upper bound of inf instead of 1. In the following example:
*SENSE:Maximize
NAME          test_importMPS_binary
ROWS
 N  OBJ
 E  _C1
 L  _C2
COLUMNS
    MARK      'MARKER'                 'INTORG'
    c1        _C1        1.000000000000e+00
    c1        _C2        1.000000000000e+00
    MARK      'MARKER'                 'INTEND'
    MARK      'MARKER'                 'INTORG'
    c2        _C1        1.000000000000e+00
    MARK      'MARKER'                 'INTEND'
    dummy     OBJ        1.000000000000e+00
RHS
    RHS       _C1        2.000000000000e+00
    RHS       _C2        0.000000000000e+00
BOUNDS
 BV BND       c1      
 BV BND       c2      
 FR BND       dummy   
ENDATA

I get the following:

pysmps_data = mps.load_mps(file)
print(pysmps_data['variable']['c1'])
# {'type': 'Integer', 'name': 'c1', 'bnd_lower': 0, 'bnd_upper': inf}
  1. Some lower bounds are not being read correctly by load_mps, I think. In the following example:
*SENSE:Minimize
NAME          test_importMPS_integer
ROWS
 N  obj
 L  c1
 G  c2
 E  c3
COLUMNS
    x         c1         1.000000000000e+00
    x         c2         1.000000000000e+00
    x         obj        1.100000000000e+00
    y         c1         1.000000000000e+00
    y         c3        -1.000000000000e+00
    y         obj        4.100000000000e+00
    MARK      'MARKER'                 'INTORG'
    z         c2         1.000000000000e+00
    z         c3         1.000000000000e+00
    z         obj        9.100000000000e+00
    MARK      'MARKER'                 'INTEND'
RHS
    RHS       c1         5.000000000000e+00
    RHS       c2         1.000000000000e+01
    RHS       c3         7.500000000000e+00
BOUNDS
 UP BND       x          4.000000000000e+00
 LO BND       y         -1.000000000000e+00
 UP BND       y          1.000000000000e+00
 LO BND       z          0.000000000000e+00
ENDATA

I get the following:

pysmps_data = mps.load_mps(file)
print(pysmps_data['variable']['y'])
# {'type': 'Continuous', 'name': 'y', 'bnd_lower': 0, 'bnd_upper': 1.0}

But I'd say y should have bnd_lower=-1. Right?

@jmaerte
Copy link
Owner

jmaerte commented Aug 31, 2021

I will look into this. Thanks for the test cases.

@pchtsp
Copy link
Author

pchtsp commented Oct 1, 2021

Hello @jmaerte could you find some time to check on this? if you need any help, do ask.

@jmaerte
Copy link
Owner

jmaerte commented Oct 1, 2021

Hello, I will schedule it for the next days. Currently I was not available for the project.

@pchtsp
Copy link
Author

pchtsp commented Oct 4, 2021 via email

@jmaerte
Copy link
Owner

jmaerte commented Oct 17, 2021

The latest commit should resolve those three issues.

@pchtsp
Copy link
Author

pchtsp commented Oct 18, 2021

Good news! I'll give it a look sometime this week I hope.

@jmaerte
Copy link
Owner

jmaerte commented Oct 18, 2021

Would be great. I will try to look into the other issues over the upcoming week.

@pchtsp
Copy link
Author

pchtsp commented Oct 25, 2021

Hello again, I re-run the tests. There's still one issue with the default lower bound. In the second example I provided:

    *SENSE:Minimize
    NAME          test_importMPS_integer
    ROWS
     N  obj
     L  c1
     G  c2
     E  c3
    COLUMNS
        x         c1         1.000000000000e+00
        x         c2         1.000000000000e+00
        x         obj        1.100000000000e+00
        y         c1         1.000000000000e+00
        y         c3        -1.000000000000e+00
        y         obj        4.100000000000e+00
        MARK      'MARKER'                 'INTORG'
        z         c2         1.000000000000e+00
        z         c3         1.000000000000e+00
        z         obj        9.100000000000e+00
        MARK      'MARKER'                 'INTEND'
    RHS
        RHS       c1         5.000000000000e+00
        RHS       c2         1.000000000000e+01
        RHS       c3         7.500000000000e+00
    BOUNDS
     UP BND       x          4.000000000000e+00
     LO BND       y         -1.000000000000e+00
     UP BND       y          1.000000000000e+00
     LO BND       z          0.000000000000e+00
    ENDATA

The current version of pysmps gives a "-inf" lower bound to variable "x". PuLP currently does not write lower bounds equal to 0 in mps files.

Just to make sure, I checked the GUROBI docs and they explicitly say:

BOUNDS section

The next section in an MPS file is the optional BOUNDS section. By default, each variable takes a lower bound of 0 and an infinite upper bound. Each line in this section can modify the lower bound of a variable, the upper bound, or both.

Also, the lpsolve docs seem to agree on this:

The optional BOUNDS section lets you put lower and upper bounds on
individual variables (no * wild cards, unfortunately), instead of
having to define extra rows in the matrix. All the bounds that have
a given name in column 5 are taken together as a set. Variables not
mentioned in a given BOUNDS set are taken to be non-negative (lower
bound zero, no upper bound).

So, if there is no lower bound in the mps file, it should be assumed a lower bound of 0.

Could you please change that?

Thanks,

Franco

@jmaerte
Copy link
Owner

jmaerte commented Oct 25, 2021

Latest commit fixes this.

@pchtsp
Copy link
Author

pchtsp commented Oct 25, 2021

great! with the changes, the pulp tests pass.

@jmaerte
Copy link
Owner

jmaerte commented Oct 25, 2021

I guess thats great news!

@jmaerte
Copy link
Owner

jmaerte commented Oct 25, 2021

Just out of curiosity; I modified the code such that it does not support different boundary groupings (tho it supports multiple RHS groups still) since you did not catch those cases.

I guess it would be best practice to catch those again, right?

The returned dict would be a little bit blown up but this way you could store multiple configurations of the same linear program in one file instance. Similar to the RHS groups stored in the 'rhs_names' attribute of the dict we would have a 'bnd_names' field and if only a single boundary is given the rest of the dict would look the same as now but in case multiple groups are given each variable would have an array instead of a value in 'bnd_lower' and so on...

What do you think? Is this a common practice in MPS files to save it this way. I think I have not seen it used in practice so far...

@pchtsp
Copy link
Author

pchtsp commented Nov 1, 2021

I've never seen the boundary groupings, actually. And through pulp we do not support that so at least from our side it is not necessary.

Having said that, I don't mind if pysmps goes back to supporting that functionality as long as its interface remains consistent. In your example, this does not appear the case (sometimes an array is returned, sometimes a value). As long as the returned object has always the same format, we will edit the code in pulp to match the definitive translation.

thanks,

@jmaerte
Copy link
Owner

jmaerte commented Dec 15, 2022

It has taken a while for me to come back to this issue - I have just reworked the MPS reader to fit the needs we discussed; the MPS format is now stored in a class rather than a dict (which can be transformed into a dict though) which is very helpful in that the user can now choose the bounds and rhs groups active (if there are multiple). I consider to add an MPS writer using this class as you requested in another issue.

Is there still interest in this? The newer version has no (known) runtime issues so far and doesn't use numpy. I will adjust the Stochastic part of the reader too and then commit. Let me know if this issue is still active. However - the backwards compatability will be broken by this update.

@pchtsp
Copy link
Author

pchtsp commented Jan 7, 2023

Thanks for the time to work on this. Yes, I still want to use this library in pulp, if possible. I will try to integrate it again soon.

I will report back as soon as I do it : )

@jmaerte
Copy link
Owner

jmaerte commented Jan 7, 2023

I have just updated the numpy_free branch and merged it into master. This loses backwards compatability. If you could check it out and test it with some of your examples this would help me a lot. For my examples this works just fine!

If you can confirm this and are okay with the output format I have decided to use for the MPS-class I will update the pip-version to the current state.

@jmaerte
Copy link
Owner

jmaerte commented Jan 7, 2023

I need to adjust the test-cases still - so don't wonder if they fail. Its mainly because i changed the entire output format so that every case can be represented the same (one problem with the old format was that it changed conditionally; depending on the file given).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants