Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger scheduler API endpoint fails to save the results to the database on subsequent calls #683

Closed
victorgarcia98 opened this issue May 14, 2023 · 12 comments · Fixed by #709
Assignees
Labels
API bug Something isn't working

Comments

@victorgarcia98
Copy link
Contributor

When trying to run the toy-schedule multiple times trough the API endpoint, I've encountered that the second time I call the schedule, it fails to save the results to the database, see Error below.

In order to reproduce this, I've tested to trigger the schedule twice with the toy-account (see Setup to reproduce this behavior). This error persists on server restart but is gone on a fresh new database.

⚠️ Reminder: you need to use different arguments in the API calls to bypass the "anti-repeated call feature".

A potential fix consists of delete all the objects in the session. This can be done by setting the argument expunge_session of add_to_session to True, exposing it in the function save_to_db as an optional argument with default value False.

Follow up

Please, let me know if this issue shows up to you, as well. In case this error happens to you as well, I would like to know if the proposed solution is on point or, requires more research to see the origin of this error.


Error

HANDLING RQ SCHEDULING WORKER EXCEPTION: <class 'sqlalchemy.exc.InvalidRequestError'>:Can't attach instance <Sensor at 0x7f0f9a1e5e70>; another instance with key (<class 'flexmeasures.data.models.time_series.Sensor'>, (1,), None) is already present in this session.

Setup to reproduce this behavior

1. Add accounts

1.0. Delete database

make clean-db db_name=flexmeasures

1.1. toy-account

flexmeasures add toy-account

1.2. admin account

flexmeasures add account --name  "admin"
flexmeasures add user --username admin --email admin@admin.com --account-id 2 --roles=admin

2. Authentication

URL

http://localhost:5000/api/requestAuthToken

HEADERS

Content-Type: application/json

BODY

{
	"email" : "admin@admin.com",
	"password" : "admin"
}

3. Post Measurements

URL

http://localhost:5000/api/v3_0/sensors/data

HEADERS

Content-Type: application/json
Authorization: <auth_token>

BODY

{
	"sensor" : "ea1.2023-06.localhost:fm1.2",
	"start" : "2023-05-14T00:00:00+02:00",
	"duration" : "PT24H",
	"unit" : "EUR/MWh",
	"values" :  [10, 11, 12, 15, 18, 17,
          10.5, 9, 9.5, 9, 8.5, 10,
          8, 5, 4, 4, 5.5, 8,
          12, 13, 14, 12.5, 10, 7]
}

4. Trigger Storage Schedule

URL

localhost:5000/api/v3_0/sensors/1/schedules/trigger

HEADERS

Authorization: <auth_token>
Content-Type: application/json

BODY

{
	"start" : "2023-05-14T00:00:00+02:00",
	"duration" : "PT12H",
	"flex-model" : {
		"soc-unit" : "MWh",
		"soc-at-start" : 0.212,
		"soc-targets" : []
	},
	"flex-context" : {
		"production-price-sensor" : 2,
		"consumption-price-sensor" : 2
	}
}
@victorgarcia98 victorgarcia98 added bug Something isn't working API labels May 14, 2023
@victorgarcia98 victorgarcia98 self-assigned this May 14, 2023
@nhoening
Copy link
Contributor

nhoening commented May 14, 2023

Thanks for this report @victorgarcia98 - I researched this behavior before (just can't find it right now), and I saw that this only happened if the very same data gets written a second time.
I hope I find my analysis again. Also, I'm not sure where we left this at the time. Maybe we didn't understand the error, but computing the same schedule twice seemed something we would prevent from happening anyways.
I do want to understand what SQLAlchemy really complains about here.

@victorgarcia98
Copy link
Contributor Author

Great!

Indeed, this shouldn't matter when running the scheduler with the same arguments (here make_schedule wouldn't be called due to the new repeated call feature). However, this is happening to me when running the schedule multiple times with different soc-at-start values.

For reproducing the error, I'm using Insomnia to make the calls to the API. If you want, I can export the requests that I'm doing to a file, which should make it easier/faster by importing them to a tool such as Postman.

@nhoening
Copy link
Contributor

@Flix6x do you recall where I wrote about this a few weeks ago?

@nhoening
Copy link
Contributor

@victorgarcia98 I found my previous research on the issue.

I concluded that "overwriting of pre-existing data might be broken". This means overwriting with data for the same timing, but with different values. The logic which handles that might be in timely-beliefs.

@Flix6x
Copy link
Contributor

Flix6x commented May 15, 2023

Some potentially relevant thoughts.

When saving beliefs data that may contain such "changed beliefs" (from the same source, at the same belief time, about the same event, but assigning a different value), we should use a dedicated function we made (I think save_to_db) in I think data.utils. I expect this function has relevant logic to either prevent such errors, or raise more informative error messages.

I would actually expect our scheduling logic (both through the API and the CLI) to generate beliefs with a unique belief time each time the logic is executed. That is, somewhere in the validation of the prior field as the belief_time, it would default to server_now.

@victorgarcia98
Copy link
Contributor Author

victorgarcia98 commented May 19, 2023

After some research, I found the following:

Potential Fixes

Method 1

A potential fix consists of delete all the objects in the session. This can be done by setting the argument expunge_session of add_to_session to True, exposing it in the function save_to_db as an optional argument with default value False.

Instead of expunging all the objects, there's the possibility to expunge just the sensor object. The issue of doing so is that we might lose the data of the Sensor, such as the updated attributes that get updated after the call. Maybe we need to commit that information first in the method persist_flex_model.

Method 2

Another potential fix is to create a fresh new Sensor object after consumption_schedule = scheduler.compute(). It turns out that the object is detaching after the call of compute, regardless of the initial conditions.

Hypothesis

Relationship

On saving the results, the TimedBelief objects try to fetch the Sensor object by doing a DB merge of the sensor table into timed_belief (or subquery). This is where it would probably complain given that the Sensor object is already in the session.

Uncommitted sensor

On compute, we update the SOC depending if we compute the schedule with the same parameters or not. In case this is updated, it will requires us to commit the results to the database. That is probably why creating a new sensor instance avoid the problem, deleting the changes that have not been changed.

@Flix6x
Copy link
Contributor

Flix6x commented May 19, 2023

Thank you for diving deeper into this. I suspect that we no longer need to keep the logic that updates sensor attributes on calling compute, as it is connected to API versions that have now been sunset. So that might be one way forward I could try.

@Flix6x
Copy link
Contributor

Flix6x commented May 25, 2023

I'm still not able to reproduce. I am running:

Versions:

  • Python 3.9 (from running python --version)
  • Postgres: PostgreSQL 13.5 (from running SELECT version(); in postgres terminal
  • SQLAlchemy 1.4.47 (from running pip list)
  • Flask-SQLAlchemy 2.5.1 (from running pip list)

Preparation

In flexmeasures.cfg:

SQLALCHEMY_DATABASE_URI = "postgresql://toy:toy@127.0.0.1/toy"

Create test_client.py containing:

import asyncio

from flexmeasures_client.client import FlexMeasuresClient as Client


async def my_script():
    await client.post_measurements(
        sensor_id=2,
        entity_address="ea1.2023-06.localhost:fm1",
        start="2023-05-14T00:00:00+02:00",
        duration="PT24H",
        unit="EUR/MWh",
        values=[10, 11, 12, 15, 18, 17,
                10.5, 9, 9.5, 9, 8.5, 10,
                8, 5, 4, 4, 5.5, 8,
                12, 13, 14, 12.5, 10, 7],
    )
    await client.trigger_storage_schedule(
        sensor_id=1,
        start="2023-05-14T00:00:00+02:00",
        duration="PT12H",
        soc_unit="MWh",
        soc_at_start=0.212,
        soc_targets=[],
        consumption_price_sensor=2,
        production_price_sensor=2,
    )
    await client.trigger_storage_schedule(
        sensor_id=1,
        start="2023-05-14T00:00:00+02:00",
        duration="PT11H",  # this changed to bypass the "anti-repeated call feature"
        soc_unit="MWh",
        soc_at_start=0.212,
        soc_targets=[],
        consumption_price_sensor=2,
        production_price_sensor=2,
    )
    await client.trigger_storage_schedule(
        sensor_id=1,
        start="2023-05-14T00:00:00+02:00",
        duration="PT12H",
        soc_unit="MWh",
        soc_at_start=0.213,  # change something else
        soc_targets=[],
        consumption_price_sensor=2,
        production_price_sensor=2,
    )

client = Client(
    email="admin@admin.com",
    password="admin",
    host="localhost:5000",
)

asyncio.run(my_script())

First terminal

Bash terminal for setting up and running the server:

make clean-db db_name=toy db_user=toy
flexmeasures add toy-account
flexmeasures add user --username admin --email admin@admin.com --account-id 2 --roles=admin
# password: admin
flexmeasures run

Second terminal

Bash terminal for running a worker:

flexmeasures jobs run-worker --queue scheduling

Third terminal

Bash terminal for running the client:

python test_client.py

@victorgarcia98
Copy link
Contributor Author

Thanks for such a detailed description!

Running through this locally, I encountered the same error both in Python 3.9.16 and Python 3.10.6 (Postgesql v12.14 and v13.11 both showing the same).

Trying with @Flix6x, we confirmed the following:

  • Creating a new sensor object after calling compute fixes the issue but feels hacky.
  • Expunging all the objects (current_app.db.session.expunge_all()) at the end of the job fixes the issue.
    • As a side note, @Flix6x proposed to add the decorator as_transaction, as well, in order to rollback in case of failure.
  • We commented out persist_flex_model to test if it might be the cause, but the error persisted.
  • We tried to disable lazy loading of attributes in SensorIdField but didn't change the outcome.

New ideas:

  • DB Isolation Levels : SO post talking about this and why expunge might be solving our issue
  • Comment by a guy called Ivan Velichko in this SO answer that resembles our problem.

@victorgarcia98
Copy link
Contributor Author

In the process of using the new command flexmeasures add report multiple times, I encountered this same error:

Can't attach instance <Sensor at 0x7ff5ee5208b0>; another instance with key (<class 'flexmeasures.data.models.time_series.Sensor'>, (58,), None) is already present in this session.

As in the scheduling, running the command works fine in the first try but fails in subsequent calls, for the same time period.

Similar to the scheduling case, I tried to expunge all the objects, but it didn't fix it :(

To explore this issue, I run a step-by-step debugging to check the status of the sensor instance object.

To check the status of the sensor instance, we can use the following:

from sqlalchemy import inspect
inspect(sensor).detached

The first report Sensor instance is created in the custom field SensorIdField, as the attribute --sensor-id is deserialized into a proper Sensor object. Inside the _deserialize method, the instance is attached to the session and we can retrieve the beliefs using backreference:

>> sensor.beliefs
[<TimedBelief: at 202...3.463149)>, <TimedBelief: at 202...3.463149)>, ...]

After leaving the _deserialize method, the report Sensor instance is detached, and we can no longer get the beliefs using backref:

>> sensor.beliefs

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/victor/Work/Seita/flexmeasures/venv/lib/python3.10/site-packages/sqlalchemy/orm/attributes.py", line 482, in __get__
    return self.impl.get(state, dict_)
  File "/home/victor/Work/Seita/flexmeasures/venv/lib/python3.10/site-packages/sqlalchemy/orm/attributes.py", line 942, in get
    value = self._fire_loader_callables(state, key, passive)
  File "/home/victor/Work/Seita/flexmeasures/venv/lib/python3.10/site-packages/sqlalchemy/orm/attributes.py", line 978, in _fire_loader_callables
    return self.callable_(state, passive)
  File "/home/victor/Work/Seita/flexmeasures/venv/lib/python3.10/site-packages/sqlalchemy/orm/strategies.py", line 863, in _load_for_state
    raise orm_exc.DetachedInstanceError(
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Sensor at 0x7f1b9db976a0> is not bound to a Session; lazy load operation of attribute 'beliefs' cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)

With this in consideration, the solution is to reattach the report Sensor Instance to the session in this by either of the following options:

  • db.session.add(sensor)
  • result.sensor = Sensor.query.get(sensor.id)

@Flix6x
Copy link
Contributor

Flix6x commented Jun 1, 2023

Two thoughts:

  • Does removing the backref fix the issue? Maybe we don't need it, because we usually use sensor.search_beliefs (because we don't typically need to fetch all beliefs for a sensor).
  • Could we extend theSensorIdField with some post-deserialization method, where we add the sensor to the session?

@victorgarcia98
Copy link
Contributor Author

victorgarcia98 commented Jun 1, 2023

Instead, a more DRY solution is to add in the sensor = Sensor.query.get(sensor.id) inside the TimedBelief class constructor.

Tested this solution for both cases and it works 🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API bug Something isn't working
Projects
None yet
3 participants