add compile-time config option to globally set aggregate lifespan #534

jvantuyl · 2023-06-02T20:58:34Z

The default behavior of "aggregate processes stay around forever" can present difficulty in a production environment. For any system that creates a steady stream of new aggregates, memory creeps up slowly until exhausted.

It is currently possible to override this on a per-router basis, but that only creates a situation where forgetting to specify it one place creates an even more subtle "memory leak".

This commit allows changing the default globally at compile-time. It does not allow changing it at run-time, as this seemed like it would cause unpredictable behavior and possibly negatively impact performance.

Testing this is difficult since it's a compile-time change. As a compromise, I set the value to the default explicitly in the testing config. While this doesn't fully test the functionality, it does at least work-out the code.

The default behavior of "aggregate processes stay around forever" can present difficulty in a production environment. For any system that creates a steady stream of new aggregates, memory creeps up slowly until exhausted. It is currently possible to override this on a per-router basis, but that only creates a situation where forgetting to specify it one place creates an even more subtle "memory leak". This commit allows changing the default globally at compile-time. It does not allow changing it at run-time, as this seemed like it would cause unpredictable behavior and possibly negatively impact performance. Testing this is difficult since it's a compile-time change. As a compromise, I set the value to the default explicitly in the testing config. While this doesn't fully test the functionality, it does at least work-out the code.

yordis

💜 🚀

jvantuyl · 2023-06-12T22:10:50Z

Any idea what I need to do to get this merged?

fahchen · 2023-06-15T04:17:57Z

I think Commanded.Application.Config might be a suitable place to store aggregate_default_lifespan, which allows applications to decide whether to set it at runtime or compile-time.

yordis · 2023-06-15T04:23:41Z

These types of configurations are a danger to be swapped at runtime, IMO.

If somebody comes across an actual use case, maybe at that point could be refactored to allow it.

fahchen · 2023-06-15T07:45:27Z

I think that these types of configurations are actually used by commanded applications only.

I created a commit to demo the changes: fahchen@7c1741f

yordis · 2023-06-30T17:52:12Z

Hey, @slashdotdash Is there anything we could do to help you to get some love over here and get this feature released?

jvantuyl · 2023-07-27T10:01:27Z

If we can, I'd like to separate the "compile-time vs run-time" discussion from this PR. The compile-time option is useful enough to include whether or not we later want to add the option to change at run-time.

In general, I'm of the opinion that frameworks should generally have "safe" defaults. This timeout behavior is effectively the only "non-safe" default I can think of in Commanded. Any application with a growing count of aggregates effectively leaks processes (and, thus, grows in memory usage until it runs out).

I wouldn't be opposed to even adding a default timeout (maybe next version), since it shouldn't break most existing code. At most, it'll have some performance impact for people with very large numbers of events to a single aggregate and no snapshots. That seems (to me) like a reasonable price so that new Commanded apps don't basically start with a memory leak.

But, yeah... if we can't have a safe default, it would be nice to at least be able to globally change the default for a given app so nothing can slip through unnoticed.

fahchen · 2023-07-27T15:24:28Z

I think this PR may not be a good approach.
Firstly, each application has the responsibility for its own configuration, other applications should not depend on that.(official document states at https://hexdocs.pm/elixir/main/Application.html#module-the-application-environment).
Secondly, the config we try to set should belong to a commanded application, not the commanded itself.

The approach I mentioned before it’s not about choosing between compile-time and runtime config, it’s try to respect the two reasons above.

yordis · 2023-07-27T16:11:56Z

I see that you're looking to configure a specific value and you're wondering at which level this could be done. Well, you have a few options, each offering different degrees of control.

Globally: If you're after a broad solution, you can set it globally like this:

config :commanded, :aggregate_default_lifespan, Commanded.Aggregates.DefaultLifespan

Per Application: Looking to configure the value on a per-application basis? Here's how you can do that:

config :myapp, MyApp.CommandedApp,
  aggregate_default_lifespan: Commanded.Aggregates.DefaultLifespan

Per Router: Or, perhaps, you'd prefer to control it on a router level? That can be done in the following way:

defmodule MyApp.BankAccount.Router do
  use Commanded.Commands.Router
  
  lifespan BankAccountAggregate, Commanded.Aggregates.DefaultLifespan
end

So, depending on where you want to exercise control and your needs, you can choose the most suitable option.

Secondly, the config we try to set should belong to a commanded application, not the commanded itself.

You suggested that these should belong to a specific commanded application, rather than commanded itself. I see where you're coming from and agree that having control over such settings per application could indeed be quite useful.

Consider, for instance, configuring it according to the Router. That could be a very viable option as well.

Firstly, each application has the responsibility for its own configuration, other applications should not depend on that.(official document states at https://hexdocs.pm/elixir/main/Application.html#module-the-application-environment).

Let's address the point about each application being responsible for its own configuration.

You're absolutely right that this is the general guideline outlined in the Elixir documentation. However, it's crucial to remember that this is a guideline and not a rule set in stone.

In reality, there are configurations that are global to commanded, such as:

config :commanded, type_provider: MyApp.TypeProvider

So, the Elixir documentation's statement regarding "application" needs to be understood in the right context. Elixir is referring to OTP applications, since he is indeed adding to :commanded application scope, he is respecting and following the guidelines.

It all boils down to the context and specific needs. There's no one-size-fits-all solution here. In other words, "It Depends ™️."

Happy coding! 💜 💜 💜

yordis

@jvantuyl would you be OK trying to also support the configuration at the Commanded.Application instance level?

jvantuyl · 2023-08-02T06:52:32Z

For the use-case I'm most concerned with, Application-level would work. So I'll take it.

Still not a fan of unsafe defaults. I'd rather not have "Every Commanded Application, with default settings, is a time bomb." be a true statement. And, given how most applications don't see the load in staging that they do in production... it's a time bomb that's most likely to go off in production. It's not great.

fahchen · 2023-08-02T16:32:39Z

Maybe we could divide these problems into two parts.
One is to support the default lifespan for aggregates, another is to attempt to fix the time bomb, or alternatively, leave it untouched.

yordis · 2023-08-02T16:41:44Z

IMO, If we can handle this under the guideline, why should we break it? :)

@fahchen I do not think you understand honestly, but it is not that important. Yes, I agree with you, your proposal makes sense.

yordis · 2023-08-02T16:45:44Z

I'd rather not have "Every Commanded Application, with default settings, is a time bomb."

@jvantuyl I would like to share some history about it based on HMB.

It was critical to have such "sharding" opportunities at multiple levels and understand how scalable the codebase could be as the application performance changes without having to rearchitect or change the system drastically. It was part of the decision-making and reasons to adopt an entirely new stack.

TL;DR about scaling the system over time: https://10consulting.com/2021/03/18/commanded-application-architecture/

You could disagree with it still, but everyone knew.

By design, it is good these level configurations exist.

Your concerns are valid, but it is a programmer problem rather than a technology problem. Commanded already allows you to do multi-app architecture styles.

jvantuyl · 2023-08-03T14:23:50Z

I don't particularly disagree that users should learn their system and architect it accordingly. And I'm a huge fan of systems that are flexible enough to handle multiple styles of multi-app architectures. But...

"Having sharp edges coming out of the box" isn't a programmer problem. The rule of least astonishment is pretty well established. Defaults can surprise people. That's entirely possible. But it's still just poor ergonomics.

Consider the example aggregates in the docs: bank accounts, account balances. Is there any real-world case where those should have infinite lifespans? If the developer is thinking of this as a "database", is there any other database or cache that you can think of that doesn't automatically reclaim memory as it runs?

Don't just take it from me. Take a look at blog posts. One of the first ones you'll find is the excellent "Event Sourcing With Elixir" series by @saurdaukar. Aggregate Lifespans show up in the very last installment (part 7). How are they introduced? By an example of this exact behavior surprising the developer and them not noticing it until it was deployed and running for some time. It's even in a section that's explicitly about the gotcha's of running Commanded in production.

Here @rodrigobotti runs across it when adopting Commanded. How does he introduce this? "It might not be that apparent but our application has both memory and performance problems coming from our aggregates." That's right. It has a "memory problem" and "it might not be apparent". This is not praise.

How about when @ChristianAlexander mentions it in this blog post? He describes it as "That sounds like an OOM waiting to happen."

Or here where the entire blog post is about how this was killing an app in production?

Or search on Stack Overflow and you'll find that this has been a problem from the beginning. Note how the user describes it? A "memory leak". There's the clear connotation that this is so unexpected that it falls into a class of common bug. And how did they find it? Their app was running out of memory frequently.

In fact, if you Google for "elixir commanded lifespan", there isn't a single relevant result that doesn't cast this as a "gotcha" or a "problem".

If we're dead-set on keeping the "default behavior is to never free memory" thing, perhaps we can update the documentation to make it more obvious. It's not even mentioned in the section on aggregates. It's only briefly mentioned in at the very bottom of the section on Command Handlers. Even then, it's pretty easy for a developer to miss that "By default an aggregate instance process will run indefinitely once started." means "Use this or you'll leak memory until you run out."

Or maybe we could alter Commanded.Aggregates.DefaultLifespan to emit a one-time warning the first time after_command/1 is called?

Just something to make users think about this if they miss that part of the documentation? Preferably before they have to explain to their boss why the tool they picked is crashing in production?

yordis · 2023-08-03T15:55:11Z

There is a miscommunication,

I'd rather not have "Every Commanded Application, with default settings, is a time bomb."

My reply was "not having Commanded Apps with default settings statements". It was NOT about sensitive defaults. I pointed out that Commanded has a different level of architecture; therefore, it needs those default settings (regardless of sensitivity settings or not)

I agree with you regarding "good defaults," The Account Balances or any other situation are well-known to me (I can put together your system since I have it all in my head, literally), so I get it, and I agree with you.

I don't like it, but 🤷🏻 hopefully, your feature in because it is much needed a long-time ago. The intent (at HMB) was for Seniors to provide better tools, hopefully mitigating these issues and giving better copy-pasting.

Just something to make users think about this if they miss that part of the documentation? Preferably before they have to explain to their boss why the tool they picked is crashing in production?

Tell Billy in Slack to pay attention and copy-paste the right stuff! It is always Billy's fault!

slashdotdash · 2024-01-09T15:10:08Z

This PR doesn't address the legitimate concern raised by @jvantuyl that Commanded provides a default aggregate lifespan which keeps aggregate processes run indefinitely. I'm in agreement that it is OK for the default value to be changed. We could set it to an appropriate timeout value, something like 24 hours?

Secondly, the default aggregate lifespan configuration should be associated with a Commanded Application, rather than being a global value associated with the :commanded app. Such global configuration is undesirable (e.g. difficult or impossible to test). We can support per Commanded Application configuration by using the Commanded.Application.Config module and with appropriate documentation added to the Commanded.Application module docs.

yordis approved these changes Jun 2, 2023

View reviewed changes

yordis suggested changes Jul 27, 2023

View reviewed changes

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add compile-time config option to globally set aggregate lifespan #534

add compile-time config option to globally set aggregate lifespan #534

jvantuyl commented Jun 2, 2023

yordis left a comment

jvantuyl commented Jun 12, 2023

fahchen commented Jun 15, 2023

yordis commented Jun 15, 2023

fahchen commented Jun 15, 2023 •

edited

yordis commented Jun 30, 2023

jvantuyl commented Jul 27, 2023 •

edited

fahchen commented Jul 27, 2023

yordis commented Jul 27, 2023

yordis left a comment

jvantuyl commented Aug 2, 2023

fahchen commented Aug 2, 2023

This comment was marked as off-topic.

yordis commented Aug 2, 2023

yordis commented Aug 2, 2023

jvantuyl commented Aug 3, 2023 •

edited

yordis commented Aug 3, 2023

slashdotdash commented Jan 9, 2024 •

edited

add compile-time config option to globally set aggregate lifespan #534

Are you sure you want to change the base?

add compile-time config option to globally set aggregate lifespan #534

Conversation

jvantuyl commented Jun 2, 2023

yordis left a comment

Choose a reason for hiding this comment

jvantuyl commented Jun 12, 2023

fahchen commented Jun 15, 2023

yordis commented Jun 15, 2023

fahchen commented Jun 15, 2023 • edited

yordis commented Jun 30, 2023

jvantuyl commented Jul 27, 2023 • edited

fahchen commented Jul 27, 2023

yordis commented Jul 27, 2023

yordis left a comment

Choose a reason for hiding this comment

jvantuyl commented Aug 2, 2023

fahchen commented Aug 2, 2023

This comment was marked as off-topic.

yordis commented Aug 2, 2023

yordis commented Aug 2, 2023

jvantuyl commented Aug 3, 2023 • edited

yordis commented Aug 3, 2023

slashdotdash commented Jan 9, 2024 • edited

fahchen commented Jun 15, 2023 •

edited

jvantuyl commented Jul 27, 2023 •

edited

jvantuyl commented Aug 3, 2023 •

edited

slashdotdash commented Jan 9, 2024 •

edited