Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add compile-time config option to globally set aggregate lifespan #534

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jvantuyl
Copy link

@jvantuyl jvantuyl commented Jun 2, 2023

The default behavior of "aggregate processes stay around forever" can present difficulty in a production environment. For any system that creates a steady stream of new aggregates, memory creeps up slowly until exhausted.

It is currently possible to override this on a per-router basis, but that only creates a situation where forgetting to specify it one place creates an even more subtle "memory leak".

This commit allows changing the default globally at compile-time. It does not allow changing it at run-time, as this seemed like it would cause unpredictable behavior and possibly negatively impact performance.

Testing this is difficult since it's a compile-time change. As a compromise, I set the value to the default explicitly in the testing config. While this doesn't fully test the functionality, it does at least work-out the code.

The default behavior of "aggregate processes stay around forever" can
present difficulty in a production environment.  For any system that
creates a steady stream of new aggregates, memory creeps up slowly until
exhausted.

It is currently possible to override this on a per-router basis, but
that only creates a situation where forgetting to specify it one place
creates an even more subtle "memory leak".

This commit allows changing the default globally at compile-time.  It
does not allow changing it at run-time, as this seemed like it would
cause unpredictable behavior and possibly negatively impact performance.

Testing this is difficult since it's a compile-time change.  As a
compromise, I set the value to the default explicitly in the testing
config.  While this doesn't fully test the functionality, it does at
least work-out the code.
Copy link
Contributor

@yordis yordis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💜 🚀

@jvantuyl
Copy link
Author

Any idea what I need to do to get this merged?

@fahchen
Copy link
Contributor

fahchen commented Jun 15, 2023

I think Commanded.Application.Config might be a suitable place to store aggregate_default_lifespan, which allows applications to decide whether to set it at runtime or compile-time.

@yordis
Copy link
Contributor

yordis commented Jun 15, 2023

These types of configurations are a danger to be swapped at runtime, IMO.

If somebody comes across an actual use case, maybe at that point could be refactored to allow it.

@fahchen
Copy link
Contributor

fahchen commented Jun 15, 2023

I think that these types of configurations are actually used by commanded applications only.

I created a commit to demo the changes: fahchen@7c1741f

@yordis
Copy link
Contributor

yordis commented Jun 30, 2023

Hey, @slashdotdash Is there anything we could do to help you to get some love over here and get this feature released?

@jvantuyl
Copy link
Author

jvantuyl commented Jul 27, 2023

If we can, I'd like to separate the "compile-time vs run-time" discussion from this PR. The compile-time option is useful enough to include whether or not we later want to add the option to change at run-time.

In general, I'm of the opinion that frameworks should generally have "safe" defaults. This timeout behavior is effectively the only "non-safe" default I can think of in Commanded. Any application with a growing count of aggregates effectively leaks processes (and, thus, grows in memory usage until it runs out).

I wouldn't be opposed to even adding a default timeout (maybe next version), since it shouldn't break most existing code. At most, it'll have some performance impact for people with very large numbers of events to a single aggregate and no snapshots. That seems (to me) like a reasonable price so that new Commanded apps don't basically start with a memory leak.

But, yeah... if we can't have a safe default, it would be nice to at least be able to globally change the default for a given app so nothing can slip through unnoticed.

@fahchen
Copy link
Contributor

fahchen commented Jul 27, 2023

I think this PR may not be a good approach.
Firstly, each application has the responsibility for its own configuration, other applications should not depend on that.(official document states at https://hexdocs.pm/elixir/main/Application.html#module-the-application-environment).
Secondly, the config we try to set should belong to a commanded application, not the commanded itself.

The approach I mentioned before it’s not about choosing between compile-time and runtime config, it’s try to respect the two reasons above.

@yordis
Copy link
Contributor

yordis commented Jul 27, 2023

I see that you're looking to configure a specific value and you're wondering at which level this could be done. Well, you have a few options, each offering different degrees of control.

  1. Globally: If you're after a broad solution, you can set it globally like this:
config :commanded, :aggregate_default_lifespan, Commanded.Aggregates.DefaultLifespan
  1. Per Application: Looking to configure the value on a per-application basis? Here's how you can do that:
config :myapp, MyApp.CommandedApp,
  aggregate_default_lifespan: Commanded.Aggregates.DefaultLifespan
  1. Per Router: Or, perhaps, you'd prefer to control it on a router level? That can be done in the following way:
defmodule MyApp.BankAccount.Router do
  use Commanded.Commands.Router
  
  lifespan BankAccountAggregate, Commanded.Aggregates.DefaultLifespan
end

So, depending on where you want to exercise control and your needs, you can choose the most suitable option.


Secondly, the config we try to set should belong to a commanded application, not the commanded itself.

You suggested that these should belong to a specific commanded application, rather than commanded itself. I see where you're coming from and agree that having control over such settings per application could indeed be quite useful.

Consider, for instance, configuring it according to the Router. That could be a very viable option as well.


Firstly, each application has the responsibility for its own configuration, other applications should not depend on that.(official document states at https://hexdocs.pm/elixir/main/Application.html#module-the-application-environment).

Let's address the point about each application being responsible for its own configuration.

You're absolutely right that this is the general guideline outlined in the Elixir documentation. However, it's crucial to remember that this is a guideline and not a rule set in stone.

In reality, there are configurations that are global to commanded, such as:

config :commanded, type_provider: MyApp.TypeProvider

So, the Elixir documentation's statement regarding "application" needs to be understood in the right context. Elixir is referring to OTP applications, since he is indeed adding to :commanded application scope, he is respecting and following the guidelines.

It all boils down to the context and specific needs. There's no one-size-fits-all solution here. In other words, "It Depends ™️."

Happy coding! 💜 💜 💜

Copy link
Contributor

@yordis yordis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvantuyl would you be OK trying to also support the configuration at the Commanded.Application instance level?

@jvantuyl
Copy link
Author

jvantuyl commented Aug 2, 2023

For the use-case I'm most concerned with, Application-level would work. So I'll take it.

Still not a fan of unsafe defaults. I'd rather not have "Every Commanded Application, with default settings, is a time bomb." be a true statement. And, given how most applications don't see the load in staging that they do in production... it's a time bomb that's most likely to go off in production. It's not great.

@fahchen
Copy link
Contributor

fahchen commented Aug 2, 2023

Maybe we could divide these problems into two parts.
One is to support the default lifespan for aggregates, another is to attempt to fix the time bomb, or alternatively, leave it untouched.

@fahchen

This comment was marked as off-topic.

@yordis
Copy link
Contributor

yordis commented Aug 2, 2023

IMO, If we can handle this under the guideline, why should we break it? :)

@fahchen I do not think you understand honestly, but it is not that important. Yes, I agree with you, your proposal makes sense.

@yordis
Copy link
Contributor

yordis commented Aug 2, 2023

I'd rather not have "Every Commanded Application, with default settings, is a time bomb."

@jvantuyl I would like to share some history about it based on HMB.

It was critical to have such "sharding" opportunities at multiple levels and understand how scalable the codebase could be as the application performance changes without having to rearchitect or change the system drastically. It was part of the decision-making and reasons to adopt an entirely new stack.

TL;DR about scaling the system over time: https://10consulting.com/2021/03/18/commanded-application-architecture/

You could disagree with it still, but everyone knew.

By design, it is good these level configurations exist.


Your concerns are valid, but it is a programmer problem rather than a technology problem. Commanded already allows you to do multi-app architecture styles.

@jvantuyl
Copy link
Author

jvantuyl commented Aug 3, 2023

I don't particularly disagree that users should learn their system and architect it accordingly. And I'm a huge fan of systems that are flexible enough to handle multiple styles of multi-app architectures. But...

"Having sharp edges coming out of the box" isn't a programmer problem. The rule of least astonishment is pretty well established. Defaults can surprise people. That's entirely possible. But it's still just poor ergonomics.

Consider the example aggregates in the docs: bank accounts, account balances. Is there any real-world case where those should have infinite lifespans? If the developer is thinking of this as a "database", is there any other database or cache that you can think of that doesn't automatically reclaim memory as it runs?

Don't just take it from me. Take a look at blog posts. One of the first ones you'll find is the excellent "Event Sourcing With Elixir" series by @saurdaukar. Aggregate Lifespans show up in the very last installment (part 7). How are they introduced? By an example of this exact behavior surprising the developer and them not noticing it until it was deployed and running for some time. It's even in a section that's explicitly about the gotcha's of running Commanded in production.

Here @rodrigobotti runs across it when adopting Commanded. How does he introduce this? "It might not be that apparent but our application has both memory and performance problems coming from our aggregates." That's right. It has a "memory problem" and "it might not be apparent". This is not praise.

How about when @ChristianAlexander mentions it in this blog post? He describes it as "That sounds like an OOM waiting to happen."

Or here where the entire blog post is about how this was killing an app in production?

Or search on Stack Overflow and you'll find that this has been a problem from the beginning. Note how the user describes it? A "memory leak". There's the clear connotation that this is so unexpected that it falls into a class of common bug. And how did they find it? Their app was running out of memory frequently.

In fact, if you Google for "elixir commanded lifespan", there isn't a single relevant result that doesn't cast this as a "gotcha" or a "problem".

If we're dead-set on keeping the "default behavior is to never free memory" thing, perhaps we can update the documentation to make it more obvious. It's not even mentioned in the section on aggregates. It's only briefly mentioned in at the very bottom of the section on Command Handlers. Even then, it's pretty easy for a developer to miss that "By default an aggregate instance process will run indefinitely once started." means "Use this or you'll leak memory until you run out."

Or maybe we could alter Commanded.Aggregates.DefaultLifespan to emit a one-time warning the first time after_command/1 is called?

Just something to make users think about this if they miss that part of the documentation? Preferably before they have to explain to their boss why the tool they picked is crashing in production?

@yordis
Copy link
Contributor

yordis commented Aug 3, 2023

There is a miscommunication,

I'd rather not have "Every Commanded Application, with default settings, is a time bomb."

My reply was "not having Commanded Apps with default settings statements". It was NOT about sensitive defaults. I pointed out that Commanded has a different level of architecture; therefore, it needs those default settings (regardless of sensitivity settings or not)


I agree with you regarding "good defaults," The Account Balances or any other situation are well-known to me (I can put together your system since I have it all in my head, literally), so I get it, and I agree with you.

I don't like it, but 🤷🏻 hopefully, your feature in because it is much needed a long-time ago. The intent (at HMB) was for Seniors to provide better tools, hopefully mitigating these issues and giving better copy-pasting.

Just something to make users think about this if they miss that part of the documentation? Preferably before they have to explain to their boss why the tool they picked is crashing in production?

Tell Billy in Slack to pay attention and copy-paste the right stuff! It is always Billy's fault!

@slashdotdash
Copy link
Member

slashdotdash commented Jan 9, 2024

This PR doesn't address the legitimate concern raised by @jvantuyl that Commanded provides a default aggregate lifespan which keeps aggregate processes run indefinitely. I'm in agreement that it is OK for the default value to be changed. We could set it to an appropriate timeout value, something like 24 hours?

Secondly, the default aggregate lifespan configuration should be associated with a Commanded Application, rather than being a global value associated with the :commanded app. Such global configuration is undesirable (e.g. difficult or impossible to test). We can support per Commanded Application configuration by using the Commanded.Application.Config module and with appropriate documentation added to the Commanded.Application module docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants