New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add compile-time config option to globally set aggregate lifespan #534
base: master
Are you sure you want to change the base?
add compile-time config option to globally set aggregate lifespan #534
Conversation
The default behavior of "aggregate processes stay around forever" can present difficulty in a production environment. For any system that creates a steady stream of new aggregates, memory creeps up slowly until exhausted. It is currently possible to override this on a per-router basis, but that only creates a situation where forgetting to specify it one place creates an even more subtle "memory leak". This commit allows changing the default globally at compile-time. It does not allow changing it at run-time, as this seemed like it would cause unpredictable behavior and possibly negatively impact performance. Testing this is difficult since it's a compile-time change. As a compromise, I set the value to the default explicitly in the testing config. While this doesn't fully test the functionality, it does at least work-out the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💜 🚀
Any idea what I need to do to get this merged? |
I think |
These types of configurations are a danger to be swapped at runtime, IMO. If somebody comes across an actual use case, maybe at that point could be refactored to allow it. |
I think that these types of configurations are actually used by commanded applications only. I created a commit to demo the changes: fahchen@7c1741f |
Hey, @slashdotdash Is there anything we could do to help you to get some love over here and get this feature released? |
If we can, I'd like to separate the "compile-time vs run-time" discussion from this PR. The compile-time option is useful enough to include whether or not we later want to add the option to change at run-time. In general, I'm of the opinion that frameworks should generally have "safe" defaults. This timeout behavior is effectively the only "non-safe" default I can think of in Commanded. Any application with a growing count of aggregates effectively leaks processes (and, thus, grows in memory usage until it runs out). I wouldn't be opposed to even adding a default timeout (maybe next version), since it shouldn't break most existing code. At most, it'll have some performance impact for people with very large numbers of events to a single aggregate and no snapshots. That seems (to me) like a reasonable price so that new Commanded apps don't basically start with a memory leak. But, yeah... if we can't have a safe default, it would be nice to at least be able to globally change the default for a given app so nothing can slip through unnoticed. |
I think this PR may not be a good approach. The approach I mentioned before it’s not about choosing between compile-time and runtime config, it’s try to respect the two reasons above. |
I see that you're looking to configure a specific value and you're wondering at which level this could be done. Well, you have a few options, each offering different degrees of control.
config :commanded, :aggregate_default_lifespan, Commanded.Aggregates.DefaultLifespan
config :myapp, MyApp.CommandedApp,
aggregate_default_lifespan: Commanded.Aggregates.DefaultLifespan
defmodule MyApp.BankAccount.Router do
use Commanded.Commands.Router
lifespan BankAccountAggregate, Commanded.Aggregates.DefaultLifespan
end So, depending on where you want to exercise control and your needs, you can choose the most suitable option.
You suggested that these should belong to a specific commanded application, rather than commanded itself. I see where you're coming from and agree that having control over such settings per application could indeed be quite useful. Consider, for instance, configuring it according to the Router. That could be a very viable option as well.
Let's address the point about each application being responsible for its own configuration. You're absolutely right that this is the general guideline outlined in the Elixir documentation. However, it's crucial to remember that this is a guideline and not a rule set in stone. In reality, there are configurations that are global to config :commanded, type_provider: MyApp.TypeProvider So, the Elixir documentation's statement regarding "application" needs to be understood in the right context. Elixir is referring to OTP applications, since he is indeed adding to It all boils down to the context and specific needs. There's no one-size-fits-all solution here. In other words, "It Depends ™️." Happy coding! 💜 💜 💜 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvantuyl would you be OK trying to also support the configuration at the Commanded.Application
instance level?
For the use-case I'm most concerned with, Application-level would work. So I'll take it. Still not a fan of unsafe defaults. I'd rather not have "Every Commanded Application, with default settings, is a time bomb." be a true statement. And, given how most applications don't see the load in staging that they do in production... it's a time bomb that's most likely to go off in production. It's not great. |
Maybe we could divide these problems into two parts. |
This comment was marked as off-topic.
This comment was marked as off-topic.
@fahchen I do not think you understand honestly, but it is not that important. Yes, I agree with you, your proposal makes sense. |
@jvantuyl I would like to share some history about it based on HMB. It was critical to have such "sharding" opportunities at multiple levels and understand how scalable the codebase could be as the application performance changes without having to rearchitect or change the system drastically. It was part of the decision-making and reasons to adopt an entirely new stack. TL;DR about scaling the system over time: https://10consulting.com/2021/03/18/commanded-application-architecture/ You could disagree with it still, but everyone knew. By design, it is good these level configurations exist. Your concerns are valid, but it is a programmer problem rather than a technology problem. Commanded already allows you to do multi-app architecture styles. |
I don't particularly disagree that users should learn their system and architect it accordingly. And I'm a huge fan of systems that are flexible enough to handle multiple styles of multi-app architectures. But... "Having sharp edges coming out of the box" isn't a programmer problem. The rule of least astonishment is pretty well established. Defaults can surprise people. That's entirely possible. But it's still just poor ergonomics. Consider the example aggregates in the docs: bank accounts, account balances. Is there any real-world case where those should have infinite lifespans? If the developer is thinking of this as a "database", is there any other database or cache that you can think of that doesn't automatically reclaim memory as it runs? Don't just take it from me. Take a look at blog posts. One of the first ones you'll find is the excellent "Event Sourcing With Elixir" series by @saurdaukar. Aggregate Lifespans show up in the very last installment (part 7). How are they introduced? By an example of this exact behavior surprising the developer and them not noticing it until it was deployed and running for some time. It's even in a section that's explicitly about the gotcha's of running Commanded in production. Here @rodrigobotti runs across it when adopting Commanded. How does he introduce this? "It might not be that apparent but our application has both memory and performance problems coming from our aggregates." That's right. It has a "memory problem" and "it might not be apparent". This is not praise. How about when @ChristianAlexander mentions it in this blog post? He describes it as "That sounds like an OOM waiting to happen." Or here where the entire blog post is about how this was killing an app in production? Or search on Stack Overflow and you'll find that this has been a problem from the beginning. Note how the user describes it? A "memory leak". There's the clear connotation that this is so unexpected that it falls into a class of common bug. And how did they find it? Their app was running out of memory frequently. In fact, if you Google for "elixir commanded lifespan", there isn't a single relevant result that doesn't cast this as a "gotcha" or a "problem". If we're dead-set on keeping the "default behavior is to never free memory" thing, perhaps we can update the documentation to make it more obvious. It's not even mentioned in the section on aggregates. It's only briefly mentioned in at the very bottom of the section on Command Handlers. Even then, it's pretty easy for a developer to miss that "By default an aggregate instance process will run indefinitely once started." means "Use this or you'll leak memory until you run out." Or maybe we could alter Just something to make users think about this if they miss that part of the documentation? Preferably before they have to explain to their boss why the tool they picked is crashing in production? |
There is a miscommunication,
My reply was "not having Commanded Apps with default settings statements". It was NOT about sensitive defaults. I pointed out that Commanded has a different level of architecture; therefore, it needs those default settings (regardless of sensitivity settings or not) I agree with you regarding "good defaults," The Account Balances or any other situation are well-known to me (I can put together your system since I have it all in my head, literally), so I get it, and I agree with you. I don't like it, but 🤷🏻 hopefully, your feature in because it is much needed a long-time ago. The intent (at HMB) was for Seniors to provide better tools, hopefully mitigating these issues and giving better copy-pasting.
Tell Billy in Slack to pay attention and copy-paste the right stuff! It is always Billy's fault! |
This PR doesn't address the legitimate concern raised by @jvantuyl that Commanded provides a default aggregate lifespan which keeps aggregate processes run indefinitely. I'm in agreement that it is OK for the default value to be changed. We could set it to an appropriate timeout value, something like 24 hours? Secondly, the default aggregate lifespan configuration should be associated with a Commanded Application, rather than being a global value associated with the |
The default behavior of "aggregate processes stay around forever" can present difficulty in a production environment. For any system that creates a steady stream of new aggregates, memory creeps up slowly until exhausted.
It is currently possible to override this on a per-router basis, but that only creates a situation where forgetting to specify it one place creates an even more subtle "memory leak".
This commit allows changing the default globally at compile-time. It does not allow changing it at run-time, as this seemed like it would cause unpredictable behavior and possibly negatively impact performance.
Testing this is difficult since it's a compile-time change. As a compromise, I set the value to the default explicitly in the testing config. While this doesn't fully test the functionality, it does at least work-out the code.