Skip to content

When to Use the Persistor?

Karlo K edited this page Dec 11, 2020 · 2 revisions

An obvious question that may arise, just from the sheer number of possible variants and possibilities, is: which option is better? And what are the limitations of them?

Below are some of the most important things to keep in mind, depending on the variant used for a particular service:

Notes for BINDING Variants
TYPE NOTES
GENERAL BINDING NOTES
  • Bear in mind that Azure Functions are limited in how well they can keep up with the incoming traffic by its scaling mechanism.
  • If you wish to store each message into its own individual blob, it is best to generate functions with an Output binding to a blob enabled. Testing has shown lower resource usage.
  • Wherever possible, enable a retry/max delivery attempt policy on your subscriptions. For the Event Hub, this does become problematic, due to its checkpoints are always updated when a batch/event is handed to a function, regardless of what that function's result might ultimately be.
EVENT GRID
  • Keep in mind that Azure Functions give a lot more weight to the number of events sent per second, rather than their size. Each message is its own function to be triggered. At around 5000 events/second, the functions started to become unable to find workers and had to recover, which may lead to event loss if no Retry Policy is enabled.
EVENT HUB
  • If you intend to stream messages to storage and have a high message throughput, it is highly suggested to consider using Event Hub and its Capture functionality or the Data Explorer, if possible. Otherwise, the execution costs of Azure Functions may end up being quite high, depending on how high your traffic really is. In addition, if you are on the Consumption plan, your estimated costs will likely fluctuate from day to day, even if the rate of incoming messages never changes.
  • If you are on a Basic Hub namespace (given that the Basic-tier only allows for a single consumer group, you would be using the Persistor only as the final part of the pipeline) and/or do not wish to use the Capture/Data Explorer, consider using the PULL variant for higher traffic.
  • Increasing the number of partitions will improve the message output rate, but will inevitably cost more resources, due to the way Hub-bound functions scale.
  • Using the output binding option is an absolute must here if you wish to even get close to the ingress rate; otherwise, the output rate will be around three times lower than the input one, regardless of message size.
  • You may further improve performance by editing the host.json file of your function app, as shown on this page.
SERVICE BUS
  • The Service Bus functions were generally stable, being able to keep up with the incoming rate of messages relatively well.
Notes for PULL Variants
TYPE NOTES
GENERAL PULL NOTES
  • In either service, increasing the number of concurrent tasks per function did not seem to improve performance in the slightest. As a result, we heavily recommend the use of the invoker! Calling several HTTP functions with the invoker generally gives each of them their own instance.
  • When discussing concurrency, the invoker variants will be the ones taken into consideration.
  • Generally speaking, increasing the number of coroutines had a negative impact when compared to running a single function with a single running task. This behavior may improve on a different App Plan, with the virtual machines set to run with more than one CPU core.
  • The number of functions the invoker can trigger at a time is limited to 32.
  • The RECEIVE_DURATION parameter should be set, regardless of which version of the Pull you use. The timeout of HTTP-triggered functions on Azure is set to 230 seconds due to scaling, regardless of the plan you're using. Setting this configuration setting is especially necessary for the Event Hub variant -- otherwise, the client may continue running even past the HTTP timeout, with no real control of the resources past that point.
  • It is not recommended to achieve periodic calls of the Invoker through Logic Apps, due to a longer amount of time taken for both invocation and response than it would compared to when tested locally. Causes timeout issues.
EVENT HUB
  • A single running function pulls roughly 24K messages/minute on average, with message size of 2kB, with a Hub of only two partitions. The number increases with the number of partitions.
  • During testing, the best performance was achieved at roughly 4 partitions, at around 42K messages/minute after which the number fell; by 10 partitions, a single function could pull 30K+ messages/minute on average (still above the mere 2 partitions)..
  • When run with the invoker, performance was proven to be inconsistent. Only with two partitions is the pull performance doubled at 2 kB messages. Performance in this regard increases with a higher number of throughput units (TUs).
  • When it comes to duplicates, you can track the number of expected duplicates by checking the traces table in your App Insights. A typical such log looks like so:
RECEIVE CANCELLED PREMATURELY! EXPECT {N} MESSAGES TO BE DUPLICATED IN NEXT RUN!
  • Performance is generally not significantly improved when running more than N parallel functions, where N is the number of partitions on the Event Hub.
SERVICE BUS
  • A single function can, on average, pull at around 26K messages/minute on message size of 2kB. (Bizarrely, this performance drops when the function is called through Logic Apps). This number decreases with message size.
  • Generally, increasing the number of concurrent functions (with the invoker) seems to improve performance, albeit not to a point where N functions will improve the performance of a single function N-fold. In fact, running more than one function at a time will potentially throttle the Service Bus namespace, and is not encouraged.
  • To slightly improve performance, the Service Bus version prefetches 512 messages. To avoid a certain number of prefetch messages to not end up on the dead letter queue at the time the Persistor is stopped, make sure to set the maximum delivery setting when configuring the subscription for the Persistor.
General Notes
  • Testing was done under the Consumption Plan.
  • The Storage Account used was V2.
  • During testing, we found a large number of ClientOtherError errors logged per minute on the Storage Account (the actual number depends on the service being tested and whether or not the append blob feature is used; much larger on Event Grid, generally). The number of these errors does not reflect the number of messages lost. In fact, it seems completely unrelated, as there were numerous tests where all the messages were stored, in spite of the aforementioned errors. As of now, their cause is unknown; the Persistor is set up to catch and log exceptions relevant to actual storing -- whatever this error is, it is being caused by Persistor activity, but does not seem to impact the Persistor's storage capabilities.
  • The following error will often be found in the traces table of your Application Insights. It is a well-known one, and present with Azure Functions written in Python:
    AI: Local storage access has resulted in an error (User: app) (CustomFolder: ). 
    If you want Application Insights SDK to store telemetry locally on disk in case of transient network issues please give the process access to %LOCALAPPDATA% or %TEMP% folder. 
    If application is running in non-windows platform, create StorageFolder yourself, and set ServerTelemetryChannel.StorageFolder to the custom folder name. 
    After you gave access to the folder you need to restart the process. 
    Currently monitoring will continue but if telemetry cannot be sent it will be dropped. 
    Error message: .