Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Orleans/ShoppingCart setup fails when try to deploy on Web App on Both Production and Staging slot #5625

Open
ArunBee84 opened this issue Jan 5, 2023 · 12 comments

Comments

@ArunBee84
Copy link

ArunBee84 commented Jan 5, 2023

Issue descriptions
Using the project Orleans/ShoppingCart, I was able to build and deploy successfully on Azure Web App. Also, to build a new Environment on Azure I used the bicep templates mentioned here. Everything worked fine till this point.
Then I created a Staging Slot copying the production slot configuration and deployed the same ShoppingCart build. The Staging Slot Silo is stuck in the Joining state giving the below error.

Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S172.17.0.254:20106:31922292. See InnerException
 ---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 172.17.0.254:20106. Error: AccessDenied
   at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 54
   at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 61
   at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 193
   --- End of inner exception stack trace ---
   at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 221
   at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 106
   at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTas`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 224

If I open Kudu on Staging Slot, and try to ping the Production Silo address, I get the following exception:

PS C:\home> tcpping 172.17.0.254:20106
tcpping 172.17.0.254:20106
Connection attempt failed: An attempt was made to access a socket in a way forbidden by its access permissions 172.17.0.254:20106

Anyone got any ideas as to what could be wrong?

@IEvangelist
Copy link
Member

I'm not entirely sure, let's ask @bradygaster and @ReubenBond for their thoughts on this.

@IEvangelist
Copy link
Member

Seems to have been originally reported here: Azure-Samples/Orleans-Cluster-on-Azure-App-Service#3

@IEvangelist
Copy link
Member

Hi @ArunBee84 - thank you for posting this issue, I see that you also posted it on the Azure-Sample repo as well. Sorry for not seeing this sooner. In talking this over with the team, they're suggesting that you try configuring with two different silo names. That way they aren't trying to dial up the same cluster. Does that make sense?

@ReubenBond
Copy link
Member

I would guess that your staging slot and production slot have the same ClusterId set, so they are trying to form a single cluster... but they cannot because they have no network connectivity between them.

@windperson
Copy link

windperson commented Jan 5, 2023

You need to also set vnet integration in staging slot too, only create deployment slot and copy app configuration did not set vnet integration in the staging slot, I've made an example project that both production slot and staging slot have two instances running simultaneously, The Bicep code is here.

@ArunBee84
Copy link
Author

You need to also set vnet integration in staging slot too, only create deployment slot and copy app configuration did not set vnet integration in the staging slot, I've made an example project that both production slot and staging slot have two instances running simultaneously, The Bicep code is here.

Hi @windperson,
Sorry I missed to mentioned before but I did verified the staging silo is also connected to the same subnet as the production silo. Here is the screenshot from Azure Storage Table showing the Silo Instances created.
image

The Silo with status joining is the Staging Instance. As you can see the IP Address is the same as the Production Silo but the status remains in Joining

@andrethy
Copy link

andrethy commented Jan 6, 2023

Hi @ArunBee84 - thank you for posting this issue, I see that you also posted it on the Azure-Sample repo as well. Sorry for not seeing this sooner. In talking this over with the team, they're suggesting that you try configuring with two different silo names. That way they aren't trying to dial up the same cluster. Does that make sense?

Colleague of @ArunBee84 here

The issue with that would be that we have a servicebus on the staging slot that connects to the same servicebus as the production slot, and would therefore activate grains, but it would be spawning new grains and not continue from the same state, since the grain will be in a new cluster.
This can be circumvented by ensuring that the staging slot doesn't connect to our servicebus, hence not activating any grains, but we would like to figure out why the staging silo have no connectivity to the production cluster as @ReubenBond also mentions.

Do we know if this is expected behavior?

@ArunBee84
Copy link
Author

I would guess that your staging slot and production slot have the same ClusterId set, so they are trying to form a single cluster... but they cannot because they have no network connectivity between them.

Hi @ReubenBond,
Both Production and Staging Slot are connected to the same Vnet/subnet with no NSG rules applied to them.

@ArunBee84
Copy link
Author

Hi Guys,

Any update on this issue.

@bradygaster
Copy link
Member

Adding @btardif to this discussion to get additional eyes on this issue from the App Service team side. I'm going to deploy a slotted instance of this app to see if I can emulate this environmental setup and replicate the issue. @windperson - your Bicep code - does that represent the entire topology deployed in a slotted instance? If not, I think I'll update this sample to reflect just that, so I'd appreciate any pull requests if you have that Bicep available.

@windperson
Copy link

windperson commented Jan 16, 2023

Hi @bradygaster
The Bicep sample is from my Chine article that should be re-authored as a printed book in next few months.
That will produce a lab resource group that has Azure resource as following picture:
https://github.com/windperson/2022ithome_30days/blob/main/articles/day37/OrleansUrlShortener.svg
Azure App service contains a production slot and a staging slot, each has two instances running, you can take that as part of official sample if you like 😄👌

@bradygaster
Copy link
Member

bradygaster commented Jan 16, 2023

In my fork of this repo, I've created a slots branch in which I create a slotted version of the site. I've tweaked the code so that this is the setup:

  • There exists a default subnet and a staging subnet
  • Each slot is in its own subnet
  • Each slot is in its own cluster
  • The cluster ID is a slot configuration setting, so the production slot always hit production data, and the staging slot always hits staging data

I know this is somewhat variant from the topology @ArunBee84 proposed, but wanted to see if this would mitigate some of the issues folks have run into in this thread and from the original issue. cc @btardif to see if he has any recommendations on this front, and @IEvangelist as I think it'd be good to create a PR to the main fork of this sample, but that would require some doc updates, so I'd prefer to coordinate those together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants