Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete and possibly poor advice on private endpoints with Hub & Spoke topology #4410

Open
fabio-s-franco opened this issue Mar 20, 2024 · 5 comments
Assignees
Labels

Comments

@fabio-s-franco
Copy link

fabio-s-franco commented Mar 20, 2024

I was looking for some validation which I believe is correct for my scenario and eventually found this:
Azure Private Link in a hub-and-spoke network

The scenario

The spoke is the main consumer and is highly dependent of a service without public endpoint accessibility. For example, a storage account.
So naturally, since the storage account is tightly coupled to the spoke, it is only logical that the private link is exposed through a private endpoint in the spoke. It avoids all the problems that could happen if it had to hop to the Hub.

Now, when we evolve from the simple scenario above and there is a need to provide accessibility also to a VPN peered on-prem network (not mission critical), the decision chart of the article suggests only one possible outcome:

The private endpoint should be in the Hub.

The problem

This advice fails to account for an important factor:

It introduces a single point of failure that is much more likely to occur when accessing a critical resource cross-vnet than what an intra-vnet would.

The hop to the Hub introduces many more variables, that increase the odds of disruptions dramatically in my opinion.
The extra factors that are introduced with that approach, top of mind:

  • Changes on the Hub's firewall rules
  • Interruptions on vnet peering
  • Misconfiguration of route tables in the hub
  • Private DNS zone link disruptions
  • Hub vnet outages

I could go on, but I think the idea is clear.

The solution (in my opinion)

The advice should include the idea of shared resources, but that are conceptually owned by a spoke.
In this case, what I would and what I am trying to validate, is that we should have 2 private endpoints.

  • One private endpoint stays within the spoke and will be used by services that already reside there. Very simple, no routing to be concerned about and the default rules of the vnet's routing table will take care of it, unless explicitly overridden. No extra hops.
  • One private endpoint is created in the Hub, to serve subnets that are external to the spoke that "owns" the resource.

Obviously this is an over simplification. It does require separate private DNS zones too and the corresponding associations, but I believe this introduces a lot less variables that can influence network disruptions towards the private resource.

Beyond suggesting that the documentation should be updated, I am also looking for feedback on how I am reasoning on this.

So, any of it is appreciated.

Best,


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

@Naveenommi-MSFT Naveenommi-MSFT added cxp CXP team is reviewing doc-idea Suggestion for a new article triaged labels Mar 20, 2024
@Naveenommi-MSFT
Copy link

@fabio-s-franco
Thanks for your feedback! We will investigate and update as appropriate.

@Jar1-MSFT Jar1-MSFT assigned jangelfdez and unassigned Jar1-MSFT Mar 21, 2024
@Jar1-MSFT Jar1-MSFT added assigned-to-author CXP assigned issue to author and removed cxp CXP team is reviewing labels Mar 21, 2024
@jangelfdez
Copy link

jangelfdez commented Mar 21, 2024

Hi @fabio-s-franco!

the initial recommendation was based on the existing restrictions at that time on how was the best way to route the traffic for that specific scenario. I'm trying to remember the specific details but I can't right now after three years of the initial version.

I moved out of the Networking team and @ivapplyr took ownership of this feature. Ivens, can you check if this recommendation is still valid nowadays or should we update the diagram?

Regards!

Edit: Thinking on it, I believe at that time, traffic from on-premises to a PE would not be able to pass across the HUB to a spoke successfully without an NVA doing the routing. If you were not using a NVA, the only option was to deploy it on the Hub.

@fabio-s-franco
Copy link
Author

fabio-s-franco commented Mar 21, 2024

Hey @jangelfdez ,

Thanks for taking the time to reply. And I understand things change over time. For example, back then there was no storage.Global service, so it wouldn't be possible to create a private service connection from a different region than the one a storage account was created in.

Nevertheless, the example of on-prem peering was just one scenario. It would mean the same thing if, for exmaple, you replace the on-prem by another spoke that is peered to the hub and plays a similar non-critical role.
Let's say, there is an "Analytics" spoke that gathers metrics from all other spokes through their storage accounts.

Moving the private endpoint to the hub would still make the "owner/mission-critical" spoke go through unnecessary risks.
And what I am suggesting is not to avoid the endpoint in the "hub". What I am suggesting is that this should be an additional private endpoint.

It would satisfy the need of having a NVA in the Hub and mitigate peering risks for the spoke that is tightly coupled with the resource.

So in the case of a shared resources, I do believe that the proper advice should be:

  • Private endpoint in the hub
  • Private endpoint in every subnet (one per vnet) that is critically dependent on or tightly coupled to it. With the caveat that this needs to be carefully considered to not trample the security paradigm the Hub & Spoke provides.
  • Although it would be technically possible to have multiple private endpoints that represent spokes that are critically dependent on the resource, I believe having multiple critically dependent spokes to a single resource is a design flaw that is worth taking a second look.

I say this, because as the hub incorporates more complex routing rules and peers more vnets, so does the risk of disruption (accidental or incidental) increase. And this approach would be immune to that increase in complexity.

@ivapplyr
Copy link
Member

Thank you for looping me in @jangelfdez.
@fabio-s-franco I do appreciate the commentary here. We do understand that there are many different configurations that are possible in the Hub and Spoke model with Private Endpoints, the article in question was written to address the most common scenario with the least amount of opportunity for misconfigurations.

It is recommended to have the Private Endpoint within the Hub to allow the simplest configuration pattern for the majority of use cases mainly due to the automated DNS benefits and the commonality of customer configurations. However, Let's explore with the scenario mentioned above for examples

Central Hub vs Central Spoke:

  • It is common in centralized network to have a centralized Hub that will differentiate between your E/W and N/S traffic. This provides cost optimizations, reduced complexity, & improved Security to run traffic through a Hub VNet as opposed to have dedicated dependencies within each spoke. We have found that large customers leverage this configuration the most.
  • However, Without the use of VNet Peerings, spoke to spoke communication is difficult. A client VM from spoke1 would need to configure through quite a few hops to connect to a PE in spoke2. Now it becomes even more complex when both spokes have overlapping address spaces.

Which brings us with the second thought process. A private endpoint in the Hub and one in the Spoke

  1. If establishing a connection with the same exact resource, there is not a way to support 2 A records on the same Private DNS Zone --> ref https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-dns as their are 2 separate IPs pointing to the same resource.

There are customers who find success with leveraging a decentralized topography like a Mesh network. This offers resiliency and efficiency benefits, but would come with the understanding that Compute VNets (VNet mainly containing client VMs) and PE VNets (vnets mainly containing Private endpoints) will have some form of distinction. This will help ensure that you have a dedicated Private Endpoint per resource so you would only need to connect Private DNS Zones to the relevant VNets.

@fabio-s-franco
Copy link
Author

fabio-s-franco commented Mar 22, 2024

Hi @ivapplyr, thanks for the detailed explanation.

I understand the advantages of having a centralized location, it's just that through my own experience setting this up, on this one particular scenario where I need several spokes to have access to the same resource (but only one critically dependent on it), it is far simpler and less risky to disruption due to the reasons I mentioned. If you add to that, the fact that the Spoke might be in a different region than the Hub, it can have not only the risks on connectivity, but also the added latency.

I appreciate that you can't cover all possible scenarios, but I do think it is relevant to mention that it is a possibility to have more than one private endpoint per resource (as I didn't see it documented anywhere, had to do it on a trial-and-error basis).

If establishing a connection with the same exact resource, there is not a way to support 2 A records on the same Private DNS Zone

I know that, that is why I have two private DNS zones with the same name:
1 - Private DNS zone created in a resource group from the spoke that owns the resource (like the storage account). This private DNS zone is associated only with the spoke Vnet
2 - Private DNS zone created in the Hub. This is associated with the hub and the Vnets of other spokes that also need access to the resource.

It has already happened that a change in the topology disrupted the connectivity between the spokes and the resource. It was this experience that led me to think about a better way to make the spoke critically dependent on it to have a dedicated private
endpoint in its own vnet.

A mesh network is an interesting concept, but I do appreciate the benefits of having isolation, which, security-wise is desirable. Specially if you would like to have relaxed constraints on a spoke (like one with a development environment), that would otherwise be riskier if it was meshed with production spokes.

I may have sounded a bit harsh on my initial post (I realized it only after I read it again) and that was not my intention at all. I am taking the time to report this here and try to validate my thought process as a way to also help others that may encounter a similar situation.

Thanks for all the feedback, nonetheless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants