Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pet Set in beta #28718

Closed
bprashanth opened this issue Jul 8, 2016 · 67 comments
Closed

Pet Set in beta #28718

bprashanth opened this issue Jul 8, 2016 · 67 comments
Assignees
Milestone

Comments

@bprashanth
Copy link
Contributor

bprashanth commented Jul 8, 2016

Important petset issues in no particular order:

  1. Local storage: There should be a local storage persistent volume #7562
  2. Upgrade: Pet set upgrades #28706
  3. Richer events: Feature request: A way to signal pods #24957
  4. Public ips: public network identities for pets #28660
  5. Leader election and failover: Support master election #1542, Set pod conditions from a container probe #28658
  6. Non database examples: eg Kafka on-top of zookeeper petset (Support Kafka in PetSet #23794)
  7. Petset fencing: PetSet fencing #31762

IMO the following features would make PetSet more usable and should gate beta:

  1. Local storage in some form, so people can deploy databases without cloud provider help
  2. Basic rolling-update style upgrades as described in (2)
  3. 1 non-database example like Kafka (which might bring more feature requests with it)
  4. Fencing?

Richer events would be really nice but I don't think it blocks beta unless we need it for 2 or 3. Lack of public ips will probably block WAN deployments.

@smarterclayton @kubernetes/sig-apps anything else?

@chrislovecnm
Copy link
Contributor

Sticky ips ;) I don't think Cassandra and other tech is going to play nice without ip addresses. Java DNS is hosed, and products that have been around awhile are still using ip addresses. I think Elastic does as well. Need to check.

@chrislovecnm
Copy link
Contributor

Btw unless someone gets to it, I will ask if my client, or I can contribute Kafka on pet set. We are going to be using it in production soon.

@bprashanth
Copy link
Contributor Author

Btw unless someone gets to it, I will ask if my client, or I can contribute Kafka on pet set

Go for it. Keep communicating hurdles you run into.

Sticky ips ;) I don't think Cassandra and other tech is going to play nice without ip addresses. Java DNS is hosed

Duly noted. grr.

@chrislovecnm
Copy link
Contributor

@bprasanth you need an issue for sticky ip addresses?

@smarterclayton
Copy link
Contributor

We've had lots of very positive results with low TTL DNS in Java with JBoss
and other portfolio products - I'm not that skeptical it can't be fixed.

On Fri, Jul 8, 2016 at 7:44 PM, Chris Love notifications@github.com wrote:

@bprasanth https://github.com/bprasanth you need an issue for sticky ip
addresses?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_pydySu7rZH65Ps3jz1CLglz4ODdJks5qTuDbgaJpZM4JIduS
.

@smarterclayton
Copy link
Contributor

I'd put reducing the complexity required to make a "correct" petset at the
top of the list - I'd even be willing to postpone other more sophisticated
operationalization things in favor of that. If someone has to be an expert
at distributed systems to build a correct PetSet we'll probably not succeed.

On Fri, Jul 8, 2016 at 7:48 PM, Clayton Coleman ccoleman@redhat.com wrote:

We've had lots of very positive results with low TTL DNS in Java with
JBoss and other portfolio products - I'm not that skeptical it can't be
fixed.

On Fri, Jul 8, 2016 at 7:44 PM, Chris Love notifications@github.com
wrote:

@bprasanth https://github.com/bprasanth you need an issue for sticky
ip addresses?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_pydySu7rZH65Ps3jz1CLglz4ODdJks5qTuDbgaJpZM4JIduS
.

@chrislovecnm
Copy link
Contributor

@smarterclayton it is about if it can be fixed, it is about how distributed systems have implemented it. I need to check on elastic, but I am 90% certain that Cassandra is not going to fix it. Going to file an issue with them anyways.

@bprashanth
Copy link
Contributor Author

I'd put reducing the complexity required to make a "correct" petset at the
top of the list - I'd even be willing to postpone other more sophisticated
operationalization things in favor of that. If someone has to be an expert
at distributed systems to build a correct PetSet we'll probably not succeed.

This is less of a problem for modern databases like etcd, but it is a problem by and large.

With a better event model the system can pass down observations about the current state of the cluster to individual members. The exact events will probably depend on the type of PetSet. Obviously we can't have a type=Cassandra, and say, "tell me when the seeds are down", but we can say something like "index 0,1" are always seeds, send all members of the set a role based event if they fail a probe.

This requires bidirectional event/response infrastructure. I think the easiest way to do this is with a sidecar (might need shared pid namespaces). For example we could add the following fields to petset

checkIndices: 0
jsonProbe:
  httpGet:
      path: /

Where the json probe returns whatever metadata a member needs to elect itself into that role, if any (eg: transaction id). All the sidecars would then heartbeat the probe, and race for the position on failure. The winner writes their index into checkIndices and the process continues.

Did you mean something like this? Getting it right is tricky (shortcomings of a http health check, flapping leaders, leader resurrection etc), but it's probably what a lot of people are doing themselves.

@magicwang-cn
Copy link
Contributor

@kubernetes/huawei

@haizaar
Copy link

haizaar commented Jul 12, 2016

ElasticSearch uses dedicated k8s plugin for node discovery. We are using it now in k8s 1.2 and we don't need sticky IPs for that.

@chrislovecnm
Copy link
Contributor

@haizaar you know of any other use case by any chance?

@haizaar
Copy link

haizaar commented Jul 12, 2016

No :(

We are ES-only shop.
On 12 Jul 2016 19:33, "Chris Love" notifications@github.com wrote:

@haizaar https://github.com/haizaar you know of any other use case by
any chance?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AADjWZcm1wBSaTHH9y0XoUKpHdkVPXpJks5qU8HpgaJpZM4JIduS
.

@damoon
Copy link

damoon commented Jul 14, 2016

Ceph Monitors and etcd need sticky IPs as far as I know. Or at least fixed dns names. Currently i use a service per 1 Pod deployment with hostpath volume and node Selector.

@bprashanth
Copy link
Contributor Author

Petset already gives you fixed dns names http://kubernetes.io/docs/user-guide/petset/#network-identity. If you have some working form of ceph with petset I'd be glad to help work out the kinks/understand what we need to grow petset. I believe theres a prototype etcd in the works: kubernetes-retired/contrib#1295

@chrislovecnm
Copy link
Contributor

@damoon I am looking for a use case that requires IP addresses to be sticky. Otherwise, I will not ask for it. I think we are coming up dry.

@chrislovecnm
Copy link
Contributor

@bprashanth what about autoscaling, and deletes? Do we want to keep Pet Set deletions manual?

@bprashanth
Copy link
Contributor Author

Most pets are not going to run amock in the 1000s. For those that do, you can write a custom autoscaler. Ideally we would integrate with HPA but unless someone has hard use cases for a broad category of pets that need autoscaling I'm tempted to punt on that for beta.

we should add a flag that allows auto-GC of the pvc when a pet is killed.

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Jul 15, 2016

@bprashanth I kinda don't agree. These are persistent data stores, and having autoscaling based on memory or cpu would be HUGE. These apps are the backbone of most systems, and often scaling challenge.

@bprashanth
Copy link
Contributor Author

bprashanth commented Jul 15, 2016

HPA is already aware of how to scale RCs, teaching it to bump up the replica count on a petset can happen without any additional features or api changes, if someone does the plumbing. @kubernetes/autoscaling

autoscaling cross zone/region is a different story (i.e mysql-us-central is saturated, spin up a new petset for mysql-asia and have it replicate data). For that we need WAN deployment prototypes.

@smarterclayton
Copy link
Contributor

Not probably - are, and having to rediscover every failure. Distributed
systems need training wheels, and Kubernetes needs to be the tricycle.

On Sat, Jul 9, 2016 at 8:06 PM, Prashanth B notifications@github.com
wrote:

I'd put reducing the complexity required to make a "correct" petset at the
top of the list - I'd even be willing to postpone other more sophisticated
operationalization things in favor of that. If someone has to be an expert
at distributed systems to build a correct PetSet we'll probably not
succeed.

This is less of a problem for modern databases like etcd, but it is a
problem by and large.

With a better event model the system can pass down observations about the
current state of the cluster to individual members. The exact events will
probably depend on the type of PetSet. Obviously we can't have a
type=Cassandra, and say, "tell me when the seeds are down", but we can say
something like "index 0,1" are always seeds, send all members of the set a
role based event if they fail a probe.

This requires bidirectional event/response infrastructure. I think the
easiest way to do this is with a sidecar (might need shared pid
namespaces). For example we could add the following fields to petset

checkIndices: 0jsonProbe:
httpGet:
path: /

Where the json probe returns whatever metadata a member needs to elect
itself into that role, if any (eg: transaction id). All the sidecars would
then heartbeat the probe, and race for the position on failure. The winner
writes their index into checkIndices and the process continues.

Did you mean something like this? Getting it right is tricky
(shortcomings of a http health check, flapping leaders, leader resurrection
etc), but it's probably what a lot of people are doing themselves.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p-N6WM5j1ND0dVenzmjxlGfFOpxBks5qUDeEgaJpZM4JIduS
.

@damoon
Copy link

damoon commented Jul 15, 2016

ceph: http://docs.ceph.com/docs/hammer/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address ceph mons NEED static ips

etcd: fixed dns names are good enough. but it seems, a dns request from a pod targeting itself fails for some reason. seems kubedns tries to prevent services to call themself recursively.

@bprashanth
Copy link
Contributor Author

Lets maintain a list of apps that needs ip on the sticky ips bug (#28969) so we can at least start by cautioning users in documentation. Currently you can create a Service pet pet for sticky ips.

@damoon can you comment on the deployment style you'd use for ceph. It sounds like one might want to run ceph as a daemon set and only the monitors as a petset, if petset had sticky ips?

@magicwang-cn
Copy link
Contributor

magicwang-cn commented Jul 16, 2016

i think the petset can join the concept of (Role). Which represents a component of an app, every role has its (Replicas, PodSpec, etc.)
for example:

  • mongodb is the composition of config server, router and shard, these three components can be three roles.
  • etcd has just one role

what's more, we can supply a common image(like peer) to find every role's identity information(like dns-name etc.) by init-containers, and store these information to a specific config file, so users can get these information from the config file, start their apps by themselves.

@leseb
Copy link

leseb commented Jul 18, 2016

@bprashanth basically when it comes to ceph monitor nodes, once they get deployed their IP address can not be changed. That's why we need sticky IP addresses. Other components can have their IP changed.

@erictune
Copy link
Member

erictune commented Oct 27, 2016

My take on beta:

Before Code Freeze:

After Code Freeze but before 1.5:

  • Dcoument and test procedure to convert resources from PetSet/v1alpha1 to StatefulSet/v1beta1 whilst upgrading from 1.4 to 1.5.
  • Greatly expand documentation on best practices for PetSets, reasoning about availability in the face of node updates, debugging, etc, etc...
  • Rename PetSet to StatefulSet in all kubernetes.github.io docs

Right after 1.5:

  • convert incubating Charts that use PetSet to StatefulSet and promote to stable if stable (@viglesiasce will lead this)

Soon after that:

  • move filenames that contain pet to new names.
  • more examples/Charts for more apps.

Everything else is for second beta or GA or not doing.

@smarterclayton

@bprashanth
Copy link
Contributor Author

Theres something else we (well I) kind-of (read: completely) forgot. There's a useful feature we added to Services. The annotations is shown in many petset examples: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/testing-manifests/petset/zookeeper/service.yaml#L6

It enables pets to make decisions about their peers being ready/unready using internal protocols, so eg: a http readiness probe timeout doesn't end up removing the DNS records for zookeeper-0, forcing a re-election.

You can still create petsets without this annotation, of course, just don't give them a readiness probe. But that means no service can leverage the magic of a readiness probe (i.e there are 2 kinds of services, the governing Service, which always needs DNS, and any other type of overlay Service, which should ideally not direct traffic to an "unhealthy" pet).

Thoughts on renaming this to beta.services? (#35713)

@chrislovecnm
Copy link
Contributor

Guys this is a show stopper #33727 can we get it in by code freeze and back ported. We are about to go into prod with a bunch of pets, and seamless upgrades are super important to our sla.

@alexouzounis
Copy link

for us, being able to get the index/ordinal in an environment variable would be very useful. #30427 is tracking this and it would be great if it was included for 1.5

@smarterclayton
Copy link
Contributor

The API changes are too significant and we're past the window for that (in
terms of review bandwidth). It's just not going to make it for 1.5.

On Mon, Oct 31, 2016 at 3:30 PM, Alex Ouzounis notifications@github.com
wrote:

for us, being able to get the index/ordinal in an environment variable
would be very useful. #30427
#30427 is tracking this
and it would be great if it was included for 1.5


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pxFDZ9H1YUuc_iIGU8RjIDe4SAGMks5q5kHHgaJpZM4JIduS
.

k8s-github-robot pushed a commit that referenced this issue Nov 3, 2016
Automatic merge from submit-queue

Move Statefulset (previously PetSet) to v1beta1

**What this PR does / why we need it**: #28718

**Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes #

**Special notes for your reviewer**: depends on #35663 (PetSet rename)
cc @erictune @foxish @kubernetes/sig-apps 

**Release note**:

``` release-note
v1beta1/StatefulSet replaces v1alpha1/PetSet.
```
@erictune
Copy link
Member

erictune commented Nov 9, 2016

I've created a detailed documentation plan for PetSets, to be completed by 1.5 docs freeze: kubernetes/website#1655

@dims
Copy link
Member

dims commented Nov 17, 2016

Moving to 1.6 per @smarterclayton 's comment

@dims dims modified the milestones: v1.6, v1.5 Nov 17, 2016
@bprashanth
Copy link
Contributor Author

I think Clayton was talking about #28718 (comment). PetSet should be in beta under the alias "StatefulSet" in 1.5. I'll leave it to the people actively working on the transition to decide when to close this, and what to migrate out into a "PetSet in GA" bug.

@dims
Copy link
Member

dims commented Nov 18, 2016

Ack @bprashanth At least it does not sound like a stop ship for 1.5, so marking as non-release-blocker. Please correct if i am mistaken.

@Stono
Copy link

Stono commented Dec 5, 2016

I need this in my Production life.

@dims
Copy link
Member

dims commented Dec 9, 2016

@bprashanth @foxish Is it appropriate to move this to the next milestone or clear the 1.5 milestone? (and remove the non-release-blocker tag as well)

@bprashanth
Copy link
Contributor Author

petset is beta in 1.5 and I think all issues discussed here have spin off bugs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests