Pet Set in beta #28718

bprashanth · 2016-07-08T23:01:28Z

Important petset issues in no particular order:

Local storage: There should be a local storage persistent volume #7562
Upgrade: Pet set upgrades #28706
Richer events: Feature request: A way to signal pods #24957
Public ips: public network identities for pets #28660
Leader election and failover: Support master election #1542, Set pod conditions from a container probe #28658
Non database examples: eg Kafka on-top of zookeeper petset (Support Kafka in PetSet #23794)
Petset fencing: PetSet fencing #31762

IMO the following features would make PetSet more usable and should gate beta:

Local storage in some form, so people can deploy databases without cloud provider help
Basic rolling-update style upgrades as described in (2)
1 non-database example like Kafka (which might bring more feature requests with it)
Fencing?

Richer events would be really nice but I don't think it blocks beta unless we need it for 2 or 3. Lack of public ips will probably block WAN deployments.

@smarterclayton @kubernetes/sig-apps anything else?

chrislovecnm · 2016-07-08T23:22:32Z

Sticky ips ;) I don't think Cassandra and other tech is going to play nice without ip addresses. Java DNS is hosed, and products that have been around awhile are still using ip addresses. I think Elastic does as well. Need to check.

chrislovecnm · 2016-07-08T23:25:39Z

Btw unless someone gets to it, I will ask if my client, or I can contribute Kafka on pet set. We are going to be using it in production soon.

bprashanth · 2016-07-08T23:39:16Z

Btw unless someone gets to it, I will ask if my client, or I can contribute Kafka on pet set

Go for it. Keep communicating hurdles you run into.

Sticky ips ;) I don't think Cassandra and other tech is going to play nice without ip addresses. Java DNS is hosed

Duly noted. grr.

chrislovecnm · 2016-07-08T23:44:13Z

@bprasanth you need an issue for sticky ip addresses?

smarterclayton · 2016-07-08T23:48:36Z

We've had lots of very positive results with low TTL DNS in Java with JBoss
and other portfolio products - I'm not that skeptical it can't be fixed.

On Fri, Jul 8, 2016 at 7:44 PM, Chris Love notifications@github.com wrote:

@bprasanth https://github.com/bprasanth you need an issue for sticky ip
addresses?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_pydySu7rZH65Ps3jz1CLglz4ODdJks5qTuDbgaJpZM4JIduS
.

smarterclayton · 2016-07-08T23:50:12Z

I'd put reducing the complexity required to make a "correct" petset at the
top of the list - I'd even be willing to postpone other more sophisticated
operationalization things in favor of that. If someone has to be an expert
at distributed systems to build a correct PetSet we'll probably not succeed.

On Fri, Jul 8, 2016 at 7:48 PM, Clayton Coleman ccoleman@redhat.com wrote:

We've had lots of very positive results with low TTL DNS in Java with
JBoss and other portfolio products - I'm not that skeptical it can't be
fixed.

On Fri, Jul 8, 2016 at 7:44 PM, Chris Love notifications@github.com
wrote:

@bprasanth https://github.com/bprasanth you need an issue for sticky
ip addresses?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_pydySu7rZH65Ps3jz1CLglz4ODdJks5qTuDbgaJpZM4JIduS
.

chrislovecnm · 2016-07-09T00:06:45Z

@smarterclayton it is about if it can be fixed, it is about how distributed systems have implemented it. I need to check on elastic, but I am 90% certain that Cassandra is not going to fix it. Going to file an issue with them anyways.

bprashanth · 2016-07-10T00:06:13Z

I'd put reducing the complexity required to make a "correct" petset at the
top of the list - I'd even be willing to postpone other more sophisticated
operationalization things in favor of that. If someone has to be an expert
at distributed systems to build a correct PetSet we'll probably not succeed.

This is less of a problem for modern databases like etcd, but it is a problem by and large.

With a better event model the system can pass down observations about the current state of the cluster to individual members. The exact events will probably depend on the type of PetSet. Obviously we can't have a type=Cassandra, and say, "tell me when the seeds are down", but we can say something like "index 0,1" are always seeds, send all members of the set a role based event if they fail a probe.

This requires bidirectional event/response infrastructure. I think the easiest way to do this is with a sidecar (might need shared pid namespaces). For example we could add the following fields to petset

checkIndices: 0
jsonProbe:
  httpGet:
      path: /

Where the json probe returns whatever metadata a member needs to elect itself into that role, if any (eg: transaction id). All the sidecars would then heartbeat the probe, and race for the position on failure. The winner writes their index into checkIndices and the process continues.

Did you mean something like this? Getting it right is tricky (shortcomings of a http health check, flapping leaders, leader resurrection etc), but it's probably what a lot of people are doing themselves.

magicwang-cn · 2016-07-11T06:24:40Z

@kubernetes/huawei

haizaar · 2016-07-12T08:07:19Z

ElasticSearch uses dedicated k8s plugin for node discovery. We are using it now in k8s 1.2 and we don't need sticky IPs for that.

chrislovecnm · 2016-07-12T16:32:58Z

@haizaar you know of any other use case by any chance?

haizaar · 2016-07-12T16:49:09Z

No :(

We are ES-only shop.
On 12 Jul 2016 19:33, "Chris Love" notifications@github.com wrote:

@haizaar https://github.com/haizaar you know of any other use case by
any chance?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AADjWZcm1wBSaTHH9y0XoUKpHdkVPXpJks5qU8HpgaJpZM4JIduS
.

damoon · 2016-07-14T17:41:30Z

Ceph Monitors and etcd need sticky IPs as far as I know. Or at least fixed dns names. Currently i use a service per 1 Pod deployment with hostpath volume and node Selector.

bprashanth · 2016-07-14T17:43:58Z

Petset already gives you fixed dns names http://kubernetes.io/docs/user-guide/petset/#network-identity. If you have some working form of ceph with petset I'd be glad to help work out the kinks/understand what we need to grow petset. I believe theres a prototype etcd in the works: kubernetes-retired/contrib#1295

chrislovecnm · 2016-07-14T19:20:41Z

@damoon I am looking for a use case that requires IP addresses to be sticky. Otherwise, I will not ask for it. I think we are coming up dry.

chrislovecnm · 2016-07-14T19:35:41Z

@bprashanth what about autoscaling, and deletes? Do we want to keep Pet Set deletions manual?

bprashanth · 2016-07-14T20:24:13Z

Most pets are not going to run amock in the 1000s. For those that do, you can write a custom autoscaler. Ideally we would integrate with HPA but unless someone has hard use cases for a broad category of pets that need autoscaling I'm tempted to punt on that for beta.

we should add a flag that allows auto-GC of the pvc when a pet is killed.

chrislovecnm · 2016-07-15T00:40:53Z

@bprashanth I kinda don't agree. These are persistent data stores, and having autoscaling based on memory or cpu would be HUGE. These apps are the backbone of most systems, and often scaling challenge.

bprashanth · 2016-07-15T01:35:39Z

HPA is already aware of how to scale RCs, teaching it to bump up the replica count on a petset can happen without any additional features or api changes, if someone does the plumbing. @kubernetes/autoscaling

autoscaling cross zone/region is a different story (i.e mysql-us-central is saturated, spin up a new petset for mysql-asia and have it replicate data). For that we need WAN deployment prototypes.

smarterclayton · 2016-07-15T02:00:41Z

Not probably - are, and having to rediscover every failure. Distributed
systems need training wheels, and Kubernetes needs to be the tricycle.

On Sat, Jul 9, 2016 at 8:06 PM, Prashanth B notifications@github.com
wrote:

I'd put reducing the complexity required to make a "correct" petset at the
top of the list - I'd even be willing to postpone other more sophisticated
operationalization things in favor of that. If someone has to be an expert
at distributed systems to build a correct PetSet we'll probably not
succeed.

This is less of a problem for modern databases like etcd, but it is a
problem by and large.

With a better event model the system can pass down observations about the
current state of the cluster to individual members. The exact events will
probably depend on the type of PetSet. Obviously we can't have a
type=Cassandra, and say, "tell me when the seeds are down", but we can say
something like "index 0,1" are always seeds, send all members of the set a
role based event if they fail a probe.

This requires bidirectional event/response infrastructure. I think the
easiest way to do this is with a sidecar (might need shared pid
namespaces). For example we could add the following fields to petset

checkIndices: 0jsonProbe:
httpGet:
path: /

Where the json probe returns whatever metadata a member needs to elect
itself into that role, if any (eg: transaction id). All the sidecars would
then heartbeat the probe, and race for the position on failure. The winner
writes their index into checkIndices and the process continues.

Did you mean something like this? Getting it right is tricky
(shortcomings of a http health check, flapping leaders, leader resurrection
etc), but it's probably what a lot of people are doing themselves.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p-N6WM5j1ND0dVenzmjxlGfFOpxBks5qUDeEgaJpZM4JIduS
.

damoon · 2016-07-15T10:22:30Z

ceph: http://docs.ceph.com/docs/hammer/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address ceph mons NEED static ips

etcd: fixed dns names are good enough. but it seems, a dns request from a pod targeting itself fails for some reason. seems kubedns tries to prevent services to call themself recursively.

bprashanth · 2016-07-15T16:15:01Z

Lets maintain a list of apps that needs ip on the sticky ips bug (#28969) so we can at least start by cautioning users in documentation. Currently you can create a Service pet pet for sticky ips.

@damoon can you comment on the deployment style you'd use for ceph. It sounds like one might want to run ceph as a daemon set and only the monitors as a petset, if petset had sticky ips?

magicwang-cn · 2016-07-16T06:09:28Z

i think the petset can join the concept of (Role). Which represents a component of an app, every role has its (Replicas, PodSpec, etc.)
for example:

mongodb is the composition of config server, router and shard, these three components can be three roles.
etcd has just one role

what's more, we can supply a common image(like peer) to find every role's identity information(like dns-name etc.) by init-containers, and store these information to a specific config file, so users can get these information from the config file, start their apps by themselves.

leseb · 2016-07-18T10:08:23Z

@bprashanth basically when it comes to ceph monitor nodes, once they get deployed their IP address can not be changed. That's why we need sticky IP addresses. Other components can have their IP changed.

erictune · 2016-10-27T16:33:54Z

bprashanth · 2016-10-27T18:36:41Z

Theres something else we (well I) kind-of (read: completely) forgot. There's a useful feature we added to Services. The annotations is shown in many petset examples: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/testing-manifests/petset/zookeeper/service.yaml#L6

It enables pets to make decisions about their peers being ready/unready using internal protocols, so eg: a http readiness probe timeout doesn't end up removing the DNS records for zookeeper-0, forcing a re-election.

You can still create petsets without this annotation, of course, just don't give them a readiness probe. But that means no service can leverage the magic of a readiness probe (i.e there are 2 kinds of services, the governing Service, which always needs DNS, and any other type of overlay Service, which should ideally not direct traffic to an "unhealthy" pet).

Thoughts on renaming this to beta.services? (#35713)

chrislovecnm · 2016-10-29T17:51:11Z

Guys this is a show stopper #33727 can we get it in by code freeze and back ported. We are about to go into prod with a bunch of pets, and seamless upgrades are super important to our sla.

alexouzounis · 2016-10-31T19:30:00Z

for us, being able to get the index/ordinal in an environment variable would be very useful. #30427 is tracking this and it would be great if it was included for 1.5

smarterclayton · 2016-10-31T19:40:45Z

The API changes are too significant and we're past the window for that (in
terms of review bandwidth). It's just not going to make it for 1.5.

On Mon, Oct 31, 2016 at 3:30 PM, Alex Ouzounis notifications@github.com
wrote:

for us, being able to get the index/ordinal in an environment variable
would be very useful. #30427
#30427 is tracking this
and it would be great if it was included for 1.5

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28718 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pxFDZ9H1YUuc_iIGU8RjIDe4SAGMks5q5kHHgaJpZM4JIduS
.

@erictune

Automatic merge from submit-queue Move Statefulset (previously PetSet) to v1beta1 **What this PR does / why we need it**: #28718 **Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes # **Special notes for your reviewer**: depends on #35663 (PetSet rename) cc @erictune @foxish @kubernetes/sig-apps **Release note**: ``` release-note v1beta1/StatefulSet replaces v1alpha1/PetSet. ```

erictune · 2016-11-09T22:16:49Z

I've created a detailed documentation plan for PetSets, to be completed by 1.5 docs freeze: kubernetes/website#1655

dims · 2016-11-17T14:07:43Z

Moving to 1.6 per @smarterclayton 's comment

bprashanth · 2016-11-17T18:24:38Z

I think Clayton was talking about #28718 (comment). PetSet should be in beta under the alias "StatefulSet" in 1.5. I'll leave it to the people actively working on the transition to decide when to close this, and what to migrate out into a "PetSet in GA" bug.

dims · 2016-11-18T13:30:34Z

Ack @bprashanth At least it does not sound like a stop ship for 1.5, so marking as non-release-blocker. Please correct if i am mistaken.

Stono · 2016-12-05T09:16:32Z

I need this in my Production life.

dims · 2016-12-09T16:09:10Z

@bprashanth @foxish Is it appropriate to move this to the next milestone or clear the 1.5 milestone? (and remove the non-release-blocker tag as well)

bprashanth · 2016-12-09T17:20:40Z

petset is beta in 1.5 and I think all issues discussed here have spin off bugs

bprashanth added team/cluster area/stateful-apps labels Jul 8, 2016

bprashanth added this to the v1.4 milestone Jul 8, 2016

bprashanth self-assigned this Jul 8, 2016

bprashanth mentioned this issue Jul 8, 2016

PetSet (was nominal services) #260

Closed

bprashanth mentioned this issue Jul 14, 2016

Sticky IPs for StatefulSet #28969

Closed

chrislovecnm mentioned this issue Oct 20, 2016

Please consider changing the name of PetSet before General Availability #27430

Closed

apsinha mentioned this issue Oct 26, 2016

StatefulSets kubernetes/enhancements#137

Closed

23 tasks

bprashanth mentioned this issue Oct 27, 2016

[wip] Move tolerate-unready-endpoints annotation to beta #35713

Closed

janetkuo mentioned this issue Oct 27, 2016

Move Statefulset (previously PetSet) to v1beta1 #35731

Merged

dims modified the milestones: v1.6, v1.5 Nov 17, 2016

bprashanth assigned foxish, janetkuo, erictune and kow3ns and unassigned bprashanth Nov 17, 2016

bprashanth modified the milestones: v1.5, v1.6 Nov 17, 2016

bprashanth assigned smarterclayton Nov 17, 2016

dims added the non-release-blocker label Nov 18, 2016

bprashanth closed this as completed Dec 9, 2016

bgrant0607 mentioned this issue Mar 8, 2017

Workload API v1 requirements umbrella issue #42752

Closed

Pet Set in beta #28718

Pet Set in beta #28718

Comments

bprashanth commented Jul 8, 2016 • edited

chrislovecnm commented Jul 8, 2016

chrislovecnm commented Jul 8, 2016

bprashanth commented Jul 8, 2016

chrislovecnm commented Jul 8, 2016

smarterclayton commented Jul 8, 2016

smarterclayton commented Jul 8, 2016

chrislovecnm commented Jul 9, 2016

bprashanth commented Jul 10, 2016

magicwang-cn commented Jul 11, 2016

haizaar commented Jul 12, 2016

chrislovecnm commented Jul 12, 2016

haizaar commented Jul 12, 2016

damoon commented Jul 14, 2016

bprashanth commented Jul 14, 2016

chrislovecnm commented Jul 14, 2016

chrislovecnm commented Jul 14, 2016

bprashanth commented Jul 14, 2016

chrislovecnm commented Jul 15, 2016 • edited

bprashanth commented Jul 15, 2016 • edited

smarterclayton commented Jul 15, 2016

damoon commented Jul 15, 2016

bprashanth commented Jul 15, 2016

magicwang-cn commented Jul 16, 2016 • edited

leseb commented Jul 18, 2016

erictune commented Oct 27, 2016 • edited by janetkuo

bprashanth commented Oct 27, 2016

chrislovecnm commented Oct 29, 2016

alexouzounis commented Oct 31, 2016

smarterclayton commented Oct 31, 2016

erictune commented Nov 9, 2016

dims commented Nov 17, 2016

bprashanth commented Nov 17, 2016

dims commented Nov 18, 2016

Stono commented Dec 5, 2016

dims commented Dec 9, 2016

bprashanth commented Dec 9, 2016

bprashanth commented Jul 8, 2016 •

edited

chrislovecnm commented Jul 15, 2016 •

edited

bprashanth commented Jul 15, 2016 •

edited

magicwang-cn commented Jul 16, 2016 •

edited

erictune commented Oct 27, 2016 •

edited by janetkuo