Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming Bus Factor #632

Open
geekygirldawn opened this issue Mar 28, 2024 · 63 comments
Open

Renaming Bus Factor #632

geekygirldawn opened this issue Mar 28, 2024 · 63 comments

Comments

@geekygirldawn
Copy link
Member

I know that renaming what is probably our most widely used metric is going to be painful, but I think it's time to rename Bus Factor to something else.

The number of people I've had express pretty severe dislike of the name Bus Factor is quite high, and I often try to avoid calling it Bus Factor.

I often call it "Lottery Factor" because it's easy to understand. How likely is your project to survive if someone suddenly one the lottery, retired on a beach, and never looked at your project again.

Pony factor is more widely used, because it's been adopted by the Apache Software Foundation, but I find that it's harder for people to understand outside of the ASF. There isn't an easy narrative around it like what I have above for Lottery Factor.

I'd be really curious about the opinions from folks involved in inclusive naming initiatives and whether they've seen a commonly suggested substitute for Bus Factor.

I'm also curious about what the academic folks have seen. Is there a particular term that is more widely used in Academia / research?

cc-ing a few folks that I think would be interested in this discussion: @GeorgLink @germonprez @sgoggins @ElizabethN @klumb @dicortazar

I welcome any Chaotics to jump in with opinions.

@danielskatz
Copy link

In my experience, academia (at least the part that's aware of open source software) uses bus factor. In academia, most people who don't know it, after having it explained, generally either get it and laugh, or still don't understand it (and don't really understand open source either).

I know I've also heard at least one other term that was more pleasant and still worked, but I can't quite remember it for the minute. (I've also heard truck factor, but I guess that's not really much of an improvement.)

@GeorgLink
Copy link
Member

GeorgLink commented Mar 28, 2024 via email

@geekygirldawn
Copy link
Member Author

Maybe we can choose a name that is directly descriptive of the problem or threat: concentration of knowledge, distribution of effort, ...

When I talk about this issue, I generally frame it as a discussion of "Contributor Sustainability", but it's probably only one of a number of things that impact contributor sustainability.

I still think the metric should be named something that's already established within our community and the literature, which might make Pony Factor a better choice. I like Lottery Factor, but it's definitely not as well known.

@GaryPWhite
Copy link

GaryPWhite commented Mar 28, 2024

There are two things I think don't work well about "Bus Factor"

  1. Not everyone likes or intuitively understands the purpose of "bus" in the phrase without explanation
  2. "Factor" feels like a word that works, but isn't as intuitive as something like "count".

I find skeptical looks and confusion when I'm explaining that a "bus factor" of a project is 3, and that's a bad thing. I feel that if we had some language that better indicated what the bus factor is, intuitively, that would be more persuasive and useful. I know "bus" is what we're moving away from, but... When I think of projects making progress, I usually think of forms of transportation. Trains, planes, boats, cars, etc. They move many people around and need critical pieces to keep them moving.

We could borrow some of these ideas, and swap "factor" for "count" like:

Captain Count
Pilot Count
Engine Count
Turbine Count
Tether Count
Anchor Count
Driver Count
Wing Count (gets a little dark, if you think of it)
Battery Count

I'm also happy to turn away from "bus" like things, maybe options from nature without conflicting with git?

Root Count
Host Count
Monarch Count

That's if we're willing to get creative though. I think if we plan on changing the name to something, Bus Factor is certainly the most well known -- so we should update it to something more intuitive and descriptive. Out of the options I gave, I like Host, Pilot, Captain, and Monarch. Excited to have a discussion about it.

The confusing wordplay also exists for "elephant factor" -- but that should probably be a separate discussion :)

@danielskatz
Copy link

I would vote against Pony Factor, as it's based on an in-joke that is just confusing to people who aren't in the group
(because ASF is full of ponies, or people who think they are ponies)

@geekygirldawn
Copy link
Member Author

@justaugustus I'm curious if the Inclusive Naming Initiative have had any discussions about this or related terminology?

@starsplatter
Copy link

I usually use lottery as well, pony doesn't make a lot of sense to me. On count vs factor, what about something like frequent contributors count, which parallels with 'inactive contributors' and 'new contributors'. Bus factor assumes something about the impact of these specific people leaving which might or might not be true depending on additional context. Just calling it a count of people who undertake a certain level of activity leaves it more neutral and more clearly as just part of the fuller picture.

@cdolfi
Copy link

cdolfi commented Mar 28, 2024

I usually use lottery factor as well

@PaulaPaul
Copy link

I like to use names/terms that are easy to read and do not have implied meaning or metaphor. For something like this in other kinds of projects or organizations, it is sometimes called 'key person risk' (or key people/member risk). I'll toss 'key maintainer risk' in here for consideration, since when I read things like 'lottery factor' or 'pony factor' I have to go look up what that means in this context (and it may be even harder for those who don't have English as a first language); 'key maintainer risk' seems closer to describing exactly what is being measured.

@ElizabethN
Copy link
Member

I would love to not use "bus factor" and usually use "lottery factor" instead. And I always explain in a few words what that means, as I'd explain the naming for any of our metrics. In my opinion, nothing is intuitive to everyone. Even something like "event location inclusivity" requires a few words of explanation by what we mean by that.

  • I understand the need for consistency, but if we have the opportunity to make open source more inclusive overall, I think we should do that.
  • I agree with @GeorgLink in that we are focusing on the issue that is causing risk makes a lot of sense. I think it's also about access, not just knowledge. Who has the keys to the castle, so to speak.
  • I like @GaryPWhite's suggestion to use "Count" instead of Factor.
  • I agree with @danielskatz in that the use of "pony" is confusing and gate-keepy (I'd actually never heard that).
  • I agree with @starsplatter in that simply focusing on a count of people who undertake a certain level of activity is neutral
  • I like @PaulaPaul's suggestion of "key person" or "key maintainer" because we're also talking about the folks that have access to all the levels of the project.

What about "Key Maintainer Count" or "Core Maintainer Count"?

@cdolfi
Copy link

cdolfi commented Mar 28, 2024

@ElizabethN I would be concerned with using the term "maintainer" as for many projects that has a very specific meaning. Maybe "Key Contributor Count"

@GaryPWhite
Copy link

KCC has a nice ring to it, and it's definitely more clear than an analogy.

@geekygirldawn
Copy link
Member Author

Oooh, I like Key Contributor Count.

@klumb
Copy link
Member

klumb commented Mar 28, 2024

For some reason, I thought we had already addressed this one. Thanks for bringing this up @geekygirldawn. The name is definitely problematic and agree pony factor isn't good option either. We could use them as key words though link them to the new name.

I like Key Contributor Count... Or Key Contributor Risk.

@klumb
Copy link
Member

klumb commented Mar 28, 2024

...or Core Contributor Risk.

We have defined Occasional Contributors ( which was previously problematic as "Drive-by Contributors").

However, we have not defined key or core contributors. Academic literature usually uses core but key may be more descriptive of contribution importance.

@danielskatz
Copy link

I like Risk better than Count, as it has the same sense (of urgency/danger) that Bus Factor has. It also feels less like something people would try to game

@GaryPWhite
Copy link

At risk of sounding like a typical tech exec.... Wouldn't "gaming" this metric be a good thing? More people contributing to oss at a level to constitute bus factor seems likea. good thing...

I'll put in that I think "risk" being a number runs the same risk (ha) as using a word like "factor". Without explanation, "my key contributor risk is 3" is a nonsensical phrase.

@GeorgLink
Copy link
Member

GeorgLink commented Mar 28, 2024 via email

@klumb
Copy link
Member

klumb commented Mar 28, 2024

In truth, without explanation, any name we choose is likely to be nonsensical. Some are more descriptive than others though. The metric should describe what we are trying to measure - which is the risk associated with key contributors abandoning a project (i think). I wouldn't get hung up on a number.

@GaryPWhite
Copy link

@klumb I get ya. The measurement is absolutely indicating the risk. I would agree more with "risk" if the metric itself wasn't a count/number. It's descriptive of what we're measuring to name the measurement. If we're renaming it anyways, why be vague? We could keep metaphorical names like "pilot risk" etc. but that hardly solves the #2 problem I mentioned above, where I regularly have to explain what the metric actually is for people to buy into why it's useful. Just my experience, though.

I like @GeorgLink 's observation. Majority Contributor Count is ultra-succinct and descriptive. I didn't even think about how "key" could mean like "having a key". Majority is much more specific.

@klumb
Copy link
Member

klumb commented Mar 28, 2024

Also, I think value judgement is going to necessary for this metric. What the value is the question? Is it related to 'ownership/authorship of a percentage of the codebase?

@klumb
Copy link
Member

klumb commented Mar 28, 2024

If it is about percentage of codebase, rather than contributor, maybe we need it to be about contribution authorship. For example, Majority Contribution Authorship, Majority Contribution Spread, or Majority Contribution Maintainership? Contribution Maintenance Risk? Majority Contribution Count?

Just throwing some more out there. ;)

@starsplatter
Copy link

I really like majority, I think it removes the value judgement of 'key'. This count in the chaoss metric reads to me as just a naive count of how many contributions people make as a percent of the total number of contributions, it says nothing about the value of those contributions in terms of code quantity or quality, which I think argues for keeping the metric as more of a single neutral data point. The metric says it wants to answer "how many contributors can we lose before a project stalls?" but that seems packed with assumptions to me.

@danielskatz
Copy link

Sorry, but I have no idea what majority means in this context. And given that not all open source contributions are captured in a repository, how would it be measured?

@GaryPWhite
Copy link

The metric says it wants to answer "how many contributors can we lose before a project stalls?" but that seems packed with assumptions to me.

It totally is! That's part of the magic IMO 😄 There's some stake-in-grounding happening here. What kind of assumptions need to get made to actually measure something, ya kno?

And given that not all open source contributions are captured in a repository, how would it be measured?

While it's perfectly possible this isn't a perfectly accurate representation, I believe that the metric and it's implementations are usually disjointed. I believe most of the time, contributions are "counted" here as "commits" or "pull request open/close" or "issue open/close". That's just a function of using the GH API / history to make measurements... More tools could definitely get built to measure more though 😃

Sorry, but I have no idea what majority means in this context.

"Majority" here meaning who is making the majority of contributors. Majority Contributor Count = count of contributors who make the majority of contributions in the project for some time window.

@emmairwin
Copy link

emmairwin commented Mar 28, 2024

I like what the bus factor means, in that - a project is one disaster away from the project being completely abandoned or maintained. Some maintainers might keep maintaining if they win the lottery :)

I think the seriousness should be retained because that seriousness is what gets people (leaders, people with influence) to act (majority contributor count IMHO, not so much) _, but agree bus factor is morbid. Propose then something more like 'disaster factor' because it has meaning immediately.

@klumb
Copy link
Member

klumb commented Mar 28, 2024

Adoption may be better for Disaster Factor because it is close enough to the previous problematic name. It also signals the risk part. I think that could work.

@klumb
Copy link
Member

klumb commented Mar 28, 2024

Disaster Factor = The risk associated with a count of contributors, who authored a majority of contributions in the project for some time window, abandoning a project. It is probably a good idea to review the description and objective of this metric as well.

@PaulaPaul
Copy link

This is such an interesting discussion!
I understand the concerns with the use of terms like 'key' (that would require definition), and the discussion of what we are really trying to measure or gauge here. There is an element of 'risk' (is this project at risk if any one person decides to stop contributing?), and an element of project 'resilience' and 'sustainability' (could the project survive and thrive without a specific, small, number of engaged contributors?).

At the risk of making this more complicated, is there a rubric that is used to come up with this number, so the name of the metric might be less of a concern (it's explanation would be the rubric)? I like the words 'adoption', 'risk', 'sustainability', and 'resilience' because they are less problematic (for me) than 'bus' or 'disaster' -

@PaulaPaul
Copy link

We could use the GitHub poll capability in the Discussions area to create a poll from the names that have been suggested here and solicit votes from the community. Let me know if you'd like me to put that into a Discussion thread (I'm still relatively new to this community - sorry if that's been rehashed or if there is a different norm for this sort of thing!)

@dicortazar
Copy link
Member

Hello there! A bit late here, but even this might be an unfortunate name, this seems to be used in many places, even in Wikipedia.

According to Wikipedia, it seems there's an existing and older concept coming from the insurance world called Key Person Risk. This is indeed a term similar to other proposals here mentioned.

Even in Google Scholar this seems to be a term used once and again.

My proposal would be to either keep using this term, or if this is updated, we should identify this with this more usual way. As an example, we could say: Key Person Risk (aka bus factor).

Some thoughts on this :).

@emmairwin
Copy link

Crediting some recent work of @JustinGOSSES I'll also nominate 'Nebraska Factor' , which also shows that with one false move ...

It also depends how granular we believe this metric to be - for me sustainability and resilience are more like metric-models, which would contain the bus factor among others, not itself the measure.

@klumb
Copy link
Member

klumb commented Mar 29, 2024

I like some of some the other names mentioned as well, but I do tend to agree with Daniel here. My favorite name is a variation on that - Key Contributor Risk.

@klumb
Copy link
Member

klumb commented Mar 29, 2024

We could do a ranking poll in our community for the names mentioned in this discussion and then discuss the top three in the next metrics meeting.

@dicortazar
Copy link
Member

@emmairwin that's a great point! But this is perhaps too US centric...we can always talk about the Antartica factor XD.

Anyway, my understanding (without having the proper definitions in front of me) is that the Nebraska factor focuses more on a tiny but important piece in the whole SBoM ecosystem where there's a risk of not being aware of the use of such pieces of code and its risks, while the bus factor is specific of one project. Perhaps we could say the Nebraska factor is a meta-bus factor.

@emmairwin
Copy link

emmairwin commented Mar 29, 2024 via email

@dicortazar
Copy link
Member

Ha! A joke on the joke ;).

Let me rephrase myself, I've seen this used in SBoM contexts :). And yep xkcd is great!

@JustinGOSSES
Copy link

JustinGOSSES commented Mar 29, 2024

I'll just second the previous comments that the "key person risk" phrasing seem an immediately understandable replacement for bus factor, at least to me.

I would also suggest it might be helpful when renaming to also call out that there's some common variations on this idea that are related but different metrics. Elephant factor is one I've heard confused before. There's also the DDS (Distributed Development Score). It is most similar to BUS factor in that it is trying to measure something that correlates to the same negative event that people want to avoid but is calculated slightly differently.

The negative event people are trying to predict a chance of is development suddenly stopping. Mechanisms for that stopping vary a bit. Methods of calculating metrics that approximate mechanisms that lead to the negative event vary even more.

I will also note that in practice, I've sometimes also looked for # of maintainers (or approvers of PRs) in past year. Any bus-factor-ish metric used by itself will say there's no risk of abandonment of a project that's only had PRs approved by a single person in the past 3 years if 80% of the commits were developed equally by 10 people in 2016 who haven't touched the project in 5 years. In the other direction, a project developed 70% by one person in 2016 will have a risky key person factor, but if 20% of the other commits were done & approved equally by a group of 10 over the past 2 years the actual chance of sudden abandonment is probably very low.

@schalkneethling
Copy link

schalkneethling commented Apr 2, 2024

I have also been using the Lottery Factor, but I have to say I really like "Core Contributor Factor". The main idea to communicate is that a project with a single point of failure is risky to adopt and so if there are already several projects that depend on said project, the single point of failure must be addressed.

After brainstorming with our AI friends I would like to propose SPOC

Single
Point
Of
Critical Failure

or SPOC-R

Single
Point
Of
Critical Failure
Risk

Used in a sentence

Unfortunately, this project has a high SPOC risk.

@arnuojo
Copy link

arnuojo commented Apr 3, 2024

+1 for Lottery Factor.
Compared with Bus Facor, Lottery Factor is much easier to understand and no offense to anyone.

@geekygirldawn
Copy link
Member Author

In an attempt to summarize this very long discussion, here are the options that seem to be getting the most traction:

  • Lottery Factor
  • Disaster Factor
  • [Core / Key] [Contributor / Person] Count
  • [Core / Key] [Contributor / Person] Risk
  • SPOC (Single Point Of Critical Failure)

If this seems reasonable, I would like to create a poll to gauge reactions with the caveat that the CHAOSS project operates by consensus, so this poll is limited to understanding what people think, and the one with the most votes will not necessarily be the "winner".

Notes:

  • I'm using contributor / person rather than maintainer / developer because this is a decision we've made when defining other metrics to make sure that we don't discount the work of other types of contributors.
  • I've left out ones that created confusion and required extra explanation (e.g., majority, pony)
  • Anything in [Brackets] contains similar words that we can decide on later after more research.

@danielskatz
Copy link

I think this "vote" seems like a good idea.

Just as a point on one of the options, I think it will be confusing when the single point of critical failure metric is not 1... "Points of Critical Failure (PSOCF)" and "Critical Failure Points (CFPS)" are alternatives for this idea.

@geekygirldawn
Copy link
Member Author

OK, we now have a poll based on the above options. Please vote here: #634

@Jefro
Copy link

Jefro commented Apr 8, 2024

Hey folks, sorry I am late to the party. Just wanted to throw in one more term that I have heard but didn't see mentioned here - succession plan, or succession strategy. I offer only for consideration, not because I think the poll needs to be changed - I like "lottery factor".

@geekygirldawn
Copy link
Member Author

succession plan, or succession strategy

@Jefro - this is an excellent point. I don't see it as part of the metric itself, but it's an important part of what you should be doing as an outcome or action as a result of what you learn from measuring this. I've just added a paragraph about succession planning to the a Practitioner Guide for Contributor Sustainability that I've been working on. It's a first draft that's ready for feedback, so if you have time, feel free to have a look and leave comments / suggestions :)

@RichardLitt
Copy link

I do not like the term Lottery Factory. It reminds me strongly of problem gambling, which can be diagnosed as a pathological addiction disorder, and which is much more prevalent in poorer communities. I think any gains we have by getting rid of the term bus factor are lost when moving to Lottery factor.

Disaster Factor also ignores the the fact that a contributor may leave a project for good reasons, personally for themselves and for project as a whole (which may have reached a point in its lifecycle where sunsetting is best for everyone).

SPOC makes me think of Star Trek, which is fine, but it'll be confusing when talking to others. 🖖

@geekygirldawn
Copy link
Member Author

geekygirldawn commented May 7, 2024

I was a big fan of Lottery Factor, but with the comment from @RichardLitt, now I'm not so sure.

Another option that @decause-gov proposed on the poll itself is "Nebraska Factor", which I think is something else to consider:

After visiting OSSNA this year, "Nebraska Factor" a la XKCD (https://xkcd.com/2347/) is a name that has been bigtime popularized in the Supply Chain world. Perhaps this could be a decent candidate, if the namespace is not already claimed elsewhere in CHAOSS?

The benefit of Nebraska Factor is that it doesn't imply a reason (disaster, bus, lottery), because as mentioned above, a person can leave for positive reasons, too.

@danielskatz
Copy link

danielskatz commented May 7, 2024

I'll vote against Nebraska Factor, as in inside joke that's likely off-putting to those not in on it. It also doesn't have any inherent meaning, so you have to know what it is to know what it is, the name doesn't help you understand it (unless you already know the cartoon)

@starsplatter
Copy link

Agree. Nebraska factor seems off-putting. When I used lottery factor it was largely in response to the negativity of bus factor, not because I particularly liked it. I ended up voting for the
[Core / Key] [Contributor / Person] Risk
which I felt did a better job describing what was actually being talked about.

@sgoggins
Copy link
Member

sgoggins commented May 7, 2024

How about: Critical contributor index?

@klumb
Copy link
Member

klumb commented May 7, 2024

Keep in mind, if we use key/core/critical contributor, we will also need to address this in our ongoing discussions about the boundaries between - core/regular contributors, occasional contributors, conversion rate, and 2nd contributors. What is a key/core/critical contributor? I think Bus Factor addresses the risk associated with 'people' leaving but I don't believe it defines who they are or why they are important.

@geekygirldawn
Copy link
Member Author

I'm thinking maybe we keep it simple. We don't need to make the judgement about whether someone is "core", "key", or "critical" in this metric, so maybe just "Contributor Risk"?

@jeffabailey
Copy link

I like "The Lottery Bus Factory". 🚌 💰

The Lottery Factor works as well.

@samanthavenialogan
Copy link
Contributor

samanthavenialogan commented May 15, 2024

Hey All, just wanted to pop in here to let you all know that I came across an 'industry standard' (in quotes because I'm not sure it actually is) here in Australia wherein Dr. Jennifer Beckett has a measure for the bus factor but she referred to it more informally as the 'moses effect.'

She "firmly believes" (this is in quotes because I'm pretty sure it was a joke) that Bus factor is a MUCH better term, but we are probably going to be using bus factor as a primary measurement for internet toxicity because it specifically acts to measure the Bridging Capital of individual influence from inside of a community, to the outside public. More importantly it connects the three social capitals (bonding, linking and bridging) together in one singular user journey so we can easily understand the risk of that person propping everything up. Reversing the metric also allows us to gauge the level of reputation and briding capital that someone will have coming INTO the community (reputation being a social currency metric, and riding capital being a social capital metric).

If you'd like I can have a longer conversation with her about this and it may give us a new insight into the way that Bus Factor impacts the socio-cultural stability of online communities. Worth a look I think!

@samanthavenialogan
Copy link
Contributor

samanthavenialogan commented May 15, 2024

In a seperate comment on this, I also wonder if we should be basing the name of this off of what is actually happening when you graph the bus factor on a network diagram - not just on contributors to an opensource project, but actually place it within the context of social capital and currency theories IN GENERAL.

In reality the Bus factor occurs when a 'node' (member in a community) has garnered a lot of linking capital (they are connected to a lot of people) and Bonding capital (they have close ties and have grown in reputation so their voice is recognized), shows a potential threat or likelihood of leaving a community--and their linked members are only connected to them in the project, so those nodes are at risk of disappearing from the community network diagram.

In other words, that individual has garnered the linking capital and bonding capital to prop up a community BUT if they were to leave the community would be at risk. Within the context of an opensource project that is the likelihood that their leaving would put the project at risk but even for an entertainment community this can be measured in the amount of engagements that a member of import has caused, in comparison to the amount that surrounding nodes have caused.

We also see this example in real life with malls (🤮) American malls were architected such that there was always a food court in the middle for people to connect with as a 3rd space. Then at either end of a mall (usually a line in 2 or 3 directions) there would be an 'anchor store' that people would go to for low-dollar, but high-value items such as grocery, or mig-market stores. The smaller novelty stores and specialty services such as asian-import stores, mini-gold outlets, video game labs, and whatnot relied on the big-box anchors to force people to go between the communal space, and the larger store. Larger stores relied on the people being there for those specific interests to keep them in the mall for long periods of time.

What caused malls to die, and also what causes bigger contributors who commit frequently to leave, is usually that the perceived reward, becomes too much for the work that they have to commit. (I've talked about the burden of contribution in CHAOSS meetings before).

So there could be something involved in the generalized issues for bus factor that we could use to rename it. This might be 'lossed link likelihood' or 'at-risk supporter' or something along those lines?

@geekygirldawn
Copy link
Member Author

In the metrics meeting, we discussed renaming this to "Contributor Risk" to keep it simple and descriptive, similar to our other metrics names.

@danielskatz
Copy link

This might be the least worst option 😄

@RichardLitt
Copy link

RichardLitt commented May 23, 2024

Another idea: kujenga factor. This is the Swahili word meaning "to build", and it is the basis for the popular game Jenga™. As you may not have played it, this game involves taking wooden blocks out of a tower, until the tower eventually falls down, at which point the person who removed the last block loses. https://en.wikipedia.org/wiki/Jenga

I believe that Jenga is trademarked. Kujenga, however, isn't. It's relatively easy to explain.

Also, it looks like:

dependency_2x

@danielskatz
Copy link

Are we thinking about projects or people?

@RichardLitt
Copy link

Or ecosystems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests