Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubot can DOS itself via conversations.info events #554

Open
4 of 9 tasks
mistydemeo opened this issue Mar 12, 2019 · 8 comments
Open
4 of 9 tasks

Hubot can DOS itself via conversations.info events #554

mistydemeo opened this issue Mar 12, 2019 · 8 comments
Labels
enhancement M-T: A feature request for new functionality

Comments

@mistydemeo
Copy link
Contributor

Description

My company recently experienced an outage of our Hubot when we were rate-limited by surprise. After discussing it with Slack support, the reason is apparently that conversations.info was being called a large number of times.

What type of issue is this? (place an x in one of the [ ])

  • bug
  • enhancement (feature request)
  • question
  • documentation related
  • testing related
  • discussion

Requirements (place an x in each of the [ ])

  • I've read and understood the Contributing guidelines and have done my best effort to follow them.
  • I've read and agree to the Code of Conduct.
  • I've searched for any related issues and avoided creating a duplicate issue.

Bug Report

It looks like hubot-slack calls this API method during the process of sending a message:

# The `envelope.room` property is intended to be a conversation ID. Even when that is not the case, this method will
# makes a reasonable attempt at sending the message. If the property is set to a public or private channel name, it
# will still work. When there's no `room` in the envelope, this method will fallback to the `id` property. That
# affordance allows scripts to use Hubot User objects, Slack users (as obtained from the response to `users.info`),
# and Slack conversations (as obtained from the response to `conversations.info`) as possible envelopes. In the first
# two cases, envelope.id` will contain a user ID (`Uxxx` or `Wxxx`). Since Hubot runs using a bot token (`xoxb`),
# passing a user ID as the `channel` argument to `chat.postMessage` (with `as_user=true`) results in a DM from the bot
# user (if `as_user=false` it would instead result in a DM from slackbot). Leaving `as_user=true` has no effect when
# the `channel` argument is a conversation ID.
It only happens sometimes, and it looks like there's an attempt at maintaining an internal mapping:
###*
# Fetch conversation info from conversation map. If not available, call conversations.info
# @public
###
fetchConversation: (conversationId) ->
# Current date minus 5 minutes (time of expiration for conversation info)
expiration = Date.now() - (5 * 60 * 1000)
# Check whether conversation is held in client's channelData map and whether information is expired
return Promise.resolve(@channelData[conversationId].channel) if @channelData[conversationId]?.channel? and
expiration < @channelData[conversationId]?.updated
# Delete data from map if it's expired
delete @channelData[conversationId] if @channelData[conversationId]?
# Return conversations.info promise
@web.conversations.info(conversationId).then((r) =>
if r.channel?
@channelData[conversationId] = {
channel: r.channel,
updated: Date.now()
}
r.channel
)
But I'm guessing for whatever reason we saw a lot of cases where real API requests were made.

I'm not clear why exactly this was; I don't have much visibility into it. Our application itself doesn't call this method at all except through whatever hubot-slack calls on its own. Since conversations.info has a much lower rate limit than actually posting messages, this has the possibility to reduce the number of messages that can be safely posted under some circumstances.

Reproducible in:

hubot-slack version: 4.6.0

node version: 8.2.0

OS version(s): Debian jesse

Steps to reproduce:

Unclear

Expected result:

Hubot doesn't rate-limit itself in normal operation.

Actual result:

Hubot was rate-limited.

@aoberoi
Copy link
Contributor

aoberoi commented Mar 14, 2019

Oh noes! Thanks for digging into the source for this issue.

Based on your response to the reproduction steps of "Unclear", I'm guessing it hasn't happened more than once. But just to be clear, is this recurring without a known cause, or was it really just a one time occurrence?

Yes, there is caching of the results from conversations.info method calls. This was added intentionally to reduce the number of requests down to a "reasonable" rate, but the definition of reasonable is relative to your workspace size (or more specifically, the number of conversations/channels your Bot User is a member of). Let's do some math.

conversations.list is a "Tier 3" rate-limited method. This means at a minimum (there are burst tolerances that I can't publicly share), your token is limited to 50/minute. Every unique conversation ID is cached for 5 minutes before its eligible to be requested again. Let's say your bot token is being used on two separate processes, so they don't share a cache. This means you get approximately (50 * 5) / 2 = 125 unique conversations.info requests in every 5 minute window (TTL of cache) before you'd hit a rate-limit (the truth is that with burst, you'd get significantly more than that). So, even from a simultaneous cold start of your two Hubot instances, where upon start a script made an attempt to contact each one of the channels it was a member of, this is only theoretically possible if your Hubot tries to do that in >125 different conversations.

Is there any sort of behavior in your scripts that might trigger that many requests? Not that its hard fact, but knowing the number of channels your Bot User is a member of would give us an idea of what the upper bound here is (I realize that the number of DM channels in theory is equal to the number of team members in your workspace, so that could blow this out of the water, but I'm going to go out on a limb and say that your bot probably doesn't DM everyone upon startup all on its own).

Do you have any idea of the volume of conversation.info requests (and over what time window) your Hubot was making (I presume support had access to this)?

If you feel more comfortable sharing the answers to these questions over a private channel, you can DM me in the Bot Developers Hangout 😄

@aoberoi
Copy link
Contributor

aoberoi commented Mar 14, 2019

Yikes, actually there's two usages of conversations.info, and they are not upon trying to send a message (my mistake). The first is upon receiving a message, and the second is upon trying to set the topic of a channel. I think its probably more likely that within 5 minutes your bot is witness to messages from 125 unique conversations, or trying to set topics in a number of conversations so that it adds up. Do you think that's possible in your team with your scripts?

If so, I think the best path forward would be to make the TTL on the values in the conversations.info cache longer than 5 minutes. It wouldn't resolve issues from cold start, necessarily, but it should make a difference in terms of reliability.

I wish we could take better advantage of asynchronously processing queued requests when rate-limiting occurs, but if processing of an incoming event was delayed by 10s of minutes, I'm not sure that's much better than an outage, because to your users its probably effectively the same.

@mistydemeo
Copy link
Contributor Author

I think its probably more likely that within 5 minutes your bot is witness to messages from 125 unique conversations

This is the most likely situation. We have over a thousand employees and about 1500 public channels, though they're not all in regular use. Between the channels we have Hubot in and private messages, receiving commands from 125 unique conversations within 5 minutes is very likely.

@aoberoi
Copy link
Contributor

aoberoi commented Mar 15, 2019

I'm trying to get a sense of what a better value would be, and I'm ultimately coming to the conclusion that 5 mins is fine as a default, but it should be configurable.

For example, if we made it 10 minutes, we'd effectively be saying your app can witness incoming messages from 250 unique channels over 10 minutes before hitting the limit. Just for demonstration, without any limit at all (making it Infinity) you'd almost never bump into an issue, but you'd also increase the likelihood that the channel info is stale and that something might have changed that makes the behavior of the bot (sometimes dagerously) incorrect. There's a fundamental constraint here that we're only just making varying degrees of tradeoffs around, which is the rate limit that Slack has put on the method. If we find that none of the tradeoffs fit our expectations of the system, the path forward is to actually see if we can change the rate limit value on the Slack Platform.

Do you think a configurable value would be the right solution for your situation, or do you think this goes beyond finding the right TTL and we should figure out how to increase the limits from the Slack Platform side?

@mistydemeo
Copy link
Contributor Author

do you think this goes beyond finding the right TTL and we should figure out how to increase the limits from the Slack Platform side?

I think this would be the best long-term solution; it sounds like any other potential fixes are going to have noteworthy tradeoffs.

@mistydemeo
Copy link
Contributor Author

This happened to us again today. Since we use Hubot extensively internally, this has a major impact on us. I'd really appreciate any action we can get here!

@seratch
Copy link
Member

seratch commented Apr 14, 2020

Can we safely close this issue as we have the HUBOT_SLACK_CONVERSATION_CACHE_TTL_MS env variable introduced by #560 now?

@seratch seratch added the enhancement M-T: A feature request for new functionality label Apr 14, 2020
@github-actions
Copy link

github-actions bot commented Dec 5, 2021

👋 It looks like this issue has been open for 30 days with no activity. We'll mark this as stale for now, and wait 10 days for an update or for further comment before closing this issue out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement M-T: A feature request for new functionality
Projects
None yet
Development

No branches or pull requests

3 participants