Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Gitter rooms, Download all data #8418

Closed
QuincyLarson opened this issue May 3, 2016 · 114 comments
Closed

Deprecate Gitter rooms, Download all data #8418

QuincyLarson opened this issue May 3, 2016 · 114 comments
Assignees

Comments

@QuincyLarson
Copy link
Contributor

Many old rooms such as /helpBonfires are now deprecated, but campers still join them. We need a contributor who's earned write access to this repo to go through and update the topic of these rooms by running:

/topic This room is inactive, and only exists for archival purposes. Join some active rooms - here's the full list: https://www.freecodecamp.com/wiki/en/official-free-code-camp-chat-rooms/
@QuincyLarson QuincyLarson added the help wanted Open for all. You do not need permission to work on these. label May 3, 2016
@raisedadead raisedadead self-assigned this May 3, 2016
@raisedadead
Copy link
Member

Rooms Notified

  • Bonfires
  • Basejumps
  • TwitchTV
  • NodeSchool

Left out the City specific rooms.

@QuincyLarson I think that should be it.

P.S I just realized that we have 540 rooms, with this audit.
And some of them were really funny, some created and forgotten.

@raisedadead raisedadead added the status: waiting review To be applied to PR's that are ready for QA, especially when additional review is pending. label May 3, 2016
@sludge256
Copy link
Contributor

Added:

  • Ziplines

@raisedadead raisedadead removed the status: waiting review To be applied to PR's that are ready for QA, especially when additional review is pending. label May 4, 2016
@BerkeleyTrue
Copy link
Contributor

Nice work!

@QuincyLarson
Copy link
Contributor Author

@sludge256 @raisedadead actually, I think we need to do this in literally every room that isn't on our official rooms list (unless it happens to have significant activity within the past 3 days).

@QuincyLarson QuincyLarson reopened this May 5, 2016
@raisedadead
Copy link
Member

Have checked on most rooms, all of the rooms which did not have any activity (in more than a month) have not been touched (simply just because it would have triggered discussion in them).

The campsite rooms do not need this, they already have a message and the above also applies to all of them.

Closing.

@BerkeleyTrue
Copy link
Contributor

@raisedadead We have discontinued the fcc wiki. Do the deprecated rooms point to the corresponding post in the forum?

@raisedadead
Copy link
Member

@BerkeleyTrue
I have updated the topic.
They still link to the wiki in the chat message, I can update, just don't want to trigger notifications. Its best left as they are, considering there is no recent activity in most.

However, Ziplines and Bonfires sometimes get visitors, because Gitter shows them as suggested rooms.

@QuincyLarson
Copy link
Contributor Author

@raisedadead @BerkeleyTrue we may actually just want to delete the deprecated rooms. Few would be missed, and that way, Gitter's native discovery features would work properly. We wouldn't have to list our official rooms - we would only have official rooms (all the unofficial other rooms would be run by campers themselves and not under the freecodecamp prefix.

Downside: we lose some history and some small amount of Google search results
Upside: the chatrooms become much simpler to explain to people.

With the success of the forum, my goal is to shift a lot of communication that would have taken place intermittently on Gitter over to the forum, where the expectation is that it may take days to get a response.

So many of the chat rooms are ghost towns - hence us frequently pruning or deprecating them.

We would need to go through the official rooms and see which are active.

@raisedadead
Copy link
Member

Yeah, I agree. I think the upside is better in favor of deleting them. Will do the audit and post a list that can/must be deleted.

@raisedadead raisedadead changed the title Update topic of deprecated Gitter rooms Deprecate Gitter rooms Aug 2, 2016
@raisedadead raisedadead removed the help wanted Open for all. You do not need permission to work on these. label Aug 2, 2016
@raisedadead raisedadead reopened this Aug 2, 2016
@QuincyLarson
Copy link
Contributor Author

many: > 100/day
some: > 10/day, < 100/day,
few: < 10/day
inactive: no posts today

FreeCodeCamp - many
Help - many
HelpJavaScript - many
HelpFrontEnd - many
HelpDataViz - some
HelpBackEnd - some
Python - few
Java - few
Ruby - inactive
PHP - few
Go - inactive
Elixir - inactive
.NET - inactive
C++ - inactive
Vagrant - inactive
Git - inactive
Linux - inactive
SQL - inactive
CodeReview - many
YouCanDoThis - few
CodingJobs - many
Casual - some
CurriculumDevelopment - some
DataScience - some
Albanian - inactive
Arabic - few
Chinese - many
Dutch - inactive
German - inactive
French - inactive
Japanese - inactive
Korean - inactive
Persian - inactive
Portuguese - inactive
Romanian - inactive
Russian - few
Spanish - some
Swedish - inactive
Tagalog - inactive
Thai - inactive
Vietnamese - inactive

@QuincyLarson
Copy link
Contributor Author

QuincyLarson commented Aug 2, 2016

Our community is primarily spread out across 3 places:

  • Gitter
  • Our forum
  • Facebook groups

For what it's worth, here's my thinking on each of these mediums:

  • Chat rooms - in order for a chat room to be useful, people should generally get a response within a few minutes of posting a question or comment. That's how it is with our main chat room and most of our help rooms. Chat rooms should hold your attention. They are less good for asynchronous communication, as many teams are discovering.
  • Facebook groups - fun places to casually share links, have superficial (non-threaded) discussions asynchronously, and just go and scroll through recent posts.
  • Forums - ideal for deep, topical discussions. Someone posts a link or asks a question. Someone else answers. A discussion emerges. These are less fun than Facebook groups and chat rooms, but generally more practical.

Of these three, chat rooms forge the tightest bonds. Talking with people in real time is exhilarating, and it can be hard to leave a chat room. This said, chat room suffer from the "ghost town" affect much more than forums or Facebook groups. Once things die down in a chat room, it can be hard to get the conversation started again.

By concentrating as much of our chat room-going community as possible into a few key rooms, we mitigate the risk of ghost towns.

The question is what rooms do we keep?

Based on the above research, I humbly propose we:

  • move all discussion of non-JavaScript languages over to the forum
  • move all world languages over to the forum and only leave the Chinese and Spanish Gitter rooms open. We can further encourage the campers who are using these to also try using the forum, and see whether those rooms continue to be active.
  • for sure leave the main room, help rooms, codeReview, coreTeam
  • discuss the future of CurriculumDevelopment, Hikes, and other rooms. These could all be combined into a more general, more active "Contributors" room.
  • By reducing the number of rooms, the remaining rooms become more prominent. For reference, here is what our "rooms" page looks like now: https://gitter.im/orgs/FreeCodeCamp/rooms

I am excited to hear everyone's thoughts on this.

@evaristoc
Copy link
Contributor

evaristoc commented Aug 2, 2016

QuincyLarson

I saw your proposed list and your invitation to discuss the future of a chatroom like the DataScience one. I don't think this room fits a "general room" format: the discussions that are occurring there are mostly about Data Science. Merging that room into another more general one could kill the current content and motivation of the room unless it is merged with rooms with similar intentions. A room like "contributors" doesn't looks like the best title for this one. I can also say that it is attracting people: there are currently 680 subscribers to the room, and some people are monitoring the activity.

As it is now, I don't think the nature of this room and the activity involved would fit the forum format.

I can only talk about this room because it is the room I have been managing since its foundation.

@evaristoc
Copy link
Contributor

evaristoc commented Aug 2, 2016

I have also monitored the activity in the Python room: it is not 'few' but 'some'. However, this is a room that due to its nature could be moved somewhere else.

@evaristoc
Copy link
Contributor

evaristoc commented Aug 2, 2016

The Spanish room is one of the most active ones I have seen too... The people there are really good in maintaining the room activity.

This won't fit the forum format neither.

If you think that the future is to delete those rooms, I think that the Spanish one will be affected by deleting related rooms like HelpDataViz, HelpJavaScript etc. The Spanish room is also a place to help to solve issues that Spanish-speakers found in those rooms. Once you delete the Help-related rooms and move activity to the forum, it is possible that the Spanish room won't survive or reduce its activity. If the activity in Gitter becomes less in general, the Spanish room could trend to disappear.

@evaristoc
Copy link
Contributor

I suggest the following:
For some of those projects with kind of "some" level of activity, either merge or wait to see what happens with them after deleting other rooms that might be related.

I think deleting rooms about Help- will simply reduce the whole activity at Gitter, as Gitter won't be a reference to the set of help resources any more.

@SamAI-Software
Copy link
Member

SamAI-Software commented Aug 2, 2016

either merge or wait to see what happens with them after deleting other rooms that might be related.

@evaristoc just came up with a great idea. Instead of making a big change at once, lets make these changes step-by-step.
For the start we can close the other languages rooms (not JS) and try to move discussions to the forum and see the result, plus we will get a feedback from regular users of those rooms.
If the feedback would be very negative, then we will rethink the whole idea of closing rooms.

@evaristoc
Copy link
Contributor

evaristoc commented Aug 2, 2016

@QuincyLarson
If the actual plan is to delete them no matter the activity, I would suggest to offer alternatives to users talking to the main moderator regarding:

  • How FCC sees the room (is it part of the main core? is it adding value to FCC?)
  • Discuss the possibilities and how FCC will consider group (is it still core to FCC? or is it an alternative that should be managed by users themselves? will it receive support by FCC in the future?)
  • Agree a decommission plan: a time lapse at which the room will be definitively moved or deleted.

At least, to prevent the users with enough time to "move their stuff somewhere else".

My understanding is:
DataScience, Spanish, Russian, Chinesse, Casual are not part of the main core and mission of Free Code Camp. Although I won't like to suggest that for the room I am managing, and I know that will affect the activity of that room completely, those rooms could be decommissioned into other platforms (like FB), with the caveat that they will lose A LOT of traffic - they will anyway if Gitter becomes less prominent as FCC platform.

This should be done only if you decide that those rooms are not contributing to FCC project directly. Otherwise I suggest you to keep them until you see what happens with Gitter activity after deleting other related rooms.

@CarlJKashnier
Copy link

I agree with Quincy. If there are very active city rooms that might be a consideration to keep (I know Cleveland, where I am from, is nothing since April.) I do think that pairing down the rooms that duplicate other rooms would make things easier. Like the old help rooms before the change over.

Now, Spanish/Chinese we don't touch them. Chinese especially because of a lack of a good FB alternative. I think we should be able to get by with about 15-ish rooms using a flow of skill sets. Front, Data, Back, Code-review, Pairing, CodingJobs. Contributers, Core . I am pretty sure that there are other rooms that should be added, but these are the ones I feel most strongly on. Once we move to the new curriculum (Each segment of the certification being it's own certification might warrant its own room)

Maybe keep off topic as a place to spend a pomodoro break away from coding.

@abhisekp
Copy link
Member

abhisekp commented Dec 18, 2016

@QuincyLarson I've downloaded the FreeCodeCamp/FreeCodeCamp main room completely till dated 17-12-2016 3:39:24 PM GMT.

Total Uncompressed Size: 588 MB (tab separated format .tsv)
Compressed using 7z Size: 95 MB (will be uploaded to repo using git-lfs)

Sample Format

room_id room_uri sent_at from_userid from_username message_id text
546fd572db8155e6700d6eaf FreeCodeCamp/FreeCodeCamp 2014-11-22T00:26:21.469Z 546fd823db8155e6700d6eb4 Rybar 546fd82da07c098d4401b480 Hola.
546fd572db8155e6700d6eaf FreeCodeCamp/FreeCodeCamp 2014-11-22T00:15:04.643Z 546fd58872a00ba87914fcfe @freeCodeCamp first person here

Note: the from_userid and from_username fields were empty in the original message.
This is the First Ever message in FreeCodeCamp room 😃


  • Ran archiving script in C9.io and the whole room downloaded in just 1 night.
  • Will release the script soon so that gitter archives can be made for any community in gitter. ⚡

Note. I've monitored the whole download and there is absolutely NO errors while downloading the messages.
I used http://papertrailapp.com for logging.

// cc: @evaristoc

@abhisekp
Copy link
Member

git-lfs cannot be used as it is chargeable. And github is not the right place to upload large files.

https://github.com/ckolivas/lrzip gives the best compression. I was able to compress 588 MB file to only 75 MB. And then decompress it. (both using C9 i.e. 512 MB memory limit)

@QuincyLarson
Copy link
Contributor Author

@abhisekp Awesome! I didn't realize Git-LFS costs money. Can't we just push these files to a repo? FreeCodeCamp/FreeCodeCamp is the largest Gitter room by an order of magnitude, so at that level of compression, the other rooms shouldn't be much of a problem.

@raisedadead
Copy link
Member

Can we dump this to Amazon S3? GitHub clearly isn't the right place for data storage. It comes with Free 5GB of storage and data retrievals cost $0.01 per GB.

I think we already have an instance for this?

@QuincyLarson
Copy link
Contributor Author

@raisedadead yes - I can put them into our AWS S3 account for hosting. Once you have the files ready for all the rooms we're archiving, let's hop on a call and figure out how to get them from your computer to our S3 instance.

@raisedadead
Copy link
Member

@QuincyLarson, @abhisekp has the data currently. So maybe he can only guide the best.
Uploading should be pretty straight as long as you can (maybe with help from Berkeley) create credentials for him with the correct scopes

But, do let me know if I can help in any way.

@QuincyLarson
Copy link
Contributor Author

@raisedadead Thanks for the idea! I've created an S3 key especially for @abhisekp and sent it to him.

@ladybugtju
Copy link
Contributor

ladybugtju commented Dec 20, 2016

Hi there
Glad to see such progress :) @abhisekp, the rooms downloaded are on the same link I shared sometime ago: https://docs.google.com/spreadsheets/d/1HSRL-HTOREYF86mNDczNp7XNA5Tuo4TWP4AiMYyZDU8/edit#gid=0
It dates back to October though, so probably better to realign the data. I actually picked important and active rooms so there is for sure lots of new stuff. Did you automate downloading all rooms or do you have to do that separately? Let me know if you need help. Cheers

PS: Here is a link for the downloaded rooms: https://we.tl/lcBU6Cs7E4
12 rooms on October 21. It s just a weTransfer but you could use Github, AWS or other

@abhisekp
Copy link
Member

abhisekp commented Dec 23, 2016

gitter-archive-cli Released 🎉

Archive gitter communities worry-free 😃

Feature

  • ⚡ Fast download using multiple gitter tokens
  • ✋ ⚙️ Pause and Resume feature (CTRL + C to end the process) and on start, it resumes from where it left off (as per auto-generated file gitterarchive-settings.json).
  • Archive and No archive room list using wildcard pattern matching. (example below)

How to use

# install globally
$ npm i -g gitter-archive-cli

Create a directory where you want to save the gitter community archives.

Create a .env file and .gitterarchiverc.json as per example below, in that directory.

NOTE: noArchiveList takes precedence over archiveList.

Now simply start the archiving process using gitter-archive command. 😄

Example

.env

# Gitter Tokens
GITTER_TOKEN_username1=
GITTER_TOKEN_username2=
GITTER_TOKEN_username3=

.gitterarchiverc.json

{
  "rooms": {
    "noArchiveList": [
      "FreeCodeCamp/HelpJavaScript",
      "FreeCodeCamp/FreeCodeCamp",
      "FreeCodeCamp/Help",
      "FreeCodeCamp/HelpFrontend",
      "FreeCodeCamp/HelpBackend",
      "FreeCodeCamp/[a-m]*"
    ],
    "archiveList": [
      "FreeCodeCamp/*"
    ]
  },

  "groups": {
    "enabled": [{
      "uri": "FreeCodeCamp",
      "id": "57542cf4c43b8c6019778297"
    }],
    "disabled": []
  }
}

Run 🏃‍♂️

$ gitter-archive

If it stops abruptly, then simply run the above command again and it will auto resume from where it left off. ✅


some known bugs

if you see Error Response 429 status, wait for one or two minutes and don't end the process. It will work fine after a few minutes.


Sidenote: It works seamlessly in C9.io


// cc: @evaristoc @ladybugtju

@QuincyLarson
Copy link
Contributor Author

@abhisekp Awesome! If this works well, can you go ahead and set it to run non-stop until all of our rooms are backed up? Then we can zip that up and push it to S3.

@ladybugtju
Copy link
Contributor

@abhisekp Let the magic happen :) Cheers

@abhisekp
Copy link
Member

abhisekp commented Dec 24, 2016

@QuincyLarson @ladybugtju Thanks. The magic is almost complete 😏
👉 https://github.com/FreeCodeCamp/gitter-history


Github has a strict file size limit of 100 MB but gives a warning at 50 MB. I was able to push a 53 MB file (FreeCodeCamp/HelpFrontend room archive) .
https://help.github.com/articles/what-is-my-disk-quota/


The only remaining rooms are

  • FreeCodeCamp/Help (messageCount: 1368452) (291 MB uncompressed)
  • FreeCodeCamp/HelpJavaScript
  • FreeCodeCamp/FreeCodeCamp

These are the most largest rooms.


Update

Pushed FreeCodeCamp/Help room after archiving it using zip with maximum compression level of 9.
File size: 62 MB (compressed)

@QuincyLarson
Copy link
Contributor Author

@abhisekp Amazing work. So now that these rooms are in version control, do you think we're safe to start deleting these rooms?

As far as the larger rooms, we might want to break them up into separate files by date. For example: FreeCodeCamp/FreeCodeCamp August 1 2015 - November 31 2015

This will make it easier for us to continue pushing updated archives to them.

This will be a huge public dataset and I imagine a lot of people will be interested in it. We can publish this on Kaggle once it's ready :)

@QuincyLarson
Copy link
Contributor Author

@abhisekp how is this process going? Have you managed to do a full archive of our Gitter rooms? Can we start closing rooms that we don't plan on keeping now?

@QuincyLarson
Copy link
Contributor Author

QuincyLarson commented Feb 25, 2017

Our plan is to:

  • download ALL messages from FreeCodeCamp/Help
  • download ALL messages from FreeCodeCamp/HelpJavaScript
  • download ALL messages from FreeCodeCamp/FreeCodeCamp
  • download messages from all other rooms that occurred after Dec 24th

Once all these messages are in the https://github.com/FreeCodeCamp/gitter-history repo, I will go through and delete all rooms except for those on our official rooms list. All of these rooms already have deprecation messages and aren't being used anyway.

Campers can still create their own unofficial Gitter rooms, but freeCodeCamp's chatroom structure will be clean and simple, with minimal ambiguity about which rooms to go to.

Then we will use https://gitter.im/FreeCodeCamp/home as our main entryway to our chat rooms.

@evaristoc
Copy link
Contributor

Sorry @QuincyLarson asking why FreeCodeCamp/FreeCodeCamp?

@erictleung
Copy link
Member

@evaristoc sounds like the answer is

...freeCodeCamp's chatroom structure will be clean and simple, with minimal ambiguity about which rooms to go to. - #8418 (comment)

A room called just freeCodeCamp isn't really descriptive of what the room does, per se.

@evaristoc
Copy link
Contributor

evaristoc commented Mar 10, 2017

@QuincyLarson @erictleung for a small project I was planning to do I downloaded almost all messages from the main room. Data is until yesterday.
That is about 2.5Gb of data without compression (if the data I downloaded is correct). Is this something that would require to be added to the repo?

@erictleung good point but I understand these are the list of official rooms. Freecodecamp room is still there:
#8418 (comment)

Is it a change in that list I am not aware of?

@QuincyLarson
Copy link
Contributor Author

@erictleung there's no way to rename a Gitter room. Otherwise we would indeed rename that chat room to "general" or "casual".

@QuincyLarson
Copy link
Contributor Author

@evaristoc yes - that would be awesome. If you pull all of freecodecamp/freecodecamp be sure to add it to https://github.com/FreeCodeCamp/gitter-history

@QuincyLarson
Copy link
Contributor Author

OK - all the rooms that were deprecated were backed up afterward. I haven't heard anything from @abhisekp recently so I've gone ahead and deleted the deprecated rooms. Thanks everyone!

@evaristoc
Copy link
Contributor

evaristoc commented Dec 7, 2017

New rooms scheduled for archiving:

  • FreeCodeCamp/NewYorkCity (id: 5593982115522ed4b3e3263f)
  • FreeCodeCamp/CoreTeam

Currently exploring @abhisekp approach to archiving before initiating the process of downloading the data:

#8418 (comment)
#8418 (comment)
#8418 (comment)

@evaristoc
Copy link
Contributor

evaristoc commented Dec 7, 2017

@QuincyLarson :

I was trying to use the great package made by @abhisekp: https://www.npmjs.com/package/gitter-archive-cli but unfortunately it didn't work on my computer. It is giving a 404 error that I am finding hard to debug.

I will likely work this on Python. My current code seems to be outdated though. Apparently I am also affected by the rate limits with an 459 error - didn't happen before, I managed to download
messages over the limit in March 2017 with a simpler code.

If it works, I will make my Python code available. Hoping that will help to find a standard code to approach chatroom-archiving in the future.

@evaristoc
Copy link
Contributor

evaristoc commented Dec 8, 2017

@evaristoc
Copy link
Contributor

This is a previous message by @abhisekp to be kept here as reference: #8418 (comment)

@raisedadead
Copy link
Member

@evaristoc please continue on the new thread linked above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests