Team Blind Review dataset

About the dataset

Team Blind is a professional's anonymous social networking platform. Work email-verified professionals can connect with their coworkers and other professionals by holding conversations on various topics, like compensation, work-life balance, pros/cons, and their overall opinion about their workplace.

According to a Fortune article:

On the platform, employees talk openly about pay transparency and seek career advice. The forum also has job postings, as part of the Talent by Blind recruiting portion of the app. But they also make more honest and frustrated statements about how they disagree with company culture.

According to the above article, an average Blind user is ambitious, independent, and direct and says that the users are set apart by their questions about how they should advance their careers. Thus it becomes more plausible to understand the various reviews that employees of particular tech companies have shared about their work over the dataset.

The Blind App Dataset is a dataset of reviews of over 25 companies, distributed across tech & consulting, and what their employees think about their workplace. The dataset covers the reviews of employees ever since Blind App was launched till May 8, 2022, when we generated this dataset. It covers an overall rating, description, pros, cons, author information, and their resignation reason (if they are a former employee).

The various companies covered in this dataset along with their number of reviews are:

Company	Review size
Adobe	954
Airbnb	515
Amazon	9903
Apple	1797
Atlassian	458
Bloomberg	1116
Bytedance	688
Cisco	1488
Coinbase	305
Deloitte	1047
Goldman Sachs	897
Google	5315
IBM	1247
Intel	1227
Intuit	785
Meta	1680
Microsoft	5830
Netflix	288
Oracle	1469
SAP Labs	822
Salesforce	1732
Stripe	496
Twitter	685
Uber	1679
Walmart	1377

The data is available in a CSV and a JSON file. Each review has a Rating, Description, Pros, Cons, Author Info & Resignation Reason. This dataset can be used to answer questions such as:

What is each company's rating, and how do reviews differ across various rating levels?
What are the main topics discussed for each company when it comes to Pros versus Cons?
What key aspects make a company good or bad across various factors (like work-life balance, management, compensation)?
What are the main reasons for former employees resigning from a company?
What are the main factors from which a prospective employee should choose a particular company?

The dataset has been scraped from Team Blind, and we used individual company reviews to develop the dataset. You can navigate to any specific company directory and check out the reviews on the original company-specific forum. Given that the data is scraped from Blind, all data attribution rests with them and the specific authors (who are anonymous at the moment).

Using the dataset

The dataset can be cloned using git:

git clone git@github.com:HarshCasper/Blind-App-Reviews.git

After a successful clone, you can use the dataset to load the individual CSV/JSON file of a particular company and start the analysis. Alternatively, you can download the ZIP of the entire repository, albeit without the version control available.

You can also use the dataset over Kaggle. Install the Kaggle library using pip install kaggle and create a new API token JSON file and save it over ~/.kaggle/ directory. Download the dataset using kaggle datasets download -d harshcasper/blind-app-company-reviews.

Contributing to the dataset

The scrapper engine used to develop the dataset is currently not open-source. However, it will be open-sourced at a later stage. To contribute to the dataset, you may create an issue to request new companies to be added to the dataset or the dataset to be updated for existing companies. At a later stage, you can self-generate new datasets using the scrapper engine and submit patches.

To fix existing dataset issues or clean certain areas, feel free to raise a pull request. Follow the GitHub guide to submit a pull request to contribute to the sanctity of the dataset. You may also develop Exploratory Data Analysis, Topic Modelling, and other NLP-related analyses with the dataset to create more data-driven insights.

License

This project uses the following license: MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Adobe		Adobe
Airbnb		Airbnb
Amazon		Amazon
Apple		Apple
Atlassian		Atlassian
Bloomberg		Bloomberg
Bytedance		Bytedance
Cisco		Cisco
Coinbase		Coinbase
Deloitte		Deloitte
Goldman-Sachs		Goldman-Sachs
Google		Google
IBM		IBM
Intel		Intel
Intuit		Intuit
Meta		Meta
Microsoft		Microsoft
Netflix		Netflix
Oracle		Oracle
SAP-Labs		SAP-Labs
Salesforce		Salesforce
Stripe		Stripe
Twitter		Twitter
Uber		Uber
Walmart		Walmart
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml

License

HarshCasper/Blind-App-Reviews

Folders and files

Latest commit

History

Repository files navigation

Team Blind Review dataset

About the dataset

Using the dataset

Contributing to the dataset

License

About

Topics

Resources

License

Stars

Watchers

Forks