Mantel Group Data Science Challenge

Thank you for taking time to do the Mantel Group Data Science challenge. We're looking forward to hearing your thoughts and your approach to these common client asks we've recreated!

The challenge contains two exercises. Because this is the tool we most commonly use across Mantel Group Data and Analytics, our preference is that you complete exercise 2 in Python. You are welcome to create a Jupyter notebook and use any Python packages. Please send your submission via email, including your code for exercise 2 and any materials for exercise 1.

The purpose of this challenge is to allow us to kick off the conversation to dive a little deeper into your consulting and technical experience. This is a chance for you to demonstrate how you might approach business problems but we don't expect you to perform every possible variation and approach! Don't stress if you're tight on time - we get it, you have a life outside of applying for jobs - we're not expecting you to spend endless hours perfecting it (or infinite hyperparameter tuning! ;) ), just make sure you do enough to show us just how awesome you are in data science. In exercise 1 we're looking to explore how you would approach solving business problems in a client setting, while exercise 2 focuses on some of the core data science skills.

Exercises

Exercise #1

Mantel Group are approached by an artisan brewer who is considering adding a new hoppy beer to their range - India pale ale (IPA). The client's marketing team would like to know how consumers feel about this product and how the attitude toward IPA beer has changed over the past few years. They believe a question-answering website such as beer.stackexchange.com could be used as a source of genuine opinions.

Given this information, you are asked to prepare for the initial meeting with the prospective client where you will represent Mantel Group as a domain expert and help with sourcing information to shape the engagement. It may help to think about:

what approach could be recommended
what further information do we need
what the deliverables should be
how can success be measured

No code is expected for this exercise, and a slide or two describing what you would present to the client's marketing team is sufficient to guide our conversation.

Exercise #2

Use the dataset in ./dataset/questions.csv to develop a classifier to predict the forum based on the question body. Your objective is to maximise the model’s performance, however, we encourage you to avoid spending time training multiple models. You are free to preprocess the data as you like and use any classification algorithms. Produce a Jupyter Notebook that includes all code, details, and decisions made over the course of your model development.

Data

We have compiled a dataset for exercise 2 (can be found in the ./dataset/ folder), which is described below.

The data is compiled from Stack Exchange data dump, which is distributed under Attribution-ShareAlike 4.0 International license.

Of the Stack Exchange forums, we have selected ai, astronomy, beer, coffee, computergraphics, martialarts, opendata, quantumcomputing, sports.

The schema for the data can be found here.

Modifications:

Extracted question bodies for Exercise 2.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset		dataset
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Mantel Group Data Science Challenge

Exercises

Exercise #1

Exercise #2

Data

About

Releases

Packages

Contributors 4

eliiza/challenge-stack-exchange

Folders and files

Latest commit

History

Repository files navigation

Mantel Group Data Science Challenge

Exercises

Exercise #1

Exercise #2

Data

About

Topics

Resources

Stars

Watchers

Forks