Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terrible Data from your wonderful job #25

Open
HasturDev opened this issue Apr 10, 2020 · 8 comments
Open

Terrible Data from your wonderful job #25

HasturDev opened this issue Apr 10, 2020 · 8 comments

Comments

@HasturDev
Copy link

Synopsis

The idea of this is to go through 4 phases where you create increasingly more difficult CSV files to parse

Examples

First input will be Easy_CSV.csv that will require
import csv
with open('Easy_CSV.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
print(row)

then it will show names and integers from the CSV files

second Input will be Intermediate.csv that will require more effort
then a hard.csv then a hellmode.csv

@HasturDev
Copy link
Author

0skhMH I created a specific Dojo I will have the last one collected and read to post before thursday

@HasturDev
Copy link
Author

I Do need help on thinking of a way to create something slightly harder or easier than the medium. Something that involves placing a single piece of data inside a regular CSV or creating something that feels like a medium or hard CSV
I just don't have any idea what that is

@xvillaneau
Copy link
Collaborator

Looking into the examples so far in the Dojo session, this is a great start. Thanks for your work!

I see that the "easy" file has a header line (with column names) while the "medium" one does not. I would suggest inverting that so that the first exercise is a little easier (getting numbers from a list of lines is a great beginner-level problem) and the medium one a little more interesting (ask for users to make sure the column names are conserved in the output). And I would not put those names in quotes just yet.

For a "hard" puzzle, here are some ideas that could make a good difficulty curve:

  • Have incomplete lines, e.g. a line where the last field is blank but the trailing comma is missing. E.g. banana,3 instead of banana,3, if there were 3 fields.
  • Have quoted string values with commas in them (they would be in quotes, so that the data is still technically valid CSV). E.g.: "Hello, World!"
  • To be extra evil, put in some "nice" numbers. E.g. "10,000" instead of 10000

@HasturDev
Copy link
Author

So I found out that the CSV size that I want to use will throw a site error. 800k is just to much for the system to handle. 100k is to large for the system to handle. There are a few options.
One I can hand people the CSV file to use if they want to do the full challenge with all 11 rows and however many thousands of columns.
Two I can go through the full tedium of removing

some of the more difficult things within the file that also increase it's size
Three I can split away the actually difficult parts of the file and just leave the Easier pieces which are still large ish

here is some of the file so you can get an idea of what I'm working with Terrible_Csv_File.txt

@xvillaneau
Copy link
Collaborator

I see, yes there are limitations in cyber-dojo. I usually avoid files that are more than 1000 lines long.

I like the idea for the last exercise, that's the kind of web-scraping messy data you were talking about. Looking forward to that!

@HasturDev
Copy link
Author

Nice also I need someone to look over the tests for the hard and medium. I don't know how you do them so I just wrote assertions that are looking for specific lines. The medium is looking for the copied list and the hard is looking for the first product in each webpage

@HasturDev
Copy link
Author

I think the tests for the Medium and Hard are looking for the correct output, but I think that the test for the easy might be incorrect

@HasturDev
Copy link
Author

Do I even need to personally write the tests or is that for everyone else?

@fimion fimion mentioned this issue May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants