Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra consistency checks on category codes and URLs #400

Open
hellais opened this issue Oct 11, 2018 · 3 comments
Open

Extra consistency checks on category codes and URLs #400

hellais opened this issue Oct 11, 2018 · 3 comments

Comments

@hellais
Copy link
Collaborator

hellais commented Oct 11, 2018

We have run into the issue, when using the lists in OONI, that some country lists present the following problems:

  1. The same URL present in different country specific lists, presents a different category code

ex.
id.csv:http://denypagetests.netsweeper.com,NEWS,News Media,2014-04-15,citizenlab,
kw.csv:http://denypagetests.netsweeper.com,CTRL,Control content,2014-04-15,citizenlab,

  1. The same URL is present in both the global and the country specific list

ex.
global.csv:http://www.crazyshit.com,PORN,Pornography,2014-04-15,citizenlab,Updated by OONI on 2017-02-14
sg.csv:http://www.crazyshit.com,NEWS,News Media,2014-04-15,citizenlab,

We should add checks to the lint-lists.py script that checks if:

  1. There are inconsistencies in category codes across lists
  2. If a URL is present in the global list it should not also be present in the country specific list

On this second point I would like to hear from @sneft and others to know if this is reasonable or if it's maybe just a OONI specific usage of the lists.

@darkk
Copy link
Contributor

darkk commented Oct 11, 2018

Check №2 is a bit tricky for cis.csv. Should cis.csv be treated in a same way as global.csv for corresponding countries? What definition of CIS should it use? E.g. should it include Georgia?

@hellais
Copy link
Collaborator Author

hellais commented Oct 11, 2018

At OONI we don't actually use cis.csv at all and that country list has not been updated in a pretty long while. I would go to the extent of suggesting we remove it or move it to another directory.

@sneft
Copy link
Collaborator

sneft commented Oct 11, 2018

For point 1, I have no doubt there are a number of these inconsistencies. We tried to fix these as we encountered them but haven't ever made a systematic effort to clean them.

For point 2, I agree that if a URL is present on the global list it should not be on a local list. Our old testing system flagged when you attempted to upload a local list with a URL duplicated in the global list. Our logic was that the global and local lists are meant to be run as a single unit, so we wanted to avoid duplication. I know we had some cases where this was inconvenient (e.g. wanting to test a very narrow sample in a bandwidth-limited place) but to my knowledge OONI is flexible enough to better accommodate custom lists for special circumstances.

(This requirement does add a small burden of labour on list compilers, as in my experience the average person compiling a local list will reasonably (and often appropriately) add certain URLs that are duplicated in the global list. Perhaps this is just a matter of good documentation and instructions to list compilers.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants