Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metadata] Documenting current metadata in th.csv #1723

Open
bact opened this issue May 9, 2024 · 0 comments
Open

[Metadata] Documenting current metadata in th.csv #1723

bact opened this issue May 9, 2024 · 0 comments

Comments

@bact
Copy link
Contributor

bact commented May 9, 2024

This is to documenting the way "notes" field is being used in th.csv, the test list for Thailand, to encoded some metadata about the URL itself or about the test status of the URL.

  • Almost all of them are not well-formatted, still not totally "free text".
  • The one that is consistency coded is the language code. This one can be easily parse.

Available metadata:

  • “[<lang_code>]” -- Appended to the end of the notes. Natural language of the page at the URL. <lang_code> is ISO 639-1 language code (2 characters).

  • “Regional site” -- Appended to the end of the notes. Telling if the website at the URL is a “regional site” where the same site is intended to serve more than one country. Useful when reporting about the characteristic of the website.

  • “blocked in ” or “blocked on “ -- Date the URL got issued a block order from court or is known to be blocked, from media or other sources. Currently some dates are now in ISO, some are not. Ideally, it should be all ISO 8601.

  • “blocked in , see:“ -- Date the URL got issued a block order or is known to be blocked, with a reference.

  • “last updated on” -- Date where a human annotator can verify (from the web content) that the page was most recently get updated.

Examples from the actual th.csv

  • Thai politics review journal [en]

  • Asian politics review and analysis [en]

  • Thai Lawyers for Human Rights (old website) [th] [en]

  • Issues in Deep South of Thailand [th] [en] [ms]

  • Telecom Asia, also cover ICT news in Asia. Announced closure on 2019-05-31. No longer updated. [en]

  • Anti-censorship group. (As of June 2020, the blog was last updated on 13 April 2019)

  • Asian porn. Found blocked in 2014, see: https://citizenlab.ca/2014/07/information-controls-thailand-2014-coup/ [en]

  • Human Rights Watch - Thailand page. Was blocked on Nov 2014, after the coup. https://www.blognone.com/node/63330 [en]

  • Think tank on civil society. Based in Singapore. [en] Regional site

  • Midnight University, was blocked in 2006. Found anomaly on OONI Explorer (most recent on 2020-06-09, as of 2020-06-11). [th]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant