Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix villages data #11

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

prasastoadi
Copy link

@prasastoadi prasastoadi commented Nov 3, 2016

Fix '0' to 'O'

ALUE DUA MUKA 0 -> ALUE DUA MUKA O
SITIMERT0 -> SITIMERTO

@jayvdb
Copy link
Contributor

jayvdb commented Nov 4, 2016

This data file is created by a script extracting data from http://mfdonline.bps.go.id/ . See https://github.com/edwardsamuel/Wilayah-Administratif-Indonesia/blob/master/scripts/run.sh#L12

It is not useful to modify this generated file. Your changes will be overwritten when the script runs next time.

Is the BPS data wrong? If it is wrong, it needs to be fixed in the BPS source.

You can see "ALUE DUA MUKA 0" and "SITIMERT0" are used in
https://web.archive.org/web/20150207100538/http://www.bps.go.id/eng/download_file/Population_of_Indonesia_by_Village_2010.pdf

Other occasions where this data has appeared;

https://www.google.com/search?q=%22SITIMERT0%22+%223506190010%22

And a 'bot' created Wikipedia articles:
https://nl.wikipedia.org/wiki/Alue_Dua_Muka_0
https://nl.wikipedia.org/wiki/Sitimert0

And it appears in a wordlist here:
https://id.wiktionary.org/wiki/Wiktionary:ProyekWiki_bahasa_Indonesia/Daftar_kata/Nama/Tempat/Semua

@jayvdb
Copy link
Contributor

jayvdb commented Nov 4, 2016

If we can confirm that the BPS data is wrong, one solution is for this repository to have a 'fixes' list, which run.sh uses to fix the generated csv files.

@edwardsamuel
Copy link
Owner

Hi @prasastoadi,

Agree with @jayvdb. Any generated files can't be edited manually. It will be overwritten in the next run. You need to modify the script that generates the files, in this project can be run.sh or the python script. But, you need to make sure first if the source (BPS MDF Online) data is wrong.

@prasastoadi
Copy link
Author

prasastoadi commented Nov 7, 2016

I am very confident that the two villages name are wrong. We know that 0 (zero) is not alphabet.

Here is the Sitimerto village
https://goo.gl/maps/qMH3K7LjahB2

Alue Dua Muka O
http://lmgtfy.com/?q=alue+dua+muka+o+site%3Ago.id

I propose very simple method before write the data to csv.
I think better to check villages/districts/regencies/provinces one by one to prevent typo in the data. I hope someone do it in the next patch 😉

@@ -72,6 +77,7 @@ def write_dict_to_csv(fname, data_dict, upper_level_key_length=0):
def main(argv):
if (len(argv) > 0):
read_html_data(argv[0] + '/' + argv[1])
fix_villages({1105130121: 'ALUE DUA MUKA O', 3506190010: 'SITIMERTO'})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method is quite dangerous. In case BPS rename 1105130121 and 3506190010, the generated data will be not following BPS update. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, @prasastoadi only found issue for those two villages, what about the other data. Did he had already check entire village data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16 is a way to check for more problems. But I think we should not wait for all problems to be found. They will be reported when people find them.

And we cant wait for government to fix them. That doesnt happen quickly.
But the fixes should be optional, so we can still use this repo's tools to obtain raw data.

@feryardiant
Copy link
Contributor

IMO it's pointless to update anything in this repo while the source data from BPS still remain wrong.

Dear @prasastoadi, one thing that you should do is ask BPS to update their data instead.

@jayvdb
Copy link
Contributor

jayvdb commented Feb 17, 2018

Maybe fixes should be wrapped in a separate function call (and possibly separate data file), so that users can easily apply all fixes on top of the existing data.

contactjavas added a commit to contactjavas/Wilayah-Administratif-Indonesia that referenced this pull request Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants