Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix villages data #11

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions csv/villages.csv
Original file line number Diff line number Diff line change
Expand Up @@ -1115,7 +1115,7 @@
1105130118,1105130,KUALA IDI
1105130119,1105130,KEUTAPANG MAMEH
1105130120,1105130,ULEE BLANG
1105130121,1105130,ALUE DUA MUKA 0
1105130121,1105130,ALUE DUA MUKA O
1105130122,1105130,KUTA LAWAH
1105131001,1105131,ASAN RAMPAK
1105131002,1105131,BUKET KUTA
Expand Down Expand Up @@ -42033,7 +42033,7 @@
3506190007,3506190,MENANG
3506190008,3506190,TENGGER KIDUL
3506190009,3506190,SEMANDING
3506190010,3506190,SITIMERT0
3506190010,3506190,SITIMERTO
3506190011,3506190,PAGU
3506190012,3506190,BENDO
3506190013,3506190,JAGUNG
Expand Down
6 changes: 6 additions & 0 deletions scripts/mdf_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ def process_buffer(buf):
villages_dict[village_id] = village_name


def fix_villages(villages):
for key, value in villages.items():
villages_dict[key] = value


def write_data_to_csv(tmp_dir, key):
print 'Writing provinces data...'
write_dict_to_csv(tmp_dir + '/provinces-' + key + '.csv', provinces_dict)
Expand All @@ -72,6 +77,7 @@ def write_dict_to_csv(fname, data_dict, upper_level_key_length=0):
def main(argv):
if (len(argv) > 0):
read_html_data(argv[0] + '/' + argv[1])
fix_villages({1105130121: 'ALUE DUA MUKA O', 3506190010: 'SITIMERTO'})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method is quite dangerous. In case BPS rename 1105130121 and 3506190010, the generated data will be not following BPS update. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, @prasastoadi only found issue for those two villages, what about the other data. Did he had already check entire village data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16 is a way to check for more problems. But I think we should not wait for all problems to be found. They will be reported when people find them.

And we cant wait for government to fix them. That doesnt happen quickly.
But the fixes should be optional, so we can still use this repo's tools to obtain raw data.

write_data_to_csv(argv[0], argv[2])
else:
print "usage: mdf_parser.py <directory> <html_input_file> <key>"
Expand Down