Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Generate Synopsis for Changed Laws #23

Closed
joacmue opened this issue Mar 28, 2021 · 7 comments
Closed

[Feature] Generate Synopsis for Changed Laws #23

joacmue opened this issue Mar 28, 2021 · 7 comments

Comments

@joacmue
Copy link

joacmue commented Mar 28, 2021

Based on the discussions here I thought about parsing the law changes (Gesetzesänderungen) discussed in the Bundestag and generating a "human-readable" diff between the old form of the law and the proposed new one (in markdown). The exact format needs to be discussed, it should probably be somewhere in the area of the MS Word "Track Changes" (but strikethrough and text colors are not really supported by markdown) mode or some table view of "old text" vs. "proposed change".

Links of interest:

Issues I've seen for that so far:

  • Law names in the Änderungsgesetz texts are not (always) compatible to the names in lawdown - will need to perform some search or user query here
  • I'm not versed enough in css/javascript to see how to scrape recent changes from the search results
  • In order to match the changes in the Änderungsgesetz, I'd like to have the corresponding original law downloaded already. Issue 🚀 [Feature] Implement github workflow to publish data daily #16 seems to adress that point, though ;)
  • IP issues: I'm not sure whether I am actually allowed to parse the documents publicised by the Bundestag - maybe someone around here can shed some light on the terms of use there and whether this would be covered as "fair use"

If someone is around to talk to about the legal issues, I'm willing to put in some elbow grease to get a protoype of parsing the Änderungsgesetz pdf and mapping the changed paragraphs running.
Long-term goal would be to set up some web front-end that provides a drop-down of recently discussed law changes where the user can select one and then get the human-readable synopsis/change document for easy access. I'd need some guidance on the Front-End stuff here, but I'm willing to learn.

@darkdragon-001
Copy link
Collaborator

Isn't the reason to use git such that one can use the visually appealing diff tools which git/github provide? One could use branches for the proposals 😉

@joacmue
Copy link
Author

joacmue commented Mar 28, 2021

Sure, you can diff the changed laws after they changed, but what I am interested here is getting in the proposed changes that get discussed in the Bundestag. I might be on the wrong horse here given that I don’t really know the things that get published in the Bundesanzeiger and such...
I like the idea of branches for proposals, though. That would actually fit the working style quite nicely. The question remains: how do you get the proposals from the change into git? The proposals I could find were not published as full texts, but rather as „diffs“. I’m not sure whether there’s a publishing channel scraper and parser for those in the tools already.

@jbruechert
Copy link
Contributor

I could only find PDFs containing human readable descriptions of the changes, are there any better documents? Parsing those seems impossible to me.

@joacmue
Copy link
Author

joacmue commented Mar 28, 2021

Yeah, the documents I found are pretty much legalese and thus not really machine-readable as well as barely human understand-able. But they do seem to follow a pretty strict syntax that might be exploited. I might give it a try over the Easter holidays.
There are probably more pressing issues around here, just wanted to post the issue that got me here in some „official“ way.

@ulfgebhardt
Copy link
Member

ulfgebhardt commented Mar 28, 2021

I believe an approche like it is followed by these repos would be good:

Those use a crawler and let them run every day automatically. The changes are checked in into git and therefore generate a history of changes. As stated this would only cover the laws after change.

Extracting the proposed changes would be another task which can be done in this tool. But in general the following principles should be followed:

  • Only extract data, don't combine stuff, calculate stuff or other. This would be subject to another tool. We want a tool that gets clean & complete data first
  • Extract as much data as you can, so we do not miss stuff. If you crawl an website - get all the information available on it if you can even tho it might not be relevant for your cause. It might be for someone elses.
  • Don't worry about legal issues. I take the responsibility if someone has a problem with us publishing bundestag/law/... related data. In the past no legal issues arose from us publishing data like we do. I tend to publish stuff under the Unlicense since the Bundestag IT department could not answer the question under what license their data is made available to the public.

@joacmue
Copy link
Author

joacmue commented Mar 28, 2021

Thanks @ulfgebhardt for pointing that out.
Now that I somehow understand the actual scope of the tools here, I feel a bit more like the idea sketched here should indeed move to another tool. It might still be worthwhile to add a crawler in the dip21-style here, though. I'll need to browse through those a bit to understand what's going on there. Getting all the information on the DIP database might be a bit steep, though.

@joacmue joacmue closed this as completed Mar 28, 2021
@ulfgebhardt
Copy link
Member

The dip21 data is scraped with this: https://github.com/bundestag/scapacra-bt

-> And don't get me wrong. I do not mind at all to do more data analysis or what not. All I say is that I consider it wise to create a solid data basis first to get the shit the official websites give us into an actual useful format. From there we can go further. The approche I describe seems logical to me: Get all data and have this process separated from the processing part. But as said - thats just an idea of mine and not set in stone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants