Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seed the database #73

Open
1 of 2 tasks
tyliec opened this issue Feb 20, 2024 · 15 comments
Open
1 of 2 tasks

Seed the database #73

tyliec opened this issue Feb 20, 2024 · 15 comments
Assignees
Labels

Comments

@tyliec
Copy link
Member

tyliec commented Feb 20, 2024

Objective

To implement support for seeding the database.

Context

Some of us have gotten the UIPA portal running, however it looks a little weird with zero data. For the purposes of development, it is quite helpful to have an initial seeding of data present to allow for all the normal operations of the portal locally.

This capability was present in the previous iteration of the UIPA portal in the reset.sh script.

Tasks

  • Initially seed the database with Hawaii's Public Bodies
  • Initially seed the database with some sample requests.

Success Criteria

The ability to seed the database is available, for both public bodies and requests.

Related Items

Parent Epic: #50

Open Questions:

  • Do we need to update our list of Hawaii's public bodies? How can we get a more recent list now, and how often do we need to update this list?
@tyliec
Copy link
Member Author

tyliec commented Feb 20, 2024

@yenhtran had some progress with this - but basically we found out that a simple python manage.py loaddata ... or import_csv isn't going to cut it. There is something different about our current version of the portal that doesn't allow for the previous version of the data's format.

@tyliec
Copy link
Member Author

tyliec commented Feb 28, 2024

Found some docs on this - https://github.com/CodeWithAloha/uipa/blob/d9e5322e0ed21b680f0e597997c20274f670220e/docs/importpublicbodies.rst, going to give it a shot this Wednesday.

@tyliec tyliec added the task label Mar 2, 2024
@russtoku
Copy link
Member

russtoku commented Mar 2, 2024

Isn't this:

Found some docs on this - https://github.com/CodeWithAloha/uipa/blob/d9e5322e0ed21b680f0e597997c20274f670220e/docs/importpublicbodies.rst, going to give it a shot this Wednesday.

the same as Froide's Docs on Importing Public Bodies?

Both suggest using the Google spreadsheet linked in the docs to create a CSV file and load it using python manage.py import_csv public_bodies.csv. The 13 fields in that spreadsheet and described in the docs could be populated from the public body seed data in the uipa_org/fixtures directory in the master branch as mentioned in the Related Items section in the first comment. They are JSON data files so it should be straight-forward to extract data from them.

I restored a backup of my notes on UIPA development from 2018 and have been trying to recreate a development version of the UIPA website by loading the seed data from the uipa_org/fixtures directory. While I got a server running with a SQLite 3 database, I was only able to load the flatpages, jurisdictions, and sites from the JSON fixture files. I could load the foilaw, public body, public body tag, and tagged public body data.

@russtoku
Copy link
Member

russtoku commented Mar 4, 2024

I was able to:

  • Load Jurisdiction data from the old UIPA.org fixture file, publicbody.jurisdiction.json.
  • Manually add Classification based on data from the old UIPA.org fixture file, publicbody.publicbody.json.
  • Load the Flatpages data from the old UIPA.org fixture file, flatpages.flatpage.json.
    • These flatpages are useful yet because they are associated with URLs that aren't linked from Froide pages.
  • Load the public bodies from the old UIPA.org fixture file, publicbody.publicbody.json.
    • I did so by extracting the data to a CSV file and loading the CSV file through the Admin site.

@yenhtran
Copy link

yenhtran commented Mar 7, 2024

Notes on where we left off:

  • Got SOME data seeded in the Public Bodies UI interface...
    Screenshot 2024-03-06 at 10 01 40 PM

However we are noticing only 28 got uploaded. We get the following error when we run the command: python3 manage.py loaddata publicbody.publicbody.json
Screenshot 2024-03-06 at 10 04 02 PM

When comparing pk: 28 (seeded) and pk:29 (not seeded), we don't find any significant difference:
Screenshot 2024-03-06 at 10 05 35 PM

Was reviewing this chunk of code: https://github.com/CodeWithAloha/froide/blob/5584e46107fce5ffee409a2172d07974d9e9103e/froide/publicbody/models.py#L360

cc: @tyliec

@russtoku
Copy link
Member

russtoku commented Mar 7, 2024

One of the problems with loading the database from the JSON fixture files is the files were dumped from a database that has data in related tables.

So, I wrote a Python program to extract the data from the publicbody.publicbody.json fixture file and create a CSV file to load from the Admin site or using python manage.py import_csv. I also wrote a Python program to get the unique names for classifications to load them before loading public bodies. I didn't include tags because I didn't realize that they are in the Categories table.

I just got through loading these tables and dumping the data into JSON fixture files:

  • Jurisdictions (loaded from publicbody.jurisdiction.json)
  • Classifications (manually added based on publicbody.publicbody.json)
  • Freedom of Information Laws (manually added based on publicbody.foilaw.json)
  • Public Bodies (loaded from CSV file based on publicbody.publicbody.json)

I'm using the names from the Public Body administration page in the Admin site to refer to these tables.

I was also looking in the old UIPA codebase and saw these CSV files in the data folder:

I think it might be a good idea to use 2017-11-21-Hawaii_UIPA_Public_Bodies_All.csv as it is probably the most recent in terms of data used for the go-live of UIPA.org.

I'm going to redo my data load to use this file.

@russtoku
Copy link
Member

russtoku commented Mar 7, 2024

As a side note, the old UIPA.org development used a SQLite database. I'm going to assume that using PostgreSQL for development is currently the preferred method. Thus, I created a clear_db.sh script to "reset" the database so you can run python manage.py migrate --skip-checks to initialize the database. This should help calm any fears about breaking stuff.

@yenhtran
Copy link

Thank you @russtoku for the explain. Any chance you'd be able to push up your changes? We are still stuck...

@russtoku
Copy link
Member

russtoku commented Mar 14, 2024

Shall I make a pull request to add a seed folder under the uipa/data folder in the https://github.com/CodeWithAloha/uipa repo?

Can you tell me what repo, branch, and commit you're using? The main branch of https://github.com/CodeWithAloha/uipa before March 6, 2024 was renamed to main-copy and a new main branch was created.

@yenhtran
Copy link

@russtoku - I think a pull request would be super helpful.

So the repo/branch/commit I'm using is sort of complicated but I had been working off the main branch before March 6 (commit: d9e5322e0ed21b680f0e597997c20274f670220e) and have not kept it updated since there are a lot of breaking changes that might make the current issue harder to investigate. So @tyliec and I both agreed that for now that I don't pull in the changes until we have something working and then I'll branch off the most updated branch and apply the solution.

@russtoku
Copy link
Member

Great! You are working at the point that I was when I was able to load public bodies as mentioned above in my #73 (comment).

I will make a pull request (PR) against the main branch of https://github.com/CodeWithAloha/uipa so you grab the files from it without updating your working directory. It shouldn't matter if you get the files from the PR or from the main branch after the PR is merged (assuming that it will be).

@yenhtran
Copy link

Yay! Mahalo @russtoku !🤙

@russtoku
Copy link
Member

In regards to @yenhtran 's comment:

  • Got SOME data seeded in the Public Bodies UI interface...

However we are noticing only 28 got uploaded. We get the following error when we run the command: python3 manage.py loaddata publicbody.publicbody.json

When comparing pk: 28 (seeded) and pk:29 (not seeded), we don't find any significant difference:

The difference is pk:29 has classification and classification-slug values while pk:28 doesn't. These values must exist before the publicbody.publicbody.json or a CSV file to upload public bodies can be loaded.

@russtoku
Copy link
Member

See #80 for a first pass at seeding the database for development.

@yenhtran
Copy link

In regards to @yenhtran 's comment:

  • Got SOME data seeded in the Public Bodies UI interface...

However we are noticing only 28 got uploaded. We get the following error when we run the command: python3 manage.py loaddata publicbody.publicbody.json
When comparing pk: 28 (seeded) and pk:29 (not seeded), we don't find any significant difference:

The difference is pk:29 has classification and classification-slug values while pk:28 doesn't. These values must exist before the publicbody.publicbody.json or a CSV file to upload public bodies can be loaded.

I think when we were still debugging this, we did notice that up until pk:29 all the classification and classification-slug fields were empty... so we did try and set those fields on pk:29 to empty strings but ended up not getting a different error. But this makes sense to actually create these classification fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests

3 participants