Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APP] Force SSL #130

Open
billimarie opened this issue Oct 4, 2020 · 9 comments
Open

[APP] Force SSL #130

billimarie opened this issue Oct 4, 2020 · 9 comments

Comments

@billimarie
Copy link
Owner

A lot of the images we are hotlinking to are https. Is there a method we can use to force https?

@michaelknowles
Copy link
Contributor

I would advise that we not hotlink to other websites' images. This is stealing their bandwidth and opens the website up to abuse. For example, if we are using another website's image, they could change that image to whatever they want (i.e. something inappropriate) while keeping the same URL so that we show it on our website unknowingly.

Instead, we should be uploading our own versions/copies of images and using them.

@billimarie
Copy link
Owner Author

Hi @michaelknowles, I agree. A few years back, we used Sirv for image hosting. The only problem was we were slowed down in our data collecting by having to save, upload, & tag each image.

We then shifted to a model where images were stored on GitHub in a folder called headshots; a little bit faster, but again, the data import was slowed due to the need to organize the images.

For this year's Hacktoberfest, if you'd like to take the lead on steering us toward a sustainable image hosting solution, I'd love to assign you an updated issue. What are your thoughts?

@michaelknowles
Copy link
Contributor

It looks like people are just supplying links to images in the JSON.

{
  ...
  "headshot":"https://www.pdaa.org/wp-content/uploads/2019/07/adamsco.jpg",
  ...
}

Can you explain how this JSON is then getting uploaded into the database? Ideally, we'd have a script that is uploading this data. The same script would fetch the linked image, transform it, then store it somewhere.

As for where to store the images, there are a couple options:

  • Github - Not recommended. 5gb size limit and makes it more time consuming to clone and work on the repository.
  • CDN like Cloudinary - Possible. Free tier would likely support this application.
  • Others?

@billimarie
Copy link
Owner Author

It sounds like writing that kind of script is the first step. An older scrapper I wrote might be able to be tweaked. It stripped necessary data from local prosecutor websites. You can find it here: https://github.com/billimarie/prosecutor-database/blob/501ef012324d3d11f520bb9aeeb334beb32f4278/README.md#optional-python-script

Your work might crossover with @janel-developer, who is researching alternative data sources. You can check out that issue at #145 in case you are able to assist.

Currently, I manually review & import the data via Terminal. It is entirely possible to create a script to first sanitize the data, then import it to MongoDB; the hard part (which we have attempted & did not succeed at numerous times) is creating a scrapper that uniformly grabs the data from multiple sources in the first place.

Are you interested in tackling any of these issues? If so, we can create a new issue for you. There is data within the 100 DA folder which you can experiment with, & notes in the DOCS.md on how to spin up a MongoDB instance.

@michaelknowles
Copy link
Contributor

We can keep this as two separate scripts:

  • Scrape data into JSON format
  • Upload JSON data and images

That way we can work in parallel and will also ease development of unit tests.

@billimarie
Copy link
Owner Author

Sounds good! Which would you like to work on first? Feel free to create an issue if you have time.

@michaelknowles
Copy link
Contributor

Lets see what discussion happens on that other issue first. I don't want to create duplicate or conflicting work.

@billimarie
Copy link
Owner Author

@michaelknowles That issue is not for creating scripts; it is for researching data sources.

@michaelknowles
Copy link
Contributor

Ah got it. In that case, I can work on the upload script first. I'm assuming the data will be stored in the same JSON format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants