Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to map search results to URLs #30

Open
chosak opened this issue Dec 2, 2020 · 1 comment
Open

Make it easier to map search results to URLs #30

chosak opened this issue Dec 2, 2020 · 1 comment

Comments

@chosak
Copy link
Member

chosak commented Dec 2, 2020

Searching through this repository using the suggested grep commands generates a list of filenames. It would be more useful if it were easier to generate a list of matching URLs instead.

Most of the time the conversion is straightforward (e.g. www.consumerfinance.gov/index.html -> https://www.consumerfinance.gov/), but it would be nice not to have to do another step. And there are some cases where the conversion may not be straightforward, for example with long URLs as documented in #13.

As suggested on #13:

One idea would be to write a script that parses our wget.log file to generate a list of URLs and their truncated filenames.

Current behavior

Documented search commands produce a list of filenames.

Expected behavior

Documented search commands produce a list of URLs.

@csebianlander
Copy link

Future feature suggestion: Have the crawler also pull the relevant code snippet(s) from the pages found as another column for downloading as a csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants