Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contextually show where links can be found in the Wikipedia pages themselves #39

Open
DyeffersonAz opened this issue Oct 10, 2018 · 9 comments

Comments

@DyeffersonAz
Copy link

To show where the links were found, just because sometimes I can't find where this link is in the page.

@jwngr
Copy link
Owner

jwngr commented Oct 10, 2018

Thanks for the suggestion! I agree it would be a cool feature, but given the data source I'm using, it is not really easy to do. I don't ever actually see the full text of the Wikipedia page itself, just the Wikipedia database containing all the links. So I can't easily show you the context around where the link shows up in the actual page. Also, since the database is only updated monthly, it is possible the link is actually no longer on the page itself as it may have been edited since the latest database dump. Maybe I'll figure out a way to do this in the future, but for now, this is not feasible with my current architecture.

@jwngr jwngr changed the title To show where the links are Contextually show where links can be found in the Wikipedia pages themselves Oct 10, 2018
@DyeffersonAz
Copy link
Author

You can't pick the HTML of the page, can you?

@jwngr
Copy link
Owner

jwngr commented Oct 10, 2018

I definitely could try something like that and I honestly think that is the way this would need to be implemented. But it wouldn't be very efficient and the system currently doesn't ever look at the raw HTML.

@DyeffersonAz
Copy link
Author

Also, it would be better than needing to dump the database much times, it'd be automatic

@jwngr
Copy link
Owner

jwngr commented Oct 12, 2018

There is no way to do the actual search algorithm using live pages as it would take way too long. Thousands to tens of thousands of pages need to be touched. What I was referring to was just pull the context for a single page when you, for example, click on it in the graph view.

@DyeffersonAz
Copy link
Author

Yep

@Quifisto
Copy link

Quifisto commented Nov 6, 2019

Maybe you could look through the HTML after the search has completed. Then do some web scraping to look for the link on the page and return the title of the section or subsection it was found in.

@DyeffersonAz
Copy link
Author

Maybe you could look through the HTML after the search has completed. Then do some web scraping to look for the link on the page and return the title of the section or subsection it was found in.

This is what I was suggesting. A way SDOW could go to the live wikipedia page and search for each link, then return the parent header of that <a> element, for example.
I don't have knowledge in web-development to help with this yet, unfortunately.

@xavzz
Copy link

xavzz commented Mar 10, 2024

that would be very nice since I cannot find any links shown in the results on either of the pages requested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants