A search engine made from scratch, that allows users to select sites they want to search and download selected search results as a CSV, XML, or JSON file.
- Back-end developmet
- Database administration and optimization
- Front-end development
- Hosting and deployment
- The back-end module which takes a page next in line to be indexed, reads the content of the page, parses it into words, and updates the database as specified in the previous section. (Note that it will need to update tables page, word, and page_word, but not search.) For each page slated to be indexed, one also collect additional URLs to be indexed (by looking through it for “href” attributes). This somewhat akin to a “breadth first search”.
[To do]
- The Admin screen in which the user can type/paste a URL to be indexed, passing it to the Indexing Engine mentioned above. Alternatively, one can add an option to the previously created screens that for any search result in Phases 1, one clicks a button to index selected items
- Present a list of all user searches, the terms, number of search results, and how long they took.
- This is the screen in which a user types a search term(s) and clicks Search, then waits for results. Since there is ambiguity as how a user intends to search, ideally these should be handled by explicit options on the screen (using radio buttons or checkboxes), such as • case-insensitive • allow partial match
This fundamental query will drive the search:
SELECT * FROM page, word, page_word WHERE page.pageId = page_word.pageId AND word.wordId = page_word.wordId AND word.wordName = ‘wordEntered ‘ ORDER BY freq desc
where wordEntered is the search word entered in the search box.
- The query in the previous section should be sent to the database. One iterates through the database results (in a way similar to how one iterated through the contents of the JSON, CSV, and XML files or the Google search) and build and display a set of results.
The same download feature available in Phase 1 should also be available in Phase 2 (checkboxes, Select/Deselect All, Save As JSON/XML/CSV).
[To do]