Skip to content
View brandonrobertz's full-sized avatar

Organizations

@html-extract @dosbox-staging @next-LI
Block or Report

Block or report brandonrobertz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
brandonrobertz/README.md

Um Yes Hello

I'm Brandon Roberts. I'm an independent data journalist specializing in open source and bringing computational techniques to journalism projects. You can read more on my site: bxroberts.org

Pinned

  1. SparseLSH SparseLSH Public

    A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

    Python 136 26

  2. propublica/django-collaborative propublica/django-collaborative Public

    ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

    Python 94 18

  3. autoscrape-py autoscrape-py Public

    An automated, programming-free web scraper for interactive sites

    HTML 102 17

  4. chatgpt-document-extraction chatgpt-document-extraction Public archive

    A proof of concept tool for using ChatGPT to transform messy text documents into structured JSON

    Python 115 12

  5. html-extract/hext.js html-extract/hext.js Public

    Use Hext in a browser or with node. Hext is a domain-specific language for extracting structured data from HTML documents.

    C++ 5 1

  6. tabula-draw-columns tabula-draw-columns Public

    Simple tool to visually build column config strings for tabula-java

    HTML 1