Time and time again, I've found myself in need to save webpage URLs I come across for future reference or for later reading. Over time, I also wanted to share this list with selected people. Eventually, this list has grown long enough to be shared with the greater community.
So here it is, my list of URLs, anything ranging from technical topics I am interested at to articles on people, social, communication skills, Diversity and LGBTQI+.
- Temporary home for data processing/machine learning SQL snippets on Greenplum/HAWQ (https://github.com/vatsan/gp-sql-snippets)
- Using Trilogy testing framework with Greenplum (https://github.com/ihuston/trilogy_gpdb)
- SAS MACROs which allow users to publish SAS models into Greenplum, HAWQ and PostgreSQL databases with MADlib SQL-based advanced analytics and machine learning library installed for in-database scoring (https://github.com/jvawdrey/madlib-sas-macros)
- SQL Server to Greenplum Migration Code using Java (https://github.com/rupendrab/sqlservertogreenplum)
- Web page classification using GPText and MADlib on Greenplum DB (https://github.com/hdlee4u/text_analysis)
- A packaging mechanism for deploying additional PL/Python packages in Apache HAWQ and Pivotal Greenplum (https://github.com/kdunn926/pivpyWheelie)
- Scripts and tools used for data cleansing and integration into Greenplum Database (https://github.com/bkersteter/GPDB-scripts)
- Python script to scrape state level unemployment and consumer price index data from Bureau of Labor Statistics ftp (http://download.bls.gov/pub/time.series/) and ingest data into Postgres/Greenplum/HAWQ database (https://github.com/jvawdrey/bls-scraper)
- Greenplum DB Window Function examples (https://github.com/csylvester-pivotal/Greenplum-Window-Function-Examples)
- Collection of tutorials on text analytics/NLP, including vector space models, neural language models and topic models on the Pivotal MPP platform (Greenplum/HAWQ) (https://github.com/vatsan/text_analytics_on_mpp)
- Sentiment classifier using PL/Python on PostgreSQL, Greenplum Database, or Apache HAWQ (https://github.com/crawles/gpdb_sentiment_analysis_twitter_model)
- Some Greenplum SQL commands for DBA's (https://github.com/faisaltheparttimecoder/Greenplum)
- Mock data in PostgreSQL/Greenplum/HAWQ databases (https://github.com/pivotal-legacy/mock-data)
- Example of using greenplum-spark connector (https://github.com/kongyew/greenplum-spark-connector)
- Various managment scripts I've written while supporting Greenplum (https://github.com/MarcPaquette/Greenplum-Management-Scripts)
- A curated list of awesome Greenplum resources, tools (https://github.com/kongyew/awesome-greenplum)
- A collection of examples illustrating data processing, data science, and machine learning on the Pivotal Greenplum and HAWQ MPP databases (https://github.com/gautamsm/data-science-on-mpp)
- Predict Visitor Purchases with a Classification Model in BQML (https://google.qwiklabs.com/focuses/1794?parent=catalog)
- Regionation with PostGIS (https://www.endpoint.com/blog/2018/02/08/regionating-with-postgis)
- Pivotal Greenplum useful DBA SQL queries (https://anonymousbi.wordpress.com/2014/11/11/pivotal-greenplum-useful-dba-sql-queries/)
- Be careful with CTE in PostgreSQL (https://medium.com/@hakibenita/be-careful-with-cte-in-postgresql-fca5e24d2119)
- PostgreSQL schema differences and views (https://www.endpoint.com/blog/2016/10/14/postgres-schema-differences-and-views)
- Google BERT — Pre Training and Fine Tuning for NLP Tasks (https://medium.com/@ranko.mosic/googles-bert-nlp-5b2bb1236d78)
- Building an ETL Pipeline in Python (https://towardsdatascience.com/building-an-etl-pipeline-in-python-f96845089635)
- Streaming Twitter Data into a MySQL Database (https://towardsdatascience.com/streaming-twitter-data-into-a-mysql-database-d62a02b050d6)
- Linear Regression in 6 lines of Python (https://towardsdatascience.com/linear-regression-in-6-lines-of-python-5e1d0cd05b8d)
- Exploring a powerful SQL pattern: ARRAY_AGG, STRUCT and UNNEST (https://medium.freecodecamp.org/exploring-a-powerful-sql-pattern-array-agg-struct-and-unnest-b7dcc6263e36)
- Introduction to Data Masking Transformation in Informatica (https://blogs.perficient.com/2018/03/27/introduction-to-data-masking-transformation-in-informatica/)
- Superset: benefits and limitations of the open source data visualization tool by Airbnb (https://medium.com/@InDataLabs/superset-benefits-and-limitations-of-the-open-source-data-visualization-tool-by-airbnb-8dc8ac81efa9)
- Lesser Known Python Libraries for Data Science (https://medium.com/analytics-vidhya/python-libraries-for-data-science-other-than-pandas-and-numpy-95da30568fad)
- 5 Bite-Sized Data Science Summaries (https://towardsdatascience.com/5-bite-sized-data-science-summaries-a5afb8509353)
- 10 Useful tools and libraries for Programmers and IT Professionals (https://hackernoon.com/10-useful-tools-and-libraries-for-programmer-and-it-professionals-914e64e0eabc)
- Recurrent Neural Networks by example in Python (https://towardsdatascience.com/recurrent-neural-networks-by-example-in-python-ffd204f99470)
- The most in-demand skills for data scientists (https://www.kdnuggets.com/2018/11/most-demand-skills-data-scientists.html)
- The five best frameworks for Data Scientists (https://medium.com/@ODSC/the-top-five-best-frameworks-for-data-scientists-6a0c42865755)
- Why people with no experience want to become data scientists? (https://www.datasciencecentral.com/profiles/blogs/why-do-people-with-no-experience-want-to-become-data-scientists)
- Why do I call myself a data scientist? (https://towardsdatascience.com/why-do-i-call-myself-a-data-scientist-d50649ddd6fe)
- Creating business value with Data Science - Part 2: A birds’ eye view (https://towardsdatascience.com/creating-business-value-with-data-science-part-2-a-birds-eye-view-552f1ed04fae)
- Cool factor: how to steal styles with Machine Learning, Turi Create and ResNet (https://towardsdatascience.com/cool-factor-how-to-steal-styles-with-machine-learning-turi-create-and-resnet-54f95fa9f26f)
- 10 Data Structure, Algorithms, and Programming Courses to crack any coding interview (https://hackernoon.com/10-data-structure-algorithms-and-programming-courses-to-crack-any-coding-interview-e1c50b30b927)
- Draco: representing, applying and learning visualization design guidelines (https://medium.com/@uwdata/draco-representing-applying-learning-visualization-design-guidelines-64ce20287e9d)
- Spotify Data Project Part 1 — from Data Retrieval to First Insights (https://towardsdatascience.com/spotify-data-project-part-1-from-data-retrieval-to-first-insights-f5f819f8e1c3)
- A Simple Example of Pipeline in Machine Learning with Scikit-learn (https://towardsdatascience.com/a-simple-example-of-pipeline-in-machine-learning-with-scikit-learn-e726ffbb6976)
- A few useful things to know about Machine Learning (https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)
- Learning Python: from zero to hero (https://medium.freecodecamp.org/learning-python-from-zero-to-hero-120ea540b567)
- A Guide to Machine Learning in R for Beginners: Logistic Regression (https://medium.com/@parulnith/a-guide-to-machine-learning-in-r-for-beginners-part-5-4c00f2366b90
- A Light Introduction to Transfer Learning in NLP (https://medium.com/@ibelmopan/a-light-introduction-to-transfer-learning-for-nlp-3e2cb56b48c8)
- Graph Databases for Beginners: Graph Search Algorithm Basics (https://neo4j.com/blog/graph-search-algorithm-basics)
- How to version control Jypyter notebooks (https://nextjournal.com/schmudde/how-to-version-control-jupyter)
- AWS vs Google Cloud Pricing – A Comprehensive Look (https://www.parkmycloud.com/blog/aws-vs-google-cloud-pricing/)
- Azure VM Comparison (https://azureprice.net/)
- Azure Billing REST API (https://docs.microsoft.com/en-us/rest/api/billing/)
- Easy Amazon EC2 Instance Comparison (https://www.ec2instances.info)
- Comparing AWS vs Azure vs Google Cloud Platforms For Enterprise App Development (https://medium.com/@distillerytech/comparing-aws-vs-azure-vs-google-cloud-platforms-for-enterprise-app-development-28ccf827381e)
- GCP BigQuery Documentation, Slots (https://cloud.google.com/bigquery/docs/slots)
- GCP BigQuery Documentation, Introduction to Optimizing Query Performance (https://cloud.google.com/bigquery/docs/best-practices-performance-overview)
- GCP BigQuery Documentation, Optimizing Query Computation (https://cloud.google.com/bigquery/docs/best-practices-performance-compute)
- GCP BigQuery Documentation, Optimizing Communication Between Slots (https://cloud.google.com/bigquery/docs/best-practices-performance-communication)
- GCP BigQuery Documentation, BigQuery Best Practices: Controlling Costs (https://cloud.google.com/bigquery/docs/best-practices-costs)
- GCP BigQuery Documentation, Quotas & Limits (https://cloud.google.com/bigquery/quotas)
- GCP BigQuery Documentation, Creating an Authorized View in BigQuery (https://cloud.google.com/bigquery/docs/share-access-views)
- GCP BigQuery Documentation, Avoiding SQL Anti-Patterns (https://cloud.google.com/bigquery/docs/best-practices-performance-patterns)
- GCP BigQuery Documentation, Standard SQL Query Syntax (https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax)
- AWS vs Google Cloud Platform. What is better for DevOps in the cloud? (https://medium.com/@Iren.Korkishko/aws-vs-google-cloud-platform-what-is-better-for-devops-in-the-cloud-434a7cefa25a)
- Google Cloud vs. AWS in 2018, comparing the giants (https://kinsta.com/blog/google-cloud-vs-aws/#)
- Amazon Redshift Performance Optimization - Leveraging fleet telemetry (https://www.allthingsdistributed.com/2018/11/amazon-redshift-performance-optimization.html)
- Purpose Build Databases in AWS - A one size fits all database doesn’t fit anyone (https://www.allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html)
- Integrated querying of SQL Databases data and S3 data in Amazon Redshift (http://sites.computer.org/debull/A18june/p82.pdf)
- Introduction to Kubernetes (https://www.edx.org/course/introduction-to-kubernetes)
- Storytelling in the workplace (https://www.edx.org/course/storytelling-workplace-ritx-skills104x-1)
- Spraying the bullshit off “vision” and “strategy” (https://medium.com/@cote/spraying-the-bullshit-off-vision-strategy-9f9b8e266b36)
- How to go from “Procrastinate Hero” to “Procrastinate Zero” (https://convertkit.s3.amazonaws.com/assets/documents/77957/1374161/from-procrastinate-hero-to-procrastinate-zero-2018.pdf)
- 50+ Data Structure and Algorithms Interview Questions for Programmers (https://hackernoon.com/50-data-structure-and-algorithms-interview-questions-for-programmers-b4b1ac61f5b0)
- Performance matters: Amazon Redshift is now up to 3.5x faster for real-world workloads (https://aws.amazon.com/blogs/big-data/performance-matters-amazon-redshift-is-now-up-to-3-5x-faster-for-real-world-workloads/)
- Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 1 (https://blogs.msdn.microsoft.com/analysisservices/2017/05/15/building-an-azure-analysis-services-model-on-top-of-azure-blob-storage-part-1/)
- AMPLab Big Data Benchmark (https://amplab.cs.berkeley.edu/benchmark/)
- Intel HiBench Hadoop Big Data Benchmark (https://github.com/intel-hadoop/HiBench)
- SQL at Scale with Apache Spark SQL and DataFrames — Concepts, Architecture and Examples (https://towardsdatascience.com/sql-at-scale-with-apache-spark-sql-and-dataframes-concepts-architecture-and-examples-c567853a702f)
- Transgender as an 11-Year-Old Middle School Student: ‘I. Am. A. Boy. That’s it.’ (https://medium.com/thewashingtonpost/transgender-as-an-11-year-old-middle-school-student-i-am-a-boy-thats-it-be088a206959)
- 'Growing up, it felt like I was too gay to be black and too black to be gay' (https://www.bbc.co.uk/bbcthree/article/c9625c21-d69f-4524-88d8-ab2f50d0e587)
- Data Analysis for Beginners; SQL, Data Visualizations, Python, R and more (https://medium.com/data-analysis-for-beginners)
- Queer voices, in the workplace, business, technology and culture (https://medium.com/queer-voices)