Skip to content

onsitejs/offsitejs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OffSiteJS

πŸ“Š Tiny, decentralized analytics
🧠 privacy by design
πŸŽ‰ Free and open source

Deploy to Cloudflare Workers TypeScript


Features:

  • 🦐 Tiny serverless function with both pixel and lite analytics payload in Javascript
  • πŸ” Privacy-conscious, with ability to hash or delete traffic like IP address or cookies
  • 🚒 Forward-anywhere logging (see function logData)
  • πŸ”₯ Host local with miniflare, or
  • β›ˆοΈ Deploy on the Cloudflare Worker platform for global edge script delivery

To-Do:

  • βœ… Pixel payload
  • βœ… Lite Javascript payload
  • ⏹️ Integration with Orbit as a logging destination
  • ⏹️ Integration with Backblaze B2 as a logging destination
  • ⏹️ WebUI for folks to create-their-own analytics tag
  • ⏹️ Web3 integration to encrypt analytics data and store user-sepcific content

Contents

Using OffSiteJS

There are two options for using OffSiteJS: the use of a pixel, or with the JavaScript payload.

β›” Logging is currently limited to console.log(). Integrations forthcoming. β›”

Pixel analytics

Place the following tag on your webpage:

<img src="https://s.offsitejs.org/pixel.gif" style="display:none"></img>

Javascript payload analytics

Place the following tag in the <body> tag, near the bottom before </body>:

<script src="https://s.offsitejs.org/offsite.js"></script>

From a signals collection standpoint, there is no benefit to using both tags.

Hosting your own OffSiteJS instance

If you are hosting OffSiteJS yourself, you would replace the hostname in the tags aboe like so.

https://s.yourwebsite.com/offsite.js

or maybe you could even put the record to a subdomain like

https://analytics.yourwebsite.com/offsite.js

Running Locally and On Server

With miniflare (recommended):

miniflare src/index.js  -c wrangler.toml -w -d --host 127.0.0.1 --port 8080

As you might imagine for miniflare, the host can be changed to 0.0.0.0 and the port 80 should you want to run this on a host, server, VPS. There are no dependencies other than miniflare needed to do this.

with wrangler:

wrangler dev

Publishing to Cloudflare

Assuming you are logged into Cloudflare with wrangler:

# must be in src/ directory: cd src
wrangler publish

Ensure the proper configuration of route, and zone_id in src/wrangler.toml.

Building from TypeScript

While it is possible to compile TypeScript source to a Javascript worker acceptable by the Cloudflare Workers platform / miniflare, there are some eccentricities that prevent it from working out of the box. Namely, the use of exports.module in TS is not accepted, and Workers insists on the export {function | const | etc} syntax.

To accomplish a portable worker in this repo, there is a patchfile in the src/ directory intended to be applied on index.js after it is built from TypeScript. The patch was created like so:

  1. First, command tsc --lib es2020,dom index.ts was run to compile index.ts into index.js
  2. After adding and commiting index.js with git, the proper changes were made to the source file (e.g. deletion of certain lines).
  3. A patchfile was generated with the command git diff index.js > patchfile and saved in the src/ directory.

Now anyone can compile the index.ts file as it is in the src/ directory, and then apply the patch to the newly generated index.js with patch -p1 < patchfile. This process is run as a build command everytime the worker is built locally with miniflare or wrangler dev.

There are some limitations, of course, as index.ts becomes more and more modified over time and by users. The patch command can only do so much to find the necessary changes to be made. But hopefully this description of how this system works will help future users, or, will indicate that the benefits of writing in TypeScript are not worth the hassle and that we should just move to pure JavaScript instead :)

Contributions welcome

Please help yourself to filing bugs, features, comments in the Issues tab! And of course, you are welcome to launch a pull request if you'd like to contribute directly.

Design and Development Decisions

The point of this project is to provide web analytics without the hassle of signing up for a service or a 3rd party.

There is false dichotomy between being able to conduct deep signals collection and protecting user privacy. Several proponents of the privacy by design school of thought have demonstrated there are plenty of technological opportunites to provide better privacy-conscious experiences on the web.

We also don't want to lose out on data key to valuable insights. But those insights cannot come at the cost of sacrificing privacy. Ideally, OffSiteJS would provide a library to automate the aggregation of data so no one individual session can being targeted, or is perhaps presented in the context of a frequency vector for that specific field to show anomalies without showing individual sessions.

Core Principles

  1. Decentralize where possible.
  2. Distributed when necessary.
  3. Do good.

The principles above extend the Privacy By Design principles of:

  1. Minimise: The amount of personal information that is processed should be minimal.
  2. Hide: Any personal information that is processed should be hidden from plain view.
  3. Separate: The processing of personal information should be done in a distributed fashion whenever possible.
  4. Aggregate: Personal information should be processed at the highest level of aggregation and with the least possible detail in which it is (still) useful.
  5. Inform: Data subjects should be adequately informed whenever personal information is processed.
  6. Control: Data subjects should have agency over the processing of their personal information.
  7. Enforce: A privacy policy compatible with legal requirements should be in place and should be enforced.
  8. Demonstrate: Demonstrate compliance with the privacy policy and any applicable legal requirements.

About Cloudflare and Miniflare

Cloudflare was chosen due to its ability to provide lightweight edge services with little infrastructure configuration on the part of the user. Obviously, Cloudflare isn't exactly what comes to mind when people think of a decentralized computing paradigm. Perhaps one might consider Cloudflare's Workers platform a step in the right direction towards distributed computing, as there is no central server where OffSiteJS is running. Ultimately, the hosted flavor of OffSiteJS won't store any logs. This can be verified by the source code here and the GitHub deploy action that deploys the serverless worker.

miniflare also poses some challenges. It is excellent in many ways, as it does IP WHOIS and geolocation lookup at request time; this is great, as we can get approximate location information without storing the IP address and looking it up ourselves. However, the way this works is that miniflare does the lookup to a trusted Cloudflare endpoint, meaning by default, miniflare will be doing a lookup on an external service. In their own words:

For a more accurate development experience, Miniflare automatically fetches the cf object for incoming requests (containing IP and location data) from a trusted Cloudflare endpoint, caching it for 30 days. You can disable this behaviour, falling back to a default cf object, using the --no-cf-fetch flag.

So using the --no-cf-fetch flag with miniflare will make sure it won't do lookups to Cloudflare, albeit at the cost of IP WHOIS and location data. If you wish to get that data at request time, the solution would to integrated with the industry-preferred MaxMind Geolite DB.

Room for Improvement

OffSiteJS is a fork of an ongoing project, OnSiteJS(https://onsitejs.org) whose mission is to provide a better alternative to Google Analytics. Where OffSiteJS will differ most will hopefully be its focus on decentralization. This will in part be accomplished with integrations with great projects like IPFS, Orbit, web3.js and more. But it will also require some radical thinking as to:

  • what is "enough" data for an average user to be able to make great insights about their website?
  • what is "too much" data for the software, data pipeline, infrastructure provider, etc, to be able to see in transit?

Let's say Google Analytics is the ceiling of what is acceptable. That is not because they give the customer sooooo much data, but rather they are collecting sooooooooooooooooo much data for themselves. In comparison, many projects will surely suffice the overall goal of providing privacy-conscious analytics.

So, the goal of OffSiteJS should be to make the most decentralized web analytics possible, without sacrificing usability or utility.

The roadmap is somewhat open, as I am not completely sure how this will be accomplished (yet). Integration with a web3 wallet is an interesting concept, not just because it offers user authentication without a central services provider, but also due to the great APIs and developer community around the use of web3.

References

Related Projects