Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A way to sanitize spiked/flooded download data #173

Open
samal-rasmussen opened this issue Mar 14, 2023 · 1 comment
Open

A way to sanitize spiked/flooded download data #173

samal-rasmussen opened this issue Mar 14, 2023 · 1 comment

Comments

@samal-rasmussen
Copy link

samal-rasmussen commented Mar 14, 2023

The Svelte package recently had their download numbers flooded with a huge amount of downloads for a short amount of time. This makes the graph unreadable as the downloads spike compresses the graph all the way down.
https://npmtrends.com/svelte
Screenshot 2023-03-14 at 15 37 11

I am searching for a way to clean up this spike, so I can actually view the unsquashed graph.

I did make this super hacky workaround to flatten the date, if you're really desperate to see the realistic graph:

In the .js bundle on the npmtrends site I found the success handler for fetch requests. It looks like this:

  e.fetch = function(t, e) {
      ...
      return this.retryer = new a.m4({
          fn: y.fetchFn,
          abort: null == h || null == (o = h.abort) ? void 0 : o.bind(h),
          onSuccess: function(t) {
              s.setData(t),

I put a breakpoint of the setData line and executed this in the terminal when the t value is the graph data:

// t is the fetched data. We update it here.
t = t.map((tt) => {
    tt.downloads = tt.downloads.map((d) => {
        // These values are hand picked from reading the svelte download data.
        // You'll have to hand pick some different values for a different package.
        d.downloads = d.downloads > 155911 ? 70000 : d.downloads;
        return d;
    })
    return tt;
})

You can trigger the downloads data fetch by selecting a new time interval in the dropdown on the page.

@dominikg
Copy link

note that a similar spike happened to vue, so you're either looking at manually patching historical data on a package name + timeframe basis which is an ongoing effort, or you'd need to apply some kind of heuristic to find unrealistic spikes and flatten them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants