Proposal (WIP): PushReader, a simpler reader API #113

josh-newman · 2020-09-16T19:34:46Z

This is a concept for making data reading simpler than ReaderFunc by allowing "normal" Go state (in a closure).

slice := bigslice.PushReader(Nshard, func(shard int, push func(string, int)) error {
	fuzzer := fuzz.NewWithSeed(1)
	var row struct {
		string
		int
	}
	for i := 0; i < N; i++ {
		fuzzer.Fuzz(&row)
		push(row.string, row.int)
	}
	return nil
})

The performance cost of this may be significant; I haven't measured yet. I wanted to start by having a concrete example of what user code will look like.

mariusae · 2020-09-17T08:55:12Z

Shouldn't this be called WriterReader? "Push" seems to imply streaming to me.

mariusae · 2020-09-17T08:55:23Z

Shouldn't this be called WriterReader? "Push" seems to imply streaming to me.

or "WriteReader" maybe

josh-newman · 2020-09-18T00:54:53Z

I'd worry about WriteReader looking confusing to a newcomer, especially if we add a similar variant of WriterFunc (since those users may want to use defers, too), and we end up with ReaderFunc, WriteReader, ReadWriter, WriterFunc 😄.

josh-newman · 2020-09-18T00:58:10Z

But I'll think more about naming.

mariusae · 2020-09-18T07:53:14Z

I agree. We should probably be consistent about what the prefix and suffix is. For example, if the suffix indicated the data flow (i.e., we should have called it FuncReader instead of ReaderFunc), then this would be clearer I think..

jschellenberger · 2020-10-20T04:01:32Z

I would fully support having this functionality. It vastly simplifies ReaderFunc()

josh-newman · 2020-11-06T03:53:51Z

I haven't thought more about naming yet, but I wrote a basic in-memory benchmark with a trivial reader task. Very roughly, it seems ~30% slower. Perhaps for a non-trivial program, this 30% extra overhead may be insignificant compared to the readability advantage.

josh-newman · 2020-11-10T03:42:58Z

I vectorized channel operations, which has a minor (if any) effect on the existing benchmarks, but intuitively may help when there's more parallelism.

I added a benchmark to demonstrate that the overhead involved is minimal if each row computation does more work, which might make this suitable for several GRAIL-internal usages. As @jcharum pointed out, there's still some reflect overhead, but it's the same situation in other bigslice ops, so it's probably ok here.

In terms of naming: most of the existing Slice-producing API functions have active/verb names. In that spirit, we could call this bigslice.Read, and it can be the default choice for users since it's easier to use. Then we can mark ReaderFunc (or actually rename it) as a second choice for special situations.

josh-newman force-pushed the pushreader branch from da76a35 to 71de8bb Compare September 16, 2020 19:56

josh-newman added 2 commits November 6, 2020 02:34

Push reader

5c99597

Comparative benchmark

434562a

josh-newman force-pushed the pushreader branch from 71de8bb to 434562a Compare November 6, 2020 03:50

josh-newman added 3 commits November 10, 2020 03:11

vectorize chan ops

a75067d

simulate more expensive per-row work in benchmark

d22fdea

fix error reflection

1746e1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal (WIP): PushReader, a simpler reader API #113

Proposal (WIP): PushReader, a simpler reader API #113

josh-newman commented Sep 16, 2020

mariusae commented Sep 17, 2020

mariusae commented Sep 17, 2020

josh-newman commented Sep 18, 2020

josh-newman commented Sep 18, 2020

mariusae commented Sep 18, 2020

jschellenberger commented Oct 20, 2020

josh-newman commented Nov 6, 2020

josh-newman commented Nov 10, 2020

Proposal (WIP): PushReader, a simpler reader API #113

Are you sure you want to change the base?

Proposal (WIP): PushReader, a simpler reader API #113

Conversation

josh-newman commented Sep 16, 2020

mariusae commented Sep 17, 2020

mariusae commented Sep 17, 2020

josh-newman commented Sep 18, 2020

josh-newman commented Sep 18, 2020

mariusae commented Sep 18, 2020

jschellenberger commented Oct 20, 2020

josh-newman commented Nov 6, 2020

josh-newman commented Nov 10, 2020