Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically calculate flow sizes when using ">" or "<" as an amount #77

Merged
merged 1 commit into from
May 28, 2024

Conversation

Ivoah
Copy link
Contributor

@Ivoah Ivoah commented Mar 20, 2024

Flows with "<" as an amount will fill the remaining space from their source.
Flows with ">" as an amount will be the sum of their targets.

Flows with "<" as an amount will fill the remaining space from their source.
Flows with ">" as an amount will be the sum of their targets.
@nowthis nowthis self-assigned this Apr 5, 2024
@nowthis
Copy link
Owner

nowthis commented Apr 5, 2024

Thanks @Ivoah. This is a good start.

I've run some scenarios with it and there are some edge cases I'm working on resolving. Some examples:

  • A Node may have flows both in and out which want to use any existing surplus (e.g. in the default Budget diagram, "Other" could have [>] and "Food" could have [<]). That should be handled gracefully.
  • If more than one flow on one side of a Node is trying to use the surplus, how should that be handled?
    • My current set of revisions will split the surplus evenly among all consuming flows.
  • What happens if every flow in the whole diagram is using one of these tokens and nothing has an actual numeric value?
  • ...etc., etc.

No need to make any changes on your end, I think I've got it from here.

I expect to ship a revised version of this that handles all the edge cases I can think of in the next few weeks.

@nowthis
Copy link
Owner

nowthis commented May 19, 2024

So, I kept finding interesting edge cases to deal with. Most are resolved now.

The ones that remain are too deep to deal with at this time (and do have practical workarounds), so I am close to merging this & then adding my followup changes.

Here are some samples of what it currently looks like in practice.

The simple case where only one flow is unknown:
one-remainder

Note - there will be a new 'Console' area underneath the diagram which will log each flow calculation. This will help in cases where you ended up with unexpected results.

Here's what it looks like for the simple case above:
one-remainder-console

Splitting a remainder into more than one calculated flow will divide it evenly.
Here's a group of three distinct flows consuming the same remainder:
three-remainders-small

If you don't want the remainders to appear evenly divided, one approach is to actually add more unknowns and then group them as intended.

Here's a pair of remainder-flow groups divided 60/40 by using five flows (3 of which are flowing to 'Remainder A' and 2 to 'Remainder B'), plus their console output:
sixty-forty-small sixty-forty-console

Notice that in cases where an amount is getting divided, there can be cases where we have to increase the precision displayed in order to convey the result accurately.

All of the amounts in these inputs had no decimal places, but adding more decimal places for these divided flows was better than rounding them.

(Fun edge cases which convinced me that the displayed precision had to be increased when dividing:

  • Given a node of size 1 split into 2 even subflows: the child flow sizes would each display their (rounded) value as 1 instead of 0.5, implying that the original amount of 1 had magically become 2.
  • Given a node of size 1 split into 3 or more even subflows: all children would show their rounded sizes as 0, which made no sense.)

These calculations do work in the reverse direction - you can have a flow which is made the right size to provide any missing amount for a node:

calculated-to-the-left-small


It is possible to legitimately have flows of zero in some cases, because there isn't always a remainder to be consumed.

Here's the logical endpoint of that path: what if nothing in the whole graph has an actual defined value?

all-zeroes

(Making 'nothing' still render in a balanced sort of way was one of the interesting diversions here.)


I still have a conundrum to resolve about the syntax itself; I will post a separate comment about that.

@nowthis
Copy link
Owner

nowthis commented May 19, 2024

On syntax:

Here's a use case I expect to be common:

A budget with one main unknown value, possibly being split into sub-values in a later stage.

In this first image, the remainder for Savings to consume is 300.
savings-300

In the second, the only difference in the inputs is that 'All other expenses' was decreased by 300; the colorful flows all automatically adjusted to consume the new remainder:
savings-600

The question is: Which of these two syntaxes will be more intuitive to the casual user to indicate this relation?

Wages [1500] Budget

Budget [400] Housing
Budget [500] All other expenses
Budget [<] Savings

Savings [<] 401K
Savings [<] Bank Account
Savings [<] Mattress
Wages [1500] Budget

Budget [400] Housing
Budget [500] All other expenses
Budget [>] Savings

Savings [>] 401K
Savings [>] Bank Account
Savings [>] Mattress

The argument for <: The target amount is drawn from the source.

  • If you look at the flow declaration and ask 'where is this amount coming from?' you can think of this symbol as an arrow answering that question by pointing back to the source node.

The argument for >: The source node is sending out its remainder to one or more targets.

  • When there are multiple outputs (like the last three in this graph), you can read it as three arrows from Savings to all three children.

I think it's also useful to consider the less-frequent, opposite case of calculated inflows too.
calculated-to-the-left-small

The source for this "Income to add" flow would look like one of these two options:

Current Income [12] Budget
Income to add [>] Budget

Budget [10] Goods
Budget [10] Services
Current Income [12] Budget
Income to add [<] Budget

Budget [10] Goods
Budget [10] Services

The argument for representing a left-calculation with >:
"Income to add" is getting its value from "Budget". The arrow is pointing to the data source.

The argument for representing a left-calculation with <:
"Budget"'s size is what determines the amount for "Income to add". In a sense it is providing its amount to the source node. The arrow is pointing in the direction of the data flow.


I've been round and round about this...

When I contemplate writing the documentation for this feature, that's what makes me lean toward swapping the two symbols from what was submitted in the pull request.

I believe it will be easier to explain (& easier for people to remember) if the convention is that the arrow represents the direction the data is flowing, namely:

  • > means the source node is providing a value to the target.
  • < means the target node is providing a value to the source.

I am not absolutely convinced this is right, but when working on some complicated sample diagrams I did find myself getting confused when the symbols were the other way around.

Feedback on this is welcome; my intuition may not match anyone else's on this.

@Anthchirp
Copy link

Feedback on this is welcome

Well here goes: I agree with your intuition.

The entire diagram description is about the flow.

Budget [...] Goods

If I put 300 in there it is clear that 300 goes from Budget to Goods. Equally, if I read

Budget [→] Goods  # yes, I know it's '>'

my immediate interpretation is that remainder goes from Budget to Goods. In my LTR-mind this represents the simple case, reads natural, and feels intuitive and right. Say there was only one outgoing flow from Budget. You wouldn't need to read the syntax documentation to understand what was going on.

Whereas, if I read

Budget [←] Goods  # yes, I know it's '<'

I basically stumble over it when reading it. I know that something more complicated is happening. Something is going from Goods to Budget - in this case information - yet the diagram description tells me numbers are still going from Budget to Goods.

However, swapping the symbols doesn't help with that. In fact, for me, it makes both scenarios more difficult to reason.

Another alternative could be to use a different symbol for <, but would that be any clearer? Eg.

Budget [*] Goods
Budget [~] Goods
Budget [:] Goods
Budget [=] Goods

The closest of those, I think, would be = for equalize, but I suspect one would like to read it as equal, which isn't helpful.

@duffry
Copy link

duffry commented May 24, 2024

I think the use of angle brackets only works if they aren't considered arrows. For me they indicate a branching flow. So conceptually they would "point" towards the source, or collective element. This seemed to work well with:

Wages [1500] Budget

Budget [400] Housing
Budget [500] All other expenses
Budget [<] Savings

Savings [<] 401K
Savings [<] Bank Account
Savings [<] Mattress

Budget to savings doesn't seem quite as natural as there's only one flow, but the children of savings make a lot of intuitive sense to me.

That said, angle brackets can read as arrows, so I agree with Anthchirp that this can feel "backwards" to the flow of the diagram and cause stumble.

Use of alternate symbols seems reasonable. Is a blank appropriate (Budget [] Goods) or will that be read as an omission?
My first thought for an alternate was the tilde. It has the "feel" of flow and the "sense" of the unknown which "seems" appropriate. Reaching for intuition here.

Obviously a symbol is brief but would a word work instead?

Budget [calc] Goods
Budget [split] Goods
Budget [other] Goods

A quick thought on the remainder splitting.

I like the thinking above, especially the provision to allow for weighted calculated pots. However, fractional remainders and the increasing of precision felt uncomfortable to me. It seems a simple solution and one that is far from wrong but left me glitching on things that just don't split below 1 well (people, for example).

I don't have a better solution but my first thought was to let the precision stand and assign remainder based on order - leaving unequal sets.

Employees [1] Owners
Employees [8] Staff

Staff [<] Sales
Staff [<] Production
Staff [<] Admin

Resulting in

Screenshot 2024-05-24 113732

Like I say, it's not a better solution but maybe a toggleable option at some point so you can choose 'accuracy' vs 'display precision' primacy.

@nowthis
Copy link
Owner

nowthis commented May 25, 2024

I'm thinking of changing tack since (as just demonstrated) people's intuitive interpretations about < and > can differ.

My current thought is ? and !... I'll walk through the thinking below.

Also, from another perspective, on software-based keyboards (like tablets and phones), < and > are sometimes two taps away from the default keyboard. It would be nice to pick more convenient characters to reach. (Unfortunately the brackets for [amounts] are in the same situation as <>, but that syntax is pretty established for now... I can at least try not to make things worse.)

In the old pull request #41, the symbol * was proposed, and that one had appealed to me a lot as the one to use for 'consume any remainder that the source node has'.

  • (I do disagree with the idea in that PR that it is unambiguous which direction any given flow would be pulling from. I think it will be important for clarity to have a separate symbol for 'consume the source's remainder' versus 'fill in any missing input for the target'.)

So if * = "Use Remainder", then what symbol would make sense as "Fill in Missing Input"?

I was thinking of ? as the "Fill in Missing Input" operator for a while. (There's a vague parallel here where both * and ? have meanings/usage that relate to each in Regular Expressions, but those meanings don't really map to this usage well, so that's not particularly helpful.)

Then I took a step back and thought about it this way:

Imagine a person encountering ? as an amount, say in a set of inputs like this:

Wages [1500] Budget

Budget [400] Housing
Budget [500] All other expenses
Budget [?] Savings

...I think that even if a person is brand new to these diagrams, they would fairly quickly be able to interpret that ? as 'use anything left over from Budget in that flow'. And I think a * would not be as obvious to interpret that way.

If ? = "Use Remainder", I would want something in the same semantic neighborhood to represent its opposite, "Fill in Missing Inputs".

  • I'm definitely not going to use . or , since those can be parts of actual numbers... also they're so small that it's hard to even parse visually which is which on a small display.
  • You could do a double question mark, ??. But I think that could introduce other confusion. (Which version is for which direction? And how do you remember?)
  • * might work...
  • but ! has the virtue of being the semantic opposite of ? in everyday life. If ? represents one case, I think that it's not going to be too hard to remember (after learning it in a hint) that ! represents the reverse case.

Here's how that would look in the above 'Fill in Missing Inputs' example:

Current Income [12] Budget
Income to add [!] Budget

Budget [10] Goods
Budget [10] Services

Also, design-wise, I am taking care to define these symbols in only one place in the code, so that if someone does wish to use different symbols in their own fork, it will be trivial to change either or both of them.

@nowthis
Copy link
Owner

nowthis commented May 26, 2024

Side note – keying off of @duffry's comment about how a blank flow like a [] b might be interpreted – I'm trying out treating that kind of empty flow as a skippable line (basically like a comment) rather than objecting to it as a syntax error or showing it as a 0-size flow. (The new Console section will note that the line was skipped as an empty flow.)

The user experience I'm anticipating there is that one can then just cut/delete the amount from between the brackets to see the diagram without that flow, then Cmd-Z/Ctrl-Z to see it put back.

I think that'll be slightly more convenient than going to the start of the line and typing // (though that will also still work).

@nowthis
Copy link
Owner

nowthis commented May 26, 2024

On tweaking the splitting mechanism -

  1. I'm not sure how often people will be splitting remainders into multiple flows in the first place; I expect most remainders are going to be captured with one [?] per source node (and therefore not require any splitting). If someone does use 3 [?]s and they were expecting whole numbers, I think it may be a helpful feature to let them know it's not dividing evenly.
  2. I don't have any sense what percentage of diagrams assume whole-number flows.
    In a financial diagram I think you'd more often want to see $9 become 2 x $4.50, not $5 / $4...
    Job-search type stories do use whole numbers, but in a job-search diagram I wouldn't expect someone to be splitting a remainder flow, either.
    All of which to say: for now, I'm sticking with the current concept of the split being even, with a bit of added precision.
  3. However!
    I do intend to extend this syntax in the future to support specific percentages of the remainder so that one doesn't have to do the multi-flow hack I mentioned above to produce non-even proportions. At this time I think the syntax would look like the following to achieve the effect in @duffry's example:
Employees [1] Owners
Employees [8] Staff

Staff [?.375] Sales
Staff [?.375] Production
Staff [?.25] Admin

...though I can recognize that approach might not be ideal because you could only decide those percentages when you already know the size of Staff...

I guess an alternative would be to add a toggle which basically requires that whole numbers in remainders would never be split.

In my theoretical example of a node of size 1 getting split into 3, that would mean that the first consumer of the remainder here (c) would get the 1 and the other two consumers (d,e) would both get 0:

a [1] b
b [?] c
b [?] d
b [?] e

That seems like it might be practical. I would likely have it off by default though.

(P.S. If anyone is wondering why % has not come up as a syntax option this whole time, that's because I would also still like to implement the percentage feature mentioned in #32, which is different from everything discussed here. That issue is concerned with percentages of a node's entire total, not of an unused remainder.)

@nowthis nowthis merged commit 6282e91 into nowthis:main May 28, 2024
@nowthis
Copy link
Owner

nowthis commented May 28, 2024

When actually merging this and marshaling the other 10+ commits that followed it (from 6282e91 to b1af9c2), I came to a couple of revised conclusions in the cold light of day:

  • ? + ! didn't feel right - ! is an imperative mark, which doesn't make sense when expressing an unknown.
  • So I stuck with the original wildcard from Add support for [*] operator #41, *, and assigned ? as the operator for "Fill in missing inputs". In practice that has felt just fine as a syntax.
  • For now I'm skipping the even splitting of remainders when there are multiple unknowns being calculated.
    It introduced a lot more code complexity than all of these other changes did (including the precision issue discussed above), and it's possible that that won't be needed when(if) it's possible later to use *.3 to express "use 30% of the remainder".
    What that means for now is that the first unknown that can consume an amount consumes all of it, leaving none for others. So in the case of a node with size = 1 having three consumers: the first one has a value of 1 and the other 2 get 0.

The changes are here on GitHub but are not yet promoted to sankeymatic.com. The site should get these changes applied in the next day or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants