Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide data to graphs beyond Float pairs #27

Open
KarthikRIyer opened this issue Jun 11, 2019 · 19 comments
Open

Provide data to graphs beyond Float pairs #27

KarthikRIyer opened this issue Jun 11, 2019 · 19 comments

Comments

@KarthikRIyer
Copy link
Owner

KarthikRIyer commented Jun 11, 2019

Initial discussion regarding this can be found here:
#23 (comment)

@KarthikRIyer KarthikRIyer changed the title Provide data beyond Float pairs Provide data to graphs beyond Float pairs Jun 11, 2019
@KarthikRIyer
Copy link
Owner Author

@BradLarson @marcrasi, as the first evaluation is coming up on 24th, is there anything specific I need to complete before that?
I am trying to change one plot to Generics and see how that works out. If that I am unable to get it working before the first evaluation maybe I can update the documentation for the work done till now.

@KarthikRIyer
Copy link
Owner Author

KarthikRIyer commented Jun 20, 2019

Now I am not sure if generics is the correct way to proceed.
I am unable to use operators like:

static func + (left: Point, right: Point) -> Point {
        return Point(left.x + right.x, left.y + right.y)
    }

Even if I write a protocol Addable overloading the + operator, it wont work because then I still wont be able to use other operations like * - / etc.
If I write protocols for each of these then I wont be able to use String.

Making the input for each plot conform to specific protocols seems like the right way. But is that not kind of what we are doing right now? We are accepting only specific data types. Maybe we can also have overload for Int, Double, etc...but we'll have to add extra variables for them in the Point type like I did for String...

How was this handled in CorePlot?

@marcrasi
Copy link
Collaborator

@BradLarson @marcrasi, as the first evaluation is coming up on 24th, is there anything specific I need to complete before that?

No, you've already done everything I'm looking for in the first milestone and you're doing really great in everything the evaluation asks about :)

@marcrasi
Copy link
Collaborator

about generics, I haven't thought much about how you could use them for plotting data types, but here's an initial answer to your question.

If you make Point generic over the x and y types like Point<X, Y> then you can define operations that only make sense for certain types of point like this:

extension Point where X: AdditiveArithmetic, Y: AdditiveArithmetic {
  static func + (left: Point, right: Point) -> Point {
    ...
  }
}

@KarthikRIyer
Copy link
Owner Author

Thanks! I didn't know that we could write such generic specific extensions. I'll try this and see if it helps.

@BradLarson
Copy link
Collaborator

@KarthikRIyer - On the matter of evaluations, as Marc said, you're doing great. You've hit the initial milestone objectives and gotten this to a demonstrable state both at the console and in Jupyter notebooks.

In fact, your work in using lodepng, setting up your Swift / C / C++ targets, and interacting with Jupyter was really informative to watch and helped me finally get my GPUImage framework operational once again on Linux and with Jupyter notebooks. Learning goes both ways on a project like this.

@KarthikRIyer
Copy link
Owner Author

@BradLarson glad that working together on this helped you too!
I'm hoping to complete all the milestones by the second evaluations, and then maybe I can work on supporting macOS and iOS.

@KarthikRIyer
Copy link
Owner Author

@BradLarson @marcrasi I don't think generics is a good because when I started by turning the Point type to a generic, I eventually realised that I had to convert all types to generics. Even the types like Series, Axis, etc. It's also becoming kind of difficult to handle different types. For example:
If I need to find the number of digits in 324.01, before, I converted it into Int which gave me 324 and I got the digits. Now I am using a generic T conforming to a protocol I wrote called NumericType. I am unable to convert it into an Int unless the generic conforms to BinaryInteger. I checked an existing plotting framework thinking it would help: https://github.com/danielgindi/Charts

This didn't help. They seem to have used Double everywhere and have different data input types for each plot, like LineChartDataEntry, PieChartDataEntry, BarChartDataEntry.
It's done in a similar manner in a popular android plotting framework: https://github.com/PhilJay/MPAndroidChart

I had faced the same issue when working on my previous plotting project Graph-Kit. I'd opted to use float then.

Can we use the same method we're using right now? Maybe use Double instead of Float?
How was this handled in CorePlot? I was trying to read the file CPTNumericDataType.m, but couldn't understand much.
Any ideas on how we can proceed?

@KarthikRIyer
Copy link
Owner Author

Till this is figured out I'll start work on histogram.

@KarthikRIyer
Copy link
Owner Author

Also could you give me an idea of how people would want to plot using Tensors?
I gave this a fleeting look: https://www.tensorflow.org/swift/tutorials/model_training_walkthrough
But in the end plotting is done using arrays

@BradLarson
Copy link
Collaborator

Sorry I've been away from this, have been busy with other matters. For types, I think it's worth approaching this from the standpoint of what is needed by the plot types and what people will want to provide as inputs.

For a scatter plot, for example, your plots will need a type on the X and Y axes that can be converted to a floating-point number so that display arithmetic can be done with it. For a bar chart, the Y axis will need to be floating-point convertible, but the X axis will need hashable values (for binning) that can be represented as a String. You need to somehow define the inputs (probably using protocols) to a plot, and then allow users to supply the various types of input data that can be converted to that.

You may not even need to use generics here, because I don't know that it's essential to preserve type information all the way through to your plot. We'd have to think about if type erasure would be acceptable.

I'm thinking that Point might not be the most expressive name here, given the different input types you might be dealing with. I could see renaming it to Pair<T, U> and possibly Triplet<T, U, V> for 2-D and 3-D plots, respectively. If type erasure wasn't a concern, you could then have scatter plots take in series of Pair<FloatConvertible, FloatConvertible> values, and bar charts could have Pair< CustomStringConvertible, FloatConvertible> values. Arithmetic and other operations could be done by first casting the FloatConvertible values to Float and working with them from there.

@BradLarson
Copy link
Collaborator

Also, is pairing off each datapoint going to work for all cases? In a scatter plot, I can see people wanting to provide two or more series of Y values that share the same series of X values. Is this approach generic enough to support that?

@KarthikRIyer
Copy link
Owner Author

I'll try working with the Pair<> approach.

Regarding the scatter plot, input is in given as an array for x and an array for y using the addSeries function....
So if we need the same x values and different y values we can just add another series passing in the same x array.

@KarthikRIyer
Copy link
Owner Author

@BradLarson just one clarification...On the user end API do we want the users to pass in arrays as it is being done now, or would they pass in an array of Pair<Float,Float>?

@KarthikRIyer
Copy link
Owner Author

Leaving this open for now. Just in case we need to make changes to this part in the near future.

@karwa
Copy link
Contributor

karwa commented Dec 10, 2019

I have some things to add to this discussion (which is very old, I know).

Over the last week or so, I added a heatmap (example output). While doing so, I started to wonder about a couple of things:

  • It would be great to accept generic (user-provided) sequences/collections. For 2D data like Heatmaps, you might have quite large datasets. It would be nice if we could graph them without having to copy all of that in to an Array. I was a little disappointed that I couldn't directly make a bar-chart out of a Range.
  • Actually, not copying the input data is quite good in principle. For things like Heatmaps and bar-charts, we just need to compute some scale information and the coordinates can be calculated on-demand in the drawing function.
  • Lots of these charts are very broadly-applicable. Why can't I plot a heatmap of my custom data-type? All a heatmap really needs is a way to find the minimum and maximum elements, and where each other element lies on that scale between them.

So given all of that, I made this Heatmap work with any Sequence where Element: Sequence (i.e. a 2D sequence). To handle the actual data we require for the plot, I created an adapter type (I called it Interpolator for lack of a better name... perhaps Mapping would be more appropriate?), erasing the required calls inside a couple of closures and with some convenience methods for KeyPaths and splitting 1-dimensional collections in to an array of slices, thereby making it 2-dimensional and able to be heat-mapped.

The upshot of this is that I can draw a heatmap of a String (by its ASCII values). And rather than copying all the ASCII values or losing that type information in another way, the type is literally a Heatmap<[Substring]> (not a Heatmap<UInt8>). Note that Substring's Element type is Character, which isn't even a numeric type. The idea is that by preserving that type information, it's easier to manipulate that data again - for example, I could take a pass over that array and remove Substrings which contain a particular character, using Substring's own APIs rather than comparing ASCII values.

Anyway, buoyed by this apparent success with Heatmaps, I tried to see if this could apply to other plots by prototyping a new BarGraph. It works in very much the same way, with any Sequence, and you provide an adapter (which may be a KeyPath) which tells us how to measure the distance to some "origin" element that you provide. So yes, you can now finally make a bar-graph out of a Range<Int>, or even an Array<Student>!

Then came stacking, and I thought - it would be kind of boring if I could only stack other ranges on top of my range - why not make the stack its own plot, with its own Sequence and adapter, which wraps a bar-graph and handles measuring and drawing its own segment? And that worked surprisingly easily, and fixed a couple of bugs in the existing stacking implementation along the way. It results in an ugly, recursive generic type, but for that you can dig in anywhere in the hierarchy (via .parent), or jump straight to the root with .bargraph, and it preserves full type info of those sequences.

So yeah, those ugly names. They were a bit of a problem. For example, I have a test that creates a StackedBarGraph<StackedBarGraph<StackedBarGraph<BarGraph<[Int]>, Data>, [Float]>, LazyMapSequence<(ClosedRange<Int>), MyStruct>>.

That issue can be solved by moving to more of a functional, declarative API. Luckily, that seems to be something of a trend these days, with things like SwiftUI (which also has to deal with enormous generic towers and solves it in a similar way). The resulting code does look very pretty:

  let barGraph = (5..<20).plots
      .barChart() {
        $0.label = "Existing product"
        $0.color = .orange
        $0.formatter = .custom { String(2000 + $1) }
        
        $0.graphOrientation = .horizontal
        $0.plotTitle.title  = "Financial Results"
        $0.plotLabel.xLabel = "Profit ($m)"
        $0.plotLabel.yLabel = "Year"
    }.stackedWith((0..<15)) {
      $0.segmentLabel = "New product"
      $0.segmentColor = .green
    }.stackedWith(-10..<1) {
      $0.segmentLabel = "Bad product"
      $0.segmentColor = .red
    }

This creates a StackedBarGraph<StackedBarGraph<BarGraph<(Range<Int>)>, Range<Int>>, Range<Int>>, which is much less pretty - but luckily you never had to write that and the results end up looking like this.

That heatmap I showed at the start was generated with the following:

let data: [[Float]] = median_daily_temp_boston_2012
    let heatmap = data.plots
      .heatmap(interpolator: .linear) {
        $0.plotTitle.title = "Maximum daily temperatures in Boston, 2012"
        $0.plotLabel.xLabel = "Day of the Month"
        $0.colorMap = ColorMap.fiveColorHeatMap.lightened(by: 0.35)
        $0.showGrid = true
        $0.grid.color = Color.gray.withAlpha(0.65)
    }

Anyway, I thought it would be a good idea to discuss this before moving much further. There is still some work to do and refinements to be made, but I think these are some pretty interesting results.

This will all come after GCI, of course (or maybe after the current issues are done?), to avoid disturbing participants.

@KarthikRIyer
Copy link
Owner Author

@karwa the above results are really cool! Although I don't understand how all this works, I'm not as proficient in swift yet.
Could you give me an idea of how this part works?

(5..<20).plots
      .barChart() 

@karwa
Copy link
Contributor

karwa commented Dec 10, 2019

@KarthikRIyer Sure - so .plots is a namespace struct. Every Sequence has one, and it just wraps that sequence. It's generic, so we don't lose the underlying sequence type.

Then, for heatmaps for example, we extend SequencePlots and add construction shorthands. For a 2D sequence, we can just make a heatmap like that. If SequencePlots.Base (the original sequence type) is a Collection or RandomAccessCollection, we can add convenient shorthands to slice it and make a 2D dataset. BarChart does the same thing, extending SequencePlots and offering convenient shorthands.

I found this useful because autocomplete will show all of the available charts for your data, and type inference will pick up the correct type. Extending fundamental protocols like Sequence or Collection can be awkward because those functions appear everywhere, but by putting everything in a namespace struct we avoid annoying users and help them discover what you can do with swiftplot.

Oh, and all of those construction helpers end with a style closure, which gives the chart type as inout and allows you to set it up before it gets returned and you chain it with other things.

@KarthikRIyer
Copy link
Owner Author

Thanks for the explanation @karwa!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants