Skip to content

Mozlandia gfx

sanxiyn edited this page Dec 8, 2014 · 3 revisions

Servo + gfx discussion

  • Current plan: Use Moz2d + SkiaGL
  • What is causing the huge work backlog for the GFX team? Is it the multiple Moz2d backends across the various hardware, or it something we will hit even if we stick with just SkiaGL?L?
  • Long-term, should we be building our own rasterization library?

Azure/Moz2d

  • jack: Do we get benefit from using azure with one backend?
  • gfx: no.
  • jack: We were thinking of using azure to let us work on Windows.
  • gfx: Just use Skia on Windows.
  • nical: Not sure about direct-mode rendering

SkiaGL

  • jgilbert: Not sure you're getting a lot of benefit from SkiaGL. With OpenGL, you're not going to be able to do parallel rendering.
  • zwarich: Yeah, have to use something other than OpenGL, because the driver takes a global lock.
  • jack: We have problems with Skia creating waing way too many resources, particularly with GL rendering.
  • jgilbert: Try to drive CPU rendering as far as you can.
  • gfx: Being Skia-only is not a bad place; I wish gecko was there.
  • nical: Trying to move away from immediate-mode APIs. Retained-mode might be nice to avoid streaming geometry all the time. Be able to do more batching to reduce the number of draw calls. But, can't do it with an API that doesn't have have a scene graph. We're having this discussion now, and it won't happen soon, but hopefully eventually. Instead of a backend at the drawtarget boundary, instead have a displaylist/graph and pass that to an optimizing backend.. One of the backends would probably be Azure inside for doing Canvas (since it can't use a scene graph anyway). If you want to go crazy & be in the best place, about half of the gfx team would advise you to create a retained mode API.
  • jack: Not just display list, but also...
  • jgilbert: A scene graph out of layers content.
  • nical: Go through the display list. Build up batbatches. SkiaGL uses a huge amount of memory because it has this gigantic texture cache because it doesn't know what it will have to use the next frame. So you're using all sorts o fheuristics to avoid using SkiaGL when it will use too much memory. But, that's a lot of people + engineer time. So I'd understand if you want to just stick with Skia.
  • zwarich: One of the reasons you want to do it is incremental update. Seemed from roc's notes that he wanted to go from a graph of layers to instead of a unified fied scene graph for the 2D layer content and then the hierarchy. With incremental updates, doing them on the level of layers / stacking contexts is straightforward. But, if it's the 100x200 px rectangle, the incremental patching of display lists has been, in practice, less efficient than the WebKit approach of just updating the 100x200px buffer, traverse the frame tree with aggressive clipping in place. That's also much simpler than building a whole scene graph, how do we know if the scene graph approach is a good one?
  • nical: WEll, nobody's tried implementing a whole browser using a scene graph. nvidia has a demo SVG rendering engine using a scene graph.
  • gfx: The whole page is laid out and rendered on the GPu.
  • jack: nvpath?
  • nical: Yeah. It's hard to have a benchmark for Skia vs. scene graph to see the difference.
  • zwarich: It would just be nice to come up with an idea of a benchmark that we think the scene graph will be better on so that we have a dummy check check against an optimized retained mode implementation. Easy to get tied in the idea of creating a newa new library and get lost in the weeds.
  • nical: Something with lots of restyling and page changes? Something where shre sharing the layer with the compositor is going to be very expensive and chatty. Browsers aren't that differentferent from games except for transparency / anti-aliasing. Otherwise, not that different, and game engines have done their 2d rendering very differently. Plus, the more we do things differently than game engines, the more we hit driver bugs. Off main thread compositing on Windows is not very stable, but we released it. Anything we do that is more gamey is likely to hit less drivdriver bugs. The google guys put all their rendering in one process, and pay overhead for proxying their opengl stuff to it, but with all the rendering in one place, there are less issues like surface sharing where we run into n into
  • jgilbert: Dependency tracking is a huge pain point.
  • nical: We have a huge lock for synchronizing our textures. And it turns out that on some drivers, we end up with racing during compositing.
  • zwarich: So you don't even get a guarantee that submit order is repsected?
  • jglilbert: Definitely not guaranteed.
  • nical: Only window managers use them, and window managers are much simpler. As soon as we don't do exactly what they do, drivers break.
  • jgilbert: It's also hard to falsify. Because a lot of the of the time, you can drop all the locks and the OpenGL will still work!
  • jack: i.e., Servo has no synchronization and it basically works... we're just getting lucky.
  • zwarich: Also, OSX gives you a submit order guarantee.
  • jgilbert:: That flush primitive is great, and I wish we had it elsewhere!
  • zwarich: Tradeoff between batching and immediate mode is due to overhead from state changes in OpenGL driv drivers? Would being on Mantle/DX12/etc. be better?
  • jgilbert: My sense is not the state changes, but all of the synchronous upload stuff.
  • gfxgfx: Modern GPU APIs let you do streaming APIs.
  • pcwalton: We do it well on the Mac, but don't kno wbaout others.
  • jgilbert: Lock an IOSurface and just do it?
  • pcwalton: Yeah. Was easy on Mac; for Android I spent a month and never really got it working well.
  • zwarich: Most of the machines we're targeting int he next five years will have shared-memory graphics on mobile (sometimes programmer-managed coherence, but increasingly automatic) with a similar situation on Intel integrated graphics.
  • pcwalton: You can on Android, but have to be root!
  • nical: Was just added recently. SurfaceTexture.
  • pcwalton: Chromium doesn't use it, though...
  • gfx: Android browser does.
  • nical: this also depends on if you want to render on the cpu and upload textures or render everything on the gpu. If you think yhink you can render everything on the gpu, it's s aiddferent problem. That's all about setting up buffers, states, uploading patches, trashing those caches as little as possible. The scenegraph thing is that if you have the info about more than just a rectanglele, you can be better about updating those states. Immediate mode backends with a GL backend throw away a lot of what they could reuse because they don't know. They don't know that stuff from prem previous frames can be reused...

Planning

  • pcwalton: Most interested in how we get from where we are today to move to where we'd like to be tomorrow without a full stack rewrite / incrementally. Maybe lots of new kinds of layers until it becomes a scene graph? e.g., textlayer, cssborderlayer, etc. Right now, we don't have lots of displaylist types right now. Could imagine having layers for each of them, and then you're a scenegraph.
  • nical: For each layer, want to have info about the surrounding layers with minimal draw calls.
  • gw: When I moved nokia's 2d immediate mode renderer and make it batched, I took their scene graph, flattened it into a list, worked out the z-order/transparency, and then in multi-thread categorized by rectangle overlaps. Took us down by a factor of five or so on the rendering time. Looking at what we have in Servo today, we could do that and it'd be a big win. But I'm not sure what we're not doing in Servo yet (especially more primitives) that would ruin that approach.
  • pcwalton: In Servo, each frame has a stacking context that may or may not have a layer. If you don't have a layer, you're rendered with your parent. If you do, then you get your own. Each stacking context has a flat list of display items. Each display item has bounds, cliprect, and other display item stuff. Also can have an opacity and transform (no transforms on display items). Tricky thing is multiple flat lists because of stacking contexts, which has to happen becasue of clips/transforms.
  • gw: That looks almost identical to the original Nokia renderer I worked on and we got lots of wins on it. The question is, are their massive item counts or things we don't handle yet?
  • pcwalton: SVG.
  • nical: It's complicated, but if we end up with an th a slower SVG renderer and faster web content, that's a big win.
  • pcwalton: Definitely cutting SVG content corners. I'd love to polyfill with a canvas backend :-)
  • gfx: It's becoming more come common...
  • pcwalton: But canvas is where we are putting our optimizaiton effort.
  • zwarich: Global scene graphs (flattened or not), every time you are doing way more work when you're scrolling on the GPU than when you're just scrolling flat tiles. When you think of low power, you don't think "game engine."
  • jgilbert: You could do the rendering, build a flat layer, build the tiles, and treat the tiles as a cache. So do the scene graph, but keep the layers.
  • nical: Definitely, not saying we shouldn't retain layers.
  • zwarich: Just mentioning we might enight end up burning a lot more power if we work like a game engine.
  • nical: Also, if you're doing it more in the way that the GPU expects, you might do way better. Especially with GPU rendering.
  • zwarich: I'm assuing you'd do GPU rasterization.
  • pcwalton: If you're doing retained layers, you're sort of recreating the same system we have now. you have to retain them somewhere; you have to have tiling... and then you have layers again!
  • nical: One thing that differentiates native and web platforms is starting an animation fast. Right now, starting a border-radius animation is slow because we have to go through a bunch of architectural stuff to kick it off.
  • gfx: Does that work poorly in IE?
  • pcwalton: Start wanting to do incremental display list construction, which is really hard...
  • nical: With DLBI, we have the infra to get a list of rendering items that are incrementally updated. Hardest part aside from the rendering code itself. Need to talk with roc; he thinks it's doable.
  • pcwalton: I'll have to talk with roc. You can retain at the level of stacking contexts easily.
  • zwarich: That's basically our current model... but it doesn't give much incrementality.
  • pcwalton: Yeah, that's the nut that's hard to crack. Maybe use DLBI to prove that two display items are identical and then avoid creating the display items?
  • gfx: I think that's what roc was thinking.
  • zwarich: Also one slightly different one for Servo. Otherwise, could walk the tiles; grab a lock on the frame tree, and emit the frames. But in Servo, we try to not lock the frame tree - we build a display list and try to render that.
  • pcwalton: In the traditional model, we'd just have the display items render immediately. Instead of creating them, perform the operation...
  • jack: But layout and rendering are in separate tasks!
  • pcwalton: Have to merge them. Oh, and you have to respect painting order, so walk the tree multiple times.
  • nical: Need roc around to talk more about stacking contexts and rendering orders.
  • zwarich: Servo's default for parallelism is different from other browsers that requires us to already have layout produce some retained-mode result. Ones that aren't built that way can cheat.
  • nical: Having the retained thing outside of layout does make it easier to experiment with a scene graph, since you already have it.
  • zwarich: Definitely would be easier to experiment with it here than WebKit or Gecko. We have less stuff and we already have the separate items.
  • nical: We should have looked more carefully at what we do that's different from what normal applications do with graphics hardware.
  • jgilbert: We used to do some external combination t testing to get some light coverage of strange configurations.
  • nical: Yeah, it's just hard to know because it's easy to break 10% of your Windows users since we don't have the hardware to test on. Should keep in mind: are we adding something that's a new trick that is unlikely to work on every version of hardware that every user has? Or is it something that everybody does? Surface sharing is not used as heavily in other apps as Firefox, so it breaks.
  • pcwalton: Yeah, we gave up on surface sharing on linux. We need Wayland.
  • jgilbert: But, you're gonna be stuck with Mir :-(
  • gfx: It is sort of interesting about how having sane graphics APIs help with this problem.
  • zwarich: It's not clear if our question should be what should the right web graphics platform be for writing a browser? Or is it how do we engineer a graphics platform that works for the widest range of hardware. Clearly not on a single piece of Android hardware; but also don't want to support every Linux/WinXP configuration.
  • gfx: Lots of the retained mode stuff is because the draw calls are expensive. If you're on hardware that can do lots of draw calls, all this work on the CPU is saving you nothing because you didn't need to batch.
  • zwarich: Most of the time when CPU rendering shows up faster than GPU rendering, it leads me to suspect that there's a ton of CPU overhead calling into the GPU...

What's causing the gfx backlog and how do we avoid it?

  • nical: Driver bugs. Five years ago, FF was a single event loop; everything on the CPU; everything we need now to go to a great place we didn't think we needed years ago. Plumbing the infrastructure for this is hard. Just off main thread compositing is a ton of work in Gecko! All of what I've been doing has just been getting OMTC features on. It's not hard for you guys because you started with it.
  • jgilbert: Our "legacy" is that everything was done on the CPU. You can always just re-render to the same thing; refcount everything and it works; etc. Now have to have crazy cross-process refcounted surfaces...
  • pcwalton: Everybody's had to deal with this - WebKit; Blink.
  • nical: Just takes a lot of time.
  • pcwalton: A lot of the Servo pipeline is just based on the Android Fennec reboot graphics API strategy.
  • nical: Another thing is that the layer stuff had to be refactored... but it turned into a six month rewrite.
  • pcwalton: My fault!
  • nical: Fennec made everything opengl. Then B2G was big and made everything opengl. Then, suddenly needed Windows layers, and we had to replace a ton of OpenGL stuff by D3D. That took like another 6 months.
  • pcwalton: When we go to Windows, should we do ANGLE or go to layers?
  • jgilbert: D3D is really easy.
  • pcwalton: Already using Azure, so can just use D3D backend.
  • nical: Don't use azure for compositing.
  • pcwalton: We don't.
  • nical: Good, don't use Azure for it.
  • pcwalton: We have rust-layers that we use for compositing. Just does layers and tiled layers and has an azure backend.
  • gw: Only a couple hundred lines of code; easy to port.
  • pcwalton: CSS 3d transforms might hurt; having to do something with depth... the shooter demo runs badly on Chrome and Firefox but great on Safari I think they just kick it to CoreAnimation.
  • gfx: Depth doesn't work because of transparency. Have to get the ordering right. So unless you're sorting
  • gfx2: Need order independent transforms.
  • nical: Looking at the video game approaches, they do crazy things we don't want to...
  • pcwalton: Not super concerned about 3d transforms for building shooters, but want to handle things correctly and be simpler. Simple tends to help because static layerization is much easier than frame layerization, which has some terrible performance cliffs that are hard to understand.
  • nical: Gotta be flexible in how you layerize. Most of the perf work on B2G for interactivity was getting applications to have exactly thtly the right layer tree.
  • gfx: Much easier as a web dev if we have static layerization. Know from CSS what will vs. what will not be a layer. In gecko, you have to hope everybody does the right things...
  • jack: is there inspection for layer building? Can web devs see it?
  • nical: Layer borders.
  • gfx: Layer scope, too.
  • jgilbert: Layerization paltform-independent?
  • gfx: No? Mostly no. Scroll bars are special.
  • pcwalton: At least on Servo we can always do overlay scrollbars, because we can't really use native widgets on Servo anyway.
  • nical: Not sure how static layerization will work out, but you will definitely have to be able to tune it well for hardware like B2G. Still have some bad things where we have to tell the Gaia people that if they have a background and text there they should put the color on the div with the text rather than the background. Easy for us to tell Gaia devs, but it's really bad for external devs.
  • jgilbert: Cases like a background image with text scrolling on top of it is unusual...
  • gfx: But it's the natural way to write these things with CSS!
  • nical: It's hard to tell web developers that they should do these weird things...
  • jgilbert: To be fair, people will fix it if you tell them!
  • pcwalton: Not really; animating top/left/right on position:absolute or margins in general are just terrible ways.
  • nical: Common cases are sometimes not that straightforward.
  • zwarich: Lots of horrific mobile layout problems come from bad layerization problems. Lots of your memory is spent on graphics, and it's easy to accidentally double your memory usage.
  • nical: Not even counting drivers that keep a second copy of the texture, like Intel.

console-style APIs

  • zwarich: Lots of things are in flux - Metal, etc. Should we be abstracting over all those instead?
  • nical: If you subscribe to the Khronos API with your mozilla address, you can look at the GL next API discussions.
Clone this wiki locally