public:projects:nervland:notes:issue22_initial_support_render_paths

Issue #22 - Initial support to render paths

  • To start the simple test app:
    nvp wgpu_path

I will use the library https://github.com/styluslabs/nanovgXC as a reference.

We should be able to define a “GUI” by combining different pieces. But in the end a GUI is something that we want to render. So we should really end in a render pass. Which doesn't mean we won't be using compute passes before that to perform some of the preparation work.

In any case, we should start with a component representing the GUI we want to render.

⇒ First I should build the example code available in the nanovgXC library.

The library comes with a makefile.msvc so let's try to build from the VS prompt:

Open x64 Native Tools Command Prompt then:

cd D:\Projects\nanovgXC>
nmake /F Makefile.msvc

Microsoft (R) Program Maintenance Utility Version 14.34.31933.0
Copyright (C) Microsoft Corporation.  All rights reserved.

Makefile.msvc(4) : fatal error U1033: syntax error : '=' unexpected
Stop.

Alright, this just doesn't work 😉. Never mind. Let's study what we should do.

First I think we need something similar to nvgBeginFrame() but as a class constructor.

We could call that class a Frame but that sounds a bit too confusing. So let's go instead with GUIFrame

So we should be able to create a GUIFrame object as follow:

GUIFrame frame({800,600})

Except that generally speaking we probably want to keep our gui frames as referenced objects: these are elements that may change dynamically depending on the user interactions,

So let's go instead with:

auto f = GUIFrame::create({800,600});

Note: I should consider using the nanovgXC software backend as reference for implementation:

#define NANOVG_SW_IMPLEMENTATION
#include "nanovg_sw.h"

On a GUIFrame we need support to create paths, so something like this:

    auto f = GUIFrame::create({800, 600});
    (*f).beginPath()
        .moveTo(100, 100)
        .lineTo(200, 100)
        .lineTo(150, 50)
        .closePath()
        .fillColor({255, 192, 0, 255})
        .fill();

Now I'm adding some helper structs in the main GUIFrame class, like Transform, Paint, State, etc

Note: we can run our simple unit test with:

nvp nvl_test_gpu -t path_rendering

And now just stopped on the implementation of the GUIFrame::arcTo() method (which will require the arc method apparently.)

Update: Still working on the arcTo implementation (added some extensions in the Vec2 class).

When we need to fill a path, for each point of the path we will emit 4 vertices to draw a quad containing the segment ending on this point.

Next when drawing this quad we will add to the coverage texture for each pixel on a first pass, and on a second pass we use that coverage to write the color value (?)

Consider we are processing one vertex.

We have a vertex id, and since we know that we are rendering 4 vertices per point, we can get the point index with:

U32 pointIndex = vertexId/4;

With the point index we can retrieve the point data:

PointData& pdata = points[pointIndex];

Next we get the path data:

PathData& path = paths[pdata.pathIndex];

Next we can get the start and end point coordinates for the current segment:

if(pdata.pointIndex == 0) {
    // get the final point as start point:
    Vec2f startCoords = points[pointIndex - pdata.pointIndex + path.numPoints-1].coords;
}
else {
    // Get the previous point: 
    Vec2f startCoords = points[pointIndex-1].coords;
}

We also need the path minY value:

F32 minY = path.minY; 

And now we have the minY, startCoords, and endCoords (coords from the current point): with these, we can emit the 4 vertices to draw the encapsulating quad.

Next in the fragment shader we compute the coverage of the fragment. And we add the result to the current pixel coverage.

Now the problem is, this only works as long as the coverage is computed for a single path 🤔…

And I've been trying to turn that into al possible direction, but I can't find a way to workaround this issue: when writing the coverage, we really need to process only one path, or at least paths that do not overlap, which is quite limiting.

Otherwise, for a given pixel the “below path” could have a full coverage (=1.0), but the path on top might have only a partial coverage (ie =0.5 for instance), but we would not be able to keep track of both coverages at the same time.

So let's say we have a single render call for a given path, what next ?

To render that path, we first need to clear the coverage texture,

Then we compute the coverage,

And finally we render a quad covering the full path, and reading for each fragment the coverage value to decide how we should process each pixel.

Hmmm 🤔… And now with a quick check, I would say that we don't have support for atomic operations on texture storage in WebGPU, which mean, we will probably need to use a storage buffer instead, but that should still work (max size for buffer is 134217728 bytes, which still allow for an 8kx4k texture of a single U32 value)

crunching… crunching… crunching…

Whhoooo! And now I have my first “path rendering ever”:

Generated from thihs code:

    auto f = GUIFrame::create({800, 600});
    (*f).beginPath()
        .moveTo(100, 100)
        .lineTo(200, 150)
        .lineTo(240, 100)
        .lineTo(170, 80)
        .lineTo(120, 60)
        // .lineTo(200, 100)
        // .lineTo(150, 50)
        .closePath()
        .fillColor({255, 192, 0, 255})
        .fill();

    rpass.add_bundle(f->get_render_node());

Okay… I agree, that doesn't like a “Path” at all, but still, this is all what I'm asking for the moment: just displaying a quad from the start to the end of each segment on that path 😉.

Here is the corresponding shader in fact:

#include "base_utils"

struct CanvasParams {
    width: f32,
    height: f32,
};

struct RenderParams {
    innerCol: vec4f,
    outerCol: vec4f,
    bounds: vec4f,
    renderMode: u32,
    segsOffset: u32,
    segsCount: u32,
};

@group(0) @binding(0) var<uniform> canvas : CanvasParams;
@group(0) @binding(1) var<uniform> params : RenderParams;
@group(0) @binding(2) var<storage, read> segments : array<vec4f>;
@group(0) @binding(3) var<storage, read_write> coverage : array<atomic<u32>>;

struct VertexOutput {
    @builtin(position) Position: vec4f,
    @location(0) coords: vec2f,
}

@vertex
fn main(input: StdVsInput) -> VertexOutput {

    // Get the segment Index based on the vertex id (we have 6 vertices per segment)
    var segIdx: u32 = input.vertexID / 6;

    // Get the vertex index for the current segment:
    var uv: vec2f = get_quad_uv(input.vertexID);

    // Get the segment data:
    var seg: vec4f = segments[segIdx];

    // The segment contains the start and the end point for the rendering,
    // But these are given in the canvas frame, so we should manually remap
    // those coords to the clip space (ie [0,width]x[0,height] -> [-1,1] x [-1,1])
    // Note: we use the bottom left point as origin.

    var w: f32 = canvas.width;
    var h: f32 = canvas.height;

    seg = vec4f(-1.0, -1.0, -1.0, -1.0) + seg * vec4f(2.0 / w, 2.0 / h, 2.0 / w, 2.0 / h);

    var pos: vec2f = seg.xy + uv * (seg.zw - seg.xy);

    // Prepare the output structure:
    var output: VertexOutput;
    output.Position = vec4f(pos, 0.0, 1.0);
    output.coords = uv;
    return output;
}

@fragment
fn main_fs(in: VertexOutput) -> @location(0) vec4<f32> {
    var color = vec3f(in.coords, 0.0);
    return vec4f(color, 1.0);
}

Now time to refine a bit on this…

Next step to take is to introduce support for coverage computation.

This means that we should perform 2 rendering passes for each path to draw. The first one accumulating the coverage, and the second one effectively rendering the fragments.

For a moment I was considering using a compute pass to accumulate the coverage, but thinking about it, that would mean alternating between the compute pipeline and the render pipeline for each quad, whereas here, instead we can use the same render pipeline to perform the 2 tasks: I think that should be more efficient (🤔?) [and it's not clear how I would “generate the segment quads” in the compute shader anyway…)

I built a first version of a compute coverage function as follow:

// Function used to compute the coverate for a given pixel.
// the pixel center coordinates will be 'coords',
// And the width/height of the pixel is 1.0,
// the segment parameter give use the 2 top points of the trapezoid
// while the minY value will define the base Y level of that trapezoid.
fn compute_coverage_v0(seg: vec4f, minY: f32, coords: vec2f) -> f32 {
    // Stop if our segment is vertical:
    if seg.x == seg.z {
        return 0.0;
    }

    // First we get the min and max x coords of the trapezoid:
    // We call the point on the left side p0 and the one on the right p1:

    var p0: vec2f = select(seg.xy, seg.zw, seg.x < seg.z);
    var p1: vec2f = select(seg.zw, seg.xy, seg.x < seg.z);

    // compute our square rect:
    var left: f32 = coords.x - 0.5;
    var right: f32 = coords.x + 0.5;
    var top: f32 = coords.y + 0.5;
    var bottom: f32 = coords.y - 0.5;

    // Check if we can finish the test early:
    if left >= p1.x || right <= p0.x || top <= minY {
        return 0.0;
    }

    // Compute the intersection considering the left/right/bottom edges of the trapezoid:
    left = max(left, p0.x);
    right = min(right, p1.x);
    bottom = max(bottom, minY);

    // Next we compute the top points y coords on the trapezoid corresponding
    // to the x coords "left" and "right" of our clipped rect:
    var alpha: f32 = (p1.y - p0.y) / (p1.x - p0.x);
    var t0: f32 = p0.y + (left - p0.x) * alpha;
    var t1: f32 = p0.y + (right - p0.x) * alpha;

    if bottom >= max(t0, t1) {
        // our clipped rect is above the trapezod top edge:
        return 0.0;
    }

    // Compute the clipped rectangle coverage:
    var dx: f32 = right - left;

    var cov: f32 = max(dx, 0.0) * max(top - bottom, 0.0);

    // Check if we are fully in the trapezoid:
    if top <= min(t0, t1) {
        // Full coverage from the top trapezoid edge:
        return cov;
    }

    // if t0==t1, then the top edge is horizontal, and we can clip the 
    // rectangle further:
    if t0 == t1 {
        top = min(top, t0);
        cov = max(right - left, 0.0) * max(top - bottom, 0.0);
        return cov;
    }

    // We are partially clipped by the trapezoid top edge:
    // We have 6 possible cases to deal with here:
    // But if t1 > t0, then we can flip t0 and t1 by symetry:
    if t1 > t0 {
        var tmp = t0;
        t0 = t1;
        t1 = tmp;
    }

    var dy = t0 - t1;

    if t0 > top && t1 >= bottom {
        // We remove the top right triangle from the coverage:
        var h: f32 = top - t1;
        // compute the left length on top edge before intersection point:
        var w: f32 = (t0 - top) * dx / dy;
        // And now the right length:
        w = dx - w;
        return cov - w * h * 0.5;
    }

    if t0 > top && t1 < bottom {
        // compute the left length on top edge before intersection point:
        var w1: f32 = (t0 - top) * dx / dy;
        // compute the left length on bottom edge before intersection point:
        var w2: f32 = (t0 - bottom) * dx / dy;

        // Compute coverage wit trapezoid rule:
        return (top - bottom) * (w1 + w2) * 0.5;
    }

    // Now we have t0 < top:
    if t1 >= bottom {
        // remember we also have t0 > t1 (so here top > t0 > t1)
        // compute with horizontal trapezoid rule:
        return (right - left) * (t0 - bottom + t1 - bottom) * 0.5;
    }

    // Now we have t0 > bottom and t1 < bottom
    // We remove the bottom left triangle:
    var w: f32 = (t0 - bottom) * dx / dy;

    return cov - (t0 - bottom) * w * 0.5;
}

Then I optimized that into the following version:

// Optimized version:
fn compute_coverage(seg: vec4f, minY: f32, coords: vec2f) -> f32 {
    // Early exit for vertical segment
    if seg.x == seg.z {
        return 0.0;
    }

    // First we get the min and max x coords of the trapezoid:
    // We call the point on the left side p0 and the one on the right p1:
    var p0: vec2f = select(seg.xy, seg.zw, seg.x < seg.z);
    var p1: vec2f = select(seg.zw, seg.xy, seg.x < seg.z);

    // Pre-calculate slopes and y-intercepts for trapezoid edges
    let slope = (p1.y - p0.y) / (p1.x - p0.x);
    // Find the intersection with the Y axis to produce the
    // reducted line equation:
    let y0 = p0.y - slope * p0.x;

    // The top line equation is:
    // y = slope * x + y0

    // Clip rectangle against trapezoid bounds
    let left = max(coords.x - 0.5, p0.x);
    let right = min(coords.x + 0.5, p1.x);
    let bottom = max(coords.y - 0.5, minY);

    // Compute the local y coord of the top egde at the clipped rect left, right positions:
    let t0 = y0 + slope * left;
    let t1 = y0 + slope * right;
    let tmax = max(t0, t1);
    let tmin = min(t0, t1);
    let top = min(coords.y + 0.5, tmax);

    // Check for complete miss
    if left >= p1.x || right <= p0.x || top <= minY || bottom >= tmax {
        return 0.0;
    }

    // Calculate clipped rectangle area
    let area = max(right - left, 0.0) * max(top - bottom, 0.0);

    // Check for full coverage:
    if top <= tmin || area == 0.0 || tmax == tmin {
        return area;
    }

    // Partial coverage calculation:
    let dy = tmax - tmin;
    let rw = right - left;
    let rh = top - bottom;

    // Compute the vertical diff from the top of our clipped rectangle:
    let h = top - dy;

    // We have 2 cases now: either h < (top- bottom) or not.
    // In the first case we remove h*(right-left)*0.5 from the coverage.
    // And otherwise, we need to compute the below remaining triangle width:
    let w1 = rh * rw / h;
    return select(rh * w1 * 0.5, area - h * rw * 0.5, h <= rh);
}

Arrggh… unfortunately, I seems I have another severe issue here, which is that I'm trying to write and read to the same buffer resource in a single render pass, and that doesn't seem to be supported 😭: cf. https://github.com/gpuweb/gpuweb/issues/59

⇒ okay, so now I'm using a dedicated pass for each rendering operation to ensure we get the buffer memory consistency. That's a shame, but at least it works this way so this is a start!

And here is an initial working non trivial result:

Generated with this code:

    F32 scale = 1.0;
    auto f = GUIFrame::create({U32(400 * scale), U32(400 * scale)});
    (*f).beginPath()

        // .moveTo(100, 100)
        // .lineTo(200, 100)
        // .lineTo(300, 300)
        // .lineTo(100, 200)
        // .lineTo(100, 100)

        // .moveTo(100, 100)
        // .lineTo(200, 100)
        // .lineTo(200, 200)
        // .lineTo(100, 200)

        .moveTo(100 * scale, 100 * scale)
        .lineTo(200 * scale, 100 * scale)
        .lineTo(300 * scale, 200 * scale)
        .lineTo(100 * scale, 200 * scale)
        // .lineTo(100 * scale, 100 * scale)
        .closePath()

        .moveTo(225 * scale, 220 * scale)
        .lineTo(275 * scale, 270 * scale)
        .lineTo(225 * scale, 270 * scale)
        .closePath()

        .moveTo(125 * scale, 125 * scale)
        .lineTo(125 * scale, 175 * scale)
        .lineTo(175 * scale, 175 * scale)
        .lineTo(150 * scale, 150 * scale)
        .set_winding(GUIFrame::WIND_CW)
        .closePath()

        .moveTo(125 * scale, 25 * scale)
        .lineTo(150 * scale, 50 * scale)
        .lineTo(175 * scale, 75 * scale)
        .lineTo(125 * scale, 75 * scale)
        .set_winding(GUIFrame::WIND_CCW)
        .closePath()

        // .lineTo(250 * scale, 175 * scale)
        // .lineTo(200 * scale, 125 * scale)
        // .lineTo(125 * scale, 125 * scale)

        // .lineTo(120, 60)
        // .lineTo(200, 100)
        // .lineTo(150, 50)
        .set_fill_color({255, 192, 0, 255})
        .fill()
        // .set_texture(tex);
        .render();

In fact to produce this result I initially tried with my compute_coverage() implementation but got some artifacts, so I then tried with the reference areaEdge2() function:

fn areaEdge2(v0: vec2f, v1: vec2f) -> f32 {
    // if disableAA {
    //     return v1.x < v0.x ? coversCenter(v1, v0) : -coversCenter(v0, v1);
    // }

    var window: vec2f = clamp(vec2f(v0.x, v1.x), vec2f(-0.5), vec2f(0.5));

    var width = window.y - window.x;

  //if(v0.y > 0.5f && v1.y > 0.5f)  -- didn't see a significant effect on Windows
  //  return -width;

    var dv: vec2f = v1 - v0;
    var slope: f32 = dv.y / dv.x;
    var midx: f32 = 0.5 * (window.x + window.y);
    var y: f32 = v0.y + (midx - v0.x) * slope;  // y value at middle of window
    var dy: f32 = abs(slope * width);

  // credit for this to https://git.sr.ht/~eliasnaur/gio/tree/master/gpu/shaders/stencil.frag
  // if width == 1 (so midx == 0), the components of sides are: y crossing of right edge of frag, y crossing
  //  of left edge, x crossing of top edge, x crossing of bottom edge.  Since we only consider positive slope
  //  (note abs() above), there are five cases (below, bottom-right, left-right, left-top, above) - the area
  //  formula below reduces to these cases thanks to the clamping of the other values to 0 or 1.
  // I haven't thought carefully about the width < 1 case, but experimentally it matches areaEdge()
    var sides: vec4f = vec4f(y + 0.5 * dy, y - 0.5 * dy, (0.5 - y) / dy, (-0.5 - y) / dy);  //ry, ly, tx, bx
    sides = clamp(sides + 0.5, vec4f(0.0), vec4f(1.0));  // shift from -0.5..0.5 to 0..1 for area calc
    var area: f32 = 0.5 * (sides.z - sides.z * sides.y - 1.0 - sides.x + sides.x * sides.w);
//   return width == 0.0f ? 0.0 : area * width;
    return select(area * width, 0.0, width == 0.0);
}
/

… and this worked just fine!

And with more investigations I finally realized the issue was in the “coords” that I'm providing as input not being at the center of the current fragment (in some case?), so i'm now using this as follow:

#ifdef USE_AREA_EDGE
        var area: f32 = areaEdge2(seg.zw - coords, seg.xy - coords);
        var wf: f32 = -1.0;
#else
        // Ensure we really use the center of the current pixel:
        coords = floor(coords) + vec2f(0.5, 0.5);
        var area: f32 = compute_coverage(seg, minY, coords);

        // Compute the winding factor:
        // positive if the segment goes from right to left:
        var wf: f32 = select(-1.0, 1.0, seg.x > seg.z);
        // var wf: f32 = select(1.0, -1.0, seg.x > seg.z);
#endif

Important note: One thing I noticed which is really strange is that the following select statements don't seem to work:

    var p0: vec2f = select(seg.xy, seg.zw, seg.x < seg.z);
    var p1: vec2f = select(seg.zw, seg.xy, seg.x < seg.z);

So instead I have to use something like this:

    var p0: vec2f = seg.xy;
    var p1: vec2f = seg.zw;
    if seg.x >= seg.z {
        p0 = seg.zw;
        p1 = seg.xy;
    }

… no idea why for now… 😅

An article on multisample in webgpu: https://webgpufundamentals.org/webgpu/lessons/webgpu-multisampling.html

⇒ It seems that for now only 4x multisamples is supported.

Idea: it might be possible to render our paths on a pass with 2 attachments: one for the color output an another for the coverage computation.

… But no: that won't work: because after writing to the coverage, we need to read from it. So we would need another render pass 😞.

Compute shaders are synchronized on the dispatch level, so we could use this property to our advantage when writing the coverage. But question is then: how do we raster the quads for each segment ?

Actually, thinking about this, there might be a way to do it with compute shaders: and in this case we could get multiple dispatches in a single compute pass. But at the same time, we don't have a “ComputeBundle” concept, so we woudl still need to dispatch all the steps manually everything we need to do an update.

TODO: should consider this path more carefully at some point.

For now introduced support for solid color only, and this was a simple change in the fragment shader:

    // Use the solid color from the render params:
    var color = params.innerCol;
    
    // Apply the coverage on the color alpha:
    color.a *= alpha;
    return color;

I'm now providing the scissorMat, scissorExt and scissorScale in the Render parameters (assigned in convert_paint):

    if (scissor.extent[0] < -0.5F || scissor.extent[1] < -0.5F) {
        // No scissor to apply:
        rdata.scissorMat.make_zero();
        rdata.scissorExtScale.set(1.0F, 1.0F, 1.0F, 1.0F);
    } else {
        // Write the inverse scissor matrix:
        rdata.scissorMat = scissor.xform.inverse().toMat4f();

        // Get the scissor scale factor:
        auto sca = scissor.xform.get_scale() * _traits.pixelRatio;
        rdata.scissorExtScale.set(scissor.extent, sca);
    }

And then defining the scissorMask() function in shader:

// Scissoring:
// Note: this will be applied to canvas coordinates (ie for instance canvas size: [800x600])
fn scissorMask(p: vec2f) -> f32 {

    var sc: vec2f = (abs((params.scissorMat * vec4f(p, 1.0, 0.0)).xy) - params.scissorExtScale.xy);
    sc = vec2(0.5, 0.5) - sc * params.scissorExtScale.zw;
    return clamp(sc.x, 0.0, 1.0) * clamp(sc.y, 0.0, 1.0);
}

And applying this method when producing the final color:

    let alpha = clamp(cov, 0.0, 1.0);

    // Use the solid color from the render params:
    var color = params.innerCol;
    
    // Apply the coverage on the color alpha and also the scissor mask:
    color.a *= alpha * scissorMask(coords);
    return color;

… But for now, this scissorMask will only return 1.0 anyway 😅. I will do some testing later on.

We can now draw a circle/ellipse path with the methods:

    // Create a new ellipse shape sub-path
    auto ellipse(F32 cx, F32 cy, F32 rx, F32 ry) -> GUIFrame&;

    // Creates new circle shaped sub-path.
    auto circle(F32 cx, F32 cy, F32 r) -> GUIFrame&;

For instance, this gives the following result:

Next I added the support to draw rounded rects and this looked just good on the first try:

The added methods in the GUIFrame class are:

    // Creates new rectangle shaped sub-path.
    auto rect(F32 x, F32 y, F32 w, F32 h) -> GUIFrame&;

    // Creates new rounded rectangle shaped sub-path.
    auto rounded_rect(F32 x, F32 y, F32 w, F32 h, F32 r) -> GUIFrame&;

    // Creates new rounded rectangle shaped sub-path with varying radii for each
    // corner.
    auto rounded_rect(F32 x, F32 y, F32 w, F32 h, F32 radTopLeft,
                      F32 radTopRight, F32 radBottomRight, F32 radBottomLeft)
        -> GUIFrame&;

⇒ Actually it seems this is only applicable if we disable antialiasing for the shape rendering. Maybe not that relevant ?

⇒ Discovered there was a clever usage of the rounded rect SDF in the original nanovg code to render gradients 😉!

And here is the result with one linear gradient:

And here is the result after adding also a radial gradient on the circle and a box gradient on the custom shape:

This was not too complicated: just had to implement the expand_stroke() method. And now I can render strokes as this:

I really feel I need a dedicated Path2d class to encapsulate my rendering more properly.

An interesting resource I should use is: https://css-tricks.com/svg-path-syntax-illustrated-guide/

  • public/projects/nervland/notes/issue22_initial_support_render_paths.txt
  • Last modified: 2024/05/18 08:17
  • by 127.0.0.1