NervLand DevLog #32: Working our way to display Proland terrains

In this article, we resume our work on the proland integration, focusing on getting a first tile producer to work on top of our TerrainNode (ie. the Elevation Producer). This was definitely a critical stage, and this took a significant effort to get the first version working. And unfortunately, towards the end of the article I stopped taking notes on what I was doing since I was really burning my head on really tricky stuff. But anyway, we will resume this in the next devlog session .

Youtube videos (3 parts) for this article available at:

References:

I just finished the core implementation of the ElevationProducer class: now wondering how I'm supposed to integrate this in the TerrainNode 🤔? And the solution seems to be with the TileSamplerZ class: investigating…

Reminder: Current TerrainNode rendering process:

In the TerrainNode::process() method we clear the count of drawn quads
Then we recursively iterate in in all the TerrainQuads with the TerrainQuad::update() method
If applicable for a quad the TerrainNode::draw_quad() method is called to add a given quad to the storage buffer of elements to draw.

If we refer to the XML files describing the scene in the reference implementation we see that we have:

    <sequence name="updateTerrainMethod">
        <updateTerrain name="this.terrain"/>
        <updateTileSamplers name="this.terrain"/>
    </sequence>

⇒ So it seems we start with the update of the terrain and then follow with the update of the Tile samplers (?) [yes, confirmed].

So let's assume we create a couple of TileSampler in our TerrainNode to link it to the tile producers, what we will do next ?

In the Proland implementation the TileSampler will create a complete tree representing the available terrain quads: this sounds a bit like a duplication of the actual TerrainQuad tree built in the TerrainNode already: maybe we could do without this 🤔?

Starting to use scheduler to run main app loop

But anyway, for now, focusing on how to process the rendering with task scheduling: I think this should be relatively easy to achive:

- Starting from WGPUEngine::render_frame() we can collect a task for processing all the scenes from SceneManager. - Then we run that task in the scheduler. - But that task should really be a TaskGraph in which multiple scenes will be rendered, - Or, maybe this is not even needed and we could simply add tasks into the scheduler from inside that root task ?

So I tried to update the rendering loop right in the wgpu_sdl implementation:

    // Use the scheduler to run the main application loop:
    auto renderTask = make_task([this]() { render_frame(); });

    auto& sch = app.get_scheduler();

    while (!app.is_done()) {
#ifdef __APPLE__
        // Should add this pool on apple.
        utils::ScopedAutoreleasePool pool;
#endif
        wman->handle_events();
        eng->process_events();
        update_fps();
        // render_frame();
        sch.run(renderTask);

        // Mark the task as requiring execution again:
        renderTask->set_done(false, 0);

        // sleep_us(16000);
    }

Which this I will indeed get something to display but I very quickly get validation errors as follow:

2023-09-25 12:09:55.840134 [ERROR] Dawn: device lost: D3D12 closing pending command list failed with E_FAIL (0x80004005)
 - While handling unexpected error type Internal when allowed errors are (Validation|DeviceLost).

And in fact I know where the issue comes from in this case: when calling WGPUEngine::build_commands() we access a shared version of the command builder while we should really use one command builder per thread! Let's fix that. OK we now have some per thread elements in the WGPUEngine:

    struct ThreadData : public RefObject {
        /** Command builder */
        RefPtr<WGPUCommandBuilder> commandBuilder;

        /** Default render pipeline builder */
        RefPtr<WGPURenderPipelineBuilder> renderPipelineBuilder;

        /** Default compute pipeline builder */
        RefPtr<WGPUComputePipelineBuilder> computePipelineBuilder;
    };

But unfortunately it seems we still have some kind of multithreading issue here 😒: Investigating further.

I ended up trying to use the scheduler from the Scene::process() method, but then even if I comment everything and just initialize the scheduler I get a CPU usage of 100% in my app 😭 not too good…

void Scene::process() {

    auto& sch = NervApp::instance().get_scheduler();

    auto ptask = make_task([this]() {
        // update the camera:
        // _camera->update_buffers();

        auto* eng = WGPUEngine::instance();
        // auto& bld = eng->build_commands();

        // for (auto& node : _nodes) {
        //     node->process();
        // }

        // // render the passes:
        // for (auto& gpass : _passes) {
        //     if (gpass.renderPass != nullptr) {
        //         bld.execute_render_pass(*gpass.renderPass);
        //     } else {
        //         bld.execute_compute_pass(*gpass.computePass);
        //     }
        // }
        // bld.submit();
    });

    // sch.run(ptask);
}

⇒ To fix this high CPU usage issue the key is actually to avoid the timed dequeue ops both in the LogManager:

        // U32 count = _msgQueue.wait_dequeue_bulk_timed(
        //     _msgConsumerToken, mtags.data(), maxNumStrings, 200);
        U32 count = _msgQueue.wait_dequeue_bulk(_msgConsumerToken, mtags.data(),
                                                maxNumStrings);

        if (count == 0) {
            continue;
        };

        if (_stop) {
            break;
        }

and in the MultithreadScheduler:

        // if (!_taskQueue.wait_dequeue_timed(ctok, task, 100)) {
        //     continue;
        // };
        _taskQueue.wait_dequeue(ctok, task);

        // Break the loop early if exit is requested:
        if (_stop) {
            break;
        }

After that CPU usage will go to almost 0% 😅!

Alright! And this actually also fixed the whole scheduler usage issue 😲! (currently running from the Scene::process, but let's try to generalize that now.)

OK When using the scheduler in the wgpu_sdl base app, I had some trouble initially to be able to create a “render task” in the engine itself.

Then I got even more problems trying to turn that task into a task graph 😂 but I keep digging anyway…

For instance I got errors such as this:

2023-09-26 08:24:57.024879 [NOTE] WGPUEngine: creating specific data for thread 6
2023-09-26 08:24:57.028276 [ERROR] Dawn: device lost: IDXGISwapChain::Present failed with DXGI_ERROR_DEVICE_REMOVED (0x887A0005)
    at CheckHRESULTImpl (D:/Projects/NervProj/build/libraries/dawn-git-20230509/src/dawn/native/d3d/D3DError.cpp:94)
    at PresentDXGISwapChain (D:/Projects/NervProj/build/libraries/dawn-git-20230509/src/dawn/native/d3d/SwapChainD3D.cpp:228)
    at PresentImpl (D:/Projects/NervProj/build/libraries/dawn-git-20230509/src/dawn/native/d3d12/SwapChainD3D12.cpp:80)
 (reason: unknown)

Checking online it seems that the SwapChain::Present() call is basically not threadsafe… but still, I don't really understand how this could be happening… 🤔. Investigating now.

Hmmm, okay, I think I might have a clue here: when I use this code:

    auto tg = make_taskgraph();

    // Pre-render step:
    auto t1 = tg->add_func([this]() {
        logDEBUG("in t1 on {}", get_thread_index());
        pre_render();
    });

    // Rendering scenes:
    auto t2 = tg->add_func([]() {
        logDEBUG("in t2 on {}", get_thread_index());
        auto* sman = SceneManager::instance();
        sman->process_all();
    });

    // GUI render and presenting:
    auto t3 = tg->add_func([this]() {
        logDEBUG("in t3 on {}", get_thread_index());
        auto& bld = build_commands();
        bld.submit(true);

        present();

        post_render();
    });

    // Task 3 depends on 1 & 2:
    tg->add_dependency(t3, t1);
    tg->add_dependency(t3, t2);

    // _renderTask = tg;
    // }

    // _renderTask->schedule();
    tg->schedule();

It seems the issue might come from the concurrency between t1 and t2 (if I move the content of one of those to remove the concurrency everything seems OK). So what are we using here that requires synchronization ?

Okay, so, with further investigations, it seems the issue is within the GuiManager::render_frame() call: this one is dedicated to the rendering of the IMGUI interface, but it doesn't seem to like concurrent execution with other WebGPU calls… But okay, that sounds fair enough. And if I use a sequential task graph now, it seems to work fine:

    tg->add_dependency(t2, t1);
    tg->add_dependency(t3, t2);

With this, we basically do the pre_render() then everything to render the scenes, then the final GUI rendering and presentation, which seems an appropriate sequence to use for now 😊.

Next issue I got was when I was trying to reuse a “render task” instead of creating a new one each time: at first, this was only executed once, but I finally traced this down to the fact that the task is considered “up-to-date” with respect to its dependencies, and thus not executed again. So the trick is to artificially update the predecessor completion date for each a task (or taskgraph) before re-submitting it:

void Task::schedule(Bool forceRun, Task::ExecReason r, U64 deadline) {
    auto& sch = NervApp::instance().get_scheduler();

    if (forceRun) {
        set_predecessors_completion_date(get_completion_date() + 1);
    }

    if (is_done()) {
        sch.reschedule(this, r, deadline);
    } else {
        sch.schedule(this); 
    }
}

⇒ And with this, it seems I now have the first fully multithreaded rendering implementation working in my engine 😲!! (Basically each task can be executed on a different thread, and one of those tasks is the final gui rendering/swapchain Present operation). That is pretty cool.

What I really need to keep in mind here is how the multithreadScheduler will use the “tasks”: on scheduling those tasks are stored either for immediate execution or pending execution, but then it is not explicitly tracking the content of a task graph for instance, which means the next time a task graph is scheduled, it would not really matter if its content was updated [and I think this will be a key element for us]

Okay, so I'm now starting to refactor the system above to use a dedicated task for the processing of each scene (so that we could potentially process multiple scenes concurrently). So injecting an overall task from the scene manager in our render task:

        const auto& t2 =
            tg->add_task(SceneManager::instance()->get_process_all_task());

Now, this works just fine, but this is getting me thinking about the “solution” I introduced above to re-schedule a complete taskgraph with this:

    if (forceRun) {
        set_predecessors_completion_date(get_completion_date() + 1);
    }

The problem with this is that now every task that we might only want to run once in a task graph as a preliminary step would still get re-executed here because all the predecessor completion date will be updated to a future time, hmmm 🤔, not quite what I would like to have. So ?

First thing is, in the TaskGraph, we only set all sub tasks as not-done in case we have a “DEPENDENCY_CHANGED” situation:

void TaskGraph::set_done(Bool done, U64 t, ExecReason r) {
    Task::set_done(done, t, r);
    if (!done) { // calls sub tasks recursively only if task must be reexecuted
        // if a dependency of this task graph has changed, then all sub tasks
        // must be reexecuted; otherwise, if the data produced by this graph
        // is needed again then, a priori, only the sub tasks without successors
        // must be reexecuted (these sub tasks may need other sub tasks to be
        // reexecuted if they need their data; in this case they can change
        // their execution state recursively in their own set_done method).
        auto& tlist = r == DEPENDENCY_CHANGED ? _allTasks : _lastTasks;
        for (const auto& task : tlist) {
            task->set_done(done, t, r);
        }
    }
}

So let's assume for a while we keep this flag when we “reshedule” our existing tasks. But we don't override the predecessor completion date before the submission. Since all the subtasks are marked as not-done, they will be resheduled appropriately. But then when it comes to execution they will be considered up to date. So that's where we need to change something [thinking…].

⇒ Solution: Introduced the concept of “perpetual” tasks, that will never be up-to-date with respected to a predecessor completion date:

    virtual void init(TaskPtrSet& initialized) {
        if ((_flags & FLAG_PERPETUAL) != 0) {
            // This is a continuous task so we should update the predecessors
            // completion date to ensure it is not considered "up-to-date"
            set_predecessors_completion_date(get_completion_date() + 1);
        }
    };

Next I just need to make the first task in a task sequence a perpetual one, and this seems to do the trick to trigger a proper cascade of tasks execution 👍!:

        const auto& t1 = tg->add_perpetual([this]() {
            // logDEBUG("in t1 on {}", get_thread_index());
            pre_render();
        });

Now, when rendering a scene, we need to perform the following:

void Scene::process() {

    // update the camera:
    _camera->update_buffers();

    auto* eng = WGPUEngine::instance();
    auto& bld = eng->build_commands();

    for (auto& node : _nodes) {
        node->process();
    }

    // render the passes:
    for (auto& gpass : _passes) {
        if (gpass.renderPass != nullptr) {
            bld.execute_render_pass(*gpass.renderPass);
        } else {
            bld.execute_compute_pass(*gpass.computePass);
        }
    }
    bld.submit();
}

Above, each node->process() execution could be done concurrently by default (?). But take the case of a TerrainNode for instance: during its process(), we will collect the TerrainQuad to show, so we will generate a list of additional tasks that should be performed before the actual rendering is done. [hmmm, thinking 🤔…]

Arrghh… I am realizing I have a serious limitation here: whenever I want to prepare a task graph for rendering of a scene, if this scene contains a SceneNode that will require some significant processing, we want to execute that logic in a dedicated task, right ? But then what if that logic itself might give birth to additional tasks that also need to be executed before the final rendering task is done ? (typically, this is what we would get in a TerrainNode: the process() method with go through all the terrain quads, and for the visible quads, it should trigger the task to generate the missing tiles with the producers). But you might still have cases where non of those “additional tasks” is created, and you don't know this before the execution of the main TerrainNode::process() task…

So you could say, okay, then I just execute the first part first: the TerrainNode::process(), and when this is done I get a list of additional tasks, and from there I schedule either the executions of those tasks before the final rendering, or directly the final rendering…

Except that the full scene processing itself is a task, and after it is submitted, and completed, the render engine will present the results! And if we wait for the TerrainNode::process() task to be completed before submitting the additional tasks & executing the rendering, then the overall TaskGraph simply doesn't care: scheduled precursor tasks are done so we can move to the present task, and Boom! you don't get anything to present, and worse: you start executing render passes depending on an already presented swapchain texture: nothing good will get out of this, believe me, that's the lesson I just learnt 🤣.

⇒ So the key here is really to dynamically modify a current task graph being executed, and the solution I found for this is to introduce the support for SubTasks inside a Task:

* During the execution of a Task we can now call this method:

    /* Schedule a subtask to this task */
    void schedule_subtask(const RefPtr<Task>& stask);

Note: the subtask to be scheduled can be either a Simple Task or even a TaskGraph in fact!

So this task will be submitted for scheduling immediately at that time.

Then when the scheduler completes the “parent” task, we check if there are any pending “subtasks” still being executed, and if it is the case we don't mark the parent task as completed yet:

            case OUT_TASK_COMPLETED: {
                // Add the duration of the task if it was executed (eg.
                // changed):
                if (tout.changes) {
                    record_task_duration(tout.task->get_type(), tout.duration);
                }

                // Check if we still have subtasks running for that task:
                if (tout.task->get_num_subtasks() == 0) {
                    // handle the task done state:
                    mark_task_done(tout.task, tout.changes);
                }
            } break;

Instead, that parent task is kept “not done” until all the subtasks are completed, and on completion of each subtask we decrement an atomic counter to check if the parent task can now be considered as done:

void Task::set_done(Bool done, U64 t, ExecReason r) {
    if (_done != done) {
        _done = done;
        if (done || r != DEPENDENCY_CHANGED) {
            this->_completionDate = t;
        }
        for (auto& listener : _listeners) {
            listener->task_state_changed(this, done, r);
        }

        if (done && _parent != nullptr) {
            Bool parent_done = _parent->on_subtask_completed(*this);

            if (parent_done) {
                // logDEBUG("Completed subtask {} for task {}", id(),
                //          _parent->id());
                _parent->notify_done();
            }

            // Reset the parent:
            _parent = nullptr;
        }
    }
}

⇒ By keeping the parent task “not-done” we prevent further execution of the initially submitted taskgraph until we are done with the subtasks which is exactly what we needed to be able to inject those dynamic logic elements, and this seems to work just fine now 👍!

I added support to store TileSamplers in a TerrainNode as follow:

    GPUTileStorage::CreateInfo infos{
        .tileSize = 50, .nTiles = 100, .format = TextureFormat::RGBA32Float};
    auto storage = create_ref_object<GPUTileStorage>(infos);

    // Create the tile cache:
    auto& sch = NervApp::instance().get_scheduler();
    auto cache = create_ref_object<TileCache>(storage, "my_cache", &sch);

    // Create an elevation producer:
    ElevationProducer::CreateInfo elevInfos = {.cache = cache,
                                               .gridMeshSize = 100};
    auto producer = create_ref_object<ElevationProducer>(elevInfos);
    sman->register_tile_producer(producer.get());

    // auto& tnode = scene.add_terrain({.maxLevel = 7, .size = 20.0});
    auto& tnode = scene.add_terrain(
        {.maxLevel = 7, .size = 20.0, .shader = "tests/sine_wave_terrain"});
    _tnode = &tnode;

    auto elevSampler = create_ref_object<TileSampler>(*producer);
    _tnode->add_tile_sampler(*elevSampler);

And when updating the terrain quads, I'm no acquiring/releasing tiles with that elevation sampler. But next step now is to provide an actual implementation for TileProducer::create_tile() ⇒ Let's get started!

Ohhh 😟… and now I'm learning something… I was taking it for granted that access to the WebGPU API with dawn would just be thread safe. But now it seemss this might now be true everywhere, has I can get a crash when firing multiple threads for the elevation tiles computation, and the crash lead me there:

auto WGPUEngine::create_buffer(const void* data, size_t size, BufferUsage usage)
    -> Buffer {
    BufferDescriptor buffer_desc = {
        .usage = BufferUsage::CopyDst | usage,
        .size = size,
    };

    logDEBUG("Create buffer!");
    auto buffer = _device.CreateBuffer(&buffer_desc);
    logDEBUG("Done Create buffer.");

    if (data != nullptr) {
        write_buffer(buffer.Get(), 0, data, size);
    }

    return std::move(buffer);
}

⇒ Above I get the message “Create buffer!” but not the “Done Create buffer.”, what does it means ?? 😨 Plus, other location seem to behave the same, for instance when I'm creating a texture too.

Arrghh… asked ChatGPT and got this answer:

WebGPU itself is designed to be a low-level graphics API that allows for more explicit control over the GPU, which means that developers need to manage threading and synchronization themselves. While WebGPU provides mechanisms to work with multiple threads, it doesn't automatically make all methods thread-safe.

Here are some general guidelines for working with threads and WebGPU:

Device Creation: Creating a WGPUDevice is generally a single-threaded operation and should be done on the main thread or a dedicated initialization thread.

Command Encoding: Command encoding is typically done on separate threads to maximize parallelism. Multiple threads can create command encoders and record commands simultaneously.

Command Submission: You should be cautious when submitting command buffers to the GPU. It's your responsibility to ensure that command buffers are not submitted concurrently from multiple threads unless you explicitly handle synchronization.

Resource Access: Accessing GPU resources (buffers, textures, etc.) from multiple threads requires proper synchronization using mechanisms like locks, fences, or semaphores. It's not inherently thread-safe.

Pipeline Creation: Pipeline creation is typically a one-time setup operation and can be done on a single thread during initialization.

Swap Chain: If you are rendering to a swap chain, you need to be mindful of synchronization when presenting images to the screen.

In summary, while WebGPU provides a foundation for multi-threaded graphics programming, it's up to the developer to manage threading and synchronization appropriately. Some operations, like device creation and pipeline creation, are typically done on a single thread, while others, like command encoding and resource access, can be parallelized with careful synchronization. Always refer to the specific documentation and best practices for your chosen WebGPU implementation for details on thread safety and synchronization.

So it seems it could really be that accessing/creating the resources is not thread safe here and requires synchronization 😭.

And in fact further tests seem to indicate that access to the queue should also be protected 😭😭

OOhh… I just realized I had a validation error before this, so maybe that's the reason for all this mess 🤔? Fixing that first:

2023-09-28 14:31:20.810842 [ERROR] Dawn: Validation error: [Texture] usage (TextureUsage::TextureBinding) doesn't include TextureUsage::CopyDst.
 - While encoding [CommandEncoder].CopyTextureToTexture([Texture], [Texture], [Extent3D width:101, height:101, depthOrArrayLayers:1]).

2023-09-28 14:31:20.810851 [ERROR] Dawn: Validation error: [Invalid CommandBuffer] is invalid.
 - While calling [Queue].Submit([[Invalid CommandBuffer]])

Another thing I'm starting to understand now is that dynamically scheduling acquire/release of tiles while I'm iterating in the TerrainQuad tree is not a so good idea as I may schedule a task to acquire a parent tile for instance because I see it as not done, but maybe it was already in process, and when I submit my second taskgraph I see only a done task and that is not quite expected ? Plus, I could end up posting the same task multiple times for concurrent scheduling, so 2 threads could be working on the same task 😲! Let's try a different path…

hmmm… this thing is just exploding to by head right now lol. So trying to take a step back: I feel I really need to stick to the “DATA_NEEDED” reason when rescheduling a TaskGraph but this means reconsidering a lot of stuff now 😒.

Yeah… I have to admit most of the time (so far) I have been working without using a debugger for my code… at first it's incredibly challenging, but when you start understanding better how code works, usually your brain becomes a debugger on its own.

But here, really… with all those threads running concurrently with code that I mostly didn't structure myself initially and where I'm anyway injecting significant modifications I'm having too much of an hard time trying to understand what's happening… and there is a limit to the efficiency of simple debug log messages to help you with this. So time to get the beast out: I'm now rebuilding NervLand in Debug mode (with clang primarily for now) and will use visual studio code CodeLLDB to perform the investigations.

Initial set of debug flags:

  set(CMAKE_CXX_FLAGS_DEBUG
      "-g -Xclang -gcodeview -O0 -fno-omit-frame-pointer -DDEBUG -Wall -Wno-unused-function"
  )
  set(CMAKE_CXX_FLAGS_RELEASE "-DNDEBUG -O3")

And now also looking into https://nullprogram.com/blog/2023/04/29/

So I tried to enable fsanitize=address, but unfortunately it seems the version of clang I built doesn't have that available (?):

[113/1100] Linking CXX shared library sources\nvCore\shared\nvCore.dll
FAILED: sources/nvCore/shared/nvCore.dll sources/nvCore/shared/nvCore.lib
cmd.exe /C "cd . && D:\Projects\NervProj\libraries\windows_clang\LLVM-15.0.4\bin\clang++.exe -fuse-ld=lld-link -nostartfiles -nostdlib -std=c++20 -fpch-instantiate-templates -Wno-deprecated-declarations -g -Xclang -gcodeview -O0 -fno-omit-frame-pointer -DDEBUG -Wall -Wno-unused-function -fsanitize=address  -D_DLL -D_MT -Xclang --dependent-lib=msvcrt   -shared -o sources\nvCore\shared\nvCore.dll  -Xlinker /MANIFEST:EMBED -Xlinker /implib:sources\nvCore\shared\nvCore.lib -Xlinker /pdb:sources\nvCore\shared\nvCore.pdb -Xlinker /version:0.0 @CMakeFiles\nvCore.rsp  && cd ."
lld-link: error: could not open 'D:\Projects\NervProj\libraries\windows_clang\LLVM-15.0.4\lib\clang\15.0.4\lib\windows\clang_rt.asan_dll_thunk-x86_64.lib': no such file or directory

Then I was trying to enable fsanitize=undefined, but I have a runtime mismatch issue in that case:

FAILED: sources/nvCore/shared/nvCore.dll sources/nvCore/shared/nvCore.lib
cmd.exe /C "cd . && D:\Projects\NervProj\libraries\windows_clang\LLVM-15.0.4\bin\clang++.exe -fuse-ld=lld-link -nostartfiles -nostdlib -std=c++20 -fpch-instantiate-templates -Wno-deprecated-declarations -g -Xclang -gcodeview -O0 -fno-omit-frame-pointer -DDEBUG -Wall -Wno-unused-function -fsanitize=undefined -D_DLL -D_MT -Xclang --dependent-lib=msvcrt   -shared -o sources\nvCore\shared\nvCore.dll  -Xlinker /MANIFEST:EMBED -Xlinker /implib:sources\nvCore\shared\nvCore.lib -Xlinker /pdb:sources\nvCore\shared\nvCore.pdb -Xlinker /version:0.0 @CMakeFiles\nvCore.rsp  && cd ."
lld-link: error: /failifmismatch: mismatch detected for 'RuntimeLibrary':
>>> harfbuzz.lib(harfbuzz.cc.obj) has value MD_DynamicRelease
>>> clang_rt.ubsan_standalone_cxx-x86_64.lib(ubsan_type_hash_win.cpp.obj) has value MT_StaticRelease
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

: ⇒ will need to look into those… one day.

⇒ Okay so, this was a pretty good move actually as this already helped me to fix a couple of bugs, including the one I've been fighting with since some good time now in the MultithreadScheduler, where I could get this check to fail at some point:

NVCHK(!_allReadyTasks.empty(), "All ready tasks should not be empty here.");

And now moving to the next issue, which is this error message:

2023-09-29 11:00:59.027946 [DEBUG] TileSampler: should acquire tile Quad(7, 54, 10)
2023-09-29 11:00:59.027958 [DEBUG] Cache elevation_cache tiles: 77 used, 946 reusable, total 1024
2023-09-29 11:00:59.027967 [DEBUG] Cache elevation_cache tiles: 78 used, 946 reusable, total 1024
2023-09-29 11:00:59.027968 [DEBUG] scheduling createTile task 2074
2023-09-29 11:00:59.027971 [DEBUG] TileSampler: should acquire tile Quad(7, 54, 11)
2023-09-29 11:00:59.027982 [FATAL] Missing createTile task.
2023-09-29 11:00:59.027984 [FATAL] A fatal error occured

Let's see if a nice debugging session can tell me more about this 😁.

Alright! So this issue comes from:

void TileCache::on_create_tile_task_deleted(const TileId& tid) {
    WITH_RMUTEXLOCK(_mutex);
    auto it = _deletedTileTasks.find(tid);
    NVCHK(it != _deletedTileTasks.end(), "Missing createTile task.");
    _deletedTileTasks.erase(it);
}

Which gets triggered when one of our CreateTile tasks is deleted. I think that's kind of expected actually, so now trying to handle that gracefully. OK

Next one on the list, now I get a case where I get this (which is also due to partial implementation for now):

2023-09-29 11:41:41.078153 [DEBUG] Adding 2 quads, removing 0 quads
2023-09-29 11:41:41.078161 [DEBUG] TileSampler: should acquire tile Quad(3, 7, 4)
2023-09-29 11:41:41.078190 [FATAL] Calling CreateTileTaskGraph::restore()

Investigating… OK, getting ride of the CreateTileTaskGraph completely: I can just use a TaskGraph instead for now.

Next, working my way on more issues… and in the end I disable the mechanism to save “deleted tasks”: for now I'm just going to recreate them as it doesn't seem to work fine yet.

And one additional thing I learnt is that the calls to swapchain.Present() must be considered as a critical section along with other queue functions like WriteBuffer for instance.

And continuing on this path, I also seem to have a synchronization issue with my custom allocation system used in a multithreaded context 🤔 interesting…

⇒ Okay so, a lot of tweaking here… and in the process I even had to merge the _devSP and _queueSP spinlock objects in the WGPUEngine, but now (crossing fingers) it seems that the application is starting to get a bit more stable [finally!]

Now its time to move forward and finally get those elevation tiles to show on our terrain! But this means we need to provide some additional data per quad to render hmmm 🤔.

First thing: we will need a sampler to sample our tiles obviously, so updating the simple_terrain shader to have:

@group(0) @binding(0) var<uniform> terrain : TerrainInfos;
@group(0) @binding(1) var<storage, read> quads : array<QuadInfos>;
@group(0) @binding(2) var tileSampler: sampler;

Next how are we going to pass the tile data ? This should go into the QuadInfos structure, but now, we may actually get a dynamic structure for those infos, so I think the best option would be to consider this as a list of Vec4f values instead.

And to clarify what data we are going to pass to the shader we need to provide the list of TileSamplers during the construction of the TerrainNode

Okay… so here is the first display result I got [After quite a lot of intensive work to be honest… pretty tricky to set all of this up!]:

It's…. well, clearly not correct 😅, but still, at least it is displaying some kind of noise! which is not too bad actually! Now time to investigate how to improve on this one step at a time 😉!

Actually, the UVs I'm using to sample the tiles already seem wrong to me:

I think the tile xy coords are OK (at 0.0 both), but then the scale value I'm applying to the tile should really be set to 1 no 🤔?

var uv: vec2f = coords.xy + uv_in * size.xy;

Okay so now I'm starting to understand a bit better how this “ds” value is computed in the reference implementation, and indeed it should be set to 1.0 if we have the correct tile (which is always the case for us for the moment) or close to 1.0 since we remove the border pixels in the process.

Yet it also seems that I'm simply not taking the parent coarse texture into account when generating the sub tiles: this doesn't seem correct to me, so investigating.

There is still something going totally wrong with the executions of the tasks that I will get. I added some checks in the code:

    const auto& t2 = _processTask->add_func([this, t1]() {
        // The task 1 should really be done when we reach this one:
        NVCHK(t1->is_done(), "!!! Scene Task 1 should be done here!");

        auto* eng = WGPUEngine::instance();
        auto& bld = eng->build_commands();
        // render the passes:
        for (auto& gpass : _passes) {
            if (gpass.renderPass != nullptr) {
                bld.execute_render_pass(*gpass.renderPass);
            } else {
                bld.execute_compute_pass(*gpass.computePass);
            }
        }
        NVCHK(t1->is_done(), "!!! Scene Task 1 should be done here (bis)!");
        bld.submit();
    });

And this will produce the following error:

2023-10-06 09:23:22.390689 [DEBUG] TileSampler: should release tile Quad(1, 0, 1)
2023-10-06 09:23:22.390692 [DEBUG] TileSampler: should release tile Quad(1, 1, 0)
2023-10-06 09:23:22.390694 [DEBUG] TileSampler: should release tile Quad(1, 1, 1)
2023-10-06 09:23:22.390697 [DEBUG] TileSampler: should acquire tile Quad(0, 0, 0)
2023-10-06 09:23:22.390700 [DEBUG] Cache elevation_cache tiles: 1 used, 320 reusable, total 1024
2023-10-06 09:23:22.924838 [FATAL] !!! Scene Task 1 should be done here (bis)!
2023-10-06 09:23:22.924851 [FATAL] A fatal error occured

How could that be ? Could it be because of the task listeners ? I'm thinking maybe I'm adding the same listener multiple times here 😲 ? [Testing…] Nope: I still seem to be able to reproduce the same kind of issue at another location:

        const auto& t3 = tg->add_func([this, t2]() {
            // logDEBUG("in t3 on {}", get_thread_index());
            NVCHK(t2->is_done(), "!!! Render task 2 should be done here!");
            _guiManager->render_frame();

            NVCHK(t2->is_done(),
                  "!!! Render task 2 should be done here (bis)!");
            auto& bld = build_commands();
            bld.submit(true);

            present();

            post_render();
        });

Will give me:

2023-10-06 09:42:49.348449 [DEBUG] TileSampler: should acquire tile Quad(0, 0, 0)
2023-10-06 09:42:49.348453 [DEBUG] Cache elevation_cache tiles: 1 used, 226 reusable, total 1024
2023-10-06 09:43:40.950019 [DEBUG] Executed 10001 render cycles
2023-10-06 09:44:29.033694 [FATAL] !!! Render task 2 should be done here!
2023-10-06 09:44:29.033706 [FATAL] A fatal error occured

⇒ Let's replace the Task done flag with an atomic value: done, but that doesn't help.

Now thinking about something else: suppose in a task I submit sub tasks for execution. A request is post towards the main thread. Next suppose that main thread immediately dispatch the sub tasks (which is very possible) and all those sub tasks become completed before the work on the parent task is done. In this case, we will get to call parent->notify_done() before that parent task is completed. So the parent task get marked as “done” before it is effectively completed, not good! 😖. I need to fix that.

To try to resolve this, I'm now introduce another flag in the Task class to check if a given task is currently being processed in a worker thread:

    /** Returns true if this task is being processed. */
    auto is_processing() const -> bool {
        return _processing.load(std::memory_order_acquire);
    }

    /** Set the processing state of this task */
    void set_processing(bool val) {
        _processing.store(val, std::memory_order_release);
    }

⇒ And indeed, this seems to help quite a lot 😲! The app doesn't seem to be crashing with the checks mentioned above now 🥳!

Hmmm, in fact I updated this concept of “processing” further now storing the id of the worker processing this task as it seems there may still be cases where 2 threads would start working on the same task:

    /** Set the processing state of this task */
    void set_processor_id(I32 pid) {
        _processing.store(pid, std::memory_order_release);
    }

    /** Get the processor for this task */
    auto get_processor_id() const -> I32 {
        return _processing.load(std::memory_order_acquire);
    }

    /** Returns true if this task is being processed. */
    auto is_processing() const -> bool { return get_processor_id() >= 0; }

Ohh crap: even with this change I eventually get the previous error I reported again:

2023-10-06 11:34:08.594344 [DEBUG] Executed 280001 render cycles
2023-10-06 11:34:13.744227 [DEBUG] Executed 290001 render cycles
2023-10-06 11:34:18.916941 [DEBUG] Executed 300001 render cycles
2023-10-06 11:34:19.673610 [FATAL] !!! Scene Task 1 should be done here (bis)!
2023-10-06 11:34:19.673623 [FATAL] A fatal error occured

my my my… 😢

I have now reduced the number of levels to draw on the terrain to 2, and I can see that we are rendering the 16 quads just as expected.

⇒ Okay I finally found the reason of the flickering of the rendering of the terrain: this was due to the convertion from f32 to u32 for thetexture index and layer index below (we really need to add +0.5 to fix the issue!):

fn sample_tile(uv_in: vec2f, coords: vec4f, size: vec4f) -> vec4f
{
    // The texture sampling code below will use the available tile textures,
    // the tileSampler, 'uv' (vec2f) texture coords, a 'layer' (u32) indicating the 
    // texture array index, and 'texIdx' (u32) indicating the texture number:

    var uv: vec2f = coords.xy + uv_in * size.xy;
    var layer: u32 = u32(coords.z+0.5);
    var texIdx: u32 = u32(coords.w+0.5);

    ${NV_TERRAIN_TILE_TEXTURE_SAMPLING()}

    return vec4f(0.749019,0.25098,0.749019, 1.0);
}

Arrggghh… And with more investigations, I noticed that my root quad size value was set to 0.0. Stupid me 😖!

Okay! So now we finally have the elevation tiles displaying correctly with some noise added at each level yeepee! https://x.com/magik_engineer/status/1710934626921824471?s=20

To acheive this, the first thing I believe I need to do is to provide support to configure the “grid size” when constructing a TerrainNode: this should define how many quads are drawn in both X and Y dimensions, and thus, control the number of vertices drawn.

NervLand DevLog #32: Working our way to display Proland terrains

Continuing with Proland ElevationProducer integration

Starting to use scheduler to run main app loop

Using dedicated task for each scene

Introduction of SubTasks

Starting a new Terrain sample with support for a TileSampler

Leveling-Up: Starting to use a debugger 😅

Displaying Elevation tiles

Investigating issue with task execution order

Displacing vertices with elevation tiles