blog:2022:1126_nervland_jit_compiler_part6

NervLand: LLVM JIT - Back to basics

So in our previous post we were playing a bit with push constants, and came to the conclusion that using luajit over C++ to define dynamic behavior only introduced an about 5% performance penalty. Still, now I really feel like diving back into the LLVM JIT compiler implementation to check what has changed there since the last time I tried that 😁. And who knows, maybe I can easily reclaim that 5% hit in the end ? 😏

  • A first startup tutorial would be the official Building a JIT: Starting out with KaleidoscopeJIT
  • Now checking my previous nvLLVM module in the NervSeed project…
    • The previous NervJIT declaration looked like this:
      namespace nv
      {
      
      struct NervJITImpl;
      
      class NVLLVM_EXPORT NervJIT
      {
      public:
          enum HeaderType {
              HEADER_SYSTEM,
              HEADER_ANGLED,
              HEADER_QUOTED,
          };
      
      private:
          NervJITImpl* impl;
      
      public:
      
          NervJIT();
          ~NervJIT();
      
          void loadModuleFromFiles(const FileList& files) {
              for(auto& f: files) {
                  loadModuleFromFile(f);
              }
          }
      
          void clearMacroDefinitions();
          void addMacroDefinition(std::string val);
          void addMacroDefinition(const std::string& key, const std::string& val) {
              addMacroDefinition(key+"="+val);
          }
          
          void linkModule(const std::string& outFile, const std::vector<std::string>& inputList, bool onlyNeeded, bool internalize, bool optimize, bool preserveUseListOrder);
          
          void clearHeaderSearchPaths();
          void addHeaderSearchPath(std::string path, HeaderType htype);
      
          void generateBitcodeFromFile(const std::string& inputFile, std::string outFile);
          void generateBitcodeFromBuffer(const std::string& buffer, std::string outFile);
      
          void generateObjectFromFile(const std::string& inputFile, std::string outFile);
          void generateObjectFromBuffer(const std::string& buffer, std::string outFile);
          
          void generatePCHFromFile(const std::string& inputFile, std::string outFile);
          void generatePCHFromBuffer(const std::string& buffer, std::string outFile);
          void usePCHFile(std::string pchfile);
      
          void loadModuleBitcode(const std::string& bcfile);
          void loadObject(const std::string& objFile);
      
          void loadModuleFromFile(const std::string& file);
      
          void loadModuleFromBuffer(const std::string& buffer);
          
          uint64_t lookup(const std::string& name);
      
          void addCurrentProcess();
          void addDynamicLib(const char* filename);
          
          void setupCommandLine(const std::vector<std::string>& args);
      
          void setCurrentDylib(const std::string& libname);
          void createDylib(const std::string& libname);
          bool hasDylib(const std::string& libname);
      
      };
      
      };
  • ⇒ So we had quite a lot happening there already apparently.
  • [Oh my… did I actully write this code ? 😅 I can barely understand what it's all about now]
  • ⇒ Maybe I should first re-read my previous posts on this topic lol.
  • Okay ⇒ let's start with a simple module setup for our new version of nvLLVM: OK
  • I'm also preparing a minimal lua test app for this:
    ---@class app.LLVMApp: app.AppBase
    local Class = createClass { name = "LLVMApp", bases = "app.AppBase" }
    
    local jit = import "base.JITCompiler"
    
    function Class:__init(args)
        Class.super.__init(self, args)
    end
    
    function Class:run()
        logDEBUG("Done running app.")
    end
    
    return Class
  • And here is the simplified version of my JITCompiler lua class for now:
    ---@class base.JITCompiler: base.Object
    local Class = createClass { name = "LLVMApp", bases = "base.Object" }
    
    local bm = import "base.BindingsManager"
    
    function Class:__init()
        Class.super.__init(self)
        logDEBUG('Loading luaLLVM module...')
        bm:loadBindings("LLVM", "luaLLVM")
        logDEBUG('Done loading luaLLVM module.')
    
        -- Create the NervJIT engine:
        self.nvjit = nvll.NervJIT()
    end
    
    -- Return an instance of that class:
    return Class()
    
  • ⇒ This will create the NervJIT object as expected an manage the object lifetime correctly. So now time to start extending this a bit.
  • First we get the target machine builder for our host:
    NervJITImpl::NervJITImpl() {
        logDEBUG("Creating NervJITImpl.");
    
        // First we detect the host here:
        auto jtmb = CHECK_LLVM(llvm::orc::JITTargetMachineBuilder::detectHost());
        logDEBUG("Detected CPU: {}", jtmb.getCPU());
        logDEBUG("Detected Triple: {}", jtmb.getTargetTriple().str());
    }
To get this to link we need the libraries: LLVMSupport LLVMOrcJIT LLVMMC
  • And the outputs I get from this are for instance:
    2022-11-17 12:36:58.508021 [DEBUG] Creating NervJIT object.
    2022-11-17 12:36:58.508033 [DEBUG] Creating NervJITImpl.
    2022-11-17 12:36:58.508215 [DEBUG] Detected CPU: ivybridge
    2022-11-17 12:36:58.508223 [DEBUG] Detected Triple: x86_64-pc-windows-msvc
  • So far so good! Let's keep moving.
  • Then, as already discussed in this article I need to disable the TLS emulation if this is enabled to avoid issues later with static variables, so let's preemptively doing that here too:
        // Resetting emulateTLS to false if needed:
        auto& tgtOpts = jtmb.getOptions();
        if (tgtOpts.EmulatedTLS || tgtOpts.ExplicitEmulatedTLS) {
            tgtOpts.EmulatedTLS = false;
            tgtOpts.ExplicitEmulatedTLS = false;
            logDEBUG("=> Explicitly disabling emulated TLS.");
        }
    /
We need to link to LLVMTarget LLVMCore to use the TargetMachine and DataLayout classes
To find which library is actually providing a given LLVM class we simply search the text content of the lib for for the expected implementation file, for instance searching for “DataLayout.cpp” will give us only “LLVMCore.lib” as result ;-)
  • Then we also need to link to the following libs to fix linker issues: LLVMAnalysis LLVMBinaryFormat LLVMRemarks LLVMProfileData LLVMBitstreamReader LLVMDemangle
  • And we also need the zlibstatic library actually.
  • Then I can build the app, but I get an exception:
    2022-11-17 13:17:54.486091 [DEBUG] Detected CPU: ivybridge
    2022-11-17 13:17:54.486106 [DEBUG] Detected Triple: x86_64-pc-windows-msvc
    2022-11-17 13:17:54.486115 [DEBUG] => Explicitly disabling emulated TLS.
    2022-11-17 13:17:54.486119 [DEBUG] Creating target machine...
    2022-11-17 13:17:54.486248 [FATAL] Error in lua app:
    C++ exception
    2022-11-17 13:17:54.486260 [DEBUG] Destroying NervApp...
    2022-11-17 13:17:54.486498 [DEBUG] Destroying all lua states.
    2022-11-17 13:17:54.486511 [INFO] Closing lua state...
  • ⇒ I suppose that could be because so far I didn't even bother to init/uninit LLVm properly 🤣 Let's look into this first.
  • In our previous implementation we were simply initializing LLVM directly from lua before creating the NervJIT instance: so let's just keep it that way:
    function Class:init()
      nv.initLLVM()
    
      self.jit = nv.NervJIT()
      
      -- more stuff here.
    end
  • ⇒ Also need to link to LLVMX86Info LLVMX86CodeGen LLVMX86AsmParser LLVMX86TargetMCA now..And in fact, I had to include many more additional LLVM libraries! (doing this one step at a time to ensure I'm not trying to link to non-required libs)
  • In this process I also updated the lua layer to properly handle uninit callbacks for any class like the following:
    function Class:__init()
        Class.super.__init(self)
        logTRACE('Loading luaLLVM module...')
        bm:loadBindings("LLVM", "luaLLVM")
        logTRACE('Done loading luaLLVM module.')
    
        -- Initialize:
        self:init()
    
        -- Register uninit callback:
        utils.on_uninit(function() self:uninit() end)
    end
  • ⇒ The uninit callbacks will then be execute when we close the lua state in the uninit.lua file:
    nvLogTRACE("Running uninit.lua")
    
    local utils = import "base.utils"
    utils.execute_uninit_callbacks()
    
  • As expected, when initializing LLVM before creating the “TargetMachine” in NervJIT, I don't experience the previous crash anymore 👍!
  • Then I was trying to restore the usage of the SingleSectionMemoryManager + SEHFrameHandler, but I'm facing an issue with the compilation of the call to the _CxxThrowException method:
    _CxxThrowException(CxxExcept, (_ThrowInfo*)Info);
  • ⇒ I can't seem to find any definition of the _ThrowInfo struct: so wondering if this is still available somehow 🤔?
  • Anyway, let's start simple: I will just bypass the setObjectLinkingLayerCreator call for now, and proceed with creating the lljit object:
        logDEBUG("Creating LLJIT object.");
        llvm::orc::LLJITBuilder llb;
        llb.setJITTargetMachineBuilder(std::move(jtmb)).setNumCompileThreads(2);
    #if 0
        // We use our custom memory manager here:
        llb.setObjectLinkingLayerCreator(
            [](llvm::orc::ExecutionSession& ES, const llvm::Triple& triple)
                -> std::unique_ptr<llvm::orc::ObjectLayer> {
                auto GetMemMgr = []() {
                    return std::make_unique<SingleSectionMemoryManager>();
                };
    
                auto ObjLinkingLayer =
                    std::make_unique<llvm::orc::RTDyldObjectLinkingLayer>(
                        ES, std::move(GetMemMgr));
    
                // Note sure this is needed/appropriate ?
                if (triple.isOSBinFormatCOFF()) {
                    ObjLinkingLayer->setOverrideObjectFlagsWithResponsibilityFlags(
                        true);
                    ObjLinkingLayer->setAutoClaimResponsibilityForObjectSymbols(
                        true);
                }
    
                return {std::move(ObjLinkingLayer)};
            });
    #endif
    
        lljit = CHECK_LLVM(llb.create());
  • OK, with that code I get the lljit constructed successfully 👍.
  • Next we get a reference on the main JIT dylib, and with that, it also means we need to handle setting up and then uninitializing that dylib (mainly to support the potential calls to atexit() currently ?):
    NervJITImpl::~NervJITImpl() {
        logDEBUG("Destroying NervJITImpl.");
    
        // We should start with running the at exit callbacks:
        run_exit_callbacks();
    
        while (!dylibList.empty()) {
            std::string libname = dylibList.back();
            dylibList.pop_back();
            logDEBUG("Uninitializing dylib {}...", libname);
            checkLLVMError(lljit->deinitialize(*lljit->getJITDylibByName(libname)));
        }
    }
    
    void NervJITImpl::setup_dylib(llvm::orc::JITDylib& JD) const {
    
        llvm::orc::SymbolMap RuntimeInterposes;
    
        // But we still need to manually take care of the atexit function itself:
        RuntimeInterposes[(*mangler)("atexit")] = llvm::JITEvaluatedSymbol(
            toTargetAddress(&at_exit_override), llvm::JITSymbolFlags::Exported);
    
        checkLLVMError(JD.define(absoluteSymbols(std::move(RuntimeInterposes))));
    }
  • With the previous stage I'm building some kind of “draft JIT” engine inside the NervJITImpl container class. But then I had a look at the updated content of llvm/tools/lli/lli.cpp and just realized we have a pretty advanced implementation of a JIT engine (optionally using an Orc Lazy JIT version as a top layer) in there, so I should probably give this a more carefull look and try to use it as a reference.
  • The only significant thing is, that the lli tool expect IR (ie. bitcode .bc files) as inputs when creating the modules: that's where we need an additional stage internally: we will use clang to compile C++ source files to IR bitcode (with storage either on disk or just in memory ?) and then we can process that IR bytecode to create the modules to be added in the JIT, right ?
  • So as a first step I should maybe rather focus on the IR compilation stage for the moment… ⇒ So let's prepare a dedicated CXXCompiler class for that.
  • Initial setup of the CXXCompiler is as follow:
    CXXCompiler::CXXCompiler(const llvm::Triple& triple) {
        logDEBUG("Creating CXX Compiler.");
        diagnosticOptions = new clang::DiagnosticOptions;
    
        textDiagnosticPrinter = std::make_unique<clang::TextDiagnosticPrinter>(
            llvm::outs(), diagnosticOptions.get());
    
        // The diagnotic engine should not own the client below (or it could if we
        // release our unique_ptr.)
        diagnosticsEngine = new clang::DiagnosticsEngine(
            diagIDs, diagnosticOptions, textDiagnosticPrinter.get(), false);
    
        compilerInstance = std::make_unique<clang::CompilerInstance>();
        auto& compilerInvocation = compilerInstance->getInvocation();
    
        auto& targetOptions = compilerInvocation.getTargetOpts();
        auto& frontEndOptions = compilerInvocation.getFrontendOpts();
        frontEndOptions.ShowStats = false;
    
        auto& headerSearchOptions = compilerInvocation.getHeaderSearchOpts();
        headerSearchOptions.Verbose = false;
        headerSearchOptions.UserEntries.clear();
    
        // targetOptions.Triple = llvm::sys::getDefaultTargetTriple();
        targetOptions.Triple = triple.str();
        compilerInstance->createDiagnostics(textDiagnosticPrinter.get(), false);
    }
  • Note: In the code above, I decided to use the triple detected for my current host when building the NervJITImpl as argumnet for the CXXCompiler, instead of using the getDefaultTargetTriple() as I was doing in my previous implementation: hopefully this will not change much 🤔?
  • Then I have a few “actions” defined in the previous NervJIT implementation, but I'm not quite sure yet which one I will actually need, so for the moment, I'm not including any of them. ⇒ Let's just see how I was compiling the files from Lua before.
  • First compilation stage I see in the previous JITCompiler implementation is the following:
      -- We try to generate the simple PCH file:
      local pchfile = self:getPCHForBuffer([[
    #include <core_lua.h>
    #include <lua/LuaManager.h>
    #include <NervApp.h>
    #include <view/WindowManager.h>
    
    using namespace nv;
      ]])
    
      local startTime = nv.SystemTime.getCurrentTime()
      logDEBUG("JIT: Loading Lua extensions...")
      self:usePCHFile(pchfile)
    
      self:loadBitcodeForFile(self.script_dir.."lua_base_extensions.cpp")
    
      -- self.jit:generateBitcodeFromFile(self.script_dir.."lua_base_extensions.cpp", self.bc_dir.."lua_base_extensions.bc")
      -- self.jit:loadModuleBitcode(self.bc_dir.."lua_base_extensions.bc")
      
      -- self.jit:loadModuleFromFile(self.script_dir.."lua_base_extensions.cpp")
      local endTime = nv.SystemTime.getCurrentTime()
      logDEBUG(string.format("Script compiled in %.3fms", (endTime - startTime)*1000.0))
    
      self.jit:call("loadLuaBaseExtensions")
  • ⇒ So let's try to replicate that 😊!
  • For now I simply create a minimal “hello world” test using our LogManager to output a debug message as follow:
    #include <core_common.h>
    
    using namespace nv;
    
    extern "C" void helloWorld() { logDEBUG("Hello world from JIT function."); };
    
  • Okay, continuing with the implementation, next I had to add a couple of functions in the CXXCompiler to actually generated bytecode from a given input buffer:
    void CXXCompiler::set_input_buffer(llvm::MemoryBuffer* buf) const {
        auto& compilerInvocation = compilerInstance->getInvocation();
        auto& frontEndOptions = compilerInvocation.getFrontendOpts();
        frontEndOptions.Inputs.clear();
    
        llvm::MemoryBufferRef rbuf(*buf);
    
        frontEndOptions.Inputs.push_back(
            clang::FrontendInputFile(rbuf, clang::InputKind(clang::Language::CXX)));
    }
    
    void CXXCompiler::generate_bitcode(const char* out_file) const {
        auto& compilerInvocation = compilerInstance->getInvocation();
        auto& frontEndOptions = compilerInvocation.getFrontendOpts();
        std::string prevFile = std::move(frontEndOptions.OutputFile);
        frontEndOptions.OutputFile = out_file;
    
        // keep a copy of the current program action:
        auto prevAction = frontEndOptions.ProgramAction;
        frontEndOptions.ProgramAction = clang::frontend::EmitBC;
    
        if (!compilerInstance->ExecuteAction(*emit_bc_action)) {
            logERROR("Failed executing emit_bc_action with compiler instance!");
        }
    
        // Restore the previous values:
        frontEndOptions.OutputFile = std::move(prevFile);
        frontEndOptions.ProgramAction = prevAction;
    }
I'm not quite sure we need the “frontEndOptions.ProgramAction” setup above ? Maybe we could simply bypass that.
  • And now running my first compilation test in NervLand ✌️! With this simple command:
    function Class:run()
        logDEBUG("Running simple compilation...")
        jit:buildBitcodeForFile("tests/hello_world.cpp")
        logDEBUG("Done running app.")
    end
    
  • ⇒ This is failing of course, but interestingly I already have useful outputs on the reason why it's failing:
    2022-11-19 22:36:25.073421 [DEBUG] Running simple compilation...
    2022-11-19 22:36:25.073946 [DEBUG] Generating bytecode file D:/Projects/NervLand/dist/compiled/3777ff771e2dffff7aff657249ffff550550ff6932ffffff60ff79ff38ffffffffffb4.bc...
    :1:10: fatal error: 'core_common.h' file not found
    #include <core_common.h>
             ^~~~~~~~~~~~~~~
    1 error generated.
    2022-11-19 22:36:25.083238 [ERROR] Failed executing emit_bc_action with compiler instance!
    2022-11-19 22:36:25.083310 [DEBUG] Bitcode compiled in 9.325ms
    2022-11-19 22:36:25.083328 [DEBUG] Done running app.
  • As shown above we need to setup the include headers paths, macros, etc correctly now to proceed with the compilation stage, so let's do just that.
  • ⇒ So I now have a few additional helper functions on the NervJIT facade accessing the CXCompiler under the hood:
        void setup_command_line(const nv::StringList& args);
        void clear_header_search_paths();
        void add_header_search_path(const char* path, HeaderType htype);
        void clear_macro_definitions();
        void add_macro_definition(const char* macro);
  • Nothing too fancy to report on those functions, and for the lua part, I mostly just took the existing implementation I already had:
    -- Method used to register new lists:
    function Class:registerCommandLineArgList(clName, args)
        self.cl_args_lists[clName] = args
    end
    
    function Class:registerHeaderPathList(name, list)
        self.header_paths_lists[name] = list
    end
    
    function Class:registerMacroDefList(name, list)
        self.macro_defs_lists[name] = list
    end
    
    -- Setup usage of given lists:
    function Class:useCommandLineArgLists(listNames, keep)
        if not keep then
            self.current_cl_args = {}
        end
    
        for _, lname in ipairs(listNames) do
            local plist = self.cl_args_lists[lname]
            for _, entry in ipairs(plist) do
                table.insert(self.current_cl_args, entry)
            end
        end
    
        self.jit:setup_command_line(self.current_cl_args)
    
        logDEBUG("Done setting up command line args.")
        -- This will anyway clear the other settings:
        self.current_header_paths = {}
        self.current_macro_defs = {}
    
        self:updateCommandLineArgsHash()
        self:updateHeaderPathsHash()
        self:updateMacroDefsHash()
    end
    
    function Class:useHeaderPathLists(listNames, keep)
        if not keep then
            -- clear the previous list of header search paths:
            self.jit:clear_header_search_paths();
            self.current_header_paths = {}
        end
    
        for _, lname in ipairs(listNames) do
            local plist = self.header_paths_lists[lname]
            for _, entry in ipairs(plist) do
                table.insert(self.current_header_paths, entry[1])
                self.jit:add_header_search_path(entry[1], entry[2]);
            end
        end
    
        -- We now generate the hash:
        self:updateHeaderPathsHash()
    end
    
    function Class:useMacroDefLists(listNames, keep)
        if not keep then
            -- clear the previous list of header search paths:
            self.jit:clear_macro_definitions();
            self.current_macro_defs = {}
        end
    
        for _, lname in ipairs(listNames) do
            local plist = self.macro_defs_lists[lname]
            for _, dval in ipairs(plist) do
                table.insert(self.current_macro_defs, dval)
                self.jit:add_macro_definition(dval)
            end
        end
    
        -- We now generate the hash:
        self:updateMacroDefsHash()
    end
    
    function Class:addHeaderPaths(paths, ptype)
        ptype = ptype or nvll.HeaderType.ANGLED
    
        for _, path in ipairs(paths) do
            table.insert(self.current_header_paths, path)
            self.jit:add_header_search_path(path, ptype);
        end
    
        -- We now generate the hash:
        self:updateHeaderPathsHash()
    end
    
    function Class:addMacroDefs(dvals)
        for _, dval in ipairs(dvals) do
            table.insert(self.current_macro_defs, dval)
            self.jit:add_macro_definition(dval)
        end
    
        -- We now generate the hash:
        self:updateMacroDefsHash()
    end
    
    -- Methods used to generate a hash value from a string list:
    function Class:computeStringListHash(list)
        return nv.SHA256.from_buffer(table.concat(list, " "))
    end
    
    function Class:updateContextHash()
        self.current_context_hash = self.cl_args_hash .. self.header_paths_hash .. self.macro_defs_hash .. self.pch_file_hash
        -- logDEBUG("Current context hash is: ", self.current_context_hash)
    end
    
    function Class:updatePCHHash()
        self.pch_file_hash = nv.SHA256.from_buffer(self.current_pch_file)
        self:updateContextHash()
    end
    
    function Class:updateMacroDefsHash()
        self.macro_defs_hash = self:computeStringListHash(self.current_macro_defs)
        self:updateContextHash()
    end
    
    function Class:updateHeaderPathsHash()
        self.header_paths_hash = self:computeStringListHash(self.current_header_paths)
        self:updateContextHash()
    end
    
    function Class:updateCommandLineArgsHash()
        self.cl_args_hash = self:computeStringListHash(self.current_cl_args)
        self:updateContextHash()
    end
    
    -- Get the hash for a given buffer in the current compilation context:
    function Class:getContextualBufferHash(buf)
        local hash = nv.SHA256.from_buffer(buf)
        return nv.SHA256.from_buffer(self.current_context_hash .. hash)
    end
Note the SHA256 hashing mechanism used to check if a bytecode for a given source buffer should be recompiled or not: this is needed as the actual compilation itself is still taking a fair bit of time in fact.
  • ⇒ And with this in place we could compile our first source file successfully (ie. the “hello world” cpp file mentioned above):
    2022-11-22 15:51:50.429210 [DEBUG] Lua: Done creating NervJIT.
    2022-11-22 15:51:50.429489 [DEBUG] Parsed args: {}
    2022-11-22 15:51:50.429502 [DEBUG] Running simple compilation...
    2022-11-22 15:51:50.434819 [DEBUG] Generating bytecode file D:/Projects/NervLand/dist/compiled/84b9b5238dba0d3c71dbe14e726fb5a4a6e477b34731e6b957ab6933d49e2e07.bc...
    2022-11-22 15:51:53.347740 [DEBUG] Bitcode compiled in 2912.878ms
    2022-11-22 15:51:53.347779 [DEBUG] Done running app.
    2022-11-22 15:51:53.347790 [DEBUG] Destroying NervApp...
  • Next, before trying to load our just compiled bytecode, let's use the lli.cpp file as reference to update our implementation of the LLJIT.
  • ⇒ This is mostly implemented in the runOrcJIT() method.
  • First thing we need is a ThreadSafeContext I think:
      // Parse the main module.
      orc::ThreadSafeContext TSCtx(std::make_unique<LLVMContext>());
      auto MainModule = ExitOnErr(loadModule(InputFile, TSCtx));
    
      // Get TargetTriple and DataLayout from the main module if they're explicitly
      // set.
      Optional<Triple> TT;
      Optional<DataLayout> DL;
      MainModule.withModuleDo([&](Module &M) {
          if (!M.getTargetTriple().empty())
            TT = Triple(M.getTargetTriple());
          if (!M.getDataLayout().isDefault())
            DL = M.getDataLayout();
        });
  • Note: we should not need to retrieve the Triple and DataLayout this way ourself since we already configure these with a different mechanism (before compiling any module in fact).
  • Note: For the moment i'm not using an Object CacheManager so not using that part from the reference implementation (to be considered layer maybe ?):
      // If the object cache is enabled then set a custom compile function
      // creator to use the cache.
      std::unique_ptr<LLIObjectCache> CacheManager;
      if (EnableCacheManager) {
    
        CacheManager = std::make_unique<LLIObjectCache>(ObjectCacheDir);
    
        Builder.setCompileFunctionCreator(
          [&](orc::JITTargetMachineBuilder JTMB)
                -> Expected<std::unique_ptr<orc::IRCompileLayer::IRCompiler>> {
            if (LazyJITCompileThreads > 0)
              return std::make_unique<orc::ConcurrentIRCompiler>(std::move(JTMB),
                                                            CacheManager.get());
    
            auto TM = JTMB.createTargetMachine();
            if (!TM)
              return TM.takeError();
    
            return std::make_unique<orc::TMOwningSimpleCompiler>(std::move(*TM),
                                                            CacheManager.get());
          });
      }
  • We will use the JITLink linker, so we use that part:
      std::unique_ptr<orc::ExecutorProcessControl> EPC = nullptr;
      if (JITLinker == JITLinkerKind::JITLink) {
        EPC = ExitOnErr(orc::SelfExecutorProcessControl::Create(
            std::make_shared<orc::SymbolStringPool>()));
    
        Builder.setObjectLinkingLayerCreator([&EPC](orc::ExecutionSession &ES,
                                                    const Triple &) {
          auto L = std::make_unique<orc::ObjectLinkingLayer>(ES, EPC->getMemMgr());
          L->addPlugin(std::make_unique<orc::EHFrameRegistrationPlugin>(
              ES, ExitOnErr(orc::EPCEHFrameRegistrar::Create(ES))));
          L->addPlugin(std::make_unique<orc::DebugObjectManagerPlugin>(
              ES, ExitOnErr(orc::createJITLoaderGDBRegistrar(ES))));
          return L;
        });
      }
  • Also, we keep the per function lazy compilation instead of per module, so we won't use that part:
      if (PerModuleLazy)
        J->setPartitionFunction(orc::CompileOnDemandLayer::compileWholeModule);
  • Finally we end the setup with the preparation of the main Dylib:
    void NervJITImpl::setup_dylib(llvm::orc::JITDylib& JD) const {
    
        // make process symbols available to JIT'd code:
        JD.addGenerator(CHECK_LLVM(
            llvm::orc::DynamicLibrarySearchGenerator::GetForCurrentProcess(
                lljit->getDataLayout().getGlobalPrefix(),
                [MainName =
                     (*mangler)("main")](const llvm::orc::SymbolStringPtr& Name) {
                    return Name != MainName;
                })));
    
        // Not sure we need the following anymore ?
        llvm::orc::SymbolMap RuntimeInterposes;
    
        // But we still need to manually take care of the atexit function itself:
        RuntimeInterposes[(*mangler)("atexit")] = llvm::JITEvaluatedSymbol(
            toTargetAddress(&at_exit_override), llvm::JITSymbolFlags::Exported);
    
        checkLLVMError(JD.define(absoluteSymbols(std::move(RuntimeInterposes))));
    }
  • Next step is to consider the AddModule function:
      // Regular modules are greedy: They materialize as a whole and trigger
      // materialization for all required symbols recursively. Lazy modules go
      // through partitioning and they replace outgoing calls with reexport stubs
      // that resolve on call-through.
      auto AddModule = [&](orc::JITDylib &JD, orc::ThreadSafeModule M) {
        return UseJITKind == JITKind::OrcLazy ? J->addLazyIRModule(JD, std::move(M))
                                              : J->addIRModule(JD, std::move(M));
      };
    
  • ⇒ I guess I could turn this into a member function on my side. OK
  • Also, I'm disregarding the “thread entry points” for now, but eventually this could also be something to consider:
      // Run any -thread-entry points.
      std::vector<std::thread> AltEntryThreads;
      for (auto &ThreadEntryPoint : ThreadEntryPoints) {
        auto EntryPointSym = ExitOnErr(J->lookup(ThreadEntryPoint));
        typedef void (*EntryPointPtr)();
        auto EntryPoint = reinterpret_cast<EntryPointPtr>(
            static_cast<uintptr_t>(EntryPointSym.getAddress()));
        AltEntryThreads.push_back(std::thread([EntryPoint]() { EntryPoint(); }));
      }
    /
  • Okay, so first thing, with the updated implementation above I could not load my minimal hello world example because apparently there is a missing symbol from the fmt library:
    2022-11-23 08:19:33.801336 [DEBUG] Loading module from tests/hello_world_v1.cpp
    2022-11-23 08:19:33.801716 [DEBUG] Generating bytecode file D:/Projects/NervLand/dist/compiled/5e4e0b1a6312893612634b3515416b4efc2545b760e8f6cdbf1dbb60b6d685b2.bc...
    2022-11-23 08:19:36.480648 [DEBUG] Bitcode compiled in 2678.904ms
    2022-11-23 08:19:36.496620 [DEBUG] Module dumped functions: [ __orc_init_func.D:/Projects/NervLand/dist/compiled/5e4e0b1a6312893612634b3515416b4efc2545b760e8f6cdbf1dbb60b6d685b2.bc.submodule.0x6e951aa21ea83511.ll ]
    LLVM ERROR: Associative COMDAT symbol '??_7?$basic_memory_buffer@D$0BPE@V?$allocator@D@std@@@v9@fmt@@6B@' is not a key for its COMDAT.

    /

  • I will come back to this point soon but first I decided I should try with an even simpler program using just std::cout:
    #include <iostream>
    
    extern "C" void helloWorld() {
        // Simple hello world output:
        std::cout << "Hello world from JIT function." << std::endl;
    };
    
  • And with this version, the module loading process seems to work! But then I face an error apparently related to the JITLink library (?) when uninitializing the main dylib:
    2022-11-23 08:28:09.322860 [DEBUG] Destroying NervJITImpl.
    2022-11-23 08:28:09.322896 [DEBUG] Uninitializing dylib main...
    2022-11-23 08:28:09.323587 [DEBUG] Module dumped functions: [ __lljit_run_atexits ]
    JIT session error: Unsupported file format

    /

  • ⇒ So, checking the LLVM sources, I could notice that in version 14.0.6 we don't seem to have support for the COFF format in the JITLink, but in versin 15.0.4 we do:
    void link(std::unique_ptr<LinkGraph> G, std::unique_ptr<JITLinkContext> Ctx) {
      switch (G->getTargetTriple().getObjectFormat()) {
      case Triple::MachO:
        return link_MachO(std::move(G), std::move(Ctx));
      case Triple::ELF:
        return link_ELF(std::move(G), std::move(Ctx));
      case Triple::COFF:
        return link_COFF(std::move(G), std::move(Ctx));
      default:
        Ctx->notifyFailed(make_error<JITLinkError>("Unsupported object format"));
      };
    }
  • ⇒ So time for an LLVM upgrade I guess ? ;-) ⇒ So I updated my LLVM library settings:

  - name: LLVM
    version: 15.0.4
    extracted_dir: llvm-project-llvmorg-15.0.4
    windows_url: https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-15.0.4.zip
    linux_url: https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-15.0.4.tar.gz
    # version: 14.0.6
    # extracted_dir: llvm-project-llvmorg-14.0.6
    # windows_url: https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-14.0.6.zip
    # linux_url: https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-14.0.6.tar.gz

  • And now requesting the build of that dependency:
    .\nvp.bat build libs llvm -c msvc
  • ⇒ Compilation for MSVC went just fine in about 2000secs:
    -- Install configuration: "Release"
    -- Installing: D:/Projects/NervProj/libraries/windows_msvc/LLVM-15.0.4/lib/c++.lib
    -- Installing: D:/Projects/NervProj/libraries/windows_msvc/LLVM-15.0.4/bin/c++.dll
    -- Installing: D:/Projects/NervProj/libraries/windows_msvc/LLVM-15.0.4/lib/libc++.lib
    -- Installing: D:/Projects/NervProj/libraries/windows_msvc/LLVM-15.0.4/lib/libc++experimental.lib
    2022/11/23 10:21:05 [nvp.core.build_manager] INFO: Removing build folder D:\Projects\NervProj\libraries\build\LLVM-15.0.4
    2022/11/23 10:21:14 [nvp.core.build_manager] INFO: Done building LLVM-15.0.4 (build time: 2067.87 seconds)
  • Now same thing for the clang version:
    .\nvp.bat build libs llvm -c clang
  • Here also, compilation went just fine in about 2400secs:
    -- Install configuration: "Release"
    -- Installing: D:/Projects/NervProj/libraries/windows_clang/LLVM-15.0.4/lib/c++.lib
    -- Installing: D:/Projects/NervProj/libraries/windows_clang/LLVM-15.0.4/bin/c++.dll
    -- Installing: D:/Projects/NervProj/libraries/windows_clang/LLVM-15.0.4/lib/libc++.lib
    -- Installing: D:/Projects/NervProj/libraries/windows_clang/LLVM-15.0.4/lib/libc++experimental.lib
    2022/11/23 12:38:21 [nvp.core.build_manager] INFO: Removing build folder D:\Projects\NervProj\libraries\build\LLVM-15.0.4
    2022/11/23 12:38:30 [nvp.core.build_manager] INFO: Done building LLVM-15.0.4 (build time: 2421.34 seconds)
  • Next, let's integrate this new version in the NervLand project: arrfff, of course this is not working just out of the box fo the clang lua module 😅, so let's update the files properly first: OK
  • So I updated the luaClang files, then a few minor changes on the nvLLVM module, then some fight with the linking to the new LLVM libraries, and finally I got everything to compile fine 👍!
  • Next stop, trying to run the lua vkapp… but that's failing because the vulkan validation layers are not available on Ragnarok lol I guess I should install that: OK got the vkapp to run (except that I get a crash on close, but let's not worry about that for now) (I get more than 6000fps by the way on the RTX 3090 :-), that's nice!)
  • And then checking the llvmapp… Again, I can compile the module file successfully, but then I get a “relocation error” on dylib deinit this time:
    2022-11-23 21:51:29.911254 [DEBUG] Lua: Uninitializing JIT Compiler...
    2022-11-23 21:51:29.911512 [DEBUG] Destroying NervJIT object.
    2022-11-23 21:51:29.911521 [DEBUG] Destroying CXX Compiler.
    2022-11-23 21:51:29.911557 [DEBUG] Destroying NervJITImpl.
    2022-11-23 21:51:29.911561 [DEBUG] Uninitializing dylib main...
    2022-11-23 21:51:29.912430 [DEBUG] Module dumped functions: [ __lljit_run_atexits atexit ]
    JIT session error: Unsupported x86_64 relocation:1

    /

  • ⇒ I shall investigate this further. But first let's ensure that the lua bindings I'm generating with the updated luaClang module are still the same: OK not a single change ;-).
  • Note: But here also, I get a crash when closing the lua state, hmmm… 🤔And same thing when just running the sdlapp ⇒ maybe I should rebuild boost/luajit with clang 15 then.
  • Rebuilding luaJIT with clang 15 with the command:
    nvp.bat build libs luajit -c clang --rebuild
  • ⇒ Arrgg… still crashing… OOhhh… lol, actually it's also crashing from the LLVM 14 branch, so I guess I messed something up… investigating. ⇒ Fixed [Was due to the uninit callbacks unloading the luaCore module apparently]
To keep compatibility between ragnarok/saturn builds I also had to upgrade to VS2022 version 17.4.1 on both sides
  • And to be able to compile my CPP files on both systems I'm using the following list of header paths now:
        self:registerHeaderPathList("default", {
            { "D:/Projects/NervProj/libraries/windows_clang/LLVM-15.0.4/lib/clang/15.0.4/include", nvll.HeaderType.SYSTEM },
            { "D:/Softs/VisualStudio2022CE/VC/Tools/MSVC/14.34.31933/include", nvll.HeaderType.SYSTEM },
            { "D:/Softs/VisualStudio2022CE/VC/Tools/MSVC/14.34.31933/atlmfc/include", nvll.HeaderType.SYSTEM },
            { "D:/Windows Kits/10/Include/10.0.19041.0/ucrt", nvll.HeaderType.SYSTEM },
            { "D:/Windows Kits/10/Include/10.0.19041.0/shared", nvll.HeaderType.SYSTEM },
            { "D:/Windows Kits/10/Include/10.0.19041.0/um", nvll.HeaderType.SYSTEM },
            { "D:/Windows Kits/10/Include/10.0.19041.0/winrt", nvll.HeaderType.SYSTEM },
            { "D:/Projects/NervProj/libraries/windows_clang/LLVM-15.0.4/lib/clang/15.0.4/include", nvll.HeaderType.SYSTEM },
            { "D:/Softs/VisualStudio/VS2022/VC/Tools/MSVC/14.34.31933/include", nvll.HeaderType.SYSTEM },
            { "D:/Softs/VisualStudio/VS2022/VC/Tools/MSVC/14.34.31933/atlmfc/include", nvll.HeaderType.SYSTEM },
            { "C:/Program Files (x86)/Windows Kits/10/Include/10.0.19041.0/ucrt", nvll.HeaderType.SYSTEM },
            { "C:/Program Files (x86)/Windows Kits/10/Include/10.0.19041.0/shared", nvll.HeaderType.SYSTEM },
            { "C:/Program Files (x86)/Windows Kits/10/Include/10.0.19041.0/um", nvll.HeaderType.SYSTEM },
            { "C:/Program Files (x86)/Windows Kits/10/Include/10.0.19041.0/winrt", nvll.HeaderType.SYSTEM },
        })
  • Now back at the problem at hand: we can compile our module, but when unloading the main dylib we get this error message:
    2022-11-25 11:39:38.637546 [DEBUG] Lua: Uninitializing JIT Compiler...
    2022-11-25 11:39:38.637961 [DEBUG] Destroying NervJIT object.
    2022-11-25 11:39:38.637975 [DEBUG] Destroying CXX Compiler.
    2022-11-25 11:39:38.641635 [DEBUG] Destroying NervJITImpl.
    2022-11-25 11:39:38.641667 [DEBUG] Uninitializing dylib main...
    2022-11-25 11:39:38.642465 [DEBUG] Module dumped functions: [ __lljit_run_atexits atexit ]
    JIT session error: Unsupported x86_64 relocation:1
  • So what could that be 🤔? Let's check the lli.cpp version again: OK, nothing special from there.
  • The error message string comes from JITLink/COFF_x86_64.cpp, and the relocation type we get is “1”, so what's the mapping for that ? ⇒ This correspond to IMAGE_REL_AMD64_ADDR64
  • ⇒ I'm not quite sure what to do with this, but I believe the problem comes from the JITLink, so let's try to use theRuntimeDyld linker instead.
  • Cool: no crash when I disable the usage of the JITLink, simply disabling this code section:
    #if 0
        logDEBUG("Creating execution process control");
        // Prepare the JITLinker:
        executionProcessControl =
            CHECK_LLVM(llvm::orc::SelfExecutorProcessControl::Create(
                std::make_shared<llvm::orc::SymbolStringPool>()));
    
        logDEBUG("Setting up object linking layer creator.");
        llb.setObjectLinkingLayerCreator(
            [this](llvm::orc::ExecutionSession& ES, const llvm::Triple&) {
                auto L = std::make_unique<llvm::orc::ObjectLinkingLayer>(
                    ES, executionProcessControl->getMemMgr());
    
                // L->addPlugin(std::make_unique<llvm::orc::EHFrameRegistrationPlugin>(
                //     ES, CHECK_LLVM(llvm::orc::EPCEHFrameRegistrar::Create(ES))));
                // L->addPlugin(std::make_unique<llvm::orc::DebugObjectManagerPlugin>(
                //     ES, CHECK_LLVM(llvm::orc::createJITLoaderGDBRegistrar(ES))));
                return L;
            });
    #endif
  • I added a call method in the NervJIT impl class:
    void NervJITImpl::call(const char* fname) const {
        auto funcAddr = CHECK_LLVM(lljit->lookup(fname));
    
        using func_t = void();
        auto func = funcAddr.toPtr<func_t*>();
        CHECK(func != nullptr, "Invalid pointer for {}", fname);
    
        try {
            func();
        } catch (const std::exception& e) {
            logERROR("Exception catched from JIT code: {}", e.what());
        } catch (...) {
            logERROR("Unknown exception catched from JIT code.");
        }
    }
  • And then trying to call that from lua, but of course, this is not working and will instead freeze the whole process with those final outputs:
    2022-11-25 14:26:47.113095 [DEBUG] JIT: Setting up macro definitions...
    2022-11-25 14:26:47.113163 [DEBUG] Lua: Done creating NervJIT.
    2022-11-25 14:26:47.113775 [DEBUG] Parsed args: {}
    2022-11-25 14:26:47.113799 [DEBUG] Loading module from tests/hello_world_v0.cpp
    2022-11-25 14:26:47.121210 [DEBUG] Module dumped functions: [ helloWorld ]
    2022-11-25 14:26:47.128711 [DEBUG] Module dumped functions: [ ]
    2022-11-25 14:26:47.131765 [DEBUG] Module dumped functions: [ ??$?6U?$char_traits@D@std@@@std@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@0@AEAV10@PEBD@Z ]
    2022-11-25 14:26:47.145352 [DEBUG] Module dumped functions: [ ?length@?$_Narrow_char_traits@DH@std@@SA_KQEBD@Z ]
    2022-11-25 14:26:47.149317 [DEBUG] Module dumped functions: [ ??0sentry@?$basic_ostream@DU?$char_traits@D@std@@@std@@QEAA@AEAV12@@Z ]
    2022-11-25 14:26:47.155943 [DEBUG] Module dumped functions: [ ??0_Sentry_base@?$basic_ostream@DU?$char_traits@D@std@@@std@@QEAA@AEAV12@@Z ]
  • Ohhh…. 😳 I commented a few more code sections, and now… it's working! 🥳🥂🥳:
    2022-11-25 14:42:16.086668 [DEBUG] Lua: Done creating NervJIT.
    2022-11-25 14:42:16.086955 [DEBUG] Parsed args: {}
    2022-11-25 14:42:16.086974 [DEBUG] Loading module from tests/hello_world_v0.cpp
    Hello world from JIT function.
    2022-11-25 14:42:16.140975 [DEBUG] Done running app.
    2022-11-25 14:42:16.141002 [DEBUG] Destroying NervApp...
  • And even ending properly with no crash, cool ;-)
  • Let's clarify where the freeze comes from exactly now. Okay: this seems to be related to the call to debug_dump_funcs(), not quite sure why, so just commenting that call for now.
  • Now let's get back to our second version of the helloworld() function using the LogManager:
    #include <core_common.h>
    
    using namespace nv;
    
    extern "C" void helloWorld() {
        logDEBUG("Hello world from JIT function with LogManager.");
    };
    
  • As expected, this end with a missing symbol:
    2022-11-25 15:41:52.169769 [DEBUG] Parsed args: {}
    2022-11-25 15:41:52.169792 [DEBUG] Loading module from tests/hello_world_v1.cpp
    2022-11-25 15:41:52.170334 [DEBUG] Generating bytecode file D:/Projects/NervLand/dist/compiled/ffaca731a73d87e83bc6973ca2f44373239012167e21d832130c254d792daea9.bc...
    2022-11-25 15:41:54.785121 [DEBUG] Bitcode compiled in 2614.747ms
    LLVM ERROR: Associative COMDAT symbol '??_7?$basic_memory_buffer@D$0BPE@V?$allocator@D@std@@@v9@fmt@@6B@' is not a key for its COMDAT.
  • Hmmm, I added support to load the archives with the following:
    void NervJITImpl::add_library_from_file(const char* path) const {
        auto gen = CHECK_LLVM(llvm::orc::StaticLibraryDefinitionGenerator::Load(
            lljit->getObjLinkingLayer(), path, targetTriple));
        logDEBUG("Adding generator for library {}", path);
        currentDylib->addGenerator(std::move(gen));
    
        checkLLVMError(lljit->initialize(*currentDylib));
    }
  • But unfortunately I'm still getting the same missing symbol (and also checked that the symbol can indeed be found in that lib file.), so there is something not quite working here.
  • But this works just find if I don't need to access to fmt library (by introducing a simpler version of the function as LogManager::debug(const char* msg)):
    2022-11-25 16:15:34.538482 [DEBUG] Lua: Done creating NervJIT.
    2022-11-25 16:15:34.538759 [DEBUG] Parsed args: {}
    2022-11-25 16:15:34.538771 [DEBUG] Loading module from tests/hello_world_v1.cpp
    2022-11-25 16:15:34.561055 [DEBUG] Hello world from JIT function with LogManager.
    2022-11-25 16:15:34.561107 [DEBUG] Done running app.
    2022-11-25 16:15:34.561127 [DEBUG] Destroying NervApp...
    2022-11-25 16:15:34.561576 [DEBUG] Lua: Uninitializing JIT Compiler...
  • Is that a critical problem ? Naaahh, not really… I don't think I really need to plan support to load static libraries for now 😅, but would be could to find a workaround for this specific situation though, so let's think a little about it.
  • Update: Actually after switching to the Non-Lazy JIT version in the steps below, I realized that this part was then working as expected and the JIT will be able to access the needed resource for the fmt.lib file 👍, amazing!:
    2022-11-26 08:56:22.021585 [DEBUG] Adding fmt library...
    2022-11-26 08:56:22.027984 [DEBUG] Adding generator for library D:/Projects/NervProj/libraries/windows_clang/fmt-9.1.1/lib/fmt.lib
    2022-11-26 08:56:22.028052 [DEBUG] Loading module from tests/hello_world_v1.cpp
    2022-11-26 08:56:22.074953 [INFO] Hello world from JIT function with LogManager.
    2022-11-26 08:56:22.074993 [DEBUG] Done running app.
  • I've then been trying to create a pretty complex class directly in this script as follow:
    #include <core_common.h>
    #include <vulkan_common.h>
    
    #include <base/RefPtr.h>
    #include <base/VulkanCommandBuffer.h>
    #include <base/VulkanFence.h>
    #include <base/VulkanFramebuffer.h>
    #include <base/VulkanPipeline.h>
    #include <base/VulkanPipelineCache.h>
    #include <base/VulkanPipelineLayout.h>
    #include <base/VulkanRenderPass.h>
    #include <base/VulkanSemaphore.h>
    #include <engine/VulkanRenderer.h>
    #include <engine/VulkanVertexBuffer.h>
    #include <vulkan_wrappers.h>
    
    using namespace nv;
    
    namespace nvk {
    
    class MyCmdBuffersProvider : public CmdBuffersProvider {
        NV_DECLARE_NO_COPY(MyCmdBuffersProvider)
        NV_DECLARE_NO_MOVE(MyCmdBuffersProvider)
    
      public:
        MyCmdBuffersProvider(){};
        MyCmdBuffersProvider(VulkanRenderer* renderer, VulkanRenderPass* rpass,
                             VulkanVertexBuffer* vbuf,
                             VulkanPipelineLayout* playout,
                             VulkanGraphicsPipelineCreateInfo* cfg,
                             VulkanPipelineCache* pcache,
                             const VulkanCommandBufferList& cbufs,
                             const nv::ByteArray& pushArr){};
    
        ~MyCmdBuffersProvider() override{};
        void get_buffers(FrameDesc& fdesc, VkCommandBufferList& buffers) override;
    
        // auto get_push_constants() -> nv::ByteArray* { return &_pushArr; }
    
      protected:
        nv::RefPtr<VulkanRenderer> _renderer;
        nv::RefPtr<VulkanPipeline> _pipeline;
        nv::RefPtr<VulkanPipelineCache> _pipelineCache;
        nv::RefPtr<VulkanRenderPass> _rpass;
        nv::RefPtr<VulkanVertexBuffer> _vbuf;
        nv::RefPtr<VulkanPipelineLayout> _playout;
        nv::RefPtr<VulkanGraphicsPipelineCreateInfo> _cfg;
        VulkanCommandBufferList _cbufs;
    
        nv::ByteArray _pushArr;
    
        U32 _width{0};
        U32 _height{0};
    };
    
    // MyCmdBuffersProvider::MyCmdBuffersProvider(
    //     VulkanRenderer* renderer, VulkanRenderPass* rpass, VulkanVertexBuffer*
    //     vbuf, VulkanPipelineLayout* playout, VulkanGraphicsPipelineCreateInfo*
    //     cfg, VulkanPipelineCache* pcache, const VulkanCommandBufferList& cbufs,
    //     const nv::ByteArray& pushArr)
    //     : _renderer(renderer), _pipelineCache(pcache), _rpass(rpass),
    //     _vbuf(vbuf),
    //       _playout(playout), _cfg(cfg), _cbufs(cbufs), _pushArr(pushArr) {}
    
    void MyCmdBuffersProvider::get_buffers(FrameDesc& fdesc,
                                           VkCommandBufferList& buffers) {
        // Write the command buffer:
        U32 idx = fdesc.swapchainImageIndex;
    
        // Re-record the command buffer as above:
        auto* cbuf = _cbufs[idx].get();
    
        auto* fbuf = _renderer->get_swapchain_framebuffer(idx);
    
        // We update our push constants here to contain a time value:
        F32 time = (F32)fdesc.frameTime;
    
        // logDEBUG("Writing time value: {}", time);
    
        // We write the time as the z element of the first vec4:
        _pushArr.write_f32(time, 8);
    
        // Push constants stages :
        U32 pstages = VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_FRAGMENT_BIT;
    
        U32 width = _renderer->get_swapchain_width();
        U32 height = _renderer->get_swapchain_height();
    
        // Check if we need to rebuild the pipeline:
        if (_width != width || _height != height) {
            _cfg->getCurrentViewportState()->setViewport((float)width,
                                                         (float)height);
            _pipeline = _renderer->get_device()->create_graphics_pipeline(
                _cfg->getVk(), _pipelineCache->getVk());
            _width = width;
            _height = height;
        }
    
        fbuf->set_clear_color(0, 0.2, 0.2, 0.2, 1.0);
    
        cbuf->begin(0);
    
        // Begin rendering into the swapchain framebuffer:
        cbuf->begin_inline_pass(_rpass.get(), fbuf);
    
        // Bind the graphics pipeline:
        cbuf->push_bind_graphics_pipeline(_pipeline->getVk());
    
        // Bind the vertex buffer:
        cbuf->bind_vertex_buffer(_vbuf.get(), 0);
    
        // add the push constants
        cbuf->write_push_contants(_playout->getVk(), pstages, 0,
                                  _pushArr.get_size(), _pushArr.get_data());
    
        // Draw our triangle:
        cbuf->draw(3);
    
        // End the render pass
        cbuf->end_render_pass();
    
        // Finish the command buffer:
        cbuf->finish();
    
        // Add the buffer to the list:
        buffers.push_back(cbuf->getVk());
    }
    
    } // namespace nvk
    
    extern "C" void helloWorld() {
        // nv::LogManager::debug("Hello world from JIT function with LogManager.");
        logDEBUG("Hello world from JIT function with LogManager.");
        nv::RefPtr<nvk::MyCmdBuffersProvider> obj = new nvk::MyCmdBuffersProvider();
        logDEBUG("Created object.");
    
        // logINFO("Hello world from JIT function with LogManager.");
    };
    
  • Unfortunately, this will not work and produce the following error message:
    2022-11-25 18:50:20.097813 [DEBUG] Loading module from tests/cmd_buf_prov_v1.cpp
    2022-11-25 18:50:20.098411 [DEBUG] Generating bytecode file D:/Projects/NervLand/dist/compiled/3c3bfaf5dae61d7dedccfcf21789db4b1b2c497fd5ab0b2a2dccb88ff8f5061d.bc...
    2022-11-25 18:50:23.813658 [DEBUG] Bitcode compiled in 3715.204ms
    LLVM ERROR: Associative COMDAT symbol '??_7MyCmdBuffersProvider@nvk@@6B@' is not a key for its COMDAT.
  • Yet, this works with a very simplified version such as this:
    #include <core_common.h>
    #include <vulkan_common.h>
    
    #include <base/RefPtr.h>
    #include <base/VulkanCommandBuffer.h>
    #include <base/VulkanFence.h>
    #include <base/VulkanFramebuffer.h>
    #include <base/VulkanPipeline.h>
    #include <base/VulkanPipelineCache.h>
    #include <base/VulkanPipelineLayout.h>
    #include <base/VulkanRenderPass.h>
    #include <base/VulkanSemaphore.h>
    #include <engine/VulkanRenderer.h>
    #include <engine/VulkanVertexBuffer.h>
    #include <vulkan_wrappers.h>
    
    using namespace nv;
    
    namespace nvk {
    
    class MyTestClass {
        NV_DECLARE_NO_COPY(MyTestClass)
        NV_DECLARE_NO_MOVE(MyTestClass)
    
      public:
        explicit MyTestClass(U32 val) : _value(val){};
        ~MyTestClass() = default;
    
        auto get_value() -> U32 { return _value; }
    
      protected:
        U32 _value{0};
    };
    
    } // namespace nvk
    
    extern "C" void helloWorld() {
        // nv::LogManager::debug("Hello world from JIT function with LogManager.");
        logDEBUG("Hello world from JIT function with LogManager.");
        nvk::MyTestClass obj(42);
        std::cout << "Meaning of life is: " << obj.get_value() << "!" << std::endl;
        // logDEBUG("Meaning of life is: {}", obj.get_value());
        // logINFO("Hello world from JIT function with LogManager.");
    };
  • And in fact this slightly modified version doesn't work anymore:
    #include <core_common.h>
    #include <vulkan_common.h>
    
    #include <base/RefPtr.h>
    #include <base/VulkanCommandBuffer.h>
    #include <base/VulkanFence.h>
    #include <base/VulkanFramebuffer.h>
    #include <base/VulkanPipeline.h>
    #include <base/VulkanPipelineCache.h>
    #include <base/VulkanPipelineLayout.h>
    #include <base/VulkanRenderPass.h>
    #include <base/VulkanSemaphore.h>
    #include <engine/VulkanRenderer.h>
    #include <engine/VulkanVertexBuffer.h>
    #include <vulkan_wrappers.h>
    
    using namespace nv;
    
    namespace nvk {
    
    class MyTestClass : public RefObject {
        NV_DECLARE_NO_COPY(MyTestClass)
        NV_DECLARE_NO_MOVE(MyTestClass)
    
      public:
        explicit MyTestClass(U32 val) : _value(val){};
        ~MyTestClass() override { logDEBUG("Destroying test object."); };
    
        auto get_value() const -> U32 { return _value; }
    
      protected:
        U32 _value{0};
    };
    
    } // namespace nvk
    
    extern "C" void helloWorld() {
        // nv::LogManager::debug("Hello world from JIT function with LogManager.");
        logDEBUG("Hello world from JIT function with LogManager.");
        nv::RefPtr<nvk::MyTestClass> obj = new nvk::MyTestClass(42);
        std::cout << "Meaning of life is: " << obj->get_value() << "!" << std::endl;
        // logDEBUG("Meaning of life is: {}", obj.get_value());
        // logINFO("Hello world from JIT function with LogManager.");
    };
    
  • Producing the same kind of error:
    2022-11-25 18:54:46.771240 [DEBUG] Parsed args: {}
    2022-11-25 18:54:46.771299 [DEBUG] Adding fmt library...
    2022-11-25 18:54:46.771309 [DEBUG] Loading module from tests/hello_world_v2.cpp
    2022-11-25 18:54:46.771670 [DEBUG] Generating bytecode file D:/Projects/NervLand/dist/compiled/ed1caf0a470bc67c84bf94e9305dd9afc2866f93ed055b3cf8b07d487e386a99.bc...
    2022-11-25 18:54:50.740277 [DEBUG] Bitcode compiled in 3968.571ms
    LLVM ERROR: Associative COMDAT symbol '??_7MyTestClass@nvk@@6B@' is not a key for its COMDAT.
  • So what do we have here 🤔? I'm starting to think maybe this is due to the “Lazy” part of the JIT since here we cannot find symbols that are defined directly in the JITed code itself.
  • ⇒ Reverting to building an llvm::orc::LLJIT instance instead of a llvm::orc::LLLazyJIT I'm not getting a different error message (which sounds familiar actually):
    2022-11-25 19:17:29.210786 [DEBUG] Parsed args: {}
    2022-11-25 19:17:29.210832 [DEBUG] Loading module from tests/hello_world_v2.cpp
    JIT session error: Symbols not found: [ ??_7type_info@@6B@ ]
    2022-11-25 19:17:29.250670 [FATAL] Error in lua app:
    C++ exception
    2022-11-25 19:17:29.250698 [DEBUG] Destroying NervApp...
  • According to this previous section, this is precisely where I need to start considering adding my LLVM symbol re-export helper module, so let's add that back in the loop.
  • To export the missing symbols I actually just added a new cpp file directly in my nvLLVM module (which will be loaded in the process):
    // we just export the symbols we need from here:
    
    #include <sstream>
    #include <vector>
    
    // Helper module used to re-export the missing symbols that may be needed in
    // LLVM JIT code.
    #pragma comment(linker, "/export:??3@YAXPEAX_K@Z")
    #pragma comment(linker, "/export:??2@YAPEAX_K@Z")
    #pragma comment(linker, "/export:??3@YAXPEAX@Z")
    #pragma comment(linker, "/export:??_7type_info@@6B@")
    
    #pragma comment(linker, "/export:_Init_thread_header")
    #pragma comment(linker, "/export:_Init_thread_footer")
    #pragma comment(linker, "/export:_Init_thread_abort")
    #pragma comment(linker, "/export:_tls_index")
    #pragma comment(linker, "/export:_Init_thread_epoch")
    
    #pragma comment(linker, "/export:?_Facet_Register@std@@YAXPEAV_Facet_base@1@@Z")
    #pragma comment(linker, "/export:??2@YAPEAX_KAEBUnothrow_t@std@@@Z")
    #pragma comment(linker, "/export:?nothrow@std@@3Unothrow_t@1@B")
    // #pragma comment(linker, "/export:atexit")
    
    #pragma comment(linker, "/export:__security_check_cookie")
    // #pragma comment(linker, "/export:__security_cookie")
    
    #pragma comment(linker, "/export:?__type_info_root_node@@3U__type_info_node@@A")
    #pragma comment(linker, "/export:??_V@YAXPEAX@Z")
    
    /
  • And surprisingly, this worked with the previous failing example! 😳!
  • So, naturally I then tested the more complex Vulkan provider script, but this time I got a bunch of missing symbols:
    JIT session error: Symbols not found: [ ?write_push_contants@VulkanCommandBuffer@nvk@@QEAAXPEAUVkPipelineLayout_T@@IIIPEBX@Z, ??0CmdBuffersProvider@nvk@@QEAA@XZ, ?set_clear_color@VulkanFramebuffer@nvk@@QEAA?A?<auto>@@IMMMM@Z, ?setViewport@VulkanPipelineViewportStateCreateInfo@nvk@@QEAAAEAU12@MMMMMM@Z, ??1CmdBuffersProvider@nvk@@UEAA@XZ, ?push_bind_graphics_pipeline@VulkanCommandBuffer@nvk@@QEAAXPEAUVkPipeline_T@@@Z, ?get_swapchain_width@VulkanRenderer@nvk@@QEBAIXZ, ?get_swapchain_height@VulkanRenderer@nvk@@QEBAIXZ, ?get_swapchain_framebuffer@VulkanRenderer@nvk@@QEAAPEAVVulkanFramebuffer@2@I@Z, ?get_device@VulkanRenderer@nvk@@QEBAPEAVVulkanDevice@2@XZ, ?begin@VulkanCommandBuffer@nvk@@QEAAXI@Z, ?begin_inline_pass@VulkanCommandBuffer@nvk@@QEAAXPEAVVulkanRenderPass@2@PEAVVulkanFramebuffer@2@HHII@Z, ?bind_vertex_buffer@VulkanCommandBuffer@nvk@@QEAAXPEAVVulkanVertexBuffer@2@I@Z, ?create_graphics_pipeline@VulkanDevice@nvk@@QEAA?AV?$RefPtr@VVulkanPipeline@nvk@@@nv@@AEBUVkGraphicsPipelineCreateInfo@@PEAUVkPipelineCache_T@@@Z, ?getVk@VulkanPipelineLayout@nvk@@QEBAPEAUVkPipelineLayout_T@@XZ, ?draw@VulkanCommandBuffer@nvk@@QEAAXIIII@Z, ?end_render_pass@VulkanCommandBuffer@nvk@@QEAAXXZ, ?finish@VulkanCommandBuffer@nvk@@QEAAXXZ, ?getVk@VulkanPipelineCache@nvk@@QEBAPEAUVkPipelineCache_T@@XZ, ?getCurrentViewportState@VulkanGraphicsPipelineCreateInfo@nvk@@QEAAAEAV?$RefPtr@UVulkanPipelineViewportStateCreateInfo@nvk@@@nv@@XZ, ?getVk@VulkanCommandBuffer@nvk@@QEBAPEAUVkCommandBuffer_T@@XZ, ?getVk@VulkanGraphicsPipelineCreateInfo@nvk@@QEBAAEAUVkGraphicsPipelineCreateInfo@@XZ, ?getVk@VulkanPipeline@nvk@@QEBAPEAUVkPipeline_T@@XZ ]
    2022-11-25 19:41:20.403431 [INFO] Closing lua state...
  • But in fact: this also makes sense since I have not loaded any vulkan library yet in this test process! Let's change that… And that worked!! 😭 Incredible.
  • First some minimal injection test inside the lua environment:
    extern "C" void helloWorld() {
        // nv::LogManager::debug("Hello world from JIT function with LogManager.");
        logDEBUG("Hello world from JIT function with LogManager.");
    
        auto& lman = LuaManager::instance();
        auto& L = lman.get_main_state();
        L.new_table();          // Create new table.
        L.push_int(1);          // push key
        L.push_string("hello"); // push value
        L.raw_set(-3);          // set k=val in table at -3 and pop key & val
        L.push_int(2);          // push key
        L.push_string("manu");  // push value
        L.raw_set(-3);          // set k=val in table at -3 and pop key & val
    
        L.set_global("jit_test");
    
        nv::RefPtr<nvk::MyCmdBuffersProvider> obj = new nvk::MyCmdBuffersProvider();
        logDEBUG("Created object.");
    
        // logINFO("Hello world from JIT function with LogManager.");
    };
  • Then, checking the result in lua:
        logDEBUG("Loading module from tests/cmd_buf_prov_v1.cpp")
        jit:addModuleFromFile("tests/cmd_buf_prov_v1.cpp")
    
        CHECK(jit_test == nil, "Unexpected non nil")
        jit:execute("helloWorld")
        CHECK(jit_test[1] == "hello", "Unexpected value 1")
        CHECK(jit_test[2] == "manu", "Unexpected value 2")
        logDEBUG("Message: ", jit_test[1], " ", jit_test[2])
    
        logDEBUG("Done running app.")
  • ANd this produces the expected results:
    2022-11-25 23:31:01.655741 [DEBUG] Loading vulkan extensions...
    2022-11-25 23:31:01.655817 [DEBUG] Loading module from tests/cmd_buf_prov_v1.cpp
    2022-11-25 23:31:01.758399 [DEBUG] Hello world from JIT function with LogManager.
    2022-11-25 23:31:01.758444 [DEBUG] Created object.
    2022-11-25 23:31:01.758462 [DEBUG] Message: hello manu
    2022-11-25 23:31:01.758479 [DEBUG] Done running app.
  • With the previous developments, it seems that we should not be able to write customized classes in C++ scripts compiled dynamically on request from Lua, and from those scripts, we should be able to extend the lua state with new elements, which could then be used directly from lua. For instance, we should be able to add a factory function for our custom “CmdBufferProvider” discussed above, returning a pointer on the parent class, and thus hiding the implementation completely.
  • ⇒ That would be awesome, but I think this is a topic for another post, as here, it seems we already have a working JIT layer anyway 😎.
  • Additional nice points here are that:
    • I have refactored my NervJIT implementation splitting this into more manageable sub components (the CXX Compiler on one side and the actual NervJIT implementation on the other side)
    • I now have a slighty better understanding of how all those layers work together,
    • The usage of the JIT is a bit simplified (missing symbols included in base module, using modules from current process directly, etc)
  • blog/2022/1126_nervland_jit_compiler_part6.txt
  • Last modified: 2022/11/26 08:07
  • by 127.0.0.1