blog:2020:0425_jit_compiler_part5_improvements

JIT Compiler with LLVM - Part 5 - Bitcode, PCH, exceptions handling, module linking and more...

So a good week has passed since my last article on my JIT Compiler experiment. And I must admit I've been playing with this code a lot in the past few days :-). In case you don't remember, my primary goal was to be able to generate C++ code directly from the Lua scripting language. So that's precisely what I did, and in the process, I built a “frontend” for my JIT compiler in Lua, that kept evolving and that I'm now using to perform most of my tests. During this part of my journey, I also worked on precompiled header (PCH) generation, LLVM module constructors and destructors, C++ unit testing from scripts, exceptions handling, and module linking concerns. So I think it's now already high time I stop coding a moment and try to share what I learnt on all those points in case this could be of interest to someone (or actually, even just to remember what I did in case I need to get back to it someday ;-))

So as mentioned above, I built a JITCompiler class in my lua environment, that I could then use to construct and use a C++ JIT object. I don't think it's worth getting too much into the details of how you would setup the lua bindings to achieve this results. So let's just say that I use the excellent sol3 library to generate those bindings, and I simply provided the bindings for my NervJIT class plus a couple of init/uninit functions so that I can fully setup/tear down the LLVM JIT environment from lua directly.

As a quick hint, here is what the bindings currently look like:

#include <llvm_common.h>
#include <core_lua.h>
#include <NervJIT.h>

namespace nv
{

void loadLLVMBindings(sol::state& lua)
{
    logTRACE2("Loading Lua bindings for LLVM module.");

    auto space = lua["nv"].get_or_create<sol::table>();

    space["initLLVM"] = &initLLVM;
    space["uninitLLVM"] = &uninitLLVM;

    SOL_BEGIN_ENUM(space, "LLVMHeaderType")
    SOL_ENUM("SYSTEM", nv::NervJIT::HEADER_SYSTEM);
    SOL_ENUM("ANGLED", nv::NervJIT::HEADER_ANGLED);
    SOL_ENUM("QUOTED", nv::NervJIT::HEADER_QUOTED);
    SOL_END_ENUM()

    SOL_BEGIN_CLASS(space, "NervJIT", NervJIT)
    SOL_CALL_CONSTRUCTORS(class_t());
    SOL_CLASS_FUNC(loadModuleFromFiles);
    SOL_CLASS_FUNC(loadModuleFromFile);
    SOL_CLASS_FUNC(loadModuleFromBuffer);
    SOL_CLASS_FUNC(generatePCHFromFile);
    SOL_CLASS_FUNC(generatePCHFromBuffer);
    SOL_CLASS_FUNC(generateBitcodeFromFile);
    SOL_CLASS_FUNC(generateBitcodeFromBuffer);
    SOL_CLASS_FUNC(loadModuleBitcode);
    SOL_CLASS_FUNC(usePCHFile);
    SOL_CLASS_FUNC(clearMacroDefinitions);
    SOL_OV2_FUNCS(addMacroDefinition, void(std::string), void(const std::string&, const std::string&)); 
    SOL_CLASS_FUNC(clearHeaderSearchPaths);
    SOL_CLASS_FUNC(addHeaderSearchPath);
    SOL_CLASS_FUNC(addCurrentProcess);
    SOL_CLASS_FUNC(addDynamicLib);
    SOL_CUSTOM_FUNC(linkModule) = [](class_t& obj, const std::string& outfile, sol::table t, bool onlyNeeded, bool internalize, bool optimize, bool preserveUseListOrder)
    {
        U32 count = t.size();
        std::vector<std::string> inputs(count);

        for (U32 i = 0; i < count; ++i)
        {
            inputs[i] = t[i + 1];
        }

        obj.linkModule(outfile, inputs, onlyNeeded, internalize, optimize, preserveUseListOrder);

    };
    SOL_CUSTOM_FUNC(setupCommandLine) = [](class_t& obj, sol::table t) {
        U32 count = t.size();
        std::vector<std::string> args(count);

        for (U32 i = 0; i < count; ++i)
        {
            args[i] = t[i + 1];
        }
        obj.setupCommandLine(args);
    };

    SOL_CUSTOM_FUNC(call) = [](class_t& obj, const std::string& name) {
        auto func = (void(*)())obj.lookup(name);
        CHECK(func, "Cannot find function with name "<<name);
        try {
            func();
        }
        catch(const std::exception& e) {
            logERROR("Exception catched from JIT code: "<<e.what());
        }
        catch(...) {
            logERROR("Unknown exception catched from JIT code.");
        }
    };
    SOL_END_CLASS()

    logTRACE2("Done loading Lua bindings for LLVM module.");
}

}

NV_REGISTER_BINDINGS(LLVM)

Don't think about trying to compile the code provided above: it's full of macros that I defined myself elsewhere, so you won't be able to get this compiling as is. But still you should get the idea on what we have available in lua afterwards ;-)
Many of the NervJIT methods available in this binding file are new: we didn't discuss them in any of the previous articles, but don't worry: we will describe them later in this post.

The first test I performed in lua then was to try to… extend the Lua bindings directly from lua itself! So I wrote the following C++ script:

#include <core_lua.h>
#include <lua/LuaManager.h>
#include <NervApp.h>

using namespace nv;

extern "C" void loadLuaBaseExtensions()
{
  auto& lman = LuaManager::instance();
  auto& lua = lman.getMainState();

  lua["nvFileExists"] = &fileExists;
  logDEBUG("Done loading Lua extensions.");
};

Then I would load that script directly from lua with something like this:

  local startTime = nv.SystemTime.getCurrentTime()
  logDEBUG("JIT: Loading Lua extensions...")
  self:loadBitcodeForFile(self.script_dir.."lua_base_extensions.cpp")

  -- self.jit:generateBitcodeFromFile(self.script_dir.."lua_base_extensions.cpp", self.bc_dir.."lua_base_extensions.bc")
  -- self.jit:loadModuleBitcode(self.bc_dir.."lua_base_extensions.bc")
  
  -- self.jit:loadModuleFromFile(self.script_dir.."lua_base_extensions.cpp")
  local endTime = nv.SystemTime.getCurrentTime()
  logDEBUG(string.format("Script compiled in %.3fms", (endTime - startTime)*1000.0))

  self.jit:call("loadLuaBaseExtensions")
</sxhjs>

=> And this actually worked without any serious trouble! If you think about it a moment this is already a pretty nice feature: in the JITCompiler lua frontend, we also have a "loadBitcodeForBuffer" for instance, so this means you could think about extending lua with concrete C++ extensions without even leaving the lua script where you want to use that extension with a code such as:

<sxhjs lua>
local jit = require "base.JITCompiler"

jit:loadBitcodeForBuffer[[
#include <core_lua.h>
#include <lua/LuaManager.h>
#include <NervApp.h>

using namespace nv;

extern "C" void loadLuaBaseExtensions()
{
  auto& lman = LuaManager::instance();
  auto& lua = lman.getMainState();

  lua["nvFileExists"] = &fileExists;
  logDEBUG("Done loading Lua extensions.");
};
]]

jit:execute("loadLuaBaseExtensions")

if nvFileExists("C:/temp/dummy.txt") then
	logDEBUG("Yes! my nvFileExists function is really available!")
else
	logDEBUG("Never mind! My nvFileExists function is really available anyway! :-)")
end
</sxhjs>

===== Bitcode serialization for LLVM module =====

In the previous section, we have seen that we could "load" some "bitcode" either from a C++ source file or from some given C++ source code in a memory buffer (ie. a simple string in lua). Now, this "bitcode" stuff is something new  we didn't mention in the previous articles. But the rationale is actually pretty simple: 

=> Initially, in my NervJIT class I was simply providing a C++ source file and generating an LLVM Module **in memory** directly for that file. But with my various compilation tests, I started to realize that compiling from source files would still take some **significant time** in some cases, and thus it would make sense to try to cache the resulting Module from a given script compilation so that we could simply **reuse** the corresponding Module in case the content of that script is not changed the next time we need it! So that's where bitcode comes into play ;-) LLVM can read/write its so called "Internal Representation" (IR) modules as **bitcode** files (usually, we would use the ".bc" extension for that).

Thus, in the NervJIT compiler, instead of directly generating a Module object from a given source file, we are now rather writing that generated module to a bitcode file (that step is only done if really needed), and then loading the Module object by reading the content of that bitcode file, before we inject that module into our LLJIT instance.

==== Writing Bitcode to file ====

The core function used to generate the bitcode file is the following:

<sxh cpp>
void NervJITImpl::generateBitcode(std::string outFile)
{
    auto& compilerInvocation = compilerInstance->getInvocation();
    auto& frontEndOptions = compilerInvocation.getFrontendOpts();
    std::string prevFile = std::move(frontEndOptions.OutputFile);
    frontEndOptions.OutputFile = std::move(outFile);

    // keep a copy of the current program action:
    auto prevAction = frontEndOptions.ProgramAction;
    frontEndOptions.ProgramAction = clang::frontend::EmitBC;

    if (!compilerInstance->ExecuteAction(*emit_bc_action))
    {
        ERROR_MSG("Cannot execute emit_bc_action with compiler instance!");
    }

    // Restore the previous values:
    frontEndOptions.OutputFile = std::move(prevFile);
    frontEndOptions.ProgramAction = prevAction;
}

Note that, here, the function is called after we provided an input file or an input buffer, with one of the following functions:

void NervJITImpl::setInputFile(const std::string& filename)
{
    auto& compilerInvocation = compilerInstance->getInvocation();
    auto& frontEndOptions = compilerInvocation.getFrontendOpts();
    frontEndOptions.Inputs.clear();
    frontEndOptions.Inputs.push_back(clang::FrontendInputFile(llvm::StringRef(filename), clang::InputKind(clang::Language::CXX)));
}

void NervJITImpl::setInputBuffer(llvm::MemoryBuffer* buf)
{
    auto& compilerInvocation = compilerInstance->getInvocation();
    auto& frontEndOptions = compilerInvocation.getFrontendOpts();
    frontEndOptions.Inputs.clear();
    frontEndOptions.Inputs.push_back(clang::FrontendInputFile(buf, clang::InputKind(clang::Language::CXX)));
}

I think it's worth explaining a little what we do in the generateBitcode: basically, we use the clang frontend itself, to generate the bitcode file for us from the source file. In fact, I think what I'm providing here is an implementation simulating a call such as:

clang -emit-llvm -o foo.bc -c foo.c

⇒ So, we keep everything as before: the command line setup of the compiler invocation, the additional header search paths, the preprocessor definitions, etc… But then, we do not perform the regular EmitLLVMOnlyAction that we had been using so far, instead:

  1. We override the ProgramAction field in the invocation frontendOptions, to be clang::frontend::EmitBC (I would assume the default value I would read here given my default invocation setup would be clang::frontend::EmitLLVMOnly)
  2. We assign the frontend option OutputFile to point to the location of the .bc file we want to write in the process.
  3. Finally we request the execution of the action “emit_bc_action” on our compiler instance. Note that this action is create before hand in our NervJITImpl constructor:
        action = std::make_unique<clang::EmitLLVMOnlyAction>(tsContext->getContext());
        emit_bc_action = std::make_unique<clang::EmitBCAction>();
    

Once the action is completed successfully, we have the resulting Module written in the provided OutputFile, and we “cleanup” the frontend options (just in case), restoring the default values we were using before.

There are many similar “emit LLVM” actions available in clang, such as EmitLLVM, EmitLLVMOnly, EmitBC, etc. From what I understand, EmitLLVMOnly would generate a Module object, but not write anything to the OutputFile, EmitBC generate bitcode and write it to output, EmitLLVM might generate assembly code, I'm not quite sure ? Anyway, if you need to investigate this you could start with the source file clang/lib/frontend/CompilerInstance.cpp as reference.
There are other ways to write an LLVM Module to a bitcode file, for instance, there is a LLVM helper function WriteBitcodeToFile: we will come back to that one later in this article.

The second part of the deal then is to be able to read the bitcode content of a given file to re-create an LLVM module object from it. And this is currently done in the NervJIT class using the parseIRFile LLVM helper function:

void NervJITImpl::loadModuleBitcode(const std::string& bcfile)
{
    // Note: we could also use getLazyIRFileModule here?
 
    llvm::SMDiagnostic Err;
    std::unique_ptr<Module> module(llvm::parseIRFile(llvm::StringRef(bcfile), Err, *tsContext->getContext()));
    if(!module) {
        THROW_MSG("Cannot load IR module from file "<<bcfile);
    }

    loadModule(std::move(module));
}

That part is really straightforward, just note that we are using here an helper function void loadModule(std::unique_ptr<Module> module); to perform the “injection” of our newly create IR module into our JIT execution session, because there are other important considerations to take into account here ⇒ We will come back to that point a bit later.

With the previous bitcode serialization mechanism implemented, it was pretty easy to setup a simple and minimal caching layer on top of it: when we generate bitcode from a source file there are only a few key elements that may change the result of the compilation:

  1. The command line arguments that we provided to setup the compiler instance,
  2. The additional header search paths that we specified,
  3. The preprocessor definitions that we provide,
  4. The actual content of the source script that we want to compile.

⇒ So, in my JITCompiler class, I use specific methods to set and at the same time cache the compilation settings (command line args, headers, macros) and I generate a SHA256 hash from those settings, updating the hash each time the settings are updated. using a simple lua function such as:

-- Methods used to generate a hash value from a string list:
function Class:computeStringListHash(list)
  return nv.sha256_from_buffer(table.concat(list, " "))
end
</sxhjs>

<note>The actual **sha256_from_buffer** function is implemented in C++ using this source as template: http://www.zedwood.com/article/cpp-sha256-function</note>

Then, when I need to compile a file or a buffer, I also generate a hash for the corresponding source code content, and then get an "overall hash" also taking into account the settings hashes mentioned just above:

<sxhjs lua>-- Get the hash for a given buffer in the current compilation context:
function Class:getContextualBufferHash(buf)
  local hash = nv.sha256_from_buffer(buf)
  return nv.sha256_from_buffer(self.current_context_hash .. hash)
end
</sxhjs>

And then I simply turn this hash string into a bitcode file name adding the ".bc" extension :-)

Finally, to figure out if I can use a cached bitcode result for that compilation, I only need to check if that bitcode file already exists, and if not, I do the compilation writing the file in the process:

<sxhjs lua>function Class:buildBitcodeForBuffer(buf, force)
  -- First we should compute the hash of the input buffer:
  local bufHash = self:getContextualBufferHash(buf)

  local bcfile = self.bc_dir..bufHash..".bc"
  if force or not nv.fileExists(bcfile) then
    logDEBUG("Generating bytecode file ", bcfile, "...")
    local startTime = nv.SystemTime.getCurrentTime()
    self.jit:generateBitcodeFromBuffer(buf, bcfile)
    local endTime = nv.SystemTime.getCurrentTime()
    logDEBUG(string.format("Bitcode compiled in %.3fms", (endTime - startTime)*1000.0))
  end

  return bcfile
end

function Class:loadBitcodeForBuffer(buf, force)
  local bcfile = self:buildBitcodeForBuffer(buf, force)
  -- Finally we load the generated bytecode:
  self.jit:loadModuleBitcode(bcfile)
end
</sxhjs>

**Important note**: There are unfortunately 2 important limitations with this caching mechanism (but I can live wit these for the time being, so this system is good enough for me):
  - As decribed above, the "compilation context hashes" will not take into account any change made in **headers** that are included into your source script. So if you are not careful about that point, you might use a cached version of a module that doesn't take into account you latest "header only" updates. [But of course, there are a few options to fix this problem if really needed]
  - With this system, you will write multiple "hashed name .bc file" (such as: "fffdc84aec07abae174c32795464209fb8e85bd2b6e3fea29521ce0da6bb8831.bc") in a given folder, and each time you make a change to your source script content (or other compilation context elements) you will get a different hash for the same [or at least, very similar] code content. So you get two sides of the same coin here: on one side, its nice because this means that if during your testing you "revert" to some previously already generated hash, then you already have the corresponding cache file and you save some compilation time. And on the other side, obviously, you get "orphan cache files" that will start to accumulate with time, so, from time to time, you need to do some cleanup ;-) [But again, this doesn't seem to be a too serious limitation from my current perspective, and I think there are a few "appropriate ways" to deal with this].


===== Precompiled Headers (PCH) handling =====

The next important element I think it's worth discussing here is the support for Precompiled Header into our JIT Compiler. This is important because often we can build "large modules" from multiple C++ source files and using PCH can save a lot of compilation time in those cases.

So we have 2 parts to handle for this feature: on one side, we must provide support to **generate a PCH file**, and on the second side, we need to provide that PCH file as "input" when compiling the source files afterwards.


==== Generation PCHs ====

The main function I use tu generate the PCH file is the following:

<sxh cpp>
void NervJITImpl::generatePCH(std::string outFile)
{
    auto& compilerInvocation = compilerInstance->getInvocation();
    auto& frontEndOptions = compilerInvocation.getFrontendOpts();
    std::string prevFile = std::move(frontEndOptions.OutputFile);
    frontEndOptions.OutputFile = std::move(outFile);

    // keep a copy of the current program action:
    auto prevAction = frontEndOptions.ProgramAction;
    frontEndOptions.ProgramAction = clang::frontend::GeneratePCH;

    if (!compilerInstance->ExecuteAction(*gen_pch_action))
    {
        ERROR_MSG("Cannot execute gen_pch_action with compiler instance!");
    }

    // Restore the previous values:
    frontEndOptions.OutputFile = std::move(prevFile);
    frontEndOptions.ProgramAction = prevAction;
}

⇒ As you can see, the idea is very similar to what we did to generate the module bitcode files already in the section above: again, we keep “everything else as is”, and when we make a request to generate a PCH file we just change the ProgramAction, and the OutputFile in the frontend options of our compiler invocation. The action that we use this time is the following (also created in the NervJITImpl constructor):

gen_pch_action = std::make_unique<clang::GeneratePCHAction>();

And basically, this is it! :-) By convention, you would specify a “.pch” file as output file in this case, and if the action is executed without error then this output file is written, and your next step is to use it to perform actual module generation.

Note: this implementation is (I think) doing the same thing as what you would get with the command line version:

clang -cc1 test.h -emit-pch -o test.h.pch

To specify that a given PCH file should be used for the following compilations is even simpler, we just need to update the preprocessor options accordingly:

void NervJIT::usePCHFile(std::string pchfile)
{
    auto& compilerInvocation = impl->compilerInstance->getInvocation();
    auto& opts = compilerInvocation.getPreprocessorOpts();
    opts.ImplicitPCHInclude = std::move(pchfile);
}

After that, you can generate your module and bitcode as usual, and the content of the PCH will be used as expected. Not much to add on this point, right ?

On the “lua side of things”, I'm also using a PCH caching mechanism strictly equivalent to the bitcode caching mechanism I described earlier. So for my part, I'm also generating pch file with “hash names” corresponding to the content of the file that was used to generate that PCH + compilation contextual hashes

Continuing with my C++ script tests, I eventually faced another serious issue when I tried to use some static variables inside my functions. For instance, I would get a compilation error trying to compile the following minimal test function:

#include <core_common.h>

extern "C" void test_func() {
    static int t = (int)1000000*cos(34.5);
    logDEBUG("My int value is: "<<t);
}

⇒ This would lead to a missing symbols error:

JIT session error: Symbols not found: [ __emutls_v._Init_thread_epoch, __emutls_get_address, _Init_thread_header, _Init_thread_footer, _Init_thread_abort ]

The “_Init_thread_header”, “_Init_thread_footer”, “_Init_thread_abort” were easy to find directing inside the vcruntime part of the microsoft CRT. So I added them into my llvm_syms module where I reexport all the required symbols that are otherwise missing when linking my JIT modules.

For the records, in my previous posts I called that export module llvm_helper, then I actually tried to exported all the symbols I needed directly from my nvLLVM shared library. But eventually, I had to revert to using a dedicate helper module which I named llvm_syms to export those missing symbols, because otherwise I was getting some duplicate symbols when trying to link the nvLLVM library to my lua LLVM bindings library…

But then, I couldn't find at all where the __emutls_v._Init_thread_epoch and __emutls_get_address could come from :-(, So digging keeper into the LLVM sources I finally realized that those functions were in fact used in some kind of TLS emulation layer (TLS stands for Thread Local Storage here in case you are wondering) that is used on some platforms where proper TLS support is not available.

⇒ normally you could also select to enable/disable TLS emulation explicitly with the cland command line arguments -femulated-tls (or -fno-emulated-tls), and this would update the entries EmulatedTLS and ExplicitEmulatedTLS from the codegen options of your compiler invocation settings. But this didn't seem to work for me as I was expecting: in my NervJIT compiler, it seems that I was still using emulated TLS anyway :-(. It took me some good time to understand what was happening here, but I finally got it reading a bit more carefully the comments on the JITTargetMachineBuilder::detectHost() function:

  /// Create a JITTargetMachineBuilder for the host system.
  ///
  /// Note: TargetOptions is default-constructed, then EmulatedTLS and
  /// ExplicitEmulatedTLS are set to true. If EmulatedTLS is not
  /// required, these values should be reset before calling
  /// createTargetMachine.
  static Expected<JITTargetMachineBuilder> detectHost();

Then, of course, I updated my LLJIT creation section as follow:

    auto jtmb = CHECK_LLVM(JITTargetMachineBuilder::detectHost());

    // Resetting emulateTLS to false:
    auto& tgtOpts = jtmb.getOptions();
    tgtOpts.EmulatedTLS = false;
    tgtOpts.ExplicitEmulatedTLS = false;
    
    targetMachine = CHECK_LLVM(jtmb.createTargetMachine());
    DEBUG_MSG("Target machine using emulated TLS: "<<(targetMachine->useEmulatedTLS() ? "YES":"NO"));

    auto dl = CHECK_LLVM(jtmb.getDefaultDataLayoutForTarget());
    DEBUG_MSG("Default layout prefix for target: '" << dl.getGlobalPrefix() << "'");

    DEBUG_MSG("Creating LLJIT object.");

    LLJITBuilder llb;
    llb.setJITTargetMachineBuilder(std::move(jtmb)).setNumCompileThreads(2);

    lljit = CHECK_LLVM(llb.create());
    
    DEBUG_MSG("Done creating LLJIT object.")

And with that change, those missing “__emutls_XXX” symbols immediately went await :-) Yeepee! Instead I got two new missing symbols (obviously related to the problem at hands): _tls_index and _Init_thread_epoch. Yet, it turns out that those 2 symbols were available in the microsoft crt this time! So just exporting what I needed from my llvm_syms module again:

#pragma comment(linker, "/export:_Init_thread_header")
#pragma comment(linker, "/export:_Init_thread_footer")
#pragma comment(linker, "/export:_Init_thread_abort")
#pragma comment(linker, "/export:_tls_index")
#pragma comment(linker, "/export:_Init_thread_epoch")

And this did the trick! I could then compile and run the test function mentioned above without any problem. Feeeeww! That was tricky…

At that point, I felt confident enough to consider trying “something bigger”: so I thought I should try executing some C++ unit tests directly with those scripts. Unfortunately, it seems it was still a bit too early after all lol. Explanations:

Until now, I'd been using the Boost test framework to build regular unit tests, yet I decided I should take this opportunity as a chance to upgrade my test environment to something new. I was particularly interested in header only solutions, so, obviously, I quickly found the Catch/Catch2 project, and this sound really promising/interesting.

⇒ So of course I decided to give it a try… but I didn't get much luck on that path :-( I spent a significant time trying to figure out what was going wrong (adding debug outputs everywhere in catch and then trying to run test scripts), and made some good progress, but still for the moment this option is still not working for me, arff!!

Anyway, seeing that Catch would not lead me anywhere before long, I thought I should give the boost test framework a try too… but here also, I didn't get this to work at that point: I tried many things, but still, I'm stuck with segmentation fault on that one… too bad.

Conclusion: that first attempt to execute C++ unit tests with scripts didn't went very well at all :-) But still, as I said, I discovered a few interesting things when trying to understand why the catch framework was not working as expected:

  1. First, I noticed that my global/static variables were simply not constructed at all in my C++ scripts
  2. And also, C++ exceptions were apparently not handled properly in my JIT code

⇒ Both were pretty serious issues, so I had to investigate those points.

As I just said, “globals” were apparently just not working in my JIT code, so I prepared a minimal reference test to investigate this one:

local jit = import "base.JITCompiler"

jit:runFunction("test_func", [[
#include <iostream>

#define DEBUG_MSG(msg) std::cout << msg << std::endl;

class MyTest {
public:
    MyTest() {
        std::cout << "Creating a MyTest object." << std::endl;
    }

    ~MyTest() {
        std::cout << "Deleting a MyTest object." << std::endl;
    }

    void hello() {
        std::cout << "Hello!" << std::endl;
    }
};

static MyTest test;

extern "C" void test_func() {
    DEBUG_MSG("Running test function.");
    //test.hello();
}
]])</sxhjs>

When executing that lua script I was expecting to get the messages "Creating a MyTest object." at first, and then on completion of the script (when my JIT compiler is finally unloaded), the message "Deleting a MyTest object.", but of course, I wasn't receiving anything at first.

<note>In the C++ script just above, I declared a **static MyTest test;** object, but in fact this problem/solution is exactly the same if you do not use the **static** specifier.</note>

So back to my best friend (google lol!): searching for explanations on why this would happen and what to do about it... And I finally figured out this has to do with the **LLVM modules initialization and uninitialization**. To be more precise, when you load a module in your JIT session, you are supposed to execute the **constructor** functions available in that module, and you should also run the **destructor** functions when that modules gets out of scope or the JIT session is destroyed.

Now the "problem is", this module init/uninit process has been evolving significantly lately:

  * "Initially", you would "collect" all the constructors in a **llvm::orc::CtorDtorRunner**, then add your module to the JIT, then run those constructors as follow: <sxh cpp>
llvm::orc::CtorDtorRunner R(lljit->getMainJITDylib());
R.add(llvm::orc::getConstructors(*module));

DEBUG_MSG("Adding module from bytecode.");
auto err = lljit->addIRModule(ThreadSafeModule(std::move(module), *tsContext));
checkLLVMError(std::move(err));

checkLLVMError(R.run());

⇒ For additional details you could have a look at this discussion: https://lists.llvm.org/pipermail/llvm-dev/2019-March/131057.html

  • Then at some point in the LLVM version 10 implementation the runConstructors()/runDestructors() functions were introduced on the LLJIT class. So you would rather do:
    DEBUG_MSG("Adding module from bytecode.");
    auto err = lljit->addIRModule(ThreadSafeModule(std::move(module), *tsContext));
    checkLLVMError(std::move(err));
    
    checkLLVMError(lljit->runConstructors());  
    
  • And now in LLVM version 11 (current version on git), the runConstructors()/runDestructors() functions were removed and replaced with the initialize(…)/uninitialize(…) functions:
    DEBUG_MSG("Adding module from bytecode.");
    auto err = lljit->addIRModule(ThreadSafeModule(std::move(module), *tsContext));
    checkLLVMError(std::move(err));
    
    checkLLVMError(lljit->initialize(lljit->getMainJITDylib()));
    

⇒ For more details on this latest implementation you could start with this page: https://groups.google.com/forum/#!msg/llvm-dev/DU5YYthVbrY/wXR1zZ7TAAAJ

I tested the 3 options described just above (because I temporarily switched to LLVM version 10.0.0 at some point in the process): all seemed to work with similar results, so for now I'll just stick to the lastest version officially available, using the initialize()/uninitialize() function. And thus, here is the loadModule(std::unique_ptr<Module> module) function I mentioned earlier, which I use to ensure that the module globals are initialized appropriately:

void NervJITImpl::loadModule(std::unique_ptr<Module> module)
{
    // llvm::orc::CtorDtorRunner R(lljit->getMainJITDylib());
    // R.add(llvm::orc::getConstructors(*module));

    DEBUG_MSG("Adding module from bytecode.");
    auto err = lljit->addIRModule(ThreadSafeModule(std::move(module), *tsContext));
    checkLLVMError(std::move(err));

    // Now we should try to run the static initializers if any:
    DEBUG_MSG("Calling dyn lib initialize()");
    checkLLVMError(lljit->initialize(lljit->getMainJITDylib()));
    // checkLLVMError(lljit->runConstructors());
    // checkLLVMError(R.run());
    DEBUG_MSG("Done calling dyn lib initialize()");
}

And of course, I also call uninitialize() for my main JITDylib in the NervJITImpl destructor for proper clean-up:

NervJITImpl::~NervJITImpl()
{
    // We should uninitialize our main dyn library here:
    DEBUG_MSG("Uninitializing main JIT lib.");
    checkLLVMError(lljit->deinitialize(lljit->getMainJITDylib()));
    DEBUG_MSG("Done uninitializing main JIT lib.");
}

Unfortunately, this was still not enough to get my minimal test script to work properly: Indeed I could now see that my global constructor was called as expected, but then the corresponding destructor was not called when I was destroying my JIT compiler, and instead I would inevitably get a segmentation fault at the very end of my program execution :-(. So, to me it really seemed like if the global destructor was register with my “program level” atexit() handler which is exactly the problem describe by Lang Hames on the second part of this page: and this is where the LocalCXXRuntimeOverrides is supposed to come into play.

After some investigations I found the file llvm/tools/lli/lli.cpp in LLVM 10.0.0 sources containing a [presumably] working usage example of that utility:

orc::MangleAndInterner Mangle(J->getExecutionSession(), J->getDataLayout());

orc::LocalCXXRuntimeOverrides CXXRuntimeOverrides;
ExitOnErr(CXXRuntimeOverrides.enable(J->getMainJITDylib(), Mangle));

// Then later on destruction, we call:
CXXRuntimeOverrides.runDestructors();

So I tried to use this in my code, but once more this didn't really work for me:

  1. First, this is only compatible with LLVM 10.0.0: in version 11.0.0 it seems the LLJIT class would already register absolute symbols for the names that this utility is trying to register too, and thus you get duplicate symbols issues:
    Error LocalCXXRuntimeOverrides::enable(JITDylib &JD,
                                            MangleAndInterner &Mangle) {
      SymbolMap RuntimeInterposes;
      RuntimeInterposes[Mangle("__dso_handle")] =
        JITEvaluatedSymbol(toTargetAddress(&DSOHandleOverride),
                           JITSymbolFlags::Exported);
      RuntimeInterposes[Mangle("__cxa_atexit")] =
        JITEvaluatedSymbol(toTargetAddress(&CXAAtExitOverride),
                           JITSymbolFlags::Exported);
    
      return JD.define(absoluteSymbols(std::move(RuntimeInterposes)));
    }
    
  2. Even when using LLVM 10.0.0, I would still get my segmentation fault when using this utility… It took me quite a long time to figure out what was happening but I finally realize this was because somehow, the code I'm generating in my JIT compiler is not registering a destructor with the “__cxa_atexit” handler, but instead, with the legacy/obsolete “atexit” handler . Unfortunately, I have no real idea why (yet).
I've tried various setups with the command line arguments “-fno-use-cxa-atexit”, “-fuse-cxa-atexit”, “-fregister-global-dtors-with-atexit”, but none of these really helped to use the cxa_atexit handler. But as a remainder, don't forget that I'm on windows 10 platform and using all the ms compatibility flags for Visual Studio… maybe that could be part of the explanation [Or maybe I just need to try harder…]

But anyway, this gave me an idea: I could simply try to mimic the work done in that LocalCXXRuntimeOverrides utility and thus provide my own axexit handler, so I came up with this kind of code:

typedef void(*ExitFunc)();
std::vector<ExitFunc> exitFuncList;

static int AtExitOverride(ExitFunc func)
{
    DEBUG_MSG("Registering destructor in my AtExit override: "<<(const void*)func);
    exitFuncList.push_back(func);
    return 0;
}

static void runDestructors() {
    // We execute the functions in LIFO order:
    while(!exitFuncList.empty()) {
        DEBUG_MSG("Executing one at exit function.")
        exitFuncList.back()();
        exitFuncList.pop_back();
        DEBUG_MSG("Done executing one at exit function.")
    }
}

NervJITImpl::NervJITImpl()
{
	// Creating lljit object here.

    {
        auto& JD = lljit->getMainJITDylib();
        SymbolMap RuntimeInterposes;

        // We need to manually take care of the atexit function itself:
        RuntimeInterposes[(*mangler)("atexit")] =
            JITEvaluatedSymbol(toTargetAddress(&AtExitOverride),
                            JITSymbolFlags::Exported);
        checkLLVMError(JD.define(absoluteSymbols(std::move(RuntimeInterposes))));
    }
}

NervJITImpl::~NervJITImpl()
{
    // We should uninitialize our main dyn library here:
    DEBUG_MSG("Uninitializing main JIT lib.");
    checkLLVMError(lljit->deinitialize(lljit->getMainJITDylib()));
    runDestructors();
    DEBUG_MSG("Done uninitializing main JIT lib.");
}

⇒ And with those updates, my test script finally worked! Producing this kind of output:

[Debug]               Bitcode compiled in 1221.376ms
[DEBUG]: Adding module from bytecode.
[DEBUG]: Calling dyn lib initialize()
Creating a MyTest object.
[DEBUG]: Registering destructor in my AtExit override: 000002B73EE00340
[DEBUG]: Done calling dyn lib initialize()
Running test function.
[DEBUG]: Deleting NervJIT object.
[DEBUG]: Uninitializing main JIT lib.
[DEBUG]: Executing one at exit function.
Deleting a MyTest object.
[DEBUG]: Done executing one at exit function.
[DEBUG]: Done uninitializing main JIT lib.
[DEBUG]: Deleted NervJIT object.
Deleted LogManager object.

And this concludes this section on the globals construction & destruction support in our JIT compiler. Now it's time to move to the second significant issue I discovered when trying to run my C++ unit tests with catch from JIT code: the support for C++ exceptions.

From what I read on the Internet (mainly from the llvm.org email archives) exceptions handling in C++ on windows 64 doesn't seem to really be support inside the JIT compiled code: this is a pretty complex subject, so I don't pretent I understand everything here, but still I built the following minimal test script on this topic:

local jit = import "base.JITCompiler"

jit:runFunction("test_func", [[
#include <iostream>

#define DEBUG_MSG(msg) std::cout << msg << std::endl;

extern "C" void test_func() noexcept(false) {
    DEBUG_MSG("Begin test.")
    try {
        DEBUG_MSG("I'm throwing an exception.");
        throw std::exception("My exception message");
    }
    catch(const std::exception& e) {
       DEBUG_MSG("Catched exception: "<<e.what());
    }
    catch(...) {
        DEBUG_MSG("Catched exception.");
    }
    DEBUG_MSG("End test.")
    
    throw std::exception("Throwing from extern C :-)!");
    DEBUG_MSG("Real end test.")
}
]])
</sxhjs>

And of course this didn't work for me (simply crashing the program completely after the "I'm throwing an exception." message).

<note>Yet, you should still be able to compile this code just fine, but keep in mind you have to enable the exceptions handling support first using the command line arguments "-fcxx-exceptions", "-fexceptions" and "-fexternc-nounwind"</note>

So, again, I spent a significant time trying to find a proper solution to this problem (or at the very least some hints on how to handle that) online, and eventually I found this commit review page: https://reviews.llvm.org/D35103

<note>Actually, I'm not quite sure what happened with that list of commits :-) It's quite old (it was closed in March 2018) but "this feature" doesn't seem to be anywhere in the current LLVM sources, so ?</note>

=> Anyway, this was the only valid template I found so I really had to try to use that. Basically, from that code, I took the complete implementation of the **SingleSectionMemoryManager** class and kept it **almost as is**: the only real change I made on this level was in the SingleSectionMemoryManager constructor: in there **I do not register** a symbol for the "_CxxThrowException" function (I tried at first, but that didn't seem to work: and anyway with the LLJIT class you are supposed to proceed differently AFAIK, so we will get to this just after):

<sxh cpp>
#if defined(_WIN64)
// cf. https://reviews.llvm.org/D35103
#define NV_JIT_HANDLE_EXCEPTIONS 1
#endif

#if NV_JIT_HANDLE_EXCEPTIONS
class SingleSectionMemoryManager : public llvm::SectionMemoryManager {
  struct Block {
    uint8_t *Addr = nullptr, *End = nullptr;
    void Reset(uint8_t *Ptr, uintptr_t Size);
    uint8_t *Next(uintptr_t Size, unsigned Alignment);
  };
  Block Code, ROData, RWData;

public:
  uint8_t *allocateCodeSection(uintptr_t Size, unsigned Align, unsigned ID,
                               llvm::StringRef Name) final;

  uint8_t *allocateDataSection(uintptr_t Size, unsigned Align, unsigned ID,
                               llvm::StringRef Name, bool RO) final;

  void reserveAllocationSpace(uintptr_t CodeSize, uint32_t CodeAlign,
                              uintptr_t ROSize, uint32_t ROAlign,
                              uintptr_t RWSize, uint32_t RWAlign) final;

  bool needsToReserveAllocationSpace() override { return true; }

  using llvm::SectionMemoryManager::EHFrameInfos;

  SingleSectionMemoryManager();

  void deregisterEHFrames() override;

  bool finalizeMemory(std::string *ErrMsg) override;

private:
  uintptr_t ImageBase = 0;
};

void SingleSectionMemoryManager::Block::Reset(uint8_t *Ptr, uintptr_t Size) {
  assert(Ptr != nullptr && "Bad allocation");
  Addr = Ptr;
  End = Ptr ? Ptr + Size : nullptr;
}

uint8_t *SingleSectionMemoryManager::Block::Next(uintptr_t Size,
                                                 unsigned Alignment) {
  uintptr_t Out = (uintptr_t)Addr;

  // Align the out pointer properly
  if (!Alignment)
    Alignment = 16;
  Out = (Out + Alignment - 1) & ~(uintptr_t)(Alignment - 1);

  // RuntimeDyld should have called reserveAllocationSpace with an amount that
  // will fit all required alignemnts...but assert on this to make sure.
  assert((Out + Size) <= (uintptr_t)End && "Out of bounds");

  // Set the next Addr to deliver at the end of this one.
  Addr = (uint8_t *)(Out + Size);
  return (uint8_t *)Out;
}

uint8_t *SingleSectionMemoryManager::allocateCodeSection(uintptr_t Size,
                                                         unsigned Align,
                                                         unsigned ID,
                                                         StringRef Name) {
  return Code.Next(Size, Align);
}

uint8_t *SingleSectionMemoryManager::allocateDataSection(
    uintptr_t Size, unsigned Align, unsigned ID, StringRef Name, bool RO) {
  return RO ? ROData.Next(Size, Align) : RWData.Next(Size, Align);
}

void SingleSectionMemoryManager::reserveAllocationSpace(
    uintptr_t CodeSize, uint32_t CodeAlign, uintptr_t ROSize, uint32_t ROAlign,
    uintptr_t RWSize, uint32_t RWAlign) {
  // FIXME: Ideally this should be one contiguous block, with Code, ROData,
  // and RWData pointing to sub-blocks within, but setting the correct
  // permissions for that wouldn't work unless we over-allocated to have each
  // Block.Base aligned on a page boundary.
  const unsigned SecID = 0;
  Code.Reset(SectionMemoryManager::allocateCodeSection(CodeSize, CodeAlign,
                                                       SecID, "code"),
             CodeSize);

  ROData.Reset(SectionMemoryManager::allocateDataSection(ROSize, ROAlign, SecID,
                                                         "rodata", true/*RO*/),
               ROSize);

  RWData.Reset(SectionMemoryManager::allocateDataSection(RWSize, RWAlign, SecID,
                                                         "rwdata", false/*RO*/),
               RWSize);

  ImageBase =
      (uintptr_t)std::min(std::min(Code.Addr, ROData.Addr), RWData.Addr);
}

// FIXME: Rather than this static and overriding _CxxThrowException via
// DynamicLibrary::AddSymbol, a better route would be to transform the call
// to _CxxThrowException(Arg0, Arg1) -> RaiseSEHException(Arg0, Arg1, this)
// where 'this' is the SingleSectionMemoryManager instance.  This could probably
// be done with clang, and definitely possible by injecting an llvm-IR function
// into the module with the name '_CxxThrowException'
//
static SEHFrameHandler sFrameHandler;

void SingleSectionMemoryManager::deregisterEHFrames() {
  sFrameHandler.DeRegisterEHFrames(ImageBase, EHFrames);
  EHFrameInfos().swap(EHFrames);
}

bool SingleSectionMemoryManager::finalizeMemory(std::string *ErrMsg) {
  sFrameHandler.RegisterEHFrames(ImageBase, EHFrames);
  ImageBase = 0;
  return SectionMemoryManager::finalizeMemory(ErrMsg);
}

SingleSectionMemoryManager::SingleSectionMemoryManager() {
  // Override Windows _CxxThrowException to call into our local version that
  // can throw to and from the JIT.
//   that function should not be called here, instead, we register an absolute symbol in our JIT libs.
//   sys::DynamicLibrary::AddSymbol(
//       "_CxxThrowException",
//       (void *)(uintptr_t)&SEHFrameHandler::RaiseSEHException);
}

#endif

I think the change I made on the “_CxxThrowException” symbol is actually what is suggested in the “FIXME” comment above :-)

Same thing for the SEHFrameHandler class: I could just keep it as is, and it would compile just fine:

#ifdef NV_JIT_HANDLE_EXCEPTIONS

// Map an "ImageBase" to a range of adresses that can throw.
//
class SEHFrameHandler {
  typedef SingleSectionMemoryManager::EHFrameInfos EHFrameInfos;
  typedef std::vector<std::pair<DWORD, DWORD>> ImageRanges;
  typedef std::map<uintptr_t, ImageRanges> ImageBaseMap;
  ImageBaseMap m_Map;

  static void MergeRanges(ImageRanges &Ranges);
  uintptr_t FindEHFrame(uintptr_t Caller);

public:
  static __declspec(noreturn) void __stdcall RaiseSEHException(void *, void *);
  void RegisterEHFrames(uintptr_t ImageBase, const EHFrameInfos &Frames,
                        bool Block = true);
  void DeRegisterEHFrames(uintptr_t ImageBase, const EHFrameInfos &Frames);
};

// Merge overlaping ranges for faster searching with throwing PC
void SEHFrameHandler::MergeRanges(ImageRanges &Ranges) {
  std::sort(Ranges.begin(), Ranges.end());

  ImageRanges Merged;
  ImageRanges::iterator It = Ranges.begin();
  auto Current = *(It)++;
  while (It != Ranges.end()) {
    if (Current.second + 1 < It->first) {
      Merged.push_back(Current);
      Current = *(It);
    } else
      Current.second = std::max(Current.second, It->second);
    ++It;
  }
  Merged.emplace_back(Current);
  Ranges.swap(Merged);
}

// Find the "ImageBase" for Caller/PC who is throwing an exception
uintptr_t SEHFrameHandler::FindEHFrame(uintptr_t Caller) {
  for (auto &&Itr : m_Map) {
    const uintptr_t ImgBase = Itr.first;
    for (auto &&Rng : Itr.second) {
      if (Caller >= (ImgBase + Rng.first) && Caller <= (ImgBase + Rng.second))
        return ImgBase;
    }
  }
  return 0;
}

// Register a range of adresses for a single section that
void SEHFrameHandler::RegisterEHFrames(uintptr_t ImageBase,
                                       const EHFrameInfos &Frames, bool Block) {
  if (Frames.empty())
    return;
  assert(m_Map.find(ImageBase) == m_Map.end());

  ImageBaseMap::mapped_type &Ranges = m_Map[ImageBase];
  ImageRanges::value_type *BlockRange = nullptr;
  if (Block) {
    // Merge all unwind adresses into a single contiguous block for faster
    // searching later.
    Ranges.emplace_back(std::numeric_limits<DWORD>::max(),
                        std::numeric_limits<DWORD>::min());
    BlockRange = &Ranges.back();
  }

  for (auto &&Frame : Frames) {
    assert(m_Map.find(DWORD64(Frame.Addr)) == m_Map.end() &&
           "Runtime function should not be a key!");

    PRUNTIME_FUNCTION RFunc = reinterpret_cast<PRUNTIME_FUNCTION>(Frame.Addr);
    const size_t N = Frame.Size / sizeof(RUNTIME_FUNCTION);
    if (BlockRange) {
      for (PRUNTIME_FUNCTION It = RFunc, End = RFunc + N; It < End; ++It) {
        BlockRange->first = std::min(BlockRange->first, It->BeginAddress);
        BlockRange->second = std::max(BlockRange->second, It->EndAddress);
      }
    } else {
      for (PRUNTIME_FUNCTION It = RFunc, End = RFunc + N; It < End; ++It)
        Ranges.emplace_back(It->BeginAddress, It->EndAddress);
    }

    ::RtlAddFunctionTable(RFunc, N, ImageBase);
  }

  if (!Block)
    MergeRanges(Ranges); // Initial sort and merge
}

void SEHFrameHandler::DeRegisterEHFrames(uintptr_t ImageBase,
                                         const EHFrameInfos &Frames) {
  if (Frames.empty())
    return;

  auto Itr = m_Map.find(ImageBase);
  if (Itr != m_Map.end()) {
    // Remove the ImageBase from lookup
    m_Map.erase(Itr);

    // Unregister all the PRUNTIME_FUNCTIONs
    for (auto &&Frame : Frames)
      ::RtlDeleteFunctionTable(reinterpret_cast<PRUNTIME_FUNCTION>(Frame.Addr));
  }
}

// Adapted from VisualStudio/VC/crt/src/vcruntime/throw.cpp
#ifdef _WIN64
#define _EH_RELATIVE_OFFSETS 1
#endif
// The NT Exception # that we use
#define EH_EXCEPTION_NUMBER ('msc' | 0xE0000000)
// The magic # identifying this version
#define EH_MAGIC_NUMBER1 0x19930520
#define EH_PURE_MAGIC_NUMBER1 0x01994000
// Number of parameters in exception record
#define EH_EXCEPTION_PARAMETERS 4

// A generic exception record
struct EHExceptionRecord {
  DWORD ExceptionCode;
  DWORD ExceptionFlags;               // Flags determined by NT
  _EXCEPTION_RECORD *ExceptionRecord; // Extra exception record (unused)
  void *ExceptionAddress;             // Address at which exception occurred
  DWORD NumberParameters; // No. of parameters = EH_EXCEPTION_PARAMETERS
  struct EHParameters {
    DWORD magicNumber;            // = EH_MAGIC_NUMBER1
    void *pExceptionObject;       // Pointer to the actual object thrown
    struct ThrowInfo *pThrowInfo; // Description of thrown object
#if _EH_RELATIVE_OFFSETS
    DWORD64 pThrowImageBase; // Image base of thrown object
#endif
  } params;
};

__declspec(noreturn) void __stdcall
SEHFrameHandler::RaiseSEHException(void *CxxExcept, void *Info) {
    // DEBUG_MSG("Entering SEHFrameHandler::RaiseSEHException()!");

  uintptr_t Caller;
  static_assert(sizeof(Caller) == sizeof(PVOID), "Size mismatch");

  USHORT Frames = CaptureStackBackTrace(1, 1, (PVOID *)&Caller, NULL);
  assert(Frames && "No frames captured");
  (void)Frames;

  const DWORD64 BaseAddr = sFrameHandler.FindEHFrame(Caller);
  if (BaseAddr == 0)
    _CxxThrowException(CxxExcept, (_ThrowInfo *)Info);

  // A generic exception record
  EHExceptionRecord Exception = {
      EH_EXCEPTION_NUMBER,      // Exception number
      EXCEPTION_NONCONTINUABLE, // Exception flags (we don't do resume)
      nullptr,                  // Additional record (none)
      nullptr,                  // Address of exception (OS fills in)
      EH_EXCEPTION_PARAMETERS,  // Number of parameters
      {EH_MAGIC_NUMBER1, CxxExcept, (struct ThrowInfo *)Info,
#if _EH_RELATIVE_OFFSETS
       BaseAddr
#endif
      }};

// const ThrowInfo* pTI = (const ThrowInfo*)Info;

#ifdef THROW_ISWINRT
  if (pTI && (THROW_ISWINRT((*pTI)))) {
    // The pointer to the ExceptionInfo structure is stored sizeof(void*)
    // infront of each WinRT Exception Info.
    ULONG_PTR *EPtr = *reinterpret_cast<ULONG_PTR **>(CxxExcept);
    EPtr--;

    WINRTEXCEPTIONINFO **ppWei = reinterpret_cast<WINRTEXCEPTIONINFO **>(EPtr);
    pTI = (*ppWei)->throwInfo;
    (*ppWei)->PrepareThrow(ppWei);
  }
#endif

  // If the throw info indicates this throw is from a pure region,
  // set the magic number to the Pure one, so only a pure-region
  // catch will see it.
  //
  // Also use the Pure magic number on Win64 if we were unable to
  // determine an image base, since that was the old way to determine
  // a pure throw, before the TI_IsPure bit was added to the FuncInfo
  // attributes field.
  if (Info != nullptr) {
#ifdef THROW_ISPURE
    if (THROW_ISPURE(*pTI))
      Exception.params.magicNumber = EH_PURE_MAGIC_NUMBER1;
#if _EH_RELATIVE_OFFSETS
    else
#endif // _EH_RELATIVE_OFFSETS
#endif // THROW_ISPURE

    // Not quite sure what this is about, but pThrowImageBase can never be 0
    // here, as that is used to mark when an "ImageBase" was not found.
#if 0 && _EH_RELATIVE_OFFSETS
    if (Exception.params.pThrowImageBase == 0)
      Exception.params.magicNumber = EH_PURE_MAGIC_NUMBER1;
#endif // _EH_RELATIVE_OFFSETS
  }

// Hand it off to the OS:
#if defined(_M_X64) && defined(_NTSUBSET_)
  RtlRaiseException((PEXCEPTION_RECORD)&Exception);
#else
  RaiseException(Exception.ExceptionCode, Exception.ExceptionFlags,
                 Exception.NumberParameters, (PULONG_PTR)&Exception.params);
#endif
}

#endif

And then we have the required changes to make the code provided above compatible with the LLJIT class in LLVM 11.0.0:

  1. First, we need to inject the symbol for “_CxxThrowException” into our JIT Dylib, so I'm using this updated code section to achieve this:

NervJITImpl::NervJITImpl()
{
	// Creating lljit object here.

    {
        auto& JD = lljit->getMainJITDylib();
        SymbolMap RuntimeInterposes;

        // We need to manually take care of the atexit function itself:
        RuntimeInterposes[(*mangler)("atexit")] =
            JITEvaluatedSymbol(toTargetAddress(&AtExitOverride),
                            JITSymbolFlags::Exported);
#if NV_JIT_HANDLE_EXCEPTIONS
        RuntimeInterposes[(*mangler)("_CxxThrowException")] =
            JITEvaluatedSymbol(toTargetAddress(&SEHFrameHandler::RaiseSEHException),
                            JITSymbolFlags::Exported);
#endif

        checkLLVMError(JD.define(absoluteSymbols(std::move(RuntimeInterposes))));
    }
}

And then we also need to explicitly “tell our LLJIT object” to use this new “SingleSectionMemoryManager” we just implemented. In the “clang interpreter” reference code this was done as follow:

// (Warning: this code doesn't apply when building a LLJIT object)
static llvm::ExecutionEngine *
createExecutionEngine(std::unique_ptr<llvm::Module> M, std::string *ErrorStr) {
  llvm::EngineBuilder EB(std::move(M));
  EB.setErrorStr(ErrorStr);
  EB.setMemoryManager(llvm::make_unique<SingleSectionMemoryManager>());
  llvm::ExecutionEngine *EE = EB.create();
  EE->finalizeObject();
  return EE;
}

… But of course, this doesn't apply for us, because we don't have an ExecutionEngine component in out LLJIT object AFAIK. So searching in the LLJIT sources, I finally got how you are supposed to do this in the newer implementation: you have to provide a ObjectLinkingLayerCreator to your LLJIT builder as shown below:

    LLJITBuilder llb;
    llb.setJITTargetMachineBuilder(std::move(jtmb)).setNumCompileThreads(2);
#if NV_JIT_HANDLE_EXCEPTIONS
    // We use our custom memory manager here:
    llb.setObjectLinkingLayerCreator([](ExecutionSession &ES, const Triple &triple) -> std::unique_ptr<ObjectLayer> {
        auto GetMemMgr = []() { return std::make_unique<SingleSectionMemoryManager>(); };
        auto ObjLinkingLayer = std::make_unique<RTDyldObjectLinkingLayer>(ES, std::move(GetMemMgr));
        
        // Note sure this is needed/appropriate ?
        if (triple.isOSBinFormatCOFF()) {
            ObjLinkingLayer->setOverrideObjectFlagsWithResponsibilityFlags(true);
            ObjLinkingLayer->setAutoClaimResponsibilityForObjectSymbols(true);
        }
        
        return std::unique_ptr<ObjectLayer>(std::move(ObjLinkingLayer));
    });
#endif

    lljit = CHECK_LLVM(llb.create());

At this point I was starting to feel quite nervous: by experience I'd say that adding so much code when you don't fully understand what is happening can usually only lead to one path: crashes, garbage, and more crashes… until you finally completely understand what you are doing [and you start contemplating how dump and crazy you were before… LOL]. But maybe this was my lucky day then… because this just worked! Yeaaaaahhh! :-D

Note: I also updated my LLVM bindings to be able to catch exceptions coming out of my JIT code since this implementation was also supposed to provide support for that:

    SOL_CUSTOM_FUNC(call) = [](class_t& obj, const std::string& name) {
        auto func = (void(*)())obj.lookup(name);
        CHECK(func, "Cannot find function with name "<<name);
        try {
            func();
        }
        catch(const std::exception& e) {
            logERROR("Exception catched from JIT code: "<<e.what());
        }
        catch(...) {
            logERROR("Unknown exception catched from JIT code.");
        }
    };

Then running the test script above, I got the following (correct) results:

[DEBUG]: Adding module from bytecode.
[DEBUG]: Calling dyn lib initialize()
[DEBUG]: Done calling dyn lib initialize()
Begin test.
I'm throwing an exception.
Catched exception: My exception message
End test.
[Error]         Exception catched from JIT code: Throwing from extern C :-)!
[DEBUG]: Deleting NervJIT object.
[DEBUG]: Uninitializing main JIT lib.
[DEBUG]: Done uninitializing main JIT lib.
[DEBUG]: Deleted NervJIT object.

After those good results on the globals construction/destruction and the C++ exceptions handling, I thought I should give my unit tests system another try. And in fact, with additional investigations, I found another header only test framework called Lest (cf. https://github.com/martinmoene/lest), it's not as complex/evolved as catch2, but in my position, this was actually a good thing! Because I got it to work without much trouble in my JIT code:

local jit = import "base.JITCompiler"

jit:runFunction("test_func", [[
#include <test/lest.hpp>
#include <iostream>
#include <core_common.h>

#define DEBUG_MSG(msg) std::cout << msg << std::endl;

using namespace std;

const lest::test specification[] =
{
    CASE( "Empty string has length zero (succeed)" )
    {
        EXPECT( 0 == string(  ).length() );
        EXPECT( 0 == string("").length() );

        EXPECT_NOT( 0 < string("").length() );
    },

    CASE( "Text compares lexically (fail)" )
    {
        EXPECT( string("hello") > string("world") );
    },

    CASE( "Unexpected exception is reported" )
    {
        EXPECT( (throw std::runtime_error("surprise!"), true) );
    },

    CASE( "Unspecified expected exception is captured" )
    {
        EXPECT_THROWS( throw std::runtime_error("surprise!") );
    },

    CASE( "Specified expected exception is captured" )
    {
        EXPECT_THROWS_AS( throw std::bad_alloc(), std::bad_alloc );
    },

    CASE( "Expected exception is reported missing" )
    {
        EXPECT_THROWS( true );
    },

    CASE( "Specific expected exception is reported missing" )
    {
        EXPECT_THROWS_AS( true, std::runtime_error );
    },
};

extern "C" void test_func()
{
    char* argv[] = { "my_dummy_app.exe" };
    int argc = 1;
    DEBUG_MSG("Running session.");
    int res = lest::run( specification, argc, argv /*, std::cout */  );
    if(res!=0) {
        DEBUG_MSG("lest detected "<<res<<" failing tests.");
    }
    DEBUG_MSG("Done running session.");
};

]])

logDEBUG("Done running tests.")
</sxhjs>

The script above produced the expected outputs (with C++ exceptions enabled obviously):

<code>
[DEBUG]: Adding module from bytecode.
[DEBUG]: Calling dyn lib initialize()
[DEBUG]: Registering destructor in my AtExit override: 000001C804B8B490
[DEBUG]: Done calling dyn lib initialize()
Running session.
(21): failed: Text compares lexically (fail): string("hello") > string("world") for "hello" > "world"
(26): failed: got unexpected exception with message "surprise!": Unexpected exception is reported: (throw std::runtime_error("surprise!"), true)
(41): failed: didn't get exception: Expected exception is reported missing: true
(46): failed: didn't get exception of type std::runtime_error: Specific expected exception is reported missing: true
4 out of 7 selected tests failed.
lest detected 4 failing tests.
Done running session.
[Debug]               Done running tests.
[DEBUG]: Deleting NervJIT object.
[DEBUG]: Uninitializing main JIT lib.
[DEBUG]: Executing one at exit function.
[DEBUG]: Done executing one at exit function.
[DEBUG]: Done uninitializing main JIT lib.
[DEBUG]: Deleted NervJIT object.
</code>

===== JIT Modules linking process =====

Now, there is one last thing I would like to discuss in this already terribly long article: while trying to build some more tests with lest, I wanted to try the "test-autoregistration" support from multiple translation units (cf. https://github.com/martinmoene/lest/blob/master/example/13-module-auto-reg-1.cpp). So I prepared 3 separated C++ scripts:

<sxh cpp>
// nv_land_test.cpp file
define lest_FEATURE_AUTO_REGISTER 1
#include <test/lest.hpp>
#include <iostream>
#include <core_common.h>

#define TEST_CASE( name ) lest_CASE( specification(), name )

using namespace std;

lest::tests & specification()
{
    static lest::tests tests;
    return tests;
}

TEST_CASE( "Empty string has length zero (succeed)" )
{
    EXPECT( 0 == string(  ).length() );
    EXPECT( 0 == string("").length() );

    EXPECT_NOT( 0 < string("").length() );
}
extern "C" void nv_land_tests()
{
    char* argv[] = { "my_dummy_app.exe" };
    int argc = 1;
    DEBUG_MSG("Running session.");
    int res = lest::run( specification(), argc, argv /*, std::cout */  );
    if(res!=0) {
        DEBUG_MSG("lest detected "<<res<<" failing tests.");
    }
    DEBUG_MSG("Done running session.");
};

// nv_land_1_spec.cpp
#define lest_FEATURE_AUTO_REGISTER 1
#include <test/lest.hpp>
#include <iostream>
#include <core_common.h>

#define TEST_CASE( name ) lest_CASE( specification(), name )

extern lest::tests & specification();

TEST_CASE( "A passing test" "[pass]" ) 
{
    EXPECT( 42 == 42 );
}

// nv_land_2_spec.cpp
#define lest_FEATURE_AUTO_REGISTER 1
#include <test/lest.hpp>
#include <iostream>
#include <core_common.h>

#define TEST_CASE( name ) lest_CASE( specification(), name )

extern lest::tests & specification();

TEST_CASE( "A failing test" "[fail]" ) 
{
    EXPECT( 42 == 7 );
}

Then I would naively try to load those 3 files as “separated modules” into my JIT session:

local jit = import "base.JITCompiler"

jit:usePCHFile("")
jit:loadScript("test/nvland/nv_land_tests")
jit:loadScript("test/nvland/nv_land_1_spec")
jit:loadScript("test/nvland/nv_land_2_spec")

jit:execute("nv_land_tests")

logDEBUG("Done running tests.")
</sxhjs>

But of course this would not work because as soon as I try to load the second script, I get a duplicate symbol error from LLVM (and this completely makes sense): <code>[ERROR]: LLVM error: Duplicate definition of symbol '??_7success@lest@@6B@'</code>

=> So I really had to figure out how you should proceed to link multiple modules together and fix this kind of symbols duplication issue correctly. And I found the **llvm-link** tool which seems to provide exactly this feature: it will allow you to take multiple Modules as input, and **merge** them into a single module, resolving symbols as needed :-)

From there I built the following helper functions (using the llvm-link sources as reference):

<sxh cpp>
static void linkFiles(LLVMContext &Context, Linker &L,
                      const std::vector<std::string> &Files,
                      unsigned Flags, bool internalize) {

  // Filter out flags that don't apply to the first file we load.
  unsigned ApplicableFlags = Flags & Linker::Flags::OverrideFromSrc;
  // Similar to some flags, internalization doesn't apply to the first file.
  bool InternalizeLinkedSymbols = false;

  for (const auto &File : Files) {
    llvm::SMDiagnostic dErr;
    std::unique_ptr<Module> M(llvm::parseIRFile(llvm::StringRef(File), dErr, Context));

    if (!M.get()) {
        THROW_MSG("Cannot load module file: "<< File);
    }

    if(verifyModule(*M, &errs())) {
        THROW_MSG("Module file "<<File<<" is broken.");
    }

    bool Err = false;
    if (InternalizeLinkedSymbols) {
      Err = L.linkInModule(
          std::move(M), ApplicableFlags, [](Module &M, const StringSet<> &GVS) {
            internalizeModule(M, [&GVS](const GlobalValue &GV) {
              return !GV.hasName() || (GVS.count(GV.getName()) == 0);
            });
          });
    } else {
      Err = L.linkInModule(std::move(M), ApplicableFlags);
    }

    if (Err) {
        THROW_MSG("Error while linking in module.");
    }

    // Internalization applies to linking of subsequent files.
    InternalizeLinkedSymbols = internalize;

    // All linker flags apply to linking of subsequent files.
    ApplicableFlags = Flags;
  }
}

void NervJITImpl::linkModule(const std::string& outFile, const std::vector<std::string>& inputList, bool onlyNeeded, bool internalize, bool optimize, bool preserveUseListOrder)
{
    auto Composite = std::make_unique<Module>("llvm-link", *tsContext->getContext());
    llvm::Linker L(*Composite);

    unsigned Flags = Linker::Flags::None;
    if (onlyNeeded)
        Flags |= Linker::Flags::LinkOnlyNeeded;

    linkFiles(*tsContext->getContext(), L, inputList, Flags, internalize);

    if(verifyModule(*Composite, &errs())) {
        THROW_MSG("Composite module is broken.");
    }

    if(optimize) {
        optimizeModule(Composite.get());
    }

    // cf. https://stackoverflow.com/questions/13903686/writing-module-to-bc-bitcode-file
    std::error_code EC;
    llvm::raw_fd_ostream OS(outFile, EC, llvm::sys::fs::F_None);
    WriteBitcodeToFile(*Composite, OS, preserveUseListOrder);
    OS.flush();
}

⇒ The idea above is to use the llvm::Linker object to perform the required link stage, and then we write the populated composite module into a bitcode file with the WriteBitcodeToFile() function.

Then I implemented the necessary frontend functions in Lua to use this new module linking feature:

function Class:linkModule(files, force)
  -- For each file, we generate the bitcode:
  local bcfiles = {}
  for _,file in pairs(files) do 
    file = self:findFile(file)
    table.insert(bcfiles, self:buildBitcodeForFile(file, force))
  end

  local modHash = self:computeStringListHash(bcfiles)

  -- Now we should link the module and generate the corresponding hash:
  local modfile = self.bc_dir..modHash..".bc"

  if force or not nv.fileExists(modfile) then
    logDEBUG("Linking module file ", modfile, "...")
    local startTime = nv.SystemTime.getCurrentTime()
    self.jit:linkModule(modfile, bcfiles, true, true, true, true)
    local endTime = nv.SystemTime.getCurrentTime()
    logDEBUG(string.format("Linked module  in %.3fms", (endTime - startTime)*1000.0))
  end

  return modfile
end

function Class:loadModule(files, force)
  local modfile = self:linkModule(files, force)
  self.jit:loadModuleBitcode(modfile)
end

And finally, I could update my test script to ensure I would “link” all my C++ scripts together to build a working module before loading it into the JIT session:

local jit = import "base.JITCompiler"

jit:usePCHFile("")
-- jit:loadScript("test/nvland/nv_land_tests")
-- jit:loadScript("test/nvland/nv_land_1_spec")
-- jit:loadScript("test/nvland/nv_land_2_spec")
jit:loadModule{
    "test/nvland/nv_land_tests",
    "test/nvland/nv_land_1_spec",
    "test/nvland/nv_land_2_spec",
}

jit:execute("nv_land_tests")

logDEBUG("Done running tests.")

And this time, this would work just fine:

[DEBUG]: Adding module from bytecode.
[DEBUG]: Calling dyn lib initialize()
[DEBUG]: Done calling dyn lib initialize()
[Debug]               Running session.
(12): failed: A failing test[fail]: 42 == 7 for 42 == 7
1 out of 3 selected tests failed.
[Debug]               lest detected 1 failing tests.
[Debug]               Done running session.
[Debug]               Done running tests.

Okay, hmmm, first of all, I'm absolutely sorry I made this article so long… I should definitely have cut this into smaller pieces :-S But Here we are anyway :-). If you made it that far, then congratulations! (Seriously man lol). And I hope you will find some useful hints and ideas in the code and details I provided here.

As usual, in case this could help, here is a package containing the latest C++ sources for this NervJIT compiler:

nv_llvm_20200427.zip

And for me, now it's time to get a coffee and take a break before I get back to some more testing hi hi hi.

Happy coding everyone!

  • blog/2020/0425_jit_compiler_part5_improvements.txt
  • Last modified: 2022/11/18 09:28
  • by 127.0.0.1