blog:2020:0617_nervluna_file_naming

NervLuna: Fixing generated file names

In our previous post, we were trying to generate the bindings for the nvCore module, and even if we cannot say this was a complete success, we made some significant steps forward. And now, it's time to continue on this path, trying to push it even further. Here we will mainly focus on how the actual binding files are named and what we can do to use short aliases for some template classes when applicable (both issues are closely related, since we use the class name to generate the file name.)

First things first, in our generated “wip_core” folder, we are still missing some of the key files required to actually perform the compilation (ie. the cmake files). So we should auto-generate them if they are missing.

⇒ I'm using my helper class CommonHeaderWriter to handle that task extending it with the following content:

  self:writeRootCMakeLists()
  self:writeOutputFile("CMakeLists.txt")
  self:reset()

  self:writeSrcCMakeLists()
  self:writeOutputFile("src/CMakeLists.txt")
  self:reset()
</sxhjs>

Where I'm using the additional function definitions:

<sxhjs lua>
function Class:writeRootCMakeLists()
  self:writeLine[[ADD_SUBDIRECTORY(src)]]
end

function Class:writeSrcCMakeLists()
  self:writeSubLine([[
SET(TARGET_NAME "${1}")
SET(TARGET_DIR ".")

SET(SHARED_TGT ${TARGET_NAME})
FILE(GLOB_RECURSE SOURCE_FILES "*.cpp")
]], luna:getLibraryName());
  self:newLine();
  
  local cfg = luna:getCMakeConfig()
  
  for _,ipath in ipairs(cfg.includeDirs or {}) do
    self:writeSubLine("INCLUDE_DIRECTORIES(${1})", ipath)
  end
  self:newLine();

  for _,ipath in ipairs(cfg.libDirs or {}) do
    self:writeSubLine("LINK_DIRECTORIES(${1})", ipath)
  end
  self:newLine();

  self:writeSubLine([[ADD_MSVC_PRECOMPILED_HEADER("bind_precomp.h" "bind_precomp.cpp" SOURCE_FILES)

# Build the shared library:
ADD_LIBRARY (${SHARED_TGT} SHARED ${SOURCE_FILES})
]])
  self:writeSubLine("TARGET_LINK_LIBRARIES(${SHARED_TGT} nvCore ${LUAJIT_LIBS} ${1})", table.concat(cfg.libs or {}," "))
  self:newLine()
  self:writeSubLine([[
INSTALL(TARGETS ${SHARED_TGT}
RUNTIME DESTINATION ${TARGET_DIR}
LIBRARY DESTINATION ${TARGET_DIR})

# Install the pdb if applicable:
INSTALL_PDB()
]])
end
</sxhjs>

And thus, now I'm also specifying a "cmake configuration" element in my binding generation file to be able to generate those files correctly. That element is mostly empty for the wip_core project, but for instance in another simple Vulkan binding module I've been working on in parallel I now have:

<sxhjs lua>
cfg.cmakeConfig = {
    includeDirs = {
        "${VULKAN_DIR}/include",
        "${VULKAN_DIR}/include/vulkan",
        "../include"
    },
    libDirs = {
        "${VULKAN_DIR}/lib"
    },
    libs = {
        "vulkan-1"
    }
}
</sxhjs>

=> With those additional files, I can now try compiling the bindings.

===== Fixing the invalid file names =====

On the first attempt to compile the nvCore bindings I got an error from cmake itself, because some of the file names are just completely wrong: <code>CMake Error at sources/lua_bindings/wip_Core/src/CMakeLists.txt:14 (ADD_LIBRARY):
  Legacy variable expansion in source file
  "W:/Projects/NervSeed/sources/lua_bindings/wip_Core/src/luna/bind_std_map_std_string,${T}.cpp"
  expanded to
  "W:/Projects/NervSeed/sources/lua_bindings/wip_Core/src/luna/bind_std_map_std_string,.cpp"
  in target "luaCoreWIP".  This behavior will be removed in a future version
  of CMake.
</code>

Of course I didn't intent to //expand anything// in the file name :-) Let's see how this should be fixed now.

Hmmm... and actually, the file I mentioned above is really confusing... I don't quite understand why this would be generated, but at least I think I know where this comes from:

<sxh cpp>
template<typename T>
struct EnumNames {
  static std::map<std::string, T> mapping;
};

So in the code above, I have a static field “mapping” that is using the type of interest. But that field is still inside a Class template, so when we create a concrete class instance we should replace the template parameter somehow [Now that I mention it, I don't think I have that handled anywhere… so it's not so surprising we still have the raw template parameter name in there afterall ?]

[I keep digging…]

⇒ the strange part now is really on this line in my log file:

[Debug] 	      Resolved template argument 1: ${T}

After some investigations I realized this was due to the fact that, when I detect a new type that take some template arguments, I was simply assuming that I was building a concrete type. This is true for instance when parsing “std::map<std::string, int>”. But in this case we were parsing “std::map<std::string, T>”. The type representing “T” here was already previously registered, so its name was changed to “${T}”, and then I was simply considering it was a concrete type, and thus building the class “std::map<std::string, ${T}>”. Not very good :-)

The solution here, is rather to check for each of the template argument I receive if we have a concrete type or if it is just a “form of template”:

local isTemplateType = false
local templateArgs = {}
for i=0,ntargs-1 do
  local atype = self:resolveType(cxtype:getTemplateArgumentType(i))
  logDEBUG("Resolved template argument ",i,": ", atype:getName())
  
  -- If the "type" we just retrieved is still a "template", then this
  -- new type will also be a template itself.
  isTemplateType = isTemplateType or atype:isTemplateType()
  table.insert(templateArgs, atype)
end
</sxhjs>

Then when finalizing the construction of our new type we check if the result is a concrete class that should be registered or if we should only register another template type (for a partial specialization for instance like here):

<sxhjs lua>
        if isTemplateType then
          -- This is still a templated type, so we do not want to create an actual class for it:
          -- We just assign the refTname to it as well as the template parameters:
          logDEBUG("Extracting template parameters from class name: ",clname, " for type ", tname)

          local refTname, tparams  = typeManager:collectTemplateParameters(clname)

          t:setName(refTname)
          t:setTemplateParameters(tparams)
        else
          -- This is a concrete class:
          cl = tplNS:getOrCreateClass(clname)

          -- We assign this class as target:
          CHECK(cl~=nil, "Invalid class generated from template.")
          t:setTarget(cl)
        end
</sxhjs>

=> This seem to work as expected, since I don't have those invalid file names generated anymore.

In the process, I also updated my code to **sanitize** the C++ element names as follow:

<sxhjs lua>function Class:sanitizeName(name)
  name = name:gsub("::","_")
  name = name:gsub("[<>,%s]","_")
  name = name:gsub("__+","_")
  name = name:gsub("_$", "")
  return name
end</sxhjs>

So no more commas or spaces in my file names either.

===== Too long file names issue =====

Yet, I'm facing another problem with those class names/file names: some of them are becoming **really long**, to the point were we have more than 250 characters for some file paths, and cmake will complain about it:

<code>CMake Warning in sources/lua_bindings/wip_Core/src/CMakeLists.txt:
  The object file directory

    W:/Projects/NervSeed/build/msvc64/sources/lua_bindings/wip_Core/src/CMakeFiles/luaCoreWIP.dir/

  has 94 characters.  The maximum full path to an object file is 250
  characters (see CMAKE_OBJECT_PATH_MAX).  Object file

    luna/bind_std_vector_std_basic_string_char_std_char_traits_char_std_allocator_char_std_allocator_std_basic_string_char_std_char_traits_char_std_allocator_char.cpp.obj

  cannot be safely placed under this directory.  The build may not work
  correctly.</code>

And if we check the content of those kind of source files we see the following:

<sxh cpp>
std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char>>, int>>* LunaTraits< std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char>>, int>> >::construct(lua_State* L) {
	luaL_error(L, "No public constructor available for class std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char>>, int>>");
	return nullptr;
}

void LunaTraits< std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char>>, int>> >::destruct(std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char>>, int>>* obj) {
	THROW_MSG("No public destructor available for class std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char>>, int>>");
}

Clearly not something you would like to play with lol.

⇒ So I'm thinking that I need to find a way to collect all the names provided for a given type (with typedefs or “using” directives ?) and then I should use the shortest name available when generating the binding code.

For instance, in the code above the part “const std::basic_string<char, std::char_traits<char>, std::allocator<char>” is really the same thing as a “const std::string” in fact.

Yet I'm not sure how easy/complex this will be… So let's investigate what we can do about it.

Note: Another serious issue I noticed while investigating on this long type names is the following:

[Debug] 	      No class template name provided, manually extracting class template name from type name: std::basic_string<char, std::char_traits<char>, std::allocator<char>>
[Debug] 	      Entering namespace std
[Debug] 	      Entering namespace basic_string<char, std
[Debug] 	      Creating namespace std::basic_string<char, std
[Debug] 	      Entering namespace char_traits<char>, std
[Debug] 	      Creating namespace std::basic_string<char, std::char_traits<char>, std
[Warning] 	No class template registered for 'allocator<char>>' creating default place holder class: 'allocator<char>>' in namespace 'std::basic_string<char, std::char_traits<char>, std'
[Debug] 	      Creating class std::basic_string<char, std::char_traits<char>, std::allocator<char>>

⇒ Clearly the namespace parsing is not done correctly in that case. So this should be fixed. Done

OK, so I'm now starting to handle the “aliases” for the types and classes: the filenames are now getting a “bit better”, for instance the system will correctly use the typedef “std::string” for “std::basic_string<char, std::char_traits<char>, std::allocator<char>>”. But then I also have another typedef called “nv::String” (which is a string class with a special allocator), and in the bindings I get this kind of content:

const char LunaTraits< nv::String >::className[] = "basic_string_char_std_char_traits_char_nv_STLAllocator_char_nv_DefaultPoolAllocator";
const char LunaTraits< nv::String >::fullName[] = "nv_String";
const char* LunaTraits< nv::String >::namespaces[] = {"std",0};
const char* LunaTraits< nv::String >::parents[] = {0};
const StringID LunaTraits< nv::String >::id = SID("nv::String");
const StringID LunaTraits< nv::String >::baseIDs[] = {0};

⇒ When using an alias, I really need to retrieve the “simple class name” and the “actual namespace” from the alias itself.

The “className” and “namespaces” variables above are the one used to find the class in lua. So we really don't want to use those too long names there either.

Okay, the issue above is now fixed. Yet, I still think there are some locations where the class names in use are too long and could be reduced. For instance, we have the class “std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char>>, int>” defined, and with no typedef for it. So of course, this will produce a very long filename and type references in the class binding file.

Yet, we could notice that the type “std::basic_string<char, std::char_traits<char>, std::allocator<char>>” is mentioned in that full type, and we have a typedef on that one (ie. “std::string”) so we should be able to refactor our type into: “std::pair<const std::string, int>”, no ? ⇒ Let's investigate this.

After some good investigations on the subject, I realized that there was a fondamental issue I couldn't find a way to fix: I can find [I mean libclang can find…] the declaration of the “std::string” somewhere in “xstring”:

using string = basic_string<char, char_traits<char>, allocator<char>>;

And from there I could retrieve a TemplateRef for “basic_string”. But from there, it would be pretty hard to say what is the actual namespace for “basic_string” ? [Or at least I couldn't find anything method in libclang that would provide me with this info] we could be inside “std”, or another namespace, and even if we are in a given namespace, we could be “using” the namespace where “basic_string” comes from and so on ?

⇒ So I ended up extending my custom version of the libclang module with this additional helper function:

CXString clang_getCursorFullRefSpelling(CXCursor C) {
  if (clang_isTranslationUnit(C.kind))
    return clang_getTranslationUnitSpelling(getCursorTU(C));

  PrintingPolicy P = getCursorContext(C).getPrintingPolicy();
  P.FullyQualifiedName = 1;

  if (clang_isReference(C.kind)) {
    switch (C.kind) {
    case CXCursor_TypeRef: {
      const TypeDecl *Type = getCursorTypeRef(C).first;
      assert(Type && "Missing type decl");

      return cxstring::createDup(
          getCursorContext(C).getTypeDeclType(Type).getAsString(P));
    }
    case CXCursor_TemplateRef: {
      const TemplateDecl *Template = getCursorTemplateRef(C).first;
      assert(Template && "Missing template decl");

      return cxstring::createDup(Template->getQualifiedNameAsString());
    }

    case CXCursor_NamespaceRef: {
      const NamedDecl *NS = getCursorNamespaceRef(C).first;
      assert(NS && "Missing namespace decl");

      return cxstring::createDup(NS->getQualifiedNameAsString());
    }

    case CXCursor_MemberRef: {
      const FieldDecl *Field = getCursorMemberRef(C).first;
      assert(Field && "Missing member decl");

      return cxstring::createDup(Field->getQualifiedNameAsString());
    }

    case CXCursor_OverloadedDeclRef: {
      OverloadedDeclRefStorage Storage = getCursorOverloadedDeclRef(C).first;
      if (const Decl *D = Storage.dyn_cast<const Decl *>()) {
        if (const NamedDecl *ND = dyn_cast<NamedDecl>(D))
          return cxstring::createDup(ND->getQualifiedNameAsString());
        return cxstring::createEmpty();
      }
      if (const OverloadExpr *E = Storage.dyn_cast<const OverloadExpr *>())
        return cxstring::createDup(E->getName().getAsString());
      OverloadedTemplateStorage *Ovl =
          Storage.get<OverloadedTemplateStorage *>();
      if (Ovl->size() == 0)
        return cxstring::createEmpty();
      return cxstring::createDup((*Ovl->begin())->getQualifiedNameAsString());
    }

    case CXCursor_VariableRef: {
      const VarDecl *Var = getCursorVariableRef(C).first;
      assert(Var && "Missing variable decl");

      return cxstring::createDup(Var->getQualifiedNameAsString());
    }

    default:
      return cxstring::createRef("<not implemented>");
    }
  }

  return cxstring::createEmpty();
}

⇒ And the good news is this seems to work as expected :-)

The function above will only work if provided with a cursor that is some kind of reference, such as TemplateRef or TypeRef. Maybe this could be extended further, but that's all that I need for the moment, so this is good enough for me.

[… Some deep investigations taking place here …]

So, after quite a few trial and error sessions, I'm finally able to replace all the occurences of “std::basic_string<char, std::char_traits<char>, std::allocator<char>>” with “std::string” (even in the types where “const std::string” is used instead). Yet, we are still not done yet: next issue on the table is the fact that I have the bindings for “std::vector<std::string, std::allocator<std::string>>” defined in one file, and the bindings for “nv::StringList” in another file… “Yes, and what ?” you will ask, well, the definition of the StringList type is the following:

typedef std::vector<std::string> StringList;

⇒ So I'm effectively generating the bindings for the same class twice [and this will not compile of course]. How could that be ?

Well, this was yet another subtle hidden issue that could lead to registering twice the same class when resolving some complex types. But this is now fixed, feewww!

Note: Initially I thought this was somehow related to some of the default template parameter values preventing the 2 types to be considered as the same thing, but it seems I was wrong on that: in libclang it rather seems I always get *all* the template parameters, with the default automagically specified too.

One last issue I noticed the was still somehow related to the file naming was the handling of some types where we would use “aliases” that would not contain some namespaces.

For instance, I define somewhere in my code something like this:

namespace nv {

typedef std::uint64_t U64;

class MyClass {
protected:
  std::atomic<U64> _refCount;
};

}

And this would produce the bindings for a class named “std::atomic<U64>”.

During the parsing process, the libclang will actually build a type “std::atomic<nv::U64>” as desired, but then we see the name “std::atomic<U64>” for it, and since it is shorter, I use it as an alias.

Except that, that name for the class is only valid if we are inside the nv namespace (or using the “nv” namespace). Problem was then that I couldn't find a way to ensure that clang would alwas provide me with a fully qualified name for a given type when I request it. So again, I had to extend the base libclang library with the following additional function:

// cf. https://stackoverflow.com/questions/33643923/get-fully-qualified-template-template-argument-name-using-libtooling
// cf. https://root.cern.ch/gitweb/?p=root.git;a=blob;f=interpreter/cling/lib/Utils/AST.cpp;hb=f5321511fe583c2c343991c4fbb2ee83fa123eff
CXString clang_getTypeFullSpelling(CXType CT) {
  QualType T = GetQualType(CT);
  if (T.isNull())
    return cxstring::createEmpty();

  CXTranslationUnit TU = GetTU(CT);
  PrintingPolicy PP(cxtu::getASTUnit(TU)->getASTContext().getLangOpts());

  PP.FullyQualifiedName = 1;
  PP.SuppressScope = false;
  PP.AnonymousTagLocations = false;

  std::string fname = clang::TypeName::getFullyQualifiedName(T, cxtu::getASTUnit(TU)->getASTContext(),PP);
  return cxstring::createDup(fname);
}

⇒ And good news: this worked great! I could then simply get rid of that in appropriate “std::atomic<U64>” alias, and this didn't break anything else [as far as I can tell for now :-)]. So we are all good now!

So, I now think this is good enough for this part, during this dev session we fixed a lot of important issues on how the binding files are generated. Of course my nvCore bindings are still not compiling [I still have a lot of incorrectly named operator functions for instance],but I feel we are getting close :-) ⇒ We will simply keep working on this in a third post!

Meanwhile, happy hacking!

  • blog/2020/0617_nervluna_file_naming.txt
  • Last modified: 2020/07/10 12:11
  • by 127.0.0.1