Deslang: From Code to CodeDOM and Back

Updated on 2019-12-15

Code generation, faster

Introduction

Please note that this project is part of a larger project of mine called the Build Pack, which is a set of utilities and tools for building build tools - usually source code generators.

Many of these tools are powered by Slang, a technology that allows a subset of C# code to be rendered out to a myriad of potential .NET languages, including of course, VB.NET. This allows code generator tools using Slang to render code without concern for the target project's build language. It will happily support VB.NET projects, C# projects, perhaps even F# projects.

Deslang was written to be an optimization tool, but the possibilities of it are definitely not limited to that. It was written to shrink and speed up Slang powered projects that don't need all of Slang's capabilities at runtime.

Even if you don't use Slang, Deslang may be useful, but Deslang was written as a companion to Slang.

Deslang allows you to store a language neutral/agnostic representation of your .NET code as static fields in your code. These static fields contain fully instantiated CodeCompileUnit/CodeDOM objects that represent the code you fed to this build tool. This is highly useful if this code was generated as part of Slang, but when you don't need to process the code through the Slang engine every time. For example, if you have static library code written in Slang and you export that as part of your tool's execution, you can shrink your tool and make it faster by precooking that static code into the tool as static fields. Precooked code like this does not need the Slang engine to be present to reinstantiate it. Instead of running Slang to parse and resolve the code, the code is already stored as fully resolved CodeDOM objects ready to be spit out in any target language.

Even purely as a curiosity, the tool is cool tech, as it turns C# code into code used to render that code (to a CodeDOM provider).

Background

Slang is a huge win for developers of code generation tools because instead of writing language neutral code generation using the CodeDOM, Slang allows you to write code in a subset of C# that can then be transformed into any target language there is a CodeDOM for.

Now instead of a bunch of ugly CodeDOM code, your code generation code is simply in C#. Write it in C#, it will render to VB or C# automatically.

However, the downside is that it isn't free in terms of resources. Parsing and type resolution in the CodeDOM is a bear, and keeps Slang busy eating up your CPU cycles. It's not very memory hungry but it's CPU intensive in spurts. It also makes Slang's binary footprint around 200k compiled (release) even when source embedded into other projects.

Deslang allows you basically to can Slang's magic for later use, so you can still use Slang, but put it on a diet.

The fields it outputs look like this (example for Token.cs, explored below):

public static System.CodeDom.CodeCompileUnit Token =
  Shared._CompileUnit(new string[0], new CodeNamespace[] {
  Shared._Namespace("Rolex", new CodeNamespaceImport[0], new CodeTypeDeclaration[] {
  Shared._TypeDeclaration("Token", false, false, false, true, false,
  ((MemberAttributes)(0)), TypeAttributes.NotPublic, new CodeTypeParameter[0],
  new CodeTypeReference[0], new CodeTypeMember[] {
  Shared._MemberField(new CodeTypeReference("System.Int32"),
  "Line", null, (MemberAttributes.Final | MemberAttributes. Public),
  new CodeCommentStatement[] {
    new CodeCommentStatement(new CodeComment("<summary>", true)),
    new CodeCommentStatement(new CodeComment
("Indicates the line where the token occurs", true)),
    new CodeCommentStatement(new CodeComment
("</summary>", true))}, new CodeAttributeDeclaration[0],
new CodeDirective[0], new CodeDirective[0], null),
  Shared._MemberField(new CodeTypeReference("System.Int32"),
  "Column", null, (MemberAttributes.Final | MemberAttributes. Public),
  new CodeCommentStatement[] {
    new CodeCommentStatement(new CodeComment
("<summary>", true)),
    new CodeCommentStatement(new CodeComment
("Indicates the column where the token occurs", true)),
    new CodeCommentStatement(new CodeComment("</summary>", true))},
new CodeAttributeDeclaration[0], new CodeDirective[0], new CodeDirective[0], null),
  Shared._MemberField(new CodeTypeReference("System.Int64"), "Position",
null, (MemberAttributes.Final | MemberAttributes. Public), new CodeCommentStatement[] {
    new CodeCommentStatement(new CodeComment("<summary>", true)),
    new CodeCommentStatement(new CodeComment
("Indicates the position where the token occurs", true)),
    new CodeCommentStatement(new CodeComment("</summary>", true))},
new CodeAttributeDeclaration[0], new CodeDirective[0], new CodeDirective[0], null),
  Shared._MemberField(new CodeTypeReference("System.Int32"), "SymbolId",
null, (MemberAttributes.Final | MemberAttributes. Public), new CodeCommentStatement[] {
    new CodeCommentStatement(new CodeComment("<summary>", true)),
    new CodeCommentStatement(new CodeComment
("Indicates the symbol id or -1 for the error symbol", true)),
    new CodeCommentStatement(new CodeComment("</summary>", true))},
new CodeAttributeDeclaration[0], new CodeDirective[0], new CodeDirective[0], null),
  Shared._MemberField(new CodeTypeReference("System.String"), "Value",
null, (MemberAttributes.Final | MemberAttributes. Public), new CodeCommentStatement[] {
    new CodeCommentStatement(new CodeComment("<summary>", true)),
    new CodeCommentStatement(new CodeComment("Indicates the value of the token", true)),
    new CodeCommentStatement(new CodeComment("</summary>", true))},
new CodeAttributeDeclaration[0], new CodeDirective[0], new CodeDirective[0], null)},
new CodeCommentStatement[] {
    new CodeCommentStatement(new CodeComment("<summary>", true)),
    new CodeCommentStatement(new CodeComment
("Reference implementation for generated shared code", true)),
    new CodeCommentStatement(new CodeComment("</summary>", true))},
    new CodeAttributeDeclaration[0], new CodeDirective[0], new CodeDirective[0], null)},
    new CodeCommentStatement[0])}, new CodeAttributeDeclaration[0],
    new CodeDirective[0], new CodeDirective[0]);

Hey look, that's a CodeDOM graph of a bunch of code! It's not pretty, but don't worry. What it generates is pretty enough - here is the output of the above rendered to C#:

namespace Rolex
{
    /// <summary>
    /// Reference implementation for generated shared code
    /// </summary>
    struct Token
    {
        /// <summary>
        /// Indicates the line where the token occurs
        /// </summary>
        public int Line;
        /// <summary>
        /// Indicates the column where the token occurs
        /// </summary>
        public int Column;
        /// <summary>
        /// Indicates the position where the token occurs
        /// </summary>
        public long Position;
        /// <summary>
        /// Indicates the symbol id or -1 for the error symbol
        /// </summary>
        public int SymbolId;
        /// <summary>
        /// Indicates the value of the token
        /// </summary>
        public string Value;
    }
}

Now that's fine looking code, well formatted, and comments are there. That's what that ugly field Token above contains. So if we need to spit this at an end consumer, we can just pass that field to GenerateCodeFromCompileUnit(). It will look just as pretty in VB or F#, don't worry.

Typically though, I don't just spit static code at consumers. That wouldn't be very useful. Often, I'll take code I've stored this way using Deslang, and modify it using CodeDomVisitor before giving it to the downstream consumer. That way, I can get my dynamism without requiring all of Slang and the CodeDOM Go Kit. Cutting Slang out as mentioned shaves about 200k off the end redistributable size any way you do it, and speeds up the app. I just use the visitor feature by including its source file which only adds about 30k to the end binary instead.

Slang and the CodeDOM Go Kit

Building this Mess

This solution uses its own output to build itself. You heard right. Also, since I deleted the binaries from the zip I'm giving you, you'll need to do a Release build first. You'll have to build two or three times until the errors go away because Visual Studio likes to try to move files before they're closed. It will eventually build. If it doesn't, you forgot to switch it to "Release". Once you're done, switch it back to debug. The reason is there are several pre-build steps in the projects that use the release binaries of other projects to build their source code with.

For example, Rolex uses Deslang to generate its template and library code, while CodeDomGoKit uses Rolex to build Slang's tokenizer (and yes, this is circular but that doesn't matter much). RolexDemo uses Rolex to build its example tokenizer it uses for its parser.

Using this Mess

So using this, we can declare like this (in Widget.cs):

using System;
namespace CorporateHellscape
{
    /// <summary>
    /// Base widget implementation
    /// </summary>
    partial class Widget
    {
        // our payload - to be filled
        byte[] _payload;

        public override string ToString()
        {
            return "[Widget - " + Convert.ToBase64String(_payload) +"]";
        }
    }
}

The above is our static portion of our implementation. We're going to modify this code to give the _payload field a value.

First, let's generate our deslanged version of this.

deslang Widget.cs /output DeslangedWidget.cs /namespace DeslangedDemo /ifstale

This is already a pre-build step in the included DeslangDemo project.

Now by including DeslangedWidget.cs in our project, we can access the CodeDOM for the Widget code with:

CodeCompileUnit code = Deslanged.Widget;

where code is our compile unit.

So now, we can turn to the CodeDOM Go Kit's CodeDomVisitor which we will use now to Visit():

CodeDomVisitor.Visit(Deslanged.Widget, (ctx) =>
{
    // look for our _payload field
    var f = ctx.Target as CodeMemberField;
    if(null!=f && "_payload"==f.Name)
    {
        // give it some data
        f.InitExpression = CodeDomUtility.Literal(_Hash(DateTime.UtcNow.ToString()));
        // we're done searching
        ctx.Cancel = true;
    }
});

Now if we dump Deslanged.Widget to the console, we'll see that the _payload field has been populated. Note that we would have marked it readonly but the CodeDOM doesn't support that, so neither does Slang, unfortunately.

This was a very simple, contrived example. Rolex is more real world though, and makes extensive use of this feature, not only housing 3 deslanged files which it selectively includes in its generated output, but also two additional deslanged files which it modifies like above for generating its code. The old version of Rolex did not have Slang, and so you can compare how complicated the alternative was:

old version of Rolex

// constructor
var ctor = new CodeConstructor();
ctor.Attributes = MemberAttributes.Public;
ctor.Parameters.Add(new CodeParameterDeclarationExpression(typeof(IEnumerable<char>), "input"));
ctor.BaseConstructorArgs.Add(new CodeFieldReferenceExpression(null, dfaTableField.Name));
ctor.BaseConstructorArgs.Add(new CodeFieldReferenceExpression(null, blockEndsField.Name));
ctor.BaseConstructorArgs.Add(new CodeFieldReferenceExpression(null, nodeFlagsField.Name));
ctor.BaseConstructorArgs.Add(new CodeArgumentReferenceExpression(ctor.Parameters[0].Name));
result.Members.Add(ctor);

That was just to declare a constructor. To declare the class is a page of code so I omitted everything but the above. Now, in late model Rolex, we declare the entire class using the C# Slang subset:

using System.Collections.Generic;

namespace Rolex
{
    class TableTokenizerTemplate : TableTokenizer
    {
        internal static DfaEntry[] DfaTable; // to be populated
        internal static int[] NodeFlags; // to be populated
        internal static string[] BlockEnds; // to be populated
        // this was what the above code declared:
        public TableTokenizerTemplate(IEnumerable<char> input) :
               base(DfaTable, BlockEnds, NodeFlags, input)
        {
        }
    }
}

Obviously, we need to fill in those fields, and change the class name to what the user selected, but we've covered updating the fields using a visitor. Updating 3 fields and a class name isn't much more complicated:

CodeDomVisitor.Visit(Shared.TableTokenizerTemplate, (ctx) => {
    td = ctx.Target as CodeTypeDeclaration;
    if(null!=td && td.Name.EndsWith("Template"))
    {
        // we need the original name for later but not here
        origName += td.Name;
        td.Name = name;
        var f = CodeDomUtility.GetByName("DfaTable", td.Members) as CodeMemberField;
        f.InitExpression = CodeGenerator.GenerateDfaTableInitializer(dfaTable);
        f = CodeDomUtility.GetByName("NodeFlags", td.Members) as CodeMemberField;
        f.InitExpression = CodeDomUtility.Literal(nodeFlags);
        f = CodeDomUtility.GetByName("BlockEnds", td.Members) as CodeMemberField;
        f.InitExpression = CodeDomUtility.Literal(blockEnds);
        CodeGenerator.GenerateSymbolConstants(td, symbolTable);
        ctx.Cancel = true;
    }
});

Basically what we're doing, is we find the first type declaration that ends in "Template" in our source file, and then from there, we find each of the fields and set their values, sort of like we did with Widget and we also call CodeGenerator.GenerateSymbolConstants() with our class to put the constant names on it. If we were using full Slang, instead of a deslanged file, we could have used T4 preprocessing to render these, which would have been nice, but doing so with CodeDomUtility.Field() isn't really so bad. Either way, we have to make sure there are no name collisions, and that the identifiers are valid, which is what that routine does, and why it doesn't simply return an array of fields.

Note that we couldn't use CodeDomUtility.Literal() to serialize our DFA state table because the array contains structs which we technically don't have defined (yet), at least not in binary form, so we just serialize it ourselves.

We also had to make sure to update our type references - simply swapping out the old names of Rolex.TableTokenizerTemplate with our final class's type. We do that in a final visit pass below the fold. I'd show it, but it's trivial.

Hooray! Now we've got our new generation routine backed by the source template TableTokenizerTemplate.cs file, and changes to it should still allow the rest of this to work, as long as the key pieces are in place. All without Slang being run by the Rolex tool itself.

Limitations and Gotchas

Remember to turn off compilation for the Slang files you are using for generation. For example, Rolex has compilation turned off for everything in its Shared folder because those are just C# templates used by the build process. They aren't compiled into this project. It can get confusing to think about it that way though. The simple way to think about it is - don't compile Slang input files! - including any files fed to Deslang.

Remember that Slang is still somewhat experimental, and your code may not work with it. The general rule is, keep it simple, and try not to do things the CodeDOM wouldn't let you do. Nested classes and generic class support is pretty dodgy and explicit base references are not supported. Also, sometimes it just gets cranky. Read the deslang output when it builds. It will warn you if Slang couldn't resolve something. You may have to tweak your source to get it right. Slang is improving daily right now so every day brings a bit more code it can deal with.

Remember to feed Deslang all your compile units that need to work together. If one uses code from the other, they both need to be read by deslang on the same pass in order for Slang to resolve them. This can mean feeding deslang multiple inputs, even if you don't intend to use all of the output.

Remember to reference the assemblies you use in your files using /asms.

Remember once you modify the code trees you get under the Deslanged (or alternatively named) class, you won't be able to get the originals back until the next time the app is run.

Remember you can include CodeDomVisitor.cs and CodeDomUtility.cs from the CodeDOM Go Kit to do code modification on these trees that deslang cooks for you.

Points of Interest

This whole project is a bit zany. It generates code that generates code, for starters. It might be a little hard to wrap one's mind around. It's easy to start playing with though.

History

  • 12th December, 2019 - Initial submission