Updated on 2020-09-05
Parse, analyze, and transform your code with Slang
Slang, in brief, is a subset of C#. I'll cover why shortly.
I've published articles dealing with Slang before, but nothing beginning to end comprehensive, and I don't think I made my earlier efforts easy to understand.
I hope to present to you a clear picture of what Slang and the CodeDOM Go! Kit can do, plus how and where to use them.
This article assumes some familiarity with the .NET CodeDOM, although I do summarize it.
Note that this solution uses Microsoft's CodeDOM Nuget package rather than using .NET framework's built in one. The reason for this is that Core and Standard do not have it built in, and we must use one or the other CodeDOM API consistently throughout the solution or it just won't compile.
Microsoft needed a way to facilitate language independent code generation ASP.NET and for Visual Studio Designer code in InitializeComponent() in WinForms. To that end, they produced the CodeDOM. The CodeDOM is a kind of language independent representation of code using objects. For example, to represent a variable declaration, we have the CodeVariableDeclarationStatement class. When you have code that is represented as a CodeDOM object graph, you can render it in VB, C#, and potentially other target languages. An instance of CodeVariableDeclarationStatement might produce this in C#:
int i = 1234;
And this in VB.NET: (hopefully - my VB is rusty!)
Dim i as Integer = 1234
Slang is a programming language. Well, technically, it's a subset of another programming language, C#. That means anything that is valid Slang code is valid C# code, but not everything that is valid C# code is valid Slang code. I hope that makes it clear.
The reason it's a subset is because the Slang parser produces abstract syntax trees using CodeDOM constructs, and the CodeDOM simply can't represent everything that C# can do. Slang is restricted to what the CodeDOM can support, and for good reason.
Any good developer would ask this question. The primary reason Slang was created was to facilitate advanced language independent code generation and make the result easier to maintain.
Slang allows you to write code in a familiar C# syntax and produce a CodeDOM object graph* representing the code. The upshot of this is that you can then take that same graph and render it to VB, C# or potentially other target languages. The other thing you can do is analyze and transform the CodeDOM graphs.
Slang is great for use in code generation tools, but it's not necessary if all you want to do is code analysis, which Roslyn excels at. Slang isn't a Roslyn alternative. The bottom line to remember is that Slang is geared for use with code generation, not code analysis. Analysis is possible over the abstract syntax trees Slang produces, and indeed such analysis is quite useful when doing code transformation, but it does not replace the advanced code analysis features of Roslyn. Use Roslyn where appropriate. Use Slang where appropriate.
The CodeDOM works for representing code and rendering it to VB or C#, but what about searching it, analyzing it, and transforming it?
We've briefly covered that we can parse to it using Slang, but what then?
If you're writing code generation tools, some of that code needs to be produced programmatically, otherwise what's the point of generating the code? Slang doesn't help you there. It's just a language.
Enter the CodeDOM Go! Kit. The CodeDOM Go! Kit allows you to search and transform CodeDOM object graphs.
It provides facilities to visit each object in the graph in turn and even provides the ability to use something very much like reflection on these code graphs. For example, if your CodeDOM graph includes a class, the Go! Kit will allow you to do things like query it for its members almost as if it was a "real" compiled Type and you were using reflection, taking into account things like inherited members.
In addition, it also provides a less verbose method of building CodeDOM graphs in situations where Slang would be overkill.
Slang was designed to be used with the CodeDOM Go! Kit and depends on it. Using Slang together with the CodeDOM Go! Kit, it is possible to write complex code transformations on code. You might do this to make templates out of Slang code.
Basically, you parse the code using Slang, and then take the CodeDOM graph you get back and visit it using the supplied CodeDomVisitor, looking for certain patterns of code. Once you find something, you can then edit what you found, perhaps using CodeDomUtility's builder methods. Finally, you can render the result to VB.NET, C#, or potentially something else like F#. We'll get into this when we dive into the code next.
The reason you may want to do this is because you might have created a type of document that has code interspersed with other text, like in ASP.NET or in Parsley. Using Slang allows you to parse segments of a document that contain code. Normally, you can't deal with this using the C# compiler since it only recognizes code and nothing else. If you want to embed C# like this, Slang is one option. There are some other ways to do it, but the advantage of using Slang is that Slang code can be rendered to VB.NET as well.
Here are some of the things Slang can help take to the next level:
Slang helps you write code that writes code. You can deliver more powerful code generation tools using Slang. Using it allows you to focus less on building and maintaining CodeDOM graphs manually and more on writing great code generation tools.
Like C#, Slang is ambiguous without type information, so when you parse it, you must then go back and "patch" the CodeDOM tree using type information or else some of the CodeDOM elements will end up incorrect. For example, the syntax for invoking a delegate and the syntax for invoking a method are the same. Therefore, Slang can't know whether the code it's examining is a method invocation or a delegate invocation without type information to match it to, which isn't found upon parsing. After parsing, with SlangParser, you must use SlangPatcher to fix up the result with the type information you have.
Parsing is simple as long as your code can know what to expect:
// expect that codeText contains a *compile unit* which is
// the contents of one source file.
CodeCompileUnit ccu = SlangParser.ParseCompileUnit(codeText);
// now that we have done that, we must patch what we've got
SlangPatcher.Patch(ccu);
// Note that if the code spans more than a single compile
// unit, then you will have to parse each compile unit in
// turn and then pass them *all* to SlangPatcher.Patch()
// at the same time like .Patch(ccu1,ccu2,ccu3);
Above, we're parsing a compile unit - one source file's text, but we could be using other parse methods like ParseExpression(). The only issue comes with patching code that isn't a compile unit - for example, a single method, or an expression. SlangPatcher.Patch() only works with compile units, so you must create a compile unit, add the necessary cruft, and then insert the method or expression inside that code somewhere. Finally, patch the whole thing and then extract the method or expression again. It's not straightforward to do, but the demo code provided illustrates the technique with statements and expressions. Note that Patch() can take some time, especially when it's working with a large graph.
You should use SlangPatcher.GetNextUnresolvedElement() to check if Patch() was able to resolve everything. If it returns null, you are fine. Otherwise, Patch() was unable to fix up the returned element (and perhaps other elements besides). You'll still get a CodeDOM graph to work with, but it might not be entirely correct. That being said, even if it's not correct, many times, the VB.NET or C# render will still produce the code you want. You can't count on that however.
As an optimization, you can "precook" the parsing and patching using the Deslang tool beforehand, and then include the resulting source file, eliminating the need to perform the above steps at runtime. This is described next.
One of the drawbacks of using the CodeDOM as our underlying abstract syntax tree representation is that it's unindexed, and otherwise not really great for searching. Add to that the ambiguous grammar of C# and therefore Slang - much of it can't be resolved without type information, and you have a lot of work to give the CPU. Therefore, Slang can be slower than we might like.
This solution ships with a project called Deslang. Deslang allows you to take a set of Slang source files and then turn them into code that can near instantly reinstatiate the CodeDOM graph that represents that Slang source code. That's a bit complicated I guess, but the upshot is it means you don't have to reference Slang from your project and the performance is much better since the hard bit is moved from runtime to compile time.
What you're doing is "precooking" your Slang source into CodeDOM graphs. That means you don't have to do this at runtime. Since Slang has to bend over backward to use the CodeDOM underneath, it's much slower than it would otherwise be, but now we only need to do it once, and then include the source file Deslang generates. That file allows access to the contents of each source file that was passed in, where each source file is represented by its own CodeCompileUnit. It's all easier to use and understand than it sounds.
Say we have this simple declaration in Slang code:
using System;
using System.Collections.Generic;
using System.Text;
namespace Example
{
struct Token
{
public string Symbol;
public int SymbolId;
public int Line;
public int Column;
public long Position;
public string Value;
public override string ToString()
{
return Symbol + " (" + SymbolId.ToString() + ") : " + Value;
}
}
}
This will be generated in the output of Deslang, which produces C# source:
public static System.CodeDom.CodeCompileUnit Token {
get {
return Deslanged._CompileUnit(new string[0], new CodeNamespace[] {
Deslanged._Namespace("", new CodeNamespaceImport[] {
new CodeNamespaceImport("System"),
new CodeNamespaceImport("System.Collections.Generic"),
new CodeNamespaceImport("System.Text")},
new CodeTypeDeclaration[0], new CodeCommentStatement[0]),
Deslanged._Namespace("Example",
new CodeNamespaceImport[0], new CodeTypeDeclaration[] {
Deslanged._TypeDeclaration("Token", false, false,
false, true, false, (MemberAttributes.Final |
MemberAttributes. Private),
TypeAttributes.NotPublic, new CodeTypeParameter[0],
new CodeTypeReference[0], new CodeTypeMember[] {
Deslanged._MemberField
(new CodeTypeReference(typeof(string)),
"Symbol", null,
(MemberAttributes.Final |
MemberAttributes. Public),
new CodeCommentStatement[0],
new CodeAttributeDeclaration[0],
new CodeDirective[0],
new CodeDirective[0], null),
Deslanged._MemberField
(new CodeTypeReference(typeof(int)),
"SymbolId", null,
(MemberAttributes.Final |
MemberAttributes. Public),
new CodeCommentStatement[0],
new CodeAttributeDeclaration[0],
new CodeDirective[0],
new CodeDirective[0], null),
Deslanged._MemberField
(new CodeTypeReference(typeof(int)),
"Line", null,
(MemberAttributes.Final |
MemberAttributes. Public),
new CodeCommentStatement[0],
new CodeAttributeDeclaration[0],
new CodeDirective[0],
new CodeDirective[0], null),
Deslanged._MemberField
(new CodeTypeReference(typeof(int)),
"Column", null,
(MemberAttributes.Final |
MemberAttributes. Public),
new CodeCommentStatement[0],
new CodeAttributeDeclaration[0],
new CodeDirective[0],
new CodeDirective[0], null),
Deslanged._MemberField
(new CodeTypeReference(typeof(long)),
"Position", null,
(MemberAttributes.Final |
MemberAttributes. Public),
new CodeCommentStatement[0],
new CodeAttributeDeclaration[0],
new CodeDirective[0],
new CodeDirective[0], null),
Deslanged._MemberField
(new CodeTypeReference(typeof(string)),
"Value", null,
(MemberAttributes.Final |
MemberAttributes. Public),
new CodeCommentStatement[0],
new CodeAttributeDeclaration[0],
new CodeDirective[0],
new CodeDirective[0], null),
Deslanged._MemberMethod
(new CodeTypeReference(typeof(string)),
"ToString",
(MemberAttributes.Override |
MemberAttributes. Public),
new CodeParameterDeclarationExpression[0],
new CodeStatement[] {
new CodeMethodReturnStatement
(new CodeBinaryOperatorExpression
(new CodeFieldReferenceExpression
(new CodeThisReferenceExpression(),
"Symbol"), CodeBinaryOperatorType.Add,
new CodeBinaryOperatorExpression
(new CodePrimitiveExpression(" ("),
CodeBinaryOperatorType.Add,
new CodeBinaryOperatorExpression
(new CodeMethodInvokeExpression
(new CodeMethodReferenceExpression
(new CodeFieldReferenceExpression
(new CodeThisReferenceExpression(),
"SymbolId"), "ToString"),
new CodeExpression[0]),
CodeBinaryOperatorType.Add,
new CodeBinaryOperatorExpression
(new CodePrimitiveExpression(") : "),
CodeBinaryOperatorType.Add,
new CodeFieldReferenceExpression
(new CodeThisReferenceExpression(),
"Value"))))))},
new CodeTypeReference[0], null,
new CodeCommentStatement[0],
new CodeAttributeDeclaration[0],
new CodeAttributeDeclaration[0],
new CodeDirective[0],
new CodeDirective[0], null)},
new CodeCommentStatement[0],
new CodeAttributeDeclaration[0],
new CodeDirective[0],
new CodeDirective[0], null)},
new CodeCommentStatement[0])},
new CodeAttributeDeclaration[0],
new CodeDirective[0],
new CodeDirective[0]);
}
}
Sorry about the line wrapping. This code is regenerated and so it's not really formatted with readability in mind. You don't need to understand all of what it's doing, as long as you understand that it's creating a CodeCompileUnit with all your parsed Slang code in it. It is equivalent to the compile unit you'd get back if you were using the Slang assembly directly. As long as all your Slang code is known at compile time, you can use Deslang to crunch it into a C# source file that contains the CodeDOM graphs you would have gotten with Slang itself. Instantiating these is fast. Using Slang is not.
Using the command-line looks something like:
deslang "Token.cs" /output "Deslanged.cs"
It will accept multiple input files if you list them. You must always include source files that depend on each other even if you don't plan on using the code in those dependencies directly.
Using it in code looks something like:
CodeCompileUnit tokenCcu = Deslanged.Token; // now tokenCcu can be manipulated
// or rendered as normal
Quite frequently, after you've patched the CodeCompileUnit(s) or fetched them from the Deslang output, you'll need to search them and modify them. We can use CodeDomVisitor for this. This class implements a visitor pattern over any CodeDOM object graph or subgraph. You use it like this:
CodeDomVisitor.Visit(ccu, (ctx) => { ... });
ctx here is an instance of CodeDomVisitContext. You can use the context to get the current Target being visited, the Parent object, the Root, the Member it was retrieved from, and the Index into the collection where the visitor is if applicable. Let's look at a simple example:
CodeDomVisitor.Visit(Deslanged.Token, (ctx) => {
var tr = ctx.Target as CodeTypeReference;
if (null != tr)
Console.WriteLine(CodeDomUtility.ToString(tr));
});
What the above does is look for any and all CodeTypeReference objects in the graph - Deslanged.Token being the root of the graph in this case. It then uses CodeDomUtility to get a string for each CodeTypeReference it finds which it writes to the console.
This works because Visit() is called for each object in the graph it encounters, so all we had to do was examine the current Target to see if it was a CodeTypeReference.
Since it's very common to visit something and then replace the value being visited, CodeDomVisitor.ReplaceTarget() is provided to do this. Passing the current context, and the new value is all that's needed.
In many circumstances, you'll need to create CodeDOM graphs without writing them in Slang. This is very often the case with portions of the CodeDOM that will be generated dynamically based on some kind of input data. You can use the CodeDOM directly, or you can use CodeDomUtility's builder methods. The advantage of the latter is it's less verbose.
Usually, you'll want to alias the CodeDomUtility class to something shorter, and then you can use all of its abbreviated creators. Here's an example:
using CU = CD.CodeDomUtility;
...
var result = CU.For(
CU.Var(typeof(int),"i",CU.Zero),
CU.Lt(CU.VarRef("i"),CU.Literal(10)),
CU.Let(CU.VarRef("i"),CU.Add(CU.VarRef("i"),CU.One)),
CU.Call(CU.TypeRef(typeof(Console)),"WriteLine",CU.Literal("Hello World!"))
);
This is entirely static for demonstration purposes, but normally the code you create would be dynamically built based on some kind of input data. The above is difficult to read at first, but it gets easier with practice. It's also easier than using the stock CodeDOM classes directly.
Here's the CodeDOM graph above rendered to C#:
for (int i = 0; (i < 10); i = (i + 1)) {
System.Console.WriteLine("Hello World!");
}
See if you can match up the calls in the first figure with the code in the second figure. You'll start to see how they line up, and with some practice it gets easier to read and use. We could have just written the whole thing in Slang in this case and saved typing but as I said normally you'd do something to dynamically produce that code based on some input data, in which case this technique makes sense.
To ease maintenance while keeping flexibility, you can combine the power of Slang with programmatic edits of the CodeDOM graph in order to templatize your Slang code. At a high level, the steps are as follows:
CodeDOM
graph and patch itConsider the Slang "template" that follows:
// namespace will be replaced
namespace T_NAMESPACE
{
// type name will be replaced
class T_TYPE
{
// init value will be replaced
public static int[] Primes = null;
}
}
Note that when you include a template file with Visual Studio, you should go to the template document's properties and set the Build Action to None so it won't be compiled as part of your project.
In addition to setting the namespace and type name, we're going to precompute prime numbers and fill Primes with the prime number array we build.
Here's how we go about it. Since it's relatively simple, we'll avoid the overhead of CodeDomVisitor and just query the graph directly:
// compute the primes. algorithm borrowed
// from SLax at https://stackoverflow.com/questions/1510124/program-to-find-prime-numbers
var primesMax = 100;
var primesArr = Enumerable.Range(0,
(int)Math.Floor(2.52 * Math.Sqrt(primesMax) / Math.Log(primesMax))).Aggregate(
Enumerable.Range(2, primesMax - 1).ToList(),
(result, index) =>
{
var bp = result[index]; var sqr = bp * bp;
result.RemoveAll(i => i >= sqr && i % bp == 0);
return result;
}
).ToArray();
// read the template into the compile unit
CodeCompileUnit ccu;
using (var stm = File.OpenRead(@"..\..\Template.cs"))
ccu=SlangParser.ReadCompileUnitFrom(stm);
// find the target namespace and change it
var ns = CU.GetByName("T_NAMESPACE", ccu.Namespaces);
ns.Name = "TestNS";
// find the target class
var type= CU.GetByName("T_TYPE", ns.Types);
// change the name
type.Name = "TestPrimes";
// get the Primes field:
var primes = CU.GetByName("Primes", type.Members) as CodeMemberField;
// change the init expression to the primes array
primes.InitExpression = CU.Literal(primesArr);
// now write the result out
Console.WriteLine(CU.ToString(ccu));
Ignore the prime number generation code, since it's just for demonstration. Instead, look at the lines past the ToArray() call. What we're doing here is reading the template, looking for T_NAMESPACE, T_TYPE and Primes. When we find them, we make the appropriate modifications to the graph. If you're paying some serious attention you might notice that we are using CU.Literal() to serialize an entire array to a literal value. This is powerful, and for many generation projects, this will be the primary way to create hard coded tables of data in your generated code. I didn't clutter the above with error handling so it will throw if you don't have the things it looks for in the Template.cs template
You can preprocess Slang code using a built in mini T4 engine. It's simple, and doesn't support things like codebehind or any T4 template directives for that matter. However, it's very suitable for what it's meant to do.
You can either access it programatically using SlangPreprocessor.Preprocess() or you can use it with Deslang simply by using T4 tags inside your input documents.
Here's an example of using it with Deslang:
Here is TestTemplace.cst:
using System;
namespace Test
{
class TestTemplate
{
public void <#=Arguments["Method1"]#>() { Console.WriteLine("foo"); }
public void <#=Arguments["Method2"]#>() { Console.WriteLine("bar"); }
<# for(var i=0;i<10;++i) {
#>public int TestField<#=(i+1).ToString()#> = <#=i.ToString()#>;<#
}
#>
}
}
Here is the Deslang command line:
deslang TestTemplate.cst /t4args Method1=Test1&Method2=Test2
Note the /t4args parameter. This allows you to pass through a list of arguments to each of the templates. The same list is used for all templates. These arguments are formed like a query string argument list, and are urlencoded. The can be accessed through the Arguments dictionary from within the template.
This isn't shown by Deslang, but this is the post processed output from above, before being parsed and serialized by Deslang:
using System;
namespace Test
{
class TestTemplate
{
public void Test1() { Console.WriteLine("foo"); }
public void Test2() { Console.WriteLine("bar"); }
public int TestField1 = 0;
public int TestField2 = 1;
public int TestField3 = 2;
public int TestField4 = 3;
public int TestField5 = 4;
public int TestField6 = 5;
public int TestField7 = 6;
public int TestField8 = 7;
public int TestField9 = 8;
public int TestField10 = 9;
}
}
The above technique is one way of building CodeDOM graphs dynamically using Slang.
The CodeDOM Go! Kit contains a powerful facility for doing reflection style queries over CodeDOM graphs. It handles the considerable complexity of doing things like binding to a particular method overload or resolving members, including inherited members. SlangPatcher uses this facility extensively to figure out the type information from the Slang source code (well technically from any CodeDOM graph), such as all declared classes and structs, and each of their members, or all the declared variables within a method or property. You can use it too, although it's not something you'll be using most of the time, if at all.
CodeDomResolver handles most of the dirty work, and it's a complicated class with a lot of features.
One of the main features of interest is being able to, from any point inside a CodeDOM graph, take a snapshot of all of the scope information, such as all the variables in the current scope, and all of the accessible members and types from that scope. This is extremely powerful analysis that gives the same kind of information for CodeDOM graphs that the C# compiler uses internally when compiling C# code. You can do this from any point using GetScope() and passing it the CodeDOM element you want to retrieve the scope for. It returns a CodeDomResolverScope instance that contains all of the scope information. This might be the most common operation with this class.
Another common feature is GetTypeOfExpression() which takes a CodeExpression instance and returns a CodeTypeReference instance that represents the type the expression will evaluate to.
Using the CodeDomResolver goes something like this:
// create a resolver
var res = new CodeDomResolver();
// read the resolver sample into the compile unit
CodeCompileUnit ccu;
using (var stm = File.OpenRead(@"..\..\Resolver.cs"))
ccu = SlangParser.ReadCompileUnitFrom(stm);
// remember to patch it!
SlangPatcher.Patch(ccu);
// add the compile unit to the resolver
res.CompileUnits.Add(ccu);
// prepare the resolver
// any time you add compile units you'll need
// to call Refresh()
res.Refresh();
// go through all expressions in the
// graph and try to get their type
CodeDomVisitor.Visit(ccu, (ctx) => {
var expr = ctx.Target as CodeExpression;
if (null != expr)
{
// we want everything except CodeTypeReferenceExpression
var ctre = expr as CodeTypeReferenceExpression;
if (null == ctre)
{
// get the scope of the expression
var scope = res.GetScope(expr);
CodeTypeReference ctr = res.TryGetTypeOfExpression(expr,scope);
if (null != ctr)
{
Console.WriteLine(CU.ToString(expr) + " is type: " + CU.ToString(ctr));
Console.WriteLine("Scope Dump:");
Console.WriteLine(scope.ToString());
}
}
}
})
Finally, we come to a somewhat less used feature - member selection and binding. Basically, what this allows you to do is query for member methods and properties based on their signature, including the parameters they take, since methods and indexed properties can be overloaded.
Consider the following Slang code (Binding.cs):
using System;
namespace scratch
{
class Binding
{
public void Test(string text)
{
Console.WriteLine(text);
}
public void Test(int value)
{
Console.WriteLine(value);
}
}
}
Let's say we want to select overloaded methods for compatible signatures like a compiler would:
// we'll need the resolver in a bit
var res = new CodeDomResolver();
// read the binding sample into the compile unit
CodeCompileUnit ccu;
using (var stm = File.OpenRead(@"..\..\Binding.cs"))
ccu = SlangParser.ReadCompileUnitFrom(stm);
// add the compile unit to the resolver
res.CompileUnits.Add(ccu);
// prepare the resolver
res.Refresh();
// get the first class available
var tdecl = ccu.Namespaces[1].Types[0];
// capture the scope at the typedecl level
var scope = res.GetScope(tdecl);
// create a new binder with that scope
var binder = new CodeDomBinder(scope);
// get the method group for Test(...)
var methodGroup = binder.GetMethodGroup
(tdecl, "Test", BindingFlags.Public | BindingFlags.Instance);
// select the method that can take a string value
var m =binder.SelectMethod(BindingFlags.Public, methodGroup, new CodeTypeReference[]
{ new CodeTypeReference(typeof(string)) }, null);
Console.WriteLine(CU.ToString((CodeMemberMethod)m));
// select the method that can take a short value
// (closest match accepts int)
m = binder.SelectMethod(BindingFlags.Public, methodGroup, new CodeTypeReference[]
{ new CodeTypeReference(typeof(short)) }, null);
Console.WriteLine(CU.ToString((CodeMemberMethod)m));
Here, we're using a CodeDomBinder together with a CodeDomResolver to select a particular method overload. Note the references to a "method group". A method group is the set of all methods whose signatures are the same except for the type and number of parameters it takes. In other words, it's a method and all of its overloads. Note that this may also return actual reflection MethodInfo objects if they're inherited from a compiled type. Here above, we always presume that they're not going to be. In a real world case, you'd have to check for the type of the m value to see whether it's a CodeMemberMethod or a MethodInfo type.
Here is the way I recommend setting up your build environment:
Make sure when you add a C# file that holds Slang code, set the Build Action to None in the document's properties. Also make sure you update the pre-build step command line whenever you add or remove files in the Exports folder.
First, you'll want to do all the steps from the Standalone Projects section.
Next, add references to both Slang and CodeDomGoKit.
Now you can use Deslang for your static Slang portions and still have all the features of Slang and CodeDOM Go! Kit available for use at runtime.
param
arrays or optional parameters.