csppg: A Little C# Preprocessor for .. Whatever

Updated on 2021-10-28

This is a little templating engine build tool I use to make my code generation projects more maintainable.

Introduction

I tried using T4 templates. I thought, "hey that sounds like a good idea!" until it started adding references to my project, and making me hunt through Google to figure out how to invoke it from the command line.

I didn't need that mess. K.I.S.S. Everything should be as simple as it can be and no simpler. Enter csppg. It's a quick and dirty tool that allows you to write some input template using <#/#> or <%/%> breakouts and C# code, and then it generates a C# file with a single class, containing a Run() method that takes a text reader and some arguments, and writes to a TextWriter. That class can then be included in your project so you can use it as a code generation template, or say you were making mailers, you could use it as a mail merge - it honestly doesn't care what the output is, I just usually use it to render C# code, or code in other languages.

Understanding this Mess

This thing is harder to explain than it is to use, but if you've ever used ASP (remember that?) you could templatize HTML output. This does nearly the same thing, except the output is whatever you want, and the "server side" language is always C#. You have a Response TextWriter and an Arguments Dictionary<string, object>.

If that's all this was, it would be workable, but that's not quite good enough, because by itself, you can't really do much, if for example, it doesn't have access to the caller's internal types and methods. Also it's pretty nasty to have to do meta-coding all inside an ASP-like page and lose proper Intellisense. You really want some sort of code behind, and you want access to your running application's objects.

To solve that, this spits out the code used to spit out the code instead. You can then include that mess into your project and call it from inside your code where it can have access to everything.

This might help if you understand how ASP context switching works. Consider the following:

<%
var tokenizerDfa = (int[])Arguments["tokenizerDfa"];
var arrayCount = 1;
%>
static readonly int[] _TokenizerDfa = new int[] {
<%for(var i = 0;i<tokenizerDfa.Length;++i) {%><%=tokenizerDfa[i]%><%
if(i<tokenizerDfa.Length-1){%>, <%}%><%if(0==((arrayCount++)%50)){%>
<%}%><%}%>
};

Forgive the formatting. Word wrapping cleanly is impossible here without impacting the resulting output.

This resolves to the following bit of code:

var tokenizerDfa = (int[])Arguments["tokenizerDfa"];
var arrayCount = 1;

            Response.Write("\r\nstatic readonly int[] _TokenizerDfa = new int[] {\r\n");
for(var i = 0;i<tokenizerDfa.Length;++i) {
            Response.Write(tokenizerDfa[i]);
if(i<tokenizerDfa.Length-1){
            Response.Write(", ");
}
if(0==((arrayCount++)%50)){
            Response.Write("\r\n");
}
}
            Response.Write("\r\n};\r\n");

The code just above is what will be generated by csppg. When it is passed a DFA state machine as a packed array of integers, it will render them to some output once it is executed with an int[] tokenizerDfa argument:

var args = new Dictionary<string, object>();
args.Add("tokenizerDfa", _ToDfaTable(tokenizer));
TokenizerGenerator.Run(Console.Out, args); // invokes the csspgen created code

Depending on the tokenizer passed, the output would be something like this:

static readonly int[] _TokenizerDfa = new int[] {
-1, 9, 1146, 2, 9, 13, 32, 32, 1154, 1, 34, 34, 1210, 1, 39, 39, 1250,
1, 46, 46, 1322, 1, 47, 47, 1330, 1, 48, 48, 1376, 1, 49, 57, 1790, 1,
64, 64, 1814, 554, 65, 90, 97, 122, 170, 170, 181, 181, 186, 186, 192, 214,
216, 246, 248, 705, 710, 721, 736, 740, 748, 748, 750, 750, 880, 884,
886, 887, 890, 893, 895, 895, 902, 902, 904, 906, 908, 908, 910, 929,
931, 1013, 1015, 1153, 1162, 1327, 1329, 1366, 1369, 1369, 1377, 1415,
1488, 1514, 1520, 1522, 1568, 1610, 1646, 1647, 1649, 1747,
1749, 1749, 1765, 1766, 1774, 1775, 1786, 1788, 1791, 1791, 1808, 1808,
1810, 1839, 1869, 1957, 1969, 1969, 1994, 2026, 2036, 2037, 2042, 2042,
2048, 2069, 2074, 2074, 2084, 2084, 2088, 2088, 2112, 2136, 2208, 2228, 2308,
2361, 2365, 2365, 2384, 2384, 2392, 2401, 2417, 2432, 2437, 2444, 2447, 2448,
2451, 2472, 2474, 2480, 2482, 2482, 2486, 2489, 2493, 2493, 2510, 2510, 2524,
2525, 2527, 2529, 2544, 2545, 2565, 2570, 2575, 2576, 2579, 2600, 2602, 2608,
2610, 2611, 2613, 2614, 2616, 2617, 2649, 2652, 2654, 2654,
2674, 2676, 2693, 2701, 2703, 2705, 2707, 2728, 2730, 2736, 2738, 2739, 2741,
2745, 2749, 2749, 2768, 2768, 2784, 2785, 2809, 2809, 2821, 2828, 2831, 2832,
2835, 2856, 2858, 2864, 2866, 2867, 2869, 2873, 2877, 2877, 2908, 2909, 2911,
2913, 2929, 2929, 2947, 2947, 2949, 2954, 2958, 2960, 2962, 2965, 2969, 2970,
2972, 2972, 2974, 2975, 2979, 2980, 2984, 2986, 2990, 3001, 3024, 3024,
3077, 3084, 3086, 3088, 3090, 3112, 3114, 3129, 3133, 3133, 3160, 3162, 3168,
3169, 3205, 3212, 3214, 3216, 3218, 3240, 3242, 3251, 3253, 3257, 3261, 3261,
3294, 3294, 3296, 3297, 3313, 3314, 3333, 3340, 3342, 3344,
3346, 3386, 3389, 3389, 3406, 3406, 3423, 3425, 3450, 3455, 3461, 3478, 3482,
3505, 3507, 3515, 3517, 3517, 3520, 3526, 3585, 3632, 3634, 3635, 3648, 3654,
3713, 3714, 3716, 3716, 3719, 3720, 3722, 3722, 3725, 3725, 3732, 3735,
3737, 3743, 3745, 3747, 3749, 3749, 3751, 3751, 3754, 3755, 3757, 3760, 3762,
3763, 3773, 3773, 3776, 3780, 3782, 3782, 3804, 3807, 3840, 3840, 3904, 3911,
3913, 3948, 3976, 3980, 4096, 4138, 4159, 4159, 4176, 4181, 4186, 4189, 4193,
4193, 4197, 4198, 4206, 4208, 4213, 4225, 4238, 4238, 4256, 4293, 4295, 4295,
4301, 4301, 4304, 4346, 4348, 4680, 4682, 4685, 4688, 4694,
4696, 4696, 4698, 4701, 4704, 4744, 4746, 4749, 4752, 4784, 4786, 4789, 4792,
4798, 4800, 4800, 4802, 4805, 4808, 4822, 4824, 4880, 4882, 4885, 4888, 4954,
4992, 5007, 5024, 5109, 5112, 5117, 5121, 5740, 5743, 5759, 5761, 5786, 5792,
5866, 5873, 5880, 5888, 5900, 5902, 5905, 5920, 5937, 5952, 5969, 5984, 5996,
5998, 6000, 6016, 6067, 6103, 6103, 6108, 6108, 6176, 6263, 6272, 6312, 6314,
6314, 6320, 6389, 6400, 6430, 6480, 6509, 6512, 6516, 6528, 6571, 6576, 6601,
6656, 6678, 6688, 6740, 6823, 6823, 6917, 6963, 6981, 6987, 7043, 7072, 7086,
7087, 7098, 7141, 7168, 7203, 7245, 7247, 7258, 7293, 7401, 7404, 7406, 7409,
7413, 7414, 7424, 7615, 7680, 7957, 7960, 7965, 7968, 8005, 8008, 8013, 8016,
8023, 8025, 8025, 8027, 8027, 8029, 8029, 8031, 8061, 8064, 8116, 8118, 8124,
8126, 8126, 8130, 8132, 8134, 8140, 8144, 8147,
8150, 8155, 8160, 8172, 8178, 8180, 8182, 8188, 8305, 8305, 8319, 8319, 8336,
8348, 8450, 8450, 8455, 8455, 8458, 8467, 8469, 8469, 8473, 8477, 8484, 8484,
8486, 8486, 8488, 8488, 8490 ...
};

Obviously, spitting out formatted arrays isn't even something an ASP-like engine is especially good at, which is why I typically write a codebehind method to do it for me.

static readonly int[] _TokenizerDfa = <%
var dt = ToLexer(rules);
WriteCSArray(dt,Response);
%>

I started using this method over the traditional manual writing out of code for my latest version of Reggie because the routines were getting difficult to update and maintain.

Using this Mess

Using the tool is pretty simple.

Usage: csppg.exe <inputfile> [/output <outputfile>] [/class <codeclass>]
   [/namespace <codenamespace>] [/internal] [/ifstale]

csppg 0.5.0.0 - Runs an ASP/T4 style template over some input,
and generates code that can be used to run the template.

   <inputfile>     The input template
   <outputfile>    The preprocessor source file - defaults to STDOUT
   <codeclass>     The name of the main class to generate - default derived from <outputfile>
   <codenamespace> The namespace to generate the code under - defaults to none
   <internal>      Mark the generated class as internal - defaults to public
   <ifstale>       Only generate if the input is newer than the output

The output file is kind of grotty, but it can be included in your code generator project to run the template you fed it.

Creating a codebehind is simply a matter of creating a partial class with the same name.

Here's a snippet of some of the output:

using System;
using System.IO;
using System.Text;
using System.Collections.Generic;
namespace Reggie {
    internal partial class TableMatcherGenerator {
        public static void Run(TextWriter Response, IDictionary<string, object> Arguments) {

var rules = (IList<LexRule>)Arguments["rules"];
var ignoreCase = (bool)Arguments["ignorecase"];
var inputFile = (string)Arguments["inputfile"];
var blockEnds = BuildBlockEnds(rules,inputFile,ignoreCase);

            Response.Write("\r\n");
 for(var k = 0;k<2;++k) {
bool reader = k==1;
string curtype = reader ?
                 "System.IO.TextReader":"System.Collections.Generic.IEnumerator<char>";
string curname = reader ? "text":"cursor";
string texttype = reader ?
                  "System.IO.TextReader":"System.Collections.Generic.IEnumerable<char>";

            Response.Write("\r\nstatic System.Collections.Generic.IEnumerable
            <System.Collections.Generic.KeyValuePair<long, string>>
            _MatchTable(int[] entries,int[] blockEnd, ");
            Response.Write(texttype);
            Response.Write(" text, long position)\r\n
            {\r\n    var sb = new System.Text.StringBuilder();\r\n");
if(!reader) {
            Response.Write("    var cursor = text.GetEnumerator();");
}

Forgive the formatting. It's generated by a tool and not really all that hand editable.

Now that you have something like that you can use it like so:

var args = new Dictionary<string, object>();
args.Add("rules", rules);
args.Add("inputfile", inputFile);
args.Add("blockEnds", blockEnds);
args.Add("lines", false);
args.Add("ignorecase", ignoreCase);
// late bind so if we break the build it doesn't blow up all over the place
// when we have to delete the output
var tp = Type.GetType("Reggie.TableMatcherGenerator");
tp.GetMethod("Run", BindingFlags.Static |
              BindingFlags.Public).Invoke(null, new object[] { writer, args });
//TableMatcherGenerator.Run(writer, args);

Notice I'm late binding, rather than using the simpler call (commented out above). This isn't strictly necessary, but what it allows for is in the case where you wrote bad code in the template it may make the project fail to compile. In order to get the project to recompile, you can delete the code in the generated file. Since it's late bound, it won't cause a compile error when the method isn't found. It just makes life a little easier.

History

  • 28th October, 2021 - Initial submission