Dynamic code execution in Roslyn

Introduction

This is a companion piece for the original article about dynamic compilation in .NET Framework with CodeDomProvider, which is a fun developer war story filled with horror, glory, and the unmistakeable stench of tech debt on fire. If you haven't read that one yet, maybe do that first. Not because it'll make you a better engineer, but because it'll lower your expectations for what's considered "quality code" during a crisis.

While every responsible manager will tell you these kinds of solutions are completely unacceptable, the decision process usually gets way more lax when the deadlines start resembling a communal noose around the dev team's necks.

I actually wrote the original article and the accompanying code around 2018, but somehow I never got to publishing it. Coming across it recently, I thought it would be a good piece among starter posts for my blog, so I refreshed it up a bit and spiced up the writing. But the general idea (and, surprisingly, the code itself) remains unchanged.

We could leave it at that, but... aren't you even a little bit curious about how to achieve similar levels of irresponsibility as a developer using a modern stack?

Because I am.

Now, why am I saying "irresponsibility"? There aren't many valid reasons why you'd want to execute dynamically typed code, but there are some, of course!

For example, video games that allow in-game scripting, like Space Engineers (which nowadays has much less Klang! than it used to, back when), or the OG of ingame scripting, Colobot.

Fair warning, though: if you don't approach Colobot with nostalgia tears blurring your vision... you might not like what you'll see. But the concept is great, regardless of how ugly it looks.

So.

This time, it's not a war story, but myself learning something new. I have avoided the concept of dynamic code execution outside of SQL queries since introducing the good ol' "Converter" rewrite into that cursed ERP software several employers ago. Maybe it's time to revisit the concept, but without CodeDomProvider.

Oh, and here's an IMPORTANT DISCLAIMER:

When not to do dynamic evaluation

In every damned situation where it's NOT absolutely necessary. The whole "Converter" should go into a separate microservice that would be maintained as a different product. With an entire CI/CD pipeline properly automated, deployment of fixes would be only marginally longer than editing .php files on prod or .cs files in the databases - but you would get complete and absolute safety through actually following the most basic principles of software development, such as:

"Don't be a clod, ne'er do eval in prod".

If you REALLY must add some sort of scripting engine to your application, even that should be heavily sandboxed in an enterprise product. And by that, I don't mean tossing the script into some .NET sandbox but actually running it in a containerized environment and allowing access only to whatever data it absolutely needs - and with read-only privileges. But why am I even saying this - y'all already know this, right?

Right?

Anyway, the task at hand and the irresponsibility? That whole "Converter" bit could've been dealt with in a multitude of ways, and dynamic code execution is possibly the worst of them all, with all its caveats, headaches, and whatnot. But what is life without a little FAFODD? So, for those who didn't read the original article, let's recap the task first.

The problem, revisited

Just like in the old war story, let's go through the requirements. They're pretty much copy-pasted from the other article. We're dialing down the sardonic tone for the sake of readability - just by a notch, though, don't worry.

Business requirements

  1. The module must accept raw .txt files (preprocessed PDF invoices with an external tool) and output CSV files containing structured data: SKU, count, price per unit.
  2. Each vendor has at least one unique format. The system must support maintaining separate parsing configurations assigned to vendors in a many-to-one relationship.
  3. Parsing rules must support arbitrary and unknown complexity, including contextual lookbacks, conditional transformations, and derived logic.
  4. When a vendor changes their invoice format, the fix must be deployable within approximately two hours without application downtime.
  5. Maintainers must be able to author, edit, and test the parsing logic live.

Technical requirements

  1. Parsing logic must be editable at runtime without recompiling or redeploying the application.
  2. Runtime-editable logic must be written in C# and dynamically compiled within the host .NET Framework application.
  3. The module has to be integrated into the existing ERP as a module, not a standalone app.
  4. The solution cannot depend on CI/CD infrastructure.
  5. Dynamically compiled code must be type-safe.

Interpretation

Same shit, different .dll - and that makes a huge difference actually. This time, we're on modern .NET Core, with Roslyn at our side. And maybe we actually care about security. So, let's talk about a "Better Converter", now with sandboxing to safeguard ourselves from malicious script-kiddies. Sounds easy?

If it does, good. First part - the actual implementation - isn't that hard. And it's much cleaner than the .NET Framework way.

Roslyn, the better .NET compiler platform

First, I think we should take a good look at what Roslyn does and how. The previous disclaimer about dynamic code execution aside, there are some valid cases for using Roslyn that aren't just laziness or tech constraints due to managerial demand that you deliver while they don't give you the right tools to succeed. While the article here is partially an "I've done it, so you don't have to" piece, if you really have to do it - let's make sure you do it right.

Naive native implementation

Let's assume you have a working project, something already existing or something completely fresh. Roslyn isn't included in core .NET Core. What we'll need are two packages, so let's install them into our project right away:

ps
1Install-Package Microsoft.CodeAnalysis.Scripting2Install-Package Microsoft.CodeAnalysis.CSharp.Scripting

Now, what does that get us? The ability to run C# code from within our app, like this:

cs
1var script = "2 + 2";23var evalResult = await CSharpScript.EvaluateAsync<int>(script);4var runResult = (await CSharpScript.RunAsync<int>(script)).ReturnValue;

There are two different methods that can handle dynamic execution, as you can see. First, the important bit: for both, the generic overload defines the return value type. In our case, it's int. Otherwise, we'd get an object. But you'll also notice that EvaluateAsync returns the end result of the evaluated script, while RunAsync returns an object with a property ReturnValue that contains the value itself. Why?

The answer immediately shows why Roslyn is so great.

States in Roslyn

The EvaluateAsync method, as the name hints, simply evaluates some logic. Done and gone, you can maybe evaluate it again, but that's about it. On the other hand, RunAsync is a stateful approach that executes the code and returns its state. We can use that to continue the execution, like this:

cs
1var state = await CSharpScript.RunAsync<int>("int x = 10; return x;");2Console.WriteLine(state.ReturnValue); // Prints 103state = await state.ContinueWithAsync<int>("return x + 5;");4Console.WriteLine(state.ReturnValue); // Prints 15

There's much more to it. For example, you can grab values of internal variables within the state. The ReturnValue doesn't reflect the variable values - while we return x + 5, we don't the change value of x, meaning inside the state, x still equals 10. Each execution returns a new, immutable state object, meaning we can store them all and see how they changed historically.

It's fun, it's amazing, it's powerful. But let's focus on our "Converter 2026" task at hand. Before we skip forward, let's talk about how scripts are built.

Types of scripts

They can be simple C# logic with one or many lines. The logic can declare variables, call methods, and even define its own, internal classes. It ends either with a return statement or by simply stating the return value without a semicolon at the end. There's no "official" Roslyn terminology for these script types that I could find because Roslyn scripting doesn't seem big on ceremony. But we can... sort of infer the names for them.

cs
1// This is a valid Roslyn script, returns False2// Notice no return statement, and no semicolon at the end3var scriptReturnless = @"4var a = 2;5var b = 4;6a == b7";89// This is also a valid Roslyn script, returns 610// This time, there's a return statement and a semicolon11var scriptWithReturn = @"12var a = 2;13var b = 4;14return a + b;15";1617// And that's a valid Roslyn script that defines its own18// class, creates it, and then prints the result of an inner19// method into the Console20var scriptNoReturnValue = @"21using System;22public class Calc {23    public int Add(int a, int b) {24        return a + b;25    }26}27var calc = new Calc();28Console.WriteLine(calc.Add(2, 7));29";

So, how would we name these?

  • scriptReturnless uses a simple expression to return the value. We could name those expression scripts. Alternatively, considering how we never directly state "Hey, here's a value to return" and Roslyn just wings it, we could call those implicit-return scripts.
  • scriptWithReturn uses a return statement. Logically, we could call those scripts statement scripts. Or, following the above logic, we could name them explicit-return scripts.
  • scriptNoReturnValue simply doesn't return anything. You could call those resultless scripts (versus resultful scripts being two previous ones), though it sounds a bit weird. Kinda like a restless API versus restful API. But hey, I didn't come up with those.

Hold up, actually, I did come up with those names myself.

Hydration break 💧

It's a good moment to take a short pause before the last stretch. We've covered a lot already, take a moment to reset your attention, drink some water. Statistcially speaking, we don't drink nearly enough.

Done? Let's go.

Using arguments, adding references

Now, there wouldn't be much point in doing these if we couldn't pass some arguments into our scripts or load assemblies and libraries. Let's talk about that because this is where Roslyn gets a little bit more complicated. But just a little bit, luckily. Let's consider a nice piece of code like this:

Problem 1: Missing references
1var script = @"numbers.Sum(x => x)";23// Below fails miserably:4var sum = await CSharpScript.EvaluateAsync<int>(script);5Console.WriteLine(sum);

We can infer what the script does - it takes an enumberable collection and returns the sum of all its values. Two issues that arise (ones that totally aren't foreshadowed by the title of this part) are that the script doesn't know what the numbers variable is, and that it has no access to System.Linq namespace. But we can easily fix that bit. First, we need to create some script options and tell Roslyn to load them before executing the script:

Problem 2: Missing variables
1var script = @"numbers.Sum(x => x)";23// We create ScriptOptions and tell Roslyn to load some assemblies4var options = ScriptOptions.Default5    .AddReferences(typeof(System.Linq.Enumerable).Assembly)6    .AddImports(typeof(System.Linq.Enumerable).Namespace);78// Below still fails, less miserably:9var sum = await CSharpScript.EvaluateAsync<int>(script, options);10Console.WriteLine(sum);

First, we're adding a reference to the assembly that contains Linq by using the AddReferences method. It's the equivalent of adding a <PackageReference> entry to your .csproj file. It tells Roslyn, "Hey, here's that DLL you gotta load before you execute the script." Then, we use AddImports, which is an equivalent of adding using System.Linq; at the top of the file. Technically, you could simply write it inside the script or even ommit it, and .NET Core should be able to infer it, but let's stay nice, clean, and readable.

What's still missing are parameters, so let's add them too. We could just do it by script formatting, but we're above that type of madness, right? We're here to lose sanity over something truly inhuman, not oversized string interpolation with pointless extra logic. Let's define a class that'll hold globals for Roslyn, then add it to our script.

This version finally works
1// We instantiate the class that we 2// define at the end of our script3var parameters = new ScriptParameters {4    numbers = [1, 2, 3, 4]5};67var script = @"numbers.Sum(x => x)";8var options = ScriptOptions.Default9    .AddReferences(typeof(System.Linq.Enumerable).Assembly)10    .AddImports(typeof(System.Linq.Enumerable).Namespace);1112// Below finally evaluates properly:13var sum = await CSharpScript.EvaluateAsync<int>(14    script,15    options,16    // We add the global parameters:17    parameters,18    // And tell Roslyn what Type they are:19    typeof(ScriptParameters)20);21Console.WriteLine(sum); // Returns 102223public class ScriptParameters24{25    // Naming needs to match the naming in the script,26    // hence the lower-case "numbers", preserved for the27    // sake of readability28    public required IEnumerable<int> numbers { get; set; }29}

Quick note: By default, Roslyn loads the current assembly into its context. But in some cases, referencing the ScriptParameters assembly might be required. For example, when we've got a separate MyApp.Common project holding models and interfaces, and want one of those inside our Roslyn script, we'll need to add an extra reference to our script options. Like this:

cs
1options.AddReferences(typeof(ScriptParameters).Assembly);

Class instantiation with Roslyn

We need to switch things around a little, but a lot of code will remain eerily similar. Let's assume we start from scratch, so it's a new project, but we've already imported required libraries from Nuget. First, we need to define the InvoiceParser class.

cs
1var invoiceParserScript = @"2public class InvoiceParser : IInvoiceParser {3    public IEnumerable<string> ParseInvoice(IEnumerable<string> lines) {4        return lines.Select(line => ""Parsed: "" + line);5    }6}7return new InvoiceParser();";

Notice an interesting thing we're doing: this is still a script. It defines a class first, but then it instantiates a new object and returns it to us. Unlike CodeDomProvider, where we needed to compile an assembly, and then used it to instantiate our class inside our program - we can tell Roslyn to simply return the instance of said class to us.

cs
1// This goes at the very end of our Program.cs2public interface IInvoiceParser3{4    IEnumerable<string> ParseInvoice(IEnumerable<string> lines);5}

Next, we'll define the interface contract for our invoice parser. It goes, of course, at the very end of our top-level code. Otherwise, CLR will throw a fit. It's a very simple interface, with just one method defined. But here's how it's a genius solution: in some cases, you could define more than one method. Then, the returned object can be pushed around your app, where the logic uses it in a type-safe manner.

cs
1var options = ScriptOptions.Default2    .AddReferences(typeof(IInvoiceParser).Assembly)3    .AddImports(4        typeof(System.Linq.Enumerable).Namespace,5        typeof(System.Collections.Generic.IEnumerable<>).Namespace6    );

Don't forget to add proper usings via AddImports method and assemblies through AddReferences. I know we talked about it already, and I'm not assuming you're all dorks who need to be told the basics repeatedly. It's because I forgot to import System.Collections.Generic and spent half an hour trying to figure out why Roslyn is screaming at me about having no idea what an IEnumerable is.

I am the one who dorks.

Now, for the last part, it's not gonna be much different from what we've already covered.

cs
1var script = CSharpScript.Create<IInvoiceParser>(invoiceParserScript, options);2var instance = script.RunAsync().Result.ReturnValue;34// Drumroll...5var result = instance.ParseInvoice(["Line 1", "Line 2", "Line 3"]);6Console.WriteLine(string.Join(Environment.NewLine, result));

And there you have it. Our very MVP implementation of "Converter 2025" is ready. Not prod-ready, mind you, but ready enough to brag about dynamic compilation to your friends.

An afterword

Now, I promised you a third part, and I will deliver, of course. Eventually.

But just for the sake of clarity, let me drop the important disclaimer one very last time before we part ways:

This is dynamic code execution. If you ever allow anyone to input just about anything into those scripts, they could cause significant damage unless everything is properly sandboxed, secured, and limited. Even with all that security in place, I wouldn't say Roslyn is 100% secure. For the most part of your career, you probably won't need dynamic code execution. There's always another way to achieve it, and Roslyn will simply be a nasty shortcut that'll end up making you miserable somewhere down the road. Try to avoid it if you can.