Improving fixed width file parsing :: KSarisc — My journey focussed on software development

At work, I have been tasked with parsing fixed width files via a Windows service using C# that follow very specific patterns. One of the patterns is number fields are entirely numeric (e.g. a 5 character field with the final integer value of 10 would be 00010).

Span Parser

I don’t have a good reason to use Span to parse the files and fields, but I have been thinking about improving the performance and my understanding of Span.

The first and easiest changes were to read the file byte by byte and use pooled arrays following Strings are evil as a guideline for improving file parsing performance. Unfortunately, each line splits according to different rules that are determined by the first 4 characters. I am currently adding each possible line parser (extending an interface to keep things generic) to a dictionary and processing the line if it has a match in the dictionary. I pass the line to the parser as a ReadonlySpan, and each parser defines the fields as section slices in the span data.

Memory Allocations

With these steps, I’m mainly trying limit allocations, but obviously, allocations will be required from time to time. When the field is intended to be a string, the allocation is simply a new string of a character array. When the final result is an integer or decimal type, I don’t want to go through the hoops of converting to a string then parsing to an int or make any heap allocations (if they can be avoided). Basically I am going to loop through the string in reverse and add the numeric characters together (adjusting by their place).

Check it out

GitHub - Span Parser