Cinchoo ETL — Compare Two CSV Files for Add, Changed or Deleted Records (Master vs Detail)

Cinchoo
2 min readDec 13, 2021

--

1. Introduction

ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.

This article talks about comparing two CSV files for Add, Deleted and Changed records using Cinchoo ETL framework. It is very simple to use, with few lines of code, the comparison can be done. You can do the comparison of large files easily as the the operation is stream based, quite fast and with low memory footprint.

2. Requirement

This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.

3. How to Use

3.1 Sample Data

Let’s begin by looking into below sample CSV files. Assuming these CSV files are large in sizes, comes with different fields, may have column counts vary on them.

Listing 3.1.1. Master CSV file (master.csv)

ID,name
1,Danny
2,Fred
3,Sam

Listing 3.1.2. Detail file (detail.csv)

ID,name
1,Danny
3,Pamela
4,Fernando

After successful comparison, the expected CSV file should be produced as below:

In here, each record we captured the statuses from the comparison operation and output them to the file.

Listing 3.1.3. CSV output (output.csv)

ID,name,Status
1,Danny,Unchanged
2,Fred,Deleted
3,Pamela,Changed
4,Fernando,New

The first thing to do is to install ChoETL/ChoETL.NETStandard nuget package. To do this, run the following command in the Package Manager Console.

.NET Framework

Install-Package ChoETL

.NET Core

Install-Package ChoETL.NETStandard

Now add ChoETL namespace to the program.

using ChoETL;

3.2 Comparision Operation

As input files may be large in sizes, we need to consider way to merge them efficiently. Here is an approach to adapt to merge such CSV files.

  1. First open each CSV file (master and detail CSV files), put them into variables.
  2. Then open ChoCSVWriter for writing.
  3. Finally, call the Compare extension method on input CSV file streams for comparison. In here, specify the key ("ID") columns as well as compare ("name") columns. Key columns used to match the records between CSV files. Compare columns used to find if there is any change in the matched records. Compare operation yields Tuple<Master, Detail, Status> records.

Listing 3.2.1. Compare Master — Detail CSV files

private static void CompareCSVFiles()
{
//Open master CSV file
var r1 = ChoCSVReader.LoadText("master.csv"). WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();
//Open detail CSV file
var r2 = ChoCSVReader.LoadText("detail.csv"). WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();
//Open final output CSV file
using (var w = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
{
//Compare Master and Detail records
foreach (var t in r1.Compare(r2, "ID", "name" ))
{
dynamic v1 = t.MasterRecord as dynamic;
dynamic v2 = t.DetailRecord as dynamic;
//Check on the status and capture it
if (t.Status == CompareStatus.Unchanged || t.Status == CompareStatus.Deleted)
{
v1.Status = t.Status.ToString();
w.Write(v1);
}
else
{
v2.Status = t.Status.ToString();
w.Write(v2);
}
}
}
}

Sample fiddle: https://dotnetfiddle.net/uPR5Sq

--

--

Cinchoo
Cinchoo

Written by Cinchoo

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

No responses yet