Cinchoo ETL — Compare Two CSV Files for Add, Changed or Deleted Records (Master vs Detail)
- Download Cinchoo ETL source
- Download Cinchoo ETL binary (.NET Core)
- Download Cinchoo ETL binary (.NET Framework)
- Working Sample 1 (.NET Fiddle)
1. Introduction
ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.
This article talks about comparing two CSV files for Add, Deleted and Changed records using Cinchoo ETL framework. It is very simple to use, with few lines of code, the comparison can be done. You can do the comparison of large files easily as the the operation is stream based, quite fast and with low memory footprint.
2. Requirement
This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.
3. How to Use
3.1 Sample Data
Let’s begin by looking into below sample CSV files. Assuming these CSV files are large in sizes, comes with different fields, may have column counts vary on them.
Listing 3.1.1. Master CSV file (master.csv)
ID,name
1,Danny
2,Fred
3,Sam
Listing 3.1.2. Detail file (detail.csv)
ID,name
1,Danny
3,Pamela
4,Fernando
After successful comparison, the expected CSV file should be produced as below:
In here, each record we captured the statuses from the comparison operation and output them to the file.
Listing 3.1.3. CSV output (output.csv)
ID,name,Status
1,Danny,Unchanged
2,Fred,Deleted
3,Pamela,Changed
4,Fernando,New
The first thing to do is to install ChoETL/ChoETL.NETStandard
nuget package. To do this, run the following command in the Package Manager Console.
.NET Framework
Install-Package ChoETL
.NET Core
Install-Package ChoETL.NETStandard
Now add ChoETL
namespace to the program.
using ChoETL;
3.2 Comparision Operation
As input files may be large in sizes, we need to consider way to merge them efficiently. Here is an approach to adapt to merge such CSV files.
- First open each CSV file (master and detail CSV files), put them into variables.
- Then open ChoCSVWriter for writing.
- Finally, call the
Compare
extension method on input CSV file streams for comparison. In here, specify thekey
("ID
") columns as well ascompare
("name
") columns. Key columns used to match the records between CSV files. Compare columns used to find if there is any change in the matched records. Compare operation yieldsTuple<Master, Detail, Status>
records.
Listing 3.2.1. Compare Master — Detail CSV files
private static void CompareCSVFiles()
{
//Open master CSV file
var r1 = ChoCSVReader.LoadText("master.csv"). WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>(); //Open detail CSV file
var r2 = ChoCSVReader.LoadText("detail.csv"). WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>(); //Open final output CSV file
using (var w = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
{
//Compare Master and Detail records
foreach (var t in r1.Compare(r2, "ID", "name" ))
{
dynamic v1 = t.MasterRecord as dynamic;
dynamic v2 = t.DetailRecord as dynamic; //Check on the status and capture it
if (t.Status == CompareStatus.Unchanged || t.Status == CompareStatus.Deleted)
{
v1.Status = t.Status.ToString();
w.Write(v1);
}
else
{
v2.Status = t.Status.ToString();
w.Write(v2);
}
}
}
}
Sample fiddle: https://dotnetfiddle.net/uPR5Sq