Skip to content

andyvroberts/crimson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crimson

Ingest the land registry Complete file of UK sold property

Application

The UK Land Registry provide a single file with all sold property prices accumulated since the 1st of January 1995.
http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv"

As of April 2023, this file contained approx. 28 million records and was 4.6Gb (uncompressed).

The latest dated records within this file can be up to several weeks old (the file is updated only periodically from the smaller yearly version). To find the newest date in the file, sort by column 3.

sort -t',' -k3 pp-complete.csv | head -1

The application should read the whole property price file, group and sort the records, and write the outputs to Azure in a single process.

Objectives

  1. Create consumable data files of historical property prices grouped by Postcode.
  2. Take advantage of multi-core/multi-threading CPUs.
  3. Enable the process to be re-executable.
  4. Store the output files in a compressed format.

Dotnet

Create a classlib to contain the application.

dotnet new classlib --name Crimson --framework "net6.0"

Add required packages from nuget

dotnet add package Microsoft.Extensions.Hosting --version 7.0.1

In the csproj file, ensure you are creating an executable by adding the property group:

<OutputType>Exe</OutputType>

If not using Visual Studio then you have to manage projects within a solution yourself.
For example, create a new classlib, add a new solution file then register the classlib to the solution file.

dotnet new classlib --name Crimson.CompRoot --framework "net6.0"

dotnet new sln --name Crimson

dotnet sln add Crimson.CompRoot

In some projects, you will now have to add references to other projects in their csproj file.
For example, in the executable (entry point) project, add the composition root project

dotnet add Crimson/Crimson.csproj reference Crimson.CompRoot/Crimson.CompRoot.csproj

To delete a large number of files on Linux (in case you have the 'too many args' issue) try this.

find . -name "*.json.gz" -print0 | xargs -0 rm

Releases

No releases published

Packages

No packages published