Generating sample data with PowerShell and lambda expressions aka ScriptBlocks

by Klaus Graefensteiner 23. February 2011 03:33

Introduction

For a project that I am currently working on I needed to create a large amount of sample data that is being stored in a SQL Server database. Since I wanted to keep the format of the sample data very generic and be usable not only in relational database systems like SQL Server, Oracle or MySQL, but also in real time historians like OSI PI, Wonderware InSQL or GE Proficy, I decided to generate the raw output format as CSV file. This CSV file can then be imported using the various import methods of each storage system. In further blog posts I am going to talk in detail about each of these import methods. The first one will be describing how to import a CSV file into SQL Server using the bcp (bulk copy) utility.

I decided to use PowerShell and ScriptBlocks to generate the CSV file.

Sample Data Requirements

The PowerShell script generates only data for the main table that I called TagFloatPoint. This table stores  Time Series data. Time Series data is data that you collect over time and store it with a time stamp associated. Recording the Dow Jones or the outside temperature over time creates Time Series data. The DDL (Database Definition Language) equivalent of the CSV file structure looks in SQL like this:

CREATE TABLE [dbo].[FloatTagPoint](
	[TagPointID] [int] IDENTITY(1,1) NOT NULL,
	[TagID] [int] NOT NULL,
	[TagValue] [float] NOT NULL,
	[TagTime] [datetime] NOT NULL,
	[TagQuality] [int] NOT NULL
) ON [PRIMARY]
GO

The PowerShell script takes as input a start time and a end time and iterates a “time curser” from start time to end time in 1 minutes intervals. For each iteration the “time curser’s” time value will be used as a parameter for a collection of “lambda” expressions, which are called ScriptBlocks in PowerShell. Each lambda expression contains a formula for calculating a floating point value that is specific to the TagID and the time value.

I designed the ScriptBlocks to generate values that can be easily identified based on the TagID and the time stamp column. That will make it easy for me to verify later whether the data is correct or not.

Since I needed sample data for tables with Tag Values that are of types float, integer, string and discrete, I used one script for each type. In a relational database these four resulting CSV files will be imported into four corresponding tables.

The resulting CSV file will look like this:

image

Figure 1: Sample Data CSV file structure

PowerShell Script

Set-StrictMode -Version "Latest"


<#
           (0 ,'float_tag_+0000_+0059'),
           (1 ,'float_tag_+0100_+0123'),
           (2 ,'float_tag_+0201_+0231'),
           (3 ,'float_tag_+0301_+0312'),
           (4 ,'float_tag_+0400_+0459'),
           (5 ,'float_tag_+0500_+0523'),
           (6 ,'float_tag_+0601_+0631'),
           (7 ,'float_tag_+0701_+0712'),
           (8 ,'float_tag_-0800_-0059'),
           (9 ,'float_tag_-0900_-0923'),
           (10 ,'float_tag_-1001_-1031'),
           (11 ,'float_tag_-1101_-1112'),
           (12 ,'float_tag_-1200_-1259'),
           (13 ,'float_tag_-1300_-1323'),
           (14 ,'float_tag_-1401_-1431'),
           (15 ,'float_tag_-1501_-1512'),
           (16 ,'integer_tag_+0000_+0059'),
           (17 ,'integer_tag_+0100_+0123'),
           (18 ,'integer_tag_+0201_+0231'),
           (19 ,'integer_tag_+0301_+0312'),
           (20 ,'integer_tag_+0400_+0459'),
           (21 ,'integer_tag_+0500_+0523'),
           (22 ,'integer_tag_+0601_+0631'),
           (23 ,'integer_tag_+0701_+0712'),
           (24 ,'integer_tag_-0800_-0059'),
           (25 ,'integer_tag_-0900_-0923'),
           (26 ,'integer_tag_-1001_-1031'),
           (27 ,'integer_tag_-1101_-1112'),
           (28 ,'integer_tag_-1200_-1259'),
           (29 ,'integer_tag_-1300_-1323'),
           (30 ,'integer_tag_-1401_-1431'),
           (31 ,'integer_tag_-1501_-1512'),
           (32 ,'string_tag_+0000_+0059'),
           (33 ,'string_tag_+0100_+0123'),
           (34 ,'string_tag_+0201_+0231'),
           (35 ,'string_tag_+0301_+0312'),
           (36 ,'string_tag_+0400_+0459'),
           (37 ,'string_tag_+0500_+0523'),
           (38 ,'string_tag_+0601_+0631'),
           (39 ,'string_tag_+0701_+0712'),
           (40 ,'string_tag_-0800_-0059'),
           (41 ,'string_tag_-0900_-0923'),
           (42 ,'string_tag_-1001_-1031'),
           (43 ,'string_tag_-1101_-1112'),
           (44 ,'string_tag_-1200_-1259'),
           (45 ,'string_tag_-1300_-1323'),
           (46 ,'string_tag_-1401_-1431'),
           (47 ,'string_tag_-1501_-1512'),
           (48 ,'discrete_tag_+0000_+0059'),
           (49 ,'discrete_tag_+0100_+0123'),
           (50 ,'discrete_tag_+0201_+0231'),
           (51 ,'discrete_tag_+0301_+0312'),
           (52 ,'discrete_tag_+0400_+0459'),
           (53 ,'discrete_tag_+0500_+0523'),
           (54 ,'discrete_tag_+0601_+0631'),
           (55 ,'discrete_tag_+0701_+0712'),
           (56 ,'discrete_tag_-0800_-0059'),
           (57 ,'discrete_tag_-0900_-0923'),
           (58 ,'discrete_tag_-1001_-1031'),
           (59 ,'discrete_tag_-1101_-1112'),
           (60 ,'discrete_tag_-1200_-1259'),
           (61 ,'discrete_tag_-1300_-1323'),
           (62 ,'discrete_tag_-1401_-1431'),
           (63 ,'discrete_tag_-1501_-1512')
#>


$TagCalculations = @{}

$TagCalculations[0] = { param([DateTime] $time) return ($time.Minute + $time.Minute * 0.0001)  }
$TagCalculations[1] = { param([DateTime] $time) return ($time.Hour + 100 + ($time.Hour + 100) * 0.0001)  }
$TagCalculations[2] = { param([DateTime] $time) return ($time.Day + 200 + ($time.Day + 200) * 0.0001)  }
$TagCalculations[3] = { param([DateTime] $time) return ($time.Month + 300 + ($time.Month + 300) * 0.0001)  }
$TagCalculations[4] = { param([DateTime] $time) return ($time.Minute + 400 + ($time.Minute + 400) * 0.0001)  }
$TagCalculations[5] = { param([DateTime] $time) return ($time.Hour + 500 + ($time.Hour + 500) * 0.0001)  }
$TagCalculations[6] = { param([DateTime] $time) return ($time.Day + 600 + ($time.Day + 600) * 0.0001)  }
$TagCalculations[7] = { param([DateTime] $time) return ($time.Month + 700 + ($time.Month + 700) * 0.0001)  }
$TagCalculations[8] = { param([DateTime] $time) return ($time.Minute + $time.Minute * 0.0001)  }
$TagCalculations[9] = { param([DateTime] $time) return -($time.Hour + 100 + ($time.Hour + 100) * 0.0001)  }
$TagCalculations[10] = { param([DateTime] $time) return -($time.Day + 200 + ($time.Day + 200) * 0.0001)  }
$TagCalculations[11] = { param([DateTime] $time) return -($time.Month + 300 + ($time.Month + 300) * 0.0001)  }
$TagCalculations[12] = { param([DateTime] $time) return -($time.Minute + 400 + ($time.Minute + 400) * 0.0001)  }
$TagCalculations[13] = { param([DateTime] $time) return -($time.Hour + 500 + ($time.Hour + 500) * 0.0001)  }
$TagCalculations[14] = { param([DateTime] $time) return -($time.Day + 600 + ($time.Day + 600) * 0.0001)  }
$TagCalculations[15] = { param([DateTime] $time) return -($time.Month + 700 + ($time.Month + 700) * 0.0001)  }


$CSVContent = New-Object -TypeName "System.Text.StringBuilder"
#$StartTime = New-Object -TypeName "DateTime" -ArgumentList 2010, 12, 1, 0, 0, 0, 0 #Tuesday, December 01, 2009 12:00:00 AM
#$EndTime = New-Object -TypeName "DateTime" -ArgumentList 2011, 1, 31, 23, 59, 0, 0 #Monday, January 31, 2011 11:59:00 PM


$EndTime = Get-Date
$StartTime = $EndTime.AddDays(-40)


$CurrentTime = $StartTime

while($CurrentTime -le $EndTime)
{
    $CurrentTime = $CurrentTime.AddMinutes(1)
    
    for($i = 0; $i -lt 16; $i++)
    {
        $TagValue = & $TagCalculations[$i] $CurrentTime
        $Line = "12`t{0}`t{1}`t{2}`t{3}`r`n" -f $i, $TagValue, ($CurrentTime.ToString("yyyy-MM-dd HH:mm:ss.fff")), 192
        $Null = $CSVContent.Append($Line)
        
    }

}

Set-Content -Path "C:\Users\Administrator\Desktop\CreatingSampleDataWithPowerShellAndLambdaExpressions\FloatTagPointImport.csv" -Value ($CSVContent.ToString()) -Encoding "ASCII"

Download

The four script files and four CSV files that the scripts generated can be downloaded here: GeneratingTimeSeriesSampleData.zip

Tags: , , , , , , , , ,

PowerShell | Automation | SQL | Tips & Tricks

About Klaus Graefensteiner

I like the programming of machines.

Add to Google Reader or Homepage

LinkedIn FacebookTwitter View Klaus Graefensteiner's profile on Technorati
Klaus Graefensteiner

Klaus Graefensteiner
works as Developer In Test and is founder of the PowerShell Unit Testing Framework PSUnit. More...

Open Source Projects

PSUnit is a Unit Testing framwork for PowerShell. It is designed for simplicity and hosted by Codeplex.
BlogShell is The tool for lazy developers who like to automate the composition of blog content during the writing of a blog post. It is hosted by CodePlex.

Administration

About

Powered by:
BlogEngine.Net
Version: 1.6.1.0

License:
Creative Commons License

Copyright:
© Copyright 2013, Klaus Graefensteiner.

Disclaimer:
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Theme design:
This blog theme was designed and is copyrighted 2013 by Klaus Graefensteiner

Rendertime:
Page rendered at 5/22/2013 6:03:36 PM (PST Pacific Standard Time UTC DST -7)