SQL Server Integration Services Design Patterns, Second Edition (2014)

Chapter 3. Scripting Patterns

As shown throughout this book, SQL Server Integration Services (SSIS) is a multifaceted product with many native capabilities that can handle even the most difficult of data challenges. With highly flexible transformations, such as the Lookup, Conditional Split, Derived Column, and Merge Join, the data flow is well-equipped to perform a limitless number of transformations to in-flight data. On the control flow side, tools including the File System task, the For Each Loop (and its cousin, the For Loop), the File System task, and the Data Profiling task provide critical services to support fundamental ETL operations. Right out of the box, you get a toolbox that’s full of flexible and powerful objects.

However, even the most casual ETL developer will eventually encounter scenarios that require more flexibility than what is afforded in the native components. Dealing with data movement and transformation is often ugly and unpredictable, and requires a level of extensibility that would be difficult to build into general-purpose tools. Fortunately, there is a solution for these uncommon ETL needs: custom .NET scripting.

The Toolset

SQL Server Integration Services has the capability to build very powerful ETL logic directly into your SSIS packages. Through Visual Studio and its various niceties (familiar development environment, IntelliSense, and project-based development, among many others), the burden of embedding custom logic in your ETL processes is made significantly easier.

Unlike its predecessor Data Transformation Services (DTS), SQL Server Integration Services exposes the entire .NET runtime within its scripting tools. Gone is the requirement to use only ActiveX scripts within ETL packages (although this capability does still exist in SSIS, for those loyal to VBScript). With the introduction of the rich scripting environments in SSIS, you now have the ability to access the same framework features used in “real” software development. True object-oriented development, events, proper error handling, and other capabilities are now fully accessible within custom scripts in SSIS.

SQL Server Integration Services includes two different vehicles for leveraging .NET code into your packages, each designed to allow different types of custom behaviors. The Script task, which resides in the control flow toolbox, is a broad, general-purpose tool intended to perform support and administrative tasks. Within the data flow toolbox, you’ll find the Script component, a versatile yet precise data movement and manipulation tool.

If you’re new to scripting in SSIS, you might wonder why there are two different script tools in SSIS. Beyond that, the critical design pattern will include a decision on which of these tools to use in a given scenario. The short answer? It depends. As mentioned, the Script task is typically the better choice for operational behavior and is most commonly used for operations affecting overall package flow (as opposed to data movement). On the other hand, if your ETL needs require generating, consuming, or manipulating rows of data, then the Script component is normally the better tool.

Although they have nearly identical interfaces, the Script task and Script component differ greatly in their default design. As you explore each of these tools, you’ll find that there is a significant amount of code automatically added to the script project when you introduce it to your work surface. As a tool designed for direct data interaction, the Script component will include preconfigured code defining inputs and/or outputs to allow data to flow through the component. In contrast, the behaviors built into the Script task have no facilities for either inputs or puts, further illustrating that this tool was built for purposes other than direct data manipulation.

The Script task and Script component share many similarities. Both tools feature a script designer that resembles the familiar Visual Studio development environment used for mainstream software development. In both tools, you’ll find the workspace organized into a virtual solution (displayed very similarly to the solution container found in Visual Studio development) that may include multiple files and folders. Also common to both script tools is the ability to include external code or services within the virtual solution. This capability allows you to leverage code that has already been written and compiled elsewhere, whether it’s a DLL file you include with the project or some external resource such as a web service. The language behaviors in both tools will be identical; functionality such as error handling, compiler directives, and core framework functionality is common to both the Script task and the Script component.

Should I Use Script?

Before we get started exploring the design patterns for using scripting in SSIS, a fundamental question should be answered: “Should I be using the scripting tools at all?”

As much as I believe in using the script tools in SSIS to solve difficult problems, there’s one piece of advice I almost always dish out when talking or writing about this topic: only use the Script task or Script component in situations where existing tools can’t easily address the problem you’re trying to solve. Although you gain a great deal in terms of flexibility, you also lose a few things—design-time validation, easy reusability, and a GUI settings editor, among others—when you deploy an instance of script into your packages.

Now don’t take any of this to mean that scripting is bad, or that it is reflective of poor package design (after all, this is a chapter describing recommended practices for using script!). Quite the opposite, in fact—I’ve found many situations where the only tool that could satisfy the ETL requirement was a well-designed script instance. The point is that the Script task and Script component are complex tools intended for use in atypical situations. Native components are much simpler to use and maintain. In situations where one or more native elements of SSIS can easily dispatch any ETL issues, don’t complicate the issue by reinventing the wheel with a script.

The Script Editor

Though their purposes differ greatly, you can expect a similar experience in the script editor for both the Script task and the Script component tools. Features of both tools include

·     The ubiquitous code window

·     Project Explorer

·     Full .NET runtime

·     Compiler

I’ll cover the semantics of writing code in each of these tools later in the chapter. Now, I’ll explore some of the other features shared by the Script task and the Script component.

Project Explorer

Software development elements in Visual Studio are stored in logical groups called projects. The Visual Studio environment in SQL Server Data Tools (SSDT) will behave in much the same way; as shown in Figure 3-1, the file(s) for a given script is represented in the Project Explorer window. In the same figure, you can see that the project has a rather lengthy and arbitrary name. This is a system-generated identifier used to identify the project and the namespace in which it resides, but it can be changed if you prefer to have a more standardized naming convention in your script.

9781484200834_Fig03-01.jpg

Figure 3-1. Script task Project Explorer

It’s worth pointing out that the C# and VB.NET code files that you create in Project Explorer are not physically materialized in your project. Rather, the filenames exist only to logically separate the code files within the project—the code itself is embedded inline within the XML of the package (the dtsx file).

Also included in the Project Explorer is a virtual folder named References. In this folder, you will find the assembly references for core functionality required by the script. In addition, you can add your own references to this list to further extend the capability of the Script task or Script component.

Because each instance of the Script task or Script component in your package is surfaced as a Visual Studio project, you have a significant amount of control over the properties of that project. Much like a full-featured software development project, an instance of the Script task or Script component allows the ETL developer the ability to control various parameters of behavior, including the version of the .NET Framework, the compilation warning level, and the assembly name. The project properties window is shown in Figure 3-2, and it can be accessed by right-clicking the project name in Project Explorer and choosing Properties. Do keep in mind, however, that the scope of any changes you make is limited to the single instance of the Script task or Script component you’re currently working with.

In practice, it’s quite rare to have to modify the project-level properties of instances of the Script task or Script component. In the majority of cases, the default project settings will suffice.

9781484200834_Fig03-02.jpg

Figure 3-2. Script task project properties

Full .NET Runtime

Both scripting tools in SSIS have full access to the objects, methods, and events within the .NET Framework runtime. This level of accessibility allows the ETL developer to leverage exiting assemblies—including network libraries, file system tools, advanced mathematical operations, and structured error handling, among many others—as part of their scripts. Need to create a complex multidimensional array? Looking for an easier way to perform tricks with string manipulation? All this and more is easily accessible, just below the covers of the .NET runtime environment.

Compiler

Finding errors in code is critical and is generally easier to do early in the development process. To understand how this works in SSIS scripting, it’s useful to understand the life cycle of script development.

When SSIS first surfaced with SQL Server 2005, the ETL developer could choose to store the code text in a Script task or Script component without actually building (compiling) the script project. By opting not to precompile the script, a trivial amount of processing resources would be saved (or more specifically, delayed) by forcing the SSIS design environment to accept the code as written. Two problems arose from this second, the risk of runtime errors increased, due to the minimized up-front validation in the designer.

Starting with SQL Server 2008, script precompilation is required. Now when you write a script, it is compiled in the background and the resulting binary output is serialized and stored inline within the package XML. As soon as you modify your script and close the editor, the .NET compiler is invoked and creates the serialized binary data. If you’ve made an error in your code that prevents compilation, you’ll be presented with a validation error on the Script task or Script component indicating that the binary code cannot be found (see Figure 3-3).

9781484200834_Fig03-03.jpg

Figure 3-3. Compilation error with a Script task

This validation step is a safety mechanism to make sure your packages don’t make it to deployment with uncompilable code. However, it’s also possible to build the script project from within the script editor, so you can periodically check your code for compilation errors before closing the editing window. In the menu bar of the script editor, you’ll find the option to Build image Build [script project name] to compile all of the code in your script project. When you build the project in this manner, any errors found during compilation will be reported in the Error List window (Figure 3-4).

9781484200834_Fig03-04.jpg

Figure 3-4. Error List

Also present in the Error List windows are any warnings generated by the compiler; although they don’t prevent the code from being compiled, it’s always a wise idea to perform a thorough review of any warnings that appear in this list before sending the code any further down the development process.

The Script Task

Within the control flow pane, the workspace for writing custom code is the Script task. Although it technically can be used for direct data manipulation, this tool is best suited for supporting operations that can’t easily be accomplished using native control flow tasks. Think of the Script task as a helper object to be used to tie together other data-centric operations within the package.

The following are a few requirements frequently served by the Script task:

·     Checking for the existence and accessibility of source or destination files

·     Using a file archive utility API to zip or unzip files

·     Generating custom event messages (such as HTML-formatted e-mail messages)

·     Setting variable values

·     Performing cleanup operations upon package completion or if an error occurs

·     Inspecting environmental data, such as available disk space, the status of Windows services, and so on

Because it’s not designed for manipulating data, there is no need to define input or output metadata, which makes the script task relatively easy to configure. As shown in Figure 3-5, there are just a few required configuration elements for this task: specify the programming language you want to use, choose the variables you want to make visible within the script, and you’re ready to write code!

9781484200834_Fig03-05.jpg

Figure 3-5. Script Task Editor

Image Note  For both the Script task and the Script component, the decision to use either C# or VB.NET should be considered permanent. Once you’ve selected a language and opened the script editor, the language selector is disabled. This behavior is by design; although both languages compile to the same intermediate language (MSIL) code, it would be difficult for any tool to convert source code from one language to another. The default language for both tools is C#, so those who prefer VB.NET will have to change the ScriptLanguage property.

Once the language and variables are set, you’re ready to open up the code editor. When you click the Edit Script button, you’ll be presented with the familiar Visual Studio development environment. If you have written code using this IDE in the past (to develop Windows applications, web apps, etc.), you’ll recognize many entities—the Solution Explorer, the code window, the properties window, and many other common components. Although the behaviors will be a bit different in this minimized version of Visual Studio, the concepts remain the same.

The Script Component

Although both SSIS scripting tools have similar properties, they serve very different roles. Unlike the Script task, which is intended mostly for administrative and operational programmability, the Script component is designed for the more traditional moving parts of ETL: retrieving data from a source, performing some manner of transformation or validation on said data, and loading data to a destination.

Common uses of the script component include:

·     Connecting to sources of data for which there is no native source component

·     Sending data to destinations that do not offer a native destination or are structured differently than the typical tabular layout

·     Performing advanced data manipulation that requires functionality not offered with the built-in SSIS transformations

·     Complex splitting, filtering, or aggregating of the in-pipeline data

The Script component is built with the versatility to behave in one of three modes: transformation, source, or destination. When you introduce an instance of the Script component into your data flow workspace, you’ll be prompted to select a configuration, as shown in Figure 3-6. This is an important choice, since the selection you make will determine which code template is used to initially configure the component. I’ll dig more into each of these roles momentarily.

9781484200834_Fig03-06.jpg

Figure 3-6. Script component configuration

Image Note  You should consider the selection of the Script component role (transformation, source, or destination) to be permanent. If you mistakenly select the wrong configuration role, it’s usually easier to delete and re-create the script instance rather than trying to reconfigure it.

Chances are good that you’ll use the Script component in each of these roles (source, transformation, and destination) from time to time. However, my experience has been that transformation is the most frequently used role for the Script component.

Script Maintenance Patterns

Designing custom logic in your ETL processes is a hard, but rewarding, task. However, the job is not done once the code is promoted from development to testing or even to production. The effective ETL architect maintains a long-term vision when designing script solutions, because the packages often survive well beyond the tenures of those who create those solutions.

To that end, an integral part of the solution design process should be evaluating the flexibility, maintainability, and reusability of the project as a whole, making specific allowances for the project-within-a-project instances of the script.

Code Reuse

Laziness is a good thing. (Pause for effect.) To clarify: all great technologists find ways to avoid solving the same problems repeatedly. Let’s say you’ve spent some time working with script tasks and script components, and you come up with a whiz-bang idea for The Next Big ETL Thing. The Thing is so narrowly focused that it adds behavior not present in native SSIS tools, but it’s also versatile enough to be used in multiple packages across several domains. Since you worked so hard on it once, you’ll want to avoid reinventing The Thing again. The solution: find a way to make The Thing reusable.

To that end, there are several ways to reuse code within SSIS, from the old-fashioned copy/paste to fancy modularization.

Copy/Paste

No further definition is needed here: code reuse via copy/paste is exactly as it sounds. Although copying and pasting code as a reuse mechanism is a bit crude and unstructured, it’s also the simplest and most commonly used means to do so within SSIS. The upside is a quick and easy deployment with few administrative limitations. However, this simplicity comes at a cost. Deployed in this manner, each copy of the code exists in its own silo and must be independently and manually maintained.

External Assemblies

As I mentioned earlier in the chapter, both the Script task and the Script component allow you to reference external assemblies (compiled code generated from a separate project) to import supplemental behavior into the instance of the Script task/component. The details of creating an external assembly are beyond the scope of this chapter, but in a nutshell, you would use a separate development environment to code and compile the classes, methods, and events to be used in your ETL processes. The resulting binary file, known as an assembly, would be deployed to the development machine and any server machine(s) that would execute the package. The assembly would then be added to the list of script references, and the result would be that the behaviors defined in the assembly would be accessible from within the instance of the Script task or Script component.

There are several upsides to this approach. First of all, it’s a more modular way of handling code reuse within your package. Rather than relying on rudimentary copy/paste operations, this method permits a single point of administration and development for the shared custom functions of your ETL processes. Since all references to the custom behavior would point to a single assembly on each development machine or server, any updates to the code would be addressed at the machine level rather than having to touch every script in every package. In addition, the behaviors built into the external assemblies could be used by other processes or applications; because these standalone assemblies are built using the Common Language Runtime (CLR), their use could be leveraged beyond the borders of SSIS.

There are a few limitations to this approach. First, you cannot use SSDT to create custom assemblies. Although both tools use the Visual Studio shell, they are only installed with the templates to create business intelligence projects and do not natively support other project types. To create an assembly containing custom code, you’d need to use a version of Visual Studio that was configured to generate class library projects (the Standard and Professional versions, or even the language-specific free Express versions)—or, for highly experienced developers, plain old Notepad and the .NET compiler. Another limitation is that any assemblies referenced in your script must be deployed to and registered in the Global Assembly Cache, or GAC, on the machine(s) that will execute the package. This deployment and registration process is not complex, but it does add to the total of moving parts in your ETL infrastructure.

Custom Tasks/Components

At the top of the SSIS reusability food chain you will find the custom task and custom component. As with a custom assembly, the ability to add your own tasks and components to SSIS allows you create highly customized behaviors within SSIS. In addition, custom tasks and components enable you to create a more customized user interface for these behaviors, allowing for relatively simple drag-and-drop use in your SSIS packages. In the interest of brevity, we won’t detail the use of custom tasks or custom components in this chapter, but it is worth mentioning that if there is script behavior that is often repeated in many of your packages, it’s worth considering converting the Script task or Script component into a custom tool that can easily integrate into SSDT.

Source Control

Ask any good developer for their short list of required project elements, and source control will almost always be near the top of the list. Because any SSIS project really is software development—albeit in a mostly graphical environment— the same consideration must be made by ETL developers as well. Being that the storage for an SSIS package is simply an XML file, it’s not difficult to add SSIS packages to most any existing source control system.

To some, the Script task and Script component have the appearance of residing outside the SSIS package—after all, both of these are managed through what appears to be a separate Visual Studio project. This thinking sometimes brings up the question of how to integrate SSIS script logic into source control. The easy answer is that there is no requirement above and beyond source controlling the package itself. Because all of the code is stored inline within the package XML, there is no need to separately track in source control the code within instances of the Script task or Script component.

Scripting Design Patterns

As a born-and-raised Southerner, I was always taught that there’s more than one way to skin a cat. Although I’ve never attempted this particular exercise, I can confirm that for any given problem (in life, as well as in SSIS) there may be dozens or perhaps even hundreds of correct solutions.

Because of the highly flexible nature of scripting solutions, it’s not possible to predict every possible permutation of logic that could find its way into SSIS code. However, as Integration Services has evolved, some commonly used design patterns have emerged. In this section, I’ll look at some of these patterns.

Connection Managers and Scripting

Connection managers are built into SQL Server Integration Services as a modular way to reuse connections to databases, data files, and other sources of information. Most everyone with even a little experience using SSIS is aware of connection managers and how they relate to conventional components such as OLE DB Source/Destination and Flat File Source/Destination, as well as tasks such as the FTP task and the Execute SQL task. You can instantiate a connection object once as a package-level entity (as shown in Figure 3-7) and use it throughout the remainder of the package.

9781484200834_Fig03-07.jpg

Figure 3-7. Connection manager objects

Not as widely known is the fact that you can access most connection manager objects within instances of the Script task and the Script component as well. Connecting to a data source is a very common task in SSIS scripting, particularly within instances of the Script component used as a source or destination. However, in real-world usage, I find that a lot of ETL developers go through the exercise of programmatically creating a new connection within the script, even if the package already has a connection manager.

If a connection manager object already exists in your SSIS package for for a particular connection, it’s preferable to use that existing connection manager object when connecting to a data store from within a script. If the connection manager does not yet exist, consider creating one: there are numerous benefits to abstracting the connection from the script, including ease of change and the ability for the connection to engage in transactions within SSIS.

Image Note  For an in-depth analysis on why to use connection manager objects as opposed to connections created entirely in script, you can review an excellent blog post on this topic by Todd McDermid here: http://toddmcdermid.blogspot.com/2011/05/use-connections-properly-in-ssis-script.html. In this post, the author specifically discusses the use of connection managers in the Script task, but most of the same principles would apply to the use of connection managers within the Script component as well.

Although it’s possible to reuse connection managers for most any connection type within a script, to keep things simple, I’ll limit my discussion to SQL Server database connections. With some modification, many of these principles would apply to other connection types as well.

Using Connection Managers in the Script Task

Although not entirely intuitive, the coding syntax to reference an existing connection manager in the Script task is relatively easy to understand. I’ll look at examples for the two most common ways to connect to SQL Server—through the OLE DB connection and the ADO.NET connection.

Connecting to an ADO.NET connection manager through the Script task is a two-step process, as shown in Listing 3-1. First, create a reference to your existing ConnectionManager object (using the name you gave it in the SSIS package), and acquire a connection from that object in code.

Listing 3-1. Use an Existing ADO.NET Connection in the Script Task

// Create the ADO.NET database connection
ConnectionManager connMgr = Dts.Connections["ADONET_PROD"];
System.Data.SqlClient.SqlConnection theConnection
        = (System.Data.SqlClient.SqlConnection)connMgr.AcquireConnection(Dts.Transaction);

Using an OLE DB connection manager in the Script task requires a little more code but is a similar exercise to its ADO.NET counterpart. As shown in Listing 3-2, you have to add an intermediate object to make the appropriate data type cast when using the OLE DB provider:

Listing 3-2. Using an Existing OLE DB Connection in the Script Task

// Create the OLEDB database connection
// Note that we'll need to add references to Microsoft.CSharp and Microsoft.SqlServer.DTSRuntimeWrap
ConnectionManager cm = Dts.Connections["AdventureWorks2012"];
Microsoft.SqlServer.Dts.Runtime.Wrapper.IDTSConnectionManagerDatabaseParameters100 cmParams
   = cm.InnerObject as Microsoft.SqlServer.Dts.Runtime.Wrapper.IDTSConnectionManagerDatabaseParameters100;
System.Data.OleDb.OleDbConnection conn = (System.Data.OleDb.OleDbConnection)cmParams.GetConnectionForSchema();

Image Note  It’s worth mentioning that it is technically possible to reference a connection by number rather than by name (e.g., using Dts.Connections[3] rather than Dts.Connections["Conn_Name"]). I strongly recommend against this practice! It makes for rather ambiguous connection references, and since the order of connection managers cannot be guaranteed, you might end up with a runtime error at best—and at worst, wildly unexpected behavior in your package.

Using Connection Managers in the Script Component

As I mentioned in the previous section, many of the same concepts apply to the reuse of connection managers, whether you’re working in the Script task or the Script component. For the same reasons, it’s almost always a best practice to reuse an existing connection manager object (or create a new one if necessary) rather than to build a connection object from scratch within code.

Logistically, connection managers are a little easier to use in the Script component. Because the purpose of this tool is to move data (which is not always the case with the Script task), some additional functionality comes baked in to make the job of reusing connection managers less cumbersome. In the Script component, you can declare the use of one or more connections within the script’s graphical editor (much like you declare the use of read-only or read-write variables, to be discussed shortly). As shown in Figure 3-8, this interface allows you to easily reference an existing connection.

9781484200834_Fig03-08.jpg

Figure 3-8. Declaring connection managers in the Script component

Once they are referenced here, the syntax to use the connection within your code is much simpler as well. Rather than using the connection name as an indexer, you can access any of these connections through the UserComponent.Connections collection. Here’s an example:

Listing 3-3. Using a Previously Declared Connection Manager in Script Component

// Connect to the ADO database connection
System.Data.SqlClient.SqlConnection conn = (System.Data.SqlClient.SqlConnection)Connections.OLEDBPROD.AcquireConnection(null);

Variables

In many—if not most—instances of the Script task and Script component, you’ll need to inspect or manipulate values stored in SSIS variables. Because they are so prevalent in these implementations, it’s important to understand how best to address SSIS variables within the scripting tools.

Image Note  It is important to draw a distinction between variables in SSIS and variables declared within the Script task and Script component. Although there’s some commonality in their usage, they are separate and distinct entities with very different properties. SSIS variables are defined as part of the SSIS package and may be used across many different tasks and components. Script variables, on the other hand, are declared within individual instances of the Script task or Script component and are only valid within the instance in which they are defined.

Variable Visibility

In both the Script task and Script component, you can explicitly expose one or more variables using the GUI editor for each. In Figure 3-9, you can see that we have the option of including SSIS variables in the script and can specify whether those variables will be surfaced as read-only or read-write.

9781484200834_Fig03-09.jpg

Figure 3-9. Including variables in script

It is possible to read or modify SSIS variables with a script even if you don’t explicitly include them. However, it’s usually preferable to declare any required variables as shown, since the syntax within the script is much simpler when references to the SSIS variables are declared ahead of time in this manner.

Variable Syntax in Code

Declaring read-only or read-write variables is a similar experience whether you’re using the Script task or the Script component. However, the syntax to address these variables in code is different depending on the tool you’re using. As shown in Listing 3-4, SSIS variables within a Script task instance are addressed by using a string indexer to specify the name of the variable.

Listing 3-4. Script Task Variable Syntax

public void main()
{
        // Get the current RunID
        int runID = int.Parse(Dts.Variables["RunID"].Value.ToString());

        // Set the ClientID
        Dts.Variables["ClientID"].Value = ETL.GetClientID(runID);

        Dts.TaskResult = (int)ScriptResults.Success;
}

When using an instance of the Script component, the syntax is noticeably different. Rather than using an indexer to read from or write to the referenced SSIS variables, you can use the Variables.<Variable Name> syntax as shown in Listing 3-5:

Listing 3-5. Script Component Variable Syntax

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
        // Push the SSIS variable ClientName to the value in the current row
        Row.ClientName = Variables.ClientName;
}

It is possible to access variables within instances of the Script task or the Script component even if you do not explicitly declare their use in the graphical settings editor. Within the SSIS scripting toolset, you will find a couple of functions that will allow you to programmatically lock and unlock variables for either read-only or read-write access. As shown inListing 3-6, you can use the VariableDispenser.LockOneForRead() function to capture the value of a variable that was not previously declared.

Listing 3-6. Manually Lock a Variable

// Programmatically lock the variable we need
Variable vars = null;
Dts.VariableDispenser.LockOneForRead("RunID", ref vars);

// Assign to script variable
runID = int.Parse(vars["RunID"].Value.ToString());

// Unlock the variable object
vars.Unlock();

Using a method similar to the one shown in Listing 3-6, you can manipulate variable values by using the function VariableDispenser.LockOneForWrite(), which would allow you to write to as well as read from the variable value.

Variable Data Types

As you may have derived from Listing 3-4 and Listing 3-5, the interpreted data type for variable values will differ between the Script task and the Script component. With the latter, any variable that you declare in the graphical settings editor will surface as the .NET data type equivalent of the SSIS variable type, and there is no need to perform a type cast. When using the Script task (and the Script component, if you opt to use either the LockOneForRead() or LockOneForWrite() method), variables are presented as the generic Object type, and most of the time you’ll have to cast to an appropriate type any variable used in script code. As shown in Figure 3-10, you’ll get a compiler error if you forget to cast a variable value to the appropriate type.

9781484200834_Fig03-10.jpg

Figure 3-10. Script task variables must be cast

Naming Patterns

If you have worked as a software developer in the past, the following section will be nothing more than a review. If you haven’t, I’ll share an important tidbit of information: naming conventions are important.

I’ve known many a developer in my time, and I’ve never found one who wasn’t loyal to some type of naming convention. Why? It’s familiar. It’s predicListing. When patterns emerge in the way you develop software, it becomes easier to maintain—by others as well as yourself. Technically, there’s no difference between code written in camel case, Hungarian notation, mnemonic notation, or Pascal style. This is purely a matter of clarity, readability, and maintainability. By finding and sticking with a style (even if it’s a hybrid of other styles), you’ll have more navigable code and will likely find that your fellow developers appreciate the approach.

Here are a few suggestions regarding naming conventions:

·        Be consistent: This is the number-one rule and should be followed above all others. Whatever style you develop, stick with it. You can change or modify your naming convention style later, but at least be consistent within each project.

·        Be clear: I can’t tell you how many times I’ve had to debug code (and yes, sometimes it was my own) littered with single-character object names, ambiguous function names, and other pull-your-hair-out kinds of practices. Now, don’t go overboard here. Most object names don’t need to read like database_write_failed_and_could_not_display_interactive_error, but there’s probably some happy medium between that and variable names such as f.

·        Be a follower: If you don’t have your own style, do as those around you do. Some organizations, especially larger ones, may dictate the naming convention style you’ll use.

Conclusion

The scripting tools in SQL Server Integration Services are both capable and complex. When scripting for ETL first surfaced with DTS many years ago, it was a quick-and-dirty way to solve some data movement and manipulation problems. With SSIS, the latest generation of scripting tools is robust and enterprise-ready. With a few recommended practices, scripting can be a great addition to any ETL developer’s toolkit.