SQL Server Integration Services Design Patterns, Second Edition (2014)

Chapter 10. Expression Language Patterns

The expression language in SSIS might appropriately be referred to as the glue that holds the product together. Expressions in SSIS provide a relatively simple and easy-to-use interface to allow data developers to introduce dynamic logic into the ETL infrastructure. When you think through the various moving parts within Integration Services, you’ll likely discover that you can manipulate all of them in one way or another by using expressions.

Expressions provide a fast, effective, and, dare I say, fun way to solve specific ETL challenges. In this chapter, we’ll look into some of the basics of the expression language and I’ll describe a few instances where SSIS expressions are ideal (and a few where they might not be) for effectively solving difficult ETL problems.

Getting to Know the Expression Language

Before we dive into the design patterns surrounding the SSIS expression language, let’s spend a little time defining and becoming familiar with the nuances of the language. Reviewing the language-specific patterns can help you get up to speed and use the language correctly.

What Is the Expression Language?

The SSIS expression language is an interpreted language that has been built into the SSIS runtime environment. This specialized language is used to craft scalar-valued snippets of code (individually referred to as expressions) that you may use at various points within the SSIS environment.

The SSIS designer exposes dozens of interfaces where expressions can be used in place of hard-coded values, allowing the BI professional to leverage that flexibility to create dynamic and reusable elements within SSIS. Conceptually, it’s not unlike the product-specific dialects that exist in other Microsoft development environments. For example, when you are developing reports in SSDT or BIDS that you want to deploy to SQL Server Reporting Services, you can use Visual Basic for Applications (VBA) code to generate dynamic behavior during report execution and rendering.

As you explore the expression language, you’ll find it to be a very powerful addition to the natural capabilities of SQL Server Integration Services. It has a rich library of functionality that will be familiar to both developers and DBAs. Among the functional domains of the SSIS expression language are

·     A full complement of mathematical functions and operators

·     An impressive set of string functions that may be used in comparisons, analysis, and value manipulation

·     Common date and time functionality, including date part extraction, date arithmetic, and comparison

The expression language serves two different roles within the package life cycle:

·        Evaluation: You can use expressions to determine whether a specified condition is true and to change the behavior of the package accordingly. When it is used as part of the control flow, an expression used as an evaluation may check a certain value and dynamically alter the execution path based on the results of that comparison. Within the data flow, expressions allow you to evaluate data a row at a time to determine how to proceed in the ETL.

·        Assignment: In addition to using expressions as decision-making elements, you can use SSIS expressions to programmatically modify data during package execution. Typically you might use them as expression-based property settings and to transform in-process data within the data flow.

Expressions in SSIS may derive their comparisons or assignments from several fronts. Built-in system variables permit visibility into software environmental data such as package and container start times, machine environment information, package versioning metadata, and more. You can interrogate and manipulate the values of user-defined SSIS variables by using expressions and access values of package parameters to be leveraged elsewhere during package execution. In the data flow, expressions may interact with running data at the cellular level.

Expressions are value-driven at runtime. Unlike settings that are generally only configurable at design time (think data flow column definitions), expressions will calculate their values when the package is actually executed. Furthermore, a single expression may be evaluated many times (perhaps with a different result each time) during the execution life cycle of the package. Consider the case of the ForEach Loop, a container that loops through a specified set of objects or values until it reaches the end of said collection. Expressions that are manipulated within the loop may be updated dozens or even hundreds of times during this process.

Why Use Expressions?

The ability to use expressions is one of the greatest strengths of SQL Server Integration Services. Simply put, expressions help to fill in the small gaps. The expression language isn’t a tool in itself, but rather, it is an interface that helps other SSIS tools more effectively perform their respective functions. That’s all well and good, but in the interests of simplicity, why would an ETL developer choose to use expressions instead of other languages such as T-SQL, C#, or VB.NET? Here are a few compelling reasons to employ the expression language in your SSIS packages:

·        Simplicity: Expressions language can be used to quickly add flow logic or make small changes to in-pipeline data in the data flow. You can often handle small ETL changes that might otherwise be relegated to a Script task or component inline without needing to introduce extra code to the package.

·        Consistency: Use of the expression language can lead to a consistent approach to data or program flow challenges. For example, if your ETL requires that you convert blank strings to NULLs, the approach and syntax would otherwise be different for flat files, Access databases, and relational database sources. By applying the expression language to the same task, you reduce the amount of distinct code you have to write by relying on the built-in string manipulation functions in the expression language.

·        Maintenance scope: By applying the design pattern of using the expression language for cleansing needs, you can eliminate much of the sleuth work you need to do to track down and change cleansing rules as your business expectations change. Using expressions in the package itself provides you with a single point of maintenance rather than forcing you to inspect the upstream data sources each time you need to make a change.

I’ve done a number of presentations for novice SSIS developers, and when I bring up the topic of the expression language, one question almost always seems to come up: “Where do I use this expression language stuff?” My answer: “Everywhere!” Part of the beauty of expressions is that they can be used almost anywhere within SSIS packages. You can employ expressions on the control flow in precedence constraints. It’s convenient to make your SSIS package variables dynamic by replacing their static values with expressions. You can leverage expressions within the data flow to manipulate data and even control the execution path. The bottom line is that you can manipulate many of the common properties of packages, tasks, constraints, and data flow elements by using expressions.

Although its syntax may seem unusual, the expression language isn’t difficult to learn. Anyone with logical scripting experience (even if that experience is limited to T-SQL) can quickly pick up on the basics and should be able to master the language with a reasonable amount of practice.

Language Essentials

Even for those who have experience scripting in other Microsoft development environments, the first exposure to the SSIS expression language can be a little unsettling. The syntax and functionality are unlike any other language, either interpreted or compiled. It appears to be a strange hybrid of several languages and is certainly a dialect all of its own.

Developers who have spent time using the C-style languages (C, C++, C#, Java) will recognize some of the syntactical nuances within the expression language:

·     Case-sensitive column and variable names

·     Case-sensitive string comparisons

·     Double-equal (==) comparison operators

·     Simplified conditional (if/then/else) operators

Similarly, anyone experienced in T-SQL will find a great deal of familiar behavior within the SSIS expression language:

·     Case-insensitive function names

·     Date arithmetic and string manipulation functions much like those in T-SQL

The SSIS expression language is quite powerful, with its wide variety of functions and operators. With native behavior including equality tests, type casts, string manipulation, and date arithmetic, the use of expressions within SSIS packages can help to overcome ETL challenges both large and small.

Limitations

As useful as the expression language is, there are a few key limitations to its use. Bear in mind that these are relatively minor hang-ups; the SSIS expression language is not intended to be a full-featured programming language, but rather a lightweight tool to supplement the behavior of existing SSIS tasks and components. Among some of the challenges are the following:

·        Expressions are limited to single-value statements: This almost goes without saying since it’s an expression language and not a programming language. Still, it’s worth mentioning that you can’t, for example, use a single expression to iterate through a list or process a string character-by-character.

·        No IntelliSense: Unlike other scripting/expression environments, there is no built-in IntelliSense within the native expression editors. Although the expression editor in SSIS does have field, variable, and function lists, the convenience and coding reliability of IntelliSense has not yet made it into the product.

·        No error handling: This limitation is most visible when you attempt to change data type or length. Because there is no try/catch or TryParse() behavior found in the .NET-based languages, you cannot, for example, attempt to cast a text value to a number and programmatically handle any type cast errors in the same expression.

·        No comments allowed: The fact that there is no provision for code comments can be a significant downside when you are using lengthy or complex expressions. Any comments documenting the purpose of the expression have to be done peripherally—for example, on the data flow or control flow surface as an SSIS annotation.

·     Complex statements can be difficult: Simple assignments or comparisons are easy to do and are usually easy to understand after the fact. However, introducing even a moderate amount of complexity to an expression can make for a lengthy and convoluted statement. Consider the case of a multiconditional If statement. In most other dialects, one could simply perform an If/Then/Else If operation to account for more than one test condition. However, the expression language doesn’t have such behavior, so to build such logic you need to nest conditional operators. Listing 10-1 shows how you might easily address four possible conditions in a CASE operation in Transact-SQL. By contrast, Listing 10-2 shows a similar example using the expression language (note that I manually wrapped the text to fit it on the page). Although the result of the operation is the same, the latter has conditional operators nested two levels deep and is more difficult to develop and maintain.

Listing 10-1. Multiconditional Evaluation in T-SQL

SELECT CASE WHEN @TestCase = 3 THEN 'Test case = Solid'
            WHEN @TestCase = 2 THEN 'Test case = Liquid'
            WHEN @TestCase = 1 THEN 'Test case = Gas'
            ELSE 'Unknown test case' END [TestCaseType]

Listing 10-2. Multiconditional Evaluation in the Expression Language

(TestCase == 1) ? "Test case = Gas" : (TestCase == 2 ? "Test case = Liquid" : (TestCase == 3 ? "Test case = Solid" : "Unknown Test Case"))

Despite its minor shortcomings, the SSIS expression language remains an integral part of the product, and as you’ll see later in this chapter, it has some very practical uses in a well-designed ETL ecosystem.

Putting the Expression Language to Work

Now that you understand what the expression language is (and is not), let’s talk about some design patterns where you might use it.

Package Expressions

Although not as common as other uses, it is possible to use SSIS expressions to configure package-level properties. Here are a handful of properties that may be set at the package level by using expressions:

·     Disable

·     DisableEventHandlers

·     CheckpointFileName

·     MaxConcurrentExecutables

·     DelayValidation

·     Description

Consider the example of MaxConcurrentExecutables, which defines how many executables (packages, tasks, etc.) can run concurrently. By setting this property through an expression, the ETL developer would be able to dynamically control this value based on any criteria visible through an expression.

Although these properties are configurable by using expressions, it’s far more common to find package-level options set by using package parameters (with later versions of SSIS) or package configurations (SQL Server 2008 and earlier). It is usually best to share common values across package ancestries using parameters or configurations, which allow you greater flexibility and easier maintenance. I expose this particular design pattern more for the purpose of identifying it as an antipattern than for defining parameters for its use. Unless there’s some business case or regulation dictating otherwise, it’s a better long-term solution to externalize these values rather than rely on expressions.

Variable Expressions

As shown in Figure 10-1, you can configure each variable with a static value in the Value field or define a value expression that will be evaluated at runtime. Note that the variable window was improved starting in SQL Server 2012—in older versions, static values were shown in the Variable window, but you had to use the Properties window to view or alter an expression for a variable.

9781484200834_Fig10-01.jpg

Figure 10-1. Expressions with variables

In practice, I often see expressions applied to variable values, and then the resulting variable is used as a property on a task or component (as opposed to it being used as an expression to set the property directly). I’m a fan of this design pattern for one simple reason: reusability. It’s not uncommon for components to share certain properties, and building expressions on each of those shared properties for every applicable component is both redundant and unnecessary. For those properties that will be shared across multiple tasks or components, it’s far easier to centralize the expression logic into a variable and then use that variable to set the shared properties. This approach allows for faster development as well as easier maintenance should the logic require changes down the road.

When using this design pattern, don’t forget that you can also “stack” variable values. In the expression statement, you can leverage other variables to set the value of the current variable.

Connection Managers

One of the most practical and common places to use SSIS expressions is the Connection Managers tray. Generally speaking, it’s typically preferable to store dynamic connection properties not in expressions, but rather as parameters, particularly when you’re dealing with structured data. Because of the sensitive and frequently changing nature of connection metadata (server names, user names, and passwords), most ETL professionals choose to externalize those settings to keep them stored securely and externally to the package so they can be globally changed (rather than modified package by package).

One recurring exception to this pattern is connections that interact with the file system. There are several cases where using expressions helps to lighten the load of processing file-based sources or destinations:

·     When working with flat files, the files are to be named according to the current date and/or time stamp (such as Medicare_2014_06_01.txt, for example).

·     The files are expected to be filed in the file system according to the date (such as D:\Data\2014\06\01\Medicare.txt).

·     A scheduled job loads a text file that always has the same file name, but a copy of each day’s file needs to be saved without overwriting the file processed on the previous day.

For these cases, you can use a little dab of expression language to dynamically build directory paths and file names that your connections in SSIS will use. For this example, let’s assume that you’re generating a flat text file from within your package, and you want to use a dynamic file name based on the current date. By setting the ConnectionString property from within the Properties window of the instance of the Flat File Connection Manager, you can manipulate the runtime value of the file name. As shown in Figure 10-2, you’re specifying the base file name and then appending the elements of the current date to build a customized file name.

9781484200834_Fig10-02.jpg

Figure 10-2. Dynamic file name using expression

Note that the pattern just discussed could be further extended to include elements of time (hours/minutes/seconds) should your ETL requirements include a restraint for that level of granularity.

Since we’re not going in-depth into all the syntactical elements of the expression language, I’ll just point out a couple of things I’ve done here:

·     Because the backslash (\) is a special character in the expression language, I have to “escape” it (to negate its status as a special character) by using a double backslash (\\) when I include it as a string literal.

·     Using (DT_STR, n, 1252), I’m converting the integer value returned by the DATEPART function to ASCII text. In this case, I’m using code page 1252 with a maximum length of either 2 or 4 depending on the component of the date element.

·     Using the RIGHT function, I’ll pad any single-digit month or day value with zeroes (e.g., so that “3” becomes “03”) to maintain consistency.

Remember that this pattern is highly flexible. It can be utilized with almost any file connection, whether it’s to be used as a source or a destination. You’re not limited to just flat file connections here either; you can extend this logic to some of the other SSIS connections as well. I’ve used this same design pattern when dealing with FTP data as both a source and destination. By embedding the same logic within the properties of an FTP source, you can programmatically “walk” the directory structure of a remote server when it is in a known and predictable format such as this.

Project-Level Connection Managers

When working with projects in the catalog deployment model, you can expose connection information across multiple packages in the same project by way of project connection managers. When you are using older versions of SSIS (2008 or earlier), or when you are working in the package deployment model in later versions of SSIS, any connection manager defined within a package is independent of those in other packages. Starting with SSIS in SQL Server 2012, however, you now have the ability to attach a connection manager to your workspace at the project level. These are accessible to all packages within the same project.

We’ll not go deeply into the new deployment model in this chapter, but it is important to point out how the use of expressions impacts project connection managers. Because they are attached to the project and not one particular package, the properties of these shared connections are common to all packages in the project. As such, any property setting on these project connection managers—including the use of expressions—would be immediately reflected in all packages in the project. This is a welcome and much needed improvement to the way packages interact with one another, but for those of us who have worked with previous versions of SSIS, it’s a bit of a paradigm shift. Don’t get caught off guard when an expression applied to a project connection in one package gets applied to the other packages in the project!

Control Flow

Within the control flow, there are a couple of different ways to implement SSIS expressions. First, each of the tasks and containers will expose several properties that are configurable using expressions. In addition, the paths between them (known as precedence constraints) allow ETL developers to customize the decision path when moving from one task/container to another.

Conditional Execution Through Expressions and Constraints

The essential function of the control flow is to manage the execution of package elements. By using precedence constraints, you can design a package so that tasks and containers are fired in the proper order and with the correct dependencies intact. For a simple example of this, think about a package that truncates and then loads a staging table. You can perform both of these tasks in the same package, but without a precedence constraint to cause the insert operation to occur after the TRUNCATE TABLE execution, you run the risk of inadvertently loading and then deleting the same data.

You can configure precedence constraints to manage flow-based successful completion of the preceding task (the default behavior), or you may set them to cause the task to execute only if the preceding task fails. In addition, you can set the constraint to Completion, allowing the downstream task to fire when the upstream task is finished, regardless of its outcome. Tasks may have multiple precedence constraints, and you may set these so that any or all of them must be satisfied before the task to which they are attached will execute. Figure 10-3 shows a fairly typical use of precedence constraints; note that the unlabeled arrows represent Success constraints, and the others are labeled as to their purpose. The dashed lines indicate that the task is configured to execute upon completion of either of the preceding tasks.

9781484200834_Fig10-03.jpg

Figure 10-3. Precedence constraints

As useful as precedence constraints are, the domain of variability that they address is fairly limited: the only conditions that can be tested are whether a task completed as well as the success or failure of said task. In the brief example shown in Figure 10-3, you can probably infer that I’m downloading one or more files from an external source, loading the data from those files into staging tables, and then merging (upserting) the data into a database table. Although there’s nothing technically wrong here, there is room for improvement. For example, what happens if there are no files to be processed? In the example shown, the truncation of the staging table, the loop through the file system to find the downloaded files (even if none exist), and the merge operation will all be executed even if there are no files to process.

In the first job I ever held, I was responsible for, among other things, gathering stray shopping carts from the store parking lot and bringing them back inside. My boss once told me, “This job requires an excessive amount of walking, so do what you can to save steps.” All these years later, that advice still holds true. Why run through extra steps when you can simply skip past them if they are not needed? For the previous example, you can include a relatively simple expression to bypass the execution of the majority of the package when no files are found to process. Saving those steps saves CPU cycles, disk I/O, and other resources.

Precedence constraints also have the ability to use expressions to enforce proper package flow. In Figure 10-4, you’ll see that the evaluation operation is set to Expression to enforce both the execution value of the prior task as well as the value defined in the Expression box. For illustration purposes, assume that you’ve populated an SSIS variable to store the number of files downloaded in the Script task operation, and you’re using the expression to confirm that at least one file was processed. From here, you can either type the expression into the window manually or use the ellipsis button to open the Expression Editor (note that in earlier versions (2008 and earlier) of the product, you will have to enter the expression by hand without the benefit of the Expression Editor).

9781484200834_Fig10-04.jpg

Figure 10-4. Precedence Constraint Editor

Refer back to the original package; you’ll see that the precedence constraint between the first Script task and the truncation SQL task now reflects the presence of an expression in the constraint (Figure 10-5).

9781484200834_Fig10-05.jpg

Figure 10-5. Expression notation in precedence constraint

It’s worth noting that the example in Figure 10-5 shows a non-standard notation on the constraint. By default, only the function icon (fx) will appear when you are using an expression as part of a constraint. Assuming that the expression is not a lengthy one, I typically change theShowAnnotation option of the constraint to ConstraintOptions, which will include the expression itself on the label of the constraint. This is an easy reminder of the expression used in the constraint, and it doesn’t require opening the properties window to see the expression.

Task-Level Expressions

In addition to the control flow uses of expressions, most every task and container in SSIS has its own properties that can be configured using expressions. The options for configuration using expressions will vary from one executable to the next, but there are elements common to most tasks and containers:

·     Description

·     Disable

·     DisableValidation

·     TransactionOption

·     FailPackageOnFailure

·     FailParentOnFailure

A common design pattern using a task-level expression is to employ the SqlStatementSource property of the Execute SQL task. In most cases, you can use this task combined with query parameters to create dynamic statements in T-SQL. However, some language constructs (such as subqueries) don’t always work well with parameters, exposing a need to build the SQL string in code. By using an expression instead of static text for the SqlStatementSource property, the ETL developer can have complete control of the T-SQL statement when query parameters don’t fit.

Image Note  There was a limit on string size in SQL Server 2008 Release 2 that’s been greatly relaxed in 2012.

Data Flow Expressions

As we move from the control flow into the data flow, we find the more traditional use of expressions as part of our ETL strategy. Like the higher-level executables, we find that every component in the data flow is affected either directly or indirectly by SSIS expressions.

Data Cleansing

Lightweight data cleansing is one of the most common uses of the expression language within the data flow of SSIS. Most frequently used within the derived column transformation, expressions can be used for certain cleanup tasks, including these:

·     Changing the case of data

·     Grabbing a substring from within a longer string

·     Trimming extraneous space characters

·     Replacing inappropriate characters (such as removing letters from text)

·     Changing data length or type

Often, you can minimize the need for data cleansing in the data flow simply through well-designed query statements in the extraction from the various data sources. However, sometimes cleanup at the source is just not an option. Many sources of data are nonrelational: consider text files and web services as data sources, for example, which generally do not have the option of cleaning up the data before its arrival into the SSIS space. Sometimes even relational sources fit in this box: I’ve encountered a number of scenarios where the only interface to the data was through a predefined stored procedure that could neither be inspected nor changed by the ETL developer. For cases such as these where source cleansing is not possible, using expressions within the data flow is a good second-level defense.

One design pattern that I use frequently is to trim out extra whitespace and convert blank strings to NULL values. As shown in the following, such an operation could be performed with a single, relatively simple expression:

(LEN(TRIM([Street_Address])) > 0) ? TRIM([Street_Address]) : (DT_WSTR, 100)NULL(DT_WSTR, 100)

Regarding data cleansing using the expression language, I will offer a brief word of caution: if you find yourself needing to do complex, multistep cleansing operations within your SSIS packages, consider using some other means to do the heavy lifting. As I mentioned earlier, the expression language is best suited for lightweight data cleansing; because complex expressions can be difficult to develop and debug, you might instead use a richer tool such as the Script task or Script component, or perhaps Data Quality Services, for these advanced transformations.

Branching

Sometimes you will find that you need to create forks in the road with ETL data flow. There are several reasons why you might need to create such branches within your data flow:

·        Different outputs: For data that exists in a single data flow but is bound for different destinations, creating branches is an effective solution. I recall a case I worked on several years back when we were building a system to distribute data to several financial vendors. Each of these vendors required the same type of data, but each vendor would get a different “slice” of the data based on several criteria, and each required the data in a slightly different format. Rather than design multiple end-to-end data flows that would essentially duplicate much of the logic, I created a single package that employed a conditional split transformation to split the data flow based on a specified condition, and from there, the data branched out to the respective outputs.

·        Inline cleansing: A very common ETL scenario in SSIS is to split “good” data from “bad” data within a single data flow, attempt to clean the bad data, and then merge the cleansed data with the good data. This allows you to leave intact any data that does not require cleansing, which may help to conserve processing resources.

·        Disparate data domains: In cases where data is structurally similar but syntactically different, you might want to employ branching to handle the data differently within your data flow. Consider the example of geographical address data: although they both describe a physical address, you might need to process domestic addresses differently than you would handle international addresses. By using branching tools such as the conditional split, various address types from a single source type can be handled within one data flow task.

·        Varying metadata: Although relatively rare, there will be the occasion where a source may contain rows with varying metadata. Consider a text file with a ragged structure in which some rows are missing columns at the end of the line. By splitting the data based on the absence of certain columns, you can account for the metadata differences inline.

Figure 10-6 exposes this design pattern by showing the use of expression logic to break apart a data stream into multiple outputs. In this case, you are processing a billing file by using comparison expressions within the conditional split transformation (see the callout) to determine whether each row is paid on time, not yet due, or past due, and then you’re sending it to the appropriate output accordingly.

9781484200834_Fig10-06.jpg

Figure 10-6. Using expressions to define multiple paths

One interesting caveat regarding the application of expressions within the data flow is the way in which SSIS exposes component-level expressions. Although the expression language is very useful within the pipeline of the data flow, most components do not actually expose properties that can be set using expressions. For those that do allow expressions on certain properties, these expressions are surfaced as elements of the data flow itself and will appear as part of the options in the Data Flow Properties window while working in the control flow.

As shown in Figure 10-7, you are using the expression properties of the data flow to access the ADO.NET data source within that data flow. As you can see, the identifier in the Property column shows that this expression belongs to the data source within the data flow, allowing you to set the SqlCommand property of that source. It’s useful to note here that I used the ADO.NET source purposefully in this example. Since this source does not currently allow the use of parameters, setting the SqlCommand property is often an acceptable substitute for dynamically retrieving data from a relational database using this component.

9781484200834_Fig10-07.jpg

Figure 10-7. Data flow expression

Application of Business Rules

Although they share some of the same methods, the applying business rules differs conceptually from data cleansing. For the most part, data cleansing is considered to be universal: misspelled words, inconsistent casing, extraneous spacing, and the NULL-versus-blank-versus-zero quandary are all common problems that must be dealt with in almost every ETL process. Business rules, on the other hand, are specific use cases in which data is manipulated, extrapolated, or discarded based on custom logic that is specific to the business at hand. These rules may be general enough to apply to an entire industry (healthcare billing workflows, for example) or as specific as the arrangement of data to suit the preferences of an individual manager.

Generally speaking, the use of expressions to apply business logic works best when limited to a small number of simple business rule cases. As mentioned earlier, the expression language is not ideal for multiple test conditions and therefore may not be ideally suited for multifaceted and complex business rules. For enterprise-level business rule application, consider other tools in SSIS, such as the Script component or Execute SQL task (for operations that can be performed at the relational database level), or perhaps a separate tool such as SQL Server Data Quality Services or Master Data Services.

CHOOSING BETWEEN COMPLEX EXPRESSIONS AND OTHER TOOLS

In my experience, the majority of uses of SSIS expressions involve short, simple expressions. Interrogating the value of a variable, modifying the contents of an existing column, comparing two values, and other similar operations tend to require relatively brief and uncomplicated logic as an SSIS expression. However, there are many cases where a short-and-sweet expression just won’t get it done.

In these cases of more complicated logic, is an SSIS expression still the best choice? In some instances, the answer is no. As mentioned earlier, there are instances in the ETL cycle where the expression language is ill-suited to solve the problem. In cases where the logic required involves complexity that exceeds that which is practical or convenient for the SSIS expression language, a common pattern is to engage a separate tool to address the problem at hand. Some of the other methods for handling these complex logical scenarios are as follows:

·     Data source component: Especially when working with relational source data, it can be simpler and faster (both in design time and runtime) to build the necessary logic into the source component instead of using an expression in SSIS.

·     Execute SQL task: Sometimes it’s easier to load the data to a relational store and then perform transformation and cleansing there rather than doing it inline within the SSIS package. This methodology differs slightly from that of traditional ETL and is typically branded as ELT (extract/load/transform). Using this model, you could use the Execute SQL task to transform the data once it has been loaded from the source to the relational database in which it will be transformed.

·     Script task: When working in the control flow, you can substitute an instance of the Script task in place of an overly complex SSIS expression. When using a Script task for this purpose, you get the added benefits of IntelliSense, error handling, multistep operations, and the ability to include comments in your code.

·     Script component: Replaces complex expressions within the data flow, for the same reasons as stated previously. In addition, the Script component may be used as a source, transformation, or destination in the data flow surface, giving you even greater control of the manipulation of data than by strictly using expressions.

·     Custom task/component: If you find yourself reusing the same complex logic in many packages, consider creating a custom task or component that you can distribute to multiple packages without having to copy and paste script code to each package.

·     Third-party task/component: Sometimes it’s easier to buy (or borrow) than to build. There are hundreds, perhaps even thousands, of third-party tasks and components designed to extend the native behaviors of SSIS. In fact, many of these tools are freely available—often with the underlying source code in case you need to further customize the behavior of the tool.

There are no hard-and-fast rules defining when an expression may not be the best solution. However, there are a few design patterns that I tend to follow when deciding whether to use an expression or some other tool when applying dynamic logic in my SSIS packages. Typically, I will avoid using expressions in situations where

·     The expression will be exceptionally lengthy: If the logic required in an expression would exceed more than a few hundred characters, a script or other tool is often a better choice.

·     The expression requires more than three levels of nesting: Especially in cases where If/Then/Else logic is required, there’s frequently a need to respond to more than one condition (if/then/elseif/then/else), and unfortunately, the only way to accomplish this in the SSIS expression language is by nesting conditional operators.

·     Complex string interrogation or manipulation is required: Simple string manipulation is easy enough through SSIS expressions with the use of well-known functions such as SUBSTRING, REPLACE, LEFT/RIGHT, UPPER/LOWER, and REVERSE. However, more advanced operations (extracting text from the middle of a string, replacing multiple patterns of character(s), extracting numbers embedded in text, etc.) usually requires overly complex expressions. Further, some text operations such as regular expression (RegEx) matching are not natively supported in the SSIS expression language.

·     The logic requires a volatile type cast: Because the SSIS expression language has no error handling in itself, a conversion that is prone to failure (text to number, Unicode to ASCII, moving from a larger to smaller capacity of the same type) may cause an undesired interruption in your package flow. Often, I’ll wrap these into a Script task or Script component using a TryParse() method or a Try-Catch block, which allows a greater amount of flexibility in the event of a type cast failure.

The bottom line is that not every ETL challenge within SSIS should be solved using expressions. The expression language was intended as a lightweight solution, and used in that context, it is an outstanding supplement to the product line. Try to think of SSIS expressions as spackle; small, light, elegant, and used pervasively, but in small doses. As effective as spackle is, a building contractor would never think of building an entire house using only spackle. As with any tool, expressions in SSIS are best used in proper context and should not be considered as a one-size-fits-all solution to every problem.

Conclusion

ETL can be hard. Often, it’s not the big design problems but the small “how do I . . .?” tactical questions that collectively cause the most friction during SSIS development. The SSIS expression language was designed for these types of questions. Its small footprint, somewhat familiar syntax, and extensive usability across the breadth of SSIS make it an excellent addition to the capabilities within Integration Services. Used properly, it can help to address a variety of problem domains and hopefully ease the burden on the ETL developer.