Professional Microsoft SQL Server 2014 Integration Services (Wrox Programmer to Programmer), 1st Edition (2014)
Chapter 6. Containers
WHAT’S IN THIS CHAPTER?
· Learning when to use containers
· Working with Sequence Containers
· Working with the For Loop Container
· Using a Foreach Loop Container to iterate through a list
WROX.COM DOWNLOADS FOR THIS CHAPTER
You can find the wrox.com code downloads for this chapter at www.wrox.com/go/prossis2014 on the Download Code tab.
In Chapter 3, you read about tasks and how they interact in the Control Flow. Now we’re going to cover one of the special types of items in the Control Flow: containers. Containers are objects that help SSIS provide structure to one or more tasks. They can help you loop through a set of tasks until a criterion has been met or group a set of tasks logically. Containers can also be nested, containing other containers. They are set in the Control Flow tab in the Package Designer. There are three types of containers in the Control Flow tab: Sequence, For Loop, and Foreach Loop Containers.
TASK HOST CONTAINERS
The Task Host Container is the default container under which single tasks fall and is used only behind the scenes for SSIS. You won’t notice this type of container. You’ll notice that this type of container is not in the Toolbox in Visual Studio and is implicitly assigned to each task. In fact, even if you don’t specify a container for a task, it will be placed in a Task Host Container. The SSIS architecture extends variables and event handlers to the task through the Task Host Container.
Sequence Containers handle the flow of a subset of a package and can help you divide a package into smaller, more manageable pieces. Some nice applications that you can use Sequence Containers for include the following:
· Grouping tasks so that you can disable a part of the package that’s no longer needed
· Narrowing the scope of the variable to a container
· Managing the properties of multiple tasks in one step by setting the properties of the container
· Using one method to ensure that multiple tasks have to execute successfully before the next task executes
· Creating a transaction across a series of data-related tasks, but not on the entire package
· Creating event handlers on a single container, wherein you could send an e-mail if anything inside one container fails and perhaps page if anything else fails
Sequence Containers show up like any other task in your Control Flow tab. Once you drag and drop any container from your SSIS Toolbox onto the design pane, you just drag the tasks you want to use into the container. Figure 6-1 shows an example of a Sequence Container in which two tasks must execute successfully before the task called Run Script 3 will execute. If you click the up-pointing arrow at the top of the container, the tasks inside the container will minimize.
A container can be considered to be a miniature package. Inside the container, all task names must be unique, just as they must be within a package that has no containers. You cannot connect a task in one container to anything outside of the container. If you try to do this, you will receive the following error:
Cannot create connector.
Cannot connect executables from different containers.
Containers such as the Sequence Container can also be nested in each other. As a best practice, each of your SSIS packages should contain a series of containers to help organize the package and to make it easy to disable subject areas quickly. For example, each set of tables that you must load probably fits into a subject area, such as Accounting or HR. Each of these loads should be placed in its own Sequence Container. Additionally, you may want to create a Sequence Container for the preparation and cleanup stages of your package. This becomes especially handy if you want to execute all the tasks in the container at one time by right-clicking on the container and selecting Execute Container.
Groups are not actually containers but simply a way to group components together. A key difference between groups and containers is that properties cannot be delegated through a container. Because of this, groups don’t have precedence constraints originating from them (only from the tasks). And you cannot disable the entire group, as you can with a Sequence Container. Groups are good for quick compartmentalization of tasks for aesthetics. Their only usefulness is to quickly group components in either a Control Flow or a Data Flow together.
To create a group, highlight the tasks that you wish to place in the group, right-click, and select Group. To ungroup the tasks, right-click the group and select Ungroup. To add additional tasks, simply drag the task into the group. This can be done in the Control Flow or Data Flow.
The same type of logic you saw in the Sequence Container in Figure 6-1 is also shown in Figure 6-2. In this case, the precedence constraint is originating from the task Run Script 2 to the task Run Script 3. You should always consider using Sequence Containers, instead of groups, because they provide a lot more functionality than groups.
FOR LOOP CONTAINER
The For Loop Container enables you to create looping in your package similar to how you would loop in nearly any programming language. In this looping style, SSIS optionally initializes an expression and continues to evaluate it until the expression evaluates to false.
In the example in Figure 6-3, the Script Task called Wait for File to Arrive is continuously looped through until a condition is evaluated as false. Once the loop is broken, the Script Task is executed.
For another real-world example, use a Message Queue Task inside the loop to continuously loop until a message arrives in the queue. Such a configuration would allow for scaling out your SSIS environment.
The following simple example demonstrates the functionality of the For Loop Container, whereby you’ll use the container to loop over a series of tasks five times. Although this example is rudimentary, you can plug in whatever task you want in place of the Script Task.
1. Create a new SSIS project called Chapter 6, and change the name of the default package to ForLoopContainer.dtsx.
2. Open the ForLoopContainer.dtsx package, create a new variable, and call it Counter. You may have to open the Variables window if it isn’t already open. To do this, right-click in the design pane and select Variables or click on the Variables icon on the top right of your package designer screen. Once the window is open, click the Add Variable button. Accept all the defaults for the variable (int32) and a default value of 0.
3. Drag the For Loop Container to the Control Flow and double-click it to open the editor. Set the InitExpression option to @Counter = 0. This will initialize the loop by setting the Counter variable to 0. Next, in the EvalExpression option, type @Counter < 5 and @Counter = @Counter + 1 for the AssignExpression (shown in Figure 6-4). This means that the loop will iterate as long as the Counter variable is less than 5; each time it loops, 1 will be added to the variable. Click OK.
4. Drag a Script Task into the For Loop Container and double-click the task to edit it. In the General tab, name the task Pop Up the Iteration.
5. In the Script tab, set the ReadOnlyVariables (see Figure 6-5) to Counter and select Microsoft Visual Basic 2012. Finally, click Edit Script to open the Visual Studio designer. By typing Counter for that option, you’re going to pass in the Counter parameter to be used by the Script Task.
6. When you click Edit Script, the Visual Studio 2012 design environment will open. Replace the Main() subroutine with the following code. This code will read the variable and pop up a message box that displays the value of the Counter variable:
7. Public Sub Main()
9. ' Add your code here
12. Dts.TaskResult = ScriptResults.Success
Close the script editor and click OK to save the Script Task Editor.
13.Drag over a Data Flow task and name it Load File. Connect the success precedence constraint to the task from the For Loop Container. This task won’t do anything, but it shows how the container can call another task.
14.Save and exit the Visual Studio design environment, then click OK to exit the Script Task. When you execute the package, you should see results similar to what is shown in Figure 6-6. That is, you should see five pop-up boxes one at a time, starting at iteration 0 and proceeding through iteration 4. Only one pop-up will appear at any given point. The Script Task will turn green and then back to yellow as it transitions between each iteration of the loop. After the loop is complete, the For Loop Container and the Script Task will both be green.
FOREACH LOOP CONTAINER
The Foreach Loop Container is a powerful looping mechanism that enables you to loop through a collection of objects. As you loop through the collection, the container assigns the value from the collection to a variable, which can later be used by tasks or connections inside or outside the container. You can also map the value to a variable. The types of objects that you will loop through vary based on the enumerator you set in the editor in the Collection page. The properties of the editor vary widely according to what you set for this option:
· Foreach File Enumerator: Performs an action for each file in a directory with a given file extension
· Foreach Item Enumerator: Loops through a list of items that are set manually in the container
· Foreach ADO Enumerator: Loops through a list of tables or rows in a table from an ADO recordset
· Foreach ADO.NET Schema Rowset Enumerator: Loops through an ADO.NET schema
· Foreach From Variable Enumerator: Loops through an SSIS variable
· Foreach Nodelist Enumerator: Loops through a node list in an XML document
· Foreach SMO Enumerator: Enumerates a list of SQL Management Objects (SMO)
The most important of the enumerators is the Foreach File enumerator since it’s used more frequently. In this next example, you’ll see how to loop over a number of files and perform an action on each file. The second most important enumerator is the Foreach ADO enumerator, which loops over records in a table.
Foreach File Enumerator Example
The following example uses the most common type of enumerator: the Foreach File enumerator. You’ll loop through a list of files and simulate some type of action that has occurred inside the container. This example has been simplified in an effort to highlight the core functionality, but if you would like a more detailed example, turn to Chapter 8, which has an end-to-end example. For this example to work, you need a folder full of some dummy files and an archive folder to move them into, which SSIS will be enumerating through. The folder can contain any type of file.
1. To start, create a new package called ForEachFileEnumerator.dtsx. Then create a string variable called sFileName with a default value of the word default. This variable will hold the name of the file that SSIS is working on during each iteration of the loop.
2. Create the variable by right-clicking in the Package Designer area of the Control Flow tab and selecting Variables. Then, click the Add New Variable option, changing the data type to a String.
3. Next, drag a Foreach Loop Container onto the Control Flow and double-click on the container to configure it, as shown in Figure 6-7. Set the Enumerator option to Foreach File Enumerator.
4. Then, set the Folder property to the folder that has the dummy files in it and leave the default Files property of ∗.∗. In this tab, you can store the filename and extension (Readme.txt), the fully qualified filename (c:\directoryname\readme.txt), or just the filename without the extension (readme). You can also tell the container to loop through all the files in subfolders as well by checking Transverse Subfolders.
5. Click the Variable Mappings tab on the left, select the variable you created earlier from the Variable dropdown box, and then accept the default of 0 for the index, as shown in Figure 6-8. Click OK to save the settings and return to the Control Flow tab in the Package Designer.
6. Drag a new File System Task into the container’s box. Double-click the new task to configure it in the editor that appears. After setting the operation to Copy file, the screen’s properties should look like what is shown in Figure 6-9. Select <New Connection> for the DestinationConnection property.
7. When the Connection Manager dialog opens, select Existing Folder and type the archive folder of your choosing, such as C:\ProSSIS\Containers\ForEachFille\Archive.
8. Lastly, set the IsSourcePathVariable property to True and set the SourceVariable to User::sFileName.
9. You’re now ready to execute the package. Place any set of files you wish into the folder that you configured the enumerator to loop over, and then execute the package. During execution, you’ll see each file picked up and moved in Windows Explorer, and in the package you’ll see something resembling Figure 6-10. If you had set the OverwriteDestination property to True in the File System Task, the file would be overwritten if there was a conflict of duplicate filenames.
Foreach ADO Enumerator Example
The Foreach ADO Enumerator loops through a collection of records and will execute anything inside the container for each row that is found. For example, if you had a table such as the following that contained metadata about your environment, you could loop over that table and reconfigure the package for each iteration of the loop. This reconfiguration is done with SSIS expressions. We cover these in much more depth in Chapter 5. At a high level, the expression can reconfigure the connection to files, databases, or variables (to name just a few items) at runtime and during each loop of the container.
The first time through the loop, your Connection Managers would point to Client1, and retrieve their files from one directory. The next time, the Connection Managers would point to another client. This enables you to create a single package that will work for all your clients.
In this next example, you will create a simple package that simulates this type of scenario. The package will loop over a table and then change the value for a variable for each row that is found. Inside the container, you will create a Script Task that pops up the current variable’s value.
1. Create a new OLE DB Connection. To start the example, create a new package called ForeachADOEnumerator.dtsx. Create a new Connection Manager called MasterConnection that points to the master database on your development machine. Create two variables: one called sDBName, which is a string with no default, and the other called objResults, which is an object data type.
2. Next, drag over an Execute SQL Task. You’ll use the Execute SQL Task to populate the ADO recordset that is stored in a variable.
3. In the Execute SQL Task, point the Connection property to the MasterConnection Connection Manager. Change the ResultSet property to Full Result Set, which captures the results of the query run in this task to a result set in a variable. Type the following query for the SQLStatement property (as shown in Figure 6-11):
Select database_id, name from sys.databases
4. Still in the Execute SQL Task, go to the Result Set page and type 0 for the Result Name, as shown in Figure 6-12.
This is the zero-based ordinal position for the result that you want to capture into a variable. If your previously typed query created multiple recordsets, then you could capture each one here by providing its ordinal position. Map this recordset to a variable calledobjResults that is scoped to the package and an object data type. The object variable data type can store up to 2GB of data in memory. If you don’t select that option, the package will fail upon execution because the object variable is the only way to store a recordset in memory in SSIS.
5. Back in the Control Flow tab, drag over a ForEach Loop Container and drag the green line from the Execute SQL Task to the Foreach Loop. Now open the Foreach Loop to configure the container. In the Collection page, select Foreach ADO Enumerator from the Enumerator dropdown box. Next, select the objResults variable from the ADO Object Source Variable dropdown, as seen in Figure 6-13. This tells the container that you wish to loop over the results stored in that variable.
6. Go to the Variable Mappings page for the final configuration step of the container. Just like the Foreach File Enumerator, you must tell the container where you wish to map the value retrieved from the ADO result set. Your result set contains two columns: ID and Name, from the sys.databases table.
7. In this example, you are working with the second column, so select the sDBName variable by selecting the variable from the Variable dropdown and type 1 for the Index (shown in Figure 6-14). Entering 1 means you want the second column in the result set, as the index starts at 0 and increments by one for each column to the right. Because of this behavior, be careful if you change the Execute SQL Task’s query.
8. With the container now configured, drag a Script Task into the container’s box. In the Script tab of the Script Task, set ReadOnlyVariables to sDBName and select Microsoft Visual Basic 2012.
9. Finally, click Edit Script to open the Visual Studio designer. By typing sDBName for the ReadOnlyVariables option in the Script tab, you’re going to pass in the Counter parameter to be used by the Script Task.
10.When you click Edit Script, the Visual Studio 2012 design environment will open. Double-click ScriptMain.vb to open the script, and replace the Main() subroutine with the following code. This one line of code uses the MessageBox method to display the sDBNamevariable.
11. Public Sub Main()
13. ' Add your code here
16. Dts.TaskResult = ScriptResults.Success
17. End Sub
18.Close the editor and task and execute the package. The final running of the package should look like Figure 6-15, which pops up the value of the sDBName variable, showing you the current database. As you click OK to each pop-up, the next database name will be displayed. In a less contrived example, this Script Task would obviously be replaced with a Data Flow Task to load the client’s data.
In this chapter, you explored groups and the four containers in SSIS: the Task Host, the Sequence Container, the For Loop Container and the Foreach Loop Container.
· The Task Host Container is used behind the scenes on any task.
· A Sequence Container helps you compartmentalize your various tasks into logical groupings.
· The For Loop Container iterates through a loop until a requirement has been met.
· A Foreach Loop Container loops through a collection of objects such as files or records in a table.
This chapter covered one of the most common examples of a Foreach Loop Container, looping through all the records in a table and looping through files. Each of the looping containers will execute all the items in the container each time it iterates through the loop.