Professional Microsoft SQL Server 2014 Integration Services (Wrox Programmer to Programmer), 1st Edition (2014)

Appendix C. Interviewing for an ETL Developer Position

This appendix should be used as a preparation guide for interview questions you may encounter when interviewing for an ETL Developer position. This is not an all-encompassing list of what you may be asked, but it does provide a good sample of common questions. These questions also only focus on the technical aspects of interviewing and do not offer guidance for personality and work behavior, which are likely questions to occur during an interview. Each question includes a chapter reference to guide you on where you can read more details on the topic.

Oftentimes in technical interviews you will be asked to demonstrate skills or whiteboard concepts. So be prepared to not only answer questions verbally but also visualize processes to the interviewer. You should anticipate the interviewer requesting you to do this, so practice whiteboarding common ETL concepts.

QUESTIONS

1.           Discuss why you would use checkpoints and how they affect the execution of an SSIS package (See Chapter 15).

2.           If you have a package that runs fine in SQL Server Data Tools (SSDT) but fails when running from a SQL Agent Job what would be your first step in troubleshooting the problem (See Chapter 22)?

3.           Discuss why you would use transactions and how they affect the execution of an SSIS package (See Chapter 15).

4.           What techniques would you consider to add notification to your packages? You’re required to send e-mails to essential staff members immediately after a package fails (See Chapter 22).

5.           Explain what breakpoints are and how you would use them (See Chapter 18).

6.           Discuss what method you would use for looping over a folder of files and loading the file contents into a database (See Chapter 6).

7.           What techniques would you consider when adding auditing to your packages? You’re required to log when a package fails and how many rows were extracted and loaded in your sources and destinations (See Chapter 22).

8.           In the SSIS Data Flow what is the difference between synchronous (non-blocking) transformations and asynchronous (blocking) transformations (See Chapter 16)?

9.           How can you configure your SSIS package to run in 32-bit mode on a 64-bit machine when using some data providers which are only available on the 32-bit machines (See Chapter 14)?

10.       How is a Data Flow path different from a precedence constraint (See Chapter 1)?

11.       What are annotations and why should they be used in your package development (Chapter 2)?

12.       What feature in SSIS can be used to make a package reconfigure itself at runtime to add the current date to the end of a file name (Chapter 5)?

13.       Which task allows you to perform common file operations and name one limitation it has (Chapter 3)?

14.       Name one reason why you would have to choose a Merge Join over a Lookup Transformation (Chapter 7)?

15.       Explain what buffers are and how the SSIS Data Flow uses them (Chapter 4)?

16.       How can Data Quality Services be used with SSIS to clean incoming address data with known issues (Chapter 11)?

17.       What is Team Foundation Server and how can it integrate with SSIS (Chapter 17)?

18.       In what scenarios would you potentially utilize a Script Component in your SSIS Data Flow design (Chapter 9)?

19.       You need to run a SQL Server stored procedure inside your SSIS package to start an update process. How can you run a stored procedure in the Control Flow and also pass values into any parameters it may have (Chapter 3)?

20.       What does the FastParse do and which Data Flow components can it be enabled on (Chapter 4)?

ANSWERS

1.  When checkpoints are enabled on a package, if the package fails it will save the point at which the package fails. This way you can correct the problem and then rerun from the point that it failed instead of rerunning the entire package. The obvious benefit to this is if you load a million record file just before the package fails you don’t have to load it again.

2.  The account that runs SQL Agent Jobs likely doesn’t have the needed permissions for one of the connections in your package. Either elevate the account permissions or create a proxy account. To create a proxy account you need to first create new credentials with the appropriate permissions. Next, assign those credentials to a proxy account. When you run the job now, you will select Run As the newly created proxy account.

3.  If transactions are enabled on your package and tasks, then when the package fails it will rollback everything that occurred during the package. First make sure MSDTC (Microsoft Distributed Transaction Coordinator) is enabled in the Control Panel ⇒ Administrative Tools ⇒ Component Services. Transactions must be enabled not only on the package level but also on each task you want included as part of the transaction. To have the entire package in a transaction, set TransactionOption at the package level to Required and each task to Supported.

4.  This could either be set in a SQL Agent Job when the package runs or actually inside the package itself with a Send Mail Task in the Event Handlers to notify when a package fails. You could also recommend third-party tools to accomplish this.

5.  Breakpoints put pauses in your package. It’s a great tool for debugging a package because you can place a breakpoint on a task and it will pause the package based on execution events. A situation where I have used breakpoints is when I have a looping container and I want to see how my variables are changed by the loop. You can place a watch window on the package and type the variable name in. Set a breakpoint on the container and then stop after each iteration of the loop.

6.  This would require a Foreach Loop using the Foreach File Enumerator. Inside the Foreach Loop Editor you need to set a variable to store the directory of the files that will be looped through. Next, select the connection manager used to load the files and add an expression to the connection string property that uses the variable created in the Foreach Loop.

7.  This could be done by creating a database that is designated for package auditing. Track row counts coming from a source and which actually make it to a destination. Row counts and package execution should be all in one location and then optionally report off that database. This reporting database could allow you to see package execution trends and calculate other interesting metrics about package executions. There are also third-party tools that can accomplish this for you.

8.  A synchronous transformation is one in which the buffers are immediately handed off to the next downstream transformation at the completion of the transformation logic. A common example of a synchronous component is a Derived Column Transformation.

A transformation output is asynchronous if the buffers used in the input are different from the buffers used in the output. An asynchronous transform is blocking because it must first consume all the data flow rows first prior to sending any to the next task. A common example of an asynchronous component is an Aggregate Transformation.

9.  In order to run an SSIS package in Visual Studio in 32-bit mode, the SSIS project property Run64BitRuntime needs to be set to False. The default configuration for this property is True. This configuration is an instruction to load the 32-bit runtime environment rather than 64-bit, and your packages will still run without any additional changes. The property can be found under SSIS Project Property Pages ⇒ Configuration Properties ⇒ Debugging. When running a package from a SQL Server Agent job, you must check the property “Use 32 bit runtime” on the Execution Options tab of the job.

10.Precedence constraints are used to control the order of operations of task execution in the Control Flow. These constraints can run a task based on the success, failure, or completion of the previous task. They can also be made dynamic with the evaluation of an SSIS expression. A Data Flow path controls the direction of data movement inside the Data Flow. These allow for the developer to send error rows down special paths, while successful rows may be loaded into a table.

11.An annotation is a comment that you place in your package to help others and yourself understand what is happening in the package. Often annotations are used for identifying the version number of a package and a list of changes that have been made in the package. Using annotations can help clearly define what the package is intended for, who developed it, and what changes have been made.

12.This can be done with SSIS expressions and variables. Concatenating the GETDATE() with a filename will produce this result. This also requires a casting function to bring together a datetime value with the string value of the filename.

13.The File System Task is capable of copying, deleting, moving, and renaming directories or files. One limitation that this task has is that it does not participate in transactions. So if you turn on a transaction on the entire package with a File System Task that deletes a file, you will not recover that file during a rollback.

14.The Lookup Transformation can only join data from OLE DB connections, so if you need to join data from a Flat File, then you must use a Merge Join Transformation to accomplish this.

15.Consider buffers like buckets of memory resources that can be used by SSIS. Data flows out of a source in memory buffers that are 10 megabytes in size or 10,000 rows (whichever comes first) by default. As the first transformation is working on those 10,000 rows, the next buffer of 10,000 rows is being processed at the source.

16.Data Quality Services requires that you define a Knowledge Base that can define what makes a good address value and what makes a bad address value. Once the Knowledge Base is set up you can integrate it into SSIS using the DQS Cleansing Transformation in the Data Flow of your package. Incoming rows will be evaluated by DQS and output corrected records that will be sent to a destination.

17.Team Foundation Server (TFS) is an enterprise software development life cycle suite and project management repository consisting of collaborative services, integrated functionality, and an extensible application programming interface (API). TFS allows developers to easily manage code developed and to work together on solutions through a process of checking in and checking out files while they are under development. TFS tightly integrates with Visual Studio, making it easy for developers to manage their code in the same application they use for developing SSIS packages.

18.The Script Component in the SSIS Data Flow can be used as a custom Source, Transformation, or Destination. You may need to use a Script Source when you are loading unusual file formats. For example, the data does not follow standard column and row formats. You may use a Script Transformation if the native transformations that are provided to you don’t solve the data extraction problem you have. For example, the Script Transformation may be used to apply regular expression formatting to your data beingextracted. You may need to use the Script Destination if your required output of data does not follow traditional column and row formats. For example, your business is requiring a file sent to customers that has summary rows after each category of data that would cause a break in the traditional row and column format of the source data.

19.Using the Execute SQL Task you can run SQL Server stored procedures. If the stored procedure has parameters, those are identified in SSIS with a question mark. You can then pass values into the question mark(s) using the Parameter Mappings page of the Execute SQL Task Editor.

20.The FastParse property can be turned on to skip validation steps that SSIS does when evaluated date, time or numeric data. Turning this property on can in some cases significantly help performance. FastParse can be enabled on either the Flat File Source or the Data Conversion Transformation. It is turned on at a column level, so for each column of the appropriate data types you must change the FastParse property to TRUE.