SSIS Basics

An Introduction to Key SSIS Functionality

dataops-icon-dark-bg

Published: May 8, 2020

SQL Server Integration Services (SSIS) is a popular and mature tool for performing data movement and cleanup operations. In the Microsoft ecosystem, SSIS is one of the most common extract, load, and transform (ETL) tools in use today. SSIS is powerful and configurable, yet surprisingly easy to use.

 

This guide walks you through the essential moving parts of SSIS.  Check out the next guide in this series that focuses on how to build your first SSIS package.

SSIS Terminology

Before we explore the machinery of SSIS, it’s important to understand some of the terms you’ll often hear used by SSIS professionals. Here are some of the key terms and phrases you’ll need to know to be successful with SSIS.

  • Package: The SSIS package is the central component of the SSIS code, and it’s the canvas on which you will spend most of your development time. An SSIS package is a collection of one or more operations that are invoked together.
  • Task: A task is a single operation within a package. There are dozens of different types of tasks available in SSIS.
  • Component: A component is part of a data pipeline, representing either a source from which data is retrieved, a destination to which data is written, or a transformation that manipulates or reshapes the data.
  • Execution: This is the act of invoking, or running, the logic in an SSIS package. Packages can be executed from within the SQL Server Data Tools (SSDT) development environment or directly on a properly configured instance of SQL Server.
  • Deployment: A deployment occurs when the fully developed SSIS project is pushed from the development workstation to an instance of SQL Server, where it can then be executed either manually or through a scheduling tool such as SQL Server Agent. Deployment is usually more complex than copying code from one machine to another, although SQL Server does a good job of hiding that complexity for most deployments.
  • Project: Source code in SSIS is arranged into functional units called projects. A project can contain one package or many packages. In most cases, when deploying SSIS code, the entire project is deployed to the server.
  • Solution: A solution is a logical grouping of related projects.
  • The SSIS runtime engine: This is the logic that allows a package to run. When you’re working with SSIS packages in SSDT, the packages will be executed using the SSIS runtime on your development machine. After the code is deployed to SQL Server, any execution run on that server will use the server’s SSIS runtime.

The SSIS Toolset

SSIS is licensed as part of SQL Server, and the internals of running an SSIS package can be installed as part of the normal SQL Server install. As an SSIS developer, however, you will need a couple of additional tools installed on your workstation:

  • SSDT: SSDT is a lighter version of Microsoft Visual Studio, configured with the extensions to create SSIS projects. When building or testing SSIS packages, SSDT is the tool in which you will spend most of your time.
  • SQL Server Management Studio (SSMS): Although not strictly required for building SSIS packages, at some point during the development cycle you’ll need SSMS to test and deploy the ETL processes you create in SSIS.

When setting up your SSIS development environment, it’s important to remember that you must install the SSDT version of Visual Studio to be able to work with SSIS projects. If you already have Visual Studio installed, you’ll need to download and install the SSDT components.

 

 

Do you want to accelerate ETL processes and eliminate many tedious SSIS programming tasks?

 

SentryOne Test icon

Task Factory

Dozens of high-performing SSIS components that help you save time managing ETL tasks.

Learn More

 

The SSIS Development Environment

If you’ve ever worked with Visual Studio, the SSIS development environment will look familiar. SSDT uses the same multiple-document interface (MDI) that enables you to open or close each SSIS package independently, open several packages at once, and conveniently dock them as needed.

Inside this development environment, you’ll be frequently using the following assets:

  • Solution Explorer: This window allows easy browsing of the current solution and all the files contained within it. You can also add, delete, or rename files in the current solution.
  • Properties Window: This context-aware window shows the properties for the currently selected item.
  • SSIS Toolbox: Like the name implies, this is your toolbox of available operations in SSIS. This list is also context-aware and shows the tools for either the control flow or data flow (more on these later in the article) based on where you’re currently working.
  • SSIS Package: This is the package you’re currently working on. Although a single package is shown in Figure 1, the MDI allows you to have more than one package open at the same time.
  • Connection Manager Tray: This is the area where any currently available connections are shown. We’ll go into more details on the connection managers shortly.

Figure 1 shows the SSDT designer with each of these assets called out.

 

 

 

Figure 1 - SSIS Development EnvironmentFigure 1: The SSIS Development Environment

The Moving Parts of SSIS

Within each SSIS package, there are several fundamental moving parts.

Connection Managers

A connection manager defines the data structures that can be read from or written to during the execution of the package. A connection manager can refer to a relational database, a flat file, a database file (such as Microsoft Access), a web service, or a cloud structure (such as blob storage).

Once defined inside an SSIS package, a connection manager can be used by any of the tasks or components within that package. Think of the connection manager as a data gateway; it can be used to retrieve data (source), to write data (destination), and to manipulate or validate the data as it passes through the package logic (transformation). In fact, the same connection manager could be used as source, destination, and transformation within the same package.

Connection managers are unique in that they can be defined at the package level or at the project level. Package and project connections have the exact same functionality; the only difference is the scope of visibility. Package connections are usable within the package in which they are created, while project connections are accessible to any package in that project. The latter is very handy when creating a project that uses the same connection in multiple packages.

As shown in Figure 3, both the package and project connections show up in the Connection Managers tray, and any project connections are also shown in the Solution Explorer under the Connection Managers folder. In the Connection Managers tray, any project-scoped connection will display a (project) prefix in the name.

Package-level

Figure 3: Package-Level and Project-Level Connection Managers

Control Flow and Data Flow

The Control Flow and Data Flow tabs represent the workspace in which most of the package logic resides. We’ll go into greater detail on each of these tabs in the next section.

Variables

Like any good programming environment (and yes, SSIS is a programming environment), SSIS allows user-defined variables that can be statically set or dynamically manipulated during package runtime.

Variables are defined at the package level and, by default, are accessible by any task or component within the package. Each variable will have the following attributes:

  • A name: For obvious reasons.
  • A scope: By default, each variable is scoped at the package level. You can optionally limit the scope to a task or container. However, it’s a very rare situation that would require the variable scope to be changed.
  • A data type: You’ll find that everything in SSIS is strongly typed, and variables are no exception. Each variable is created with a specific data type, and any values written to that variable must honor the constraints of that variable. For example, the package will fail if you try to load a value of February 31st to a DateTime variable or if you mistakenly try to assign the value “apple” to an Int32 variable. Each of the SSIS variable data types has an analog in SQL Server (e.g., Int32 corresponds to the INT in SQL Server and the string data type aligns with NVARCHAR).
  • A value: For some data types (such as the string data type), the value can be left blank. For others (such as DateTime or any of the integer types), there must be a default value provided.
  • An expression (optional). Variables can be set to a static value or can be configured to use an expression for more dynamic behavior.

The Variables window, shown in Figure 4, is used to create or edit variable values.

SSIS Variables Fig 4Figure 4: The SSIS Variables Window

Variables in SSIS provide a great deal of flexibility when designing SSIS packages. Variables can be used to store operational values (such as row counts), configuration values (such as a particular file name when looping through multiple files), and can even be used to process serialized object data.

 

Parameters

Parameters were introduced in SSIS with the release of SQL Server 2012 to easily pass in runtime values to a package. Using SSIS parameters makes a project much more flexible by eliminating the reliance on hard-coded values.

Parameters behave a lot like SSIS variables. They both require a data type definition, can both be set with a default value, and can reduce administrative work by eliminating hard-coded static values within a package. There are some important differences, including:

  • Parameters can be scoped at either the package level or the project level, but the scope of a variable only goes as high as the package.
  • Parameters can be marked as Sensitive, in which case the value of that parameter will be encrypted when the package or project is saved. Variable values are always stored in plain text.
  • Parameters can be set to Required, which will prevent the package from executing if a value is not supplied at runtime. There is no such behavior with variables.
  • Parameters cannot use expressions to define their values, but variables can.
  • The value of a parameter, once set, remains static for the duration of that execution and cannot be changed after the execution starts. The value of a variable can be programmatically changed at any time (and as many times as necessary) during package execution.

Parameters are incredibly powerful in that they allow you to completely decouple values that might change over time, including database connection strings, file paths, and login credentials, from the source code. By using parameters to pass in these values at runtime, you can eliminate a lot of unnecessary code changes. Parameterizing values such as these also makes testing easier, by allowing a separate set of test values to be passed in at runtime with no code modification.

Parameters TabFigure 5: The Parameters Tab

To create or edit package parameters, open the package and click the Parameters tab. Project parameters can be found by opening the Project.params file in the SSIS project.

 

Expressions

SSIS has its own expression language for creating dynamic behaviors. Almost every task and component in SSIS can use expressions for replacing otherwise static values with a short bit of code that is interpreted at runtime.

Some of the practical uses for expressions in SSIS include:

  • Replacing a static query with one that uses a customized WHERE clause
  • Replacing an output file name with one that includes the current date and time in that file name
  • Creating a variable that uses an expression to concatenate several other variable values together (e.g., a directory path concatenated to a relative file name)
  • Substituting a hard-coded database connection string with an expression that uses a parameter, allowing the connection string to be supplied at runtime

Expressions can be used throughout SSIS. Many properties of tasks, components, and containers can be modified to substitute expression values for static text. Expressions can be used in the value expression for a variable, and an expression can reference other variables, parameters, or environment information (such as the date and time the current package began executing).

To view the expressions available for any given task or component in SSIS, you can select the item (indicated by the arrow) and find the Expressions item in the Properties window (shown inside the rectangle), as shown in Figure 6.

 

Expressions TabFigure 6: Expressions

 

After selecting the item and the property in which the expression will apply, you can use the Expression Builder window, shown in Figure 7, to design the expression. The list of variables and parameters (shown in the upper-left pane) and functions (shown in the upper-right pane) can provide a lot of help when building your expression. For the daring, you can code your expression entirely by hand in the Expression box.

Expressions are incredibly powerful in SSIS, but they do take a bit of getting used to. Don’t feel alone if you initially struggle with the syntax of the SSIS expression language; every SSIS developer who has ever worked with it has had moments of frustration when adapting to this new lingo. Use the Expression Builder to get acquainted with the nuance of creating SSIS expressions.

Expressions BuilderFigure 7: Expression Builder

The SSIS Control Flow

The control flow in SSIS is the starting point at which package development begins. The control flow is the design surface onto which tasks are added. The SSIS control flow is responsible for directing which tasks will execute and in what order.

The control flow is the default view when creating or opening a package. The Control Flow tab, at the top left of an opened package, reveals the control flow surface. On a new package, this will be blank; Figure 8 shows a package with two tasks added to the control flow.

When working with the SSIS control flow, there are three different types of objects you’ll use: tasks, containers, and precedence constraints.

SSIS Control Flow ScreenshotFigure 8: The SSIS Control Flow

Figure 12 shows an example of using multiple precedence constraint for both success and failure operations.

The setup shown in Figure 12 results in the following workflow:

  • DFT - Data Flow Task 1 and DFT - Data Flow Task 2 will start at the same time
  • If both DFT - Data Flow Task 1 and DFT - Data Flow Task 2 complete successfully, the Sequence Container will then be started
  • When the Sequence Container starts, DFT - Data Flow Task 3 and DFT - Data Flow Task 4 will start at the same time, since there is no precedence constraint between them
  • If DFT - Data Flow Task 1 fails, DFT - Data Flow Task 5 will start

Using containers and precedence constraints, the workflow of the package can be as simple or as complex as needed.

Precedence Constraint 2Figure 12: Precedence Constraint Example

Event Handlers

Event handlers are a special type of SSIS control flow. An event handler is designed to execute when a specific event occurs. Figure 13 shows an event handler set to execute when the executable (in this case, the package itself) encounters an error (defined as OnError, which is the default setting).

Note that, in this example, the event handler logic has not yet been set up; to create an event handler for this executable and this event, you can click the hyperlinked message in the middle of the design surface. The SSIS Toolbox for this tab shows all the tasks commonly available in the control flow.

Event handlers are useful tools but be careful that you don’t overuse them. Because the event handler logic can be set up individually for every task and container (as well as for the package itself), the overuse of event handlers creates a labyrinth of logic that is very difficult to maintain and debug.

Event HandlerFigure 13: Event Handler

 

The SSIS Data Flow

As mentioned earlier, the Data Flow task is one of many tasks in the SSIS Toolbox. However, this task differs significantly from the other tasks in that it has its own child elements, referred to as components, that can be connected to create end-to-end data flow operations.

The data flow can be accessed through the Data Flow tab in the package editor. If a package contains multiple data flows, you can use the Data Flow Task drop-down menu, shown in Figure 14, to select the data flow to edit.

 

Data Flow Layout ScreenshotFigure 14: Data Flow Layout

To learn how to create your first SSIS package, check out this guide or download the free PDF!

 

image (43)

Download the PDF

About The Author

yjPWypEd_400x400

Tim Mitchell is a Business Intelligence architect, author, and trainer. He has been building data solutions for over 15 years, specializing in data warehousing, ETL/SSIS, and reporting. He holds a Bachelor’s Degree in Computer Science from Texas A&M at Commerce, and is recognized as a Microsoft Data Platform MVP. Tim is the founder and principal data architect at Tyleris Data Solutions, a consulting firm focused on delivering on-premises and cloud-based solutions on the Microsoft Data Platform.

As an active member of the community, Tim has spoken at international, regional, and local venues, including the PASS Summit, SQLBits, SQL Connections, SQLSaturday events, and various user groups and webcasts. Tim is coauthor of the book “SSIS Design Patterns,” and is a contributing author on the charity book project “MVP Deep Dives 2.” He is a member of the Boulder BI Brain Trust, a Melissa Data MVP, and is active in the North Texas SQL Server Users Group in the Dallas area.

You can visit his website and blog at TimMitchell.net or follow him on Twitter.