Google News
logo
SSIS Interview Questions
SSIS is a short form of 'SQL Server Integration Services'. It is one of the significant components of Microsoft SQL Server and was initially introduced with SQL Server 2005. SQL Server 2005 is said to be the next generation of SQL Server program after SQL Server 2000, and it replaced the DTS (Data Transformation Services) in SQL Server 2000.
 
* SSIS is very beneficial for accomplishing a wide variety of data transformation, data integration, and data migration jobs.
* SSIS is also known as an ETL tool mainly used to handle tasks related to data extraction, data transformation, and load operations.
* Additionally, SSIS also enables developers to use it for tasks other than ETL, such as data profiling, file system manipulation, etc. Apart from this, we can perform powerful batch operations in SSIS with the support of programming languages like C#.net or VB.net.
* Most companies use SSIS to store and manage their data in warehouses, and further extract and load it whenever required.
SSIS packages are organized collections of various connections and elements related to data flow, event handlers, variables, control flow, configurations, or parameters that would be used for assembling and programmatically building graphical design tools. They can also be used for populating data from different sources which can be later used for standardizing and administrative purposes. They are created in BIDS - Business Intelligent Development Studio.
SSIS control flow allows you to program graphically how the tasks will run by using the logical connectors between tasks.  there are three basic logical connectors that you can use: success, failure, or complete.  Furthermore, you can use the FX (expression) to handle more complex conditions in your control flow.
The important component in SSIS package are :
 
* Data flow
* Control flow
* Package Explorer
* Event handler
A task in SSIS is very much similar to the method of any programming language that represents or carries out an individual unit of work. 

Tasks are categorized into two categories :
 
* Control Flow Tasks
* Database Maintenance Tasks
SSIS Expressions are used for filtering information based on parameters and conditions to get desired information. They make lives easy by helping to work with conditions, loops, and dynamic connections. The expression could consist of symbols, literals, identifiers, operators, and functions. They are mainly used for updating properties dynamically at runtime. Once we have the expressions ready, an expression evaluator parses this expression for validating the rules of the expression.

The expressions are used for the below 2 cases :
 
Control Flow:
* Here, the variable values are derived using other variables employing expressions.
* For validating any condition, precedence constraints are used.
* Used for setting properties dynamically at runtime by tasks and containers.

Data Flow:
* Expressions are used on rows to get conditional split transformations evaluating to true or false.
* Helps to create a new column based on the result of applying expressions to get derived transformations on new columns.
Data transformation is the critical SSIS process used to extract specific data from the corresponding data source. Once the extraction is completed, the process further helps to manage and transfer the extracted data to a separate file we have selected. The data transformation generally implements several rules and functionalities to the extracted data to derive the data for loading into the separate file or end file (also called end target).
Data Transfermation
 
SSIS DTS
SSIS stands for SQL Server Integration Services. DTS stands for Data Transformation Services.
It has powerful and complex error handling support. It has limited error handling support.
It contains message boxes in .NET scripts. It contains message boxes in ActiveX scripting.
It consists of an interactive deployment wizard. It does not have any deployment wizard.
It has a wide range for the number of transformations. It has a limited number of transformations.
It is suitable for complete BI (Business Intelligence) integration. It does not support any BI functionality.
Different types of connection that work within SSIS are :
 
* ODBC
* OLEDB
* .net SQLClient
* Flat File
* Excel
* XML
In SSIS, a container is a logical grouping of tasks, and it allows to manage the scope of a task together.
 
Types of containers in SSIS are :
 
* Sequence container
* For loop container
* Foreach loop container
* Task host container
SSIS and Informatica both perform well for the medium SQL server environment and are secure enough to use. However, there are some differences :
 
SSIS Informatica
It is easy to use and maintain. It is easy to use but difficult to maintain as compared to SSIS.
It can be easily implemented. It is a bit difficult to implement.
It has no cost, meaning it is cost-free or free to use. It is quite expensive.
When it comes to productivity, SSIS is considered a moderate option. It has comparatively higher productivity while implementing larger applications.
SSIS is a newer approach. Informatica is older than SSIS.
The 'bulk insert task' in SSIS is mainly used to upload vast amounts of data from flat files to SQL Server. In this process, only OLE DB connections to the destination database are supported.
The manifest file is the utility having information that is useful for deploying packages using file system wizard and database of SQL server.
SSIS permits the creation of two variable types- global and task-specific variables. The scope of global variables is available for all tasks of a specified process. The task-specific variables have scope specific to the task.
SSIS has more advantages when dealing with complex transformation processes involving slow-changing dimensions. They are:
 
* SSIS is known for managing memory efficiently thereby resulting in much bigger improvements in performance when compared to stored procedures.
* SSIS supports GUI (Graphical User Interface) and helps developers develop complex transformations easily and in a reliable way.]
* SSIS helps in easier interaction with external sources which are particularly useful during data cleansing.

If the ETL (Extract, Transform, Load) process consists majorly of Extract and Load with very little Transform, then Stored Procedures work well because they do not deal with cursors.
These are the transformations that are similar to the IF-ELSE condition of programming languages which checks for the condition and executes the process if the condition evaluates to true. Here, we can also specify the order of evaluation for the conditions and also provide the default value as the output.
There are some disadvantages while using SSIS. They are as follows :
 
* If we need to see the report of package execution, then we will have to install new software called Management Studio. SSIS does not have the option to publish to reporting services.
* SSIS utilizes high memory which can cause conflicts with memory utilization of SQL. This causes problems if there are requirements to run multiple packages in parallel.
* SSIS also uses high CPU resources which again results in problems while running multiple packages parallely. Proper CPU allocation needs to be ensured between SQL and SSIS to avoid either of them running slow.
While gathering data from different sources and writing it to a destination, connection managers are helpful.  Connection manager facilitates the connection to the system that includes information’s like data provider information, server name, authentication mechanism, database name, etc.
A breakpoint enables you to pause the execution of the package in the business intelligence development studio during troubleshooting or development of an SSIS package.
In SSIS, event logging allows you to select any specific event of a task or a package to be logged. It is beneficial when you are troubleshooting your package to understand the performance package.
SSIS packages and all the associated tasks have a property called LoggingMode. This property accepts three possible values.
 
Disabled : To disable logging of the component
Enabled : To enable logging of the part
Use Parent Setting : To use the parent’s setting of the component
The following are the advantages of using SSIS :
 
* SSIS is capable of handling the data from heterogeneous data sources in the same package
* Works with complex data, such as FTP, HTTP, MSMQ, analysis services, etc.
* Offers transformation functionality and is easier to maintain
* Tight integration with Microsoft Visual Studio and SQL Server
* Helps remove the network as a bottleneck for the insertion of data by SSIS into SQL
* Good at performing complex transformations, structured exception-handling, multi-step operations, and aggregating data from several data sources.
* Allows the loading of data in parallel across multiple destinations
The following are some of the most important features of SSIS :
 
* Improves data quality by providing data cleansing and profiling
* Helps in smooth data integration from several data sources
* Easy integration with other components of Microsoft SQL products
* Enriched studio environment and graphical tools
* Offers workflow functionalities such as file transfer protocol
* Provides the APIs for SSIS object modeling
* Implements high-speed data connectivity efficiently
* Comes with packaged data source connectors
* Organizes data mining query and lookup transformation
The following are some significant reasons to use SSIS :
 
* SSIS helps merge the data from several data stores and load data to various destinations while ensuring great speed
* Very versatile and flexible, reducing the need for dedicated developers
* Tightly integrates with the Microsoft family of products
* Cost-effective tool, offering robust error handling and a smart programming style that will help developers write lengthy code quickly
* Provides business intelligence output with the data transformation process
* Automates the process of data loading and other administrative operations
* Detailed documentation and strong community support
* This tool can analyze, clean, load, transform, aggregate, merge, and extract the data with ease.
* Easy to configure, manage and handle
The following are the major components of the SSIS work environment:
 
SSIS Toolbox : Contains all the necessary operational tools for controlling the data flow

Properties Window :
Displays the properties of the selected component

SSIS Package :
Shows the exact package that is currently in use

Solution Explorer :
Determines the browsing of existing solutions and all the available files

Connection Manager :
Displays existing active connections
There is always a probability of an error while transferring vast amounts of data from various data sources to their destination. Therefore, it becomes necessary to identify the data that cause errors. To find the error, you must log the errors.
 
Each component of SSIS (source, transformation, and destination) should have log support using a secondary pipe that will define the behavior during errors. You can connect the error flow to another transformation as per the requirement. You can include the details such as the error column, error description, and codes in your error log for better understanding.
In SSIS, a deployment is a process to update the package state from development mode to executable mode. You can deploy by right-clicking the operations on the integration services project and then selecting the build option. This will create the “package. dtsc” file under the bin folder. You can use the deployment utility for deploying the packages at the SQL server.
 
You can follow these steps for creating a deployment utility :
 
* Navigate to Project and right-click on it
* Double click on the properties option
* You can choose the path for your deployment by selecting the box next to “True” available under the Create Deployment Utility
* Close the window after saving all the changes
* Right-click again on the project and then click the build option. A deployment folder will be created under the BIN folder.
* Find the .manifest file in the deployment folder
* To perform the package deployment on the SQL server, double-click the manifest file and choose the deploy option
Merge transformations : This merges the data from two sources and creates a single output. You can use these transformations to break the data flow path and create a different error path. After the error is handled, you can merge it back to the main data flow. For implementing this sort, you need to sort the data, and the metadata should be the same for two different paths. 
 
Union all transformations : This works similar to merge transformation but the only difference is that you do not have to sort the data. It will merge the data from different sources into a single output.
Late Arriving Dimension is inevitable; to manage these we need to build a model dimension with business key and maintain the remaining of the attributes as default.
The greatest and top way to perform incremental load is by utilizing a Timestamp column in the supply table and storing the final ETL timestamp.
The following containers are present in SSIS :
 
Task Host Containers : Default container where all single tasks are available and are used for background scenes in SSIS. This is not present in the toolbox of Visual Studio and is assigned to the task implicitly. This container is also used to extend event handlers and variables.

Sequence Containers : These handle the flow of package subset and helps to divide packages into smaller pieces.
* These are displayed in the Control Flow tab. We can drag and drop containers from the toolbox in the design pane and then add a set of tasks into the container. Following things can be achieved using sequence containers:
* Group tasks for disabling a portion of the unused package.
* To narrow the variable scope to the container.
* Manage properties of tasks easily by setting container properties.
* Ensure multiple executions of tasks by making use of one method.
* Create transactions for data-related tasks and not package as a whole.
* Create event handlers such as sending mail in case things go wrong.
 
For Loop Containers : Helps to create a loop in the execution of packages. SSIS does this by initializing expression optionally and continues evaluation until expression becomes false. In the below example, We have the task name “Wait for file to arrive” which is inside For Loop Container, and the task is executed until the expression is evaluated to false. The Load File task is executed after the loop execution is complete.
 
Foreach Containers : This comprises of powerful loop mechanism which enables the loop using object collection. As the collection is looped, the variable is assigned values from the collection which is used by connections or tasks outside or within the container.
Data profiling is the process of examining information by comprehending its condition, readability, examples, nulls, or numbers. This is performed at the beginning of the advancement cycle for structuring the database blueprint and assessing the quality of the data.
Data Profiling Task
There are mainly three different types of Lookup Cache Modes present in SSIS Lookup Transformation :
 
Full Cache Mode : This type of cache mode helps SSIS query the database before the beginning of the data-flow task execution. This mode is a critical part of the pre-execute phase. Besides, SSIS copies all the data from the reference table (or lookup table) into the SSIS lookup cache during full cache mode.
 
Partial Cache Mode : This cache mode helps SSIS to query the database against new rows from different sources. In this mode, the row is cached into the SSIS lookup cache only in the case when there is a subsequent match. Once the cache gets full, SSIS automatically starts removing existing rows based on the match and usage stats. After that, new matching rows are loaded into the lookup cache.
 
No Cache Mode : As the name suggests, SSIS doesn't cache any rows in this cache mode unless there are two subsequent sources with the same lookup values. In 'No Cache Mode', the database is queried to get the match data/value from the reference table for each row coming through the source.
In most cases, the error occurs during transformation due to unexpected input of data values. There can be several different scenarios when an error may occur. For example- while applying a transformation to column data, loading data into destinations, extracting data from sources, etc.
 
The most critical errors commonly found in SSIS are :
 
Data Connection Errors : This type of error is commonly seen when the connection manager cannot be initialized with a connection string. This can be seen in both the data-source and the data-destination, along with the control flow that uses the connection strings.

Data Transformation Errors : This type of error is observed while converting the data from the data source to a destination (in the data pipeline).

Expression Evaluation Errors : This type of error can usually be seen in a scenario where values evaluated at a run-time exhibit invalid performance.
The ignore failure option in SSIS is mainly used to ignore the errors during the transformation process. Using this, an error is ignored, and the data row is iterated to move on to the next transformation.
 
This option helps in a scenario where we have incorrect data (JUNK data) coming from the source. We can use the Ignore failure option to redirect such data to another transformation instead of failing the entire package. The option simply allows us to move only valid data to the destination. Additionally, it helps to move incorrect data into a separate file.
In SSIS, all the tasks and packages contain a property referred to as Logging Mode. The event logging mode properly allows or accept the following three values :
 
Enabled : This option helps us to log the components.
 
Disabled : This option allows us to disable the components.
 
UserParentSetting : This option allows us to modify or optimize the parent's settings.
If your package is deployed on SQL Server then you can back up the MSDB database as all the packages on SQL server deploy at MSDB.
The OLE DB Command Transform is a component designed to execute a SQL Statement for each row in an input stream. This task is analogous to an ADO Command Object being created, prepared, and executed for each row of a result set. The input stream provides that data for parameters that can be set into the SQL Statement that is either an Inline statement or a stored procedure call.
In SSIS there is one task Execute TSQL task which is similar to Execute SQL task. We will see what is the difference between the two.
 
Execute the TSQL Task :  
Pros : Takes less memory, faster performance
Cons : Output into variable not supported, Only supports ADO.net connection
 
Execute SQL Task:
Pros : Support output into variables and multiple types of connection, parameterized query possible.
Cons : Takes more memory, slower performance compared to the TSQL task.
Config file in SSIS is used to provide inputs to connection manager different properties that package use to assign values at runtime dynamically.
 
Using config file users need to make changes to the config file which package will take automatically at runtime because of using it you don’t need to every time make changes in the packages every in case you are deploying package on multiple servers or locations. There are multiple ways in which configuration values can be stored.
 
XML configuration file : Store the config file as an XML file.
 
* Environment variable Store the config in one of the environment variables.
 
* Registry entry Store the config in the registry
 
* Parent package variable Store the config as a variable in the package that contains the tasks.
 
* SQL Server Store the config in a table in SQL Server
* The control flow is for designing the flow of the package. Data flow is for the ETL process.
 
* Data Flow is the subset of control flow
 
* There will be only one control flow while multiple dataflow can exists.
 
* Data flow cannot work without a control flow
 
All process base tasks are part of control flow while ETL related tasks are part of Dataflow which is again a subset of control flow.
VARCHAR(MAX) column values will be stored IN the table row, space permitting. So if you have a single VARCHAR(MAX) field and it’s 200, 300 bytes, chances are it’ll be stored inline with the rest of your data. No problem or additional overhead here.
 
Only when the entire data of a single row cannot fit on a single SQL Server page (8K) anymore, only then will SQL Server move VARCHAR(MAX) data into overflow pages.

So all in all, I think you get the best of both worlds – inline storage when possible, overflow storage when necessary.
Yes, you can schedule packages for any time as required. You can do it by designing SQL Server agent jobs with similar work steps as the SSIS packages. This job will fetch the ‘dtexec’ order utility for executing the bundle. You can run this bundle or schedule it as per the request.
You can use variables in the script tasks for exchanging the data with other objects of the packages. The script task uses the property “variables of Dts object” to read and write the variables object of the package. For making the variable available to the custom script, you can add them to the read-only or read-write variables list. These variables are case-sensitive, so make sure that you add the right variable name to the list.
Yes, you can stop a forever-running package, but it depends on where the package is running. If the package is running within the SQL Agent, you can kill the entire process using T-SQL. But if the package is running within the SSIS catalog, you need to use the stop-operation method. Apart from this, you can use the Active operations window to stop the SSIS catalog’s running process.
The following are the different data viewers available in SSIS:
 
* Grid
* Histogram
* Scatter Plot
* Column Chart
The following are possible locations :
 
* SQL Server.
* Package Store.
* File System.
There are 3 types of debugging
 
* Execute Package partially
* By using breakpoints
* By using Data Viewers
The SSIS connection manager has issues in storing the password. Hence we can follow the below-proposed solutions:
 
Method 1 : Create and use SQL Server Agent Proxy account which uses credential that makes SQL Server Agent capable of the running job as an account having required permissions to run the job or as the account which created the package.
* This method is used for decrypting secrets and key requirements of the user are satisfied.
* There are chances that this method could fail because the secrets are decrypted by taking the current user and current computer information which will inadvertently fail if the project or package is moved to a different computer.

Method 2 : Set the value of Package Protection Level Property to Server Storage. This stores packages on the SQL Server and lets users access this through SQL Server database roles.

Method 3 : Set the value of Package Protection Level property as “Encrypt Sensitive with Password”. The method uses a password for encrypting sensitive information. This can be added to the SQL Server Agent job to run the job effectively.

Method 4 : Package configuration can be used to save sensitive information which can later be stored in a secured folder. When the package is run, the information is obtained from this config file. If we do not want to save the secrets in the package, we can set the property of Protection Level to “DontSaveSensitive”.

Method 5 : Package templates having a good protection level can be created for long-term solutions.