SSIS
Brief description of SSIS?
- It is a platform for building high performance data integration solutions, including E,T,L(Extract, Transform, Load).
- Solution For automating data movements.
- Build a package made up of a task that can move data around from place to place and alter it on the way.
- Hosted within business Intelligence development studio to help to build those package as well as an API for programming SSIS objects from other application.
Package is a collection of SSIS objects including:-
- Connection - To data Source
- Data Flow- Which include the source and destination that extract and load data.
- Control Flow- Where you tell the SSIS package what the package will do exactly.It includes Tasks and Containers that execute when package runs.We can organise task in sequence and in loops.
- Event Handler- Which are workflow that runs in response to events raised by a package task or container.
It provide the functionality to our package. It is a individual unit of work.
Precedence Constraints:
Provide ordinal relationship between various items in your package.
Batch:
A batch is defined as a group of sessions.
2 types:
- Parallel batch processing
- Sequential batch processing
For deploying the package we can use "Package Deployment Utility"
Difference Between Data Flow Vs Control Flow
SSIS uses Connection Manager to integrate different datasource into packages. it include a wide variety of different Connection Manager
e.g ADO, ADO.Net,Cache, MSOLAP, Excel, File, Flat File etc.
How to create Connection Manager?
Steps:- Right Click anywhere in the conn mngr area of a package in BIDS & choose the appropriate shortcut from menu. Each conn mngr has its own custom configuration dialog box with specific options.
Name Few Command Line Utilities/tools in SSIS ?
- DTEXECUI - A user interface is loaded in order to configure each of the application parameter to execute an SSIS package.
- DTEXEC- Where all of the needed switches must be passed into the cmd for successful executing of SSIS package.
3 types
- For Loop- Repeat a task a fixed number of times.
- For Each Loop- Repeat a task by enumerating over a group of objects.
- Sequence- Group multiple task into a single unit for easier management.
what is the extension of SSIS package?
dtsx(Data transformation services with XML format)
Parallelism is achieved using MaxConcurrentExecutable property of the package.Its default is -1 and calculated as number of processor +2.
How would you restart the package from the previous failure point. What are the Checkpoints and how we can implement in SSIS? (Do it practically ).
when a package is configured to use checkpoints, information about package execution is written to a checkpoint file.
When a failed package is rerun , the checkpoint file is used to restart the package from the point of failure.
If the package runs successfully, then the checkpoint file is deleted and then recreated the next time that the package is run.
Difference between Union All and Merge Join ?
- Merge transformation can accept only two inputs where as union all can take more than 2 inputs
- Data has to be sorted before Merge Join Transaformation, where as Union all doesn't have any condition like that.
What is Logfile and how to send log file to conn manager ?
It is specially useful when the package has been deployed to the production enviorement, and you cant use BIDS to debug the package.
SSIS enables you to implement logging code through the DTS log method is called in the script, the SSIS engine will route the msg to the log provider that are configured in the containing package.
How do you logging in SSIS ?
It includes logging feature that write log entries when runtime events occurs and can also custom msg.
log the details of various events like on error, on warning, etc to the various option say flat file, Sql Server table, XML, or SQL Profiler. Logs are associated with a package and are configured at the package level.
To enable logging in a Package :- STEPS
- In BIDS open Integration service project that contains the package.
- On SSIS menu click Logging.
- Select a log provider in the provider type list and then click add.
In MSDB database
Name the task to execute batch file in SSIS ?
Execute Process task
How would you do Error Handling in SSIS ?
A SSIS package could mainly have 2 types of error
- Precedence Error : can be handle in Control Flow through the Precedence Control and redirecting the execution flow.
- Data Error: it can be handled in Data Flow Task but redirecting the data flow using error output of a component.
For e.g
data conversion fails because a column contain a string instead of a number, i.e insertion fail into database
Deployment Steps of SSIS Package
We have to configure like
Two files are created after build which is in Deployment folder
1) .dtsx file
2) .manifest file
Take the Deployment folder to the production environment.
So, once we have the .Manifest file
Double click on it, so it will pop up a package installation wizard
There is Two Type of installation
1) File system installation
2) Sql Server installation
In short you may say:
1) Create a deployment utility by setting its property TRUE
2) It will be created in the bin folder of the solution as soon as the package is build
3) Copy all the files in the utility and use .manifest file to deploy it on production environment.
How to pass property value at runtime? How do you implement package configuration.
e.g By using configuration, we can change connection string of a connection manager, or update the value of a variable.
Package Configuration Benefits:
1) Configurations make it easier to move package from development to production environment.
e.g. – Configuration can be update the path of a source file and change the name of the database etc.
2) Configuration is useful when you deploy package to many different servers.
3) Configuration makes package more flexible.
e.g. – configuration can update the values of a variable i.e. used in a property expression.
TRANSFORMATIONS IN SSIS??
It is an object that generates, modifies, or passes data.
1.AGGEGATE :-It applies an aggregate function to grouped records and produces new output records from aggregated results.
2.AUDIT :-the t/r adds the value of a system variable, such as machine name or execution instance GUID to a new output column.
3.CHARACTER MAP :-this t/r makes string data changes such as changing data from lower case to upper case.
4.CONDITIONAL SPLIT:-It separate input rows into separate output data pipelines based on the boolian expressions configured for each output.
5.COPY COLUMN:-add a copy of column to the t/r output we can later transform the copy keeping the original for audIting personal
6.DATA CONVERSION:-converts a columns data type to another data type.
7.DATA MINING QUERY:-perform a data mining query against analysis services.
8.DERIVED COLUMN:-create a new derive column calculated from expression.
9.EXPORT COLUMN:-It allows you to export a column from the data flow to a file.
10.FUZZY GROUPING:-perform data cleansing by finding rows that are likely duplicates.
11.FUZZY LOOKUP:-matches and standardizes data based on fuzzy logic.
eg:-transform the name jon to john
12.IMPORT COLUMN:-reads the dat from a file & adds It into a dataflow.
13.LOOKUP:-perform the lookup of data tobe used later in a transform.
ex:-t/f to lookup a cIty based on zipcode.
1.getting a related value from a table using a key column value
2.update slowly changing dimension table
3.to check whether records already exist in the table.
Deployment Steps of SSIS Package
- Go to the properties of your package by right click—Properties
- Property page is open
- Under config properties – Deployment utility
We have to configure like
- Allow Configuration changes TRUE
- Create Deployment Utility TRUE by default it is False.
- Deployment Output path – Bin\Deployment
- Build your package.
Two files are created after build which is in Deployment folder
1) .dtsx file
2) .manifest file
Take the Deployment folder to the production environment.
So, once we have the .Manifest file
Double click on it, so it will pop up a package installation wizard
There is Two Type of installation
1) File system installation
2) Sql Server installation
In short you may say:
1) Create a deployment utility by setting its property TRUE
2) It will be created in the bin folder of the solution as soon as the package is build
3) Copy all the files in the utility and use .manifest file to deploy it on production environment.
How to pass property value at runtime? How do you implement package configuration.
- SSIS provides package configuration that we can use to update the values of properties at runtime.(property values like connection string of a conn mngr can be passed to the package using configuration.)
- A configuration is a property/value pair that you add to compiled package. Typically you create a package set properties on the package object during package development and then add the configuration to the package.
e.g By using configuration, we can change connection string of a connection manager, or update the value of a variable.
Package Configuration Benefits:
1) Configurations make it easier to move package from development to production environment.
e.g. – Configuration can be update the path of a source file and change the name of the database etc.
2) Configuration is useful when you deploy package to many different servers.
3) Configuration makes package more flexible.
e.g. – configuration can update the values of a variable i.e. used in a property expression.
TRANSFORMATIONS IN SSIS??
It is an object that generates, modifies, or passes data.
1.AGGEGATE :-It applies an aggregate function to grouped records and produces new output records from aggregated results.
2.AUDIT :-the t/r adds the value of a system variable, such as machine name or execution instance GUID to a new output column.
3.CHARACTER MAP :-this t/r makes string data changes such as changing data from lower case to upper case.
4.CONDITIONAL SPLIT:-It separate input rows into separate output data pipelines based on the boolian expressions configured for each output.
5.COPY COLUMN:-add a copy of column to the t/r output we can later transform the copy keeping the original for audIting personal
6.DATA CONVERSION:-converts a columns data type to another data type.
7.DATA MINING QUERY:-perform a data mining query against analysis services.
8.DERIVED COLUMN:-create a new derive column calculated from expression.
9.EXPORT COLUMN:-It allows you to export a column from the data flow to a file.
10.FUZZY GROUPING:-perform data cleansing by finding rows that are likely duplicates.
11.FUZZY LOOKUP:-matches and standardizes data based on fuzzy logic.
eg:-transform the name jon to john
12.IMPORT COLUMN:-reads the dat from a file & adds It into a dataflow.
13.LOOKUP:-perform the lookup of data tobe used later in a transform.
ex:-t/f to lookup a cIty based on zipcode.
1.getting a related value from a table using a key column value
2.update slowly changing dimension table
3.to check whether records already exist in the table.
14.MERGE:-merges two sorted data sets into a single data set into a single data flow.
15.MERGE JOIN:-merges two data sets into a single dataset using a join junction.
16.MULTI CAST:-sends a copy of two datato an addItional path in the workflow.
17.ROW COUNT:-stores the rows count from the data flow into a variable.
18.ROW SAMPLING:-captures the sample of data from the dataflow by using a row count of the total rows in dataflow.
19.ROW SAMPLING:-captures the sample of the data from the data flow by using a row count
of the total rows in data flow.
20.UNION ALL:-merge multiple data sets into a single dataset.
21.PIVOT:-converts rows into columns
22.UNPIVOT:-converts columns into rows
What is the use of Slowly changing dimension Task in SSIS 2008 ? How do we use it ?
Although there's lots of in depth information on the web about what a Slowly Changing Dimension is, here's a very common and simple use of it:
Let's say that you are trying to keep the contents of one table up to date with another table's contents. For example, if you have one database (database A) that contains an item master (eg. an ERP system), and you want to have another database (database B) have the same item master data. A slowly changing dimension will allow you to propagate changes from the item master Database A into Database B automatically. For example, if a an item's field changes in db A, then that change will get propagated to db B; if a new item is inserted into db A, then that same item will get inserted into db B. (Note that, in this scenario, deletions don't get propagated, since there's no record to trigger the deletion.)
What is environment variable in SSIS?
a) An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.
Tell me about your experience with SSIS
Debugging Packages: Setting break points, using dataviews, running SQL profiler
Creating custom configurations: XML and DB
Deployment Strategies
Custom Logging
Running control flow and data flow in Transactions
Can you explain the SQL Server Integration Services functionality in Management Studio?
You have the ability to do the following:
Login to the SQL Server Integration Services instance
View the SSIS log
View the packages that are currently running on that instance
Browse the packages stored in MSDB or the file system
Import or export packages
Delete packages
Run packages
What is a breakpoint in SSIS? How is it setup? How do you disable it?
A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package.
10 unique conditions exist for each breakpoint.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the object where you want to set the breakpoint and select the 'Edit Breakpoints...' option.
Can you explain different options for dynamic configurations in SSIS?
Use an XML file
Use custom variables
Use a database per environment with the variables
Use a centralized database with all variables
How would you schedule a SSIS packages?
Using SQL Server Agent. Read about Scheduling a job on Sql server Agent
Comments
Post a Comment