3 people following this project (follow)

What is Dataflow Discoverer?
Dataflow Discoverer (DFLD) is a command-line discovery utility that detects and writes the metadata of SSIS dataflow columns to a SQL Server lookup table. DFLD detects Dataflows nested within containers up to any level of nesting.

Why use Dataflow Discoverer?
SSIS dataflow column metadata generated by DFLD utility could be used to enhance the error reporting capability of your SSIS package. For example, to get the name of ErrorColumn from the Error Outputs which is not available by default in SSIS yet.

What are the minimum system requirements to use Dataflow Discoverer?
  1. .NET framework 3.5 or above
  2. SQL Server Database (any version)
How to install Dataflow Discoverer?
Verify that your system meets minimum requirements, then
  1. Download sql script from Downloads page. Execute the script on your SQL Server. By default the sql script will create a SQL Server database - ETLMetadata and a lookup table – DFLMetadata. You could also create the table DFLMetadata in any other SQL Server database, in that case just comment out the database creation segment of the sql script.
  2. Download DFLD executable from Downloads page. Copy the executable to a local folder. It is recommended to add the executable path to System Path environment variable.
How to use Dataflow Discoverer?
From the command prompt, type:
dfld /param1:value1 /param2:value2 /param3:value3 ... /paramN:valueN

List of Parameters:
ParameterShorthandMandatoryDescription
/package /p Y Fully qualified path of SSIS Package in File System. If the path contains spaces, enclose in double quotes.
/server /s Y IP Address/Name of Database Server hosting the DFLMetadata table. If Instance Name is applicable, then specify as Database Server/Instance Name.
/database /d Y Database Name hosting the DFLMetadata table. This is usually ETLMetadata, unless you have deployed the table DFLMetadata to a different database.
/user /u N Database User Name for SQL authentication. This parameter is not required for Windows authentication.
/pwd /pwd N Password of Database User required for SQL authentication. This parameter is not required for Windows authentication.


If any parameter contains space, then enclose in double quotes. From command prompt, type dfld /? for more help options.
Examples:
  • Database uses windows authentication
dfld /p:C:\users\eagle\populateDim.dtsx /s:localhost /d:ETLmetadata
  • Database uses SQL server authentication
dfld /p:C:\users\eagle\populateRef.dtsx /s:localhost /d:ETLmetadata /u:userxyz /pwd:XXXXXXX
  • Package Name contains space
dfld /p:"C:\users\eagle\populate fact.dtsx" /s:localhost /d:ETLmetadata

On successful completion, your console will display statistics similar to the one shown in this screen capture

What metadata is captured by Dataflow Discoverer?

Metadata DataTypeDescription
PackageID Varchar(38) Unique GUID of the package
PackageName Varchar(200) Recommended to keep package name under 200 characters
DataflowID Varchar(38) GUID of the Dataflow
DataflowName Varchar(200) Recommended to keep dataflow name under 200 characters
IOType Char(1) Flag to indicate whether the collection is Input or Output. I= Input Collection, O= Output Collection
IOName Varchar(200) Name of Input/output collection
IOID Int ID of Input/output collection
ColumnName Varchar(200) Recommended to keep Column name under 200 characters
ColumnDesc Varchar(500)
LineageID Int It’s an integer value which identifies the column in the buffer
ColumnID Int Unique value that identifies a column within a Dataflow in the package.
ColumnDataType Varchar(30)
ColumnPrecision Int
ColumnScale Int
UpstreamComponentName Varchar(200)
SortKeyPosition Int
ColumnCodePage Int
MappedColumnID Int
UsageType Int
ColumnType Int


How to get name of ErrorColumn from the Error Outputs in my SSIS Package?
From your SSIS Dataflow, lookup DFLMetadata table to fetch the name of ErrorColumn. Check-out this post which demonstrates how this is done SSIS: How to get name of ErrorColumn from Error Output?

At what stage in my ETL development cycle should I use Dataflow Discoverer?
You can generate the dataflow column metadata using DFLD at any stage in your SSIS package development cycle. It is recommended to refresh your dataflow column metadata just before deployment when your package is ready to go.

How to capture/refresh Dataflow column metadata from multiple packages using Dataflow Discoverer?
It’s a typical ops requirement to keep the DFLMetadata up to date. This can be automated by creating a SSIS package that scans your deployment folder for package (* .dtsx) files. For each package found, execute the DFLD using an Execute Process Task in the control flow. Check-out this post which demonstrates how this is done DFLD: How to capture/refresh Dataflow column metadata from multiple packages using Dataflow Discoverer?

Where could I find the documentation for the source code of Dataflow Discoverer?
Download the source code from Downloads page. The source code is adequately documented to be self-explanatory.

Author's Blog

http://bennyaustin.wordpress.com

Recent Posts

 Benny Austin News Feed 
Thursday, May 17, 2012  |  From Benny Austin

Design Tip to Cascade Logging using LoggingMode property of package and control flow elements Continue reading »

Wednesday, February 22, 2012  |  From Benny Austin

The default SSIS log provider for SQL Server captures package trace in sysssislog table. It does capture some useful but limited trace information about the package execution. This trace is of little use without user defined variable values like record count, status flags, control metadata, selection from and to dates etc. By combining your execution log with sysssislog, you could take advantage of SSIS logging capabilities and customize it for your specific logging requirements. Continue reading »

Wednesday, January 25, 2012  |  From Benny Austin

This post discusses the challenges for the ETL to handle Late Arriving Dimensions/Early Arriving Facts. and proposes solutions to tackle these challenges and optimize the ETL for Late Arriving Dimensions. Continue reading »

Monday, November 07, 2011  |  From Benny Austin

So you have several cubes to build and perhaps wondering what is the best way to build and deploy your cubes. Should you build all the cubes within the same Analysis Service database or create one database per cube? While there are benefits of having all cubes in one database there are scenarios where having one cube per database might be just as helpful. The following guidelines might help you to choose the best approach to build and deploy your SSAS cubes. Continue reading »

Saturday, October 29, 2011  |  From Benny Austin

SSAS provides Process Incremental as one of the processing options for cubes, measure groups and partitions. It is important to understand how Process Incremental works because it differs significantly from the seemingly equivalent Process Update for dimensions. Continue reading »

Monday, July 25, 2011  |  From Benny Austin

This post is about troubleshooting and finding a workaround for this exception - [Analysis Services Execute DDL Task] Error: Internal error: An unexpected error occurred (file 'mdinfo.cpp', line nnnn, function 'MDInfo::Init').
Continue reading »

Monday, July 25, 2011  |  From Benny Austin

This post is about troubleshooting and finding a workaround for this exception - [Analysis Services Execute DDL Task] Error: Internal error: An unexpected error occurred (file 'mdinfo.cpp', line nnnn, function 'MDInfo::Init').
Continue reading »

Monday, July 25, 2011  |  From Benny Austin

This post is about troubleshooting and finding a workaround for this exception - [Analysis Services Execute DDL Task] Error: Internal error: An unexpected error occurred (file 'mdinfo.cpp', line nnnn, function 'MDInfo::Init').
Continue reading »

Wednesday, July 20, 2011  |  From Benny Austin

'A fact relationship cannot be defined. The dimension and measure group must be based on the same table in the same data source view.' If you get this message when defining the relationship type as Fact between the measure group and Degenerate/Fact dimension, what could possibly go wrong especially when the second part of the message clearly tells what needs to happen? Continue reading »

Tuesday, July 05, 2011  |  From Benny Austin

Some coding tips when using Analysis Management Objects (AMO) methods Continue reading »

Tuesday, May 24, 2011  |  From Benny Austin

Analysis Management Objects (AMO) opens up the object model of SSAS cube for programming. With a bit of .NET programing, AMO can be effectively used to automate administration tasks. Securing the cube is one such task that can be easily automated using AMO. The rest of this post explains how to automate cube security using AMO.

Wednesday, March 23, 2011  |  From Benny Austin

A Role Playing Dimensions reuses its definition to represent same concept in different contexts. Common example of Role Playing Dimension is date dimension. For instance, date dimension can be used to represent order date, delivery date and shipment date. Currency, staff, organisation hierarchy are other dimensions that are usually role played. While Role Playing Dimensions promote dimension reuse and share dimension data security, the effect on cube browsing must also be considered.

Tuesday, March 01, 2011  |  From Benny Austin

SSAS Dynamic Management Views (DMV's) are very useful to query metadata of a cube. For developers, this is quite handy to provide simple documentation of the cubes they build.

Saturday, February 05, 2011  |  From Benny Austin

Dataflow Discoverer (DFLD) is a command-line discovery utility that detects and writes the metadata of SSIS dataflow columns to a SQL Server lookup table. DFLD detects Dataflows nested within containers up to any level of nesting.DFLD works on one package at a time. It's a typical ops requirement to keep the DFLMetadata of all SSIS packages up to date. In this post a technique is described to automate refresh of dataflow column metadata.

Friday, February 04, 2011  |  From Benny Austin

SSIS automatically creates two columns on the Error Output from dataflow components - ErrorColumn and ErrorCode. ErrorColumn contains the column ID of the erroneous column. Column ID is a number that is unique to a dataflow within the package. Similarly ErrorCode contains the error number that caused the row to be rejected. At the moment, SSIS by default does not provide ErrorColumn name and ErrorCode description. This post describes a technique to get the name of ErrorColumn.

Friday, February 04, 2011  |  From Benny Austin

Dataflow Discoverer (DFLD) is a command-line discovery utility that detects and writes the metadata of SSIS dataflow columns to a SQL Server lookup table. DFLD detects Dataflows nested within containers up to any level of nesting.

Tuesday, November 16, 2010  |  From Benny Austin

This post highlights the differences between Analysis Services and PowerPivot in 30 seconds.

Wednesday, September 22, 2010  |  From Benny Austin

If you recently upgraded to SQL Server 2008 or considering an upgrade, you would be glad to know that the Deployment Wizard of Analysis Services works in the way it should.Roles retain their existing permission to cubes and mining structures after deployment. Existing permissions are not lost. Dimension Data and Cell Data permissions are preserved in the role definition after deployment. And Role membership stays put.

Sunday, September 19, 2010  |  From Benny Austin

Analysis Service supports role based model to secure cube data. Data can be secured at object level, cell level (Cell Data Access) and at member level (Dimension Data Access). A user can be a member of more than one role and it is important to understand the behaviour of Analysis Services role permissions in such a scenario. Analysis Services role permissions are additive in nature. This means if you are a member of a role that has been granted access to an Analysis Service object; you have access to that object even if you belong to another role which denies access to the same object. In other word the least restrictive roles permissions override the most restrictive role permissions. This could throw some nasty surprises that you never expected.

Sunday, August 08, 2010  |  From Benny Austin

The greatest strength of PowerPivot is the ability to create relationship with different sources of data and piggyback on this relationship for various kinds of analysis. PowerPivot relationship requires one-to-many relationship between the parent and child tables. But what happens if the relationship between parent and child tables is many-to-many instead? For e.g. you might want to establish a relationship between a fact table that has a Business Key to the dimension table and that dimension table maintains history. In this post I explain how to establish a relationship between two tables that do not exhibit one-to-many relationship on their Business Key using a technique that is similar to an ETL Type 2 SCD lookup using Data Analysis Expressions (DAX).

 Benny Austin News Feed 

Last edited Feb 4 2011 at 11:56 PM by BennyAustin, version 32