Software Science: Impractical SSIS Configurations

If you are developing SSIS packages for use in an enterprise, it is likely that you have different (and possibly multiple) development and production environments. When you deploy a package to production, therefore it is necessary to change all your connections and other settings from their development values to production. Without SSIS configurations, your workflow would be to load the package into BIDS, change all the connections and variables, save the modified package and then deploy it to the production environment. In many of our ETL packages, there can be half a dozen connections that need to be changed, and we may have 20 or so SSIS packages, so we’re talking on the order of 100 changes that need to be made to release an updated ETL process.

This is the problem configurations in SSIS are intended to solve.

But they don’t.

I don’t think our requirements are unusual or unreasonable: manage sets of connections and variable settings. However, here is some of the things we tried which didn’t really seem practical.

Use XML configuration files, with different versions deployed to the same location on the development and production servers.

Problem 1: Unless *all* your SSIS packages use the same set of connections and variables, you will require a different configuration file for each SSIS package. Failure to have *only* the correct settings in the configuration file (i.e., spurious settings in your configuration file) will result in a myriad of errors and warnings at design time and/or execution time. (I forget precisely which, as I am writing this on an iPad somewhere over the Pacific Ocean between Tahiti and Auckland, but if memory serves me, it was errors at run time.). You might think you could mitigate this problem by creating *every* connection in *every* package regardless of whether it is used or not, but (1) this is an awkward hack, and (2) when you add a new connection to one package, you now have to add it to every package.

Problem 2: Because the configuration file is server-wide, this approach only allows you to have one configuration per server. However, this is impractical when you might have your regular ETL process, but you may also have the need for additional processes to run in parallel. One example of this would be when your regular process does an incremental load, but you also want to run a full load. Since the full load may take a considerable amount of time (possibly several days), you want to run the full load in parallel with the incremental load and into different databases until the full load is complete, at which time you will make the newly fully-loaded database the active production database. This situation is exacerbated by the fact that configuration files are not overridden by command line options — configuration files are applied after command line options — so it is impossible to correct this problem. Using environment variables to identify the appropriate configuration file also does not alleviate the situation, since environment variables, too, are server-wide, so we are still restricted to a single configuration per server.
Use database-stored configuration settings.

This approach suffers the same as problem 1 with XML configuration files: we cannot have a single generic set of settings in our database table, because that would require every package to have every database connection. Luckily, with a database table, we have at our full disposal the tools of SQL Server to assist us.

Our first thought was to have a table (or set of tables) describing the settings and connections for each package, and then define a view that contains all the requisite settings. We are thwarted, however, by BIDS, which won't accept a view in place of the settings table. (I believe it works OK at run time, just BIDS doesn't like it at design time.) While we could have created update and insert triggers on the view to try to fool BIDS into thinking our view is a table, we would rather just have a physical table and create a stored procedure that populates it (probably with a MERGE statement), since this gives us some added flexibility in case a strict view will not work (or would be unwieldy), and also we did not know for sure whether we could fool BIDS into thinking the view was a table.

So our final idea, as yet untried, is to use a set of tables describing the connections and settings in our packages, and create a stored procedure that populates an SSIS configuration table with the appropriate values. To accommodate our requirement to have multiple configurations on the same server, we may have multiple ETL settings databases with the difference sets of setting values.

This ETL database will also likely be used for other command and control requirements of our ETL processes.

Software Science

Friday, July 27, 2012

Impractical SSIS Configurations

1 comment: