UPDATE (18APR2013): If you find this post interesting, then you will want to watch the Enterprise Data Integration for CRM, featuring Daniel Cai webcast.
“I am building a solution using CRM 2011, but I need to migrate a significant amount of data from an existing system. The Import Data wizard in the CRM 211 UI doesn’t meet my needs due to the amount of data and the logic necessary to transform the data from the source system into the destination system.”
This scenario is often referred to as Extract Transform and Load (ETL). The first thing most people look at is using SQL Server Integrations Services (SSIS), because it is the ETL tool that comes with SQL Server. The only supported way to perform Create Update and Delete operations against CRM 2011 is to go through the CRM Web Services. While it is possible to call a web service from SSIS out of the box, the reality is that the level of effort is arguably higher than most expect when interacting with the CRM 2011 Web Services. Because of this, there are two companies (that I am aware of) who provide add-ons to SSIS:
These add-ons greatly improve productivity when performing CRUD operations through the CRM Web Services. I’ve personally never used either, but I’ve spoken with folks who have. Those people claim that these add-ins are well worth the license fee compared to creating the SSIS packages without one. Once people have built SSIS packages to perform data migration, they inevitably want to know how they can improve the performance of the data import. One of the first things you should consider is executing parts of your package in parallel where possible. There’s another add-on for SSIS called the Balanced Data Distributor that helps you parallelize:
This one is free. Make sure you check out link to the video in the comments of the post. Of course performance will improve if you run this on hardware that has more CPUs/Cores. There are also some other things to take into consideration when it comes to performance of data imports for CRM 2011.
- First, you want to make sure you’ve optimized your CRM server infrastructure (see here). Following the optimization guidelines makes a difference.
- Where possible, deactivate workflows and plugin messages before importing large amounts of data. Reactivate after import is complete.
- It’s also possible to improve performance by dropping indexes and rebuilding them after the import is complete. However, index rebuild time needs to be considered.
- Disable statistics, then re-enable after import is completed (see here)
Whether you can apply some of these depends on your data import scenarios. For example, if you have to import data during normal usage hours, some of these may not be feasible. But then again, you probably shouldn’t be “migrating a significant amount of data” during normal usage hours.
You may be asking yourself: “What about Scribe?” Scribe makes wonderful tools that do much more than what’s needed for basic ETL scenarios. The focus of this post the scenarios for which ETL with SSIS makes the most sense.
UPDATE: Last, but definitely not least as a few folks have pointed out, it’s important to understand the fundamentals of a well built / optimized SSIS package. This post was intended to cover with the CRM nuances. Here are a few links to help with the fundamentals: