Developing for SQL Server merge replication
Lately, we’ve been pushing our application from single server environment into an environment with merge replication. Not only that, but we also migrated from another RDBMS to Microsoft SQL server. While application had no issues working in replication environment, I learned the hard way there are some things that I could and should have done better. This are my findings.
If possible, avoid auto-incrementing identities
Auto-incrementing identities are a bit tricky for merge replication. Of course SQL Server knows how to handle that, thus merge replication handles it by assigning pools of numbers that a publisher and subscribers can use. By default, publisher gets first 100,000 numbers and each client gets first 10,000 numbers that are available. When publisher or subscriber runs out of numbers in pool (which is by default at 80% of current pool), he is assigned next available numbers.
To simplify, let’s imagine you have 1 publisher and 1 subscriber. In this case, a publisher would get numbers from 1 to 100,000 and a subscriber would get numbers from 100,001 to 110,000. When a subscriber runs out of pool numbers from 110,001 to 120,000 would be assigned to it. Let’s say a publisher’s pool runs out. Numbers assigned to publisher will be from 120,001 to 220,000. And so on and so forth.
So, what should you use? As odd as it sounds, unique identifiers also known as GUIDs. Apparently GUIDs are random enough that you shall not experience a conflict. I strongly recommend reading this article about auto generated identity and unique identifier keys.
Stay away from user triggers
Many developers use triggers to store old values in an archive table on insert or update. There is nothing wrong with this in single server environment, but, please, avoid it in replication environments (this also applies for all other replication types). Why? Well, you can disable replication of user defined triggers, but this doesn’t mean that triggers created on tables on multiple servers, will not get executed on replication. Thus, if you are not really really careful, you will end up with n-variations of the same entry in your archive table.
Create your data model so it minimizes conflicts
A replication in SQL server environment (or any other for that matter) is not instantaneous. Continuous replication is executed in 1 minute intervals. Also, the only useful setting I found in subscription configuration is the one where all entries made on publisher take precedence over anything done at subscribers. Thus, it can easily happen that if someone changes data on subscriber and inside 1 minute period, someone changes data on publisher, the changes from subscriber will be overwritten.
My advice. Try to minimize situations, where same entries in a table can be modified on all sides of replication.
Leave a Reply