SQL SERVER

Ques: What is .LDF – SQL Server?

Ans:

I was quite surprised to see search engines throwing very less information on this one, yet it’s a frequently asked question. LDF acronym stands for Log Data File (though could have been called – Transactional Log Data File). This is a must to have file, and is created at the same time the database file (.mdf) is created by SQL Server.

So why do you need it? For keeping .mdf consistent. To cut short I will copy these lines straight from wiki :
“SQL Server ensures that any change to the data is ACID-compliant, i.e., it uses transactions to ensure that any operation either totally completes or is undone if fails, but never leaves the database in an intermediate state. Using transactions, a sequence of actions can be grouped together, with the guarantee that either all actions will succeed or none will. SQL Server implements transactions using a write-ahead log. Any changes made to any page will update the in-memory cache of the page, simultaneously all the operations performed will be written to a log, along with the transaction ID which the operation was a part of. Each log entry is identified by an increasing Log Sequence Number (LSN) which ensure that no event overwrites another. SQL Server ensures that the log will be written onto the disc before the actual page is written back. This enables SQL Server to ensure integrity of the data, even if the system fails. If both the log and the page were written before the failure, the entire data is on persistent storage and integrity is ensured. If only the log was written (the page was either not written or not written completely), then the actions can be read from the log and repeated to restore integrity. If the log wasn’t written then integrity is also maintained although the database state remains unchanged as if the transaction never occurred. If it was only partially written, then the actions associated with the unfinished transaction are discarded. Since the log was only partially written, the page is guaranteed to have not been written, again ensuring data integrity. Removing the unfinished log entries effectively undoes the transaction. SQL Server ensures consistency between the log and the data every time an instance is restarted.”

Are there any other uses of .LDF? Yes they can be used for Log Shipping, a high availability solution.

What are the recommended optimizations for LDF file? LDF could be stored on a separate disk having a dedicated disk controller in high volume scenario. Growth of LDF should be managed as daily maintenance activity as it can easily eat up the available storage capacity. A 2 disk RAID 1 is also a good trade off for LDF is terms of performance vs. failure protection (if you are able to invest money on disks RAID 10 is good for MDF (faster read)).

Hope that helps you get started .

Snapshot vs. LogShipping vs. Mirroring vs. Replication vs. Failover Clustering

All these SQL SERVER terms were quite confusing for me. Luckily, i got it while surfing . So I am going to jot down my understanding of them, & will look forward to read your comments on it.

1) Snapshot is a static read only picture of database at a given point of time. Snapshot is implemented by copying a Page (8KB for SQL SERVER) at a time. For e.g. assume you have a table in your DB, & you want to take a snapshot of it. You specify the physical coordinates for storing snapshot & when ever original table changes the affected rows are pushed first to the the snapshot & then changes happen to the DB. (N.B. There is also something called as Snapshot Isolation Level which is different from Database Snapshot).

Usage Scenario: You have a separate DB for report generation, and want to ensure that latest data for that is available. You can periodically take snapshot of your transactional database.

2) Log Shipping is an old technique available since SQL SERVER 2000. Here the transactional log (ldf) is transferred periodically to the standby server. If the active server goes down, the stand by server can be brought up by restoring all shipped logs.

Usage Scenario: You can cope up with a longer down time. You have limited investments in terms of shared storage, switches, etc.

3) Mirroring which was introduced with 2005 edition, works on top of Log Shipping. Main difference is the uptime for the standby server is quite less in mirroring. Standby server automatically becomes active in this case (through help of a broker server which is called as Witness in SQL SERVER parlance), without having to restore logs (actually logs are continuously merged in this scenario – no wonder it’s called Mirror ). Additional advantages of Mirroring include support at .NET Framework level (read no switching/routing code – requires ADO.NET 2.0 & higher) plus some new features like Auto Page Recovery introduced with SQL SERVER 2008.

Usage Scenario: You want very less down time and also a cost effective solution in terms of shared storage, switches, etc. Also you are targeting a single database which easily fits in your disks.

4) Replication is used mainly when data centers are distributed geographically. It is used to replicate data from local servers to the main server in the central data center. Important thing to note here is, there are no standby servers. The publisher & subscriber both are active.

Usage Scenario: A typical scenario involves syncing local / regional lookup servers for better performance with the main server in data center periodically, or sync with a remote site for disaster recovery.

5) Failover Clustering is a high availability option only (unlike others above which can be used for disaster recovery as well) used with clustering technology provided by hardware + OS. Here the data / databases don’t belong to either of servers, and in fact reside on shared external storage like SAN. Advantages of a SAN storage is large efficient hot pluggable disk storage. Here’s a good article on adding geo redundancy to a failover cluster setup.

You might want to look at the licensing options of SQL Server, various editions available and how they map to above features. You can find this information in detail here.

Hope that helps to some extent

COUNT(*) vs. COUNT(ColumnReference)

Most of us are familiar with this but one of my team members recently got stuck with his query. The difference is COUNT(*) includes NULL values while COUNT(column_reference) doesn’t. As an example consider the two below queries for Northwind database that find the number of orders placed by each Northwind customer:

WRONG:
SELECT C.CompanyName, COUNT(*) AS NoOfOrders
FROM dbo.Customers AS C LEFT OUTER JOIN dbo.Orders AS O
ON C.CustomerID = O.CustomerID
GROUP BY C.CompanyName
ORDER BY NoOfOrders

RIGHT:
SELECT C.CompanyName, COUNT(O.OrderId) AS NoOfOrders
FROM dbo.Customers AS C LEFT OUTER JOIN dbo.Orders AS O
ON C.CustomerID = O.CustomerID
GROUP BY C.CompanyName
ORDER BY NoOfOrders

Search This Blog

MSBI - Business Intelligence

SQL SERVER

Comments

Post a Comment

Popular posts from this blog

host

Steps to create SSH key from git bash

requirement.txt