Making a Case for SQL Server over SQL Database in Azure

When Windows Azure went online some years back, the buzz prompted me to check on it too. But my main reason to check out Microsoft’s cloud solution Azure was Azure’s SQL Database (formerly SQL Azure). Initially, the Azure SQL Database was pretty lame if you start comparing it to its on-premise roots (the Microsoft SQL Server). A SQL Server veteran’s top 10 lists of things he can find in Azure SQL Database would probably be composed of things that SQL Database doesn’t have.

  • no way of backing up databases
  • no automatic database sharding mechanisms
  • no fulltext indexing and seaching
  • and yeah it doesn’t do this
  • and it doesn’t do that
  • etc …

Then Microsoft introduced FEDERATION as a way to make things elastic in terms of scaling database growth and people have hailed the feature as something that will allow Azure SQL Database to make a name of its own as the feature isn’t even available in SQL Server.

Then, one begs to ask if Microsoft could have allowed SQL Server to run on Windows Azure instead. Today, Microsoft gave us that power to provision, install and run SQL Server on an Azure Virtual Machine just like what we do on an in-premise environment. To some, this could put Azure’s SQL Database to the back end of the long line.

Why SQL Server Over Azure SQL Database?

The answer is simple. It is so we can run existing applications on Azure that are using SQL Server features not found in Azure SQL Database.

However, if the goal is just to migrate existing SQL Server based applications to the cloud, then we might be missing some of the major reasons why we have the cloud in the first place. And we dont have to go to that direction for this article. People are aware that aside from wanting to get away with dealing with the nuances of an in-premise infra, to some, going for the cloud simply means cloud scale load. And for those of us who still wanted an SQL way of dealing with data in the cloud, Azure SQL Database, can definitely handle any cloud scale load thrown its way as it has inherent mechanisms, like FEDERATION, to address issues related to handling load on a massive scale. But there are also a lot of things to be desired that are still not available in Azure SQL Database. Eventually, Azure SQL Database will mature and we will not have to deal with the missing pieces but some of us can’t wait any longer.

So the questions now are the challenges one need to face to use SQL Server in Azure instead of Azure SQL Database, specifically one that will allow cloud scaling. The same questions have been bugging my company so we can transition our applications and services over to the cloud (Azure specifically) without throwing away so much of the current SQL Server related assets that we have. The primary reason for this is that, aside from adopting the cloud, we are still maintaining our in-premise code base until when there are no more clients depending on it. But I don’t see our existing clients going in that direction so soon. The way I see it, the challenge would be our ability to maintain and run two code-bases. One for in-premise, and the other for the cloud. Eventually, the code for the cloud will become the code for in-premise.

So to rationalize all these based on my own and company’s scenario, it seems running SQL Server on an Azure VM will be the way to go. The challenge will be to use SQL Server on Azure as storage and data management engine of choice for a multi-client SAAS (software-as-a-service) type of system. I have identified a few major issues that I need to deal with from the get go:

(1) how to handle horizontal scaling

How would I shard? I would think a single database per client would be easier. Given that SQL Server wasn’t designed to be automatically elastic, I would probably need to develop a mechanism that would create a new database on some SQL Server instance on an Azure VM as soon as a new client registers. These VMs can be on the same or different Azure instance. The SQL SErver instance on Azure VM can be pre-determined, provisioned and should be running on standby and wait for new registrants.

(2) full text

Fulltext indexing is currently not available in Azure SQL Database. To provide a similar service, one needs to look at 3rd party search solutions like Lucene.NET. The difference between the fulltext indexing in SQL Server and that of Lucene is that, the former allows you to mix things up in your TSQL queries in the same TSQL statements. By using SQL Server on an Azure VM, one can have fulltext indexing and searching. However, SQL Server fulltext can only be configured on a per database basis. To do fulltext across multiple databases across multiple SQL Server instances on various Azure VM instances, one has to probably make SQL SErver fulltext work like that of Lucene.NET. To probably do that, one has to treat each client database as a document.

(3) reporting services

The simplest implementation for a multi-client setup I can think of right now is to include the login information in the report’s parameter and hi-jack the info to get the right database to query. But I am not sure yet if this is a good approach security wise. The other option is to use the report RDL control and access the RDL files from the apps.

(4) High Availability & Backup

Going for SQL Server on an Azure VM means you have to deal with High Availability issues like you do with in-premise implementations. Though Microsoft made sure that the Azure instances one is using all have redundant instances, there were downtimes that you don’t have control that you have to deal with. The good thing with using SQL Server on an Azure VM is that, full database backups like those you do in in-premise setup are available.

Given the above challenges, I would definitely go for SQL Server in an Azure VM over Azure SQL Database. It is probably much easier to fashion out a custom sharding mechanism or horizontal scaler, a lucene.NET like implementation using SQL Server’s fulltext than deal with a lot of Azure SQL Database missing pieces. By also picking SQL Server as the storage of choice, I could probably port the entire system with negligible changes to other cloud infrastructure provider. Should a client decides to run everything off the cloud, then that wouldn’t be a problem either.

It seems to me now that Microsoft has provided me a potent option with having SQL Server on an Azure VM. I don’t have to wait until they get everything I need in Azure SQL Database. Surely, my personal and company’s needs  would make a case for SQL Server in Azure.

**************************************************

f677754256
Maira-ira Beach
Photographed in Pagudpud, Ilocos Norte

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Building, Deploying Databases Incrementally On Multiple Servers

SSDT + DACFx = Cool Database Version Control

 

For years, building and deploying a database to multiple clients was all pain and a lot of work. I have relied mostly in hand-coded solution to keep my sanity intact. The goal was simply to synchronize all databases deployed to various clients from a single schema which our company maintains. The goal was very simple, yet it was easier said than done. Often, one or more of these databases gets less and less updates over time which made things difficult to track. Some of these are of our own doing and some due to the inherent nature of our business as we don’t necessarily and automatically update databases as we wish. Clients need to sign up and approve of the updates and upgrades to the system and the databases.

In our case, there are tens of thousands of stored procedures to maintain and hundreds of tables, indexes and other SQL objects to track.These objects should be also in synched with the applications as version 1.0 of apps have different sets of Sql Server objects than application version 2.00. And we maintain up to 3 different versions hoping the clients would sign-up to upgrade to the latest faster than we solve all these complications. Some of these objects evolved from more than a decade of operation. In the end, databases get out of synched from the model that we have at hand as versions upon versions in various clients change. Today, the challenges are even greater. As we transition to a more faster and smaller roll out of updates (people refer nowadays to a more ‘agile’ way of evolving applications), new features, changes, fixes, deprecations, or whatever is on the plate increase the chances of our database out there getting out of synched.

A Viable Solution : SSDT and DACFx

Now enter the realm of SSDT and DACFx, it has provided us a free and automated way of doing what yesterday was a major headache for most of us. Some semblance of these workings started with SQL Server 2005 SP4 but I have waited for SSDT to have gone out to fully embrace the solution. Now there is that ‘silver bullet’ that would allow us to slay a werewolf of database proportions. So what are these tools to build databases incrementally and have all the increments deployed in an automatic manner (yes, it only involves a single command with a few parameters to dabble with!)?

The Tools

All we need now are the following:

  • SQL Server Data Tools (SSDT). A free database development tool that will allow us to design our databases and package it for deployment from version 1.0 to a series of minor and major updates to its schema. It is better to check out what it is if you still have not heard of SSDT. Coupled with DACFx, you can build Data Tier Applications (DAC).
  • DACFx. SQL Server Data-Tier Application Framework which is a set of tools, API’s and services intended to simplify our development, deployment and management of SQL Server schemas. The framework specifically allows us to build Data Tier Applications (DAC).
  • SQLPackage.Exe, a DAC tool use to deal with DACPACs and a lot more.

How These Things Work

Here, I would simply present a typical scenario everyone can relate and see how these tools work. Our goal is to deploy TESTDB v1.0 and update it with TESTDB v1.1. Using SSDT, I will have to author my TESTDB v1.0 with probably Table1 and StoredProcedure1 in it. Then I have to get the database snapshot.

snapshotdb

This forces SSDT to generate a DACPAC file containing metadata and instructions on how my TESTDB v1.0 is like when it is deployed. For this article I got the file TESTDB_20130301_11-32-20.dacpac under the snapshots folder of the project. The file naming gives me instant version management of some sort and that is neat. Using SQLPackage.Exe, I could go to any client’s SQL Server, fire up CMD and run TESTDB_20130301_11-32-20.dacpac.

sqlpackage

SQLPackage.Exe will proceed to deploy TESTDB v1.0 on any instance of SQL Server that I want. I basically only need to know the proper connection string that I can use for the process. But of course there is more to that, but I leave all that outside of this article for now as I wanted to show how easy it is all these can be achieved.

At this point in time, I got the following:

  • A simple, stand-alone dacpac file that I can use to deploy TESTDB v1.0 on more servers. I can have others or the clients do that for me. No more of those hand-coded scripts that could easily get tampered or accidentally modified for me. I can keep this file knowing this sets up the schema version 1.0 for my TESTDB. And I can use it weeks, months or even years after without worrying to recall what it needs to do.
  • A ‘testdb1’ database sitting on a new server
  • and a head-ache free afternoon

At some point in time, I probably might need to upgrade TESTDB with new tables and stored procs. We can refer to this as TESTDB v1.1. I can do that with SSDT by opening the TESTDB project, add the new objects and have a snapshot once again of the latest version of TESTDB to get a new DACPAC associated to TESTDB’s new version/updates. I just have to run the new DACPAC file on new or existing servers. This is where these tools get interesting and there are possibly varying outcomes depending on what SQLPackage.Exe encounters:

  • if testdb1 exists in the target server, SQLPackage.EXE will update the schema as necessary. It will check existing objects, leave it as is or update it as necessary and add new ones.  It will do this without DATA LOSS. Yes, no DATA LOSS.
  • if testdb1 does not exist in the target server, it will simply create testdb1. Objects from previous versions (e.g. TESTDB v1.0) will be recreated up to the latest version (which is TESTDB v1.1)
  • A possibility of DATA LOSS is also detected during this process. For example, if a certain column from v1.0 was changed from VARCHAR to INT in v1.1, SQLPackage.Exe prompts the user and cancels the entire update. THIS prevents DATA LOSS.
  • A version drift is also detected during this process. A user is prompted if a modification of the database schema outside this formal and controlled process of deployment is present. Once detected, it also cancels the entire update. THIS prevents DATA LOSS and structural modifications on databases that were tweaked outside of this process for some valid reasons.

With all these free and simple yet very effective deployment tools, I’d have my weekends for more non-tech activities. Oh well, I should have gone fishing but these tools tell me to tell others so they can go fishing too. 🙂

**************************************************

IMG_0468
𝗦𝘂𝗹𝗳𝘂𝗿-𝗯𝗶𝗹𝗹𝗲𝗱 𝗡𝘂𝘁𝗵𝗮𝘁𝗰𝗵, 𝘚𝘪𝘵𝘵𝘢 𝘰𝘦𝘯𝘰𝘤𝘩𝘭𝘢𝘮𝘺𝘴
Photographed in Mountain Province

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Cleaning Up Old Database Dirt with SQL Server 2012

I am about to venture into transitioning a 10 year old database that started with SQL Server 2000 which has surely some (a lot?) accumulated  ‘dirt’ in it that started way back in 2001. In the past (am sure it still is with 2012 edition), SQL Server has been very forgiving for all the sins of its handlers (DBAs, developers, users, & pretenders). It had allowed a lot of minor issues to creep into the database without constantly complaining and compromising its ability to run in production. In a perfect world, those issues could have been eradicated a long time ago.

The ‘dirt’ has accumulated as SQL Server hasn’t been pestering people that there is some cleaning up to do. In the real world, that is acceptable. We just can’t stop the show after someone spilt water on the stage floor. After 10 years,with better tools and ways to clean up dirt, it would be a worthwhile exercise so we can navigate easily through the challenges of modern times.

Enter SQL Server 2012, it packs with efficient and reliable tools to accomplish this tasks of cleaning things up. To start on the cleaning process though I need to see the mostly unseen ‘dirt’. SSDT comes in handy for this job.

Using SQL Server Data Tools (SSDT) As Dirt Probe

Here is how to begin identifying what needs to be cleaned.

Then allow SSDT to scan your database for ‘dirt’. I am actually referring to ‘errors’ and ‘warnings’!

After scanning, go to the Error List tab and VOILA!!! now you see the ‘dirt’.

Armed with a long ugly list, you can now hire a cleaner to scrub off each and every dirt you have identified.

That is all for now for the cleaning up!

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

SQL Server Data Tools (SSDT) Lengthens Life of SQL Server 2005!

The arrival of SQL Server 2012 last week sparked a semblance of positivity among people who have been using version 2005 who also held off on upgrading and skipping two SQL Server 2008 releases. For a lot of folks like me though, SQL Server 2012 gave a lot of boost and now lengthens the lifespan of 2005 databases. Not that 2005 is dying, but the tool lend more than a helping hand to a lot of 2005’s inherent goodies that some might even consider sticking with the version a little bit longer. Enter SQL Server Data Tools (SSDT), formerly aliased as “Juneau“, is the newest tool Microsoft has provided for its legion of SQL Server professionals.

Developing for SQL Server 2005

Though SQL Server 2012 has a lot to offer to anyone, SQL Server 2005 is very much a formidable data platform that would not go away anytime soon. My company still supports a lot of clients running SQL Server 2005 that might not be considering an upgrade anytime soon. And as a matter of strategy, we still very much believe that we have to pick the most common denominator among our target space. So here, just in time, SSDT gets to be initiated with a new project that would at least support SQL Server 2005.

Working Right Away

The last time I use SSDT was in SQL Server 2012 CTP3 and utilized it for some presentation/demo. I downloaded it earlier today and wasted no time and have it installed smoothly. No hiccups. It gets me so excited to play around a new tool with something that won’t be laid only to be forgotten after some tinkering. This time, SSDT gets to do real work right away with a small database.

Here are some of goodies SQL Server 2005 users from would get from a production release of SQL Server Data Tools:

  • Stand Alone SSDT (No SQL Server 2012, No Visual Studio 2010 needed). I was glad Microsoft had a stand-alone distro for this tool. The last time I tried it, it was blended with SQL Server 2012 CTP3. Now, you need not have SQL Server 2012 and Visual Studio 2010. I mean, you don’t need to have an installation of SQL Server 2012 and Visual Studio 2010 to use SSDT. Without the two, your SSDT off the web installer installs the necessary SQL Server 2012 and Visual Studio 2010 / SP1 components so it can run standalone. I had installed on a bare Windows 7 machine with no SQL Server nor Visual Studio components prior and it just went with no hassles.
  • SSDT Supports SQL Server 2005 as Target Platform. As I have tried before, SSDT fully supports development on SQL Server 2005 (lowest version allowed). You get all the bells and whistles of SSDT even if your target deployment platform is only SQL Server 2005. SSDT tunes itself to support 2005 so you are assured that every object you have in your SSDT project is fully supported by 2005’s engine and syntax.
  • TSQL Code Analysis On Build. Most common coding malpractices are easily detected. There are around 14 coding rules that can be detected for any violations. It is all up to you to weed the problems as they appear as warnings (can be set to appear as errors if you wanted to tidy up your TSQL code.
  • Catches TSQL Design/Compile Time Errors. For users who were contented to use just the SQL Server Management Studio, catching SQL compile time error isn’t a routine. Often, code errors are resolved only when executed against a database. For example, this following statement “SELECT col1, col2 FROM table1 ORDER BY col1, col3”, when parse or included in a stored procedure, produces no error upon the creation of the stored procedure. With SSDT, errors an be resolved at design/compile time.
  • Import SQL Server 2005 databases. This tool comes very handy as I attempt to do some cleaning up of some old 2005 databases and probably copy some old stuff for the new project I am doing. I was able to import 2005 databases into SSDT projects with the goal of running some code analysis. As expected, I was able to detect a lot of areas that arent flagged by the old platform as potential cause of problems (e.g. unresolved references, data typing inconsistencies, possible points of data loses, etc). I was also able to copy easily old structures and stored procedures and have it copied to a new SSDT project I am working on.
  • Offline Development. Unless I need to get some data in and out of a database while developing my SQL Server objects, I can completely do my development without needing to connect to a development or production database.
  • It is FREE! Thank you Microsoft!

My initial feedback on SSDT is that it is a very helpful and very promising tool. For the first release, there is so much goodies that can alleviate some tasks that used to be very tedious before. For the few hours of tinkering, I was able to fashion out a database with ease.

There is a lot more to discover from SSDT. So far, only a few hours of playing time was spent and I felt my productivity has been tremendously boosted.

Indeed, SQL Server 2012 is now the flagship product of Microsoft when it comes to handling data … but with SSDT’s help, SQL Server 2005 is here to stay for a bit more.

For more information about SSDT, check out the links below:

 

The Philippine Eagle-Owl. The largest of all owls endemic to the Philippines. It stands almost 20 inches and preys on small mammals like rodents and snakes.

Juneau That You Are On Steroids With Denali?

Supporting multiple versions of SQL Server for a single application typically means headache. And that is what I have been in since I started making commercial applications (banking mostly on the benefits provided by T-SQL). We have not abstracted our databases with non-TSQL objects in the mid-tier for very valid and interesting reasons. In my current situation, I am dealing with SQL Server 2005, 2008, 2008/R2 supporting bits of features from each version. Thank heavens all our clients have upgraded already from version 2000. My world though is still 80% SQL Server 2005, and being still a very formidable version, I don’t think I will be able to part ways with this build anytime soon. Case in point, clients cannot just easily upgrade to higher versions every time one comes out.

As a result, I am mired in managing our SQL Server development assets using various tools. Anything that can help tract and manage everything. Ever since, I had always wished for a tool that will address most of my SQL Server development concerns if not all. All these years, I have heavily relied on SSMS and supported by various other tools. When Database Projects arrive with Visual Studio, it got me interested but it tanks out every time I load it up with what I have. The last one was with Visual Studio 2010. With the sheer number of objects that we have, capacity and performance was always the issue and each time, I had to resort back to SSMS without exploring further those tools.

When Juneau, now officially called SQL Server Developer Tools, or SSDT, was announced as one of the touted tool in the next version of SQL Server  codenamed “Denali”, I was pretty excited about the possibilities it will bring to my situation. After getting my hands on it with Denali CTP3, I say it really got me excited. Though I haven’t tried steroids, based on the effects that I have seen, SSDT (sounds like drug testing haha) gets my nod despite some issues that I have encountered. I am aware that what I got is preview build and for sure Microsoft will eventually iron out those. The benefits far outweigh the quirks that I have seen. It would be worth to upgrade even with SSDT alone.

So what is in SQL Server Developer Tools?

I barely scraped the surface yet but here is what I have found that will a big impact with my situation:

  • It is a unified environment/tool for SQL Server, Business Intelligence, SSIS. Everything I need is provided for within the SSDT shell. I dont have to switch back and forth between Visual Studio, SSMS, and BIDS. Creation of new reports and SQL Server objects such as tables, views, stored procedures, file groups, full text catalogs, indexes … all can be done within SSDT. Existing ones can be added too. When I had the chance to explore MySQL Workbench, I find it cool to have objects that I create scripted first before it is created physically. This capability is now available in SSDT.
  • “Intellisense support for TSQL” for SQL Server 2005. With SSDT, it has not actually upgraded SQL Server 2005 then gave it TSQL Intellisense. TSQL Intellisense is outright available within SSDT regardless of your targetted SQL Server platform. With SSDT, you model your database schema/objects without the tool actually being connected to a targetted version 2005 live database.  This way, you work with your TSQL with Intellisense provided by SSDT, and you can publish your project to an SQL Server 2005 database by choosing 2005 as your target platform. Build/error checks are done prior to the publishing of the database project.
  • SQL Server Platform Targetting. SSDT supports versions 2005, 2008, 2008 R2, Denali and SQL Azure. You do your thing and just select your target platform when you are done. During the build, you will be prompted for platform violations.
  • Refactoring (Rename, wildcard expansion, Fully qualified naming). While in TSQL writing mode, you can rename a field and SSDT does the rest to all affected objects (tables, stored procedures, etc). Neat. No more 3rd party stuff to achieve this. 🙂
  • Supports Incremental Project Publishing. You can keep a development version of your database and upgrade your live version incrementally either via script or while connected to the database. It can even prompt you when your changes/upgrades causes a data loss. Very cool.
  • SSDT detects changes done outside of SSDT. It prompts you during publishing of a database project when it detects objects created outside of the tool. This is necessary so you know if somebody is messing up with your databases. I do wish though that SSDT would have the capability to reverse engineer these detected objects when it is determined to be legit as sometimes, you get to tweak directly the database using other tools like SSMS.
  • Importing of Existing Database Schema (from 2005 to Denali) into a database development project. One need not start from an empty project, existing databases can be dealt with.
  • Scalability. I am glad SSDT successfully imported the more than 20,000 objects in our current database. The previous attempts I had using Visual Studios’ Database Project (both 2008 and 2010) just hanged midstream after hours of importing all the objects. Took SSDT more than 20 minutes to import our database using a lowly test Denali test machine.

This is all for now folks! Regardless of some unwanted quirks I have encountered when using SSDT, I can still say, it is worth looking at. I am just waiting for some decent Denali machine to use so I get to load up my projects in SSDT and take the next level from there.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group (PHISSUG), a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Tomorrow, Microsoft Will Bet Its Farm On Apple!!!

Will the next version of Microsoft’s Windows Phone OS be codenamed blackberry? or apple? 😛 Should Ballmer call it Microsoft apple, it will be my first time to own an apple. Hehehe. And one can imagine a news headline saying, millions and millions of Microsoft users now use apple! That would be bedlam. Then we probably hear Mac fanatics or Steve Jobs saying apple sucks! Ahh … that would surely perk up my day!

Expertise Bug Leads to Major Database Design Blunder

Often, we hear software developers (myself included) say, “yeah … we can definitely do that!” or “yeah … you have come to the right place, anything you want .. we can build it for you”! Then often, we too hear stories of software projects fail here and there. And now I begin to wonder how things could be worked out properly so problems, or worst case, failures can be avoided.

What happened to me recently could probably be one of the reasons why software projects fail. For the past several months, I have been working on a system that concerns the health of people. The system is intended for use by doctors, dentists, physical therapists, nurses, or any practice or profession that deals with people’s health, etc. The system, when done, will handle quite an extensive amount of data gathered from a good number of processes and sources that are realized every minute of the day.

For the past several months, there was a good amount of communication between my group and those potential users of the system. A lot of time were spent in requirements discovery and gathering, analysis, and everybody even subject a lot of items to questions just to sort things out clearly so things come out fine. A near functional prototype has been established for several months and it looks like everybody is happy of everyone’s progress … including the state of the system. Beta testing was conducted, had 6 major beta builds in the last few months, and seemed not a thing was amiss. In fact, it is almost done that the potential users are so eager to have the system deployed already and pioneered as soon as possible (that would have been like last month). I was into the finishing touches and was like in the process of putting icing on the cake.

Then … it was KABLAAAAAAMMMMMM!!! As the project’s database designer, I thought of something that hurled everything back into the drawing board. I miss identifying one piece of information that should have been in the database design from the very start. With the set of people contributing to make this happen … this one thing never had any manifestation of being thought out. The medical guys involved in the project never thought of the item. The software development guys never had any clue. I never had any clue. And I have contemplated so deeply to analyze how I could miss something this important. People who played the analyst role came short in thinking about this (I am one of those). But I, being the database designer, am blaming of missing something so important.

After gathering myself, I came to a conclusion that the only time that I’d be able to easily identify or come across such piece of information, is probably when I am a doctor, a practicing one, and at the same time a database designer who had lots of databases and experience tucked under my belt. I caught the missing piece from a totally unrelated event, not even related to what I am doing.

To cut the story short, what happened thereafter was 1 table was added to the database structure with 1 new column that would serve as a reference for 70% of other tables. With the change, 70% of sql code were re-written, 50% of critical UI got revamped and lots of time lost and gained lots of sleepless nights.

The moral of my story, though completely lacking of juicy technical details as I cannot divulge those for fear of legal ramifications and of ridicule (ano ako hilo? hahahha), software development / technical expertise can only bring us to certain extents and domain expertise is clearly a desirable attribute one can have, especially if you are a database designer. The reality and funny thing though is that, I can’t picture myself as a doctor and a database person so I can eliminate this problem in the future and this is probably the reason why there are just too many software project that have failed. And I can still hear myself saying “yeah … of course, anything you want, I can build it for you!”.

The system I talked about here is almost done and is looking really good! And I don’t think I have missed some more that would screw up my day. How about you? Do you say, being considered as an expert, can do of anything that is asked of you? 😛

Thoughts on Designing Databases for SQL Azure – Part 3

In the first article of this series, I raised an issue considered to be one of sharding’s oddities. The issue raised was what would one do should a single tenant occupying a shard exceeds a shard’s capability (e.g. in terms of storage and computing power). The scenario I was referring in the first article was that I opted to choose “country” as my way of defining a tenant (or sharding key). In this iteration, I’ll once again attempt to share my thoughts on how I would approach the situation.

Off the bat, I’d probably blurt out the following when ask how to solve this issue:

  • Increase the size of the shard
  • Increase the computing power of the machine where the shard is situated

On in-premise sharding implementations, throwing in more hardware is easier to accomplish. However, doing the above suggestions when you are using SQL Azure, is easier said than done. Here is why:

  • Microsoft limits SQL Azure’s database sizes to 1GB, 5GB, and 50GB chunks.
  • The computing instance of where a shard can reside in SQL Azure is as finite as well

I have heard of unverified reports that Microsoft allows on a case-to-case basis to increase an SQL Azure’s database size to more than 50GB and probably situate a shard on some fine special rig. This however leads to a question on how much Microsoft allows each and every SQL Azure subscriber to avail of such special treatment. And it could probably cost one a fortune to get things done this way.

However, there are various ways to circumvent on the issue at hand without getting special treatment. One can also do the following:

  • You can tell your tenant not to grow big and consume much computing power (Hey … Flickr does this. :P)
  • You can probably shard a shard. Sometimes, things can really go complicated but anytime of the day, one can chop into pieces a shard. Besides, at this point, you could have probably eaten sharding for breakfast, lunch and dinner.

So how does one shard a shard?

In the first part of this series, I used as an example of sharding a database by Country. To refresh, here is an excerpt from the first article:

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
6 edu file1 grmy
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy
Server: CDE
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
3 allan file1 can
4 allan file2 can
5 allan file3 can
9 jon file1 can
10 jon file2 can
11 jon file3 can
Server: CDE
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

In the sample above, the first shard contains data related only to grmy (germany) and the second shard contains data related only to can (canada). To break the shard further into pieces, one needs to find a another candidate key for sharding. If there is none, as in the case of our example, one should create one. We can probably think of splitting up a country by introducing regions from within (e.g. split by provinces, by cities, or by states). In this example, we can probably pick city as a our sharding key. To illustrate how, see the following shards:

Shard #1
Server: ABC1
Database: DB1
Table: userfiles
userfiles_pk User uploaded file country_code city_code
1 John file1 grmy berlin
2 John file2 grmy berlin

 

Shard #2
Server: ABC2
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code city_code
6 edu file1 grmy hamburg

By deciding to further subdivide a country by cities where each city becomes a shard, the following statements would be true:

  • The new sharding key is now city_code.
  • Our shard would only occupy data related to a city.
  • Our shard would only occupy data related to a city.
  • Various shards can be in the same server. Shards don’t need to be in separate servers.
  • The increase in the number of shards would also increase the amount we spend on renting SQL Azure databases. According to Wikipedia, Germany alone have 2062 cities. This is some serious monthly spending that we have here. However this example is just for illustration purposes to convey the idea of sharding. One can always pick/create the most practical and cost-effective key for further sharding to address the issue of going beyond a shard’s capacity without the spending overhead due to poor design choices.
  • At a certain point in the future, we might exceed a shard’s capacity once again breaking our design.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Thoughts on Designing Databases for SQL Azure – Part 2

In the first article, I showed an example how a database’s design could impact us technically and financially. And sharding isn’t all just about splitting up data. It also brings to the table a group of terrible monsters to slay. There are a lot of concerns that needs to be considered when one attempts to shard a database, especially in SQL Azure.

NoSQL, NoRel, NoACID

In breaking things apart, one is bordering on clashing religions. One monster to slay is the issue of ACIDity. People discuss NoSQL, NoRel, NoACID to be one of the trends out there. And most even swear to the fact that these approaches are better than SQL. In my case, I prefer to call it NoACID and it is not by any means more or less than SQL. I have NoACID implementations on some projects I had. And I love SQL.  To simplify, I’ll put in these trends in a NoX lump as they commonly attempt to disengage with the realities of SQL.

For me, NoX is not a religion, it is simply a requirement. The nature of the app you build will dictate if you need to comply to the principles of ACID (Atomicity, Consistency, Isolation, Durability). If ACID is required, it is required regardless of your data and storage engine or your prefered religion. If it is required, you have to support it. Most cloud apps that we see, like Google and Facebook, could probably have ACID to be absent in their requirements list. Google is primarily read only so it does make sense to have data scattered all over various servers in all continents without the need for ACID. By nature, ACID in this regard, can be very minimal or absent. Facebook on the otherhand is read/write intensive. Seems like it is driven by a massive highly sophisticated message queuing engine. Would ACID be required in Facebook? I am not quite sure about Facebook’s implementation but the way I look at it, ACID can be optional. ACID can well be present in operations concerned only to one tenant in case of an FB account. Outside of this, the absence of ACID could probably be compensated by queuing and data synching.

If Facebook and Google decided to require ACID, they could be facing concerns on locking a lot of things. While locked on, latency could be one of the consequences. It is therefore very important to lay out firsthand if ACID is a requirement or not. For a heavy transactional system, a sharded design presents a lot of obstacles to hurdle. In SQL Azure, this is even harder as SQL Azure does not support distributed transactions like we used to with SQL Server. This means, if your transaction spans across multiple shards, there is no simple way to do it as SQL Azure does not support it, thus ACID can be compromised. SQL Azure however does support local transactions. This means you can definitely perform ACIDic operations within a shard.

To be continued…

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Thoughts on Designing Databases for SQL Azure

Cloud has been the new darling in the world of computing and Microsoft has provided once again developers a new compeling platform to build their new generation applications on. In this picture enters SQL Azure, a cloud-based edition of Microsoft’s leading database product, Microsoft SQL Server. One would ask how one designs an RDBMS database for the cloud and specifically for the Azure platform. It is basically easy to create a simple database in SQL Azure but database creation and database design are two different life forms. So we ask, how are databases be in the cloud and in SQL Azure? In this blog, I hope to provide an answer.

Scale Out

First and foremost, the cloud has been one of the primary solutions to consider when an application is in the realm of a massive scalability issue. Or to put it in another perspective, the Cloud was both a catalyst of the emergence of extremely large applications and a result of the webification of a lot of things. Design considerations need to factor in unheard of scales that can span from almost everywhere. Bandwidth needed is of the global scale, hard disk space are in the petabytes, and users are in the millions. When you talk about building an application to handle this scale, one can’t do anymore the usual stuff any one of us used to do. One now needs to think massively in terms of design. E.g. No single piece of hardware can handle the kind of scale I just mentioned. So how does one design a simple database that is intended to scale up to the degree that I just mentioned?

The Problem

Supposing, we have a great idea of developing the next big thing and we named it FreeDiskSpace. FreeDiskSpace allows us to upload massive amounts of files where files can be anything one can think of. The scale of this next big thing is thought to be in the range of having 500 million users scattered around the globe with each users allowed to upload an unlimited amount of files.

The Challenge

With such order of magnitude, the many challenges we will be facing will definitely lead and force us to question every bit of our knowledge and development capability that we might have been using when we are used to doing gigs in the few GBytes of databases with a few thousand users. If what is allowed makes each user upload files in the range of not less than 5GB, one would need to store 2,500 petabytes worth of files. How does one store 2,500 petabytes worth of files and manage the uploading/downloading of 500 million users?

In the Cloud, you are less likely to see a bunch of supercomputers handling this kind of scale. Rather, you are more likely to see hundreds of thousands of commodity servers woven together forming a cohesive unit. Making use of these mesh to behave like it is as if it is operating as a unit requires design patterns that are more appropriate for these environments.

Database Sharding in SQL Azure

First, what is database sharding. To put it simply, database sharding is a design pattern to split up data into shards or fragments where each shard can be run on its own on a separate instance running on a separate server located somewhere. Each shard contains the same structure as that of another but contains different subsets of data.

Current version of SQL Azure offers NO feature that handles automatic scaling. This will surely change in the near future as everybody doing applications handling these massive loads will somehow need an automatic way to allow the application to handle this amount of load and data easily.  One way that is guaranteed to be commonly used will be sharding databases. However for now, sharding in SQL Azure isn’t automatic. If one needs to shard, one needs to do it by hand. As I have said, in the future, Microsoft will be compelled to give us an easy way to practice sharding. Hopefully, that future is near.

Now we would ask, why do we need to shard databases in SQL Azure. The cloud and specifically with SQL Azure, with its limitations, forces us to use this design pattern. SQL Azure only allows us 50GB for a database’s maximum size on a virtual instance running on an off the  shelf commodity server. With that, we surely need a way to work around these constraints if we see our data to grow for like 500 terabytes. Second, if we foresee a huge number of users hitting our applications, doing this on a single piece of hardware would spawn concurrency nightmares that would choke our databases to death.

Though there are other approaches to solve our scaling challenges, there are other interesting motivating factors where sharding becomes a very compelling option. Firstly, it is more likely for a database server to process small databases faster than larger ones. Secondly, by utilizing multiple instances of computing power, we can spread processing load across these instances to more processes can be executed all at once.

How To Shard

Let us try to go back with our FreeDiskSpace killer app (assuming we provide each user up 1GB in database space) and practice some very simple sharding so we have a partial understanding of what sharding is all about. If we are to design the database using a single instance, the table would look like this:

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
3 allan file1 can
4 allan file2 can
5 allan file3 can
6 edu file1 grmy
7 roman file1 ity
8 roman file2 ity
9 jon file1 can
10 jon file2 can
11 jon file3 can
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

If we are to shard, we need to decide on how we split up our database into logical, manageable fragments. We then decide what data is going to occupy each fragment (we can probably call the occupier our “tenant”). Based on tables we see above, there are 2 fields that we can potentially use as keys for sharding. These fields are user and country. These fields should provide answers to a question: How to we sensibly split up our database? Do we have to split it up by user, or by country? Both are actually qualified to be used as basis for splitting up our database. We can investigate each though.

Sharding by field: User

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy
Server: CDE
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
3 allan file1 can
4 allan file2 can
5 allan file3 can
Server: CDE
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

By deciding to make each user as a tenant in our database, the following statements would be true:

  • Our shard would only occupy data related to a single user.
  • Various shards can be in the same server. Shards don’t need to be in separate servers.
  • It would render country table a bit useless and awkward. But we can always decide to get rid of it and just maintain a column indicating a user’s country of origin.
  • Now with a one-user-one-database structure, 1 million users mean 1 million separate databases. In SQL Azure’s terms, assuming a 1GB costs us 10$ a month, designing our shards by user means we have to spend $10 million monthly. Now that is no joke!

Sharding by field: Country

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
6 edu file1 grmy
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy
Server: CDE
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
3 allan file1 can
4 allan file2 can
5 allan file3 can
9 jon file1 can
10 jon file2 can
11 jon file3 can
Server: CDE
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

By deciding to make country as a tenant for each of our database, the following statements would be true:

  • Our shard would only occupy data related to a country.
  • Various shards can be in the same server. Shards don’t need to be in separate servers.
  • It would render country table unnecessary as each database represent a country. But we can always decide to get rid of this table.
  • Asnwer.com says there are only 196 countries in the world. With a one-country-one-database structure,  this means,  if we are getting the larger 10GB, 100$ a month SQL Azure databases, we only need to spend $19,600 a month.
  • This design decision though is pretty useful until we ran out of space in our 10GB design to contain all users related to a country. There are solutions to this problem though. However, I will try to discuss this in my future blogs.

It is pretty clear that picking the right key for sharding has obvious ramifications. It is very important for us to determine which ones are appropriate for our needs. It is obvious that picking country is the most cost-effective in this scenario while picking user could shield you much longer on the 10GB limit problem.

This article only presents partial concepts of sharding but nevertheless has given one the very basic idea of how database sharding can be applied, and how all these translate when you choose SQL Azure as a platform.

Thoughts on Designing Databases for SQL Azure – Part 2

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com