The Architecture of StackOverflow


The Architecture of StackOverflow

A couple of months ago, I read a comment on Hacker News about how .NET is a poor technology choice for startups for performance reasons. I disagree wholeheartedly. Not because .NET is always the best choice, but because characterizing a platform/language as a poor choice for performance without the context is non-sensical. Today, for most web based applications, performance ranks lower compared to licensing cost, talent pool, existing frameworks, etc.

Bad design, cookie-cutter approach and a lack of proper architecture are more likely candidates for performance issues rather than the underlying technology. Here is a case to demonstrate it.

StackOverflow is serving about 500M page views a month and they are able to do that on a grand total of dozen servers. They also happen to use .NET for all of this. How is this possible having made a “poor decision” by going with .NET? This video explains how StackOverflow was able to use .NET to create a high-performing web app.

Clearly the development team members at StackOverflow think about their design/architecture quite a bit and do not follow “out of the box” patterns. Sometimes, they even use anti-patterns. But, they do all of this consciously with specific design goals in mind. They ended up writing some libraries that were a better fit for their uses rather than the mainstream ones. They even open sourced most of these libraries and shared it on GitHub.

Just like the saying goes; “It’s the poor craftsman who blames his tools”. Any technology, when used properly can perform well. It’s the architect’s job to pick and apply it the right way.

Next time you hear a developer/architect blame the technology for poor results, you should ask: “What have you tried to address the issue?”

Dynamic Resource Allocation in the Enterprise


By now all of us architects are very much used to the idea of spinning a new server in the cloud and scaling our solution horizontally as needed. Virtually nonexistent setup times and a good API provided by cloud vendors makes this a possibility. Infrastructure has come a long way in the form of IAAS and PAAS. Also, open source software has been a great enabler of this movement; namely deployment automation technologies and linux distributions that are customized per use case.

The situation in most enterprise systems however, is not that great. It is not rare for me to run into a horizontally scaled solution that is not efficiently utilizing the computing resources. Take the following example; a software solution that serves an end to end business process is created as a single service. When the volume increases on the client, the service is scaled by creating an exact replica and using a load balancer to reduce the load.

To elaborate on the issue at hand, I will use a hypothetical car insurance company. As you may already know, there are some basic concepts in car insurance business such as finding a quote that fits your needs, comparing them with competitors, signing up for a policy and finally paying for it. If everything goes well and you don’t get into an accident, your interaction with the insurance company may just end there. Their “service” oriented software may be running on a server that looks like the following:


When this company starts becoming successful and gets permanent growth, the current computing capacity may no longer serve its needs. What I mean by permanent growth is a net gain on the  number of users. Say, from X to 2X. This growth is a pretty happy and desirable scenario which can be the result of geographic expansion, an acquisition or a marketing campaign. Any sane architect would just double the capacity, distribute the traffic with a load balancer go on with life. The server stacks may look something like this after the capacity upgrade:


While this is going on, an X-Ray of the business processes may show us some service resource usage distribution that looks like this.


Now lets think about a slightly more interesting scenario. Hurricane Sandy happens. A lot of cars are damaged and as a result the organization is experiencing a spike in the number of claim requests it gets. Since the servers are tuned to handle the current capacity (with a foreseen +/- 5% elasticity) things start slowing down. Customers who are on the phone with the customer service reps start experiencing a longer wait time. Mobile phone based claim submission requests start timing out. Overall, customer satisfaction goes down and there is very little this organization can do about it because they cannot procure and deploy new services overnight and configure their client software to handle the spike. If they had access to cloud resources, they could certainly grow the number of servers temporarily into the cloud and turn them down later. However, in the absence of that there is not much this organization can do.

This is exactly why a DRA like framework is needed in the enterprise. Imagine, if this organization could temporarily limit (or even turn off) its capacity to sell more policies and re-allocate those computing resources to the business function that it needs at the moment. The service allocation could look something like this:


This re-configuration could save the company thousands of unhappy customers and more importantly ensure that the business resources were utilized to the maximum when they were needed.

This concept is not entirely new. There are similarities to Cory Isacson’s Software Pipelines and SOA and cloud computing in general. Some advanced organization with very good engineering teams are able to achieve this scenario by maximizing technologies like Chef/Puppet in the cloud. However, it is not an easily accessible to the common enterprise.

What if there was an application/services framework that facilitated this? I believe that a solution like this would truly align the business’ needs with IT capabilities of the enterprise.

Fixing T-Mobile’s Activation Form


I was in the process of porting my home phone number to Google Voice and using Obi to connect to my home phone. For those of you who have not heard of it yet; Google Voice offers free calling within the US and Canada and it has many advantages such as ringing many phone numbers. Obi on the other hand is a nice SIP client that can work with Google Voice and other VoIP providers and connect them to your house phone. You can purchase an Obi for about $50 on Amazon.

If you currently have land line service with your triple play offering with Comcast, Verizon Fios etc, it is not possible to directly port your number to Google Voice, You need to first port your number to a wireless carrier and then to Google Voice.

I chose to do this process by using TMobile. I ordered a Prepaid SIM card activtion kit from Amazon for about $10. Then you need to put this into a spare phone and activate the number.

Tmobile has an activation site at

It is fairly straight forward to go through. However, when you hit Step 5 to fund your pre-paid account. This is where the web site drives you crazy. You literally cannot go forward because of a Form Validation error where it says the Auto-Pay date is not set. Surprisingly, there is no such field on the form!

After pulling my hair out, I resorted to a little hack that helped me get through the process. Before I show you what you need to do, here is what is happening:

Behind the scenes, the form is populated with an Automatic Payment field. So the server thinks that you are trying to set recurring payments while you are only trying to do a one time payment.

What we need to fix is to get into the “hidden” fields of the form and change the value so the server realizes that we are not trying to set up a recurring payment.

Here is how to do it; just open the browser with Chrome (or IE)


Now hit the inspect element button to start looking at the form:


Hit Control + F to bring up the search box in the form:


Now search for “autofill” you will something similar to the following:


Double click on the “autorefill” part of value=”autorefill” and change it to value=”autorefill1″ by adding a number to the end.

Now re-fill the rest of the form as you would and submit, you should be all set!

What’s your data strategy in the cloud?


A couple of years ago I wrote a blog post about  Cost Oriented Architectures (COA) as becoming one of the most important aspects of creating solutions in the cloud since everything is metered and charged for within your application not just across your domain boundaries.

A recent presentation by Jeremy Edburg on Reddit scalability journey also talks about this concept in the context of data gravity.

Have you thought of your company’s gravity field of data in the cloud? How do you architect your applications to balance minimizing the data transfer in the cloud while maximizing performance?


Below is a link to the high scalability web site that summarizes that the presentation very nicely. 

High Scalability – High Scalability – Reddit: Lessons Learned from Mistakes Made Scaling to 1 Billion Pageviews a Month.

Windows Azure – TypeLoader Exception


I am working on a side project that uses Microsoft Azure. Although the experience pretty good so far, I ran into an interesting problem that took a while to fix.

After a basic Javascript update to the site, I ended up receiving

Unable to load one or more of the requested types. Retrieve the LoaderExceptions property for more information

After a couple of attempts, it turns out some of the NuGet added references are not “copied local”. Setting them to Copy Local fixed the problem.

Lesson learned: do not count on the environment supplied components, make sure to have a contained deployment package.