Code Debugger: January 2013

Wednesday 30 January 2013

Market Interest in Elastic Cloud Infrastructure Accelerates — “With enterprises, web application companies and service providers searching for true elastic infrastructure solutions, we are seeing increasing prospect and sales activity surrounding our Open Cloud System,” said Michael Grant, CEO of Cloudscaling. “To support that growth," he continued, "we’ve made thoughtful additions to the leadership of our channels, engineering, enterprise sales and product management functions."

The above excerpt from CloudBlogs daily makes it all the more clear that organizations are now increasingly seeking to adopt an Elastic Cloud infrastructure to cater to their scalability needs. As pointed out earlier, Elastic Cloud provides higher performance in a scenario marked by extremely large number of requests, that typically is a bottleneck for conventional cloud based operation.

Sunday 13 January 2013

Different Virtualization Paradigms

This post was imminent. Unlike the other posts, the source of content of this post is a web link rather than a YouTube video clip.

For my readers' reference, the link [1] is provided in the reference section below.

From [1], we can see that Virtualization manifests itself in four forms:

Fig. 1: Hardware Virtualization [1]

First and foremost, Hardware Virtualization which is nothing but emulation of underlying hardware. This name is quite misleading for it may suggest that the application running atop the hardware is not serviced by the hardware itself, but instead by a software that emulates the hardware. Although this interpretation is partially valid, what we need to remember here is that the application of hardware virtualization is not exactly to substitute the hardware with its emulator but rather help application developers and designers test and debug their application code and check its behavior in the target environment. This enables them to perform the preliminary tests even when the actual hardware is not available. We can see from Fig. 1 that multiple hardware virtual machine instances run on top of the hardware layer, each of which emulates a different hardware environment. E.g. VM1 may emulate a system with 4 GB physical memory and VM2 may emulate a system with 2 GB physical memory.

Fig. 2: Full Virtualization [1]

Secondly, we have Full Virtualization. To all my readers who have used vmware, this is exactly the category of virtualization that vmware falls under. Here, a program layer called hypervisor or Virtual Machine Manager (VMM) runs on top of the hardware layer and several guest operating systems may be installed above this VMM layer. The idea here is not to emulate the hardware but to make the presence of multiple guests transparent to each of them. This means that each of the guest operating systems feels that it is the unique holder of the underlying hardware resources. What we need to understand here is that each guest operating system is believed to be executing as a separate virtual machine. In other words, what we essentially mean by virtual machine is a framework or abstraction that houses a single guest. The hypervisor is tasked with handling protected instructions that require access to the hardware resources which are not held by the guests in the real sense. Fig. 2 shows a VMM layer intermediary between the hardware and the guest operating system which monitors and manages each of the guests above and co-ordinates their access to the underlying hardware in a manner that keeps the presence of multiple guests transparent from each of them.

Fig. 3: Paravirtualization [1]

Another type of virtualization technique is called Paravirtualization. This is similar to Full Virtualization, the only difference being that support for virtualization is extended in the guest operating system itself. This means that the guest operating system code is virtualization-aware and it provides its assistance and co-operation to aid the hypervisor with the execution in a virtual environment. The thin cream strip shown in the guest operating system section in Fig. 3 represents the virtualization aware code that has been added to each of the guests to enable them to co-operate with the VMM.

Fig. 4: OS level Virtualization [1]

Besides the three we have discussed above, there is also a virtualization at the operating system level. In my opinion, this is nothing but the commonly encountered notion of concurrent processes running in a system. We know that the operating system can create new processes dynamically and then perform certain management tasks such as scheduling, resource allocation and commitment. Here we can have multiple instances of the same process and use these separate instances to service separate requests. This is precisely how a server handles multiple incoming requests using the same, single physical resource. It creates multiple, logical instances of the single physical resource to create the illusion that it has not just one but more than one unit of each resource.

So what's the idea behind this post in this blog page? Well, to be frank with you, this post may be seen as a sister post of the previous one which introduced virtualization as a current IT trend. In this post, I compare and contrast the commonly seen manifestations of virtualization to better discern one from the other. One thing I can add here is that the fourth paradigm is what businesses use these days to reduce their maintenance costs. That said, I clarify yet again that the role of this post is only to present to you folks the more intricate, technical details of virtualization. That's it for now, stay tuned for more updates.

Reference:
[1] http://www.ibm.com/developerworks/library/l-linuxvirt/index.html

Friday 11 January 2013

Virtualization: Why do we need it?

We have been lately hearing a lot about Virtualization whenever there has been a talk about cloud computing. Although most of us may have ourselves used virtual machines in the past to solve a very different purpose altogether, most of us are not really sure what virtualization is and why it is so beneficial!

As usual, I post link to a relevant YouTube video here to get you readers started:

The contents of the video are fairly complete, but I might as well expostulate the same. Most businesses often use a combination of a number of application servers, web servers, image servers, document servers, audio and video servers, and not to forget the database servers.

Although contemporary web usage trends may suggest that all of the above mentioned hardware infrastructure is being used well almost all the time, this is largely a myth and more precisely, an ill-founded specious belief! If 75% of the hardware appears as being used at any point of time on the basis of average number of server requests recorded, the servers are still largely under-utilized. Hmm, it's a bit of a challenge to present this information more convincingly, but, I shall nevertheless give it a try!

What appears as active to us is largely superficial. The servers typically take only about (1-10) ms to service each request. If my estimate is flawed, I can only tell you that it should be much faster! Given this extremely short amount of time taken to service the request, the amount of time the server machine is kept up and running relative to the actual time spent by it servicing the requests, is much higher. This clearly demonstrates that a significant amount of energy is wasted per server in the process of keeping the servers up and ever-ready to service requests upon their arrival. I must again reiterate that the cumulative energy wasted is actually pretty high considering the fact that we use not one server for each purpose, but a number of them for different purposes.

What we must remember here is that efforts to maximize the server utilization is limited by the number of incoming server requests. So, if you have done your best to ensure that a server spends a good fraction of the time servicing requests, this is only as much as the number of requests the server receives at any point of time. So, how exactly do we eliminate this wastage and thereby maximize the profits?The answer to this problem lies with virtualization.

Virtualization essentially means to create multiple, logical instances of software or hardware on a single physical hardware resource. This technique simulates the available hardware and gives every application running on top of it, the feel that it is the unique holder of the resource. The details of the virtual, simulated environment are kept transparent from the application. Organizations may use this technique as illustrated by the video to actually do away with many of their physical servers and map their function onto one robust, evergreen physical server. The advantage here is the reduced cost of maintenance and reduced energy wastage which is not very surprising. As you have fewer physical servers, you need only maintain them and therefore maintenance becomes much easier and cheaper too. As for energy conservation, it is fairly implicit. The amount of energy wasted is a function of the number of physical servers which is clearly much lower in a virtualized environment. Also as far as desktop virtualization is concerned, as the video points out, updates may now be made available much sooner as a single firmware update does not update one client machine, but several instances of the same.

Now, I am not extending the scope of this post to include the technical minutiae. This post is only targeted at enlightening the readers with regard to why exactly we need virtualization. The working details will be covered in a subsequent post which is due shortly :-P

Friday 4 January 2013

NetApp Filers and Amazon Direct Connect

This afternoon, I stumbled upon NetApp's Agile Data Infrastructure and the three 'I's associated with the same: Intelligent, Immortal and Infinite.

I checked out a YouTube video by NetApp TV where their sales representative talks about the Agile Data Infrastructure which really wasn't too clear to me until that point. The video (linked below) answered a few questions but made me ask many more. It is really one of those days where you start unraveling more new problems as you unravel each mystery piecemeal, and then when the day draws to a close, you heave a huge sigh of relief after having looked back at the long day and patting yourself at the back for having learnt a whole new deal about contemporary IT trends.

So apparently, this team at NetApp provides storage solutions similar to Amazon S3 that guarantees efficient management of the monstrous amounts of business critical data which they refer to as intelligent, durability of the stored data in the sense that it will never be lost (immortal) and support business growth by providing storage solutions for the concomitant business data growth.

Now let's delve into some technicalities. NetApp storage solutions comprise of a NetApp filer or FAS (Fabric attached storage) device that in my opinion is nothing less than a full-fledged computer system. Storage solution it is called and fairly misleading the name is, for it is practically nothing short of a cabinet that houses a processor, NVRAM (Non-volatile Random Access Memory, I'll mostly have a post coming soon about this one too) backed by a battery and the actually physical device that stores the data which could be SATA, Fiber channel or SAS disk drives. The filer can communicate with other filers using file-based protocols that could be as simple as FTP and HTTP which most of us already know about to the little known NFS (not Need for Speed! Mega facepalm, if even for a moment you thought it was the EA game), CIFS and TFTP. The NetApp filer is supported by an overlying NetApp ONTAP operating system and I hear that the current version of the same is NetApp ONTAP 8 which many businesses remark as simply stunning!

This dedicated storage solutions service provider happened to tie the knot with Amazon that can be evidenced by a blog post in their website, https://communities.netapp.com/community/netapp-blogs/tim/blog/2012/11/28/netapp-private-storage-for-amazon-web-services which again made little sense to me right from the start when the state of my mind could be best expressed as random ideas scattered hither and thither like an unsolved jigsaw puzzle. I thereby set off to join the dots and solve the puzzle and I must say I am nearly there, at least 90% done which is precisely why I am writing this post.

Businesses that use the AWS for managing their colossal amounts of data can now use a two-fold strategy. They can continue to use the AWS for managing a great proportion of their data in the public domain, but can also now achieve a limited level of private access by caching the frequently accessed data in a privately managed NetApp filer. At this point, I would want to draw a distinction between the business goals of NetApp and Amazon, both of which seem to be suggesting that they are in essence satisfying the same goal but believe me they are not! NetApp aims at providing storage solutions for Big Data but their mission statement clearly indicates that their solution is suitable for privately managed, enterprise-level storage. This means that you as their client, still need to hire experts to manage the data storage infrastructure, but you do not need to keep hiring experts for maintenance as your business grows. The NetApp filer provides that inherent scalability to support your ever-growing business. Now, on the contrary, Amazon storage solutions like S3 are typically storage and maintenance services that you pay for. You as the client, need to only pay for the service and you are not really in charge of any other associated support activity. Phew! That seemed so pretty close, but on deeper dissection, it does appear more distinct and disparate.

Now as suggested by that link above, NetApp filers owned and managed by you can now securely connect with the Amazon cloud infrastructure using the Amazon Direct Connect feature which is nothing very different from a VLAN technique modeled along the lines of IEEE standard, 802.1Q. This again has a two fold use. One, you can now swap data, in and out from the larger cloud, S3 to your private NetApp filer. The second benefit stems as a natural corollary of the first one. As data can now be moved in and out, you can unleash the high performance of Amazon EC2 processing infrastructure even on to your privately held NetApp filer data.

For those of you still continuing to read, let me tell you that the post is now over. If you didn't really understand, I would suggest that you read the previous paragraph yet again. Of course, any help you need beyond that, I have a fairly ubiquitous presence today :D. Just ping me on Facebook or LinkedIn or Twitter and I will get back to you.

The NetApp-Amazon alliance is definitely a healthy sign for more innovation in the cloud and I have today an extended and fairly entrenched belief in the idea that most of the solutions for businesses, both today and tomorrow, can only be found in the cloud!

Amazon Simple Storage Service or S3

We have previously discussed how really critical data has become today with the fast growth of businesses. Business are willing to pay any price for the efficient management, maintenance and security of data that has become a vital business asset. Services like AWS have smartly tapped into this opportunity to provide enterprise-wide reliable data storage solutions.

The Amazon S3 or Amazon Simple Storage Service is a data management and maintenance feature included in the Amazon Cloud infrastructure which is popularly known as Amazon Web Services. S3 is aimed at providing reliable storage for the astronomically vast amounts of business data. By reliable, it means that the data will be available when it is needed and the the integrity of the business critical data that it houses is guaranteed. Security is a critical concern as far as cloud computing is concerned as the business enterprise premises and their data storage premises are disjoint. This is a worrisome because the critical and confidential data is maintained at a remote location that is accessible via the internet.

Amazon S3 guarantees the integrity of data as well as 24*7 availability of the data. It is an efficient storage solution that can intelligently scale up as the business grows. Storage has become a critical facet of IT solutions today and what was for a long time a rather stagnant and nonchalant area of high performance computing, has now pressed the IT giants to speedily innovate in this much neglected front.

Thursday 3 January 2013

What really is Elastic Cloud?

I have been lately hearing more and more about Elastic Cloud and like all other enterprise solutions regarding which I have already posted in this blog, it really took me a while to distinctly identify this one from the already discussed cloud infrastructure.

So what exactly is elastic cloud? With due credits to http://aws.amazon.com/ec2/, elastic cloud infrastructure provides users or the cloud computing environment with greater flexibility and scalability. As businesses start-up on the cloud, they need fewer virtual servers and fewer disks for the storage of business-critical data. As the businesses expand, they will need more resources and Elastic Cloud framework provides exactly this requirement.

I can illustrate the above with an example. A social networking website like facebook initially started off with tens of users, then it became hundreds, then thousands and so on. Currently, there are over a billion facebook users and I can estimate that at least half a billion facebook users are active and online at any time. They are active in that they either keep posting status updates, pull the updates from their friends on their feeds, upload new pictures or simply browse through the existing pictures. There is a whole lot of data that is constantly being processed in these activities and it would have been really difficult for a company like facebook to plan and procure the hardware support infrastructure to accommodate the needs of so many users well before-hand.

Here is where the Elastic Cloud environment gives ever-growing companies like facebook momentary respite. In this framework, businesses pay as they use. So if initially facebook procured 1000 servers and 1000 disks, and paid a monthly rental of USD 10000 for the same, with increased need they can scale up their resources by making the additional payments. Elastic Cloud infrastructure service providers like Amazon Web Services(AWS) facilitate this scalability.

What we need to note here is that the program code should be made transparent to the dynamic increase in the number of resources. E.g. if the code has been written for 100 disks in a way that it references disks by an identifier, it may not facilitate this dynamic increase in storage and care must be taken to avoid exactly this. That's what imparts elasticity to an Elastic Cloud and services like AWS mask the operating system and the programming language used by the user from the underlying hardware infrastructure.

I think I have missed another vital point. Elastic Cloud is also intelligent enough to automatically scale-up at times of heavy load by using a paradigm like Elastic Map Reduce (again with due credits to Amazon), http://aws.amazon.com/elasticmapreduce/. This means that if during the hours 4 AM to 7 AM, you have only 100 users and this increases to a million past 7, the cloud will automatically deploy more processors to provide for this increased need for computation.

What's strikingly impressive is its inherent flexibility. Services like AWS have made possible many new start-ups and have enabled them to grow at their pace without much initial planning. This has truly revolutionized businesses that can start very quickly today without having to wait for the procurement and thereby setup of the hardware infrastructure. In my opinion, this is just the beginning. We are going to see many more similar innovations in this decade. So stay alert and awake!

Wednesday 2 January 2013

Storage classes in C

A rather surprising yet monumental discovery of a concept in C that I knew little about. Storage classes or groups are all about the extent to which definition of a variable is valid in the context of a C program.

There are 4 storage classes in C:

1) auto: the default class when a user does not give any storage class specifier, this is tantamount to a regular variable declaration.

e.g: int a or auto int a are both equivalent to each other

2) static: fairly intuitive to all Java programmers. A static specifier is used for a global variable declaration so that the value of the variable is initialized only once at run-time and is immune to any number of function calls that follow.

e.g.

static int count=10;
void function_A()
{
count=count+1;
printf("%d",count);
}

int main()
{
for(int i=0;i<5;i++)
function_A();
}

The output of the above program will read:
11
12
13
14
15

3) extern: This is one of the most interesting and least known C concepts to most Java programmers. A variable whose storage class has been defined as extern may be accessed from a file external to the one where it has been defined. Another point to be noted is that such variables are global. extern is nothing but a static variable that can be seen and accessed from multiple files.

e.g
//Program 1
int A=0;
int main()
{
function_A();
}

//Program 2

extern int A;
void function_A()
{
printf("A=%d",A);
}

4) register: Another cool concept that lets you store variables in register rather than in physical memory. This of course lets you tap into the higher speed of registers, commonly called associative access. This of course does imply that the variable size is limited by size of the register which is used to house the variable. The storage class specification has a fairly simply syntax which is given below:

{
register int A;
}

Hope this post was informative. Stay tuned for more updates. You may follow me on twitter. My twitter handle is @imancrsrk. You can also search for my science and tech group, GIC: Glean Info Club on facebook.