Infogain Perspectives
Home > Company > Perspectives > NOSQL
Tushar Jain

NOSQL = Not Only SQL by Tushar Jain

Tushar Jain is a Senior Solution Architect and has been with Infogain for three years. He has over thirteen years of performance-driven professional experience working in the IT, service and manufacturing arenas to deliver enterprise-class, N-tier, object-oriented integration, SOA and BPM solutions. He has used systems and software engineering experience to help customers successfully deliver and maintain their technical solutions. Tushar has educational training and professional experience in various aspects of the Object Oriented Engineered Software Project Life Cycle and the Structured Software Project Life Cycle. He has performed software and systems engineering, analysis, data modeling, design, development, integration, engineering management and deployment, including technical evaluations.


Structured vs Unstructured Data

The need for storing and retrieving data in some form has been there since the invention of Computers and Storage Devices. There have been different storage media starting with the iconic paper punch cards to today’s ubiquitous storage hardware such as Hard Drives, NAS and SAN storage solutions and of late Virtualized Storage Options but all these relate only to the physical aspect of storage. The manner in which data gets to these devices has historically been categorized into two types: Structured and Unstructured.

Structured Data Storage and retrieval was traditionally done through file access mechanisms like Sequential and Random Access Files. Then came Databases which added a whole new chapter to this story.

Of late, with the advent of Web 2.0 technologies, data storage and management has taken on a whole new dimension and the needs for dealing with data have necessitated innovative and radical departures from traditional mechanisms—which we shall try to explore in the following sections. The type and volume of data have made the current mechanisms for dealing with it quite obsolete.

Data Management in the Web 2.0 Era

With the growing penetration of electronic networks, the management and architecture of structured and unstructured data is becoming challenging and is affecting the way data is stored and processed. In many of today’s Web 2.0 businesses such as Google, FaceBook, Twitter and others, it’s not unheard of to process terabytes and even petabytes of data each day. The architectural challenge of humungous masses of data along with real-time or near real-time access to it has led to a broad reaching movement to find alternatives to the relational data management systems prevalent in enterprise business applications.

In any typical software application, data can be classified into two groups on the basis of aging analysis. The first class of data is created in real time and accessed frequently at a given point of time. The second class of data is a collection of once real-time data along with master and configuration data. Apart from this, in today’s software systems, much data is hierarchical, key/value pair and graph-structured, which make them difficult to store, retrieve, update and process in traditional Relational Databases (a.k.a. RDBMS). These challenges often result in reduced performance, increased hardware, manpower and license costs and scalability pains.

To tackle such challenges, one of the emerging solutions is NOSQL—which is interpreted by the community as Not Only SQL. NOSQL implementations focus on dynamic scalability, high availability, real-time access and storage virtualization while pushing features like consistency and transaction management down a few rungs on the priority ladder. As such, the NOSQL approach is applicable where the requirement on Data Stores is loose on its ACID guarantee—avoiding join operations—and where horizontal scaling exists.

NOSQL systems often provide weak consistency guarantees such as eventual consistency and transactions restricted to single data items. However, one can attain full ACID guarantees by adding a middleware layer to their NOSQL systems. Not providing relational capabilities makes it a lot easier to scale data storage by not having to pay the costs associated with relational guarantees.

Several NOSQL systems employ a distributed architecture, with the data being held in a redundant manner on several servers, often using a distributed hash table. In this way, the system can be scaled up easily by adding more servers, and failure of a server can be tolerated though CAP theorem is not violated.

On the basis of storage medium, NOSQL systems can be classified as:

1. Data remains in memory: These types of systems are based on the premise that disk is not less risky than memory, which if run over distributed, redundant machines provide a higher level of reliability and performance – throughput. A few implementations of such systems are Memcached, GigaSpaces, XAP etc.

2. Data is stored in disk: These types of storage solutions are typically based on the premises of both key/value pairs and distributed storage. Prominent implementations in this class are BigTable, Dynamo, Cassandra, etc.

What Lies Ahead?

Big names in the industry are using NOSQL implementations today. Some primary examples are BigTable from Google, Cassandra from Facebook and Dynamo from Amazon. In fact, many of the Web 2.0 leaders have moved away from their MySQL implementations in favor of NOSQL systems, primarily for speed and scalability. Twitter and Digg have recently made such announcements. These applications are write-intensive, have real-time access requirements and must be able to dynamically scale to meet the needs of a worldwide user base processing potentially hundreds of millions of records each day.

That said, the NOSQL solutions aren’t necessarily just for the big players—they lend themselves well to all cloud computing environments. The economics of these open source options are much better than high-cost RDBMS solutions, and many small and medium businesses opting for cloud-based solutions do not have high-end transaction processing requirements that require all the bells and whistles of relational database systems.

Since NOSQL is optimized for environments with massive data stores and high volume transactions accessing that data in real time, Telecom, Insurance, Banking and Retail are prime industry targets for early adoption. However, it’s unlikely that NOSQL will be replacing the enterprise RDBMS in the near future. Rather, an alignment of data access and storage requirements with the appropriate data management solution will drive point-adoption of NOSQL, as we are also seeing with the overall cloud movement.

Posted by Tushar Jain on 15 April, 2010 Add Comment |  Comments (0)