Pages

Monday, 19 December 2016

How and Why To Bridge between SQL and NoSQL

SQL have for long now been the synonym of "database" for us. For any sort of data management, SQL had been our instinctive choice. However, the past decade saw the emergence of NoSQL which gave rise to a fierce competition of preferences.
  
What haunts the mind of every aspiring database developer today is the question of choice: To SQL or NoSQL. We want to keep in touch with the latest trends in the technology, but don't want the established technologies to slip away either. However, the most basic point that most people seem to miss is this: SQL and NoSQL are not competitors, and most certainly not antonyms of each other.



SQL or Structured Query Language is the most standard concept of database management systems today. SQL considers data to be stored in the form of tables called Relations, that consist of tuples and attributes. While this concept had been a hugely successful improvement over the data-storage systems present at that time, like flat files, things have changed today.

NoSQL came as a breath of fresh air in an industry that was rapidly changing. The world is going digital, and the digital world is messy. We can never predict the volume, variety or velocity of incoming data. The data, apart from being unpredictable, is also unstructured. Since relational databases are not inherently adept to handle them, something else was required. At the same time, distributed computing is all the rage today, because most businesses are moving towards the cloud. The expansion of relational databases cannot keep up with the pace; thus, NoSQL entered into the scene.


Why to migrate from SQL to NoSQL

Strictly speaking, NoSQL aims to do what SQL cannot. It is not based on relations and it may sometimes even fail to follow the ACID properties! But unlike what you have been taught, ACID properties, though really useful, are not the ultimate necessity. The ultimate necessity is fault tolerance, and NoSQL manages to achieve that anyway.

NoSQL cannot be defined in a single line, as there is no single definition. While all SQL-based databases follow strict guidelines that adhere to SQL-standards, NoSQL gives the databases a free rein. With so many lacks of standards, one might wonder: Are the reasons enough to migrate to NoSQL?

Yes, because we have only touched the crux of the importance of NoSQL in modern world. The two biggest reasons why NoSQL trumps over SQL are agility and scalability.

With the rapid changes that occur daily in the industry, being agile is the only way to survive. However, Relational databases couldn't ever hope to achieve that, with their rigid schemas and complex development. The aforementioned rapid changes are also met by growing size, which require rapid scalability. However, scalability was one aspect that was blatantly ignored in SQL (as it was made in a time when web and internet were non-existent). To cope up with these issues, NoSQL seems like our best bet.


Why to Bridge SQL and NoSQL

"Now that we know how NoSQL differs from SQL, the question arises: Why to bridge them? Why not adopt NoSQL altogether?   "

Simply, because NoSQL doesn't have the same penetration as SQL. A huge number of companies have their entire existing architecture based on relational databases, which would be quite a headache to change. But that doesn't mean that one has to remain stuck with SQL forever. The best option in such scenarios is to bridge the existing SQL framework with a NoSQL database. The benefit? To put it simple, it will bring out "the best of both worlds".

As far the "bridging" goes, there is no one, simple way to do that. The easiest way would be to use third-party drivers like easysoft, which provides ODBC-like bridging capabilities. However, as it comes from a third-party vendor, it might have its own security and licensing issues.

An alternative approach would be to develop languages that could extend SQL functionality to NoSQL databases. One example would be the N1QL, introduced by Couchbase Server, which extends SQL to JSON.

The ways to bridge the gap between these two technologies may differ and evolve; but we can all agree that co-existence of the two is best for the progress of industry.




Please share your thoughts on this topic. If you like this posts, please share it on google by clicking on the Google +1 button.

Read more on NO SQL- NOT ONLY SQL here - WhatisNoSQL

Wednesday, 14 December 2016

Unix vs Linux: What Is The Difference?



The two terms look similar but there are significant differences between both.

Unix is a proprietary operating system created in 1970, although there are now free derivative versions. UNIX is usually favored for largescale environments like universities, big enterprises or companies. The proprietary version today has a number of variants that developed over time but are mostly based on one of original editions. A few of the top ones are - Sun's Solaris, Hewlett-Packard's HP-UX, Mac OS X and IBM's AIX®

“Linux is a free source version of the same idea of UNIX, behaving similarly but not a clone per se“

The development of Linux started off with a desire to have a free alternative to Unix. In early 1980s the GNU project developed a free version of Unix, and decided to adopt the kernel which was being written by Linus Torvalds. Linux in itself is only a kernel while Unix is a complete operating system with all components coming from a single source. Linux in conjunction with GNU Project is a complete system, and the code is freely available.

A few popular names in Linux Distribution (Operating System) are Redhat Enterprise Linux, Debian Linux, Fedora Linux, Suse Enterprise Linux, Ubuntu Linux




Understanding the Differences

Although they share the same foundations, Linux & Unix have a number of technical differences.Primarily commercial Unix versions remain largely consistent as they follow published standards, retaining established norms. Linux on the other hand is more diverse. Different developers have developed different versions modifying elements as required. This often makes it difficult for developers to switch between versions or keep track of changes.

Both software packages come with their own set of tools, firewall systems, backup software, and other applications.

A major difference is in the filesystems support. Linux was created for personal computer but it’s more flexible than UNIX as it supports far many more file-system types than UNIX. This flexibility has made Linux an extremely popular and powerful tool. Commercial Unix versions usually supports two or three filesystem types but Linux supports almost all the different filesystem types that are available under any form of operating system. Not surprisingly, Linux is today used on a wide variety of hardware ranging from mobile phones, or video game systems to supercomputers.

Linux has numerous forms of operating systems available– both free and paid. Cheaper than the commercial versions, the paid versions offer support, training and consultancy services.  For Unix, a commercial license would need to be procured for deploying the software.


Read more about UNIX - Unix inDetail


If you like this post, please share it on google by clicking on the Google +1 button.

Friday, 9 December 2016

7 Top Big Data Tools for Enterprise Developers


Big Data is now a critical technology utilized for leveraging data to enable better decision-making. Developers today have a wide variety of choice in picking a tool for their needs – open source or proprietary.

A through assessment of the existing data structure and the business requirements is essential for developers to identify the right tool. Predominant data formats, existing database types, available budget and the desired output analytics are some important factors. A few of the top tools that can be considered by developers are given below:



1.   MongoDB
This is an open source platform that that is heavily document oriented allowing for full-fledged indexing. Users can index any attribute and can also scale the data horizontally. Its a cross-platform tool that allows developers great control over final results

Read more about this at Mongo DB-Features         


2.   SAP HANA
A proprietary platform from tech giant SAP, HANA offers in-memory data storage that speeds up data processing and delivers more insightful data. It is flexible in handling all forms of data including spatial data, graph data and text data, at real time, making it a powerful though an expensive tool for holistic data analytics.


3.   Google Charts
A free open source platform from Google, this has a wide range of capabilities to handle data off a website. Primarily used for visualization, it can be plugged into a website with a simple JavaScript code. Developers can use it to create dashboards, carry out data management tasks, pull data from an external database, among other tasks


4.   Hadoop
Hadoop is the Big Data tool that everyone has been talking about. An open-source framework, it can handle, store and process massive amounts of data. Due to a distributed computing model, the data processing is fast and powerful. The tool is highly flexible and scalable, making it an ideal choice to leverage Big Data analytics for enormous data sets

Read more about this at Hadoop-Features


5.   Spark
Spark is an highly popular open-source data processing platform and is said to be the most active Apache open source project under Big Data. It is an extremely fast – 100x faster than Hadoop- and flexible platform, enabling analyses all forms of structured and unstructured data. Its advantages include use of multiple languages and accessibility to other databases. 


6.   Splice Machine
Splice Machine is a SQL-on-Hadoop database that analyses data in real time. The tool allows developers to utilize standard SQL on it, giving it flexibility and making it easier to use.
Splice Machine which is also ACID-compliant product is available on a freemium basis and has a list price annual license fee of $5,000 per node.


7.   Splunk
It’s a popular Advanced IT Search Tool used by many companies which derives information from machine data. To make it more lucid, Splunk has the ability to search, monitor and analyze through all machine generated data such as log data generated by applications, servers, and network devices across an organization.

This product further indexes structured as well as unstructured data and helps in diagnosing the problems, making it easy for administrators, business analysts and managers to detect requisite information.

Read more at Splunk-Features

Thursday, 1 December 2016

Oracle’s DYN Acquisition Fits Into Its Goal To Become A Cloud Leader


Last week tech giant Oracle announced that it was acquiring DYN, the popular cDNS provider, for an unspecified amount. Some reports have said that it could be in the region of $600-700 million.

DYN’s cloud-based platform manages and optimises the performance of internet applications and infrastructure by using analytics and intelligent routing. Its Internet performance and Domain Name System (DNS) solution is being used by over 3,500 companies that include top digital brands like Netflix, Twitter and Reddit.  On a daily basis, it handles over 40 billion traffic optimization decisions, making it one of the leading DNS service providers

For Oracle, buying DYN offers an opportunity to challenge the current leaders of cloud computing, Google Cloud.  Oracle currently lags significantly behind these companies in terms of the cloud computing market share, having primarily a portfolio that’s limited to datacentres systems. 


The Enterprises services leader does has a variety of Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) products but adding DYN’s range of scalable services would help Oracle’s customers to get access to cutting edge traffic optimization technologies, filling a gap that might have taken far more time to plug with organic product development.

Oracle has pegged it as a natural extension to its cloud solutions – a service that is the link between hosting data and incoming traffic, resulting in improved metrics for access and user satisfaction for its clients.

This latest acquisition is in keeping with Oracles strategy of buying companies that have noteworthy products in cloud computing – it has in recent times acquired cloud-based applications firm LogFire, cloud access security broker Palerra as well as NetSuite the integrated cloud business software suite.  

If looked at the pattern of these acquisitions, it clearly show the intent of Oracle to move away from its legacy software-led business towards the cloud which in recent times has significantly reshaped how businesses and IT infrastructure are built and run.

It in fact has a stated goal to become the first tech company to reach $10 billion in revenue from cloud business. The NetSuite deal alone is expected to add close to $1 billion in revenue giving a boost to its cloud business.

"With the DYN acquisition, Oracle will be able surely to leapfrog into direct competition with leaders of cloud computing, and make an attempt at taking the leadership position in the market " . 



If you like this post, please share it on google by clicking on the Google +1 button.

Monday, 28 November 2016

TOP 6 BIG DATA TRENDS IN THE NEAR FUTURE

Big Data is a buzz word we all are familiar with now. But behind the buzz, there have been rapid developments which has changed business models and brought big data to the strategic foreground. 2016 has been a pretty eventful year for BigData and the future indicates promising. Let’s take a look at the top trends that will follow in the upcoming year:



1) CUSTOMER DIGITAL ASSISTANTS
One of the surprising trends we saw this year was the growing interests in Digital Assistants. The logic had been simple: If we could gather and process data to generate meaningful results, why do we need humans to convey them to customers?  The most devoted users are perhaps the gamers, who have fully accepted this technology in the likes of XBox One and Sony PS4. With advanced NLP and audio-recognition, mobile digital assistants like Cortana, Siri and Google Now are almost the must-haves today, and all signs indicate that digital assistants will play an even more important role in the upcoming year.



2) SIMPLER DATA ANALYSIS
Like many past years, data saw an unprecedented growth in volume and veracity. With this rate, the current data analysis techniques would soon be obsolete. However, the upcoming trend in 2017 might focus on simplifying the data analysis process, to an extent where even non-coders could easily analyze huge datasets. Giants like Microsoft and Salesforce are working upon it, while complementary tools to SQL like Spark will continue to make storage and access of data easier.



3) MACHINE LEARNING IS THE FUTURE
Not far ago, machine learning was considered purely a research field. For the benefit of all, this perception soon changed and today, machine learning has dedicated departments in numerous companies. For business purposes, the idea of machine learning is to serve as an extension to predictive analytics, thereby minimizing the work and maximizing the profits. This trend will continue to be one of the top business strategies in the future.



4) DATA-AS-A-SERVICE
Although it took a long, long time; but today, companies realize the importance of their data. This, in turn, is giving rise to an entire new business model of data-as-a-service (DaaS). With IBm's acquisition of The Weather Channel, more tech giants might realize that their data can, in fact, be converted into a profitable service.



5) THE TRANSITION OF BIG DATA to “ACTIONABLE DATA”
Big data will continue to face its existing challenges- the most prominent being the required manpower to handle the ever-increasing volume. Privacy concerns will also continue to haunt the general perception regarding the increased use of Big Data. Amidst all that is the new question: Why to worry about big data when most companies only use a fraction of it anyway? The answer to this question is giving rise to a new trend of "actionable data", data that is relevant to the business. It is entirely possible that big data may be replaced by actionable data in upcoming years.



6) INTERNET-OF-THINGS
One of the most revolutionary digital concepts of this century, IoT still fascinates masses, even if its application continues to face hurdles. But the rise and success of IoT is inevitable. With the rapid rate with which devices are becoming integral parts of our lives, IoT can provide us with unmeasured potential. While the initial cost of converting every device as a node in a vast, digital world is pretty high, it is estimated that IoT will grow by 30% in next 5 years, creating an economic value of $4-11 trillion by 2025.

If you like this post, please share it on google by clicking on the Google +1 button.

Thursday, 20 October 2016

How to Drop Indexes/ Unique Indexes in Oracle?

There can be multiple situations where we don’t require indexes and have to drop them.

  • Sometimes it’s better to drop the indexes when there is not much performance gain for your table with indexes.
  • Once the indexes becomes invalid, you must first drop the indexes before rebuilding it.
  • If your indexes are too fragmented, it’s better to drop the indexes and create a new index since rebuilding an index requires twice the space of the index.

All the extents of the index segment are restored to the containing table
space once you drop the index so that it becomes available to other objects in the table space.

Below is the command to drop indexes:
SYNTAX : DROP INDEX [OWNER.]INDEXNAME [FROM [OWNER.]TABLENAME]
EXAMPLE:
SQL> DROP INDEX EMP_NAME_IDX;
INDEX DROPPED
 SQL>


Conversely, you can't drop any implicitly created index, such as those created by defining a UNIQUE key constraint on a table, with the drop index command. If you try to do so it will throw an error.

SQL> DROP INDEX EMP_NAME_IDX ;
 DROP INDEX EMP_NAME_IDX *
ERROR AT LINE 1: ORA-02429: CANNOT DROP INDEX USED FOR ENFORCEMENT OF UNIQUE/PRIMARY KEY


If you want to drop such an index you have to first drop the constraint defined on the table. In order to drop a constraint, issue the drop constraint command, as shown here:

SQL> ALTER TABLE EMP DROP CONSTRAINT emp_name_PK1;
TABLE ALTERED.
SQL>


You can query the ALL_CONSTRAINTS performance view to understand which constraint the index is used by,


SELECT OWNER, CONSTRAINT_NAME, CONSTRAINT_TYPE,
 TABLE_NAME, INDEX_OWNER, INDEX_NAME
FROM ALL_CONSTRAINTS
WHERE INDEX_NAME = 'EMP_NAME_IDX';





If you like this post, please share it on google by clicking on the Google +1 button.

Please go through similar Oracle Posts @DWHLAUREATE:



Saturday, 1 October 2016

Oracle Indexes Performance and Creation Guidelines

These guidelines will help you create and manage indexes and help improving the performance by correct usage of indexes.

DON’T ADD INDEXES WORTHLESSLY:
Addition of indexes increases performance but also ingest disk space.Based on the performance improvement add as many indexes as required sensibly.

MARK INDEXES AS UNUSABLE OR INVISIBLE RATHER THAN DROPPING
Before dropping an index think over marking the indexes as unusable and invisible. This give us an extra option to check for any performance issues before dropping the index. If there are any performance issues we can revert back by rebuilding or re-enable the index without requiring the data definition language (DDL) creation statement.

You can read more about Invisible Indexes here:

It’s better to drop the indexes that are not used by any database objects as it would free up the physical space and improve the performance.

INDEXING METHODOLOGY:
Indexing the columns that are used in queries executed against a table will help improve the performance.

CREATE PRIMARY /UNIQUE CONSTARINTS:
Build primary constraints on all tables and unique constraints wherever applicable. This will automatically create a B-tree index if the columns are not already indexed.

USING SEPARATE TABLESPACE FOR INDEXES
Using distinct table space helps in managing indexes separately from tables. Table and index data may have different storage and/or backup and recovery requirements.

USE BITMAP INDEXES IN DATAWAREHOUSE ENVIRONMENT
Bitmap indexes are used for complex queries in a data warehouse environment to prevent spending long time to access and retrieve answers for the queries. B-Tree index technique is used for high cardinality column and Bitmap Indexes have predominantly been used for low cardinality columns.

Bitmap indexes achieve important functions in answering data warehouse’s queries because they have capability to perform operations at the index level before fetching data

To learn more about Bitmap & B-tree indexes check our previous post


USE APPROPRIATE NAMING STANDARDS
Correct naming standards would help in the maintenance and troubleshooting easier.



If you like this post, please share it on google by clicking on the Google +1 button.

Please go through similar Oracle Posts @DWHLAUREATE:




Saturday, 17 September 2016

How to Create/Change/Set Databases in Hive?

As discussed in previous posts, HIVE makes it easier for developers to port SQL-based applications to Hadoop, compared with other Hadoop languages and tools.


Hive is most suited for data warehouse applications where data is static and fast response time is not required and record-level inserts, updates, and deletes are not required

Creating a Database
The simplest syntax for creating a database in hive is shown in the following example:
Go to the Hive Shell by giving the command sudo hive and enter the command


CREATE DATABASE <DATA BASE NAME>
EXAMPLE
HIVE> CREATE DATABASE HR_STAGING;

HIVE> CREATE DATABASE IF NOT EXISTS HR_STAGING;


We can suppress the warning if the database hr_staging already exists in the hive database by using IF NOT EXISTS. The general syntax for creating the database in Hive is given below. The keyword ‘SCHEMA’ can be used instead of ‘DATABASE’ while creating database.



CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] DATABASE_NAME
  [COMMENT DATABASE_COMMENT]
  [LOCATION HDFS_PATH]
  [WITH DBPROPERTIES (PROPERTY_NAME=PROPERTY_VALUE, ...)]


The CREATE DATABASE command creates the database under HDFS at the default location: /user/hive/warehouse.
Hive creates a directory for each database. Tables in that database will be stored in sub directories of the database directory. The exception is tables in the default database, which doesn’t have its own directory.

Syntax to see the databases that already exists in hive. 

  

HIVE> SHOW DATABASES;
HR_STAGING

HIVE> CREATE DATABASE EMP_STAGING ;
HIVE> SHOW DATABASES;
HR_STAGING
EMP_STAGING

HIVE> SHOW DATABASES LIKE 'H.*';
HR_STAGING


Using a Database
The USE command sets a database as your working database, similar to changing working directories in a file system:

SYNTAX:
USE <DATABASE_NAME>
HIVE> USE HR_STAGING;
HIVE > USE DEFAULT;


Dropping Database in Hive
Syntax to drop a database:

HIVE> DROP DATABASE IF EXISTS HR_STAGING;


Hive won’t allow to drop the database if they contain tables. In such case we have to either drop the table first or append the CASCADE keyword to the command, which will cause the Hive to drop the tables in the database first.


DROP (DATABASE|SCHEMA) [IF EXISTS] DATABASE_NAME
[RESTRICT|CASCADE];

HIVE> DROP DATABASE IF EXISTS HR_STAGING CASCADE;



Alter Database in Hive
You can set key-value pairs in the DBPROPERTIES associated with a database using the ALTER DATABASE command. No other metadata about the database can be changed, including its name and directory location:

ALTER (DATABASE|SCHEMA) DATABASE_NAME
SET DBPROPERTIES (PROPERTY_NAME=PROPERTY_VALUE, ...); 

ALTER (DATABASE|SCHEMA) DATABASE_NAME
SET OWNER [USER|ROLE] USER_OR_ROLE;

HIVE> ALTER DATABASE HR_STAGING
SET DBPROPERTIES ('EDITED-BY' = 'XXXX');


If you like this post, please share it on google by clicking on the Google +1 button.