What is Big data!! It’s nothing but a collection of complex data
which is difficult to process with the existing tools. The size of data ranges
forms few dozen terabytes to many petabytes of data in a single data set. This
data can be posts to social media sites, digital pictures and videos or any
other information
Apache Hadoop is a very
popular solution Big Data .For the storage of Big data we use different kind of
storage like S3, Hadoop Distributed File System (HDFS)
- Amazon S3 filesystem.
What is S3??
Amazon S3 (Simple Storage
Service) is an online file storage web service (Internet hosting service
specifically designed to host user files.) offered by Amazon Web Services. Apache
Hadoop file systems can be hosted on S3, also Tumblr, Formspring, Pinterest, and
Posterous images are hosted on the S3 servers.
S3 stores arbitrary
objects (computer files,) up to 5 terabytes in size. This are stored in the
form of buckets. It can store data from web applications to media files and we
can retrieve from anywhere in Web.
- Hadoop Distributed File System
The Hadoop Distributed
File System (HDFS) is a portable file system built for the Hadoop framework.
HDFS is to store very
large amount of data by sharing the storage and computation across many
servers. HDFS stores large files with ideal file size is a multiple of 64 MB
Below are some of the
organizations that are using Hadoop.
- Facebook
- Yahoo
- Amazon.com
- LinkedIn
- StumbleUpon
- Twitter
- Google and many more companies…
No comments:
Post a Comment