Now a days almost all the companies are using Cloud platforms despite any technology. There are lot reasons behind it. In this blog we will see what is Cloud computing, what are all the services they provide, Cloud services providers in the market and why they are used in Bigdata processing.
What is Bigdata processing?
Traditional RDBMS can be used to process the data but they can hold the data up to some volume and after that they can't process data fast. As the size of the data grows, data will be moved to data warehouse where the historical data can be stored. But data warehouse is not meant for processing. To process large data sets which are of sizes TBs and PBs we need to have special processing unit, which can't done on the traditional RDBMS that is installed on a server. And also one processor cannot do it alone as the data is huge. To overcome this problem Hadoop was developed. Hadoop is a framework that uses cluster of computers to process the data by distributing chunks of data on each hardware and aggregate the processed data finally. Hadoop used hardware disks to store the data and intermediate even. So the processing speed is too low. Again to overcome this issue Spark was developed which uses memory to process the data. Memory will be faster to read and write the data and there won't any disk I/O.
What is Cloud computing?
To process data we in Big data environment we need to have lot of hardware components and set them up for processing. And also we always need to perform maintenance activities regularly like patches upgrade. For this we need lot of man power 24*7. This we avoid by using Cloud computing. We have several Cloud services providers in market, who provide us the remote access of cluster with all the softwares pre-configured. So we don't need to maintain them and we don't need to install any softwares or perform patch upgrades.
Cloud services providers
In the initial days of Cloud computing there was domination of Amazon's cloud product AWS. But recently so many other organizations came up with their Cloud sesrvices, some them are Google PCF, Microsoft Azure and IBM Cloud.
Why Cloud computing is used in Big data?
Big data and Cloud computing together will be most efficient. There are several reasons for that. They are:
Advanced analysis - Cloud computing is very advanced. They provide with software as well as hardware services. You don't need to buy the hardware and configure the softwares. They even provide storage services also.
Scalability - In Bigdata environment user cannot fix the amount of data that he is going to process. Some times size of the data increases suddenly. Then the cluster that is configured might not be enough in case of Non-Cloud cluster. Whereas in Cloud platforms we can just configure the size of the cluster and use it.
Lower maintenance cost - Companies don't need to buy the huge hardware components to set up the cluster environment. And also don't need to install and configure softwares on the cluster. So buying cost and maintenance cost for all theses activities will be decreased if we opt for Cloud platform.
Security - Security will be the first concern when we deal with enterprise data, because of increased threats and illegal cyber activities to theft the data. Cloud platforms provide us the complete security by providing some authorization keys and certificates to access data.
Storage - Using Cloud platforms we overcome problem storing the data, which provide unlimited storage.
Please comment what you think of this post..