All you need to know about Hadoop

1) Hadoop and Big data-

i) What is Big data?

– Big data is a marketing term, not a technicality. Everything is big data these days.
– Big data consist of three Vs-

a) Volume – Now days data is collected in large amount
b) Velocity – The speed which we access data
c) Variety – All types of data formats. Structured, semi-structured, unstructured, log files, pictures, audio files, communications records, email.

Big data is like teenage sex

ii) What is Hadoop?

Hadoop is divided into two components-
a) Open source data storage – [HDFS]
b) Processing – Map-Reduce API

DefinitionHadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing.

– Hadoop is not a database. It is alternative file system.

2) How did Hadoop get here?

– Hadoop was created by Doug Cutting. He was working on Nutch project- an open-source web search engine. Their main goal to invent to a way to return a web search result faster by distrusting data and calculations across different computers so multiple tasks could be accomplished simultaneously. During the same time another search engine project called Google was in progress on the same concept.

– In 2006, Cutting joined Yahoo and took with him the Nutch project as well as ideas based on Google’s early work with automating distributed data storage and processing. In 2008, Yahoo released Hadoop as an open-source project.

Fun Fact- Hadoop was the name of a yellow toy elephant owned by the son of Doug Cutting.

3) When should you use Hadoop?

a) When there is huge data
b) Unstructured data
c) Non-transnational data -write once and read more
d) Behaviour data – refers to observational information collected about the actions and activities. Best example is flipkart product recommendation.

4) When not to use Hadoop?

a) You require random, interactive access to data
b) Small dataset(large number of small files)
c) If you want to store sensitive data
d) Real time data

5) How does data get into Hadoop?

There are numerous ways to get data into Hadoop. Here are just a few:

a) Using Java program you can load data in HDFS
b) Using Shell script/ command
c) Using Sqoop to import structured data from a relational database to HDFS, Hive and HBase
d) Using Flume to continuously load data from logs into Hadoop.

6) Hadoop Ecosystem

a) Pig – a platform for manipulating data stored in HDFS. It consists of a compiler for MapReduce programs and a high-level language called Pig Latin. It provides a way to perform data extractions, transformations and loading, and basic analysis without having to write MapReduce programs.

b) Hive – a data warehousing and SQL-like query language that presents data in the form of tables. Hive programming is similar to database programming.

c) HBase – a non-relational, distributed database that runs on top of Hadoop. HBase tables can serve as input and output for MapReduce jobs.

d) Zookeeper – an application that coordinates distributed processes.

e) Ambari – a web interface for managing, configuring and testing Hadoop services and components.

f) Flume – software that collects, aggregates and moves large amounts of streaming data into HDFS.

g) Sqoop – a connection and transfer mechanism that moves data between Hadoop and relational databases.

h) Oozie – a Hadoop job scheduler.

You can see a full list of Apache Hadoop project on their official website.

PHP 7 release – is major syntax changes?

Are you as shocked with the release of php 7 after the 5.x(5.6) version?

Here is story – Andrei Zimievski has initiated a project to implement the native Unicode support throughout the PHP. It was planned to release with PHP6 along with other new features. However this development is Abandoned. so there was an intense debate about the name of the next major version.

Finally they decided to vote and came with name PHP 7. You can see arguments and voting in document are available – Name of Next Release of PHP

The whole feature set of PHP 7 is not defined. We can see top major changes.

Top Features for PHP 7

1) Huge Performance Improvements

phpng (PHP next generation) – Main goal to improve performance that could at least match what Facebook HHVM provides.

Zeev Suraski written article on which he clearly take HHVM as competitor.

I believe it would be good enough to beat the speed of HHVM.

2) JIT Engine

According to Dmitry Stogov of Zend, the development of PHPNG was started with the motivation to research the implementation of a JIT engine for the Zend Engine based PHP.

A JIT engine can dynamically compile Zend opcodes into native machine code that eventually would make the code run faster next time it is run.

3) AST (Abstract Syntax Tree)

Another change that boost up the performance . AST is intermediary step for the PHP compilation process.

An AST would provide several advantages that he described in his proposals, including the potential for more optimizations that would make PHP run even faster.

4) Asynchronous Programming

Facebook hack already implemented asynchronous programming which push PHP core development team to integrate asynchronous Programming feature in PHP 7.

An event loop is part of the code that takes care of handling events related with I/O operations and other asynchronous tasks that may be going on in parallel, like accesses to files, network, databases, timers, etc..

In simpler terms, this would allow future PHP versions to easily implement support to the execution of parallel tasks within the same request, thus pushing PHP performance improvement potential to a totally different level.


5) Standalone Multi-threading Web Server

This is what PHP make more scalable. PHP can already be run from multi-threaded Web servers like ngynx, lighttpd or even Apache in worker mode, however that is not the same as having PHP run on its own multi-threading Web server.

A multi-threading Web server can handle many simultaneous requests using a single memory pool, thus avoiding the memory waste that happens when you run PHP as FastCGI or in Apache pre-fork mode.

Despite running PHP as a standalone multi-threading Web server is not yet in the plans for PHP 7, it is certainly something good to have, at least for PHP 8.

When will be the PHP 7 Release Date?

Different people estimate it will take between 1 to 3 years. A reasonable guess is to expect a final PHP 7 release some time in 2016, although it is not impossible to see early alpha versions still in mid October 2015.

Conclusions

We should thank Facebook to make the PHP core developers wake up and move faster to integrate these great features from HHVM and the Hack Language. PHP 7 version will be certainly exciting. Keep up with all the CodePlateau social accounts to stay in touch with the latest news about Web Development.

Resource: PHPClasses