redgeek: apache hadoop

Showing posts with label apache hadoop. Show all posts

Thursday, June 21, 2012

Secondary Namenode - What it really do?

Secondary Namenode is one of the poorly named component in Hadoop. By its name, it gives a sense that its a backup for the Namenode.But in reality its not. Lot of beginners in Hadoop get confused about what exactly SecondaryNamenode does and why its present in HDFS.So in this blog post I try to explain the role of secondary namenode in HDFS.

By its name, you may assume that it has something to do with Namenode and you are right. So before we dig into Secondary Namenode lets see what exactly Namenode does.

Namenode

Namenode holds the meta data for the HDFS like Namespace information, block information etc. When in use, all this information is stored in main memory. But these information also stored in disk for persistence storage.

The above image shows how Name Node stores information in disk. Two different files are

fsimage - Its the snapshot of the filesystem when namenode started
Edit logs - Its the sequence of changes made to the filesystem after namenode started

Only in the restart of namenode , edit logs are applied to fsimage to get the latest snapshot of the file system. But namenode restart are rare in production clusters which means edit logs can grow very large for the clusters where namenode runs for a long period of time. The following issues we will encounter in this situation

Editlog become very large , which will be challenging to manage it
Namenode restart takes long time because lot of changes has to be merged
In the case of crash, we will lost huge amount of metadata since fsimage is very old

So to overcome this issues we need a mechanism which will help us reduce the edit log size which is manageable and have up to date fsimage ,so that load on namenode reduces . It's very similar to Windows Restore point, which will allow us to take snapshot of the OS so that if something goes wrong , we can fallback to the last restore point.

So now we understood NameNode functionality and challenges to keep the meta data up to date.So what is this all have to with Seconadary Namenode?

Secondary Namenode

Secondary Namenode helps to overcome the above issues by taking over responsibility of merging editlogs with fsimage from the namenode.

The above figure shows the working of Secondary Namenode

It gets the edit logs from the namenode in regular intervals and applies to fsimage
Once it has new fsimage, it copies back to namenode
Namenode will use this fsimage for the next restart,which will reduce the startup time

Secondary Namenode whole purpose is to have a checkpoint in HDFS. Its just a helper node for namenode.That’s why it also known as checkpoint node inside the community.

So we now understood all Secondary Namenode does puts a checkpoint in filesystem which will help Namenode to function better. Its not the replacement or backup for the Namenode. So from now on make a habit of calling it as a checkpoint node.

Tuesday, August 23, 2011

Hadoop workshop : First success story

We completed our first hadoop workshop on 20th August with great success . This post summarizes some of the insights and feedback we got from the event.

People love to learn a new hot technology in market. So many people are interested to learn Hadoop but they just did not have the right place to start. I think our workshop gave them the right platform to kick start in hadoop. We sold all our 17 tickets to the event within few days. So we even sold out next workshop tickets and the third workshop tickets are already selling . Yeah! its on fire.. We are doing small workshops to get the feedback and improve the overall experience.

Out of 17 , twelve people attended the workshop. Participants thoroughly enjoyed the interactive sessions and expressed that the hands on were great . The hands on went as planned which gave the participants an insight to hadoop and map/reduce .Putting in their own words,the following is what the people expressed....

“Great work by small company having effective people...Impressed! I want to have the same training once again” -Vijesh
“Good and Interactive sessions delivered.Nice job by Madhu and company” -Devang Gandhi
"Hands-on trainings were good" -Uma Mahewari
"Content delivery was very good" -Puneetha

With this kind of positive response we are charged to host more workshops. We sold out few tickets for students which is a student centric workshop on 27th Aug . People already signing up for our third workshop . So if you are interested you can register here http://hadoopworkshopsept.eventbrite.com/ asap , since we are sure that we are going to sell out that soon.

We are also launching advanced trainings particularly for the workshop attendees which gives opportunities them to go deep into Hadoop and start their carrier as a Hadoop developer .If you know hadoop and if you want to know more this will be a great opportunity.

So overall it was a great experience and it gave the feeling that we are in a right path.
If you are interested in Hadoop and its ecosystem meet us at any of the above events. We can assure you that it would be a great experience for you.

Tuesday, August 2, 2011

One day Hadoop Workshop in Bangalore

After releasing Nectar, our open source analytics framework, we got a positive feedback and many of them wanted to know more about how we use hadoop in our company and get started with the hadoop development. So, we thought that a workshop on Hadoop would be great idea.

Thus, we have arranged a workshop about Hadoop on 20th August ,2011 held at Bangalore. In the workshop, we have scheduled events as how we are using hadoop to build our own analytics products and about Nectar.We are also going to talk about how you can use Hadoop in your organization. We will be having hands on experience for the attendees in the labs to setup the hadoop cluster,running map/reduce jobs etc.For more details about the event , refer this page.

Hadoop and small things

As you know Hadoop always wants to play with Big Data . It doesn’t like small files. Initially, we thought we are going to have workshop for 10 people and the tickets were made free. But within 12 hours, all the tickets were sold out !!! Now, we have a workshop for 30 people, by adding 20 more paid tickets.On a lighter side, we learnt that we cannot do small things with Hadoop! ;)

So, if you are interested in Hadoop event and want to know more about it, then do come and join us in the workshop. You can register here.

Sunday, July 24, 2011

Nectar : Developing an open source predictive modeling framework on Hadoop

I am very happy to tell you that we have finally released the first version of Nectar,the first open source predictive modeling framework on Apache Hadoop. Being a part of the development team Nectar, I want to share some of the insights of the framework and Apache Hadoop in general.

Apache Hadoop : LAMP of new era
Back in 1990’s, LAMP (Linux,Apache,MySql,Perl/PHP) stack enabled many start ups to build innovative products. Some of them became big companies like Google,Twitter,Facebook etc., which redefined the whole Web. With the practice of open source, these companies have emerged a great way.

Now, in the scenario of solving Big Data problems, Apache Hadoop, finds the best fit. It gives a powerful stack to build powerful applications. Its vibrant community and Apache licence makes it very attractive for start ups to use as its base .Some of the companies like cloud era,Karma sphere are trying to build the innovative products on Hadoop.

Nectar : Predictive Modeling meets Apache Hadoop
Here in Zinnia Systems we see Apache Hadoop one of the most powerful stack. But with power, complexity creeps in. Though Hadoop is powerful in nature its complex to use. This is because, it must be thought in terms of Map/Reduce . So we thought, we can develop a framework which helps to abstract the map/reduce and just focus on the application. Thus we developed nectar, which provides the basic modeling algorithm like regression as simple as a java API .

We have open sourced the framework and we believe, open source is the best model to promote,encourage and envision the innovation. So, if you are interested about Hadoop and want to play with modeling problems, then have a look at our framework.

All the details about the project is made available in following links.

Info Page : http://zinniasystems.com/zinnia.jsp?lookupPage=blogs/nectar.jsp

Github Page : https://github.com/zinnia-phatak-dev/Nectar

Google Group : http://groups.google.com/group/nectar-user-group

Please checkout the code and feel free to comment or suggest us.