Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • December 08, 2016, 06:06:36 AM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Anyone else building Hadoop Clusters?  (Read 1533 times)

Rover

  • Master of Smilies
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 630
    • View Profile
    • Donate to Member
Anyone else building Hadoop Clusters?
« on: March 03, 2013, 10:30:23 PM »
I've been working with a group of very smart and cool guys at my large medical industry provider company to build an Enterprise Class Hadoop  Cluster.

The Good:  It's new, exciting, improving and growing.  Besides all of that it freaking works like a silver bullet.  If huge data scans are your bane, Hadoop is your balm.

We've secretly journal-ed some of our thoughts here: DataForProfit

I'm just curious if any other DC's have been playing w/ Hadoop.  I really think it could be a game changer for large enterprises.

$0.02  ..  :two:
Insert Brilliant Sig line here
« Last Edit: March 04, 2013, 06:27:12 PM by Rover, Reason: fixing typeo »

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,768
    • View Profile
    • Donate to Member
Re: Anyone else building Hadoop Clusters?
« Reply #1 on: March 04, 2013, 06:54:09 AM »
I'm curious - what exactly are you building with it - assuming you can talk about it?

Or, if you can't, can you at least recommend a good book on HAdoop?

I was never able to make it through the O'Reilly "Definitive Guide" on Hadoop because it was (surprise!) so poorly written. And there doesn't seem to be many other books out on Hadoop yet.

Rover

  • Master of Smilies
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 630
    • View Profile
    • Donate to Member
Re: Anyone else building Hadoop Clusters?
« Reply #2 on: March 04, 2013, 06:44:24 PM »
OK, so Core Hadoop is HDFS with Map/Reduce.
HDFS is the Hadoop Distributed File System.  Much like Gluster and Ceph, it uses local disk and replication across nodes to provide a resilient cluster FS. 
Map/Reduce is a method of writing parallel applications that find (map) data and return filtered (based on your logic) results (reduce) back to the user.  Hadoop provides the API with Map/Reduce functions.

Since HDFS distributes code to "smart" nodes, it can process data locally and is really fast for Table Scan type functions. 

We're building our cluster to help do analytics on our Sales and Order Data,  Medical Trial results, and other stuff.

I'd grab a copy of Hadoop Operations or Hadoop in Action.

Here's a good visual.  Traditional Database is like you searching thru a deck of shuffled cards searching for the Ace of Spades.  Hadoop is like getting a group of 52 10 year-olds  and giving each a card.  You ask who has the Ace and you get the answer instantly.  They smaller and less intelligent than you, but they can still find the card faster.  So instead of building monster sized servers for databases, you buy lots of smaller, cheaper systems.  Even with the duplication of data, our cost per GB of storage has dropped by 90%.

Hadoop is kewl.  :Thmbsup:
Insert Brilliant Sig line here

Edvard

  • Coding Snacks Author
  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 2,888
    • View Profile
    • Donate to Member
Re: Anyone else building Hadoop Clusters?
« Reply #3 on: March 05, 2013, 12:52:17 AM »
Linky:

http://hadoop.apache.org/
hadoop-logo.jpg

Sounds very interesting, I've always wondered about other practical uses for clusters besides monstrous number-crunching, and your illustration of cards and 10-year olds is eminently descriptive.

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,768
    • View Profile
    • Donate to Member
Re: Anyone else building Hadoop Clusters?
« Reply #4 on: March 05, 2013, 07:54:57 AM »
Here's a good visual.  Traditional Database is like you searching thru a deck of shuffled cards searching for the Ace of Spades.  Hadoop is like getting a group of 52 10 year-olds  and giving each a card.  You ask who has the Ace and you get the answer instantly.  They smaller and less intelligent than you, but they can still find the card faster.

Interesting concept - and one I first learned about while reading Yale University professor David Gelernter's 1992 book Mirror Worlds which described the exact same approach - along with some other even more interesting ideas such as "plunge" and "squish." It's also the first place I ever read about the concept of linking generic networked computers together in order to create ad hoc massive parallelism. Something we now refer to as "clustered computing."

Much, if not most in this book, that was once dubbed 'sci-fi.' 'fantasy,' and 'speculation' back in the early 90s has since become our daily reality. (And also the basis for numerous patent disputes. :mrgreen:)

Quote
From Kirkus Reviews

Within ten years, Gelernter (Computer Science/Yale) predicts here, scientists will deploy computer systems able to capture extensive data about a particular ``reality'' (hospital, city, etc.), and to present a constantly updated model on a desktop computer.

``A Mirror World is some huge institution's moving, true-to- life mirror image trapped inside a computer--where you can see and grasp it whole,'' Gelernter writes. Citizens will be able to visit these computer models like public squares, gaining unprecedented access to data on what's going on (and the officials in charge, the author intimates, will presumably welcome a chance to have their performance monitored). Building such mirror worlds will be extraordinarily difficult: streams and rivers of raw data need to be constantly flowing; thousands of computers must process the data in parallel fashion; and tying it all together will demand new kinds of software of immense complexity.

Gelernter explains clearly the problems to be solved and describes pieces of the technology already working in research labs. Left unchallenged is his assumption that such technology will remain benign--giving honest folk a way of grasping an ever-more complex world instead of providing the powerful owners of such technology a superb way to distort and control ``reality.''

Plausible but potentially frightening view of what the future could hold if those who view ``reality'' as merely a vast array of numbers waiting to be crunched have their way.

(Twenty illustrations--not seen.) -- Copyright ©1991, Kirkus Associates, LP. All rights reserved. --This text refers to an out of print or unavailable edition of this title.

Well worth hunting down a copy if you're interested in parallel computing and data analysis. (Also available on Kindle) :Thmbsup:
« Last Edit: March 05, 2013, 08:05:39 AM by 40hz »