tirsdag den 17. marts 2009

Making Sense Of All The Data: Google, Hadoop & Cloudera

Nøglen til succes er ikke en kæmpe computer men utallige mindre af slagsen ... mange bække små ...

The key to success is not to own THE world largest computer but to own the largest number of computers ...


The article, Hadoop, a Free Software Program, Finds Uses Beyond Search, explains the very interesting history behind Hadoop. What is Hadoop? It’s distributed computing software that enables data mining and analysis on a huge scale. It also, apparently, is an open-source version of proprietary software developed by Google to process and analyze massive volumes of data for search. Here’s how the NY Times explains the problem Google was addressing:

By 2003, Google found it increasingly difficult to ingest and index the entire Internet on a regular basis. Adding to these woes, Google lacked a relatively easy to use means of analyzing its vast stores of information to figure out the quality of search results and how people behaved across its numerous online services.

To address those issues, a pair of Google engineers invented a technology called MapReduce that, when paired with the intricate file management technology the company uses to index and catalog the Web, solved the problem.

The MapReduce technology makes it possible to break large sets of data into little chunks, spread that information across thousands of computers, ask the computers questions and receive cohesive answers. Google rewrote its entire search index system to take advantage of MapReduce’s ability to analyze all of this information and its ability to keep complex jobs working even when lots of computers die.

MapReduce represented a couple of breakthroughs. The technology has allowed Google’s search software to run faster on cheaper, less-reliable computers, which means lower capital costs. In addition, it makes manipulating the data Google collects so much easier that more engineers can hunt for secrets about how people use the company’s technology instead of worrying about keeping computers up and running. [...]

Read more: http://searchengineland.com/making-sense-of-all-the-data-google-cloudera-and-hadoop-explained-16962

Ingen kommentarer:

Send en kommentar