This article is a mirror article of machine translation, please click here to jump to the original article.

View: 8750|Reply: 0

Understanding before you get started with Hadoop

[Copy link]
Posted on 12/8/2017 1:33:48 PM | | |

What is hadoop?
(1) Hadoop is an open-source framework for writing and running distributed applications to process large-scale data, designed for offline and large-scale data analysis, and is not suitable for the online transaction processing model of random reads and writes to several records. Hadoop = HDFS (file system, data storage technology related) + Mapreduce (data processing), Hadoop's data source can be in any form, it has better performance than relational databases in processing semi-structured and unstructured data, and has more flexible processing capabilities, regardless of whether any data form will eventually be converted into key/value, key/value is the basic data unit. Use functional expressions to replace SQL with Mapreduce, SQL is a query statement, and Mapreduce uses scripts and code, while for relational databases, Hadoop, which is accustomed to SQL, has an open source tool hive instead.
(2) Hadoop is a distributed computing solution.

What can hadoop do?
In 2009, 30% of non-programmers on Facebook used HiveQL for data analysis. Hive is also used for custom filters in Taobao search; Pig can also be used for advanced data processing, including Twitter and LinkedIn to discover people you may know, and can achieve Amazon.com-like collaborative filtering recommendation effects. Taobao's product recommendations are also recommended! In Yahoo! The40% of Hadoop jobs are run with pig, including spam identification and filtering, as well as user signature modeling. (New update on August 25, 2012, Tmall's recommendation system is hive, try mahout in small quantities!) )
The latest version of hadoop download address: http://hadoop.apache.org/releases.html

Build and install Hadoop 2.x or later on Windows, link: https://wiki.apache.org/hadoop/Hadoop2OnWindows

1. Introduction

Hadoop version 2.2 and above includes native support for Windows. The official Apache Hadoop version does not include Windows binaries (as of January 2014). However, building a Windows package from source is fairly simple.

Hadoop is a complex system with many components. It's helpful to do some familiarization before attempting to build or install, or at a high level for the first time. If you need troubleshooting, you need to be familiar with Java.


Hadoop developers used Windows Server 2008 and Windows Server 2008 R2 during development and testing。 Windows Vista and Windows 7 may also work due to the similarity of the Win32 API to the respective server SKU. We haven't tested it on Windows XP or any earlier version of Windows, which is unlikely. Any issues reported in Windows XP or earlier versions will be considered invalid.

Do not try to run the installation in Cygwin. Cygwin neither requests nor supports it.







Previous:C# winform in listview sorting
Next:Domestic excellent npm image recommendation and use
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com