Big data is a collection of large data sets that cannot be processed with traditional computing techniques. It is not a single technology or tool, but involves many areas of business and technology.Currently, the three major mainstream distributed computing systems are :Hadoop, Spark and Strom:Hadoop is one of the current big data management standards used in many commercial applications. You can easily integrate structured, semi-structured, and even unstructured data sets.Spark USES memory computing. Starting from multi-iteration batch processing, it allows the data to be loaded into memory for repeated queries. In addition, it integrates various computing paradigms such as data warehouse, stream processing and graph computing. Spark is built on HDFS and works well with Hadoop. Its RDD is a big feature.Storm is used for distributed real-time computing systems that handle high-speed, large data streams. Add reliable real-time data processing capabilities to HadoopHadoop is Apache's open source framework, written in Java, that allows the processing of large datasets of computers distributed in clusters, using a simple programming model. The Hadoop framework application project provides a distributed storage and computing environment across computer clusters. Hadoop is designed to scale from a single server to thousands of machines, each of which can provide local computing and storage.
Big data is a collection of large data sets that cannot be processed with traditional computing techniques. It is not a single technology or tool, but involves many areas of business and technology.<br>Currently, the three major mainstream distributed computing systems are :Hadoop, Spark and Strom:<br>Hadoop is one of the current big data management standards used in many commercial applications. You can easily integrate structured, semi-structured, and even unstructured data sets.<br>Spark USES memory computing. Starting from multi-iteration batch processing, it allows the data to be loaded into memory for repeated queries. In addition, it integrates various computing paradigms such as data warehouse, stream processing and graph computing. Spark is built on HDFS and works well with Hadoop. Its RDD is a big feature.<br>Storm is used for distributed real-time computing systems that handle high-speed, large data streams. Add reliable real-time data processing capabilities to Hadoop<br>Hadoop is Apache's open source framework, written in Java, that allows the processing of large datasets of computers distributed in clusters, using a simple programming model. The Hadoop framework application project provides a distributed storage and computing environment across computer clusters. Hadoop is designed to scale from a single server to thousands of machines, each of which can provide local computing and storage.
正在翻译中..