1010cc时时彩标准版 > 三分时时彩1010CC > 提交任务到Spark,提交任务Spark

原标题:提交任务到Spark,提交任务Spark

浏览次数:86 时间:2019-08-14

交由职分到斯Parker,提交任务Spark

1.场景

  在搭建好Hadoop 斯Parker遭逢后,现筹划在此条件上提交轻便的职分到斯Parker进行计算并出口结果。搭建进程: 

  自己比较熟习Java语言,现以Java的WordCount为例讲授这一体进程,要促成总结出给定文本中各种单词出现的次数。

2.遇到测验

  在上课例子在此以前,笔者想先测验一下事先搭建好的条件。

  2.1测试Hadoop环境

  首先创制三个文本wordcount.txt 内容如下:

Hello hadoop
hello spark
hello bigdata
yellow banana
red apple

  然后施行如下命令:

  hadoop fs -mkdir -p /Hadoop/Input(在HDFS创制目录)

  hadoop fs -put wordcount.txt /Hadoop/Input(将wordcount.txt文件上传到HDFS)

  hadoop fs -ls /Hadoop/Input (查看上传的文件)

  hadoop fs -text /Hadoop/Input/wordcount.txt (查看文件内容)

   2.2斯Parker际遇测量试验

  作者利用spark-shell,做三个大致的WordCount的测量检验。我就用地点Hadoop测量试验上传出HDFS的文书wordcount.txt。

  首先运维spark-shell命令:

  spark-shell

  1010cc时时彩标准版 1

  然后一贯输入scala语句:

  val file=sc.textFile("hdfs://Master:9000/Hadoop/Input/wordcount.txt")

  val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_ _)

  rdd.collect()

  rdd.foreach(println)

  1010cc时时彩标准版 2

  退出使用如下命令:

  :quit

  那样遭逢测验就离世了。

 3.Java完成单词计数

package com.example.spark;

import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.regex.Pattern;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;

import scala.Tuple2;

public final class WordCount {
    private static final Pattern SPACE = Pattern.compile(" ");

    public static void main(String[] args) throws Exception {
        SparkConf conf = new SparkConf().setAppName("kevin's first spark app");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> lines = sc.textFile(args[0]).cache();
        JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Iterator<String> call(String s) {
                return Arrays.asList(SPACE.split(s)).iterator();
            }
        });

        JavaPairRDD<String, Integer> ones = words.mapToPair(new PairFunction<String, String, Integer>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Tuple2<String, Integer> call(String s) {
                return new Tuple2<String, Integer>(s, 1);
            }
        });

        JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Integer call(Integer i1, Integer i2) {
                return i1   i2;
            }
        });

        List<Tuple2<String, Integer>> output = counts.collect();
        for (Tuple2<?, ?> tuple : output) {
            System.out.println(tuple._1()   ": "   tuple._2());
        }

        sc.close();
    }
}

4.职责交给完成

  将方面Java完成的单词计数打成jar包spark-example-0.0.1-SNAPSHOT.jar,并且将jar包上传到Master节点,我是将jar包上传到/opt目录下,本文将以三种办法提交职务到spark,第一种是以spark-submit命令的主意交给义务,第三种是以java web的秘籍交给职责。

  4.1以spark-submit命令的不二秘技交给职务

  spark-submit --master spark://114.55.246.88:7077 --class com.example.spark.WordCount /opt/spark-example-0.0.1-SNAPSHOT.jar hdfs://Master:9000/Hadoop/Input/wordcount.txt

  4.2以java web的法子提交职务

  笔者是用spring boot搭建的java web框架,达成代码如下:

  1)新建maven项目spark-submit

  2)pom.xml文件内容,这里要专注spark的正视jar包要与scala的本子相对应,如spark-core_2.11,那前边2.11正是你安装的scala的版本。

<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.4.1.RELEASE</version>
    </parent>
    <artifactId>spark-submit</artifactId>
    <description>spark-submit</description>
    <properties>
        <start-class>com.example.spark.SparkSubmitApplication</start-class>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>1.8</java.version>
        <commons.version>3.4</commons.version>
        <org.apache.spark-version>2.1.0</org.apache.spark-version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>${commons.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.tomcat.embed</groupId>
            <artifactId>tomcat-embed-jasper</artifactId>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.jayway.jsonpath</groupId>
            <artifactId>json-path</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
            <exclusions>
                <exclusion>
                    <artifactId>spring-boot-starter-tomcat</artifactId>
                    <groupId>org.springframework.boot</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jetty</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>org.eclipse.jetty.websocket</groupId>
                    <artifactId>*</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jetty</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>jstl</artifactId>
        </dependency>
        <dependency>
            <groupId>org.eclipse.jetty</groupId>
            <artifactId>apache-jsp</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-solr</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>jstl</artifactId>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.11</artifactId>
            <version>1.6.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-graphx_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>3.0.0</version>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>2.6.5</version>
        </dependency>

    </dependencies>
    <packaging>war</packaging>

    <repositories>
        <repository>
            <id>spring-snapshots</id>
            <name>Spring Snapshots</name>
            <url>https://repo.spring.io/snapshot</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>maven2</id>
            <url>http://repo1.maven.org/maven2/</url>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>spring-snapshots</id>
            <name>Spring Snapshots</name>
            <url>https://repo.spring.io/snapshot</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </pluginRepository>
        <pluginRepository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </pluginRepository>
    </pluginRepositories>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-war-plugin</artifactId>
                <configuration>
                    <warSourceDirectory>src/main/webapp</warSourceDirectory>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.mortbay.jetty</groupId>
                <artifactId>jetty-maven-plugin</artifactId>
                <configuration>
                    <systemProperties>
                        <systemProperty>
                            <name>spring.profiles.active</name>
                            <value>development</value>
                        </systemProperty>
                        <systemProperty>
                            <name>org.eclipse.jetty.server.Request.maxFormContentSize</name>
                            <!-- -1代表不作限制 -->
                            <value>600000</value>
                        </systemProperty>
                    </systemProperties>
                    <useTestClasspath>true</useTestClasspath>
                    <webAppConfig>
                        <contextPath>/</contextPath>
                    </webAppConfig>
                    <connectors>
                        <connector implementation="org.eclipse.jetty.server.nio.SelectChannelConnector">
                            <port>7080</port>
                        </connector>
                    </connectors>
                </configuration>
            </plugin>
        </plugins>

    </build>
</project>

 

  3)SubmitJobToSpark.java

package com.example.spark;

import org.apache.spark.deploy.SparkSubmit;

/**
 * @author kevin
 *
 */
public class SubmitJobToSpark {

    public static void submitJob() {
        String[] args = new String[] { "--master", "spark://114.55.246.88:7077", "--name", "test java submit job to spark", "--class", "com.example.spark.WordCount", "/opt/spark-example-0.0.1-SNAPSHOT.jar", "hdfs://Master:9000/Hadoop/Input/wordcount.txt" };
        SparkSubmit.main(args);
    }
}

  4)SparkController.java

package com.example.spark.web.controller;

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.ResponseBody;

import com.example.spark.SubmitJobToSpark;

@Controller
@RequestMapping("spark")
public class SparkController {
    private Logger logger = LoggerFactory.getLogger(SparkController.class);

    @RequestMapping(value = "sparkSubmit", method = { RequestMethod.GET, RequestMethod.POST })
    @ResponseBody
    public String sparkSubmit(HttpServletRequest request, HttpServletResponse response) {
        logger.info("start submit spark tast...");
        SubmitJobToSpark.submitJob();
        return "hello";
    }

}

  5)将品种spark-submit打成war包计划到Master节点tomcat上,访谈如下诉求:

1010cc时时彩标准版,  

  在tomcat的log中能看到计算的结果。

1.场景 在搭建好Hadoop Spark意况后,现计划在此条件上付出轻易的职责到Spark实行总计并出口结果。搭建进度:...

1.场景

1.场景

1、首先须求搭建好hadoop spark意况,并有限支撑服务正常。本文以wordcount为例。

1、首先需求搭建好hadoop spark碰到,并保管服务正常。本文以wordcount为例。

  在搭建好Hadoop 斯Parker碰着后,现计划在此条件上交给轻便的职务到斯Parker进行计算并出口结果。搭建进程: 

  在搭建好Hadoop 斯Parker遭遇后,现盘算在此情况上付出轻便的职分到斯Parker举办测算并出口结果。搭建进度:http://www.linuxidc.com/Linux/2017-06/144926.htm

2、创立源文件,即输入源。hello.txt文件,内容如下:

2、创制源文件,即输入源。hello.txt文件,内容如下:

  本身相比较熟练Java语言,现以Java的WordCount为例疏解这一体经过,要落到实处计算出给定文本中各样单词出现的次数。

  本身相比较熟谙Java语言,现以Java的WordCount为例疏解那总体经过,要促成计算出给定文本中各种单词出现的次数。

tom jerry
henry jim
suse lusy
tom jerry
henry jim
suse lusy

2.境况测量检验

2.意况测量检验

注:以空格为分隔符

注:以空格为分隔符

  在教师例子在此之前,我想先测量试验一下事先搭建好的条件。

  在讲课例子从前,笔者想先测量检验一下此前搭建好的条件。

3、然后实践如下命令:

3、然后施行如下命令:

  2.1测试Hadoop环境

  2.1测试Hadoop环境

  hadoop fs -mkdir -p /Hadoop/Input(在HDFS创设目录)

  hadoop fs -mkdir -p /Hadoop/Input(在HDFS创设目录)

  首先成立一个文件wordcount.txt 内容如下:

  首先创制一个文件wordcount.txt 内容如下:

  hadoop fs -put hello.txt /Hadoop/Input(将hello.txt文件上传播HDFS)

  hadoop fs -put hello.txt /Hadoop/Input(将hello.txt文件上传出HDFS)

Hello hadoop
hello spark
hello bigdata
yellow banana
red apple
Hello hadoop
hello spark
hello bigdata
yellow banana
red apple

  hadoop fs -ls /Hadoop/Input (查看上传的文书)

  hadoop fs -ls /Hadoop/Input (查看上传的公文)

  然后实行如下命令:

  然后试行如下命令:

  hadoop fs -text /Hadoop/Input/hello.txt (查看文件内容)

  hadoop fs -text /Hadoop/Input/hello.txt (查看文件内容)

  hadoop fs -mkdir -p /Hadoop/Input(在HDFS成立目录)

  hadoop fs -mkdir -p /Hadoop/Input(在HDFS创制目录)

4、用spark-shell先测量检验一下wordcount职务。

4、用spark-shell先测验一下wordcount职务。

  hadoop fs -put wordcount.txt /Hadoop/Input(将wordcount.txt文件上传到HDFS)

  hadoop fs -put wordcount.txt /Hadoop/Input(将wordcount.txt文件上传播HDFS)

(1)运行spark-shell,当然必要在spark的bin目录下实施,可是此地自个儿安插了情状变量。

(1)运转spark-shell,当然必要在spark的bin目录下施行,可是此间作者铺排了情形变量。

  hadoop fs -ls /Hadoop/Input (查看上传的文书)

  hadoop fs -ls /Hadoop/Input (查看上传的公文)

1010cc时时彩标准版 3

1010cc时时彩标准版 4

本文由1010cc时时彩标准版发布于三分时时彩1010CC,转载请注明出处:提交任务到Spark,提交任务Spark

关键词:

上一篇:【1010cc时时彩标准版】java里Struts2学习登录练习详

下一篇:没有了