Original article , Declined reprint

For the purpose of work and study , Often with Spark Dealing with source code , It's hard to avoid that Spark Source code modification and testing . I always pay attention to using tools to improve efficiency , Development Spark In the process, we are also exploring how to debug the source code more smoothly .

Spark be based on Scala, use IntelliJ IDEA and sbt Deal with daily development , Nature is the best choice . How to import and compile Spark project , There's a lot of information online , The course offered on the official website is also quite detailed :

This article is based on Spark2.x Source code , Focus on how to use sbt combination IDEA Yes Spark Debug and develop breakpoints , It's very important for us to modify or study frequently Spark The reader of source code is more beneficial . That's bullshit , Let's get down to business .

Spark Source code compilation

For the first time Spark Source code , Direct import IDEA There will be a lot of mistakes , because SQL Project catalyst Medium SQL Parsing depends on ANTLR Grammar definition , You need to compile to generate code , Here's how to use sbt The process of packaging and compiling :

git clone https://github.com/apache/spark.git
cd spark
build/sbt package

... After a long wait , After successful compilation , Import IDEA You can see the source code normally .

You can use alicloud's Maven Warehouse , Speed up the process of unpacking , Please refer to my article :https://zhuanlan.zhihu.com/p/25279570

Write test cases

I'm used to working directly at Spark Write in the project TestCase As a way to execute Spark Entrance , It's a way to change things a lot Spark The development scenario of source code is very applicable , Compared to the SparkShell The benefits of writing test code in are as follows :

  • The code remains in the file , Easy to modify and re execute
  • The code is in the same project , After source code modification IDEA No need to index the code twice
  • Convenient for continuous testing (Continuous Test)

Spark Source code comes with a lot of TestCase For our reference , We use Spark Of SQL Project as an example , take spark/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala Copy to SimpleSuite.scala.

Be careful , Don't use it here IDEA The built-in replication function , because IDEA When copying, it reorganizes the code import The order of the , This can cause compilation errors . The right posture should be :

  1. stay IDEA in , Find the file to copy , Right click , Copy the code path

  1. stay IDEA Of Terminal Execute... In the window cp xxx xxx2 Copy complete

We're based on SQLQuerySuite Make a copy of SimpleSuite The document is because :Spark To ensure that the code style is consistent with the specification ( For example, the header of each code file needs to be defined Apache Of License notes ;import The order is java,scala,3rdParty,spark), Introduced in the project Scala-style checker, If the code is out of specification , There will be errors when compiling . Directly copying a file and modifying it can avoid stepping into the pit of code style checking . I will SimpleSuite The content of is amended as follows :

open IDEA Of Terminal window , perform build/sbt Get into sbt The interactive environment , Perform our SimpleSuite:

> project sql
> testOnly *SimpleSuite

project sql It means switching to SQL project , It's executing testOnly We can quickly locate our SimpleSuite class , It can be executed projects see Spark All submodules defined , The current module name will be preceded by * The logo of . It takes a long time to perform the test for the first time , It will be faster to execute again , If the test passes , You will see the following information :

stay sbt In the implementation of exit Exit the interactive environment , Next, how to use sbt combination IDEA Debug breakpoints .

sbt combination IDEA Yes Spark Debug breakpoints

because sbt Is in Terminal A process that starts separately in , Right sbt debugging , You need to use IDEA Remote debugging function of . stay IDAE From the menu of Run -> Edit Configrations..., Add a Remote To configure :

Configuration name is optional , I'm here for Spark, The port for remote debugging is 5005, If the local 5005 Port occupied , Change to another port .

Then back Terminal Restart sbt, You need to add remote debugging parameters at startup :build/sbt -jvm-debug 5005, You will be prompted during startup Listening for transport dt_socket at address: 5005, start-up sbt after , We can go through IDEA Yes sbt It was debugged .

Next we give SimpleSuite Of test Method randomly add a breakpoint inside , go back to sbt perform :

> project sql
> set fork in Test := false
> testOnly *SimpleSuite

If everything goes well , perform testOnly In the process of , Our breakpoints will be hit :

If the Spark Source code or SimpleSuite You just need to execute it again testOnly *SimpleSuite that will do .

Give Way IDEA There is a key statement to hit the breakpoint :set fork in Test := false, The purpose of this statement is to make sbt perform Test Avoid fork Subprocesses . We started sbt When adding remote debugging port is added in sbt Upper , If you execute Test Not in a process ,IDEA You can't hit the breakpoint .

If you change the code frequently , Repeat testOnly It's hard to avoid some inconvenience , We can use sbt The continuous compilation function of simplifies the process . Add... To the execution ~, That is to say ~testOnly *SimpleSuite, such , We modify the code , In preservation ,sbt Will monitor file changes and automatically execute tests , Super convenience . The same applies to compile,test,run Wait for the order .


A few key points :

# Spark Under the source code directory ( With SimpleSuite For example ):
$ build/sbt -jvm-debug 5005
> project sql
> set fork in Test := false
> testOnly *SimpleSuite

OK, Master the above skills , And we can happily go deep into Spark Inside the source code , understand Spark The operation mechanism of .

sbt combination IDEA Yes Spark More related articles on breakpoint debugging and development

  1. Spark Application remote debugging

    Originally wanted to use Eclipse Of . However, I found a circle on the Internet , I found that everyone was saying IntelliJ How good . I was also encouraged , So I decided to do it on this broken machine IntelliJ Well . Spark Program remote debugging , That is to put local IDE Connect to ...

  2. PhpStorm Integrate xdebug Debug breakpoints

    This article describes how to use PhpStorm Integrate xdebug In the local development environment breakpoint debugging skills . My configuration environment is :Windows10 + PhpStorm10.0.1 + PHP5.6. 1. download xdebug The expansion of ...

  3. PyCharm Breakpoint debugging django

    I am using PyCharm Development django When it comes to programming , For the way of printing log debugging program, I still feel a bit troublesome and not intuitive , So I studied the method of breakpoint debugging as follows : 1. Open your project , Find... In the menu bar Run-->Edit Co ...

  4. netbeans-xdebug Breakpoint debugging php

    come from NetBeans Help document on the official website : https://netbeans.org/kb/docs/php/debugging_zh_CN.html But the specific problem , We still have to say preparation Locally deployed ser ...

  5. Drools mvel dialect drl Breakpoint debugging method

    development environment :myeclipse2014,  jdk1.8.0.91,drools6.4.0.Final, drools-eclipse-plugin,mvel2-2.2.6.Final Problem description :drl send ...

  6. Eclipse Breakpoint debugging

    from :http://blog.csdn.net/maritimesun/article/details/7815903 As a developer , It is necessary to master the debugging skills in the development environment . Last year I wanted to talk about Eclipse break ...

  7. js Breakpoint debugging experience

    Although there are countless debugging tutorials on the Internet , But there is still no article that is easy to understand , Simply try to write some of your own use habits or experience , I hope it will be helpful to those children who don't know how to use breakpoint debugging ( God, please ignore ~). 1. The breakpoint ...

  8. chrome developer tool—— Breakpoint debugging

    The breakpoint , One of the functions of the debugger , You can interrupt the program where you need it , So as to facilitate its analysis . You can also set breakpoints in a single debug , Next time, just let the program run to the breakpoint automatically , You can interrupt at the position where you set the breakpoint last time , It is very convenient to operate , At the same time, it saves time ...

  9. .NET C# WeChat official account develops remote breakpoint debugging ( Local remote debugging of production environment code )

    The official account of WeChat is being developed recently , Because I haven't touched it before , Suddenly found that debugging is not convenient , It's not convenient for breakpoint trace debugging . Because the server address bound to wechat must be a public network address , But I still want to debug breakpoints ( After all, it's so convenient , The program has Bug, Step by step with ...

Random recommendation

  1. Meta Label details (HTML JAVASCRIPT)

    Meta Label details , On the Internet , I hope you found that useful No matter how wonderful your personal website is , stay “ Vast as the sea ” In cyberspace , Just like a boat is not easy to find , How to popularize Personal website , People first think of the following methods : ● In search engines ...

  2. [ turn ]【 Selfless sharing :ASP.NET CORE Project practice ( Chapter nine )】 Create area Areas, add to TagHelper

    In this paper, from :http://www.cnblogs.com/zhangxiaolei521/p/5808417.html indexes [ Selfless sharing :ASP.NET CORE Project practice ] indexes brief introduction stay Asp ...

  3. Java Suggestions for learning procedural apes ( turn )

    The first part : Not yet Java Working classmates , Including some students in school and just about to change careers Java Classmate . One .Java Basics First find one Java Learn the basic course of , Here you can recommend an address , Or you can refer to this address to find the corresponding ...

  4. mysql-5.7.9 install

    edition :mysql-5.7.9-linux-glibc2.5-x86_64.tar.gz( Compiled version ) decompression : tar -zxvf mysql-5.7.9-linux-glibc2.5-x86_64.ta ...

  5. 【poj1014】 Dividing

    http://poj.org/problem?id=1014 ( Topic link ) The question It is shown that the value of differential equation is 1,2,3,4,5,6 Of 6 Items , Input 6 A digital , Indicates the number of items of corresponding value , Can you divide the items into two parts , ...

  6. The identity used to sign the executable is no longer valid.

    It worked well yesterday , today Xcode Suddenly report this error . I searched the Internet , There's no proper solution . Then what shall I do? ? So I landed Appstore My developer account , I found that all the certificates in it were invalid state , I think so ...

  7. 12096 - The SetStack Computer UVA

    Background from Wikipedia: \Set theory is a branch of mathematics created principally by the German ...

  8. shell Script advanced cycle judgment

    p.MsoNormal,li.MsoNormal,div.MsoNormal { margin: 0cm; margin-bottom: .0001pt; text-align: justify; f ...

  9. window 10 Delete... With administrator rights Oracle Folder

    Because the file has been deleted, we don't want to explain it Because the file was installed in the wrong way , So this is the normal procedure to unload Oracle, The previous disable Orace Services and deletions Oracle There are no errors in the registry , But to the last step ---------Oracle writing ...

  10. Spring.net Some insights of

    simulation Oracle+spring.net+NHibernate+MVC Yes DLL Attention should be paid to file reference , Library files are best placed in a fixed directory , Otherwise, the success of project migration will be seriously affected , thus it can be seen ,“ Code specification ” Importance ( I spend most of my time ...