当前位置:网站首页>Analysis of Apache dolphin scheduler-1.3 series core table structure

Analysis of Apache dolphin scheduler-1.3 series core table structure

2020-11-13 00:23:20 Dai Lidong

Apache DolphinScheduler It's a distributed decentralization , Extensible visualization DAG Workflow task scheduling system . Committed to solving the complex dependencies in the data processing process , Make scheduling system use out of the box in data processing flow .

In recent days, , Bo Yi has contributed to the community the analysis of the core table structure of workflow , Very meticulous , Please turn around if you like

1. The overall storage structure of workflow

stay dolphinscheduler All workflow definitions created in the library ( Templates ) Is stored in a t_ds_process_definition In the table .

The database table structure is shown in the following table :

Serial number Field type describe
1 id int(11) Primary key
2 name varchar(255) Process definition name
3 version int(11) Process definition version
4 release_state tinyint(4) Release status of process definition :0 Not online , 1 Online
5 project_id int(11) project id
6 user_id int(11) The user of the process definition id
7 process_definition_json longtext Process definition JSON
8 description text Process definition description
9 global_params text Global parameter
10 flag tinyint(4) Whether the process is available :0 Unavailable ,1 You can use
11 locations text Node coordinate information
12 connects text Node connection information
13 receivers text The recipient
14 receivers_cc text Cc people
15 create_time datetime Creation time
16 timeout int(11) Timeout time
17 tenant_id int(11) Tenant id
18 update_time datetime Update time
19 modify_by varchar(36) Modify the user
20 resource_ids varchar(255) resources ids

among process_definition_json The field is the core field , Defined DAG The task information in the diagram . The data is based on JSON The way to store .

The common data structure is shown in the table below :

Serial number Field type describe
1 globalParams Array Global parameter
2 tasks Array The set of tasks in the process [ Please refer to the following sections for each type of structure ]
3 tenantId int Tenant id
4 timeout int Timeout time

The data sample :

{
    
    "globalParams":[
        {
    
            "prop":"golbal_bizdate",
            "direct":"IN",
            "type":"VARCHAR",
            "value":"${system.biz.date}"
        }
    ],
    "tasks":Array[1],
    "tenantId":0,
    "timeout":0
}

2. The storage structure of each task type is explained in detail

2.1 Shell node

** Shell The node data structure is as follows :**

Serial number Parameter name type describe describe
1 id String Task code
2 type String type SHELL
3 name String name
4 params Object Custom parameters Json Format
5 rawScript String Shell Script
6 localParams Array Custom parameters
7 resourceList Array Resource file
8 description String describe
9 runFlag String Running signs
10 conditionResult Object Conditional branch
11 successNode Array Successfully jump to node
12 failedNode Array Failed jump node
13 dependence Object The task depends on And params Mutually exclusive
14 maxRetryTimes String max retries
15 retryInterval String Retry interval
16 timeout Object Timeout control
17 taskInstancePriority String Task priority
18 workerGroup String Worker grouping
19 preTasks Array Pre task

Shell Sample node data :

{
    
    "type":"SHELL",
    "id":"tasks-80760",
    "name":"Shell Task",
    "params":{
    
        "resourceList":[
            {
    
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "rawScript":"echo "This is a shell script""
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
    
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
    
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}

2.2 SQL node

adopt SQL Query the specified data source 、 update operation .

** SQL The node data structure is as follows :**

Serial number Parameter name type describe describe
1 id String Task code
2 type String type SQL
3 name String name
4 params Object Custom parameters Json Format
5 type String Database type
6 datasource Int data source id
7 sql String Inquire about SQL sentence
8 udfs String udf function UDF function id, Separated by commas .
9 sqlType String SQL Node type 0 Inquire about , 1 Non query
10 title String Email title
11 receivers String The recipient
12 receiversCc String Cc people
13 showType String Email display type TABLE form , ATTACHMENT The attachment
14 connParams String Connection parameters
15 preStatements Array In front of SQL
16 postStatements Array After SQL
17 localParams Array Custom parameters
18 description String describe
19 runFlag String Running signs
20 conditionResult Object Conditional branch
21 successNode Array Successfully jump to node
22 failedNode Array Failed jump node
23 dependence Object The task depends on And params Mutually exclusive
24 maxRetryTimes String max retries
25 retryInterval String Retry interval
26 timeout Object Timeout control
27 taskInstancePriority String Task priority
28 workerGroup String Worker grouping
29 preTasks Array Pre task

** SQL Sample node data :**

{
    
    "type":"SQL",
    "id":"tasks-95648",
    "name":"SqlTask-Query",
    "params":{
    
        "type":"MYSQL",
        "datasource":1,
        "sql":"select id , namge , age from emp where id =  ${id}",
        "udfs":"",
        "sqlType":"0",
        "title":"xxxx@xxx.com",
        "receivers":"xxxx@xxx.com",
        "receiversCc":"",
        "showType":"TABLE",
        "localParams":[
            {
    
                "prop":"id",
                "direct":"IN",
                "type":"INTEGER",
                "value":"1"
            }
        ],
        "connParams":"",
        "preStatements":[
            "insert into emp ( id,name ) value (1,'Li' )"
        ],
        "postStatements":[
        ]
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
    
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
    
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}

2.3 Spark node

** Spark The node data structure is as follows :**

Serial number Parameter name type describe describe
1 id String Task code
2 type String type SPARK
3 name String name
4 params Object Custom parameters Json Format
5 mainClass String Run main class
6 mainArgs String Operation parameters
7 others String The other parameters
8 mainJar Object Program jar package
9 deployMode String Deployment mode local,client,cluster
10 driverCores String driver Check the number
11 driverMemory String driver Memory count
12 numExecutors String executor Number
13 executorMemory String executor Memory
14 executorCores String executor Check the number
15 programType String Program type JAVA,SCALA,PYTHON
16 sparkVersion String Spark edition SPARK1 , SPARK2
17 localParams Array Custom parameters
18 resourceList Array Resource file
19 description String describe
20 runFlag String Running signs
21 conditionResult Object Conditional branch
22 successNode Array Successfully jump to node
23 failedNode Array Failed jump node
24 dependence Object The task depends on And params Mutually exclusive
25 maxRetryTimes String max retries
26 retryInterval String Retry interval
27 timeout Object Timeout control
28 taskInstancePriority String Task priority
29 workerGroup String Worker grouping
30 preTasks Array Pre task

** Spark Sample node data :**

{
    
    "type":"SPARK",
    "id":"tasks-87430",
    "name":"SparkTask",
    "params":{
    
        "mainClass":"org.apache.spark.examples.SparkPi",
        "mainJar":{
    
            "id":4
        },
        "deployMode":"cluster",
        "resourceList":[
            {
    
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "driverCores":1,
        "driverMemory":"512M",
        "numExecutors":2,
        "executorMemory":"2G",
        "executorCores":2,
        "mainArgs":"10",
        "others":"",
        "programType":"SCALA",
        "sparkVersion":"SPARK2"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
    
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
    
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}

2.4 MapReduce(MR) node

** MapReduce(MR) The node data structure is as follows :**

Serial number Parameter name type describe describe
1 id String Task code
2 type String type MR
3 name String name
4 params Object Custom parameters Json Format
5 mainClass String Run main class
6 mainArgs String Operation parameters
7 others String The other parameters
8 mainJar Object Program jar package
9 programType String Program type JAVA,PYTHON
10 localParams Array Custom parameters
11 resourceList Array Resource file
12 description String describe
13 runFlag String Running signs
14 conditionResult Object Conditional branch
15 successNode Array Successfully jump to node
16 failedNode Array Failed jump node
17 dependence Object The task depends on And params Mutually exclusive
18 maxRetryTimes String max retries
19 retryInterval String Retry interval
20 timeout Object Timeout control
21 taskInstancePriority String Task priority
22 workerGroup String Worker grouping
23 preTasks Array Pre task

** MapReduce(MR) Sample node data :**

{
    
    "type":"MR",
    "id":"tasks-28997",
    "name":"MRTask",
    "params":{
    
        "mainClass":"wordcount",
        "mainJar":{
    
            "id":5
        },
        "resourceList":[
            {
    
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "mainArgs":"/tmp/wordcount/input /tmp/wordcount/output/",
        "others":"",
        "programType":"JAVA"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
    
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
    
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}

2.5 Python node

** Python The node data structure is as follows :**

Serial number Parameter name type describe describe
1 id String Task code
2 type String type PYTHON
3 name String name
4 params Object Custom parameters Json Format
5 rawScript String Python Script
6 localParams Array Custom parameters
7 resourceList Array Resource file
8 description String describe
9 runFlag String Running signs
10 conditionResult Object Conditional branch
11 successNode Array Successfully jump to node
12 failedNode Array Failed jump node
13 dependence Object The task depends on And params Mutually exclusive
14 maxRetryTimes String max retries
15 retryInterval String Retry interval
16 timeout Object Timeout control
17 taskInstancePriority String Task priority
18 workerGroup String Worker grouping
19 preTasks Array Pre task

Python Sample node data :

{
    
    "type":"PYTHON",
    "id":"tasks-5463",
    "name":"Python Task",
    "params":{
    
        "resourceList":[
            {
    
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "rawScript":"print("This is a python script")"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
    
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
    
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}

2.6 Flink node

Flink The node data structure is as follows :

Serial number Parameter name type describe describe
1 id String Task code
2 type String type FLINK
3 name String name
4 params Object Custom parameters Json Format
5 mainClass String Run main class
6 mainArgs String Operation parameters
7 others String The other parameters
8 mainJar Object Program jar package
9 deployMode String Deployment mode local,client,cluster
10 slot String slot Number
11 taskManager String taskManage Number
12 taskManagerMemory String taskManager Memory count
13 jobManagerMemory String jobManager Memory count
14 programType String Program type JAVA,SCALA,PYTHON
15 localParams Array Custom parameters
16 resourceList Array Resource file
17 description String describe
18 runFlag String Running signs
19 conditionResult Object Conditional branch
20 successNode Array Successfully jump to node
21 failedNode Array Failed jump node
22 dependence Object The task depends on And params Mutually exclusive
23 maxRetryTimes String max retries
24 retryInterval String Retry interval
25 timeout Object Timeout control
26 taskInstancePriority String Task priority
27 workerGroup String Worker grouping
38 preTasks Array Pre task

** Flink Sample node data :**

{
    
    "type":"FLINK",
    "id":"tasks-17135",
    "name":"FlinkTask",
    "params":{
    
        "mainClass":"com.flink.demo",
        "mainJar":{
    
            "id":6
        },
        "deployMode":"cluster",
        "resourceList":[
            {
    
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "slot":1,
        "taskManager":"2",
        "jobManagerMemory":"1G",
        "taskManagerMemory":"2G",
        "executorCores":2,
        "mainArgs":"100",
        "others":"",
        "programType":"SCALA"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
    
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
    
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}

2.7 Http node

Http The node data structure is as follows :

Serial number Parameter name type describe describe
1 id String Task code
2 type String type HTTP
3 name String name
4 params Object Custom parameters Json Format
5 url String Request address
6 httpMethod String Request mode GET,POST,HEAD,PUT,DELETE
7 httpParams Array Request parameters
8 httpCheckCondition String Calibration conditions Default response code 200
9 condition String Check the content
10 localParams Array Custom parameters
11 description String describe
12 runFlag String Running signs
13 conditionResult Object Conditional branch
14 successNode Array Successfully jump to node
15 failedNode Array Failed jump node
16 dependence Object The task depends on And params Mutually exclusive
17 maxRetryTimes String max retries
18 retryInterval String Retry interval
19 timeout Object Timeout control
20 taskInstancePriority String Task priority
21 workerGroup String Worker grouping
22 preTasks Array Pre task

** Http Sample node data :**

{
    
    "type":"HTTP",
    "id":"tasks-60499",
    "name":"HttpTask",
    "params":{
    
        "localParams":[
        ],
        "httpParams":[
            {
    
                "prop":"id",
                "httpParametersType":"PARAMETER",
                "value":"1"
            },
            {
    
                "prop":"name",
                "httpParametersType":"PARAMETER",
                "value":"Bo"
            }
        ],
        "url":"https://www.xxxxx.com:9012",
        "httpMethod":"POST",
        "httpCheckCondition":"STATUS_CODE_DEFAULT",
        "condition":""
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
    
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
    
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}

2.8 DataX node

** DataX The node data structure is as follows :**

Serial number Parameter name type describe describe
1 id String Task code
2 type String type DATAX
3 name String name
4 params Object Custom parameters Json Format
5 customConfig Int Custom type 0 customized , 1 Customize
6 dsType String Source database type
7 dataSource Int Source database ID
8 dtType String Target database type
9 dataTarget Int Target database ID
10 sql String SQL sentence
11 targetTable String Target table
12 jobSpeedByte Int Current limiting ( Number of bytes )
13 jobSpeedRecord Int Current limiting ( Record number )
14 preStatements Array In front of SQL
15 postStatements Array After SQL
16 json String Custom configuration customConfig=1 Effective when
17 localParams Array Custom parameters customConfig=1 Effective when
18 description String describe
19 runFlag String Running signs
20 conditionResult Object Conditional branch
21 successNode Array Successfully jump to node
22 failedNode Array Failed jump node
23 dependence Object The task depends on And params Mutually exclusive
24 maxRetryTimes String max retries
25 retryInterval String Retry interval
26 timeout Object Timeout control
27 taskInstancePriority String Task priority
28 workerGroup String Worker grouping
29 preTasks Array Pre task

DataX Sample node data :

{
    
    "type":"DATAX",
    "id":"tasks-91196",
    "name":"DataxTask-DB",
    "params":{
    
        "customConfig":0,
        "dsType":"MYSQL",
        "dataSource":1,
        "dtType":"MYSQL",
        "dataTarget":1,
        "sql":"select id, name ,age from user ",
        "targetTable":"emp",
        "jobSpeedByte":524288,
        "jobSpeedRecord":500,
        "preStatements":[
            "truncate table emp "
        ],
        "postStatements":[
            "truncate table user"
        ]
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
    
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
    
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}

2.9 Sqoop node

Sqoop The node data structure is as follows :

Serial number Parameter name type describe describe
1 id String Task code
2 type String type SQOOP
3 name String name
4 params Object Custom parameters JSON Format
5 concurrency Int Concurrency
6 modelType String flow import,export
7 sourceType String Data source type
8 sourceParams String Data source parameters JSON Format
9 targetType String Target data type
10 targetParams String Target data parameters JSON Format
11 localParams Array Custom parameters
12 description String describe
13 runFlag String Running signs
14 conditionResult Object Conditional branch
15 successNode Array Successfully jump to node
16 failedNode Array Failed jump node
17 dependence Object The task depends on And params Mutually exclusive
18 maxRetryTimes String max retries
19 retryInterval String Retry interval
20 timeout Object Timeout control
21 taskInstancePriority String Task priority
22 workerGroup String Worker grouping
23 preTasks Array Pre task

Sqoop Sample node data :

{
    
            "type":"SQOOP",
            "id":"tasks-82041",
            "name":"Sqoop Task",
            "params":{
    
                "concurrency":1,
                "modelType":"import",
                "sourceType":"MYSQL",
                "targetType":"HDFS",
                "sourceParams":"{"srcType":"MYSQL","srcDatasource":1,"srcTable":"","srcQueryType":"1","srcQuerySql":"selec id , name from user","srcColumnType":"0","srcColumns":"","srcConditionList":[],"mapColumnHive":[{"prop":"hivetype-key","direct":"IN","type":"VARCHAR","value":"hivetype-value"}],"mapColumnJava":[{"prop":"javatype-key","direct":"IN","type":"VARCHAR","value":"javatype-value"}]}",
                "targetParams":"{"targetPath":"/user/hive/warehouse/ods.db/user","deleteTargetDir":false,"fileType":"--as-avrodatafile","compressionCodec":"snappy","fieldsTerminated":",","linesTerminated":"@"}",
                "localParams":[
                ]
            },
            "description":"",
            "runFlag":"NORMAL",
            "conditionResult":{
    
                "successNode":[
                    ""
                ],
                "failedNode":[
                    ""
                ]
            },
            "dependence":{
    
            },
            "maxRetryTimes":"0",
            "retryInterval":"1",
            "timeout":{
    
                "strategy":"",
                "interval":null,
                "enable":false
            },
            "taskInstancePriority":"MEDIUM",
            "workerGroup":"default",
            "preTasks":[
            ]
        }

2.10 Conditional branch node

The data structure of conditional branch node is as follows :

Serial number Parameter name type describe describe
1 id String Task code
2 type String type SHELL
3 name String name
4 params Object Custom parameters null
5 description String describe
6 runFlag String Running signs
7 conditionResult Object Conditional branch
8 successNode Array Successfully jump to node
9 failedNode Array Failed jump node
10 dependence Object The task depends on And params Mutually exclusive
11 maxRetryTimes String max retries
12 retryInterval String Retry interval
13 timeout Object Timeout control
14 taskInstancePriority String Task priority
15 workerGroup String Worker grouping
16 preTasks Array Pre task

Sample conditional branch node data :

{
    
    "type":"CONDITIONS",
    "id":"tasks-96189",
    "name":" Conditions ",
    "params":{
    
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
    
        "successNode":[
            "test04"
        ],
        "failedNode":[
            "test05"
        ]
    },
    "dependence":{
    
        "relation":"AND",
        "dependTaskList":[
        ]
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
    
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
        "test01",
        "test02"
    ]
}

2.11 Sub process nodes

The data structure of sub process node is as follows :

Serial number Parameter name type describe describe
1 id String Task code
2 type String type SHELL
3 name String name
4 params Object Custom parameters Json Format
5 processDefinitionId Int Process definition id
6 description String describe
7 runFlag String Running signs
8 conditionResult Object Conditional branch
9 successNode Array Successfully jump to node
10 failedNode Array Failed jump node
11 dependence Object The task depends on And params Mutually exclusive
12 maxRetryTimes String max retries
13 retryInterval String Retry interval
14 timeout Object Timeout control
15 taskInstancePriority String Task priority
16 workerGroup String Worker grouping
17 preTasks Array Pre task

Sub process node data sample :

{
    
            "type":"SUB_PROCESS",
            "id":"tasks-14806",
            "name":"SubProcessTask",
            "params":{
    
                "processDefinitionId":2
            },
            "description":"",
            "runFlag":"NORMAL",
            "conditionResult":{
    
                "successNode":[
                    ""
                ],
                "failedNode":[
                    ""
                ]
            },
            "dependence":{
    
            },
            "timeout":{
    
                "strategy":"",
                "interval":null,
                "enable":false
            },
            "taskInstancePriority":"MEDIUM",
            "workerGroup":"default",
            "preTasks":[
            ]
        }

2.12 rely on (DEPENDENT) node

rely on (DEPENDENT) The node data structure is as follows :

Serial number Parameter name type describe describe
1 id String Task code
2 type String type DEPENDENT
3 name String name
4 params Object Custom parameters Json Format
5 rawScript String Shell Script
6 localParams Array Custom parameters
7 resourceList Array Resource file
8 description String describe
9 runFlag String Running signs
10 conditionResult Object Conditional branch
11 successNode Array Successfully jump to node
12 failedNode Array Failed jump node
13 dependence Object The task depends on And params Mutually exclusive
14 relation String Relationship AND,OR
15 dependTaskList Array Rely on the task list
16 maxRetryTimes String max retries
17 retryInterval String Retry interval
18 timeout Object Timeout control
19 taskInstancePriority String Task priority
20 workerGroup String Worker grouping
21 preTasks Array Pre task

rely on (DEPENDENT) Sample node data :

{
    
            "type":"DEPENDENT",
            "id":"tasks-57057",
            "name":"DenpendentTask",
            "params":{
    
            },
            "description":"",
            "runFlag":"NORMAL",
            "conditionResult":{
    
                "successNode":[
                    ""
                ],
                "failedNode":[
                    ""
                ]
            },
            "dependence":{
    
                "relation":"AND",
                "dependTaskList":[
                    {
    
                        "relation":"AND",
                        "dependItemList":[
                            {
    
                                "projectId":1,
                                "definitionId":7,
                                "definitionList":[
                                    {
    
                                        "value":8,
                                        "label":"MRTask"
                                    },
                                    {
    
                                        "value":7,
                                        "label":"FlinkTask"
                                    },
                                    {
    
                                        "value":6,
                                        "label":"SparkTask"
                                    },
                                    {
    
                                        "value":5,
                                        "label":"SqlTask-Update"
                                    },
                                    {
    
                                        "value":4,
                                        "label":"SqlTask-Query"
                                    },
                                    {
    
                                        "value":3,
                                        "label":"SubProcessTask"
                                    },
                                    {
    
                                        "value":2,
                                        "label":"Python Task"
                                    },
                                    {
    
                                        "value":1,
                                        "label":"Shell Task"
                                    }
                                ],
                                "depTasks":"ALL",
                                "cycle":"day",
                                "dateValue":"today"
                            }
                        ]
                    },
                    {
    
                        "relation":"AND",
                        "dependItemList":[
                            {
    
                                "projectId":1,
                                "definitionId":5,
                                "definitionList":[
                                    {
    
                                        "value":8,
                                        "label":"MRTask"
                                    },
                                    {
    
                                        "value":7,
                                        "label":"FlinkTask"
                                    },
                                    {
    
                                        "value":6,
                                        "label":"SparkTask"
                                    },
                                    {
    
                                        "value":5,
                                        "label":"SqlTask-Update"
                                    },
                                    {
    
                                        "value":4,
                                        "label":"SqlTask-Query"
                                    },
                                    {
    
                                        "value":3,
                                        "label":"SubProcessTask"
                                    },
                                    {
    
                                        "value":2,
                                        "label":"Python Task"
                                    },
                                    {
    
                                        "value":1,
                                        "label":"Shell Task"
                                    }
                                ],
                                "depTasks":"SqlTask-Update",
                                "cycle":"day",
                                "dateValue":"today"
                            }
                        ]
                    }
                ]
            },
            "maxRetryTimes":"0",
            "retryInterval":"1",
            "timeout":{
    
                "strategy":"",
                "interval":null,
                "enable":false
            },
            "taskInstancePriority":"MEDIUM",
            "workerGroup":"default",
            "preTasks":[
            ]
        }

DolphinScheduler Introduction to the community :

Apache DolphinScheduler It's a very diverse community , So far, the contributors are close to 100 name , They are from 30 Many different companies . Wechat group users 3000 people .

file

Apache DolphinScheduler Some user stories ( In no particular order )

There has been a 300 Many enterprises and scientific research institutions are using DolphinScheduler, To handle all kinds of scheduling and timing tasks , have other 500 Several companies have launched a trial of dolphin scheduling :
file

Apache DolphinScheduler Project Origin - Demand determines

file

Apache DolphinScheduler Four characteristics

file

Apache DolphinScheduler Ability :

  • With DAG The way the picture will be Task Linked by task dependencies , It can monitor the running status of tasks in real time

  • Support rich task types :Shell、MR、Spark、Flink、SQL(mysql、postgresql、hive、sparksql)、Python、Http、Sub_Process、Procedure etc.

  • Support workflow scheduling 、 Rely on scheduling 、 Manual scheduling 、 Manual pause / stop it / recovery , At the same time, it supports failure retry / The alarm 、 Recovery from the specified node failed 、Kill Tasks, etc

  • Support workflow priority 、 Task priority, task failover and task timeout alarm / Failure

  • Support workflow global parameters and node custom parameter settings

  • Support online upload of resource files / download , Management etc. , Support online file creation 、 edit

  • Support online view and scrolling of task logs 、 Download logs online, etc

  • Implement the cluster HA, adopt Zookeeper Realization Master Clusters and Worker Cluster decentralization

  • Support for Master/Worker cpu load,memory,cpu Check online

  • Support workflow running history tree / Gantt chart shows 、 Support task status statistics 、 Process status statistics

  • Support complement

  • Multi tenant support

  • Support internationalization

Apache DolphinScheduler 1.3 New characteristics

* Worker Implement refactoring , promote Worker performance 

* Master and Worker introduce Netty signal communication 

*  Go to zookeeper Task queue 

* Worker Three choices of nodes : Random 、 Circulation and CPU And linear weighted load balancing of memory 

* Worker Go to database operation 

*  Resource center supports multiple directories 

*  add to  if/else  Conditional task 

*  add to  sqoop/datax  Mission 

*  Support  k8s  Deploy 

*  add to DAG One click formatting of flow chart 

*  Flow charts beautify 

*  Support  ambari  Plug in installation 

*  Batch export and import workflow 

*  Process definition supports replication 

*  Greatly simplify configuration items , Simplify deployment 

On-line DEMO The trial

http://106.75.43.194:8888/

DolphinScheduler Slogan

file

Join in Apache DolphinScheduler

In the use of DolphinScheduler In the process of , If you have any questions or ideas 、 Suggest , Both can pass Apache The mailing list is involved in DolphinScheduler In the process of community construction .

Welcome to the contributing team , Join the open source community from submitting the first PR Start ,

  • Find the belt ”easy to fix” Tags or something very simple issue( Like spelling mistakes, etc ), Go through the first one PR Familiar with the submission process , If there are any questions , Welcome to contact
    file

版权声明
本文为[Dai Lidong]所创,转载请带上原文链接,感谢

随机推荐