1.环境准备
1.1 介绍
在使用 Flink&Spark 时发现从编程模型, 启动配置到运维管理都有很多可以抽象共用的地方, 目前streampark提供了一个flink一站式的流处理作业开发管理平台, 从流处理作业开发到上线全生命周期都做了支持, 是一个一站式的流出来计算平台。
未来spark开发也在规划范围内,目前还不支持
1.2 下载
StreamPark安装包下载:https://streampark.apache.org/download
StreamPark官网:https://streampark.apache.org/docs/intro
最新版为2.1.2,本次安装为2.1.2版本
1.3 已有组件及版本
序号 | 名称 | 配置 |
---|---|---|
1 | K8S(TEK) | CPU:316,Me:361.57G,Storage:349.1G,Pods:3253,IP:3*273 (v1.26.1) |
2 | NFS(CFS) | 40G磁盘(目前,可扩充) |
3 | Harbor(容器仓库) | 未知 |
4 | flink | flink-(1.13.0/1.14.4/1.16.2) |
5 | StreamPark | 2.1.2 |
6 | Mysql | 5.7 |
2.挂载与权限
2.1 k8s挂载
找运维开通权限,主要是kubectl的权限
下载kubectl
权限文件:/root/.kube/config
2.2 nfs挂载
找运维要服务ip
#挂载前,请确保系统中已经安装了nfs-utils或nfs-common
sudo yum install nfs-utils
#创建待挂载目标目录
mkdir <待挂载目标目录> mkdir /localfolder/
#挂载文件系统
#挂载 CFS 根目录
#以下命令可以到 CFS 控制台-文件系统详情-挂载点详情中获取,由于部分旧版本文件系统不支持 noresvport 参数,具体挂载命令请以控制台建议命令为主。配置 norevsport 参数后,在网络重连时使用新的 TCP 端口,可以保障在网络异常到恢复期间、客户端和文件系统的连接不会中断,建议启用该参数。
#另,部分旧版本 Linux 内核需要使用 vers=4 挂载,若使用 vers=4.0 挂载有异常,可以尝试修改为 vers=4。
sudo mount -t nfs -o vers=4.0,noresvport <挂载点 IP>:/ <待挂载目录>
sudo mount -t nfs -o vers=4.0,noresvport 10.0.24.4:/ /localfolder
#挂载 CFS 子目录
#以下命令可以到 CFS 控制台-文件系统详情-挂载点详情中获取,由于部分旧版本文件系统不支持 noresvport 参数,具体挂载命令请以控制台建议命令为主。配置 norevsport 参数后,在网络重连时使用新的 TCP 端口,可以保障在网络异常到恢复期间、客户端和文件系统的连接不会中断,建议启用该参数。
#另,部分旧版本 Linux 内核需要使用 vers=4 挂载,若使用 vers=4.0 挂载有异常,可以尝试修改为 vers=4。
sudo mount -t nfs -o vers=4.0,noresvport 10.0.24.4:/subfolder /localfolder
腾讯nfs(CFS)使用文档:https://cloud.tencent.com/document/product/582/11523
2.3 Harbor
需要找运维开通权限
此处注意:项目库要公开才可以使用
3.测试Flink
在k8s为flink单独开通命名空间,并创建相应账户
kubectl create clusterrolebinding flink-role-bind --clusterrole=edit --serviceaccount=flink:flink
在有k8s操作权限的节点运行flink-session
bin/flink run \
-e kubernetes-session \
-Dkubernetes.namespace=flink \
-Dkubernetes.rest-service.exposed.type=NodePort \
-Dkubernetes.cluster-id=flink-cluster \
-c WordCount1 \
/data/package/jar/flink_test-1.0-SNAPSHOT.jar
#参考配置
bin/kubernetes-session.sh \
-Dkubernetes.namespace=flink \
-Dkubernetes.jobmanager.service-account=flink \
-Dkubernetes.rest-service.exposed.type=NodePort \
-Dkubernetes.cluster-id=flink-cluster \
-Dkubernetes.jobmanager.cpu=0.2 \
-Djobmanager.memory.process.size=1024m \
-Dresourcemanager.taskmanager-timeout=3600000 \
-Dkubernetes.taskmanager.cpu=0.2 \
-Dtaskmanager.memory.process.size=1024m \
-Dtaskmanager.numberOfTaskSlots=1
4.安装StreamPark
4.1 streampark(k8s)镜像打包
#编译
./build #注意maven 镜像配置,要不找不到依赖包同时要安装npm
#测试npm是否安装
npm -v
#streampark安装包添加mysql连接包
cp /data/module/streampart_2.12-2.1.2/lib/mysql-connector-java-8.0.30.jar lib/
#配置maven配置拷贝
cp /data/module/maven-3.6.3/conf/settings.xml /data/module/docker/streampark-docker/
#修改application.yml
profiles.active: mysql #[h2,pgsql,mysql]
lark-url: https://open.feishu.cn
workspace:
local: /opt/streampark_workspace
#配置application-mysql.yml
tee /data/module/docker/streampark-docker/streampark-2.1.2/conf/application-mysql.yml <<-'EOF' spring.datasource.driver-class-name: com.mysql.cj.jdbc.Driver streampark.docker.http-client.docker-host: ${DOCKER_HOST:} streampark.maven.settings: ${MAVEN_SETTINGS_PATH:/root/.m2/settings.xml} streampark.workspace.local: ${WORKSPACE_PATH:/opt/streampark_workspace} EOF
# 编写Dockerfile
#需要提前准备kubectl、settings.xml 、config(kubectl的密钥)
tee docker<<-'EOF' FROM flink:1.17.1-scala_2.12-java8 WORKDIR /opt/streampark/ ADD ./streampark-2.1.2/ /opt/streampark/ ADD ./kubectl /opt/streampark/ ADD ./settings.xml /root/.m2/ USER root RUN sed -i -e 's/eval $NOHUP/eval/' bin/streampark.sh \ && sed -i -e 's/>> "$APP_OUT" 2>&1 "&"//' bin/streampark.sh \ && install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl \ && mkdir -p ~/.kube WORKDIR /opt/streampark/ ADD ./config /root/.kube/ RUN chown -R flink /opt/streampark/ EXPOSE 10000 EOF
#构建镜像
docker build -f Dockerfile -t apache/streampark-flink:2.1.2 .
#推送镜像到仓库
docker tag apache/streampark-flink:2.1.2 storage/bigdata/streampark-flink:2.1.2
docker push storage/bigdata/streampark-flink:2.1.2
docker tag apache/streampark-flink:2.1.2-rc4 storage/bigdata/streampark-flink:2.1.2-rc4
docker push storage/bigdata/streampark-flink:2.1.2-rc4
4.2部署MySQL的pod
#k8s上创建mysql的namespace
#含义:kubectl create clusterrolebinding ClusterRoleBinding名 --clusterrole=绑定的Role serviceaccount=被绑定的SA -n 命名空间
kubectl create namespace mysql
kubectl create serviceaccount mysql
kubectl create clusterrolebinding mysql-role-bind --clusterrole=edit --serviceaccount=mysql:mysql -n mysql
clusterrolebinding.rbac.authorization.k8s.io/mysql-role-bind created
#查看角色绑定
kubectl get clusterrolebinding flink-role-bind -n flink -o yaml
kubectl get clusterrolebinding mysql-role-bind -n mysql -o yaml
kubectl get clusterrolebinding mysql-role-bind -n flink -o yaml
#配置pvc和pv和nfs指定(腾讯nfs可直接使用)
apiVersion: v1
kind: PersistentVolume
metadata:
name: data-mysql
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 10Gi
csi:
driver: com.tencent.cloud.csi.cfs
volumeAttributes:
host: x x x
path: /data_mysql
vers: "4"
volumeHandle: cfs #此处需要每个pv都不相同,否则挂载两个pvc会报错
persistentVolumeReclaimPolicy: Retain
storageClassName: data-mysql
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-mysql
namespace: mysql
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: data-mysql
#docker拉去mysql镜像并上传至镜像库
docker tag mysql:5.7 storage/bigdata/mysql:5.7
docker push storage/bigdata/mysql:5.7
#查看集群的node ip
kubectl get node
# 编写mysql的yaml文件,并做配置
sudo mkdir -p /data/module/docker/mysql/{
conf,data}/
sudo tee /data/module/docker/mysql/conf/pod-db-mysql.yaml <<-'EOF' apiVersion: v1 kind: ConfigMap metadata: name: conf-mysql namespace: mysql data: mysql.cnf: | [mysqld] #Mysql服务的唯一编号 每个mysql服务Id需唯一 server-id=1 # 允许访问的IP网段 bind-address=0.0.0.0 #设置时区 default-time_zone='+8:00' #数据库默认字符集,主流字符集支持一些特殊表情符号(特殊表情符占用4个字节) character-set-server=utf8mb4 #数据库字符集对应一些排序等规则,注意要和character-set-server对应 collation-server=utf8mb4_general_ci #设置client连接mysql时的字符集,防止乱码 init_connect='SET NAMES utf8mb4' #是否对sql语句大小写敏感,1表示不敏感 lower_case_table_names=1 #最大连接数 max_connections=400 #最大错误连接数 max_connect_errors=1000 #TIMESTAMP如果没有显示声明NOT NULL,允许NULL值 explicit_defaults_for_timestamp=true #SQL数据包发送的大小,如果有BLOB对象建议修改成1G max_allowed_packet=128M #MySQL连接闲置超过一定时间后(单位:秒)将会被强行关闭 #MySQL默认的wait_timeout 值为8个小时, interactive_timeout参数需要同时配置才能生效 interactive_timeout=3600 wait_timeout=3600 --- apiVersion: v1 kind: Pod metadata: name: pod-db-mysql namespace: mysql spec: #serviceAccount: mysql nodeName: xxx hostNetwork: true #主机网络可见(会占用node端口) containers: - name: mysql-k8s image: storage/bigdata/mysql:5.7 env: - name: TZ value: "Asia/Shanghai" - name: LANG value: "zh_CN.UTF-8" - name: MYSQL_ROOT_PASSWORD value: "xxxx" ports: #- containerPort: 3306 volumeMounts: - mountPath: /var/lib/mysql subPath: mysql name: data-mysql - mountPath: /etc/mysql/conf.d name: conf-volume readOnly: true volumes: - name: data-mysql persistentVolumeClaim: claimName: data-mysql - name: conf-volume configMap: name: conf-mysql EOF
#kubectl delete -f /opt/module/k3s/conf/pod-db-mysql-k3s100.yaml
#启动pod
kubectl apply -f /data/module/docker/mysql/conf/pod-db-mysql.yaml
kubectl delete -f /data/module/docker/mysql/conf/pod-db-mysql.yaml
# 稍等片刻
kubectl get pod -A -owide
kubectl describe pod pod-db-mysql
kubectl logs --tail=100 -f pod-db-mysql -n mysql
4.3 mysql 数据库初始化
# 复制数据库文件
cp -r /data/software/incubator-streampark-2.1.2-rc3/streampark-console/streampark-console-service/src/main/assembly/script/ /localnfs/data_mysql/mysql/streampark-sql/
cp -r /data/module/docker/streampark-docker/streampark-2.2.0/script/ /localnfs/data_mysql/mysql/streampark-sql/
# 创建用户及数据库
# 进入mysql容器
kubectl exec -n mysql -it pod-db-mysql -- bash
#------------------------进入mysql容器----------------------------
mysql -uroot -proot
create database if not exists `streampark` character set utf8mb4 collate utf8mb4_general_ci;
create user 'xxxx'@'%' IDENTIFIED WITH mysql_native_password by 'xxx';
grant ALL PRIVILEGES ON streampark.* to 'xxxx'@'%';
flush privileges;
-- 导入数据文件
use streampark;
source /var/lib/mysql/streampark-sql/schema/mysql-schema.sql
source /var/lib/mysql/streampark-sql/data/mysql-data.sql
-- 退出mysql
quit
#------------------------退出mysql容器------------------------
exit
#查看mysql 的建表
vim /data/module/docker/streampark-docker/streampark-2.2.0/script/schema/mysql-schema.sql
vim /data/module/docker/streampark-docker/streampark-2.2.0/script/data/mysql-data.sql
4.4 创建StreamPark的pod
#k8s上创建mysql的namespace
#含义:kubectl create clusterrolebinding ClusterRoleBinding名 --clusterrole=绑定的Role serviceaccount=被绑定的SA -n 命名空间
kubectl create namespace streampark
kubectl create serviceaccount streampark
kubectl create clusterrolebinding streampark-role-bind --clusterrole=edit --serviceaccount=streampark:streampark -n streampark
clusterrolebinding.rbac.authorization.k8s.io/mysql-role-bind created
#配置pvc和pv和nfs指定
vim pv-pvc-streampark.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: data-streampark
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 10Gi
csi:
driver: com.tencent.cloud.csi.cfs
volumeAttributes:
host: xxx
path: /data_streampark
vers: "4"
volumeHandle: cfs #此处需要每个pv都不相同,否则挂载两个pvc会报错
persistentVolumeReclaimPolicy: Retain
storageClassName: data-streampark
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-streampark
namespace: streampark
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: data-streampark
#执行
kubectl apply -f /data/module/docker/k8s/pv-pvc-streampark.yaml
kubectl delete -f /data/module/docker/k8s/pv-pvc-streampark.yaml
# 在xxx 上执行,在指定 节点 xxx 安装StreamPark
sudo tee /data/pod-app-streampark.yaml <<-'EOF' apiVersion: apps/v1 kind: Deployment metadata: labels: app: pod-app-streampark name: pod-app-streampark namespace: streampark spec: replicas: 1 selector: matchLabels: app: pod-app-streampark template: metadata: labels: app: pod-app-streampark spec: nodeName: xxx hostNetwork: true #主机网络可见(会占用node端口) containers: - name: streampark image: storage/bigdata/streampark-flink:2.1.2-rc4 imagePullPolicy: Always env: - name: TZ value: "Asia/Shanghai" - name: LANG value: "zh_CN.UTF-8" - name: SPRING_PROFILES_ACTIVE value: "mysql" - name: SPRING_DATASOURCE_URL value: "jdbc:mysql://xxx:3306/streampark?useSSL=false&useUnicode=true&characterEncoding=UTF-8&allowPublicKeyRetrieval=false&useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=GMT%2B8" - name: SPRING_DATASOURCE_USERNAME value: "xxxx" - name: SPRING_DATASOURCE_PASSWORD value: "xxxx" - name: DOCKER_HOST value: "tcp://xxx:2375" - name: DEBUG_OPTS #调试端口参数 value: "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:10001" ports: #- containerPort: 10000 volumeMounts: - mountPath: /root/.kube subPath: .kube name: conf-volume - mountPath: /opt/streampark_workspace subPath: streampark_workspace name: data-volume command: ["sh","-c","bash bin/startup.sh debug"] volumes: - name: conf-volume hostPath: path: /root type: DirectoryOrCreate - name: data-volume nfs: path: /data_streampark server: xxx EOF
#kubectl delete -f /opt/module/k3s/conf/pod-app-streampark-k3s100.yaml
#启动pod
kubectl apply -f /data/pod-app-streampark.yaml
kubectl delete -f /data/pod-app-streampark.yaml
# 稍等片刻
kubectl get pod -A -o wide -n mysql
kubectl describe pod pod-app-streampark-k3s100
kubectl logs --tail=1000 -f pod-app-streampark -n mysql
# 进入mysql容器中创建用户及数据库
kubectl exec -n mysql -it pod-app-streampark -- bash
#-c streampark-k3s100
# 为默认命名空间添加权限 kubectl create clusterrolebinding flink-role-binding-default --clusterrole=edit --serviceaccount=flink_dev:default
kubectl create clusterrolebinding flink-role-binding-default --clusterrole=edit --serviceaccount=default:default
文章评论