当前位置：网站首页>New version of fluid 0.3 officially released: realize data acceleration of cloud native scene generalization
New version of fluid 0.3 officially released: realize data acceleration of cloud native scene generalization
2020-12-07 19:19:03 【Aliyun yunqi】
brief introduction ： To solve big data 、AI And other data intensive applications in the cloud computing storage separation scenario , Data that exists ** High access latency 、 Joint analysis is difficult 、 Multi dimensional management ** Wait for the pain point , Nanjing University PASALab、 Alibaba 、Alluxio stay 2020 year 9 Co launched the open source project in January Fluid. Recently we have updated 0.3 edition
author | Gu Rong Nanjing University PASALab
Reading guide ： To solve big data 、AI And other data intensive applications in the cloud computing storage separation scenario , Data that exists High access latency 、 Joint analysis is difficult 、 Multi dimensional management Wait for the pain point , Nanjing University PASALab、 Alibaba 、Alluxio stay 2020 year 9 In January, it jointly launched Open source project Fluid.
Fluid The cloud is an efficient platform for data intensive applications , Since the open source release, the project has attracted the attention of many experts and engineers in related fields , With your positive feedback, the development of the community is progressing rapidly . In the near future Fluid 0.3 Official release , There are three major new features , Namely ：
- Achieve universal data storage acceleration , Provide Kubernetes Data volume access acceleration function
- Strengthen data access security , Provides fine-grained permission control for data sets
- Simplify user complex parameter configuration , Provide the function of parameter configuration optimization in the original biochemical system
Fluid Project address ：https://github.com/fluid-cloudnative/fluid
The development needs of these three main functions come from the actual production feedback of many community users , Besides Fluid v0.3 And a little bit of bug Repairs and document updates , Welcome to experience Fluid v0.3！ Thank you for your contribution to this release , We will continue to pay wide attention to and adopt community suggestions , Push Fluid Development of the project , Looking forward to hearing more feedback from you ！
Fluid v0.3 Download link ：https://github.com/fluid-cloudnative/fluid/releases
The following is a further introduction to the functions of this new release .
1. Support Kubernetes Data volume access acceleration
Despite the previous version of Fluid It already supports many underlying storage systems （ Such as HDFS、OSS etc. ）, But in the actual production environment , Internal storage systems are often more diverse , Unable to dock due to incompatible storage system Fluid The situation still exists . For example, users use Lustre distributed file system , Because of the previous Fluid The distributed cache engine used is not compatible yet Lustre System , Therefore, the user will not be able to use it normally Fluid.
In order to improve Fluid In the cloud, native data access accelerates the commonality of scenarios ,Fluid v0.3. Added For data volumes Persistent Volume Claim (PVC) And Host Directory （Host Path） Mount accelerated support , So as to provide various storage systems with Fluid The docking provides a general acceleration scheme ： No matter which underlying storage system you use , As long as the storage system can be mapped to Kubernetes Native data volumes PVC Resource object or host directory on cluster node , Then it can go through Fluid Enjoy distributed data caching 、 The advantages of data affinity scheduling and other functional features . The basic concept is shown in the figure below ：
How to use it is very simple , Users only need to be in mountPoint It is specified in pvc://nfs-imagenet, among nfs-imagenet yes Kubernetes There are already data volumes in the cluster .
apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: fluid-imagenet spec: mounts: - mountPoint: pvc://nfs-imagenet name: nfs-imagenet
【 Click to view the system demo video 】
We go through TensorFlow Benchmark Training ResNet-50 The model is a test scenario , Verified PVC Access Acceleration capabilities , Here's the speed boost ：
|Use it directly PVC Data volume||Fluid Speed up PVC Data volume|
|1000 Step speed (images/second)||3,136||8,889|
|The final speed (images/second)||15,024||20,506|
|Accuracy @ 5||0.9228||0.9204|
Judging from the evaluation results ,Fluid The distributed cache capability provided can improve the speed of the whole training task , Shorten the overall training time by more than 20%. For more test related details, please refer to Github Upper Related sample documents ：
- PVC Speed up the document ：https://github.com/fluid-cloudnative/fluid/blob/master/docs/zh/samples/accelerate_pvc.md
- Host Directory acceleration document ：https://github.com/fluid-cloudnative/fluid/blob/master/docs/zh/samples/hostpath.md
2. Access control of data sets
Many enterprises that provide machine learning platform services have multi-user shared storage system situations and scenarios . For security reasons , Machine learning platform service providers need strict access control to Ensure data isolation between users , That is, any unauthorized users are not allowed to access other people's datasets at will .
Fluid stay v0.3 Support for the above scenarios is provided in ： The underlying storage system shared by multiple users is mounted to Fluid after ,Fluid Exposed file permission information （ Such as the user 、 File mode, etc ） Will be consistent with the underlying storage system , That is to realize the file from the underlying storage system to deployment Fluid The node of transparent transmission . This means that access control in the underlying storage system will also be deployed Fluid On each node of the , In this way, the data isolation between users will not be destroyed .
in addition to ,Fluid v0.3 It also provides data sets “ On a temporary loan ” Functional characteristics of .“ On a temporary loan ” It refers to that a user needs to have temporary access to a dataset of another user . stay Fluid v0.3 in , Administrators can deploy through flexible configuration Fluid Complete the transformation of data set ownership on the node of , To give the specified user “ On a temporary loan ” The ability of other people's datasets , It can Help cluster administrators achieve more fine-grained and flexible data set permission management .
Visit non root User data usage document ：https://github.com/fluid-cloudnative/fluid/blob/master/docs/zh/samples/nonroot_access.md
3. Default parameter configuration optimization
Fluid Many parameter configurations are provided for users to customize their own applications , stay Fluid 0.3 Before the release , Users need to manually configure according to the actual environment and business objectives , However, it is difficult and heavy workload for most users to complete configuration optimization manually .
Fluid v0.3 Built in a lot of Alluxio and Fuse Internal configuration of default components , Users no longer need to focus on parameter configuration tuning . According to our experience, the optimized default parameters can be set in most of Fluid Get better performance in common scenarios .
Fluid v0.3 It mainly solves the feedback problems and needs of community users in the actual production environment . On the host directory and PVC Mount support provides a general solution for compatibility with different underlying storage systems ; Access control of data sets makes Fluid It can really meet the needs of the actual production environment shared by multiple users ; The optimized default parameter configuration has been added Fluid Ease of use , And maintain stable performance in most scenarios .
If you have any questions , Welcome to join the nail exchange group to participate and discuss ：https://img.alicdn.com/tfs/TB1Cm4ciNvbeK8jSZPfXXariXXa-452-550.png
- Thank you Xu Zhihao , Luo Yili （ Nanjing University PASALab） For support Kubernetes The contribution of data volume access acceleration
- Thank you, Lu Dongdong , Xie Yuandong （ Yun Zhisheng ） Contribution to the data set permission control function
Author's brief introduction
Gu Rong Doctor , Associate researcher, Department of computer science, Nanjing University , Research direction big data processing system , Already in TPDS、ICDE、JPDC、IPDPS、ICPP Published papers at the conference of Frontier journals in other fields 20 Yu Wen , Presided over the general program of NSFC / The youth project 、 There are many special projects supported by China Postdoctoral Science Foundation , The research results are applied to Alibaba 、 Baidu 、 Bytes to beat 、 China petroleum & chemical corporation 、 Huatai Securities and other companies and open source projects Apache Spark、Alluxio, a 2018 First prize of science and technology of Jiangsu Province of the year 、2019 Jiangsu computer society youth science and technology award of the year , As a member of the system software special committee of China computer society / Communication member of the big data special committee 、 Secretary General of big data special committee of Jiangsu computer society 、Fluid Open source project co-founder、Alluxio Open source project PMC member .
Link to the original text
This article is the original content of Alibaba cloud , No reprint without permission .
- C++ 数字、string和char*的转换
- Won the CKA + CKS certificate with the highest gold content in kubernetes in 31 days!
- C + + number, string and char * conversion
- C + + Learning -- capacity() and resize() in C + +
- C + + Learning -- about code performance optimization
C + + programming experience (6): using C + + style type conversion
Latest party and government work report ppt - Park ppt
Online ID number extraction birthday tool
Field pointer? Dangling pointer? This article will help you understand!
GVRP of hcna Routing & Switching
- LeetCode 91. 解码方法
- Seq2seq implements chat robot
- [chat robot] principle of seq2seq model
- Leetcode 91. Decoding method
- HCNA Routing＆Switching之GVRP
- GVRP of hcna Routing & Switching
- HDU7016 Random Walk 2
- [Code+＃1]Yazid 的新生舞会
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- HDU7016 Random Walk 2
- [code + 1] Yazid's freshman ball
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- Qt Creator 自动补齐变慢的解决
- HALCON 20.11：如何处理标定助手品质问题
- HALCON 20.11：标定助手使用注意事项
- Solution of QT creator's automatic replenishment slowing down
- Halcon 20.11: how to deal with the quality problem of calibration assistant
- Halcon 20.11: precautions for use of calibration assistant
- "Top ten scientific and technological issues" announced| Young scientists 50 ² forum
- Reverse linked list
- JS data type
- Remember the bug encountered in reading and writing a file
- Singleton mode
- 在这个 N 多编程语言争霸的世界，C++ 究竟还有没有未来？
- In this world of N programming languages, is there a future for C + +?
- js Promise
- js 数组方法 回顾
- ES6 template characters
- js Promise
- JS array method review
- 【Golang】️走进 Go 语言️ 第一课 Hello World
- [golang] go into go language lesson 1 Hello World