当前位置:网站首页>Atlas vs datahub vs Amundsen

Atlas vs datahub vs Amundsen

2020-11-11 09:29:15 Irving the procedural ape

Data governance is significant , Traditional data governance is managed in the form of documents , It has been unable to meet the needs of data governance under big data . And suitable for Hadoop Data governance of big data ecosystem is very important .

​ Data governance under big data is a huge problem for many enterprises , There are not many solutions to the data that can be found , But in recent years , A lot of companies have tried and open source , This article will analyze these data discovery platforms in detail , In foreign countries, there are more than ten kinds of implementation schemes .

Data discovery platform can solve the problem

Why a data discovery platform is needed ?

In the process of data governance , We often encounter these problems : Where does the data exist ? How to use this data ? What data does ? How data is created ? How the data is updated ?

.....

The purpose of data discovery platform is to solve the above problems , Help better find , Understanding and using data .

such as Facebook Of Nemo We used full-text retrieval technology , This can quickly search for target data .

When the user is browsing the data table , How to quickly understand data ? The general way is to list , data type , The description shows , If the user has permission , You can also preview the data .

Here is Amundsen Data column display function of .

data ETL It's a big problem , Especially how to show it is very difficult , In fact, the data is ETL It can be represented by a flow chart of data , Many platforms support this feature , such as Databook, also Metcat.

Amundsen And data scheduling platform Airflow It's a very good combination .

Data discovery platform comparison

The next table Compare the support of each major platform for the above functions

Search for recommend Table description Data preview Make statistics Occupancy indicators jurisdiction ranking Data lineage Change notice Open source file Support data sources
Amundsen (Lyft) Todo Hive, Redshift, Druid, RDBMS, Presto, Snowflake, etc.
Datahub (LinkedIn) Hive, Kafka, RDBMS
Metacat (Netflix) Todo Todo Hive, RDS, Teradata, Redshift, S3, Cassandra
Atlas (Apache) HBase, Hive, Sqoop, Kafka, Storm
Marquez (Wework) S3, Kafka
Databook (Uber) Hive, Vertica, MySQL, Postgress, Cassandra
Dataportal (Airbnb) Unknown
Data Access Layer (Twitter) HDFS, Vertica, MySQL
Lexikon (Spotify) Unknown
Here are five open source solutions

DataHub (LinkedIn)

LinkedIn Open source , Originally called WhereHows . After one .........

版权声明
本文为[Irving the procedural ape]所创,转载请带上原文链接,感谢

随机推荐