Julien Le Dem is the CTO and Co-Founder of Datakin. He co-created Apache Parquet and is involved in several open source projects including Marquez (LF AI), Apache Pig, Apache Arrow, Apache Iceberg and a few others. Previously, Julien was a senior principal at Wework; principal architect at Dremio; tech lead for Twitter’s data processing tools, where he also obtained a two-character Twitter handle (@J_); and a principal engineer and tech lead working on content platforms at Yahoo, where he received his Hadoop initiation. His French accent makes his talks particularly attractive.
Relevant Links
LinkedIn - Twitter
As data becomes core to every product, data operations become critical. The OpenLineage API enables data pipeline observability.
Introduction to presentation - 0:00
Presentation Agenda - 0:20
Why Metadata? - 0:47
Team Interdependencies - 2:09
Today: Limited Context - 2:57
Data hierarchy of needs - 3:28
OpenLineage and contributors - 6:00
Purpose - 7:21
Problem. Process Before vs. With OpenLineage - 9:28
OpenLineage scope - 11:38
Core Model - 13:34
Lifecycle - 15:40
Protocol - 16:12
Facets - 16:34
Facet examples - 18:09
Join the conversation (GitHub, Slack, Twitter, Email links) - 19:36
Introducing Marquez - 20:06
Marquez at the intersection of Data Operations, Data Governance, and Data Discovery - 21:05
Marquez takes a lot of inspiration from [Ground: Data Context as a Service](http://cidrdb.org/cidr2017/papers/p111-hellerstein-cidr17.pdf) paper - 21:43
Walking through Marquez architecture - 22:06
Marquez Data Model - 22:42
Design benefits: debugging and backfilling. - 23:42
Metadata Service - 24:35
Marquez Metadata collection - 25:35
Datakin leverages Marquez metadata - 25:55
Community (https://marquezproject.github.io/marquez) - 27:00
Part of the LF AI & Data foundation - 27:27
Contributors and community welcome (https://github.com/MarquezProject/marquez) - 28:35
Share your questions and comments below!
Top comments (0)