Cover image for OCS 2020 Breakout: Julien Le Dem
Joseph (JJ) Jacks for COSS Community

Posted on

OCS 2020 Breakout: Julien Le Dem

Julien Le Dem is the CTO and Co-Founder of Datakin. He co-created Apache Parquet and is involved in several open source projects including Marquez (LF AI), Apache Pig, Apache Arrow, Apache Iceberg and a few others. Previously, Julien was a senior principal at Wework; principal architect at Dremio; tech lead for Twitter’s data processing tools, where he also obtained a two-character Twitter handle (@J_); and a principal engineer and tech lead working on content platforms at Yahoo, where he received his Hadoop initiation. His French accent makes his talks particularly attractive.

Relevant Links
LinkedIn - Twitter

As data becomes core to every product, data operations become critical. The OpenLineage API enables data pipeline observability.

Introduction to presentation - 0:00

Presentation Agenda - 0:20

Why Metadata? - 0:47

Team Interdependencies - 2:09

Today: Limited Context - 2:57

Data hierarchy of needs - 3:28

OpenLineage and contributors - 6:00

Purpose - 7:21

Problem. Process Before vs. With OpenLineage - 9:28

OpenLineage scope - 11:38

Core Model - 13:34

Lifecycle - 15:40

Protocol - 16:12

Facets - 16:34

Facet examples - 18:09

Join the conversation (GitHub, Slack, Twitter, Email links) - 19:36

Introducing Marquez - 20:06

Marquez at the intersection of Data Operations, Data Governance, and Data Discovery - 21:05

Marquez takes a lot of inspiration from [Ground: Data Context as a Service]( paper - 21:43

Walking through Marquez architecture - 22:06

Marquez Data Model - 22:42

Design benefits: debugging and backfilling. - 23:42

Metadata Service - 24:35

Marquez Metadata collection - 25:35

Datakin leverages Marquez metadata - 25:55

Community ( - 27:00

Part of the LF AI & Data foundation - 27:27

Contributors and community welcome ( - 28:35

