Shirshanka Das has been working on data infrastructure and platforms for the past 10+ years at LinkedIn. He is currently the uber architect for LinkedIn’s Analytics Infra, Apps and Platforms team (acting virtually as the CTO for a 200 engineer strong team). The team provides infrastructure and a data platform to enable among other things, a global metadata system for LinkedIn that powers privacy, compliance and search and discovery; a self-service hybrid (on-prem + Azure) data lake with hundreds of Petabytes of data, and automatic data management (ingestion, movement, organization, compliance) for over 1M data assets. Shirshanka led and delivered LinkedIn GDPR strategy and implementation as its chief architect. He continues to stay deeply involved in data privacy initiatives as new regulations like CCPA and e-Privacy get rolled out. Since 2013, Shirshanka has been leading LinkedIn’s metadata strategy, including open-sourcing LinkedIn’s first attempt at this (WhereHows) in 2015, followed by a re-write and re-open source in 2019 (LinkedIn DataHub). He is also a committer on Apache Gobblin (incubating), a highly-scalable swiss army knife for data ingestion, data deletion and lifecycle management; which is in use at multiple top-tier companies (e.g. LinkedIn, Apple, PayPal).
Architecture survey; success stories of metadata-driven data management with LinkedIn’s DataHub and Apache Gobblin will be discussed.
You recently organized an industry-wide metadata event. What are some of the insights that emerged from that, and can you tell the audience what “metadata” means to you? - 0:24
I have some background in data tooling, about a decade ago I was working at a company called Talend in the data integration ETL space. They had a product portfolio of middleware tools, and a master data management (MDM) tool. Is there an emergent metadata management tool emerging, and what is your vision for metadata management tooling or data catalogs around that? - 7:28
Metadata feels almost like an ephemeral exhaust. How is metadata viewed from the lense of a system of record? It seems like the format/modeling of metadata would be different from golden records and centralized entities that data systems have evolved around. Is that scratching at the right thing, or does it all just kind of become data? - 13:40
What are the opportunities for open-source in this area? What does the open-source community hold for metadata as it continues to evolve? - 16:52
What are the big challenges that are underappreciated by people learning about these tools, or the existing open-source solutions you mentioned? What’s unsolved? - 19:57
What are you excited about these days? What are you working on, what’s next? - 23:10
Closing thoughts - 24:52
Share your questions and comments below!