Sr. Data Engineer
Greylock PartnersFull Description
Summary:
Growth-stage company is building a new class of AI-native platform where data is the product. The system processes millions of structured and unstructured outputs daily across a rapidly growing set of external sources, and the underlying data infrastructure directly determines product reliability, correctness, and user trust.
The company is forming its first dedicated data function. This role will define how ingestion, processing, and data quality systems are architected and operated at scale. The decisions made here will shape the data platform for years, with direct impact on product velocity and customer experience.
What You’ll Do:
* Own the architecture and reliability of a multi-source ingestion platform operating at high throughput and increasing scale
* Design and evolve distributed data systems that support asynchronous processing, fault tolerance, and horizontal scalability
* Define and enforce data contracts across ingestion, transformation, and serving layers to ensure end-to-end correctness
* Establish best practices for pipeline observability, including monitoring, alerting, and system health tracking
* Drive performance improvements across ingestion and processing layers, ensuring throughput stays ahead of product demand
* Partner with product and engineering leadership to ensure data systems are designed proactively for upcoming features
* Set technical direction for the data platform and mentor engineers as the team grows
Where You’ll Work in the Stack:
* Ingestion: resilient, multi-provider data collection and normalization
* Processing: distributed pipelines, async job orchestration, and transformation logic
* Storage: columnar systems and query optimization for large-scale analytics
* Serving: tight integration with application systems and product features
* Observability: system-wide monitoring, alerting, and debugging infrastructure
What We’re Looking For:
* Deep experience designing and operating production data systems at scale, including ingestion, ETL, and distributed processing
* Strong expertise with modern data infrastructure, including columnar databases (e.g., ClickHouse or similar)
* Proven track record owning system reliability, data quality, and observability in production environments
* Experience working with external data sources, APIs, or scraping systems at scale
* Comfort operating close to the application layer and understanding how data systems power user-facing features
* Strong engineering fundamentals (Python or similar; experience with distributed systems required)
* Ability to lead technical direction and influence system design across teams
Signals We’re Especially Excited About:
* You’ve owned data platforms where correctness and latency directly impacted customers
* You’ve built high-throughput systems with strict reliability and completeness requirements
* You’ve designed observability systems that provide real-time insight into system health
* You’ve operated in early-stage environments and built systems from first principles
* You’ve led or mentored engineers while remaining deeply hands-on
Why This Role:
* Define the data foundation for a rapidly scaling AI-native product
* High ownership across architecture, reliability, and long-term system design
* Direct impact on customer-facing product quality and trust
* Opportunity to build and shape a data function from the ground up
* Close collaboration with experienced founders and engineering leadership
About Greylock
Greylock is a leading early-stage venture capital firm that partners with exceptional founders building category-defining companies. Our portfolio includes Figma, Anthropic, Ramp, Abnormal Security, Rubrik, Airbnb, LinkedIn, Roblox, Dropbox, and Coinbase.
About the Greylock Recruiting Team
As full-time employees of Greylock, our team provides candidate referrals and introductions to our portfolio companies. We work closely with founders to build exceptional teams and bring deep experience across startups and large-scale technology companies.