Lead Data Engineer | Enterprise Tech & Streaming Systems
Designing data platforms that stay fast, reliable, and cost-efficient at production scale.
I build real-time, low-latency data systems on AWS with a strong bias for operational clarity, dependable delivery, and measurable business impact. Over 8+ years, I have worked across Flink, Kafka/MSK, Spark, Kinesis Data Analytics, DynamoDB, S3, and modern lakehouse patterns to ship business-critical pipelines that teams can trust.
Exactly-once
Reliability designed into the pipeline, not added later.
Strong grounding in checkpoint alignment, idempotent sinks, state tuning, and long-lived stream behavior.
Low latency
Fast paths for event-driven systems that need predictable runtime behavior.
Built to reduce hot-path IO, improve recovery time, and keep production workloads responsive at scale.
Cost discipline
Performance work tied directly to infrastructure efficiency.
Experience optimizing storage layout, state management, data models, and compute patterns for real savings.
Technologies used repeatedly across streaming, analytics, platform engineering, and delivery.
About
Lead data engineering grounded in production behavior, scale, and operational confidence.
My work sits at the intersection of real-time systems, AWS data platforms, and practical engineering leadership. I care about architecture that performs well in production and stays understandable for the teams operating it.
What I Build
Streaming and batch platforms for critical data flows, with a focus on reliability, observability, storage efficiency, and low-latency execution.
How I Work
I favor simple, high-leverage architecture decisions, careful tuning, and delivery patterns that help teams move faster without sacrificing confidence in production.
Why It Matters
The strongest systems are not just scalable on paper. They recover predictably, stay observable under load, and keep costs under control as usage grows.
Reliable data platforms are built by sweating the runtime details that others skip.
Experience
Experience building real-time and analytics platforms across product and enterprise domains.
The through-line across these roles is consistent: design dependable pipelines, improve performance and cost behavior, and ship systems that downstream teams can operate with confidence.
- Moved into Flutter Entertainment's Enterprise Tech organization to support broader company-wide data and platform initiatives.
- Bringing production discipline from Streaming AI into enterprise-facing data workflows with a focus on reliability, clarity, and scalable delivery.
- Promoted to lead the Streaming AI data engineering track, guiding architecture and delivery for real-time workloads on AWS.
- Continued optimization of hot-path IO, storage layout, and state handling to improve runtime behavior and infrastructure efficiency.
- Strengthened production readiness through exactly-once sinks, checkpoint strategy, and recovery tuning across Flink-based services.
- Designed real-time processing on AWS using Flink on Kinesis Data Analytics, MSK, DynamoDB, and S3 for high-value production workloads.
- Reduced hot-path IO and delivered major annual cost savings through DynamoDB modeling, payload compaction, and S3 layout tuning.
- Built exactly-once sinks with checkpoint alignment and idempotent upserts while tuning RocksDB state and JVM behavior for recovery and latency.
- Delivered regulated pipelines with secure ingestion, lineage, and data quality gates to improve analytics readiness and operational trust.
- Standardized batch and streaming jobs with reproducible configuration, deployment discipline, and monitoring that reduced delivery friction.
- Built event-driven analytics with Kafka, Spark, and Delta Lake and exposed downstream access through Dremio and REST services.
- Improved query performance with partitioning, Z-ordering, predicate pushdown, and compaction to lower compute and storage cost.
- Integrated Medicare and Medicaid datasets with SQL and distributed data processing to improve revenue capture and reporting readiness.
- Supported analytics workflows with reliable pipeline behavior across Spark, Flink, Kafka, PostgreSQL, and AWS services.
- Delivered optimized ETL on Teradata and Informatica while standardizing SLAs, validations, and delivery quality for healthcare data workflows.
- Built a strong foundation in enterprise data movement, operational rigor, and quality-minded delivery.
Expertise
Built around streaming systems, AWS data platforms, and operational reliability.
I bring hands-on depth across languages, data frameworks, cloud services, and platform design patterns, with a practical bias toward runtime behavior and production supportability.
01
Languages
Java, Python, Go, Rust, Scala, C++, SQL, and shell used with a practical bias toward maintainability and runtime performance.
02
Streaming and Batch
Apache Flink, Kafka/MSK, Spark, Kinesis Data Analytics, and Airflow across both event-driven and analytical workloads.
03
Storage and Infrastructure
AWS-centric delivery with DynamoDB, S3, EMR, Glue, Athena, Redshift, Kubernetes, and Docker for production-grade systems.
04
Specialties
Exactly-once processing, checkpointing, schema evolution, serialization, observability, and performance-plus-cost optimization.
Architecture Focus
- Designing pipelines that stay understandable as they scale in complexity and traffic.
- Keeping throughput, resilience, and cost efficiency aligned instead of trading one against another blindly.
- Making operational behavior visible through stronger monitoring, lineage, and debugging hooks.
Delivery Strengths
- Reproducible job configuration, disciplined deployment patterns, and production-minded defaults.
- Hands-on tuning of storage layout, state handling, JVM/runtime behavior, and data models.
- Clear collaboration with downstream analytics, platform, and product teams.
Where I Add Leverage
- Greenfield streaming architecture and modernization of high-volume legacy pipelines.
- Platform hardening for reliability, observability, and easier incident response.
- Performance optimization efforts that translate directly into lower cloud spend.
Projects
Selected engineering work that reflects the systems I enjoy building most.
These projects mirror my interest in stream processing internals, data platform design, and practical developer-facing tooling for real-time systems.
Selected Projects
DataWizz
Local-first lakehouse and analytics workspace inspired by Databricks, Snowflake, Airflow, and Superset, with file ingestion, SQL exploration, Delta publishing, orchestration, and dashboards.
GoXStream
Flink-inspired stream processor in Go with operator graphs, checkpoints, and connectors for Kafka, files, and databases.
FlowCore
Rust-powered real-time stream processing engine inspired by Apache Flink, featuring event-time processing, windows, watermarks, late-event handling, checkpointing, and a live dashboard.
Astra Sentinel
Rust desktop malware triage application for fast local file inspection with hash matching, optional YARA scanning, recursive directory analysis, and JSON report export.
Education
Jawaharlal Nehru Technological University, Hyderabad
B.Tech in Electrical Engineering
GPA 4.0/4.0 (2014 - 2018)
Professional Summary
Lead Data Engineer with hands-on depth in streaming systems, AWS-native pipelines, distributed runtime tuning, and production-grade observability.
Contact
- Emailrohankumardubey497@gmail.com
- GitHubgithub.com/Rohan-flutterint
- LinkedInrohan-kumar-dubey-3a9a31156
- Portfoliorohan-flutterint.github.io
Open to lead engineering conversations
Available for Lead data engineering roles focused on streaming systems, Platform reliability, and AWS-scale infrastructure.
If you are building critical data products and need somebody who can think deeply about runtime behavior, reliability, and cost, I would love to connect.