All Case Studies
Analytics EngineeringData Engineering

Optimizing Data Warehouse for Y-Combinator Backed Startup

Reducing query time of business-critical queries by 80%.

Background

A Y-Combinator backed startup engaged Beyond Data Consulting to optimize their data warehouse. Billions of data points per month were being loaded into their AWS Athena Data Warehouse, and they were experiencing significant delays on business-critical queries due to poorly designed partitioning schema.

Solution

The consulting team architected a new optimized partitioning and clustering structure for the company's data lake and migrated to this solution. The work included:

  1. Migrating old data from one S3 bucket into a different, repartitioned bucket in S3.
  2. Performing similar optimizations from the migration process onto the data ingestion process in AWS Kinesis to ingest data into this new repartitioned architecture.
  3. Introducing handling and reporting of low-quality data to report on data that didn't correctly fit the correct schema.

Results

  • Improved Efficiency: Average query time on high-frequency business-critical queries were reduced by 80%.
  • Improved Visibility: By adding handling and reporting of low-quality data, the time to respond to issues in the data ingestion stream was reduced significantly.

Technologies Used

PythonApache SparkAWS KinesisAWS GlueAWS Athena