Handling large-scale data operations is a common hurdle in modern applications, often leading to increased operational costs and slower response times. At FloQast, we've tackled these challenges head-on by implementing effective caching strategies. This guide walks you through our approach to caching large data sets, ensuring scalability and optimal performance.
Step 1: Identifying Bottlenecks
Every optimization journey starts with pinpointing the pain points. At FloQast, we discovered that repeatedly fetching vast amounts of data from third-party cloud storage was significantly slowing down our application responses.
Start by analyzing your application's performance metrics and API response times to identify operations that consistently take longer than expected – these are prime candidates for caching optimization.

Step 2: Choosing the Right Cache Storage
When selecting a storage solution for large-scale data caching, we evaluated Redis and Amazon S3.
Redis excels at handling small, frequently-changing data but becomes impractical for large data sets due to memory constraints and scaling costs.
Amazon S3 emerged as the ideal solution for our large data sets, offering:
- Durability: Built-in redundancy for data security
- Scalability: Easily handles growing data volumes
- Cost-Effectiveness: Pay-as-you-go pricing model
- Cache Invalidation: S3 can configure lifecycle rules to automatically delete old data, reducing storage costs over time

💡 Tip: Consider Redis for high-speed access to frequently changing, small data. For durable, scalable storage with lower maintenance, S3 is often a better choice.
Step 3: Building Your Cache Client
With Amazon S3 selected as our storage solution, the next step was to develop a cache client to manage our caching operations efficiently.
Set Up the Basic Structure
First, create a new file called S3Cache.ts
that will manage our S3 interactions:
import { S3 } from 'aws-sdk';
export class S3Cache {
private s3: S3;
private bucketName: string;
constructor(options: { bucketName: string; region: string }) {
this.s3 = new S3({ region: options.region });
this.bucketName = options.bucketName;
}
// Methods will be implemented here
}
Implement the Cache Methods
With the basic structure in place, implement the following methods:
get(key)
: Retrieves data from the cache for the given key.set(key, value, options)
: Stores data in the cache with the given key.getOrSet(key, fetchFunction, options)
: Retrieves data from cache if present; otherwise, fetches, stores, and returns it.
💡 Tip: Consider how your caching client will evolve with your application. Plan for future functionality such as expiration times, data invalidation, and parallel fetching.
Step 4: Application Integration
Now, let's integrate our caching client into our application:
import { S3Cache } from './S3Cache';
const cache = new S3Cache({
bucketName: 'my-cache-bucket',
region: 'us-west-2',
});
async function fetchJoke(id: string) {
// Simulating an API call or database query
return {
id,
setup: "What do you call it when your cache needs a cache?",
punchline: "A Cache-22"
};
}
async function getJoke(jokeId: string) {
try {
const joke = await cache.getOrSet(
jokeId,
() => fetchJoke(jokeId)
);
return joke;
} catch (error) {
throw error;
}
}
// Usage
getJoke('bad-joke-001')
.catch(error => console.error('Failed to get joke:', error));
Conclusion
S3-based caching offers a scalable, cost-effective solution for handling large datasets without the operational complexity of traditional caching systems. By leveraging S3's built-in features, teams can focus on building their applications while confidently managing large-scale data operations.
