How do you optimize Firestore queries for a large dataset?

Optimizing Firestore Queries for Large Datasets

Firestore is powerful, but inefficient queries can increase costs, slow performance, and hit usage limits. Here’s how to optimize queries for large datasets effectively.


1. Structure Data for Fast Queries (No Joins!)

Firestore doesn’t support joins like SQL, so data modeling is crucial.

Example: Chat App Data Model

Instead of storing all messages inside a chat document (bad for scalability):
* Use subcollections for messages:

/chats/{chatId}/messages/{messageId}
Why ?
  • Each document should be ≤1MB (Firestore limit).
  • Subcollections allow faster indexed queries.
  • Firestore loads only necessary documents (not full parent docs).

2. Use Indexes for Faster Queries

Firestore automatically indexes fields used in simple queries, but complex queries (e.g., multiple where conditions or orderBy) require composite indexes.

Example: Query Without Index
db.collection("orders")
  .where("status", "==", "shipped")
  .where("customerId", "==", "user123")
  .orderBy("orderDate", "desc")
  .limit(10);

* Firestore will reject this query unless you create a composite index.

Fix : Create an Index
  1. Check Firestore "Indexes" tab → Click "Create Index"
  2. Add fields:
    • status (Filter)
    • customerId (Filter)
    • orderDate (Sort: Descending)
  3. Click "Create"

3. Use .select() to Reduce Data Transfer

By default, Firestore returns all fields in a document, even if you don’t need them.

Example : Fetch Only Required Fields
db.collection("users")
  .select("name", "email")  // Fetch only name & email (not full profile)
  .get();

* Reduces data transfer cost and query execution time.


4. Paginate Large Queries with .startAfter()

Fetching too much data at once can cause slow performance and expensive reads.

Paginated Query (Limit 20)
let firstQuery = db.collection("posts")
    .orderBy("createdAt", "desc")
    .limit(20);

let snapshot = await firstQuery.get();
let lastVisible = snapshot.docs[snapshot.docs.length - 1];  // Get last doc

let nextQuery = db.collection("posts")
    .orderBy("createdAt", "desc")
    .startAfter(lastVisible)  // Fetch next page
    .limit(20);
Why?
  • Avoids Firestore hitting read limits.
  • Fetches only necessary data, instead of entire collection.

5. Filter First, Then Sort

Firestore processes where() before orderBy().
Mistake : Ordering first and then filtering results in errors.

* Correct Order :
db.collection("products")
  .where("category", "==", "electronics")  // Filter first
  .orderBy("price", "asc")                 // Then sort
  .get();
Why?
  • Firestore can’t sort on fields not in filters or indexes.
  • Always filter data first, then apply sorting.

6. Avoid != & Array Queries When Possible

Firestore does not support != queries directly, and array queries can be costly.

Bad Query (Not Supported)
db.collection("users")
  .where("role", "!=", "admin")  // Firestore does not support '!='
  .get();
Workaround Using array-contains-any
db.collection("users")
  .where("role", "in", ["editor", "viewer"])  // Instead of '!='
  .get();
Why?
  • Firestore supports in (max 10 values) but not !=.

7. Use Firestore TTL (Time-To-Live) for Expiring Data

If old data isn’t needed, automate deletion to reduce query size.

* Example: Auto-Delete Old Logs
  1. Store timestamp in documents.
  2. Use a Cloud Function to delete old entries.
exports.cleanupOldLogs = functions.pubsub.schedule('every 24 hours').onRun(async () => {
    const cutoff = Date.now() - 30 * 24 * 60 * 60 * 1000; // 30 days ago
    const query = db.collection("logs").where("createdAt", "<", cutoff);
    const snapshot = await query.get();

    let batch = db.batch();
    snapshot.forEach(doc => batch.delete(doc.ref));
    await batch.commit();
});

* Keeps dataset small, improving query performance.


8. Optimize Real-Time Queries with .onSnapshot()

Firestore re-fetches all matching documents when using real-time updates.

Fix : Use docChanges() to Track Only New Data
db.collection("messages")
  .orderBy("timestamp", "desc")
  .limit(20)
  .onSnapshot(snapshot => {
      snapshot.docChanges().forEach(change => {
          if (change.type === "added") {
              console.log("New message:", change.doc.data());
          }
      });
  });
Why?
  • Updates only when new data is added, not every time a document updates.

Summary: Firestore Query Optimization Cheat Sheet
Issue Solution
Large data sets Use subcollections instead of nested arrays
Slow queries Index fields used in where() & orderBy()
High read costs Use .select() to fetch only necessary fields
Too much data at once Use pagination (startAfter())
Sorting before filtering Always filter first, then sort
Inefficient real-time updates Use docChanges() instead of full reloads
Too many old documents Set up TTL & batch deletes