Highway to Scale – Part 7

How we scaled the BeeHero API server: from ad hoc HTTP and ORM usage to a consistent RESTful standard, optimized queries and schemas, table partitioning, and a React client that uses the API efficiently

Inbar Shani

Chief Software Architect

February 22, 2026

Scaling Our API Server

‍

Where We Started

‍

The API was built on stateless HTTP from day one, but we had no shared standard for naming, implementation, or when to use which pattern. Routes were action-oriented instead of resource-oriented (e.g. “get this” or “do that” in the URL). That made understanding the API non-trivial, and made it hard to reuse data across endpoints, as the response was custom tailored to the specific action.

‍

We used SQLAlchemy and had wrappers for common DB operations (bulk insert/update, commit/rollback), but ORM usage was on a case-by-case basis - no standard approach and no systematic optimization of queries. That was fine at first; as traffic and data grew, it became a bottleneck.

‍

First Phase: A RESTful Standard

‍

We introduced a RESTful API standard, documented in a confluence page at the time - jumping to the present, the standard is part of our AI agents skills. The rules are clear: resource-based URLs in kebab-case and plural, HTTP verbs with defined semantics (GET for read, PUT for update, etc.), filter/select/range and `with_nested` query params, and sub-collections under a primary resource. We shared this with the team and made adherence in PRs a requirement.

‍

Because the approach was generic (resources, filters, range, nested data), we could implement generic request handlers: shared logic for parsing query params and building queries. That let us implement standard GET and PUT collection endpoints with very little route-specific code.

‍

GET – Parse query params, build the query (filters, order, range), choose schema, run query, serialize:

_{@admin_blueprint2.route('/seasonal-activities',}_methods_=['GET'])

_{@roles_accepted(*ALL_ADMINS, UserRoles.AWS)}

_{def get_seasonal_activities():}

_{query_params = {**request.args}}

_{schema = QueryParams(query_params).get_schema(}

_{SeasonalActivity, SeasonalActivityWithNestedSchema,}

_{seasonal_activities_with_nested_schema, seasonal_activities_schema, True}

₎

_return_{collection_get(request, SeasonalActivity, schema)}

‍

PUT – Parse body (list of entities with `id` or filter + single object), apply changes, commit:

_{@admin_blueprint2.route('/seasonal-activities',}_methods_=['PUT'])

_{@roles_accepted(*ALL_ADMINS)}

_{def update_seasonal_activities():}

_return_{collection_put(request, SeasonalActivity)}

‍

We still implement custom logic where a resource needs it (e.g. special validations or side effects), but the majority of collection GET and PUT flows go through these handlers. Over time we extended the request handlers to support join tables, extra conditions, dynamic Marshmallow schema selection, and range (X-Range / X-Items-Range) for pagination, so more complex resources could stay in the same pattern.

‍

ORM and DB Utils: The Good and the Trap

‍

Our SQLAlchmey ORM usage and `db_utils` wrappers gave us a consistent way to do commit/rollback, clear faulty data, and handle bulk actions. The main challenge was how we used the ORM across the codebase, especially around joins and relationship loading.

Lazy loading was convenient: you touch `entity.relationship` and SQLAlchemy runs a query. The trap was serialization: when a Marshmallow schema rendered a collection of entities, it often accessed a relationship on each entity. That meant one query per entity (N+1 queries), which didn’t scale.

‍

Continuous Optimization: Load Options and Schemas

‍

We started continuously reviewing endpoint performance, looking for N+1s and heavy queries. The levers were:

Load options – Use `joinedload` (to-one) and `subqueryload` (to-many) so relationships are eager-loaded in a predictable way instead of lazy-loaded during serialization.
Marshmallow schemas – Only serialize what the client needs; avoid loading and dumping relationships that aren’t used.

‍

The generic request handlers support this end-to-end. The client can request nested data via `with_nested` query param(e.g. `with_nested=gateway_config,sensor_config`). We then:

- Choose or create a schema that includes only the requested nested fields (`get_schema`).

- Apply load options only for those relationships (`filter_load_options_by_with_nested`), so we don’t over-fetch.

‍

_{@admin_blueprint2.route('/carbon-templates',}_methods_=['GET'])

_{@roles_accepted(*ALL_ADMINS)}

_{def get_carbon_templates():}

_{query_params = {**request.args}}

_{query_params_parser = QueryParams(query_params,}_{raise_error_on_invalid_column_filter}_=False)

_{schema = query_params_parser.get_schema(}

_{CarbonTemplate, carbon_template_schema,}

_{carbon_templates_with_nested_schema, carbon_templates_schema}

₎

_{load_options = query_params_parser.filter_load_options_by_with_nested({}

_{'seasonal_activities': [subqueryload(CarbonTemplate.seasonal_activities)],}

_})

_return_{collection_get(}

_{request, CarbonTemplate, schema,}

_{raise_error_on_invalid_column}_=False,

_{load_options}_{=load_options,}

₎

‍

So: dynamic schema (flat vs nested, or a subset of nested fields) and dynamic load options (only what the client asked for) keep both DB and response size under control.

‍

Table Partitioning

‍

For large tables, we introduced table partitioning (e.g. by time or another partition key). That only helps if:

Queries always use the partition key in filters, so the planner can prune partitions.
Partitions are created ahead of time - we added a small Lambda that creates future partitions on a schedule.
We monitor the default partition - alerts fire when rows land there, so we don’t accumulate unpartitioned data.

‍

We also reconsidered indexes: on partitioned tables, indexes should be compound with the partition key to be effective. Adding an index without the partition key often doesn’t help as much as we’d expect.

‍

Learning from DB Logs

‍

We used RDS Performance Insights (and DB logs in general) to see which queries caused load or wait. For the heaviest ones we:

Reconsidered joins (necessary vs redundant, order).
Used LATERAL joins and CTEs where they simplified or sped up the plan.
Tightened WHERE clauses and filters so the database could use indexes and partition pruning.

This was iterative: find a slow or busy query, optimize it, then look for the next one.

‍

React Client: Reuse and Parallel Requests

‍

On the client we wanted to call the API less and use data more across components.

React Query – we use it to decide when to call endpoints (cache, invalidation, refetch). That reduces duplicate requests and keeps UI state in sync with the server.
Generic RESTful endpoints – because GET collections support filters, `select`, and `with_nested`, different pages and components can reuse the same endpoint with different params instead of each having its own “custom” endpoint that returns a one-off shape. That improves cache reuse and consistency.
RxJS for parallel range requests – when we need a large collection, we use the API’s X-Range support. We fetch the first page, read `X-Items-Range` for the total count, then fetch the remaining ranges in parallel (with a cap on concurrency) and merge. Retries and timeouts are handled in one place.

‍

Example: fetching sensors with parallel range requests via `getWithParallelRequests`, and using it from a React Query hook

_{export const getWithParallelRequests = <T>(}

_url_{: string,}

_queryParams_{: any = {},}

_pageSize_{: number =}_{DEFAULT_PAGE_SIZE}_,

_{maxParallelRequests}_{: number =}_{DEFAULT_MAX_PARALLEL_REQUESTS}

_{): Promise<T[]> => {}

_{return lastValueFrom(}

_{fetchFirstPageWithRetry<T>(url, queryParams, pageSize).pipe(}

_mergeMap(({_data_,_headers_{}) => {}

_{if (headers['x-items-all-range'] === 'true') return of(data);}

_{const totalItems = parseInt(headers['x-items-range']?.split('/')[1]);}

_{const resultsArray = Array(totalItems);}

_{fillResultsArray(resultsArray, data, 0);}

_{const pageRanges = [];}

_{for (let i = pageSize; i < totalItems; i += pageSize) {}

_{pageRanges.push({ start: i, end: Math.min(i + pageSize - 1, totalItems - 1) });}

_}

_{return from(pageRanges).pipe(}

_mergeMap(({_start_,_end_{}) => from(fetchRangeWithRetry<T>(url, queryParams, start, end)),}

_{maxParallelRequests),}

_reduce((_results_{, {}_start_,_data_{}) => {}

_{fillResultsArray(results, data, start);}

_{return results;}

_{}, resultsArray)}

_);

_})

₎

_);

_};

‍

_{export const fetchSensors = ({}_queryKey_{}: { queryKey: QueryKey }): Promise<Sensor[]> => {}

_{const [_key, queryParams, isParallel] = queryKey;}

_{if (isParallel) {}

_{return getWithParallelRequests<Sensor>('/sensors', queryParams);}

_}

_{return axios.get<Sensor[]>('/sensors', { params: queryParams }).then((}_res_{) => res.data);}

_};

‍

We have not moved to client-side paging (e.g. “fetch only a specific page and show it in the UI until the user moves to the next page”) because our users need to filter quickly; we don't want to rely server filters which will require an additional server call for any change in the filter by the user. If we had a case where the dataset was too large even with range and filters, we’d consider client-side paging or virtualized lists for that flow.

‍

Summary

‍

Scaling the API server came from:

A single RESTful standard and generic request handlers, espeically for GET and PUT collections.
Consistent ORM use: load options and schemas aligned with what the client requests (`with_nested`, filters, range).
Partitioning large tables and keeping queries and indexes partition-key-aware.
Using DB metrics to find and optimize the most expensive queries.
Client-side discipline: React Query for when to call, generic endpoints for reuse, and RxJS-driven parallel range requests for large collections.

Together, these changes let us grow traffic and data size while keeping the API predictable for both backend and frontend teams.

‍