Prevent TypeORM from eating up all your memory

Prevent TypeORM from eating up all your memory
Photo by Ross Sokolovski / Unsplash

Recently, a client told me that the code I had written to sync their systems and a new datasource blew up their AWS EC2 instances. Apparently, the synchronisation engine kept getting killed for using up too much memory. More than a few gigabytes, actually. To sync a few tens of thousands of SQL rows. That didn't seem quite right, but they were adamant, so I reluctantly and doubtfully looked into it. Only to realise they were absolutely right.

Each time I ran a sync locally, node's memory usage blew up. 1GB, 2GB, 3GB.... sometimes even 4, before going suddenly back to less than a hundred MB. Now that was weird. Because there was no way a bit of text (ok, a lot of text, for a human, but certainly not a full gigabyte worth of text, let alone 4) could do that. Did I make a mistake? Somehow copy some dictionaries and objects all over the place.

It turns out, we were using TypeORM (a client requirement). And, unbeknownst to all of us, TypeORM is a greedy bastard, sometimes taking up more than 2.5Gb to fetch 4 rows (and taking its sweet time while doing so). It especially seems to have an issue with resolving one to many and many to many relationships through joins.

TypeORM, when it finds the memory cabinet - Photo by Nathan Dumlao / Unsplash

From what I've found, the only solution, aside from switching to another ORM, or getting rid of ORMs altogether, is to fetch the relationships ourselves and then repopulate our models ourselves. Meaning more work and less reliance on our ORM's features for us.

Here's an example.

Context

We have students, each in a school class. Here are our schemas for this:

Schemas

Students

import {
	BaseEntity,
	Column,
	Entity,
	JoinColumn,
	ManyToOne,
	PrimaryColumn,
} from "typeorm";

import { SchoolClassSchema } from './schoolClass.schema';

const STUDENTS_TABLE = "students";

@Entity(STUDENTS_TABLE)
export class StudentSchema extends BaseEntity {

  @ManyToMany((type) => StudentSchema)
  classMates: StudentSchema[]

  @ManyToOne((type) => SchoolClassSchema, { nullable: true })
  @JoinColumn({ name: "id_class", referencedColumnName: "id_class" })
  schoolClass: SchoolClassSchema | null;
  
  @Column({
    nullable: false,
    type: String,
  })
  first_name: string;
 
  @Column({
    nullable: true,
    type: String,
  })
  id_class: number | null;

  @PrimaryColumn({
    nullable: false,
    type: Number,
  })
  id_student: number;

  @Column({
    nullable: false,
    type: String,
  })
  last_name: string;
}

School Classes

import {
	BaseEntity,
	Column,
	Entity,
	JoinColumn,
	PrimaryColumn,
	OneToMany
} from "typeorm";

import { StudentSchema } from "./student.schema";

const CLASSES_TABLE = "classes";

@Entity(CLASSES_TABLE)
export class SchoolClassSchema extends BaseEntity {
  @PrimaryColumn({
    nullable: false,
    type: Number,
  })
  id_class: number;

  @Column({
    nullable: false,
    type: String,
  })
  class_name: string;
  
  @OneToMany((type) => StudentSchema)
  @JoinColumn({ name: "id_class", referencedColumnName: "id_class" })
  students: StudentSchema[]
}

The task at hand

We've been tasked with indexing all that in Algolia. To achieve that, we would want to fetch all the students and, for each student, their class and classmates.

The naive, memory hungry and noticeably slow (anti)solution

In this project's architecture, we isolate the database and ORM-specific logic from the rest of the app by wrapping them in services. Hence, for each model, a TypeORM Entity Repository and a Service.

TypeORM Entity Repository

And this would be our naive TypeORM repository for our students

import { AbstractRepository, EntityRepository } from "typeorm";
import { StudentSchema } from "../schemas";


@EntityRepository(StudentSchema)
export class StudentsRepository extends AbstractRepository<StudentSchema> {
	/**
	*
	* Returns a single page of Students
	* @returns promise of page's Students
	*/
	async getPage(params: {
		page: number;
		perPage: number;
	}): Promise<StudentSchema[]> {
		const { page, perPage } = params;
		return this.repository
		.createQueryBuilder('student')
		.leftJoinAndSelect('student.schoolClass', 'schoolClass')
		.leftJoinAndSelect('schoolClass.students', 'classMates')
		.skip(perPage * page)
		.take(perPage)
		.getMany()
	}
		
	/**
	*
	* Returns the count of all Students
	* @returns promise of Students count
	*/
	async getItemsCount(): Promise<number> {
		return this.repository.createQueryBuilder('student').getCount();
	}
	
	/**
	*
	* Returns all the Students
	* @returns promise of all Students
	*/
	async list(): Promise<StudentSchema[]> {
		const perPage = 1000;
		const totalItemsCount = await this.getItemsCount();
		const pagesCount = Math.ceil(totalItemsCount / perPage);
		const pages = Array.from({ length: pagesCount }, (v, index) => index);
		const results = await Promise.all(pages.map((pageNumber) => this.getPage({ page: pageNumber, perPage })));
		return results.reduce(
			(acc, resultsPage) => acc.concat(resultsPage),
			[] as StudentSchema[],
		);
	}
}

Service

import { Student } from '../models';
import {
  StudentsRepository,
} from './repositories';

import { StudentUtils } from './utils'; 

export interface StudentsServiceInterface {
  list: () => Promise<Student[]>;
}

/**
 * API for Students
 * @param repository The Students repository
 * @param students The list of students (initially empty) returned at the end of this function call
 * @param pageToken The possible string defining the token for accessing the next page of results
 */
export const StudentsClient = (
  repository: StudentsRepository,
): StudentsServiceInterface => ({
  /**
   * API Call to list all the students from a repository
   */
  list: async (): Promise<Student[]> =>
    repository
      .list()
      .then((students) => students.map((student) => StudentUtils.schemaToModel(student, student.classMates, student.schoolClass))),
});

Clean, short, clear. In many words rather than one, easy on the eye (aka readable). Sadly, this is precisely what leads to TypeORM gobbling up more than a few gigabytes of RAM while loading a measly few thousand items with, at most,  ten relationships each.

TypeORM discussion · Issue #1131 · bloom-housing/bloom
The problem with our current setup (my subjective opinion): Excessive memory usage when querying listings and applications because of nested joins, see issues: typeorm/typeorm#4499 typeorm/typeorm#...

Improved, more performant and less memory hungry

Instead of naively relying on the library's .leftJoin query building method, we'll fetch those relationships separately and rebuild the relations ourselves, in the Service. It involves more work on our side, but the result (ie not blowing up our client's EC2 instances anymore) is more than worth it.

Students Repository

import { AbstractRepository, EntityRepository } from "typeorm";
import { StudentSchema } from "../schemas";


@EntityRepository(StudentSchema)
export class StudentsRepository extends AbstractRepository<StudentSchema> {
  /**
   *
   * Returns a single page of Students
   * @returns promise of page's Students
   */
  async getPage(params: {
    page: number;
    perPage: number;
  }): Promise<StudentSchema[]> {
    const { page, perPage } = params;
    return this.repository
       .createQueryBuilder('student')
       .skip(perPage * page)
       .take(perPage)
       .getMany()
  }

  /**
   *
   * Returns the count of all Students
   * @returns promise of Students count
   */
   async getItemsCount(): Promise<number> {
    return this.repository.createQueryBuilder('student').getCount();
  }
  
  /**
   *
   * Returns all the Students
   * @returns promise of all Students
   */
   async list(): Promise<StudentSchema[]> {
    const perPage = 1000;
    const totalItemsCount = await this.getItemsCount();
    const pagesCount = Math.ceil(totalItemsCount / perPage);
    const pages = Array.from({ length: pagesCount }, (v, index) => index);
    const results = await Promise.all(
      pages.map((pageNumber) => this.getPage({ page: pageNumber, perPage })),
    );
    return results.reduce(
      (acc, resultsPage) => acc.concat(resultsPage),
      [] as StudentSchema[],
    );
  }
}

School Classes Repository

import { AbstractRepository, EntityRepository } from "typeorm";
import { SchoolClassSchema, StudentSchema } from "../schemas";


@EntityRepository(SchoolClassSchema)
export class SchoolClasssRepository extends AbstractRepository<SchoolClassSchema> {
  /**
   *
   * Returns a single page of SchoolClasss
   * @returns promise of page's SchoolClasss
   */ 
	 async schoolClassesForStudents(students: StudentSchema[]): Promise<SchoolClassSchema[]> {
    return this.repository
      .createQueryBuilder('schoolClass')
      .where('schoolClass.id_class IN (:ids)', {
        ids: students.reduce(
          (acc: number[], student) =>
            student.id_class === null || acc.includes(student.id_class)
              ? acc
              : acc.concat(student.id_class),
          [],
        ),
      })
      .getMany();
  }
}

Services

Students Service

import { Student } from '../models';
import {
  SchoolClassesRepository,
  StudentsRepository,
} from './repositories';

import { StudentUtils } from './utils'; 

export interface StudentsServiceInterface {
  list: () => Promise<Student[]>;
}

/**
 * API for Students
 * @param repository The Students repository
 * @param students The list of students (initially empty) returned at the end of this function call
 * @param pageToken The possible string defining the token for accessing the next page of results
 */
export const StudentsClient = (
  schoolClassesRepository: SchoolClassesRepository,
  repository: StudentsRepository,
): StudentsServiceInterface => ({
  /**
   * API Call to list all the students from a repository
   */
  list: async (): Promise<Student[]> =>
    repository.list().then(
      async (students) => {
        const schoolClasses = await schoolClassesRepository.classesForStudents(students);
        const studentsWithRelationships = students.map((student) => {
          const schoolClass = schoolClasses.find((schoolClass) => schoolClass.id_class === student.id_class);
          let repositoryStudent = {
            ...student,
            classMates: students.filter((classMate) => classMate.id_class === student.id_class),
            schoolClass: schoolClass
          }
          return StudentUtils.schemaToModel(repositoryStudent, repositoryStudent.classMates, repositoryStudent.schoolClass)
        });
        return studentsWithRelationships;
      },
    ),
});

As you can see, readability took a hit here. And our service now relies on more dependencies than before. But requests take less than a second, and, just as importantly, they've stopped crashing our servers by eating up all of their memory.


Epilogue

Quite often, the most obvious way isn't the best. What I sorely missed here was a simple and systematic way of measuring my program's compute and memory consumption. Something I will try to find and implement a solution for in the future.

Because while I certainly am not an optimisation fanatic, and I do indeed favour clarity and readability over performance, these kinds of trade-offs should be actual, knowingly chosen, trade-offs. Not mere costly accidents.

Icecream
Who said eating ice cream had to be unreasonable and messy? - Photo by Courtney Cook / Unsplash