Complex-ish SQL queries in Typescript using TypeORM

I have, as a teenager, fiddled around with phpMyAdmin when fooling around with WordPress and OGSpy, back when OGSTeam still relied on SVN for versioning, and I didn't know crap about development, let alone architecture.

Complex-ish SQL queries in Typescript using TypeORM
Photo by Rick Mason / Unsplash

I have, as a teenager, fiddled around with phpMyAdmin when fooling around with WordPress and OGSpy, back when OGSTeam still relied on SVN for versioning, and I didn't know crap about development, let alone architecture.

Simply put, while I did have some general culture on the topic of relational databases, and had written a few SELECT while trying to get through some prospective client's or employer's online coding challenges, it never went further than that. Until last year when I had to work with an SQL database for the first time in my life as a software engineer.

Which is also when I faced a small albeit fun challenge.

Context

Let's suppose we work for a huge software services consulting and contracting firm. The kind most of us would rather not work at.

The kind of firm that gets sued for $32 million for failing to deliver a non-responsive design e-commerce website.

The kind of firm responsible for this sort of software (the post isn't available anymore, hence the Wayback Machine link; given how badly it breaks the design, I suggest using your browser's reader mode for readability).

Or the kind of firm that somehow manages to spend over a decade and €80 million (I couldn't find a page in English, but it should be easy to translate using DeepL) building a steaming pile of crap that somehow manages to cost an additional €76 million in damage control.

Lost Hallway
How I imagine the second post's firm's offices - Photo by Matthew Feeney / Unsplash
lost in space
Insight into the software's architecture - Photo by Martijn Baudoin / Unsplash
Splatter paint on white table
Louvois' latest UI mockup - Photo by Ricardo Viana / Unsplash

The Actual Context

No. Let's not do that. Instead, let's suppose we work for a much better, yet still huge, software services consulting and contracting firm, simultaneously working on a few thousand projects.

The (fictional) project we are working on is an internal one. A SaaS that will enable our coworkers to better understand the different projects the company is involved in, how they are doing, etc... to avoid ending up like the three I mentioned earlier.

And this project uses an SQL database.

Models

Each project has a name, an associated client, and different types of work associated with it. Whether it's sourcing requirements, investigating other tools and off-the-shelf solutions, designing the product, gathering feedback on various releases, re-designing the product based on this feedback, testing, planning it all, and actually developing the damn thing.

Each of these types of work will have its own estimate. Estimates that will be evaluated, re-evaluated, and refined regularly depending on many parameters, from input gathered during the initial phases and external factors to product issues identified during feedback, unanticipated technical limitations encountered while producing the project, and evolving requirements.

To better retrospectively analyse how projects went and why, it has been decided that each and every estimate for each and every type of work should always be kept.

So here we have it.

Projects

A projects table, where the model would look like this:

interface ProjectModel {
	id: Int,			// The table's primary key
	id_client: UUID,		// A relationship we won't care about
	name: String,
}

Work Types

A work_types table, with the following model:

interface WorkTypesModel {
	id: Int,			// The table's primary key
	name: String,
}

Estimates

A project_estimates table, of which the model would look like this:

interface ProjectEstimate {
	id: Int,			// The table's primary key
	id_project: Int,
	id_work_type: Int,
	duration: Int,
}

Basically, we have a many-to-one relationship between our project estimates and our projects, and a many-to-one relationship between our project estimates and our work types. Through the project_estimates table, which acts as a join table, we also have a many-to-many relationship between our projects and our work types.

The job to be done

Today's task, should we accept it, is an ETL (Extract, Transform, Load) task. Our architecture relies on ElasticSearch for complex searches, regularly synced with the SQL database's data, and the frontend team would like it if we could add each project's latest estimate in the data synced over to ElasticSearch[^1].

Now, you'll notice an annoying thing. Estimates don't have a creation date associated with them. Luckily for us, someone had the odd, yet helpful to us today, idea of using incremental integers for our tables' primary keys. Meaning that the most recent estimate also has the greatest id.

How I did it

Basically, what we want to do is

  • For each project, get the most recent estimate for each work_type.
  • Sum the estimates
  • Return each project's id with it's associated estimates durations sum.

And here is how I did it in Typescript, using TypeORM (not my first choice, but more on that later, and had I known I would have insisted we use something else).

  /**
   *
   * Returns a map of all the Projects' estimated durations
   * It does so by summing up, for each id_project, the durations of the most recent rows for each id_work_type & id_project combination from the project_estimates table
   * @returns promise of a map of each Project's total estimated duration
   */
  async getProjectDurations(): Promise<{ [projectId: number]: number }> {
    const estimatedDurations: ProjectEstimate[] = await this.repository.query(`
      SELECT
      t1.id_project,
      SUM(t1.duration) AS total_duration
      FROM (
        SELECT
          MAX(t2.id_estimate) AS most_recent_es,
          t2.id_project,
          t2.id_work_type,
          t2.duration
        FROM project_estimates AS t2
        GROUP BY t2.id_project, t2.id_work_type
      ) AS t3
      INNER JOIN project_estimates AS t1
      ON t3.id_project = t1.id_project AND t3.id_work_type = t1.id_work_type AND t3.most_recent_es = t1.id_estimate
      GROUP BY id_project
    `);
    const estimatedDurationsMap = estimatedDurations.reduce(
      (map, estimatedDuration) => ({
        ...map,
        [estimatedDuration.id_project]: estimatedDuration.total_duration,
      }),
      {},
    );
    return estimatedDurationsMap;

Epilogue

Conclusion? Well, if possible, don't build systems that rely on happenstance (the incremental integer id), try to plan your models for future use cases (saving the creation date and the latest update's date seems a safe bet), and keep finding joy in solving small puzzles. In my eye, along with actually building something, it's a big part of what makes what we do so great. That and building stuff that matters and helps people. But having learned my lesson, I will happily settle on not contributing to products that pollute our daily lives, or actually kill people.

Anyway, I hope it helped you, and I wish you a wonderful day! Also, as always, reach out to me on Twitter should you have any questions, feedback, or improvements to suggest! Bye!

Also, I've recently, and finally, started traveling, so that's nice - ©ElFitz 2022

[1]: Now, perhaps all this could have been avoided with a CQRS / Event Sourcing architecture, but remember that this is an internal tool, not the LMAX. The consulting firm we work at may be one the best, but we're lucky to get an actual full-time dedicated team for this project, and some time to tackle technical debt. Most of the places I've been weren't like that.