When modelling users in a hierarchical organization, it's common to need to retrieve the line manager or subordinates of a given user. In MySQL, two of the techniques that can be used to handle these recursive relationships are the
WITH RECURSIVE common table expression and using closure tables.
Using CTEs with
A common table expression (CTE) is a temporary result set that is defined within the execution scope of a single SQL statement. In MySQL, you can use the
WITH RECURSIVE clause to define a CTE that can be used to retrieve all of the users above or below a given user in the hierarchy.
Say we have a
users table that looks like this:
1CREATE TABLE users (2 id INTEGER PRIMARY KEY,3 name VARCHAR(255) NOT NULL,4 manager_id INTEGER REFERENCES users(id)5);
To create a CTE that gets the users above a given user, you can use the following query:
1WITH RECURSIVE managers AS (2 SELECT id, name, manager_id3 FROM users4 WHERE id = :user_id5UNION ALL6 SELECT u.id, u.name, u.manager_id7 FROM users u8 INNER JOIN managers m ON m.manager_id = u.id9)10SELECT * FROM managers;
This query uses the
WITH RECURSIVE clause to define a CTE called
managers that first selects the user with the given
user_id and then uses the
UNION ALL operator to combine that user with all of the users above them in the hierarchy. The
INNER JOIN clause is used to link the
managers CTE to the users table, so that the
managers CTE can keep growing until it includes all of the users above the given user in the hierarchy.
To get the users below a given user, you can use a similar query, but with the
id fields swapped in the INNER JOIN clause:
1WITH RECURSIVE subordinates AS (2 SELECT id, name, manager_id3 FROM users4 WHERE id = :user_id5UNION ALL6 SELECT u.id, u.name, u.manager_id7 FROM users u8 INNER JOIN subordinates s ON s.id = u.manager_id9)10SELECT * FROM subordinates;
This query uses the same technique as the previous example, but it retrieves the users below the given user instead of above.
Using closure tables
Another technique for handling recursive relationships in MySQL is to use closure tables. A closure table is a special type of table that is used to represent hierarchical data in a relational database. It consists of two tables: one for the entities in the hierarchy (e.g. users) and one for the relationships between those entities.
To create a closure table for a hierarchy of users, you can use the following DDL statement:
1CREATE TABLE users (2 id INTEGER PRIMARY KEY,3 name VARCHAR(255) NOT NULL4);56CREATE TABLE hierarchy (7 ancestor INTEGER NOT NULL REFERENCES users(id),8 descendant INTEGER NOT NULL REFERENCES users(id),9 PRIMARY KEY (ancestor, descendant),10 CHECK (ancestor != descendant)11);
The users table is similar to the one in the previous example, but it only contains the
name fields. The relationships between the users are defined in the
To insert data into these tables, you would first need to insert rows into the users table for each user in the hierarchy, and then insert rows into the hierarchy table to define the relationships between those users. Here is an example of how this could be done:
1INSERT INTO users (id, name) VALUES2 (1, 'Alice'),3 (2, 'Bob'),4 (3, 'Carol'),5 (4, 'Dave'),6 (5, 'Eve');78INSERT INTO hierarchy (ancestor, descendant) VALUES9 (1, 1),10 (1, 2),11 (1, 3),12 (1, 4),13 (1, 5),14 (2, 2),15 (2, 3),16 (2, 4),17 (3, 3),18 (4, 4);
This inserts five users into the
users table and then inserts rows into the
hierarchy table to define the relationships between those users. The ancestor and descendant values are both set to the same id for each user, which indicates that each user is a descendant of themselves. The hierarchy table also defines the relationships between the users, with each user being a descendant of the user above them in the hierarchy.
To get all of the users above a given user using a closure table, you can use the following query:
1SELECT u.*2FROM users u3INNER JOIN hierarchy h ON h.descendant = :user_id4WHERE u.id = h.ancestor
This query uses an
INNER JOIN to link the
users table to the
hierarchy table, and then filters the results to only include users that are ancestors of the user with the given
user_id. This query will return a list of all of the users above the given user in the hierarchy.
To get the users below a given user, you can use a similar query, but with the ancestor and descendant fields swapped in the INNER JOIN clause:
1SELECT u.*2FROM users u3INNER JOIN hierarchy h ON h.ancestor = :user_id4WHERE u.id = h.descendant
This query will return a list of all of the users below the given user in the hierarchy.
One downside of the closure table approach is that when the hierarchical structure changes, the
hierarchy table needs to be updated to reflect the changes. There are a number of approaches you can take to handle this, including:
- Creating MySQL triggers to update the
hierarchytable automatically when a user is inserted, updated or deleted
- Using events, such as Eloquent model events, to apply the changes in application code
- Truncating and repopulating the
hierarchytable from scratch
The first approach is generally the most efficient, but has the downside that triggers aren't generally exported from
mysqldump, making it difficult to manage when importing the production database locally. The last approach often makes the most sense in cases where users are populated from some kind of regular import, in which case the hierarchy will only ever change as a result of that import.
Which one should I use?
Which of these approaches you should choose in a given situation is highly dependent on the specific needs of your application, since each has advantages and limitations.
Using a closure table results in a smaller, simpler, and generally more efficient query that is easy to express using an ORM or query builder, but requires that you take steps to update the separate closure table when the hierarchy changes. Using the
WITH RECURSIVE CTE doesn't require a separate table, eliminating the need to populate said table, but for some queries it may not be as efficient. In addition, it can be difficult to express with some ORMs and query builders, necessitating either additional third party packages or falling back to raw queries. If you're stuck using an older version of MySQL, such as on a legacy application, and can't upgrade, you might also not be able to use
WITH RECURSIVE (though at this point you really shouldn't be using a version that old).
At times I've found it necessary to combine both techniques. One application I maintain has a nightly import process for all the users and derives the permissions to view various pieces of content in part from the hierarchy - the only way to determine the hierarchy is by following the line managers back all the way to the managing director for each individual user, but permissions can be assigned to individual business units within the company and cascade down to child business units, and so to know what permissions a user has, we need to know where they sit in the hierarchy. This query would be too cumbersome to perform on the fly for each user, so we use the
WITH RECURSIVE CTE to detetermine a user's place within the hierarchy, and then populate a closure table from it, as a part of the nightly import.
In MySQL, there are a number of techniques that can be used to handle recursive relationships in hierarchical data, but two of the most performant and flexible are the
WITH RECURSIVE CTE and using closure tables. Both techniques have their own benefits and drawbacks, and the best choice will depend on the specific requirements of your application.