SQLAlchemy ORM: Setting Up Self-Referential Many-to-Many Relationships
If you’re new to SQLAlchemy, or are trying to get started, this may not be the post for you as I assume some familiarity with Python, SQL, and SQLAlchemy. Perhaps the tutorials section on the package’s website is a better place to start. SQLAlchemy has proven to be pretty handy once you figure out how to do what you’re trying to do. However, getting there can be a bit challenging, at least it was for me.
In this post, I will go over one challenge that I recently ran into, and that is how to set up models for database tables with many-to-many relationships to themselves.
I will assume that you are familiar with relational database concepts and somewhat familiar with Python and SQLAlchemy’s ORM. So, let’s get to it! I’ll start with what the tables look like.
The Database Tables
Table 1: Person table
PersonId PersonType Name
1 Parent Bob
2 Parent Mary
3 Child Tommy
4 Child Sally
Table 2: ParentChild (intersect) table
ParentChildId ParentId ChildId
1 1 3
2 2 3
3 1 4
4 2 4
These tables aren’t exactly the ones I was working with, but they should better illustrate the idea that the Person table has a many-to-many relationship to itself. I.e., one parent can have many children, and one child can have many parents.
Joining this table to itself to get a child’s parents, or a parent’s children, would look like the following:
At first, it wasn’t clear to me whether SQLAlchemy could handle this scenario, but as it turns out, it can! So, why don’t we take a look at the code first, then I’ll explain.
Building the SQLAlchemy Models
Imports
If we start at the top, the first 4 lines are just the necessary imports from SQLAlchemy. On the 3rd line, I am just importing the MS SQL Server datatypes provided by SQLAlchemy. You should be able to use those from whatever dialect you are using (e.g., MySQL, Postgres, etc.)
Base object
Next, in a similar fashion to the documentation, we create a Base
object via the declarative_base()
function. This Base
class will be the class that our database-table model classes inherit from. For more info, see the Declare a Mapping section of the documentation.
Intersect table
After that, we create a variable for the intersect table called parent_child
. Interestingly, SQLAlchemy seems to require us to represent intersect tables this way, where we have a variable that we set equal to a Table
object which is initialized with the name of the table, the Base.metadata
, and then the Column
objects. This as opposed to being created like the other tables, which is to say, as classes that subclass the Base
object.
Notice that the two Columns
after the primary key column also specify a ForeignKey
param, each pointing back to the primary key of the Person table.
Note: Some people might find it confusing that the value you pass to the ForeignKey
class must be the table name and column name as they appear in the database, not how you specified them in your Python models. So here, that value is ForeignKey(‘Person.PersonId’)
, which is the actual table and column name in the database.
Table Model class
Lastly, the Person
class is the model of the database table that we are trying to represent. In this class, we add the required __tablename__
attribute, add Column
objects for each column making sure to specify the person_id
as the primary key, and finally the relationship.
For the relationship, we need to add an attribute to the table model class Person
to represent the list of foreign objects, which in this case are more Person
objects. So, let’s call it parents
.
Next, we set parents
equal to SQLAlchemy’s relationship
function, which we imported at the top.
The relationship function takes a variable number of arguments depending on the type of relationship you’re setting up. The first argument to pass is the string name of the foreign table model’s class name. Since we are creating a relationship from Person
back to itself, this is just ‘Person’
.
The next argument is a keyword argument called secondary
that must be set equal to the intersect-table variable, parent_child
.
In my experience, this is all that would be required for your average many-to-many relationship, but for our case, we also need to specify the primaryjoin
and secondaryjoin
keyword arguments so that SQLAlchemy knows what to do with the two ForeignKey
columns in our intersect-table variable. And this makes sense if you were to recreate the join order in SQL, which I suggest doing so that you have a clear picture of the relationships you’re trying to represent.
Note: for anyone reading this that might be new to SQLAlchemy and has made it this far, everything described above only pertains to setting up the table models, and does not touch on how to execute queries against them using the ORM syntax. For that, I would check out the Object Relational Tutorial.
Conclusion
Usually, I find myself linking one table model to another via a one-to-many relationship, or many-to-one, with the occasional many-to-many relationship via an intersect table. This case was unusual for me, as I needed to link the table to itself as a many-to-many relationship.
However, each time I’ve come across an edge-case, SQLAlchemy has had something to support that scenario. It can take a bit of hunting to find the solution in the documentation, though, so I thought I’d document this one here just in case I need to come back to it, or someone else finds it useful.