[InterMine Dev] Why don't subclasses use postgres table inheritance?

Julie Sullivan julie at flymine.org
Fri Apr 8 10:26:17 BST 2016


Hi Sam,

Yes, as Justin says, this design decision was made prior to us. We are 
assuming the InterMine team knew about this feature and decided not to 
use it!

Looking at the docs, I see one problem:

	http://www.postgresql.org/docs/current/static/ddl-inherit.html

"A serious limitation of the inheritance feature is that indexes 
(including unique constraints) and foreign key constraints only apply to 
single tables, not to their inheritance children."

Being a data warehouse, indexes are our everything. In some cases, when 
querying we don't even go to the tables, we use the index only.

	https://wiki.postgresql.org/wiki/Index-only_scans

Maybe that's why? Just a guess. However, there is a ray of hope:

"These deficiencies will probably be fixed in some future release, but 
in the meantime considerable care is needed in deciding whether 
inheritance is useful for your application."

We're taking a really good look at postgres this summer, so I've put 
that on our list of things to investigate.

And thanks for this!!

Cheers
Julie

On 30/03/16 12:33, Justin Clark-Casey wrote:
> That's a good question.  Unfortunately, I can't answer it directly (if
> anybody else can shed more light please jump in).  As you say,
> PostgreSQL has had inheritance since at least 6.0 afaict.  This was
> released in 1997 and so precedes InterMine by a good few years.
>
> Perhaps originally InterMine did not want its ORM system tied too
> tightly into PostgreSQL (e.g. systems like Hibernate that can use many
> different SQL backends don't use PostgreSQL inheritance afaik).  Or
> maybe there was some old (perhaps now dead) InterMine feature that
> prevented using PostgreSQL inheritance.  This is all speculation.
>
> There are some caveats involved with inheritance around things like
> constraints [1].  On a quick read, it doesn't seem any of these would be
> an issue for InterMine but the proof would be in the pudding.
>
> On a quick Google, there's no obvious good performance information on
> how using inheritance impacts row insert times/storage
> requirements/query performance. However, one of our current concerns is
> the time it takes to load InterMine with data.  Intuitively, one might
> expect it to also reduce database size but this would be another case of
> suck it and see.  DB size does not seem to be a critical issue (please
> feel free to correct me!).
>
> Thanks for bringing this up, Sam.  It's now on our radar as a possible
> experiment to see if using PostgreSQL inheritance could improve data
> load times without affecting query performance.
>
> [1] http://www.postgresql.org/docs/current/static/ddl-inherit.html
>
> --
> Justin Clark-Casey, Synbiomine/InterMine Developer
> http://synbiomine.org
> http://twitter.com/justincc
>
> On 30/03/16 02:33, Sam Hokin wrote:
>> Actually, I'll note that those count queries would have returned the
>> same numbers if table inheritance were used. But, table inheritance
>> isn't used, so those
>> are actual table-specific record counts. And then, there's the true
>> daddy of them all:
>>
>> multimine=# select count(*) from intermineobject;
>>    count
>> ---------
>>   7107971
>>
>> Which made me think about heirarchy: intermineobject -> bioentity -> a
>> zillion other things. So perhaps that's why table inheritance isn't
>> used? Gets too hairy?
>> But Postgres is happy to let you inherit a table which, in turn,
>> inherits another table. The common fields just keep cascading down.
>> Still puzzling to me.
>>
>> On 03/29/2016 05:03 PM, Sam Hokin wrote:
>>> Hi, devs. I was just shocked to discover that the subclass of an
>>> entity winds up populating both the subclass table and the parent
>>> table with the same data (the extra subclass fields missing from the
>>> parent table, of course). For example:
>>>
>>>    <class name="GOAnnotation" extends="OntologyAnnotation"
>>> is-interface="true">
>>>      <attribute name="annotationExtension" type="java.lang.String"/>
>>>      <collection name="evidence" referenced-type="GOEvidence"/>
>>>    </class>
>>>
>>> results in:
>>>
>>> beanmine=> select count(*) from goannotation;
>>>   count
>>> -------
>>>   44844
>>>
>>> beanmine=> select count(*) from ontologyannotation;
>>>   count
>>> -------
>>>   44844
>>>
>>> A terrible price to pay for subclassing! Of course, the king of
>>> parents, BioEntity, gives:
>>>
>>> soymine=> select count(*) from bioentity;
>>>    count
>>> ---------
>>>   1458068
>>>
>>> Yikes!!!!!
>>>
>>> Postgres provides table inheritance, so you'd be able to simply
>>> define, for example:
>>>
>>> CREATE TABLE goannotation (
>>>    annotationextension  text
>>> ) INHERITS (ontologyannotation);
>>>
>>> In which case you'd put a record in goannotation and it'd show up
>>> automagically in ontologyannotation (without annotationextension).
>>>
>>> Is there a reason why the InterMine devs chose not to use table
>>> inheritance for subclasses and, instead, populate both tables with
>>> the same data? I use table inheritance all the time, it's one of the
>>> great features of Postgres going back to the beginning!
>>
>> _______________________________________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>
> _______________________________________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>



More information about the dev mailing list