[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Updating list of _audit.schema
- To: comcifs@iucr.org
- Subject: Re: Updating list of _audit.schema
- From: "john.westbrook--- via comcifs" <comcifs@iucr.org>
- Date: Thu, 7 Jan 2021 08:32:39 -0500
- Cc: "john.westbrook@rcsb.org" <john.westbrook@rcsb.org>
- In-Reply-To: <CABcsX25c_nL9K-Mx7jnAfwvDjqQ55ZBVcFw5fvh+6Wd=PDaNuA@mail.gmail.com>
- References: <CAM+dB2eAoV9BkyAmgCSQLQrNxnN9YssNpJ2bs_X_FJtKGn0p0g@mail.gmail.com><CABcsX27zaj3reB5+5=1FLytTewG7vvQuKK+k4uYw19nHa331QQ@mail.gmail.com><CAM+dB2f6O9S=0+c=9WR5cuN_fe7xrPLvdNEdgnaXPZBOUFcefw@mail.gmail.com><CABcsX27x4zpu16y9kB0RcOkGiDvCJegnaLrc8iY6ha2fndy2Lg@mail.gmail.com><CAM+dB2c7airzAr-Ao4zttG9EY0vL59skUS+MM160eLy4rb_+6A@mail.gmail.com><CABcsX26cGtTywueQf4PHCwzn9aLkuz=WBqQEesT9+r1-DwhnOA@mail.gmail.com><CAM+dB2dFX4M8e3-3-t8nRKDFit=0SqS2DeuKs4bJfjjeMkWJYg@mail.gmail.com><CABcsX25c_nL9K-Mx7jnAfwvDjqQ55ZBVcFw5fvh+6Wd=PDaNuA@mail.gmail.com>
Hi James, Related to the loop_ presentation issue. Nothing prohibits creating a loop_ with a single row even ifthe category logically has unit cardinality. Regards, John On 1/7/21 7:00 AM, Herbert J. Bernstein via comcifs wrote:> Dear James,>  This reminds me very much about the bitter fight in the early 1970s between the proponents of hierarchical databases> and relational databases. At the time I was on the wrong side and thought that there was something very neat and> organized about always forcing your information into a tree that allowed the use of highly efficient pointers.  Just putting> information into tables of unsorted tuples and eschewing pointers seemed horribly inefficient. Those of us who> liked hierarchies and pointers were wrong. Codd was right. My enthusiasm is the enthusiasm of a convert. Relations> rule!!!>  Regards,>   Herbert> > On Wed, Jan 6, 2021 at 11:39 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:> > OK Herbert, I can only go on what is in the dictionaries. Please explain how mmCIF can "loop freely" the categories containing> an _entry.id <http://entry.id> child key data name within a single data block given the definition. I will leave John W to> further comment on how _entry.id <http://entry.id> is supposed to be used if he wishes. Meanwhile, in order to make progress I> suggest simply> (i) removing the "Macromolecular" option, noting the previous "Experiments" option covers multi-wavelength, multi-crystal setups.> (ii) removing the "imgCIF" option> (iii) returning in the future to add corresponding _audit.schema options corresponding to mmCIF and imgCIF if necessary> > By the way, I think I share your enthusiasm for managing information using the relational model, but data containers (data> blocks/files/directories etc.) are unavoidable, and your characterisation of them as projections over particular values of one> or more key data names I think is the precise way of defining the relationship between a data block and a relational schema and> is vital for proper understanding of how to build datasets from constituent pieces.> > > On Thu, 7 Jan 2021 at 13:11, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:> > Dear James,>  John Westbrook will have to speak to the question of why his dictionary says that, but the reality is that he also runs a> database that> in fact supports a lot more than one entry id, and it is certainly the case that imgCIF data can have a very complex and> tangled relationship> with mmCIF entry ids. Further it is not unusual to present one entry as multiple datablocks pulled together by a common> entry id when> they are not in same data file, e.g. for structure factors and coordinates.>  All of which is beside the point. Both imgCIF and mmCIF are database schema and stick keys on everything and loop them> quite freely.> Sure you can always pick any key and extract only the tuples that contain that key at a single value and such other tuples> from related> child categories as make some kind of sense and call it a datablock, but that is a very narrow and minimally useful view of> how to manage information.>  Regards,>   Herbert> > On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:> > Herbert - are you arguing that imgCIF and mmCIF should not be assigned different schema names? If your comments are not> about that, feel free to ignore the following.> > If you scrutinize the definition in mmCIF of _entry.id <http://entry.id>> (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html> <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html>), you will see that it "identifies the> data block" so is therefore restricted to a single value in a single data block. It follows that all the child data> items of _entry.id <http://entry.id> are restricted to single values, so where these child items are the sole keys of> their categories those categories become single-row categories. Such categories are entirely functionally equivalent to> DDLm Set categories and so it would be possible to list which Set categories in core CIF are multi-row in mmCIF,> satisfying the criteria for a schema. Frankly I was a bit too lazy to write the code to determine this but from memory> it is only diffrn and exptl_crystal. If there are objections to the label "macromolecular" we can change it to> "multi-crystal multi-wavelength" to avoid any implications or restrictions on mmCIF.> > Although imgCIF does not have any categories that have child data names of _entry.id <http://entry.id> (so every imgCIF> category can have multiple rows), it does add new key data names to a few mmCIF categories, thereby creating a distinct> "_audit.schema". I don't think that is a controversial statement. For example, the Diffrn_Detector category in imgCIF> has key data names "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id <http://diffrn_detector.id>", whereas> mmCIF has only the former (as per the text at bottom of p203 of Vol G).> > all the best,> James.> > > On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:> > I believe both imgCIF and mmCIF only use loop categories and any set categories picked up for inclusion with their> datasets will need to have> keys added and be mapped into loop categories. That is certainly the case for imgCIF -- Herbert> > On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:> > Apologies for the lax terminology. By "looped" I mean "able to have more than one row in a loop". Perhaps the> explanations should be rewritten to use 'Loop category' and 'Set category' rigorously?> > On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:> >  In imgCIF (as with mmCIF) any and all categories may be looped -- its how you put information into> database tables.  - Herbert> > On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <comcifs@iucr.org <mailto:comcifs@iucr.org>> wrote:> > Dear COMCIFS,> > FIrst of all, Happy New Year to you all, I hope you've all been keeping well.> > I am writing to propose updating the list of _audit.schema in the core dictionary. Normally this would> be core DMG business, but as it concerns most dictionaries covered by COMCIFS I believe this is the more> appropriate forum. This has been prompted by reviewing the DDLm dictionary chapters for the next edition> of Volume G. Please examine the list below and discuss any changes you would like to see. The formal> changes to the dictionary can be viewed as a diff at this link:> https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e> <https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e>> > As a reminder, the _audit.schema dataname indicates that one or more categories have become looped> relative to the core CIF dictionary. For example, where multiple crystals are used in a measurement, the> exptl_crystal category becomes looped. Ideally software will check this dataname and exit if the> dataname has an incompatible value.> > best wishes,> James.> > =====================================================> loop_> _enumeration_set.state> _enumeration_set.detail>   Base         'Original Core CIF schema'>   'Space group tables' 'space_group category is looped'>   Entry> ;>   entry category is defined and looped: multiple experiments>   with results may be present> ;>   Powder        'Multiple compounds (phases) may be present'>   Modulated      'Multiple subsystems may be present'>   Experiments> ;>   diffrn and exptl_crystal categories are looped: multiple>   diffraction measurements on multiple samples may be present> ;>   Macromolecular> ;>   mmCIF equivalent. Only single-key mmCIF categories containing children>   of _entry.id <http://entry.id> are Set categories> ;>   Raw> ;>   imgCIF equivalent. As for Macromolecular, with the addition of>   multiple detectors.> ;>   Laue> ;>   diffrn_radiation is looped: Multiple wavelengths are used.> ;>   Custom        'Examine dictionaries provided in _audit_conform'>   Local        'Locally modified dictionaries. Datafile not for distribution'> _enumeration.default   Base> =======================> -- > T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148> _______________________________________________> comcifs mailing list> comcifs@iucr.org <mailto:comcifs@iucr.org>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs> <http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs>> > > > -- > T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148> > > > -- > T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148> > > > -- > T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148> > > _______________________________________________> comcifs mailing list> comcifs@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs> -- John WestbrookRCSB, Protein Data BankRutgers, The State University of New JerseyInstitute for Quantitative Biomedicine at Rutgers174 Frelinghuysen RdPiscataway, NJ 08854-8087e-mail: john.westbrook@rcsb.orgPh: (848) 445-4290 Fax: (732) 445-4320_______________________________________________comcifs mailing listcomcifs@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs
- Follow-Ups:
- Re: Updating list of _audit.schema
- From: "Bollinger, John C via comcifs" <comcifs@iucr.org>
- Re: Updating list of _audit.schema
- From: James Hester via comcifs <comcifs@iucr.org>
- Re: Updating list of _audit.schema
- References:
- Updating list of _audit.schema
- From: James Hester via comcifs <comcifs@iucr.org>
- Re: Updating list of _audit.schema
- From: "Herbert J. Bernstein via comcifs" <comcifs@iucr.org>
- Re: Updating list of _audit.schema
- From: James Hester via comcifs <comcifs@iucr.org>
- Re: Updating list of _audit.schema
- From: "Herbert J. Bernstein via comcifs" <comcifs@iucr.org>
- Re: Updating list of _audit.schema
- From: James Hester via comcifs <comcifs@iucr.org>
- Re: Updating list of _audit.schema
- From: "Herbert J. Bernstein via comcifs" <comcifs@iucr.org>
- Re: Updating list of _audit.schema
- From: James Hester via comcifs <comcifs@iucr.org>
- Re: Updating list of _audit.schema
- From: "Herbert J. Bernstein via comcifs" <comcifs@iucr.org>
- Updating list of _audit.schema
- Prev by Date: Re: Updating list of _audit.schema
- Next by Date: Re: Updating list of _audit.schema
- Prev by thread: Re: Updating list of _audit.schema
- Next by thread: Re: Updating list of _audit.schema
- Index(es):