Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Updating list of _audit.schema


Hi James,
Related to the loop_ presentation issue.  Nothing prohibits creating a loop_ with a single row even ifthe category logically has unit cardinality.
Regards,
John
On 1/7/21 7:00 AM, Herbert J. Bernstein via comcifs wrote:> Dear James,>    This reminds me very much about the bitter fight in the early 1970s between the proponents of hierarchical databases> and relational databases.  At the time I was on the wrong side and thought that there was something very neat and> organized about always forcing your information into a tree that allowed the use of highly efficient pointers.   Just putting> information into tables of unsorted tuples and eschewing pointers seemed horribly inefficient.  Those of us who> liked hierarchies and pointers were wrong.  Codd was right.  My enthusiasm is the enthusiasm of a convert.  Relations> rule!!!>    Regards,>      Herbert> > On Wed, Jan 6, 2021 at 11:39 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:> >     OK Herbert, I can only go on what is in the dictionaries. Please explain how mmCIF can "loop freely" the categories containing>     an _entry.id <http://entry.id> child key data name within a single data block given the definition. I will leave John W to>     further comment on how _entry.id <http://entry.id> is supposed to be used if he wishes. Meanwhile, in order to make progress I>     suggest simply>     (i) removing the "Macromolecular" option, noting the previous "Experiments" option covers multi-wavelength, multi-crystal setups.>     (ii) removing the "imgCIF" option>     (iii) returning in the future to add corresponding _audit.schema options corresponding to mmCIF and imgCIF if necessary> >     By the way, I think I share your enthusiasm for managing information using the relational model, but data containers (data>     blocks/files/directories etc.) are unavoidable, and your characterisation of them as projections over particular values of one>     or more key data names I think is the precise way of defining the relationship between a data block and a relational schema and>     is vital for proper understanding of how to build datasets from constituent pieces.> > >     On Thu, 7 Jan 2021 at 13:11, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:> >         Dear James,>            John Westbrook will have to speak to the question of why his dictionary says that, but the reality is that he also runs a>         database that>         in fact supports a lot more than one entry id, and it is certainly the case that imgCIF data can have a very complex and>         tangled relationship>         with mmCIF entry ids.  Further it is not unusual to present one entry as multiple datablocks pulled together by a common>         entry id when>         they are not in same data file, e.g. for structure factors and coordinates.>            All of which is beside the point.  Both imgCIF and mmCIF are database schema and stick keys on everything and loop them>         quite freely.>         Sure you can always pick any key and extract only the tuples that contain that key at a single value and such other tuples>         from related>         child categories as make some kind of sense and call it a datablock, but that is a very narrow and minimally useful view of>         how to manage information.>            Regards,>              Herbert> >         On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:> >             Herbert - are you arguing that imgCIF and mmCIF should not be assigned different schema names? If your comments are not>             about that, feel free to ignore the following.> >             If you scrutinize the definition in mmCIF of _entry.id <http://entry.id>>             (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html>             <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html>), you will see that it "identifies the>             data block" so is therefore restricted to a single value in a single data block. It follows that all the child data>             items of _entry.id <http://entry.id> are restricted to single values, so where these child items are the sole keys of>             their categories those categories become single-row categories. Such categories are entirely functionally equivalent to>             DDLm Set categories and so it would be possible to list which Set categories in core CIF are multi-row in mmCIF,>             satisfying the criteria for a schema. Frankly I was a bit too lazy to write the code to determine this but from memory>             it is only diffrn and exptl_crystal. If there are objections to the label "macromolecular" we can change it to>             "multi-crystal multi-wavelength" to avoid any implications or restrictions on mmCIF.> >             Although imgCIF does not have any categories that have child data names of _entry.id <http://entry.id> (so every imgCIF>             category can have multiple rows), it does add new key data names to a few mmCIF categories, thereby creating a distinct>             "_audit.schema". I don't think that is a controversial statement. For example, the Diffrn_Detector category in imgCIF>             has key data names "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id <http://diffrn_detector.id>", whereas>             mmCIF has only the former (as per the text at bottom of p203 of Vol G).> >             all the best,>             James.> > >             On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:> >                 I believe both imgCIF and mmCIF only use loop categories and any set categories picked up for inclusion with their>                 datasets will need to have>                 keys added and be mapped into loop categories.  That is certainly the case for imgCIF -- Herbert> >                 On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:> >                     Apologies for the lax terminology. By "looped" I mean "able to have more than one row in a loop". Perhaps the>                     explanations should be rewritten to use 'Loop category' and 'Set category' rigorously?> >                     On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:> >                           In imgCIF (as with mmCIF) any and all categories may be looped -- its how you put information into>                         database tables.  - Herbert> >                         On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <comcifs@iucr.org <mailto:comcifs@iucr.org>> wrote:> >                             Dear COMCIFS,> >                             FIrst of all, Happy New Year to you all, I hope you've all been keeping well.> >                             I am writing to propose updating the list of _audit.schema in the core dictionary. Normally this would>                             be core DMG business, but as it concerns most dictionaries covered by COMCIFS I believe this is the more>                             appropriate forum. This has been prompted by reviewing the DDLm dictionary chapters for the next edition>                             of Volume G. Please examine the list below and discuss any changes you would like to see.  The formal>                             changes to the dictionary can be viewed as a diff at this link:>                             https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e>                             <https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e>> >                             As a reminder, the _audit.schema dataname indicates that one or more categories have become looped>                             relative to the core CIF dictionary. For example, where multiple crystals are used in a measurement, the>                             exptl_crystal category becomes looped. Ideally software will check this dataname and exit if the>                             dataname has an incompatible value.> >                             best wishes,>                             James.> >                             =====================================================>                             loop_>                             _enumeration_set.state>                             _enumeration_set.detail>                                  Base                'Original Core CIF schema'>                                 'Space group tables' 'space_group category is looped'>                                  Entry>                             ;>                                  entry category is defined and looped: multiple experiments>                                  with results may be present>                             ;>                                  Powder              'Multiple compounds (phases) may be present'>                                  Modulated           'Multiple subsystems may be present'>                                  Experiments>                             ;>                                  diffrn and exptl_crystal categories are looped: multiple>                                  diffraction measurements on multiple samples may be present>                             ;>                                  Macromolecular>                             ;>                                  mmCIF equivalent. Only single-key mmCIF categories containing children>                                  of _entry.id <http://entry.id> are Set categories>                             ;>                                  Raw>                             ;>                                  imgCIF equivalent. As for Macromolecular, with the addition of>                                  multiple detectors.>                             ;>                                  Laue>                             ;>                                  diffrn_radiation is looped: Multiple wavelengths are used.>                             ;>                                  Custom              'Examine dictionaries provided in _audit_conform'>                                  Local               'Locally modified dictionaries. Datafile not for distribution'>                             _enumeration.default    Base>                             =======================>                             -- >                             T +61 (02) 9717 9907>                             F +61 (02) 9717 3145>                             M +61 (04) 0249 4148>                             _______________________________________________>                             comcifs mailing list>                             comcifs@iucr.org <mailto:comcifs@iucr.org>>                             http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs>                             <http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs>> > > >                     -- >                     T +61 (02) 9717 9907>                     F +61 (02) 9717 3145>                     M +61 (04) 0249 4148> > > >             -- >             T +61 (02) 9717 9907>             F +61 (02) 9717 3145>             M +61 (04) 0249 4148> > > >     -- >     T +61 (02) 9717 9907>     F +61 (02) 9717 3145>     M +61 (04) 0249 4148> > > _______________________________________________> comcifs mailing list> comcifs@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs> 
-- John WestbrookRCSB, Protein Data BankRutgers, The State University of New JerseyInstitute for Quantitative Biomedicine at Rutgers174 Frelinghuysen RdPiscataway, NJ 08854-8087e-mail: john.westbrook@rcsb.orgPh: (848) 445-4290 Fax: (732) 445-4320_______________________________________________comcifs mailing listcomcifs@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs