[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Updating list of _audit.schema
- To: comcifs@iucr.org
- Subject: Re: Updating list of _audit.schema
- From: "john.westbrook@rcsb.org" <john.westbrook@rcsb.org>
- Date: Thu, 7 Jan 2021 08:32:39 -0500
- In-Reply-To: <CABcsX25c_nL9K-Mx7jnAfwvDjqQ55ZBVcFw5fvh+6Wd=PDaNuA@mail.gmail.com>
- References: <CAM+dB2eAoV9BkyAmgCSQLQrNxnN9YssNpJ2bs_X_FJtKGn0p0g@mail.gmail.com><CABcsX27zaj3reB5+5=1FLytTewG7vvQuKK+k4uYw19nHa331QQ@mail.gmail.com><CAM+dB2f6O9S=0+c=9WR5cuN_fe7xrPLvdNEdgnaXPZBOUFcefw@mail.gmail.com><CABcsX27x4zpu16y9kB0RcOkGiDvCJegnaLrc8iY6ha2fndy2Lg@mail.gmail.com><CAM+dB2c7airzAr-Ao4zttG9EY0vL59skUS+MM160eLy4rb_+6A@mail.gmail.com><CABcsX26cGtTywueQf4PHCwzn9aLkuz=WBqQEesT9+r1-DwhnOA@mail.gmail.com><CAM+dB2dFX4M8e3-3-t8nRKDFit=0SqS2DeuKs4bJfjjeMkWJYg@mail.gmail.com><CABcsX25c_nL9K-Mx7jnAfwvDjqQ55ZBVcFw5fvh+6Wd=PDaNuA@mail.gmail.com>
Hi James, Related to the loop_ presentation issue. Nothing prohibits creating a loop_ with a single row even if the category logically has unit cardinality. Regards, John On 1/7/21 7:00 AM, Herbert J. Bernstein via comcifs wrote: > Dear James, >  This reminds me very much about the bitter fight in the early 1970s between the proponents of hierarchical databases > and relational databases. At the time I was on the wrong side and thought that there was something very neat and > organized about always forcing your information into a tree that allowed the use of highly efficient pointers.  Just putting > information into tables of unsorted tuples and eschewing pointers seemed horribly inefficient. Those of us who > liked hierarchies and pointers were wrong. Codd was right. My enthusiasm is the enthusiasm of a convert. Relations > rule!!! >  Regards, >   Herbert > > On Wed, Jan 6, 2021 at 11:39 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote: > > OK Herbert, I can only go on what is in the dictionaries. Please explain how mmCIF can "loop freely" the categories containing > an _entry.id <http://entry.id> child key data name within a single data block given the definition. I will leave John W to > further comment on how _entry.id <http://entry.id> is supposed to be used if he wishes. Meanwhile, in order to make progress I > suggest simply > (i) removing the "Macromolecular" option, noting the previous "Experiments" option covers multi-wavelength, multi-crystal setups. > (ii) removing the "imgCIF" option > (iii) returning in the future to add corresponding _audit.schema options corresponding to mmCIF and imgCIF if necessary > > By the way, I think I share your enthusiasm for managing information using the relational model, but data containers (data > blocks/files/directories etc.) are unavoidable, and your characterisation of them as projections over particular values of one > or more key data names I think is the precise way of defining the relationship between a data block and a relational schema and > is vital for proper understanding of how to build datasets from constituent pieces. > > > On Thu, 7 Jan 2021 at 13:11, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote: > > Dear James, >  John Westbrook will have to speak to the question of why his dictionary says that, but the reality is that he also runs a > database that > in fact supports a lot more than one entry id, and it is certainly the case that imgCIF data can have a very complex and > tangled relationship > with mmCIF entry ids. Further it is not unusual to present one entry as multiple datablocks pulled together by a common > entry id when > they are not in same data file, e.g. for structure factors and coordinates. >  All of which is beside the point. Both imgCIF and mmCIF are database schema and stick keys on everything and loop them > quite freely. > Sure you can always pick any key and extract only the tuples that contain that key at a single value and such other tuples > from related > child categories as make some kind of sense and call it a datablock, but that is a very narrow and minimally useful view of > how to manage information. >  Regards, >   Herbert > > On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote: > > Herbert - are you arguing that imgCIF and mmCIF should not be assigned different schema names? If your comments are not > about that, feel free to ignore the following. > > If you scrutinize the definition in mmCIF of _entry.id <http://entry.id> > (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html > <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html>), you will see that it "identifies the > data block" so is therefore restricted to a single value in a single data block. It follows that all the child data > items of _entry.id <http://entry.id> are restricted to single values, so where these child items are the sole keys of > their categories those categories become single-row categories. Such categories are entirely functionally equivalent to > DDLm Set categories and so it would be possible to list which Set categories in core CIF are multi-row in mmCIF, > satisfying the criteria for a schema. Frankly I was a bit too lazy to write the code to determine this but from memory > it is only diffrn and exptl_crystal. If there are objections to the label "macromolecular" we can change it to > "multi-crystal multi-wavelength" to avoid any implications or restrictions on mmCIF. > > Although imgCIF does not have any categories that have child data names of _entry.id <http://entry.id> (so every imgCIF > category can have multiple rows), it does add new key data names to a few mmCIF categories, thereby creating a distinct > "_audit.schema". I don't think that is a controversial statement. For example, the Diffrn_Detector category in imgCIF > has key data names "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id <http://diffrn_detector.id>", whereas > mmCIF has only the former (as per the text at bottom of p203 of Vol G). > > all the best, > James. > > > On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote: > > I believe both imgCIF and mmCIF only use loop categories and any set categories picked up for inclusion with their > datasets will need to have > keys added and be mapped into loop categories. That is certainly the case for imgCIF -- Herbert > > On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote: > > Apologies for the lax terminology. By "looped" I mean "able to have more than one row in a loop". Perhaps the > explanations should be rewritten to use 'Loop category' and 'Set category' rigorously? > > On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote: > >  In imgCIF (as with mmCIF) any and all categories may be looped -- its how you put information into > database tables.  - Herbert > > On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <comcifs@iucr.org <mailto:comcifs@iucr.org>> wrote: > > Dear COMCIFS, > > FIrst of all, Happy New Year to you all, I hope you've all been keeping well. > > I am writing to propose updating the list of _audit.schema in the core dictionary. Normally this would > be core DMG business, but as it concerns most dictionaries covered by COMCIFS I believe this is the more > appropriate forum. This has been prompted by reviewing the DDLm dictionary chapters for the next edition > of Volume G. Please examine the list below and discuss any changes you would like to see. The formal > changes to the dictionary can be viewed as a diff at this link: > https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e > <https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e> > > As a reminder, the _audit.schema dataname indicates that one or more categories have become looped > relative to the core CIF dictionary. For example, where multiple crystals are used in a measurement, the > exptl_crystal category becomes looped. Ideally software will check this dataname and exit if the > dataname has an incompatible value. > > best wishes, > James. > > ===================================================== > loop_ > _enumeration_set.state > _enumeration_set.detail >   Base         'Original Core CIF schema' >   'Space group tables' 'space_group category is looped' >   Entry > ; >   entry category is defined and looped: multiple experiments >   with results may be present > ; >   Powder        'Multiple compounds (phases) may be present' >   Modulated      'Multiple subsystems may be present' >   Experiments > ; >   diffrn and exptl_crystal categories are looped: multiple >   diffraction measurements on multiple samples may be present > ; >   Macromolecular > ; >   mmCIF equivalent. Only single-key mmCIF categories containing children >   of _entry.id <http://entry.id> are Set categories > ; >   Raw > ; >   imgCIF equivalent. As for Macromolecular, with the addition of >   multiple detectors. > ; >   Laue > ; >   diffrn_radiation is looped: Multiple wavelengths are used. > ; >   Custom        'Examine dictionaries provided in _audit_conform' >   Local        'Locally modified dictionaries. Datafile not for distribution' > _enumeration.default   Base > ======================= > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > _______________________________________________ > comcifs mailing list > comcifs@iucr.org <mailto:comcifs@iucr.org> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs > <http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs> > > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > > > _______________________________________________ > comcifs mailing list > comcifs@iucr.org > http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs > -- John Westbrook RCSB, Protein Data Bank Rutgers, The State University of New Jersey Institute for Quantitative Biomedicine at Rutgers 174 Frelinghuysen Rd Piscataway, NJ 08854-8087 e-mail: john.westbrook@rcsb.org Ph: (848) 445-4290 Fax: (732) 445-4320
Reply to: [list | sender only]
- Follow-Ups:
- Re: Updating list of _audit.schema (James Hester)
- Re: Updating list of _audit.schema (Bollinger, John C)
- References:
- Updating list of _audit.schema (James Hester)
- Re: Updating list of _audit.schema (Herbert J. Bernstein)
- Re: Updating list of _audit.schema (James Hester)
- Re: Updating list of _audit.schema (Herbert J. Bernstein)
- Re: Updating list of _audit.schema (James Hester)
- Re: Updating list of _audit.schema (Herbert J. Bernstein)
- Re: Updating list of _audit.schema (James Hester)
- Re: Updating list of _audit.schema (Herbert J. Bernstein)
- Prev by Date: Re: Updating list of _audit.schema
- Next by Date: Re: Updating list of _audit.schema
- Prev by thread: Re: Updating list of _audit.schema
- Next by thread: Re: Updating list of _audit.schema
- Index(es):