[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Thu, 20 Jan 2011 15:12:12 -0500 (EST)
- In-Reply-To: <4D388C3D.3040707@mcmaster.ca>
- References: <AANLkTikZoEF_D+5-3+Eg4pbCx0cAu+SJvR-a_XkC3zK2@mail.gmail.com><alpine.BSF.2.00.1101190833560.91751@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54166D7D1ECE@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1101191042290.42382@epsilon.pair.com><4D371BE7.3050501@mcmaster.ca><alpine.BSF.2.00.1101191234221.42382@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54166D7D1ED0@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1101191632410.65107@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54166D7D1ED1@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1101191855500.30768@epsilon.pair.com><AANLkTi=xn2ntdNTvdTBKQQTsJhCQFbKcxceJ1C_u1oOf@mail.gmail.com><alpine.BSF.2.00.1101200440460.66943@epsilon.pair.com><alpine.BSF.2.00.1101200604480.4054@epsilon.pair.com><4D382A29.20809@rcsb.rutgers.edu> <4D388C3D.3040707@mcmaster.ca>
Dear David, If we give it a try, we might succeed. If we don't even try we definitely won't succeed. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Thu, 20 Jan 2011, David Brown wrote: > Dear Colleagues, > > It is unlikely that we will be able to abandon our DDL1 and DDL2 > dictionaries as lightly as Herbert suggests, though the goal is a noble > one. We must remember that it is not just the tags that are different > between the different dictionaries but also the structure. DDLm is more > structured than DDL2 which in turn is more structured that DDL1. Many CIFs > based on DDL1 have bent the rules while we were learning how best to > structure CIF and it is possible that some problems will occur in reading > early CIFs because of this. We might also find cases where a CIF includes a > loop that is not allowed in the DDLm dictionaries but we will only discover > this by experiment, which requires working software. Similarly any datafile > written using a DDLm dictionary will be able to produce a CIF filled with > DDL1 dictionary datanames, but the structure will still correspond to DDLm. > In most cases this is unlikely to be a problem, but we will only find out > when we have working software. > > The aliases should contain the following information: The tag, the > dictionary in which it appears, the version of this dictionary, the DDL in > which the dictionary is written (a given dictionaray may be written using > different DDLs as for example the symmetry dictinoary was written in DDL2 > and parts converted to DDL1), a flag to indicate whether the dataname is > deprecated (needed for writing files) and a pointer to where the named > dictionary can be found. This may be a public archive or a local file that > in turn points either to a local source or the public archive depending on > the local institution. There may be rare occasions when someone may want to > write a program to produce a CIF in an earlier version that is compatible > with software that is unaware of the later datanames. > > The goal, as I say, is noble and is worth shooting for. Whether we reach > the goal can only be determined when we have working software, but we should > design the system on the assumption that it will work. > > David > > > > > John Westbrook wrote: > > Herbert and David, > > Could I ask for some clarification on the requirements for the aliasing > mechanism. In particular is this intended to provide more than naming > correspondence between the current dictionary and the some prior dictionary. > In DDL2 we have used ITEM_ALIASES as in the example below to provide name > correspondences between our dictionary, the CIF core dictionary, and other > recognized variant dictionaries. The image dictionary has done this > similarly I believe. > > I am confused by the last set of messages about how this will be used > backwardly for validation. The semantics of new dictionaries may > enforce a new potentially stricter set of rules that are not necessarily > backwardly compatible. I am wondering what is expected here. > > John > > > > save__atom_site.Cartn_z > _item_description.description > ; The z atom-site coordinate in angstroms specified according t > o > a set of orthogonal Cartesian axes related to the cell axes > as > specified by the description given in > _atom_sites.Cartn_transform_axes. > ; > _item.name '_atom_site.Cartn_z' > _item.category_id atom_site > _item.mandatory_code no > _item_aliases.alias_name '_atom_site_Cartn_z' > _item_aliases.dictionary cif_core.dic > _item_aliases.version 2.0.1 > loop_ > _item_dependent.dependent_name > '_atom_site.Cartn_x' > '_atom_site.Cartn_y' > _item_related.related_name '_atom_site.Cartn_z_esd' > _item_related.function_code associated_esd > _item_sub_category.id cartesian_coordinate > _item_type.code float > _item_type_conditions.code esd > _item_units.code angstroms > save_ > > On 1/20/11 6:10 AM, Herbert J. Bernstein wrote: > > Dear Colleagues, > > There is an importantr part of James' suggestions that, > if Brian is willing, I think it would be a good idea > to add to the _alias.tag_style proposal and that is > a central registry of styles to facilty dictionary > merging. The ground rules would be: > > COMCIFS approval for any style, such as DDL1, DDL2, > DDLm, etc., unless prefixed by a prefix from Brian's > prefix registry, e.g. pdbx_. The special prefix > local_ could be used for styles for use purely > locally, i.e. for private dictionaries for which] > collisions on merging are not a concern. > > Regards, > Herbert > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Thu, 20 Jan 2011, Herbert J. Bernstein wrote: > > Dear Colleagues, > > If a DDLm dictionary is to be a fully functional replacement > for, say, a DDL1 dictionary, a dictionary against which one > can validate the use of purely DDL1 tags, we need a way to > not only specify the desired DDL1 tag as an alias to the > DDLm tag used in the dictionary, but also to specify that > we do _not_ want to accept the DDLm tag used as the save > frame name as the valud name. As David has noted, in > order not to still be maintaining both a DDL1 and a > DDLm dictionary, we want this information _in_ the DDLm > dictionary, so simply aliasing back to some other > DDL1 dictionary to use it as a way to say -- "use that dictionary > URI as the style indicator" is suboptimal. Worse, it is a > source of future errors and confusion in that it is > defining properties of the tag that may end up disagreeing > with the properties we wish to actually have that we > defined in the DDLm dictionary. > > OK, so far, so good -- all we need then is John B.'s tag-by-tag > style preference flag to say, for this dictionary we want to > be DDL1'ish. > > Ah, but now we say, we are in the situation of maintaining > the core (David's problem) in which we have to maintain > a dictionary for validation against both DDL1 and DDL2 > tag names. Now there are times when we wish the DDL1 > alias to be the preferred alias and for both the DDL2 > and DDLm tags to fail a validation check and other times when > we wish the DDL2 alias to be the prefeered alias and > for both the DDL1 and DDLm tags to fail a validation check. > Now it becomes simpler to just have a common style key, > such as "DDL1" or "DDL2" and to select just the way we > do for alternate conformers on that key. > > OK, that was not so bad, but now we are at, say, the PDB > and in addition to having DDL1 and DDL2 style tags from > the core, we also have prefixed tags (pdbx) that should > eventually get promoted to be prefix-free. Now we can > use the styles to validate for strict use of > the prefixes when we are producing output that we want > to be certain actually does use the prefixes, or > relax the validation to allow both the prefixed and promoted > tags, or go strict again on the far side to be sure be > are only producting promoted tags. > > Note that none of these style based input validation choices > are based on the choice of dictionary -- it is one dictionary, > so it does not really help to be maintaining the styles > dictionary by dictionary. The grain of identification is > too coarse, and involves multiple maintenance issues when > in reality only one, nice new, DDLm dictionary needs to be > maintained. > > On the output side, essentially the same issues arise, but > there are fewer users, but as I said, it is a harmless > addition to the DDLm spec for those who do not wish to > be aware of it, and for those of use for whom it is > useful, it really is useful. > > The fundamental diagreement is on whether we will have > to have a DDL1 dictionary, a DDL2 dictionary, a DDLm > dictionary, a prefix dictionary, etc., and plant them > on assorted web sites, or just one DDLm dictionary that > handles everything and can be local or remote or in > local and remote pieces without changing the behavior > of the validation or of the output. > > I hope that those who are uncomfortable with this change > will reconsider and support it. Thanks to David's clear > thinking it is a clean, simple and useful idea, much > better than my original import suggestion. > > Please support it. > > Regards, > Herbert > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Thu, 20 Jan 2011, James Hester wrote: > > I'm trying to get a grip on what problem the tag_style proposal > solves. I'll just emphasise at the outset in case there are any > misconceptions that it is incorrect to suppose that the dREL method > knows or needs to know anything about the particular syntax in which > an input or output value is expressed; dREL is concerned purely with > describing relationships. > > Here are the two scenarios that I think are being discussed under the > rubric of DDLm compatibility with CIF1: > > Scenario 1: given a DDLm dictionary, a program wishes to generate and > (validate/insert) the value for some given CIF1 dataname in a CIF1 > datafile, using other CIF1 tags found in that datafile. We are all > agreed (I think) that locating the relevant DDLm dictionary entries > for a CIF1 dataname is a simple and well-defined task. The formatting > of the eventual output value of the DDLm method is also not in the > purvey of the dictionary, but rather of the application that is using > the dictionary. The particular CIF1 tag to put in the datafile is > also not an issue, as that was given at the beginning. So the > tag_style proposal is not relevant here. > > Scenario 2: given a CIF2 datafile, a DDLm application wishes to > produce an equivalent CIF1 datafile. For many of the CIF2 datanames > found in the CIF2 datafile, there are multiple possible datanames > listed as aliases. How is the application to ensure that it writes a > set of datanames from DDL1 dictionaries only or DDL2 dictionaries > only? The simple solution alluded to by John B would be to do as > follows: for each dictionary URI mentioned in the alias list, use the > IUCr CIF dictionary register (and/or other canonical sources) to > determine the DDL version of that dictionary. DDL conformance is a > standard entry in the dictionary register. The latest dictionary > version as given in the dictionary register could be selected where > multiple versions are presented (URL for the register is > ftp://ftp.iucr.org/pub/cifdics/cifdic.register). > > Of course, any program wanting to do such conversions efficiently > would pregenerate a DDL version - dictionary table once and refer to > that. I therefore see no use, either in terms of efficiency or new > functionality, for the tag_style attribute. > > Please advise if I have misunderstood the problem. > > James. > On Thu, Jan 20, 2011 at 11:20 AM, Herbert J. Bernstein > <yaya@bernstein-plus-sons.com> wrote: > > No, a tag style is simply supposed to identify a grouping of alias > tag choices that belong together, so you can decide to put out > those particular versions of tags. It is just a text string, > just like a alternate conformer identifier. > > The same tag name could be marked with a many tag styles as > you choose. It is just text. But you could not give multiple > aliases for the same DDLm tag for the same tag style when allowing > DDLm missing value generation or you would not know which version to put > out, and for validation, there is no reason not to use different > styles for the different alternatives. > > The way I will write the extraction algorithm, if you choose > a tag style, you will get the DDLm name for the tags that don't > have an alias for the chosen style, but the tag alias given for the > specified style is there is one. That way a dictionary that is > intended to support DDL1, DDL2 and DDLm for which the DDLm > tags happen to be primarily consistent with DDL2 conventions, > then for the tags that conform to DDL2 conventions, you will > not need a DDL2 style alias, just a DDL1 style alias. You will > only need both a DDL1 style alias and a DDL2 style alias for > a tag for which the DDLm tag is different from both, e.g. > for _diffrn_standards_decay_% (DDL1), _diffrn_standards.decay_% > (DDL2) and _diffrn_standards_decay_percent (DDLm). When you > want DDLm output and validation, you don't specify a style at all. > > This will be very nice to allow an automatic cleanup for dictionaries > using a prefix, say pdbx, for tags that later get promoted to > to not need a prefix. > > Regards, > Herbert > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Wed, 19 Jan 2011, Bollinger, John C wrote: > > On Wednesday, January 19, 2011 3:47 PM, Herbert J. Bernstein wrote: > > The definition_id most certainly does not exhibit the tag > style. For example, there is no way to distinguish DDLm > tag style from DDL2 or DDL2 tag style from context. That > is intentionally inherent in the design of DDLm. > > Then I'm afraid I don't quite comprehend the meaning of "tag style". I > would like to do, so that I can form a well-founded opinion about it. > > As I thought I had understood the idea, the tag style is proposed to > identify the set of DDL conventions with which the given alias complies. > If that were indeed what it was intended to mean, however, then (1) as > you observe, some names would comply with more than one set of > conventions, but also (2) a set of candidate tag styles, at least, could > be generated could be computed for any alias name. > > What would be the significance of marking an alias that conforms with > both DDL2 and DDLm conventions with tag style DDL2? > > Might it ever be needful or useful to mark the same alias with more than > one tag style? > > As for defining a hypothetical URI, that can break, > or each least time-out programs trying to get additional > information about an aliased tag from that URI. URIs > should be for things that really exist on the web, > not a substitute for a tag that really defines something > different, in this case the style of tags. > > I don't think the issue is nearly so clear cut. I would hold, for example, > that the primary purpose of a URI is to *identify* > a resource. That's what the "I" stands for, as I'm sure you're aware. RFC > 3986 (Uniform Resource Identifier (URI): General > Syntax) explicitly provides that a URI may identify an abstract resource. R > FC 2396 (now obsoleted by 3986) says the same. > Although many URIs fulfill their purpose by serving as resolvable web addre > sses, some, even among those formatted as URLs, do > not. Examples of the latter abound in various XML communities. > > Personally, however, I think a bit more like you do: a URL ought to refer to > a retrievable resource on the web. For an > abstract or virtual resource, therefore, I prefer to use a URN. For somethi > ng like your virtual DDL1 imgCIF dictionary, I > might choose something like urn:x-imgCIF:DDL1. If a URN were used, then pro > grams assuming a resolvable URL might still break, > but only if they were poorly crafted indeed would they hang pending a time o > ut. The whole issue could largely be mooted by > clarifying the purpose and intended usage of _alias.dictionary_uri in its de > finition. That need not prevent programs from > attempting to resolve dictionary URIs, but if it specified that dictionary U > RIs might be permanently unresolvable then > programmers would know to prepare for that possibility. > > We already do something very similar to this with > alternate conformers and with NMR model numbers. It > really is a simply concept for organizing information > that belongs in groups, in this case the group of > DDL1 or DDL2 or DDLm or ... style tags. > > I think that makes it a bit clearer to me what you want to do, but I'm still > interested in the answers to my questions above. > I'm a bit uncomfortable with defining generic groups of aliases with per-di > ctionary semantics, if that's indeed what you're > proposing. For one thing, it does not play well with dictionary merging. F > or another, the meaning of the groupings is nowhere > defined, at least not without adding at least one more data names to DDLm fo > r that purpose. > > On the other hand, data names have at least one natural grouping: the dictio > naries in which they are defined. This grouping is > already modeled in DDLm, and as far as I can tell, it is conceptually a perf > ect fit for what you want to do. > > That doesn't necessarily mean that there is no use for a more general groupi > ng mechanism. I am curious indeed whether there > are use cases for grouping data names that do not align well with dictionari > es or dictionary-defined attributes. Can anyone > suggest some? > > It solves > a very real problem for me with imgCIF. It does > not harm to anybody else. If nobody uses it in > another dictionary, it still would have been a useful > addition to DDLm. > > I very much want you to have a solution to your problem, and I have suggeste > d one that still seems absolutely natural to me. > It may be that there are better alternatives, and perhaps even that tag sty > le would be one such. Of the latter, however, I am > not yet persuaded. > > Perhaps "harm" is too charged a word, but adding an additional attribute to > DDLm certainly does cost everyone else. Every DDLm > application must support all the DDLm attributes, so every additional attrib > ute places a development and maintenance burden on > multiple developers. That incrementally slows software release cycles and i > ntroduces additional space for bugs and > incompatibilities to hide. It's a small cost for most people, but everyone > pays it. The proposed tag style is no different in > that regard from any other DDLm attribute, of course, but that doesn't mean > that its cost should be ignored. > > As for whether it would be a useful addition to DDLm, that is exactly what I > am trying to decide. Potential use cases such as > I solicited above would help me make that decision. > > In the end, I suspect that both core and mmCIF DDLm > dictionaries will be built this way, because it > make it simpler and clearer and allows multi-purpose > dictionaries to be self-contained and avoid the > maintenance headache David spotted. > > If by "multi-purpose dictionaries" you mean defining multiple virtual dictio > naries via a single DDLm dictionary, such as you > plan, then I still see the dictionary_uri as the natural way to use aliases > for that purpose. If there is a broader concept > here then please help me see it. > > > Regards, > > John > > -- > John C. Bollinger, Ph.D. > Department of Structural Biology > St. Jude Children's Research Hospital > > > > > Email Disclaimer: www.stjude.org/emaildisclaimer > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > > > nlikely to be a problem. The file will be more structured that a file that > comforms to a DDL1 compliant CIF, but there may be some legacy software that > will not be able to read it. Again we canc only guage the extent of the > problem by experiment. No amount of hot air emails will solve that > problem. If all goes well we may be able to abandon the older dictionaries > in favour of DDLm. Let;s hope. > > The information that should be available in an alias are the tag, the > dictionary in which the tag first appears (including the version number), > the DDL that the dictionary conforms to, a flag to indicate if a particular > name has been deprecated, and a pointer to where the dictionary can be > found. It is better to keep these pieces of information separate (as is > done in the mmCIF dictionary), since using a single item to convey two > distinct types of information is inelegant and can lead to problems if there > is conflict between the two meanings. The pointer might be to a public > archive, but it may make more sense for it to point to a local source that > in turn can point either to a local source or an archive. The deprecation > flag is not needed on reading, but is needed on writing to ensure that only > the current tag is used. Alternatively, if the output file is to conform to > a particular version of the CIF dictionary (so as to use an early piece of > softward that is only aware of the deprecated name), the fils can be written > in this version. It is unlikely tht general software would include this as > an option, but the dictionary should make it possible. > > I agree with Herbert that the possibility of using the DDLm dictionaries to > manage the whole CIF archive is worthy of pursuing, even if it eventually > proves not to be possible. > > David > > > > >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] Objectives of CIF2 syntax discussion (James Hester)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion (Herbert J. Bernstein)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. . (Bollinger, John C)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. . (Herbert J. Bernstein)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. . (David Brown)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. . (Herbert J. Bernstein)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. . (Bollinger, John C)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (Bollinger, John C)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (James Hester)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (John Westbrook)
- Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (David Brown)
- Prev by Date: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .. .
- Next by Date: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .
- Prev by thread: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .
- Next by thread: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .
- Index(es):