On Aug 11, 10:52am, Peter Keller wrote: > Subject: Re: _item.mandatory_code can be undefined! (plus other developmen > > There is some history to this issue > > Yes, I thought that there might be.... > > which is related to providing compliance > > to earlier dictionaries and CIFs. It was agreed that in order to provide > > an easier integration with older dictionaries that there be a placeholder > > definition for every item in the mmCIF dictionary. This really results > > in a large number of essentially redundant definitions for data items > > that are children of other items. In these cases only the definition of > > the data item and perhaps the item name have been specified in the mmcif > > dictionary. Placing a default value on the mandatory code would result > > in conflicting definitions for this attribute as in almost all cases these > > items are part of the key for the category in which they reside. > > Yes, I see - I hadn't quite realised the implications of this. I suppose > that ideally there ought to be some kind of inheritance mechanism..... > Yep.. I have a proposal on this issue which follows... > Rather > > that load up all of the definitions with an additional mandatory code > > attribute we have chosen to make this specification optional. > > OK, so as it stands, there are in effect four possible values of > _item.mandatory_code: yes, no, implicit and undefined, with undefined > being the effective default. I have no fundemental problem with this - > just that there is no way of knowing, from publicly available information, > what an application is supposed to do with an item for which > _item.mandatory_code is undefined. At the very least, this issue should be > discussed in the documentation, and some sort of convention for dictionary > developers proposed. This kind of thing makes it very difficult for people > (like myself) who were not part of the original CIF/DDL effort, to write > effective and robust applications and libraries. In this particluar case, > if this isn't clarified, you risk running into what the compiler writers > call 'implementation-dependent behaviour'. It is very dangerous to rely on > the excercise of common sense by different people who are not in regular > contact, to produce consistent results! You have been warned! > Your point is well taken, please see later comments on this.. > However, > > you will note in the mmCIF dictionary that it is provided all data > > items except in the case of redundant children. > > Actually, if you look, you will see that it is provided for all data items > _including_ the case of redundant children, as well as in their parent > item definitions. This is causing me problems, because there is nothing in > the DDL documentation about how multiply defined properties of dictionary > items are to be treated, and no mechanism (at least in publicly available > documentation) which ensures that conflicts don't arise. Ah, well, I > suppose that I shall just have to use my common sense....:-). > It is good that you bring up the point of the "update policy". This was in an early version of the DDL and was removed because it had too much of a "database" flavor. When I removed this I was of the mind set that we would avoid such problems at the dictionary level because we would not typically encounter redundant definitions. This was in fact something we set out to avoid. Since that time the redundancy has crept back into the dictionary and we are confronted with problem once again. This actually relates to other criticisms that you raise regarding the enforcement of a mandatory policy for data type and mandatory code. One the major reasons for making these optional was to avoid the the possibility of an inconsistent update in redundant data definitions. >From my last look at the mmCIF dictionary it appears that most of these instances now include a complete respecification of at least the item category (which includes the mandatory code). I would prefer if we could simply not include the redefinition of the item category if this were acceptable everyone and simply provide an item description which may in some cases needs to be refined relative to the parent description. Back to updating... We are treating these situations from the software perspective as overwrites. When we encounter a duplicate key in a category we assume it updates the row associated with the key. Consequently, any respecification of a row must be complete. We further process the update in the order in which we parse the definitions in the dictionary. This is problematic as this should really be order independent, but there should really not be any redundancy either. We have experimented with partial row updates, and simply discarding duplicate rows but there are problems with both of these approaches. I think that the full update approach is the most reasonable. From the standpoint of checking the dictionary we have been printing diagnostics when a row update occurs with a data item value conflict which does not involve a NULL value. Now there is a question about where this sort of information/rule should be encoded. If there are no objections I will add this into the description for the category key for the moment. An issue related to this is the propagation of an update throughout a collection of categories parent/child relationships. For instance, an update of a _entity.id in the entity category could have profound consequences down the structural hierarchy. There are well know ways of describing the alternative actions, but this sort of information has been stripped out of DDL 2.1. As Peter points out perhaps it should at least be dealt with in some associated documentation. > > The section on the ITEM_LINKED category points out (quite rightly) the > difficulties which can arise from cyclical linkages, but says nothing > about conflicts such as: > > save__cat1.name > .... > _item.mandatory_code yes > > save_ > > save__cat2.name > > loop_ > _item_linked.parent_name _item_linked.child_name > '_cat2.name' '_cat1.name' > ... > > loop_ > _item.name _item.mandatory_code > '_cat2.name' yes > '_cat1.name' no > ... > > save_ > > Can I rely on SIFLIB (or other tools being used by dictionary developers) > ensuring that this kind of thing doesn't happen, or do I have to check for > this in my own applications? I think that those of us who are dictionary > users (i.e. CIF application developers) as opposed to dictionary > developers, should be told. > In this case we would detect an update and that there was a conflict in the value of mandatory_code. Simply relying on an overwriting update in this case is problematic. > > Back to the common sense point. Those of you who were at Montreal, may > remember that I said that there must be a forum for developers to discuss > these, and other, issues, and to check their interpretations of the DDL > and dictionaries. An example - is there someone out there who can clarify > this point: > > _category.mandatory_code for the ITEM_TYPE category is no, i.e. it is not > compulsory to define the type of a data item in a dictionary. So, if an > application uses my library to request an item from a CIF file, and > _item_type.code is undefined for that item in the dictionary, what is my > library supposed to do? Refuse to process the item, and stop with an > error? Assume a type of text, and let the application program sort out As I explained before, we are in a catch 22 situation here. If we make this mandatory, then it must be respecified for each redundant item in the dictionary. We very much wish to minimize this sort of redundancy as much is this is possible. I appreciate the problems that this causes with respect to data type .. What if we expand the enumeration for mandatory_code to include "mandatory/inherit" which formally specifies that an item is required and can inherit the property of a parent. If there is no parent or the value is not specified then it there is clearly an error condition. This would involve a rather small change to the DDL that would not require any changes in the mmCIF dictionary. Regards.. John -- **************************************************************************** * John Westbrook Ph: (908) 445-5156 * * Department of Chemistry Fax: (908) 445-5958 * * Rutgers University * * PO Box 939 e-mail: jwest@rutchem.rutgers.edu * * Piscataway, NJ 08855-0939 * ****************************************************************************