[Date Prev][Date Next][Date Index]
(66) Further thoughts on pdCIF categories
- To: COMCIFS@iucr.ac.uk
- Subject: (66) Further thoughts on pdCIF categories
- From: bm
- Date: Mon, 19 May 1997 17:53:58 +0100
Dear Colleagues D61.1. pdCIF categories ----------------------- (The unattributed comments in this first part are Brian Toby's reaction to David Brown's critique.) D> As far as I can make out, Brian T does not believe in D> categories, never has and sees no reason to change now. I have no quibble with categories. They could serve a valuable purpose. I object to the restriction on mixing categories in a loop since it either makes categories useless, as Paula so correctly points out for pdCIF, or requires very complex dictionaries with lots of inter-loop pointers and professional computer programmers to create software. D> different times. But what about the case where Brian and his D> students work on the file on a number of different occasions and D> then I subsequently make my own contribution? His flexible D> structure does not allow for this possibility. Either all authors Yes, I had thought about this. Then the only solution is to assign a date to every person in the loop with the implicit connection made that people who did work at the same time must have worked together. Not elegant, but functional. The alternative is to define a new set of loops to differentiate between collaboration and sequential processing. This will probably require multiple loops and pointers between loops. I would prefer that we wait until there is a demonstrated need for this level of structure. D> DATA NAMES D> I see no particular reason why all the datanames in the pd D> dictionary need to start with _pd_. We have not adopted this D> convention in any other dictionary and I do not see that it offers I don't like the _pd_ prefix and if we are going make a revision of the pdCIF names to match categories, I would like to discuss dropping the prefix. On the other hand, I would prefer to see the dictionary approved in its current form. Syd has convinced me to live with the prefix. D> _PD_DATA D> My vote is to divide this into two categories, _pd_meas and D> _pd_proc. There should be no need to have pointers between them D> since there may not be a one-to-one correspondence between them. D> _*_2theta should serve to connect the information in the two D> categories. If it makes sense to list these two sets of numbers D> together this can surely be done by the software. We should not D> confuse the cif with the output produced by the software, just as D> we should not necessarily confuse the structure of the cif with the D> structure of the database into which the cif is to be copied (as D> Brian points out). Cifs are for the transfer and archiving of D> information, not for providing a convenient layout for research or D> publication. I would have no problem with this as long as it is considered "proper CIF" to create a file that looks like this: loop_ _pd_meas_angle_2theta _pd_meas_counts_total _pd_proc_intensity_net _pd_calc_intensity_net 10 131 101 100 10.05 127 97 100 where it is not necessary to include _pd_proc_2theta_corrected if it is identical to _pd_meas_angle_2theta. Alas, _pd_meas and _pd_proc cannot appear in a single loop. Requiring the following structure only makes CIF less transparent, except perhaps to the database folks: loop_ _pd_meas_angle_2theta _pd_meas_counts_total 10 131 10.05 127 ... loop_ _pd_proc_2theta_corrected _pd_proc_intensity_net _pd_calc_intensity_net 10 101 100 10.05 97 100 ... D> Why is _pd_proc_2theta_range_* needed in the _pd_data category D> since it is not looped with the profile points? I am also puzzled D> to know how _*_range_ fields are used to describe fixed-angle D> profiles. Does one use _*_min or _*_max to give this angle or is D> it necessary to set both equal to each other and set _*_inc to 0.0? D> This seems rather complicated. Why not a _pd_proc_2theta_fixed D> field as is done with _pd_meas? It is OK (acronym definition unknown) with me to move _pd_proc_2theta_range_ to a different category, but I still do not see what the advantage is in placing _pd_proc_2theta_range_ in a different category from _pd_proc_2theta_corrected when they specify the same information. For stationary detectors one uses _pd_meas_2theta_fixed and there is no need for a _pd_proc_2theta value (perhaps a zero correction in _pd_calib_2theta_offset) D> _pd_calib_std_external_id D> Shouldn't this be called _pd_calib_std_ext_block_id since it D> contains the name of a datablock, not the name of a link to another D> part of the same datablock? This is an excellent suggestion. Brian T. ----------------------------------- Herbert Bernstein has sent to me the following comments: H> ... I agree very much H> that the nub of the issue is whether categories are to be taken seriously H> or not. I hope there will be an effort at promoting use of categories. H> It is not just a database issue. It is also very much an issue of H> creating a well-organized interchange format that can be used effectively H> by both people and software. Consistency and good style reduce the chances H> of error by either one. That is what I like about mmCIF and the new core. I am becoming increasingly confused over where the present discussions are taking us (if anywhere). Surely the fundamental difference is over the degree to which the mmCIF and pdCIF data models are 'normalised', in the sense in which the term is used in relational databases to describe the homogeneity of entries in a table. The mmCIF (DDL2) model is well normalised: there is a table for apples, a table for pears and a table for oranges. The pdCIF model is less clean; it has a table for "fruit", in which properties of apples, oranges and pears may be deposited willy-nilly. But there is nothing in relational database theory to prescribe one view as fundamentally more "correct" than another. The professional fruiterer may find the distinct apple/orange/pear tables essential for his business; the general stores manager may be able to make do with a "fruit" table and a scrap of paper and pencil tucked behind his ear. It seems to me that Brian T. is adopting the role of the man with the pencil; the mmCIF requirements are those of a multinational fruit wholesaler. Nothing I've seen so far convinces me that it would be death to the CIF effort to proceed with a DDL1.4 pdCIF dictionary - essentially the one we now have - and migrate later to a DDL2 formulation if it's demonstrated to be necessary. The powder community may not be able to validate their pd data files against the dictionary using John W.'s software, but they will be able to write their own validator or use CIFtbx or Paul Edgington's new HICCUP software. mmCIF tools will not be able to validate a pdCIF data file against the pdCIF dictionary; but they can validate the embedded core data names (through aliases) against the DDL2 version of the core dictionary, and accept the _pd_* tokens as effectively local datanames. My instinct is still to approve the dictionary now, so that people can begin archiving data sets today, send powder papers to Acta today. Recall that we have some 7000 data sets at the Acta offices in DDL0 format; ingenuity and hard work have ensured that they are accessible to the DDL2 software; and I am sure that a little more ingenuity could build a rules-driven normalisation converter from DDL1 format pdCIF data to DDL2 format if there were sufficient need to do so. I would prefer to see us commit to this approach and deliver a well-documented DDL1.4 dictionary (by which I mean that the documentation will include specific directives such as "category assignment of datanames must be derived from the dictionary _category value, and not inferred from the structure of dataname tokens") by this summer's ACA and ECM meetings, rather than have to report that the powder project has gone back into limbo. Regards Brian
- Prev by Date: (65) more on pdCIF categories
- Next by Date: (67) Call for votes on pdCIF and mmCIF
- Index(es):