I want to discuss some ideas I have about the separation of the mmCIF data into a diffraction data block and a model data block. I know the committee will not want to hear such basic matters being questioned at this time. All I want to accomplish is to make the community aware of the problems. As I understand the situation, the mmCIF committee and the PDB have decided to allow the diffraction data to be stored in one file and models to be stored in another. I think it is very reasonable to divide the data in this fashion because of the different natures of the two kinds of data. However, whenever you split data you have to examine each data group and decide which direction it must go. I disagree with the details of this split as it appears to be implemented. First, we must recognize that the mapping between these to data blocks is many to many. For a given diffraction data set there will be many models. For a given model there many be several data sets upon which it is based (e. g. a X-ray data block and a neutron data block, or X-ray and NMR data). As I understand the current division for these files, the diffraction file contains the ID of the model data block but the model data block does not contain a pointer to the diffraction data block. It is difficult to place a table of models in the diffraction data block because this file could be constructed prior to the solution of the structure (and the structure might never be solved leaving an empty list). In my work I generate many models which need to be passed between program packages and would in a perfect world be in mmCIF. The table in the diffraction mmCIF file would have to be updated almost daily. I suggest that the list of models which depend on a diffraction data block be optional. In an archive where both model and diffraction data blocks are stored you could put in a table complete within that restricted universe of models. In the lab you would not maintain this table. However, it would be easy to have the diffraction data blocks listed in the model data block because the programs need to know that information anyway. This field should be mandatory for any model based on diffraction data. However because there may be multiple diffraction data blocks the definition of this dependency must be in a loop construction. Because many models will be based upon any particular diffraction file the calculated F's cannot be stored with the observed F's. While I recognize some (but not much) utility to storing the calculated F's, if you are going to have them they must be in the model's data block -- They are a property of the model alone. The many-to-many relationship between model and diffraction data cannot be represented when the Fc's are in the diffraction data file. The second problem is what data groups goes into each data block? Currently the data collection, data reduction, and agreement statistics are stored in the model file. These data belong in the diffraction file. They do not change when the model changes and it would be redundant to write the same values over and over again. You also would have to place it all in loops to cover each diffraction data set. With this information in the diffraction data block life becomes much simpler. You do not have to place your statistics inside of loops nor do you have to the confusion of listing diffraction intensities from multiple crystals in a single loop. For structures currently in the PDB without deposited structure factors one could construct small diffraction data blocks to contain the statistics but without structure factors. These would be like the current PDB files which contain no coordinates. However they would contain the proper cross links (hyperlinks?) and data dependencies. Their presence would make clear the huge gaps in deposition of these data and might encourage some people to deposit older diffraction data sets. The final point I would like to make is the most outlandish, but does come as a natural progression of these thoughts. The cell constants are not a property of the model and should not be stored in the model's data block. If you have a model which was refined against two diffraction patterns in all likelyhood you will have two different sets of cell constants. In the models already deposited the sets will be very similar but there are cases where people have refined models with constrained ncs between nonisomorphic crystal forms. In such a case each diffraction pattern would not only have unrelated cell constants but different space groups. The cell constants and space group belongs in the diffraction data block. Placing the cell constants in the diffraction data block immediately solves another problem. What are the cell constants of an NMR structure? The concept does not apply. If you place the cell constants in the diffraction data block you can make their presence mandatory and not affect NMR or theoretical models' validation. Currently the provision of the cell constants cannot be mandatory. Related to the cell constants is the deorthogonalization matrix. In fact the deorthogonalization matrix is a composite of two things, the cell constants and the convention. The cell constants are a function of the diffraction data block (which indicates that there cannot be a single deorthogonalization matrix because there may be more than one crystal type). This implies that the deorthogonalization matrix should be in the diffraction data block. However it is possible that differing conventions might be used in different models implying that the orthogonalization convention should be in the model data block. Since mmCIF seems to want the matrix and not its convention you must have a loop construction in the model data block which identifies each diffraction data block and the deorthogonalization convention used to move the model into that crystal's coordinate system. It would be cleaner to simply list the convention and not the matrix but I don't know of a good way to do this in general. Currently mmCIF has the cell constants, the convention, and the matrix listed (or listable). This information is redundant and should be consistent with itself. Without a standard form to describe the convention I would not like to be assigned the job of writing the validation software. If there is interest in this approach I could put more time into filling in the details. Dale Tronrud