Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Items for the Agenda of the COMCIFS closed meeting

I support the breadth of the discussion suggested by Herbert. We
should always be examining new alternatives to the way that we are
currently handling and defining data, and these closed meetings are
the place to do this. All of us recognise the investment and inertia
involved in some of the current data applications and that changes
to these will need to carefully planned and staged.  It has taken
us 15 years to get to this stable level of managing crystallographic
data and it well may take another 15 to get to the next!
John Westbrook and I intend presenting our ideas on an integrated DDL
for crystallographic dictionaries based on StarDDL.  As far as I can
judge crystallography is currently a leader in the development and use
of data ontologies, and I think we should try to keep it that way.
Professor Sydney R. Hall
School Biomedical & Chemical Sciences
University of Western Australia
Crawley, 6009  AUSTRALIA.
Ph: +61 (8) 6488 2725
Fx: +61 (8) 6488 1118
"Data data everywhere but not a thought to think!" - Theodore Roszak

On 22/03/2005, at 5:18 AM, Herbert J. Bernstein wrote:

> I would like to suggest that the real question is not the future 
> direction
> of CIF, but the future direction of information management in 
> crystallography
> and allied sciences.  The issues of catagory key naming and the 
> relationship between
> DDL1 and DDL2 or between parent and child data items are not central 
> issues
> for our science.   The issue of how well our data can move into and 
> among
> experiment control systems, databases and publications is.  The 
> definition
> of CIF and the structure of its supporting software should be moving in
> a direction that supports efficient and reliable data management in 
> crystallography.
> John and Syd made an important contribution to CIF when they made DDL2 
> CIFs
> better able to carry the information needed to load SQL-based data 
> bases.
> dREL will add more useful features.  We should be asking what 
> additional
> funtional capabilities, if any, we made need, and then working to 
> create
> a single framework with supporting software within which all the 
> features
> we need are easily available to working crystallographers, archivists 
> and
> publishers.
> At 3:17 PM -0500 3/21/05, David Brown wrote:
>> To members of COMCIFS
>> I would like to place the following two topics on the agenda for the 
>> closed meetings in Florence.  I welcome suggestions for other agenda 
>> items.
>> 1. What is the role of CIF in the current rapidly changing world of 
>> information technology?
>> 2. How can we make transparent the boundary between CIFs written with 
>> DDL1 dictionaries and those written with DDL2?
>> David Brown
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> It should be no surprise that an information technology language 
>> adopted in 1990 needs to be reviewed after fifteen years of 
>> operation.   The rapid advances in the field and the introduction of 
>> XML make such a review more than timely.  A further urgency is added 
>> by the need to ensure that incremental changes that we make in the 
>> dictionaries and other documents are compatible with future 
>> directions of crystallographic information technology.  Two current 
>> problems illustrate how this impacts on dictionary structures.
>> 1. Is it better to have a semantically meaningless item as the 
>> _list_reference (DDL1) or _category_key (DDL2) to label each line in 
>> a loop, or should we use semantically meaningful items (such as 
>> _atom_site_label) that are already present?  The former solution 
>> allows a more straightforward programming and avoids possible 
>> conflicts between the information technology and crystallographic use 
>> of the item, but the latter leaves the CIF less cluttered and easier 
>> for humans to follow because the links are more readily followed by 
>> eye.  The current revision of the core dictionary needs an answer to 
>> this question, because the answer will affect future CIF data 
>> structures.
>> 2. Should there be rules defining the relationships that are allowed 
>> to be expressed by parent-child links?  These links have been 
>> developed in an ad hoc way, but as we move towards more advanced data 
>> structures, we may find that we have developed links that are 
>> impossible to manipulate.  One way of exploring the logic of the 
>> linked structures is to use the ResourceDescriptionFramework (RDF) 
>> which is being developed as part of the Semantic Web (see 
>> http://www.w3.org/RDF/ and http://www.w3.org/RDF/FAQ ).  This scheme 
>> expresses the parent-child links as a graph making it easier to trace 
>> the logic.  Another possibility is to use the Unified Modeling 
>> Language ( www.uml.org ).
>> As interest focuses on software that explores the interactions of 
>> small and large molecules, the incompatibility between the Dictionary 
>> Definition Language 1 (DDL1) and DDL2 is becoming a hindrance.
>> CoreCIF is designed for use with small molecules and is written in 
>> DDL1 but mmCIF designed for reporting macromolecules is written using 
>> DDL2.  While most of the features of the two standards are similar, 
>> there are two significant differences:  Firstly DDL2 has a tighter 
>> structure designed to make automatic computer manipulation of the 
>> information easier, secondly the names given to the data items have a 
>> different structure.  As the similarities between the two languages 
>> are far greater than their differences, it should be possible to 
>> achieve some convergence;  already the core dictionary is evolving 
>> towards the DDL2 standard, but a complete convergence would require 
>> major reworking of some dictionaries.
>> Convergence can be achieved in different ways.  One way is to ensure 
>> that software is able to validate CIFs against both DDL1 and DDL2 
>> dictionaries, and since the dictionaries contain synonyms of the data 
>> names (alternative data names for items with essentially the same 
>> definition, listed under _related_item (DDL1) and 
>> _item_aliases.alias_name (DDL2)), any character string used to 
>> represent a particular data name should be recognized by software 
>> that takes note of any alias names present regardless of the 
>> dictionary or version being used.   Since all the items in the 
>> coreCIF dictionary appear (transformed to DDL2) in the mmCIF 
>> dictionary with their original DDL1 data names given as aliases, 
>> mmCIF software should be able to read coreCIFs without difficulty. 
>> mmCIF aliases are currently not present in the coreCIF dictionary but 
>> could easily be added.  Alternatively, a DDL2 version of the coreCIF 
>> dictionary could be separated out and used as an alternative to the 
>> DDL1 core dictionary.
>> Attachment converted: Macintosh HD:idbrown 15.vcf (TEXT/ttxt) 
>> (00111444)
>> _______________________________________________
>> comcifs mailing list
>> comcifs@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/comcifs
> -- 
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>                  +1-631-244-3035
>                  yaya@dowling.edu
> =====================================================
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs

Reply to: [list | sender only]