Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF 2.0 syntax proposal for retaining backwards CIF 1.xcompatibility. .

Dear colleagues,

thanks everyone for you time and for the comments on my proposal.

On 2013-09-17 20:21, Bollinger, John C wrote:
> From a higher perspective, those costs may include some or all of
> the following:
> - Loss of developer good will
> - Lack of community acceptance
> - Technical issues at various levels arising from confusing one 
>   format with the other
> - User confusion

I can not emphasize strong enough how precise John has expressed my

On 2013-09-18 01:56, yayahjb wrote:
> It is appears that the PDB intends to keep the macromolecular 
> community files in the CIF 1 world, even if the dictionaries move up
> to DDLm

I can attest that the COD would follow the same path as well. Having
both CIF1 and CIF2 in one database and one dataflow would be, IMHO, to
costly for us to maintain, and switching completely to CIF2 would be too
disruptive both for us and probably for our users. In any case, there
needs to be a gradual mechanism for such change, we at COD can not/may
not drop the CIF1 compatibility overnight.

On 2013-09-16 05:58, James Hester wrote:
> Rather, we would allow ourselves to be disruptive in order to add
> the new datastructures for DDLm and to fix deficiencies in CIF1. We 
> therefore called the new standard CIF2, rather than (for example) 
> CIF1.2.

Actually, what about making CIF 1.2, as an *extension* of CIF 1.1, in
parallel to the fresh development of CIF2? I understand that this is an
extra work, therefore I am ready to offer my time and work to write
together a draft for CIF 1.2 and offer it for COMCIFS consideration in a
most possibly complete form for consideration.

This would permit the development of CIF 2.0 as planned (without
reconsidering the CIF2 syntax at the moment), would allow smooth
transition for CIF1 users (like COD and PDB) to a richer syntax, without
breaking the backwards compatibility, and (as it has just occurred to
me) would also ease a smooth transition from CIF1 to CIF2 in the future.

The CIF 1.2 would include all key elements that Herbert has identified
as necessary for the CIF 1 and CIF 2 interconversion. The CIF 1.2
proposal would:

a) retain compatibility with CIF1.1 as in my proposal currently discussed;

b) Introduce the UTF-8 encoding;

c) Introduced prefixed text fields as adopted in CIF2 syntax for quoting;

Under this proposal, the CIF1.2 <-> CIF2.0 conversion would be
straightforward, and would survive an "automatic data round-trip":

On 2013-09-18 01:56, yayahjb wrote:

> 1.  Handling UTF8.

Would not need any conversion between CIF1.2 and CIF2 in both
directions. Would not need any conversion in CIF1.2 <-> CIF2.0 transition.

CIF 1.1 <-> CIF1.2 would be possible by using XML-style entities like in
"Gra&#x17E;ulis" or "Gra&zcaron;ulis";) Or, indeed, as "Gra\<zulis" ...

> 2.  Handling bracketed constructs.  Almost any quoting scheme will 
> allow a bracketed construct to be carried as an opaque value in a
> CIF 1 file. I propose that we carry CIF 2 bracketed constructs in CIF
> 1 files as semicolon delimited quoted text, beginning either with 
> \n;$\n (newline, semicolonm dollar, newline) for non-line-folded 
> versions or with \n;\\$\n (newline, semicolon, backslash, dollar, 
> newline) for line-folded versions

The bracketed constructs would be carried straightforwardly between CIF
1.2 and CIF 2.0 in both directions, without any loss of information, by
switching between '{' and '[[' and quoting keys were necessary by CIF2
syntax. The special convention of the text field semantics that Herbert
proposes is thus not needed.

CIF1.2 -> CIF1.1 conversion would still be problematic but would
probably be not needed -- since CIF1.2 would be an extension of CIF1.1,
any application that has to deal with bracketed lists could probably
easily switch to CIF1.2 syntax (if only to ignore the bracketed values
after parsing them correctly...).

> 3.  Handling the different quoting and white-space conventions. This 
> will require aggressive use of both the CIF 1 and CIF 2 quoting 
> mechanisms, but should be doable.

Carrying over different quoted strings can be done in a uniform way
between CIF1.2 and CIF2.0 by using prefixed text fields that both
syntaxes would support. Since prefixed text fields can represent *any*
value, the conversion can be made in a uniform and automatic way (say,
all values containing "'", "\"" or ";" characters would be represented
by prefixed text fields in both CIF1.2 and CIF 2.0).

CIF1.1 <-> CIF1.2 conversion would be straightforward (after converting
UTF-8 to entities) since both use the same quoting conventions.

On 2013-09-16 05:58, James Hester wrote:
> [The compatible CIF2 syntax proposal] it gives '[ [' and '[['
> different meanings, which is counter-intuitive and a step backward
> towards the CIF1 approach of giving whitespace extra significance.

May I disagree with this particular statement. The double-character
tokens are common in computer languages, viz. C/C++(<-- :) tokens "+"
and "++"; bash and C have '<' vs. '<<' and '>' vs. '>>'; some languages
even introduced triple (!!!) tokens such as """ :). At least for C,
spaces matter in the '++' tokens ('+ +' is not the same as '++').

The CIF itself *is* space sensitive, in some very peculiar ways. Thus
there is no such invariant in CIF (be it CIF1 or CIF2) as "spaces can be
arbitrarily removed between non-aplphanumeric tokens" (e.g. CIF values
'? ?' and '??' would be interpreted in quite different ways). Thus,
introducing '[ [' vs. '[[' distinction does not invalidate any useful
general invariants about CIF1, and using double characters is a usual
practice in many computer languages to introduce additional tokens, it's
nothing "against the fur" for contemporary computer people.

I guess from all computer languages only FORTRAN code could be parsed
after removing *all* spaces, right? :)

The specification of CIF1.2 would open a non-disruptive, smooth
migration path CIF1.0 -> CIF1.1 -> CIF1.2 -> CIF2.0, with the roadmap
viable for the coming decade.

What you think?


Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366
comcifs mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.