Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF 2.0 syntax proposal for retaining backwards CIF 1.xcompatibility. .

Dear Saulius

> The specification of CIF1.2 would open a non-disruptive, smooth
> migration path CIF1.0 -> CIF1.1 -> CIF1.2 -> CIF2.0

I am convinced that this is not the right way to go. While it has
appeal on paper, introduction of another format provides another
opportunity for end-users and "part-time" developrs to get things

By "part-time" I mean (but not in a derogatory sense) crystallographers
who write software primarily to perform scientific calculations, who
see the information exchange (and general i/o) aspects of programming
as at best a necessary evil. There are still many of them around.

>> It is appears that the PDB intends to keep the macromolecular 
>> community files in the CIF 1 world, even if the dictionaries move up
>> to DDLm
> I can attest that the COD would follow the same path as well.

And it is equally likely that IUCr journals will follow suit, at least
until such time as robust feature-rich toools for CIF2 exist. If that
happens quickly - and I hope it does - so much the better. If it takes
longer, at least we are not in the position that have acute problems
with the existing workflow and file format.

Best wishes

On Wed, Sep 18, 2013 at 11:34:58AM +0300, Saulius Gra?ulis wrote:
> Dear colleagues,
> thanks everyone for you time and for the comments on my proposal.
> On 2013-09-17 20:21, Bollinger, John C wrote:
> > From a higher perspective, those costs may include some or all of
> > the following:
> > 
> > - Loss of developer good will
> > - Lack of community acceptance
> > - Technical issues at various levels arising from confusing one 
> >   format with the other
> > - User confusion
> I can not emphasize strong enough how precise John has expressed my
> concerns!
> On 2013-09-18 01:56, yayahjb wrote:
> > It is appears that the PDB intends to keep the macromolecular 
> > community files in the CIF 1 world, even if the dictionaries move up
> > to DDLm
> I can attest that the COD would follow the same path as well. Having
> both CIF1 and CIF2 in one database and one dataflow would be, IMHO, to
> costly for us to maintain, and switching completely to CIF2 would be too
> disruptive both for us and probably for our users. In any case, there
> needs to be a gradual mechanism for such change, we at COD can not/may
> not drop the CIF1 compatibility overnight.
> On 2013-09-16 05:58, James Hester wrote:
> > Rather, we would allow ourselves to be disruptive in order to add
> > the new datastructures for DDLm and to fix deficiencies in CIF1. We 
> > therefore called the new standard CIF2, rather than (for example) 
> > CIF1.2.
> Actually, what about making CIF 1.2, as an *extension* of CIF 1.1, in
> parallel to the fresh development of CIF2? I understand that this is an
> extra work, therefore I am ready to offer my time and work to write
> together a draft for CIF 1.2 and offer it for COMCIFS consideration in a
> most possibly complete form for consideration.
> This would permit the development of CIF 2.0 as planned (without
> reconsidering the CIF2 syntax at the moment), would allow smooth
> transition for CIF1 users (like COD and PDB) to a richer syntax, without
> breaking the backwards compatibility, and (as it has just occurred to
> me) would also ease a smooth transition from CIF1 to CIF2 in the future.
> The CIF 1.2 would include all key elements that Herbert has identified
> as necessary for the CIF 1 and CIF 2 interconversion. The CIF 1.2
> proposal would:
> a) retain compatibility with CIF1.1 as in my proposal currently discussed;
> b) Introduce the UTF-8 encoding;
> c) Introduced prefixed text fields as adopted in CIF2 syntax for quoting;
> Under this proposal, the CIF1.2 <-> CIF2.0 conversion would be
> straightforward, and would survive an "automatic data round-trip":
> On 2013-09-18 01:56, yayahjb wrote:
> > 1.  Handling UTF8.
> Would not need any conversion between CIF1.2 and CIF2 in both
> directions. Would not need any conversion in CIF1.2 <-> CIF2.0 transition.
> CIF 1.1 <-> CIF1.2 would be possible by using XML-style entities like in
> "Gra&#x17E;ulis" or "Gra&zcaron;ulis";) Or, indeed, as "Gra\<zulis" ...
> > 2.  Handling bracketed constructs.  Almost any quoting scheme will 
> > allow a bracketed construct to be carried as an opaque value in a
> > CIF 1 file. I propose that we carry CIF 2 bracketed constructs in CIF
> > 1 files as semicolon delimited quoted text, beginning either with 
> > \n;$\n (newline, semicolonm dollar, newline) for non-line-folded 
> > versions or with \n;\\$\n (newline, semicolon, backslash, dollar, 
> > newline) for line-folded versions
> The bracketed constructs would be carried straightforwardly between CIF
> 1.2 and CIF 2.0 in both directions, without any loss of information, by
> switching between '{' and '[[' and quoting keys were necessary by CIF2
> syntax. The special convention of the text field semantics that Herbert
> proposes is thus not needed.
> CIF1.2 -> CIF1.1 conversion would still be problematic but would
> probably be not needed -- since CIF1.2 would be an extension of CIF1.1,
> any application that has to deal with bracketed lists could probably
> easily switch to CIF1.2 syntax (if only to ignore the bracketed values
> after parsing them correctly...).
> > 3.  Handling the different quoting and white-space conventions. This 
> > will require aggressive use of both the CIF 1 and CIF 2 quoting 
> > mechanisms, but should be doable.
> Carrying over different quoted strings can be done in a uniform way
> between CIF1.2 and CIF2.0 by using prefixed text fields that both
> syntaxes would support. Since prefixed text fields can represent *any*
> value, the conversion can be made in a uniform and automatic way (say,
> all values containing "'", "\"" or ";" characters would be represented
> by prefixed text fields in both CIF1.2 and CIF 2.0).
> CIF1.1 <-> CIF1.2 conversion would be straightforward (after converting
> UTF-8 to entities) since both use the same quoting conventions.
> On 2013-09-16 05:58, James Hester wrote:
> > [The compatible CIF2 syntax proposal] it gives '[ [' and '[['
> > different meanings, which is counter-intuitive and a step backward
> > towards the CIF1 approach of giving whitespace extra significance.
> May I disagree with this particular statement. The double-character
> tokens are common in computer languages, viz. C/C++(<-- :) tokens "+"
> and "++"; bash and C have '<' vs. '<<' and '>' vs. '>>'; some languages
> even introduced triple (!!!) tokens such as """ :). At least for C,
> spaces matter in the '++' tokens ('+ +' is not the same as '++').
> The CIF itself *is* space sensitive, in some very peculiar ways. Thus
> there is no such invariant in CIF (be it CIF1 or CIF2) as "spaces can be
> arbitrarily removed between non-aplphanumeric tokens" (e.g. CIF values
> '? ?' and '??' would be interpreted in quite different ways). Thus,
> introducing '[ [' vs. '[[' distinction does not invalidate any useful
> general invariants about CIF1, and using double characters is a usual
> practice in many computer languages to introduce additional tokens, it's
> nothing "against the fur" for contemporary computer people.
> I guess from all computer languages only FORTRAN code could be parsed
> after removing *all* spaces, right? :)
> The specification of CIF1.2 would open a non-disruptive, smooth
> migration path CIF1.0 -> CIF1.1 -> CIF1.2 -> CIF2.0, with the roadmap
> viable for the coming decade.
> What you think?
> Regards,
> Saulius
> -- 
> Dr. Saulius Gražulis
> Vilnius University Institute of Biotechnology, Graiciuno 8
> LT-02241 Vilnius, Lietuva (Lithuania)
> fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
> mobile: (+370-684)-49802, (+370-614)-36366
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://mailman.iucr.org/mailman/listinfo/comcifs
comcifs mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.