Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Relationship of CIF2 to legacy platforms

Dear John,

> 1. support legacy ascii cifs in cif2 apps

     Most legacy ascii CIFS should be using a 7-bit ascii subset of the 
UTF-8 character set, so the problem should not be the ascii, but the 
change in rules for unquoted strings, the new rules on quoted strings and 
the impact of the change in uses of the reverse solidus in quoted strings 
on the line-folding protocol.  I am trying to keep my code parametrized to 
be able to work both ways (or rather 3 ways -- old CIF, 2008 DDLm and 
CIF2)

>
> 2. support for an upper limit on line length in cif2
>

As long as there is a finite upper limit, is should be feasible for most 
fortran systems to cope.  Right now we are at 2048.  If there is a reason 
to pick, say, 4096 or 8192, it should be possible to cope with it.

> 3. extended character set support in cif2
>
>   +  identification of character set via a leading directive or comment

     The group consensus was to make CIF2 be strictly UTF-8, which is a 
nice, extended character set.  For CBF, I will also support UCS-2, as well 
as the pure binary formats.

>
>   +  choice of default character set for cif2 (utf-8? or via a directive).

     I believe the group consensus was that a CIF would use UTF-8 without 
any need for directives, so I am writing my code on that assumption.  For 
non-CIF applications, I intend to recognize BOMs and to write the vim 
style encoding comment.

>
>   +  your need for extended character other than utf-8

    imgCIF uses binutf as one of its encoding options.  binutf is a UCS-2 
based encoding.  The current code is capable of supporting it purely 
within a MIME section, but for imgCIF it makes sense to support it for the 
entire file as well for convenience in debugging such an imgCIF file in a 
text editor.  I know there are people who don't think any of the ascii/utf 
versions of CBF are needed when all images are collected in the pure 
binary mode, but it has been a real life-saver to have the pure ascii 
representations for debugging.  The pure UCS-2 version will make the files 
I have to deal with a lot smaller and easier to work with.

Let me know if you need more information.

Regards,
   Herbert
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Sat, 31 Oct 2009, John Westbrook wrote:

> Hi Herb,
>
> I am trying to understand your position with respect to the line length
> and character set.
>
> Could you briefly summarize where you stand on the following points -
>
> 1. support legacy ascii cifs in cif2 apps
>
> 2. support for an upper limit on line length in cif2
>
> 3. extended character set support in cif2
>
>   +  identification of character set via a leading directive or comment
>
>   +  choice of default character set for cif2 (utf-8? or via a directive).
>
>   +  your need for extended character other than utf-8
>
> Thanks very much,
>
> John
>
>
> Herbert J. Bernstein wrote:
>> Dear Joe,
>>
>>    This is _not_ a matter of legacy software, but of currently maintained 
>> data collection software that happens to be written in Fortran.
>>
>>    People have work to get done.  The failure of CIF2 to support fortran on 
>> a wide range of platforms will not stop those applications from doing what 
>> they need to do, it will just further hinder the adoption of CIF in the 
>> macromolecular community.
>>
>>    Regards,
>>      Herbert
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
>> 
>> On Fri, 30 Oct 2009, Joe Krahn wrote:
>> 
>>> IMHO, even though Fortran is not dead yet, it's quirky I/O semantics
>>> should not be an important consideration for CIF2. I still write Fortran
>>> code, so I am not suggesting that Fortran code be neglected. However,
>>> still using Fortran should have a modern compiler that supports STREAM
>>> I/O (including GFortran), which avoids these text I/O problems.
>>> 
>>> If you have old software that you don't want to maintain, there can
>>> always be a CIF2-to-CIF1 utility, so that the old program will still
>>> work as-is. For new code, it really only makes sense to use Fortran for
>>> number crunching, and just use a C library to do CIF I/O.
>>> 
>>> Joe Krahn
>>> 
>>> 
>>> James Hester wrote:
>>>> By 'systems' I had in mind computer operating systems and programming
>>>> environments, in particular multilingual support and Fortran.  So, for
>>>> example, as Herbert's replies have been indicating, Fortran behaviour
>>>> continues to influence the CIF standard.
>>>> 
>>>> On Thu, Oct 29, 2009 at 1:04 AM, David Brown <idbrown@mcmaster.ca> wrote:
>>>>> James asks whether we should require CIF2 to support legacy systems.  I 
>>>>> am
>>>>> not sure what James means by 'systems'.  Are these datafiles or 
>>>>> programs?
>>>>> That is to say is the queston 'should CIF2 applications be able to read
>>>>> legacy CIFs?', or 'should legacy CIF1 programs be able to read CIF2
>>>>> datafiles?'?
>>>>> 
>>>>> The answer to the first question is definitely 'yes'.  It is part of the
>>>>> mandate of CIF2 that its programs should be able to process the existing
>>>>> archive so that the archive can take advantage of the enhanced functions 
>>>>> of
>>>>> DDLm.  The CIF2 dictionaries will alias all the datanames appearing in 
>>>>> the
>>>>> CIF1 dictionaries in a way that makes such reading easy.
>>>>> 
>>>>> The answer to the second question is almost certainly no, at least in 
>>>>> cases
>>>>> where the CIF data file makes use of the added syntax features.  All the
>>>>> datanames in CIF1.0 dictionaries differ from those in the CIF2 
>>>>> dictionary by
>>>>> not using a period at the end of the category part of the name and in 
>>>>> some
>>>>> cases the names differ in other ways.  There would be no point in trying 
>>>>> to
>>>>> produce CIF2 compatible CIF1 dictionaries, since the CIF1 dictionaries 
>>>>> are
>>>>> poorly designed for maintenance and have poor aliasing features.
>>>>> 
>>>>> David
>>>>> 
>>>>> 
>>>>> James Hester wrote:
>>>>> 
>>>>> Dear All,
>>>>> 
>>>>> I think it would be helpful to make a policy decision regarding our
>>>>> treatment of legacy systems in CIF2.0.  This concerns first and
>>>>> foremost Fortran derived line-length constraints, but may impact on
>>>>> the encoding discussion in deciding which encodings might get some
>>>>> special treatment.  There may be other such issues as well.
>>>>> We have a few choices:
>>>>> 
>>>>> 1. Disregard legacy system issues when designing CIF2, on the basis
>>>>> that such systems can continue to use CIF1 and will eventually
>>>>> disappear at about the same time that it does (sort of like ASCII and
>>>>> Fortran...)
>>>>> 
>>>>> 2. Continue to support legacy systems on the basis that we don't want
>>>>> to deny such systems the chance to partake of the raw unadulterated
>>>>> goodness of CIF2, or perhaps more seriously that such legacy systems
>>>>> are integral to CIF2 takeup.
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> James.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>> 
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> -- 
> ******************************************************************
>  John Westbrook, Ph.D.
>  Rutgers, The State University of New Jersey
>  Department of Chemistry and Chemical Biology
>  610 Taylor Road
>  Piscataway, NJ 08854-8087
>  e-mail: jwest@rcsb.rutgers.edu
>  Ph:  (732) 445-4290  Fax: (732) 445-4320
> ******************************************************************
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.