Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF header

Dear Nick,

   I was not trying to argue the merits -- I think both sides have done
that to death, but to simply present what the two proposals are.

   As for your question -- I do not know what the user intended and do
not find myself any more enlightened on that issue under with proposal.
If I has a dictionary, I well might know more from the stated purpose
of the data value, or if we added some sort of tags throwing light
on what is intended.  But with what is on the table right now, I
dont know and I doubt anyone else does.

   Now, could we please just have that straw vote and be done with this.

   Regards,
     Herbert
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Tue, 24 Nov 2009, Nick Spadaccini wrote:

>
>
>
> On 23/11/09 9:29 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
> wrote:
>
>> So, to try to reduce this to two simply choices:
>>
>> For both choices, we are discussing quoted stings delimited by
>>
>>    1.  The single quote character '\''
>>    2.  The double quote character '"'
>>    3.  The newline-semicolon digraph "\n;"
>>    4.  The treble single quoted trigraph "'''"
>>    5.  The treble double quotes trigraph "\"\"\""
>>
>> In the first two cases the string may not contain a new-line character In
>> the last three cases the string may contain new-line characters
>>
>> In all cases, the processing of a quotes string ceases when an unprotected
>> terminal quoting character, digraph or trigraph is encountered
>>
>> The two proposals have to do with the handling of the protection of a
>> terminal quoting chacater, digraph or trigraph.
>>
>> Proposal 1:  The string delivered to the application will be literally the
>> string of characters starting from the first character after the initial
>> quoting character, digraph or trigraph through the last character prior to
>> the first unprotected terminal quoting character, digraph or trigraph, not
>> including the initial or terminal quoting character, digraph or trigraph.
>> An unprotected reverse-solidus will protect the following character for
>> interpretation as a reverse solidus, or as a terminal quoting character or
>> as a member of a terminal digraph or as a member of a terminal trigraph,
>> but will be included in the string delivered to the application. Thus
>> "ab\"cd" and 'ab\"cd' would return the same string (ab\"cd) to the
>> application
>>
>> Proposal 2. The string delivered to the application will be literally the
>> string of characters starting from the first character after the initial
>> quoting character, digraph or trigraph through the last character prior to
>> the first unprotected terminal quoting character, digraph or trigraph, not
>> including the initial or terminal quoting character, digraph or trigraph
>> and not including any reverse-solidus protecting a terminal quoting
>> character or protecting any individual character in a terminal digraph or
>> trigraph.  Any reverse-solidus will protect the following character for
>> interpretation as a a terminal quoting character or as a member of a
>> terminal digraph or as a member of a terminal trigraph, and will be
>> removed from the string delivered to the application. Thus "ab\"cd" and
>> 'ab\"cd' would return the two different strings (ab"cd vs. ab\"cd) to the
>> application
>>
>> Please check whether I have presented the substance of the difference in
>> approaches fairly, and if so, let us please just have straw vote and
>> finish this.
>
> What you have presented is not the substance of the difference in
> approaches. What this shows is by choosing to consider the elide of a
> terminal character ONLY when that is delimiting character that you get
> differences in what is returned. James and I could have argued that any "
> and ' that is elided is skipped. If that were the approach then our
> algorithm would also be consistent in returning
>
> "ab\"cd" -> ab"cd
> 'ab\"cd' -> ab"cd
>
> By the way this is the Python approach to elided terminals.
>
> The substance of the differences in the two approaches is what is
> consistently done with terminating characters within delimited strings and
> where is it done. We ague it is consistently done by the parser at write and
> read and the user doesn't need to see it. You say the user is responsible at
> both ends.
>
> My question is, when your system reads "ab\"cd" and returns to me ab\"cd
> what was the original string the user was trying to output? Was it ab\"cd or
> was it ab"cd? Because I can't tell.
>
>>
>> I favor proposal #1.
>>
>> Regards,
>>    Herbert
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
>>
>> On Mon, 23 Nov 2009, Nick Spadaccini wrote:
>>
>>> Agree to
>>> #<something>CIF_2.0
>>>
>>> <something> is whatever character sequence people are happy with.
>>>
>>> <something>=\# seems reasonable enough.
>>>
>>> Don't see we need for the _ between CIF and 2.0, but agnostic about it. I am
>>> assuming there must be an existing CIF_1.1 magic sequence in use - though I
>>> don't recall any formal specification for it.
>>>
>>>
>>> On 23/11/09 1:32 PM, "James Hester" <jamesrhester@gmail.com> wrote:
>>>
>>>> I agree with Brian's suggestion.  Can other participants also indicate
>>>> their agreement or alternative suggestions?
>>>>
>>>> James.
>>>>
>>>> On Fri, Nov 20, 2009 at 11:15 PM, Brian McMahon <bm@iucr.org> wrote:
>>>>>>>> Is there a reason why it can't be #!, to make it consistent with other
>>>>>>>> *nix
>>>>>>>> based directives.
>>>>>
>>>>> As James says, #! is normally understood by Unix shells to specify
>>>>> an appropriate shell interpreter, not quite what we're aiming for here.
>>>>>
>>>>> A characteristic initial set of bytes (file 'magic') is often used
>>>>> by GUI file managers and other generic file-handling software to
>>>>> associate icons or applications (in association with, or sometimes
>>>>> competing against, the use of a filename extension). We use this
>>>>> approach to identify the type of file uploaded in our submission
>>>>> system. It's useful for that initial byte sequence to be (a) short
>>>>> to facilitate rapid scanning, (b) specific to an individual file type.
>>>>> For that reason we suggested for CIF 1.1 the magic string
>>>>>      #\#CIF_1.1
>>>>> For CBF it is
>>>>>      ###CBF: VERSION
>>>>>
>>>>> I recommend #\#CIF_2.0 to be consistent with version 1.1 and so that
>>>>> generic file magic handling can map all #\#CIF_ strings to files of type
>>>>> "cif". (A sophisticated file manager could extend the scan to allow for
>>>>> different icons to be associated with version 1.1 and version 2 CIFs.)
>>>>> It seems a pity from the viewpoint of neatness that the CIF and CBF
>>>>> magic strings aren't more similar in structure.
>>>>>
>>>>> Brian
>>>>>
>>>>> On Fri, Nov 20, 2009 at 02:09:56PM +0800, Nick Spadaccini wrote:
>>>>>>
>>>>>> We don't need an extra character, a single hash would suffice, but I guess
>>>>>> an extra character my uniquely identify it as the CIF header to a parser,
>>>>>> rather than it as just a comment. An extra character also moves you away
>>>>>> from an ordinary comment which is smart, to a smart comment which has its
>>>>>> own unique tag. I am NOT a fan of smart comments, or comments which can be
>>>>>> smart, but they seem to be to modus operandi of many systems.
>>>>>>
>>>>>> On 20/11/09 1:59 PM, "James Hester" <jamesrhester@gmail.com> wrote:
>>>>>>
>>>>>>> Wouldn't this cause a UNIX-style OS to try to execute 'CIF2' if
>>>>>>> someone accidentally typed the filename in a command context?  This is
>>>>>>> not a huge problem in that it will otherwise attempt to execute
>>>>>>> 'data_xxxx', and only if the file is executable.
>>>>>>>
>>>>>>> I guess I don't understand why we need an extra character after the
>>>>>>> hash.  If we really do need an extra character, why not just another
>>>>>>> hash?
>>>>>>>
>>>>>>> On Mon, Nov 9, 2009 at 7:30 PM, Nick Spadaccini <nick@csse.uwa.edu.au>
>>>>>>> wrote:
>>>>>>>> On 30/10/09 11:47 PM, "Joe Krahn" <krahn@niehs.nih.gov> wrote:
>>>>>>>>
>>>>>>>>> A directive embedded in an initial comment really does make sense,
>>>>>>>>> because it is irrelevant once the correct parser is selected. It might
>>>>>>>>> make sense to add a specific 2nd character, similar to the POSIX shell
>>>>>>>>> #!. For example, the STAR format could define an initial line beginning
>>>>>>>>> with #% as parsing directive rather than just a plain comment. That
>>>>>>>>> makes the abuse of a comment line as a bit less of a hack.
>>>>>>>>
>>>>>>>> Is there a reason why it can't be #!, to make it consistent with other
>>>>>>>> *nix
>>>>>>>> based directives.
>>>>>>>>
>>>>>>>> cheers
>>>>>>>>
>>>>>>>> Nick
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>
>>>>
>>>>
>>>
>>> cheers
>>>
>>> Nick
>>>
>>> --------------------------------
>>> Associate Professor N. Spadaccini, PhD
>>> School of Computer Science & Software Engineering
>>>
>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>>> MBDP  M002
>>>
>>> CRICOS Provider Code: 00126G
>>>
>>> e: Nick.Spadaccini@uwa.edu.au
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
>
> CRICOS Provider Code: 00126G
>
> e: Nick.Spadaccini@uwa.edu.au
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.