Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Revised draft of CIF 1.1 syntax document

  • Subject: Re: Revised draft of CIF 1.1 syntax document
  • From: Brian McMahon <bm@xxxxxxxx>
  • Date: Wed, 25 Sep 2002 10:12:16 +0100 (BST)
> Please understand that while it is appropriate to speak of lines in a CIF,
> it will cause a great deal of machine dependent trouble if we require the
> inclusion of a "newline" within the definition of a line.  The terminator
> of a line is a _very_ machine/system dependent concept.

I think this now understood and accepted by the contributors to this
discussion. However, you have identified the difficulty that, while unable to
specify a particular character of set of characters as a line terminator, we
must be able to handle lines as discrete entities according to whatever
implementation is appropriate for the current software environment.

>                                                ...   The obligation of a
> parser is to provide one empty line in the first case, ...

This is a helpful way to think about it. More generally, I see the obligation
of a parser to be to pass to a back-end application the information that the
application needs to work upon the contents of the file. So, when handling a
semicolon-delimited text field, the parser needs to pass on the information
that the "value" of the field is one or more character strings, each
considered a separate line of text. The backend application has the
responsibility, if an author so decides, of padding or truncating lines with
whitespace, of concatenating folded lines, or of trying to convert between
semicolon delimiters and quote delimiters: the parser does not need to worry
about that.

What Herbert is saying is that Fortran will lose trailing space on a line,
so a general statement about parsers must carry the caveat that "trailing
white space on a line may however be elided", as is already in para. 17 of
the syntax document. An *application* that depends on trailing whitespace
must be aware of this, and be designed in some way that it can guarantee
that the whitespace is properly handled. This may mean that the application
is restricted to a particular programming language, and/or uses a specific
parser that is attested to retain trailing whitespace.

The question then is: should the parser identify the last line of the input
text field as:
   (i)  a character string forming a last line of text
or
   (ii) a character string to be emitted without termination in the
        current output line ?

The argument boils down to: "(i) seems more natural and is how one would
read the description of these things in the STAR papers; but (ii) is the
way it has actually been implemented in CIFtbx and allows easier
transformation between semicolon and quote-delimited strings, particularly
if they don't extend over multiple lines".

I should certainly like to hear from Syd and Nick on this. It's another
small point of detail, but it's something from which we need to remove the
ambiguity.



>                                                        ...  It would be
> an equally valid approach to have a parser spew out the entire text field
> as a series of lines, but be warned that some valid imgCIF files may
> demand more memory from such a parser than it is likely to have available.

I'm not quite sure I understand this. Do you mean that the parser would read
the entire text field into memory and at the end of the process emit the
same byte-stream unchanged, leaving downstream applications to recognise
separate lines?

Brian

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.