[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
RE: Advice on COMCIFS policy regarding compatibility of CIFsyntaxwith other domains.
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: RE: Advice on COMCIFS policy regarding compatibility of CIFsyntaxwith other domains.
- From: Matthew Towler <towler@ccdc.cam.ac.uk>
- Date: Thu, 10 Mar 2011 09:22:18 +0000
- Accept-Language: en-US, en-GB
- acceptlanguage: en-US, en-GB
- In-Reply-To: <a06240800c997f77ca4d3@[192.168.2.102]>
- References: <AANLkTikfLNd6mQB9hB9haGek_52ceO3GjXrtAR5tbsnj@mail.gmail.com><AANLkTin+DsXM58+gQ=H4vXGyuRS7xcDHcmAKKYMztvDL@mail.gmail.com><AANLkTimzgzLHrAg_pKHv82Qjzsz6ME1NPFsfZ87P2tQ8@mail.gmail.com><AANLkTi=pQoaya+9eyChCzn5HnkGkcOcbZxL=rQEN=jDL@mail.gmail.com><a06240800c996972c073b@[149.72.35.130]><8F77913624F7524AACD2A92EAF3BFA54168ECD35D9@SJMEMXMBS11.stjude.sjcrh.local ><20110305125300.GA4352@emerald.iucr.org><a06240800c997f77ca4d3@[192.168.2.102]>
I will agree with many of the points made by Peter. I believe the decision on the byte order markings (BOM) should be made having considered what type of format CIF should be. As I see it there are two options. 1) An easily human editable, text based format, as CIF 1.1 is presently. Users can quickly edit files using the text editor of their choice. Tools such as enCIFer that validate the content are helpful to users, but entirely optional. For this situation to continue with the new format, both the BOM and encoding of Unicode characters need to be something standard that is already supported by a number of text editors. IMO this means it must be a standard Unicode BOM (as described in the Unicode standard and on http://en.wikipedia.org/wiki/Byte_order_mark, and either UTF-8 or UTF-16 format. 2) A machine editable or non-text format, such as XML or PDF or a text file with non-standard encoding. As an aside, I do realise that it is entirely practicable to edit XML by hand, but it is certainly more difficult than editing a CIF. This would be the situation with a non-standard BOM or encoding and would imply that users need to use special tools to edit the files. I feel that use of numeric encodings similar to HTML entity encodings (e.g. Ӓ) also falls into this category, as for a non HTML file standard text editors will not understand the encoding scheme. A major disadvantage of (2) is that it will create a chicken and egg situation for the adoption of the new format. Users will not be able to create the new files as there will be no tools, whilst providers of tools will be less inclined to develop these as the format is not in widespread use. I expect a few enthusiasts would in this case produce some tools to get the community going, but they will likely be less featured than those already existing for CIF 1.1, imposing a barrier to adoption. A custom format will also raise the likelihood of users using the wrong type of editor to adjust files, resulting in more syntactically incorrect files or foreign characters being corrupted by use of different encodings. Such errors will not help efforts to automatically store and curate crystallographic data. In summary, I feel that creating a non-standard-standard will impede the usage of the new files, and therefore the best choice is to use standard Unicode files. Matt In case my signature does not make it apparent, although I work for the CCDC and have been involved in the development of enCIFer, these views are entirely my own as a scientific software developer, and have not been endorsed nor approved by the CCDC. LEGAL NOTICE Unless expressly stated otherwise, information contained in this message is confidential. If this message is not intended for you, please inform postmaster@ccdc.cam.ac.uk and delete the message. The Cambridge Crystallographic Data Centre is a company Limited by Guarantee and a Registered Charity. Registered in England No. 2155347 Registered Charity No. 800579 Registered office 12 Union Road, Cambridge CB2 1EZ.
Reply to: [list | sender only]
- Follow-Ups:
- References:
- Advice on COMCIFS policy regarding compatibility of CIF syntax withother domains (James Hester)
- Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (James Hester)
- Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (Peter Murray-Rust)
- Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (James Hester)
- Re: Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains (Herbert J. Bernstein)
- Re: Advice on COMCIFS policy regarding compatibility of CIFsyntaxwith other domains. (Brian McMahon)
- Re: Advice on COMCIFS policy regarding compatibility of CIFsyntaxwith other domains. (Herbert J. Bernstein)
- Prev by Date: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- Next by Date: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- Prev by thread: Re: Advice on COMCIFS policy regarding compatibility of CIFsyntaxwith other domains.
- Next by thread: RE: Advice on COMCIFS policy regarding compatibility of CIFsyntaxwith other domains.. .
- Index(es):