[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Cif2-encoding] Addressing Simon's concerns
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] Addressing Simon's concerns
- From: SIMON WESTRIP <simonwestrip@xxxxxxxxxxxxxx>
- Date: Wed, 29 Sep 2010 08:31:48 +0000 (GMT)
- In-Reply-To: <AANLkTin1J+CdNDN8_S-7N6UN+K9EWOgdUfNb7WwJcq9H@mail.gmail.com>
- References: <AANLkTin1J+CdNDN8_S-7N6UN+K9EWOgdUfNb7WwJcq9H@mail.gmail.com>
Dear James
I was *not* suggesting that "We would alienate biologists if they are unable to submit manuscripts in their native language".
Rather, I would like them to start working with mmCIF and feal happy to do so, whether it be one of the 60000+ mmCIFs in
the PDB archives, or one of the future mmCIF2s (if such a thing comes into being). So I concluded that the less difference between
CIF1 and CIF2 as perceived by the user, the better.
This particular message was an attempt to explain why I had switched to a more flexible approach. To quote myself:
"Granted this may not be the most compelling argument in favour of 'any encoding', but recognizing the hurdles that
may have to be overcome once we move beyond ASCII whatever the CIF2 specification, I support 'any encoding'
as 'a means to an end'."
As it stands, the mmCIF user group is very different from the core CIF user group and the adoption of 'specified' encodings
by the PDB is likely to have far less impact on users than the adoption of 'specified' encodings by the IUCr. My use of
mmCIFs as an example was not intended to sway anyone - it just happened to be the consideration that prompted me to
request that I could switch my vote.
Cheers
Simon
From: James Hester <jamesrhester@gmail.com>
To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@iucr.org>
Sent: Wednesday, 29 September, 2010 3:16:07
Subject: [Cif2-encoding] Addressing Simon's concerns
Some comments on Simon's concerns as raised last week:
Simon wrote:
This cause would also not be furthered if the PDB or other colleagues are unable to figure out what the symbols in the text were without hassling the provider of the CIF (think of a phone conversion "OK, try this, now send it again....no, what about trying this format and send it again...that works, except that the Greek characters don't come out....") I think a rough equivalent of what you are saying is "We would alienate biologists if they are unable to submit manuscripts in their native language". However, scientists are used to making some extra effort in order to achieve international communication. Furthermore, are there any macromolecular data exchange formats that allow characters from the Unicode range to be interchanged reliably? Isn't there a carrot as well as a (small) stick here?
Just to make sure that I understand, you are concerned that third party software may take a UTF8 mmCIF template provided by the IUCr and populate it with further information, and at some stage transcode to a different encoding. By mandating UTF8, we are therefore forcing biologists to jump through more hoops than they would otherwise have to. I don't see how that last sentence follows: surely those writing third party software are the ones that will be dealing with encoding issues, and as long as we say right now, at the beginning, that the encoding is UTF8/16, the software will be written as intended and biologists will be oblivous to the encoding.
all the best,
James.
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
I was *not* suggesting that "We would alienate biologists if they are unable to submit manuscripts in their native language".
Rather, I would like them to start working with mmCIF and feal happy to do so, whether it be one of the 60000+ mmCIFs in
the PDB archives, or one of the future mmCIF2s (if such a thing comes into being). So I concluded that the less difference between
CIF1 and CIF2 as perceived by the user, the better.
This particular message was an attempt to explain why I had switched to a more flexible approach. To quote myself:
"Granted this may not be the most compelling argument in favour of 'any encoding', but recognizing the hurdles that
may have to be overcome once we move beyond ASCII whatever the CIF2 specification, I support 'any encoding'
as 'a means to an end'."
As it stands, the mmCIF user group is very different from the core CIF user group and the adoption of 'specified' encodings
by the PDB is likely to have far less impact on users than the adoption of 'specified' encodings by the IUCr. My use of
mmCIFs as an example was not intended to sway anyone - it just happened to be the consideration that prompted me to
request that I could switch my vote.
Cheers
Simon
From: James Hester <jamesrhester@gmail.com>
To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@iucr.org>
Sent: Wednesday, 29 September, 2010 3:16:07
Subject: [Cif2-encoding] Addressing Simon's concerns
Some comments on Simon's concerns as raised last week:
Simon wrote:
I have been developing a new docx template which the IUCr editorial office is shortly to release for use by
authors. The template will be packaged with some tools to extract data from CIFs
and tabulate them in the Word document, e.g. open an mmCIF, click a button, and standard
tables populated with data from the CIF will be included in the document, acting as
table templates for the author to edit as appropriate for their manuscript.
Inclusion of the mmCIF tools is part of an unofficial policy to 'coax' biologists to start using/accepting mmCIF
as a useful medium, rather than as a product of their deposition to the PDB, and to encourage them to become comfortable
with passing mmCIFs between applications, and even to edit the things (in the same way as the core-CIF community
treats CIFs). For example, our perception is that there is no reason why an author should not feel free to take an mmCIF
that has been created by e.g. pdb_extract and populate it using third-party software before uploading to the PDB for
deposition.
This cause would not be furthered by effectively invalidating an mmCIF if it were not to be encoded in one of
the specified encodings.
authors. The template will be packaged with some tools to extract data from CIFs
and tabulate them in the Word document, e.g. open an mmCIF, click a button, and standard
tables populated with data from the CIF will be included in the document, acting as
table templates for the author to edit as appropriate for their manuscript.
Inclusion of the mmCIF tools is part of an unofficial policy to 'coax' biologists to start using/accepting mmCIF
as a useful medium, rather than as a product of their deposition to the PDB, and to encourage them to become comfortable
with passing mmCIFs between applications, and even to edit the things (in the same way as the core-CIF community
treats CIFs). For example, our perception is that there is no reason why an author should not feel free to take an mmCIF
that has been created by e.g. pdb_extract and populate it using third-party software before uploading to the PDB for
deposition.
This cause would not be furthered by effectively invalidating an mmCIF if it were not to be encoded in one of
the specified encodings.
This cause would also not be furthered if the PDB or other colleagues are unable to figure out what the symbols in the text were without hassling the provider of the CIF (think of a phone conversion "OK, try this, now send it again....no, what about trying this format and send it again...that works, except that the Greek characters don't come out....") I think a rough equivalent of what you are saying is "We would alienate biologists if they are unable to submit manuscripts in their native language". However, scientists are used to making some extra effort in order to achieve international communication. Furthermore, are there any macromolecular data exchange formats that allow characters from the Unicode range to be interchanged reliably? Isn't there a carrot as well as a (small) stick here?
So although I am uneasy about a specification that propogates uncertainty, I'm also uneasy about alienating
users,
especially when we are struggling to change their mindset as in the case of the biological community
(my perception of the biological community's attitude to mmCIF is based on feedback from authors/coeditors to
IUCr journals).
Granted this may not be the most compelling argument in favour of 'any encoding', but recognizing the hurdles that
may have to be overcome once we move beyond ASCII whatever the CIF2 specification, I support 'any encoding'
as 'a means to an end'.
especially when we are struggling to change their mindset as in the case of the biological community
(my perception of the biological community's attitude to mmCIF is based on feedback from authors/coeditors to
IUCr journals).
Granted this may not be the most compelling argument in favour of 'any encoding', but recognizing the hurdles that
may have to be overcome once we move beyond ASCII whatever the CIF2 specification, I support 'any encoding'
as 'a means to an end'.
Just to make sure that I understand, you are concerned that third party software may take a UTF8 mmCIF template provided by the IUCr and populate it with further information, and at some stage transcode to a different encoding. By mandating UTF8, we are therefore forcing biologists to jump through more hoops than they would otherwise have to. I don't see how that last sentence follows: surely those writing third party software are the ones that will be dealing with encoding issues, and as long as we say right now, at the beginning, that the encoding is UTF8/16, the software will be written as intended and biologists will be oblivous to the encoding.
all the best,
James.
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________ cif2-encoding mailing list cif2-encoding@iucr.org http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- Follow-Ups:
- Re: [Cif2-encoding] Addressing Simon's concerns (SIMON WESTRIP)
- References:
- [Cif2-encoding] Addressing Simon's concerns (James Hester)
- Prev by Date: Re: [Cif2-encoding] How we wrap this up
- Next by Date: Re: [Cif2-encoding] Addressing Simon's concerns
- Prev by thread: [Cif2-encoding] Addressing Simon's concerns
- Next by thread: Re: [Cif2-encoding] Addressing Simon's concerns
- Index(es):