Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] UTF-8 BOM

Dear Herbert:

I'm puzzled by your opinion that 0xFEFF is not a valid Unicode code
point, as you were an active participant in recent discussions where
we were talking about the appearance of 0xFEFF in UTF8 CIF2 files.
Anyway:

The third message in this thread, from John Bollinger, discusses the
treatment of 0xFEFF, with reference to the Unicode standard.  I
recommend that you read that message, and particularly note the phrase
from the Unicode FAQ: "For backwards compatibility it should be
treated as ZERO WIDTH NON-BREAKING SPACE (ZWNBSP), and is then part of
the content of the file or string."  In a nutshell, it is up to us to
decide how to treat 0xFEFF in the decoded data stream, and you have
contributed to that particular discussion in the message below.  So:
voting on the treatment of 0xFEFF in the datastream is appropriate,
and from your contribution I interpret a desire to ignore it (Option
2(d))?

Note also that http://www.unicode.org/faq/utf_bom.html states:

"A byte order mark (BOM) consists of the character code U+FEFF at the
beginning of a data stream, where it can be used as a signature
defining the byte order and encoding form, primarily of unmarked
plaintext files. Under some higher level protocols, use of a BOM may
be mandatory (or prohibited) in the Unicode data stream defined in
that protocol."

That is, a BOM is only recognised by the Unicode standard at the start
of a data stream. Voting option 3(b) is there due to your advocacy of
using UCS2 BOM as an encoding switch inside a data stream (because the
Unicode standard *does not* mandate that, we should explicitly state
this if this is what we want).

On Fri, Jun 18, 2010 at 11:30 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
> Sorry, you are mistaken.  What the code chart says is:
>
> Special
> FEFF    ZERO WIDTH NO-BREAK SPACE
>        = BYTE ORDER MARK (BOM), ZWNBSP
>         may be used to detect byto order by contrast
>         with the noncharcater code point FFFE
>          use as an indication of non-breaking is
>         deprecated; see 2060 instead
>         -> 200B zero width space
>         -> 2060 word joiner
>         -> FFFE <not a charcater>
>
> So, under the latest version of unicode, the use you are describing
> in deprecated. The unicode consortium has the character back to what it
> originally was -- the BOM, which is not a character, and I intend
> to process it that way, not in the very odd way that some people followed
> for a few recent Unicode versions that made no sense and has now been
> deprecated.
>
> In theory there could be old unicode UTF-8 files somehow with stray FEFF
> characters in them as code points, but inasmuch as CIF2 is new, we are
> all spared the puzzlement of dealing with this non-problem of dealing
> with a noncharacter which became a strange character and is now again
> a noncharacter.
>
> In addition, if you read the bizarre discussions on FEFF when people
> were trying to use if as a code point instead of just stopping it
> at the text processing level, you will see that the only thing they
> could do with it was throw it away (that is what a zero width no-break
> space means)
>
> The _only_ fully compliant use for FEFF in the current standard is
> as a BOM, not as a valid code point, so it is not really an issue
> for CIF2 any more than FFFF or FFFE are, none of which should
> be delivered as code points in text processing.
>
> The proposition you have proposed is a false trichotomy.
>
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
>
> On Fri, 18 Jun 2010, James Hester wrote:
>
>> I suggest you look again (perhaps you found 0xFFFE instead?).  Unicode
>> Hexadecimal code point 0xFEFF is Zero Width Non-Breaking Space
>> (ZWNBSP).  Previous recent emails have discussed this at some length.
>>
>> On Fri, Jun 18, 2010 at 10:55 AM, Herbert J. Bernstein
>> <yaya@bernstein-plus-sons.com> wrote:
>>>
>>> Dear Colleagues,
>>>
>>>  As I said, I reject the false trichotomy presented, and vote to reject
>>> this binary approach to CIF2.  Asking what should be done if the
>>> Unicode code point 0xFEFF is encountered in the text stream.  FFFE is
>>> not a Unicode text character (I just checked the latest Unicode standard,
>>> and it is still not a character, explicitly call as "noncharacter") so
>>> a properly functioning text system simply will not deliver it as text
>>> to an application, just as in older ASCII-based systems, characters such
>>> as
>>> NUL and SYN are stripped before delivery of text to an application.
>>>
>>>  Regards,
>>>    Herbert
>>>
>>> =====================================================
>>>  Herbert J. Bernstein, Professor of Computer Science
>>>   Dowling College, Kramer Science Center, KSC 121
>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                 +1-631-244-3035
>>>                 yaya@dowling.edu
>>> =====================================================
>>>
>>> On Fri, 18 Jun 2010, James Hester wrote:
>>>
>>>> Herbert and Simon: regardless of your concerns about what encodings
>>>> should be acceptable for CIF2, I would invite you to vote on the
>>>> treatment of Unicode code point 0xFEFF when encountered in the decoded
>>>> text stream.  If you think a initial BOM should not be part of the
>>>> decoded text, then you are deciding how to treat code point 0xFEFF as
>>>> the first character in a CIF2 file, and the only consistent stance
>>>> would be that such a file is non-conformant, as the magic number
>>>> convention is violated.
>>>>
>>>> On Thu, Jun 17, 2010 at 9:21 PM, SIMON WESTRIP
>>>> <simonwestrip@btinternet.com> wrote:
>>>>>
>>>>> Dear all
>>>>>
>>>>> I've been watching this thread with the viewpoint that whatever is
>>>>> decided
>>>>> for the spec,
>>>>> I am going to have to be aware that CIFs may contain mixed encoding or
>>>>> encoding that
>>>>> isnt as specified. We meet this situation elsewhere, especially with
>>>>> text
>>>>> uploaded from
>>>>> web forms.
>>>>>
>>>>> So I quite like Herbert's latest description and would prefer to hold
>>>>> back
>>>>> from voting until I've considered this in more detail.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Simon
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
>>>>> To: Group finalising DDLm and associated dictionaries
>>>>> <ddlm-group@iucr.org>
>>>>> Sent: Wednesday, 16 June, 2010 12:29:41
>>>>> Subject: Re: [ddlm-group] UTF-8 BOM
>>>>>
>>>>> Dear Colleagues,
>>>>>
>>>>>   As I said in my last message, I am proposing that we do what
>>>>> most of the world really does with unicode -- treat a CIF2 as
>>>>> a text file in which the information presented is a sequence
>>>>> os valid printable unicode code points no matter what the
>>>>> encoding.
>>>>>
>>>>>   For convenience in interchange, I am proposing that all
>>>>> CIF2 processing software working on systems that provide
>>>>> support for UTF-8 must provide support for that particular
>>>>> encoding, but if someone happens to be working in a system
>>>>> the only supports a UTF-7 or a UTF-16 or an old code-page-based
>>>>> encoding then I see no reason to declare what they produce
>>>>> erroneous in any way -- just a reason to require that they
>>>>> clearly identify the encoding used so that one of the
>>>>> many reliable encoding conversion programs that are available
>>>>> may be passed over their file when it needs to be handled
>>>>> in the preferred encoding.  I happen to use cyclone on my
>>>>> mac for that purpose.
>>>>>
>>>>>   The use of a BOM is just a quick, simply way to clearly
>>>>> specify an ecnoding if the file encoding a text file
>>>>> is a unicode file, but it really is not part of the text
>>>>> itself.
>>>>>
>>>>>   I, the strong proponent of supporting binary with CIF,
>>>>> am proposing that we return to the original approach
>>>>> to CIF -- that it really is a text file, not a binary file.
>>>>> I do so precisely to help me support the handling of
>>>>> binary with CIF.
>>>>>
>>>>>   Regards,
>>>>>     Herbert
>>>>> =====================================================
>>>>>   Herbert J. Bernstein, Professor of Computer Science
>>>>>     Dowling College, Kramer Science Center, KSC 121
>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>
>>>>>                   +1-631-244-3035
>>>>>                   yaya@dowling.edu
>>>>> =====================================================
>>>>>
>>>>> On Wed, 16 Jun 2010, James Hester wrote:
>>>>>
>>>>>> Dear Herbert,
>>>>>>
>>>>>> Would you mind enlarging a little on what you are responding to here,
>>>>>> as I don't follow your thinking.
>>>>>> Perhaps I was not clear: I am not in favour of allowing a variety of
>>>>>> encodings to be included within the CIF2 standard.  I am advocating
>>>>>> UTF8 only.  Is this what you are responding to, or are you discussing
>>>>>> the suggestion of allowing a variety of encodings?
>>>>>>
>>>>>> On Wed, Jun 16, 2010 at 12:33 PM, Herbert J. Bernstein
>>>>>> <yaya@bernstein-plus-sons.com> wrote:
>>>>>>>
>>>>>>> Dear Colleagues,
>>>>>>>
>>>>>>>  This is quite a disruptive change.  Until now CIF has always had
>>>>>>> machine-dependent encoding changes assumed.  I am in favor of
>>>>>>> working the entire world towards a common representation of text,
>>>>>>> and the use of multiple Unicode representations supported on
>>>>>>> current systems is going to be a large positive step.  I think
>>>>>>> it is a little premature (by about 10 years) to assume a
>>>>>>> world of UTF-8 purity.  We ain't there yet.
>>>>>>>
>>>>>>>  You are essentially making CIF2 into a binary format instead
>>>>>>> of a text format.  That is a truly disruptive change.  I think
>>>>>>> it is a serious mistake that will discourage use of CIF as an
>>>>>>> interchange format, not encourage it.
>>>>>>>
>>>>>>>  Regards,
>>>>>>>    Herbert
>>>>>>>
>>>>>>> =====================================================
>>>>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>>>>   Dowling College, Kramer Science Center, KSC 121
>>>>>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>>
>>>>>>>                 +1-631-244-3035
>>>>>>>                 yaya@dowling.edu
>>>>>>> =====================================================
>>>>>>>
>>>>>>> On Wed, 16 Jun 2010, James Hester wrote:
>>>>>>>
>>>>>>>> My concern with opening up the suite of possible CIF encodings is
>>>>>>>> that
>>>>>>>> we
>>>>>>>> need to maintain a guarantee that any CIF2-conformant writer will
>>>>>>>> produce
>>>>>>>> files that any CIF2-conformant reader can read.  As we are a data
>>>>>>>> transfer
>>>>>>>> and archiving standard, this is a core guarantee that we make, so we
>>>>>>>> cannot
>>>>>>>> specify optional behaviour.  Note that we are not restricted to
>>>>>>>> someone
>>>>>>>> transferring files between computers at a single point in time, when
>>>>>>>> some
>>>>>>>> negotiation of encoding protocol could take place; we may be talking
>>>>>>>> about
>>>>>>>> a
>>>>>>>> third party retrieving a file archived some years ago by someone
>>>>>>>> else
>>>>>>>> in
>>>>>>>> the
>>>>>>>> local university repository.
>>>>>>>>
>>>>>>>> What people are and have always been free to do is to encapsulate
>>>>>>>> and
>>>>>>>> encode
>>>>>>>> CIFs in whatever way they wish, as long as the result is not touted
>>>>>>>> as
>>>>>>>> being
>>>>>>>> 'CIF2 conformant'.  The optional UTF8 BOM that we have more or less
>>>>>>>> agreed
>>>>>>>> to is purely in deference to poorly-written text editors, rather
>>>>>>>> than
>>>>>>>> an
>>>>>>>> encoding signature as such.
>>>>>>>>
>>>>>>>> On Tue, Jun 15, 2010 at 6:09 AM, Bollinger, John C
>>>>>>>> <John.Bollinger@stjude.org> wrote:
>>>>>>>>      On Monday, June 14, 2010 9:26 AM, Brian McMahon wrote:
>>>>>>>>
>>>>>>>>      >I'm coming to this late, I fear, but I would prefer that the
>>>>>>>>      spec
>>>>>>>>      >be kept as simple as possible. I note the following comments
>>>>>>>> in
>>>>>>>>      >the Unicode FAQ document referenced by John B
>>>>>>>>      >(http://www.unicode.org/faq/utf_bom.html):
>>>>>>>>      >
>>>>>>>>      >    "Where UTF-8 is used transparently in 8-bit environments,
>>>>>>>>      the use
>>>>>>>>      >    of a BOM will interfere with any protocol or file format
>>>>>>>>      that expects
>>>>>>>>      >    specific ASCII characters at the beginning, such as the
>>>>>>>> use
>>>>>>>>      of "#!"
>>>>>>>>      >    of at the beginning of Unix shell scripts."
>>>>>>>>
>>>>>>>> Well yes, but that applies to protocols defined in terms of 8-bit,
>>>>>>>> ASCII-derived character sets ("8-bit environments").  It does not
>>>>>>>> argue for BOMs to be forbidden in Unicode environments such as CIF2.
>>>>>>>>  Of course, neither does it require that BOMs be accepted or
>>>>>>>> recognized in Unicode environments.
>>>>>>>>
>>>>>>>>>    "In the absence of a protocol supporting its use as a BOM and
>>>>>>>>
>>>>>>>> when
>>>>>>>>>
>>>>>>>>>    not at the beginning of a text stream, U+FEFF should normally
>>>>>>>>> not
>>>>>>>>>    occur."
>>>>>>>>
>>>>>>>> I'm disappointed that you truncated the quote there.  It continues
>>>>>>>> with "For backwards compatibility it should be treated as ZERO WIDTH
>>>>>>>> NON-BREAKING SPACE (ZWNBSP), and is then part of the content of the
>>>>>>>> file or string."  It goes on to advocate using U+2060 instead, and
>>>>>>>> (in
>>>>>>>> the interest of full disclosure) it closes by commenting that a
>>>>>>>> language or protocol can specify that U+FEFF is unsupported in the
>>>>>>>> middle of a file.
>>>>>>>>
>>>>>>>>> I suggest the CIF specification deprecate the use of U+FEFF so that
>>>>>>>>> *any* occurrence of it be treated formally as an error. However, a
>>>>>>>>> note should acknowledge that U+FEFF is permitted according to the
>>>>>>>>> Unicode standard at the start of a data stream, and that therefore
>>>>>>>>> a
>>>>>>>>> CIF reading application may at its discretion accept U+FEFF
>>>>>>>>> followed
>>>>>>>>> by #\#CIF2.0 as a valid magic number at the start of a file.
>>>>>>>>
>>>>>>>> I don't see what is gained by forbidding U+FEFF from appearing
>>>>>>>> inside
>>>>>>>> data values, where one might arrive via any number of innocent
>>>>>>>> means.
>>>>>>>>  As it currently stands, the draft permits this.  It is somewhat
>>>>>>>> problematic to allow it at the beginning or end of a
>>>>>>>> whitespace-delimited value, but U+FEFF is by no means the only
>>>>>>>> character that is allowed but problematic at such a position.
>>>>>>>>
>>>>>>>> On the other hand, it is viable to specify that CIF itself does not
>>>>>>>> (directly) include a BOM.  That's where we started.  (Pedantic note:
>>>>>>>> "initial BOM" is redundant.  As the term is used in relation to
>>>>>>>> Unicode, a BOM necessarily appears at the beginning of a data
>>>>>>>> stream;
>>>>>>>> anywhere else, U+FEFF is just U+FEFF.)  If CIF does not formally
>>>>>>>> allow
>>>>>>>> a BOM then an otherwise well-formed CIF stream headed by a BOM would
>>>>>>>> then need to be interpreted either
>>>>>>>>
>>>>>>>> 1) as an unrecognized file, or
>>>>>>>>
>>>>>>>> 2) as an ill-formed CIF, or
>>>>>>>>
>>>>>>>> 3) as a well-formed CIF (any version) encapsulated in another
>>>>>>>> protocol.  Such "another protocol" does not need to be the concern
>>>>>>>> of
>>>>>>>> CIF.
>>>>>>>>
>>>>>>>>> The idea is that any fully-conformant CIF writer will never write
>>>>>>>>> an
>>>>>>>>> initial UTF-8 BOM, and so any software designed to handle only
>>>>>>>>> fully
>>>>>>>>> conformant CIFs will not be troubled by it.
>>>>>>>>
>>>>>>>> I could live with that.  I can't imagine writing a CIF processor
>>>>>>>> limited to that mode of operation, nor would I want to use one, but
>>>>>>>> I
>>>>>>>> can handle CIF's formal scope being limited in that way.
>>>>>>>>
>>>>>>>> In that case, however, let's carry it to the logical conclusion.
>>>>>>>>  Rather than put one particular encoding detail outside CIF's scope,
>>>>>>>> why not put character encoding out of scope altogether?  CIF can
>>>>>>>> easily be defined simply in terms of "Unicode characters".  Perhaps
>>>>>>>> instead of anointing UTF-8 as the One True Encoding for CIF, it
>>>>>>>> would
>>>>>>>> be better to make encoding an entirely separate concern.
>>>>>>>>
>>>>>>>> Practically speaking, you're going to have that anyway.  Even
>>>>>>>> disregarding imgCIF, does anyone really expect never to hear "it's a
>>>>>>>> CIF, except encoded in <FOO-13> instead of UTF-8"?  Does anyone
>>>>>>>> really
>>>>>>>> think they need the authority of the CIF specification to require
>>>>>>>> that
>>>>>>>> CIFs be delivered to them in a particular encoding?  How is that
>>>>>>>> qualitatively different from requiring particular CIF content, as
>>>>>>>> most
>>>>>>>> programs do?
>>>>>>>>
>>>>>>>>>                                             Of course the world
>>>>>>>>> does
>>>>>>>>> contain CIFs created other than by fully-conformant CIF writers. To
>>>>>>>>> an extent the community should decide for itself how best to
>>>>>>>>> attempt
>>>>>>>>> to handle deviations from full conformance. It would help, perhaps,
>>>>>>>>
>>>>>>>> if
>>>>>>>>>
>>>>>>>>> those of us writing CIF readers would document specific practices
>>>>>>>>
>>>>>>>> that
>>>>>>>>>
>>>>>>>>> the software takes to accommodate such deviations. Ideally, such
>>>>>>>>> software should have a verbose logging mode that can be activated
>>>>>>>>> whenever surprising behaviour in reading CIFs is encountered by
>>>>>>>>> the user.
>>>>>>>>
>>>>>>>> I think it's exceedingly optimistic to expect "the community" to
>>>>>>>> arrive at and abide by a single, consistent set of best practices.
>>>>>>>>  The best you can hope for is that a small number of organizations
>>>>>>>> and
>>>>>>>> / or programs will exert enough influence to establish their own de
>>>>>>>> facto standards.
>>>>>>>>
>>>>>>>> We can exert some influence there, however.  Either the CIF spec or
>>>>>>>> a
>>>>>>>> companion spec could establish conformance requirements for CIF
>>>>>>>> *processors*, including, for example, the ability to diagnose
>>>>>>>> particular malformations.  The XML spec does this, as do some
>>>>>>>> programming language specs.
>>>>>>>>
>>>>>>>> Such a document could also establish, perhaps, that CIF processors
>>>>>>>> must be able to accept the UTF-8 encoding, and maybe even that they
>>>>>>>> must assume UTF-8 by default.  That would establish the baseline and
>>>>>>>> a
>>>>>>>> guaranteed interoperability mode that we would otherwise lose by
>>>>>>>> pushing character encoding outside the format specification.
>>>>>>>>
>>>>>>>>> Notice that naive concatenation of CIFs will remain a bad idea for
>>>>>>>>> all sorts of reasons - beyond the purely syntactic issues, one will
>>>>>>>>> get multiple "data_TOZ" declarations for example. Undoubtedly this
>>>>>>>>> will continue to happen, but perhaps increasing the number of
>>>>>>>>> occasions when blindly concatenating files triggers software errors
>>>>>>>>> will help to raise awareness and/or the use of better software
>>>>>>>>> tools.
>>>>>>>>
>>>>>>>> You are preaching to the choir with that as far as I am concerned.
>>>>>>>>  It
>>>>>>>> has never been altogether safe or reliable to assemble CIFs by
>>>>>>>> concatenation of fragments or complete CIFs, and I don't see why
>>>>>>>> CIF2
>>>>>>>> needs to make special accommodation for behavior that was never
>>>>>>>> correct in the first place.  No matter what treatment is chosen for
>>>>>>>> U+FEFF, people who exercise due care will still be able to assemble
>>>>>>>> well-formed CIF2 files from fragments, even by using 'cat' if they
>>>>>>>> do
>>>>>>>> so shrewdly.
>>>>>>>>
>>>>>>>> John
>>>>>>>> --
>>>>>>>> John C. Bollinger, Ph.D.
>>>>>>>> Department of Structural Biology
>>>>>>>> St. Jude Children's Research Hospital
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ddlm-group mailing list
>>>>>>>> ddlm-group@iucr.org
>>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> T +61 (02) 9717 9907
>>>>>>>> F +61 (02) 9717 3145
>>>>>>>> M +61 (04) 0249 4148
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ddlm-group mailing list
>>>>>>> ddlm-group@iucr.org
>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> T +61 (02) 9717 9907
>>>>>> F +61 (02) 9717 3145
>>>>>> M +61 (04) 0249 4148
>>>>>> _______________________________________________
>>>>>> ddlm-group mailing list
>>>>>> ddlm-group@iucr.org
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> T +61 (02) 9717 9907
>>>> F +61 (02) 9717 3145
>>>> M +61 (04) 0249 4148
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.