From gsinai@yudit.org Sun Feb  3 13:51:32 2002 +0900
Status: 
X-Status: 
X-Keywords:
Received: by macska.yudit.org id g134pVA12138; Sun, 3 Feb 2002 13:51:31 +0900
Delivered-To: yuditorg-gsinai@yudit.org
Received: from mail.yudit.org [206.245.164.55]
	by localhost with POP3 (fetchmail-5.7.4)
	for gsinai@localhost (single-drop); Sun, 03 Feb 2002 13:51:31 +0900 (JST)
Received: (qmail 32128 invoked from network); 3 Feb 2002 03:44:58 -0000
Received: from unicode.org (209.235.17.55)
  by mailserv2.iuinc.com with SMTP; 3 Feb 2002 03:44:58 -0000
Received: from sarasvati.unicode.org (localhost.localdomain [127.0.0.1])
	by unicode.org (8.9.3/8.9.3) with ESMTP id UAA08878;
	Sat, 2 Feb 2002 20:22:20 -0500
Received: with LISTAR (v1.0.0; list unicode); Sat, 02 Feb 2002 20:22:20 -0500 (EST)
Received: from mailserv2.iuinc.com (mailserv2.iuinc.com [206.245.164.55])
	by unicode.org (8.9.3/8.9.3) with SMTP id UAA08872
	for <unicode@unicode.org>; Sat, 2 Feb 2002 20:22:20 -0500
Received: (qmail 4080 invoked from network); 3 Feb 2002 02:41:14 -0000
Received: from 27.pool0.ipctokyo.att.ne.jp (HELO macska.yudit.org) (165.76.244.27)
  by mailserv2.iuinc.com with SMTP; 3 Feb 2002 02:41:14 -0000
Received: by macska.yudit.org id g132fBA11996; Sun, 3 Feb 2002 11:41:11 +0900
Date: Sun, 3 Feb 2002 11:41:11 +0900 (JST)
From: Gaspar Sinai <gsinai@yudit.org>
To: Unicode List <unicode@unicode.org>
Subject: Unicode and Security
Message-ID: <Pine.LNX.4.33.0202031140220.11992-100000@macska.yudit.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-archive-position: 1370
X-listar-version: Listar v1.0.0
Sender: unicode-bounce@unicode.org
Errors-to: unicode-bounce@unicode.org
X-original-sender: gsinai@yudit.org
Precedence: bulk
List-help: <mailto:listar@unicode.org?Subject=help>
List-unsubscribe: <mailto:unicode-request@sarasvati.unicode.org?Subject=unsubscribe>
List-software: Listar version 1.0.0
X-List-ID: <unicode.sarasvati.unicode.org>
X-list: unicode

Unicode and Security

I would like to start a series of discussion about
the security aspects of Unicode.

I would also like to know your opinion about the
need to create another or an 'intermediate' standard.

I have a lot of issues in my mind - Security is
the top one.

With the introduction of digital signatures security
will became a very important part of the character
encoding.

Is Unicode secure? What character standards can be
considered secure?

I had the following problems where unicode could not
be used because of security issues. In all cases
the signer of  a document can be lured into
believing that the wording of the document he/she
is about to sign is different.

How can it be? I had the following problems:

1. Character Order Problem

   The BIDI algorithm is too complex and not reversible.
   I could create a BIDI document where only RLO LRO and
   PDF characters were used, and the WORD, JAVA and KDE
   produced different word ordering. I don't have access
   to MS platform  now to reproduce this but as far as
   I can tell it was like:

    <RLO>text1<PDF>U+0020<RLO>text2<PDF>

   Because the BIDI algorithm is too complex and vague
   it can be said that these programs all displayed
   the text correctly, still differently.

      text1 text2
      text2 text1

2. Character Shape Problem

   I had different character shapes, because:
   a) Ligatures
      In complex scripts, in Devanagari for instance the
      ZERO WITH JOINER should be used to prevent ligature
      forming and normally join the characters.

      Whether ligature forming will actually happen or not
      is completely up to the font. If the font does have
      the ligature,  it will be formed. The standard does
      not define all the compulsory ligatures.

      I was even thinking about putting ZERO WITH JOINER
      after each character. But why we have ZERO WITH JOINER
      at all? I think a ZERO WITH LIGATURE FORMER would
      be better. In this case at least I would know that
      a ligature may appear at that point.

    b) Hidden Marks
      It is possible to make a combining mark, like a
      negation mark appear in the base characters body
      making it invisible. It is nearly impossible to
      test the rendering engine for all possible
      combinations.

3. Text Search Problem

    It is possible to create texts that look the same,
    but the can not be searched because even when fully
    decomposed and ordered they will be different.

I am sure this is not a full list, but these are the things
that concern me most at the moment.

Thank you for you attention
Gaspar
