- 热门文章:
- · Dreamweaver 4 & UltraDev 4两个BUG(字体设置不能保存和行号显示错位)的解决方法
- · 一个DHTML的例子——3D文字
- · 如何做页面自动刷新,又不用让用户按回车键来提交数据!(大功告成)
- · javascript对象与数组参考大全1
- · javascript对象与数组参考大全2
- · 一个点击后自动滚屏的例子!
- · 一个鼠标自动移动的js例子!
- · window.showModalDialog()中有三个参数,各有什么用,请举例!
- · 三级下拉框连动的数据库版!
- · 给一个类增加属性和方法?看看这个够不够?
- · 在网页中实现OICQ里的头像选择的下拉框 (附例子)
- · 请看用javascript设置和读取cookie的简单例子.....
html的标准里这样写的。
Contents
The Document Character Set
Character entities
Human languages define a large number of text characters and human beings have invented a wide variety of systems for representing these characters in a computer. Unless proper precautions are taken, differing character representations may not be understood by user agents in all parts of the world.
The Document Character Set
To promote interoperability, SGML requires that each application (including HTML), as part of its definition, define its document character set. A document character set is a set of abstract characters (such as the Cyrillic letter "I", the Chinese character meaning "water", etc.) and a corresponding set of integer references to those characters. SGML considers a document to be a sequence of references in the document character set.
The document character set for HTML is the Universal Character Set (UCS) of [ISO10646]. This set is character-by-character equivalent to Unicode 2.0 ([UNICODE]). Both of these standards are updated from time to time with new characters and the amendments should be consulted at the respective Web sites.
In the current specification, references to ISO/IEC-10646 or Unicode imply the same document character set. However, the current document also refers to the Unicode specification for other issues such as the bidirectional text algorithm.
Conforming HTML user agents may receive or output a document, or represent a document internally, using any character encoding. A character encoding represents some subset of the document character set. Character encodings such as ISO-8859-1 (commonly referred to as "Latin-1" since it encodes most Western European languages), ISO-8859-5 (which supports Cyrillic), SHIFT_JIS (a Japanese encoding), and euc-jp (another Japanese encoding) save bandwidth by representing only slices of the document character set.
Thus, character encodings allow authors to work with a convenient subset of the document character. Authors should not have to know anything about the underlying character encoding of the document or tool they are using --- writing Japanese in a UTF-8 editor is as easy as writing Japanese in a JIS or SHIFT_JIS editor.
Character encodings also mean that authors are not required to enter a document@#s text in the form of references the document character set. Requiring authors to work with such a large character encoding would be cumbersome and wasteful (although encodings such as UTF-8 that cover all of Unicode do exist).
To allow this convenience, conforming user agents must correctly map to [UNICODE] all characters in any character encodings ("charsets") they recognize (or behave as if they did). A list of recommended character encodings for various scripts and languages will be provided in a separate document.
How does a user agent know which character encoding has been used to encode a given document?
In many cases, before a Web server sends an HTML document over the Web, it tries to figure out the character encoding (by a variety of techniques such as examining the first few bytes of the file, checking its encoding against a database of known files and encodings, etc.). The server transmits the document and the name of the character encoding to the receiving user agent by way of the charset parameter of the HTTP "Content-Type" field. For example, the following HTTP header announces that the character encoding is "euc-jp".
Content-Type: text/html; charset=euc-jp
The value of the "charset" parameter must be the name of a "charset" as defined in [RFC2045].
Unfortunately, not all servers send information about the character encoding (even when the character encoding is different from the widely used ISO-8859-1 encoding). HTML therefore allows authors a way to tell user agents which character encoding has been used by specifying it explicitly in the document header with the META element. For example, to specify that the character encoding of the current document is "euc-jp", include the following META declaration:
<META http-equiv="Content-Type" Content="text/html; charset=euc-jp">
This mechanism has a notable limit: the user agent cannot interpret the META element to determine the character encoding if it doesn@#t already know the character encoding of the document. The META declaration must only be used when the character encoding is organized such that ASCII characters stand for themselves at least until the META element is parsed. In this case, conforming user agents must correctly interpret the META element.
To sum up, conforming user agents must observe the following priorities when determining a document@#s character encoding, (from highest priority to lowest):
Explicit user action to override erroneous behavior.
An HTTP "charset" parameter in a "Content-Type" field.
A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
The "charset" attribute set for the A and LINK elements.
User agent heuristics and user settings. For example, user agents typically assume that in the absence of other indicators, the character encoding is ISO-8859-1. This assumption may lead to an unreadable presentation of certain documents.
In all cases, the value of the "charset" attribute or parameter must be the name of a "charset" as defined in [RFC2045].
If, for a specific application, it becomes necessary to refer to characters outside [ISO10646], characters should be assigned to a private zone to avoid conflicts with present or future versions of the standard. This is highly discouraged, however, for reasons of portability.
Note: Modern web servers can be configured with information about which document is using which character encoding. Webmasters should use these facilities but should take pains to configure the server properly.
Character entities
Your hardware and software configuration probably won@#t allow you to refer to all Unicode characters through simple input mechanisms, so SGML offers character encoding-independent mechanisms for specifying any character from the document character set.
Numeric character references (either decimal or hexadecimal form).
Named character references.
Numeric character references specify the integer reference of a Unicode character. A numeric character reference with the syntax D; refers to Unicode decimal character number D. A numeric character reference with the syntax H; refers to Unicode hexadecimal character number H. The hexadecimal representation is a new SGML convention and is particularly useful since character standards use hexadecimal representations.
Here are some examples:
Entity å refers to the letter "a" with a small circle above it (used, for example, in Norwegian).
Entity å refers to the same character with the hexadecimal representation.
Entity И refers to the Cyrillic capital letter "I".
Entity 水 refers to the Chinese character for water with the hexadecimal representation.
To give authors a more intuitive way to refer to characters in the document character set, HTML offers a set of named character entities. Named character references replace integer references with symbolic names. The named entity å refers to the same Unicode character as å. There is no named entity for the Cyrillic capital letter "I". The full list of named character entities is included in this specification.
Four named character entities deserve special mention since they are frequently used to "escape" special characters: For text appearing as part of the content of an element, you should escape < as < to avoid possible confusion with the beginning of a tag. The & character should be escaped as & to avoid confusion with the beginning of an entity reference.
You should also escape & within attribute values since entity references are allowed within cdata attribute values. In addition, you should escape > as > to avoid problems with older user agents that incorrectly perceive this as the end of a tag when coming across this character in quoted attribute values.
Rather than worry about rules for quoting attribute values, its often easier to encode any instance of " by " and to always use " for quoting attribute values. Many people find it simpler to always escape these 4 characters in element content and attribute values.
"&" to represent the & sign.
"<" to represent the < sign.
">" to represent the > sign.
"" to represent the " mark.
Names of named character entities are case-sensitive. Thus, Å refers to a different character (upper case A, ring) than å (lower case a, ring).
Note: In SGML, it is possible to eliminate the final ";" after a numeric or named character reference in some cases (e.g., at a line break or directly before a tag). In other circumstances it may not be eliminated (e.g., in the middle of a word). We strongly suggest using the ";" in all cases to avoid problems with user agents that require this character to be present.
相关文章:
- · 请看被打开的子窗口继承父窗口定义的styleSheets的例子
- · 经常有人询问如何用javascript判断日期是否有效,我以前也遇到过,不过后来得高人指点解决了,贴出来大...
- · ShowModalDialog的具体用法
- · 下拉式互动列表框(EC潮流网同学录之真情留言板使用的代码)
- · MSGBOX返回值
- · js中几种去掉字串左右空格的方法,请看
- · VBSctipt 5.0中的新特性
- · Jscript 5.0中的新特性
- · 加快 DHTML 的一组技巧(Copy from Microsoft)
- · 一个不太让人讨厌的自动弹出窗口:)
- · 一个把数字转英文的实用程序
- · rollarea.js及其用法示例
- · 下拉框连动的小例子(.htm版)
- · 判断访问者的浏览器是否支持javascript和Cookies
- · 在Windows桌面上使用WSH接收邮件 (转)
- · HTML4.0的 Access Key
- · 绝对是好东西 (select1 <==> select2):
- · 一个类似vbscript的round函数的javascript函数
- · 用javascript检查yyyy-mm-dd格式的正确源码。
- · HEAD元素使用集锦 (转)
- · 脚本控制Frame (转)
- · vbscript错误代码及对应解释大全
- · jscript错误代码及相应解释大全
- · 打开最大化窗口的一点经验
- · 无偿贡献,进入页面后自动刷新一次
- · 庆祝 Joy ASP 上贴数超过800页!!! 送给大家一个小礼物 ^_^
- · 自己动手,结合javascript和dhtml做一个ubb编辑器(附例子代码)
- · 选择最快的镜像站点
- · 一段有趣并且实用的程序--利用javascript和dhtml实现两个列表框中内容的移动。(代码见内,把它存为一个...
- · 动态菜单的另一种实现(一) category.js
- · 动态数组的另一种实现(二) 界面
- · 实际使用“DXML”:在站点上实现 DHTML 菜单和目录(co.)
- · 以前收集的一些资料---JS中处理日期的一些函数和方法
- · 以前搜集的一些资料---html中的特殊字符(2)
- · 以前搜集的一些资料---html中的特殊字符(1)
- · 在浏览器里实现类似VB Form的界面控制
- · CSS2参考之一(转贴)
- · CSS2参考之二(转贴)
