SoFunction
Updated on 2025-04-14

XML Easy Learning Manual (Good) Page 2/3


6. Strict formatting of XML

Learning from the lessons learned from loose HTML formatting, XML insists on implementing "good formatting" from the beginning.
Let's first look at some HTML statements, which can be seen everywhere in HTML:
1.
sample

2.< b>< i>sample< /b>< /i>

3.< td>sample< /TD>

4.< font color=red>samplar< /font>

In XML documents, the syntax of the above statements is all wrong. because:
1. All marks must have a corresponding end mark;
2. All XML tags must be reasonably nested;
3. All XML tags are case sensitive;
4. All attributes of tags must be enclosed with "";
So the correct way to write the above statement in XML is
1.
sample
2.< b>< i>sample< /i>< /b>
3.< td>sample< /td>
4.< font color="red">samplar< /font>
In addition, XML tags must follow the following naming rules:
1. Names can contain letters, numbers and other letters;
2. Names cannot start with numbers or "_" (underscore);
3. Names cannot start with the letter xml (or XML or Xml ..);
4. The name cannot contain spaces.
Any errors in XML documents will get the same result: the web page cannot be displayed. Browser developers have reached an agreement to implement strict and critical parsing of XML, and any minor errors will be reported. You can modify the above, such as changing <email> to <email> and then opening it directly in IE5, and you will get an error message page:

<?xml version="1.0" encoding="GB2312"?>
<myfile>
<title>XML Easy Learning Manual</title>
<author>ajie</author>
<Email>ajie@</email>
<date>20010115</date>
</myfile>
7. More about XML
OK, you already know:
1.What is XML;
,The relationship and difference between HTML and SGML;
Simple application.
congratulations! You no longer know nothing about XML and are already at the forefront of network technology. The whole learning process doesn't seem to be very difficult :)
If you are more interested in XML and want to further understand the detailed information of XML and other practical application technologies, please continue to browse our next chapter: The concept of XML.
XML Easy Learning Manual (2) XML Concept
Chapter 2 XML Concept
Introduction
After the quick start learning in Chapter 1, you already know that XML is a language that allows you to create your own identity. It can separate data from formats from web pages. It can store data and share data. The characteristics of XML can do everything. If you want to learn XML in depth and understand the ins and outs of XML in the system, then we must first return to the issue of XML concept. XML (Extensible Markup Language), an extensible markup language. "Extensibility", "Identification", "Language". Each word clearly points out the important characteristics and functions of XML. Let's analyze it carefully:
1. Extensibility
2. Identification
3. Language
IV. Structured
V. Meta Data
6. Show
7. DOM
1. Extensibility---Use XML, you can create your own tags for your document.
The first word for XML is "extensibility", which is why XML is powerful and elastic.
In HTML, there are many fixed tags, we must remember and use them, you cannot use tags that are not in the HTML specification. And in XML, you can create any tags you need. You can use your imagination to give your document some easy-to-remember marking names. For example, your document contains some game strategies. You can create a tag called <game>, and then create tags such as <RPG>, <SLG> based on the game category under <game>. As long as it is clear and easy to understand, you can build any number of marks.
You may not be able to adapt at first because when we learn HTML, we have fixed tags that can be directly learned and used; (Many people, including myself, build their own web pages while analyzing other people's code and logos), but XML has no tags to learn, and few document tags are exactly the same. What should we do? Haha, if you don’t have it, just create it yourself. Once you actually start writing XML documents, you will find it interesting to create new tags as you like. You can create tags with your own characteristics, or even build your own HTML language.
Scalability gives you more choices and powerful abilities, but at the same time it also creates a problem that you must learn to plan. You need to understand your own document, know what parts it consists of, its relationships with each other and how to identify them.
It should be noted that the identification describes the type or characteristics of the data, such as <width>, age<age>, name<name>, etc., rather than the content of the data, such as: <10pxl>, <18>, <Zhang San>, these are useless marks. If you have learned about databases, you can understand this way, and the identification is a field name.
2. Identification---Use XML to identify elements in the document.
The second word for XML is "identification", which indicates that the purpose of XML is to identify elements in a document.
Whether you are HTML or XML, the essence of identification is to be easy to understand. If there is no identification, your document will be just a long string to the computer, and each word looks the same, without a distinction between key points.
Only through identification can your document be easy to read and understand. You can divide paragraphs and list the title. In XML, you can use its extensibility to create more appropriate identifiers for documents.
However, one thing to remind everyone is that the logo is only used to identify information and does not convey information itself. For example, HTML code like this:
<b>frist step<b>
Here <b> means bold, and it is only used to indicate that the "frist step" character is displayed in bold. <b> does not contain any actual information itself. You cannot see it on the page. The real message is "frist step".
3. Language---Use XML you need to follow specific syntax to identify your document.
The third word for XML is "language". This shows that as a language XML must follow certain rules. While XML's scalability allows you to create new identities, it still has to follow specific structures, syntax, and explicit definitions.
In the field of computers, languages ​​often represent "program languages", which are used to program and implement some functions and applications, but not all "languages" are used to program, and XML is just a language used to define identification and describe information.
Let’s take a deeper understanding of the principle of XML application. It may be boring, but it is very important for the overall understanding. You can quickly go through it first and have a vague concept in your mind, and the specific essence needs to be understood slowly in practice.
4. Structure--XML promotes the structure of the document, and all information is arranged in a certain relationship.
"Structured" sounds too abstract. When we understand it this way, structure is to build a framework for your document, just like writing an outline first when writing an article. Structure makes your document look uncluttered, and each part is closely linked to form a whole.
There are two principles for structuring:
1. Each part (each element) is associated with other elements. The associated series form a structure.
2. The meaning of the identification itself is separated from the information it describes.
Let's look at a simple example to help understand:
<?xml version="1.0" encoding="GB2312"?>
<myfile>
<title>XML Easy Learning Manual</title>
<chapter>XML Quick Start
<para>What is XML</para>
<para>The benefits of using XML</para>
</chapter>
<chapter>XML concept
<para>Extensibility</para>
<para>Login</para>
</chapter>
</myfile>
This is the XML description document of this article. You can see that the identifier is divided into three levels of association, which is very clear:
<myfile>
<chapter>
<para>
...
</para>
</chapter>
</myfile>
The above document structure, which we also call the "document tree", is the main element, such as <myfile>, and the branches and pages are child elements, such as <chapter> and <para>.
5.Metadata (Metadata)---Professional XML users will use metadata to work.
In HTML, we know that meta tags can be used to define keywords, introductions, etc. of a web page. These tags will not be displayed in the web page, but can be searched by search engines and affect the order of search results.
XML has deepened and extended this principle. With XML, you can describe where your information is. You can verify the information through meta, perform searches, force display, or process other data.
Here are some uses of XML metadata in practical applications:
1. The digital signature can be verified to make the submission action of online business valid.
2. It can be easily indexed and searched more efficiently.
3. Data can be transferred between different languages.
The W3C organization is studying a metadata processing method called RDF (Resource Description Framework), which can automatically exchange information. W3C claims that using RDF with digital signatures will enable "real and trustworthy" e-commerce in the network.
6. Show
Pages cannot be displayed using XML alone. We use formatting techniques, such as CSS or XSL, to display documents created by XML tags.
In the first chapter, we talked about XML separating data from formats. The XML document itself does not know how to display it, and there must be auxiliary files to help implement it. (XML cancels all logos, including font, color, p and other style definitions, so XML uses methods similar to CSS in DHTML to define document styles.), the file types used in XML to set display styles are:

The full name of XSL is Extensible Stylesheet Language (extensible style language), which is the main file type for designing XML document display styles in the future. It is also based on XML language. With XSL, you can flexibly set the document display style, and the document will automatically adapt to any browser and PDA (Part 1).
XSL can also convert XML into HTML, so that old browsers can also browse XML documents.

Everyone is familiar with CSS. The full name is Cascading Style Sheets, which is currently the main method used to display XML documents on the browser.

Behaviors have not become the standard yet. It is a unique feature of Microsoft's IE browser, and it can set some interesting actions on XML identification.
7.DOM
The full name of DOM is document object model (document object model). What is DOM used for? Suppose you think of your document as a separate object, DOM is the standard for how to operate and control this object using HTML or XML.
Object-oriented thinking methods have become very popular. In programming languages ​​(such as java, js), object-oriented programming ideas are used. In XML, we need to operate and control the web page as an object, and we can create our own objects and templates. When communicating with objects, you need to use the API to command objects. The full name of the API is Application Programming Interface, which is the rule for accessing and manipulating objects. DOM is an API that describes HTML/XML document object rules in detail. It specifies the naming agreement, program model, communication rules, etc. of HTML/XML document objects. In XML documents, we can think of each identification element as an object--it has its own name and attribute.
XML creates the identifier, and the purpose of DOM is to tell script how to operate and display these identifiers in the browser window.
Above we have briefly explained some basic principles of XML. Let’s take a look at the correlation between them and how they work. Let’s take a look at a picture here:

Pictures related to this topic are as follows:

Describes the data type. For example: "King learn" is a title element;
Stores and controls the display style of elements. For example: the title will be displayed in 18pt font
The script controls how elements act. For example: When a title element "out of stock", it will be displayed in red.
This provides a public platform for the communication between scripts and objects, and displays the results in the browser window.
If an error occurs in any part, no correct result will be obtained.
OK, after seeing this, we have already had a general overall concept of how XML works. Through this chapter's study, we may feel that XML seems to be more inclined to data processing and is more convenient for programmers to learn. The actual situation is the same. The purpose of XML design is to facilitate sharing and interacting data. In the next chapter, we will systematically understand various terms about XML. Welcome to continue browsing.
XML Easy Learning Manual (3) Terms of XML
Chapter 3 XML Terms
outline:

Introduction
1. Related terms for XML documents
2. Related terms of DTD
Introduction

The most troublesome thing about getting started with XML is that there are a lot of new term concepts to understand. Since XML itself is also a brand new technology that is constantly developing and changing, and organizations and major network companies (Microsoft, IBM, SUN, etc.) are constantly introducing their own insights and standards, it is no surprise that new concepts are flying everywhere. There is a lack of authoritative institutions or organizations in China to formally name these terms. Most of the Chinese textbooks about XML you see are translated based on the author's own understanding. Some are correct and some are wrong, which further hinders our understanding and learning of these concepts.
The explanation of XML terms you will see below is also the author's own understanding and translation. Ajie describes it based on the XML 1.0 standard specifications and related formal documentation released by the W3C organization. It is ensured that these understandings are basically correct, at least not wrong. If you want to read and learn more, I have listed the source and links of the relevant resources in the last part of this article, which you can access directly. OK, let's move on to the topic:
1. Related terms for XML documents
What is an XML document? Just know the HTML original code file. XML documents are XML original code files written with XML identifiers. XML documents are also plain text files for ASCII, which you can create and modify using Notepad. The suffix of an XML document is .XML, for example. You can also open the .xml file directly with IE5.0 or above browsers, but what you see is the "XML original code" and the page content will not be displayed. You can save the following code as a try:

<?xml version="1.0" encoding="GB2312"?>
<myfile>
<title>XML Easy Learning Manual</title>
<author>ajie</author>
<email>ajie@</email>
<date>20010115</date>
</myfile>

The XML document contains three parts:
1. An XML document declaration;
2. A definition of document type;
3. Identify the created content with XML.

Give an example:
<?xml version="1.0"?>
<!DOCTYPE filelist SYSTEM "">
<filelist>
<myfile>
<title>QUICK START OF XML</title>
<author>ajie</author>
</myfile>
......
</filelist>
The first line <?xml version="1.0"?> is the declaration of an XML document. The second line indicates that this document is used to define the document type. The third line is the main part of the content.
Let's learn about the terms related to XML documents:

(element):
Elements are already understood in HTML, and they are the smallest unit that makes up HTML documents, and the same is true in XML. An element is defined by an identifier, including the start and end identifiers and the contents therein, like this: <author>ajie</author>

The only difference is: in HTML, the identity is fixed, while in XML, the identity needs to be created by you.

(Identification)
Identification is used to define elements. In XML, the identifier must appear in pairs, surrounding the data in the middle. The name of the identifier is the same as the name of the element. For example, an element like this:
<author>ajie</author>
Where <author> is the logo.

(property):
What are attributes? Look at this HTML code:<font color="red">word</font>. Among them, color is one of the properties of font.
An attribute is a further description and description of the identifier. An identifier can have multiple attributes, such as the attribute of font and size. Properties in XML are the same as those in HTML. Each property has its own name and value, and the property is part of the identification. For example:
<author sex="female">ajie</author>
Attributes in XML are also defined by yourself. We recommend that you try not to use attributes, but change attributes to child elements. For example, the above code can be changed to this:
<author>ajie
<sex>female</sex>
</author>
The reason is that attributes are not easy to expand and operate by programs.

(statement)
There is an XML declaration on the first line of all XML documents. This statement means that this document is an XML document, which XML version of it follows. An XML declaration statement looks like this:
<?xml version="1.0"?>

(File type definition)
DTD is used to define the relationship between elements, attributes and elements in XML documents.
DTD files can be used to detect whether the structure of the XML document is correct. But creating XML documents does not necessarily require DTD files. For detailed descriptions of the DTD file, we will list the items separately below.

-formed XML (XML in good format)
A document that complies with XML syntax rules and adheres to XML specifications is called "good format". If all your identities strictly adhere to the XML specification, then your XML document does not necessarily need a DTD file to define it.
A well-formed document must start with an XML declaration, for example:
<?xml version="1.0" standalone="yes" encoding="UTF-8"?>
You must specify the XML version that the document complies with, currently 1.0; secondly, the document is "independent", which does not require a DTD file to verify whether the identifier is valid; thirdly, the language encoding used by the document is specified. The default is UTF-8. If you use Chinese, you need to set it to GB2312.
A well-formed XML document must have a root element, which is the first element created immediately after declaring. The other elements are child elements of this root element and belong to the group of root elements.
The content of a well-format XML document must be written in accordance with XML syntax. (We will explain the XML syntax in the next chapter)

XML (valid XML)
An XML document that complies with XML syntax rules and complies with the corresponding DTD file specification is called a valid XML document. Note that we compare "Well-formed XML" and "Valid
XML", the biggest difference is that one fully complies with the XML specification, and the other has its own "file type definition (DTD)".
The process of comparing and analyzing the XML document with its DTD file to see if it complies with the DTD rules is called validation. This process is usually handled by a software called parser.
A valid XML document must also start with an XML declaration, for example:
<?xml version="1.0" standalone="no" encode="UTF-8"?>
Unlike the above example, in the standalone (independent) property, "no" is set here because it must be used with the corresponding DTD. The definition method of the DTD file is as follows:
<!DOCTYPE type-of-doc SYSTEM/PUBLIC "dtd-name">
in:
"!DOCTYPE" means that you want to define a DOCTYPE;
"type-of-doc" is the name of the document type, defined by you, usually the same as the DTD file name;
Only one of the two parameters "SYSTEM/PUBLIC" is used. SYSTEM refers to the URL of a private DTD file used by a document, while PUBLIC refers to the URL of a document calling a public DTD file.
"dtd-name" is the URL and name of the DTD file. The suffix of all DTD files is ".dtd".
Let's use the above example, which should be written like this:
<?xml version="1.0" standalone="no" encode="UTF-8"?>
<!DOCTYPE filelist SYSTEM "">
2. Related terms of DTD
What is DTD, we have mentioned briefly above. DTD is an effective method to ensure the correct format of XML documents. You can compare XML documents and DTD files to see whether the document complies with the specifications and whether the elements and labels are used correctly. A DTD document contains: definition rules for elements, definition rules for relationships between elements, attributes that elements can use, and entities or symbol rules that can be used.
The DTD file is also an ASCII text file with the suffix name .dtd. For example:.
Why use DTD files? My understanding is that it satisfies network sharing and data interaction. The biggest benefit of using DTD is the sharing of DTD files. (It is the PUBLIC property in the DTD description above). For example, if two people in the same industry and different regions use the same DTD file as document creation specification, their data will be easily exchanged and shared. Others online want to supplement data, and they only need to create documents based on the public DTD specifications and can join them immediately.
At present, there are a large number of written DTD files that can be used. These DTD files have established common element and label rules for different industries and applications. You don't need to recreate yourself, just add the new logo you need to on their basis.
Of course, if you want, you can create your own DTD, which may work more perfectly with your documentation. Establishing your own DTD is also a very simple thing. Generally, you only need to define 4-5 elements.
There are two ways to call DTD files:
1. DTD directly included in XML documents
You just need to insert some special instructions into the DOCTYPE statement, like this:
We have an XML document:
<?xml version="1.0" encoding="GB2312"?>
<myfile>
<title>XML Easy Learning Manual</title>
<author>ajie</author>
</myfile>
We can insert the following code after the first line:
<!DOCTYPE myfile [
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ENTITY copyright "Copyright 2001, Ajie.">
]>

2. Call independent DTD files
Save the DTD document as a file of .dtd and then call it in the DOCTYPE declaration line, for example, save the following code as
<!ELEMENT myfile (title, author)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>

Then call it in the XML document and insert it after the first line:
<!DOCTYPE myfile SYSTEM "">

We can see that the call of js in DTD documents is similar to that in HTML. Regarding how to write DTD documents, we will introduce them together with the syntax of XML documents in the next chapter.

Let’s learn about DTD terms below:
(planning)
schema is a description of data rules. schema does two things:
a. It defines the relationship between element data types and elements;
b. It defines the type of content that an element can contain.
DTD is a schema about XML documents.
Tree(document tree)
"Document Tree" has been mentioned in the previous second chapter. It is an image representation of the hierarchical structure of document elements. A document structure tree contains a root element, which is the top-level element (that is the first element immediately after the XML declaration statement). See example:
<?xml version="1.0"?>
<filelist>
<myfile>
<title>...</title>
<author>...</author>
</myfile>
</filelist>
The above example is arranged in three-level structures into a "tree", where <filelist> is the root element. In XML and DTD files, the first one defines the root element.

Element (parent element)/Child Element (child element)
A parent element refers to an element containing other elements, and the contained element is called its child element. Look at the "structure tree" above, where <myfile> is the parent element, <title>, <author> is its child element, and <myfile> is the child element of <filelist>. We also call the last level element like <title> that does not contain any child elements.
(Solution software)
Parser is a tool software that checks whether XML documents comply with DTD specifications.
XML parsers have developed into two categories: one is "non-confirm class paeser", which only detects whether the document complies with XML syntax rules and whether the document tree is created with element identifiers. Another type is "confirm class paeser", which not only detects document syntax and structure tree, but also compares whether the element identifier you use complies with the corresponding DTD file specifications.
Parser can be used independently or can also be part of editing software or browser. In the following related resource list, I listed some of the most popular parsers.

Well, through the third chapter, we have learned some basic terms of XML and DTD, but we don’t know how to write these files yet and what kind of syntax to follow. In the next chapter, we will focus on the syntax for writing XML and DTD documents. Please continue browsing, thank you!
XML Easy Learning Manual (4) XML Syntax
Chapter 4 XML Syntax
outline:
1.XML syntax rules
2. Element syntax
3. Syntax of comments
4.CDATA syntax
Five.Namespaces syntax
6. The syntax of intity
7. DTD syntax
Through the study of the previous three chapters, we have already understood what XML is, its implementation principles and related terms. Next, we start to learn the syntax specifications of XML and write our own XML documents.
1.XML syntax rules
XML documents are similar to HTML original code, and also use tags to identify content. The following important rules must be followed when creating XML documents:
Rule 1: There must be an XML declaration statement
We have mentioned this point in the previous chapter when we studied it. The declaration is the first sentence of an XML document, and its format is as follows:
<?xml version="1.0" standalone="yes/no" encoding="UTF-8"?>
The purpose of the declaration is to tell the browser or other processors that this document is an XML document. The version in the declaration statement represents the version of the XML specification that the document complies with; standalone indicates whether the document comes with a DTD file, and if so, the parameter is no; encoding indicates the language encoding used by the document, and the default is UTF-8.
Rule 2: Is there a DTD file available
If the document is a "valid XML document" (see previous chapter), the document must have corresponding DTD files and strictly abide by the specifications formulated by the DTD files. The declaration statement of the DTD file is immediately followed by the XML declaration statement, and the format is as follows:
<!DOCTYPE type-of-doc SYSTEM/PUBLIC "dtd-name">
in:
"!DOCTYPE" means that you want to define a DOCTYPE;
"type-of-doc" is the name of the document type, defined by you, usually the same as the DTD file name;
Only one of the two parameters "SYSTEM/PUBLIC" is used. SYSTEM refers to the URL of a private DTD file used by a document, while PUBLIC refers to the URL of a document calling a public DTD file.
"dtd-name" is the URL and name of the DTD file. The suffix of all DTD files is ".dtd".
Let's use the above example, which should be written like this:
<?xml version="1.0" standalone="no" encode="UTF-8"?>
<!DOCTYPE filelist SYSTEM "">
Rule 3: Pay attention to your case
In XML documents, there are differences in upper and lower case. <P> and <p> are different identifiers. Note that when writing elements, the upper and lower case of the front and back marks should be kept the same. For example: <Author>ajie</Author>, it is wrong to write <Author>ajie</author>.
You'd better develop a habit, either all capital, all lowercase, or capital the first letter. This reduces document errors due to case mismatch.
Rule 4: Add quotes to attribute values
In HTML code, attribute values ​​can be quoted or not. For example: <font color=red>word</font> and <font color="red">word</font> can be correctly interpreted by the browser.
However, in XML, it is stipulated that all attribute values ​​must be quoted (can be single or double quotes), otherwise they will be considered an error.
Rule 5: All logos must have corresponding ending logos
In HTML, the identifier may not appear in pairs, than?lt;br>. In XML, it is stipulated that all identifiers must appear in pairs, and there must be a start identifier, and there must be an end identifier. Otherwise it will be considered an error.
Rule 6: All empty signs must also be closed
An empty sign is an identification with no content between the pairs of signs. for example
, <img> and other logos. In XML, it is stipulated that all identifiers must have an end identifier. For such empty identifiers, the method of processing in XML is to add / at the end of the original identifier, and it is enough. For example:
It should be written as<br />;
<META name="keywords" content="XML, SGML, HTML"> should be written as <META name="keywords" content="XML, SGML, HTML" />;
<IMG src= ""> should be written as <IMG src= "" />

Chapter 4 XML Syntax
2. Element syntax
The element consists of a pair of logos and the contents therein. Just like this: ajie. The name of the element is the same as the name of the identifier. Identification can be further described by attributes.
In XML, there are no reserved words, so you can use any word as the element name as you wish. However, the following specifications must also be followed:
1. The name can contain letters, numbers and other letters;
2. The name cannot start with a number or "_" (underscore);
3. Names cannot start with the letter xml (or XML or Xml ..)
4. The name cannot contain spaces
5. The name cannot contain ":" (colon)
To make elements easier to read, understand and manipulate, we have some suggestions:
1. Do not use "." in the name. Because in many programming languages, "." is an attribute of an object, for example:. For the same reason, it is best not to use "-" as necessary, and use "_" instead;
2. The name should be as short as possible.
3. Try to use the same standard for the upper and lower case of the name.
4. The name can use non-English characters, such as in Chinese. But some software may not support it. (IE5 currently supports Chinese elements.)
In addition, add some explanations about attributes. In HTML, attributes can be used to define the display format of an element, for example: <font color="red">word</font> will display word in red. In XML, attributes are just descriptions of the logo and have nothing to do with displaying the element content. For example, the same sentence: <font color="red">word</font> will not display word in red. (So, some netizens will ask: How to display text in red in XML? This requires the use of CSS or XSL, which we will explain in detail below.)
3. Syntax of comments
Comments are for easy reading and understanding. Additional information added to XML documents will not be interpreted by the program or displayed by the browser.
The syntax of the comment is as follows:
<!-- Here is the comment information -->
As you can see, it is the same as the comment syntax in HTML, which is very easy. Developing good annotation habits will make your documents easier to maintain, share, and look more professional.
4.CDATA syntax
The full name of CDATA is character data, which is translated into character data. When we write XML documents, we sometimes need to display letters, numbers and other symbols themselves, such as "<". In XML, these characters already have special meanings. What should we do? This requires the use of CDATA syntax. The syntax format is as follows:
<![CDATA[Place the characters to be displayed here]]>
For example:
<![CDATA[<AUTHOR sex="female">ajie</AUTHOR>]]>
The content displayed on the page will be "<AUTHOR sex="female">ajie</AUTHOR>"
Previous page123Next pageRead the full text