XML study (Lesson 1: Understanding the structure and syntax of XML)

A. To see the widespread influence of XML in the modern Information Technology just to note that XML is the reason for existence (raison d'etre) of Microsoft. Net . From Windows XP onwards, the inside filled with XML. Microsoft has invested over 3 billion dollars on this technology, and in the near future all Microsoft software if you do not move (is Ported) across. NET is also the least. NET Enabled (for. NET is). Paralleled. NET is SQLServer 2000 , a database fully supports XML.

Maybe you've heard about Web Services . That is what the Web services you can use on-demand, ie when needed for his program, by calling it the same way as calling a function (Function). Web Services are deployed based on XML and HTTP, the standard used to send Web pages.

Important point of XML technology is that it does not belong on a separate company, but is a standard recognized by all as being prepared by the World Wide Web Consortium - W3C (a Drafting Committee with the presence of all both people have blood on Informatics Wanderer) and those who want to contribute to the exchange via email. XML by itself but not difficult to understand, but the standard tool set for working with XML Document Object Model - DOM, XPath, XSL , etc.. is very effective, and the standards are developed constantly.



Microsoft committed (determined commitment) into XML from the beginning. Not only have representatives standing to work in the W3C but also actively contribute by sending suggestions. Microsoft's position on the XML standard is not yet complete, the Microsoft product compliance (comply) what seems to be most recognized standards and when completed, comply fully.

The Most Valuable tools XML ActiveX is Microsoft's MSXML . It is used in Visual Basic 6, ASP (Active Server Pages) for IIS and Internet Explorer since version 5.5. Currently there MSXML version 4.0. MSXML parse (read and analysis) and validate (validation test) DOM XML file to us, a tree of Nodes represent XML elements inside. MSXML is also based on an XSL file to transform (variations) an XML file into a Web page (HTML) or XML.

What is XML?

A little history

As we all know, XML is an abbreviation for the word eXtensible Markup Language - but Markup Language (markup language) is what?

In the printing industry, to instruct workers on how to put words in a burst, the author or editor often draw circles in the manuscript and annotated with a markup language similar to shorthand. Language is calledMarkup Language .

XML is a markup language is relatively new because it is a subset (smaller part) of and from (derived from) an older markup language called Standard Generalized Markup Language (SGML) . HTML language is also based on SGML, it is really an application of SGML.

SGML was invented by Ed Mosher and Ray Lorie and Charles F. Goldfarb's research group at IBM in 1969, when man set foot on the moon. At first it is called Generalized Markup Language (GML) , and is designed for use as a meta-language , a language used to describe the language - grammar, vocabulary of them,. etc. In 1986, SGML body ISO (International Standard Organisation) acquired (adopted) as the standard for storing and exchanging data. When Tim Berners-Lee implemented HyperText Markup Language - HTML Web pages for use in early 1990, just remind him that HTML is an application of SGML.

Because SGML is very complicated, and HTML is limited, so in 1996 many organizations W3C XML design. XML version 1.0 documents is defined in the W3C Recommendation February 1998 , as anInternet Request for Comments (RFC) , is a "standard" .

From HTML to XML

In a Web page, using HTML markup language pairs Tags to mark the start and end position of the pieces of data to assist browsers (browser) parse (dis song to analyze) and displays the Web page according to the design of Web pages. For example a sentence following HTML:
< P align = "center" > Chao mung ban đến tham 
   < STRONG > Vovisoft </ STRONG > Web site 
</ P >
Question on HTML code that contains both markup Tags, <P> and <STRONG> . Each pair of data packets Tags it marks the opening tag and closing tag . Tags in this two closing </ P> and </ STRONG> . All that is inside a pair of Tags is called Element . Added to the character of an Element, you can insertattributes such as align the opening of Element Tag as he AttributeName = "value" , eg align = "center" .

Because Tags in HTML is used to format (presentation) document so the browser needs to know the meaning of each tag. Browser or an HTML parser will collect the following directives from HTML on the question:
  1. Start a new Paragraph Text and placed in the middle ( P align = "center" > ).
  2. Display question Welcome to visit
  3. Display text Vovisoft strongly ( STRONG Vovisoft </ STRONG > ).
  4. Showing questions Web site
  5. Meet at the end of Paragraph ( </ > )
To handle HTML code, not the browser needs to locate the Tags, but also to understand the meaning of each tag. Because each tag has its own idea ngia, eg P for paragraph, STRONG to emphasize, for example using bold (Bold) .

Like HTML, XML comes from SGML. It is also used to encode data Tags. Main difference between HTML and XML is that while the Tags of HTML contains significant formatting (layout) of the data, then the XML Tags contain significant structure of the data. For example a document order (order) the following XML:
< Order OrderNo = "1023" > 
   < order date > 03/27/2002 </ date order > 
   < Customer > Peter Collingwood </ Customer > 
   < Item > 
      < ProductID > 1 </ ProductID > 
      < Quantity > 5 </ Quantity > 
   </ Item > 
   < Item > 
      < ProductID > 4 </ ProductID > 
      < Quantity > 3 </ Quantity > 
   </ Item > 
</ Order >
This document contains data, does not prompt anything to the presentation. This means that an XML parser (the program off track and analysis) do not need to understand the meaning of the Tags. It's just finding the Tags, and determined that this is a valid XML document. Because browsers do not need to understand the meaning of the Tags, so you can use the Tag will do. That's why people use the wordeXtensible (expand it), but when using the word to write off the Left choose X instead of f , perhaps because of the mysterious X sounds, more attractive.

Let's take a closer look of an XML structure. First, the Element Order is Attribute OrderNo with value 1023. Inside the Element Order are:
  • Một Child (con) Element OrderDate với value 2002-3-27
  • Một Child Element Customer với value Peter Collingwood.
  • Hai Child Elements Item, mỗi Element Item lại chứa một Child Element ProductID và một Child Element Quantity.
Sometimes for an Element with the name properly, but does not contain a value, the reason is we want to use it as an elective Element (Optional), there is also no not. The most natural way is add the closing tag immediately after the opening tag. Such as Empty (empty) Element Element customer MiddleInitial in below:
<Customer>
   <FirstName>Stephen</FirstName>
   <MiddleInitial></MiddleInitial>
   <LastName>King</LastName>
</Customer>
There is another way to show off closing is Empty Element Tag and add a "/" (slash) at the bottom openning Tag. We can rewrite such customer as follows:
<Customer>
   <FirstName>Stephen</FirstName>
   <MiddleInitial/>
   <LastName>King</LastName>
</Customer>
Empty Element Of course there may also like Element Attribute Phonenumber second following:
<Customer>
   <FirstName>Stephen</FirstName>
   <MiddleInitial></MiddleInitial>
   <LastName>King</LastName>
   <PhoneNumber Location="Home">9847 2635</PhoneNumber>
   <PhoneNumber Location="Work"></PhoneNumber>
</Customer>

Data representation in XML

An XML document must be well-formed and valid . Although the two heard papers from this sight, but they have different meanings. A well-formed XML is an XML parser suitable for processing. Ie compliant XML rules of Tag, Element, Attribute, value. Etc. contained within the parser to be able to identify and distinguish anything.

Note that a well-formed XML is unlikely to contain useful data in the business. Is only well-formed XML that is structured properly. Useful work to do, not only well-formed XML, but also need to be valid. A valid XML document as it contains the data to be included in a document, type or class was. XML example an order may require a Child Element and an Attribute OrderNo OrderDate. Validate an XML parser by examining the XML data is valid as defined in an XML document, Specification for him. This specification can be aDocument Type Definition (DTD) or Schema .

I will talk about later in this document valid, now let's talk about well-formed.

Create a well-formed XML documents

For well-formed, an XML document must comply with the following laws:
  1. There must be a root (root) Element only, called Document Elements , it contains all of the other Elements in the document.
  2. Each opening tag must have a closing tag like it.
  3. Tags in XML is case sensitive , ie the opening tag and closing tag must be spelled exactly the same, uppercase or lowercase.
  4. Every Child Element to fit within its parent Element.
  5. XML attribute value in the package must be between a pair of quotes or a pair of apostrophe.
The first law requires a single root Element, the following documents are not well-formed because it did not have a top level Element:
< Product ProductID = "1" > Chair </ Product > 
< Product ProductID = "2" > Desc </ Product >
An XML document has no root element is called a XML fragment (piece) . To make it well-formed we need to add a root Element, as follows:
< Catalog > 
   < Product ProductID = "1" > Chair </ Product > 
   < Product ProductID = "2" > Desc </ Product > 
</ Catalog >
Second law says that each opening tag must have a closing tag like it. That is open every tag must be closed. Empty Element compact as written by MiddleInitial /> is called a self-closing tags. The other must have a closing Tag Tags. The following XML is not well-formed because it contains a Tag <Item> missing a closing tag </ Item>:
< Order > 
   < order date > 06/14/2002 </ date order > 
   < Customer > Helen Mooney </ Customer > 
   < Item > 
      < ProductID > 2 </ ProductID > 
      < Quantity > 1 </ Quantity > 
   < Item > 
      < ProductID > 4 </ ProductID > 
      < Quantity > 3 </ Quantity > 
   </ Item > 
</ Order >
To make it well-formed we must add the closing tag for the first Element Item:
< Order > 
   < order date > 06/14/2002 </ date order > 
   < Customer > Helen Mooney </ Customer > 
   < Item > 
      < ProductID > 2 </ ProductID > 
      < Quantity > 1 </ Quantity > 
   </ Item > 
   < Item > 
      < ProductID > 4 </ ProductID > 
      < Quantity > 3 </ Quantity > 
   </ Item > 
</ Order >
Law said Tuesday that the name tag is case sensitive, ie the closing tag must be spelled exactly as the opening tag, regardless of uppercase, lowercase. How <order> different <Order> , we can not use the tag</ Order> to close Tag <order> . The following XML is not well-formed because opening and closing tags of the Element Tags OrderDate not spelled the same:
< Order > 
   < order date > 01/01/2001 </ Order Date > 
   < Customer > Graeme Malcolm </ Customer > 
</ Order >
Want to make it well formed, we must change the word d into letters (uppercase) D as follows:
< Order > 
   < order date > 01/01/2001 </ date order > 
   < Customer > Graeme Malcolm </ Customer > 
</ Order >
Law said Wednesday a Child Element to fit within its parent Element, ie can not start a new Element to this Element may be terminated. For example the following XML document not well-formed because of the closing Tag Category Tag out before closing the Product .
< Catalog > 
   < Category CategoryName = "Beverages" > 
       < Product ProductID = "1" > 
          Coca-Cola 
       </ Category > 
   </ Product > 
</ Catalog >
Want to fix it well-formed we need to play like this before Tag Product:
< Catalog > 
   < Category CategoryName = "Beverages" > 
       < Product ProductID = "1" > 
          Coca-Cola 
       </ Product > 
   </ Category > 
</ Catalog >
Law final well-formed XML document requires the attribute value must be wrapped in a pair of apostrophe or quotation. The following documents are not well-form values for these attributes are not properly parentheses, number 1 without parentheses, number 2 is an apostrophe, a quotation:
< Catalog > 
   < Product ProductID = 1 > Chair </ Product > 
   < Product ProductID = '2 " > Desc </ Product > 
</ Catalog >

Processing Instructions và Comments

In addition to the data necessary for doing business, an XML document also contains Processing Instructions (instructions on how to process) for the parser and Comments (notes) to the reader.

Tags are in pairs Processing Instruction <? and ?> . Usually it says version of the XML parser should specification which follow. Sometimes it is also said to use data in XML encoding which, for example uft-8.Attribute again is also a standalone . for standalone XML parser that can be validated by yourself, do not need a DTD or Schema.

Although a well-formed XML document does not need a Processing Instruction, but usually to a Processing Instruction is in the document properly, it is called the prologue (head teacher) . Here is an example of a Processing Instruction in the prologue of an XML document:
<?xml version="1.0" encoding="utf-8"  standalone="yes"?>
<Order>
   <OrderDate>2002-6-14</OrderDate>
   <Customer>Helen Mooney</Customer>
   <Item>
      <ProductID>1</ProductID>
      <Quantity>2</Quantity>
   </Item>
   <Item>
      <ProductID>4</ProductID>
      <Quantity>1</Quantity>
   </Item>
</Order>
There is a Processing Instruction is also very common for the name of the XML stylesheet, for example:
<?xml-stylesheet type="text/xsl" href="order.xsl"?>
Here we know that the XML parser stylesheet stylesheet type text / xsl and it is contained in the file nameorder.xsl . You can also add Comments using pairs Tags <! - and -> as follows:
<?xml version="1.0" encoding="utf-8"  standalone="yes"?>
<!-- Below are details of a purchase order.  -->
<Order>
   <OrderDate>2002-6-14</OrderDate>
   <Customer>Helen Mooney</Customer>
   <Item>
      <ProductID>1</ProductID>
      <Quantity>2</Quantity>
   </Item>
   <Item>
      <ProductID>4</ProductID>
      <Quantity>1</Quantity>
   </Item>
</Order>

Namespaces

There is a very important concept of XML is namespace . It gives the Element the same name to refer to two different data in one XML document. Like two students of the same name Tuan in the classroom, we must use more of them to distinguish them, or we call Tuan Tran Tuan Le. For example there is an order to place the following bookstores:
<?xml version="1.0"?>
<BookOrder OrderNo="1234">
   <OrderDate>2001-01-01</OrderDate>
   <Customer>
      <Title>Mr.</Title>
      <FirstName>Graeme</FirstName>
      <LastName>Malcolm</LastName>
   </Customer>
   <Book>
      <Title>Treasure Island</Title>
      <Author>Robert Louis Stevenson</Author>
   </Book>
</BookOrder>
When you look closely, we find that there may be some confusion about how to use Element Title . In the document there are two types of Title, a customer using customer comes to the title of Mr., Mrs., Dr.. , while the other to speak to the subject of a book entitled Book .

To avoid confusion, you can use to specify the name Element namespace that belongs to no clan. Clan was a Universal Resource Identifier (URI) . A URI can be a URL or a place to define its unique character.A namespace does not need to speak to an Internet address, it is just one, not two.

You can declare namespaces in an Element by using the attribute xmlns ( ns in the word stands for xmlns namespace) you can also declare a default namespace to apply to what lies inside an Element, where you declare namespace. For example the order document can be rewritten as follows:
<?xml version="1.0"?>
<BookOrder OrderNo="1234">
   <OrderDate>2001-01-01</OrderDate>
   <Customer xmlns="http://www.northwindtraders.com/customer">
      <Title>Mr.</Title>
      <FirstName>Graeme</FirstName>
      <LastName>Malcolm</LastName>
   </Customer>
   <Book xmlns="http://www.northwindtraders.com/book">
      <Title>Treasure Island</Title>
      <Author>Robert Louis Stevenson</Author>
   </Book>
</BookOrder>
I have avoided confusion for the customer, use the namespacehttp://www.northwindtraders.com/customer and then use the namespace inside the Bookhttp://www.northwindtraders.com/book .

However, we will address how much the customer if the order book and more. If any changes in the document, the namespace nostalgic dizziness death. One solution is to declare the abbreviation for namespaces in the document immediately, in the root Element (ie Document Element). Then the prefix in the document will need to confirm the Element namespace using namespace abbreviation of it. For example, as follows:
<?xml version="1.0"?>
<BookOrder  xmlns="http://www.northwindtraders.com/order" 
       xmlns:cust="http://www.northwindtraders.com/customer" 
       xmlns:book="http://www.northwindtraders.com/book" OrderNo="1234">
   <OrderDate>2001-01-01</OrderDate>
   <cust:Customer>
      <cust:Title>Mr.</cust:Title>
      <cust:FirstName>Graeme</cust:FirstName>
      <cust:LastName>Malcolm</cust:LastName>
   </cust:Customer>
   <book:Book>
      <book:Title>Treasure Island</book:Title>
      <book:Author>Robert Louis Stevenson</book:Author>
   </book:Book>
</BookOrder>
In the XML document we use three namespaces: a default namespace namehttp://www.northwindtraders.com/order , namespace http://www.northwindtraders.com/customer(abbreviated as cust ) and namespace http:// www.northwindtraders.com / book (abbreviated as book). The Elements and Attributes have no prefix (ie there is no abbreviation in front) as BookOrder, OrderNo, OrderDate, and, considered as belonging to the default namespace. To mark an Element or Attribute does not belong to default namespace, an abbreviation, representing namespace prefix will be attached to the Element or Attribute name. For example cust: LastName , book: Title .

CDATA

CDATA is a single XML data between <! [CDATA [ and ]] . Data inside the CDATA is through parser for medical reasons, can not be modified. This point is important when you want included in the data containing the text is seen as markup. You can set the example for the CDATA XML parser, and they will be ignored. When using an XSL stylesheets to transform XML files into HTML, with any scripting you also have to put in CDATA. Below are examples using CDATA:
<![CDATA[...place your data here...]]>

<SCRIPT>
  <![CDATA[
      function warning()
    {
        alert("Watch out!");
    }
   ]]>
</SCRIPT>

Entity References

Entity refers to how to write a special stamp has been predefined in XML. There are five entities below:
Entity
Description
&apos;apostrophe marks
&amp;sign ampersand
&gt;larger sign
&lt;smaller sign
&quot;quotes
In the next article we will learn how to process (processing) of an XML document.

0 comments:

Post a Comment