英雄联盟vn视频:XML class for processing and building simple XML documents

来源:百度文库 编辑:九乡新闻网 时间:2024/04/27 23:00:49
  • Download release 6.5 lite source files only - 166 Kb
  • Download release 6.5 lite source with exe - 326 Kb

This article has been re-written with the help of 2 years of feedback, andthe new source code has benefited from all of the fixes and developments duringthat time period. See release notes below.

Introduction

Often times you don't want to invest in learning a complex XML tool toimplement a little bit of XML processing in your application. Its SOEasy! Just add Markup.cpp and Markup.h to your Visual C++ MFCproject, #include "Markup.h", and begin using it. There are noother dependencies.

Features

  • Light: one small class that maintains one single document string with a simple array of indexes
  • Fast: the parser builds the index array in one quick pass
  • Simple: EDOM methods make it ridiculously easy to create or process XML strings
  • Independent: compiles into your program without requiring MSXML or any tokenizer
  • UNICODE: can be compiled for UNICODE for Windows CE and NT/XP platforms (define _UNICODE)
  • UTF-8: when not in UNICODE or MBCS builds, it works with UTF-8, ASCII, or Windows extended sets
  • MBCS: can be compiled for Windows double-byte character sets such as Chinese GB2312 (define _MBCS)

XML for Everyday Data

We often need to store and/or pass information in a file, or send a block ofinformation from computer A to computer B. And the issue is always the same:How shall I format this data? Before XML, you might haveconsidered "env" style e.g. PATH=C:\WIN95; "ini" style (grouped in sections);comma-delimited or otherwise delimited; or fixed character lengths. XML is nowthe established answer to that question except that programmers are sometimesdiscouraged by the size and complexity of XML solutions when all they need issomething convenient to help parse and format angle brackets. For goodminimalist reading on the syntax rules for XML tags, I recommend Beginning XML -Chapter 2: Well-Formed XML posted here on the Code Project.

XML is better because of its flexible and hierarchical nature, plus its wideacceptance. Although XML uses more characters than delimited formats, itcompresses down well if needed. The flexibility of XML becomes apparent when youwant to expand the types of information your document can contain withoutrequiring every consumer of the information to rewrite processing logic. You cankeep the old information identified and ordered the same way it was while addingnew attributes and elements.

CMarkup Lite Methods

CMarkup is based on the "Encapsulated" Document Object Model (EDOM), the key tosimple XML processing. Its a set of methods for XML processing with the samegeneral purpose as DOM(Document Object Model). But while DOM has numerous types of objects, EDOMdefines only one object, the XML document. EDOM harks back to the originalattraction of XML which was its simplicity. To keep overhead low,CMarkup takes a very light non-conforming non-validating approachto XML, and it does not verify the XML is well-formed.

The CMarkup "Lite" in this article is the free version of theCMarkup product sold at firstobject.com. CMarkupLite implements a subset of EDOM methods for creating and parsing XML documentstrings. The Lite methods also encompass some modification functionality such assetting an attribute or adding additional elements to an existing XML document,but not changing the data of, or removing, XML elements. See the EDOM specification to comparethe full CMarkup with CMarkup Lite. The fullCMarkup is available in Evaluation (Educational) and licensedDeveloper versions with many more methods, STL and MSXML versions, Base64, andadditional documentation. But this Lite version here at Code Project is morethan adequate for parsing and creating simple XML strings in MFC.

The CMarkup Lite methods are grouped into Creation andNavigation categories listed below.

CMarkup Lite Creation Methods

Collapse
CString GetDoc() const { return m_csDoc; };bool AddElem( LPCTSTR szName, LPCTSTR szData=NULL );bool AddChildElem( LPCTSTR szName, LPCTSTR szData=NULL );bool AddAttrib( LPCTSTR szAttrib, LPCTSTR szValue );bool AddChildAttrib( LPCTSTR szAttrib, LPCTSTR szValue );bool SetAttrib( LPCTSTR szAttrib, LPCTSTR szValue );bool SetChildAttrib( LPCTSTR szAttrib, LPCTSTR szValue );

GetDoc is used to get the document string after adding elementsand setting attributes. The AddAttrib and SetAttribmethods do the same thing as each other (as do AddChildAttrib andSetChildAttrib). They will change the attribute's value if italready exists, and add the attribute if it doesn't.

CMarkup Lite Navigation Methods

Collapse
bool SetDoc( LPCTSTR szDoc );bool IsWellFormed();bool FindElem( LPCTSTR szName=NULL );bool FindChildElem( LPCTSTR szName=NULL );bool IntoElem();bool OutOfElem();void ResetChildPos();void ResetMainPos();void ResetPos();CString GetTagName() const;CString GetChildTagName() const;CString GetData() const;CString GetChildData() const;CString GetAttrib( LPCTSTR szAttrib ) const;CString GetChildAttrib( LPCTSTR szAttrib ) const;CString GetError() const;

When you call SetDoc it parses the szDoc string andpopulates the CMarkup object. If it fails, it returnsfalse, and you can call GetError for an errordescription. The IsWellFormed method returns true ifthe CMarkup object has at least a root element; it does not verifywell-formedness.

Using CMarkup

The CMarkup class encapsulates the XML document text, structure,and current positions. It has methods both to add elements and to navigate andget element attributes and data. The locations in the document where operationsare performed are governed by the current position and the current childposition. This current positioning allows you to work with the XML documentwithout instantiating additional objects that point into the document. At alltimes, the object maintains a string representing the text of the document whichcan be retrieved using GetDoc.

Check out the freefirstobject XML editor which generates C++ source code for creating andnavigating your own XML documents with CMarkup Lite.

Creating an XML Document

To create an XML document, instantiate a CMarkup object and callAddElem to create the root element. At this point, if you calledAddElem("ORDER") your document would simply contain the empty ORDERelement . Then callAddChildElem to create elements under the root element (i.e."inside" the root element, hierarchically speaking). The following example codecreates an XML document and retrieves it into a CString:

Collapse
CMarkup xml;xml.AddElem( "ORDER" );xml.AddChildElem( "ITEM" );xml.IntoElem();xml.AddChildElem( "SN", "132487A-J" );xml.AddChildElem( "NAME", "crank casing" );xml.AddChildElem( "QTY", "1" );CString csXML = xml.GetDoc();

This code generates the following XML. The root is the ORDER element; noticethat its start tag is at the beginning andend tag is at the bottom. When an elementis under (i.e. inside or contained by) a parent element, the parent's start tagis before it and the parent's end tag is after it. The ORDER element containsone ITEM element. That ITEM element contains 3 child elements: SN, NAME, andQTY.

Collapse
132487A-Jcrank casing1

As shown in the example, you can create elements under a child element bycalling IntoElem to move your current main position to where thecurrent child position is so you can begin adding under what was the childelement. CMarkup maintains a current position in order to keep yoursource code shorter and simpler. This same position logic is used whennavigating a document.

Navigating an XML Document

The XML string created in the above example can be parsed into aCMarkup object with the SetDoc method. You can alsonavigate it right inside the same CMarkup object where it wascreated; just call ResetPos if you want to reset the currentposition back to the beginning of the document.

In the following example, after populating the CMarkup objectfrom the csDoc string, we loop through all ITEM elements under theORDER element and get the serial number and quantity of each item:

Collapse
CMarkup xml;xml.SetDoc( csXML );while ( xml.FindChildElem("ITEM") ){xml.IntoElem();xml.FindChildElem( "SN" );CString csSN = xml.GetChildData();xml.FindChildElem( "QTY" );int nQty = atoi( xml.GetChildData() );xml.OutOfElem();}

For each item we find, we call IntoElem before interrogating itschild elements, and then OutOfElem afterwards. As you getaccustomed to this type of navigation you will know to check in your loops tomake sure there is a corresponding OutOfElem call for everyIntoElem call.

Adding Elements and Attributes

The above example for creating a document only created one ITEM element. Hereis an example that creates multiple items loaded from a previously populateddata source, plus a SHIPMENT information element in which one of the elementshas an attribute. This code also demonstrates that instead of callingAddChildElem, you can call IntoElem andAddElem. It means more calls, but some people find this moreintuitive.

Collapse
CMarkup xml;xml.AddElem( "ORDER" );xml.IntoElem(); // inside ORDERfor ( int nItem=0; nItem

This code generates the following XML. The root ORDER element contains 2 ITEMelements and a SHIPMENT element. The ITEM elements both contain SN, NAME and QTYelements. The SHIPMENT element contains a POC element which has a typeattribute, and NAME and TEL child elements.

Collapse
132487A-Jcrank casing14238764-Abearing15John Smith555-1234

Finding Elements

The FindElem and FindChildElem methods go to thenext sibling element. If the optional tag name argument is specified,then they go to the next element with a matching tag name. The element that isfound becomes the current element, and the next call to Find will go to the nextsibling or matching sibling after that current position.

When you cannot assume the order of the elements, you must reset the positionin between calling the Find method. Looking at the ITEM element in the aboveexample, if someone else is creating the XML and you cannot assume the SNelement is before the QTY element, then call ResetChildPos() beforefinding the QTY element.

To find the item with a particular serial number, you can loop through theITEM elements and compare the SN element data to the serial number you aresearching for. This example differs from the original navigation example bycalling IntoElem to go into the ORDER element and useFindElem("ITEM") instead of FindChildElem("ITEM");either way is fine. And notice that by specifying the "ITEM" element tag name inthe Find method we ignore all other sibling elements such as the SHIPMENTelement.

Collapse
CMarkup xml;xml.SetDoc( csXML );xml.FindElem(); // ORDER element is rootxml.IntoElem(); // inside ORDERwhile ( xml.FindElem("ITEM") ){xml.FindChildElem( "SN" );if ( xml.GetChildData() == csFindSN )break; // found}

Encodings

ASCII refers to the character codes under 128 that we have come to depend on,programming in English. Conveniently if you are only using ASCII, UTF-8 encodingis the same as your common ASCII set.

If you are using a character set not corresponding to one of the Unicode setsUTF-8, UTF-16 or UCS-2, you really should declare it in your XML declaration forthe sake of interoperability and viewing it properly in Internet Explorer.Character sets like ISO-8859-1 (Western European) assign characters to thevalues in a byte between 128 and 255, so that every character still only usesone byte. Windows double-byte character sets such as GB2312, Shift_JIS andEUC-KR use one or two bytes per character. For these Windows charsets, put_MBCS in your preprocessor definitions and make sure your user'sOperating System is set to the corresponding code page.

To prefix your XML document with an XML declaration such as , pass it toSetDoc or the CMarkup constructor. Include a CRLF atthe end as shown so that the root element goes on the next line.

Collapse
xml.SetDoc( "\r\n" );xml.AddElem( "island", "Cura?ao" );

Depth First Traversal

You can use the following code to loop through every element in your XMLdocument. In the part of the code where you process the element, every elementin the document (except the root element) will be encountered in depth firstorder. For illustrative purposes, it gets the tag name of the element. If youwere searching for a particular element tag name you could break out of the loopat this point. "Depth first" means that it traverses all of an element'schildren before going to its sibling.

Collapse
BOOL bFinished = FALSE;xml.ResetPos();if ( ! xml.FindChildElem() )bFinished = TRUE;while ( ! bFinished ){// Process elementxml.IntoElem();CString csTag = xml.GetTagName();// Next element (depth first)BOOL bFound = xml.FindChildElem();while ( ! bFound && ! bFinished ){if ( xml.OutOfElem() )bFound = xml.FindChildElem();elsebFinished = TRUE;}}

Loading and Saving Files

CMarkup Lite does not have Load andSave methods. To load a file, look in theCMarkupDlg::OnButtonParse method which loads a file into a string.Once you have it in a string, you can put it into the CMarkupobject using SetDoc. To save it to a file, call GetDocto get the string and then implement your own code to write the string to yourfile. When you need to implement any of your own project specific I/O errorhandling, streaming, permissions/locking, and charset conversion, it is actuallygood software design to keep this outside of the CMarkup classallowing CMarkup to remain a generic class.

The Test Dialog

The Markup.exe test bed for CMarkup is a Visual Studio6.0 MFC project (also compiles in VS .NET too). When the dialog starts, itperforms diagnostics in the RunTest function to testCMarkup in the context of the particular build options that havebeen selected. You can step through the RunTest function to see alot of examples of how to use CMarkup. Use the Open and Parsebutton in the dialog to test a file.

In the following illustration, the Build Version is shown as "CMarkup Lite6.5 Debug Unicode." This means that it is the debug version built with_UNICODE defined. The RunTest completed successfully.A parse error was encountered in the order_e.xml file. It also shows theload and parse times, and file size.

The Test Dialog keeps track of the last file parsed and the dialog screenposition for convenience. This is kept in the registry underHKEY_CURRENT_USER/ Software/ First Objective Software/ Markup/Settings.

How CMarkup Works

The CMarkup strategy is to leave the data in the document string and maintaina hierarchical arrangement of indexes mapping out the document.

  • increase speed: parse in one pass and maintain hierarchy of indexes
  • reduce overhead: do not copy or break up the text of the document

CMarkup parses the 250k play.xml sample document in about40 milliseconds (1/25th of a second) on a 500Mhz machine, holding it as a singlestring, and allocating about 200k for a map of the 6343 elements. From then on,navigation does not require any parsing. As a rule of thumb, the map of indexestakes up approximately the same amount of memory as the document, so the memoryfootprint of the CMarkup object should settle down around 2 timesthe size of the document. For each element in the document a struct of eightintegers (32 bytes) is maintained.

Collapse
int nStartL;int nStartR;int nEndL;int nEndR;int nReserved;int iElemParent;int iElemChild;int iElemNext;
Look at the start and end tags in 1. The struct contains the offsets of theleft and right of both the start and end tags (i.e. all the < and >signs). The reserved integer is not currently used but could be used for adelete flag and/or level (i.e. depth) in the hierarchy to support indentation.The other three integers are indexes to the structs for the parent, child andnext elements.

When the document is first parsed an array of these structs is built, andthen as elements are modified and inserted in the XML, the structs are modifiedand added. Rather than allocating structs individually, they are allocated in anarray using a "grow-by" mechanism to reduce the number of allocations to ahandful. That is why integer array indexes rather than pointers are used for thelinks. Once an element is assigned an index in the array, that index does notchange. So the index can be used as a way of referring to and locating anelement

Release Notes

This release 6.5 of CMarkup Lite's public methods are backwardscompatible with the previous release 6.1 posted here in August 2001 except forone rare usage of IntoElem. In 6.1, if you calledIntoElem without a current child element, it would find the firstchild element. Now in 6.5 when there is no current child position,IntoElem puts the main position before the first child element sothat a subsequent call to FindElem will not bypass the firstelement. So, the quick way to check this when upgrading is to scan alloccurrences of IntoElem and make sure the previous CMarkupnavigation call is FindChildElem before it. Or, if the childelement was just created with AddChildElem then its okay becausethat sets the current child position too. For full details on this, see the IntoElem Changes inRelease 6.3.

Other major changes since 6.1:

  • Fix: MBCS double-byte text x_TextToDoc *thanks knight_zhuge
  • Performance: parsing is roughly twice as fast
  • Debugging: see m_pMainDS and m_pChildDS class members while debugging to see string pointers showing current main and child positions
  • New Test Dialog interface with diagnostic results and load vs. parse times, and RunTest code for startup

License

CMarkup Lite is free for compiling into your commercial,personal and educational applications. Modify it as much as you like, but retainthe copyright notice in the source code remarks. Redistribution of the modifiedor unmodified CMarkup Lite class source code is limited to your owndevelopment team and it cannot be made publicly available or distributable aspart of any source code library or product, even if that offering is free. Forsource code products that derive from or utilize CMarkup Lite,please refer users to this article to obtain the source files for themselves.You are encouraged to discuss this source code and share enhancements here inthe discussion board under this article. Enjoy!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Ben Bryant



United States

Member
Raised in Southern Ontario Canada. Bachelor of Science from the University of Toronto in Computer Science and Anthropology. Living near Washington D.C. in Virginia, USA.