PERL Modules | XML-TreePP-XMLPath |
CodePin.org |
| ABOUT |
|
A pure PERL module to compliment the pure PERL XML::TreePP module. XMLPath may
be similar to XPath, and it does attempt to conform to the XPath standard when
possible, but it is far from being fully XPath compliant.
|
| AVAILABILITY | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
View the XML-TreePP-XMLPath
README,
CHANGES,
online. Download the source code from Subversion.
Download from CPAN: http://cpan.perl.org/modules/by-module/XML/ |
| POD DOCUMENTATION |
NAMEXML::TreePP::XMLPath - Similar to XPath, defines a path as an accessor to nodes of an XML::TreePP parsed XML Document.
SYNOPSIS
use XML::TreePP;
use XML::TreePP::XMLPath;
my $tpp = XML::TreePP->new();
my $tppx = XML::TreePP::XMLPath->new();
my $tree = { rss => { channel => { item => [ {
title => "The Perl Directory",
link => "http://www.perl.org/",
}, {
title => "The Comprehensive Perl Archive Network",
link => "http://cpan.perl.org/",
} ] } } };
my $xml = $tpp->write( $tree );
Get a subtree of the XMLTree:
my $xmlsub = $tppx->filterXMLDoc( $tree , q{rss/channel/item[title="The Comprehensive Perl Archive Network"]} );
print $xmlsub->{'link'};
Iterate through all attributes and Elements of each <item> XML element:
my $xmlsub = $tppx->filterXMLDoc( $tree , q{rss/channel/item} );
my $h_attr = $tppx->getAttributes( $xmlsub );
my $h_elem = $tppx->getElements( $xmlsub );
foreach $attrHash ( @{ $h_attr } ) {
while my ( $attrKey, $attrVal ) = each ( %{$attrHash} ) {
...
}
}
foreach $elemHash ( @{ $h_elem } ) {
while my ( $elemName, $elemVal ) = each ( %{$elemHash} ) {
...
}
}
EXAMPLE for using XML::TreePP::XMLPath to access a non-XML compliant tree of PERL referenced data.
use XML::TreePP::XMLPath;
my $tppx = new XML::TreePP::XMLPath;
my $hashtree = {
config => {
nodes => {
"10.0.10.5" => {
options => [ 'option1', 'option2' ],
alerts => {
email => 'someone@nowhere.org'
}
}
}
}
};
print $tppx->filterXMLDoc($hashtree, '/config/nodes/10.0.10.5/alerts/email');
print "\n";
print $tppx->filterXMLDoc($hashtree, '/config/nodes/10.0.10.5/options[2]');
print "\n";
Result
someone@nowhere.org
option2
DESCRIPTIONA pure PERL module to compliment the pure PERL XML::TreePP module. XMLPath may be similar to XPath, and it does attempt to conform to the XPath standard when possible, but it is far from being fully XPath compliant. Its purpose is to implement an XPath-like accessor methodology to nodes in a XML::TreePP parsed XML Document. In contrast, XPath is an accessor methodology to nodes in an unparsed (or raw) XML Document. The advantage of using XML::TreePP::XMLPath over any other PERL implementation of XPath is that XML::TreePP::XMLPath is an accessor to XML::TreePP parsed XML Documents. If you are already using XML::TreePP to parse XML, you can use XML::TreePP::XMLPath to access nodes inside that parsed XML Document without having to convert it into a raw XML Document. As an additional side-benefit, any PERL HASH/ARRY reference data structure can be accessible via the XPath accessor method provided by this module. It does not have to a parsed XML structure. The last example in the SYNOPSIS illustrates this.
REQUIREMENTSThe following perl modules are depended on by this module: ( Note: Dependency on Params::Validate was removed in version 0.52; Dependency on Data::Dump was removed in version 0.64 )
IMPORTABLE METHODSWhen the calling application invokes this module in a use clause, the following methods can be imported into its space. Example:
use XML::TreePP::XMLPath qw(parseXMLPath filterXMLDoc getValues getAttributes getElements);
REMOVED METHODSThe following methods are removed in the current release.
XMLPath PHILOSOPHY
General Illustration of XMLPathReferring to the following XML Data.
<paragraph>
<sentence language="english">
<words>Do red cats eat yellow food</words>
<punctuation>?</punctuation>
</sentence>
<sentence language="english">
<words>Brown cows eat green grass</words>
<punctuation>.</punctuation>
</sentence>
</paragraph>
Where the path `` Where the path `` Where the path `` And where the path `` So that `` And `` And `` And `` After XML::TreePP parses the above XML, it looks like this:
{
paragraph => {
sentence => [
{
"-language" => "english",
punctuation => "?",
words => "Do red cats eat yellow food",
},
{
"-language" => "english",
punctuation => ".",
words => "Brown cows eat green grass",
},
],
},
}
Noting Attribute Identification in Parsed XMLNote that attributes are specified in the XMLPath as XMLPath requires attributes to be specified as Child elements on the next level of a parent element are accessible as
attributes as
Noting Text (CDATA) Identification in Parsed XMLAdditionally, the values of child elements are identified in XML parsed by
Accessing Child Element Values in XMLPathChild element values are only accessible as
<jungle>
<animal>
<cat>tiger</cat>
</animal>
</jungle>
The XMLPath used to access the key=value pair of
jungle/animal[cat='tiger']
And in version 0.52, in this second case, the above XMLPath is still valid:
<jungle>
<animal>
<cat color="black">tiger</cat>
</animal>
</jungle>
In version 0.52, the period (.) is supported as it is in XPath to represent the current context node. As such, the following XMLPaths would also be valid:
jungle/animal/cat[.='tiger']
jungle/animal/cat[@color='black'][.='tiger']
One should realize that in these previous two XMLPaths, the element To perform the same evaluation, but return the matching
jungle/animal[cat='tiger']
To evaluate
jungle/animal[cat='tiger']/cat
jungle/animal/cat[.='tiger']
The first path analyzes
Matching AttributesPrior to version 0.52, attributes could only be used in XMLPath to evaluate an element for a result set. As of version 0.52, attributes can now be matched in XMLPath to return their values. This next example illustrates:
<jungle>
<animal>
<cat color="black">tiger</cat>
</animal>
</jungle>
/jungle/animal/cat[.='tiger']/@color
The result set of this XMLPath would be ``
METHODS
tppThis module is an extension of the XML::TreePP module. As such, it uses the
module in many different methods to parse XML Documents, and to get the value
of The To avoid having this module load the XML::TreePP module,
do not pass in unparsed XML documents. The caller would instead want to
parse the XML document with Alternately, If the caller has loaded a copy of If this module has loaded an instance of <XML::TreePP>, this instance can be directly accessed or retrieved through this method. For example, the aformentioned properties can be set.
$tppx->tpp->set('attr_prefix','@'); # default is (-) dash
$tppx->tpp->set('text_node_key','#'); # default is (#) pound
If you want to only get the internally loaded instance of
my $tppobj = $tppx->get( 'tpp' );
warn "XML::TreePP is not loaded in XML::TreePP::XMLPath.\n" if !defined $tppobj;
This method was added in version 0.52
setSet the value for a property in this object instance. This method can only be accessed in object oriented style. This method was added in version 0.52
getRetrieve the value set for a property in this object instance. This method can only be accessed in object oriented style. This method was added in version 0.52
newCreate a new object instances of this module.
charlexsplitAn analysis method for single character boundary and start/stop tokens
parseXMLPathParse a string that represents the XMLPath to a XML element or attribute in a XML::TreePP parsed XML Document. Note that the XML attributes, known as ``@attr'' are transformed into ``-attr''. The preceding (-) minus in place of the (@) at is the recognized format of attributes in the XML::TreePP module. Being that this is intended to be a submodule of XML::TreePP, the format of '@attr' is converted to '-attr' to conform with how XML::TreePP handles attributes. See:
my $tppx = new XML::TreePP::XMLPath();
$tppx->tpp->set( attr_prefix => '@' );
XMLPath Filter by index and existence Also, as of version 0.52, there are two additional types of XMLPaths understood. XMLPath with indexes, which is similar to the way XPath does it
$path = '/books/book[5]';
This defines the fifth book in a list of book elements under the books root. When using this to get the value, the 5th book is returned. When using this to test an element, there must be 5 or more books to return true. XMLPath by existence, which is similar to the way XPath does it
$path = '/books/book[author]';
This XMLPath represents all book elements under the books root which have 1 or more author child element. It does not evaluate if the element or attribute to evaluate has a value. So it is a test for existence of the element or attribute.
assembleXMLPathAssemble an ARRAY or HASH ref structure representing an XMLPath. This method can be used to construct an XMLPath array ref that has been parsed by the parseXMLPath method. Note that the XML attributes can be identified as ``-attribute'' or ``@attribute''.
When identified as ``-attribute', they are transformed into ''@attribute`` upon
assembly. The preceding minus (-) in place of the at (@) is the recognized
format of attributes in the This method was added in version 0.70.
filterXMLDocTo filter down to a subtree or set of subtrees of an XML document based on a given XMLPath This method can also be used to determine if a node within an XML tree is valid based on the given filters in an XML path. This method replaces the two methods This method was added in version 0.52
getValuesRetrieve the values found in the given XML Document at the given XMLPath. This method was added in version 0.53 as getValue, and changed to getValues in 0.54
getAttributesRetrieve the attributes found in the given XML Document at the given XMLPath.
getElementsGets the child elements found at a specified XMLPath
EXAMPLES
Method: newIt is not necessary to create an object of this module. However, if you choose to do so any way, here is how you do it.
my $obj = new XML::TreePP::XMLPath;
This module supports being called by two methods.
Using either method works the same and returns the same output.
Method: charlexsplitHere are three steps that can be used to parse values out of a string: Step 1: First, parse the entire string delimited by the / character.
my $el = charlexsplit (
string => q{abcdefg/xyz/path[@key='val'][@key2='val2']/last},
boundry_start => '/',
boundry_stop => '/',
tokens => [qw( [ ] ' ' " " )],
boundry_begin => 1,
boundry_end => 1
);
print Dumper( $el );
Output:
["abcdefg", "xyz", "path[\@key='val'][\@key2='val2']", "last"],
Step 2: Second, parse the elements from step 1 that have key/val pairs, such that each single key/val is contained by the [ and ] characters
my $el = charlexsplit (
string => q( path[@key='val'][@key2='val2'] ),
boundry_start => '[',
boundry_stop => ']',
tokens => [qw( ' ' " " )],
boundry_begin => 0,
boundry_end => 0
);
print Dumper( $el );
Output:
["\@key='val'", "\@key2='val2'"]
Step 3: Third, parse the elements from step 2 that is a single key/val, the single key/val is delimited by the = character
my $el = charlexsplit (
string => q{ @key='val' },
boundry_start => '=',
boundry_stop => '=',
tokens => [qw( ' ' " " )],
boundry_begin => 1,
boundry_end => 1
);
print Dumper( $el );
Output:
["\@key", "'val'"]
Note that in each example the So if you have a start token without a stop token, you will get undesired results. This example demonstrate this data error.
my $el = charlexsplit (
string => q{ path[@key='val'][@key2=val2'] },
boundry_start => '[',
boundry_stop => ']',
tokens => [qw( ' ' " " )],
boundry_begin => 0,
boundry_end => 0
);
print Dumper( $el );
Undesired output:
["\@key='val'"]
In this example of bad data being parsed, the And there is no error message. The charlexsplit method throws away the second element silently due to the token start and stop mismatch.
Method: parseXMLPath
use XML::TreePP::XMLPath qw(parseXMLPath);
use Data::Dumper;
my $parsedPath = parseXMLPath(
q{abcdefg/xyz/path[@key1='val1'][key2='val2']/last}
);
print Dumper ( $parsedPath );
Output:
[
["abcdefg", undef],
["xyz", undef],
["path", [["-key1", "val1"], ["key2", "val2"]]],
["last", undef],
]
Method: filterXMLDocFiltering an XML Document, using an XMLPath, to find a node within the document.
#!/usr/bin/perl
use XML::TreePP;
use XML::TreePP::XMLPath qw(filterXMLDoc);
use Data::Dumper;
#
# The XML document data
my $xmldata=<<XMLEND;
<level1>
<level2>
<level3 attr1="val1" attr2="val2">
<attr3>val3</attr3>
<attr4/>
<attrX>one</attrX>
<attrX>two</attrX>
<attrX>three</attrX>
</level3>
<level3 attr1="valOne"/>
</level2>
</level1>
XMLEND
#
# Parse the XML document.
my $tpp = new XML::TreePP;
my $xmldoc = $tpp->parse($xmldata);
print "Output Test #1\n";
print Dumper( $xmldoc );
#
# Retrieve the sub tree of the XML document at path "level1/level2"
my $xmlSubTree = filterXMLDoc($xmldoc, 'level1/level2');
print "Output Test #2\n";
print Dumper( $xmlSubTree );
#
# Retrieve the sub tree of the XML document at path "level1/level2/level3[@attr1='val1']"
my $xmlSubTree = filterXMLDoc($xmldoc, 'level1/level2/level3[@attr1="val1"]');
print "Output Test #3\n";
print Dumper( $xmlSubTree );
Output:
Output Test #1
{
level1 => {
level2 => {
level3 => [
{
"-attr1" => "val1",
"-attr2" => "val2",
attr3 => "val3",
attr4 => undef,
attrX => ["one", "two", "three"],
},
{ "-attr1" => "valOne" },
],
},
},
}
Output Test #2
{
level3 => [
{
"-attr1" => "val1",
"-attr2" => "val2",
attr3 => "val3",
attr4 => undef,
attrX => ["one", "two", "three"],
},
{ "-attr1" => "valOne" },
],
}
Output Test #3
{
"-attr1" => "val1",
"-attr2" => "val2",
attr3 => "val3",
attr4 => undef,
attrX => ["one", "two", "three"],
}
Validating attribute and value pairs of a given node.
#!/usr/bin/perl
use XML::TreePP;
use XML::TreePP::XMLPath qw(filterXMLDoc);
use Data::Dumper;
#
# The XML document data
my $xmldata=<<XMLEND;
<paragraph>
<sentence language="english">
<words>Do red cats eat yellow food</words>
<punctuation>?</punctuation>
</sentence>
<sentence language="english">
<words>Brown cows eat green grass</words>
<punctuation>.</punctuation>
</sentence>
</paragraph>
XMLEND
#
# Parse the XML document.
my $tpp = new XML::TreePP;
my $xmldoc = $tpp->parse($xmldata);
print "Output Test #1\n";
print Dumper( $xmldoc );
#
# Retrieve the sub tree of the XML document at path "paragraph/sentence"
my $xmlSubTree = filterXMLDoc($xmldoc, "paragraph/sentence");
print "Output Test #2\n";
print Dumper( $xmlSubTree );
#
my (@params, $validatedSubTree);
#
# Test the XML Sub Tree to have an attribute "-language" with value "german"
@params = (['-language', 'german']);
$validatedSubTree = filterXMLDoc($xmlSubTree, [ ".", \@params ]);
print "Output Test #3\n";
print Dumper( $validatedSubTree );
#
# Test the XML Sub Tree to have an attribute "-language" with value "english"
@params = (['-language', 'english']);
$validatedSubTree = filterXMLDoc($xmlSubTree, [ ".", \@params ]);
print "Output Test #4\n";
print Dumper( $validatedSubTree );
Output:
Output Test #1
{
paragraph => {
sentence => [
{
"-language" => "english",
punctuation => "?",
words => "Do red cats eat yellow food",
},
{
"-language" => "english",
punctuation => ".",
words => "Brown cows eat green grass",
},
],
},
}
Output Test #2
[
{
"-language" => "english",
punctuation => "?",
words => "Do red cats eat yellow food",
},
{
"-language" => "english",
punctuation => ".",
words => "Brown cows eat green grass",
},
]
Output Test #3
undef
Output Test #4
{
"-language" => "english",
punctuation => "?",
words => "Do red cats eat yellow food",
}
Method: getAttributes
#!/usr/bin/perl
#
use XML::TreePP;
use XML::TreePP::XMLPath qw(getAttributes);
use Data::Dumper;
#
# The XML document data
my $xmldata=<<XMLEND;
<level1>
<level2>
<level3 attr1="val1" attr2="val2">
<attr3>val3</attr3>
<attr4/>
<attrX>one</attrX>
<attrX>two</attrX>
<attrX>three</attrX>
</level3>
<level3 attr1="valOne"/>
</level2>
</level1>
XMLEND
#
# Parse the XML document.
my $tpp = new XML::TreePP;
my $xmldoc = $tpp->parse($xmldata);
print "Output Test #1\n";
print Dumper( $xmldoc );
#
# Retrieve the sub tree of the XML document at path "level1/level2/level3"
my $attributes = getAttributes($xmldoc, 'level1/level2/level3');
print "Output Test #2\n";
print Dumper( $attributes );
#
# Retrieve the sub tree of the XML document at path "level1/level2/level3[attr3=""]"
my $attributes = getAttributes($xmldoc, 'level1/level2/level3[attr3="val3"]');
print "Output Test #3\n";
print Dumper( $attributes );
Output:
Output Test #1
{
level1 => {
level2 => {
level3 => [
{
"-attr1" => "val1",
"-attr2" => "val2",
attr3 => "val3",
attr4 => undef,
attrX => ["one", "two", "three"],
},
{ "-attr1" => "valOne" },
],
},
},
}
Output Test #2
[{ attr1 => "val1", attr2 => "val2" }, { attr1 => "valOne" }]
Output Test #3
[{ attr1 => "val1", attr2 => "val2" }]
Method: getElements
#!/usr/bin/perl
#
use XML::TreePP;
use XML::TreePP::XMLPath qw(getElements);
use Data::Dumper;
#
# The XML document data
my $xmldata=<<XMLEND;
<level1>
<level2>
<level3 attr1="val1" attr2="val2">
<attr3>val3</attr3>
<attr4/>
<attrX>one</attrX>
<attrX>two</attrX>
<attrX>three</attrX>
</level3>
<level3 attr1="valOne"/>
</level2>
</level1>
XMLEND
#
# Parse the XML document.
my $tpp = new XML::TreePP;
my $xmldoc = $tpp->parse($xmldata);
print "Output Test #1\n";
print Dumper( $xmldoc );
#
# Retrieve the multiple same-name elements of the XML document at path "level1/level2/level3"
my $elements = getElements($xmldoc, 'level1/level2/level3');
print "Output Test #2\n";
print Dumper( $elements );
#
# Retrieve the elements of the XML document at path "level1/level2/level3[attr3="val3"]
my $elements = getElements($xmldoc, 'level1/level2/level3[attr3="val3"]');
print "Output Test #3\n";
print Dumper( $elements );
Output:
Output Test #1
{
level1 => {
level2 => {
level3 => [
{
"-attr1" => "val1",
"-attr2" => "val2",
attr3 => "val3",
attr4 => undef,
attrX => ["one", "two", "three"],
},
{ "-attr1" => "valOne" },
],
},
},
}
Output Test #2
[
{ attr3 => "val3", attr4 => undef, attrX => ["one", "two", "three"] },
undef,
]
Output Test #3
[
{ attr3 => "val3", attr4 => undef, attrX => ["one", "two", "three"] },
]
AUTHORRussell E Glaue, http://russ.glaue.org
SEE ALSOXML::TreePP::XMLPath on Codepin: http://www.codepin.org/project/perlmod/XML-TreePP-XMLPath
COPYRIGHT AND LICENSECopyright (c) 2008-2013 Russell E Glaue, Center for the Application of Information Technologies, Western Illinois University. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. |