Class: Parser
- Package: saf.Parser
- Author: John Luxford <lux@simian.ca>
- Copyright: Copyright (C) 2001-2003, Simian Systems Inc.
- License: http://www.sitellite.org/index/license Simian Open Software License
- Version: 1.0, 2003-01-18, $Id: Parser.php,v 1.4 2008/03/09 18:46:06 lux Exp $
- Access: public
Generic parser class from which new and complex parsers can be derived.
Provides basic lexical analysis via regular expressions, which are assigned
to callback functions. These functions are used to iterate over the resulting
token list. Where you extend this class is in the syntax analysis and code
generation/execution stages. Parser can also be used as a Finite State
Machine (FSM), in which case saf.Parser.Buffer is handy for implementing
the data structure creation.
Usage Example
<?php
// this example creates a comma-separated values (CSV) parser,
// which can be accomplished in PHP much easier than this, but
// this does serve as an example of what Parser can do, and hopefully
// you'll see more complex and more interesting uses for it.
class CSVParser extends Parser {
function CSVParser () {
// define our tokens
$this->addInternal ('_comma', ',');
$this->addInternal ('_newline', "\n");
$this->addInternal ('_escape', '\\');
// define our internal variables.
// we define $list as an array
// since we're parsing CSV files to create
// 2D arrays. note: in this case we're
// not using $output, but we don't want to
// override $output or the internal variables.
// in this case, we'll consider $output and
// $struct and $tokens and $regex reserved
// words.
$this->list = array ();
$this->skip = false;
$this->row = 0;
$this->column = 0;
$this->list[$this->row] = array ();
}
function _default ($token, $name) {
$this->list[$this->row][$this->column] .= $token;
}
function _comma ($token, $name) {
// commas are the separators
if ($this->skip) {
$this->list[$this->row][$this->column] .= ',';
$this->skip = false;
} else {
$this->column++;
}
}
function _newline ($token, $name) {
// increment
$this->row++;
$this->column = 0;
$this->list[$this->row] = array ();
}
function _escape ($token, $name) {
if ($this->skip) {
$this->list[$this->row][$this->column] .= '\\';
$this->skip = false;
} else {
$this->skip = true;
}
}
}
$data = 'Joe,Smith,joe@yoursite.com
Phil,Johnson,phil@yoursite.com
Bert,Morris,bert@yoursite.com';
$csv = new CSVParser ();
$csv->parse ($data);
echo '<pre>';
print_r ($csv->list);
echo '</pre>';
?>
Properties
$output
- Access: public
Contains the output of parse().
$buffers = array ()
- Access: public
The array buffer.
$original
- Access: public
Contains the original data sent to parse().
$struct
- Access: public
Contains the array of parsed elements, aka tokens.
$tokens
- Access: public
Contains all registered tokens as hashes containing 'name',
'token', 'callback', and 'object' keys.
$regex
- Access: public
Contains the output of makeRegex() on the current token list.
$switches = 's'
- Access: public
Contains a list of switches to the preg_split() and
preg_match() regular expression evaluations Default is 's',
for dot-all mode. Note that these are PCRE (Perl-Compatible
Regular Expression) expressions, not ereg() calls. For more
info about switches, check out the PHP documentation at
http://www.php.net/manual/en/pcre.pattern.modifiers.php
Methods
Parser ()
- Access: public
Constructor method.
addInternal ($name, $token, $quote = true)
- Access: public
Alias of addToken().
addToken ($name, $token, $quote = true)
- Access: public
Defines a token whose callback function has the same name as
$name, and is a method defined in the subclass of Parser (the class
you create when you create a custom parser). Tokens are literal
strings, unless $quote is set to false, in which case their values
become active pieces of the token parsing regular expression.
makeRegex ()
- Access: public
- Return: string
Turns the $tokens list into a regular expression.
parse ($data)
- Access: public
- Return: string
This is the mainloop of the parser.
_default ($token, $name)
- Access: public
- Return: string
This is the default token handler. It merely returns
the token sent to it, which will be added to the output string
in the parse() method, thereby recreating the original source
data. This method is usually overridden when this class is
extended.
