Documentation » API Reference

Application:

Class: Parser

  • Package: saf.Parser
  • Author: John Luxford <lux@simian.ca>
  • Copyright: Copyright (C) 2001-2003, Simian Systems Inc.
  • License: http://www.sitellite.org/index/license Simian Open Software License
  • Version: 1.0, 2003-01-18, $Id: Parser.php,v 1.4 2008/03/09 18:46:06 lux Exp $
  • Access: public

Generic parser class from which new and complex parsers can be derived.
Provides basic lexical analysis via regular expressions, which are assigned
to callback functions. These functions are used to iterate over the resulting
token list. Where you extend this class is in the syntax analysis and code
generation/execution stages. Parser can also be used as a Finite State
Machine (FSM), in which case saf.Parser.Buffer is handy for implementing
the data structure creation.


Usage Example


<?php

// this example creates a comma-separated values (CSV) parser,
// which can be accomplished in PHP much easier than this, but
// this does serve as an example of what Parser can do, and hopefully
// you'll see more complex and more interesting uses for it.

class CSVParser extends Parser {

  function 
CSVParser () {
    
// define our tokens
    
$this->addInternal ('_comma'',');
    
$this->addInternal ('_newline'"\n");
    
$this->addInternal ('_escape''\\');

    
// define our internal variables.
    // we define $list as an array
    // since we're parsing CSV files to create
    // 2D arrays.  note: in this case we're
    // not using $output, but we don't want to
    // override $output or the internal variables.
    // in this case, we'll consider $output and
    // $struct and $tokens and $regex reserved
    // words.
    
$this->list = array ();
    
$this->skip false;
    
$this->row 0;
    
$this->column 0;
    
$this->list[$this->row] = array ();
  }

  function 
_default ($token$name) {
      
$this->list[$this->row][$this->column] .= $token;
  }

  function 
_comma ($token$name) {
    
// commas are the separators
    
if ($this->skip) {
      
$this->list[$this->row][$this->column] .= ',';
      
$this->skip false;
    } else {
      
$this->column++;
    }
  }

  function 
_newline ($token$name) {
    
// increment
    
$this->row++;
    
$this->column 0;
    
$this->list[$this->row] = array ();
  }

  function 
_escape ($token$name) {
    if (
$this->skip) {
      
$this->list[$this->row][$this->column] .= '\\';
      
$this->skip false;
    } else {
      
$this->skip true;
    }
  }
}

$data 'Joe,Smith,joe@yoursite.com
Phil,Johnson,phil@yoursite.com
Bert,Morris,bert@yoursite.com'
;

$csv = new CSVParser ();
$csv->parse ($data);

echo 
'<pre>';
print_r ($csv->list);
echo 
'</pre>';

?>

Return to Top



Properties


$output

  • Access: public

Contains the output of parse().


$buffers = array ()

  • Access: public

The array buffer.


$original

  • Access: public

Contains the original data sent to parse().


$struct

  • Access: public

Contains the array of parsed elements, aka tokens.


$tokens

  • Access: public

Contains all registered tokens as hashes containing 'name',
'token', 'callback', and 'object' keys.


$regex

  • Access: public

Contains the output of makeRegex() on the current token list.


$switches 's'

  • Access: public

Contains a list of switches to the preg_split() and
preg_match() regular expression evaluations Default is 's',
for dot-all mode. Note that these are PCRE (Perl-Compatible
Regular Expression) expressions, not ereg() calls. For more
info about switches, check out the PHP documentation at
http://www.php.net/manual/en/pcre.pattern.modifiers.php

Return to Top



Methods


Parser () 

  • Access: public

Constructor method.


addInternal ($name$token$quote true

  • Access: public

Alias of addToken().


addToken ($name$token$quote true

  • Access: public

Defines a token whose callback function has the same name as
$name, and is a method defined in the subclass of Parser (the class
you create when you create a custom parser). Tokens are literal
strings, unless $quote is set to false, in which case their values
become active pieces of the token parsing regular expression.


makeRegex () 

  • Access: public
  • Return: string

Turns the $tokens list into a regular expression.


parse ($data

  • Access: public
  • Return: string

This is the mainloop of the parser.


_default ($token$name

  • Access: public
  • Return: string

This is the default token handler. It merely returns
the token sent to it, which will be added to the output string
in the parse() method, thereby recreating the original source
data. This method is usually overridden when this class is
extended.

Return to Top