Text::Merge - v.0.34 General purpose text/data merging methods in Perl.
$merge = new Text::Merge;
$merge->line_by_line(); # query $merge->line_by_line(0); # turn off $merge->line_by_line(1); # turn on
$merge->set_delimiters('<<', '>>'); # user defined delims
$success = $merge->publish($template, \%data); $success = $merge->publish($template, \%data, \%actions); $success = $merge->publish($template, $item);
$success = $merge->publish_to($handle, $template, \%data); $success = $merge->publish_to($handle, $template, \%data, \%actions); $success = $merge->publish_to($handle, $template, $item);
$text = $merge->publish_text($template, \%data); $text = $merge->publish_text($template, \%data, \%actions); $text = $merge->publish_text($template, $item);
$success = $merge->publish_email($mailer, $headers, $template, \%data); $success = $merge->publish_email($mailer, $headers, $template, \%data, \%actions); $success = $merge->publish_email($mailer, $headers, $template, $item);
$datahash = $merge->cgi2data(); # if you used "CGI(:standard)" $datahash = $merge->cgi2data($cgi); # if you just used CGI.pm
The Text::Merge
package is designed to provide a quick, versatile, and extensible way to combine presentation
templates and data structures. The Text::Merge
package attempts to do this by assuming that templates are
constructed with text and that objects consist of data and functions that operate on that data. Text::Merge
is very simple, in that it works on one file and one object at a time, although an extension exists to display
lists (Text::Merge::Lists
) and Text::Merge
itself could easily be extended further.
This is not XML and is intended merely to ``flatten'' the learning curve for non-programmers who design display pages for programmers or to provide programmers with a quick way of merging page templates with data sets or objects without extensive research.
The templates can be interpreted ``line by line'' or taken as a whole.
This object is normally inherited and so the new()
function is the constructor. It just blesses an
anonymous HASH reference, sets two flags within that HASH, and returns it. I'm am acutely aware
of the criticisms of the overuse of OOP (Object Oriented Programming). This module needs to be OO
because of its extensibility and encapsulation; I wanted to impose classification of the objects to allow
the greatest flexibility in context of implementation. Text::Merge
is generally used on web servers, and
can become integrated quickly into the httpd using mod_perl, hence the encapsulation and inheritance provided
by the Perl OO model clearly outweighed the constraints thereby imposed. That's my excuse...what's yours?
There are four public methods for the Text::Merge
object: publish()
, publish_to()
, publish_text()
,
publish_email()
. The first, publish()
, sends output to the currently selected file handle (normally
STDOUT). The second method, publish_text()
, returns the merged output as a text block. The last method,
publish_email()
, sends the merged output as a formatted e-mail message to the designated mailer.
Support is provided to merge the data and the functions performed on that data with a text template that contains substitution tag markup used to designate the action or data conversion. Data is stored in a HASH that is passed by reference to the publishing methods. The keys of the data hash correspond to the field names of the data, and they are associated with their respective values. Actions (methods) are similarly referenced in a hash, keyed by the action name used in the template.
Here is a good example of a publishing call in Perl:
$obj = new Text::Merge; %data = ( 'Name'=>'John Smith', 'Age'=>34, 'Sex'=>'not enough' ); %actions = ( 'Mock' => \&mock_person, 'Laud' => \&laud_person ); $obj->publish($template, \%data, \%actions);
In this example, mock_person()
and laud_person()
would be subroutines that took a single hash reference,
the data set, as an argument. In this way you can create dynamic or complex composite components and reference
them with a single tag in the template. The actions HASH has been found to be useful for default constructs
that can be difficult to code manually, giving page designers an option to work with quickly.
Simply put, tags are replaced with what they designate. A tag generally consists of a prefix, followed by a colon, then either an action name or a field name followed by zero or more formatting directives seperated by colons. In addition, blocks of output can be contained within curly brackets in certain contexts for conditional display.
REF:
tag. Here is an example of the use of a REF:
tag
in context, assume we have a key-value pair in our data HASH associating the key 'Animal' with the value of
'turtle':
The quick brown REF:Animal jumped over the lazy dog.
when filtered, becomes:
The quick brown turtle jumped over the lazy dog.
The REF:
tag designators may also contain one or more format directives. These are chained left
to right, and act to convert the data before it is displayed. For example:
REF:Animal:lower:trunc3
would result in the first three letters of the SCALAR data value associated with Animal in lower case. See
the section, Data Conversions Formats
, for a list of the available SCALAR data formatting directives. Note
that some conversions may be incompatible or contradictory. The system will not necessarily warn you of such
cases, so be forewarned.
Any REF:
tag designator can be surrounded by curly brace pairs containing text that would be included in the
merged response only if the result of the designator is not empty (has a length). There must be no spaces between
the tag and the curly braced text. If line-by-line mode is turned off, then the conditional text block may span
multiple lines. For example:
The {quick brown }REF:Animal{ jumps over where the }lazy dog lies.
Might result in:
The quick brown fox jumps over where the lazy dog lies.
or, if the value associated with the data key 'Animal' was undefined, empty, or zero:
The lazy dog lies.
IF:
tag designators performs a conditional display. The syntax is as follows:
IF:FieldName:formats{Text to display}
This designator would result in the string Text to display being returned if the formatted data value is not empty. The curly braced portion is required, and no curly braces are allowed before the designator.
NEG:
tag designator is similar to the IF:
tag, but the bracketed text is processed only if the
formatted data value is empty (zero length) or zero. Effectively the NEG:
can be thought of as if not.
Here is an example:
NEG:FieldName:formats{Text to display if the result is empty.}
ACT:
tag designates that an action is to be performed (a subroutine call) to obtain the result for
substition. The key name specified in the designator is used to look up the reference to the appropriate
subroutine, and the data HASH reference is passed as the sole argument to that subroutine. The returned
value is the value used for the substition.
ACT:
is intended to be used to insert programmatic components into the document. It can only specify
action key names and has no equivalent tags to IF:
and NEG:
. The curly brace rules for the ACT:
tag are exactly the same as those for the REF:
tag.
line_by_line()
switch is set, then
the entire tag degignator must be on a single line of text, but if the switch is OFF (default) then the
conditional text can span multiple lines.
The two conditional tags, IF:
and NEG:
, require a single conditional text block, surrounded by curly
braces, immediately following (suffixing) the field name or format string. For example:
IF:SomeField{this text will print}
The REF:
and ACT:
tags allow for curly braces both at the beginning (prefixing) and at the end
(suffixing). For example:
{Some optional text }REF:SomeValue{ more text.}
The [[IF:VerboseVar{quick, brown }]]fox jumped over the lazy dog.
assuming that 'VerboseVar' represented some data value, the above example would result in one of:
The quick, brown fox jumped over the lazy dog. or The fox jumped over the lazy dog.
upper - converts all lowercase letters to uppercase lower - converts all uppercase letters to lower proper - treats the string as a Proper Noun trunc## - truncate the scalar to ## characters (## is an integer) words## - reduce to ## words seperated by spaces (## is an integer) paragraph## - converts to a paragraph ## columns wide indent## - indents plain text ## spaces int - converts the value to an integer float - converts the value to a floating point value string - converts the numeric value to a string (does nothing) detab - replaces tabs with spaces, aligned to 8-char columns html - replaces newlines with HTML B<BR> tags dollars - converts the value to 2 decimal places percent - converts the value to a percentage abbr - converts a time value to m/d/yy format short - converts a time value to m/d/yy H:MMpm format time - converts a time value to H:MMpm (localtime am/pm) 24h - converts a time value to 24hour format (localtime) dateonly - converts a time value to Jan. 1, 1999 format date - same as 'dateonly' with 'time' ext - converts a time value to extended format: Monday, Januay 12th, 1999 at 12:20pm unix - converts a time value to UNIX date string format escape - performs a browser escape on the value ({) unescape - performs a browser unescape (numeric only) urlencode - performs a url encoding on the value (%3B) urldecode - performs a url decoding (reverse of urlencode)
Most of the values are self-explanatory, however a few may need explanation:
The C<trunc> format must be suffixed with an integer digit to define at most how many characters should be displayed, as in C<trunc14>.
The html
format just inserts a <BR> construct at every newline in the
string. This allows text to be displayed appropriately in some cases.
The escape
format performs an HTML escape on all of the reserved characters
of the string. This allows values to be displayed correctly on browsers in
most cases. If your data is not prefiltered, it is usually a good idea to
use escape on strings where HTML formatting is prohibited. For example
a '$' value would be converted to '$'.
The unescape
format does the reverse of an escape
format, however it
does not operate on HTML mnemonic escapes, allowing special characters to
remain intact. This can be used to reverse escapes inherent in the use of
other packages.
The urlencode
and urldecode
formats either convert a value (text string)
to url encoded format, converting special characters to their %xx equivalent,
or converting to the original code by decoding %xx characters respectively from
the url encoded value.
The publishing methods all require at the very least a template, a data set, and the action set; although either the data set or the action set or both could be empty or null. You may also bundle this information into a single HASH (suitable for blessing as a class) with the key 'Data' associated with the data HASH reference, and the key 'Actions' associated with the action HASH reference. A restatement of a previous example might look like this:
$obj = new Text::Merge; $data = { 'Name'=>'John Smith', 'Age'=>34, 'Sex'=>'not enough' }; $actions = { 'Mock' => \&mock_person, 'Laud' => \&laud_person }; $item = { 'Data' => $data, 'Actions' => $actions }; $obj->publish($template, $item);
In addition, if you specify a key 'ItemType' in your $item
and give it a value, then the item reference
will be handed to any methods invoked by the ACT:
tags, rather than just the data hash. This allows
you to construct items that can be merged with templates. For example, the following code is valid:
%data = ( 'Author' => 'various', 'Title' => 'The Holy Bible' ); %actions = ( 'Highlight' => \&highlight_item ); $item = { 'ItemType'=>'book', 'Data'=>\%data, 'Actions'=>\%actions }; bless $item, Some::Example::Class; $obj->publish($template, $item);
In this last example, the designator ACT:Highlight
would result in the object $item
being passed
as the only argument to the subroutine highlight_item()
referenced in the action HASH.
By default, the publishing methods slurp in the entire template and process it as a text block. This
allows for multi-line conditional text blocks. However, in some cases the resulting output may be very
large, or you may want the output to be generated line by line for some other reason (such as unbuffered
output). This is accomplished through the line_by_line()
method, which accepts an optional boolean value,
which sets the current setting if specified or returns the current settingif not. Note that this has the
most notable impact on the publish()
and publish_email()
methods, since the results of the merge operations
are sent to a handle. If the line by line switch is set, then the publish_text()
method will substitute line
by line, but will still return the entire merged document as a single text block (not line by line).
This is turned OFF by default.
Templates consist of text documents that contain special substitution designators as described previously. The template arguments passed to the publishing functions can take one of three forms:
FileHandle
package that comes with the Perl distribution
for this type of template argument. Processing begins at the current file position and continues until the end of
file condition is reached.
Note that you should not use this type of template argument if your template is very large and you are using line by line mode. In this case you should use a FileHandle or file path argument.
new()
_Text_Merge_LineMode
Other keys can be added by objects which inherit Text::Merge
.
line_by_line($setting)
$setting
argument is omitted. Otherwise it resets the
line-by-line mode to the setting requested. A non-zero value tells the publishing methods to process the
template line by line. For those methods that output results to a handle, then those results will also be
echoed line by line.
$start
and $end
delimiters must be provided, and they cannot be
identical.
$handle
or to the currently
selected handle, normally STDOUT, if the $handle
argument is omitted.
publish_to()
method, except it returns the filtered output as text
rather than sending it to the currently selected filehandle.
publish()
but opens a handle to $mailer
, and sending the merged data
formatted as an e-mail message. $mailer
may contain the sequences RECIPIENT
and/or SUBJECT
.
If either does not exists, it will be echoed at the beginning of the email (in the form of a header), allowing
e-mail to be passed preformatted. This is the preferred method; use a mailer that can be told to
accept the ``To:'', ``Subject:'' and ``Reply-To:'' fields within the body of the passed message and do
not specify the RECIPIENT
or SUBJECT
tags in the $mailer
string. Returns false if failed,
true if succeeded. The recommended mail program is 'sendmail'. $headers
is a HASH reference, containing
the header information. Only the following header keys are recognized:
To Subject Reply-To CC From (works for privileged users only)
The values associated with these keys will be used to construct the desired e-mail message header. Secure minded site administrators might put hooks in here, or even better clean the data, to protect access to the system as a precaution, to avoid accidental mistakes perhaps.
Note: the $mailer
argument string should begin with the type of pipe required for your request. For
sendmail, this argument would look something like (note the vertical pipe):
'|/usr/bin/sendmail -t'
Be careful not to run this with write permission on the sendmail file and forget the process pipe!!!
cgi2data($cgi)
CGI.pm
parameters to a data hash reference suitable
for merging. The $cgi
parameter is a CGI object and is optional, but
you must have imported the :standard
methods from CGI.pm
if you omit
the $cgi
paramter. This method returns a hash reference containing the
parameters as data. Basically it turns list values into list references and
puts everything in a hash keyed by field name.
This module was written and tested in Perl 5.005 and runs with -Tw
set and use strict
. It
requires use of the package FileHandle
which is part of the standard perl distribution.
This software is released under the Perl Artistic License. Derive what you wish, as you wish, but please attribute releases and include derived source code. (C) 1997-2004 by Steven D. Harris, perl@nullspace.com