Python: Context-Sensitive Formatting

Introduction

Python’s new string formatting syntax, introduced in Python 2.6, provides many advantages over the old %-based formatting.  I’ve set out to extend this string formatting, in order to help developers avoid making silly mistakes like forgetting to quote special characters in URLs or html.

Python string formatting—a quick summary

If you’re familiar with Python’s old %-based string formatting, pretend you’re not. Forget about it. Let me introduce you to Python string formatting the way it’s done today.

To specify a template for string substitution, I use curly braces to delimit fields. To perform the substitution and formatting, I use the format() method.

>>> template = 'Did you know that {name} likes to {action} on {day}?'
>>> print template.format(name='J. D. Bartlett',
...         action='publish blog posts', day='Mondays')
Did you know that J. D. Bartlett likes to publish blog posts on Mondays?

If I don’t want to use keywords to specify the parameters, I can use numbers instead of names in the curly braces. (In Python 2.7+ you can even omit the numbers and just use {}.)

>>> template = 'Perhaps {0} > {1}.'
>>> print template.format(17, 6)
Perhaps 17 > 6.

If I want to specify format details, I put them after a colon in the field definition.

>>> print '{num:7.3g}'.format(num=10.0)
     10

To do a str() or repr() of the object before formatting it, I use !s or !r. I can also specify things like alignment and padding in the format description. For instance, the following example takes the repr() of the text 'foo', centres it in a field of 15 characters, and uses underscores for padding.

>>> print '{txt!r:_^15}'.format(txt='foo')
_____'foo'_____

For more details, see PEP 3101 or the Python documentation for format strings.

My changes

The formattools module which I’ve written defines a number of tools that extend this string formatting.

Define string-based types

I said earlier that my module would help programmers avoid silly mistakes like forgetting to correctly escape a string. Lets take the example where we’re generating html code and we want to make sure we don’t accidentally leave the code vulnerable to JavaScript injection attacks.

The first thing we would do is to define a string-based type which represents html code.

import cgi
from formattools import TextBased, PlainText

class Html(TextBased):
    @classmethod
    def _convert_from_(cls, other):
        if isinstance(other, PlainText):
            other = raw(other)
        if isinstance(other, (str, unicode)):
            return Html(cgi.escape(other))
        raise NotImplementedError

    @staticmethod
    def valid_raw_text(text):
        return True

In the class definition above, the _convert_from_ class method says that if we try to insert plain text into some html, the text needs to be escaped using cgi.escape(). The valid_raw_text static method says that any raw text is valid html. We won’t worry about details of what is and isn’t valid html just yet.

Now we can construct Html and PlainText objects.

>>> text1 = Html('<b>Hello there</b>')
>>> text2 = PlainText('I think that 3 < 5')

The formattools module provides two special functions. The convert() function tries to convert an object to another type. The raw() function returns the raw text of an object where applicable. These are made to mimic built-in functions like repr() and len() in that they call special methods (_convert_to_, _convert_from_ and _raw_; note the single leading and trailing underscores). This is done so that any object may be defined to take advantage of these functions, even if that object does not inherit from TextBased.

>>> from formattools import convert, raw
>>> text1
Html('<b>Hello there</b>')
>>> raw(text1)
'<b>Hello there</b>'
>>> raw(text2)
'I think that 3 < 5'
>>> convert(text2, Html)
Html('I think that 3 &lt; 5')

Note that calling convert(text1, PlainText) will not work because we have not defined how to convert html into plain text. In fact, it should not work, because in general, html code is not plain text.

Use string-based types as format templates

If we use a string-based type, such as our Html type, as a format template, formattools will automatically make sure that the things we substitute are compatible. For example:

>>> template = Html('<i>{text}</i>')
>>> template.format(text=text2)
Html('<i>I think that 3 &lt; 5</i>')

>>> t2 = PlainText('I {text} you.')
>>> t2.format(text=text1)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 ⋮
TypeError: Html('<b>Hello there</b>') cannot be automatically
…   converted to PlainText

This is all quite neat, and does pretty much what you would expect.

“But,” I hear you ask, “what if I want to substitute html code without it being escaped?”
Why you simply substitute an Html object. For instance:

>>> template.format(text=text1)
Html('<i><b>Hello there</b></i>')

Because both template and text1 are Html objects, no conversion is performed.

Use any object as a format template

In case you want to define your own objects which act as format templates, formattools defines a class Formattable which defines the format() method. If you want your object to be able to act as a format template, you’ll need to define the _parse_format_() method and, optionally, the _finish_format_() method. I won’t go into detail here about what these functions need to do. Check out the docstrings on the functions for more details.

Explicit field types

Suppose we want to have a plain string template that contains some substitution fields that are Html format. The formattools module extends the syntax of field definitions to allow you to do this.

>>> from formattools import formatter
>>> formatter.registertype(Html, 'html')
>>> t3 = PlainText('If you want to write "{text}" in html, you write
 "{text:/html}".')
>>> print t3.format(text=text2)
If you want to write "I think that 3 < 5" in html, you write "I think that
   3 &lt; 5"

If you want to include format specifiers, just include them between the colon and the slash.

In the example code above, we’ve registered Html as a field type with the default formatter.  If you don’t want to register your field type with the default formatter, there’s still hope.  You can create your own formatter.

>>> from formattools import Formatter
>>> f = Formatter()
>>> f.registertype(Html, html)
>>> print f.format(t3, text=text2)
If you want to write "I think that 3 < 5" in html, you write "I think that
   3 &lt; 5"

Get the code

The formattools module is licensed under the MIT license.  You may not download, use, modify or redistribute this code unless you agree to the terms of the license. If you accept the license, you may download the module here (zipped Python source file).

Where to next?

For the adventurous, you could override the _parse_format_() method in your subclass of TextBased to perform rudimentary parsing of the template string in order to provide better default field types. For instance, if we create a template with our Html class, then by default all fields within the template are Html fields. But if we override _parse_format_() in the Html class, we could change the default field types of the substitution fields based on the context. For instance, we could make it so that Html('<a href="{link}">{link}</a>') behaved the same as Html('<a href="{link:/url}">{link/html}</a>'). Of course we’d have to define and register a field type for URLs.

This entry was posted in long and tagged , , , , , , . Bookmark the permalink.

Comments are closed.