
SVG and Typography: Characters
by Fabio Arciniegas A.
May 12, 2004
In the second part of our
discussion of SVG and typography we explore some time-honored
practices of typographic excellence; as we go along, each
“type issue” will lead to the discussion of relevant
technical aspects of SVG. The typography issues covered are listed
below. Beside each one of them is the associated technical SVG issue
discussed:
- Quotes, Hyphens, Ellipses (Character References)
- Fonts (Embedding SVG Fonts, Creating SVG fonts from True Type)
- Non-Latin Scripts (Proper fonts vs. cheating fonts, encodings,
bidirectionality)
- Ligatures and the Euro Sign
Quotes, Hyphens, Ellipses
“The devil is in the details,” details such as the
quotes surrounding the previous phrase.

Fig. 1 Smart vs Dumb quotes
There are two kinds of quotes: “straight” or
“dumb” quotes and “curly” or
“smart” quotes. As you can see in Fig. 1 on the left,
using smart quotes gives the text a professional look. This is
because the paired quotation marks are specifically designed for each
font, unlike the neutral “dumb” marks which are often
considered a faux pas in professional typesetting.
The common misuse of dumb quotes is just the most popular version
of a larger problem: using characters which are not appropriate but
are similar to the correct ones and easier to input with a
keyboard. For some, this problem is taken care of by their editing
software like MS Word, which automatically will convert straight to
curly quotes; but since a large part of SVG development is manual or
the output of our custom software, we cannot afford to hope the editor
will fix it. We need to understand the details of how these characters
are included.
Numeric Character References
The correct way to introduce curly quotes in SVG documents is
through numeric character references. The Left Double Quotation Mark
is character U+201C in the Unicode standard so it can be included in
your SVG document via the decimal numeric character
reference “; its right counterpart, character
U+201D, is included via ”. It is also possible to
use hexadecimal character references, in which case you would
use “ and ”. The SVG code for
Fig. 2 (quotes.svg) mixes the two approaches.
<?xml version="1.0"?>
<svg xmlns="http://www.w3.org/2000/svg"
width="220" height="220" version="1.1">
<rect x="1" y="20" width="110" height="110"
style="fill:black;"/>
<text x="80" y="40" style="font-family:Arial;
font-size: 10pt; fill:red;">Eno</text>
<text x="3" y="110" style="font-family:Arial;
font-size: 12pt; fill:white;"> “ Fabrication ”
</text>
</svg>
Fig 2. Quotes.svg
There are several reasons for using numeric character references in
SVG instead of other methods such as HTML character entity
references. Let's examine them briefly:
-
HTML character entity references (“ and ” for double
curly quotes) are not defined in SVG. If you were to try to include them
in an SVG document, compliant viewers such as the Adobe SVG viewer v3.01
will not show the character, because such entities are not pre-defined like
they are in HTML.
-
Specifying an encoding such as UTF-8 and including the
character directly in the document is a technically valid
alternative. However, many programming tools have difficulties
showing and manipulating UTF-8 and other encodings. This
difficulty is relevant not only for curly quotes but also for
any character that is difficult to input or display in common
programming tools, including non-Latin alphabetic
characters.
Part of the beauty of SVG is that you can
write simple programs to generate it and use common text
programs to manipulate it; however, that simplicity is blurred
when common operations like searching via the command line
become difficult because the characters in question are not
supported by your input methods. In other words, you can use any
good old terminal to write grep -n "䉍"
code/*.svg, but you would have to go through contortions
to get grep -n "Ф" code/*.svg
-
Non-standard character sets such as Microsoft windows-1252 are
being deprecated and should be avoided because of their conflict
with Unicode. For a more detailed explanation of the problems
related with windows-1252 please refer to David Wheeler's
article “Curling
Quotes in HTML,SGML, and XML”, which also mentions (in
a slightly different light) the two points above.
XML allows both decimal and hexadecimal numeric character
references, so just as shown in Fig.2 you can use either one in SVG. I
prefer and recommend hexadecimal references. Some respectable sources
advocate the exclusive use of decimal references to keep
“maximum backwards compatibility with SGML” because before
XML, SGML only supported decimal references. In practice, however, one
is more likely to use XML tools to process SVG, and there are ways in
most modern SGML tools to enable hexadecimal references. More
important is the argument that the Unicode standard and literature
refers to every character by its hexadecimal code, making hexadecimal
references very convenient.
Commonly Bungled Characters and their Correct Codes
Now that we know the how and why of inserting special
characters in SVG, lets go back to typography and some variations of
the bungled curly quotes syndrome, including single quotes, hyphens,
and ellipses.
| Character(s) |
Common Error |
Examples |
| Single Quotes |
Using the ASCII grave accent (U+0060) and a
“corresponding” acute accent (U+00B4) is a common error,
which looks about as bad as using two apostrophes (U+0027). The
correct single quote marks are U+2018 and U+2019 (except when writing
code that uses apostrophes). |
`this is a hack from typewriter days´
'This is also wrong'
‘Nice, no?,’ she asked
|
| Double Quotes |
Using the ASCII quotation mark (a.k.a. dumb quotes) when quoting
text is a common mistake. Instead, use smart quotes, characters U+201C
and U+201D inserted in SVG documents via their corresponding
hexadecimal numeric character references
“ and ”. |
"this is a common typographic typo too"
print OUT "in code it is ok";
“This is not an exit,” Pat says
|
| Hyphen, n-dash, m-dash |
The character U+002D is the plain hyphen accessed
on your keyboard. It's typographic purpose is to break words at the
end of a line (to hyphenate); however, the hyphen is commonly abused
to
indicate ranges or to break the flow of a
sentence. The correct characters for such purposes
are, respectively, the n-dash (U+2014), and the m-dash (U+2015).
The hyphen is the shortest of the three characters, the n-dash is
larger and commonly about as wide as the letter “n”. The
m-dash is the longest of the three and should not be replaced by two
hyphens, as I'm sure you've seen done before. |
Using an n-dash in March 3—8 is a subtle but elegant improvement over March
3-8
this is wrong -- and ugly --
Boggart―or at least his presence―will remain.
|
| Ellipses |
Although many people are used to create
‘faux—ellipses’ using three dots ―something
that some packages like MS Word automatically correct―,
horizontal ellipses have their own character, and we must include
it explicitly using … |
Isn't that special…
This
isn't...
|
The SVG graphic in Figure 3 and its associated code illustrate the
points above.
<?xml version="1.0"?>
<svg xmlns="http://www.w3.org/2000/svg" height="400" width="400"
xmlns:xlink="http://www.w3.org/1999/xlink">
<image xlink:href="triceratops.png"
width="303" height="216" x="1" y="1"/>
<text x="55" y="45" style="font-family:Arial; font-size: 24pt;
fill:#F8431C;">
sands of time…
</text>
<text x="2" y="60" style="font-family:Times New Roman;
font-size:14pt; fill:#F8431C;">
“Not an experience-a revelation”
</text>
<text x="125" y="185" style="font-family:Times New Roman;
font-size: 14pt; fill:#F8431C;">
Stefan George Institute
</text>
<text x="210" y="205" style="font-family:Times New Roman;
font-size: 14pt; fill:#F8431C;">
June 10–24
</text>
<image height="16" width="16" y="192" x="99"
xlink:href="triceratops.png"/>
</svg>
Fig 3. triceratops.svg
Before moving on, a word of caution about smarts quotes: always use
curly quotes except when showing code. String literals in
programming languages, attributes in XML, and other such technical
code is only correctly presented in dumb quotes, the way it would
compile/parse. Using curly quotes to show code is not only incorrect
but looks cluelessly affected, roughly similar to eating a Snickers
bar with fork and knife.
[1] [2] [3] Next