IT:AD:Markdown

Summary

Markdown is quickly getting acceptance beyond geek circles as the solution for long term use of notes.

It doesn't have the power of mores specialized Markups (*.docx, latex, etc.), but that's its point: it's the common human readable information denominator syntax.

The Problem

The problem is maybe best exemplified with Blogs and Wikis that use a WYSIWYG Rich HTML Editor to build notes, that are then stored in a Database (SharePoint's Wiki is a solution that was designed with such an architecture).

The intent of most corporations is for users to assemble notes of value in a common store that all can access, and use over a long time.

Unfortunately, things don't work out so well.

Html is killing your notes

Consider saving a post that contains some html, and maybe a simple code fragment such as the following:

    void AnImportantExample {
       ...
    } 

In a database backed blog or cms, the above generates the following database entry:

<div class="code"><span class="codeLine">&nbsp;&nbsp;&nbsp;&nbsp;void AnImportantExample {</span><span class="codeLine alt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...</span><span class="codeLine">&nbsp;&nbsp;&nbsp;&nbsp;}</span>

The first thing to notice is that the text is basically unreadable without a tool. Secondly, you'll probably need a tool configured to the original era (notice that is formatted text is referring to “code” “codeLine” and “alt” CSS styles – that's how we did here in 2006).

When 2009 came around, and we did a refresh of the website, …well…things never looked the same again.

Later still, in 2011, when we combined two different syntax solutions to try to get ready for mobile – everything went to pot.

What happened?

Things are constantly changing

What happened is the web, and its standards (HTML, CSS,etc.), are living changing things. So are the target devices. Once there were none – now Mobile is king.

Things change. Websites have to be adjusted. And if you are making the mistake – as we did at first – of saving Data + Markup, the Data will get taken down when the Markup becomes out of date.

Don't fool yourself: there is practically no way to successfully regular expression your way through a database full of Html markup in order to correctly update to the new styles.

What we're saying is simple: the data is made perishable by the styling.

You are not the first to notice this

Wikipedia, and the other Wikis, understanding that they wanted their data to last a very long time, decided to break away from Html, and choose a format that could be translated into Html, or any future format that comes along, by saving the Data + only the most minimal markup set – a Common Denominator markup syntax.

They called it Markdown.

The other issues with Html Markup listed below are less critical than the longevity of data, but worth considering as well.

Readable Even without a System

The second issue with Html Markup (or Latex, or Docx, or most other markups) is that they are unreadable without a tool.
You have terabytes of information – but you can't use it because it's stored in formatted text, or worse, binary. How much effective value did you lose when you transitioned your corporat data from WordPerfect to Word? Then to online Html docs?

In recognition of the perishability of not only the syntax, but the systems themselves, Wiki developers are making a concerted effort to work towards making the data readable even if the system is down.

Many wiki developers have stopped using databases to store the entries and are instead saving each entry in it's own *.txt file, and using alternate systems for indexing, searching, etc.

Secondly, the syntax chosen for Wikis was meant to use the same markup used on earlier devices…

Markdown, the WWII TypeWriter Syntax

Consider the following typewritten message:

# IMPORTANT #
===========

This is an **important** message from ***HQ***.  

## Prelude ##
We expect *everyone* to read it...

## Message ##
Blah, Blah, Blah...

We're a little out of practice reading typewritter messages such as the above (although I would bet your granddad can read it better than he can read email and other “modern fanddangleness”), but even in its raw format (ie, not yet translated upwards to Html) it is understandable by anyone at a glance. As compared to:

<div class="code"><span class="codeLine">&nbsp;&nbsp;&nbsp;&nbsp;void AnImportantExample {</span><span class="codeLine alt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...</span><span class="codeLine">&nbsp;&nbsp;&nbsp;&nbsp;}</span>

The syntax is rudimentary, but has the essentials to express essential formatting:

# Header1 #
## Header2 ##
### Header3 ###

*Italic* (note this)
**Bold** (note this, it's important)
***Bold Italic*** (note this, it's very important)

Note:
Regarding underlines. I found it very interesting that they decided to not
include underlines, as underlines were only an artifact of typewriters being
unable to tilt their heads to make italics. ie. Underlines meant the same
as italic, therefore were deemed irrelevant by the wiki syntax committee. 
Interesting history...

Would you like more options? Sure. Everyone would. There are extensions to Markdown for Tables, etc. But, honestly, the core of it is just the right amount of enough to get the job down.

  Note:
  If it was enough formatting to express how to win a world war, I'm sure it will do just fine in any corporation.    

Portability

The interesting thing is that the above Markdown (or WWII Typewriter syntax, if you want to call it that) can be translated upwards to any current format.

There are converters for Html, Word docx, LateX, and millions more.

And (here's the million dollar point) there will be converters for any system you use in the future.

In other words, anything you can write down in Markdown will have usable value for as long as you live. And probably beyond.

That's essentially why corporate data should not be put into the SharePoint wiki. It's a self-defeating strategy.

There are other lesser reason as well, listed below.

Verbose

It's a common understanding among developers that Verbosity is usually a CodeStink (ie, something's wrong with the way one is going about solving the problem). We usually store facts tightly, and expand them on screen (eg: save “1980-01-01” and show on screen something that takes more space, such as “January 1st, 1980.”

Working with HTML as the formatting solution goes against this well known pattern. It's not a problem in itself, but a good indicator that something is wrong.

Consider saving a note that contains a code fragment such as the following:

    void AnImportantExample {
       ...
    } 

A Blog program might save the above 3 lines of code the following in the database:

<div class="code"><span class="codeLine">&nbsp;&nbsp;&nbsp;&nbsp;void AnImportantExample {</span><span class="codeLine alt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...</span><span class="codeLine">&nbsp;&nbsp;&nbsp;&nbsp;}</span>

Line Breaks

Html is a syntax that has doesn't use system LineBreaks. In fact's it's intended to be a line-break-less syntax, where text is suppossed to flow.

This is an important design concept when considering the web, where one doesn't know the size of the target screen, or it's target OS (where the linebreak code could be any of the following:

  • LF: Multics, Unix and Unix-like systems (GNU/Linux, Mac OS X, FreeBSD, AIX, Xenix, etc.), BeOS, Amiga, RISC OS and others.
  • CR+LF: Microsoft Windows, DEC TOPS-10, RT-11 and most other early non-Unix and non-IBM OSes, CP/M, MP/M, DOS (MS-DOS, PC-DOS, etc.), Atari TOS, OS/2, Symbian OS, Palm OS, Amstrad CPC
  • LF+CR: Acorn BBC and RISC OS spooled text output.
  • CR: Commodore 8-bit machines, Acorn BBC, ZX Spectrum, TRS-80, Apple II family, Mac OS up to version 9 and OS-9
  • RS: QNX pre-POSIX implementation.

But it plays havoc with Code/Paste.

Try to copy/paste the output from a div/span'ed formatted fragment of code, and instead of getting:

    void AnImportantExample {
       ...
    } 

You'll end up with text that you'll have to re-enter line breaks before it can compile. For a developer, the following is very annoying, as it takes time, and can inadvertently enters bugs where there should not be:

    void AnImportantExample {...} 

The Solution

Choose knowledge systems that use Markdown.

  • The data will gain longevity.
  • It’s easy: the syntax is so simple you can barely call it “syntax.” If you can use an emoticon, you can write Markdown.
  • It’s fast: the simple formatting saves a significant amount of time over hand-crafted HTML tags, and is often faster than using a word processor or WYSIWYG editor. It speeds up the workflows of writers of all ilk, from bloggers to novelists.
  • It’s clean: Markdown translates quickly to perfectly-formed HTML. No missing closing tags, no improperly nested tags, no blocks left without containers. You also get 100% less cruft than exporting HTML from Microsoft Word. There’s no styling inline, nothing that will otherwise break a site’s design or mess with the XSLT formatting for PDF output. In short, it’s foolproof.
  • It’s portable: your documents are cross-platform by nature. You can edit them in any text-capable application on any operating system. Transporting files requires no zipping or archiving, and the filesize is as small as it can possibly get.
  • It’s flexible: output your documents to a wide array of formats. Convert to HTML for posting on the web, rich text for sending emails or importing into a layout program for final arrangement or any number of other proprietary formats.
  • It fits any workflow: You can make Markdown work with any workflow. It can speed up just about any writing-related process with very little setup. It can also be scripted all to hell, if you want, because plain text is the most flexible of any format known to computer-kind.

Tools

Examples

Alternative Parsers

  • XYZ (three underscores works in some parsers to create bold+italic).
  • Underline? (four underscores works in Markua)
  • (two tildas, is strikethru in some parsers).

Pygments